|
|
|
|
@ -8,11 +8,11 @@ |
|
|
|
|
</indexterm> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
<productname>PostgreSQL</productname> can devise query plans which can leverage |
|
|
|
|
<productname>PostgreSQL</productname> can devise query plans that can leverage |
|
|
|
|
multiple CPUs in order to answer queries faster. This feature is known |
|
|
|
|
as parallel query. Many queries cannot benefit from parallel query, either |
|
|
|
|
due to limitations of the current implementation or because there is no |
|
|
|
|
imaginable query plan which is any faster than the serial query plan. |
|
|
|
|
imaginable query plan that is any faster than the serial query plan. |
|
|
|
|
However, for queries that can benefit, the speedup from parallel query |
|
|
|
|
is often very significant. Many queries can run more than twice as fast |
|
|
|
|
when using parallel query, and some queries can run four times faster or |
|
|
|
|
@ -27,7 +27,7 @@ |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
When the optimizer determines that parallel query is the fastest execution |
|
|
|
|
strategy for a particular query, it will create a query plan which includes |
|
|
|
|
strategy for a particular query, it will create a query plan that includes |
|
|
|
|
a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm> |
|
|
|
|
node. Here is a simple example: |
|
|
|
|
|
|
|
|
|
@ -59,7 +59,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<para> |
|
|
|
|
<link linkend="using-explain">Using EXPLAIN</link>, you can see the number of |
|
|
|
|
workers chosen by the planner. When the <literal>Gather</literal> node is reached |
|
|
|
|
during query execution, the process which is implementing the user's |
|
|
|
|
during query execution, the process that is implementing the user's |
|
|
|
|
session will request a number of <link linkend="bgworker">background |
|
|
|
|
worker processes</link> equal to the number |
|
|
|
|
of workers chosen by the planner. The number of background workers that |
|
|
|
|
@ -79,7 +79,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Every background worker process which is successfully started for a given |
|
|
|
|
Every background worker process that is successfully started for a given |
|
|
|
|
parallel query will execute the parallel portion of the plan. The leader |
|
|
|
|
will also execute that portion of the plan, but it has an additional |
|
|
|
|
responsibility: it must also read all of the tuples generated by the |
|
|
|
|
@ -88,7 +88,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
worker, speeding up query execution. Conversely, when the parallel portion |
|
|
|
|
of the plan generates a large number of tuples, the leader may be almost |
|
|
|
|
entirely occupied with reading the tuples generated by the workers and |
|
|
|
|
performing any further processing steps which are required by plan nodes |
|
|
|
|
performing any further processing steps that are required by plan nodes |
|
|
|
|
above the level of the <literal>Gather</literal> node or |
|
|
|
|
<literal>Gather Merge</literal> node. In such cases, the leader will |
|
|
|
|
do very little of the work of executing the parallel portion of the plan. |
|
|
|
|
@ -109,7 +109,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<title>When Can Parallel Query Be Used?</title> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
There are several settings which can cause the query planner not to |
|
|
|
|
There are several settings that can cause the query planner not to |
|
|
|
|
generate a parallel query plan under any circumstances. In order for |
|
|
|
|
any parallel query plans whatsoever to be generated, the following |
|
|
|
|
settings must be configured as indicated. |
|
|
|
|
@ -119,7 +119,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
<xref linkend="guc-max-parallel-workers-per-gather"/> must be set to a |
|
|
|
|
value which is greater than zero. This is a special case of the more |
|
|
|
|
value that is greater than zero. This is a special case of the more |
|
|
|
|
general principle that no more workers should be used than the number |
|
|
|
|
configured via <varname>max_parallel_workers_per_gather</varname>. |
|
|
|
|
</para> |
|
|
|
|
@ -144,8 +144,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
The query writes any data or locks any database rows. If a query |
|
|
|
|
contains a data-modifying operation either at the top level or within |
|
|
|
|
a CTE, no parallel plans for that query will be generated. As an |
|
|
|
|
exception, the following commands which create a new table and populate |
|
|
|
|
it can use a parallel plan for the underlying <literal>SELECT</literal> |
|
|
|
|
exception, the following commands, which create a new table and populate |
|
|
|
|
it, can use a parallel plan for the underlying <literal>SELECT</literal> |
|
|
|
|
part of the query: |
|
|
|
|
|
|
|
|
|
<itemizedlist> |
|
|
|
|
@ -255,7 +255,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
than normal but would produce incorrect results. Instead, the parallel |
|
|
|
|
portion of the plan must be what is known internally to the query |
|
|
|
|
optimizer as a <firstterm>partial plan</firstterm>; that is, it must be constructed |
|
|
|
|
so that each process which executes the plan will generate only a |
|
|
|
|
so that each process that executes the plan will generate only a |
|
|
|
|
subset of the output rows in such a way that each required output row |
|
|
|
|
is guaranteed to be generated by exactly one of the cooperating processes. |
|
|
|
|
Generally, this means that the scan on the driving table of the query |
|
|
|
|
@ -365,11 +365,11 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Because the <literal>Finalize Aggregate</literal> node runs on the leader |
|
|
|
|
process, queries which produce a relatively large number of groups in |
|
|
|
|
process, queries that produce a relatively large number of groups in |
|
|
|
|
comparison to the number of input rows will appear less favorable to the |
|
|
|
|
query planner. For example, in the worst-case scenario the number of |
|
|
|
|
groups seen by the <literal>Finalize Aggregate</literal> node could be as many as |
|
|
|
|
the number of input rows which were seen by all worker processes in the |
|
|
|
|
the number of input rows that were seen by all worker processes in the |
|
|
|
|
<literal>Partial Aggregate</literal> stage. For such cases, there is clearly |
|
|
|
|
going to be no performance benefit to using parallel aggregation. The |
|
|
|
|
query planner takes this into account during the planning process and is |
|
|
|
|
@ -425,7 +425,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
involve appending multiple results sets can therefore achieve |
|
|
|
|
coarse-grained parallelism even when efficient partial plans are not |
|
|
|
|
available. For example, consider a query against a partitioned table |
|
|
|
|
which can only be implemented efficiently by using an index that does |
|
|
|
|
that can only be implemented efficiently by using an index that does |
|
|
|
|
not support parallel scans. The planner might choose a <literal>Parallel |
|
|
|
|
Append</literal> of regular <literal>Index Scan</literal> plans; each |
|
|
|
|
individual index scan would have to be executed to completion by a single |
|
|
|
|
@ -446,7 +446,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
If a query that is expected to do so does not produce a parallel plan, |
|
|
|
|
you can try reducing <xref linkend="guc-parallel-setup-cost"/> or |
|
|
|
|
<xref linkend="guc-parallel-tuple-cost"/>. Of course, this plan may turn |
|
|
|
|
out to be slower than the serial plan which the planner preferred, but |
|
|
|
|
out to be slower than the serial plan that the planner preferred, but |
|
|
|
|
this will not always be the case. If you don't get a parallel |
|
|
|
|
plan even with very small values of these settings (e.g., after setting |
|
|
|
|
them both to zero), there may be some reason why the query planner is |
|
|
|
|
@ -473,15 +473,15 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<para> |
|
|
|
|
The planner classifies operations involved in a query as either |
|
|
|
|
<firstterm>parallel safe</firstterm>, <firstterm>parallel restricted</firstterm>, |
|
|
|
|
or <firstterm>parallel unsafe</firstterm>. A parallel safe operation is one which |
|
|
|
|
or <firstterm>parallel unsafe</firstterm>. A parallel safe operation is one that |
|
|
|
|
does not conflict with the use of parallel query. A parallel restricted |
|
|
|
|
operation is one which cannot be performed in a parallel worker, but which |
|
|
|
|
operation is one that cannot be performed in a parallel worker, but that |
|
|
|
|
can be performed in the leader while parallel query is in use. Therefore, |
|
|
|
|
parallel restricted operations can never occur below a <literal>Gather</literal> |
|
|
|
|
or <literal>Gather Merge</literal> node, but can occur elsewhere in a plan which |
|
|
|
|
contains such a node. A parallel unsafe operation is one which cannot |
|
|
|
|
or <literal>Gather Merge</literal> node, but can occur elsewhere in a plan that |
|
|
|
|
contains such a node. A parallel unsafe operation is one that cannot |
|
|
|
|
be performed while parallel query is in use, not even in the leader. |
|
|
|
|
When a query contains anything which is parallel unsafe, parallel query |
|
|
|
|
When a query contains anything that is parallel unsafe, parallel query |
|
|
|
|
is completely disabled for that query. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
@ -505,7 +505,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Scans of foreign tables, unless the foreign data wrapper has |
|
|
|
|
an <literal>IsForeignScanParallelSafe</literal> API which indicates otherwise. |
|
|
|
|
an <literal>IsForeignScanParallelSafe</literal> API that indicates otherwise. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
|
|
|
|
|
@ -517,7 +517,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
|
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Plan nodes which reference a correlated <literal>SubPlan</literal>. |
|
|
|
|
Plan nodes that reference a correlated <literal>SubPlan</literal>. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
</itemizedlist> |
|
|
|
|
@ -528,7 +528,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<para> |
|
|
|
|
The planner cannot automatically determine whether a user-defined |
|
|
|
|
function or aggregate is parallel safe, parallel restricted, or parallel |
|
|
|
|
unsafe, because this would require predicting every operation which the |
|
|
|
|
unsafe, because this would require predicting every operation that the |
|
|
|
|
function could possibly perform. In general, this is equivalent to the |
|
|
|
|
Halting Problem and therefore impossible. Even for simple functions |
|
|
|
|
where it could conceivably be done, we do not try, since this would be expensive |
|
|
|
|
@ -546,11 +546,11 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
<para> |
|
|
|
|
Functions and aggregates must be marked <literal>PARALLEL UNSAFE</literal> if |
|
|
|
|
they write to the database, access sequences, change the transaction state |
|
|
|
|
even temporarily (e.g., a PL/pgSQL function which establishes an |
|
|
|
|
even temporarily (e.g., a PL/pgSQL function that establishes an |
|
|
|
|
<literal>EXCEPTION</literal> block to catch errors), or make persistent changes to |
|
|
|
|
settings. Similarly, functions must be marked <literal>PARALLEL |
|
|
|
|
RESTRICTED</literal> if they access temporary tables, client connection state, |
|
|
|
|
cursors, prepared statements, or miscellaneous backend-local state which |
|
|
|
|
cursors, prepared statements, or miscellaneous backend-local state that |
|
|
|
|
the system cannot synchronize across workers. For example, |
|
|
|
|
<literal>setseed</literal> and <literal>random</literal> are parallel restricted for |
|
|
|
|
this last reason. |
|
|
|
|
@ -568,10 +568,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
If a function executed within a parallel worker acquires locks which are |
|
|
|
|
If a function executed within a parallel worker acquires locks that are |
|
|
|
|
not held by the leader, for example by querying a table not referenced in |
|
|
|
|
the query, those locks will be released at worker exit, not end of |
|
|
|
|
transaction. If you write a function which does this, and this behavior |
|
|
|
|
transaction. If you write a function that does this, and this behavior |
|
|
|
|
difference is important to you, mark such functions as |
|
|
|
|
<literal>PARALLEL RESTRICTED</literal> |
|
|
|
|
to ensure that they execute only in the leader. |
|
|
|
|
|