|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* nodes.h
|
|
|
|
* Definitions for tagged nodes.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
|
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* src/include/nodes/nodes.h
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#ifndef NODES_H
|
|
|
|
#define NODES_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The first field of every node is NodeTag. Each node created (with makeNode)
|
|
|
|
* will have one of the following tags as the value of its first field.
|
|
|
|
*
|
|
|
|
* Note that inserting or deleting node types changes the numbers of other
|
|
|
|
* node types later in the list. This is no problem during development, since
|
|
|
|
* the node numbers are never stored on disk. But don't do it in a released
|
|
|
|
* branch, because that would represent an ABI break for extensions.
|
|
|
|
*/
|
|
|
|
typedef enum NodeTag
|
|
|
|
{
|
|
|
|
T_Invalid = 0,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR EXECUTOR NODES (execnodes.h)
|
|
|
|
*/
|
|
|
|
T_IndexInfo,
|
|
|
|
T_ExprContext,
|
|
|
|
T_ProjectionInfo,
|
|
|
|
T_JunkFilter,
|
|
|
|
T_OnConflictSetState,
|
|
|
|
T_ResultRelInfo,
|
|
|
|
T_EState,
|
|
|
|
T_TupleTableSlot,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR PLAN NODES (plannodes.h)
|
|
|
|
*/
|
|
|
|
T_Plan,
|
|
|
|
T_Result,
|
Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
9 years ago
|
|
|
T_ProjectSet,
|
|
|
|
T_ModifyTable,
|
|
|
|
T_Append,
|
|
|
|
T_MergeAppend,
|
|
|
|
T_RecursiveUnion,
|
|
|
|
T_BitmapAnd,
|
|
|
|
T_BitmapOr,
|
|
|
|
T_Scan,
|
|
|
|
T_SeqScan,
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
10 years ago
|
|
|
T_SampleScan,
|
|
|
|
T_IndexScan,
|
|
|
|
T_IndexOnlyScan,
|
|
|
|
T_BitmapIndexScan,
|
|
|
|
T_BitmapHeapScan,
|
|
|
|
T_TidScan,
|
|
|
|
T_SubqueryScan,
|
|
|
|
T_FunctionScan,
|
|
|
|
T_ValuesScan,
|
|
|
|
T_TableFuncScan,
|
|
|
|
T_CteScan,
|
|
|
|
T_NamedTuplestoreScan,
|
|
|
|
T_WorkTableScan,
|
|
|
|
T_ForeignScan,
|
|
|
|
T_CustomScan,
|
|
|
|
T_Join,
|
|
|
|
T_NestLoop,
|
|
|
|
T_MergeJoin,
|
|
|
|
T_HashJoin,
|
|
|
|
T_Material,
|
|
|
|
T_Sort,
|
|
|
|
T_Group,
|
|
|
|
T_Agg,
|
|
|
|
T_WindowAgg,
|
|
|
|
T_Unique,
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
10 years ago
|
|
|
T_Gather,
|
|
|
|
T_GatherMerge,
|
|
|
|
T_Hash,
|
|
|
|
T_SetOp,
|
|
|
|
T_LockRows,
|
|
|
|
T_Limit,
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
16 years ago
|
|
|
/* these aren't subclasses of Plan: */
|
|
|
|
T_NestLoopParam,
|
Re-implement EvalPlanQual processing to improve its performance and eliminate
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
16 years ago
|
|
|
T_PlanRowMark,
|
|
|
|
T_PlanInvalItem,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR PLAN STATE NODES (execnodes.h)
|
|
|
|
*
|
|
|
|
* These should correspond one-to-one with Plan node types.
|
|
|
|
*/
|
|
|
|
T_PlanState,
|
|
|
|
T_ResultState,
|
Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
9 years ago
|
|
|
T_ProjectSetState,
|
|
|
|
T_ModifyTableState,
|
|
|
|
T_AppendState,
|
|
|
|
T_MergeAppendState,
|
|
|
|
T_RecursiveUnionState,
|
|
|
|
T_BitmapAndState,
|
|
|
|
T_BitmapOrState,
|
|
|
|
T_ScanState,
|
|
|
|
T_SeqScanState,
|
|
|
|
T_SampleScanState,
|
|
|
|
T_IndexScanState,
|
|
|
|
T_IndexOnlyScanState,
|
|
|
|
T_BitmapIndexScanState,
|
|
|
|
T_BitmapHeapScanState,
|
|
|
|
T_TidScanState,
|
|
|
|
T_SubqueryScanState,
|
|
|
|
T_FunctionScanState,
|
|
|
|
T_TableFuncScanState,
|
|
|
|
T_ValuesScanState,
|
|
|
|
T_CteScanState,
|
|
|
|
T_NamedTuplestoreScanState,
|
|
|
|
T_WorkTableScanState,
|
|
|
|
T_ForeignScanState,
|
|
|
|
T_CustomScanState,
|
|
|
|
T_JoinState,
|
|
|
|
T_NestLoopState,
|
|
|
|
T_MergeJoinState,
|
|
|
|
T_HashJoinState,
|
|
|
|
T_MaterialState,
|
|
|
|
T_SortState,
|
|
|
|
T_GroupState,
|
|
|
|
T_AggState,
|
|
|
|
T_WindowAggState,
|
|
|
|
T_UniqueState,
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
10 years ago
|
|
|
T_GatherState,
|
|
|
|
T_GatherMergeState,
|
|
|
|
T_HashState,
|
|
|
|
T_SetOpState,
|
|
|
|
T_LockRowsState,
|
|
|
|
T_LimitState,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR PRIMITIVE NODES (primnodes.h)
|
|
|
|
*/
|
|
|
|
T_Alias,
|
|
|
|
T_RangeVar,
|
|
|
|
T_TableFunc,
|
|
|
|
T_Expr,
|
|
|
|
T_Var,
|
|
|
|
T_Const,
|
|
|
|
T_Param,
|
|
|
|
T_Aggref,
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
10 years ago
|
|
|
T_GroupingFunc,
|
|
|
|
T_WindowFunc,
|
|
|
|
T_ArrayRef,
|
|
|
|
T_FuncExpr,
|
|
|
|
T_NamedArgExpr,
|
|
|
|
T_OpExpr,
|
|
|
|
T_DistinctExpr,
|
|
|
|
T_NullIfExpr,
|
|
|
|
T_ScalarArrayOpExpr,
|
|
|
|
T_BoolExpr,
|
|
|
|
T_SubLink,
|
|
|
|
T_SubPlan,
|
|
|
|
T_AlternativeSubPlan,
|
|
|
|
T_FieldSelect,
|
|
|
|
T_FieldStore,
|
|
|
|
T_RelabelType,
|
|
|
|
T_CoerceViaIO,
|
|
|
|
T_ArrayCoerceExpr,
|
|
|
|
T_ConvertRowtypeExpr,
|
|
|
|
T_CollateExpr,
|
|
|
|
T_CaseExpr,
|
|
|
|
T_CaseWhen,
|
|
|
|
T_CaseTestExpr,
|
|
|
|
T_ArrayExpr,
|
|
|
|
T_RowExpr,
|
|
|
|
T_RowCompareExpr,
|
|
|
|
T_CoalesceExpr,
|
|
|
|
T_MinMaxExpr,
|
|
|
|
T_SQLValueFunction,
|
|
|
|
T_XmlExpr,
|
|
|
|
T_NullTest,
|
|
|
|
T_BooleanTest,
|
|
|
|
T_CoerceToDomain,
|
|
|
|
T_CoerceToDomainValue,
|
|
|
|
T_SetToDefault,
|
|
|
|
T_CurrentOfExpr,
|
Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs
support. (outfuncs support is useful today for debugging purposes. The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.) Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost. Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday. Teach pg_stat_statements
about the new node type, too.
While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96. The others are longer-standing oversights, but no time like the
present to fix them. (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)
Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods. Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.
Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.
Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
8 years ago
|
|
|
T_NextValueExpr,
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
10 years ago
|
|
|
T_InferenceElem,
|
|
|
|
T_TargetEntry,
|
|
|
|
T_RangeTblRef,
|
|
|
|
T_JoinExpr,
|
|
|
|
T_FromExpr,
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
10 years ago
|
|
|
T_OnConflictExpr,
|
|
|
|
T_IntoClause,
|
Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning. The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.
At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.
This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.
Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
7 years ago
|
|
|
T_PartitionPruneStep,
|
|
|
|
T_PartitionPruneStepOp,
|
|
|
|
T_PartitionPruneStepCombine,
|
Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
7 years ago
|
|
|
T_PartitionPruneInfo,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
|
|
|
|
*
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
8 years ago
|
|
|
* ExprState represents the evaluation state for a whole expression tree.
|
|
|
|
* Most Expr-based plan nodes do not have a corresponding expression state
|
|
|
|
* node, they're fully handled within execExpr* - but sometimes the state
|
|
|
|
* needs to be shared with other parts of the executor, as for example
|
|
|
|
* with AggrefExprState, which nodeAgg.c has to modify.
|
|
|
|
*/
|
|
|
|
T_ExprState,
|
|
|
|
T_AggrefExprState,
|
|
|
|
T_WindowFuncExprState,
|
Faster expression evaluation and targetlist projection.
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
8 years ago
|
|
|
T_SetExprState,
|
|
|
|
T_SubPlanState,
|
|
|
|
T_AlternativeSubPlanState,
|
|
|
|
T_DomainConstraintState,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR PLANNER NODES (relation.h)
|
|
|
|
*/
|
|
|
|
T_PlannerInfo,
|
|
|
|
T_PlannerGlobal,
|
|
|
|
T_RelOptInfo,
|
|
|
|
T_IndexOptInfo,
|
|
|
|
T_ForeignKeyOptInfo,
|
Revise parameterized-path mechanism to fix assorted issues.
This patch adjusts the treatment of parameterized paths so that all paths
with the same parameterization (same set of required outer rels) for the
same relation will have the same rowcount estimate. We cache the rowcount
estimates to ensure that property, and hopefully save a few cycles too.
Doing this makes it practical for add_path_precheck to operate without
a rowcount estimate: it need only assume that paths with different
parameterizations never dominate each other, which is close enough to
true anyway for coarse filtering, because normally a more-parameterized
path should yield fewer rows thanks to having more join clauses to apply.
In add_path, we do the full nine yards of comparing rowcount estimates
along with everything else, so that we can discard parameterized paths that
don't actually have an advantage. This fixes some issues I'd found with
add_path rejecting parameterized paths on the grounds that they were more
expensive than not-parameterized ones, even though they yielded many fewer
rows and hence would be cheaper once subsequent joining was considered.
To make the same-rowcounts assumption valid, we have to require that any
parameterized path enforce *all* join clauses that could be obtained from
the particular set of outer rels, even if not all of them are useful for
indexing. This is required at both base scans and joins. It's a good
thing anyway since the net impact is that join quals are checked at the
lowest practical level in the join tree. Hence, discard the original
rather ad-hoc mechanism for choosing parameterization joinquals, and build
a better one that has a more principled rule for when clauses can be moved.
The original rule was actually buggy anyway for lack of knowledge about
which relations are part of an outer join's outer side; getting this right
requires adding an outer_relids field to RestrictInfo.
13 years ago
|
|
|
T_ParamPathInfo,
|
|
|
|
T_Path,
|
|
|
|
T_IndexPath,
|
|
|
|
T_BitmapHeapPath,
|
|
|
|
T_BitmapAndPath,
|
|
|
|
T_BitmapOrPath,
|
|
|
|
T_TidPath,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
T_SubqueryScanPath,
|
|
|
|
T_ForeignPath,
|
|
|
|
T_CustomPath,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
T_NestPath,
|
|
|
|
T_MergePath,
|
|
|
|
T_HashPath,
|
|
|
|
T_AppendPath,
|
|
|
|
T_MergeAppendPath,
|
|
|
|
T_ResultPath,
|
|
|
|
T_MaterialPath,
|
|
|
|
T_UniquePath,
|
Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
10 years ago
|
|
|
T_GatherPath,
|
|
|
|
T_GatherMergePath,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
T_ProjectionPath,
|
Move targetlist SRF handling from expression evaluation to new executor node.
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
9 years ago
|
|
|
T_ProjectSetPath,
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
T_SortPath,
|
|
|
|
T_GroupPath,
|
|
|
|
T_UpperUniquePath,
|
|
|
|
T_AggPath,
|
|
|
|
T_GroupingSetsPath,
|
|
|
|
T_MinMaxAggPath,
|
|
|
|
T_WindowAggPath,
|
|
|
|
T_SetOpPath,
|
|
|
|
T_RecursiveUnionPath,
|
|
|
|
T_LockRowsPath,
|
|
|
|
T_ModifyTablePath,
|
|
|
|
T_LimitPath,
|
|
|
|
/* these aren't subclasses of Path: */
|
|
|
|
T_EquivalenceClass,
|
|
|
|
T_EquivalenceMember,
|
|
|
|
T_PathKey,
|
|
|
|
T_PathTarget,
|
|
|
|
T_RestrictInfo,
|
|
|
|
T_PlaceHolderVar,
|
|
|
|
T_SpecialJoinInfo,
|
|
|
|
T_AppendRelInfo,
|
|
|
|
T_PlaceHolderInfo,
|
|
|
|
T_MinMaxAggInfo,
|
|
|
|
T_PlannerParamItem,
|
|
|
|
T_RollupData,
|
|
|
|
T_GroupingSetData,
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
8 years ago
|
|
|
T_StatisticExtInfo,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR MEMORY NODES (memnodes.h)
|
|
|
|
*/
|
|
|
|
T_MemoryContext,
|
|
|
|
T_AllocSetContext,
|
Add "Slab" MemoryContext implementation for efficient equal-sized allocations.
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
8 years ago
|
|
|
T_SlabContext,
|
|
|
|
T_GenerationContext,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR VALUE NODES (value.h)
|
|
|
|
*/
|
|
|
|
T_Value,
|
|
|
|
T_Integer,
|
|
|
|
T_Float,
|
|
|
|
T_String,
|
|
|
|
T_BitString,
|
|
|
|
T_Null,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR LIST NODES (pg_list.h)
|
|
|
|
*/
|
|
|
|
T_List,
|
|
|
|
T_IntList,
|
|
|
|
T_OidList,
|
|
|
|
|
Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data. Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise. Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.
Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.
KaiGai Kohei, per a design suggested by me. Reviewed by Andres Freund
and by me, and further edited by me.
10 years ago
|
|
|
/*
|
|
|
|
* TAGS FOR EXTENSIBLE NODES (extensible.h)
|
|
|
|
*/
|
|
|
|
T_ExtensibleNode,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR STATEMENT NODES (mostly in parsenodes.h)
|
|
|
|
*/
|
Change representation of statement lists, and add statement location info.
This patch makes several changes that improve the consistency of
representation of lists of statements. It's always been the case
that the output of parse analysis is a list of Query nodes, whatever
the types of the individual statements in the list. This patch brings
similar consistency to the outputs of raw parsing and planning steps:
* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.
* The output of pg_plan_queries() is now always a list of PlannedStmt
nodes, even for utility statements. In the case of a utility statement,
"planning" just consists of wrapping a CMD_UTILITY PlannedStmt around
the utility node. This list representation is now used in Portal and
CachedPlan plan lists, replacing the former convention of intermixing
PlannedStmts with bare utility-statement nodes.
Now, every list of statements has a consistent head-node type depending
on how far along it is in processing. This allows changing many places
that formerly used generic "Node *" pointers to use a more specific
pointer type, thus reducing the number of IsA() tests and casts needed,
as well as improving code clarity.
Also, the post-parse-analysis representation of DECLARE CURSOR is changed
so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained
SELECT remains a child of the DeclareCursorStmt rather than getting flipped
around to be the other way. It's now true for both Query and PlannedStmt
that utilityStmt is non-null if and only if commandType is CMD_UTILITY.
That allows simplifying a lot of places that were testing both fields.
(I think some of those were just defensive programming, but in many places,
it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.)
Because PlannedStmt carries a canSetTag field, we're also able to get rid
of some ad-hoc rules about how to reconstruct canSetTag for a bare utility
statement; specifically, the assumption that a utility is canSetTag if and
only if it's the only one in its list. While I see no near-term need for
relaxing that restriction, it's nice to get rid of the ad-hocery.
The API of ProcessUtility() is changed so that what it's passed is the
wrapper PlannedStmt not just the bare utility statement. This will affect
all users of ProcessUtility_hook, but the changes are pretty trivial; see
the affected contrib modules for examples of the minimum change needed.
(Most compilers should give pointer-type-mismatch warnings for uncorrected
code.)
There's also a change in the API of ExplainOneQuery_hook, to pass through
cursorOptions instead of expecting hook functions to know what to pick.
This is needed because of the DECLARE CURSOR changes, but really should
have been done in 9.6; it's unlikely that any extant hook functions
know about using CURSOR_OPT_PARALLEL_OK.
Finally, teach gram.y to save statement boundary locations in RawStmt
nodes, and pass those through to Query and PlannedStmt nodes. This allows
more intelligent handling of cases where a source query string contains
multiple statements. This patch doesn't actually do anything with the
information, but a follow-on patch will. (Passing this information through
cleanly is the true motivation for these changes; while I think this is all
good cleanup, it's unlikely we'd have bothered without this end goal.)
catversion bump because addition of location fields to struct Query
affects stored rules.
This patch is by me, but it owes a good deal to Fabien Coelho who did
a lot of preliminary work on the problem, and also reviewed the patch.
Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
9 years ago
|
|
|
T_RawStmt,
|
|
|
|
T_Query,
|
|
|
|
T_PlannedStmt,
|
|
|
|
T_InsertStmt,
|
|
|
|
T_DeleteStmt,
|
|
|
|
T_UpdateStmt,
|
|
|
|
T_SelectStmt,
|
|
|
|
T_AlterTableStmt,
|
|
|
|
T_AlterTableCmd,
|
|
|
|
T_AlterDomainStmt,
|
|
|
|
T_SetOperationStmt,
|
|
|
|
T_GrantStmt,
|
|
|
|
T_GrantRoleStmt,
|
|
|
|
T_AlterDefaultPrivilegesStmt,
|
|
|
|
T_ClosePortalStmt,
|
|
|
|
T_ClusterStmt,
|
|
|
|
T_CopyStmt,
|
|
|
|
T_CreateStmt,
|
|
|
|
T_DefineStmt,
|
|
|
|
T_DropStmt,
|
|
|
|
T_TruncateStmt,
|
|
|
|
T_CommentStmt,
|
|
|
|
T_FetchStmt,
|
|
|
|
T_IndexStmt,
|
|
|
|
T_CreateFunctionStmt,
|
|
|
|
T_AlterFunctionStmt,
|
|
|
|
T_DoStmt,
|
|
|
|
T_RenameStmt,
|
|
|
|
T_RuleStmt,
|
|
|
|
T_NotifyStmt,
|
|
|
|
T_ListenStmt,
|
|
|
|
T_UnlistenStmt,
|
|
|
|
T_TransactionStmt,
|
|
|
|
T_ViewStmt,
|
|
|
|
T_LoadStmt,
|
|
|
|
T_CreateDomainStmt,
|
|
|
|
T_CreatedbStmt,
|
|
|
|
T_DropdbStmt,
|
|
|
|
T_VacuumStmt,
|
|
|
|
T_ExplainStmt,
|
Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements. The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.
In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs. Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.
Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.
Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.
Andres Freund and Tom Lane
13 years ago
|
|
|
T_CreateTableAsStmt,
|
|
|
|
T_CreateSeqStmt,
|
|
|
|
T_AlterSeqStmt,
|
|
|
|
T_VariableSetStmt,
|
|
|
|
T_VariableShowStmt,
|
|
|
|
T_DiscardStmt,
|
|
|
|
T_CreateTrigStmt,
|
|
|
|
T_CreatePLangStmt,
|
|
|
|
T_CreateRoleStmt,
|
|
|
|
T_AlterRoleStmt,
|
|
|
|
T_DropRoleStmt,
|
|
|
|
T_LockStmt,
|
|
|
|
T_ConstraintsSetStmt,
|
|
|
|
T_ReindexStmt,
|
|
|
|
T_CheckPointStmt,
|
|
|
|
T_CreateSchemaStmt,
|
|
|
|
T_AlterDatabaseStmt,
|
|
|
|
T_AlterDatabaseSetStmt,
|
|
|
|
T_AlterRoleSetStmt,
|
|
|
|
T_CreateConversionStmt,
|
|
|
|
T_CreateCastStmt,
|
|
|
|
T_CreateOpClassStmt,
|
|
|
|
T_CreateOpFamilyStmt,
|
|
|
|
T_AlterOpFamilyStmt,
|
|
|
|
T_PrepareStmt,
|
|
|
|
T_ExecuteStmt,
|
|
|
|
T_DeallocateStmt,
|
|
|
|
T_DeclareCursorStmt,
|
|
|
|
T_CreateTableSpaceStmt,
|
|
|
|
T_DropTableSpaceStmt,
|
|
|
|
T_AlterObjectDependsStmt,
|
|
|
|
T_AlterObjectSchemaStmt,
|
|
|
|
T_AlterOwnerStmt,
|
|
|
|
T_AlterOperatorStmt,
|
|
|
|
T_DropOwnedStmt,
|
|
|
|
T_ReassignOwnedStmt,
|
|
|
|
T_CompositeTypeStmt,
|
|
|
|
T_CreateEnumStmt,
|
|
|
|
T_CreateRangeStmt,
|
|
|
|
T_AlterEnumStmt,
|
|
|
|
T_AlterTSDictionaryStmt,
|
|
|
|
T_AlterTSConfigurationStmt,
|
|
|
|
T_CreateFdwStmt,
|
|
|
|
T_AlterFdwStmt,
|
|
|
|
T_CreateForeignServerStmt,
|
|
|
|
T_AlterForeignServerStmt,
|
|
|
|
T_CreateUserMappingStmt,
|
|
|
|
T_AlterUserMappingStmt,
|
|
|
|
T_DropUserMappingStmt,
|
|
|
|
T_AlterTableSpaceOptionsStmt,
|
|
|
|
T_AlterTableMoveAllStmt,
|
|
|
|
T_SecLabelStmt,
|
|
|
|
T_CreateForeignTableStmt,
|
|
|
|
T_ImportForeignSchemaStmt,
|
|
|
|
T_CreateExtensionStmt,
|
|
|
|
T_AlterExtensionStmt,
|
|
|
|
T_AlterExtensionContentsStmt,
|
|
|
|
T_CreateEventTrigStmt,
|
|
|
|
T_AlterEventTrigStmt,
|
|
|
|
T_RefreshMatViewStmt,
|
|
|
|
T_ReplicaIdentityStmt,
|
|
|
|
T_AlterSystemStmt,
|
Row-Level Security Policies (RLS)
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table. Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.
New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner. Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.
Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used. If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.
By default, row security is applied at all times except for the
table owner and the superuser. A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE. When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.
Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.
A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.
Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.
Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.
Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
11 years ago
|
|
|
T_CreatePolicyStmt,
|
|
|
|
T_AlterPolicyStmt,
|
|
|
|
T_CreateTransformStmt,
|
|
|
|
T_CreateAmStmt,
|
|
|
|
T_CreatePublicationStmt,
|
|
|
|
T_AlterPublicationStmt,
|
|
|
|
T_CreateSubscriptionStmt,
|
|
|
|
T_AlterSubscriptionStmt,
|
|
|
|
T_DropSubscriptionStmt,
|
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
8 years ago
|
|
|
T_CreateStatsStmt,
|
|
|
|
T_AlterCollationStmt,
|
|
|
|
T_CallStmt,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR PARSE TREE NODES (parsenodes.h)
|
|
|
|
*/
|
|
|
|
T_A_Expr,
|
|
|
|
T_ColumnRef,
|
|
|
|
T_ParamRef,
|
|
|
|
T_A_Const,
|
|
|
|
T_FuncCall,
|
|
|
|
T_A_Star,
|
|
|
|
T_A_Indices,
|
|
|
|
T_A_Indirection,
|
|
|
|
T_A_ArrayExpr,
|
|
|
|
T_ResTarget,
|
Implement UPDATE tab SET (col1,col2,...) = (SELECT ...), ...
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated. While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.
The standard actually says that the source for a multi-column assignment
could be any row-valued expression. The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too. However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
11 years ago
|
|
|
T_MultiAssignRef,
|
|
|
|
T_TypeCast,
|
|
|
|
T_CollateClause,
|
|
|
|
T_SortBy,
|
|
|
|
T_WindowDef,
|
|
|
|
T_RangeSubselect,
|
|
|
|
T_RangeFunction,
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
10 years ago
|
|
|
T_RangeTableSample,
|
|
|
|
T_RangeTableFunc,
|
|
|
|
T_RangeTableFuncCol,
|
|
|
|
T_TypeName,
|
|
|
|
T_ColumnDef,
|
|
|
|
T_IndexElem,
|
|
|
|
T_Constraint,
|
|
|
|
T_DefElem,
|
|
|
|
T_RangeTblEntry,
|
Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
12 years ago
|
|
|
T_RangeTblFunction,
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
10 years ago
|
|
|
T_TableSampleClause,
|
|
|
|
T_WithCheckOption,
|
|
|
|
T_SortGroupClause,
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
10 years ago
|
|
|
T_GroupingSet,
|
|
|
|
T_WindowClause,
|
|
|
|
T_ObjectWithArgs,
|
|
|
|
T_AccessPriv,
|
|
|
|
T_CreateOpClassItem,
|
|
|
|
T_TableLikeClause,
|
|
|
|
T_FunctionParameter,
|
|
|
|
T_LockingClause,
|
|
|
|
T_RowMarkClause,
|
|
|
|
T_XmlSerialize,
|
|
|
|
T_WithClause,
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
10 years ago
|
|
|
T_InferClause,
|
|
|
|
T_OnConflictClause,
|
|
|
|
T_CommonTableExpr,
|
Allow CURRENT/SESSION_USER to be used in certain commands
Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
user specifiers in place of an explicit user name.
This commit also fixes some quite ugly handling of special standards-
mandated syntax in CREATE USER MAPPING, which in particular would fail
to work in presence of a role named "current_user".
The special role specifiers PUBLIC and NONE also have more consistent
handling now.
Also take the opportunity to add location tracking to user specifiers.
Authors: Kyotaro Horiguchi. Heavily reworked by Álvaro Herrera.
Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
10 years ago
|
|
|
T_RoleSpec,
|
|
|
|
T_TriggerTransition,
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
9 years ago
|
|
|
T_PartitionElem,
|
|
|
|
T_PartitionSpec,
|
|
|
|
T_PartitionBoundSpec,
|
|
|
|
T_PartitionRangeDatum,
|
Code review focused on new node types added by partitioning support.
Fix failure to check that we got a plain Const from const-simplification of
a coercion request. This is the cause of bug #14666 from Tian Bing: there
is an int4 to money cast, but it's only stable not immutable (because of
dependence on lc_monetary), resulting in a FuncExpr that the code was
miserably unequipped to deal with, or indeed even to notice that it was
failing to deal with. Add test cases around this coercion behavior.
In view of the above, sprinkle the code liberally with castNode() macros,
in hope of catching the next such bug a bit sooner. Also, change some
functions that were randomly declared to take Node* to take more specific
pointer types. And change some struct fields that were declared Node*
but could be given more specific types, allowing removal of assorted
explicit casts.
Place PARTITION_MAX_KEYS check a bit closer to the code it's protecting.
Likewise check only-one-key-for-list-partitioning restriction in a less
random place.
Avoid not-per-project-style usages like !strcmp(...).
Fix assorted failures to avoid scribbling on the input of parse
transformation. I'm not sure how necessary this is, but it's entirely
silly for these functions to be expending cycles to avoid that and not
getting it right.
Add guards against partitioning on system columns.
Put backend/nodes/ support code into an order that matches handling
of these node types elsewhere.
Annotate the fact that somebody added location fields to PartitionBoundSpec
and PartitionRangeDatum but forgot to handle them in
outfuncs.c/readfuncs.c. This is fairly harmless for production purposes
(since readfuncs.c would just substitute -1 anyway) but it's still bogus.
It's not worth forcing a post-beta1 initdb just to fix this, but if we
have another reason to force initdb before 10.0, we should go back and
clean this up.
Contrariwise, somebody added location fields to PartitionElem and
PartitionSpec but forgot to teach exprLocation() about them.
Consolidate duplicative code in transformPartitionBound().
Improve a couple of error messages.
Improve assorted commentary.
Re-pgindent the files touched by this patch; this affects a few comment
blocks that must have been added quite recently.
Report: https://postgr.es/m/20170524024550.29935.14396@wrigleys.postgresql.org
8 years ago
|
|
|
T_PartitionCmd,
|
|
|
|
T_VacuumRelation,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
|
|
|
|
*/
|
|
|
|
T_IdentifySystemCmd,
|
|
|
|
T_BaseBackupCmd,
|
|
|
|
T_CreateReplicationSlotCmd,
|
|
|
|
T_DropReplicationSlotCmd,
|
|
|
|
T_StartReplicationCmd,
|
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
13 years ago
|
|
|
T_TimeLineHistoryCmd,
|
|
|
|
T_SQLCmd,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* TAGS FOR RANDOM OTHER STUFF
|
|
|
|
*
|
|
|
|
* These are objects that aren't part of parse/plan/execute node tree
|
|
|
|
* structures, but we give them NodeTags anyway for identification
|
|
|
|
* purposes (usually because they are involved in APIs where we want to
|
|
|
|
* pass multiple object types through the same pointer).
|
|
|
|
*/
|
|
|
|
T_TriggerData, /* in commands/trigger.h */
|
|
|
|
T_EventTriggerData, /* in commands/event_trigger.h */
|
|
|
|
T_ReturnSetInfo, /* in nodes/execnodes.h */
|
|
|
|
T_WindowObjectData, /* private in nodeWindowAgg.c */
|
|
|
|
T_TIDBitmap, /* in nodes/tidbitmap.h */
|
|
|
|
T_InlineCodeBlock, /* in nodes/parsenodes.h */
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
10 years ago
|
|
|
T_FdwRoutine, /* in foreign/fdwapi.h */
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
10 years ago
|
|
|
T_IndexAmRoutine, /* in access/amapi.h */
|
|
|
|
T_TsmRoutine, /* in access/tsmapi.h */
|
Transaction control in PL procedures
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language. Add similar underlying functions to SPI. Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call. Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
- SPI
Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags. The only option flag right now is
SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed. This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called. A nonatomic SPI connection uses different
memory management. A normal SPI connection allocates its memory in
TopTransactionContext. For nonatomic connections we use PortalContext
instead. As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.
SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.
- portalmem.c
Some adjustments were made in the code that cleans up portals at
transaction abort. The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.
In AtAbort_Portals(), remove the code that marks an active portal as
failed. As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort. And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception. So the code in AtAbort_Portals() is never used
anyway.
In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much. This mirrors similar code in
PreCommit_Portals().
- PL/Perl
Gets new functions spi_commit() and spi_rollback()
- PL/pgSQL
Gets new commands COMMIT and ROLLBACK.
Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.
- PL/Python
Gets new functions plpy.commit and plpy.rollback.
- PL/Tcl
Gets new commands commit and rollback.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
8 years ago
|
|
|
T_ForeignKeyCacheInfo, /* in utils/rel.h */
|
|
|
|
T_CallContext /* in nodes/parsenodes.h */
|
|
|
|
} NodeTag;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The first field of a node of any type is guaranteed to be the NodeTag.
|
|
|
|
* Hence the type of any node can be gotten by casting it to Node. Declaring
|
|
|
|
* a variable to be of Node * (instead of void *) can also facilitate
|
|
|
|
* debugging.
|
|
|
|
*/
|
|
|
|
typedef struct Node
|
|
|
|
{
|
|
|
|
NodeTag type;
|
|
|
|
} Node;
|
|
|
|
|
|
|
|
#define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* newNode -
|
|
|
|
* create a new node of the specified size and tag the node with the
|
|
|
|
* specified tag.
|
|
|
|
*
|
|
|
|
* !WARNING!: Avoid using newNode directly. You should be using the
|
|
|
|
* macro makeNode. eg. to create a Query node, use makeNode(Query)
|
|
|
|
*
|
|
|
|
* Note: the size argument should always be a compile-time constant, so the
|
|
|
|
* apparent risk of multiple evaluation doesn't matter in practice.
|
|
|
|
*/
|
|
|
|
#ifdef __GNUC__
|
|
|
|
|
|
|
|
/* With GCC, we can use a compound statement within an expression */
|
|
|
|
#define newNode(size, tag) \
|
|
|
|
({ Node *_result; \
|
|
|
|
AssertMacro((size) >= sizeof(Node)); /* need the tag, at least */ \
|
|
|
|
_result = (Node *) palloc0fast(size); \
|
|
|
|
_result->type = (tag); \
|
|
|
|
_result; \
|
|
|
|
})
|
|
|
|
#else
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There is no way to dereference the palloc'ed pointer to assign the
|
|
|
|
* tag, and also return the pointer itself, so we need a holder variable.
|
|
|
|
* Fortunately, this macro isn't recursive so we just define
|
|
|
|
* a global variable for this purpose.
|
|
|
|
*/
|
|
|
|
extern PGDLLIMPORT Node *newNodeMacroHolder;
|
|
|
|
|
|
|
|
#define newNode(size, tag) \
|
|
|
|
( \
|
|
|
|
AssertMacro((size) >= sizeof(Node)), /* need the tag, at least */ \
|
|
|
|
newNodeMacroHolder = (Node *) palloc0fast(size), \
|
|
|
|
newNodeMacroHolder->type = (tag), \
|
|
|
|
newNodeMacroHolder \
|
|
|
|
)
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
8 years ago
|
|
|
#endif /* __GNUC__ */
|
|
|
|
|
|
|
|
|
|
|
|
#define makeNode(_type_) ((_type_ *) newNode(sizeof(_type_),T_##_type_))
|
|
|
|
#define NodeSetTag(nodeptr,t) (((Node*)(nodeptr))->type = (t))
|
|
|
|
|
|
|
|
#define IsA(nodeptr,_type_) (nodeTag(nodeptr) == T_##_type_)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* castNode(type, ptr) casts ptr to "type *", and if assertions are enabled,
|
|
|
|
* verifies that the node has the appropriate type (using its nodeTag()).
|
|
|
|
*
|
|
|
|
* Use an inline function when assertions are enabled, to avoid multiple
|
|
|
|
* evaluations of the ptr argument (which could e.g. be a function call).
|
|
|
|
*/
|
|
|
|
#ifdef USE_ASSERT_CHECKING
|
|
|
|
static inline Node *
|
|
|
|
castNodeImpl(NodeTag type, void *ptr)
|
|
|
|
{
|
|
|
|
Assert(ptr == NULL || nodeTag(ptr) == type);
|
|
|
|
return (Node *) ptr;
|
|
|
|
}
|
|
|
|
#define castNode(_type_, nodeptr) ((_type_ *) castNodeImpl(T_##_type_, nodeptr))
|
|
|
|
#else
|
|
|
|
#define castNode(_type_, nodeptr) ((_type_ *) (nodeptr))
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
8 years ago
|
|
|
#endif /* USE_ASSERT_CHECKING */
|
|
|
|
|
|
|
|
|
|
|
|
/* ----------------------------------------------------------------
|
|
|
|
* extern declarations follow
|
|
|
|
* ----------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* nodes/{outfuncs.c,print.c}
|
|
|
|
*/
|
|
|
|
struct Bitmapset; /* not to include bitmapset.h here */
|
|
|
|
struct StringInfoData; /* not to include stringinfo.h here */
|
|
|
|
|
|
|
|
extern void outNode(struct StringInfoData *str, const void *obj);
|
Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data. Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise. Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.
Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.
KaiGai Kohei, per a design suggested by me. Reviewed by Andres Freund
and by me, and further edited by me.
10 years ago
|
|
|
extern void outToken(struct StringInfoData *str, const char *s);
|
|
|
|
extern void outBitmapset(struct StringInfoData *str,
|
|
|
|
const struct Bitmapset *bms);
|
|
|
|
extern void outDatum(struct StringInfoData *str, uintptr_t value,
|
|
|
|
int typlen, bool typbyval);
|
|
|
|
extern char *nodeToString(const void *obj);
|
|
|
|
extern char *bmsToString(const struct Bitmapset *bms);
|
Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data. Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise. Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.
Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.
KaiGai Kohei, per a design suggested by me. Reviewed by Andres Freund
and by me, and further edited by me.
10 years ago
|
|
|
|
|
|
|
/*
|
|
|
|
* nodes/{readfuncs.c,read.c}
|
|
|
|
*/
|
|
|
|
extern void *stringToNode(char *str);
|
Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data. Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise. Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.
Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.
KaiGai Kohei, per a design suggested by me. Reviewed by Andres Freund
and by me, and further edited by me.
10 years ago
|
|
|
extern struct Bitmapset *readBitmapset(void);
|
|
|
|
extern uintptr_t readDatum(bool typbyval);
|
|
|
|
extern bool *readBoolCols(int numCols);
|
|
|
|
extern int *readIntCols(int numCols);
|
|
|
|
extern Oid *readOidCols(int numCols);
|
|
|
|
extern int16 *readAttrNumberCols(int numCols);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* nodes/copyfuncs.c
|
|
|
|
*/
|
|
|
|
extern void *copyObjectImpl(const void *obj);
|
|
|
|
|
|
|
|
/* cast result back to argument type, if supported by compiler */
|
|
|
|
#ifdef HAVE_TYPEOF
|
|
|
|
#define copyObject(obj) ((typeof(obj)) copyObjectImpl(obj))
|
|
|
|
#else
|
|
|
|
#define copyObject(obj) copyObjectImpl(obj)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* nodes/equalfuncs.c
|
|
|
|
*/
|
|
|
|
extern bool equal(const void *a, const void *b);
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Typedefs for identifying qualifier selectivities and plan costs as such.
|
|
|
|
* These are just plain "double"s, but declaring a variable as Selectivity
|
|
|
|
* or Cost makes the intent more obvious.
|
|
|
|
*
|
|
|
|
* These could have gone into plannodes.h or some such, but many files
|
|
|
|
* depend on them...
|
|
|
|
*/
|
|
|
|
typedef double Selectivity; /* fraction of tuples a qualifier will pass */
|
|
|
|
typedef double Cost; /* execution cost (in page-access units) */
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* CmdType -
|
|
|
|
* enums for type of operation represented by a Query or PlannedStmt
|
|
|
|
*
|
|
|
|
* This is needed in both parsenodes.h and plannodes.h, so put it here...
|
|
|
|
*/
|
|
|
|
typedef enum CmdType
|
|
|
|
{
|
|
|
|
CMD_UNKNOWN,
|
|
|
|
CMD_SELECT, /* select stmt */
|
|
|
|
CMD_UPDATE, /* update stmt */
|
|
|
|
CMD_INSERT, /* insert stmt */
|
|
|
|
CMD_DELETE,
|
|
|
|
CMD_UTILITY, /* cmds like create, destroy, copy, vacuum,
|
|
|
|
* etc. */
|
|
|
|
CMD_NOTHING /* dummy command for instead nothing rules
|
|
|
|
* with qual */
|
|
|
|
} CmdType;
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* JoinType -
|
|
|
|
* enums for types of relation joins
|
|
|
|
*
|
|
|
|
* JoinType determines the exact semantics of joining two relations using
|
|
|
|
* a matching qualification. For example, it tells what to do with a tuple
|
|
|
|
* that has no match in the other relation.
|
|
|
|
*
|
|
|
|
* This is needed in both parsenodes.h and plannodes.h, so put it here...
|
|
|
|
*/
|
|
|
|
typedef enum JoinType
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The canonical kinds of joins according to the SQL JOIN syntax. Only
|
|
|
|
* these codes can appear in parser output (e.g., JoinExpr nodes).
|
|
|
|
*/
|
|
|
|
JOIN_INNER, /* matching tuple pairs only */
|
|
|
|
JOIN_LEFT, /* pairs + unmatched LHS tuples */
|
|
|
|
JOIN_FULL, /* pairs + unmatched LHS + unmatched RHS */
|
|
|
|
JOIN_RIGHT, /* pairs + unmatched RHS tuples */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Semijoins and anti-semijoins (as defined in relational theory) do not
|
|
|
|
* appear in the SQL JOIN syntax, but there are standard idioms for
|
|
|
|
* representing them (e.g., using EXISTS). The planner recognizes these
|
|
|
|
* cases and converts them to joins. So the planner and executor must
|
|
|
|
* support these codes. NOTE: in JOIN_SEMI output, it is unspecified
|
|
|
|
* which matching RHS row is joined to. In JOIN_ANTI output, the row is
|
|
|
|
* guaranteed to be null-extended.
|
|
|
|
*/
|
|
|
|
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
|
|
|
|
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* These codes are used internally in the planner, but are not supported
|
|
|
|
* by the executor (nor, indeed, by most of the planner).
|
|
|
|
*/
|
|
|
|
JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
|
|
|
|
JOIN_UNIQUE_INNER /* RHS path must be made unique */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We might need additional join types someday.
|
|
|
|
*/
|
|
|
|
} JoinType;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* OUTER joins are those for which pushed-down quals must behave differently
|
|
|
|
* from the join's own quals. This is in fact everything except INNER and
|
|
|
|
* SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
|
|
|
|
* since those are temporary proxies for what will eventually be an INNER
|
|
|
|
* join.
|
|
|
|
*
|
|
|
|
* Note: semijoins are a hybrid case, but we choose to treat them as not
|
|
|
|
* being outer joins. This is okay principally because the SQL syntax makes
|
|
|
|
* it impossible to have a pushed-down qual that refers to the inner relation
|
|
|
|
* of a semijoin; so there is no strong need to distinguish join quals from
|
|
|
|
* pushed-down quals. This is convenient because for almost all purposes,
|
|
|
|
* quals attached to a semijoin can be treated the same as innerjoin quals.
|
|
|
|
*/
|
|
|
|
#define IS_OUTER_JOIN(jointype) \
|
|
|
|
(((1 << (jointype)) & \
|
|
|
|
((1 << JOIN_LEFT) | \
|
|
|
|
(1 << JOIN_FULL) | \
|
|
|
|
(1 << JOIN_RIGHT) | \
|
|
|
|
(1 << JOIN_ANTI))) != 0)
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
/*
|
|
|
|
* AggStrategy -
|
|
|
|
* overall execution strategies for Agg plan nodes
|
|
|
|
*
|
|
|
|
* This is needed in both plannodes.h and relation.h, so put it here...
|
|
|
|
*/
|
|
|
|
typedef enum AggStrategy
|
|
|
|
{
|
|
|
|
AGG_PLAIN, /* simple agg across all input rows */
|
|
|
|
AGG_SORTED, /* grouped agg, input must be sorted */
|
|
|
|
AGG_HASHED, /* grouped agg, use internal hashtable */
|
|
|
|
AGG_MIXED /* grouped agg, hash and sort both used */
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
} AggStrategy;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* AggSplit -
|
|
|
|
* splitting (partial aggregation) modes for Agg plan nodes
|
|
|
|
*
|
|
|
|
* This is needed in both plannodes.h and relation.h, so put it here...
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Primitive options supported by nodeAgg.c: */
|
|
|
|
#define AGGSPLITOP_COMBINE 0x01 /* substitute combinefn for transfn */
|
|
|
|
#define AGGSPLITOP_SKIPFINAL 0x02 /* skip finalfn, return state as-is */
|
|
|
|
#define AGGSPLITOP_SERIALIZE 0x04 /* apply serializefn to output */
|
|
|
|
#define AGGSPLITOP_DESERIALIZE 0x08 /* apply deserializefn to input */
|
|
|
|
|
|
|
|
/* Supported operating modes (i.e., useful combinations of these options): */
|
|
|
|
typedef enum AggSplit
|
|
|
|
{
|
|
|
|
/* Basic, non-split aggregation: */
|
|
|
|
AGGSPLIT_SIMPLE = 0,
|
|
|
|
/* Initial phase of partial aggregation, with serialization: */
|
|
|
|
AGGSPLIT_INITIAL_SERIAL = AGGSPLITOP_SKIPFINAL | AGGSPLITOP_SERIALIZE,
|
|
|
|
/* Final phase of partial aggregation, with deserialization: */
|
|
|
|
AGGSPLIT_FINAL_DESERIAL = AGGSPLITOP_COMBINE | AGGSPLITOP_DESERIALIZE
|
|
|
|
} AggSplit;
|
|
|
|
|
|
|
|
/* Test whether an AggSplit value selects each primitive option: */
|
|
|
|
#define DO_AGGSPLIT_COMBINE(as) (((as) & AGGSPLITOP_COMBINE) != 0)
|
|
|
|
#define DO_AGGSPLIT_SKIPFINAL(as) (((as) & AGGSPLITOP_SKIPFINAL) != 0)
|
|
|
|
#define DO_AGGSPLIT_SERIALIZE(as) (((as) & AGGSPLITOP_SERIALIZE) != 0)
|
|
|
|
#define DO_AGGSPLIT_DESERIALIZE(as) (((as) & AGGSPLITOP_DESERIALIZE) != 0)
|
|
|
|
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
9 years ago
|
|
|
/*
|
|
|
|
* SetOpCmd and SetOpStrategy -
|
|
|
|
* overall semantics and execution strategies for SetOp plan nodes
|
|
|
|
*
|
|
|
|
* This is needed in both plannodes.h and relation.h, so put it here...
|
|
|
|
*/
|
|
|
|
typedef enum SetOpCmd
|
|
|
|
{
|
|
|
|
SETOPCMD_INTERSECT,
|
|
|
|
SETOPCMD_INTERSECT_ALL,
|
|
|
|
SETOPCMD_EXCEPT,
|
|
|
|
SETOPCMD_EXCEPT_ALL
|
|
|
|
} SetOpCmd;
|
|
|
|
|
|
|
|
typedef enum SetOpStrategy
|
|
|
|
{
|
|
|
|
SETOP_SORTED, /* input must be sorted */
|
|
|
|
SETOP_HASHED /* use internal hashtable */
|
|
|
|
} SetOpStrategy;
|
|
|
|
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
10 years ago
|
|
|
/*
|
|
|
|
* OnConflictAction -
|
|
|
|
* "ON CONFLICT" clause type of query
|
|
|
|
*
|
|
|
|
* This is needed in both parsenodes.h and plannodes.h, so put it here...
|
|
|
|
*/
|
|
|
|
typedef enum OnConflictAction
|
|
|
|
{
|
|
|
|
ONCONFLICT_NONE, /* No "ON CONFLICT" clause */
|
|
|
|
ONCONFLICT_NOTHING, /* ON CONFLICT ... DO NOTHING */
|
|
|
|
ONCONFLICT_UPDATE /* ON CONFLICT ... DO UPDATE */
|
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
10 years ago
|
|
|
} OnConflictAction;
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
8 years ago
|
|
|
#endif /* NODES_H */
|