There was a grammar ambiguity between SET PUBLICATION name REFRESH and
SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word. To
resolve that, fold the refresh choice into the WITH options. Refreshing
is the default now.
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Fix failure to check that we got a plain Const from const-simplification of
a coercion request. This is the cause of bug #14666 from Tian Bing: there
is an int4 to money cast, but it's only stable not immutable (because of
dependence on lc_monetary), resulting in a FuncExpr that the code was
miserably unequipped to deal with, or indeed even to notice that it was
failing to deal with. Add test cases around this coercion behavior.
In view of the above, sprinkle the code liberally with castNode() macros,
in hope of catching the next such bug a bit sooner. Also, change some
functions that were randomly declared to take Node* to take more specific
pointer types. And change some struct fields that were declared Node*
but could be given more specific types, allowing removal of assorted
explicit casts.
Place PARTITION_MAX_KEYS check a bit closer to the code it's protecting.
Likewise check only-one-key-for-list-partitioning restriction in a less
random place.
Avoid not-per-project-style usages like !strcmp(...).
Fix assorted failures to avoid scribbling on the input of parse
transformation. I'm not sure how necessary this is, but it's entirely
silly for these functions to be expending cycles to avoid that and not
getting it right.
Add guards against partitioning on system columns.
Put backend/nodes/ support code into an order that matches handling
of these node types elsewhere.
Annotate the fact that somebody added location fields to PartitionBoundSpec
and PartitionRangeDatum but forgot to handle them in
outfuncs.c/readfuncs.c. This is fairly harmless for production purposes
(since readfuncs.c would just substitute -1 anyway) but it's still bogus.
It's not worth forcing a post-beta1 initdb just to fix this, but if we
have another reason to force initdb before 10.0, we should go back and
clean this up.
Contrariwise, somebody added location fields to PartitionElem and
PartitionSpec but forgot to teach exprLocation() about them.
Consolidate duplicative code in transformPartitionBound().
Improve a couple of error messages.
Improve assorted commentary.
Re-pgindent the files touched by this patch; this affects a few comment
blocks that must have been added quite recently.
Report: https://postgr.es/m/20170524024550.29935.14396@wrigleys.postgresql.org
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types. Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only. (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).
Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command. This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.
Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming. He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.
Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.
Implement tab-completion for statistics objects. This can stand some
more improvement.
Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
It turned out this approach had problems, because a DROP command should
not have any options other than CASCADE and RESTRICT. Instead, always
attempt to drop the slot if there is one configured, but also add an
ALTER SUBSCRIPTION action to set the slot to NONE.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/29431.1493730652@sss.pgh.pa.us
Even though no actual tuples are ever inserted into a partitioned
table (the actual tuples are in the partitions, not the partitioned
table itself), we still need to have a ResultRelInfo for the
partitioned table, or per-statement triggers won't get fired.
Amit Langote, per a report from Rajkumar Raghuwanshi. Reviewed by me.
Discussion: http://postgr.es/m/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
is_parallel_safe() supposed that the only relevant property of a SubPlan
was the parallel safety of the referenced subplan tree. This is wrong:
the testexpr or args subtrees might contain parallel-unsafe stuff, as
demonstrated by the test case added here. However, just recursing into the
subtrees fails in a different way: we'll typically find PARAM_EXEC Params
representing the subplan's output columns in the testexpr. The previous
coding supposed that any Param must be treated as parallel-restricted, so
that a naive attempt at fixing this disabled parallel pushdown of SubPlans
altogether. We must instead determine, for any visited Param, whether it
is one that would be computed by a surrounding SubPlan node; if so, it's
safe to push down along with the SubPlan node.
We might later be able to extend this logic to cope with Params used for
correlated subplans and other cases; but that's a task for v11 or beyond.
Tom Lane and Amit Kapila
Discussion: https://postgr.es/m/7064.1492022469@sss.pgh.pa.us
Commit 5e6d8d2bb allowed parallel workers to execute parallel-safe
subplans, but it transmitted the query's entire list of subplans to
the worker(s). Since execMain.c blindly does ExecInitNode and later
ExecEndNode on every list element, this resulted in parallel-unsafe plan
nodes nonetheless getting started up and shut down in parallel workers.
That seems mostly harmless as far as core plan node types go (but
maybe not so much for Gather?). But it resulted in postgres_fdw
opening and then closing extra remote connections, and it's likely
that other non-parallel-safe FDWs or custom scan providers would have
worse reactions.
To fix, just make ExecSerializePlan replace parallel-unsafe subplans
with NULLs in the cut-down plan tree that it transmits to workers.
This relies on ExecInitNode and ExecEndNode to do nothing on NULL
input, but they do anyway. If anything else is touching the dropped
subplans in a parallel worker, that would be a bug to be fixed.
(This thus provides a strong guarantee that we won't try to do
something with a parallel-unsafe subplan in a worker.)
This is, I think, the last fix directly occasioned by Andreas Seltenreich's
bug report of a few days ago.
Tom Lane and Amit Kapila
Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
We'd managed to avoid doing this so far, but it seems pretty obvious
that it would be forced on us some day, and this is much the cleanest
way of approaching the open problem that parallel-unsafe subplans are
being transmitted to parallel workers. Anyway there's no space cost
due to alignment considerations, and the time cost is pretty minimal
since we're just copying the flag from the corresponding Path node.
(At least in most cases ... some of the klugier spots in createplan.c
have to work a bit harder.)
In principle we could perhaps get rid of SubPlan.parallel_safe,
but I thought it better to keep that in case there are reasons to
consider a SubPlan unsafe even when its child plan is parallel-safe.
This patch doesn't actually do anything with the new flags, but
I thought I'd commit it separately anyway.
Note: although this touches outfuncs/readfuncs, there's no need for
a catversion bump because Plan trees aren't stored on disk.
Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
This extends the castNode() notation introduced by commit 5bcab1114 to
provide, in one step, extraction of a list cell's pointer and coercion to
a concrete node type. For example, "lfirst_node(Foo, lc)" is the same
as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode
that have appeared so far include a list extraction call, so this is
pretty widely useful, and it saves a few more keystrokes compared to the
old way.
As with the previous patch, back-patch the addition of these macros to
pg_list.h, so that the notation will be available when back-patching.
Patch by me, after an idea of Andrew Gierth's.
Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row. This saves useless scanning in nestloop
and hash joins. In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.
Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses. This could be improved later.
David Rowley, reviewed and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
This is the SQL standard-conforming variant of PostgreSQL's serial
columns. It fixes a few usability issues that serial columns have:
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro
Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution. At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs. The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.
An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.
Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.
The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.
An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement. No tests previously covered that
possibility, so one is added.
Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
Commit 45be99f8cd removed GatherPath's
num_workers field, but this is entirely bogus. Normally, a path's
parallel_workers flag is supposed to indicate the number of workers
that it wants, and should be 0 for a non-partial path. In that
commit, I mistakenly thought that GatherPath could also use that field
to indicate the number of workers that it would try to start, but
that's disastrous, because then it can propagate up to higher nodes in
the plan tree, which will then get incorrect rowcounts because the
parallel_workers flag is involved in computing those values. Repair
by putting the separate field back.
Report by Tomas Vondra. Patch by me, reviewed by Amit Kapila.
Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.
If the compiler supports typeof or something similar, cast the result to
the input type. This creates a greater amount of type safety. In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.czhttps://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
Add a column collprovider to pg_collation that determines which library
provides the collation data. The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.
The pg_locale_t type is changed to a union that contains the
provider-specific locale handles. Users of locale information are
changed to look into that struct for the appropriate handle to use.
Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures. This is currently only supported by
ICU-provided collations.
initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.
Currently, ICU-provided collations can only be explicitly named
collations. The global database locales are still always libc-provided.
ICU support is enabled by configure --with-icu.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned. However, the partitioned tables still
need to be locked, even though they're not scanned. To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.
Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages. First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off. Second, it paves the way to getting rid of
the storage for partitioned tables altogether.
Amit Langote, reviewed by me.
Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
There is still some inconsistency with the error messages surrounding
foreign servers. Some use the word "foreign" and some don't. My
inclination is to remove all such uses of "foreign" on the basis that
the CREATE/ALTER/DROP SERVER commands don't use the word. However, that
is left for another day. In this patch I have kept to the existing usage
in the affected commands, which omits "foreign".
Anastasia Lubennikova, reviewed by Arthur Zakirov and Ashtosh Bapat.
Discussion: http://postgr.es/m/7c2ab9b8-388a-1ce0-23a3-7acf2a0ed3c6@postgrespro.ru
In DDL commands referring to an existing function, allow omitting the
argument list if the function name is unique in its schema, per SQL
standard.
This uses the same logic that the regproc type uses for finding
functions by name only.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT
in the replication protocol, but several upcoming patches want more
control over what happens.
Suppress snapshot export in pg_recvlogical, which neither needs nor can
use the exported snapshot. Since snapshot exporting can fail this
improves reliability.
This also paves the way for allowing the creation of replication slots
on standbys, which cannot export snapshots because they cannot allocate
new XIDs.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
When a shared iterator is used, each call to tbm_shared_iterate()
returns a result that has not yet been returned to any process
attached to the shared iterator. In other words, each cooperating
processes gets a disjoint subset of the full result set, but all
results are returned exactly once.
This is infrastructure for parallel bitmap heap scan.
Dilip Kumar. The larger patch set of which this is a part has been
reviewed and tested by (at least) Andres Freund, Amit Khandekar,
Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
The old function took function name and function argument list as
separate arguments. Now that all function signatures are passed around
as ObjectWithArgs structs, this is no longer necessary and can be
replaced by a function that takes ObjectWithArgs directly. Similarly
for aggregates and operators.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
In simpler times, it might have worked to refer to all kinds of objects
by a list of name components and an optional argument list. But this
doesn't work for all objects, which has resulted in a collection of
hacks to place various other nodes types into these fields, which have
to be unpacked at the other end. This makes it also weird to represent
lists of such things in the grammar, because they would have to be lists
of singleton lists, to make the unpacking work consistently. The other
problem is that keeping separate name and args fields makes it awkward
to deal with lists of functions.
Change that by dropping the objargs field and have objname, renamed to
object, be a generic Node, which can then be flexibly assigned and
managed using the normal Node mechanisms. In many cases it will still
be a List of names, in some cases it will be a string Value, for types
it will be the existing Typename, for functions it will now use the
existing ObjectWithArgs node type. Some of the more obscure object
types still use somewhat arbitrary nested lists.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This makes the handling of operators similar to that of functions and
aggregates.
Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This makes it consistent with the usage in opclass_item.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
This is expected to be useful mostly when performing such scans in
parallel, because in that case it allows (in combination with commit
acf555bc53) nodes below a Gather to get
control just before the DSM segment goes away.
KaiGai Kohei, except that I rewrote the documentation. Reviewed by
Claudio Freire.
Discussion: http://postgr.es/m/CADyhKSXJK0jUJ8rWv4AmKDhsUh124_rEn39eqgfC5D8fu6xVuw@mail.gmail.com
With this change, you can see the query that a parallel worker is
executing in pg_stat_activity, and if the worker crashes you can
see what query it was executing when it crashed.
Rafia Sabih, reviewed by Kuntal Ghosh and Amit Kapila and slightly
revised by me.
Commit 5262f7a4fc added similar support
for parallel index scans; this extends that work to index-only scans.
As with parallel index scans, this requires support from the index AM,
so currently parallel index-only scans will only be possible for btree
indexes.
Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja,
and Amit Kapila
Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
The core of the functionality was already implemented when
pg_import_system_collations was added. This just exposes it as an
option in the SQL command.
This doesn't do anything to make Param nodes anything other than
parallel-restricted, so this only helps with uncorrelated subplans,
and it's not necessarily very cheap because each worker will run the
subplan separately (just as a Hash Join will build a separate copy of
the hash table in each participating process), but it's a first step
toward supporting cases that are more likely to help in practice, and
is occasionally useful on its own.
Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and
me.
Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that. However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)
For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.
The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data. What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.
Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
The new function allows to cast from one NodeTag based type to
another, while asserting that the conversion is valid. This replaces
the common pattern of doing a cast and a Assert(IsA(ptr, type))
close-by.
As this seems likely to be used pervasively, we decided to backpatch
this change the addition of this macro. Otherwise backpatched fixes
are more likely not to work on back-branches.
On branches before 9.6, where we do not yet rely on inline functions
being available, the type assertion is only performed if PG_USE_INLINE
support is detected. The cast obviously is performed regardless.
For the benefit of verifying the macro compiles in the back-branches,
this commit contains a single use of the new macro. On master, a
somewhat larger conversion will be committed separately.
Author: Peter Eisentraut and Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com
Backpatch: 9.2-
We've accumulated quite a bit of stuff with which pgindent is not
quite happy in this code; clean it up to provide a less-annoying base
for future pgindent runs.