I removed the duplicate command tags for START_REPLICATION inadvertently
in commit 07082b08cc, but the replication protocol requires them. The
fact that the replication protocol was broken was not noticed because
all our test cases use an optimized code path that exits early, failing
to verify that the behavior is correct for non-optimized cases. Put
them back.
Also document this protocol quirk.
Add a test case that shows the failure. It might still succeed even
without the patch when run on a fast enough server, but it suffices to
show the bug in enough cases that it would be noticed in buildfarm.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Henry Hinze <henry.hinze@gmail.com>
Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com>
Discussion: https://postgr.es/m/16643-eaadeb2a1a58d28c@postgresql.org
This covers the functionality tests for streaming in-progress
subtransactions, streaming transactions containing rollback to savepoints,
and streaming transactions having DDLs.
Author: Tomas Vondra, Amit Kapila and Dilip Kumar
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
To add support for streaming of in-progress transactions into the
built-in logical replication, we need to do three things:
* Extend the logical replication protocol, so identify in-progress
transactions, and allow adding additional bits of information (e.g.
XID of subtransactions).
* Modify the output plugin (pgoutput) to implement the new stream
API callbacks, by leveraging the extended replication protocol.
* Modify the replication apply worker, to properly handle streamed
in-progress transaction by spilling the data to disk and then
replaying them on commit.
We however must explicitly disable streaming replication during
replication slot creation, even if the plugin supports it. We
don't need to replicate the changes accumulated during this phase,
and moreover we don't have a replication connection open so we
don't have where to send the data anyway.
Author: Tomas Vondra, Dilip Kumar and Amit Kapila
Reviewed-by: Amit Kapila, Kuntal Ghosh and Ajin Cherian
Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
This is like CVE-2018-1058 commit
582edc369c. Today, a malicious user of a
publisher or subscriber database can invoke arbitrary SQL functions
under an identity running replication, often a superuser. This fix may
cause "does not exist" or "no schema has been selected to create in"
errors in a replication process. After upgrading, consider watching
server logs for these errors. Objects accruing schema qualification in
the wake of the earlier commit are unlikely to need further correction.
Back-patch to v10, which introduced logical replication.
Security: CVE-2020-14349
Commit b9c130a1f failed to apply the publisher-to-subscriber column
mapping while checking which columns were updated. Perhaps less
significantly, it didn't exclude dropped columns either. This could
result in an incorrect updated-columns bitmap and thus wrong decisions
about whether to fire column-specific triggers on the subscriber while
applying updates. In HEAD (since commit 9de77b545), it could also
result in accesses off the end of the colstatus array, as detected by
buildfarm member skink. Fix the logic, and adjust 003_constraints.pl
so that the problem is exposed in unpatched code.
In HEAD, also add some assertions to check that we don't access off
the ends of these newly variable-sized arrays.
Back-patch to v10, as b9c130a1f was.
Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION.
When that's set, the publisher will send data using the data type's
typsend function if any, rather than typoutput. This is generally
faster, if slightly less robust.
As committed, we won't try to transfer user-defined array or composite
types in binary, for fear that type OIDs won't match at the subscriber.
This might be changed later, but it seems like fit material for a
follow-on patch.
Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others;
adjusted some by me
Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
In a logical replication subscriber, a table using REPLICA IDENTITY FULL
which has a primary key would try to use the primary key's index
available to scan for a tuple, but an assertion only assumed as correct
the case of an index associated to REPLICA IDENTITY USING INDEX. This
commit corrects the assertion so as the use of a primary key index is a
valid case.
Reported-by: Dilip Kumar
Analyzed-by: Dilip Kumar
Author: Euler Taveira
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAFiTN-u64S5bUiPL1q5kwpHNd0hRnf1OE-bzxNiOs5zo84i51w@mail.gmail.com
Backpatch-through: 10
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
To control whether partition changes are replicated using their own
identity and schema or an ancestor's, add a new parameter that can be
set per publication named 'publish_via_partition_root'.
This allows replicating a partitioned table into a different partition
structure on the subscriber.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
When a partitioned table is added to a publication, changes of all of
its partitions (current or future) are published via that publication.
This change only affects which tables a publication considers as its
members. The receiving side still sees the data coming from the
individual leaf partitions. So existing restrictions that partition
hierarchies can only be replicated one-to-one are not changed by this.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
On the publisher, it was assumed that an INSERT change cannot happen for
a relation with no replica identity. However this is true only for a
change that needs references to old rows, aka UPDATE or DELETE, so
trying to use logical replication with a relation that has no replica
identity led to an assertion failure in the publisher when issuing an
INSERT. This commit removes the incorrect assertion, and adds more
regression tests to provide coverage for relations without replica
identity.
Reported-by: Neha Sharma
Author: Dilip Kumar, Michael Paquier
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com
Backpatch-through: 10
It seems pretty unacceptable to have no regression test coverage
for this aspect of the logical replication protocol, especially
given the bugs we've found in related code.
Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org
slot_modify_cstrings seriously abused the TupleTableSlot API by relying
on a slot's underlying data to stay valid across ExecClearTuple. Since
this abuse was also quite undocumented, it's little surprise that the
case got broken during the v12 slot rewrites. As reported in bug #16129
from Ondřej Jirman, this could lead to crashes or data corruption when
a logical replication subscriber processes a row update. Problems would
only arise if the subscriber's table contained columns of pass-by-ref
types that were not being copied from the publisher.
Fix by explicitly copying the datum/isnull arrays from the source slot
that the old row was in already. This ends up being about the same
thing that happened pre-v12, but hopefully in a less opaque and
fragile way.
We might've caught the problem sooner if there were any test cases
dealing with updates involving non-replicated or dropped columns.
Now there are.
Back-patch to v10 where this code came in. Even though the failure
does not manifest before v12, IMO this code is too fragile to leave
as-is. In any case we certainly want the additional test coverage.
Patch by me; thanks to Tomas Vondra for initial investigation.
Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org
This happens when we add a replica identity column on a subscriber
that does not yet exist on the publisher, according to the mapping
maintained by the subscriber. Code that checks whether the target
relation on the subscriber is updatable would check the replica
identity attribute bitmap with a column number -1, which would result
in an error. To fix, skip such columns in the bitmap lookup and
consider the relation not updatable. The result is consistent with
the rule that the replica identity columns on the subscriber must be a
subset of those on the publisher, since if the column doesn't exist on
the publisher, the column set on the subscriber can't be a subset.
Reported-by: Tim Clarke <tim.clarke@minerva.info>
Analyzed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/a9139c29-7ddd-973b-aa7f-71fed9c38d75%40minerva.info
If a FOR ALL TABLES publication exists, temporary and unlogged tables
are ignored for publishing changes. But CheckCmdReplicaIdentity()
would still check in that case that such a table has a replica
identity set before accepting updates. To fix, have
GetRelationPublicationActions() return that such a table publishes no
actions.
Discussion: https://www.postgresql.org/message-id/f3f151f7-c4dd-1646-b998-f60bd6217dd3@2ndquadrant.com
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.
This implements one kind of generated column: stored (computed on
write). Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
When libpq is loaded in the server (for instance, by
libpqwalreceiver), it may use libpq environment variables set in the
postmaster environment for connection parameter defaults. This has
some confusing effects in our test suites. For example, the TAP test
infrastructure sets PGAPPNAME to allow identifying clients in the
server log. But this environment variable is also inherited by
temporary servers started with pg_ctl and is then in turn used by
libpqwalreceiver as the application_name for connecting to remote
servers where it then shows up in pg_stat_replication and is relevant
for things like synchronous_standby_names. Replication already has a
suitable default for application_name, and overriding that
accidentally then requires the individual test cases to re-override
that, which is all very confusing and unnecessary.
To fix, unset PGAPPNAME temporarily before running pg_ctl start or
restart in the tests.
More comprehensive approaches like unsetting all environment variables
in pg_ctl were considered but might be too complicated to achieve
portably.
The now unnecessary re-overriding of application_name by test cases is
also removed.
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/33383613-690e-6f1b-d5ba-4957ff40f6ce@2ndquadrant.com
The bug was that determining which columns are part of the replica
identity index using RelationGetIndexAttrBitmap() would run
eval_const_expressions() on index expressions and predicates across
all indexes of the table, which in turn might require a snapshot, but
there wasn't one set, so it crashes. There were actually two separate
bugs, one on the publisher and one on the subscriber.
To trigger the bug, a table that is part of a publication or
subscription needs to have an index with a predicate or expression
that lends itself to constant expressions simplification.
The fix is to avoid the constant expressions simplification in
RelationGetIndexAttrBitmap(), so that it becomes safe to call in these
contexts. The constant expressions simplification comes from the
calls to RelationGetIndexExpressions()/RelationGetIndexPredicate() via
BuildIndexInfo(). But RelationGetIndexAttrBitmap() calling
BuildIndexInfo() is overkill. The latter just takes pg_index catalog
information, packs it into the IndexInfo structure, which former then
just unpacks again and throws away. We can just do this directly with
less overhead and skip the troublesome calls to
eval_const_expressions(). This also removes the awkward
cross-dependency between relcache.c and index.c.
Bug: #15114
Reported-by: Петър Славов <pet.slavov@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/152110589574.1223.17983600132321618383@wrigleys.postgresql.org/
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.
This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server. The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.
Backpatch down to 9.4 where this code has been introduced.
Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko
Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
Explain the difference between "make check" and "make installcheck".
Mention the need for --enable-tap-tests (only some of these did so
before). Standardize their wording about how to run the tests.
This patch introduces INCLUDE clause to index definition. This clause
specifies a list of columns which will be included as a non-key part in
the index. The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans. Also, such columns don't need to have
appropriate operator classes. Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.
Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause.
In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys). Therefore, B-tree indexes
now might have variable number of attributes. This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating
that. This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.
Bump catalog version
Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
We need to wait for the initial sync of all subscriptions. On
some (faster?) machines, this didn't make a difference, but
the (slower?) buildfarm machines are upset.
Update the built-in logical replication system to make use of the
previously added logical decoding for TRUNCATE support. Add the
required truncate callback to pgoutput and a new logical replication
protocol message.
Publications get a new attribute to determine whether to replicate
truncate actions. When updating a publication via pg_dump from an older
version, this is not set, thus preserving the previous behavior.
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
In the pgoutput plugin, skip changes for relations that are not
publishable, per is_publishable_class(). This concerns in particular
materialized views and information_schema tables. While those relations
cannot be part of a publication, per existing checks, they will be
considered by a FOR ALL TABLES publication. A subscription would not
actually apply changes for those relations, again per existing checks,
but trying to match incoming changes to local tables on the subscriber
would lead to errors if no matching local table exists. Skipping those
changes on the publisher avoids sending useless changes and eliminates
the error.
Bug: #15044
Reported-by: Chad Trabant <chad@iris.washington.edu>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN. To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.
For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence. The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet. So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This was nearly the same code. Extend wait_for_catchup to allow waiting
for pg_current_wal_lsn() and use that in the subscription tests. Also
change one use in the pg_rewind tests to use this.
Also remove some broken code in wait_for_catchup and
wait_for_slot_catchup. The error message in case the waiting failed
wanted to show the current LSN, but the way it was written never
worked. So since nobody ever cared, just remove it.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
When a publisher table has fewer columns than a subscriber, the update
of a row on the publisher should result in updating of only the columns
in common. The previous coding mistakenly reset the values of
additional columns on the subscriber to NULL because it failed to skip
updates of columns not found in the attribute map.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
A FOR ALL TABLES publication naturally considers all base tables to be a
candidate for replication. This includes transient heaps that are
created during a table rewrite during DDL. This causes failures on the
subscriber side because it will not have a table like pg_temp_16386 to
receive data (and if it did, it would be the wrong table).
The prevent this problem, we filter out any tables that match this
naming pattern and match an actual table from FOR ALL TABLES
publications. This is only a heuristic, meaning that user tables that
match that naming could accidentally be omitted. A more robust solution
might require an explicit marking of such tables in pg_class somehow.
Reported-by: yxq <yxq@o2.pl>
Bug: #14785
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
005_encoding.pl neglected to wait for the subscriber's initial
synchronization to happen. While we have not seen this fail in
the buildfarm, it's pretty easy to demonstrate there's an issue
by hacking logicalrep_worker_launch() to fail most of the time.
Michael Paquier
Discussion: https://postgr.es/m/27032.1505749806@sss.pgh.pa.us
When ALTER SUBSCRIPTION DISABLE is run in the same transaction before
DROP SUBSCRIPTION, the latter will hang because workers will still be
running, not having seen the DISABLE committed, and DROP SUBSCRIPTION
will wait until the workers have vacated the replication origin slots.
Previously, DROP SUBSCRIPTION killed the logical replication workers
immediately only if it was going to drop the replication slot, otherwise
it scheduled the worker killing for the end of the transaction, as a
result of 7e174fa793. This, however,
causes the present problem. To fix, kill the workers immediately in all
cases. This covers all cases: A subscription that doesn't have a
replication slot must be disabled. It was either disabled in the same
transaction, or it was already disabled before the current transaction,
but then there shouldn't be any workers left and this won't make a
difference.
Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/87mv6av84w.fsf%40ars-thinkpad