As one might guess, this function dumps the sequence data. It is
called once per sequence, and each such call executes a query to
retrieve the relevant data for a single sequence. This can cause
pg_dump to take significantly longer, especially when there are
many sequences.
This commit improves the performance of this function by gathering
all the sequence data with a single query at the beginning of
pg_dump. This information is stored in a sorted array that
dumpSequenceData() can bsearch() for what it needs. This follows a
similar approach as previous commits that introduced sorted arrays
for role information, pg_class information, and sequence metadata.
As with those commits, this patch will cause pg_dump to use more
memory, but that isn't expected to be too egregious.
Note that we use the brand new function pg_sequence_read_tuple() in
the query that gathers all sequence data, so we must continue to
use the preexisting query-per-sequence approach for versions older
than 18.
Reviewed-by: Euler Taveira, Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20240503025140.GA1227404%40nathanxps13
This function dumps the sequence definitions. It is called once
per sequence, and each such call executes a query to retrieve the
metadata for a single sequence. This can cause pg_dump to take
significantly longer, especially when there are many sequences.
This commit improves the performance of this function by gathering
all the sequence metadata with a single query at the beginning of
pg_dump. This information is stored in a sorted array that
dumpSequence() can bsearch() for what it needs. This follows a
similar approach as commits d5e8930f50 and 2329cad1b9, which
introduced sorted arrays for role information and pg_class
information, respectively. As with those commits, this patch will
cause pg_dump to use more memory, but that isn't expected to be too
egregious.
Note that before version 10, the sequence metadata was stored in
the sequence relation itself, which makes it difficult to gather
all the sequence metadata with a single query. For those older
versions, we continue to use the preexisting query-per-sequence
approach.
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/20240503025140.GA1227404%40nathanxps13
This commit modifies dumpSequence() to parse all the sequence
metadata into the appropriate types instead of carting around
string pointers to the PGresult data. Besides allowing us to free
the PGresult storage earlier in the function, this eliminates the
need to compare min_value and max_value to their respective
defaults as strings.
This is preparatory work for a follow-up commit that will improve
the performance of dumpSequence() in a similar manner to how commit
2329cad1b9 optimized binary_upgrade_set_pg_class_oids().
Reviewed-by: Euler Taveira
Discussion: https://postgr.es/m/20240503025140.GA1227404%40nathanxps13
The problem is that the tool is using the LSN returned by
pg_create_logical_replication_slot() as recovery_target_lsn. This LSN is
ahead of the current WAL position and the recovery waits until the
publisher writes a WAL record to reach the target and ends the recovery.
On idle systems, this wait time is unpredictable and could lead to failure
in promoting the subscriber. To avoid that, insert a harmless WAL record.
Reported-by: Alexander Lakhin and Tom Lane
Diagnosed-by: Hayato Kuroda
Author: Euler Taveira
Reviewed-by: Hayato Kuroda, Amit Kapila
Backpatch-through: 17
Discussion: https://postgr.es/m/2377319.1719766794%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CA+TgmoYcY+Wb67NAwaHT7MvxCSeV86oSc+va9hHKaasE42ukyw@mail.gmail.com
The initial implementation in commit 959b38d77 counted one action
per TOC entry (except for some special cases for multi-blob BLOBS
entries). This assumes that TOC entries are all about equally
complex, but it turns out that that assumption doesn't hold up very
well in binary-upgrade mode. For example, even after the previous
commit I was able to cause backend bloat with tables having many
inherited constraints. There may be other cases too. (Since no
serious problems have been reported with --single-transaction mode,
we can conclude that the backend copes well with psql's regular
restore scripts; but before 959b38d77 we never ran binary-upgrade
restores with multi-command transactions.)
To fix, count multi-command TOC entries as N actions, allowing the
transaction size to be scaled down when we hit a complex TOC entry.
Rather than add a SQL parser to pg_restore, approximate "multi
command" by counting semicolons in the TOC entry's defn string.
This will be fooled by semicolons appearing in string literals ---
but the error is in the conservative direction, so it doesn't seem
worth working harder. The biggest risk is with function/procedure
TOC entries, but we can just explicitly skip those.
(This is undoubtedly a hack, and maybe someday we'll be able to
revert it after fixing the backend's bloat issues or rethinking
what pg_dump emits in binary upgrade mode. But that surely isn't
a project for v17.)
Thanks to Alexander Korotkov for the let's-count-semicolons idea.
Per report from Justin Pryzby. Back-patch to v17 where txn_size mode
was introduced.
Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
Avoid issuing a separate SQL UPDATE command for each column when
directly manipulating pg_attribute contents in binary upgrade mode.
With the separate updates, we triggered a relcache invalidation with
each update. For a table with N columns, that causes O(N^2) relcache
bloat in txn_size mode because the table's newly-created relcache
entry can't be flushed till end of transaction. Reducing the number
of commands should make it marginally faster as well as avoiding that
problem.
While at it, likewise avoid issuing a separate UPDATE on pg_constraint
for each inherited constraint. This is less exciting, first because
inherited (non-partitioned) constraints are relatively rare, and
second because the backend has a good deal of trouble anyway with
restoring tables containing many such constraints, due to
MergeConstraintsIntoExisting being horribly inefficient. But it seems
more consistent to do it this way here too, and it surely can't hurt.
In passing, fix one place in dumpTableSchema that failed to use ONLY
in ALTER TABLE. That's not a live bug, but it's inconsistent.
Also avoid silently casting away const from string literals.
Per report from Justin Pryzby. Back-patch to v17 where txn_size mode
was introduced.
Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.
To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.
In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.
Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.
Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
At the moment, pg_upgrade stores whether it is doing a "live check"
(i.e., the user specified --check and the old server is still
running) in a local variable scoped to main(). This live_check
variable is passed to several functions. To further complicate
matters, a few call sites provide a hard-coded "false" as the
live_check argument. Specifically, this is done when calling these
functions for the new cluster, for which any live-check-only paths
won't apply.
This commit moves the live_check variable to the global user_opts
variable, which stores information about the options the user
specified on the command line. This allows us to remove the
live_check parameter from several functions. For the functions
with callers that provide a hard-coded "false" as the live_check
argument (e.g., get_control_data()), we verify the given cluster is
the old cluster before taking any live-check-only paths.
This small refactoring effort helps simplify some proposed changes
that would parallelize many of pg_upgrade's once-in-each-database
tasks using libpq's asynchronous APIs. By removing the live_check
parameter, we can more easily convert the functions to callbacks
for the new parallel system.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
Commit f06b1c598 removed validate_exec from pg_upgrade and instead
exported it from src/common, but the macro for checking executable
suffix on Windows was accidentally left. Fix by removing.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/c1d63754-cb85-2d8a-8409-bde2c4d2d04b@gmail.com
Presently, pg_upgrade obtains the number of subscriptions in the
to-be-upgraded cluster by first querying pg_subscription in every
database for the number of subscriptions in only that database.
Then, in count_old_cluster_subscriptions(), it adds all the values
collected in the first step. This is expensive, especially when
there are many databases.
Fortunately, there is a better way to retrieve the subscription
count. Since pg_subscription is a shared catalog, we only need to
connect to a single database and query it once. This commit
modifies pg_upgrade to use that approach, which also allows us to
trim several lines of code. In passing, move the call to
get_db_subscription_count(), which has been renamed to
get_subscription_count(), from get_db_rel_and_slot_infos() to the
dedicated >= v17 section in check_and_dump_old_cluster().
We may be able to make similar improvements to
get_old_cluster_logical_slot_infos(), but that is left as a future
exercise.
Reviewed-by: Michael Paquier, Amit Kapila
Discussion: https://postgr.es/m/ZprQJv_TxccN3tkr%40nathan
Backpatch-through: 17
The two_phase option is controlled by both the publisher (as a slot
option) and the subscriber (as a subscription option), so the slot option
must also be modified.
Changing the 'two_phase' option for a subscription from 'true' to 'false'
is permitted only when there are no pending prepared transactions
corresponding to that subscription. Otherwise, the changes of already
prepared transactions can be replicated again along with their corresponding
commit leading to duplicate data or errors.
To avoid data loss, the 'two_phase' option for a subscription can only be
changed from 'false' to 'true' once the initial data synchronization is
completed. Therefore this is performed later by the logical replication worker.
Author: Hayato Kuroda, Ajin Cherian, Amit Kapila
Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C
Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166
If pg_ctl tries to start the postmaster, but the postmaster shuts down
because it completed a point-in-time recovery, pg_ctl used to report
a message that indicated a failure. It's not really a failure, so
instead say "server shut down because of recovery target settings".
Zhao Junwang, Crisp Lee, Laurenz Albe
Discussion: https://postgr.es/m/CAGHPtV7GttPZ-HvxZuYRy70jLGQMEm5=LQc4fKGa=J74m2VZbg@mail.gmail.com
To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.
This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.
A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.
Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.
Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
The current code can have pg_isready unexpectedly succeed if there is a
server running on the default port. To avoid this we delay running the
test until after a node has been created but before it starts, and then
use that node's port, so we are fairly sure there is nothing running on
the port.
Backpatch to all live branches.
The slot synchronization failed because the local slot's (created during
slot synchronization) catalog_xmin on standby is ahead of remote slot.
This happens because the INSERT before slot synchronization results in the
generation of a new xid that could be replicated to the standby. Now
before the xmin of the physical slot on the primary catches up via
hot_standby_feedback, the test has created a logical slot that got some
prior value of catalog_xmin.
To fix this we could try to ensure that the physical slot's catalog_xmin
is caught up to latest value before creating a logical slot but we took a
simpler path to move the INSERT after synchronizing the logical slot.
Reported-by: Alexander Lakhin as per buildfarm
Diagnosed-by: Amit Kapila, Hou Zhijie, Alexander Lakhin
Author: Hou Zhijie
Backpatch-through: 17
Discussion: https://postgr.es/m/bde6ac67-69cc-c104-5ab6-dd4f5deadf24@gmail.com
This reverts commit e9f15bc9. Instead of a hacky solution that didn't
work on Windows, we avoid trying to move the directory possibly across
drives, and instead remove it and recreate it in the new location.
Discussion: https://postgr.es/m/20240707070243.sb77kp4ubowauctz@awork3.anarazel.de
Backpatch to release 14 like the previous patch.
While this strategy is ordinarily quite costly because it requires
performing two checkpoints, testing shows that it tends to be a
faster choice than WAL_LOG during pg_upgrade, presumably because
fsync is turned off. Furthermore, we can skip the checkpoints
altogether because the problems they are intended to prevent don't
apply to pg_upgrade. Instead, we just need to CHECKPOINT once in
the new cluster after making any changes to template0 and before
restoring the rest of the databases. This ensures that said
template0 changes are written out to disk prior to creating the
databases via FILE_COPY.
Co-authored-by: Matthias van de Meent
Reviewed-by: Ranier Vilela, Dilip Kumar, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/Zl9ta3FtgdjizkJ5%40nathan
This acts as a revert of b83747a8a6 and 9886744a36. As pointed out
by Noah, HEAD and REL_17_STABLE are in a weird state where the code
paths adding /D would limit the spawn of child processes, but we still
have code paths where the spawn of more than one child process(es) would
be possible.
Let's remove these /D switches for now, to bring back the code into a
state consistent with how autorun is configured on a Windows host.
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20240630021211.f3.nmisch@google.com
Backpatch-through: 17
Various buildfarm critters were complaining about
pgbench.c:304:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
Evidently a thinko in 720b0eaae.
This function generates the commands that preserve the OIDs and
relfilenodes of relations during pg_upgrade. It is called once per
relevant relation, and each such call executes a relatively
expensive query to retrieve information for a single pg_class_oid.
This can cause pg_dump to take significantly longer when
--binary-upgrade is specified, especially when there are many
tables.
This commit improves the performance of this function by gathering
all the required pg_class information with a single query at the
beginning of pg_dump. This information is stored in a sorted array
that binary_upgrade_set_pg_class_oids() can bsearch() for what it
needs. This follows a similar approach as commit d5e8930f50, which
introduced a sorted array for role information.
With this patch, 'pg_dump --binary-upgrade' will use more memory,
but that isn't expected to be too egregious. Per the mailing list
discussion, folks feel that this is worth the trade-off.
Reviewed-by: Corey Huinker, Michael Paquier, Daniel Gustafsson
Discussion: https://postgr.es/m/20240418041712.GA3441570%40nathanxps13
Since commit 9a974cbcba, this function retrieves the relkind before
it needs to know whether the relation is an index, so we no longer
need callers to provide this information.
Suggested-by: Daniel Gustafsson
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20240418041712.GA3441570%40nathanxps13
The failed test was syncing failover replication slot to standby to test
that we remove such slots after the standby is converted to subscriber by
pg_createsubscriber.
In one of the buildfarm members, the sync of the slot failed because the
LSN on the standby was before the syncslot's LSN. We need to wait for
standby to catch up before trying to sync the slot with
pg_sync_replication_slots().
The other buildfarm failed because autovacuum generated a xid which is
replicated to the standby at some random point making slots at primary
lag behind standby during slot sync.
Both these failures wouldn't have occurred if we had used built-in
slotsync worker as it would have waited for the standby to sync with
primary but for this test, it is sufficient to use
pg_sync_replication_slots().
Reported-by: Alexander Lakhin as per buildfarm
Author: Kuroda Hayato
Reviewed-by: Amit Kapila
Backpatch-through: 17
Discussion: https://postgr.es/m/0dffca12-bf17-4a7a-334d-225569de5e6e@gmail.com
Discussion: https://postgr.es/m/OSBPR01MB25528300C71FDD83EA1DCA12F5DD2@OSBPR01MB2552.jpnprd01.prod.outlook.com
getSchemaData() does not use the return values of many of its get*
helper functions because they store the data elsewhere. For
example, commit 92316a4582 introduced a separate hash table for
dumpable objects that said helper functions populate. This commit
changes these functions to return void and removes their "int *"
parameters that returned the number of objects found.
Reviewed-by: Neil Conway, Tom Lane, Daniel Gustafsson
Discussion: https://postgr.es/m/ZmCAtVaOrHpf31PJ%40nathan
We don't need the pre-existing subscriptions on the newly formed
subscriber by using pg_createsubscriber. The apply workers corresponding
to these subscriptions can connect to other publisher nodes and either get
some unwarranted data or can lead to ERRORs in connecting to such nodes.
Author: Kuroda Hayato
Reviewed-by: Amit Kapila, Shlok Kyal, Vignesh C
Backpatch-through: 17
Discussion: https://postgr.es/m/OSBPR01MB25526A30A1FBF863ACCDDA3AF5C92@OSBPR01MB2552.jpnprd01.prod.outlook.com
This commit removes unused variables and routines from some perl code
that have accumulated across the years. This touches the following
areas:
- Wait event generation script.
- AdjustUpgrade.pm.
- TAP perl code
Author: Alexander Lakhin
Reviewed-by: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/70b340bc-244a-589d-ef8b-d8aebb707a84@gmail.com
Since 374c7a2290, it is possible to set a table AM on a partitioned
table. This information was showing up already in psql with \d+, while
\dP+ provided no information.
This commit extends \dP+ to show the access method used by a partitioned
table or index, if set.
Author: Justin Pryzby
Discussion: https://postgr.es/m/ZkyivySXnbvOogZz@pryzbyj2023
There's no reason to ensure that the files pg_upgrade generates
with pg_dump and pg_dumpall have been written safely to disk. If
there is a crash during pg_upgrade, the upgrade must be restarted
from the beginning; dump files left behind by previous pg_upgrade
attempts cannot be reused.
Reviewed-by: Peter Eisentraut, Tom Lane, Michael Paquier, Daniel Gustafsson
Discussion: https://postgr.es/m/20240503171348.GA1731524%40nathanxps13
Also omit backslashes (\) in the generated database names on Windows.
As before, perhaps we can revert this after updating affected
buildfarm animals.
Discussion: https://postgr.es/m/2509767.1719773880@sss.pgh.pa.us
This is required before the creation of a new branch. pgindent is
clean, as well as is reformat-dat-files.
perltidy version is v20230309, as documented in pgindent's README.
Don't include double-quotes (") in the generated database names
on Windows. Doing so tickles a bug in older versions of IPC::Run,
which fail to quote command line arguments correctly for that
platform. Possibly we can revert this after updating affected
buildfarm animals.
Discussion: https://postgr.es/m/2509767.1719773880@sss.pgh.pa.us
Introduces an environment variable PG_TEST_PG_COMBINEBACKUP_MODE, that
determines copy mode used by pg_combinebackup in TAP tests. Defaults to
"--copy" but may be set to "--clone" or "--copy-file-range" to use the
alternative stategies.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
Introduces --copy as an alternative to --clone and --copy-file-range.
This option simply picks the default mode to copy files, as if none of
the options was specified. This makes pg_combinebackup options more
consistent with pg_upgrade, and it makes testing simpler.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
The code for file cloning existed, but was not reachable as it relied on
constants from missing headers. Due to that, on Linux --clone always
failed with
error: file cloning not supported on this platform
Fixed by including the missing headers to relevant places. Adding the
headers revealed a couple compile errors in copy_file_clone(), so fix
those too.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
pg_createsubscriber currently always sets up logical replication
with two-phase commit disabled. Improving that is not going to
happen for v17. In the meantime, document the deficiency, and
adjust pg_createsubscriber so that it will emit a warning if
the source installation has max_prepared_transactions > 0.
Hayato Kuroda (some mods by Amit Kapila and me), per complaint from
Noah Misch
Discussion: https://postgr.es/m/20240623062157.97.nmisch@google.com
The original coding here could fail with database names, user names,
etc that contain spaces or other special characters.
As partial test coverage, extend the 040_pg_createsubscriber.pl
test script so that it uses a generated database name containing
funny characters.
Hayato Kuroda (some mods by me), per complaint from Noah Misch
Discussion: https://postgr.es/m/20240623062157.97.nmisch@google.com
This covers both regular and inplace changes, since bugs arise at their
intersection. Where marked, these witness extant bugs. Back-patch to
v12 (all supported versions).
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com