Use replay position as floor for WAIT FOR LSN standby_(write|flush)

GetCurrentLSNForWaitType() for standby_write and standby_flush modes
returned only the walreceiver position, which may lag behind WAL
already present on the standby from a base backup, archive restore,
or prior streaming.  This could cause unnecessary blocking if the
target LSN falls between the walreceiver's tracked position and the
replay position.

Fix by returning the maximum of the walreceiver position and the
replay position.  WAL up to the replay point is physically on disk
regardless of its origin, so there is no reason to wait for the
walreceiver to re-receive it.

This complements 29e7dbf5e4, which seeded writtenUpto to
receiveStart in RequestXLogStreaming() to fix the most common
hang scenario.  The getter-level floor handles the remaining edge
cases: targets between receiveStart and the replay position, and
standbys running with archive recovery only (no walreceiver).

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1957514.1775526774%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
master
Alexander Korotkov 1 week ago
parent df9f938ca2
commit cba67b5b87
  1. 38
      doc/src/sgml/ref/wait_for.sgml
  2. 21
      src/backend/access/transam/xlogwait.c
  3. 73
      src/test/recovery/t/049_wait_for_lsn.pl

@ -105,30 +105,25 @@ WAIT FOR LSN '<replaceable class="parameter">lsn</replaceable>'
<listitem>
<para>
<literal>standby_write</literal>: Wait for the WAL containing the
LSN to be received from the primary and written to disk on a
standby server, but not yet flushed. This is faster than
LSN to be written to disk on a standby server, but not yet
necessarily flushed. This is faster than
<literal>standby_flush</literal> but provides weaker durability
guarantees since the data may still be in operating system
buffers. After successful completion, the
<structfield>written_lsn</structfield> column in
<link linkend="monitoring-pg-stat-wal-receiver-view">
<structname>pg_stat_wal_receiver</structname></link> will show
a value greater than or equal to the target LSN. This mode can
only be used during recovery.
buffers. This is satisfied by WAL already present on the
standby from a base backup, archive restore, or prior
streaming, as well as WAL newly received from the primary.
This mode can only be used during recovery.
</para>
</listitem>
<listitem>
<para>
<literal>standby_flush</literal>: Wait for the WAL containing the
LSN to be received from the primary and flushed to disk on a
standby server. This provides a durability guarantee without
waiting for the WAL to be applied. After successful completion,
<function>pg_last_wal_receive_lsn()</function> will return a
value greater than or equal to the target LSN. This value is
also available as the <structfield>flushed_lsn</structfield>
column in <link linkend="monitoring-pg-stat-wal-receiver-view">
<structname>pg_stat_wal_receiver</structname></link>. This mode
can only be used during recovery.
LSN to be flushed to disk on a standby server. This provides
a durability guarantee without waiting for the WAL to be
applied. This is satisfied by WAL already present on the
standby from a base backup, archive restore, or prior
streaming, as well as WAL newly received from the primary.
This mode can only be used during recovery.
</para>
</listitem>
<listitem>
@ -238,10 +233,11 @@ WAIT FOR LSN '<replaceable class="parameter">lsn</replaceable>'
useful to achieve read-your-writes consistency while using an async
replica for reads and the primary for writes. The
<literal>standby_flush</literal> mode waits for the WAL to be flushed
to durable storage on the replica, providing a durability guarantee
without waiting for replay. The <literal>standby_write</literal> mode
waits for the WAL to be written to the operating system, which is
faster than flush but provides weaker durability guarantees. The
to durable storage on the replica, or to have already been replayed
from WAL present on the standby. The <literal>standby_write</literal> mode
waits for the WAL to be written to the operating system, or to have
already been replayed, which is faster than flush for newly received
WAL but provides weaker durability guarantees. The
<literal>primary_flush</literal> mode waits for WAL to be flushed on
a primary server. In all cases, the <acronym>LSN</acronym> of the last
modification should be stored on the client application side or the

@ -111,10 +111,27 @@ GetCurrentLSNForWaitType(WaitLSNType lsnType)
return GetXLogReplayRecPtr(NULL);
case WAIT_LSN_TYPE_STANDBY_WRITE:
return GetWalRcvWriteRecPtr();
{
XLogRecPtr recptr = GetWalRcvWriteRecPtr();
XLogRecPtr replay = GetXLogReplayRecPtr(NULL);
/*
* Use the replay position as a floor. WAL up to the replay
* point is already on disk from a base backup, archive
* restore, or prior streaming, so there is no reason to wait
* for the walreceiver to re-receive it.
*/
return Max(recptr, replay);
}
case WAIT_LSN_TYPE_STANDBY_FLUSH:
return GetWalRcvFlushRecPtr(NULL, NULL);
{
XLogRecPtr recptr = GetWalRcvFlushRecPtr(NULL, NULL);
XLogRecPtr replay = GetXLogReplayRecPtr(NULL);
/* Same floor as standby_write; see comment above. */
return Max(recptr, replay);
}
case WAIT_LSN_TYPE_PRIMARY_FLUSH:
return GetFlushRecPtr(NULL);

@ -674,4 +674,77 @@ for (my $i = 0; $i < 3; $i++)
$wait_sessions[$i]->{run}->finish;
}
# 9. Archive-only standby tests: verify standby_write/standby_flush work
# without a walreceiver. These exercises the replay-position floor in
# GetCurrentLSNForWaitType().
#
# We set up a separate primary with archiving and an archive-only standby
# (has_restoring, no has_streaming), so no walreceiver ever starts and the
# shared walreceiver positions (writtenUpto, flushedUpto) stay at their
# zero-initialized values.
my $arc_primary = PostgreSQL::Test::Cluster->new('arc_primary');
$arc_primary->init(has_archiving => 1, allows_streaming => 1);
$arc_primary->start;
$arc_primary->safe_psql('postgres',
"CREATE TABLE arc_test AS SELECT generate_series(1,10) AS a");
my $arc_backup_name = 'arc_backup';
$arc_primary->backup($arc_backup_name);
# Generate WAL that will be archived and replayed on the standby.
$arc_primary->safe_psql('postgres',
"INSERT INTO arc_test VALUES (generate_series(11, 20))");
my $arc_target_lsn =
$arc_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
# Force WAL to be archived by switching segments, then wait for archiving.
my $arc_segment = $arc_primary->safe_psql('postgres',
"SELECT pg_walfile_name(pg_current_wal_lsn())");
$arc_primary->safe_psql('postgres', "SELECT pg_switch_wal()");
$arc_primary->poll_query_until('postgres',
qq{SELECT last_archived_wal >= '$arc_segment' FROM pg_stat_archiver}, 't')
or die "Timed out waiting for WAL archiving on arc_primary";
# Create an archive-only standby: has_restoring but NOT has_streaming.
# No primary_conninfo means no walreceiver will start.
my $arc_standby = PostgreSQL::Test::Cluster->new('arc_standby');
$arc_standby->init_from_backup($arc_primary, $arc_backup_name,
has_restoring => 1);
$arc_standby->start;
# Wait for the standby to replay past our target LSN via archive recovery.
$arc_standby->poll_query_until('postgres',
qq{SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$arc_target_lsn') >= 0}
) or die "Timed out waiting for archive replay on arc_standby";
# Sanity: verify no walreceiver is running.
$output = $arc_standby->safe_psql('postgres',
"SELECT count(*) FROM pg_stat_wal_receiver");
is($output, '0', "arc_standby has no walreceiver");
# 9a. Getter fallback: standby_write/standby_flush succeed immediately when
# the target LSN has already been replayed, even though writtenUpto and
# flushedUpto are zero. GetCurrentLSNForWaitType() returns
# Max(walrcv_pos, replay), so replay >= target satisfies the check on the
# first loop iteration without ever sleeping.
$output = $arc_standby->safe_psql(
'postgres', qq[
WAIT FOR LSN '${arc_target_lsn}'
WITH (MODE 'standby_write', timeout '3s', no_throw);]);
ok($output eq "success",
"standby_write succeeds on archive-only standby (getter fallback)");
$output = $arc_standby->safe_psql(
'postgres', qq[
WAIT FOR LSN '${arc_target_lsn}'
WITH (MODE 'standby_flush', timeout '3s', no_throw);]);
ok($output eq "success",
"standby_flush succeeds on archive-only standby (getter fallback)");
$arc_standby->stop;
$arc_primary->stop;
done_testing();

Loading…
Cancel
Save