Enhance standby documentation.

Original patch by Fujii Masao, with heavy editing and bitrot-fixing after my other commit.
16 years ago · ec9ee9381f
parent 259f60e9b6
commit ec9ee9381f
1 changed files with 111 additions and 18 deletions
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.55 2010/03/31 19:13:01 heikki Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.56 2010/03/31 20:35:09 heikki Exp $ -->
 <chapter id="high-availability">
 <title>High Availability, Load Balancing, and Replication</title>
@ -622,7 +622,8 @@ protocol to make nodes agree on a serializable transactional order.
   <title>Preparing Master for Standby Servers</title>
   <para>
-    Set up continuous archiving to a WAL archive on the master, as described
+    Set up continuous archiving on the primary to an archive directory
    accessible from the standby, as described
    in <xref linkend="continuous-archiving">. The archive location should be
    accessible from the standby even when the master is down, ie. it should
    reside on the standby server itself or another trusted server, not on
@ -646,11 +647,11 @@ protocol to make nodes agree on a serializable transactional order.
   <para>
    To set up the standby server, restore the base backup taken from primary
-    server (see <xref linkend="backup-pitr-recovery">). In the recovery command file
+    server (see <xref linkend="backup-pitr-recovery">). Create a recovery
-    <filename>recovery.conf</> in the standby's cluster data directory,
+    command file <filename>recovery.conf</> in the standby's cluster data
-    turn on <varname>standby_mode</>. Set <varname>restore_command</> to
+    directory, and turn on <varname>standby_mode</>. Set
-    a simple command to copy files from the WAL archive. If you want to
+    <varname>restore_command</> to a simple command to copy files from
-    use streaming replication, set <varname>primary_conninfo</>.
+    the WAL archive.
   </para>
   <note>
@ -664,17 +665,38 @@ protocol to make nodes agree on a serializable transactional order.
   </note>
   <para>
-    You can use restartpoint_command to prune the archive of files no longer
+     If you want to use streaming replication, fill in
-    needed by the standby.
+     <varname>primary_conninfo</> with a libpq connection string, including
     the host name (or IP address) and any additional details needed to
     connect to the primary server. If the primary needs a password for
     authentication, the password needs to be specified in
     <varname>primary_conninfo</> as well.
   </para>
   <para>
    You can use <varname>restartpoint_command</> to prune the archive of
    files no longer needed by the standby.
   </para>
   <para>
    If you're setting up the standby server for high availability purposes,
    set up WAL archiving, connections and authentication like the primary
    server, because the standby server will work as a primary server after
-    failover. If you're setting up the standby server for reporting
+    failover. You will also need to set <varname>trigger_file</> to make
-    purposes, with no plans to fail over to it, configure the standby
+    it possible to fail over.
-    accordingly.
+    If you're setting up the standby server for reporting
    purposes, with no plans to fail over to it, <varname>trigger_file</>
    is not required.
   </para>
   <para>
    A simple example of a <filename>recovery.conf</> is:
 <programlisting>
 standby_mode = 'on'
 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
 restore_command = 'cp /path/to/archive/%f %p'
 trigger_file = '/path/to/trigger_file'
 </programlisting>
   </para>
   <para>
@ -731,7 +753,7 @@ protocol to make nodes agree on a serializable transactional order.
    On systems that support the keepalive socket option, setting
    <xref linkend="guc-tcp-keepalives-idle">,
    <xref linkend="guc-tcp-keepalives-interval"> and
-    <xref linkend="guc-tcp-keepalives-count"> helps the master promptly
+    <xref linkend="guc-tcp-keepalives-count"> helps the primary promptly
    notice a broken connection.
   </para>
@ -798,6 +820,29 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
      <varname>primary_conninfo</varname> then a FATAL error will be raised.
    </para>
   </sect3>
   <sect3 id="streaming-replication-monitoring">
    <title>Monitoring</title>
    <para>
     The WAL files required for the standby's recovery are not deleted from
     the <filename>pg_xlog</> directory on the primary while the standby is
     connected. If the standby lags far behind the primary, many WAL files
     will accumulate in there, and can fill up the disk. It is therefore
     important to monitor the lag to ensure the health of the standby and
     to avoid disk full situations in the primary.
     You can calculate the lag by comparing the current WAL write
     location on the primary with the last WAL location received by the
     standby. They can be retrieved using
     <function>pg_current_xlog_location</> on the primary and the
     <function>pg_last_xlog_receive_location</> on the standby,
     respectively (see <xref linkend="functions-admin-backup-table"> and
     <xref linkend="functions-recovery-info-table"> for details).
     The last WAL receive location in the standby is also displayed in the
     process status of the WAL receiver process, displayed using the
     <command>ps</> command (see <xref linkend="monitoring-ps"> for details).
    </para>
   </sect3>
  </sect2>
  </sect1>
@ -1898,16 +1943,64 @@ LOG:  database system is ready to accept read only connections
    updated backup than from the original base backup.
   </para>
   <para>
    The procedure for taking a file system backup of the standby server's
    data directory while it's processing logs shipped from the primary is:
   <orderedlist>
    <listitem>
     <para>
      Perform the backup, without using <function>pg_start_backup</> and
      <function>pg_stop_backup</>. Note that the <filename>pg_control</>
      file must be backed up <emphasis>first</>, as in:
 <programlisting>
 cp /var/lib/pgsql/data/global/pg_control /tmp
 cp -r /var/lib/pgsql/data /path/to/backup
 mv /tmp/pg_control /path/to/backup/data/global
 </programlisting>
      <filename>pg_control</> contains the location where WAL replay will
      begin after restoring from the backup; backing it up first ensures
      that it points to the last restartpoint when the backup started, not
      some later restartpoint that happened while files were copied to the 
      backup.
     </para>
    </listitem>
    <listitem>
     <para>
      Make note of the backup ending WAL location by calling the <function>
      pg_last_xlog_replay_location</> function at the end of the backup,
      and keep it with the backup.
 <programlisting>
 psql -c "select pg_last_xlog_replay_location();" > /path/to/backup/end_location
 </programlisting>
      When recovering from the incrementally updated backup, the server
      can begin accepting connections and complete the recovery successfully
      before the database has become consistent. To avoid that, you must
      ensure the database is consistent before users try to connect to the
      server and when the recovery ends. You can do that by comparing the
      progress of the recovery with the stored backup ending WAL location:
      the server is not consistent until recovery has reached the backup end
      location. The progress of the recovery can also be observed with the
      <function>pg_last_xlog_replay_location</> function, but that required
      connecting to the server while it might not be consistent yet, so
      care should be taken with that method.
     </para>
     <para>
     </para>
    </listitem>
   </orderedlist>
   </para>
   <para>
    Since the standby server is not <quote>live</>, it is not possible to
    use <function>pg_start_backup()</> and <function>pg_stop_backup()</>
    to manage the backup process; it will be up to you to determine how
    far back you need to keep WAL segment files to have a recoverable
-    backup.  You can do this by running <application>pg_controldata</>
+    backup. That is determined by the last restartpoint when the backup
-    on the standby server to inspect the control file and determine the
+    was taken, any WAL older than that can be deleted from the archive
-    current checkpoint WAL location, or by using the
+    once the backup is complete. You can determine the last restartpoint
-    <varname>log_checkpoints</> option to print values to the standby's
+    by running <application>pg_controldata</> on the standby server before
-    server log.
+    taking the backup, or by using the <varname>log_checkpoints</> option
    to print values to the standby's server log.
   </para>
  </sect1>