mirror of https://github.com/postgres/postgres
the Admin Guide about routine maintenance tasks. Currently this only discusses the various reasons for doing VACUUM, but perhaps it could be fleshed out with topics like log rotation.REL7_2_STABLE
parent
3d162c0d1a
commit
8e0d642c9b
@ -0,0 +1,371 @@ |
||||
<!-- |
||||
$Header: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v 1.1 2001/08/26 21:17:12 tgl Exp $ |
||||
--> |
||||
|
||||
<chapter id="maintenance"> |
||||
<title>Routine Database Maintenance Tasks</title> |
||||
|
||||
<sect1 id="maintenance-general"> |
||||
<title>General Discussion</Title> |
||||
|
||||
<para> |
||||
There are a few routine maintenance chores that must be performed on |
||||
a regular basis to keep a <productname>PostgreSQL</productname> |
||||
installation running smoothly. The tasks discussed here are repetitive |
||||
in nature and can easily be automated using standard Unix tools such |
||||
as <filename>cron</filename> scripts. But it is the database |
||||
administrator's responsibility to set up appropriate scripts, and to |
||||
check that they execute successfully. |
||||
</para> |
||||
|
||||
<para> |
||||
One obvious maintenance task is creation of backup copies of the data on a |
||||
regular schedule. Without a recent backup, you have no chance of recovery |
||||
after a catastrophe (disk failure, fire, mistakenly dropping a critical |
||||
table, etc). The backup and recovery mechanisms available in |
||||
<productname>PostgreSQL</productname> are discussed at length in |
||||
<xref linkend="backup">. |
||||
</para> |
||||
|
||||
<para> |
||||
The other main category of maintenance task is periodic <quote>vacuuming</> |
||||
of the database. This activity is discussed in |
||||
<xref linkend="routine-vacuuming">. |
||||
</para> |
||||
|
||||
<!-- probably ought to have something about log rotation here; |
||||
any other important maintenance tasks? |
||||
--> |
||||
|
||||
<para> |
||||
<productname>PostgreSQL</productname> is low-maintenance compared to |
||||
some other database products. Nonetheless, appropriate attention to |
||||
these tasks will go far towards ensuring a pleasant and productive |
||||
experience with the system. |
||||
</para> |
||||
|
||||
</sect1> |
||||
|
||||
<sect1 id="routine-vacuuming"> |
||||
<title>Routine Vacuuming</title> |
||||
|
||||
<para> |
||||
<productname>PostgreSQL</productname>'s <command>VACUUM</> command must be |
||||
run on a regular basis for several reasons: |
||||
|
||||
<orderedlist> |
||||
<listitem> |
||||
<simpara>To recover disk space occupied by updated or deleted |
||||
rows.</simpara> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<simpara>To update data statistics used by the |
||||
<productname>PostgreSQL</productname> query planner.</simpara> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<simpara>To protect against loss of very old data due to |
||||
<firstterm>transaction ID wraparound</>.</simpara> |
||||
</listitem> |
||||
</orderedlist> |
||||
|
||||
The frequency and scope of <command>VACUUM</>s performed for each of |
||||
these reasons will vary depending on the needs of each installation. |
||||
Therefore, database administrators must understand these issues and |
||||
develop an appropriate maintenance strategy. This section concentrates |
||||
on explaining the high-level issues; for details about command syntax |
||||
and so on, see the <command>VACUUM</> command reference page. |
||||
</para> |
||||
|
||||
<para> |
||||
Beginning in <productname>PostgreSQL</productname> 7.2, the standard form |
||||
of <command>VACUUM</> can run in parallel with normal database operations |
||||
(selects, inserts, updates, deletes, but not changes to table schemas). |
||||
Routine vacuuming is therefore not nearly as intrusive as it was in prior |
||||
releases, and it's not as critical to try to schedule it at low-usage |
||||
times of day. |
||||
</para> |
||||
|
||||
<sect2 id="vacuum-for-space-recovery"> |
||||
<title>Recovering disk space</title> |
||||
|
||||
<para> |
||||
In normal <productname>PostgreSQL</productname> operation, an UPDATE or |
||||
DELETE of a row does not immediately remove the old <firstterm>tuple</> |
||||
(version of the row). This approach is necessary to gain the benefits |
||||
of multi-version concurrency control (see the User's Guide): the tuple |
||||
must not be deleted while |
||||
it is still potentially visible to other transactions. But eventually, |
||||
an outdated or deleted tuple is no longer of interest to any transaction. |
||||
The space it occupies must be reclaimed for reuse by new tuples, to avoid |
||||
infinite growth of disk space requirements. This is done by running |
||||
<command>VACUUM</>. |
||||
</para> |
||||
|
||||
<para> |
||||
Clearly, a table that receives frequent updates or deletes will need |
||||
to be vacuumed more often than tables that are seldom updated. It may |
||||
be useful to set up periodic cron tasks that vacuum only selected tables, |
||||
skipping tables that are known not to change often. This is only likely |
||||
to be helpful if you have both large heavily-updated tables and large |
||||
seldom-updated tables --- the extra cost of vacuuming a small table |
||||
isn't enough to be worth worrying about. |
||||
</para> |
||||
|
||||
<para> |
||||
The standard form of <command>VACUUM</> is best used with the goal of |
||||
maintaining a fairly level steady-state usage of disk space. The standard |
||||
form finds old tuples and makes their space available for re-use within |
||||
the table, but it does not try very hard to shorten the table file and |
||||
return disk space to the operating system. If you need to return disk |
||||
space to the operating system you can use <command>VACUUM FULL</> --- |
||||
but what's the point of releasing disk space that will only have to be |
||||
allocated again soon? Moderately frequent standard <command>VACUUM</>s |
||||
are a better approach than infrequent <command>VACUUM FULL</>s for |
||||
maintaining heavily-updated tables. |
||||
</para> |
||||
|
||||
<para> |
||||
Recommended practice for most sites is to schedule a database-wide |
||||
<command>VACUUM</> once a day at a low-usage time of day, supplemented |
||||
by more frequent vacuuming of heavily-updated tables if necessary. |
||||
(If you have multiple databases in an installation, don't forget to |
||||
vacuum each one; the <filename>vacuumdb</> script may be helpful.) |
||||
Use plain <command>VACUUM</>, not <command>VACUUM FULL</>, for routine |
||||
vacuuming for space recovery. |
||||
</para> |
||||
|
||||
<para> |
||||
<command>VACUUM FULL</> is recommended for cases where you know you have |
||||
deleted the majority of tuples in a table, so that the steady-state size |
||||
of the table can be shrunk substantially with <command>VACUUM FULL</>'s |
||||
more aggressive approach. |
||||
</para> |
||||
|
||||
<para> |
||||
If you have a table whose contents are deleted completely every so often, |
||||
consider doing it with <command>TRUNCATE</> rather than using |
||||
<command>DELETE</> followed by <command>VACUUM</>. |
||||
</para> |
||||
</sect2> |
||||
|
||||
<sect2 id="vacuum-for-statistics"> |
||||
<title>Updating planner statistics</title> |
||||
|
||||
<para> |
||||
The <productname>PostgreSQL</productname> query planner relies on |
||||
statistical information about the contents of tables in order to |
||||
generate good plans for queries. These statistics are gathered by |
||||
the <command>ANALYZE</> command, which can be invoked by itself or |
||||
as an optional step in <command>VACUUM</>. It is important to have |
||||
reasonably accurate statistics, otherwise poor choices of plans may |
||||
degrade database performance. |
||||
</para> |
||||
|
||||
<para> |
||||
As with vacuuming for space recovery, frequent updates of statistics |
||||
are more useful for heavily-updated tables than for seldom-updated ones. |
||||
But even for a heavily-updated table, there may be no need for |
||||
statistics updates if the statistical distribution of the data is not |
||||
changing much. A simple rule of thumb is to think about how much |
||||
the minimum and maximum values of the columns in the table change. |
||||
For example, a timestamp column that contains the time of row update |
||||
will have a constantly-increasing maximum value as rows are added and |
||||
updated; such a column will probably need more frequent statistics |
||||
updates than, say, a column containing URLs for pages accessed on a |
||||
website. The URL column may receive changes just as often, but the |
||||
statistical distribution of its values probably changes relatively |
||||
slowly. |
||||
</para> |
||||
|
||||
<para> |
||||
It is possible to run <command>ANALYZE</> on specific tables and even |
||||
just specific columns of a table, so the flexibility exists to update some |
||||
statistics more frequently than others if your application requires it. |
||||
In practice, however, the usefulness of this feature is doubtful. |
||||
Beginning in <productname>PostgreSQL</productname> 7.2, |
||||
<command>ANALYZE</> is a fairly fast operation even on large tables, |
||||
because it uses a statistical random sampling of the rows of a table |
||||
rather than reading every single row. So it's probably much simpler |
||||
to just run it over the whole database every so often. |
||||
</para> |
||||
|
||||
<tip> |
||||
<para> |
||||
Although per-column tweaking of <command>ANALYZE</> frequency may not be |
||||
very productive, you may well find it worthwhile to do per-column |
||||
adjustment of the level of detail of the statistics collected by |
||||
<command>ANALYZE</>. Columns that are heavily used in WHERE clauses |
||||
and have highly irregular data distributions may require a finer-grain |
||||
data histogram than other columns. See <command>ALTER TABLE SET |
||||
STATISTICS</>. |
||||
</para> |
||||
</tip> |
||||
|
||||
<para> |
||||
Recommended practice for most sites is to schedule a database-wide |
||||
<command>ANALYZE</> once a day at a low-usage time of day; this can |
||||
usefully be combined with a nightly <command>VACUUM</>. However, |
||||
sites with relatively slowly changing table statistics may find that |
||||
this is overkill, and that less-frequent <command>ANALYZE</> runs |
||||
are sufficient. |
||||
</para> |
||||
</sect2> |
||||
|
||||
<sect2 id="vacuum-for-wraparound"> |
||||
<title>Preventing transaction ID wraparound failures</title> |
||||
|
||||
<para> |
||||
<productname>PostgreSQL</productname>'s MVCC transaction semantics |
||||
depend on being able to compare transaction ID (<firstterm>XID</>) |
||||
numbers: a tuple with an insertion XID newer than the current |
||||
transaction's XID is <quote>in the future</> and should not be visible |
||||
to the current transaction. But since transaction IDs have limited size |
||||
(32 bits at this writing) an installation that runs for a long time (more |
||||
than 4 billion transactions) will suffer <firstterm>transaction ID |
||||
wraparound</>: the XID counter wraps around to zero, and all of a sudden |
||||
transactions that were in the past appear to be in the future --- which |
||||
means their outputs become invisible. In short, catatrophic data loss. |
||||
(Actually the data is still there, but that's cold comfort if you can't |
||||
get at it.) |
||||
</para> |
||||
|
||||
<para> |
||||
Prior to <productname>PostgreSQL</productname> 7.2, the only defense |
||||
against XID wraparound was to re-initdb at least every 4 billion |
||||
transactions. This of course was not very satisfactory for high-traffic |
||||
sites, so a better solution has been devised. The new approach allows an |
||||
installation to remain up indefinitely, without initdb or any sort of |
||||
restart. The price is this maintenance requirement: <emphasis>every table |
||||
in the database must be VACUUMed more often than once every billion |
||||
transactions</emphasis>. |
||||
</para> |
||||
|
||||
<para> |
||||
In practice this isn't an onerous requirement, but since the consequences |
||||
of failing to meet it can be complete data loss (not just wasted disk |
||||
space or slow performance), some special provisions have been made to help |
||||
database administrators keep track of the time since the last |
||||
<command>VACUUM</>. The remainder of this section gives the details. |
||||
</para> |
||||
|
||||
<para> |
||||
The new approach to XID comparison distinguishes two special XIDs, numbers |
||||
1 and 2 (<literal>BootstrapXID</> and <literal>FrozenXID</>). These two |
||||
XIDs are always considered older than every normal XID. Normal XIDs (those |
||||
greater than 2) are compared using modulo-2^31 arithmetic. This means |
||||
that for every normal XID, there are two billion XIDs that are |
||||
<quote>older</> and two billion that are <quote>newer</>; another way to |
||||
say it is that the normal XID space is circular with no endpoint. |
||||
Therefore, once a tuple has been created with a particular normal XID, the |
||||
tuple will appear to be <quote>in the past</> for the next two billion |
||||
transactions, no matter which normal XID we are talking about. If the |
||||
tuple still exists after more than two billion transactions, it would |
||||
suddenly appear to be in the future. To prevent data loss, old tuples |
||||
must be reassigned the XID <literal>FrozenXID</> sometime before they reach |
||||
the two-billion-transactions-old mark. Once they are assigned this |
||||
special XID, they will appear to be <quote>in the past</> to all normal |
||||
transactions regardless of wraparound issues, and so such tuples will be |
||||
good until deleted, no matter how long that is. This reassignment of |
||||
XID is handled by <command>VACUUM</>. |
||||
</para> |
||||
|
||||
<para> |
||||
<command>VACUUM</>'s normal policy is to reassign <literal>FrozenXID</> |
||||
to any tuple with a normal XID more than one billion transactions in the |
||||
past. This policy preserves the original insertion XID until it is not |
||||
likely to be of interest anymore (in fact, most tuples will probably |
||||
live and die without ever being <quote>frozen</>). This policy means |
||||
that the maximum safe interval between <command>VACUUM</>s of any table |
||||
is exactly one billion transactions: if you wait longer, it's possible |
||||
that a tuple that was not quite old enough to be reassigned last time |
||||
is now more than two billion transactions old and has wrapped around |
||||
into the future --- ie, is lost to you. (Of course, it'll reappear |
||||
after another two billion transactions, but that's no help.) |
||||
</para> |
||||
|
||||
<para> |
||||
Since periodic <command>VACUUM</>s are needed anyway for the reasons |
||||
described earlier, it's unlikely that any table would not be vacuumed |
||||
for as long as a billion transactions. But to help administrators ensure |
||||
this constraint is met, <command>VACUUM</> stores transaction ID |
||||
statistics in the system table <filename>pg_database</>. In particular, |
||||
the <filename>datfrozenxid</> field of a database's |
||||
<filename>pg_database</> row is updated at the completion of any |
||||
database-wide vacuum operation (ie, <command>VACUUM</> that does not |
||||
name a specific table). The value stored in this field is the freeze |
||||
cutoff XID that was used by that <command>VACUUM</> command. All normal |
||||
XIDs older than this cutoff XID are guaranteed to have been replaced by |
||||
<literal>FrozenXID</> within that database. A convenient way to |
||||
examine this information is to execute the query |
||||
<informalexample> |
||||
<programlisting> |
||||
select datname, age(datfrozenxid) from pg_database; |
||||
</programlisting> |
||||
</informalexample> |
||||
The <literal>age</> column measures the number of transactions from the |
||||
cutoff XID to the current transaction's XID. |
||||
</para> |
||||
|
||||
<para> |
||||
With the standard freezing policy, the <literal>age</> column will start |
||||
at one billion for a freshly-vacuumed database. When the <literal>age</> |
||||
approaches two billion, the database must be vacuumed again to avoid |
||||
risk of wraparound failures. Recommended practice is to vacuum each |
||||
database at least once every half-a-billion (500 million) transactions, |
||||
so as to provide plenty of safety margin. To help meet this rule, |
||||
each database-wide <command>VACUUM</> automatically delivers a warning |
||||
if there are any <filename>pg_database</> entries showing an |
||||
<literal>age</> of more than 1.5 billion transactions, for example: |
||||
<informalexample> |
||||
<programlisting> |
||||
play=# vacuum; |
||||
NOTICE: Some databases have not been vacuumed in 1613770184 transactions. |
||||
Better vacuum them within 533713463 transactions, |
||||
or you may have a wraparound failure. |
||||
VACUUM |
||||
</programlisting> |
||||
</informalexample> |
||||
</para> |
||||
|
||||
<para> |
||||
<command>VACUUM</> with the <command>FREEZE</> option uses a more |
||||
aggressive freezing policy: tuples are frozen if they are old enough |
||||
to be considered good by all open transactions. In particular, if |
||||
a <command>VACUUM FREEZE</> is performed in an otherwise-idle database, |
||||
it is guaranteed that <emphasis>all</> tuples in that database will be |
||||
frozen. Hence, as long the database is not modified in any way, it |
||||
will not need subsequent vacuuming to avoid transaction ID wraparound |
||||
problems. This technique is used by <filename>initdb</> to prepare the |
||||
<filename>template0</> database. It should also be used to prepare any |
||||
user-created databases that are to be marked <literal>datallowconn</> = |
||||
<literal>false</> in <filename>pg_database</>, since there isn't any |
||||
convenient way to vacuum a database that you can't connect to. Note |
||||
that VACUUM's automatic warning message about unvacuumed databases will |
||||
ignore <filename>pg_database</> entries with <literal>datallowconn</> = |
||||
<literal>false</>, so as to avoid giving false warnings about these |
||||
databases; therefore it's up to you to ensure that such databases are |
||||
frozen correctly. |
||||
</para> |
||||
|
||||
</sect2> |
||||
</sect1> |
||||
</chapter> |
||||
|
||||
<!-- Keep this comment at the end of the file |
||||
Local variables: |
||||
mode:sgml |
||||
sgml-omittag:nil |
||||
sgml-shorttag:t |
||||
sgml-minimize-attributes:nil |
||||
sgml-always-quote-attributes:t |
||||
sgml-indent-step:1 |
||||
sgml-indent-data:t |
||||
sgml-parent-document:nil |
||||
sgml-default-dtd-file:"./reference.ced" |
||||
sgml-exposed-tags:nil |
||||
sgml-local-catalogs:("/usr/lib/sgml/catalog") |
||||
sgml-local-ecat-files:nil |
||||
End: |
||||
--> |
Loading…
Reference in new issue