Add new documentation on page format.

Martijn van Ooster
24 years ago · d1fcd337e0
parent 42ef2c9cb7
commit d1fcd337e0
1 changed files with 234 additions and 88 deletions
--- a/doc/src/sgml/page.sgml
+++ b/doc/src/sgml/page.sgml
@ -22,9 +22,13 @@ refers to data that is stored in <productname>PostgreSQL</productname> tables.
 </para>

 <para>
-<xref linkend="page-table"> shows how pages in both normal <productname>PostgreSQL</productname> tables
- and <productname>PostgreSQL</productname> indexes
-(e.g., a B-tree index) are structured.
+
+<xref linkend="page-table"> shows how pages in both normal
+ <productname>PostgreSQL</productname> tables and
+ <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
+are structured. This structure is also used for toast tables and sequences.
+There are five parts to each page.
+
 </para>

 <table tocentry="1" id="page-table">
@ -43,113 +47,255 @@ Item
 <tbody>

 <row>
-<entry>itemPointerData</entry>
-</row>
-
-<row>
-<entry>filler</entry>
+ <entry>PageHeaderData</entry>
+ <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
 </row>

 <row>
-<entry>itemData...</entry>
+<entry>itemPointerData</entry>
+<entry>List of (offset,length) pairs pointing to the actual item.</entry>
 </row>

 <row>
-<entry>Unallocated Space</entry>
+<entry>Free space</entry>
+<entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry>
 </row>

 <row>
-<entry>ItemContinuationData</entry>
+<entry>items</entry>
+<entry>The actual items themselves. Different access method have different data here.</entry>
 </row>

 <row>
 <entry>Special Space</entry>
+<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
 </row>

-<row>
-<entry><quote>ItemData 2</quote></entry>
-</row>
+</tbody>
+</tgroup>
+</table>

-<row>
-<entry><quote>ItemData 1</quote></entry>
-</row>
+ <para>

-<row>
-<entry>ItemIdData</entry>
-</row>
+  The first 20 bytes of each page consists of a page header
+  (PageHeaderData). It's format is detailed in <xref
+  linkend="pageheaderdata-table">. The first two fields deal with WAL
+  related stuff. This is followed by three 2-byte integer fields
+  (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
+  <firstterm>special</firstterm>). These represent byte offsets to the start
+  of unallocated space, to the end of unallocated space, and to the start of
+  the special space. 
+  
+ </para>
+ 
+ <table tocentry="1" id="pageheaderdata-table">
+ <title>PageHeaderData Layout</title>
+ <titleabbrev>PageHeaderData Layout</titleabbrev>
+ <tgroup cols="4">   
+ <thead>
+  <row> 
+   <entry>Field</entry>
+   <entry>Type</entry>
+   <entry>Length</entry>
+   <entry>Description</entry>
+  </row>
+ </thead>
+ <tbody>
+  <row>
+   <entry>pd_lsn</entry>
+   <entry>XLogRecPtr</entry>
+   <entry>6 bytes</entry>
+   <entry>LSN: next byte after last byte of xlog</entry>
+  </row>
+  <row>
+   <entry>pd_sui</entry>
+   <entry>StartUpID</entry>
+   <entry>4 bytes</entry>
+   <entry>SUI of last changes (currently it's used by heap AM only)</entry>
+  </row>
+  <row>
+   <entry>pd_lower</entry>
+   <entry>LocationIndex</entry>
+   <entry>2 bytes</entry>
+   <entry>Offset to start of free space.</entry>
+  </row>
+  <row>
+   <entry>pd_upper</entry>
+   <entry>LocationIndex</entry>
+   <entry>2 bytes</entry>
+   <entry>Offset to end of free space.</entry>
+  </row>
+  <row>
+   <entry>pd_special</entry>
+   <entry>LocationIndex</entry>
+   <entry>2 bytes</entry>
+   <entry>Offset to start of special space.</entry>
+  </row>
+  <row>
+   <entry>pd_opaque</entry>
+   <entry>OpaqueData</entry>
+   <entry>2 bytes</entry>
+   <entry>AM-generic information. Currently just stores the page size.</entry>
+  </row>
+ </tbody>
+ </tgroup>
+ </table>

-<row>
-<entry>PageHeaderData</entry>
-</row>
+ <para>  
+  Special space is a region at the end of the page that is allocated at page
+  initialization time and contains information specific to an access method. 
+  The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
+  currently only stores the page size.  Page size is stored in each page
+  because frames in the buffer pool may be subdivided into equal sized pages
+  on a frame by frame basis within a table (is this true? - mvo).

-</tbody>
-</tgroup>
-</table>
+ </para>

-<!--
-.\" Running
-.\" .q .../bin/dumpbpages
-.\" or
-.\" .q .../src/support/dumpbpages
-.\" as the postgres superuser
-.\" with the file paths associated with
-.\" (heap or B-tree index) classes,
-.\" .q .../data/base/<database-name>/<class-name>,
-.\" will display the page structure used by the classes.
-.\" Specifying the
-.\" .q -r
-.\" flag will cause the classes to be
-.\" treated as heap classes and for more information to be displayed.
-->
+ <para>

-<para>
-The first 8 bytes of each page consists of a page header
-(PageHeaderData).
-Within the header, the first three 2-byte integer fields
-(<firstterm>lower</firstterm>,
-<firstterm>upper</firstterm>,
-and
-<firstterm>special</firstterm>)
-represent byte offsets to the start of unallocated space, to the end
-of unallocated space, and to the start of <firstterm>special space</firstterm>.
-Special space is a region at the end of the page that is allocated at
-page initialization time and contains information specific to an
-access method.  The last 2 bytes of the page header,
-<firstterm>opaque</firstterm>,
-encode the page size and information on the internal fragmentation of
-the page.  Page size is stored in each page because frames in the
-buffer pool may be subdivided into equal sized pages on a frame by
-frame basis within a table.  The internal fragmentation information is
-used to aid in determining when page reorganization should occur.
-</para>
+  Following the page header are item identifiers
+  (<firstterm>ItemIdData</firstterm>).  New item identifiers are allocated
+  from the first four bytes of unallocated space.  Because an item
+  identifier is never moved until it is freed, its index may be used to
+  indicate the location of an item on a page.  In fact, every pointer to an
+  item (<firstterm>ItemPointer</firstterm>, also know as
+  <firstterm>CTID</firstterm>) created by
+  <productname>PostgreSQL</productname> consists of a frame number and an
+  index of an item identifier.  An item identifier contains a byte-offset to
+  the start of an item, its length in bytes, and a set of attribute bits
+  which affect its interpretation.

-<para>
-Following the page header are item identifiers
-(<firstterm>ItemIdData</firstterm>).
-New item identifiers are allocated from the first four bytes of
-unallocated space.  Because an item identifier is never moved until it
-is freed, its index may be used to indicate the location of an item on
-a page.  In fact, every pointer to an item
-(<firstterm>ItemPointer</firstterm>)
-created by <productname>PostgreSQL</productname> consists of a frame number and an index of an item
-identifier.  An item identifier contains a byte-offset to the start of
-an item, its length in bytes, and a set of attribute bits which affect
-its interpretation.
-</para>
+ </para>

-<para>
-The items themselves are stored in space allocated backwards from
-the end of unallocated space.  Usually, the items are not interpreted.
-However when the item is too long to be placed on a single page or
-when fragmentation of the item is desired, the item is divided and
-each piece is handled as distinct items in the following manner.  The
-first through the next to last piece are placed in an item
-continuation structure
-(<firstterm>ItemContinuationData</firstterm>).
-This structure contains
-itemPointerData
-which points to the next piece and the piece itself.  The last piece
-is handled normally.
-</para>
+ <para>
+ 
+  The items themselves are stored in space allocated backwards from the end
+  of unallocated space.  The exact structure varies depending on what the
+  table is to contain. Sequences and tables both use a structure named
+  <firstterm>HeapTupleHeaderData</firstterm>, describe below.
+
+ </para>
+ 
+ <para>
+ 
+  The final section is the "special section" which may contain anything the
+  access method wishes to store. Ordinary tables do not use this at all
+  (indicated by setting the offset to the pagesize).
+  
+ </para>
+ 
+ <para>
+
+  All tuples are structured the same way. A header of around 31 bytes
+  followed by an optional null bitmask and the data. The header is detailed
+  below in <xref linkend="heaptupleheaderdata-table">.  The null bitmask is
+  only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
+  <firstterm>t_infomask</firstterm>. If it is present it takes up the space
+  between the end of the header and the beginning of the data, as indicated
+  by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
+  indicates not-null, a 0 bit is a null.
+  
+ </para>
+ 
+ <table tocentry="1" id="heaptupleheaderdata-table">
+ <title>HeapTupleHeaderData Layout</title>
+ <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
+ <tgroup cols="4">   
+ <thead>
+  <row> 
+   <entry>Field</entry>
+   <entry>Type</entry>
+   <entry>Length</entry>
+   <entry>Description</entry>
+  </row>
+ </thead>
+ <tbody>
+  <row>
+   <entry>t_oid</entry>
+   <entry>Oid</entry>
+   <entry>4 bytes</entry>
+   <entry>OID of this tuple</entry>
+  </row>
+  <row>
+   <entry>t_cmin</entry>
+   <entry>CommandId</entry>
+   <entry>4 bytes</entry>
+   <entry>insert CID stamp</entry>
+  </row>
+  <row>
+   <entry>t_cmax</entry>
+   <entry>CommandId</entry>
+   <entry>4 bytes</entry>
+   <entry>delete CID stamp</entry>
+  </row>
+  <row>
+   <entry>t_xmin</entry>
+   <entry>TransactionId</entry>
+   <entry>4 bytes</entry>
+   <entry>insert XID stamp</entry>
+  </row>
+  <row>
+   <entry>t_xmax</entry>
+   <entry>TransactionId</entry>
+   <entry>4 bytes</entry>
+   <entry>delete XID stamp</entry>
+  </row>
+  <row>
+   <entry>t_ctid</entry>
+   <entry>ItemPointerData</entry>
+   <entry>6 bytes</entry>
+   <entry>current TID of this or newer tuple</entry>
+  </row>
+  <row>
+   <entry>t_natts</entry>
+   <entry>int16</entry>
+   <entry>2 bytes</entry>
+   <entry>number of attributes</entry>
+  </row>
+  <row>
+   <entry>t_infomask</entry>
+   <entry>uint16</entry>
+   <entry>2 bytes</entry>
+   <entry>Various flags</entry>
+  </row>
+  <row>
+   <entry>t_hoff</entry>
+   <entry>uint8</entry>
+   <entry>1 byte</entry>
+   <entry>length of tuple header. Also offset of data.</entry>
+  </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ 
+  All the details may be found in src/include/storage/bufpage.h.
+  
+ </para>
+
+ <para>
+ 
+  Interpreting the actual data can only be done with information obtained
+  from other tables, mostly <firstterm>pg_attribute</firstterm>. The
+  particular fields are <firstterm>attlen</firstterm> and
+  <firstterm>attalign</firstterm>. There is no way to directly get a
+  particular attribute, except when there are only fixed width fields and no
+  NULLs. All this trickery is wrapped up in the functions
+  <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
+  and <firstterm>heap_getsysattr</firstterm>.
+  
+ </para>
+ <para>

+  To read the data you need to examine each attribute in turn. First check
+  whether the field is NULL according to the null bitmap. If it is, go to
+  the next. Then make sure you have the right alignment.  If the field is a
+  fixed width field, then all the bytes are simply placed. If it's a
+  variable length field (attlen == -1) then it's a bit more complicated,
+  using the variable length structure <firstterm>varattrib</firstterm>.
+  Depending on the flags, the data may be either inline, compressed or in
+  another table (TOAST).
+  
+ </para>
 </chapter>