@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags
test=# SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----------+-------
magic | 340322
version | 3
version | 4
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 582
last_cleanup_num_tuples | 1000
allequalimage | f
</screen>
</para>
</listitem>
@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
live_items | 224
dead_items | 0
avg_item_size | 12
avg_item_size | 16
page_size | 8192
free_size | 4056
free_size | 3668
btpo_prev | 0
btpo_next | 0
btpo | 0
@ -356,33 +357,75 @@ btpo_flags | 3
<function>bt_page_items</function> returns detailed information about
all of the items on a B-tree index page. For example:
<screen>
test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
(13 rows)
</screen>
In a B-tree leaf page, <structfield>ctid</structfield> points to a heap tuple.
In an internal page, the block number part of <structfield>ctid</structfield>
points to another page in the index itself, while the offset part
(the second number) is ignored and is usually 1.
This is a B-tree leaf page. All tuples that point to the table
happen to be posting list tuples (all of which store a total of
100 6 byte TIDs). There is also a <quote>high key</quote> tuple
at <literal>itemoffset</literal> number 1.
<structfield>ctid</structfield> is used to store encoded
information about each tuple in this example, though leaf page
tuples often store a heap TID directly in the
<structfield>ctid</structfield> field instead.
<structfield>tids</structfield> is the list of TIDs stored as a
posting list.
</para>
<para>
In an internal page (not shown), the block number part of
<structfield>ctid</structfield> is a <quote>downlink</quote>,
which is a block number of another page in the index itself.
The offset part (the second number) of
<structfield>ctid</structfield> stores encoded information about
the tuple, such as the number of columns present (suffix
truncation may have removed unneeded suffix columns). Truncated
columns are treated as having the value <quote>minus
infinity</quote>.
</para>
<para>
<structfield>htid</structfield> shows a heap TID for the tuple,
regardless of the underlying tuple representation. This value
may match <structfield>ctid</structfield>, or may be decoded
from the alternative representations used by posting list tuples
and tuples from internal pages. Tuples in internal pages
usually have the implementation level heap TID column truncated
away, which is represented as a NULL
<structfield>htid</structfield> value.
</para>
<para>
Note that the first item on any non-rightmost page (any page with
a non-zero value in the <structfield>btpo_next</structfield> field) is the
page's <quote>high key</quote>, meaning its <structfield>data</structfield>
serves as an upper bound on all items appearing on the page, while
its <structfield>ctid</structfield> field is meaningless. Also, on non-leaf
pages, the first real data item (the first item that is not a high
key) is a <quote>minus infinity</quote> item, with no actual value
in its <structfield>data</structfield> field. Such an item does have a valid
downlink in its <structfield>ctid</structfield> field, however.
its <structfield>ctid</structfield> field does not point to
another block. Also, on internal pages, the first real data
item (the first item that is not a high key) reliably has every
column truncated away, leaving no actual value in its
<structfield>data</structfield> field. Such an item does have a
valid downlink in its <structfield>ctid</structfield> field,
however.
</para>
<para>
For more details about the structure of B-tree indexes, see
<xref linkend="btree-structure"/>. For more details about
deduplication and posting lists, see <xref
linkend="btree-deduplication"/>.
</para>
</listitem>
</varlistentry>
@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
with <function>get_raw_page</function> should be passed as argument. So
the last example could also be rewritten like this:
<screen>
test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 00
test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
(13 rows)
</screen>
All the other details are the same as explained in the previous item.
</para>