mirror of https://github.com/postgres/postgres
Up to now, useful info for writing a new btree opclass has been buried in the backend's nbtree/README file. Let's move it into the SGML docs, in preparation for extending it with info about "in_range" functions in the upcoming window RANGE patch. To do this, I chose to create a new chapter for btree indexes in Part VII (Internals), parallel to the chapters that exist for the newer index AMs. This is a pretty short chapter as-is. At some point somebody might care to flesh it out with more detail about btree internals, but that is beyond the scope of my ambition for today. Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.uspull/31/merge
parent
f069c91a57
commit
3785f7eee3
@ -0,0 +1,267 @@ |
||||
<!-- doc/src/sgml/btree.sgml --> |
||||
|
||||
<chapter id="btree"> |
||||
<title>B-Tree Indexes</title> |
||||
|
||||
<indexterm> |
||||
<primary>index</primary> |
||||
<secondary>B-Tree</secondary> |
||||
</indexterm> |
||||
|
||||
<sect1 id="btree-intro"> |
||||
<title>Introduction</title> |
||||
|
||||
<para> |
||||
<productname>PostgreSQL</productname> includes an implementation of the |
||||
standard <acronym>btree</acronym> (multi-way binary tree) index data |
||||
structure. Any data type that can be sorted into a well-defined linear |
||||
order can be indexed by a btree index. The only limitation is that an |
||||
index entry cannot exceed approximately one-third of a page (after TOAST |
||||
compression, if applicable). |
||||
</para> |
||||
|
||||
<para> |
||||
Because each btree operator class imposes a sort order on its data type, |
||||
btree operator classes (or, really, operator families) have come to be |
||||
used as <productname>PostgreSQL</productname>'s general representation |
||||
and understanding of sorting semantics. Therefore, they've acquired |
||||
some features that go beyond what would be needed just to support btree |
||||
indexes, and parts of the system that are quite distant from the |
||||
btree AM make use of them. |
||||
</para> |
||||
|
||||
</sect1> |
||||
|
||||
<sect1 id="btree-behavior"> |
||||
<title>Behavior of B-Tree Operator Classes</title> |
||||
|
||||
<para> |
||||
As shown in <xref linkend="xindex-btree-strat-table"/>, a btree operator |
||||
class must provide five comparison operators, |
||||
<literal><</literal>, |
||||
<literal><=</literal>, |
||||
<literal>=</literal>, |
||||
<literal>>=</literal> and |
||||
<literal>></literal>. |
||||
One might expect that <literal><></literal> should also be part of |
||||
the operator class, but it is not, because it would almost never be |
||||
useful to use a <literal><></literal> WHERE clause in an index |
||||
search. (For some purposes, the planner treats <literal><></literal> |
||||
as associated with a btree operator class; but it finds that operator via |
||||
the <literal>=</literal> operator's negator link, rather than |
||||
from <structname>pg_amop</structname>.) |
||||
</para> |
||||
|
||||
<para> |
||||
When several data types share near-identical sorting semantics, their |
||||
operator classes can be grouped into an operator family. Doing so is |
||||
advantageous because it allows the planner to make deductions about |
||||
cross-type comparisons. Each operator class within the family should |
||||
contain the single-type operators (and associated support functions) |
||||
for its input data type, while cross-type comparison operators and |
||||
support functions are <quote>loose</quote> in the family. It is |
||||
recommendable that a complete set of cross-type operators be included |
||||
in the family, thus ensuring that the planner can represent any |
||||
comparison conditions that it deduces from transitivity. |
||||
</para> |
||||
|
||||
<para> |
||||
There are some basic assumptions that a btree operator family must |
||||
satisfy: |
||||
</para> |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
An <literal>=</literal> operator must be an equivalence relation; that |
||||
is, for all non-null values <replaceable>A</replaceable>, |
||||
<replaceable>B</replaceable>, <replaceable>C</replaceable> of the |
||||
data type: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
<replaceable>A</replaceable> <literal>=</literal> |
||||
<replaceable>A</replaceable> is true |
||||
(<firstterm>reflexive law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
if <replaceable>A</replaceable> <literal>=</literal> |
||||
<replaceable>B</replaceable>, |
||||
then <replaceable>B</replaceable> <literal>=</literal> |
||||
<replaceable>A</replaceable> |
||||
(<firstterm>symmetric law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
if <replaceable>A</replaceable> <literal>=</literal> |
||||
<replaceable>B</replaceable> and <replaceable>B</replaceable> |
||||
<literal>=</literal> <replaceable>C</replaceable>, |
||||
then <replaceable>A</replaceable> <literal>=</literal> |
||||
<replaceable>C</replaceable> |
||||
(<firstterm>transitive law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
A <literal><</literal> operator must be a strong ordering relation; |
||||
that is, for all non-null values <replaceable>A</replaceable>, |
||||
<replaceable>B</replaceable>, <replaceable>C</replaceable>: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
<replaceable>A</replaceable> <literal><</literal> |
||||
<replaceable>A</replaceable> is false |
||||
(<firstterm>irreflexive law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
if <replaceable>A</replaceable> <literal><</literal> |
||||
<replaceable>B</replaceable> |
||||
and <replaceable>B</replaceable> <literal><</literal> |
||||
<replaceable>C</replaceable>, |
||||
then <replaceable>A</replaceable> <literal><</literal> |
||||
<replaceable>C</replaceable> |
||||
(<firstterm>transitive law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Furthermore, the ordering is total; that is, for all non-null |
||||
values <replaceable>A</replaceable>, <replaceable>B</replaceable>: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
exactly one of <replaceable>A</replaceable> <literal><</literal> |
||||
<replaceable>B</replaceable>, <replaceable>A</replaceable> |
||||
<literal>=</literal> <replaceable>B</replaceable>, and |
||||
<replaceable>B</replaceable> <literal><</literal> |
||||
<replaceable>A</replaceable> is true |
||||
(<firstterm>trichotomy law</firstterm>) |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
|
||||
(The trichotomy law justifies the definition of the comparison support |
||||
function, of course.) |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
|
||||
<para> |
||||
The other three operators are defined in terms of <literal>=</literal> |
||||
and <literal><</literal> in the obvious way, and must act consistently |
||||
with them. |
||||
</para> |
||||
|
||||
<para> |
||||
For an operator family supporting multiple data types, the above laws must |
||||
hold when <replaceable>A</replaceable>, <replaceable>B</replaceable>, |
||||
<replaceable>C</replaceable> are taken from any data types in the family. |
||||
The transitive laws are the trickiest to ensure, as in cross-type |
||||
situations they represent statements that the behaviors of two or three |
||||
different operators are consistent. |
||||
As an example, it would not work to put <type>float8</type> |
||||
and <type>numeric</type> into the same operator family, at least not with |
||||
the current semantics that <type>numeric</type> values are converted |
||||
to <type>float8</type> for comparison to a <type>float8</type>. Because |
||||
of the limited accuracy of <type>float8</type>, this means there are |
||||
distinct <type>numeric</type> values that will compare equal to the |
||||
same <type>float8</type> value, and thus the transitive law would fail. |
||||
</para> |
||||
|
||||
<para> |
||||
Another requirement for a multiple-data-type family is that any implicit |
||||
or binary-coercion casts that are defined between data types included in |
||||
the operator family must not change the associated sort ordering. |
||||
</para> |
||||
|
||||
<para> |
||||
It should be fairly clear why a btree index requires these laws to hold |
||||
within a single data type: without them there is no ordering to arrange |
||||
the keys with. Also, index searches using a comparison key of a |
||||
different data type require comparisons to behave sanely across two |
||||
data types. The extensions to three or more data types within a family |
||||
are not strictly required by the btree index mechanism itself, but the |
||||
planner relies on them for optimization purposes. |
||||
</para> |
||||
|
||||
</sect1> |
||||
|
||||
<sect1 id="btree-support-funcs"> |
||||
<title>B-Tree Support Functions</title> |
||||
|
||||
<para> |
||||
As shown in <xref linkend="xindex-btree-support-table"/>, btree defines |
||||
one required and one optional support function. |
||||
</para> |
||||
|
||||
<para> |
||||
For each combination of data types that a btree operator family provides |
||||
comparison operators for, it must provide a comparison support function, |
||||
registered in <structname>pg_amproc</structname> with support function |
||||
number 1 and |
||||
<structfield>amproclefttype</structfield>/<structfield>amprocrighttype</structfield> |
||||
equal to the left and right data types for the comparison (i.e., the |
||||
same data types that the matching operators are registered with |
||||
in <structname>pg_amop</structname>). |
||||
The comparison function must take two non-null values |
||||
<replaceable>A</replaceable> and <replaceable>B</replaceable> and |
||||
return an <type>int32</type> value that |
||||
is <literal><</literal> <literal>0</literal>, <literal>0</literal>, |
||||
or <literal>></literal> <literal>0</literal> |
||||
when <replaceable>A</replaceable> <literal><</literal> |
||||
<replaceable>B</replaceable>, <replaceable>A</replaceable> |
||||
<literal>=</literal> <replaceable>B</replaceable>, |
||||
or <replaceable>A</replaceable> <literal>></literal> |
||||
<replaceable>B</replaceable>, respectively. The function must not |
||||
return <literal>INT_MIN</literal> for the <replaceable>A</replaceable> |
||||
<literal><</literal> <replaceable>B</replaceable> case, |
||||
since the value may be negated before being tested for sign. A null |
||||
result is disallowed, too. |
||||
See <filename>src/backend/access/nbtree/nbtcompare.c</filename> for |
||||
examples. |
||||
</para> |
||||
|
||||
<para> |
||||
If the compared values are of a collatable data type, the appropriate |
||||
collation OID will be passed to the comparison support function, using |
||||
the standard <function>PG_GET_COLLATION()</function> mechanism. |
||||
</para> |
||||
|
||||
<para> |
||||
Optionally, a btree operator family may provide <firstterm>sort |
||||
support</firstterm> function(s), registered under support function number |
||||
2. These functions allow implementing comparisons for sorting purposes |
||||
in a more efficient way than naively calling the comparison support |
||||
function. The APIs involved in this are defined in |
||||
<filename>src/include/utils/sortsupport.h</filename>. |
||||
</para> |
||||
|
||||
</sect1> |
||||
|
||||
<sect1 id="btree-implementation"> |
||||
<title>Implementation</title> |
||||
|
||||
<para> |
||||
An introduction to the btree index implementation can be found in |
||||
<filename>src/backend/access/nbtree/README</filename>. |
||||
</para> |
||||
|
||||
</sect1> |
||||
|
||||
</chapter> |
||||
Loading…
Reference in new issue