|
|
|
@ -3192,7 +3192,7 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
</sect1> |
|
|
|
</sect1> |
|
|
|
|
|
|
|
|
|
|
|
<sect1 id="textsearch-indexes"> |
|
|
|
<sect1 id="textsearch-indexes"> |
|
|
|
<title>GiST and GIN Index Types</title> |
|
|
|
<title>GIN and GiST Index Types</title> |
|
|
|
|
|
|
|
|
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<primary>text search</primary> |
|
|
|
<primary>text search</primary> |
|
|
|
@ -3213,18 +3213,17 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
<term> |
|
|
|
<term> |
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<primary>index</primary> |
|
|
|
<primary>index</primary> |
|
|
|
<secondary>GiST</secondary> |
|
|
|
<secondary>GIN</secondary> |
|
|
|
<tertiary>text search</tertiary> |
|
|
|
<tertiary>text search</tertiary> |
|
|
|
</indexterm> |
|
|
|
</indexterm> |
|
|
|
|
|
|
|
|
|
|
|
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable>);</literal> |
|
|
|
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal> |
|
|
|
</term> |
|
|
|
</term> |
|
|
|
|
|
|
|
|
|
|
|
<listitem> |
|
|
|
<listitem> |
|
|
|
<para> |
|
|
|
<para> |
|
|
|
Creates a GiST (Generalized Search Tree)-based index. |
|
|
|
Creates a GIN (Generalized Inverted Index)-based index. |
|
|
|
The <replaceable>column</replaceable> can be of <type>tsvector</> or |
|
|
|
The <replaceable>column</replaceable> must be of <type>tsvector</> type. |
|
|
|
<type>tsquery</> type. |
|
|
|
|
|
|
|
</para> |
|
|
|
</para> |
|
|
|
</listitem> |
|
|
|
</listitem> |
|
|
|
</varlistentry> |
|
|
|
</varlistentry> |
|
|
|
@ -3234,17 +3233,18 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
<term> |
|
|
|
<term> |
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<indexterm zone="textsearch-indexes"> |
|
|
|
<primary>index</primary> |
|
|
|
<primary>index</primary> |
|
|
|
<secondary>GIN</secondary> |
|
|
|
<secondary>GiST</secondary> |
|
|
|
<tertiary>text search</tertiary> |
|
|
|
<tertiary>text search</tertiary> |
|
|
|
</indexterm> |
|
|
|
</indexterm> |
|
|
|
|
|
|
|
|
|
|
|
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIN (<replaceable>column</replaceable>);</literal> |
|
|
|
<literal>CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING GIST (<replaceable>column</replaceable>);</literal> |
|
|
|
</term> |
|
|
|
</term> |
|
|
|
|
|
|
|
|
|
|
|
<listitem> |
|
|
|
<listitem> |
|
|
|
<para> |
|
|
|
<para> |
|
|
|
Creates a GIN (Generalized Inverted Index)-based index. |
|
|
|
Creates a GiST (Generalized Search Tree)-based index. |
|
|
|
The <replaceable>column</replaceable> must be of <type>tsvector</> type. |
|
|
|
The <replaceable>column</replaceable> can be of <type>tsvector</> or |
|
|
|
|
|
|
|
<type>tsquery</> type. |
|
|
|
</para> |
|
|
|
</para> |
|
|
|
</listitem> |
|
|
|
</listitem> |
|
|
|
</varlistentry> |
|
|
|
</varlistentry> |
|
|
|
@ -3253,13 +3253,18 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
</para> |
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
<para> |
|
|
|
There are substantial performance differences between the two index types, |
|
|
|
GIN indexes are the preferred text search index type. As inverted |
|
|
|
so it is important to understand their characteristics. |
|
|
|
indexes, they contain an index entry for each word (lexeme), with a |
|
|
|
|
|
|
|
compressed list of matching locations. Multi-word searches can find |
|
|
|
|
|
|
|
the first match, then use the index to remove rows that are lacking |
|
|
|
|
|
|
|
additional words. GIN indexes store only the words (lexemes) of |
|
|
|
|
|
|
|
<type>tsvector</> values, and not their weight labels. Thus a table |
|
|
|
|
|
|
|
row recheck is needed when using a query that involves weights. |
|
|
|
</para> |
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
<para> |
|
|
|
A GiST index is <firstterm>lossy</firstterm>, meaning that the index |
|
|
|
A GiST index is <firstterm>lossy</firstterm>, meaning that the index |
|
|
|
may produce false matches, and it is necessary |
|
|
|
might produce false matches, and it is necessary |
|
|
|
to check the actual table row to eliminate such false matches. |
|
|
|
to check the actual table row to eliminate such false matches. |
|
|
|
(<productname>PostgreSQL</productname> does this automatically when needed.) |
|
|
|
(<productname>PostgreSQL</productname> does this automatically when needed.) |
|
|
|
GiST indexes are lossy because each document is represented in the |
|
|
|
GiST indexes are lossy because each document is represented in the |
|
|
|
@ -3280,53 +3285,6 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
recommended. |
|
|
|
recommended. |
|
|
|
</para> |
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
GIN indexes are not lossy for standard queries, but their performance |
|
|
|
|
|
|
|
depends logarithmically on the number of unique words. |
|
|
|
|
|
|
|
(However, GIN indexes store only the words (lexemes) of <type>tsvector</> |
|
|
|
|
|
|
|
values, and not their weight labels. Thus a table row recheck is needed |
|
|
|
|
|
|
|
when using a query that involves weights.) |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
In choosing which index type to use, GiST or GIN, consider these |
|
|
|
|
|
|
|
performance differences: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<itemizedlist spacing="compact" mark="bullet"> |
|
|
|
|
|
|
|
<listitem> |
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
GIN index lookups are about three times faster than GiST |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
</listitem> |
|
|
|
|
|
|
|
<listitem> |
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
GIN indexes take about three times longer to build than GiST |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
</listitem> |
|
|
|
|
|
|
|
<listitem> |
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
GIN indexes are moderately slower to update than GiST indexes, but |
|
|
|
|
|
|
|
about 10 times slower if fast-update support was disabled |
|
|
|
|
|
|
|
(see <xref linkend="gin-fast-update"> for details) |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
</listitem> |
|
|
|
|
|
|
|
<listitem> |
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
GIN indexes are two-to-three times larger than GiST indexes |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
</listitem> |
|
|
|
|
|
|
|
</itemizedlist> |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
|
|
|
As a rule of thumb, <acronym>GIN</acronym> indexes are best for static data |
|
|
|
|
|
|
|
because lookups are faster. For dynamic data, GiST indexes are |
|
|
|
|
|
|
|
faster to update. Specifically, <acronym>GiST</acronym> indexes are very |
|
|
|
|
|
|
|
good for dynamic data and fast if the number of unique words (lexemes) is |
|
|
|
|
|
|
|
under 100,000, while <acronym>GIN</acronym> indexes will handle 100,000+ |
|
|
|
|
|
|
|
lexemes better but are slower to update. |
|
|
|
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
<para> |
|
|
|
Note that <acronym>GIN</acronym> index build time can often be improved |
|
|
|
Note that <acronym>GIN</acronym> index build time can often be improved |
|
|
|
by increasing <xref linkend="guc-maintenance-work-mem">, while |
|
|
|
by increasing <xref linkend="guc-maintenance-work-mem">, while |
|
|
|
@ -3335,7 +3293,7 @@ SELECT plainto_tsquery('supernovae stars'); |
|
|
|
</para> |
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
<para> |
|
|
|
Partitioning of big collections and the proper use of GiST and GIN indexes |
|
|
|
Partitioning of big collections and the proper use of GIN and GiST indexes |
|
|
|
allows the implementation of very fast searches with online update. |
|
|
|
allows the implementation of very fast searches with online update. |
|
|
|
Partitioning can be done at the database level using table inheritance, |
|
|
|
Partitioning can be done at the database level using table inheritance, |
|
|
|
or by distributing documents over |
|
|
|
or by distributing documents over |
|
|
|
|