mirror of https://github.com/postgres/postgres
parent
43da837eda
commit
ec2ff52045
@ -0,0 +1,78 @@ |
||||
<sect1 id="dict-int"> |
||||
<title>dict_int</title> |
||||
|
||||
<indexterm zone="dict-int"> |
||||
<primary>dict_int</primary> |
||||
</indexterm> |
||||
|
||||
<para> |
||||
The motivation for this example dictionary is to control the indexing of |
||||
integers (signed and unsigned), and, consequently, to minimize the number of |
||||
unique words which greatly affect the performance of searching. |
||||
</para> |
||||
|
||||
<sect2> |
||||
<title>Configuration</title> |
||||
<para> |
||||
The dictionary accepts two options: |
||||
</para> |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
The MAXLEN parameter specifies the maximum length (number of digits) |
||||
allowed in an integer word. The default value is 6. |
||||
</para> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
The REJECTLONG parameter specifies if an overlength integer should be |
||||
truncated or ignored. If REJECTLONG=FALSE (default), the dictionary returns |
||||
the first MAXLEN digits of the integer. If REJECTLONG=TRUE, the |
||||
dictionary treats an overlength integer as a stop word, so that it will |
||||
not be indexed. |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
</sect2> |
||||
|
||||
<sect2> |
||||
<title>Usage</title> |
||||
|
||||
<para> |
||||
Running the installation script creates a text search template |
||||
<literal>intdict_template</> and a dictionary <literal>intdict</> |
||||
based on it, with the default parameters. You can alter the |
||||
parameters, for example |
||||
|
||||
<programlisting> |
||||
mydb# ALTER TEXT SEARCH DICTIONARY intdict (MAXLEN = 4, REJECTLONG = true); |
||||
ALTER TEXT SEARCH DICTIONARY |
||||
</programlisting> |
||||
|
||||
or create new dictionaries based on the template. |
||||
</para> |
||||
|
||||
<para> |
||||
To test the dictionary, you can try |
||||
|
||||
<programlisting> |
||||
mydb# select ts_lexize('intdict', '12345678'); |
||||
ts_lexize |
||||
----------- |
||||
{123456} |
||||
</programlisting> |
||||
|
||||
but real-world usage will involve including it in a text search |
||||
configuration as described in <xref linkend="textsearch">. |
||||
That might look like this: |
||||
|
||||
<programlisting> |
||||
ALTER TEXT SEARCH CONFIGURATION english |
||||
ALTER MAPPING FOR int, uint WITH intdict; |
||||
</programlisting> |
||||
|
||||
</para> |
||||
</sect2> |
||||
|
||||
</sect1> |
||||
@ -0,0 +1,78 @@ |
||||
<sect1 id="dict-xsyn"> |
||||
<title>dict_xsyn</title> |
||||
|
||||
<indexterm zone="dict-xsyn"> |
||||
<primary>dict_xsyn</primary> |
||||
</indexterm> |
||||
|
||||
<para> |
||||
The Extended Synonym Dictionary module replaces words with groups of their |
||||
synonyms, and so makes it possible to search for a word using any of its |
||||
synonyms. |
||||
</para> |
||||
|
||||
<sect2> |
||||
<title>Configuration</title> |
||||
<para> |
||||
A <literal>dict_xsyn</> dictionary accepts the following options: |
||||
</para> |
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
KEEPORIG controls whether the original word is included, or only its |
||||
synonyms. Default is 'true'. |
||||
</para> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
RULES is the base name of the file containing the list of synonyms. |
||||
This file must be in $(prefix)/share/tsearch_data/, and its name must |
||||
end in ".rules" (which is not included in the RULES parameter). |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
<para> |
||||
The rules file has the following format: |
||||
</para> |
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
Each line represents a group of synonyms for a single word, which is |
||||
given first on the line. Synonyms are separated by whitespace: |
||||
</para> |
||||
<programlisting> |
||||
word syn1 syn2 syn3 |
||||
</programlisting> |
||||
</listitem> |
||||
<listitem> |
||||
<para> |
||||
Sharp ('#') sign is a comment delimiter. It may appear at any position |
||||
inside the line. The rest of the line will be skipped. |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
|
||||
<para> |
||||
Look at xsyn_sample.rules, which is installed in $(prefix)/share/tsearch_data/, |
||||
for an example. |
||||
</para> |
||||
</sect2> |
||||
|
||||
<sect2> |
||||
<title>Usage</title> |
||||
<programlisting> |
||||
mydb=# SELECT ts_lexize('xsyn','word'); |
||||
ts_lexize |
||||
---------------- |
||||
{word,syn1,syn2,syn3) |
||||
</programlisting> |
||||
<para> |
||||
Change dictionary options: |
||||
</para> |
||||
<programlisting> |
||||
mydb# ALTER TEXT SEARCH DICTIONARY xsyn (KEEPORIG=false); |
||||
ALTER TEXT SEARCH DICTIONARY |
||||
</programlisting> |
||||
</sect2> |
||||
|
||||
</sect1> |
||||
Loading…
Reference in new issue