|
|
|
|
@ -1,3 +1,5 @@ |
|
|
|
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ --> |
|
|
|
|
|
|
|
|
|
<sect1 id="unaccent"> |
|
|
|
|
<title>unaccent</title> |
|
|
|
|
|
|
|
|
|
@ -6,24 +8,24 @@ |
|
|
|
|
</indexterm> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
<filename>unaccent</> removes accents (diacritic signs) from a lexeme. |
|
|
|
|
It's a filtering dictionary, that means its output is |
|
|
|
|
always passed to the next dictionary (if any), contrary to the standard |
|
|
|
|
behavior. Currently, it supports most important accents from European |
|
|
|
|
languages. |
|
|
|
|
<filename>unaccent</> is a text search dictionary that removes accents |
|
|
|
|
(diacritic signs) from lexemes. |
|
|
|
|
It's a filtering dictionary, which means its output is |
|
|
|
|
always passed to the next dictionary (if any), unlike the normal |
|
|
|
|
behavior of dictionaries. This allows accent-insensitive processing |
|
|
|
|
for full text search. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Limitation: Current implementation of <filename>unaccent</> |
|
|
|
|
dictionary cannot be used as a normalizing dictionary for |
|
|
|
|
<filename>thesaurus</filename> dictionary. |
|
|
|
|
The current implementation of <filename>unaccent</> cannot be used as a |
|
|
|
|
normalizing dictionary for the <filename>thesaurus</filename> dictionary. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<sect2> |
|
|
|
|
<title>Configuration</title> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
A <literal>unaccent</> dictionary accepts the following options: |
|
|
|
|
An <literal>unaccent</> dictionary accepts the following options: |
|
|
|
|
</para> |
|
|
|
|
<itemizedlist> |
|
|
|
|
<listitem> |
|
|
|
|
@ -43,23 +45,27 @@ |
|
|
|
|
<itemizedlist> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Each line represents pair: character_with_accent character_without_accent |
|
|
|
|
Each line represents a pair, consisting of a character with accent |
|
|
|
|
followed by a character without accent. The first is translated into |
|
|
|
|
the second. For example, |
|
|
|
|
<programlisting> |
|
|
|
|
À A |
|
|
|
|
Á A |
|
|
|
|
 A |
|
|
|
|
 A |
|
|
|
|
à A |
|
|
|
|
Ä A |
|
|
|
|
Å A |
|
|
|
|
Æ A |
|
|
|
|
Ä A |
|
|
|
|
Å A |
|
|
|
|
Æ A |
|
|
|
|
</programlisting> |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
</itemizedlist> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Look at <filename>unaccent.rules</>, which is installed in |
|
|
|
|
<filename>$SHAREDIR/tsearch_data/</>, for an example. |
|
|
|
|
A more complete example, which is directly useful for most European |
|
|
|
|
languages, can be found in <filename>unaccent.rules</>, which is installed |
|
|
|
|
in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</> |
|
|
|
|
module is installed. |
|
|
|
|
</para> |
|
|
|
|
</sect2> |
|
|
|
|
|
|
|
|
|
@ -67,66 +73,66 @@ |
|
|
|
|
<title>Usage</title> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Running the installation script creates a text search template |
|
|
|
|
<literal>unaccent</> and a dictionary <literal>unaccent</> |
|
|
|
|
Running the installation script <filename>unaccent.sql</> creates a text |
|
|
|
|
search template <literal>unaccent</> and a dictionary <literal>unaccent</> |
|
|
|
|
based on it, with default parameters. You can alter the |
|
|
|
|
parameters, for example |
|
|
|
|
|
|
|
|
|
<programlisting> |
|
|
|
|
=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
|
|
|
|
mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
|
|
|
|
</programlisting> |
|
|
|
|
|
|
|
|
|
or create new dictionaries based on the template. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
To test the dictionary, you can try |
|
|
|
|
|
|
|
|
|
To test the dictionary, you can try: |
|
|
|
|
<programlisting> |
|
|
|
|
=# select ts_lexize('unaccent','Hôtel'); |
|
|
|
|
ts_lexize |
|
|
|
|
mydb=# select ts_lexize('unaccent','Hôtel'); |
|
|
|
|
ts_lexize |
|
|
|
|
----------- |
|
|
|
|
{Hotel} |
|
|
|
|
(1 row) |
|
|
|
|
</programlisting> |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
Filtering dictionary are useful for correct work of |
|
|
|
|
<function>ts_headline</function> function. |
|
|
|
|
Here is an example showing how to insert the |
|
|
|
|
<filename>unaccent</> dictionary into a text search configuration: |
|
|
|
|
<programlisting> |
|
|
|
|
=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
|
|
|
|
=# ALTER TEXT SEARCH CONFIGURATION fr |
|
|
|
|
mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
|
|
|
|
mydb=# ALTER TEXT SEARCH CONFIGURATION fr |
|
|
|
|
ALTER MAPPING FOR hword, hword_part, word |
|
|
|
|
WITH unaccent, french_stem; |
|
|
|
|
=# select to_tsvector('fr','Hôtels de la Mer'); |
|
|
|
|
to_tsvector |
|
|
|
|
mydb=# select to_tsvector('fr','Hôtels de la Mer'); |
|
|
|
|
to_tsvector |
|
|
|
|
------------------- |
|
|
|
|
'hotel':1 'mer':4 |
|
|
|
|
(1 row) |
|
|
|
|
|
|
|
|
|
=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
|
|
|
|
?column? |
|
|
|
|
mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
|
|
|
|
?column? |
|
|
|
|
---------- |
|
|
|
|
t |
|
|
|
|
(1 row) |
|
|
|
|
=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
|
|
|
|
ts_headline |
|
|
|
|
|
|
|
|
|
mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
|
|
|
|
ts_headline |
|
|
|
|
------------------------ |
|
|
|
|
<b>Hôtel</b>de la Mer |
|
|
|
|
<b>Hôtel</b> de la Mer |
|
|
|
|
(1 row) |
|
|
|
|
|
|
|
|
|
</programlisting> |
|
|
|
|
</para> |
|
|
|
|
</sect2> |
|
|
|
|
|
|
|
|
|
<sect2> |
|
|
|
|
<title>Function</title> |
|
|
|
|
<title>Functions</title> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
<function>unaccent</> function removes accents (diacritic signs) from |
|
|
|
|
argument string. Basically, it's a wrapper around |
|
|
|
|
<filename>unaccent</> dictionary. |
|
|
|
|
The <function>unaccent()</> function removes accents (diacritic signs) from |
|
|
|
|
a given string. Basically, it's a wrapper around the |
|
|
|
|
<filename>unaccent</> dictionary, but it can be used outside normal |
|
|
|
|
text search contexts. |
|
|
|
|
</para> |
|
|
|
|
|
|
|
|
|
<indexterm> |
|
|
|
|
@ -134,14 +140,14 @@ |
|
|
|
|
</indexterm> |
|
|
|
|
|
|
|
|
|
<synopsis> |
|
|
|
|
unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) |
|
|
|
|
returns <type>text</type> |
|
|
|
|
unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type> |
|
|
|
|
</synopsis> |
|
|
|
|
|
|
|
|
|
<para> |
|
|
|
|
For example: |
|
|
|
|
<programlisting> |
|
|
|
|
SELECT unaccent('unaccent', 'Hôtel'); |
|
|
|
|
SELECT unaccent('Hôtel'); |
|
|
|
|
SELECT unaccent('unaccent', 'Hôtel'); |
|
|
|
|
SELECT unaccent('Hôtel'); |
|
|
|
|
</programlisting> |
|
|
|
|
</para> |
|
|
|
|
</sect2> |
|
|
|
|
|