mirror of https://github.com/postgres/postgres
and for src/data directories, and one minor patch for doc/README.locale. Please apply. Oleg.REL7_0_PATCHES
parent
c5d0a1bc42
commit
972124091d
@ -0,0 +1,113 @@ |
||||
|
||||
PostgreSQL Charsets README |
||||
Josef Balatka, <balatka@email.cz> |
||||
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999 |
||||
|
||||
This document is a brief overview of the national charsets support |
||||
that PostgreSQL ver. 6.5 has implemented. Various compilation options |
||||
and setup tips are mentioned here to be helpful in the particular use. |
||||
|
||||
--------------------------------------------------------------------------- |
||||
|
||||
Table of Contents |
||||
|
||||
1. Locale awareness |
||||
|
||||
2. Single-byte charsets recoding |
||||
|
||||
3. Multi-byte support/recoding |
||||
|
||||
4. Credits |
||||
|
||||
--------------------------------------------------------------------------- |
||||
|
||||
1. Locale awareness |
||||
|
||||
PostgreSQL server supports both locale aware and locale not aware |
||||
(default) operational modes. You can determine this mode during the |
||||
configuration stage of the installation with --enable-locale option. |
||||
|
||||
If you don't use --enable-locale, the multi-language code will not be |
||||
compiled and PostgreSQL will behave as an ASCII compliant application. |
||||
This mode is useful for its speed but only provided that you don't |
||||
have to consider national specific chars. |
||||
|
||||
With --enable-locale you will get a locale aware server using LC_* |
||||
environment variables to determine how to process national specifics. |
||||
In this case strcoll(3) and similar functions are used internally |
||||
so speed is somewhat lower. |
||||
|
||||
Notice here that --enable-locale is sufficient when all your clients |
||||
use the same single-byte encoding as the database server does. |
||||
|
||||
When your clients use encoding different from the server than you have |
||||
to use, moreover, --enable-recode or --with-mb=<encoding> options on |
||||
the server side or a particular client that does recoding itself (e.g. |
||||
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic |
||||
encoding capability). Option --with-mb=<encoding> is necessary for the |
||||
multi-byte charsets support. |
||||
|
||||
|
||||
2. Single-byte charsets recoding |
||||
|
||||
You can set up this feature with --enable-recode option. This option |
||||
is described as 'enable Cyrillic recode support' which doesn't express |
||||
all its power. It can be used for *any* single-byte charset recoding. |
||||
|
||||
This method uses charset.conf file located in the $PGDATA directory. |
||||
It's a typical configuration text file where spaces and newlines |
||||
separate items and records and # specifies comments. Three keywords |
||||
with the following syntax are recognized here: |
||||
|
||||
BaseCharset <server_charset> |
||||
RecodeTable <from_charset> <to_charset> <file_name> |
||||
HostCharset <host_spec> <host_charset> |
||||
|
||||
BaseCharset defines encoding of the database server. All charset |
||||
names are only used for mapping inside the charset.conf so you can |
||||
freely use typing-friendly names. |
||||
|
||||
RecodeTable records specify translation table between server and client. |
||||
The file name is relative to the $PGDATA directory. Table file format |
||||
is very simple. There are no keywords and characters are represented by |
||||
a pair of decimal or hexadecimal (0x prefixed) values on single lines: |
||||
|
||||
<char_value> <translated_char_value> |
||||
|
||||
HostCharset records define IP address and charset. You can use a single |
||||
IP address, an IP mask range starting from the given address or an IP |
||||
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40) |
||||
|
||||
The charset.conf is always processed up to the end, so you can easily |
||||
specify exceptions from the previous rules. In the src/data you will |
||||
find charset.conf example and a few recoding tables. |
||||
|
||||
As this solution is based on the client's IP address / charset mapping |
||||
there are obviously some restrictions as well. You can't use different |
||||
encoding on the same host at the same time. It's also inconvenient when |
||||
you boot your client hosts into more operating systems. |
||||
Nevertheless, when these restrictions are not limiting and you don't |
||||
need multi-byte chars than it's a simple and effective solution. |
||||
|
||||
|
||||
3. Multi-byte support/recoding |
||||
|
||||
It's a new generation of charset encoding in PostgreSQL designed as a |
||||
more complex solution supporting both single-byte and multi-byte chars. |
||||
You can set up this feature with --with-mb=<encoding> option. |
||||
|
||||
There is no IP mapping file and recoding is controlled through the new |
||||
SQL statements. Recoding tables are included in the code. Many national |
||||
charsets are already supported and further will follow. |
||||
|
||||
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how |
||||
to use the multibyte support. In the file doc/README.locale there is |
||||
a particular instruction on usage of the multibyte support with Cyrillic. |
||||
|
||||
|
||||
4. Credits |
||||
|
||||
I'd like to thank the PostgreSQL development team and all contributors |
||||
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and |
||||
Tatsuo Ishii for opening the door into the multi-language world. |
||||
|
||||
@ -0,0 +1,12 @@ |
||||
# |
||||
# Czech ISO-8859-2 -> WIN-1250 translation table |
||||
# |
||||
165 188 |
||||
169 138 |
||||
171 141 |
||||
174 142 |
||||
181 190 |
||||
185 154 |
||||
187 157 |
||||
190 158 |
||||
|
||||
Loading…
Reference in new issue