mirror of https://github.com/postgres/postgres
parent
333cbc2dab
commit
0ba77c14aa
@ -1,113 +0,0 @@ |
||||
|
||||
PostgreSQL Charsets README |
||||
Josef Balatka, <balatka@email.cz> |
||||
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999 |
||||
|
||||
This document is a brief overview of the national charsets support |
||||
that PostgreSQL ver. 6.5 has implemented. Various compilation options |
||||
and setup tips are mentioned here to be helpful in the particular use. |
||||
|
||||
--------------------------------------------------------------------------- |
||||
|
||||
Table of Contents |
||||
|
||||
1. Locale awareness |
||||
|
||||
2. Single-byte charsets recoding |
||||
|
||||
3. Multi-byte support/recoding |
||||
|
||||
4. Credits |
||||
|
||||
--------------------------------------------------------------------------- |
||||
|
||||
1. Locale awareness |
||||
|
||||
PostgreSQL server supports both locale aware and locale not aware |
||||
(default) operational modes. You can determine this mode during the |
||||
configuration stage of the installation with --enable-locale option. |
||||
|
||||
If you don't use --enable-locale, the multi-language code will not be |
||||
compiled and PostgreSQL will behave as an ASCII compliant application. |
||||
This mode is useful for its speed but only provided that you don't |
||||
have to consider national specific chars. |
||||
|
||||
With --enable-locale you will get a locale aware server using LC_* |
||||
environment variables to determine how to process national specifics. |
||||
In this case strcoll(3) and similar functions are used internally |
||||
so speed is somewhat lower. |
||||
|
||||
Notice here that --enable-locale is sufficient when all your clients |
||||
use the same single-byte encoding as the database server does. |
||||
|
||||
When your clients use encoding different from the server than you have |
||||
to use, moreover, --enable-recode or --with-mb=<encoding> options on |
||||
the server side or a particular client that does recoding itself (e.g. |
||||
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic |
||||
encoding capability). Option --with-mb=<encoding> is necessary for the |
||||
multi-byte charsets support. |
||||
|
||||
|
||||
2. Single-byte charsets recoding |
||||
|
||||
You can set up this feature with --enable-recode option. This option |
||||
is described as 'enable Cyrillic recode support' which doesn't express |
||||
all its power. It can be used for *any* single-byte charset recoding. |
||||
|
||||
This method uses charset.conf file located in the $PGDATA directory. |
||||
It's a typical configuration text file where spaces and newlines |
||||
separate items and records and # specifies comments. Three keywords |
||||
with the following syntax are recognized here: |
||||
|
||||
BaseCharset <server_charset> |
||||
RecodeTable <from_charset> <to_charset> <file_name> |
||||
HostCharset <host_spec> <host_charset> |
||||
|
||||
BaseCharset defines encoding of the database server. All charset |
||||
names are only used for mapping inside the charset.conf so you can |
||||
freely use typing-friendly names. |
||||
|
||||
RecodeTable records specify translation table between server and client. |
||||
The file name is relative to the $PGDATA directory. Table file format |
||||
is very simple. There are no keywords and characters are represented by |
||||
a pair of decimal or hexadecimal (0x prefixed) values on single lines: |
||||
|
||||
<char_value> <translated_char_value> |
||||
|
||||
HostCharset records define IP address and charset. You can use a single |
||||
IP address, an IP mask range starting from the given address or an IP |
||||
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40) |
||||
|
||||
The charset.conf is always processed up to the end, so you can easily |
||||
specify exceptions from the previous rules. In the src/data you will |
||||
find charset.conf example and a few recoding tables. |
||||
|
||||
As this solution is based on the client's IP address / charset mapping |
||||
there are obviously some restrictions as well. You can't use different |
||||
encoding on the same host at the same time. It's also inconvenient when |
||||
you boot your client hosts into more operating systems. |
||||
Nevertheless, when these restrictions are not limiting and you don't |
||||
need multi-byte chars than it's a simple and effective solution. |
||||
|
||||
|
||||
3. Multi-byte support/recoding |
||||
|
||||
It's a new generation of charset encoding in PostgreSQL designed as a |
||||
more complex solution supporting both single-byte and multi-byte chars. |
||||
You can set up this feature with --with-mb=<encoding> option. |
||||
|
||||
There is no IP mapping file and recoding is controlled through the new |
||||
SQL statements. Recoding tables are included in the code. Many national |
||||
charsets are already supported and further will follow. |
||||
|
||||
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how |
||||
to use the multibyte support. In the file doc/README.locale there is |
||||
a particular instruction on usage of the multibyte support with Cyrillic. |
||||
|
||||
|
||||
4. Credits |
||||
|
||||
I'd like to thank the PostgreSQL development team and all contributors |
||||
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and |
||||
Tatsuo Ishii for opening the door into the multi-language world. |
||||
|
@ -1,107 +0,0 @@ |
||||
=========== |
||||
1999 Jul 21 |
||||
=========== |
||||
|
||||
Josef Balatka, <balatka@email.cz> asked us not to remove RECODE and sent me |
||||
Czech ISO-8859-2 -> WIN-1250 translation table. |
||||
RECODE is no longer contains just Cyrillic RECODE and will stay in |
||||
PostgreSQL. |
||||
|
||||
He also created some bits of documentation, mostly concerning RECODE - |
||||
see README.Charsets. |
||||
|
||||
|
||||
=========== |
||||
1999 Apr 14 |
||||
=========== |
||||
|
||||
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it |
||||
to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5 |
||||
and CP866 (ALT) encodings. |
||||
|
||||
Short instruction on using this feature follows. Longer discussion of |
||||
Multibyte support is in README.mb. |
||||
|
||||
WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete |
||||
and will be removed from Postgres. If you are using RECODE consider |
||||
switching to Multibyte support. |
||||
|
||||
Instructions on how to prepare Postgres for Cyrillic Multibyte support. |
||||
---------------------------------------------------------------------- |
||||
|
||||
First, you need to backup all your databases. I recommend to backup the |
||||
entire Postgres directory, including binaries and libraries - thus you can |
||||
easily restore if something goes wrong. |
||||
|
||||
Dump you data: pg_dumpall > dump.db |
||||
|
||||
Stop postmaster. |
||||
|
||||
Configure, compile and install Postgres. (I'll mostly talk about KOI8-R |
||||
encoding, this is just to make examples a little more clear; you can use |
||||
any supported encoding.) |
||||
|
||||
cd src |
||||
./configure --enable-locale --with-mb=KOI8 |
||||
make |
||||
make install |
||||
|
||||
Make sure you've backed up your databases. Doublecheck your backup. I |
||||
really mean it - make regular backups and test your backups sometimes by |
||||
fake restore. |
||||
|
||||
Remove your data directory (better, rename or move it). |
||||
|
||||
Run initdb saying your primary encoding: initdb -e KOI8. If you omit |
||||
encoding, primary encoding from configure will be taken. |
||||
|
||||
Start postmaster. |
||||
|
||||
Create databases: createdb -e KOI8. Again, you can omit encoding - |
||||
default encoding will be used. You are not forced to use the same encoding |
||||
for all your databases - you can create different databases with different |
||||
encodings. |
||||
|
||||
Load your data from the dump you've created: psql < dump.db |
||||
|
||||
That's all! Now you are ready to enjoy the full power of Multibyte |
||||
support. |
||||
|
||||
To use Multibyte support you do not need to do something special - just |
||||
execute your queries. If client program does not set encoding, it will get |
||||
the data in database encoding. But client may ask Postgres to do automatic |
||||
server-to-client and client-to-server conversions. There are 2 (two) ways |
||||
client program declares its encoding: |
||||
1) client explicitly executes the query SET CLIENT_ENCODING TO 'win'; |
||||
2) client started with environment variable set. Examples - |
||||
using sh syntax: |
||||
PGCLIENTENCODING='win'; export PGCLIENTENCODING |
||||
using csh syntax: |
||||
setenv PGCLIENTENCODING 'win' |
||||
|
||||
Setting PGCLIENTENCODING even if you use same client encding as the |
||||
database would omit an overhead of asking the database encoding while |
||||
initiating the connection, so it is good idea to set it in any case. |
||||
|
||||
Now you may run test suite and see Multibyte support in action. Go to |
||||
.../src/test/locale and run |
||||
make clean all test-koi2win |
||||
|
||||
|
||||
=========== |
||||
1998 Nov 20 |
||||
=========== |
||||
|
||||
I extended locale support, originally written by Oleg Bartunov |
||||
<oleg@sai.msu.su>. Now ORDER BY (if PostgreSQL configured with |
||||
--enable-locale) uses strcoll() for all text fields: char(n), varchar(n), |
||||
text. |
||||
|
||||
I included test suite .../src/test/locale. I didn't include this in |
||||
the regression test because not so much people require locale support. Read |
||||
.../src/test/locale/README for details on the test suite. |
||||
|
||||
Many thanks to Oleg Bartunov (oleg@sai.msu.su) and Thomas G. Lockhart |
||||
(lockhart@alumni.caltech.edu) for hints, tips, help and discussion. |
||||
|
||||
Oleg. |
Loading…
Reference in new issue