mirror of https://github.com/postgres/postgres
parent
333cbc2dab
commit
0ba77c14aa
@ -1,113 +0,0 @@ |
|||||||
|
|
||||||
PostgreSQL Charsets README |
|
||||||
Josef Balatka, <balatka@email.cz> |
|
||||||
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999 |
|
||||||
|
|
||||||
This document is a brief overview of the national charsets support |
|
||||||
that PostgreSQL ver. 6.5 has implemented. Various compilation options |
|
||||||
and setup tips are mentioned here to be helpful in the particular use. |
|
||||||
|
|
||||||
--------------------------------------------------------------------------- |
|
||||||
|
|
||||||
Table of Contents |
|
||||||
|
|
||||||
1. Locale awareness |
|
||||||
|
|
||||||
2. Single-byte charsets recoding |
|
||||||
|
|
||||||
3. Multi-byte support/recoding |
|
||||||
|
|
||||||
4. Credits |
|
||||||
|
|
||||||
--------------------------------------------------------------------------- |
|
||||||
|
|
||||||
1. Locale awareness |
|
||||||
|
|
||||||
PostgreSQL server supports both locale aware and locale not aware |
|
||||||
(default) operational modes. You can determine this mode during the |
|
||||||
configuration stage of the installation with --enable-locale option. |
|
||||||
|
|
||||||
If you don't use --enable-locale, the multi-language code will not be |
|
||||||
compiled and PostgreSQL will behave as an ASCII compliant application. |
|
||||||
This mode is useful for its speed but only provided that you don't |
|
||||||
have to consider national specific chars. |
|
||||||
|
|
||||||
With --enable-locale you will get a locale aware server using LC_* |
|
||||||
environment variables to determine how to process national specifics. |
|
||||||
In this case strcoll(3) and similar functions are used internally |
|
||||||
so speed is somewhat lower. |
|
||||||
|
|
||||||
Notice here that --enable-locale is sufficient when all your clients |
|
||||||
use the same single-byte encoding as the database server does. |
|
||||||
|
|
||||||
When your clients use encoding different from the server than you have |
|
||||||
to use, moreover, --enable-recode or --with-mb=<encoding> options on |
|
||||||
the server side or a particular client that does recoding itself (e.g. |
|
||||||
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic |
|
||||||
encoding capability). Option --with-mb=<encoding> is necessary for the |
|
||||||
multi-byte charsets support. |
|
||||||
|
|
||||||
|
|
||||||
2. Single-byte charsets recoding |
|
||||||
|
|
||||||
You can set up this feature with --enable-recode option. This option |
|
||||||
is described as 'enable Cyrillic recode support' which doesn't express |
|
||||||
all its power. It can be used for *any* single-byte charset recoding. |
|
||||||
|
|
||||||
This method uses charset.conf file located in the $PGDATA directory. |
|
||||||
It's a typical configuration text file where spaces and newlines |
|
||||||
separate items and records and # specifies comments. Three keywords |
|
||||||
with the following syntax are recognized here: |
|
||||||
|
|
||||||
BaseCharset <server_charset> |
|
||||||
RecodeTable <from_charset> <to_charset> <file_name> |
|
||||||
HostCharset <host_spec> <host_charset> |
|
||||||
|
|
||||||
BaseCharset defines encoding of the database server. All charset |
|
||||||
names are only used for mapping inside the charset.conf so you can |
|
||||||
freely use typing-friendly names. |
|
||||||
|
|
||||||
RecodeTable records specify translation table between server and client. |
|
||||||
The file name is relative to the $PGDATA directory. Table file format |
|
||||||
is very simple. There are no keywords and characters are represented by |
|
||||||
a pair of decimal or hexadecimal (0x prefixed) values on single lines: |
|
||||||
|
|
||||||
<char_value> <translated_char_value> |
|
||||||
|
|
||||||
HostCharset records define IP address and charset. You can use a single |
|
||||||
IP address, an IP mask range starting from the given address or an IP |
|
||||||
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40) |
|
||||||
|
|
||||||
The charset.conf is always processed up to the end, so you can easily |
|
||||||
specify exceptions from the previous rules. In the src/data you will |
|
||||||
find charset.conf example and a few recoding tables. |
|
||||||
|
|
||||||
As this solution is based on the client's IP address / charset mapping |
|
||||||
there are obviously some restrictions as well. You can't use different |
|
||||||
encoding on the same host at the same time. It's also inconvenient when |
|
||||||
you boot your client hosts into more operating systems. |
|
||||||
Nevertheless, when these restrictions are not limiting and you don't |
|
||||||
need multi-byte chars than it's a simple and effective solution. |
|
||||||
|
|
||||||
|
|
||||||
3. Multi-byte support/recoding |
|
||||||
|
|
||||||
It's a new generation of charset encoding in PostgreSQL designed as a |
|
||||||
more complex solution supporting both single-byte and multi-byte chars. |
|
||||||
You can set up this feature with --with-mb=<encoding> option. |
|
||||||
|
|
||||||
There is no IP mapping file and recoding is controlled through the new |
|
||||||
SQL statements. Recoding tables are included in the code. Many national |
|
||||||
charsets are already supported and further will follow. |
|
||||||
|
|
||||||
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how |
|
||||||
to use the multibyte support. In the file doc/README.locale there is |
|
||||||
a particular instruction on usage of the multibyte support with Cyrillic. |
|
||||||
|
|
||||||
|
|
||||||
4. Credits |
|
||||||
|
|
||||||
I'd like to thank the PostgreSQL development team and all contributors |
|
||||||
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and |
|
||||||
Tatsuo Ishii for opening the door into the multi-language world. |
|
||||||
|
|
@ -1,107 +0,0 @@ |
|||||||
=========== |
|
||||||
1999 Jul 21 |
|
||||||
=========== |
|
||||||
|
|
||||||
Josef Balatka, <balatka@email.cz> asked us not to remove RECODE and sent me |
|
||||||
Czech ISO-8859-2 -> WIN-1250 translation table. |
|
||||||
RECODE is no longer contains just Cyrillic RECODE and will stay in |
|
||||||
PostgreSQL. |
|
||||||
|
|
||||||
He also created some bits of documentation, mostly concerning RECODE - |
|
||||||
see README.Charsets. |
|
||||||
|
|
||||||
|
|
||||||
=========== |
|
||||||
1999 Apr 14 |
|
||||||
=========== |
|
||||||
|
|
||||||
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it |
|
||||||
to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5 |
|
||||||
and CP866 (ALT) encodings. |
|
||||||
|
|
||||||
Short instruction on using this feature follows. Longer discussion of |
|
||||||
Multibyte support is in README.mb. |
|
||||||
|
|
||||||
WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete |
|
||||||
and will be removed from Postgres. If you are using RECODE consider |
|
||||||
switching to Multibyte support. |
|
||||||
|
|
||||||
Instructions on how to prepare Postgres for Cyrillic Multibyte support. |
|
||||||
---------------------------------------------------------------------- |
|
||||||
|
|
||||||
First, you need to backup all your databases. I recommend to backup the |
|
||||||
entire Postgres directory, including binaries and libraries - thus you can |
|
||||||
easily restore if something goes wrong. |
|
||||||
|
|
||||||
Dump you data: pg_dumpall > dump.db |
|
||||||
|
|
||||||
Stop postmaster. |
|
||||||
|
|
||||||
Configure, compile and install Postgres. (I'll mostly talk about KOI8-R |
|
||||||
encoding, this is just to make examples a little more clear; you can use |
|
||||||
any supported encoding.) |
|
||||||
|
|
||||||
cd src |
|
||||||
./configure --enable-locale --with-mb=KOI8 |
|
||||||
make |
|
||||||
make install |
|
||||||
|
|
||||||
Make sure you've backed up your databases. Doublecheck your backup. I |
|
||||||
really mean it - make regular backups and test your backups sometimes by |
|
||||||
fake restore. |
|
||||||
|
|
||||||
Remove your data directory (better, rename or move it). |
|
||||||
|
|
||||||
Run initdb saying your primary encoding: initdb -e KOI8. If you omit |
|
||||||
encoding, primary encoding from configure will be taken. |
|
||||||
|
|
||||||
Start postmaster. |
|
||||||
|
|
||||||
Create databases: createdb -e KOI8. Again, you can omit encoding - |
|
||||||
default encoding will be used. You are not forced to use the same encoding |
|
||||||
for all your databases - you can create different databases with different |
|
||||||
encodings. |
|
||||||
|
|
||||||
Load your data from the dump you've created: psql < dump.db |
|
||||||
|
|
||||||
That's all! Now you are ready to enjoy the full power of Multibyte |
|
||||||
support. |
|
||||||
|
|
||||||
To use Multibyte support you do not need to do something special - just |
|
||||||
execute your queries. If client program does not set encoding, it will get |
|
||||||
the data in database encoding. But client may ask Postgres to do automatic |
|
||||||
server-to-client and client-to-server conversions. There are 2 (two) ways |
|
||||||
client program declares its encoding: |
|
||||||
1) client explicitly executes the query SET CLIENT_ENCODING TO 'win'; |
|
||||||
2) client started with environment variable set. Examples - |
|
||||||
using sh syntax: |
|
||||||
PGCLIENTENCODING='win'; export PGCLIENTENCODING |
|
||||||
using csh syntax: |
|
||||||
setenv PGCLIENTENCODING 'win' |
|
||||||
|
|
||||||
Setting PGCLIENTENCODING even if you use same client encding as the |
|
||||||
database would omit an overhead of asking the database encoding while |
|
||||||
initiating the connection, so it is good idea to set it in any case. |
|
||||||
|
|
||||||
Now you may run test suite and see Multibyte support in action. Go to |
|
||||||
.../src/test/locale and run |
|
||||||
make clean all test-koi2win |
|
||||||
|
|
||||||
|
|
||||||
=========== |
|
||||||
1998 Nov 20 |
|
||||||
=========== |
|
||||||
|
|
||||||
I extended locale support, originally written by Oleg Bartunov |
|
||||||
<oleg@sai.msu.su>. Now ORDER BY (if PostgreSQL configured with |
|
||||||
--enable-locale) uses strcoll() for all text fields: char(n), varchar(n), |
|
||||||
text. |
|
||||||
|
|
||||||
I included test suite .../src/test/locale. I didn't include this in |
|
||||||
the regression test because not so much people require locale support. Read |
|
||||||
.../src/test/locale/README for details on the test suite. |
|
||||||
|
|
||||||
Many thanks to Oleg Bartunov (oleg@sai.msu.su) and Thomas G. Lockhart |
|
||||||
(lockhart@alumni.caltech.edu) for hints, tips, help and discussion. |
|
||||||
|
|
||||||
Oleg. |
|
Loading…
Reference in new issue