|
|
|
|
@ -21,22 +21,22 @@ |
|
|
|
|
#include "utils/pg_locale.h" |
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* To provide as much functionality as possible on a variety of platforms, |
|
|
|
|
* without going so far as to implement everything from scratch, we use |
|
|
|
|
* several implementation strategies depending on the situation: |
|
|
|
|
* For the libc provider, to provide as much functionality as possible on a |
|
|
|
|
* variety of platforms without going so far as to implement everything from |
|
|
|
|
* scratch, we use several implementation strategies depending on the |
|
|
|
|
* situation: |
|
|
|
|
* |
|
|
|
|
* 1. In C/POSIX collations, we use hard-wired code. We can't depend on |
|
|
|
|
* the <ctype.h> functions since those will obey LC_CTYPE. Note that these |
|
|
|
|
* collations don't give a fig about multibyte characters. |
|
|
|
|
* |
|
|
|
|
* 2. In the "default" collation (which is supposed to obey LC_CTYPE): |
|
|
|
|
* |
|
|
|
|
* 2a. When working in UTF8 encoding, we use the <wctype.h> functions. |
|
|
|
|
* 2. When working in UTF8 encoding, we use the <wctype.h> functions. |
|
|
|
|
* This assumes that every platform uses Unicode codepoints directly |
|
|
|
|
* as the wchar_t representation of Unicode. On some platforms |
|
|
|
|
* as the wchar_t representation of Unicode. (XXX: ICU makes this assumption |
|
|
|
|
* even for non-UTF8 encodings, which may be a problem.) On some platforms |
|
|
|
|
* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF. |
|
|
|
|
* |
|
|
|
|
* 2b. In all other encodings, we use the <ctype.h> functions for pg_wchar |
|
|
|
|
* 3. In all other encodings, we use the <ctype.h> functions for pg_wchar |
|
|
|
|
* values up to 255, and punt for values above that. This is 100% correct |
|
|
|
|
* only in single-byte encodings such as LATINn. However, non-Unicode |
|
|
|
|
* multibyte encodings are mostly Far Eastern character sets for which the |
|
|
|
|
@ -46,14 +46,11 @@ |
|
|
|
|
* the platform's wchar_t representation matches what we do in pg_wchar |
|
|
|
|
* conversions. |
|
|
|
|
* |
|
|
|
|
* 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h> |
|
|
|
|
* functions, under exactly the same cases as #2. |
|
|
|
|
* |
|
|
|
|
* There is one notable difference between cases 2 and 3: in the "default" |
|
|
|
|
* collation we force ASCII letters to follow ASCII upcase/downcase rules, |
|
|
|
|
* while in a non-default collation we just let the library functions do what |
|
|
|
|
* they will. The case where this matters is treatment of I/i in Turkish, |
|
|
|
|
* and the behavior is meant to match the upper()/lower() SQL functions. |
|
|
|
|
* As a special case, in the "default" collation, (2) and (3) force ASCII |
|
|
|
|
* letters to follow ASCII upcase/downcase rules, while in a non-default |
|
|
|
|
* collation we just let the library functions do what they will. The case |
|
|
|
|
* where this matters is treatment of I/i in Turkish, and the behavior is |
|
|
|
|
* meant to match the upper()/lower() SQL functions. |
|
|
|
|
* |
|
|
|
|
* We store the active collation setting in static variables. In principle |
|
|
|
|
* it could be passed down to here via the regex library's "struct vars" data |
|
|
|
|
|