postgres

Commit Graph

Author	SHA1	Message	Date
Tom Lane	94de3a679b	Improve ispell dictionary's defenses against bad affix files. Don't crash if an ispell dictionary definition contains flags but not any compound affixes. (This isn't a security issue since only superusers can install affix files, but still it's a bad thing.) Also, be more careful about detecting whether an affix-file FLAG command is old-format (ispell) or new-format (myspell/hunspell). And change the error message about mixed old-format and new-format commands into something intelligible. Per bug #11770 from Emre Hasegeli. Back-patch to all supported branches.	11 years ago
Tom Lane	4741e31600	Prevent potential overruns of fixed-size buffers. Coverity identified a number of places in which it couldn't prove that a string being copied into a fixed-size buffer would fit. We believe that most, perhaps all of these are in fact safe, or are copying data that is coming from a trusted source so that any overrun is not really a security issue. Nonetheless it seems prudent to forestall any risk by using strlcpy() and similar functions. Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports. In addition, fix a potential null-pointer-dereference crash in contrib/chkpass. The crypt(3) function is defined to return NULL on failure, but chkpass.c didn't check for that before using the result. The main practical case in which this could be an issue is if libc is configured to refuse to execute unapproved hashing algorithms (e.g., "FIPS mode"). This ideally should've been a separate commit, but since it touches code adjacent to one of the buffer overrun changes, I included it in this commit to avoid last-minute merge issues. This issue was reported by Honza Horak. Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()	12 years ago
Tom Lane	6755558b92	Improve aset.c's space management in contexts with small maxBlockSize. The previous coding would allow requests up to half of maxBlockSize to be treated as "chunks", but when that actually did happen, we'd waste nearly half of the space in the malloc block containing the chunk, if no smaller requests came along to fill it. Avoid this scenario by limiting the maximum size of a chunk to 1/8th maxBlockSize, so that we can waste no more than 1/8th of the allocated space. This will not change the behavior at all for the default context size parameters (with large maxBlockSize), but it will change the behavior when using ALLOCSET_SMALL_MAXSIZE. In particular, there's no longer a need for spell.c to be overly concerned about the request size parameters it uses, so remove a rather unhelpful comment about that. Merlin Moncure, per an idea of Tom Lane's	15 years ago
Tom Lane	1e16a8107d	Teach regular expression operators to honor collations. This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.	15 years ago
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	15 years ago
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	15 years ago
Tom Lane	3e5f9412d0	Reduce the memory requirement for large ispell dictionaries. This patch eliminates per-chunk palloc overhead for most small allocations needed in the representation of an ispell dictionary. This saves close to a factor of 2 on the current Czech ispell data. While it doesn't cover every last small allocation in the ispell code, we are at the point of diminishing returns, because about 95% of the allocations are covered already. Pavel Stehule, rather heavily revised by Tom	15 years ago
Tom Lane	9b910def24	Clean up temporary-memory management during ispell dictionary loading. Add explicit initialization and cleanup functions to spell.c, and keep all working state in the already-existing ISpellDict struct. This lets us get rid of a static variable along with some extremely shaky assumptions about usage of child memory contexts. This commit is just code beautification and has no impact on functionality or performance, but it opens the way to a less-grotty implementation of Pavel's memory-saving hack, which will follow shortly.	15 years ago
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	16 years ago
Bruce Momjian	0239800893	Update copyright for the year 2010.	16 years ago
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	17 years ago
Teodor Sigaev	b5b3134813	Fix incorrect dereferencing of char* to array's index. Per Tommy Gildseth <tommy.gildseth@usit.uio.no> report	17 years ago
Bruce Momjian	511db38ace	Update copyright for 2009.	17 years ago
Tom Lane	30dc388a0d	Fix a few places that were non-multibyte-safe in tsearch configuration file parsing. Per bug #4253 from Giorgio Valoti.	18 years ago
Tom Lane	fbeb9da22b	Improve error reporting for problems in text search configuration files by installing an error context subroutine that will provide the file name and line number for all errors detected while reading a config file. Some of the reader routines were already doing that in an ad-hoc way for errors detected directly in the reader, but it didn't help for problems detected in subroutines, such as encoding violations. Back-patch to 8.3 because 8.3 is where people will be trying to debug configuration files.	18 years ago
Tom Lane	716e8b8374	Fix RS_isRegis() to agree exactly with RS_compile()'s idea of what's a valid regis. Correct the latter's oversight that a bracket-expression needs to be terminated. Reduce the ereports to elogs, since they are now not expected to ever be hit (thus addressing Alvaro's original complaint). In passing, const-ify the string argument to RS_compile.	18 years ago
Teodor Sigaev	cd42dd5a17	Fix core dump with buffer-overrun by too long infinitive. Add checking of using fixed length arrays to prevent array's overrun. Per report by Hannes Dorbath <light@theendofthetunnel.de> and comments by Tom.	18 years ago
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	18 years ago
Tom Lane	bb0e3011f8	Make a cleanup pass over error reports in tsearch code. Use ereport for user-facing errors, fix some poor choices of errcode, adhere to message style guide.	18 years ago
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	18 years ago
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	18 years ago
Teodor Sigaev	13553cbbff	Fix header's size of structs defines in ispell. Backpatch is needed for contrib version.	19 years ago
Teodor Sigaev	53ef36cb4a	Fix recently introduced bugs about parsing ispell/hunspell files. In most cases it cause because of unneeded lowercasing of flags. Per experiment with regression checks with ispell dictionary.	19 years ago
Teodor Sigaev	83d0b9f3ca	Fixes from Heikki Linnakangas <heikki@enterprisedb.com>: Apparently it's a bug I introduced when I refactored spell.c to use the readline function for reading and recoding the input file. I didn't notice that some calls to STRNCMP used the non-lowercased version of the input line.	19 years ago
Tom Lane	7351b5fa17	Cleanup for some problems in tsearch patch: - ispell initialization crashed on empty dictionary file - ispell initialization crashed on affix file with prefixes but no suffixes - stop words file was run through pg_verify_mbstr, with database encoding, but it's supposed to be UTF-8; similar bug for synonym files - bunch of comments added, typos fixed, and other cleanup Introduced consistent encoding checking/conversion of data read from tsearch configuration files, by doing this in a single t_readline() subroutine (replacing direct usages of fgets). Cleaned up API for readstopwords too. Heikki Linnakangas	19 years ago
Tom Lane	140d4ebcb4	Tsearch2 functionality migrates to core. The bulk of this work is by Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.	19 years ago
Teodor Sigaev	6cd9a58480	Fix core dump of ispell for case of non-successfull initialization. Previous versions aren't affected. Fix synonym dictionary init: string should be malloc'ed, not palloc'ed. Bug introduced recently while fixing lowerstr().	19 years ago
Teodor Sigaev	3de2682a1e	Fix lowercasing while parse OO dictionary	19 years ago
Teodor Sigaev	419fe7cd1b	Fix bug http://archives.postgresql.org/pgsql-bugs/2006-10/msg00258.php . Fix string's length calculation for recoding, fix strlower() to avoid wrong assumption about length of recoded string (was: recoded string is no greater that source, it may not true for multibyte encodings) Thanks to Thomas H. <me@alternize.com> and Magnus Hagander <mha@sollentuna.net>	19 years ago
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	20 years ago
Tom Lane	ae643747b1	Fix a passel of recently-committed violations of the rule 'thou shalt have no other gods before c.h'. Also remove some demonstrably redundant #include lines, mostly of <errno.h> which was added to c.h years ago.	20 years ago
Teodor Sigaev	04e9704b9e	Now ispell dictionary can eat dictionaries in MySpell format, used by OpenOffice. Dictionaries are placed at http://lingucomponent.openoffice.org/spell_dic.html Dictionary automatically recognizes format of files. Warning. MySpell's format has limitation with compound word support: it's impossible to mark affix as compound-only affix. So for norwegian, german etc languages it's recommended to use original ispell format. For that reason I don't want to remove my2ispell scripts, it's has workaround at least for norwegian language.	20 years ago
Neil Conway	8e5a10d46c	This patch makes the error message strings throughout the backend more compliant with the error message style guide. In particular, errdetail should begin with a capital letter and end with a period, whereas errmsg should not. I also fixed a few related issues in passing, such as fixing the repeated misspelling of "lexeme" in contrib/tsearch2 (per Tom's suggestion).	20 years ago
Teodor Sigaev	dde9457294	Fixing and improve compound word support. This changes cannot be applied to previous version iwthout recreating tsvector fields... Thanks to Alexander Presber <aljoscha@weisshuhn.de> to discover a problem.	20 years ago
Teodor Sigaev	01f2172ec1	Allow "'" symbol in affixes ("'s" affix in english): it was diallowed during multibyte support work. Add line number to error output during affix file parsing.	20 years ago
Teodor Sigaev	46a25ce6a9	1 Fix bug with very short word: prefix and suffix might be overlapped, sorry but fix can't be applyed to previous version: it's require refill tsvector... 2 Small optimize of load time for huge dictionaries 3 use palloc instead of malloc during load dict file	20 years ago
Teodor Sigaev	a6fefc866c	Check number of affixes to prevent core dump with zero number of affixes	20 years ago
Teodor Sigaev	7ac8a4be89	Multibyte encodings support for ISpell dictionary	20 years ago
Teodor Sigaev	cb4ea994c6	Improve support of multibyte encoding: - tsvector_(in\|out) - tsquery_(in\|out) - to_tsvector - to_tsquery, plainto_tsquery - 'simple' dictionary	20 years ago
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	21 years ago
Tom Lane	8a65b820e2	Suppress signed-vs-unsigned-char warnings in contrib.	21 years ago
Bruce Momjian	21634e513f	Add extra argument for new pg_regexec API.	21 years ago
Tom Lane	c0e0d3e2e9	Avoid unnecessary dependence on u_int16_t, per buildfarm failure. (It doesn't compile on HPUX either...)	21 years ago
Teodor Sigaev	324300bc7c	improve support of agglutinative languages (query with compound words). regression=# select to_tsquery( '\'fotballklubber\''); to_tsquery ------------------------------------------------ 'fotball' & 'klubb' \| 'fot' & 'ball' & 'klubb' (1 row) So, changed interface to dictionaries, lexize method of dictionary shoud return pointer to aray of TSLexeme structs instead of char*. Last element should have TSLexeme->lexeme == NULL. typedef struct { / number of variant of split word , for example Word 'fotballklubber' (norwegian) has two varian to split: ( fotball, klubb ) and ( fot, ball, klubb ). So, dictionary should return: nvariant lexeme 1 fotball 1 klubb 2 fot 2 ball 2 klubb / uint16 nvariant; / currently unused / uint16 flags; / C-string / char lexeme; } TSLexeme;	21 years ago
Teodor Sigaev	5b354d2c7e	Fixes: 1 Report error message instead of do nothing in case of error in regex 2 Malloced storage for mask, find and repl part of Affix. This parts may be large enough in real life (for example in czech, thanks to moje <moje@kalhotky.net>)	21 years ago
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	22 years ago
Teodor Sigaev	df9d87f608	Previous commit wasnt full...	22 years ago
Teodor Sigaev	de55c0cef6	1 Fix affixes with void replacement (AFAIK, it's only russian) 2 Optimize regex execution	22 years ago
Teodor Sigaev	7cb55d21ed	Fix memory leak with pg_regexec	22 years ago
Teodor Sigaev	d222bb4d5e	Fix memory leak with pg_regcomp	22 years ago

26 Commits (94de3a679bf0afe9bb15ffb7af066b7df58859af)