* Defined new struct WordEntryPosVector that holds a uint16 length and a
variable size array of WordEntries. This replaces the previous
convention of a variable size uint16 array, with the first element
implying the length. WordEntryPosVector has the same layout in memory,
but is more readable in source code. The POSDATAPTR and POSDATALEN
macros are still used, though it would now be more readable to access
the fields in WordEntryPosVector directly.
* Removed needfree field from DocRepresentation. It was always set to false.
* Miscellaneous other commenting and refactoring
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.
- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().
- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
- add code to check that the query tree is well-formed. It was indeed
possible to send malformed queries in binary mode, which produced all
kinds of strange results.
- make the left-field a uint32. There's no reason to
arbitrarily limit it to 16-bits, and it won't increase the disk/memory
footprint either now that QueryOperator and QueryOperand are separate
structs.
- add check_stack_depth() call to all recursive functions I found.
Some of them might have a natural limit so that you can't force
arbitrarily deep recursions, but check_stack_depth() is cheap enough
that seems best to just stick it into anything that might be a problem.
small editorization by me
- Brake the QueryItem struct into QueryOperator and QueryOperand.
Type was really the only common field between them. QueryItem still
exists, and is used in the TSQuery struct as before, but it's now a
union of the two. Many other changes fell from that, like separation
of pushval_asis function into pushValue, pushOperator and pushStop.
- Moved some structs that were for internal use only from header files
to the right .c-files.
- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
about half of the size of tsvector.c, it's also used from tsquery.c, and
it has some data structures of its own, so it seems better to separate
it. Cleaned up the API so that TSVectorParserState is not accessed from
outside tsvector_parser.c.
- Separated enumerations (#defines, really) used for QueryItem.type
field and as return codes from gettoken_query. It was just accidental
code sharing.
- Removed ParseQueryNode struct used internally by makepol and friends.
push*-functions now construct QueryItems directly.
- Changed int4 variables to just ints for variables like "i" or "array
size", where the storage-size was not significant.
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.
Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
suffix, to distinguish them from doubles. Make some function declarations
and definitions use the "const" qualifier for arguments consistently.
Ignore warning 4102 ("unreferenced label"), because such warnings
are always emitted by bison-generated code. Patch from Magnus Hagander.
the 8.1 SQL function definition for it. Per report from Rajesh Kumar Mallah,
such a DBA error doesn't seem at all improbable, and the cost of checking for
it is not very high compared to the cost of running this function. (It would
have been better to change the C name of the function so it wouldn't be called
by the old SQL definition, but it's too late for that now in the 8.2 branch.)
static variables. This avoids any risk of potential non-reentrancy,
and in particular offers a much cleaner workaround for the Intel compiler
bug that was affecting ginutil.c.
1) rank_cd now use weight of lexemes
2) rank_cd and rank can use any combination of normalization methods:
no normalization
normalization by log(length of document)
-----/------- by length of document
-----/------- by number of unique word in document
-----/------- by log(number of unique word in document)
-----/------- by number of covers (only rank_cd)
Improve cover's search.
TODO: changes in documentation
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
1 Comparison operation for tsquery
2 Btree index on tsquery
3 numnode(tsquery) - returns 'length' of tsquery
4 tsquery @ tsquery, tsquery ~ tsquery - contains, contained for tsquery.
Note: They don't gurantee exact result, only MAY BE, so it
useful only for speed up rewrite functions
5 GiST index support for @,~
6 rewrite():
select rewrite(orig, what, to);
select rewrite(ARRAY[orig, what, to]) from tsquery_table;
select rewrite(orig, 'select what, to from tsquery_table;');
7 significantly improve cover algorithm
typedef struct {} WordEntryPos;
to
typedef uint16 WordEntryPos
according to http://www.pgsql.ru/db/mw/msg.html?mid=2035188
Require re-fill all tsvector fields and reindex tsvector indexes.