distribution, by creating a special fast path for the (first few) most common
values of the outer relation. Tuples having hashvalues matching the MCVs
are effectively forced to be in the first batch, so that we never write
them out to the batch temp files.
Bryce Cutt and Ramon Lawrence, with some editorialization by me.
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.) Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.
Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.
Hashing for aggregation purposes still needs work, so it's not time to
mark any cross-type operators as hashable for general use, but these cases
work if the operators are so marked by hand in the system catalogs.
match because they contain a null join key (and the join operator is
known strict). Improves performance significantly when the inner
relation contains a lot of nulls, as per bug #2930.
return just a single tuple at a time. Currently the only such node
type is Hash, but I expect we will soon have indexscans that can return
tuple bitmaps. A side benefit is that EXPLAIN ANALYZE now shows the
correct tuple count for a Hash node.
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size. Hash join will now obey the
work_mem limit with some faithfulness. Per my recent proposal
(hash aggregate part isn't done yet though).
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine. This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
instead of only one. This should speed up planning (only one hash path
to consider for a given pair of relations) as well as allow more effective
hashing, when there are multiple hashable joinclauses.
maintained for each cache entry. A cache entry will not be freed until
the matching ReleaseSysCache call has been executed. This eliminates
worries about cache entries getting dropped while still in use. See
my posting to pg-hackers of even date for more info.
right circumstances a hash join executed as a DECLARE CURSOR/FETCH
query would crash the backend. Problem as seen in current sources was
that the hash tables were stored in a context that was a child of
TransactionCommandContext, which got zapped at completion of the FETCH
command --- but cursor cleanup executed at COMMIT expected the tables
to still be valid. I haven't chased down the details as seen in 7.0.*
but I'm sure it's the same general problem.
memory contexts. Currently, only leaks in expressions executed as
quals or projections are handled. Clean up some old dead cruft in
executor while at it --- unused fields in state nodes, that sort of thing.
for details). It doesn't really do that much yet, since there are no
short-term memory contexts in the executor, but the infrastructure is
in place and long-term contexts are handled reasonably. A few long-
standing bugs have been fixed, such as 'VACUUM; anything' in a single
query string crashing. Also, out-of-memory is now considered a
recoverable ERROR, not FATAL.
Eliminate a large amount of crufty, now-dead code in and around
memory management.
Fix problem with holding off SIGTRAP, SIGSEGV, etc in postmaster and
backend startup.
BufFile so that it handles multi-segment temporary files transparently.
This allows sorts and hashes to work with data exceeding 2Gig (or whatever
the local limit on file size is). Change psort.c to use relative seeks
instead of absolute seeks for backwards scanning, so that it won't fail
when the data volume exceeds 2Gig.
fixed-size hashtable. This should prevent 'hashtable out of memory' errors,
unless you really do run out of memory. Note: target size for hashtable
is now taken from -S postmaster switch, not -B, since it is local memory
in the backend rather than shared memory.
about certain to fail anytime it decided the relation to be hashed was
too big to fit in memory --- the code for 'batching' a series of hashjoins
had multiple errors. I've fixed the easier problems. A remaining big
problem is that you can get 'hashtable out of memory' if the code's
guesstimate about how much overflow space it will need turns out wrong.
That will require much more extensive revisions to fix, so I'm committing
these fixes now before I start on that problem.
Note. all include files that have been hit so far have had extraneous
include files cleaned out and are reduced to...the lowest common
"include file", based on 'cc -Wall -I. test.c', where test.c is:
#include "postgres.h"
#include "<top of branches>" (ie. top of branches this time was utils/fcache2.h)