When committing abd94bcac4, I tried to make
it decide what kind of abbreviation to use based only on SIZEOF_DATUM,
without regard to USE_FLOAT8_BYVAL. That attempt was a few bricks short
of a load, so try to fix it, and add a comment explaining what we're
about.
Patch by me; review (but not a full endorsement) by Andrew Gierth.
Be more aggressive about aborting early on if it looks like it's not
helping, but be less aggressive about aborting later on, since it's
more expensive at that point, and also since we're currently aborting
in some cases where abbreviation can still deliver a substantial win.
Peter Geoghegan. Extensive testing by Tomas Vondra.
This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.
Michael Paquier, reviewed by me.
While a new backend nominally participates in sinval signaling starting
from the SharedInvalBackendInit call near the top of InitPostgres, it
cannot recognize sinval messages for unshared catalogs of its database
until it has set up MyDatabaseId. This is not problematic for the catcache
or relcache, which by definition won't have loaded any data from or about
such catalogs before that point. However, commit 568d4138c6
introduced a mechanism for re-using MVCC snapshots for catalog scans, and
made invalidation of those depend on recognizing relevant sinval messages.
So it's possible to establish a catalog snapshot to read pg_authid and
pg_database, then before we set MyDatabaseId, receive sinval messages that
should result in invalidating that snapshot --- but do not, because we
don't realize they are for our database. This mechanism explains the
intermittent buildfarm failures we've seen since commit 31eae6028e.
That commit was not itself at fault, but it introduced a new regression
test that does reconnections concurrently with the "vacuum full pg_am"
command in vacuum.sql. This allowed the pre-existing error to be exposed,
given just the right timing, because we'd fail to update our information
about how to access pg_am. In principle any VACUUM FULL on a system
catalog could have created a similar hazard for concurrent incoming
connections. Perhaps there are more subtle failure cases as well.
To fix, force invalidation of the catalog snapshot as soon as we've
set MyDatabaseId.
Back-patch to 9.4 where the error was introduced.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.
One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.
Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.
Discussion: 20150331141423.GK4878@alap3.anarazel.de
This patch fills in the formerly-stub networksel() and networkjoinsel()
estimation functions. Those are used for << <<= >> >>= and && operators
on inet/cidr types. The estimation is not perfect, certainly, because
we rely on the existing statistics collected for the inet btree operators.
But it's a long way better than nothing, and it's not clear that asking
ANALYZE to collect separate stats for these operators would be a win.
Emre Hasegeli, with reviews from Dilip Kumar and Heikki Linnakangas,
and some further hacking by me
... and rename it and its sibling array_offsets to array_position and
array_positions, to account for the changed behavior.
Having the functions return subscripts better matches existing practice,
and is better suited to using the result value as a subscript into the
array directly. For one-based arrays, the new definition is identical
to what was originally committed.
(We use the term "subscript" in the documentation, which is what we use
whenever we talk about arrays; but the functions themselves are named
using the word "position" to match the standard-defined POSITION()
functions.)
Author: Pavel Stěhule
Behavioral problem noted by Dean Rasheed.
The previous coding in get_const_expr() tried to avoid quoting integer,
float, and numeric literals if at all possible. While that looks nice,
it means that dumped expressions might re-parse to something that's
semantically equivalent but not the exact same parsetree; for example
a FLOAT8 constant would re-parse as a NUMERIC constant with a cast to
FLOAT8. Though the result would be the same after constant-folding,
this is problematic in certain contexts. In particular, Jeff Davis
pointed out that this could cause unexpected failures in ALTER INHERIT
operations because of child tables having not-exactly-equivalent CHECK
expressions. Therefore, favor correctness over legibility and dump
such constants in quotes except in the limited cases where they'll
be interpreted as the same type even without any casting.
This results in assorted small changes in the regression test outputs,
and will affect display of user-defined views and rules similarly.
The odds of that causing problems in the field seem non-negligible;
given the lack of previous complaints, it seems best not to change
this in the back branches.
This improves on commit bbfd7edae5 by
making two simple changes:
* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces
pgindent's tendency to misformat declarations involving them.
* attributes are now always attached to function declarations, not
definitions. Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway. (It does little good to put a
noreturn annotation where callers can't see it.) In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.
I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.
Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.
This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.
Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
Previously, CHECK constraints of the same scope were checked in whatever
order they happened to be read from pg_constraint. (Usually, but not
reliably, this would be creation order for domain constraints and reverse
creation order for table constraints, because of differing implementation
details.) Nondeterministic results of this sort are problematic at least
for testing purposes, and in discussion it was agreed to be a violation of
the principle of least astonishment. Therefore, borrow the principle
already established for triggers, and apply such checks in name order
(using strcmp() sort rules). This lets users control the check order
if they have a mind to.
Domain CHECK constraints still follow the rule of checking lower nested
domains' constraints first; the name sort only applies to multiple
constraints attached to the same domain.
In passing, I failed to resist the temptation to wordsmith a bit in
create_domain.sgml.
Apply to HEAD only, since this could result in a behavioral change in
existing applications, and the potential regression test failures have
not actually been observed in our buildfarm.
Revert "to_char(float4/8): zero pad to specified length". There are
too many platform-specific problems, and the proper rounding is missing.
Also revert companion patch 9d61b9953c.
Previously, zero padding was limited to the internal length, rather than
the specified length. This allows it to match to_char(int/numeric), which
always padded to the specified length.
Regression tests added.
BACKWARD INCOMPATIBILITY
On platforms where we support 128bit integers, use them to implement
faster transition functions for sum(int8), avg(int8),
var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
numeric as a transition type.
In some synthetic benchmarks this has been shown to provide significant
speedups.
Bumps catversion.
Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson
Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
Oskari Saarenmaa, David Rowley
The pg_stat and pg_signal-related functions have been using GetUserId()
instead of has_privs_of_role() for checking if the current user should
be able to see details in pg_stat_activity or signal other processes,
requiring a user to do 'SET ROLE' for inheirited roles for a permissions
check, unlike other permissions checks.
This patch changes that behavior to, instead, act like most other
permission checks and use has_privs_of_role(), removing the 'SET ROLE'
need. Documentation and error messages updated accordingly.
Per discussion with Alvaro, Peter, Adam (though not using Adam's patch),
and Robert.
Reviewed by Jeevan Chalke.
This patch fixes two inadequacies of the PlanRowMark representation.
First, that the original LockingClauseStrength isn't stored (and cannot be
inferred for foreign tables, which always get ROW_MARK_COPY). Since some
PlanRowMarks are created out of whole cloth and don't actually have an
ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to
enum LockingClauseStrength, which is fairly annoying but the alternatives
seem worse. This fix allows getting rid of the use of get_parse_rowmark()
in FDWs (as per the discussion around commits 462bd95705 and
8ec8760fc8), and it simplifies some things elsewhere.
Second, that the representation assumed that all child tables in an
inheritance hierarchy would use the same RowMarkType. That's true today
but will soon not be true. We add an "allMarkTypes" field that identifies
the union of mark types used in all a parent table's children, and use
that where appropriate (currently, only in preprocess_targetlist()).
In passing fix a couple of minor infelicities left over from the SKIP
LOCKED patch, notably that _outPlanRowMark still thought waitPolicy
is a bool.
Catversion bump is required because the numeric values of enum
LockingClauseStrength can appear in on-disk rules.
Extracted from a much larger patch to support foreign table inheritance;
it seemed worth breaking this out, since it's a separable concern.
Shigeru Hanada and Etsuro Fujita, somewhat modified by me
The introduction of min_wal_size & max_wal_size in 88e9823026 makes it
feasible to increase the default upper bound in checkpoint
size. Previously raising the default would lead to a increased disk
footprint, even if more segments weren't beneficial. The low default of
checkpoint size is one of common performance problem users have thus
increasing the default makes sense. Setups where the increase in
maximum disk usage is a problem will very likely have to run with a
modified configuration anyway.
Discussion: 54F4EFB8.40202@agliodbs.com,
CA+TgmoZEAgX5oMGJOHVj8L7XOkAe05Gnf45rP40m-K3FhZRVKg@mail.gmail.com
Author: Josh Berkus, after a discussion involving lots of people.
While the SQL standard is pretty vague on the overall topic of operator
precedence (because it never presents a unified BNF for all expressions),
it does seem reasonable to conclude from the spec for <boolean value
expression> that OR has the lowest precedence, then AND, then NOT, then IS
tests, then the six standard comparison operators, then everything else
(since any non-boolean operator in a WHERE clause would need to be an
argument of one of these).
We were only sort of on board with that: most notably, while "<" ">" and
"=" had properly low precedence, "<=" ">=" and "<>" were treated as generic
operators and so had significantly higher precedence. And "IS" tests were
even higher precedence than those, which is very clearly wrong per spec.
Another problem was that "foo NOT SOMETHING bar" constructs, such as
"x NOT LIKE y", were treated inconsistently because of a bison
implementation artifact: they had the documented precedence with respect
to operators to their right, but behaved like NOT (i.e., very low priority)
with respect to operators to their left.
Fixing the precedence issues is just a small matter of rearranging the
precedence declarations in gram.y, except for the NOT problem, which
requires adding an additional lookahead case in base_yylex() so that we
can attach a different token precedence to NOT LIKE and allied two-word
operators.
The bulk of this patch is not the bug fix per se, but adding logic to
parse_expr.c to allow giving warnings if an expression has changed meaning
because of these precedence changes. These warnings are off by default
and are enabled by the new GUC operator_precedence_warning. It's believed
that very few applications will be affected by these changes, but it was
agreed that a warning mechanism is essential to help debug any that are.
Error messages informing the user that no such column exists can
sometimes provoke a perplexed response. This often happens due to
a subtle typo in the column name or, perhaps less likely, in the
alias name. To speed discovery of what the real issue is in such
cases, we'll now search the range table for approximate matches.
If there are one or two such matches that are good enough to think
that they might be what the user intended to type, and better than
all other approximate matches, we'll issue a hint suggesting that
the user might have intended to reference those columns.
Peter Geoghegan and Robert Haas
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability. It's also just
generally dubious to do so in a header as widely included as c.h.
Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.
This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...
Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server
compresses a full page image written to WAL when full_page_writes is on or
during a base backup. A compressed page image will be decompressed during WAL
replay. Turning this parameter on can reduce the WAL volume without increasing
the risk of unrecoverable data corruption, but at the cost of some extra CPU
spent on the compression during WAL logging and on the decompression during
WAL replay.
This commit changes the WAL format (so bumping WAL version number) so that
the one-byte flag indicating whether a full page image is compressed or not is
included in its header information. This means that the commit increases the
WAL volume one-byte per a full page image even if WAL compression is not used
at all. We can save that one-byte by borrowing one-bit from the existing field
like hole_offset in the header and using it as the flag, for example. But which
would reduce the code readability and the extensibility of the feature.
Per discussion, it's not worth paying those prices to save only one-byte, so we
decided to add the one-byte flag to the header.
This commit doesn't introduce any new compression algorithm like lz4.
Currently a full page image is compressed using the existing PGLZ algorithm.
Per discussion, we decided to use it at least in the first version of the
feature because there were no performance reports showing that its compression
ratio is unacceptably lower than that of other algorithm. Of course,
in the future, it's worth considering the support of other compression
algorithm for the better compression.
Rahila Syed and Michael Paquier, reviewed in various versions by myself,
Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
user specifiers in place of an explicit user name.
This commit also fixes some quite ugly handling of special standards-
mandated syntax in CREATE USER MAPPING, which in particular would fail
to work in presence of a role named "current_user".
The special role specifiers PUBLIC and NONE also have more consistent
handling now.
Also take the opportunity to add location tracking to user specifiers.
Authors: Kyotaro Horiguchi. Heavily reworked by Álvaro Herrera.
Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
Commit 5cefbf5a6c introduced an
assumption that this field would always be non-NULL when doing a merge
pass, but that's not true. Without this fix, you can crash the server
by building a hash index that is sufficiently large relative to
maintenance_work_mem, or by triggering a large datum sort.
Commit 5ea86e6e65 changed the comments
for that field to say that it would be set in all cases except for the
hash index case, but that wasn't (and still isn't) true.
The datum-sort failure was spotted by Tomas Vondra; initial analysis
of that failure was by Peter Geoghegan. The remaining issues were
spotted by me during review of the surrounding code, and the patch is
all my fault.
genericcostestimate() and friends used the cost of the entire indexqual
expressions as the charge for initial evaluation of indexscan arguments.
But of course the index column is not evaluated, only the other side
of the qual expression, so this was a bad overestimate if the index
column was an expensive expression.
To fix, refactor the logic in this area so that there's a single routine
charged with deconstructing index quals and figuring out what is the index
column and what is the comparison expression. This is more or less free in
the case of btree indexes, since btcostestimate() was doing equivalent
deconstruction already. It probably adds a bit of new overhead in the cases
of other index types, but not a lot. (In the case of GIN I think I saved
something by getting rid of code that wasn't aware that the index column
associations were already available "for free".)
Per recent gripe from Jeff Janes.
Arguably this is a bug fix, but I'm hesitant to back-patch because of the
possibility of destabilizing plan choices that people may be happy with.
Previously, we cached domain constraints for the life of a query, or
really for the life of the FmgrInfo struct that was used to invoke
domain_in() or domain_check(). But plpgsql (and probably other places)
are set up to cache such FmgrInfos for the whole lifespan of a session,
which meant they could be enforcing really stale sets of constraints.
On the other hand, searching pg_constraint once per query gets kind of
expensive too: testing says that as much as half the runtime of a
trivial query such as "SELECT 0::domaintype" went into that.
To fix this, delegate the responsibility for tracking a domain's
constraints to the typcache, which has the infrastructure needed to
detect syscache invalidation events that signal possible changes.
This not only removes unnecessary repeat reads of pg_constraint,
but ensures that we never apply stale constraint data: whatever we
use is the current data according to syscache rules.
Unfortunately, the current configuration of the system catalogs means
we have to flush cached domain-constraint data whenever either pg_type
or pg_constraint changes, which happens rather a lot (eg, creation or
deletion of a temp table will do it). It might be worth rearranging
things to split pg_constraint into two catalogs, of which the domain
constraint one would probably be very low-traffic. That's a job for
another patch though, and in any case this patch should improve matters
materially even with that handicap.
This patch makes use of the recently-added memory context reset callback
feature to manage the lifespan of domain constraint caches, so that we
don't risk deleting a cache that might be in the midst of evaluation.
Although this is a bug fix as well as a performance improvement, no
back-patch. There haven't been many if any field complaints about
stale domain constraint checks, so it doesn't seem worth taking the
risk of modifying data structures as basic as MemoryContexts in back
branches.
This makes "ALTER TABLE tabname ALTER tscol TYPE ... USING tscol AT TIME
ZONE 'UTC'" skip rewriting the table when altering from "timestamp" to
"timestamptz" or vice versa. While it would be nicer still to optimize
this in the absence of the USING clause given timezone==UTC, transform
functions must consult IMMUTABLE facts only.
Add documentation about the new reset callback mechanism.
Also, at long last, recast the existing text so that it describes the
current context mechanisms as established fact rather than something
we're going to implement. Shoulda done that in 2001 or so ...
The type variable must get set on first iteration of the while loop,
but there are reasonably modern gcc versions that don't realize that.
Initialize it with a dummy value. This undoes a removal of initialization
in commit 654809e770.
That is, MemoryContextReset() now means what was formerly meant by
MemoryContextResetAndDeleteChildren(), and the latter is now just a macro
alias for the former. If you really want the functionality that was
formerly provided by MemoryContextReset(), what you have to do is
MemoryContextResetChildren() plus MemoryContextResetOnly() (which is a
new API to reset *only* the named context and not touch its children).
The reason for this change is that near fifteen years of experience has
proven that there is noplace where old-style MemoryContextReset() is
actually what you want. Making that the default behavior has led to lots
of context-leakage bugs, while we've not found anyplace where it's actually
necessary to keep the child contexts; at least the standard regression
tests do not reveal anyplace where this change breaks anything. And there
are upcoming patches that will introduce additional reasons why child
contexts need to be removed.
We could change existing calls of MemoryContextResetAndDeleteChildren to be
just MemoryContextReset, but for the moment I'll leave them alone; they're
not costing anything.
This allows cleanup actions to be registered to be called just before a
particular memory context's contents are flushed (either by deletion or
MemoryContextReset). The patch in itself has no use-cases for this, but
several likely reasons for wanting this exist.
In passing, per discussion, rearrange some boolean fields in struct
MemoryContextData so as to avoid wasted padding space. For safety,
this requires making allowInCritSection's existence unconditional;
but I think that's a better approach than what was there anyway.
Typo "aggreagate" appeared three times, and the return value of function
JsonbIteratorNext() was being assigned to an int variable in a bunch of
places.
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.
The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.
Per gripe on pgsql-general.
When I rewrote this in commit 56a79a869b,
I forgot that it's possible for the input array type to change from one
call to the next (this can happen when applying the function to
pg_statistic columns, for instance). Fix that.
The "simple" path for printing VALUES clauses doesn't work if we need
to attach nondefault column aliases, because there's noplace to do that
in the minimal VALUES() syntax. So modify get_simple_values_rte() to
detect nondefault aliases and treat that as a non-simple case. This
further exposes that the "non-simple" path never actually worked;
it didn't produce valid syntax. Fix that too. Per bug #12789 from
Curtis McEnroe, and analysis by Andrew Gierth.
Back-patch to all supported branches. Before 9.3, this also requires
back-patching the part of commit 092d7ded29
that created get_simple_values_rte() to begin with; inserting the extra
test into the old factorization of that logic would've been too messy.
Instead of having a single knob (checkpoint_segments) that both triggers
checkpoints, and determines how many checkpoints to recycle, they are now
separate concerns. There is still an internal variable called
CheckpointSegments, which triggers checkpoints. But it no longer determines
how many segments to recycle at a checkpoint. That is now auto-tuned by
keeping a moving average of the distance between checkpoints (in bytes),
and trying to keep that many segments in reserve. The advantage of this is
that you can set max_wal_size very high, but the system won't actually
consume that much space if there isn't any need for it. The min_wal_size
sets a floor for that; you can effectively disable the auto-tuning behavior
by setting min_wal_size equal to max_wal_size.
The max_wal_size setting is now the actual target size of WAL at which a
new checkpoint is triggered, instead of the distance between checkpoints.
Previously, you could calculate the actual WAL usage with the formula
"(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
patch, you set the desired WAL usage with max_wal_size, and the system
calculates the appropriate CheckpointSegments with the reverse of that
formula. That's a lot more intuitive for administrators to set.
Reviewed by Amit Kapila and Venkata Balaji N.
Replace the if-switch-case constructs with two conversion tables,
containing all the supported conversions between human-readable unit
strings and the base units used in GUC variables. This makes the code
easier to read, and makes adding new units simpler.
Previously when the standby server failed to retrieve WAL files from any sources
(i.e., streaming replication, local pg_xlog directory or WAL archive), it always
waited for five seconds (hard-coded) before the next attempt. For example,
this is problematic in warm-standby because restore_command can fail
every five seconds even while new WAL file is expected to be unavailable for
a long time and flood the log files with its error messages.
This commit adds new parameter, wal_retrieve_retry_interval, to control that
wait time.
Alexey Vasiliev and Michael Paquier, reviewed by Andres Freund and me.
Everywhere else in the file, "context" is of type MemoryContext and
"set" is of type AllocSet. AllocSetContextCreate uses a variable of
type AllocSet, so rename it from "context" to "set".
Previously, each new array created a new memory context that started
out at 8kB. This is incredibly wasteful when there are lots of small
groups of just a few elements each.
Change initArrayResult() and friends to accept a "subcontext" argument
to indicate whether the caller wants the ArrayBuildState allocated in
a new subcontext or not. If not, it can no longer be released
separately from the rest of the memory context.
Fixes bug report by Frank van Vugt on 2013-10-19.
Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.