When key-share locking a tuple that has been not-key-updated, and the
update is a committed transaction, in some cases we raised
serializability errors:
ERROR: could not serialize access due to concurrent update
Because the key-share doesn't conflict with the update, the error is
unnecessary and inconsistent with the case that the update hasn't
committed yet. This causes problems for some usage patterns, even if it
can be claimed that it's sufficient to retry the aborted transaction:
given a steady stream of updating transactions and a long locking
transaction, the long transaction can be starved indefinitely despite
multiple retries.
To fix, we recognize that HeapTupleSatisfiesUpdate can return
HeapTupleUpdated when an updating transaction has committed, and that we
need to deal with that case exactly as if it were a non-committed
update: verify whether the two operations conflict, and if not, carry on
normally. If they do conflict, however, there is a difference: in the
HeapTupleBeingUpdated case we can just sleep until the concurrent
transaction is gone, while in the HeapTupleUpdated case this is not
possible and we must raise an error instead.
Per trouble report from Olivier Dony.
In addition to a couple of test cases that verify the changed behavior,
I added a test case to verify the behavior that remains unchanged,
namely that errors are raised when a update that modifies the key is
used. That must still generate serializability errors. One
pre-existing test case changes behavior; per discussion, the new
behavior is actually the desired one.
Discussion: https://www.postgresql.org/message-id/560AA479.4080807@odoo.comhttps://www.postgresql.org/message-id/20151014164844.3019.25750@wrigleys.postgresql.org
Backpatch to 9.3, where the problem appeared.
In commit afb9249d06, we (probably I) made ExecLockRows assign
null test tuples to all relations of the query while setting up to do an
EvalPlanQual recheck for a newly-updated locked row. This was sheerest
brain fade: we should only set test tuples for relations that are lockable
by the LockRows node, and in particular empty test tuples are only sensible
for inheritance child relations that weren't the source of the current
tuple from their inheritance tree. Setting a null test tuple for an
unrelated table causes it to return NULLs when it should not, as exhibited
in bug #14034 from Bronislav Houdek. To add insult to injury, doing it the
wrong way required two loops where one would suffice; so the corrected code
is even a bit shorter and faster.
Add a regression test case based on his example, and back-patch to 9.5
where the bug was introduced.
Commit 2834855cb added a not-very-carefully-thought-out isolation test
to check a BRIN index bug fix. The test depended on the availability
of the pageinspect contrib module, which meant it did not work in
several common testing scenarios such as "make check-world". It's not
clear whether we want a core test depending on a contrib module like
that, but in any case, failing to deal with the possibility that the
module isn't present in the installation-under-test is not acceptable.
Remove that test pending some better solution.
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions. Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters. The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.
Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.
To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules. This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken. Instead, rely on the SnapshotAny semantics to provide
what it needs. (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)
Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org
In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English. This part submitted by Kevin
Grittner.
Backpatch to 9.5, where BRIN was introduced.
A Salesforce colleague of mine griped that the regression tests don't
exercise EvalPlanQualFetchRowMarks() and allied routines. Which is
a fair complaint. Add test cases that go through the REFERENCE and COPY
code paths. Unfortunately we don't have sufficient infrastructure right
now to exercise the FDW code path in the isolation tests, but this is
surely better than before.
specparse.y and specscanner.l used "string" as a token name. Now, bison
likes to define each token name as a macro for the token code it assigns,
which means those names are basically off-limits for any other use within
the grammar file or included headers. So names as generic as "string" are
dangerous. This is what was causing the recent failures on protosciurus:
some versions of Solaris' sys/kstat.h use "string" as a field name.
With late-model bison we don't see this problem because the token macros
aren't defined till later (that is why castoroides didn't show the problem
even though it's on the same machine). But protosciurus uses bison 1.875
which defines the token macros up front.
This land mine has been there from day one; we'd have found it sooner
except that protosciurus wasn't trying to run the isolation tests till
recently.
To fix, rename the token to "string_literal" which is hopefully less
likely to collide with names used by system headers. Back-patch to
all branches containing the isolation tests.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
Before, make check-world would create a new temporary installation for
each test suite, which is slow and wasteful. Instead, we now create one
test installation that is used by all test suites that are part of a
make run.
The management of the temporary installation is removed from pg_regress
and handled in the makefiles. This allows for better control, and
unifies the code with that of test suites not run through pg_regress.
review and msvc support by Michael Paquier <michael.paquier@gmail.com>
more review by Fabien Coelho <coelho@cri.ensmp.fr>
Reduce lock levels to ShareRowExclusive for the following SQL
CREATE TRIGGER (but not DROP or ALTER)
ALTER TABLE ENABLE TRIGGER
ALTER TABLE DISABLE TRIGGER
ALTER TABLE … ADD CONSTRAINT FOREIGN KEY
Original work by Simon Riggs, extracted and refreshed by Andreas Karlsson
New test cases added by Andreas Karlsson
Reviewed by Noah Misch, Andres Freund, Michael Paquier and Simon Riggs
Up to now, the "child" executor state trees generated for EvalPlanQual
rechecks have simply shared the ResultRelInfo arrays used for the original
execution tree. However, this leads to dangling-pointer problems, because
ExecInitModifyTable() is all too willing to scribble on some fields of the
ResultRelInfo(s) even when it's being run in one of those child trees.
This trashes those fields from the perspective of the parent tree, because
even if the generated subtree is logically identical to what was in use in
the parent, it's in a memory context that will go away when we're done
with the child state tree.
We do however want to share information in the direction from the parent
down to the children; in particular, fields such as es_instrument *must*
be shared or we'll lose the stats arising from execution of the children.
So the simplest fix is to make a copy of the parent's ResultRelInfo array,
but not copy any fields back at end of child execution.
Per report from Manuel Kniep. The added isolation test is based on his
example. In an unpatched memory-clobber-enabled build it will reliably
fail with "ctid is NULL" errors in all branches back to 9.1, as a
consequence of junkfilter->jf_junkAttNo being overwritten with $7f7f.
This test cannot be run as-is before that for lack of WITH syntax; but
I have no doubt that some variant of this problem can arise in older
branches, so apply the code change all the way back.
Commit 0e5680f473 contained a thinko
mixing LOCKMODE with LockTupleMode. This caused misbehavior in the case
where a tuple is marked with a multixact with at most a FOR SHARE lock,
and another transaction tries to acquire a FOR NO KEY EXCLUSIVE lock;
this case should block but doesn't.
Include a new isolation tester spec file to explicitely try all the
tuple lock combinations; without the fix it shows the problem:
starting permutation: s1_begin s1_lcksvpt s1_tuplock2 s2_tuplock3 s1_commit
step s1_begin: BEGIN;
step s1_lcksvpt: SELECT * FROM multixact_conflict FOR KEY SHARE; SAVEPOINT foo;
a
1
step s1_tuplock2: SELECT * FROM multixact_conflict FOR SHARE;
a
1
step s2_tuplock3: SELECT * FROM multixact_conflict FOR NO KEY UPDATE;
a
1
step s1_commit: COMMIT;
With the fixed code, step s2_tuplock3 blocks until session 1 commits,
which is the correct behavior.
All other cases behave correctly.
Backpatch to 9.3, like the commit that introduced the problem.
In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo
WHERE-clause checking on rows that have been updated since the SELECT's
snapshot, it invokes EvalPlanQual processing to do that. If this first
occurs within a non-first child table of an inheritance tree, the previous
coding could accidentally re-return a matching row from an earlier,
already-scanned child table. (And, to add insult to injury, I think this
could make it miss returning a row that should have been returned, if the
updated row that this happens on should still have passed the WHERE qual.)
Per report from Kyotaro Horiguchi; the added isolation test is based on his
test case.
This has been broken for quite awhile, so back-patch to all supported
branches.
This clause changes the behavior of SELECT locking clauses in the
presence of locked rows: instead of causing a process to block waiting
for the locks held by other processes (or raise an error, with NOWAIT),
SKIP LOCKED makes the new reader skip over such rows. While this is not
appropriate behavior for general purposes, there are some cases in which
it is useful, such as queue-like tables.
Catalog version bumped because this patch changes the representation of
stored rules.
Reviewed by Craig Ringer (based on a previous attempt at an
implementation by Simon Riggs, who also provided input on the syntax
used in the current patch), David Rowley, and Álvaro Herrera.
Author: Thomas Munro
If SELECT FOR UPDATE NOWAIT tries to lock a tuple that is concurrently
being updated, it might fail to honor its NOWAIT specification and block
instead of raising an error.
Fix by adding a no-wait flag to EvalPlanQualFetch which it can pass down
to heap_lock_tuple; also use it in EvalPlanQualFetch itself to avoid
blocking while waiting for a concurrent transaction.
Authors: Craig Ringer and Thomas Munro, tweaked by Álvaro
http://www.postgresql.org/message-id/51FB6703.9090801@2ndquadrant.com
Per Thomas Munro in the course of his SKIP LOCKED feature submission,
who also provided one of the isolation test specs.
Backpatch to 9.4, because that's as far back as it applies without
conflicts (although the bug goes all the way back). To that branch also
backpatch Thomas Munro's new NOWAIT test cases, committed in master by
Heikki as commit 9ee16b49f0 .
In support of this, have the MSVC build follow GNU make in preferring
GNUmakefile over Makefile when a directory contains both.
Michael Paquier, reviewed by MauMau.
This reverts commit 45b7abe59e.
It turns out that the %name-prefix syntax without "=" does not work
at all in pre-2.4 Bison. We are not prepared to make such a large
jump in minimum required Bison version just to suppress a warning
message in a version hardly any developers are using yet.
When 3.0 gets more popular, we'll figure out a way to deal with this.
In the meantime, BISONFLAGS=-Wno-deprecated is recommendable for
anyone using 3.0 who doesn't want to see the warning.
%name-prefix doesn't use an "=" sign according to the Bison docs, but it
silently accepted one anyway, until Bison 3.0. This was originally a
typo of mine in commit 012abebab1, and we
seem to have slavishly copied the error into all the other grammar files.
Per report from Vik Fearing; analysis by Peter Eisentraut.
Back-patch to all active branches, since somebody might try to build
a back branch with up-to-date tools.
It's easy to forget using SYSTEMQUOTEs when constructing command strings
for system() or popen(). Even if we fix all the places missing it now, it is
bound to be forgotten again in the future. Introduce wrapper functions that
do the the extra quoting for you, and get rid of SYSTEMQUOTEs in all the
callers.
We previosly used SYSTEMQUOTEs in all the hard-coded command strings, and
this doesn't change the behavior of those. But user-supplied commands, like
archive_command, restore_command, COPY TO/FROM PROGRAM calls, as well as
pgbench's \shell, will now gain an extra pair of quotes. That is desirable,
but if you have existing scripts or config files that include an extra
pair of quotes, those might need to be adjusted.
Reviewed by Amit Kapila and Tom Lane
VALIDATE CONSTRAINT
CLUSTER ON
SET WITHOUT CLUSTER
ALTER COLUMN SET STATISTICS
ALTER COLUMN SET ()
ALTER COLUMN RESET ()
All other sub-commands use AccessExclusiveLock
Simon Riggs and Noah Misch
Reviews by Robert Haas and Andres Freund
We used to have externs for getopt() and its API variables scattered
all over the place. Now that we find we're going to need to tweak the
variable declarations for Cygwin, it seems like a good idea to have
just one place to tweak.
In this commit, the variables are declared "#ifndef HAVE_GETOPT_H".
That may or may not work everywhere, but we'll soon find out.
Andres Freund
This ensures that all stdout output is flushed immediately, to match
stderr. This eliminates the need for fflush(stdout) calls sprinkled all
over the place.
Per Daniel Wood in message 519A79C6.90308@salesforce.com
If a tuple was locked by transaction A, and transaction B updated it,
the new version of the tuple created by B would be locked by A, yet
visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the
lock held by A wouldn't get checked if transaction B later deleted (or
key-updated) the new version of the tuple. This might cause referential
integrity checks to give false positives (that is, allow deletes that
should have been rejected).
This is an easy oversight to have made, because prior to improved tuple
locks in commit 0ac5ad5134 it wasn't possible to have tuples created by
our own transaction that were also locked by remote transactions, and so
locks weren't even considered in that code path.
It is recommended that foreign keys be rechecked manually in bulk after
installing this update, in case some referenced rows are missing with
some referencing row remaining.
Per bug reported by Daniel Wood in
CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com
In 247c76a989, I added some code to do fine-grained checking of
MultiXact status of locking/updating transactions when traversing an
update chain. There was a thinko in that patch which would have the
traversing abort, that is return HeapTupleUpdated, when the other
transaction is a committed lock-only. In this case we should ignore it
and return success instead. Of course, in the case where there is a
committed update, HeapTupleUpdated is the correct return value.
A user-visible symptom of this bug is that in REPEATABLE READ and
SERIALIZABLE transaction isolation modes spurious serializability errors
can occur:
ERROR: could not serialize access due to concurrent update
In order for this to happen, there needs to be a tuple that's key-share-
locked and also updated, and the update must abort; a subsequent
transaction trying to acquire a new lock on that tuple would abort with
the above error. The reason is that the initial FOR KEY SHARE is seen
as committed by the new locking transaction, which triggers this bug.
(If the UPDATE commits, then the serialization error is correctly
reported.)
When running a query in READ COMMITTED mode, what happens is that the
locking is aborted by the HeapTupleUpdated return value, then
EvalPlanQual fetches the newest version of the tuple, which is then the
only version that gets locked. (The second time the tuple is checked
there is no misbehavior on the committed lock-only, because it's not
checked by the code that traverses update chains; so no bug.) Only the
newest version of the tuple is locked, not older ones, but this is
harmless.
The isolation test added by this commit illustrates the desired
behavior, including the proper serialization errors that get thrown.
Backpatch to 9.3.
HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while
checking the contents of a multixact and finding it contains an aborted
update, by setting the HEAP_XMAX_INVALID bit. This would lead to
concurrent transactions not noticing any previous locks held by
transactions that might still be running, and thus being able to acquire
subsequent locks they wouldn't be normally able to acquire.
This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3,
like that commit.
This change reverts the change to the delete-abort-savept isolation test
in 1ce150b7bb, because that behavior change was caused by this bug.
Noticed by Andres Freund while investigating a different issue reported
by Noah Misch.
It is dangerous to do so, because some code expects to be able to see what's
the true Xmax even if it is aborted (particularly while traversing HOT
chains). So don't do it, and instead rely on the callers to verify for
abortedness, if necessary.
Several race conditions and bugs fixed in the process. One isolation test
changes the expected output due to these.
This also reverts commit c235a6a589, which is no longer necessary.
Backpatch to 9.3, where this function was introduced.
Andres Freund
Commit 061b88c732 saved argv0 to a
global buffer without ensuring that it was zero terminated,
allowing references to it to overrun the buffer and access other
memory. This probably would not have presented any security risk,
but could have resulted in very confusing failures if the path to
the executable was very long.
Reported by David Rowley
We can't search for the isolationtester binary until after we've set
up the environment, because otherwise when find_other_exec() tries
to invoke it with the -V option, it might fail for inability to
locate a working libpq. So postpone that step.
Andres Freund
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
Previously, isolationtester would forbid returning tuples in
session-specific teardown (but not global teardown), as well as in
global setup. Allow these places to return tuples, too.
Notice and complain about PQcancel() failures. Also, don't dump core if
an error PGresult doesn't contain severity and message subfields, as it
might not if it was generated by libpq itself. (We have a longstanding
TODO item to improve that, but in the meantime isolationtester had better
cope.)
I tripped across the latter item while investigating a trouble report on
buildfarm member spoonbill. As for the former, there's no evidence that
PQcancel failure is actually involved in spoonbill's problem, but it still
seems like a bad idea to ignore an error return code.
The buildfarm members using -DCLOBBER_CACHE_ALWAYS still don't like this
test. Some experimentation shows that on my machine, isolationtester's
query to check for "waiting" state takes 2 to 2.5 seconds to bind+execute
under -DCLOBBER_CACHE_ALWAYS. Set the timeouts to 5 seconds to leave some
headroom for possibly-slower buildfarm critters.
Really we ought to fix the "waiting" query, which is not only horridly
slow but outright wrong in detail; and then maybe we can back off these
timeouts. But right now I'm just trying to get the buildfarm green again.
Buildfarm member friarbird doesn't like this test as-committed, evidently
because it's so slow that the test framework doesn't reliably notice that
the backend is waiting before the timeout goes off. (This is not totally
surprising, since friarbird builds with -DCLOBBER_CACHE_ALWAYS.) Increase
the timeout delay from 1 second to 2 in hopes of resolving that problem.
This GUC allows limiting the time spent waiting to acquire any one
heavyweight lock.
In support of this, improve the recently-added timeout infrastructure
to permit efficiently enabling or disabling multiple timeouts at once.
That reduces the performance hit from turning on lock_timeout, though
it's still not zero.
Zoltán Böszörményi, reviewed by Tom Lane,
Stephen Frost, and Hari Babu