You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
postgres/src/backend/utils/adt/lockfuncs.c

829 lines
20 KiB

/*-------------------------------------------------------------------------
*
23 years ago
* lockfuncs.c
* Functions for SQL access to various lock-manager capabilities.
23 years ago
*
* Copyright (c) 2002-2016, PostgreSQL Global Development Group
23 years ago
*
* IDENTIFICATION
* src/backend/utils/adt/lockfuncs.c
*
*-------------------------------------------------------------------------
23 years ago
*/
#include "postgres.h"
#include "access/htup_details.h"
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "funcapi.h"
#include "miscadmin.h"
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
#include "storage/predicate_internals.h"
#include "utils/builtins.h"
23 years ago
/* This must match enum LockTagType! */
static const char *const LockTagTypeNames[] = {
"relation",
"extend",
"page",
"tuple",
"transactionid",
"virtualxid",
Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
11 years ago
"speculative token",
"object",
"userlock",
"advisory"
};
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
/* This must match enum PredicateLockTargetType (predicate_internals.h) */
static const char *const PredicateLockTagTypeNames[] = {
"relation",
"page",
"tuple"
};
/* Working status for pg_lock_status */
typedef struct
{
LockData *lockData; /* state data from lmgr */
int currIdx; /* current PROCLOCK index */
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
PredicateLockData *predLockData; /* state data for pred locks */
int predLockIdx; /* current index for pred lock */
} PG_Lock_Status;
/* Number of columns in pg_locks output */
#define NUM_LOCK_STATUS_COLUMNS 15
/*
* VXIDGetDatum - Construct a text representation of a VXID
*
* This is currently only used in pg_lock_status, so we put it here.
*/
static Datum
VXIDGetDatum(BackendId bid, LocalTransactionId lxid)
{
/*
* The representation is "<bid>/<lxid>", decimal and unsigned decimal
* respectively. Note that elog.c also knows how to format a vxid.
*/
char vxidstr[32];
snprintf(vxidstr, sizeof(vxidstr), "%d/%u", bid, lxid);
return CStringGetTextDatum(vxidstr);
}
/*
* pg_lock_status - produce a view with one row per held or awaited lock mode
*/
23 years ago
Datum
pg_lock_status(PG_FUNCTION_ARGS)
23 years ago
{
23 years ago
FuncCallContext *funcctx;
PG_Lock_Status *mystatus;
LockData *lockData;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
PredicateLockData *predLockData;
23 years ago
if (SRF_IS_FIRSTCALL())
{
23 years ago
TupleDesc tupdesc;
MemoryContext oldcontext;
23 years ago
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
23 years ago
/*
* switch to memory context appropriate for multiple function calls
23 years ago
*/
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
/* build tupdesc for result tuples */
/* this had better match pg_locks view in system_views.sql */
tupdesc = CreateTemplateTupleDesc(NUM_LOCK_STATUS_COLUMNS, false);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "locktype",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "database",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 3, "relation",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "page",
INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 5, "tuple",
INT2OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 6, "virtualxid",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 7, "transactionid",
XIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 8, "classid",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 9, "objid",
OIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 10, "objsubid",
INT2OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 11, "virtualtransaction",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 12, "pid",
INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 13, "mode",
TEXTOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 14, "granted",
BOOLOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
BOOLOID, -1, 0);
funcctx->tuple_desc = BlessTupleDesc(tupdesc);
23 years ago
/*
* Collect all the locking information that we will format and send
* out as a result set.
23 years ago
*/
mystatus = (PG_Lock_Status *) palloc(sizeof(PG_Lock_Status));
funcctx->user_fctx = (void *) mystatus;
23 years ago
mystatus->lockData = GetLockStatusData();
mystatus->currIdx = 0;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
mystatus->predLockData = GetPredicateLockStatusData();
mystatus->predLockIdx = 0;
23 years ago
MemoryContextSwitchTo(oldcontext);
23 years ago
}
23 years ago
funcctx = SRF_PERCALL_SETUP();
mystatus = (PG_Lock_Status *) funcctx->user_fctx;
lockData = mystatus->lockData;
23 years ago
while (mystatus->currIdx < lockData->nelements)
23 years ago
{
23 years ago
bool granted;
LOCKMODE mode = 0;
const char *locktypename;
char tnbuf[32];
Datum values[NUM_LOCK_STATUS_COLUMNS];
bool nulls[NUM_LOCK_STATUS_COLUMNS];
23 years ago
HeapTuple tuple;
Datum result;
LockInstanceData *instance;
23 years ago
instance = &(lockData->locks[mystatus->currIdx]);
23 years ago
/*
* Look to see if there are any held lock modes in this PROCLOCK. If
* so, report, and destructively modify lockData so we don't report
* again.
23 years ago
*/
granted = false;
if (instance->holdMask)
23 years ago
{
for (mode = 0; mode < MAX_LOCKMODES; mode++)
{
if (instance->holdMask & LOCKBIT_ON(mode))
{
granted = true;
instance->holdMask &= LOCKBIT_OFF(mode);
break;
}
}
23 years ago
}
/*
23 years ago
* If no (more) held modes to report, see if PROC is waiting for a
* lock on this lock.
*/
if (!granted)
23 years ago
{
if (instance->waitLockMode != NoLock)
{
/* Yes, so report it with proper mode */
mode = instance->waitLockMode;
23 years ago
/*
* We are now done with this PROCLOCK, so advance pointer to
* continue with next one on next call.
*/
mystatus->currIdx++;
}
else
{
/*
* Okay, we've displayed all the locks associated with this
* PROCLOCK, proceed to the next one.
*/
mystatus->currIdx++;
continue;
}
}
23 years ago
/*
* Form tuple with appropriate data.
*/
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
if (instance->locktag.locktag_type <= LOCKTAG_LAST_TYPE)
locktypename = LockTagTypeNames[instance->locktag.locktag_type];
else
{
snprintf(tnbuf, sizeof(tnbuf), "unknown %d",
(int) instance->locktag.locktag_type);
locktypename = tnbuf;
}
values[0] = CStringGetTextDatum(locktypename);
switch ((LockTagType) instance->locktag.locktag_type)
{
case LOCKTAG_RELATION:
case LOCKTAG_RELATION_EXTEND:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_PAGE:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[3] = UInt32GetDatum(instance->locktag.locktag_field3);
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_TUPLE:
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[3] = UInt32GetDatum(instance->locktag.locktag_field3);
values[4] = UInt16GetDatum(instance->locktag.locktag_field4);
nulls[5] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_TRANSACTION:
values[6] =
TransactionIdGetDatum(instance->locktag.locktag_field1);
nulls[1] = true;
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_VIRTUALTRANSACTION:
values[5] = VXIDGetDatum(instance->locktag.locktag_field1,
instance->locktag.locktag_field2);
nulls[1] = true;
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[6] = true;
nulls[7] = true;
nulls[8] = true;
nulls[9] = true;
break;
case LOCKTAG_OBJECT:
case LOCKTAG_USERLOCK:
case LOCKTAG_ADVISORY:
default: /* treat unknown locktags like OBJECT */
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
values[7] = ObjectIdGetDatum(instance->locktag.locktag_field2);
values[8] = ObjectIdGetDatum(instance->locktag.locktag_field3);
values[9] = Int16GetDatum(instance->locktag.locktag_field4);
nulls[2] = true;
nulls[3] = true;
nulls[4] = true;
nulls[5] = true;
nulls[6] = true;
break;
23 years ago
}
values[10] = VXIDGetDatum(instance->backend, instance->lxid);
if (instance->pid != 0)
values[11] = Int32GetDatum(instance->pid);
else
nulls[11] = true;
values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
values[13] = BoolGetDatum(granted);
values[14] = BoolGetDatum(instance->fastpath);
23 years ago
tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
result = HeapTupleGetDatum(tuple);
SRF_RETURN_NEXT(funcctx, result);
23 years ago
}
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
/*
* Have returned all regular locks. Now start on the SIREAD predicate
* locks.
*/
predLockData = mystatus->predLockData;
if (mystatus->predLockIdx < predLockData->nelements)
{
PredicateLockTargetType lockType;
PREDICATELOCKTARGETTAG *predTag = &(predLockData->locktags[mystatus->predLockIdx]);
SERIALIZABLEXACT *xact = &(predLockData->xacts[mystatus->predLockIdx]);
Datum values[NUM_LOCK_STATUS_COLUMNS];
bool nulls[NUM_LOCK_STATUS_COLUMNS];
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
HeapTuple tuple;
Datum result;
mystatus->predLockIdx++;
/*
* Form tuple with appropriate data.
*/
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
/* lock type */
lockType = GET_PREDICATELOCKTARGETTAG_TYPE(*predTag);
values[0] = CStringGetTextDatum(PredicateLockTagTypeNames[lockType]);
/* lock target */
values[1] = GET_PREDICATELOCKTARGETTAG_DB(*predTag);
values[2] = GET_PREDICATELOCKTARGETTAG_RELATION(*predTag);
if (lockType == PREDLOCKTAG_TUPLE)
values[4] = GET_PREDICATELOCKTARGETTAG_OFFSET(*predTag);
else
nulls[4] = true;
if ((lockType == PREDLOCKTAG_TUPLE) ||
(lockType == PREDLOCKTAG_PAGE))
values[3] = GET_PREDICATELOCKTARGETTAG_PAGE(*predTag);
else
nulls[3] = true;
/* these fields are targets for other types of locks */
nulls[5] = true; /* virtualxid */
nulls[6] = true; /* transactionid */
nulls[7] = true; /* classid */
nulls[8] = true; /* objid */
nulls[9] = true; /* objsubid */
/* lock holder */
values[10] = VXIDGetDatum(xact->vxid.backendId,
xact->vxid.localTransactionId);
if (xact->pid != 0)
values[11] = Int32GetDatum(xact->pid);
else
nulls[11] = true;
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
/*
* Lock mode. Currently all predicate locks are SIReadLocks, which are
* always held (never waiting) and have no fast path
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
*/
values[12] = CStringGetTextDatum("SIReadLock");
values[13] = BoolGetDatum(true);
values[14] = BoolGetDatum(false);
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
result = HeapTupleGetDatum(tuple);
SRF_RETURN_NEXT(funcctx, result);
}
SRF_RETURN_DONE(funcctx);
23 years ago
}
/*
* Functions for manipulating advisory locks
*
* We make use of the locktag fields as follows:
*
* field1: MyDatabaseId ... ensures locks are local to each database
* field2: first of 2 int4 keys, or high-order half of an int8 key
* field3: second of 2 int4 keys, or low-order half of an int8 key
* field4: 1 if using an int8 key, 2 if using 2 int4 keys
*/
#define SET_LOCKTAG_INT64(tag, key64) \
SET_LOCKTAG_ADVISORY(tag, \
MyDatabaseId, \
(uint32) ((key64) >> 32), \
(uint32) (key64), \
1)
#define SET_LOCKTAG_INT32(tag, key1, key2) \
SET_LOCKTAG_ADVISORY(tag, MyDatabaseId, key1, key2, 2)
static void
PreventAdvisoryLocksInParallelMode(void)
{
if (IsInParallelMode())
ereport(ERROR,
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
errmsg("cannot use advisory locks during a parallel operation")));
}
/*
* pg_advisory_lock(int8) - acquire exclusive lock on an int8 key
*/
Datum
pg_advisory_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ExclusiveLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock(int8) - acquire xact scoped
* exclusive lock on an int8 key
*/
Datum
pg_advisory_xact_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ExclusiveLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_lock_shared(int8) - acquire share lock on an int8 key
*/
Datum
pg_advisory_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ShareLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock_shared(int8) - acquire xact scoped
* share lock on an int8 key
*/
Datum
pg_advisory_xact_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
(void) LockAcquire(&tag, ShareLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_try_advisory_lock(int8) - acquire exclusive lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ExclusiveLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock(int8) - acquire xact scoped
* exclusive lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ExclusiveLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_lock_shared(int8) - acquire share lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ShareLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock_shared(int8) - acquire xact scoped
* share lock on an int8 key, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockAcquire(&tag, ShareLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_advisory_unlock(int8) - release exclusive lock on an int8 key
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
bool res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockRelease(&tag, ExclusiveLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_shared(int8) - release share lock on an int8 key
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_shared_int8(PG_FUNCTION_ARGS)
{
int64 key = PG_GETARG_INT64(0);
LOCKTAG tag;
bool res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT64(tag, key);
res = LockRelease(&tag, ShareLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_lock(int4, int4) - acquire exclusive lock on 2 int4 keys
*/
Datum
pg_advisory_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ExclusiveLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock(int4, int4) - acquire xact scoped
* exclusive lock on 2 int4 keys
*/
Datum
pg_advisory_xact_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ExclusiveLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_lock_shared(int4, int4) - acquire share lock on 2 int4 keys
*/
Datum
pg_advisory_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ShareLock, true, false);
PG_RETURN_VOID();
}
/*
* pg_advisory_xact_lock_shared(int4, int4) - acquire xact scoped
* share lock on 2 int4 keys
*/
Datum
pg_advisory_xact_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
(void) LockAcquire(&tag, ShareLock, false, false);
PG_RETURN_VOID();
}
/*
* pg_try_advisory_lock(int4, int4) - acquire exclusive lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ExclusiveLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock(int4, int4) - acquire xact scoped
* exclusive lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ExclusiveLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_lock_shared(int4, int4) - acquire share lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ShareLock, true, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_try_advisory_xact_lock_shared(int4, int4) - acquire xact scoped
* share lock on 2 int4 keys, no wait
*
* Returns true if successful, false if lock not available
*/
Datum
pg_try_advisory_xact_lock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
LockAcquireResult res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockAcquire(&tag, ShareLock, false, true);
PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL);
}
/*
* pg_advisory_unlock(int4, int4) - release exclusive lock on 2 int4 keys
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
bool res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockRelease(&tag, ExclusiveLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_shared(int4, int4) - release share lock on 2 int4 keys
*
* Returns true if successful, false if lock was not held
*/
Datum
pg_advisory_unlock_shared_int4(PG_FUNCTION_ARGS)
{
int32 key1 = PG_GETARG_INT32(0);
int32 key2 = PG_GETARG_INT32(1);
LOCKTAG tag;
bool res;
PreventAdvisoryLocksInParallelMode();
SET_LOCKTAG_INT32(tag, key1, key2);
res = LockRelease(&tag, ShareLock, true);
PG_RETURN_BOOL(res);
}
/*
* pg_advisory_unlock_all() - release all advisory locks
*/
Datum
pg_advisory_unlock_all(PG_FUNCTION_ARGS)
{
LockReleaseSession(USER_LOCKMETHOD);
PG_RETURN_VOID();
}