You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
postgres/src/backend/storage/ipc/shmqueue.c

191 lines
4.5 KiB

/*-------------------------------------------------------------------------
*
* shmqueue.c
* shared memory linked lists
*
* Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
* src/backend/storage/ipc/shmqueue.c
*
* NOTES
*
* Package for managing doubly-linked lists in shared memory.
* The only tricky thing is that SHM_QUEUE will usually be a field
* in a larger record. SHMQueueNext has to return a pointer
* to the record itself instead of a pointer to the SHMQueue field
* of the record. It takes an extra parameter and does some extra
* pointer arithmetic to do this correctly.
*
* NOTE: These are set up so they can be turned into macros some day.
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "storage/shmem.h"
/*
* ShmemQueueInit -- make the head of a new queue point
* to itself
*/
void
SHMQueueInit(SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
queue->prev = queue->next = queue;
}
/*
* SHMQueueIsDetached -- TRUE if element is not currently
* in a queue.
*/
bool
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
SHMQueueIsDetached(const SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
return (queue->prev == NULL);
}
/*
* SHMQueueElemInit -- clear an element's links
*/
void
SHMQueueElemInit(SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
queue->prev = queue->next = NULL;
}
/*
* SHMQueueDelete -- remove an element from the queue and
* close the links
*/
void
SHMQueueDelete(SHM_QUEUE *queue)
{
SHM_QUEUE *nextElem = queue->next;
SHM_QUEUE *prevElem = queue->prev;
Assert(ShmemAddrIsValid(queue));
Assert(ShmemAddrIsValid(nextElem));
Assert(ShmemAddrIsValid(prevElem));
prevElem->next = queue->next;
nextElem->prev = queue->prev;
queue->prev = queue->next = NULL;
}
/*
* SHMQueueInsertBefore -- put elem in queue before the given queue
* element. Inserting "before" the queue head puts the elem
* at the tail of the queue.
*/
void
SHMQueueInsertBefore(SHM_QUEUE *queue, SHM_QUEUE *elem)
{
SHM_QUEUE *prevPtr = queue->prev;
Assert(ShmemAddrIsValid(queue));
Assert(ShmemAddrIsValid(elem));
elem->next = prevPtr->next;
elem->prev = queue->prev;
queue->prev = elem;
prevPtr->next = elem;
}
/*
* SHMQueueInsertAfter -- put elem in queue after the given queue
* element. Inserting "after" the queue head puts the elem
* at the head of the queue.
*/
void
SHMQueueInsertAfter(SHM_QUEUE *queue, SHM_QUEUE *elem)
{
SHM_QUEUE *nextPtr = queue->next;
Assert(ShmemAddrIsValid(queue));
Assert(ShmemAddrIsValid(elem));
elem->prev = nextPtr->prev;
elem->next = queue->next;
queue->next = elem;
nextPtr->prev = elem;
}
/*--------------------
* SHMQueueNext -- Get the next element from a queue
*
* To start the iteration, pass the queue head as both queue and curElem.
* Returns NULL if no more elements.
*
* Next element is at curElem->next. If SHMQueue is part of
* a larger structure, we want to return a pointer to the
* whole structure rather than a pointer to its SHMQueue field.
* For example,
* struct {
* int stuff;
* SHMQueue elem;
* } ELEMType;
* When this element is in a queue, prevElem->next points at struct.elem.
* We subtract linkOffset to get the correct start address of the structure.
*
* calls to SHMQueueNext should take these parameters:
* &(queueHead), &(queueHead), offsetof(ELEMType, elem)
* or
* &(queueHead), &(curElem->elem), offsetof(ELEMType, elem)
*--------------------
*/
Pointer
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
SHMQueueNext(const SHM_QUEUE *queue, const SHM_QUEUE *curElem, Size linkOffset)
{
SHM_QUEUE *elemPtr = curElem->next;
Assert(ShmemAddrIsValid(curElem));
if (elemPtr == queue) /* back to the queue head? */
return NULL;
return (Pointer) (((char *) elemPtr) - linkOffset);
}
/*--------------------
* SHMQueuePrev -- Get the previous element from a queue
*
* Same as SHMQueueNext, just starting at tail and moving towards head
* All other comments and usage applies.
*/
Pointer
SHMQueuePrev(const SHM_QUEUE *queue, const SHM_QUEUE *curElem, Size linkOffset)
{
SHM_QUEUE *elemPtr = curElem->prev;
Assert(ShmemAddrIsValid(curElem));
if (elemPtr == queue) /* back to the queue head? */
return NULL;
return (Pointer) (((char *) elemPtr) - linkOffset);
}
/*
* SHMQueueEmpty -- TRUE if queue head is only element, FALSE otherwise
*/
bool
Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
15 years ago
SHMQueueEmpty(const SHM_QUEUE *queue)
{
Assert(ShmemAddrIsValid(queue));
if (queue->prev == queue)
{
Assert(queue->next == queue);
return TRUE;
}
return FALSE;
}