mirror of https://github.com/postgres/postgres
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
538 lines
25 KiB
538 lines
25 KiB
![]()
15 years ago
|
src/backend/storage/lmgr/README-SSI
|
||
|
|
||
|
Serializable Snapshot Isolation (SSI) and Predicate Locking
|
||
|
===========================================================
|
||
|
|
||
|
This is currently sitting in the lmgr directory because about 90% of
|
||
|
the code is an implementation of predicate locking, which is required
|
||
|
for SSI, rather than being directly related to SSI itself. When
|
||
|
another use for predicate locking justifies the effort to tease these
|
||
|
two things apart, this README file should probably be split.
|
||
|
|
||
|
|
||
|
Credits
|
||
|
-------
|
||
|
|
||
|
This feature was developed by Kevin Grittner and Dan R. K. Ports,
|
||
|
with review and suggestions from Joe Conway, Heikki Linnakangas, and
|
||
|
Jeff Davis. It is based on work published in these papers:
|
||
|
|
||
|
Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008.
|
||
|
Serializable isolation for snapshot databases.
|
||
|
In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD
|
||
|
international conference on Management of data,
|
||
|
pages 729–738, New York, NY, USA. ACM.
|
||
|
http://doi.acm.org/10.1145/1376616.1376690
|
||
|
|
||
|
Michael James Cahill. 2009.
|
||
|
Serializable Isolation for Snapshot Databases.
|
||
|
Sydney Digital Theses.
|
||
|
University of Sydney, School of Information Technologies.
|
||
|
http://hdl.handle.net/2123/5353
|
||
|
|
||
|
|
||
|
Overview
|
||
|
--------
|
||
|
|
||
|
With true serializable transactions, if you can show that your
|
||
|
transaction will do the right thing if there are no concurrent
|
||
|
transactions, it will do the right thing in any mix of serializable
|
||
|
transactions or be rolled back with a serialization failure. This
|
||
|
feature has been implemented in PostgreSQL using SSI.
|
||
|
|
||
|
|
||
|
Serializable and Snapshot Transaction Isolation Levels
|
||
|
------------------------------------------------------
|
||
|
|
||
|
Serializable transaction isolation is attractive for shops with
|
||
|
active development by many programmers against a complex schema
|
||
|
because it guarantees data integrity with very little staff time --
|
||
|
if a transaction can be shown to always do the right thing when it is
|
||
|
run alone (before or after any other transaction), it will always do
|
||
|
the right thing in any mix of concurrent serializable transactions.
|
||
|
Where conflicts with other transactions would result in an
|
||
|
inconsistent state within the database, or an inconsistent view of
|
||
|
the data, a serializable transaction will block or roll back to
|
||
|
prevent the anomaly. The SQL standard provides a specific SQLSTATE
|
||
|
for errors generated when a transaction rolls back for this reason,
|
||
|
so that transactions can be retried automatically.
|
||
|
|
||
|
Before version 9.1 PostgreSQL did not support a full serializable
|
||
|
isolation level. A request for serializable transaction isolation
|
||
|
actually provided snapshot isolation. This has well known anomalies
|
||
|
which can allow data corruption or inconsistent views of the data
|
||
|
during concurrent transactions; although these anomalies only occur
|
||
|
when certain patterns of read-write dependencies exist within a set
|
||
|
of concurrent transactions. Where these patterns exist, the anomalies
|
||
|
can be prevented by introducing conflicts through explicitly
|
||
|
programmed locks or otherwise unnecessary writes to the database.
|
||
|
Snapshot isolation is popular because performance is better than
|
||
|
serializable isolation and the integrity guarantees which it does
|
||
|
provide allow anomalies to be avoided or managed with reasonable
|
||
|
effort in many environments.
|
||
|
|
||
|
|
||
|
Serializable Isolation Implementation Strategies
|
||
|
------------------------------------------------
|
||
|
|
||
|
Techniques for implementing full serializable isolation have been
|
||
|
published and in use in many database products for decades. The
|
||
|
primary technique which has been used is Strict 2 Phase Locking
|
||
|
(S2PL), which operates by blocking writes against data which has been
|
||
|
read by concurrent transactions and blocking any access (read or
|
||
|
write) against data which has been written by concurrent
|
||
|
transactions. A cycle in a graph of blocking indicates a deadlock,
|
||
|
requiring a rollback. Blocking and deadlocks under S2PL in high
|
||
|
contention workloads can be debilitating, crippling throughput and
|
||
|
response time.
|
||
|
|
||
|
A new technique for implementing full serializable isolation in an
|
||
|
MVCC database appears in the literature beginning in 2008. This
|
||
|
technique, known as Serializable Snapshot Isolation (SSI) has many of
|
||
|
the advantages of snapshot isolation. In particular, reads don't
|
||
|
block anything and writes don't block reads. Essentially, it runs
|
||
|
snapshot isolation but monitors the read-write conflicts between
|
||
|
transactions to identify dangerous structures in the transaction
|
||
|
graph which indicate that a set of concurrent transactions might
|
||
|
produce an anomaly, and rolls back transactions to ensure that no
|
||
|
anomalies occur. It will produce some false positives (where a
|
||
|
transaction is rolled back even though there would not have been an
|
||
|
anomaly), but will never let an anomaly occur. In the two known
|
||
|
prototype implementations, performance for many workloads (even with
|
||
|
the need to restart transactions which are rolled back) is very close
|
||
|
to snapshot isolation and generally far better than an S2PL
|
||
|
implementation.
|
||
|
|
||
|
|
||
|
Apparent Serial Order of Execution
|
||
|
----------------------------------
|
||
|
|
||
|
One way to understand when snapshot anomalies can occur, and to
|
||
|
visualize the difference between the serializable implementations
|
||
|
described above, is to consider that among transactions executing at
|
||
|
the serializable transaction isolation level, the results are
|
||
|
required to be consistent with some serial (one-at-a-time) execution
|
||
|
of the transactions[1]. How is that order determined in each?
|
||
|
|
||
|
S2PL locks rows used by the transaction in a way which blocks
|
||
|
conflicting access, so that at the moment of a successful commit it
|
||
|
is certain that no conflicting access has occurred. Some transactions
|
||
|
may have blocked, essentially being partially serialized with the
|
||
|
committing transaction, to allow this. Some transactions may have
|
||
|
been rolled back, due to cycles in the blocking. But with S2PL,
|
||
|
transactions can always be viewed as having occurred serially, in the
|
||
|
order of successful commit.
|
||
|
|
||
|
With snapshot isolation, reads never block writes, nor vice versa, so
|
||
|
there is much less actual serialization. The order in which
|
||
|
transactions appear to have executed is determined by something more
|
||
|
subtle than in S2PL: read/write dependencies. If a transaction
|
||
|
attempts to read data which is not visible to it because the
|
||
|
transaction which wrote it (or will later write it) is concurrent
|
||
|
(one of them was running when the other acquired its snapshot), then
|
||
|
the reading transaction appears to have executed first, regardless of
|
||
|
the actual sequence of transaction starts or commits (since it sees a
|
||
|
database state prior to that in which the other transaction leaves
|
||
|
it). If one transaction has both rw-dependencies in (meaning that a
|
||
|
concurrent transaction attempts to read data it writes) and out
|
||
|
(meaning it attempts to read data a concurrent transaction writes),
|
||
|
and a couple other conditions are met, there can appear to be a cycle
|
||
|
in execution order of the transactions. This is when the anomalies
|
||
|
occur.
|
||
|
|
||
|
SSI works by watching for the conditions mentioned above, and rolling
|
||
|
back a transaction when needed to prevent any anomaly. The apparent
|
||
|
order of execution will always be consistent with any actual
|
||
|
serialization (i.e., a transaction which run by itself can always be
|
||
|
considered to have run after any transactions committed before it
|
||
|
started and before any transacton which starts after it commits); but
|
||
|
among concurrent transactions it will appear that the transaction on
|
||
|
the read side of a rw-dependency executed before the transaction on
|
||
|
the write side.
|
||
|
|
||
|
|
||
|
PostgreSQL Implementation
|
||
|
-------------------------
|
||
|
|
||
|
The implementation of serializable transactions for PostgreSQL is
|
||
|
accomplished through Serializable Snapshot Isolation (SSI), based on
|
||
|
the work of Cahill, et al. Fundamentally, this allows snapshot
|
||
|
isolation to run as it has, while monitoring for conditions which
|
||
|
could create a serialization anomaly.
|
||
|
|
||
|
* Since this technique is based on Snapshot Isolation (SI), those
|
||
|
areas in PostgreSQL which don't use SI can't be brought under SSI.
|
||
|
This includes system tables, temporary tables, sequences, hint bit
|
||
|
rewrites, etc. SSI can not eliminate existing anomalies in these
|
||
|
areas.
|
||
|
|
||
|
* Any transaction which is run at a transaction isolation level
|
||
|
other than SERIALIZABLE will not be affected by SSI. If you want to
|
||
|
enforce business rules through SSI, all transactions should be run at
|
||
|
the SERIALIZABLE transaction isolation level, and that should
|
||
|
probably be set as the default.
|
||
|
|
||
|
* If all transactions are run at the SERIALIZABLE transaction
|
||
|
isolation level, business rules can be enforced in triggers or
|
||
|
application code without ever having a need to acquire an explicit
|
||
|
lock or to use SELECT FOR SHARE or SELECT FOR UPDATE.
|
||
|
|
||
|
* Those who want to continue to use snapshot isolation without
|
||
|
the additional protections of SSI (and the associated costs of
|
||
|
enforcing those protections), can use the REPEATABLE READ transaction
|
||
|
isolation level. This level will retain its legacy behavior, which
|
||
|
is identical to the old SERIALIZABLE implementation and fully
|
||
|
consistent with the standard's requirements for the REPEATABLE READ
|
||
|
transaction isolation level.
|
||
|
|
||
|
* Performance under this SSI implementation will be significantly
|
||
|
improved if transactions which don't modify permanent tables are
|
||
|
declared to be READ ONLY before they begin reading data.
|
||
|
|
||
|
* Performance under SSI will tend to degrade more rapidly with a
|
||
|
large number of active database transactions than under less strict
|
||
|
isolation levels. Limiting the number of active transactions through
|
||
|
use of a connection pool or similar techniques may be necessary to
|
||
|
maintain good performance.
|
||
|
|
||
|
* Any transaction which must be rolled back to prevent
|
||
|
serialization anomalies will fail with SQLSTATE 40001, which has a
|
||
|
standard meaning of "serialization failure".
|
||
|
|
||
|
* This SSI implementation makes an effort to choose the
|
||
|
transaction to be cancelled such that an immediate retry of the
|
||
|
transaction will not fail due to conflicts with exactly the same
|
||
|
transactions. Pursuant to this goal, no transaction is cancelled
|
||
|
until one of the other transactions in the set of conflicts which
|
||
|
could generate an anomaly has successfully committed. This is
|
||
|
conceptually similar to how write conflicts are handled. To fully
|
||
|
implement this guarantee there needs to be a way to roll back the
|
||
|
active transaction for another process with a serialization failure
|
||
|
SQLSTATE, even if it is "idle in transaction".
|
||
|
|
||
|
|
||
|
Predicate Locking
|
||
|
-----------------
|
||
|
|
||
|
Both S2PL and SSI require some form of predicate locking to handle
|
||
|
situations where reads conflict with later inserts or with later
|
||
|
updates which move data into the selected range. PostgreSQL didn't
|
||
|
already have predicate locking, so it needed to be added to support
|
||
|
full serializable transactions under either strategy. Practical
|
||
|
implementations of predicate locking generally involve acquiring
|
||
|
locks against data as it is accessed, using multiple granularities
|
||
|
(tuple, page, table, etc.) with escalation as needed to keep the lock
|
||
|
count to a number which can be tracked within RAM structures, and
|
||
|
this was used in PostgreSQL. Coarse granularities can cause some
|
||
|
false positive indications of conflict. The number of false positives
|
||
|
can be influenced by plan choice.
|
||
|
|
||
|
|
||
|
Implementation overview
|
||
|
-----------------------
|
||
|
|
||
|
New RAM structures, inspired by those used to track traditional locks
|
||
|
in PostgreSQL, but tailored to the needs of SIREAD predicate locking,
|
||
|
are used. These refer to physical objects actually accessed in the
|
||
|
course of executing the query, to model the predicates through
|
||
|
inference. Anyone interested in this subject should review the
|
||
|
Hellerstein, Stonebraker and Hamilton paper[2], along with the
|
||
|
locking papers referenced from that and the Cahill papers.
|
||
|
|
||
|
Because the SIREAD locks don't block, traditional locking techniques
|
||
|
were be modified. Intent locking (locking higher level objects
|
||
|
before locking lower level objects) doesn't work with non-blocking
|
||
|
"locks" (which are, in some respects, more like flags than locks).
|
||
|
|
||
|
A configurable amount of shared memory is reserved at postmaster
|
||
|
start-up to track predicate locks. This size cannot be changed
|
||
|
without a restart.
|
||
|
|
||
|
* To prevent resource exhaustion, multiple fine-grained locks may
|
||
|
be promoted to a single coarser-grained lock as needed.
|
||
|
|
||
|
* An attempt to acquire an SIREAD lock on a tuple when the same
|
||
|
transaction already holds an SIREAD lock on the page or the relation
|
||
|
will be ignored. Likewise, an attempt to lock a page when the
|
||
|
relation is locked will be ignored, and the acquisition of a coarser
|
||
|
lock will result in the automatic release of all finer-grained locks
|
||
|
it covers.
|
||
|
|
||
|
|
||
|
Heap locking
|
||
|
------------
|
||
|
|
||
|
Predicate locks will be acquired for the heap based on the following:
|
||
|
|
||
|
* For a table scan, the entire relation will be locked.
|
||
|
|
||
|
* Each tuple read which is visible to the reading transaction
|
||
|
will be locked, whether or not it meets selection criteria; except
|
||
|
that there is no need to acquire an SIREAD lock on a tuple when the
|
||
|
transaction already holds a write lock on any tuple representing the
|
||
|
row, since a rw-dependency would also create a ww-dependency which
|
||
|
has more aggressive enforcement and will thus prevent any anomaly.
|
||
|
|
||
|
|
||
|
Index AM implementations
|
||
|
------------------------
|
||
|
|
||
|
Since predicate locks only exist to detect writes which conflict with
|
||
|
earlier reads, and heap tuple locks are acquired to cover all heap
|
||
|
tuples actually read, including those read through indexes, the index
|
||
|
tuples which were actually scanned are not of interest in themselves;
|
||
|
we only care about their "new neighbors" -- later inserts into the
|
||
|
index which would have been included in the scan had they existed at
|
||
|
the time. Conceptually, we want to lock the gaps between and
|
||
|
surrounding index entries within the scanned range.
|
||
|
|
||
|
Correctness requires that any insert into an index generates a
|
||
|
rw-conflict with a concurrent serializable transaction if, after that
|
||
|
insert, re-execution of any index scan of the other transaction would
|
||
|
access the heap for a row not accessed during the previous execution.
|
||
|
Note that a non-HOT update which expires an old index entry covered
|
||
|
by the scan and adds a new entry for the modified row's new tuple
|
||
|
need not generate a conflict, although an update which "moves" a row
|
||
|
into the scan must generate a conflict. While correctness allows
|
||
|
false positives, they should be minimized for performance reasons.
|
||
|
|
||
|
Several optimizations are possible:
|
||
|
|
||
|
* An index scan which is just finding the right position for an
|
||
|
index insertion or deletion need not acquire a predicate lock.
|
||
|
|
||
|
* An index scan which is comparing for equality on the entire key
|
||
|
for a unique index need not acquire a predicate lock as long as a key
|
||
|
is found corresponding to a visible tuple which has not been modified
|
||
|
by another transaction -- there are no "between or around" gaps to
|
||
|
cover.
|
||
|
|
||
|
* As long as built-in foreign key enforcement continues to use
|
||
|
its current "special tricks" to deal with MVCC issues, predicate
|
||
|
locks should not be needed for scans done by enforcement code.
|
||
|
|
||
|
* If a search determines that no rows can be found regardless of
|
||
|
index contents because the search conditions are contradictory (e.g.,
|
||
|
x = 1 AND x = 2), then no predicate lock is needed.
|
||
|
|
||
|
Other index AM implementation considerations:
|
||
|
|
||
|
* If a btree search discovers that no root page has yet been
|
||
|
created, a predicate lock on the index relation is required;
|
||
|
otherwise btree searches must get to the leaf level to determine
|
||
|
which tuples match, so predicate locks go there.
|
||
|
|
||
|
* GiST searches can determine that there are no matches at any
|
||
|
level of the index, so there must be a predicate lock at each index
|
||
|
level during a GiST search. An index insert at the leaf level can
|
||
|
then be trusted to ripple up to all levels and locations where
|
||
|
conflicting predicate locks may exist.
|
||
|
|
||
|
* The effects of page splits, overflows, consolidations, and
|
||
|
removals must be carefully reviewed to ensure that predicate locks
|
||
|
aren't "lost" during those operations, or kept with pages which could
|
||
|
get re-used for different parts of the index.
|
||
|
|
||
|
|
||
|
Innovations
|
||
|
-----------
|
||
|
|
||
|
The PostgreSQL implementation of Serializable Snapshot Isolation
|
||
|
differs from what is described in the cited papers for several
|
||
|
reasons:
|
||
|
|
||
|
1. PostgreSQL didn't have any existing predicate locking. It had
|
||
|
to be added from scratch.
|
||
|
|
||
|
2. The existing in-memory lock structures were not suitable for
|
||
|
tracking SIREAD locks.
|
||
|
* The database products used for the prototype
|
||
|
implementations for the papers used update-in-place with a rollback
|
||
|
log for their MVCC implementations, while PostgreSQL leaves the old
|
||
|
version of a row in place and adds a new tuple to represent the row
|
||
|
at a new location.
|
||
|
* In PostgreSQL, tuple level locks are not held in RAM for
|
||
|
any length of time; lock information is written to the tuples
|
||
|
involved in the transactions.
|
||
|
* In PostgreSQL, existing lock structures have pointers to
|
||
|
memory which is related to a connection. SIREAD locks need to persist
|
||
|
past the end of the originating transaction and even the connection
|
||
|
which ran it.
|
||
|
* PostgreSQL needs to be able to tolerate a large number of
|
||
|
transactions executing while one long-running transaction stays open
|
||
|
-- the in-RAM techniques discussed in the papers wouldn't support
|
||
|
that.
|
||
|
|
||
|
3. Unlike the database products used for the prototypes described
|
||
|
in the papers, PostgreSQL didn't already have a true serializable
|
||
|
isolation level distinct from snapshot isolation.
|
||
|
|
||
|
4. PostgreSQL supports subtransactions -- an issue not mentioned
|
||
|
in the papers.
|
||
|
|
||
|
5. PostgreSQL doesn't assign a transaction number to a database
|
||
|
transaction until and unless necessary.
|
||
|
|
||
|
6. PostgreSQL has pluggable data types with user-definable
|
||
|
operators, as well as pluggable index types, not all of which are
|
||
|
based around data types which support ordering.
|
||
|
|
||
|
7. Some possible optimizations became apparent during development
|
||
|
and testing.
|
||
|
|
||
|
Differences from the implementation described in the papers are
|
||
|
listed below.
|
||
|
|
||
|
* New structures needed to be created in shared memory to track
|
||
|
the proper information for serializable transactions and their SIREAD
|
||
|
locks.
|
||
|
|
||
|
* Because PostgreSQL does not have the same concept of an "oldest
|
||
|
transaction ID" for all serializable transactions as assumed in the
|
||
|
Cahill these, we track the oldest snapshot xmin among serializable
|
||
|
transactions, and a count of how many active transactions use that
|
||
|
xmin. When the count hits zero we find the new oldest xmin and run a
|
||
|
clean-up based on that.
|
||
|
|
||
|
* Because reads in a subtransaction may cause that subtransaction
|
||
|
to roll back, thereby affecting what is written by the top level
|
||
|
transaction, predicate locks must survive a subtransaction rollback.
|
||
|
As a consequence, all xid usage in SSI, including predicate locking,
|
||
|
is based on the top level xid. When looking at an xid that comes
|
||
|
from a tuple's xmin or xmax, for example, we always call
|
||
|
SubTransGetTopmostTransaction() before doing much else with it.
|
||
|
|
||
|
* Predicate locking in PostgreSQL will start at the tuple level
|
||
|
when possible, with automatic conversion of multiple fine-grained
|
||
|
locks to coarser granularity as need to avoid resource exhaustion.
|
||
|
The amount of memory used for these structures will be configurable,
|
||
|
to balance RAM usage against SIREAD lock granularity.
|
||
|
|
||
|
* A process-local copy of locks held by a process and the coarser
|
||
|
covering locks with counts, are kept to support granularity promotion
|
||
|
decisions with low CPU and locking overhead.
|
||
|
|
||
|
* Conflicts will be identified by looking for predicate locks
|
||
|
when tuples are written and looking at the MVCC information when
|
||
|
tuples are read. There is no matching between two RAM-based locks.
|
||
|
|
||
|
* Because write locks are stored in the heap tuples rather than a
|
||
|
RAM-based lock table, the optimization described in the Cahill thesis
|
||
|
which eliminates an SIREAD lock where there is a write lock is
|
||
|
implemented by the following:
|
||
|
1. When checking a heap write for conflicts against existing
|
||
|
predicate locks, a tuple lock on the tuple being written is removed.
|
||
|
2. When acquiring a predicate lock on a heap tuple, we
|
||
|
return quickly without doing anything if it is a tuple written by the
|
||
|
reading transaction.
|
||
|
|
||
|
* Rather than using conflictIn and conflictOut pointers which use
|
||
|
NULL to indicate no conflict and a self-reference to indicate
|
||
|
multiple conflicts or conflicts with committed transactions, we use a
|
||
|
list of rw-conflicts. With the more complete information, false
|
||
|
positives are reduced and we have sufficient data for more aggressive
|
||
|
clean-up and other optimizations.
|
||
|
o We can avoid ever rolling back a transaction until and
|
||
|
unless there is a pivot where a transaction on the conflict *out*
|
||
|
side of the pivot committed before either of the other transactions.
|
||
|
o We can avoid ever rolling back a transaction when the
|
||
|
transaction on the conflict *in* side of the pivot is explicitly or
|
||
|
implicitly READ ONLY unless the transaction on the conflict *out*
|
||
|
side of the pivot committed before the READ ONLY transaction acquired
|
||
|
its snapshot. (An implicit READ ONLY transaction is one which
|
||
|
committed without writing, even though it was not explicitly declared
|
||
|
to be READ ONLY.)
|
||
|
o We can more aggressively clean up conflicts, predicate
|
||
|
locks, and SSI transaction information.
|
||
|
|
||
|
* Allow a READ ONLY transaction to "opt out" of SSI if there are
|
||
|
no READ WRITE transactions which could cause the READ ONLY
|
||
|
transaction to ever become part of a "dangerous structure" of
|
||
|
overlapping transaction dependencies.
|
||
|
|
||
|
* Allow the user to request that a READ ONLY transaction wait
|
||
|
until the conditions are right for it to start in the "opt out" state
|
||
|
described above. We add a DEFERRABLE state to transactions, which is
|
||
|
specified and maintained in a way similar to READ ONLY. It is
|
||
|
ignored for transactions which are not SERIALIZABLE and READ ONLY.
|
||
|
|
||
|
* When a transaction must be rolled back, we pick among the
|
||
|
active transactions such that an immediate retry will not fail again
|
||
|
on conflicts with the same transactions.
|
||
|
|
||
|
* We use the PostgreSQL SLRU system to hold summarized
|
||
|
information about older committed transactions to put an upper bound
|
||
|
on RAM used. Beyond that limit, information spills to disk.
|
||
|
Performance can degrade in a pessimal situation, but it should be
|
||
|
tolerable, and transactions won't need to be cancelled or blocked
|
||
|
from starting.
|
||
|
|
||
|
|
||
|
R&D Issues
|
||
|
----------
|
||
|
|
||
|
This is intended to be the place to record specific issues which need
|
||
|
more detailed review or analysis.
|
||
|
|
||
|
* WAL file replay. While serializable implementations using S2PL
|
||
|
can guarantee that the write-ahead log contains commits in a sequence
|
||
|
consistent with some serial execution of serializable transactions,
|
||
|
SSI cannot make that guarantee. While the WAL replay is no less
|
||
|
consistent than under snapshot isolation, it is possible that under
|
||
|
PITR recovery or hot standby a database could reach a readable state
|
||
|
where some transactions appear before other transactions which would
|
||
|
have had to precede them to maintain serializable consistency. In
|
||
|
essence, if we do nothing, WAL replay will be at snapshot isolation
|
||
|
even for serializable transactions. Is this OK? If not, how do we
|
||
|
address it?
|
||
|
|
||
|
* External replication. Look at how this impacts external
|
||
|
replication solutions, like Postgres-R, Slony, pgpool, HS/SR, etc.
|
||
|
This is related to the "WAL file replay" issue.
|
||
|
|
||
|
* Weak-memory-ordering machines. Make sure that shared memory
|
||
|
access which involves visibility across multiple transactions uses
|
||
|
locks as needed to avoid problems. On the other hand, ensure that we
|
||
|
really need volatile where we're using it.
|
||
|
http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php
|
||
|
|
||
|
* UNIQUE btree search for equality on all columns. Since a search
|
||
|
of a UNIQUE index using equality tests on all columns will lock the
|
||
|
heap tuple if an entry is found, it appears that there is no need to
|
||
|
get a predicate lock on the index in that case. A predicate lock is
|
||
|
still needed for such a search if a matching index entry which points
|
||
|
to a visible tuple is not found.
|
||
|
|
||
|
* Planner index probes. To avoid problems with data skew at the
|
||
|
ends of an index which have historically caused bad plans, the
|
||
|
planner now probes the end of an index to see what the maximum or
|
||
|
minimum value is when a query appears to be requesting a range of
|
||
|
data outside what statistics shows is present. These planner checks
|
||
|
don't require predicate locking, but there's currently no easy way to
|
||
|
avoid it. What can we do to avoid predicate locking for such planner
|
||
|
activity?
|
||
|
|
||
|
* Minimize touching of shared memory. Should lists in shared
|
||
|
memory push entries which have just been returned to the front of the
|
||
|
available list, so they will be popped back off soon and some memory
|
||
|
might never be touched, or should we keep adding returned items to
|
||
|
the end of the available list?
|
||
|
|
||
|
|
||
|
Footnotes
|
||
|
---------
|
||
|
|
||
|
[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
|
||
|
Search for serial execution to find the relevant section.
|
||
|
|
||
|
[2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
|
||
|
Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007.
|
||
|
Architecture of a Database System. Foundations and Trends(R) in
|
||
|
Databases Vol. 1, No. 2 (2007) 141–259.
|
||
|
Of particular interest:
|
||
|
* 6.1 A Note on ACID
|
||
|
* 6.2 A Brief Review of Serializability
|
||
|
* 6.3 Locking and Latching
|
||
|
* 6.3.1 Transaction Isolation Levels
|
||
|
* 6.5.3 Next-Key Locking: Physical Surrogates for Logical
|