mirror of https://github.com/postgres/postgres
parent
a4155d3bbd
commit
7809cbf7e6
@ -0,0 +1,99 @@ |
||||
$Header: /cvsroot/pgsql/src/backend/executor/README,v 1.1 2001/05/15 00:35:50 tgl Exp $ |
||||
|
||||
The Postgres Executor |
||||
--------------------- |
||||
|
||||
The executor processes a tree of "plan nodes". The plan tree is essentially |
||||
a demand-pull pipeline of tuple processing operations. Each node, when |
||||
called, will produce the next tuple in its output sequence, or NULL if no |
||||
more tuples are available. If the node is not a primitive relation-scanning |
||||
node, it will have child node(s) that it calls in turn to obtain input |
||||
tuples. |
||||
|
||||
Refinements on this basic model include: |
||||
|
||||
* Choice of scan direction (forwards or backwards). Caution: this is not |
||||
currently well-supported. It works for primitive scan nodes, but not very |
||||
well for joins, aggregates, etc. |
||||
|
||||
* Rescan command to reset a node and make it generate its output sequence |
||||
over again. |
||||
|
||||
* Parameters that can alter a node's results. After adjusting a parameter, |
||||
the rescan command must be applied to that node and all nodes above it. |
||||
There is a moderately intelligent scheme to avoid rescanning nodes |
||||
unnecessarily (for example, Sort does not rescan its input if no parameters |
||||
of the input have changed, since it can just reread its stored sorted data). |
||||
|
||||
The plan tree concept implements SELECT directly: it is only necessary to |
||||
deliver the top-level result tuples to the client, or insert them into |
||||
another table in the case of INSERT ... SELECT. (INSERT ... VALUES is |
||||
handled similarly, but the plan tree is just a Result node with no source |
||||
tables.) For UPDATE, the plan tree selects the tuples that need to be |
||||
updated (WHERE condition) and delivers a new calculated tuple value for each |
||||
such tuple, plus a "junk" (hidden) tuple CTID identifying the target tuple. |
||||
The executor's top level then uses this information to update the correct |
||||
tuple. DELETE is similar to UPDATE except that only a CTID need be |
||||
delivered by the plan tree. |
||||
|
||||
XXX a great deal more documentation needs to be written here... |
||||
|
||||
|
||||
EvalPlanQual (READ COMMITTED update checking) |
||||
--------------------------------------------- |
||||
|
||||
For simple SELECTs, the executor need only pay attention to tuples that are |
||||
valid according to the snapshot seen by the current transaction (ie, they |
||||
were inserted by a previously committed transaction, and not deleted by any |
||||
previously committed transaction). However, for UPDATE and DELETE it is not |
||||
cool to modify or delete a tuple that's been modified by an open or |
||||
concurrently-committed transaction. If we are running in SERIALIZABLE |
||||
isolation level then we just raise an error when this condition is seen to |
||||
occur. In READ COMMITTED isolation level, we must work a lot harder. |
||||
|
||||
The basic idea in READ COMMITTED mode is to take the modified tuple |
||||
committed by the concurrent transaction (after waiting for it to commit, |
||||
if need be) and re-evaluate the query qualifications to see if it would |
||||
still meet the quals. If so, we regenerate the updated tuple (if we are |
||||
doing an UPDATE) from the modified tuple, and finally update/delete the |
||||
modified tuple. SELECT FOR UPDATE behaves similarly, except that its action |
||||
is just to mark the modified tuple for update by the current transaction. |
||||
|
||||
To implement this checking, we actually re-run the entire query from scratch |
||||
for each modified tuple, but with the scan node that sourced the original |
||||
tuple set to return only the modified tuple, not the original tuple or any |
||||
of the rest of the relation. If this query returns a tuple, then the |
||||
modified tuple passes the quals (and the query output is the suitably |
||||
modified update tuple, if we're doing UPDATE). If no tuple is returned, |
||||
then the modified tuple fails the quals, so we ignore it and continue the |
||||
original query. (This is reasonably efficient for simple queries, but may |
||||
be horribly slow for joins. A better design would be nice; one thought for |
||||
future investigation is to treat the tuple substitution like a parameter, |
||||
so that we can avoid rescanning unrelated nodes.) |
||||
|
||||
Note a fundamental bogosity of this approach: if the relation containing |
||||
the original tuple is being used in a self-join, the other instance(s) of |
||||
the relation will be treated as still containing the original tuple, whereas |
||||
logical consistency would demand that the modified tuple appear in them too. |
||||
But we'd have to actually substitute the modified tuple for the original, |
||||
while still returning all the rest of the relation, to ensure consistent |
||||
answers. Implementing this correctly is a task for future work. |
||||
|
||||
In UPDATE/DELETE, only the target relation needs to be handled this way, |
||||
so only one special recheck query needs to execute at a time. In SELECT FOR |
||||
UPDATE, there may be multiple relations flagged FOR UPDATE, so it's possible |
||||
that while we are executing a recheck query for one modified tuple, we will |
||||
hit another modified tuple in another relation. In this case we "stack up" |
||||
recheck queries: a sub-recheck query is spawned in which both the first and |
||||
second modified tuples will be returned as the only components of their |
||||
relations. (In event of success, all these modified tuples will be marked |
||||
for update.) Again, this isn't necessarily quite the right thing ... but in |
||||
simple cases it works. Potentially, recheck queries could get nested to the |
||||
depth of the number of FOR UPDATE relations in the query. |
||||
|
||||
It should be noted also that UPDATE/DELETE expect at most one tuple to |
||||
result from the modified query, whereas in the FOR UPDATE case it's possible |
||||
for multiple tuples to result (since we could be dealing with a join in |
||||
which multiple tuples join to the modified tuple). We want FOR UPDATE to |
||||
mark all relevant tuples, so we pass all tuples output by all the stacked |
||||
recheck queries back to the executor toplevel for marking. |
Loading…
Reference in new issue