mirror of https://github.com/postgres/postgres
parent
da0e6bfeaf
commit
99f964fcc6
@ -1,519 +0,0 @@ |
|||||||
From owner-pgsql-hackers@hub.org Wed Sep 22 20:31:02 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA15611 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 22 Sep 1999 20:31:01 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id UAA02926 for <maillist@candle.pha.pa.us>; Wed, 22 Sep 1999 20:21:24 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id UAA75413; |
|
||||||
Wed, 22 Sep 1999 20:09:35 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 22 Sep 1999 20:08:50 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id UAA75058 |
|
||||||
for pgsql-hackers-outgoing; Wed, 22 Sep 1999 20:06:58 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id UAA74982 |
|
||||||
for <pgsql-hackers@postgreSQL.org>; Wed, 22 Sep 1999 20:06:25 -0400 (EDT) |
|
||||||
(envelope-from tgl@sss.pgh.pa.us) |
|
||||||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1]) |
|
||||||
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id UAA06411 |
|
||||||
for <pgsql-hackers@postgreSQL.org>; Wed, 22 Sep 1999 20:05:40 -0400 (EDT) |
|
||||||
To: pgsql-hackers@postgreSQL.org |
|
||||||
Subject: [HACKERS] Progress report: buffer refcount bugs and SQL functions |
|
||||||
Date: Wed, 22 Sep 1999 20:05:39 -0400 |
|
||||||
Message-ID: <6408.938045139@sss.pgh.pa.us> |
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us> |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
I have been finding a lot of interesting stuff while looking into |
|
||||||
the buffer reference count/leakage issue. |
|
||||||
|
|
||||||
It turns out that there were two specific things that were camouflaging |
|
||||||
the existence of bugs in this area: |
|
||||||
|
|
||||||
1. The BufferLeakCheck routine that's run at transaction commit was |
|
||||||
only looking for nonzero PrivateRefCount to indicate a missing unpin. |
|
||||||
It failed to notice nonzero LastRefCount --- which meant that an |
|
||||||
error in refcount save/restore usage could leave a buffer pinned, |
|
||||||
and BufferLeakCheck wouldn't notice. |
|
||||||
|
|
||||||
2. The BufferIsValid macro, which you'd think just checks whether |
|
||||||
it's handed a valid buffer identifier or not, actually did more: |
|
||||||
it only returned true if the buffer ID was valid *and* the buffer |
|
||||||
had positive PrivateRefCount. That meant that the common pattern |
|
||||||
if (BufferIsValid(buf)) |
|
||||||
ReleaseBuffer(buf); |
|
||||||
wouldn't complain if it were handed a valid but already unpinned buffer. |
|
||||||
And that behavior masks bugs that result in buffers being unpinned too |
|
||||||
early. For example, consider a sequence like |
|
||||||
|
|
||||||
1. LockBuffer (buffer now has refcount 1). Store reference to |
|
||||||
a tuple on that buffer page in a tuple table slot. |
|
||||||
2. Copy buffer reference to a second tuple-table slot, but forget to |
|
||||||
increment buffer's refcount. |
|
||||||
3. Release second tuple table slot. Buffer refcount drops to 0, |
|
||||||
so it's unpinned. |
|
||||||
4. Release original tuple slot. Because of BufferIsValid behavior, |
|
||||||
no assert happens here; in fact nothing at all happens. |
|
||||||
|
|
||||||
This is, of course, buggy code: during the interval from 3 to 4 you |
|
||||||
still have an apparently valid tuple reference in the original slot, |
|
||||||
which someone might try to use; but the buffer it points to is unpinned |
|
||||||
and could be replaced at any time by another backend. |
|
||||||
|
|
||||||
In short, we had errors that would mask both missing-pin bugs and |
|
||||||
missing-unpin bugs. And naturally there were a few such bugs lurking |
|
||||||
behind them... |
|
||||||
|
|
||||||
3. The buffer refcount save/restore stuff, which I had suspected |
|
||||||
was useless, is not only useless but also buggy. The reason it's |
|
||||||
buggy is that it only works if used in a nested fashion. You could |
|
||||||
save state A, pin some buffers, save state B, pin some more |
|
||||||
buffers, restore state B (thereby unpinning what you pinned since |
|
||||||
the save), and finally restore state A (unpinning the earlier stuff). |
|
||||||
What you could not do is save state A, pin, save B, pin more, then |
|
||||||
restore state A --- that might unpin some of A's buffers, or some |
|
||||||
of B's buffers, or some unforeseen combination thereof. If you |
|
||||||
restore A and then restore B, you do not necessarily return to a zero- |
|
||||||
pins state, either. And it turns out the actual usage pattern was a |
|
||||||
nearly random sequence of saves and restores, compounded by a failure to |
|
||||||
do all of the restores reliably (which was masked by the oversight in |
|
||||||
BufferLeakCheck). |
|
||||||
|
|
||||||
|
|
||||||
What I have done so far is to rip out the buffer refcount save/restore |
|
||||||
support (including LastRefCount), change BufferIsValid to a simple |
|
||||||
validity check (so that you get an assert if you unpin something that |
|
||||||
was pinned), change ExecStoreTuple so that it increments the refcount |
|
||||||
when it is handed a buffer reference (for symmetry with ExecClearTuple's |
|
||||||
decrement of the refcount), and fix about a dozen bugs exposed by these |
|
||||||
changes. |
|
||||||
|
|
||||||
I am still getting Buffer Leak notices in the "misc" regression test, |
|
||||||
specifically in the queries that invoke more than one SQL function. |
|
||||||
What I find there is that SQL functions are not always run to |
|
||||||
completion. Apparently, when a function can return multiple tuples, |
|
||||||
it won't necessarily be asked to produce them all. And when it isn't, |
|
||||||
postquel_end() isn't invoked for the function's current query, so its |
|
||||||
tuple table isn't cleared, so we have dangling refcounts if any of the |
|
||||||
tuples involved are in disk buffers. |
|
||||||
|
|
||||||
It may be that the save/restore code was a misguided attempt to fix |
|
||||||
this problem. I can't tell. But I think what we really need to do is |
|
||||||
find some way of ensuring that Postquel function execution contexts |
|
||||||
always get shut down by the end of the query, so that they don't leak |
|
||||||
resources. |
|
||||||
|
|
||||||
I suppose a straightforward approach would be to keep a list of open |
|
||||||
function contexts somewhere (attached to the outer execution context, |
|
||||||
perhaps), and clean them up at outer-plan shutdown. |
|
||||||
|
|
||||||
What I am wondering, though, is whether this addition is actually |
|
||||||
necessary, or is it a bug that the functions aren't run to completion |
|
||||||
in the first place? I don't really understand the semantics of this |
|
||||||
"nested dot notation". I suppose it is a Berkeleyism; I can't find |
|
||||||
anything about it in the SQL92 document. The test cases shown in the |
|
||||||
misc regress test seem peculiar, not to say wrong. For example: |
|
||||||
|
|
||||||
regression=> SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM person p; |
|
||||||
name |name |name |
|
||||||
-------------+-----------+----- |
|
||||||
advil |posthacking|mike |
|
||||||
peet's coffee|basketball |joe |
|
||||||
hightops |basketball |sally |
|
||||||
(3 rows) |
|
||||||
|
|
||||||
which doesn't appear to agree with the contents of the underlying |
|
||||||
relations: |
|
||||||
|
|
||||||
regression=> SELECT * FROM hobbies_r; |
|
||||||
name |person |
|
||||||
-----------+------ |
|
||||||
posthacking|mike |
|
||||||
posthacking|jeff |
|
||||||
basketball |joe |
|
||||||
basketball |sally |
|
||||||
skywalking | |
|
||||||
(5 rows) |
|
||||||
|
|
||||||
regression=> SELECT * FROM equipment_r; |
|
||||||
name |hobby |
|
||||||
-------------+----------- |
|
||||||
advil |posthacking |
|
||||||
peet's coffee|posthacking |
|
||||||
hightops |basketball |
|
||||||
guts |skywalking |
|
||||||
(4 rows) |
|
||||||
|
|
||||||
I'd have expected an output along the lines of |
|
||||||
|
|
||||||
advil |posthacking|mike |
|
||||||
peet's coffee|posthacking|mike |
|
||||||
hightops |basketball |joe |
|
||||||
hightops |basketball |sally |
|
||||||
|
|
||||||
Is the regression test's expected output wrong, or am I misunderstanding |
|
||||||
what this query is supposed to do? Is there any documentation anywhere |
|
||||||
about how SQL functions returning multiple tuples are supposed to |
|
||||||
behave? |
|
||||||
|
|
||||||
regards, tom lane |
|
||||||
|
|
||||||
************ |
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Thu Sep 23 11:03:19 1999 |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA16211 |
|
||||||
for <maillist@candle.pha.pa.us>; Thu, 23 Sep 1999 11:03:17 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id KAA58151; |
|
||||||
Thu, 23 Sep 1999 10:53:46 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 10:53:05 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id KAA57948 |
|
||||||
for pgsql-hackers-outgoing; Thu, 23 Sep 1999 10:52:23 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id KAA57841 |
|
||||||
for <hackers@postgreSQL.org>; Thu, 23 Sep 1999 10:51:50 -0400 (EDT) |
|
||||||
(envelope-from tgl@sss.pgh.pa.us) |
|
||||||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1]) |
|
||||||
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA14211; |
|
||||||
Thu, 23 Sep 1999 10:51:10 -0400 (EDT) |
|
||||||
To: Andreas Zeugswetter <andreas.zeugswetter@telecom.at> |
|
||||||
cc: hackers@postgreSQL.org |
|
||||||
Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions |
|
||||||
In-reply-to: Your message of Thu, 23 Sep 1999 10:07:24 +0200 |
|
||||||
<37E9DFBC.5C0978F@telecom.at> |
|
||||||
Date: Thu, 23 Sep 1999 10:51:10 -0400 |
|
||||||
Message-ID: <14209.938098270@sss.pgh.pa.us> |
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us> |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
Andreas Zeugswetter <andreas.zeugswetter@telecom.at> writes: |
|
||||||
> That is what I use it for. I have never used it with a |
|
||||||
> returns setof function, but reading the comments in the regression test, |
|
||||||
> -- mike needs advil and peet's coffee, |
|
||||||
> -- joe and sally need hightops, and |
|
||||||
> -- everyone else is fine. |
|
||||||
> it looks like the results you expected are correct, and currently the |
|
||||||
> wrong result is given. |
|
||||||
|
|
||||||
Yes, I have concluded the same (and partially fixed it, per my previous |
|
||||||
message). |
|
||||||
|
|
||||||
> Those that don't have a hobbie should return name|NULL|NULL. A hobbie |
|
||||||
> that does'nt need equipment name|hobbie|NULL. |
|
||||||
|
|
||||||
That's a good point. Currently (both with and without my uncommitted |
|
||||||
fix) you get *no* rows out from ExecTargetList if there are any Iters |
|
||||||
that return empty result sets. It might be more reasonable to treat an |
|
||||||
empty result set as if it were NULL, which would give the behavior you |
|
||||||
suggest. |
|
||||||
|
|
||||||
This would be an easy change to my current patch, and I'm prepared to |
|
||||||
make it before committing what I have, if people agree that that's a |
|
||||||
more reasonable definition. Comments? |
|
||||||
|
|
||||||
regards, tom lane |
|
||||||
|
|
||||||
************ |
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Thu Sep 23 04:31:15 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA11344 |
|
||||||
for <maillist@candle.pha.pa.us>; Thu, 23 Sep 1999 04:31:15 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id EAA05350 for <maillist@candle.pha.pa.us>; Thu, 23 Sep 1999 04:24:29 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id EAA85679; |
|
||||||
Thu, 23 Sep 1999 04:16:26 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 04:09:52 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id EAA84708 |
|
||||||
for pgsql-hackers-outgoing; Thu, 23 Sep 1999 04:08:57 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from gandalf.telecom.at (gandalf.telecom.at [194.118.26.84]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id EAA84632 |
|
||||||
for <hackers@postgresql.org>; Thu, 23 Sep 1999 04:08:03 -0400 (EDT) |
|
||||||
(envelope-from andreas.zeugswetter@telecom.at) |
|
||||||
Received: from telecom.at (w0188000580.f000.d0188.sd.spardat.at [172.18.65.249]) |
|
||||||
by gandalf.telecom.at (xxx/xxx) with ESMTP id KAA195294 |
|
||||||
for <hackers@postgresql.org>; Thu, 23 Sep 1999 10:07:27 +0200 |
|
||||||
Message-ID: <37E9DFBC.5C0978F@telecom.at> |
|
||||||
Date: Thu, 23 Sep 1999 10:07:24 +0200 |
|
||||||
From: Andreas Zeugswetter <andreas.zeugswetter@telecom.at> |
|
||||||
X-Mailer: Mozilla 4.61 [en] (Win95; I) |
|
||||||
X-Accept-Language: en |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: hackers@postgreSQL.org |
|
||||||
Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
> Is the regression test's expected output wrong, or am I |
|
||||||
> misunderstanding |
|
||||||
> what this query is supposed to do? Is there any |
|
||||||
> documentation anywhere |
|
||||||
> about how SQL functions returning multiple tuples are supposed to |
|
||||||
> behave? |
|
||||||
|
|
||||||
They are supposed to behave somewhat like a view. |
|
||||||
Not all rows are necessarily fetched. |
|
||||||
If used in a context that needs a single row answer, |
|
||||||
and the answer has multiple rows it is supposed to |
|
||||||
runtime elog. Like in: |
|
||||||
|
|
||||||
select * from tbl where col=funcreturningmultipleresults(); |
|
||||||
-- this must elog |
|
||||||
|
|
||||||
while this is ok: |
|
||||||
select * from tbl where col in (select funcreturningmultipleresults()); |
|
||||||
|
|
||||||
But the caller could only fetch the first row if he wanted. |
|
||||||
|
|
||||||
The nested notation is supposed to call the function passing it the tuple |
|
||||||
as the first argument. This is what can be used to "fake" a column |
|
||||||
onto a table (computed column). |
|
||||||
That is what I use it for. I have never used it with a |
|
||||||
returns setof function, but reading the comments in the regression test, |
|
||||||
-- mike needs advil and peet's coffee, |
|
||||||
-- joe and sally need hightops, and |
|
||||||
-- everyone else is fine. |
|
||||||
it looks like the results you expected are correct, and currently the |
|
||||||
wrong result is given. |
|
||||||
|
|
||||||
But I think this query could also elog whithout removing substantial |
|
||||||
functionality. |
|
||||||
|
|
||||||
SELECT p.name, p.hobbies.name, p.hobbies.equipment.name FROM person p; |
|
||||||
|
|
||||||
Actually for me it would be intuitive, that this query return one row per |
|
||||||
person, but elog on those that have more than one hobbie or a hobbie that |
|
||||||
needs more than one equipment. Those that don't have a hobbie should |
|
||||||
return name|NULL|NULL. A hobbie that does'nt need equipment name|hobbie|NULL. |
|
||||||
|
|
||||||
Andreas |
|
||||||
|
|
||||||
************ |
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Wed Sep 22 22:01:07 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA16360 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 22 Sep 1999 22:01:05 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id VAA08386 for <maillist@candle.pha.pa.us>; Wed, 22 Sep 1999 21:37:24 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id VAA88083; |
|
||||||
Wed, 22 Sep 1999 21:28:11 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 22 Sep 1999 21:27:48 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id VAA87938 |
|
||||||
for pgsql-hackers-outgoing; Wed, 22 Sep 1999 21:26:52 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) |
|
||||||
by hub.org (8.9.3/8.9.3) with SMTP id VAA87909 |
|
||||||
for <pgsql-hackers@postgresql.org>; Wed, 22 Sep 1999 21:26:36 -0400 (EDT) |
|
||||||
(envelope-from wieck@debis.com) |
|
||||||
Received: by orion.SAPserv.Hamburg.dsh.de |
|
||||||
for pgsql-hackers@postgresql.org |
|
||||||
id m11TxXw-0003kLC; Thu, 23 Sep 99 03:19 MET DST |
|
||||||
Message-Id: <m11TxXw-0003kLC@orion.SAPserv.Hamburg.dsh.de> |
|
||||||
From: wieck@debis.com (Jan Wieck) |
|
||||||
Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions |
|
||||||
To: tgl@sss.pgh.pa.us (Tom Lane) |
|
||||||
Date: Thu, 23 Sep 1999 03:19:39 +0200 (MET DST) |
|
||||||
Cc: pgsql-hackers@postgreSQL.org |
|
||||||
Reply-To: wieck@debis.com (Jan Wieck) |
|
||||||
In-Reply-To: <6408.938045139@sss.pgh.pa.us> from "Tom Lane" at Sep 22, 99 08:05:39 pm |
|
||||||
X-Mailer: ELM [version 2.4 PL25] |
|
||||||
Content-Type: text |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
Tom Lane wrote: |
|
||||||
|
|
||||||
> [...] |
|
||||||
> |
|
||||||
> What I am wondering, though, is whether this addition is actually |
|
||||||
> necessary, or is it a bug that the functions aren't run to completion |
|
||||||
> in the first place? I don't really understand the semantics of this |
|
||||||
> "nested dot notation". I suppose it is a Berkeleyism; I can't find |
|
||||||
> anything about it in the SQL92 document. The test cases shown in the |
|
||||||
> misc regress test seem peculiar, not to say wrong. For example: |
|
||||||
> |
|
||||||
> [...] |
|
||||||
> |
|
||||||
> Is the regression test's expected output wrong, or am I misunderstanding |
|
||||||
> what this query is supposed to do? Is there any documentation anywhere |
|
||||||
> about how SQL functions returning multiple tuples are supposed to |
|
||||||
> behave? |
|
||||||
|
|
||||||
I've said some time (maybe too long) ago, that SQL functions |
|
||||||
returning tuple sets are broken in general. This nested dot |
|
||||||
notation (which I think is an artefact from the postquel |
|
||||||
querylanguage) is implemented via set functions. |
|
||||||
|
|
||||||
Set functions have total different semantics from all other |
|
||||||
functions. First they don't really return a tuple set as |
|
||||||
someone might think - all that screwed up code instead |
|
||||||
simulates that they return something you could consider a |
|
||||||
scan of the last SQL statement in the function. Then, on |
|
||||||
each subsequent call inside of the same command, they return |
|
||||||
a "tupletable slot" containing the next found tuple (that's |
|
||||||
why their Func node is mangled up after the first call). |
|
||||||
|
|
||||||
Second they have a targetlist what I think was originally |
|
||||||
intended to extract attributes out of the tuples returned |
|
||||||
when the above scan is asked to get the next tuple. But as I |
|
||||||
read the code it invokes the function again and this might |
|
||||||
cause the resource leakage you see. |
|
||||||
|
|
||||||
Third, all this seems to never have been implemented |
|
||||||
(thought?) to the end. A targetlist doesn't make sense at |
|
||||||
this place because it could at max contain a single attribute |
|
||||||
- so a single attno would have the same power. And if set |
|
||||||
functions could appear in the rangetable (FROM clause), than |
|
||||||
they would be treated as that and regular Var nodes in the |
|
||||||
query would do it. |
|
||||||
|
|
||||||
I think you shouldn't really care for that regression test |
|
||||||
and maybe we should disable set functions until we really |
|
||||||
implement stored procedures returning sets in the rangetable. |
|
||||||
|
|
||||||
Set functions where planned by Stonebraker's team as |
|
||||||
something that today is called stored procedures. But AFAIK |
|
||||||
they never reached the useful state because even in Postgres |
|
||||||
4.2 you haven't been able to get more than one attribute out |
|
||||||
of a set function. It was a feature of the postquel |
|
||||||
querylanguage that you could get one attribute from a set |
|
||||||
function via |
|
||||||
|
|
||||||
RETRIEVE (attributename(setfuncname())) |
|
||||||
|
|
||||||
While working on the constraint triggers I've came across |
|
||||||
another regression test (triggers :-) that's errorneous too. |
|
||||||
The funny_dup17 trigger proc executes an INSERT into the same |
|
||||||
relation where it get fired for by a previous INSERT. And it |
|
||||||
stops this recursion only if it reaches a nesting level of |
|
||||||
17, which could only occur if it is fired DURING the |
|
||||||
execution of it's own SPI_exec(). After Vadim quouted some |
|
||||||
SQL92 definitions about when constraint checks and triggers |
|
||||||
are to be executed, I decided to fire regular triggers at the |
|
||||||
end of a query too. Thus, there is absolutely no nesting |
|
||||||
possible for AFTER triggers resulting in an endless loop. |
|
||||||
|
|
||||||
|
|
||||||
Jan |
|
||||||
|
|
||||||
-- |
|
||||||
|
|
||||||
#======================================================================# |
|
||||||
# It's easier to get forgiveness for being wrong than for being right. # |
|
||||||
# Let's break this rule - forgive me. # |
|
||||||
#========================================= wieck@debis.com (Jan Wieck) # |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
************ |
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Thu Sep 23 11:01:06 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA16162 |
|
||||||
for <maillist@candle.pha.pa.us>; Thu, 23 Sep 1999 11:01:04 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id KAA28544 for <maillist@candle.pha.pa.us>; Thu, 23 Sep 1999 10:45:54 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id KAA52943; |
|
||||||
Thu, 23 Sep 1999 10:20:51 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 10:19:58 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id KAA52472 |
|
||||||
for pgsql-hackers-outgoing; Thu, 23 Sep 1999 10:19:03 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id KAA52431 |
|
||||||
for <pgsql-hackers@postgresql.org>; Thu, 23 Sep 1999 10:18:47 -0400 (EDT) |
|
||||||
(envelope-from tgl@sss.pgh.pa.us) |
|
||||||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1]) |
|
||||||
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA13253; |
|
||||||
Thu, 23 Sep 1999 10:18:02 -0400 (EDT) |
|
||||||
To: wieck@debis.com (Jan Wieck) |
|
||||||
cc: pgsql-hackers@postgreSQL.org |
|
||||||
Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions |
|
||||||
In-reply-to: Your message of Thu, 23 Sep 1999 03:19:39 +0200 (MET DST) |
|
||||||
<m11TxXw-0003kLC@orion.SAPserv.Hamburg.dsh.de> |
|
||||||
Date: Thu, 23 Sep 1999 10:18:01 -0400 |
|
||||||
Message-ID: <13251.938096281@sss.pgh.pa.us> |
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us> |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
wieck@debis.com (Jan Wieck) writes: |
|
||||||
> Tom Lane wrote: |
|
||||||
>> What I am wondering, though, is whether this addition is actually |
|
||||||
>> necessary, or is it a bug that the functions aren't run to completion |
|
||||||
>> in the first place? |
|
||||||
|
|
||||||
> I've said some time (maybe too long) ago, that SQL functions |
|
||||||
> returning tuple sets are broken in general. |
|
||||||
|
|
||||||
Indeed they are. Try this on for size (using the regression database): |
|
||||||
|
|
||||||
SELECT p.name, p.hobbies.equipment.name FROM person p; |
|
||||||
SELECT p.hobbies.equipment.name, p.name FROM person p; |
|
||||||
|
|
||||||
You get different result sets!? |
|
||||||
|
|
||||||
The problem in this example is that ExecTargetList returns the isDone |
|
||||||
flag from the last targetlist entry, regardless of whether there are |
|
||||||
incomplete iterations in previous entries. More generally, the buffer |
|
||||||
leak problem that I started with only occurs if some Iter nodes are not |
|
||||||
run to completion --- but execQual.c has no mechanism to make sure that |
|
||||||
they have all reached completion simultaneously. |
|
||||||
|
|
||||||
What we really need to make functions-returning-sets work properly is |
|
||||||
an implementation somewhat like aggregate functions. We need to make |
|
||||||
a list of all the Iter nodes present in a targetlist and cycle through |
|
||||||
the values returned by each in a methodical fashion (run the rightmost |
|
||||||
through its full cycle, then advance the next-to-rightmost one value, |
|
||||||
run the rightmost through its cycle again, etc etc). Also there needs |
|
||||||
to be an understanding of the hierarchy when an Iter appears in the |
|
||||||
arguments of another Iter's function. (You cycle the upper one for |
|
||||||
*each* set of arguments created by cycling its sub-Iters.) |
|
||||||
|
|
||||||
I am not particularly interested in working on this feature right now, |
|
||||||
since AFAIK it's a Berkeleyism not found in SQL92. What I've done |
|
||||||
is to hack ExecTargetList so that it behaves semi-sanely when there's |
|
||||||
more than one Iter at the top level of the target list --- it still |
|
||||||
doesn't really give the right answer, but at least it will keep |
|
||||||
generating tuples until all the Iters are done at the same time. |
|
||||||
It happens that that's enough to give correct answers for the examples |
|
||||||
shown in the misc regress test. Even when it fails to generate all |
|
||||||
the possible combinations, there will be no buffer leaks. |
|
||||||
|
|
||||||
So, I'm going to declare victory and go home ;-). We ought to add a |
|
||||||
TODO item along the lines of |
|
||||||
* Functions returning sets don't really work right |
|
||||||
in hopes that someone will feel like tackling this someday. |
|
||||||
|
|
||||||
regards, tom lane |
|
||||||
|
|
||||||
************ |
|
||||||
|
|
||||||
|
|
@ -1,285 +0,0 @@ |
|||||||
From owner-pgsql-hackers@hub.org Fri Nov 13 13:24:37 1998 |
|
||||||
Received: from hub.org (majordom@hub.org [209.47.148.200]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457 |
|
||||||
for <maillist@candle.pha.pa.us>; Fri, 13 Nov 1998 13:24:35 -0500 (EST) |
|
||||||
Received: from localhost (majordom@localhost) |
|
||||||
by hub.org (8.9.1/8.9.1) with SMTP id NAA02464; |
|
||||||
Fri, 13 Nov 1998 13:22:52 -0500 (EST) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.1/8.9.1) id NAA02331 |
|
||||||
for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) |
|
||||||
by hub.org (8.9.1/8.9.1) with SMTP id NAA02316 |
|
||||||
for <pgsql-hackers@postgreSQL.org>; Fri, 13 Nov 1998 13:21:06 -0500 (EST) |
|
||||||
(envelope-from wieck@sapserv.debis.de) |
|
||||||
Received: by orion.SAPserv.Hamburg.dsh.de |
|
||||||
for pgsql-hackers@postgreSQL.org |
|
||||||
id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET |
|
||||||
Message-Id: <m0zeOEf-000EBPC@orion.SAPserv.Hamburg.dsh.de> |
|
||||||
From: jwieck@debis.com (Jan Wieck) |
|
||||||
Subject: [HACKERS] shmem limits and redolog |
|
||||||
To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS) |
|
||||||
Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET) |
|
||||||
Reply-To: jwieck@debis.com (Jan Wieck) |
|
||||||
X-Mailer: ELM [version 2.4 PL25] |
|
||||||
Content-Type: text |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
Hi, |
|
||||||
|
|
||||||
I'm currently hacking around on a solution for logging all |
|
||||||
database operations at query level that can recover a crashed |
|
||||||
database from the last successful backup by redoing all the |
|
||||||
commands. |
|
||||||
|
|
||||||
Well, I wanted it to be as flexible as can. So I decided to |
|
||||||
make it per database configurable. One could say which |
|
||||||
databases are logged and if a database is, if it is logged |
|
||||||
sync or async (in sync mode, every COMMIT forces an fsync of |
|
||||||
the actual logfile and controlfiles). |
|
||||||
|
|
||||||
To make async mode as fast as can, I'm using a shared memory |
|
||||||
of 32K per database (not per backend) that is used as a wrap |
|
||||||
around buffer from the backends to place their query |
|
||||||
information. So the log writer can fall a little behind if |
|
||||||
there are many backends doing different things that don't |
|
||||||
lock each other. |
|
||||||
|
|
||||||
Now I'm a little in doubt about the shared memory limits |
|
||||||
reported. Was it a good decision to use shared memory? Am I |
|
||||||
better off using socket's? |
|
||||||
|
|
||||||
The bad thing in what I have up to now (it's far from |
|
||||||
complete) is, that even if a database isn't currently logged, |
|
||||||
a redolog writer is started and creates the 32K shmem segment |
|
||||||
(plus a semaphore set with 5 semaphores). This is because I |
|
||||||
plan to create commands like |
|
||||||
|
|
||||||
ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname'; |
|
||||||
|
|
||||||
and the like that can be used at runtime (while more than one |
|
||||||
backend is connected to the database) to turn logging on/off, |
|
||||||
switch to/from backup mode (all other activity is stopped) |
|
||||||
etc. |
|
||||||
|
|
||||||
So every 32 databases will require another megabyte of shared |
|
||||||
memory. The logging master controls which databases have |
|
||||||
activity and kills redolog writers after some time of |
|
||||||
inactivity, and the shmem is freed then. But it can hurt if |
|
||||||
someone really has many many databases that are all used at |
|
||||||
the same time. |
|
||||||
|
|
||||||
What do the others say? |
|
||||||
|
|
||||||
|
|
||||||
Jan |
|
||||||
|
|
||||||
-- |
|
||||||
|
|
||||||
#======================================================================# |
|
||||||
# It's easier to get forgiveness for being wrong than for being right. # |
|
||||||
# Let's break this rule - forgive me. # |
|
||||||
#======================================== jwieck@debis.com (Jan Wieck) # |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Wed Dec 16 15:46:41 1998 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:46:40 -0500 (EST) |
|
||||||
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id PAA08772 for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:10:01 -0500 (EST) |
|
||||||
Received: from localhost (majordom@localhost) |
|
||||||
by hub.org (8.9.1/8.9.1) with SMTP id PAA01254; |
|
||||||
Wed, 16 Dec 1998 15:06:56 -0500 (EST) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.1/8.9.1) id OAA00660 |
|
||||||
for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8]) |
|
||||||
by hub.org (8.9.1/8.9.1) with SMTP id OAA00643 |
|
||||||
for <pgsql-hackers@postgreSQL.org>; Wed, 16 Dec 1998 14:58:05 -0500 (EST) |
|
||||||
(envelope-from wieck@sapserv.debis.de) |
|
||||||
Received: by orion.SAPserv.Hamburg.dsh.de |
|
||||||
for pgsql-hackers@postgreSQL.org |
|
||||||
id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET |
|
||||||
Message-Id: <m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de> |
|
||||||
From: jwieck@debis.com (Jan Wieck) |
|
||||||
Subject: Re: [HACKERS] redolog - for discussion |
|
||||||
To: vadim@krs.ru (Vadim Mikheev) |
|
||||||
Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET) |
|
||||||
Cc: jwieck@debis.com, pgsql-hackers@postgreSQL.org |
|
||||||
Reply-To: jwieck@debis.com (Jan Wieck) |
|
||||||
In-Reply-To: <3677B71D.C67462B3@krs.ru> from "Vadim Mikheev" at Dec 16, 98 08:35:25 pm |
|
||||||
X-Mailer: ELM [version 2.4 PL25] |
|
||||||
Content-Type: text |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
Vadim wrote: |
|
||||||
|
|
||||||
> |
|
||||||
> Jan Wieck wrote: |
|
||||||
> > |
|
||||||
> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET}; |
|
||||||
> > |
|
||||||
> ... |
|
||||||
> > |
|
||||||
> > For the others, the backend starts the recovery program |
|
||||||
> > which reads the redolog files, establishes database |
|
||||||
> > connections as required and reruns all the commands in |
|
||||||
> ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
||||||
> > them. If a required logfile isn't found, it tells the |
|
||||||
> ^^^^^ |
|
||||||
> |
|
||||||
> I foresee problems with using _commands_ logging for |
|
||||||
> recovery/replication -:(( |
|
||||||
> |
|
||||||
> Let's consider two concurrent updates in READ COMMITTED mode: |
|
||||||
> |
|
||||||
> update test set x = 2 where y = 1; |
|
||||||
> |
|
||||||
> and |
|
||||||
> |
|
||||||
> update test set x = 3 where y = 1; |
|
||||||
> |
|
||||||
> The result of both committed transaction will be x = 2 |
|
||||||
> if the 1st transaction updated row _after_ 2nd transaction |
|
||||||
> and x = 3 if the 2nd transaction gets row after 1st one. |
|
||||||
> Order of updates is not defined by order in which commands |
|
||||||
> begun and so order in which commands should be rerun |
|
||||||
> will be unknown... |
|
||||||
|
|
||||||
Yepp, the order in which commands begun is absolutely not of |
|
||||||
interest. Locking could already delay the execution of one |
|
||||||
command until another one started later has finished and |
|
||||||
released the lock. It's a classic race condition. |
|
||||||
|
|
||||||
Thus, my plan was to log the queries just before the call to |
|
||||||
CommitTransactionCommand() in tcop. This has the advantage, |
|
||||||
that queries which bail out with errors don't get into the |
|
||||||
log at all and must not get rerun. And I can set a static |
|
||||||
flag to false before starting the command, which is set to |
|
||||||
true in the buffer manager when a buffer is written (marked |
|
||||||
dirty), so filtering out queries that do no updates at all is |
|
||||||
easy. |
|
||||||
|
|
||||||
Unfortunately query level logging get's hit by the current |
|
||||||
implementation of sequence numbers. If a query that get's |
|
||||||
aborted somewhere in the middle (maybe by a trigger) called |
|
||||||
nextval() for rows processed earlier, the sequence number |
|
||||||
isn't advanced at recovery time, because the query is |
|
||||||
suppressed at all. And sequences aren't locked, so for |
|
||||||
concurrently running queries getting numbers from the same |
|
||||||
sequence, the results aren't reproduceable. If some |
|
||||||
application selects a value resulting from a sequence and |
|
||||||
uses that later in another query, how could the redolog know |
|
||||||
that this has changed? It's a Const in the query logged, and |
|
||||||
all that corrupts the whole thing. |
|
||||||
|
|
||||||
All that is painful and I don't see another solution yet than |
|
||||||
to hook into nextval(), log out the numbers generated in |
|
||||||
normal operation and getting back the same numbers in redo |
|
||||||
mode. |
|
||||||
|
|
||||||
The whole thing gets more and more complicated :-( |
|
||||||
|
|
||||||
|
|
||||||
Jan |
|
||||||
|
|
||||||
-- |
|
||||||
|
|
||||||
#======================================================================# |
|
||||||
# It's easier to get forgiveness for being wrong than for being right. # |
|
||||||
# Let's break this rule - forgive me. # |
|
||||||
#======================================== jwieck@debis.com (Jan Wieck) # |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Wed Jun 16 09:29:31 1999 |
|
||||||
Received: from hub.org (hub.org [209.167.229.1]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA22504 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 16 Jun 1999 09:29:29 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [209.167.229.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id JAA02132; |
|
||||||
Wed, 16 Jun 1999 09:18:20 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Jun 1999 09:14:07 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id JAA01318 |
|
||||||
for pgsql-hackers-outgoing; Wed, 16 Jun 1999 09:14:06 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f |
|
||||||
Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id JAA01278 |
|
||||||
for <hackers@postgreSQL.org>; Wed, 16 Jun 1999 09:13:48 -0400 (EDT) |
|
||||||
(envelope-from vadim@krs.ru) |
|
||||||
Received: from krs.ru (dune.krs.ru [195.161.16.38]) |
|
||||||
by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id VAA06276 |
|
||||||
for <hackers@postgreSQL.org>; Wed, 16 Jun 1999 21:12:49 +0800 (KRSS) |
|
||||||
Message-ID: <3767A2CF.E6E4A5F9@krs.ru> |
|
||||||
Date: Wed, 16 Jun 1999 21:12:47 +0800 |
|
||||||
From: Vadim Mikheev <vadim@krs.ru> |
|
||||||
Organization: OJSC Rostelecom (Krasnoyarsk) |
|
||||||
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386) |
|
||||||
X-Accept-Language: ru, en |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: PostgreSQL Developers List <hackers@postgreSQL.org> |
|
||||||
Subject: [HACKERS] Savepoints... |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
To have them I need to add tuple id (6 bytes) to heap tuple |
|
||||||
header. Are there objections? Though it's not good to increase |
|
||||||
tuple header size, subj is, imho, very nice feature... |
|
||||||
|
|
||||||
Implementation is , hm, "easy": |
|
||||||
|
|
||||||
- heap_insert/heap_delete/heap_replace/heap_mark4update will |
|
||||||
remember updated tid (and current command id) in relation cache |
|
||||||
and store previously updated tid (remembered in relation cache) |
|
||||||
in additional heap header tid; |
|
||||||
- lmgr will remember command id when lock was acquired; |
|
||||||
- for a savepoint we will just store command id when |
|
||||||
the savepoint was setted; |
|
||||||
- when going to sleep due to concurrent the-same-row update, |
|
||||||
backend will store MyProc and tuple id in shmem hash table. |
|
||||||
|
|
||||||
When rolling back to a savepoint, backend will: |
|
||||||
|
|
||||||
- release locks acquired after savepoint; |
|
||||||
- for a relation updated after savepoint, get last updated tid |
|
||||||
from relation cache, walk through relation, set |
|
||||||
HEAP_XMIN_INVALID/HEAP_XMAX_INVALID in all tuples updated |
|
||||||
after savepoint and wake up concurrent writers blocked |
|
||||||
on these tuples (using shmem hash table mentioned above). |
|
||||||
|
|
||||||
The last feature (waking up of concurrent writers) is most hard |
|
||||||
part to implement. AFAIK, Oracle 7.3 was not able to do it. |
|
||||||
Can someone comment is this feature implemented in Oracle 8.X, |
|
||||||
other DBMSes? |
|
||||||
|
|
||||||
Now about implicit savepoints. Backend will place them before |
|
||||||
user statements execution. In the case of failure, transaction |
|
||||||
state will be rolled back to the one before execution of query. |
|
||||||
As side-effect, this means that we'll get rid of complaints |
|
||||||
about entire transaction abort in the case of mistyping |
|
||||||
causing abort due to parser errors... |
|
||||||
|
|
||||||
Comments? |
|
||||||
|
|
||||||
Vadim |
|
||||||
|
|
||||||
|
|
@ -1,392 +0,0 @@ |
|||||||
From lockhart@alumni.caltech.edu Thu Jan 7 13:31:08 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA07771 |
|
||||||
for <maillist@candle.pha.pa.us>; Thu, 7 Jan 1999 13:31:06 -0500 (EST) |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA14597 for <maillist@candle.pha.pa.us>; Thu, 7 Jan 1999 13:27:37 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA13416; |
|
||||||
Thu, 7 Jan 1999 18:26:56 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <3694FC70.FAD67BC3@alumni.caltech.edu> |
|
||||||
Date: Thu, 07 Jan 1999 18:26:56 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.30 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us> |
|
||||||
CC: Postgres Hackers List <hackers@postgresql.org> |
|
||||||
Subject: Outer Joins (and need CASE help) |
|
||||||
References: <199901071747.MAA07054@candle.pha.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: RO |
|
||||||
|
|
||||||
> Thomas, do you need help on outer joins? |
|
||||||
|
|
||||||
Yes. I'm going slowly partly because I get distracted with other |
|
||||||
Postgres stuff like docs, and partly because I don't understand all of |
|
||||||
the pieces I'm working with. |
|
||||||
|
|
||||||
I've identified the place in the MergeJoin code where the null filling |
|
||||||
for outer joins needs to happen, and have the "merge walk" code done. |
|
||||||
But I don't have the supporting code which actually would know how to |
|
||||||
null-fill a result tuple from the left or right. I thought you might be |
|
||||||
interested in that? |
|
||||||
|
|
||||||
I've done some work in the parser, and can now do things like: |
|
||||||
|
|
||||||
postgres=> select * from t1 join t2 using (i); |
|
||||||
NOTICE: JOIN not yet implemented |
|
||||||
i|j|i|k |
|
||||||
-+-+-+- |
|
||||||
1|2|1|3 |
|
||||||
(1 row) |
|
||||||
|
|
||||||
But this is just an inner join, and the result isn't quite right since |
|
||||||
the second "i" column should probably be omitted. At the moment I |
|
||||||
transform it from the syntax above into existing parse nodes, and |
|
||||||
everything from there on works. |
|
||||||
|
|
||||||
I don't yet pass an explicit join node into the planner/optimizer, and |
|
||||||
that will be the hardest part I assume. Perhaps we can work on that |
|
||||||
together. |
|
||||||
|
|
||||||
So, what I'll try to do (soon, in the next few days?) is put in |
|
||||||
|
|
||||||
#ifdef ENABLE_OUTER_JOINS |
|
||||||
|
|
||||||
conditional code into the parser area (already there for the executor) |
|
||||||
and commit everything to the development tree. Does that sound OK? |
|
||||||
|
|
||||||
Oh, and if anyone is looking for something to do, I've got a couple of |
|
||||||
CASE statements in the case.sql regression test which are commented out |
|
||||||
because they crash the backend. They involve references to multiple |
|
||||||
tables within a single result column, and in other contexts that |
|
||||||
construct works. It would be great if someone had time to track it |
|
||||||
down... |
|
||||||
|
|
||||||
- Tom |
|
||||||
|
|
||||||
From lockhart@alumni.caltech.edu Mon Feb 22 02:01:13 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA22073 |
|
||||||
for <maillist@candle.pha.pa.us>; Mon, 22 Feb 1999 02:01:12 -0500 (EST) |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id BAA26054 for <maillist@candle.pha.pa.us>; Mon, 22 Feb 1999 01:57:00 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA04715; |
|
||||||
Mon, 22 Feb 1999 06:56:36 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <36D0FFA4.32ADB75C@alumni.caltech.edu> |
|
||||||
Date: Mon, 22 Feb 1999 06:56:36 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us> |
|
||||||
CC: hackers@postgreSQL.org |
|
||||||
Subject: Re: start on outer join |
|
||||||
References: <199902220304.WAA10066@candle.pha.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
Bruce Momjian wrote: |
|
||||||
> |
|
||||||
> > Will apply ... some other changes laying a bit of |
|
||||||
> > groundwork for outer joins so you can start on the planner/optimizer |
|
||||||
> > parts :) |
|
||||||
> Those will be a synch now that I understand the optimizer. In fact, I |
|
||||||
> think it all will happen in the executor. |
|
||||||
|
|
||||||
I've modified executor/nodeMergeJoin.c to walk a left/right/both outer |
|
||||||
join, but didn't fill in the part which actually creates the result |
|
||||||
tuple (which will be the current left- or right-side tuple plus nulls |
|
||||||
for filler). I hope this is up your alley :) |
|
||||||
|
|
||||||
So far, I'm not certain what to pass to the planner. The syntax leads me |
|
||||||
to pass a select structure from gram.y with a "JoinExpr" structure in |
|
||||||
the "fromClause" list. I need to expand that with a combination of |
|
||||||
column names and qualifications, but at the time I see the JoinExpr I |
|
||||||
don't have access to the top query structure itself. So I may just keep |
|
||||||
a modestly transformed JoinExpr to expand later or to pass to the |
|
||||||
planner. |
|
||||||
|
|
||||||
btw, the EXCEPT/INTERSECT stuff from Stefan has some ugliness in gram.y |
|
||||||
which needs to be fixed (the shift/reduce conflict is not acceptable for |
|
||||||
our release version) and some of that code clearly needs to move to |
|
||||||
analyze.c or some other module. |
|
||||||
|
|
||||||
- Tom |
|
||||||
|
|
||||||
From maillist Wed Feb 24 05:27:08 1999 |
|
||||||
Received: (from maillist@localhost) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) id FAA09648; |
|
||||||
Wed, 24 Feb 1999 05:27:08 -0500 (EST) |
|
||||||
From: Bruce Momjian <maillist> |
|
||||||
Message-Id: <199902241027.FAA09648@candle.pha.pa.us> |
|
||||||
Subject: Re: [HACKERS] OUTER joins |
|
||||||
In-Reply-To: <199902240953.EAA08561@candle.pha.pa.us> from Bruce Momjian at "Feb 24, 1999 4:53:21 am" |
|
||||||
To: maillist@candle.pha.pa.us (Bruce Momjian) |
|
||||||
Date: Wed, 24 Feb 1999 05:27:07 -0500 (EST) |
|
||||||
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org |
|
||||||
X-Mailer: ELM [version 2.4ME+ PL47 (25)] |
|
||||||
MIME-Version: 1.0 |
|
||||||
Content-Type: text/plain; charset=US-ASCII |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: RO |
|
||||||
|
|
||||||
> |
|
||||||
> How do you propose doing outer joins in non-mergejoin situations? |
|
||||||
> Mergejoins can only be used currently in equal joins. |
|
||||||
|
|
||||||
Is your solution going to be to make sure the OUTER table is always a |
|
||||||
MergeJoin, or on the outside of a join loop? That could work. |
|
||||||
|
|
||||||
That could get tricky if the table is joined to _two_ other tables. |
|
||||||
With the cleaned-up optimizer, we can disable non-merge joins in certain |
|
||||||
circumstances, and prevent OUTER tables from being inner in the others. |
|
||||||
Is that the plan? |
|
||||||
|
|
||||||
-- |
|
||||||
Bruce Momjian | http://www.op.net/~candle |
|
||||||
maillist@candle.pha.pa.us | (610) 853-3000 |
|
||||||
+ If your life is a hard drive, | 830 Blythe Avenue |
|
||||||
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026 |
|
||||||
|
|
||||||
From lockhart@alumni.caltech.edu Mon Mar 1 13:01:08 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA21672 |
|
||||||
for <maillist@candle.pha.pa.us>; Mon, 1 Mar 1999 13:01:06 -0500 (EST) |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id MAA12756 for <maillist@candle.pha.pa.us>; Mon, 1 Mar 1999 12:14:16 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id RAA09406; |
|
||||||
Mon, 1 Mar 1999 17:10:49 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <36DACA19.E6DBE7D8@alumni.caltech.edu> |
|
||||||
Date: Mon, 01 Mar 1999 17:10:49 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us> |
|
||||||
CC: PostgreSQL-development <hackers@postgreSQL.org> |
|
||||||
Subject: Re: OUTER joins |
|
||||||
References: <199902240953.EAA08561@candle.pha.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
(back from a short vacation...) |
|
||||||
|
|
||||||
> How do you propose doing outer joins in non-mergejoin situations? |
|
||||||
> Mergejoins can only be used currently in equal joins. |
|
||||||
|
|
||||||
Hadn't thought about it, other than figuring that implementing the |
|
||||||
equi-join first was a good start. There is a class of outer join syntax |
|
||||||
(the USING clause) which is implicitly an equi-join... |
|
||||||
|
|
||||||
- Tom |
|
||||||
|
|
||||||
From lockhart@alumni.caltech.edu Mon Mar 8 21:55:02 1999 |
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA15978 |
|
||||||
for <maillist@candle.pha.pa.us>; Mon, 8 Mar 1999 21:54:57 -0500 (EST) |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id VAA15837 for <maillist@candle.pha.pa.us>; Mon, 8 Mar 1999 21:48:33 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id CAA06996; |
|
||||||
Tue, 9 Mar 1999 02:46:40 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <36E48B90.F3E902B7@alumni.caltech.edu> |
|
||||||
Date: Tue, 09 Mar 1999 02:46:40 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us> |
|
||||||
CC: hackers@postgreSQL.org |
|
||||||
Subject: Re: OUTER joins |
|
||||||
References: <199903070325.WAA10357@candle.pha.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
> > Hadn't thought about it, other than figuring that implementing the |
|
||||||
> > equi-join first was a good start. There is a class of outer join |
|
||||||
> > syntax (the USING clause) which is implicitly an equi-join... |
|
||||||
> Not that easy. You don't automatically get a mergejoin from an |
|
||||||
> equijoin. I will have to force outer's to be either mergejoins, or |
|
||||||
> inners of non-merge joins. Can you add code to non-merge joins in the |
|
||||||
> executor to throw out a null row if it does not find an inner match |
|
||||||
> for the outer row, and I will handle the optimizer so it doesn't throw |
|
||||||
> a non-conforming plan to the executor. |
|
||||||
|
|
||||||
So far I don't have enough info in the parser to get the |
|
||||||
planner/optimizer going. Should we work from the front to the back, or |
|
||||||
should I go ahead and look at the non-merge joins? It's painfully |
|
||||||
obvious that I don't know anything about the middle parts of this to |
|
||||||
proceed without lots more research. |
|
||||||
|
|
||||||
- Tom |
|
||||||
|
|
||||||
From lockhart@alumni.caltech.edu Tue Mar 9 22:47:57 1999 |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07869 |
|
||||||
for <maillist@candle.pha.pa.us>; Tue, 9 Mar 1999 22:47:54 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id DAA14761; |
|
||||||
Wed, 10 Mar 1999 03:46:43 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <36E5EB23.F5CD959B@alumni.caltech.edu> |
|
||||||
Date: Wed, 10 Mar 1999 03:46:43 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us>, tgl@mythos.jpl.nasa.gov |
|
||||||
Subject: Re: SQL outer |
|
||||||
References: <199903100112.UAA05772@candle.pha.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: RO |
|
||||||
|
|
||||||
> select * |
|
||||||
> from outer tab1, tab2, tab3 |
|
||||||
> where tab1.col1 = tab2.col1 and |
|
||||||
> tab1.col1 = tab3.col1 |
|
||||||
|
|
||||||
select * |
|
||||||
from t1 left join t2 using (c1) |
|
||||||
join t3 on (c1 = t3.c1) |
|
||||||
|
|
||||||
Result: |
|
||||||
t1.c1 t1.c2 t2.c2 t3.c1 |
|
||||||
2 12 NULL 32 |
|
||||||
|
|
||||||
t1: |
|
||||||
c1 c2 |
|
||||||
1 11 |
|
||||||
2 12 |
|
||||||
3 13 |
|
||||||
4 14 |
|
||||||
|
|
||||||
t2: |
|
||||||
c1 c2 |
|
||||||
1 21 |
|
||||||
3 23 |
|
||||||
|
|
||||||
t3: |
|
||||||
c1 c2 |
|
||||||
2 32 |
|
||||||
|
|
||||||
From lockhart@alumni.caltech.edu Wed Mar 10 10:48:54 1999 |
|
||||||
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA16741 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 10 Mar 1999 10:48:51 -0500 (EST) |
|
||||||
Received: from alumni.caltech.edu (localhost [127.0.0.1]) |
|
||||||
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id PAA17723; |
|
||||||
Wed, 10 Mar 1999 15:48:31 GMT |
|
||||||
Sender: tgl@mythos.jpl.nasa.gov |
|
||||||
Message-ID: <36E6944F.1F93B08@alumni.caltech.edu> |
|
||||||
Date: Wed, 10 Mar 1999 15:48:31 +0000 |
|
||||||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> |
|
||||||
Organization: Caltech/JPL |
|
||||||
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us> |
|
||||||
CC: Thomas Lockhart <lockhart@alumni.caltech.edu> |
|
||||||
Subject: Re: SQL outer |
|
||||||
References: <199903100112.UAA05772@candle.pha.pa.us> <36E5EB23.F5CD959B@alumni.caltech.edu> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Status: ROr |
|
||||||
|
|
||||||
Just thinking... |
|
||||||
|
|
||||||
If the initial RelOptInfo groupings are derived from the WHERE clause |
|
||||||
expressions, how about marking the "outer" property in those expressions |
|
||||||
in the parser? istm that is where the parser knows about two tables in |
|
||||||
one place, and I'm generating those expressions anyway. We could add a |
|
||||||
field(s) to the expression structure, or pass along a slightly different |
|
||||||
structure... |
|
||||||
|
|
||||||
- Tom |
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Wed Jul 21 02:35:13 1999 |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA13837 |
|
||||||
for <maillist@candle.pha.pa.us>; Wed, 21 Jul 1999 02:35:12 -0400 (EDT) |
|
||||||
Received: from hub.org (hub.org [216.126.84.1]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id CAA88539; |
|
||||||
Wed, 21 Jul 1999 02:27:41 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@hub.org) |
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jul 1999 02:24:08 +0000 (EDT) |
|
||||||
Received: (from majordom@localhost) |
|
||||||
by hub.org (8.9.3/8.9.3) id CAA87850 |
|
||||||
for pgsql-hackers-outgoing; Wed, 21 Jul 1999 02:23:13 -0400 (EDT) |
|
||||||
(envelope-from owner-pgsql-hackers@postgreSQL.org) |
|
||||||
Received: from localhost (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) |
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id CAA87810 |
|
||||||
for <pgsql-hackers@postgreSQL.org>; Wed, 21 Jul 1999 02:22:52 -0400 (EDT) |
|
||||||
(envelope-from lockhart@alumni.caltech.edu) |
|
||||||
Received: from alumni.caltech.edu (lockhart@localhost [127.0.0.1]) |
|
||||||
by localhost (8.8.7/8.8.7) with ESMTP id GAA14480; |
|
||||||
Wed, 21 Jul 1999 06:20:22 GMT |
|
||||||
Message-ID: <379566A6.A4CDF97F@alumni.caltech.edu> |
|
||||||
Date: Wed, 21 Jul 1999 06:20:22 +0000 |
|
||||||
From: Thomas Lockhart <lockhart@alumni.caltech.edu> |
|
||||||
X-Mailer: Mozilla 4.6 [en] (X11; I; Linux 2.0.36 i686) |
|
||||||
X-Accept-Language: en |
|
||||||
MIME-Version: 1.0 |
|
||||||
To: Tom Lane <tgl@sss.pgh.pa.us> |
|
||||||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, pgsql-hackers@postgreSQL.org |
|
||||||
Subject: Re: [HACKERS] Another reason to redesign querytree representation |
|
||||||
References: <591.932505751@sss.pgh.pa.us> |
|
||||||
Content-Type: text/plain; charset=us-ascii |
|
||||||
Content-Transfer-Encoding: 7bit |
|
||||||
Sender: owner-pgsql-hackers@postgreSQL.org |
|
||||||
Precedence: bulk |
|
||||||
Status: RO |
|
||||||
|
|
||||||
> Thomas, what do you think is needed for outer joins? |
|
||||||
|
|
||||||
Bruce and I have talked about it some already: |
|
||||||
|
|
||||||
For outer joins, tables must be combined in a particular order. For |
|
||||||
example, a left outer join requires that any entries in the left-side |
|
||||||
table which do not have a corresponding entry in the right-side table |
|
||||||
be expanded with nulls during the join. The information on the outer |
|
||||||
join can't be carried by the rte since the same table can appear twice |
|
||||||
in an outer join expression: |
|
||||||
|
|
||||||
select * from t1 left join t2 using (i) |
|
||||||
left join t1 on (i = t1.j); |
|
||||||
|
|
||||||
For a query like |
|
||||||
|
|
||||||
select * from t1 left join t2 using (i) where t2.j = 3; |
|
||||||
|
|
||||||
istm that the outer join must be done before the t2 qualification is |
|
||||||
applied, and that another ordering may produce the wrong result. |
|
||||||
|
|
||||||
>From what I understand Bruce to say, the planner/optimizer is allowed |
|
||||||
to try all kinds of permutations of plans, choosing the one with the |
|
||||||
lowest cost. But if the info for the join is carried in a |
|
||||||
qualification node, then the planner/optimizer must know that it can't |
|
||||||
reorder the query as freely as it does now. |
|
||||||
|
|
||||||
I was thinking of having a new qualification node to carry this info, |
|
||||||
and it could be transformed into a mergejoin node which has a couple |
|
||||||
of new fields indicating left and/or right outer join behavior. |
|
||||||
|
|
||||||
A hashjoin method may be possible for queries which are structured as |
|
||||||
a left outer join; other outer joins will need to use the mergejoin |
|
||||||
method. Also, some poorly-qualified outer joins reduce to inner joins, |
|
||||||
and perhaps the optimizer can be smart enough to realize this. |
|
||||||
|
|
||||||
- Thomas |
|
||||||
|
|
||||||
-- |
|
||||||
Thomas Lockhart lockhart@alumni.caltech.edu |
|
||||||
South Pasadena, California |
|
||||||
|
|
||||||
|
|
Loading…
Reference in new issue