apply_scanjoin_target_to_paths wants to avoid useless work and
platform-specific dependencies by throwing away the path list created
prior to applying the final scan/join target and constructing a whole
new one using the final scan/join target. However, this is only valid
when we'll consider all the same strategies after the pathlist reset
as before.
After resetting the path list, we reconsider Append and MergeAppend
paths with the modified target list; therefore, it's only valid for a
partitioned relation. However, what the previous coding missed is that
it cannot be a partitioned join relation, because that also has paths
that are not Append or MergeAppend paths and will not be reconsidered.
Thus, before this patch, we'd sometimes choose a partitionwise strategy
with a higher total cost than cheapest non-partitionwise strategy,
which is not good.
We had a surprising number of tests cases that were relying on this
bug to work as they did. A big part of the reason for this is that row
counts in regression test cases tend to be low, which brings the cost
of partitionwise and non-partitionwise strategies very close together,
especially for merge joins, where the real and perceived advantages of
a partitionwise approach are minimal. In addition, one test case
included a row-count-inflating join. In such cases, a partitionwise
join can easily be a loser on cost, because the total number of tuples
passing through an Append node is much higher than it is with a
non-partitionwise strategy. That test case is adjusted by adding
additional join clauses to avoid the row count inflation.
Although the failure of the planner to choose the lowest-cost path is a
bug, we generally do not back-patch fixes of this type, because planning
is not an exact science and there is always a possibility that some user
will end up with a plan that has a lower estimated cost but actually
runs more slowly. Hence, no backpatch here, either.
The code change here is exactly what was originally proposed by
Ashutosh, but the changes to the comments and test cases have been
very heavily rewritten by me, helped along by some very useful advice
from Richard Guo.
Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Author: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Arne Roland <arne.roland@malkut.net>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: http://postgr.es/m/CAExHW5toze58+jL-454J3ty11sqJyU13Sz5rJPQZDmASwZgWiA@mail.gmail.com
@ -89,7 +92,7 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a =
-> Hash
-> Seq Scan on prt2_p3 t2_3
Filter: (a = b)
(22 rows)
(21 rows)
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
a | c | b | c
@ -101,6 +104,7 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a =
24 | 0024 | 24 | 0024
(5 rows)
COMMIT;
-- left outer join, 3-way
EXPLAIN (COSTS OFF)
SELECT COUNT(*) FROM prt1 t1
@ -1244,11 +1248,12 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
450 | 0 | 0450
(4 rows)
-- test merge joins
-- test merge joins, slightly modifying the query to ensure that we still
-- get a fully partitionwise join
SET enable_hashjoin TO off;
SET enable_nestloop TO off;
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) ORDER BY t1.a;
@ -1298,9 +1300,9 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
Sort Key: (((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
(47 rows)
(44 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) ORDER BY t1.a;
a | b | c
-----+---+------
0 | 0 | 0000
@ -4922,27 +4924,27 @@ ANALYZE plt3_adv;
-- merged partition when re-called with plt1_adv_p1 for the second list value
-- '0001' of that partition
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.a = t3.a AND t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
Hash Cond: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
-> Seq Scan on plt1_adv_p2 t1_2
-> Hash
-> Seq Scan on plt2_adv_p2 t2_2
@ -4950,7 +4952,7 @@ SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t
-> Seq Scan on plt3_adv_p2 t3_2
(23 rows)
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.a = t3.a AND t1.c = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
a | c | a | c | a | c
----+------+----+------+----+------
0 | 0000 | | | |
@ -4959,56 +4961,16 @@ SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN plt2_adv t
15 | 0000 | | | |
20 | 0000 | | | |
1 | 0001 | | | 1 | 0001
1 | 0001 | | | 6 | 0001
1 | 0001 | | | 11 | 0001
1 | 0001 | | | 16 | 0001
1 | 0001 | | | 21 | 0001
6 | 0001 | | | 1 | 0001
6 | 0001 | | | 6 | 0001
6 | 0001 | | | 11 | 0001
6 | 0001 | | | 16 | 0001
6 | 0001 | | | 21 | 0001
11 | 0001 | | | 1 | 0001
11 | 0001 | | | 6 | 0001
11 | 0001 | | | 11 | 0001
11 | 0001 | | | 16 | 0001
11 | 0001 | | | 21 | 0001
16 | 0001 | | | 1 | 0001
16 | 0001 | | | 6 | 0001
16 | 0001 | | | 11 | 0001
16 | 0001 | | | 16 | 0001
16 | 0001 | | | 21 | 0001
21 | 0001 | | | 1 | 0001
21 | 0001 | | | 6 | 0001
21 | 0001 | | | 11 | 0001
21 | 0001 | | | 16 | 0001
21 | 0001 | | | 21 | 0001
2 | 0002 | 2 | 0002 | |
2 | 0002 | 7 | 0002 | |
2 | 0002 | 12 | 0002 | |
2 | 0002 | 17 | 0002 | |
2 | 0002 | 22 | 0002 | |
7 | 0002 | 2 | 0002 | |
7 | 0002 | 7 | 0002 | |
7 | 0002 | 12 | 0002 | |
7 | 0002 | 17 | 0002 | |
7 | 0002 | 22 | 0002 | |
12 | 0002 | 2 | 0002 | |
12 | 0002 | 7 | 0002 | |
12 | 0002 | 12 | 0002 | |
12 | 0002 | 17 | 0002 | |
12 | 0002 | 22 | 0002 | |
17 | 0002 | 2 | 0002 | |
17 | 0002 | 7 | 0002 | |
17 | 0002 | 12 | 0002 | |
17 | 0002 | 17 | 0002 | |
17 | 0002 | 22 | 0002 | |
22 | 0002 | 2 | 0002 | |
22 | 0002 | 7 | 0002 | |
22 | 0002 | 12 | 0002 | |
22 | 0002 | 17 | 0002 | |
22 | 0002 | 22 | 0002 | |
(55 rows)
(15 rows)
DROP TABLE plt1_adv;
DROP TABLE plt2_adv;
@ -5233,8 +5195,11 @@ CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');
INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
ANALYZE fract_t;
-- verify plan; nested index only scans
-- (avoid merge joins, because the costs of partitionwise and non-partitionwise
-- merge joins tend to be almost equal, and we want this test to be stable)
SET max_parallel_workers_per_gather = 0;
SET enable_partitionwise_join = on;
SET enable_mergejoin = off;
EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10;
QUERY PLAN
@ -5242,14 +5207,14 @@ SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY x.id AS
Limit
-> Merge Append
Sort Key: x.id
-> Merge Left Join
Merge Cond: (x_1.id = y_1.id)
-> Nested Loop Left Join
-> Index Only Scan using fract_t0_pkey on fract_t0 x_1
-> Index Only Scan using fract_t0_pkey on fract_t0 y_1
-> Merge Left Join
Merge Cond: (x_2.id = y_2.id)
Index Cond: (id = x_1.id)
-> Nested Loop Left Join
-> Index Only Scan using fract_t1_pkey on fract_t1 x_2
-> Index Only Scan using fract_t1_pkey on fract_t1 y_2
Index Cond: (id = x_2.id)
(11 rows)
EXPLAIN (COSTS OFF)
@ -5366,6 +5331,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM pht1 p1 JOIN pht1 p2 USING (c) LIMIT 1000;
-> Seq Scan on pht1_p3 p2_3
(17 rows)
RESET enable_mergejoin;
SET max_parallel_workers_per_gather = 1;
SET debug_parallel_query = on;
-- Partial paths should also be smart enough to employ limits