If an error is thrown after calling CreateWaitEventSet(), the memory
of a WaitEventSet is free'd as it's allocated in the short-lived
memory context, but the file descriptor (on epoll- or kqueue-based
systems) or handles (on Windows) that it contains are leaked.
Use PG_TRY-FINALLY to ensure it gets freed. (On master, I will apply a
better fix, using ResourceOwners to track the WaitEventSet, but that's
not backpatchable.)
The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.
In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.
Report by Alexander Lakhin, analysis and suggestion for the fix by Tom
Lane. Fixes bug #17828. Backpatch to v14 where async execution was
introduced, but master gets a different fix.
Discussion: https://www.postgresql.org/message-id/17828-122da8cba23236be@postgresql.org
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
The copy command formed for initial sync was using parenthesis for tables
with no columns leading to syntax error. This patch avoids adding
parenthesis for such tables.
Reported-by: Justin G
Author: Vignesh C
Reviewed-by: Peter Smith, Amit Kapila
Backpatch-through: 15
Discussion: http://postgr.es/m/18203-df37fe354b626670@postgresql.org
The DROP STATISTICS code failed to properly lock the table, leading to
ERROR: tuple concurrently deleted
when executed concurrently with ANALYZE.
Fixed by modifying RemoveStatisticsById() to acquire the same lock as
ANALYZE. This function is called only by DROP STATISTICS, as ANALYZE
calls RemoveStatisticsDataById() directly.
Reported by Justin Pryzby, fix by me. Backpatch through 12. The code was
like this since it was introduced in 10, but older releases are EOL.
Reported-by: Justin Pryzby
Reviewed-by: Tom Lane
Backpatch-through: 12
Discussion: https://postgr.es/m/ZUuk-8CfbYeq6g_u@pryzbyj2023
Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.
Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.
Given that these errors are relatively easy to hit, back-patch to all
supported branches.
Per bug #18200 from Alexander Lakhin, and subsequent investigation.
Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.
To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types. Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.
This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.
This is a backpatch of 9dce22033d to all supported branches.
Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
Backpatch-through: v12
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions). Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.
Two actual bugs of this ilk are fixed here. We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest. Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism. A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.
In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after. The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.
Per bug #18097 from Jim Keener. Back-patch to all supported versions.
Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
It's clearly stated in the comments that ginFindParents() must keep
the pin on the index's root page that's associated with the topmost
GinBtreeStack item. However, the code path for the case that the
desired downlink has been pushed down to the next index level
ignored this proviso, and would release the pin anyway if we were
still examining the root level. That led to an assertion failure
or "buffer NNNN is not owned by resource owner" error later, when
we try to release the pin again at the end of the insertion.
This is quite hard to reproduce, since it can only happen if an
index root page split occurs concurrently with our own insertion.
Thanks to Jeff Janes for finding a test case that triggers it
often enough to allow investigation.
This has been there since the beginning of GIN, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
Commit b7eda3e0e moved XidInMVCCSnapshot() from tqual.c into snapmgr.c,
but follow-up commit c91560def incorrectly updated this reference. We
could fix it, but as pointed out by Daniel Gustafsson, 1) the reader can
easily find the file that contains the definition of that function, e.g.
by grepping, and 2) this kind of reference is prone to going stale; so
let's just remove it.
Back-patch to all supported branches.
Reviewed by Daniel Gustafsson.
Discussion: https://postgr.es/m/CAPmGK145VdKkPBLWS2urwhgsfidbSexwY-9zCL6xSUJH%2BBTUUg%40mail.gmail.com
We can simplify FieldSelect on a whole-row Var into a plain Var
for the selected field. However, we should copy the whole-row Var's
varnullingrels when we do so, because the new Var is clearly nullable
by exactly the same rels as the original. Failure to do this led to
errors like "wrong varnullingrels (b) (expected (b 3)) for Var 2/2".
Richard Guo, per bug #18184 from Marian Krucina. Back-patch to
v16 where varnullingrels was introduced.
Discussion: https://postgr.es/m/18184-5868dd258782058e@postgresql.org
When executing a MERGE UPDATE action, if the UPDATE is turned into a
cross-partition DELETE then INSERT, do not attempt to invoke AFTER
UPDATE ROW triggers, or any of the other post-update actions in
ExecUpdateEpilogue().
For consistency with a plain UPDATE command, such triggers should not
be fired (and typically fail anyway), and similarly, other post-update
actions, such as WCO/RLS checks should not be executed, and might also
lead to unexpected failures.
Therefore, as with ExecUpdate(), make ExecMergeMatched() return
immediately if ExecUpdateAct() reports that a cross-partition update
was done, to be sure that no further processing is done for that
tuple.
Back-patch to v15, where MERGE was introduced.
Discussion: https://postgr.es/m/CAEZATCWjBgagyNZs02vgDF0DvASYj-iHTFtXG2-nP3orZhmtcw%40mail.gmail.com
false_positive_rate is a parameter that can be set with the bloom
opclass in BRIN, and setting it to a value of exactly 0.25 would trigger
an assertion in the first INSERT done on the index with value set.
The assertion changed here relied on BLOOM_{MIN|MAX}_FALSE_POSITIVE_RATE
that are somewhat arbitrary values, and specifying an out-of-range value
would also trigger a failure when defining such an index. So, as-is,
the assertion was just doubling on the min-max check of the reloption.
This is now enlarged to check that it is a correct percentage value,
instead, based on a suggestion by Tom Lane.
Author: Alexander Lakhin
Reviewed-by: Tom Lane, Shihao Zhong
Discussion: https://postgr.es/m/17969-a6c54de48026d694@postgresql.org
Backpatch-through: 14
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds. In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen. Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.
To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.
The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text. v11 psql
lacks that parameter, so omit the tests in that branch.
Our thanks to Pedro Gallegos for reporting this problem.
Security: CVE-2023-5869
transformAggregateCall() captures the datatypes of the aggregate's
arguments immediately to construct the Aggref.aggargtypes list.
This seems reasonable because the arguments have already been
transformed --- but there is an edge case where they haven't been.
Specifically, if we have an unknown-type literal in an ANY argument
position, nothing will have been done with it earlier. But if we
also have DISTINCT, then addTargetToGroupList() converts the literal
to "text" type, resulting in the aggargtypes list not matching the
actual runtime type of the argument. The end result is that the
aggregate tries to interpret a "text" value as being of type
"unknown", that is a zero-terminated C string. If the text value
contains no zero bytes, this could result in disclosure of server
memory following the text literal value.
To fix, move the collection of the aggargtypes list to the end
of transformAggregateCall(), after DISTINCT has been handled.
This requires slightly more code, but not a great deal.
Our thanks to Jingzhou Fu for reporting this problem.
Security: CVE-2023-5868
It was always false in single-user mode, in autovacuum workers, and in
background workers. This had no specifically-identified security
consequences, but non-core code or future work might make it
security-relevant. Back-patch to v11 (all supported versions).
Jelte Fennema-Nio. Reported by Jelte Fennema-Nio.
Documentation says it cannot signal "a backend owned by a superuser".
On the contrary, it could signal background workers, including the
logical replication launcher. It could signal autovacuum workers and
the autovacuum launcher. Block all that. Signaling autovacuum workers
and those two launchers doesn't stall progress beyond what one could
achieve other ways. If a cluster uses a non-core extension with a
background worker that does not auto-restart, this could create a denial
of service with respect to that background worker. A background worker
with bugs in its code for responding to terminations or cancellations
could experience those bugs at a time the pg_signal_backend member
chooses. Back-patch to v11 (all supported versions).
Reviewed by Jelte Fennema-Nio. Reported by Hemanth Sandrana and
Mahendrakar Srinivasarao.
Security: CVE-2023-5870
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val. Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice. Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen"
class XX. The other functions succeeded on an empty index; they might
have malfunctioned if the failed index build left torn I/O or other
complex state. Report an ERROR in statistics functions pgstatindex,
pgstatginindex, pgstathashindex, and pgstattuple. Report DEBUG1 and
skip all index I/O in maintenance functions brin_desummarize_range,
brin_summarize_new_values, brin_summarize_range, and
gin_clean_pending_list. Back-patch to v11 (all supported versions).
Discussion: https://postgr.es/m/20231001195309.a3@google.com
When looping around after finding that the set-returning function
returned zero rows for the current input tuple, ExecProjectSet
neglected to reset either of the two memory contexts it's
responsible for cleaning out. Typically this wouldn't cause much
problem, because once the SRF does return at least one row, the
contexts would get reset on the next call. However, if the SRF
returns no rows for many input tuples in succession, quite a lot
of memory could be transiently consumed.
To fix, make sure we reset both contexts while looping around.
Per bug #18172 from Sergei Kornilov. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18172-9b8c5fc1d676ded3@postgresql.org
When calculating distance for interval values, the code mostly mimicked
interval_mi, i.e. it built a new interval value for the difference.
That however does not work for sufficiently distant interval values,
when the difference overflows the interval range.
Instead, we can calculate the distance directly, without constructing
the intermediate (and unnecessary) interval value.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Dean Rasheed
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
Make sure that infinite values in date/timestamp columns are treated as
if in infinite distance. Infinite values should not be merged with other
values, leaving them as outliers. The code however returned distance 0
in this case, so that infinite values were merged first. While this does
not break the index (i.e. it still produces correct query results), it
may make it much less efficient.
We don't need explicit handling of infinite date/timestamp values when
calculating distances, because those values are represented as extreme
but regular values (e.g. INT64_MIN/MAX for the timestamp type).
We don't need an exact distance, just a value that is much larger than
distanced between regular values. With the added cast to double values,
we can simply subtract the values.
The regression test queries a value in the "gap" and checks the range
was properly eliminated by the BRIN index.
This only affects minmax-multi indexes on timestamp/date columns with
infinite values, which is not very common in practice. The affected
indexes may need to be rebuilt.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
When calculating the distance between date values, make sure to subtract
them in the right order, i.e. (larger - smaller).
The distance is used to determine which values to merge, and is expected
to be a positive value. The code unfortunately did the subtraction in
the opposite order, i.e. (smaller - larger), thus producing negative
values and merging values the most distant values first.
The resulting index is correct (i.e. produces correct results), but may
be significantly less efficient. This affects all minmax-multi indexes
on date columns.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
When calculating distances for timestamp values for BRIN minmax-multi
indexes, we need to be careful about overflows for extreme values. If
the value overflows into a negative value, the index may be inefficient.
The new regression test checks this for the timestamp type by adding a
table with enough values to force range compaction/merging. The values
are close to min/max, which means a risk of overflow.
Fixed by converting the int64 values to double first, before calculating
the distance. This prevents the overflow. We may lose some precision, of
course, but that's good enough. In the worst case we build a slightly
less efficient index, but for large distances this won't matter.
This only affects minmax-multi indexes on timestamp columns, with ranges
containing values sufficiently distant to cause an overflow. That seems
like a fairly rare case in practice.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
Currently, expand_single_inheritance_child() doesn't reset
perminfoindex in a plain-inheritance parent's child RTE, because
prior to 387f9ed0a0, the executor would use the first child RTE to
locate the parent's RTEPermissionInfo. That in turn causes
add_rte_to_flat_rtable() to create an extra RTEPermissionInfo
belonging to the parent's child RTE with the same content as the one
belonging to the parent's original ("root") RTE.
In 387f9ed0a0, we changed things so that the executor can now use the
parent's "root" RTE for locating its RTEPermissionInfo instead of the
child RTE, so the latter's perminfoindex need not be set anymore, so
make it so.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us
Backpatch-through: 16
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not. (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.) In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:
1. The wrong table's statement-level triggers would get fired.
2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan. This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.
In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.
While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch. Given the lack of field complaints about the
trigger issue, I'll desist for now. (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)
Per bug #18147 from Hans Buschmann.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
When min_dynamic_shared_memory is set above 0, we try to find space in a
pre-allocated region of the main shared memory area instead of calling
dsm_impl_XXX() routines to allocate more. The dsm_pin_segment() and
dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX()
routines even for main region segments. Nobody noticed before now
because those routines do nothing on Unix, but on Windows they'd fail
while attempting to duplicate an invalid Windows HANDLE. Add the
missing gating.
Back-patch to 14, where commit 84b1c63a added this feature. Fixes
pgsql-bugs bug #18165.
Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org
This avoids a compiler bug occurring in AIX's xlc, even in pretty
late-model revisions. Buildfarm testing has now confirmed that
only 64-bit xlc is affected. Although we are contemplating
dropping support for xlc in v17, it's still supported in the
back branches, so we need this fix.
Back-patch of code changes from HEAD commit 19fa97731.
(The test cases were already back-patched, in 4a427b82c et al.)
Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].
* For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
* For LLVM == 15, we'll continue to do the same, explicitly opting
out of opaque pointer mode.
* For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
the extra type argument.
The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc. The change is mostly mechanical,
except that at each site the correct type had to be provided.
In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.
Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.
[1] https://llvm.org/docs/OpaquePointers.html
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
Relation files extended multiple blocks at a time have been counting the
number of blocks written, but forgot that to increment the write time in
this case, as write and relation extension are treated as two different
I/O operations in the shared stats: IOOP_EXTEND vs IOOP_WRITE. In this
case IOOP_EXTEND was forgotten for normal (non-temporary) relations.
Write times are tracked when track_io_timing is enabled, which is not
the case by default.
Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
Backpatch-through: 16
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system(). This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers. If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit(). This can end badly. For
example, both processes will try to remove themselves from the
PGPROC shared array.
To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before. If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.
This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.
Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
Dropping a temp table could entail TOAST table access to clean out
toasted catalog entries, such as large pg_constraint.conbin strings
for complex CHECK constraints. If we did that via ON COMMIT DROP,
we triggered the assertion in init_toast_snapshot(), because
there was no provision for setting up a snapshot for the drop
actions. Fix that.
(I assume here that the adjacent truncation actions for ON COMMIT
DELETE ROWS don't have a similar problem: it doesn't seem like
nontransactional truncations would need to touch any toasted fields.
If that proves wrong, we could refactor a bit to have the same
snapshot acquisition cover that too.)
The test case added here does not fail before v15, because that
assertion was added in 277692220 which was not back-patched.
However, the race condition the assertion warns of surely
exists further back, so back-patch to all supported branches.
Per report from Richard Guo.
Discussion: https://postgr.es/m/CAMbWs4-x26=_QxxgdJyNbiCDzvtr2WV5ZDso_v-CukKEe6cBZw@mail.gmail.com
If SIGTERM is received within this section, the startup process
will immediately proc_exit() in the signal handler, so it is
inadvisable to include any more code than is required there (as
such code is unlikely to be compatible with doing proc_exit() in a
signal handler). This commit moves the code recently added to this
section (see 1b06d7bac9 and 7fed801135) to outside of the section.
This ensures that the startup process only calls proc_exit() in its
SIGTERM handler for the duration of the system() call, which is how
this code worked from v8.4 to v14.
Reported-by: Michael Paquier, Thomas Munro
Analyzed-by: Andres Freund
Suggested-by: Tom Lane
Reviewed-by: Michael Paquier, Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 15
Commit dc7d70ea added functions that read the control file, but didn't
acquire ControlFileLock. With unlucky timing, file systems that have
weak interlocking like ext4 and ntfs could expose partially overwritten
contents, and the checksum would fail.
Back-patch to all supported releases.
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
Under interval_ops, some equal values are distinguishable. One such
pair is '24:00:00' and '1 day'. With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function. This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes. This fix makes interval_ops simply omit the support function,
like numeric_ops does. Back-pack to v13, where btequalimage() first
appeared. In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval". Going
forward, back-branch initdb will include the catalog change.
Reviewed by Peter Geoghegan.
Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
When COPYing into a partitioned table that does now permit the use of
table_multi_insert(), we could error out with
ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes
because BulkInsertState->next_free was not reset between partitions. This
problem occurred only when not able to use table_multi_insert(), as a
dedicated BulkInsertState for each partition is used in that case.
The bug was introduced in 00d1e02be2, but it was hard to hit at that point,
as commonly bulk relation extension is not used when not using
table_multi_insert(). It became more likely after 82a4edabd2, which expanded
the use of bulk extension.
To fix the bug, reset the bulk relation extension state in BulkInsertState in
ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fc) to tackle a very
similar issue. Obviously the name is not quite correct, but there might be
external callers, and bulk insert state needs to be reset in precisely in the
situations that ReleaseBulkInsertStatePin() already needed to be called.
Medium term the better fix likely is to disallow reusing BulkInsertState
across relations.
Add a test that, without the fix, reproduces #18130 in most
configurations. The test also catches the problem fixed in b1ecb9b3fc when
run with small shared_buffers.
Reported-by: Ivan Kolombet <enderstd@gmail.com>
Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us>
Analyzed-by: Andres Freund <andres@anarazel.de>
Bug: #18130
Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org
Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us
Backpatch: 16-
This could only affect HASH partitioned tables with at least 2 partition
key columns.
If partition pruning was delayed until execution and the query contained
an IS NULL qual on one of the partitioned keys, and some subsequent
partitioned key was being compared to a non-Const, then this could result
in a crash due to the incorrect keyno being used to calculate the
stateidx for the expression evaluation code.
Here we fix this by properly skipping partitioned keys which have a
nullkey set. Effectively, this must be the same as what's going on
inside perform_pruning_base_step().
Sergei Glukhov also provided a patch, but that's not what's being used
here.
Reported-by: Sergei Glukhov
Reviewed-by: tender wang, Sergei Glukhov
Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru
Backpatch-through: 11, where runtime partition pruning was added.
get_steps_using_prefix_recurse() incorrectly assumed that it could stop
recursive processing of the 'prefix' list when cur_keyno was one before
the step_lastkeyno. Since hash partition pruning can prune using IS
NULL quals, and these IS NULL quals are not present in the 'prefix'
list, then that logic could cause more levels of recursion than what is
needed and lead to there being no more items in the 'prefix' list to
process. This would manifest itself as a crash in some code that
expected the 'start' ListCell not to be NULL.
Here we adjust the logic so that instead of stopping recursion at 1 key
before the step_lastkeyno, we just look at the llast(prefix) item and
ensure we only recursively process up until just before whichever the last
key is. This effectively allows keys to be missing in the 'prefix' list.
This change does mean that step_lastkeyno is no longer needed, so we
remove that from the static functions. I also spent quite some time
reading this code and testing it to try to convince myself that there
are no other issues. That resulted in the irresistible temptation of
rewriting some comments, many of which were just not true or inconcise.
Reported-by: Sergei Glukhov
Reviewed-by: Sergei Glukhov, tender wang
Discussion: https://postgr.es/m/2f09ce72-315e-2a33-589a-8519ada8df61@postgrespro.ru
Backpatch-through: 11, where partition pruning was introduced.
1349d2790 added code to adjust the PlannerInfo.group_pathkeys so that
ORDER BY / DISTINCT aggregate functions could obtain pre-sorted inputs
to allow faster execution. That commit forgot to adjust the pathkeys in
create_agg_path(). Some code in there assumed that it was always fine
to make the AggPath's pathkeys the same as its subpath's. That seems to
have been ok up until 1349d2790, but since that commit adds pathkeys for
columns which are within the aggregate function, those columns won't be
available above the aggregate node. This can result in "could not find
pathkey item to sort" during create_plan().
The fix here is to strip off the additional pathkeys added by
adjust_group_pathkeys_for_groupagg(). It seems that the pathkeys here
will only ever be group_pathkeys, so all we need to do is check if the
length of the pathkey list is longer than the num_groupby_pathkeys and
get rid of the additional ones only if we see extras.
Reported-by: Justin Pryzby
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/ZQhYYRhUxpW3PSf9%40telsasoft.com
Backpatch-through: 16, where 1349d2790 was introduced
Ensure we switch to the per-tuple memory context to prevent any memory
leaks of detoasted Datums in MemoizeHash_hash() and MemoizeHash_equal().
Reported-by: Orlov Aleksej
Author: Orlov Aleksej, David Rowley
Discussion: https://postgr.es/m/83281eed63c74e4f940317186372abfd%40cft.ru
Backpatch-through: 14, where Memoize was added
The code in charge of copying the contents of PgBackendStatus to local
memory could fail on memory allocation because of an overflow on the
amount of memory to use. The overflow can happen when combining a high
value track_activity_query_size (max at 1MB) with a large
max_connections, when both multiplied get higher than INT32_MAX as both
parameters treated as signed integers. This could for example trigger
with the following functions, all calling pgstat_read_current_status():
- pg_stat_get_backend_subxact()
- pg_stat_get_backend_idset()
- pg_stat_get_progress_info()
- pg_stat_get_activity()
- pg_stat_get_db_numbackends()
The change to use MemoryContextAllocHuge() has been introduced in
8d0ddccec6, so backpatch down to 12.
Author: Jakub Wartak
Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com
Backpatch-through: 12