While we already test that our headers are valid C++ using
headerscheck, it turns out that the macros we define might still
expand to invalid C++ code. This adds a minimal test extension that
is compiled using C++ to test that it's actually possible to build and
run extensions written in C++. Future commits will improve C++
compatibility of some of our macros and add usage of them to this
extension make sure that they don't regress in the future.
The test module is for the moment disabled when using MSVC. In
particular, the use of designated initializers in PG_MODULE_MAGIC
would require C++20, for which we are currently not set up. (GCC and
Clang support it as extensions.) It is planned to fix this.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg@mail.gmail.com
This way we don't have to walk the entire process type array and
strcmp() the string with the names therein. The integer value can be
directly used as array index instead.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/202512090935.k3xrtr44hxkn@alvherre.pgsql
A recent BF failure showed that commit 7a485bd641 did not handle the case
where a sequence is dropped concurrently during sequence synchronization
on the subscriber. Previously, pg_get_sequence_data() would ERROR out if
the sequence was dropped concurrently. After 7a485bd641, it instead
returns NULL, which leads to an assertion failure on the subscriber.
To handle this change, update sequence synchronization to skip sequences
for which pg_get_sequence_data() returns NULL.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CALDaNm0FoGdt+1mzua0t-=wYdup5_zmFrvfNf-L=MGBnj9HAcg@mail.gmail.com
This addition is in the same spirit as 32e27bd320 for MVNDistinct and
MVDependencies, except that we were missing a free routine for the third
type of extended statistics, MCVList. I was not sure if we needed an
equivalent for MCVList, but after more review of the main patch set for
the import of extended statistics, it has become clear that we do.
This is introduced as its own change as this routine can be useful on
its own. This one is a piece that has not been written by Corey
Huinker, I have just noticed it by myself on the way.
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.
This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.
This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present. Some
regression tests are added to PGSS to verify these patterns.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com
Backpatch-through: 18
When faced with a relation containing more than 1 physical segment
(i.e. >1GB, with normal settings), the previous code could compute a
truncation block length greater than RELSEG_SIZE, which could lead to
restore failures of this form:
file "%s" has truncation block length %u in excess of segment size %u
The fix is simply to clamp the maximum computed truncation_block_length
to RELSEG_SiZE. I have also added some comments to clarify the logic.
The test case was written by Oleg Tkachenko, but I have rewritten its
comments.
Reported-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Oleg Tkachenko <oatkachenko@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Backpatch-through: 17
Discussion: http://postgr.es/m/00FEFC88-EA1D-4271-B38F-EB741733A84A@gmail.com
When checking a subquery's output expressions to see if it's safe to
push down an upper-level qual, check_output_expressions() previously
treated grouping Vars as opaque Vars. This implicitly assumed they
were stable and scalar.
However, a grouping Var's underlying expression corresponds to the
grouping clause, which may be volatile or set-returning. If an
upper-level qual references such an output column, pushing it down
into the subquery is unsafe. This can cause strange results due to
multiple evaluation of a volatile function, or introduce SRFs into
the subquery's WHERE/HAVING quals.
This patch teaches check_output_expressions() to look through grouping
Vars to their underlying expressions. This ensures that any
volatility or set-returning properties in the grouping clause are
detected, preventing the unsafe pushdown.
We do not need to recursively examine the Vars contained in these
underlying expressions. Even if they reference outputs from
lower-level subqueries (at any depth), those references are guaranteed
not to expand to volatile or set-returning functions, because
subqueries containing such functions in their targetlists are never
pulled up.
Backpatch to v18, where this issue was introduced.
Reported-by: Eric Ridge <eebbrr@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/7900964C-F99E-481E-BEE5-4338774CEB9F@gmail.com
Backpatch-through: 18
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California. (Upstream made this release mostly to update
their leap-second data, which we don't use.) But with minor
releases coming up, we should be up-to-date.
Backpatch-through: 14
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline. A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message. This error is
fixed to show the correct timeline number in the report.
This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.
Oversight in 0d8c9c1210.
Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_0F2949C4594556F672CF4658@qq.com
Backpatch-through: 14
An assertion is used in this routine to check that a valid namespace OID
is given by the caller, but it was repeated twice: once at the top of
the routine and a second time multiple times in a switch/case. This
commit removes the assertions within the switch/case.
Thinko in commit 765cbfdc92.
Author: Man Zeng <zengman@halodbtech.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/tencent_40F8C1D82E2EE28065009AAA@qq.com
The function is part of the injection_points test module and only used
in tests. None of the current tests call it with a NULL argument, but
it is supposed to work.
Backpatch-through: 17
If auto-analyze kicks in at just the right moment, it can hold a
snapshot and prevent the VACUUM command in the test from removing the
deleted tuples. The test needs the tuples to be removed, otherwise no
half-dead page is generated. To fix, introduce a helper procedure to
wait for the removable cutoff to advance, like the one used in the
syscache-update-pruned test for similar purposes.
Thanks to Alexander Lakhin for reproducing and analyzing the test
failure, and Tom Lane for the report.
Discussion: https://www.postgresql.org/message-id/307198.1767408023@sss.pgh.pa.us
Some compilers, e.g. gcc with -Og or -O1, warn about the wait_event in
BufferLockAcquire() possibly being uninitialized. That can't actually happen,
as the switch() covers all legal lock mode values, but we still need to
silence the warning. We could add a default:, but we'd like to get a warning
if we were to get a new lock mode in the future. So just initialize
wait_event to 0.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/934395.1768518154@sss.pgh.pa.us
Issue fat-fingered in d756fa1019, noticed while doing more review of
the main patch set proposed. I have missed the fact that this can be
triggered by specifying an extended stats object that does not match
with the relation specified and already locked. Like the cases where
an object defined in input is missing, the code is changed to issue a
WARNING instead of a confusing cache lookup failure.
A regression test is added to cover this case.
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
Commit cbc127917e introduced tracking of unpruned relids to skip
processing of pruned partitions. PlannedStmt.unprunableRelids is
computed as the difference between PlannerGlobal.allRelids and
prunableRelids, but allRelids only contains RTE_RELATION entries.
This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are
never included in unprunableRelids, and consequently not in
es_unpruned_relids at runtime.
As a result, rowmarks attached to non-relation RTEs were incorrectly
skipped during executor initialization. This affects any DML statement
that has rowmarks on such RTEs, including MERGE with a VALUES or
subquery source, and UPDATE/DELETE with joins against subqueries or
CTEs. When a concurrent update triggers an EPQ recheck, the missing
rowmark leads to incorrect results.
Fix by restricting the es_unpruned_relids membership check to
RTE_RELATION entries only, since partition pruning only applies to
actual relations. Rowmarks for other RTE kinds are now always
processed.
Bug: #19355
Reported-by: Bihua Wang <wangbihua.cn@gmail.com>
Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org
Backpatch-through: 18
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.
The problem sequence is:
1. Process acquires a lock in a DSM segment (e.g., via dshash)
2. FATAL error occurs outside transaction context
3. proc_exit() begins, calling before_shmem_exit callbacks
4. dsm_backend_shutdown() detaches all DSM segments
5. Later, on_shmem_exit callbacks run
6. ProcKill() calls LWLockReleaseAll()
7. Segfault: the lock being released is in unmapped memory
This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.
Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.
Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.
This fix aligns with the original design intent from commit
001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.
Reported-by: Rahila Syed <rahilasyed90@gmail.com>
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com
Backpatch-through: 14
Commit 1e2fddfa33 changed OutputFsync() so that it always returns true.
However, pg_recvlogical.c still contained checks of its boolean return
value, which are now redundant.
This commit removes those checks and changes the type of return value of
OutputFsync() to void, simplifying the code.
Suggested-by: Yilin Zhang <jiezhilove@126.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
This commit adds a test to verify that data already received and flushed by
pg_recvlogical is not streamed again even after the connection is lost,
reestablished, and logical replication is restarted.
Author: Mircea Cadariu <cadariu.mircea@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
wait_for_file() waits for the contents of a specified file, starting at an
optional offset, to match a given regular expression. If no offset is
provided, the entire file is checked. The function times out after
$PostgreSQL::Test::Utils::timeout_default seconds. It returns the total
file length on success.
The existing wait_for_log() function contains almost identical logic, but
is limited to reading the cluster's log file. This commit also refactors
wait_for_log() to call wait_for_file() instead, avoiding code duplication.
This helper will be used by upcoming changes.
Suggested-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
Previously, when pg_recvlogical lost connection, reconnected, and restarted
replication, data that had already been flushed could be streamed again.
This happened because the replication start position used when restarting
replication was taken from the last standby status message, which could be
older than the position of the last flushed data. As a result, some flushed
data newer than the replication start position could exist and be re-sent.
This commit fixes the issue by ensuring all written data is flushed to disk
before restarting replication, and by using the last flushed position as
the replication start point. This prevents already flushed data from being
re-sent.
Additionally, previously when the --no-loop option was used, pg_recvlogical
could exit without flushing written data, potentially losing data. To fix
this issue, this commit also ensures all data is flushed to disk before
exiting due to --no-loop.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Dewei Dai <daidewei1970@163.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFeTymZQ7RLvMU6WuDGar8bUQCazg=VOfA-9GeBkg-FzA@mail.gmail.com
The test introduced in 639352d904 has added a direct pg_ctl command to
start a node, a method that is incompatible with the teardown() routine
used at the end of the test as the PID saved in the Cluster object would
prevent the node to be shut down. This can ultimately prevent the test
to perform its cleanup, failing on timeout.
Like pg_ctl's 001_start_stop or ssl_passphrase_callback's 001_testfunc,
this commit changes the test so a direct pg_ctl command is used to stop
the rogue node. That should be hopefully enough to cool down the
buildfarm.
Per report from buildfarm member fairywren, which is the only animal
that is showing this issue.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/TY7PR01MB1455452AE9053DD2B77B74FEAF58CA@TY7PR01MB14554.jpnprd01.prod.outlook.com
This function is able to clear the data associated to an extended
statistics object, making things so as the object looks as
newly-created.
The caller of this function needs the following arguments for the
extended stats to clear:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
The first two parameters are especially important to ensure a consistent
lookup and ACL checks for the relation on which is based the extended
stats object that will be cleared, relying first on a RangeVar lookup
where permissions are checked without locking a relation, critical to
prevent denial-of-service attacks when using this kind of function (see
also 688dc6299a for a similar concern). The third to fifth arguments
give a way to target the extended stats records to clear.
This has been extracted from a larger patch by the same author, for a
piece which is again useful on its own. I have rewritten large portions
of it. The tests have been extended while discussing this piece,
resulting on what this commit includes. The intention behind this
feature is to add support for the import of extended statistics across
dumps and upgrades, this change building one piece that we will be able
to rely on for the rest of the changes.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This reverts commit f8d7f29b3e, plus parts of
subsequent commits fixing a typo in a parameter name.
Support for disowned lwlocks was added for the benefit of AIO, to be able to
have content locks "owned" by the AIO subsystem. But as of commit fcb9c977aa,
content locks do not use lwlocks anymore.
It does not seem particularly likely that we need this facility outside of the
AIO use-case, therefore remove the now unused functions.
I did choose to keep the comment added in the aforementioned commit about
lock->owner intentionally being left pointing to the last owner.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/cj5mcjdpucvw4a54hehslr3ctukavrbnxltvuzzhqnimvpju5e@cy3g3mnsefwz
As of commit fcb9c977aa, ForEachLWLockHeldByMe(), introduced in f4ece891fc,
is not used anymore, as content locks are now implemented in bufmgr.c. It
doesn't seem that likely that a new user of the functionality will appear all
that soon, making removal of the function seem like the most sensible path. It
can easily be added back if necessary.
Discussion: https://postgr.es/m/lneuyxqxamqoayd2ntau3lqjblzdckw6tjgeu4574ezwh4tzlg%40noioxkquezdw
Until now buffer content locks were implemented using lwlocks. That has the
obvious advantage of not needing a separate efficient implementation of
locks. However, the time for a dedicated buffer content lock implementation
has come:
1) Hint bits are currently set while holding only a share lock. This leads to
having to copy pages while they are being written out if checksums are
enabled, which is not cheap. We would like to add AIO writes, however once
many buffers can be written out at the same time, it gets a lot more
expensive to copy them, particularly because that copy needs to reside in
shared buffers (for worker mode to have access to the buffer).
In addition, modifying buffers while they are being written out can cause
issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not
like that, due to filesystem internal checksums getting corrupted.
The solution to this is to require a new share-exclusive lock-level to set
hint bits and to write out buffers, making those operations mutually
exclusive. We could introduce such a lock-level into the generic lwlock
implementation, however it does not look like there would be other users,
and it does add some overhead into important code paths.
2) For AIO writes we need to be able to race-freely check whether a buffer is
undergoing IO and whether an exclusive lock on the page can be acquired. That
is rather hard to do efficiently when the buffer state and the lock state
are separate atomic variables. This is a major hindrance to allowing writes
to be done asynchronously.
3) Buffer locks are by far the most frequently taken locks. Optimizing them
specifically for their use case is worth the effort. E.g. by merging
content locks into buffer locks we will be able to release a buffer lock
and pin in one atomic operation.
4) There are more complicated optimizations, like long-lived "super pinned &
locked" pages, that cannot realistically be implemented with the generic
lwlock implementation.
Therefore implement content locks inside bufmgr.c. The lockstate is stored as
part of BufferDesc.state. The implementation of buffer content locks is fairly
similar to lwlocks, with a few important differences:
1) An additional lock-level share-exclusive has been added. This lock-level
conflicts with exclusive locks and itself, but not share locks.
2) Error recovery for content locks is implemented as part of the already
existing private-refcount tracking mechanism in combination with resowners,
instead of a bespoke mechanism as the case for lwlocks. This means we do
not need to add dedicated error-recovery code paths to release all content
locks (like done with LWLockReleaseAll() for lwlocks).
3) The lock state is embedded in BufferDesc.state instead of having its own
struct.
4) The wakeup logic is a tad more complicated due to needing to support the
additional lock-level
This commit unfortunately introduces some code that is very similar to the
code in lwlock.c, however the code is not equivalent enough to easily merge
it. The future wins that this commit makes possible seem worth the cost.
As of this commit nothing uses the new share-exclusive lock mode. It will be
used in a future commit. It seemed too complicated to introduce the lock-level
in a separate commit.
It's worth calling out one wart in this commit: Despite content locks not
being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better
than duplicating the relevant infrastructure.
Another thing worth pointing out is that, after this change, content locks are
not reported as LWLock wait events anymore, but as new wait events in the
"Buffer" wait event class (see also 6c5c393b74). The old BufferContent lwlock
tranche has been removed.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This is motivated by wanting to merge buffer content locks into
BufferDesc.state in a future commit, rather than having a separate lwlock (see
commit c75ebc657f for more details). As this change is rather mechanical, it
seems to make sense to split it out into a separate commit, for easier review.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This patch reworks LISTEN/NOTIFY to avoid waking backends that have
no need to process the notification messages we just sent.
The primary change is to create a shared hash table that tracks
which processes are listening to which channels (where a "channel" is
defined by a database OID and channel name). This allows a notifying
process to accurately determine which listeners are interested,
replacing the previous weak approximation that listeners in other
databases couldn't be interested.
Secondly, if a listener is known not to be interested and is
currently stopped at the old queue head, we avoid waking it at all
and just directly advance its queue pointer past the notifications
we inserted.
These changes permit very significant improvements (integer multiples)
in NOTIFY throughput, as well as a noticeable reduction in latency,
when there are many listeners but only a few are interested in any
specific message. There is no improvement for the simplest case where
every listener reads every message, but any loss seems below the noise
level.
Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:
- A relation has a size of 400 blocks.
- Blocks 201 to 400 are empty.
- Block 200 has two rows.
- Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts
When the replica restarts:
- The relation on disk starts at 100 blocks, because all the
truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
on the cached size and incorrectly assumes that the page already
exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
is 100 blocks, an extend is attempted. However, the block 200 is
already present in the buffer cache due to the first FPW. This
triggers the 'unexpected data beyond EOF'.
To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com
Backpatch-through: 14
Liujinyang reported the one in binaryheap.c, I then found and analyzed
the rest.
For future patches, we require git archaelogical analysis before we
accept patches of this nature.
Co-authored-by: liujinyang <21043272@qq.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com
We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the
completion to be reused, which in turn could lead to cqe->res being
overwritten. The window for that is very narrow and the likelihood of it
happening is very low, as we should never actually utilize all CQEs, but the
consequences would be bad.
This bug was reported to me privately.
Backpatch-through: 18
Discussion: https://postgr.es/m/bwo3e5lj2dgi2wzq4yvbyzu7nmwueczvvzioqsqo6azu6lm5oy@pbx75g2ach3p
When an autovacuum worker exits, the launcher needs to be notified
with SIGUSR2, so that it can rebalance and possibly launch a new
worker. The launcher must be notified only after the worker has
finished ProcKill(), so that the worker slot is available for a new
worker. Before this commit, the autovacuum worker was responsible for
that, which required a slightly complicated dance to pass the
launcher's PID from FreeWorkerInfo() to ProcKill() in a global
variable.
Simplify that by moving the responsibility of the signaling to the
postmaster. The postmaster was already doing it when it failed to fork
a worker process, so it seems logical to make it responsible for
notifying the launcher on worker exit too. That's also how the
notification on background worker exit is done.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Discussion: https://www.postgresql.org/message-id/a5e27d25-c7e7-45d5-9bac-a17c8f462def@iki.fi
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.
In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
Backpatch-through: 14
With 64-bit multixact offsets, PerformMembersTruncation() doesn't need
the starting offset anymore. The 'oldestOffset' value that
TruncateMultiXact() calculates is no longer used for anything. Remove
it, and the code to calculate it.
'oldestOffset' was included in the WAL record as 'startTruncMemb',
which sounds nice if you e.g. look at the WAL with pg_waldump, but it
was also confusing because we didn't actually use the value for
determining what to truncate. Replaying the WAL would remove all
segments older than 'endTruncMemb', regardless of
'startTruncMemb'. The 'startTruncOff' stored in the WAL record was
similarly unnecessary even before 64-bit multixid offsets, it was
stored just for the sake of symmetry with 'startTruncMemb'. Remove
both from the WAL record, and rename the remaining 'endTruncOff' to
'oldestMulti' and 'endTruncMemb' to 'oldestOffset', for consistency
with the variable names used for them in other places.
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
The initialization of PL/Python (the Python interpreter, the global
state, the plpy module) was arranged confusingly across different
functions with unclear and confusing boundaries. For example,
PLy_init_interp() said "Initialize the Python interpreter ..." but it
didn't actually do this, and PLy_init_plpy() said "initialize plpy
module" but it didn't do that either. After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy().
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
This seems to have existed like this since Python 3 support was
added (commit dd4cd55c15), but it's unclear what this second call is
supposed to accomplish.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
The comments "PyModule_AddObject does not add a refcount to the
object, for some odd reason" seem distracting. Arguably, this
behavior is expected, not odd. Also, the additional references
created by the existing code are apparently not necessary. But we
should clean up the reference in the error case, as suggested by the
Python documentation.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
This code has been commented out since the first commit of plpython.
It doesn't seem worth keeping.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: li carol <carol.li2025@outlook.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/f31333f1-fbb7-4098-b209-bf2d71fbd4f3%40eisentraut.org
These routines are useful to perform some basic validation checks on
each object structure, working currently on attribute numbers for
non-expression and expression attnums. These checks could be extended
in the future.
Note that this code is not used yet in the tree, and that these
functions will become handy for an upcoming patch for the import of
extended statistics data. However, they are worth their own independent
change as they are actually useful by themselves, with at least the
extension code argument in mind (or perhaps I am just feeling more
pedantic today).
Extracted from a larger patch by the same author, with many adjustments
and fixes by me.
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
In CreateWorkExprContext(), maxBlockSize is initialized to
ALLOCSET_DEFAULT_MAXSIZE, and it then immediately reassigned,
thus the initialization is a redundant.
Author: Andreas Karlsson <andreas@proxel.se>
Reported-by: Chao Li <lic@highgo.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/83a14f3c-f347-4769-9c01-30030b31f1eb@gmail.com
This reverts commit f0f2c0c1ae.
The original problem that led to the use of pg_restrict was that MSVC
couldn't handle plain restrict, and defining it to something else
would conflict with its __declspec(restrict) that is used in system
header files. In C11 mode, this is no longer a problem, as MSVC
handles plain restrict. This led to the commit to replace pg_restrict
with restrict. But this did not take C++ into account. Standard C++
does not have restrict, so we defined it as something else (for
example, MSVC supports __restrict). But this then again conflicts
with __declspec(restrict) in system header files. So we have to
revert this attempt. The comments are updated to clarify that the
reason for this is now C++ only.
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/CAGECzQRoD7chJP1-dneSrhxUJv%2BBRcigoGOO4UwGzaShLot2Yw%40mail.gmail.com
Previously, the Python Limited API was disabled on MSVC due to build
failures caused by Meson not knowing to link against python3.lib
instead of python3XX.lib when using the Limited API.
This commit works around the Meson limitation by explicitly finding
and linking against python3.lib on MSVC, and removes the preprocessor
guard that was disabling the Limited API on MSVC in plpython.h.
This requires python3.lib to be present in the Python installation,
which is included when Python is installed.
Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/ee410de1-1e0b-4770-b125-eeefd4726a24%40eisentraut.org
This patch switches some code paths to use GetDatum() macros more in
line with the data types of the variables they manipulate. This set of
changes does not fix a problem, but it is always nice to be more
consistent across the board.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru>
Reviewed-by: Yuan Li <carol.li2025@outlook.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com
Commit 5b148706c5 exposed functionality that allows multiple processes to
use the same replication origin, enabling non-builtin logical replication
solutions to implement parallel apply for large transactions.
With this functionality, if two backends acquire the same replication
origin and one of them resets it first, the acquired_by flag is cleared
without acknowledging that another backend is still actively using the
origin. This can lead to the origin being unintentionally dropped. If the
shared memory for that dropped origin is later reused for a newly created
origin, the remaining backend that still holds a pointer to the old memory
may inadvertently advance the LSN of a completely different origin,
causing unpredictable behavior.
Although the underlying issue predates commit 5b148706c5, it did not
surface earlier because the internal parallel apply worker mechanism
correctly coordinated origin resets and drops.
This commit resolves the problem by introducing a reference counter for
replication origins. The reference count increases when a backend sets the
origin and decreases when it resets it. Additionally, the backend that
first acquires the origin will not release it until all other backends
using the origin have released it as well.
The patch also prevents dropping a replication origin when acquired_by is
zero but the reference counter is nonzero, covering the scenario where the
first session exits without properly releasing the origin.
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB169077EE72ABE9E55BAF162D494B5A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com