postgres

Commit Graph

Author	SHA1	Message	Date
Amit Kapila	f2818868ae	Fix LOCK_TIMEOUT handling in slotsync worker. Previously, the slotsync worker relied on SIGINT for graceful shutdown during promotion. However, SIGINT is also used by the LOCK_TIMEOUT handler to cancel queries. Since the slotsync worker can lock catalog tables while parsing libpq tuples, this overlap caused it to ignore LOCK_TIMEOUT signals and potentially wait indefinitely on locks. This patch replaces the slotsync worker's SIGINT handler with StatementCancelHandler to correctly process query-cancel interrupts. Additionally, the startup process now uses SIGUSR1 to signal the slotsync worker to stop during promotion. The worker exits after detecting that the shared memory flag stopSignaled is set. Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, here it was introduced Discussion: https://postgr.es/m/TY4PR01MB169078F33846E9568412D878C94A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com	1 week ago
Michael Paquier	f3fb6bc9fe	Fix regression with slot invalidation checks This commit reverts `818fefd8fd`, that has been introduced to address a an instability in some of the TAP tests due to the presence of random standby snapshot WAL records, when slots are invalidated by InvalidatePossiblyObsoleteSlot(). Anyway, this commit had also the consequence of introducing a behavior regression. After `818fefd8fd`, the code may determine that a slot needs to be invalidated while it may not require one: the slot may have moved from a conflicting state to a non-conflicting state between the moment when the mutex is released and the moment when we recheck the slot, in InvalidatePossiblyObsoleteSlot(). Hence, the invalidations may be more aggressive than they actually have to. `105b2cb336` has tackled the test instability in a way that should be hopefully sufficient for the buildfarm, even for slow members: - In v18, the test relies on an injection point that bypasses the creation of the random records generated for standby snapshots, eliminating the random factor that impacted the test. This option was not available when `818fefd8fd` was discussed. - In v16 and v17, the problem was bypassed by disallowing a slot to become active in some of the scenarios tested. While on it, this commit adds a comment to document that it is fine for a recheck to use xmin and LSN values stored in the slot, without storing and reusing them across multiple checks. Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/f492465f-657e-49af-8317-987460cb68b0.mengjuan.cmj@alibaba-inc.com Backpatch-through: 16	2 months ago
Amit Kapila	0024f5a102	Fix GUC check_hook validation for synchronized_standby_slots. Previously, the check_hook for synchronized_standby_slots attempted to validate that each specified slot existed and was physical. However, these checks were not performed during server startup. As a result, if users configured non-existent slots before startup, the misconfiguration would go undetected initially. This could later cause parallel query failures, as newly launched workers would detect the issue and raise an ERROR. This patch improves the check_hook by validating the syntax and format of slot names. Validation of slot existence and type is deferred to the WAL sender process, aligning with the behavior of the check_hook for primary_slot_name. Reported-by: Fabrice Chapuis <fabrice636861@gmail.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/CAA5-nLCeO4MQzWipCXH58qf0arruiw0OeUc1+Q=Z=4GM+=v1NQ@mail.gmail.com	2 months ago
Fujii Masao	8ceab82ca5	Add comments explaining overflow entries in the replication lag tracker. Commit `883a95646a` introduced overflow entries in the replication lag tracker to fix an issue where lag columns in pg_stat_replication could stall when the replay LSN stopped advancing. This commit adds comments clarifying the purpose and behavior of overflow entries to improve code readability and understanding. Since commit `883a95646a` was recently applied and backpatched to all supported branches, this follow-up commit is also backpatched accordingly. Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CABPTF7VxqQA_DePxyZ7Y8V+ErYyXkmwJ1P6NC+YC+cvxMipWKw@mail.gmail.com Backpatch-through: 13	2 months ago
Fujii Masao	1db2870bb5	Make invalid primary_slot_name follow standard GUC error reporting. Previously, if primary_slot_name was set to an invalid slot name and the configuration file was reloaded, both the postmaster and all other backend processes reported a WARNING. With many processes running, this could produce a flood of duplicate messages. The problem was that the GUC check hook for primary_slot_name reported errors at WARNING level via ereport(). This commit changes the check hook to use GUC_check_errdetail() and GUC_check_errhint() for error reporting. As with other GUC parameters, this causes non-postmaster processes to log the message at DEBUG3, so by default, only the postmaster's message appears in the log file. Backpatch to all supported versions. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com Backpatch-through: 13	2 months ago
Fujii Masao	62d5ee75bb	Fix stalled lag columns in pg_stat_replication when replay LSN stops advancing. Previously, when the replay LSN reported in feedback messages from a standby stopped advancing, for example, due to a recovery conflict, the write_lag and flush_lag columns in pg_stat_replication would initially update but then stop progressing. This prevented users from correctly monitoring replication lag. The problem occurred because when any LSN stopped updating, the lag tracker's cyclic buffer became full (the write head reached the slowest read head). In that state, the lag tracker could no longer compute round-trip lag values correctly. This commit fixes the issue by handling the slowest read entry (the one causing the buffer to fill) as a separate overflow entry and freeing space so the write and other read heads can continue advancing in the buffer. As a result, write_lag and flush_lag now continue updating even if the reported replay LSN remains stalled. Backpatch to all supported versions. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Chao Li <lic@highgo.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAHGQGwGdGQ=1-X-71Caee-LREBUXSzyohkoQJd4yZZCMt24C0g@mail.gmail.com Backpatch-through: 13	2 months ago
Michael Paquier	42348839d8	Remove state.tmp when failing to save a replication slot An error happening while a slot data is saved on disk in SaveSlotToPath() could cause a state.tmp file (temporary file holding the slot state data, renamed to its permanent name at the end of the function) to remain around after it has been created. This temporary file is created with O_EXCL, meaning that if an existing state.tmp is found, its creation would fail. This would prevent the slot data to be saved, requiring a manual intervention to remove state.tmp before being able to save again a slot. Possible scenarios where this temporary file could remain on disk is for example a ENOSPC case (no disk space) while writing, syncing or renaming it. The bug reports point to a write failure as the principal cause of the problems. Using O_TRUNC has been argued back in 2019 as a potential solution to discard any temporary file that could exist. This solution was rejected as O_EXCL can also act as a safety measure when saving the slot state, crash recovery offering cleanup guarantees post-crash. This commit uses the alternative approach that has been suggested by Andres Freund back in 2019. When the temporary state file cannot be written, synced, closed or renamed (note: not when created!), an unlink() is used to remove the temporary state file while holding the in-progress I/O LWLock, so as any follow-up attempts to save a slot's data would not choke on an existing file that remained around because of a previous failure. This problem has been reported a few times across the years, going back to 2019, but for some reason I have never come back to do something about it and it has been forgotten. A recent report has reminded me that this was still a problem. Reported-by: Kevin K Biju <kevinkbiju@gmail.com> Reported-by: Sergei Kornilov <sk@zsrv.org> Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru> Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru Backpatch-through: 13	2 months ago
Masahiko Sawada	a61592253e	Fix access-to-already-freed-memory issue in pgoutput. While pgoutput caches relation synchronization information in RelationSyncCache that resides in CacheMemoryContext, each entry's information (such as row filter expressions and column lists) is stored in the entry's private memory context (entry_cxt in RelationSyncEntry), which is a descendant memory context of the decoding context. If a logical decoding invoked via SQL functions like pg_logical_slot_get_binary_changes fails with an error, subsequent logical decoding executions could access already-freed memory of the entry's cache, resulting in a crash. With this change, it's ensured that RelationSyncCache is cleaned up even in error cases by using a memory context reset callback function. Backpatch to 15, where entry_cxt was introduced for column filtering and row filtering. While the backbranches v13 and v14 have a similar issue where RelationSyncCache persists even after an error when pgoutput is used via SQL API, we decided not to backport this fix. This decision was made because v13 is approaching its final minor release, and we won't have an chance to fix any new issues that might arise. Additionally, since using pgoutput via SQL API is not a common use case, the risk outwights the benefit. If we receive bug reports, we can consider backporting the fixes then. Author: vignesh C <vignesh21@gmail.com> Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com Backpatch-through: 15	2 months ago
Amit Kapila	2f6e1a4906	Fix LOCK_TIMEOUT handling during parallel apply. Previously, the parallel apply worker used SIGINT to receive a graceful shutdown signal from the leader apply worker. However, SIGINT is also used by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals, leading to incorrect behavior during lock wait/contention. This patch resolves the conflict by switching the graceful shutdown signal from SIGINT to SIGUSR2. Reported-by: Zane Duffield <duffieldzane@gmail.com> Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com	3 months ago
Peter Eisentraut	591aeb5068	Remove stray semicolon at global scope The Sun Studio compiler complains about an empty declaration here. Note for future historians: This does not mean that this compiler is still of current interest for anyone using PostgreSQL. But we can let this small fix be its parting gift. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org	3 months ago
Michael Paquier	07a3023871	Fix assertion failure with replication slot release in single-user mode Some replication slot manipulations (logical decoding via SQL, advancing) were failing an assertion when releasing a slot in single-user mode, because active_pid was not set in a ReplicationSlot when its slot is acquired. ReplicationSlotAcquire() has some logic to be able to work with the single-user mode. This commit sets ReplicationSlot->active_pid to MyProcPid, to let the slot-related logic fall-through, considering the single process as the one holding the slot. Some TAP tests are added for various replication slot functions with the single-user mode, while on it, for slot creation, drop, advancing, copy and logical decoding with multiple slot types (temporary, physical vs logical). These tests are skipped on Windows, as direct calls of postgres --single would fail on permission failures. There is no platform-specific behavior that needs to be checked, so living with this restriction should be fine. The CI is OK with that, now let's see what the buildfarm tells. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Mutaamba Maasha <maasha@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com Backpatch-through: 13	4 months ago
Amit Kapila	288a817bcb	Fix self-deadlock during DROP SUBSCRIPTION. The DROP SUBSCRIPTION command performs several operations: it stops the subscription workers, removes subscription-related entries from system catalogs, and deletes the replication slot on the publisher server. Previously, this command acquired an AccessExclusiveLock on pg_subscription before initiating these steps. However, while holding this lock, the command attempts to connect to the publisher to remove the replication slot. In cases where the connection is made to a newly created database on the same server as subscriber, the cache-building process during connection tries to acquire an AccessShareLock on pg_subscription, resulting in a self-deadlock. To resolve this issue, we reduce the lock level on pg_subscription during DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier, the higher lock level was used to prevent the launcher from starting a new worker during the drop operation, as a restarted worker could become orphaned. Now, instead of relying on a strict lock, we acquire an AccessShareLock on the specific subscription being dropped and re-validate its existence after acquiring the lock. If the subscription is no longer valid, the worker exits gracefully. This approach avoids the deadlock while still ensuring that orphan workers are not created. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org	4 months ago
Peter Eisentraut	be26c77ef9	Message improvements Backpatch of the relevant parts of commit `50fd428b2b` for consistency.	4 months ago
Fujii Masao	f71fa981c9	Avoid unexpected shutdown when sync_replication_slots is enabled. Previously, enabling sync_replication_slots while wal_level was not set to logical could cause the server to shut down. This was because the postmaster performed a configuration check before launching the slot synchronization worker and raised an ERROR if the settings were incompatible. Since ERROR is treated as FATAL in the postmaster, this resulted in the entire server shutting down unexpectedly. This commit changes the postmaster to log that message with a LOG-level instead of raising an ERROR, allowing the server to continue running even with the misconfiguration. Back-patch to v17, where slot synchronization was introduced. Reported-by: Hugo DUBOIS <hdubois@scaleway.com> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Hugo DUBOIS <hdubois@scaleway.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Discussion: https://postgr.es/m/CAH0PTU_pc3oHi__XESF9ZigCyzai1Mo3LsOdFyQA4aUDkm01RA@mail.gmail.com Backpatch-through: 17	5 months ago
Michael Paquier	9e0b4b1ab5	Fix use-after-free with INSERT ON CONFLICT changes in reorderbuffer.c In ReorderBufferProcessTXN(), used to send the data of a transaction to an output plugin, INSERT ON CONFLICT changes (INTERNAL_SPEC_INSERT) are delayed until a confirmation record arrives (INTERNAL_SPEC_CONFIRM), updating the change being processed. `8c58624df4` has added an extra step after processing a change to update the progress of the transaction, by calling the callback update_progress_txn() based on the LSN stored in a change after a threshold of CHANGES_THRESHOLD (100) is reached. This logic has missed the fact that for an INSERT ON CONFLICT change the data is freed once processed, hence update_progress_txn() could be called pointing to a LSN value that's already been freed. This could result in random crashes, depending on the workload. Per discussion, this issue is fixed by reusing in update_progress_txn() the LSN from the change processed found at the beginning of the loop, meaning that for a INTERNAL_SPEC_CONFIRM change the progress is updated using the LSN of the INTERNAL_SPEC_CONFIRM change, and not the LSN from its INTERNAL_SPEC_INSERT change. This is actually more correct, as we want to update the progress to point to the INTERNAL_SPEC_CONFIRM change. Masahiko Sawada has found a nice trick to reproduce the issue: hardcode CHANGES_THRESHOLD at 1 and run test_decoding (test "ddl" being enough) on an instance running valgrind. The bug has been analyzed by Ethan Mertz, who also originally suggested the solution used in this patch. Issue introduced by `8c58624df4`, so backpatch down to v16. Author: Ethan Mertz <ethan.mertz@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/aIsQqDZ7x4LAQ6u1@paquier.xyz Backpatch-through: 16	5 months ago
Amit Kapila	8c298324a4	Fix a deadlock during ALTER SUBSCRIPTION ... DROP PUBLICATION. A deadlock can occur when the DDL command and the apply worker acquire catalog locks in different orders while dropping replication origins. The issue is rare in PG16 and higher branches because, in most cases, the tablesync worker performs the origin drop in those branches, and its locking sequence does not conflict with DDL operations. This patch ensures consistent lock acquisition to prevent such deadlocks. As per buildfarm. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Ajin Cherian <itsajin@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 14, where it was introduced Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com	5 months ago
Amit Kapila	24f6c1bd41	Fix the handling of two GUCs during upgrade. Previously, the check_hook functions for max_slot_wal_keep_size and idle_replication_slot_timeout would incorrectly raise an ERROR for values set in postgresql.conf during upgrade, even though those values were not actively used in the upgrade process. To prevent logical slot invalidation during upgrade, we used to set special values for these GUCs. Now, instead of relying on those values, we directly prevent WAL removal and logical slot invalidation caused by max_slot_wal_keep_size and idle_replication_slot_timeout. Note: PostgreSQL 17 does not include the idle_replication_slot_timeout GUC, so related changes were not backported. BUG #18979 Reported-by: jorsol <jorsol@gmail.com> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed by: vignesh C <vignesh21@gmail.com> Reviewed by: Alvaro Herrera <alvherre@alvh.no-ip.org> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/219561.1751826409@sss.pgh.pa.us Discussion: https://postgr.es/m/18979-a1b7fdbb7cd181c6@postgresql.org	5 months ago
Tom Lane	9f33300e69	Prevent excessive delays before launching new logrep workers. The logical replication launcher process would sometimes sleep for as much as 3 minutes before noticing that it is supposed to launch a new worker. This could happen if (1) WaitForReplicationWorkerAttach absorbed a process latch wakeup that was meant to cause ApplyLauncherMain to do work, or (2) logicalrep_worker_launch reported failure, either because of resource limits or because the new worker terminated immediately. In case (2), the expected behavior is that we retry the launch after wal_retrieve_retry_interval, but that didn't reliably happen. It's not clear how often such conditions would occur in the field, but in our subscription test suite they are somewhat common, especially in tests that exercise cases that cause quick worker failure. That causes the tests to take substantially longer than they ought to do on typical setups. To fix (1), make WaitForReplicationWorkerAttach re-set the latch before returning if it cleared it while looping. To fix (2), ensure that we reduce wait_time to no more than wal_retrieve_retry_interval when logicalrep_worker_launch reports failure. In passing, fix a couple of perhaps-hypothetical race conditions, e.g. examining worker->in_use without a lock. Backpatch to v16. Problem (2) didn't exist before commit `5a3a95385` because the previous code always set wait_time to wal_retrieve_retry_interval when launching a worker, regardless of success or failure of the launch. That behavior also greatly mitigated problem (1), so I'm not excited about adapting the remainder of the patch to the substantially-different code in older branches. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/817604.1750723007@sss.pgh.pa.us Backpatch-through: 16	6 months ago
Amit Kapila	25505082f0	Improve log messages and docs for slot synchronization. Improve the clarity of LOG messages when a failover logical slot synchronization fails, making the reasons more explicit for easier debugging. Update the documentation to outline scenarios where slot synchronization can fail, especially during the initial sync, and emphasize that pg_sync_replication_slot() is primarily intended for testing and debugging purposes. We also discussed improving the functionality of pg_sync_replication_slot() so that it can be used reliably, but we would take up that work for next version after some more discussion and review. Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com	6 months ago
Masahiko Sawada	45c357e0e8	Fix re-distributing previously distributed invalidation messages during logical decoding. Commit `4909b38af0` introduced logic to distribute invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. However, since each transaction distributes not only its original invalidation messages but also previously distributed messages to other transactions, this leads to an exponential increase in allocation request size for invalidation messages, ultimately causing memory allocation failure. This commit fixes this issue by tracking distributed invalidation messages separately per decoded transaction and not redistributing these messages to other in-progress transactions. The maximum size of distributed invalidation messages that one transaction can store is limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the distributed invalidation messages exceeds this threshold, we invalidate all caches in locations where distributed invalidation messages need to be executed. Back-patch to all supported versions where we introduced the fix by commit `4909b38af0`. Note that this commit adds two new fields to ReorderBufferTXN to store the distributed transactions. This change breaks ABI compatibility in back branches, affecting third-party extensions that depend on the size of the ReorderBufferTXN struct, though this scenario seems unlikely. Additionally, it adds a new flag to the txn_flags field of ReorderBufferTXN to indicate distributed invalidation message overflow. This should not affect existing implementations, as it is unlikely that third-party extensions use unused bits in the txn_flags field. Bug: #18938 #18942 Author: vignesh C <vignesh21@gmail.com> Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Reported-by: John Hutchins <john.hutchins@wicourts.gov> Reported-by: Laurence Parry <greenreaper@hotmail.com> Reported-by: Max Madden <maxmmadden@gmail.com> Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com Backpatch-through: 13	6 months ago
Alexander Korotkov	32ab0fd55d	Add TAP tests to check replication slot advance during the checkpoint The new tests verify that logical and physical replication slots are still valid after an immediate restart on checkpoint completion when the slot was advanced during the checkpoint. This commit introduces two new injection points to make these tests possible: * checkpoint-before-old-wal-removal - triggered in the checkpointer process just before old WAL segments cleanup; * logical-replication-slot-advance-segment - triggered in LogicalConfirmReceivedLocation() when restart_lsn was changed enough to point to the next WAL segment. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17	6 months ago
Alexander Korotkov	2090edc6f3	Keep WAL segments by the flushed value of the slot's restart LSN The patch fixes the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The issue occurs when a slot is advanced after the start of the checkpoint and before old WAL segments are removed at the end of the checkpoint. The idea of the patch is to get the minimal restart_lsn at the beginning of checkpoint (or restart point) creation and use this value when calculating the oldest LSN for WAL segments removal at the end of checkpoint. This idea was proposed by Tomas Vondra in the discussion. Unlike 291221c46575, this fix doesn't affect ABI and is intended for back branches. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13	6 months ago
Michael Paquier	87be749c71	Use replay LSN as target for cascading logical WAL senders A cascading WAL sender doing logical decoding (as known as doing its work on a standby) has been using as flush LSN the value returned by GetStandbyFlushRecPtr() (last position safely flushed to disk). This is incorrect as such processes are only able to decode changes up to the LSN that has been replayed by the startup process. This commit changes cascading logical WAL senders to use the replay LSN, as returned by GetXLogReplayRecPtr(). This distinction is important particularly during shutdown, when WAL senders need to send any remaining available data to their clients, switching WAL senders to a caught-up state. Using the latest flush LSN rather than the replay LSN could cause the WAL senders to be stuck in an infinite loop preventing them to shut down, as the startup process does not run when WAL senders attempt to catch up, so they could keep waiting for work that would never happen. Backpatch down to v16, where logical decoding on standbys has been introduced. Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru> Reviewed-by: Ajin Cherian <itsajin@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/52138028-7246-421c-9161-4fa108b88070@postgrespro.ru Backpatch-through: 16	7 months ago
Nathan Bossart	fe8ea7a2a8	Ensure we have a snapshot when updating various system catalogs. A few places that access system catalogs don't set up an active snapshot before potentially accessing their TOAST tables. To fix, push an active snapshot just before each section of code that might require accessing one of these TOAST tables, and pop it shortly afterwards. While at it, this commit adds some rather strict assertions in an attempt to prevent such issues in the future. Commit `16bf24e0e4` recently removed pg_replication_origin's TOAST table in order to fix the same problem for that catalog. On the back-branches, those bugs are left in place. We cannot easily remove a catalog's TOAST table on released major versions, and only replication origins with extremely long names are affected. Given the low severity of the issue, fixing older versions doesn't seem worth the trouble of significantly modifying the patch. Also, on v13 and v14, the aforementioned strict assertions have been omitted because commit `2776922201`, which added HaveRegisteredOrActiveSnapshot(), was not back-patched. While we could probably back-patch it now, I've opted against it because it seems unlikely that new TOAST snapshot issues will be introduced in the oldest supported versions. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan Backpatch-through: 13	7 months ago
Amit Kapila	7318f241d2	Don't retreat slot's confirmed_flush LSN. Prevent moving the confirmed_flush backwards, as this could lead to data duplication issues caused by replicating already replicated changes. This can happen when a client acknowledges an LSN it doesn't have to do anything for, and thus didn't store persistently. After a restart, the client can send the prior LSN that it stored persistently as an acknowledgement, but we need to ignore such an LSN to avoid retreating confirm_flush LSN. Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Tested-by: Nisha Moond <nisha.moond412@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10Ly0x7HhwQ@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57164AB5716AF2E477D53F6F9489A@OS0PR01MB5716.jpnprd01.prod.outlook.com	7 months ago
Amit Kapila	36148b22ee	Fix xmin advancement during fast_forward decoding. During logical decoding, we advance catalog_xmin of logical too early in fast_forward mode, resulting in required catalog data being removed by vacuum. This mode is normally used to advance the slot without processing the changes, but we still can't let the slot's xmin to advance to an incorrect value. Commit `f49a80c481` fixed a similar issue where the logical slot's catalog_xmin was getting advanced prematurely during non-fast-forward mode. During xl_running_xacts processing, instead of directly advancing the slot's xmin to the oldest running xid in the record, it allowed the xmin to be held back for snapshots that can be used for not-yet-replayed transactions, as those might consider older txns as running too. However, it missed the fact that the same problem can happen during fast_forward mode decoding, as we won't build a base snapshot in that mode, and the future call to get_changes from the same slot can miss seeing the required catalog changes leading to incorrect reslts. This commit allows building the base snapshot even in fast_forward mode to prevent the early advancement of xmin. Reported-by: Amit Kapila <amit.kapila16@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com	8 months ago
Amit Kapila	05676d87e2	Fix an oversight in `3f28b2fcac`. Commit `3f28b2fcac` tried to ensure that the replication origin shouldn't be advanced in case of an ERROR in the apply worker, so that it can request the same data again after restart. However, it is possible that an ERROR was caught and handled by a (say PL/pgSQL) function, and the apply worker continues to apply further changes, in which case, we shouldn't reset the replication origin. Ensure to reset the origin only when the apply worker exits after an ERROR. Commit `3f28b2fcac` added new function geterrlevel, which we removed in HEAD as part of this commit, but kept it in backbranches to avoid breaking any applications. A separate case can be made to have such a function even for HEAD. Reported-by: Shawn McCoy <shawn.the.mccoy@gmail.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/CALsgZNCGARa2mcYNVTSj9uoPcJo-tPuWUGECReKpNgTpo31_Pw@mail.gmail.com	8 months ago
Michael Paquier	3339847ccd	Fix race with synchronous_standby_names at startup synchronous_standby_names cannot be reloaded safely by backends, and the checkpointer is in charge of updating a state in shared memory if the GUC is enabled in WalSndCtl, to let the backends know if they should wait or not for a given LSN. This provides a strict control on the timing of the waiting queues if the GUC is enabled or disabled, then reloaded. The checkpointer is also in charge of waking up the backends that could be waiting for a LSN when the GUC is disabled. This logic had a race condition at startup, where it would be possible for backends to not wait for a LSN even if synchronous_standby_names is enabled. This would cause visibility issues with transactions that we should be waiting for but they were not. The problem lasts until the checkpointer does its initial update of the shared memory state when it loads synchronous_standby_names. In order to take care of this problem, the shared memory state in WalSndCtl is extended to detect if it has been initialized by the checkpointer, and not only check if synchronous_standby_names is defined. In WalSndCtlData, sync_standbys_defined is renamed to sync_standbys_status, a bits8 able to know about two states: - If the shared memory state has been initialized. This flag is set by the checkpointer at startup once, and never removed. - If synchronous_standby_names is known as defined in the shared memory state. This is the same as the previous sync_standbys_defined in WalSndCtl. This method gives a way for backends to decide what they should do until the shared memory area is initialized, and they now ultimately fall back to a check on the GUC value in this case, which is the best thing that can be done. Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by the checkpointer when this process starts, so the window is very narrow. It is possible to enlarge the problematic window by making the checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined() with a hardcoded sleep for example, and doing so has showed that a 2PC visibility test is indeed failing. On machines slow enough, this bug would cause spurious failures. In 17~, we have looked at the possibility of adding an injection point to have a reproducible test, but as the problematic window happens at early startup, we would need to invent a way to make an injection point optionally persistent across restarts when attached, something that would be fine for this case as it would involve the checkpointer. This issue is quite old, and can be reproduced on all the stable branches. Author: Melnikov Maksim <m.melnikov@postgrespro.ru> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru Backpatch-through: 13	8 months ago
Amit Kapila	cadaf0ac46	Fix data loss in logical replication. Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take a strong lock on table happens concurrently to DMLs on the tables involved in the DDL. This happens because logical decoding doesn't distribute invalidations to concurrent transactions and those transactions use stale cache data to decode the changes. The problem becomes bigger because we keep using the stale cache even after those in-progress transactions are finished and skip the changes required to be sent to the client. This commit fixes the issue by distributing invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. This allows the necessary rebuild of the catalog cache when decoding new changes after concurrent DDL. We observed performance regression primarily during frequent execution of publication DDL statements that modify the published tables. The regression is minor or nearly nonexistent for DDLs that do not affect the published tables or occur infrequently, making this a worthwhile cost to resolve a longstanding data loss issue. An alternative approach considered was to take a strong lock on each affected table during publication modification. However, this would only address issues related to publication DDLs (but not the ALTER TYPE ...) and require locking every relation in the database for publications created as FOR ALL TABLES, which is impractical. The bug exists in all supported branches, but we are backpatching till 14. The fix for 13 requires somewhat bigger changes than this fix, so the fix for that branch is still under discussion. Reported-by: hubert depesz lubaczewski <depesz@depesz.com> Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com> Backpatch-through: 14 Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com	8 months ago
Michael Paquier	5cbbe70a9c	Flush the IO statistics of active WAL senders more frequently WAL senders do not flush their statistics until they exit, limiting the monitoring possible for live processes. This is penalizing when WAL senders are running for a long time, like in streaming or logical replication setups, because it is not possible to know the amount of IO they generate while running. This commit makes WAL senders more aggressive with their statistics flush, using an internal of 1 second, with the flush timing calculated based on the existing GetCurrentTimestamp() done before the sleeps done to wait for some activity. Note that the sleep done for logical and physical WAL senders happens in two different code paths, so the stats flushes need to happen in these two places. One test is added for the physical WAL sender case, and one for the logical WAL sender case. This can be done in a stable fashion by relying on the WAL generated by the TAP tests in combination with a stats reset while a server is running, but only on HEAD as WAL data has been added to pg_stat_io in `a051e71e28`. This issue exists since `a9c70b46db` and the introduction of pg_stat_io, so backpatch down to v16. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z73IsKBceoVd4t55@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 16	8 months ago
Masahiko Sawada	a4309e85f4	Restrict copying of invalidated replication slots. Previously, invalidated logical and physical replication slots could be copied using the pg_copy_logical_replication_slot and pg_copy_physical_replication_slot functions. Replication slots that were invalidated for reasons other than WAL removal retained their restart_lsn. This meant that a new slot copied from an invalidated slot could have a restart_lsn pointing to a WAL segment that might have already been removed. This commit restricts the copying of invalidated replication slots. Backpatch to v16, where slots could retain their restart_lsn when invalidated for reasons other than WAL removal. For v15 and earlier, this check is not required since slots can only be invalidated due to WAL removal, and existing checks already handle this issue. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com Backpatch-through: 16	9 months ago
Daniel Gustafsson	8afec4ef67	Fix guc_malloc calls for consistency and OOM checks check_createrole_self_grant and check_synchronized_standby_slots were allocating memory on a LOG elevel without checking if the allocation succeeded or not, which would have led to a segfault on allocation failure. On top of that, a number of callsites were using the ERROR level, relying on erroring out rather than returning false to allow the GUC machinery handle it gracefully. Other callsites used WARNING instead of LOG. While neither being not wrong, this changes all check_ functions do it consistently with LOG. init_custom_variable gets a promoted elevel to FATAL to keep the guc_malloc error handling in line with the rest of the error handling in that function which already call FATAL. If we encounter an OOM in this callsite there is no graceful handling to be had, better to error out hard. Backpatch the fix to check_createrole_self_grant down to v16 and the fix to check_synchronized_standby_slots down to v17 where they were introduced. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Nikita <pm91.arapov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Bug: #18845 Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org Backpatch-through: 16	9 months ago
Amit Kapila	7c906c5b46	Doc: Fix pg_copy_logical_replication_slot description. This commit documents that the failover option is not copied when using the pg_copy_logical_replication_slot function. In passing, we modify the comments in the function clarifying the reason for this behavior. Reported-by: <duffieldzane@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/173976850802.682632.11315364077431550250@wrigleys.postgresql.org	10 months ago
Masahiko Sawada	174952ece1	Fix assertion when decoding XLOG_PARAMETER_CHANGE on promoted primary. When a standby replays an XLOG_PARAMETER_CHANGE record that lowers wal_level below logical, we invalidate all logical slots in hot standby mode. However, if this record was replayed while not in hot standby mode, logical slots could remain valid even after promotion, potentially causing an assertion failure during WAL record decoding. To fix this issue, this commit adds a check for hot_standby status when restoring a logical replication slot on standbys. This check ensures that logical slots are invalidated when they become incompatible due to insufficient wal_level during recovery. Backpatch to v16 where logical decoding on standby was introduced. Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAD21AoABoFwGY_Rh2aeE6tEq3HkJxf0c6UeOXn4VV9v6BAQPSw%40mail.gmail.com Backpatch-through: 16	10 months ago
Tom Lane	788baa9a25	Fix crash in brininsertcleanup during logical replication. Logical replication crashes if the subscriber's partitioned table has a BRIN index. There are two independently blamable causes, and this patch fixes both: 1. brininsertcleanup fails if called twice for the same IndexInfo, because it half-destroys its BrinInsertState but leaves it still linked from ii_AmCache. brininsert would also fail in that state, so it's pretty hard to see any advantage to this coding. Fully remove the BrinInsertState, instead, so that a new brininsert call would create a new cache. 2. A logical replication subscriber sometimes does ExecOpenIndices twice on the same ResultRelInfo, followed by doing ExecCloseIndices twice; the second call reaches the brininsertcleanup bug. Quite aside from tickling unexpected cases in aminsertcleanup methods, this seems very wasteful, because the IndexInfos built in the first ExecOpenIndices call are just lost during the second call, and have to be rebuilt at possibly-nontrivial cost. We should establish a coding rule that you don't do that. The problematic coding is that when the target table is partitioned, apply_handle_tuple_routing calls ExecFindPartition which does ExecOpenIndices (and expects that ExecCleanupTupleRouting will close the indexes again). Using the ResultRelInfo made by ExecFindPartition, it calls apply_handle_delete_internal or apply_handle_insert_internal, both of which think they need to do ExecOpenIndices/ExecCloseIndices for themselves. They do in the main non-partitioned code paths, but not here. The simplest fix is to pull their ExecOpenIndices/ExecCloseIndices calls out and put them in the call sites for the non-partitioned cases. (We could have refactored apply_handle_update_internal similarly, but I did not do so today because there's no bug there: the partitioned code path doesn't call it.) Also, remove the always-duplicative open/close calls within apply_handle_tuple_routing itself. Since brininsertcleanup and indeed the whole aminsertcleanup mechanism are new in v17, there's no observable bug in older branches. A case could be made for trying to avoid these duplicative open/close calls in the older branches, but for now it seems not worth the trouble and risk of new bugs. Bug: #18815 Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com> Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org Backpatch-through: 17	10 months ago
Michael Paquier	836435424b	Fix memory leak in pgoutput with relation attribute map pgoutput caches the attribute map of a relation, that is free()'d only when validating a RelationSyncEntry. However, this code path is not taken when calling any of the SQL functions able to do some logical decoding, like pg_logical_slot_{get,peek}_changes(), leaking some memory into CacheMemoryContext on repeated calls. To address this, a relation's attribute map is allocated in PGOutputData's cachectx, free()'d at the end of the execution of these SQL functions when logical decoding ends. This is available down to 15. v13 and v14 have a similar leak, which will be dealt with later. Reported-by: Masahiko Sawada Author: Vignesh C Reviewed-by: Hou Zhijie Discussion: https://postgr.es/m/CAD21AoDkAhQVSukOfH3_reuF-j4EU0-HxMqU3dU+bSTxsqT14Q@mail.gmail.com Discussion: https://postgr.es/m/CALDaNm1hewNAsZ_e6FF52a=9drmkRJxtEPrzCB6-9mkJyeBBqA@mail.gmail.com Backpatch-through: 15	12 months ago
Michael Paquier	bbe68c13ab	Fix memory leak in pgoutput with publication list cache The pgoutput module caches publication names in a list and frees it upon invalidation. However, the code forgot to free the actual publication names within the list elements, as publication names are pstrdup()'d in GetPublication(). This would cause memory to leak in CacheMemoryContext, bloating it over time as this context is not cleaned. This is a problem for WAL senders running for a long time, as an accumulation of invalidation requests would bloat its cache memory usage. A second case, where this leak is easier to see, involves a backend calling SQL functions like pg_logical_slot_{get,peek}_changes() which create a new decoding context with each execution. More publications create more bloat. To address this, this commit adds a new memory context within the logical decoding context and resets it each time the publication names cache is invalidated, based on a suggestion from Amit Kapila. This ensures that the lifespan of the publication names aligns with that of the logical decoding context. Contrary to the HEAD-only commit `f0c569d715` that has changed PGOutputData to track this new child memory context, the context is tracked with a static variable whose state is reset with a MemoryContext reset callback attached to PGOutputData->context, so as ABI compatibility is preserved in stable branches. This approach is based on an suggestion from Amit Kapila. Analyzed-by: Michael Paquier, Jeff Davis Author: Masahiko Sawada Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira, Hou Zhijie Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz Backpatch-through: 13	1 year ago
Álvaro Herrera	9abdc1841e	Fix synchronized_standby_slots GUC check hook The validate_sync_standby_slots subroutine requires an LWLock, so it cannot run in processes without PGPROC; skip it there to avoid a crash. This replaces the current test for ReplicationSlotCtl being not null, which appears to be a solution for the same problem but less general. I also rewrote a related comment that mentioned ReplicationSlotCtl in StandbySlotsHaveCaughtup. This code came in with commit bf279ddd1c28; backpatch to 17. Reported-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/202411281216.sutbxtr6idnn@alvherre.pgsql	1 year ago
Amit Kapila	5f46439d59	Doc: Clarify the `inactive_since` field description. Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com	1 year ago
Michael Paquier	afe9b0d9fe	Fix memory leak in pgoutput for the WAL sender RelationSyncCache, the hash table in charge of tracking the relation schemas sent through pgoutput, was forgetting to free the TupleDesc associated to the two slots used to store the new and old tuples, causing some memory to be leaked each time a relation is invalidated when the slots of an existing relation entry are cleaned up. This is rather hard to notice as the bloat is pretty minimal, but a long-running WAL sender would be in trouble over time depending on the workload. sysbench has proved to be pretty good at showing the problem, coupled with some memory monitoring of the WAL sender. Issue introduced in `52e4f0cd47`, that has added row filters for tables logically replicated. Author: Boyu Yang Reviewed-by: Michael Paquier, Hou Zhijie Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM Backpatch-through: 15	1 year ago
Masahiko Sawada	568e78a653	Fix a possibility of logical replication slot's restart_lsn going backwards. Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13	1 year ago
Noah Misch	c1099dd745	Revert "For inplace update, send nontransactional invalidations." This reverts commit `95c5acb3fc` (v17) and counterparts in each other non-master branch. If released, that commit would have caused a worst-in-years minor release regression, via undetected LWLock self-deadlock. This commit and its self-deadlock fix warrant more bake time in the master branch. Reported by Alexander Lakhin. Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com	1 year ago
Noah Misch	95c5acb3fc	For inplace update, send nontransactional invalidations. The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com	1 year ago
Masahiko Sawada	eef9cc4dc2	Reduce memory block size for decoded tuple storage to 8kB. Commit `a4ccc1cef` introduced the Generation Context and modified the logical decoding process to use a Generation Context with a fixed block size of 8MB for storing tuple data decoded during logical decoding (i.e., rb->tup_context). Several reports have indicated that the logical decoding process can be terminated due to out-of-memory (OOM) situations caused by excessive memory usage in rb->tup_context. This issue can occur when decoding a workload involving several concurrent transactions, including a long-running transaction that modifies tuples. By design, the Generation Context does not free a memory block until all chunks within that block are released. Consequently, if tuples modified by the long-running transaction are stored across multiple memory blocks, these blocks remain allocated until the long-running transaction completes, leading to substantial memory fragmentation. The memory usage during logical decoding, tracked by rb->size, does not account for memory fragmentation, resulting in potentially much higher memory consumption than the value of the logical_decoding_work_mem parameter. Various improvement strategies were discussed in the relevant thread. This change reduces the block size of the Generation Context used in rb->tup_context from 8MB to 8kB. This modification significantly decreases the likelihood of substantial memory fragmentation occurring and is relatively straightforward to backport. Performance testing across multiple platforms has confirmed that this change will not introduce any performance degradation that would impact actual operation. Backport to all supported branches. Reported-by: Alex Richman, Michael Guissine, Avi Weinberg Reviewed-by: Amit Kapila, Fujii Masao, David Rowley Tested-by: Hayato Kuroda, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com Backpatch-through: 12	1 year ago
Peter Eisentraut	f2353dd717	Message style improvements	1 year ago
Peter Eisentraut	203b5ceee8	Fix misplaced translator comments They did not immediately precede the code they were applying to.	1 year ago
Masahiko Sawada	c739ae9e28	Fix identation.	1 year ago
Masahiko Sawada	dbed2e3662	Fix memory counter update in ReorderBuffer. Commit `5bec1d6bc5` changed the memory usage updates of the ReorderBufferTXN to zero all at once by subtracting txn->size, rather than updating it for each change. However, if TOAST reconstruction data remained in the transaction when freeing it, there were cases where it further subtracted the memory counter from zero, resulting in an assertion failure. This change calculates the memory size for each change and updates the memory usage to precisely the amount that has been freed. Backpatch to v17, where this was introducd. Reviewed-by: Amit Kapila, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoAqkNUvicgKPT_dXzNoOwpPkVTg0QPPxEcWmzT0moCJ1g%40mail.gmail.com Backpatch-through: 17	1 year ago
Amit Kapila	915aafe82a	Don't advance origin during apply failure. We advance origin progress during abort on successful streaming and application of ROLLBACK in parallel streaming mode. But the origin shouldn't be advanced during an error or unsuccessful apply due to shutdown. Otherwise, it will result in a transaction loss as such a transaction won't be sent again by the server. Reported-by: Hou Zhijie Author: Hayato Kuroda and Shveta Malik Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com	1 year ago
Nathan Bossart	925479b8d8	Use PqMsg_* macros in more places. Commit `f4b54e1ed9`, which introduced macros for protocol characters, missed updating a few places. It also did not introduce macros for messages sent from parallel workers to their leader processes. This commit adds a new section in protocol.h for those. Author: Aleksander Alekseev Discussion: https://postgr.es/m/CAJ7c6TNTd09AZq8tGaHS3LDyH_CCnpv0oOz2wN1dGe8zekxrdQ%40mail.gmail.com Backpatch-through: 17	1 year ago

1 2 3 4 5 ...

1605 Commits (f2818868aec856e7d2502e5232e08d3a4857a802)