When executing a MERGE UPDATE action, if there is more than one
concurrent update of the target row, the lock-and-retry code would
sometimes incorrectly identify the latest version of the target tuple,
leading to incorrect results.
This was caused by using the ctid field from the TM_FailureData
returned by table_tuple_lock() in a case where the result was TM_Ok,
which is unsafe because the TM_FailureData struct is not guaranteed to
be fully populated in that case. Instead, it should use the tupleid
passed to (and updated by) table_tuple_lock().
To reduce the chances of similar errors in the future, improve the
commentary for table_tuple_lock() and TM_FailureData to make it
clearer that table_tuple_lock() updates the tid passed to it, and most
fields of TM_FailureData should not be relied on in non-failure cases.
An exception to this is the "traversed" field, which is set in both
success and failure cases.
Reported-by: Dmitry <dsy.075@yandex.ru>
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru
Backpatch-through: 15
s2: NOTICE: Update: (1,160,s1,setup) -> (1,170,s1,"setup updated by update1_tg")
step update1_tg: UPDATE target_tg t SET balance = balance + 10, val = t.val || ' updated by update1_tg' WHERE t.key = 1;
s2: NOTICE: Update: (1,170,s1,"setup updated by update1_tg") -> (1,70,s1,"setup updated by update1_tg updated by update6_tg")
step update6_tg: UPDATE target_tg t SET balance = balance - 100, val = t.val || ' updated by update6_tg' WHERE t.key = 1;
step merge_bal_tg:
WITH t AS (
MERGE INTO target_tg t
USING (SELECT 1 as key) s
ON s.key = t.key
WHEN MATCHED AND balance < 100 THEN
UPDATE SET balance = balance * 2, val = t.val || ' when1'
WHEN MATCHED AND balance < 200 THEN
UPDATE SET balance = balance * 4, val = t.val || ' when2'
WHEN MATCHED AND balance < 300 THEN
UPDATE SET balance = balance * 8, val = t.val || ' when3'
RETURNING t.*
)
SELECT * FROM t;
<waiting ...>
step c2: COMMIT;
s1: NOTICE: Update: (1,70,s1,"setup updated by update1_tg updated by update6_tg") -> (1,140,s1,"setup updated by update1_tg updated by update6_tg when1")
# merge_bal sees row concurrently updated twice, first update would cause all checks to fail, second update causes different check to pass, so final balance = 2000