Don't let timeout interrupts happen unless ImmediateInterruptOK is set.

Serious oversight in commit 16e1b7a1b7f7ffd8a18713e83c8cd72c9ce48e07:
we should not allow an interrupt to take control away from mainline code
except when ImmediateInterruptOK is set.  Just to be safe, let's adopt
the same save-clear-restore dance that's been used for many years in
HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad
happens if a timeout handler invokes code that tests or even manipulates
ImmediateInterruptOK.

Per report of "stuck spinlock" failures from Christophe Pettus, though
many other symptoms are possible.  Diagnosis by Andres Freund.
pull/6/head
Tom Lane 12 years ago
parent 50e547096c
commit e8312b4f03
  1. 36
      src/backend/utils/misc/timeout.c

@ -259,22 +259,27 @@ static void
handle_sig_alarm(SIGNAL_ARGS) handle_sig_alarm(SIGNAL_ARGS)
{ {
int save_errno = errno; int save_errno = errno;
bool save_ImmediateInterruptOK = ImmediateInterruptOK;
/* /*
* We may be executing while ImmediateInterruptOK is true (e.g., when * We may be executing while ImmediateInterruptOK is true (e.g., when
* mainline is waiting for a lock). If SIGINT or similar arrives while * mainline is waiting for a lock). If SIGINT or similar arrives while
* this code is running, we'd lose control and perhaps leave our data * this code is running, we'd lose control and perhaps leave our data
* structures in an inconsistent state. Hold off interrupts to prevent * structures in an inconsistent state. Disable immediate interrupts, and
* that. * just to be real sure, bump the holdoff counter as well. (The reason
* for this belt-and-suspenders-too approach is to make sure that nothing
* bad happens if a timeout handler calls code that manipulates
* ImmediateInterruptOK.)
* *
* Note: it's possible for a SIGINT to interrupt handle_sig_alarm even * Note: it's possible for a SIGINT to interrupt handle_sig_alarm before
* before we reach HOLD_INTERRUPTS(); the net effect would be as if the * we manage to do this; the net effect would be as if the SIGALRM event
* SIGALRM event had been silently lost. Therefore error recovery must * had been silently lost. Therefore error recovery must include some
* include some action that will allow any lost interrupt to be * action that will allow any lost interrupt to be rescheduled. Disabling
* rescheduled. Disabling some or all timeouts is sufficient, or if * some or all timeouts is sufficient, or if that's not appropriate,
* that's not appropriate, reschedule_timeouts() can be called. Also, the * reschedule_timeouts() can be called. Also, the signal blocking hazard
* signal blocking hazard described below applies here too. * described below applies here too.
*/ */
ImmediateInterruptOK = false;
HOLD_INTERRUPTS(); HOLD_INTERRUPTS();
/* /*
@ -330,9 +335,12 @@ handle_sig_alarm(SIGNAL_ARGS)
} }
/* /*
* Re-allow query cancel, and then service any cancel request that arrived * Re-allow query cancel, and then try to service any cancel request that
* meanwhile (this might in particular include a cancel request fired by * arrived meanwhile (this might in particular include a cancel request
* one of the timeout handlers). * fired by one of the timeout handlers). Since we are in a signal
* handler, we mustn't call ProcessInterrupts unless ImmediateInterruptOK
* is set; if it isn't, the cancel will happen at the next mainline
* CHECK_FOR_INTERRUPTS.
* *
* Note: a longjmp from here is safe so far as our own data structures are * Note: a longjmp from here is safe so far as our own data structures are
* concerned; but on platforms that block a signal before calling the * concerned; but on platforms that block a signal before calling the
@ -341,7 +349,9 @@ handle_sig_alarm(SIGNAL_ARGS)
* unblocking any blocked signals. * unblocking any blocked signals.
*/ */
RESUME_INTERRUPTS(); RESUME_INTERRUPTS();
CHECK_FOR_INTERRUPTS(); ImmediateInterruptOK = save_ImmediateInterruptOK;
if (save_ImmediateInterruptOK)
CHECK_FOR_INTERRUPTS();
errno = save_errno; errno = save_errno;
} }

Loading…
Cancel
Save