2 @comment node-name, next, previous, up
3 @chapter Signal handling
7 * The deferral mechanism::
8 * Implementation warts::
9 * Programming with signal handling in mind::
12 @node Groups of signals
13 @section Groups of signals
15 There are two distinct groups of signals.
17 @subsection Synchronous signals
19 This group consists of signals that are raised on illegal instruction,
20 hitting a protected page, or on a trap. Examples from this group are:
21 @code{SIGBUS}/@code{SIGSEGV}, @code{SIGTRAP}, @code{SIGILL} and
22 @code{SIGEMT}. The exact meaning and function of these signals varies
23 by platform and OS. Understandably, because these signals are raised
24 in a controllable manner they are never blocked or deferred.
26 @subsection Asynchronous or blockable signals
28 The other group is of blockable signals. Typically, signal handlers
29 block them to protect against being interrupted at all. For example
30 @code{SIGHUP}, @code{SIGINT}, @code{SIGQUIT} belong to this group.
32 With the exception of @code{SIG_STOP_FOR_GC} all blockable signals are
35 @node The deferral mechanism
36 @section The deferral mechanism
38 @subsection Pseudo atomic sections
40 Some operations, such as allocation, consist of several steps and
41 temporarily break for instance gc invariants. Interrupting said
42 operations is therefore dangerous to one's health. Blocking the
43 signals for each allocation is out of question as the overhead of the
44 two @code{sigsetmask} system calls would be enormous. Instead, pseudo
45 atomic sections are implemented with a simple flag.
47 When a deferrable signal is delivered to a thread within a pseudo
48 atomic section the pseudo-atomic-interrupted flag is set, the signal
49 and its context are stored, and all deferrable signals blocked. This
50 is to guarantee that there is at most one pending handler in
51 SBCL. While the signals are blocked, the responsibilty of keeping
52 track of other pending signals lies with the OS.
54 On leaving the pseudo atomic section, the pending handler is run and
55 the signals are unblocked.
57 @subsection @code{WITHOUT-INTERRUPTS}
59 Similar to pseudo atomic, @code{WITHOUT-INTERRUPTS} defers deferrable
60 signals in its thread until the end of its body, provided it is not
61 nested in another @code{WITHOUT-INTERRUPTS}.
63 Not so frequently used as pseudo atomic, @code{WITHOUT-INTERRUPTS}
64 benefits less from the deferral mechanism.
66 @subsection Stop the world
68 Something of a special case, a signal that is blockable but not
69 deferrable by @code{WITHOUT-INTERRUPTS} is @code{SIG_STOP_FOR_GC}. It
70 is deferred by pseudo atomic and @code{WITHOUT-GCING}.
72 @subsection When are signals handled?
74 At once or as soon as the mechanism that deferred them allows.
76 First, if something is deferred by pseudo atomic then it is run at the
77 end of pseudo atomic without exceptions. Even when both a GC request
78 or a @code{SIG_STOP_FOR_GC} and a deferrable signal such as
79 SIG_INTERRUPT_THREAD interrupts the pseudo atomic section.
81 Second, an interrupt deferred by WITHOUT-INTERRUPTS is run when the
82 interrupts are enabled again. GC cannot interfere.
84 Third, if GC or @code{SIG_STOP_FOR_GC} is deferred by
85 @code{WITHOUT-GCING} then the GC or stopping for GC will happen when
86 GC is not inhibited anymore. Interrupts cannot delay a gc.
88 @node Implementation warts
89 @section Implementation warts
91 @subsection Miscellaneous issues
93 Signal handlers automatically restore errno and fp state, but
94 arrange_return_to_lisp_function does not restore errno.
96 @subsection POSIX -- Letter and Spirit
98 POSIX restricts signal handlers to a use only a narrow subset of POSIX
99 functions, and declares anything else to have undefined semantics.
101 Apparently the real reason is that a signal handler is potentially
102 interrupting a POSIX call: so the signal safety requirement is really
103 a re-entrancy requirement. We can work around the letter of the
104 standard by arranging to handle the interrupt when the signal handler
105 returns (see: @code{arrange_return_to_lisp_function}.) This does,
106 however, in no way protect us from the real issue of re-entrancy: even
107 though we would no longer be in a signal handler, we might still be in
108 the middle of an interrupted POSIX call.
110 For some signals this appears to be a non-issue: @code{SIGSEGV} and
111 other synchronous signals are raised by our code for our code, and so
112 we can be sure that we are not interrupting a POSIX call with any of
115 For asynchronous signals like @code{SIGALARM} and @code{SIGINT} this
118 The right thing to do in multithreaded builds would probably be to use
119 POSIX semaphores (which are signal safe) to inform a separate handler
120 thread about such asynchronous events. In single-threaded builds there
121 does not seem to be any other option aside from generally blocking
122 asynch signals and listening for them every once and a while at safe
123 points. Neither of these is implemented as of SBCL 1.0.4.
125 Currently all our handlers invoke unsafe functions without hesitation.
127 @node Programming with signal handling in mind
128 @section Programming with signal handling in mind
130 @subsection On reentrancy
132 Since they might be invoked in the middle of just about anything,
133 signal handlers must invoke only reentrant functions or async signal
134 safe functions to be more precise. Functions passed to
135 @code{INTERRUPT-THREAD} have the same restrictions and considerations
138 Destructive modification, and holding mutexes to protect desctructive
139 modifications from interfering with each other are often the cause of
140 non-reentrancy. Recursive locks are not likely to help, and while
141 @code{WITHOUT-INTERRUPTS} is, it is considered untrendy to litter the
144 Some basic functionality, such as streams and the debugger are
145 intended to be reentrant, but not much effort has been spent on
148 @subsection More deadlocks
150 If functions A and B directly or indirectly lock mutexes M and N, they
151 should do so in the same order to avoid deadlocks.
153 A less trivial scenario is where there is only one lock involved but
154 it is acquired in a @code{WITHOUT-GCING} in thread A, and outside of
155 @code{WITHOUT-GCING} in thread B. If thread A has entered
156 @code{WITHOUT-GCING} but thread B has the lock when the gc hits, then
157 A cannot leave @code{WITHOUT-GCING} because it is waiting for the lock
158 the already suspended thread B has. From this scenario one can easily
159 derive the rule: in a @code{WITHOUT-GCING} form (or pseudo atomic for
160 that matter) never wait for another thread that's not in
161 @code{WITHOUT-GCING}.
163 Somewhat of a special case, it is enforced by the runtime that
164 @code{SIG_STOP_FOR_GC} and @code{SIG_RESUME_FROM_GC} always unblocked
165 when we might trigger a gc (i.e. on alloc or calling into Lisp).
167 @subsection Calling user code
169 For the reasons above, calling user code, i.e. functions passed in, or
170 in other words code that one cannot reason about, from non-reentrant
171 code (holding locks), @code{WITHOUT-INTERRUPTS}, @code{WITHOUT-GCING}
172 is dangerous and best avoided.
176 It is not easy to debug signal problems. The best bet probably is to
177 enable @code{QSHOW} and @code{QSHOW_SIGNALS} in runtime.h and once
178 SBCL runs into problems attach gdb. A simple @code{thread apply all
179 ba} is already tremendously useful. Another possibility is to send a
180 SIGABORT to SBCL to provoke landing in LDB, if it's compiled with it
181 and it has not yet done so on its own.
183 Note, that fprintf used by QSHOW is not reentrant and at least on x86
184 linux it is known to cause deadlocks, so place SHOW and co carefully,
185 ideally to places where blockable signals are blocked. Use
186 @code{QSHOW_SAFE} if you like.