There are two distinct groups of signals.
-@subsection Semi-synchronous signals
+@subsection Synchronous signals
-The first group, tentatively named ``semi-synchronous'', consists of
-signals that are raised on illegal instruction, hitting a protected
-page, or on a trap. Examples from this group are:
+This group consists of signals that are raised on illegal instruction,
+hitting a protected page, or on a trap. Examples from this group are:
@code{SIGBUS}/@code{SIGSEGV}, @code{SIGTRAP}, @code{SIGILL} and
@code{SIGEMT}. The exact meaning and function of these signals varies
by platform and OS. Understandably, because these signals are raised
in a controllable manner they are never blocked or deferred.
-@subsection Blockable signals
+@subsection Asynchronous or blockable signals
The other group is of blockable signals. Typically, signal handlers
block them to protect against being interrupted at all. For example
atomic section the pseudo-atomic-interrupted flag is set, the signal
and its context are stored, and all deferrable signals blocked. This
is to guarantee that there is at most one pending handler in
-SBCL. While the signals are blocked, the responsibilty of keeping
+SBCL. While the signals are blocked, the responsibility of keeping
track of other pending signals lies with the OS.
On leaving the pseudo atomic section, the pending handler is run and
deferrable by @code{WITHOUT-INTERRUPTS} is @code{SIG_STOP_FOR_GC}. It
is deferred by pseudo atomic and @code{WITHOUT-GCING}.
+@subsection When are signals handled?
+
+At once or as soon as the mechanism that deferred them allows.
+
+First, if something is deferred by pseudo atomic then it is run at the
+end of pseudo atomic without exceptions. Even when both a GC request
+or a @code{SIG_STOP_FOR_GC} and a deferrable signal such as
+SIG_INTERRUPT_THREAD interrupts the pseudo atomic section.
+
+Second, an interrupt deferred by WITHOUT-INTERRUPTS is run when the
+interrupts are enabled again. GC cannot interfere.
+
+Third, if GC or @code{SIG_STOP_FOR_GC} is deferred by
+@code{WITHOUT-GCING} then the GC or stopping for GC will happen when
+GC is not inhibited anymore. Interrupts cannot delay a gc.
+
@node Implementation warts
@section Implementation warts
-@subsection RT signals
+@subsection Miscellaneous issues
-Sending and receiving the same number of signals is crucial for
-@code{INTERRUPT-THREAD} and @code{sig_stop_for_gc}, hence they are
-real-time signals for which the kernel maintains a queue as opposed to
-just setting a flag for ``sigint pending''.
+Signal handlers automatically restore errno and fp state, but
+arrange_return_to_lisp_function does not restore errno.
-Note, however, that the rt signal queue is finite and on current linux
-kernels a system wide resource. If the queue is full, SBCL tries to
-signal until it succeeds. This behaviour can lead to deadlocks, if a
-thread in a @code{WITHOUT-INTERRUPTS} is interrupted many times,
-filling up the queue and then a gc hits and tries to send
-@code{SIG_STOP_FOR_GC}.
+@subsection POSIX -- Letter and Spirit
-@subsection Miscellaneous issues
+POSIX restricts signal handlers to a use only a narrow subset of POSIX
+functions, and declares anything else to have undefined semantics.
-Signal handlers should automatically restore errno and fp
-state. Currently, this is not the case.
+Apparently the real reason is that a signal handler is potentially
+interrupting a POSIX call: so the signal safety requirement is really
+a re-entrancy requirement. We can work around the letter of the
+standard by arranging to handle the interrupt when the signal handler
+returns (see: @code{arrange_return_to_lisp_function}.) This does,
+however, in no way protect us from the real issue of re-entrancy: even
+though we would no longer be in a signal handler, we might still be in
+the middle of an interrupted POSIX call.
-Furthormore, while @code{arrange_return_to_lisp_function} exits, most
-signal handlers invoke unsafe functions without hesitation: gc and all
-lisp level handlers think nothing of it.
+For some signals this appears to be a non-issue: @code{SIGSEGV} and
+other synchronous signals are raised by our code for our code, and so
+we can be sure that we are not interrupting a POSIX call with any of
+them.
+
+For asynchronous signals like @code{SIGALARM} and @code{SIGINT} this
+is a real issue.
+
+The right thing to do in multithreaded builds would probably be to use
+POSIX semaphores (which are signal safe) to inform a separate handler
+thread about such asynchronous events. In single-threaded builds there
+does not seem to be any other option aside from generally blocking
+asynch signals and listening for them every once and a while at safe
+points. Neither of these is implemented as of SBCL 1.0.4.
+
+Currently all our handlers invoke unsafe functions without hesitation.
@node Programming with signal handling in mind
@section Programming with signal handling in mind
@code{INTERRUPT-THREAD} have the same restrictions and considerations
as signal handlers.
-Destructive modification, and holding mutexes to protect desctructive
+Destructive modification, and holding mutexes to protect destructive
modifications from interfering with each other are often the cause of
non-reentrancy. Recursive locks are not likely to help, and while
@code{WITHOUT-INTERRUPTS} is, it is considered untrendy to litter the
that matter) never wait for another thread that's not in
@code{WITHOUT-GCING}.
+Somewhat of a special case, it is enforced by the runtime that
+@code{SIG_STOP_FOR_GC} and @code{SIG_RESUME_FROM_GC} always unblocked
+when we might trigger a gc (i.e. on alloc or calling into Lisp).
+
@subsection Calling user code
For the reasons above, calling user code, i.e. functions passed in, or
in other words code that one cannot reason about, from non-reentrant
code (holding locks), @code{WITHOUT-INTERRUPTS}, @code{WITHOUT-GCING}
is dangerous and best avoided.
+
+@section Debugging
+
+It is not easy to debug signal problems. The best bet probably is to
+enable @code{QSHOW} and @code{QSHOW_SIGNALS} in runtime.h and once
+SBCL runs into problems attach gdb. A simple @code{thread apply all
+ba} is already tremendously useful. Another possibility is to send a
+SIGABRT to SBCL to provoke landing in LDB, if it's compiled with it
+and it has not yet done so on its own.
+
+Note, that fprintf used by QSHOW is not reentrant and at least on x86
+linux it is known to cause deadlocks, so place SHOW and co carefully,
+ideally to places where blockable signals are blocked. Use
+@code{QSHOW_SAFE} if you like.