| Allegro CL version 8.2 Moderate update since 8.2 release. 8.1 version |
This document contains the following sections:
1.0 Symmetric Multiprocessing introductionThis document describes the Symmetric Multiprocessing (SMP) implementation in Allegro CL on certain platforms. See also the multiprocessing.htm document, which describes the older Lisp-based uniprocessor implementation but most of the functionality applies to SMP as well.
Starting in Allegro CL 9.0, Allegro CL on certain platforms will have images that will allow using multiple independent processors. This may result in dramatic speedups in applications as the additional processor resources are used.
Note that the SMP version works on a machine that has only one processor and using the SMP version may be more efficient on such machines even though (obviously) only one process can run at a time.
But there is a downside. Using multiple processors simultaneously can make programs that run correctly on a single processor fail, either by producing incorrect results or by signaling unexpected errors or even by failing altogether. Here are three areas where such failures can occur:
Some of these problems (but not number 1 so long as the Lisp heap was not modified by foreign code) actually already exist in a non-SMP multiprocessing Lisp, as Allegro CL has been for many years. But because the system-initiated process-switch latency was so slow (on the order of a second) compared to the execution speed, the errors either never occurred or were so rare that the problem was never properly identified or even noticed. But in an SMP Lisp, where processes are just running simultaneously and there is no system process switch latency, the rare or never happening events are suddenly much, much more likely and subtle coding errors which could be ignored before become unignorable.
Please realize first that essentially all problems arise from object modification, although the failure can come from the process reading a value (because the modification process can result in inconsistencies); and second that tools such as locks (see Process locks in multiprocessing.htm) and gates (see Gates in multiprocessing.htm) are available to ensure that processes do not interfere with each other. If you use SMP, it is your programs responsibility to ensure that modifications are consistent and do not cause problems.
The important point is that programs which ran without error (or had such a small chance of error that they for all practical purposes ran without error) may fail in an SMP environment, either because of mysterious and unexpected errors or (and this is likely worse) by not signaling an error and instead producing very incorrect results.
This document describes the problems and also the new tools which will assist in making your program SMP-safe. It discusses the general problem and the macros and data elements introduced in Allegro CL 9.0 to control concurrent access. Many of the macros have keyword arguments that can be used to control how the macro expands in a non-SMP Lisp. This makes it easier to write source code that works efficiently whether it is compiled for running in an SMP Lisp or a non-SMP Lisp.
In a non-SMP Lisp only one thread of control at a time is ever running Lisp code. Lisp execution is gated by a mutex and low-level runtime code selects one lisp process to execute at a time. A Lisp process can voluntarily yield control, either explicitly or by waiting on some event. A running Lisp process can also be preempted, temporarily ceding control to another process in response to some event.
In a non-SMP Lisp interleaved execution of multiple processes is coarse-grained. A process can not lose control at arbitrary points in its execution. Running Lisp code checks periodically, at what might be called safe points, to see if an interruption in the normal flow of sequence has been requested. If so, the state of the running process is saved, the state of the preempting process restored, and that second process allowed to execute.
Lisp-level handling of asynchronous signals is restricted to happen at these same safe points. This is true in SMP-enabled Lisps as well as in non-SMP Lisps. Low-level, non-Lisp support code keeps a record of signals seen but not yet processed. This code responds to asynchronous signals like SIGALRM by recording the event in an unprocessed-signals buffer and setting a global flag to show that a signal has been seen, and then returning to the lisp processing that was happening when the signal was recognized. At the next safe point, the state of the global flag will be seen and Lisp-level processing of the signal will occur. In a non-SMP Lisp this could result in a process switch.
In a non-SMP Lisp, controlling concurrent access to shared objects often reduces to controlling when these process-switch and signal-handling events can occur. By ensuring that a block of Lisp code will execute without interruption, an application running in a non-SMP Lisp can provide arbitrarily complex operations that will look atomic to multiple processes. This is obviously inadequate in an SMP Lisp, where multiple threads of control are running Lisp code simultaneously.
Most symbols naming SMP functionality are in
the multiprocessing
(nickname mp
) package or in
the excl
package. Some functionality is loaded into
all images and other is in the :smputil
module. You
should load this module with a form like
(eval-when (compile load eval) (require :smputil))
(Just (require :smputil)
is sufficient when typed
to the top-level.)
Consider the following example. We have a variable VAR whose initial value is the fixnum 8. Various processes are running in Lisp and independently incrementing or decrementing the value of VAR with calls to incf and decf. Our first attempt is:
;; In process-1: (incf var) ;; In process-2: (incf var) ;; In process-3: (decf var)
But this might cause us trouble, because (incf var)
macroexpands to (setq var (+ var 1))
. It is
possible that (+ 1 var)
is evaluated and then there
is a process switch, before that value is stored as the value of VAR,
so when the process gets control again, it might store what has become
the wrong value.
But in a non-SMP Lisp, we can prevent a process switch with without-interrupts, so we recode as follows:
;; In process-1: (without-interrupts (incf var)) ;; In process-2: (without-interrupts (incf var)) ;; In process-3: (without-interrupts (decf var))
In non-SMP Lisp, once process-1, process-2, and process-3 have completed (or at least executed the forms shown), the value of VAR will be 9 (8 + 1 + 1 - 1). We do not actually know in what order the additions and subtraction will occur, but the without-interrupts guarantees that each will complete once it has started without any process switch and so we can be confident of the final result.
In an SMP Lisp, the without-interrupts macro does not have the effect of preventing a process switch because all processes can be running simultaneously on different processors, and so the whole notion of 'process switch' is problematic. So the following can happen:
1. process-1 reads the value 8 from VAR and places that value in one of its registers. 2. Before process-1 does its calculation and stores a new value in VAR, process-2 reads the value (still 8) from VAR and places it in one of its registers. 3. process-1 adds 1 to 8 and stores the result (9) in VAR. 4. After process-1 stores its new value but before process-2 stores a new value, process-3 reads the value (9) from VAR and places it in one of its registers. 5. process-2 adds 1 to 8 and stores the result (9) as the value of VAR. 6. process-3 subtracts 1 from 9 and stores the result (8) as the value of VAR.
When all this completes, the value of VAR is 8, not 9. The value could indeed end up as 7 (process-3 reads first and stores last), 8 (as shown), 9 or 10 (left as an exercise to the reader).
So code which depended on without-interrupts to ensure that certain operations will not be interfered with and that therefore certain final results could be depended upon will no longer work as expected.
The fundamental issue is that on a non-SMP Lisp, process locking (ensuring that code in a single Lisp process is guaranteed to execute to completion without interruption) gave you object locking (ensuring that only one process could read or set a value) as a side effect. In an SMP Lisp, that is no longer true.
New macros (see Section 3.0 New macros and related functionality) provide object locking to allow the example here and related examples to run as desired. The macros incf-atomic and decf-atomic are specific to this example and others to related examples. Of course, programmers can also use more complex mechanisms to lock objects and processes and avoid undesired concurrency, although these techniques often have a significant overhead.
Here is the code that guarantees that all the incf's and decf's will run as desired (but in an undefined order). This works in SMP and non-SMP Lisps.
;; In process-1: (incf-atomic var) ;; In process-2: (incf-atomic var) ;; In process-3: (decf-atomic var)
All platforms that will support SMP Lisp will also have non-SMP images
available. Current programs that depend on the fact that Lisp code is
run on a single processors will thus continue to work using the
non-SMP images without any changes relating to SMP (beyond trivial
suppression of compiler warnings, done by evaluating (setq
excl::*warn-smp-usage* nil)
).
Therefore there is no need for users who do not wish to use SMP to make any changes to their programs. The rest of this document is directed at users who do wish to use SMP. They will have to determine whether their code must be modified (usually, some changes must be made) using the information in this document.
Three macros are used in non-SMP Lisps to provide concurrency control: excl::atomically, without-interrupts, and without-scheduling. (The symbol excl::atomically was not exported and the associated macro never documented, but some users were aware of it; if you are unfamiliar with excl::atomically, you need not worry about it as its use is now deprecated. You just have less code to modify).
These macros are all very light-weight in processing overhead and are much cheaper than using a process-lock. They have the additional advantage that they do not require the multiprocessing package be loaded. This allows a programmer to write code that runs correctly in a multi-process application, but still runs efficiently in a single-process application. All three rely on the coarse-grained scheduling of non-SMP multiprocessing. All three represent problem situations for multiprocessing under SMP. They are all deprecated in 8.2 and later Lisps, although they are still available and still perform the same functions in a non-SMP lisp as they did before 8.2.
New macros provide the same functions as the deprecated macros, in
ways that are compatible with an smp environment. The new macros are
described in Section 3.0 New macros and related functionality. The deprecated
macros generate compile-time warnings (which can be globally muffled
by evaluating (setq excl::*warn-smp-usage* nil)
).
The excl::atomically macro was never documented and the symbol naming it was never exported. However, some users made use of it. Users who never used atomically can skip this section.
excl::atomically is wrapped around a sequence of forms:
(excl::atomically . BODY)
This acted exactly like
(progn . BODY)
as far as code generation was concerned, but the compiler would produce a diagnostic if BODY involved any operations that might result in a process switch or a garbage collection. Such opportunities are inherent in almost any out-of-line call, in object allocation, in anything that might signal an error, and in several other low-level actions. If BODY contained none of these, then the compiler would accept the form and the programmer could be confident that BODY would execute atomically as far as Lisp processes were concerned.
A major use of excl::atomically was to wrap a section of code in which it was required that garbage collection not relocate some object or objects. This requirement was independent of multiprocessing concerns, and happened most often in low level code that was trying to process an array's elements very efficiently.
In some cases, the excl::atomically form was used not to assure atomicity, or to guarantee gc-free running, but to wrap a section of code that needed to be as fast as possible. If some infelicitous change to the compiler caused it to start generating out-of-line calls where in-line code was expected, the presence of the excl::atomically wrapper made sure there would be a compile-time warning.
The new macro fast-and-clean
replaces excl::atomically. In an SMP
Lisp, (fast-and-clean ...)
, which is the functional
equivalent of (excl::atomically ...)
, cannot be
guaranteed to be atomic because of the nature of SMP and atomicity. It
will, however, prevent a gc from happening while the form is
executing, because gc's can only happen when a process allocates
something or at one of the same "safe-point"s that allow interrupts
(described below
in Section 2.2 Deprecated macro: without-interrupts). The
fast-and-clean form assures
us we have neither of these. Even so, if control of garbage collection
is the issue, using the with-pinned-objects macro is preferred.
The without-interrupts macro works as follows:
(without-interrupts . BODY)
acts almost exactly like
(let ((excl::*without-interrupts* t)) BODY)
excl::*without-interrupts*
is a special
variable (named by an unexported symbol and so not documented) which
gates the processing of asynchronous signals. When an asynchronous
signal triggers Lisp's low-level signal handler, the signal is queued
and a flag is set to indicate the situation. This happens without any
Lisp code executing, and does not interrupt the running Lisp
code. While executing, Lisp code polls the flag periodically, checking
for the signal-has-happened situation.
These checks are made at safe-points in the code, where the Lisp
system is ready to allow execution of arbitrary signal-handling code
in an interrupt-the-application mode; handling the signals at the Lisp
level could result in a process-interrupt, or cause a process switch
in a non-SMP Lisp.
If excl::*without-interrupts*
is nil
, then the safe-point check happens
normally, and any queued signals get
handled. If excl::*without-interrupts*
is
non-nil
, then queued signals are ignored. No
process-interrupt will be handled, and in a non-SMP Lisp, no automatic
process switch can happen
when excl::*without-interrupts*
is
non-nil
, although an explicit process-wait or
other scheduling operation would break through this barrier.
Ignored signals are not discarded, and signals caught while
excl::*without-interrupts*
is
non-nil
will still be added to the queue of
pending signals, to be processed at some safe-point after leaving the
without-interrupts regime. The way in which without-interrupts differs
from the simple let form shown above has to do with details of
safe and efficient processing, ensuring that we pay as little as
possible in computation overhead to ignore the pending signals and
that when excl::*without-interrupts*
returns
to a nil
value, we handle any pending signals
at the next safe-point.
However, even though interrupts are delayed and process switches from the code running within a without-interrupts are also inhibited, object locking, a side effect in a non-SMP Lisp is no longer guaranteed and gc's may occur even if the code within the without-interrupts form does no consing and so does not itself trigger a gc.
The new macro with-delayed-interrupts replaces without-interrupts. In a non-SMP Lisp, with-delayed-interrupts expands to the same code as without-interrupts (but does not produce a compiler warning). In an SMP Lisp, interrupts are delayed, so the code within the macro runs to completion once it is started, but objects are not locked as other processes can run.
sys:without-scheduling worked as follows:
(without-scheduling . BODY)
worked like
(let ((*disallow-scheduling* t)) (progn . BODY))
*disallow-scheduling*
is a special variable
that gates automatic process switches in non-SMP Lisps. It is checked
when other factors indicate it is time to stop the currently running
process and let another one have control for a while. If it
is nil
, then the switch will happen; if
non-nil
, the switch will be prohibited and
the current process will continue running. sys:*disallow-scheduling*
finds some
specialized uses, as when it is important to honor sys:with-timeout requests (which require
handling timer signals, which would be ignored by without-interrupts) without allowing a process
switch to occur.
There is no exact replacement for sys:without-scheduling since the model for processes is changed and the concept of process switching is nearly without meaning. If you use sys:without-scheduling, the question is why? If the purpose was object locking (ensuring that values are not read or written by other processes while code runs), you have to use the new object locking routines. If the purpose was to ensure that the process ran to completion but also allowed for signal processing (for sys:with-timeout, for example), that is no longer supported.
We provide new macros for efficient object-level synchronization. Some of these involve locking objects, and others are atomic read-modify-write primitives. We also provide a set of macros to perform those functions of the deprecated macros that did not serve to synchronize access to specific objects.
A Lisp that actually supports SMP has :smp
on its
*features*
list. A Lisp
that supports these new macros has :smp-macros
on
its *features*
list. Thus a 9.0 smp-enabled Lisp has both :smp
and
:smp-macros
on its *features*
list, while an 8.2 Lisp, a patched
8.1 Lisp, or a non-SMP 9.0 Lisp have the
feature :smp-macros
but
not :smp
. See Reader macros and cl:*features*
in implementation.htm for more information on
features.
All the symbols naming the functions, macros, classes and structure types added as part of SMP are exported from the excl package.
The first three new macros provide the non-concurrent-access-related uses of the old excl::atomically and without-interrupts macros.
When an atomic operation fits the pattern
(setf [place] (foo [place]))
and [place] is one of several inline-accessible 'slots' like (car x), the compiler can generate code to ensure that the value in [place] doesn't change between the initial read and the subsequent write. There are several macros that provide special cases of this operation, and a general conditional-update form on which all of them are based. [place] can be any of the following setf-places:
These places are a subset of places legal for setf. In particular, they do not include local or special bound variables. This is perhaps not surprising because local places are not shared between multiple processes, but the actual reason for the restriction has to do with hardware/system limits on where low-level atomic read-modify-write can be done.
The relevant macros are:
(when (eq [place] oldval-form)
(setf [place] newval-form) t)
except that the subforms of
[place] are evaluated just once, the comparison and store, if there is one,
are an atomic sequence.
(globalq
x)
is a macro shorthand
for (system:global-symbol-value 'x)
.
None of the deprecated macros provide effective concurrency control in an smp lisp. New macros are provided to give the programmer tools for controlling concurrent access. There is no way to make the change to smp automatic and painless. The crux of the problem is that in the absence of smp, one-sided concurrency control works. That is, a process modifying a shared object could wrap a series of modifications with pre-8.2 forms like those using without-interrupts, and be assured that all processes would see those changes as an atomic transaction. The forms themselves assured that no other process would see the object until all the changes were in place.
A confounding factor is that pre-8.2 concurrency was very coarse-grained. Processes were preempted at a very low frequency, typically once in 2 seconds, unless they explicitly waited on some condition. Even with no attempt at concurrency control, it was rare to see an inconveniently-timed process switch break a transaction that should have been atomic.
In an SMP Lisp, concurrency is as fine-grained as it could possibly be. Writers and readers must all cooperate to ensure that access to shared structures is legitimate.
Allegro CL has a set of object-locking macros. These are lighter weight and less flexible than the mp:process-lock features, and do not rely on the mp package. They must be used with care to avoid deadlocks.
The macros and related functionality are:
synchronizing-structure
:
this structure class is used as a base structure when instances of a
structure class are to be locked with the with-locked-structure macro.
basic-lock
: this
structure class is a simple extension of synchronizing-structure,
adding a name slot.
lockable-object
: this is
a mixin class that allows subclasses to be used in the
with-locked-object
macro.
A sharable lock allows multiple user threads to access a resource simultaneously when it is safe and appropriate to share it. When exclusive use of the resource is required, a thread can ask for exclusive access. During exclusive access, all shared access is blocked. A typical use case is a data structure that can be safely accessed in read-only mode by many threads but requires exclusive access while the data structure is modified and may be temporarily in an inconsistent state.
A sharable-lock
instance is
created with make-sharable-lock. Locking contexts are
provided with the macros with-sharable-lock.
An application can
manage locking explicitly with sharable-lock-lock and sharable-lock-unlock.
An important parameter in the definition of Allegro CL sharable locks is the max-shared count that specifies the maximum number of simultaneous shared users of a lock. When the maximum number of simultaneous shared users is reached additional attempts to share the lock must wait until some shared threads release control.
Once a thread requests exclusive access, subsequent request for shared access must wait until the exclusive request is honored and completed. Because there is a limit on the number of shared users, there is an upper bound on the delay that the exclusive request can endure. While an exclusive locker is running, shared requests are accepted until the max-shared limit is reached. These shared requests will be honored as soon as the exclusive user releases the lock. If another exclusive request arrives before the first is done, it will wait until the first exclusive request and up to max-shared shared requests are honored.
If several threads compete for exclusive access, it is possible to create a situation where an exclusive request is delayed indefinitely. The only solution in this case is to reduce the frequency of exclusive requests or to speed up their handling. It is also possible to delay shared requests indefinitely if too many threads compete for shared access. Such a situation can be alleviated by increasing the max-shared value. The size of a sharable lock is proportional to the max-shared value, but values in the thousands are still reasonable.
In the current implementation of sharable-lock, the fairness policy is strong writer preference. When an exclusive lock is requested, only the current shared users are allowed to continue. New requests for a shared (reader) lock must wait until the exclusive (writer) request is honored and completed. While an exclusive lock is in effect, shared requests up to the max-shared limit are allowed but remain pending until the exclusive lock is released.
sharable-lock
: the class of a sharable-lock object.
A barrier instance allows several threads to share progress information or to synchronize their behavior. A newly created barrier is initialized in the enabled state with an expected count and an arriver count of zero. As each thread arrives at or passes through the barrier, the arriver count is incremented. All barrier-wait calls will block until the arriver count reaches the specified expcted count.
The state of a barrier can set to disabled. A disabled barrier is simply ignored -- passing through a disabled barrier has no effect, and call to barrier-wait returns immediately.
A typical application of barriers is to synchronize several threads to start some computation simultaneously. The program creates a barrier with a count of N; each of the N threads calls barrier-wait and blocks. When the slowest thread reaches the barrier, all the threads are released.
Another application of barriers is to allow one thread to detect when several threads have completed some task. The control thread creates a barrier with a count of (N+1) and calls barrier-wait; each of the N worker threads calls barrier-pass-through when it completes the task. When the Nth thread calls barrier-pass-through, the barrier-wait in the control thread returns.
barrier
, class
The mp:queue
class and the
functions for using it are very safe, but relatively
heavy-weight. Becasue they use mp:process-lock to control access to
the queue, they use more resources than something based on
with-locked-object.
A reasonable development approach would be to use an instance of mp:queue for transmitting information from producers to consumers, with the option of later coding a lighter-weight application-specific mechanism if performance tests showed the queue to be a significant cost factor.
An operator which accesses or sets the value stored in an object is safe when it can be called in a number of processes, otherwise unsynchronized by user code, and the result is the same as if the operations were done sequentially in a single process though in an unspecified order. Therefore, safe operations will not error and will not produce bogus results (results that could never have occurred no matter in what order the operations were done). But safe operations will not necessarily produce the same results in repeated runs.
Unsafe operators are those which could unexpectedly error or could produce results which are bogus according to the definition above.
For operators that do not make any changes (i.e. do not set values), safety means that the result will be meaningful and correct for some instantaneous state of the data structure. Each thread will see a data structure in a consistent state. Operators that modify data are safe if they see data structures in some instantaneous consistent state, and leave the data structure in a consistent state.
It is important to emphasize, so we will say it again, that safe operators are not guaranteed to produce the same result in repeated runs, just a result that is consistent with some possible sequence of evaluation, produced by some interleaving of the operations from the multiple processes.
Basically any operator that modifies a value is unsafe except those listed above and the atomic operators described elsewhere in this document. In particular sequence functions and their setf's are unsafe. Operators like pushnew, which is supposed to guarantee that duplicates are not added is unsafe and may produce results that have unexpected duplicates.
The jLinker (see jlinker.htm) and the dbi (or aodbc) modules (see aodbc.htm) are thread safe in SMP or non-SMP Lisps.
Certain standard Common Lisp operations and some standard Allegro CL operations cannot be conveniently be made thread safe. Here we summarize many of these issues.
*features*
or *modules*
is not thread-safe.
sys:*all-processes*
must be treated with the
caution described on the documentation page.
It is not possible to suspend all threads programmatically. (All processes are suspended in a garbage collection but this cannot generalized.)
Some code you may have may not be SMP safe. However, this code will typically load and run in an SMP Lisp (probably producing incorrect results, perhaps silently). If you have a source file which should not be run in an SMP lisp, add this form to the top:
(eval-when (compile load eval) (when (featurep :smp) (error "This cannot run in an SMP lisp")))
Copyright (c) 1998-2016, Franz Inc. Oakland, CA., USA. All rights reserved.
This page was not revised from the 8.1 page.
Created 2010.1.21.
| Allegro CL version 8.2 Moderate update since 8.2 release. 8.1 version |