Atomic operations in Allegro CL 9.0 SMP

Allegro CL 9.0 on Windows, Linux, and the Mac comes with a version which supports Symmetric Multiprocessing (SMP), which means that it can utilize more than one hardware processor at the same time. (On all supported platforms, a non-SMP version of Allegro CL 9.0 is also available.) This is a second of a series of Tech Corner articles which discuss aspects of SMP.

One feature of SMP is that it is difficult to do certain simple store and read operations on lists, like pop and push and certain read/modify/store operations, such as is done by incf and friends, and pushnew. All these operators work fine in SMP if the place they are operating on is read and stored by one process only, indeed most work fine if one process only stores. But as soon as multiple processes try to store, things can go wrong.

The problem is that operations like incf first read the current value in a place (such as the value of a variable), then increment it by 1, and then store the new value. If in the time that between the read and the store, another process also calls incf (or decf) on the same location, the operations of the two proocesses can get interleaved and the result is not what is expected. The following shows this. Assume the steps are ordered by the exact instant in time they occur:

1. Value of *var* is 6
2. Process 1 calls (incf *var*)
3. Process 2 calls (incf *var*)
4. Process 1 reads value, gets 6
5. Process 2 reads value, gets 6
6. Process 1 adds 1 to 6, gets 7
7. Process 2 adds 1 to 6, gets 7
8. Process 1 stores 7 as value of *var*
9. Process 2 stores 7 as value of *var*

The result is *var* ends up with value 7 when it should have value 8. More processes and combined incf's and decf's can result is an array of potential end values, almost all incorrect, if by 'correct' we mean the result if all the incf's and decf's were done in a single thread.

Here we have a log of doing this calculation running Allegro CL 64-bit 9.0 SMP on a 4-processor Windows 7 machine. We define functions that do incf's and decf's 100 times, and we run them in twenty processes. Because we have an equal number of incf and decf calls, the value of *v* at the end should be 0. But it is -4 in this example and can be any value between -1000 (no incf succeeds) to 1000 (no decf succeeds).

cg-user(1): (defvar *v* 0)
*v*
cg-user(2): (defun foo ()
              (dotimes (i 100) (incf *v*) (sleep 1)))
foo
cg-user(3): (defun bar ()
              (dotimes (i 100) (decf *v*) (sleep 1)))
bar
cg-user(4): (compile 'foo)
foo
nil
nil
cg-user(5): (compile 'bar)
bar
nil
nil
cg-user(6): (dotimes (i 10)
              (mp:process-run-function (symbol-name (gensym)) 'foo)
              (mp:process-run-function (symbol-name (gensym)) 'bar))
nil
cg-user(7): :proc
P   Id Bix Dis Sec   dSec Pri State    Process Name, Whostate, Arrest
* 20f8  14   0   2    2.2   0 waiting  g1181, Sleeping
* 2058  25   0   2    2.2   0 runnable g1192
* 31e8  23   0   2    2.0   0 runnable g1190
*  47c   3   0   2    1.7   8 runnable IDE GUI
* 33d8  15   0   1    1.4   0 runnable g1182
* 154c   7   0   1    1.4   0 waiting  g1174, Sleeping
* 25b4   9   0   1    1.3   0 waiting  g1176, Sleeping
* 112c  22   0   1    1.3   0 runnable g1189
* 2c78   8   0   1    1.3   0 waiting  g1175, Sleeping
* 2750  11   0   1    1.2   0 runnable g1178
* 37c8   6   0   1    1.2   0 runnable g1173
* 1904  16   0   1    1.2   0 waiting  g1183, Sleeping
* 2054  21   0   1    1.0   0 runnable g1188
* 1d14  20   0   1    1.0   0 waiting  g1187, Sleeping
* 1b3c  12   0   1    0.8   0 runnable g1179
* 1198  10   0   1    0.7   0 waiting  g1177, Sleeping
* 2bf8  24   0   1    0.6   0 waiting  g1191, Sleeping
* 3280  19   0   1    0.6   0 runnable g1186
* 3694  13   0   0    0.4   0 waiting  g1180, Sleeping
* 2ac4  17   0   0    0.3   0 waiting  g1184, Sleeping
* 234c   2   1   0    0.2   0 waiting  Initial Lisp Listener, waiting-for-input
* 2f9c  18   0   0    0.2   0 runnable g1185
*  fa0   5   0   0    0.0   0 runnable Listener 1
cg-user(8): :proc
P   Id Bix Dis Sec   dSec Pri State    Process Name, Whostate, Arrest
*  47c   3   0   2    0.3   8 runnable IDE GUI
*  fa0   5   0   0    0.0   0 runnable Listener 1
* 234c   2   0   0    0.0   0 waiting  Initial Lisp Listener, waiting-for-input
cg-user(9): *v*
-4
cg-user(10): 

To handle incf and decf in SMP, we have added the macros incf-atomic and decf-atomic. The revised functions are then:

cg-user(1): (defvar *v1* 0)
*v*
cg-user(2): (defun foo ()
              (dotimes (i 100) (incf-atomic *v1*) (sleep 1)))
foo
cg-user(3): (defun bar ()
              (dotimes (i 100) (decf-atomic *v1*) (sleep 1)))
bar

Now any calls to foo and bar in any process in any order will, when they have all completed, result in *v1* having the expected value. However, the order of calculation is not guaranteed.

push, pop, and pushnew

These macros also have problems with SMP. The push macro places a new item at the front of a list. push works be taking an item, creating a new cons object, making the cdr point to the current first cons of the list, and making the list header point to the new cons. (Non-empty lists are stored as a collection of conses and a header. The header points to the first cons in the list, and the cdr of each cons points to the next list cons, except the last where the cdr is nil -- ignoring, for this discussion, circular lists.) pushnew works by first testing whethe rthe new item is on the list, and adding it like push if it is not. popmoves (and returns) the first item from the list.

Here we illustrate the problem with push. The following is what can happen with two processors both evaluating a push on the same list:

Processor 1 get the pointer to the first list item
Processor 2 get the pointer to the (same) first list item
Processor 1 creates a new cons with the first argument to 
    push the car and the pointed just obtained the cdr
Processor 2 creates a new cons with the first argument to 
    push the car and the pointed just obtained the cdr
Processor X replaces the pointer in the list header 
    with a pointer to its new cons
Processor Y replaces the pointer in the list header 
    with a pointer to its new cons

We put X and Y rather than 1 and 2 in the last lines. Whichever is X will find its item missing from the list after this all completes. This will work correctly with push-atomic used in place of push.

pop has the same issue (two processes could both get the same value from the list rather than one getting the first and the other getting the second). This will work correctly with pop-atomic.

pushnew adds an additional complication: making sure that the object to be added to the list is not already present. If two processes try pushnew with the same item at the same time, even if push-atomic is used, the result may be that the item is placed on the list twice. Rather than having an atomic primitive version of pushnew, implementations have to use locks or similar tools to guarantee the list is updated correctly without unwanted duplicates.

Suitable places for atomic operators

Not all places suitable as a place in a setf form will work with these atomic operators. This section of smp.htm lists the suitable places.

We will discuss locks in the next Tech Corner article. See smp.htm, which is the main document discussing SMP.

Copyright © 2014 Franz Inc., All Rights Reserved | Privacy Statement
Delicious Google Buzz Twitter Google+