1
Wait Free Synchronization
Lecture 2
CS380D Distributed Computing I
2
Linearizability
Each operation of the system appears to take effect instantaneously between the invocation and response.
Linearizability is a local property
! a concurrent system is linearizable if and only if each individual object is linearizable
Linearizability is a non-blocking property
! a total operation (defined for all object states) is never required to block
Wait-Free Data Structures
A wait-free data structure guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of other processes.
A lock-free data structure guarantees that *some*
process will complete an operation in a finite number of steps, regardless of the execution speeds of other processes.
Compare-and-Swap
boolean CAS( val* addr, val old, val new) { if (*addr == old) {
*addr = new;
return true;
} else
return false;
}
atomically
!CMPXCHG (with “lock”) – Intel x86
!Load Linked / Store Conditional – MIPS, PowerPC
5
Wait-Free Synchronization and Consensus
In a system with n processes, a primitive can be used to construct wait-free objects if and only if the primitive can be used to solve the
consensus problem.
6
Consensus Objects
A consensus object is a concurrent object that implements a consensus protocol:
The consensus number of a concurrent object is the maximum number of processes for which the object can solve a simple consensus problem.
// A consensus object class cobj {
// decide - always returns the same value, which is a value previously passed as input value_t decide( value_t input );
}
Wait-Free Hierarchy
compare&swap, FIFO queue w/ peek
!
n-register assignment 2n - 2
test&set, fetch&add 2
atomic read/write registers 1
Object Consensus
Number
Universal Objects
An object is universal if it can be used to construct a wait-free implementation of any object.
In a system of n processes, an object is
universal if and only if the object has
consensus number n.
9
Wait-Free vs Lock-Free
In lock-free data structures, it’s okay to interfere with another process as long as you make progress – this maintains the guarantee that some process makes progress.
In wait-free data structures, interfering with another process is not okay, because this could prevent that process from making progress.
So in wait-free data structures, concurrent operations must cooperate to make sure everyone makes progress.
This cooperation is called helping.
10
Helping
The general approach for helping is:
!Each process “announces” its intent to perform an operation before starting.
!Once an operation is announced, any process can perform the operation.
!“Eventually” some process performs the operation
• Even if the original process crashed.
• More than one process could attempt to perform the operation. The protocol must ensure that only one succeeds.
Universal Construction - 1
Object is represented as a linked list of cells, each of which represents an operation on the object
! Order of cells in the list determines order of operations
.
struct cell {
cobj after // consensus object with value < cell *after >
// null indicates end of the list cell *before; // ptr to previous cell
seqnum_t seq; // sequence number; 0 means not threaded // montonically increasing by 1
invoc_t inv; // invocation (operation name and argument values) cobj new; // consensus object with value <new.state, new.result>
}
Universal Construction - 2
class object { cell anchor;
// Shared variables - all processes can read, but only process P can write element P cell* announce[1:N]; // Pth element is the cell P is trying to thread cell* head[1:N]; // Pth element is last cell P has observed // Auxilliary variables - "write only" variables used only for proof purposes
set of cell concur[1:N]; // the set of cells whose addresses have been stored into // the head array since P's last announcement (stmt 2) seqnum_t start[1:N]; // value of max(head[Q].seq) at P's last announcement object() { // constructor
anchor = { after = new cobj(null), before = null, seq = 1, inv = init, new = cobj(<init.state, 0>) };
for all Q {
announce[Q] = anchor;
head[Q] = anchor;
concur[Q] = {};
start[Q] = anchor.seq;
} } }
13
Universal Construction - 3
universal(invoc_t what) returns(RESULT) // Allocate a cell to represent an operation
1 cell mine = { after = new cobj, before = null, seq = 0, inv = what, new = new cobj } // Announce intent to thread cell
2 <announce[P] = mine; start[P] = max(head[1].seq,...,head[N].seq); concur[P] = {};>
// Locate a cell near the end of the list 3 for (Q = 1; Q <= N; Q++) do
if (head[P].seq < head[Q].seq) then head[P] = head[Q];
end for
// Execute until the cell for this process has been threaded onto the object.
4 while announce[P].seq == 0 do // while body on next slide end while
13 <head[P] = announce[P]; (for all Q) concur[Q] = concur[Q] U announce[P]; >
14 return (announce[P].new.result) end universal
14
Universal Construction - 4
// Execute until the cell for this process has been threaded onto the object.
4 while announce[P].seq == 0 do
5 cell *c = head[P] // c is our view of the last cell on the list.
6 cell *help = announce[(c.seq mod N) + 1] // Choose a process to help // If the process needs help, try to thread its cell for it. Else thread mine.
7 if help.seq == 0 then prefer = help else prefer = announce[P]
end if
8 d = c.after->decide(prefer) // Attempt to thread a cell (either mine or help) // operation could be nondeterminstic, so compute result using consensus obj 9 d.new->decide(apply (d.inv, c.new.state))
10 d.before = c 11 d.seq = c.seq + 1
12 <head[P] = d; (for all Q) concur[Q] = concur[Q] U {d}>
end while
Universal Construction - Example
anchor
announce
0 y
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
x y
0 0 0 x y 0
0 x 0 0
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 0 0 x y 0
0 0 0 x y 0
17
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
18
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
help c
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
help c prefer
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< z >
0 pop ()
< >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
help c prefer
21
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< z >
4 pop ()
<|B|, A >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
help c prefer
22
Universal Construction - Example
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< z >
4 pop ()
<|B|, A >
< >
x y z
0 z 0 x y 0
0 z 0 x y 0
help c prefer
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 x y 0
0 y 0 x y 0
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 w y 0
0 y 0 y y 0 0
pop ()
< >
< >
w
25
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 w y 0
0 y 0 w y 0 0
pop ()
< >
< >
help c
w
26
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< >
0 pop ()
< >
< >
x y z
0 z 0 w y 0
0 y 0 w y 0 0
pop ()
< >
< >
w
help c
prefer
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< w >
0 pop ()
< >
< >
x y z
0 z 0 w y 0
0 y 0 w y 0 0
pop ()
< >
< >
w
help c
prefer
Helping
anchor
announce
head
2 push (A)
<|A|, 0>
< y >
1 init
<init_state, 0>
NULL
< x >
seq inv new before after
3 push (B)
<|A|B|, 0>
< w >
5 pop ()
<||, B >
< >
x y z
0 z 0 w y 0
0 z 0 w y 0 4
pop ()
<|B|, A >
< z >
w
help c
prefer
29
Auxilliary variables
“write only” variables only used for proofs
!concur[P] – the set of cells whose addresses have been stored into the head array since P's last announcement (stmt 2)
!start[P] -- value of max(head) at P's last announcement
30
Construction is Wait-Free
Lemma 1: The following assertion is invariant:
| concur[P] | > n ==> announce[P] " head
Lemma 2: The following assertion is invariant:
max(head) >= start[P]
Construction is Wait-Free
Lemma 3: The following is the loop invariant for stmt 3:
max(head[P].seq, head[Q].seq, ...,head[N].seq ) >= start[P]
where Q is the loop index.
.
Construction is Wait-Free
Lemma 4: Just before stmt 4:
head[P].seq >= start[P]
Lemma 5: The following is invariant:
| concur(P) | >= head[P].seq – start[P] >= 0.
33
Construction is Wait-Free
Theorem 14: Construction is linearizable and wait-free.
Proof:
!linearizable because order of operations is determined by order of cells in the list
!Wait-free because the main loop executes at most N+1 times.
34
Universal Construction: Summary
Given an object with a sequential specification, we can use a consensus object with consensus number n to create a linearizable, wait-free concurrent object for a system of n processes.
This construction is theoretically important.
This construction is not practically useful.
!Each operation requires two consensus protocols
!The object requires O(N2) space
Practical Lock-Free Synchronization
Desiderata:
!Ease of reasoning: programmers should be able to construct a correct lock-free data structure -- and be able to prove or rigorously argue its correctness -- without ending up with a publishable result.
!Performance: programmers should be able to construct lock-free data structures with acceptable performance, and be able to understand and influence the performance of the implementation
Let’s Start Small
Suppose our object is small enough to fit in a single word ...
lockfree_op( obj_type *obj, args ) obj_type new_obj;
do
new_obj = Load_Linked( object );
ret = op( &new_obj, args );
cc = Store_Conditional( obj, new_obj );
until ( cc );
return ret;
37
For bigger (but still small) objects
Add a level of indirection
! Object must be small enough to be copied efficiently
! Object storage must be in a single, fixed-size contiguous block.
lockfree_op ( obj_type **obj, args ) obj_type *old_obj, *new_obj;
do
old_obj = Load_Linked( obj );
new_obj = new( obj_type );
memcpy( new_obj, old_obj, sizeof( obj_type) );
ret = op( new_obj, args );
cc = Store_Conditional( obj, new_obj );
until ( cc );
free( old_obj );
return ret;
38
lockfree_op ( obj_type **obj, args ) obj_type *old_obj, *new_obj;
do
old_obj = Load_Linked( obj );
new_obj = new( obj_type );
memcpy( new_obj, old_obj, sizeof( obj_type) );
ret = op( new_obj, args );
cc = Store_Conditional( obj, new_obj );
until ( cc );
free( old_obj );
return ret;
Two problems
Standard memory management routines are typically not lock-free
Freeing the storage for the old object could cause another process to crash
! No calls to standard memory management routines
static obj_type *new_obj; // One per process – points to an empty obj lockfree_op ( obj_type **obj, args )
obj_type *old_obj;
do
old_obj = Load_Linked( obj );
memcpy( new_obj, old_obj, sizeof( obj_type) );
ret = op( new_obj, args );
cc = Store_Conditional( obj, new_obj );
until ( cc );
new_obj = old_obj; // Save old object for use on next op return ret;
Solution Attempt 1 New Problem
static obj_type *new_obj; // One per process – points to an empty obj lockfree_op ( obj_type **obj, args )
obj_type *old_obj;
do
old_obj = Load_Linked( obj );
memcpy( new_obj, old_obj, sizeof( obj_type) );
ret = op( new_obj, args );
cc = Store_Conditional( obj, new_obj );
until ( cc );
new_obj = old_obj; // Save old object for use on next op return ret;
memcpy is not atomic, so contents of new_obj could be inconsistent.
41
Inconsistent Data
May seem harmless
!Store_Conditional will fail, so it can’t corrupt the obj
But it could cause the process to crash
!Null ptr dereference, divide by zero, etc.
Hardware solution: validate instruction Software solution: version numbers
42
The “Small Object” Protocol - 1
typedef struct { obj_type obj;
unsigned check[2];
} Obj_type;
static Obj_type *new_obj; // One per process – points to an empty obj lockfree_op ( Obj_type **Obj, args )
Obj_type *old_obj, *new_obj;
unsigned first, last;
while ( TRUE )
// While loop body on next slide end while;
new_obj = old_obj; // Save old object for use on next op return ret;
The “Small Object” Protocol - 2
while ( TRUE )
old_obj = Load_Linked( obj );
new_obj->check[0] = new_obj->check[1]+1; // Mark inconsistent first = old_obj->check[1];
memcpy( &new_obj->obj, &old_obj->obj, sizeof( obj_type) );
last = old_obj->check[0];
if ( first != last ) continue;
ret = op( new_obj, args );
new_obj->check[1]++; // Mark consistent cc = Store_Conditional( obj, new_obj );
if ( cc ) break;
end while;
! Readers access version numbers in opposite order from writers
So far ...
Ease of Reasoning
!For small objects, we can (almost mechanically) construct a lock-free concurrent implementation of an object given a sequential implementation
Performance
!Should be pretty good, if cost of memcpy is small
!But ...
45
Performance of “Naive” Approach
Encore Multimax
! 18 NS32532 processors
Compare to spin locks
! Using test&test&set
Benchmark
! Each process performs 220/n queue operations on a single queue
! All runs perform the same amount of work
46
Performance Problems
Useless Parallelism
!When one process successfully performs an operation, all other processes that have started an operation will fail – but continue to consume
resources (and generate contention for memory and bus bandwidth)
Starvation
!Operations that take longer have a much greater chance of being aborted by shorter operations
• Even when the relative difference in running time is small
Solution to Performance Problems
Exponential backoff
!When contention is detected
• suspend for a random time interval between 0 and t
• also double t (up to some maximum)
!On successful operation
• reduce t by half (down to some minimum)
Performance with Backoff
Performance is better than standard spin lock for 8 or more processes
Performance is within a factor of two of a
“sophisticated” spin lock implementation (using exponential backoff)
49
Now where are we?
Lock-free implementation of small objects
"
Ease of reasoning"
PerformanceWhat’s left to do:
!
Wait-free small objects!
Lock-free and wait-free large objectsRead the paper!
50