Design Techniques - Lock-free Data Structures

1.3 Lock-free Data Structures

1.3.1 Design Techniques

Lock-free implementations employ optimistic conflict control and guarantee system-wide progress. In contrast to pessimistic lock-based approaches, processes do not signal their presence before the operation, work independently and check at the end whether their independent work is invalidated. As a result, delays occur only if there is an actual conflict between concurrent processes. For example, an inactive preempted process cannot delay another process which is possible when one relies on mutual exclusion.

Three steps form the basic block to design a lock-free data structure operation: accessing the concurrent data structure to determine its state, preparing the desired changes to the concurrent data structure locally and trying to apply them to the shared state in an atomic way (thanks to an atomic primitive). When included in a retry loop, the basic block can be repeated until the desired changes are applied to the concurrent data structure.

Similar to the coarse- and fine-grained locking approaches, lock-free data structures are designed in many different ways. Universal constructions are design techniques that can transform any sequential object into a safe concurrent object. As a coarse-grained approach, one can always rely on the universal construction described by Herlihy in [9], where it is shown that any abstract data type can get a lock-free implementation based on a single retry loop that applies the whole operation with a single successful atomic primitive.

In simple terms, the construction is realized in three steps: a process (i) accesses (via a shared pointer) to the object; (ii) copies the object and applies

1.3. LOCK-FREE DATA STRUCTURES 15 the sequential operation to the copied object; (iii) tries to apply the changes to the shared state by updating the shared pointer to the updated copy with an atomic primitive, and repeats the three steps until the third step is successful which happens only if the shared pointer is not updated by another process between step one and three. This approach introduces two problems for the large objects. It is inefficient to copy a large object, and the potential parallelism might be inhibited because the updates can conflict even if they modify the disjoint parts of the copied object (i.e. the implementation is not disjoint-access parallel [22]). Although this construction emphasizes mostly the computability aspect in asynchronous concurrent environments, it can be used as a basis to design efficient implementations of some fundamental abstract data types that have inherent sequential bottlenecks. This can be done by updating only a small portion (memory words that host the bottleneck) of the data structure while the old and new versions are sharing the untouched portion of the data structure.

A popular example is Treiber’s lock-free stack [23]. Its operations (push and pop) are realized with a single retry loop, both following a very similar structure. Figure 1.3 provides the structure of the push operation of Treiber’s stack. The stack is formed of a linked list of nodes where the top variable points to the first node. A push operation takes a new node as its parameter and appends it to the top of the stack. One can observe the three steps: (i) read the top pointer to determine the first element of the stack; (ii) prepare the new desired state locally by setting next field of the new node to the address of the first element; (iii) try to commit this state as the new state of the data structure with a Compare-And-Swap (CAS) on the top pointer to update it with the address of the new element. These steps are repeated in a retry loop until a successful CAS, whose failure would imply the existence of another successful concurrent operation.

For some other abstract data types, more practical designs apply the basic block in multiple, finer steps that gradually carry the data structure to the desired state. As in the fine-grained locking, this reduces the conflicts between different operations and provide better performance. However, it is harder to obtain the lock-free progress guarantee property when the operations are com-

16 CHAPTER 1. INTRODUCTION Push(newNode)

while(! success) oldNode_{← top}

newNode.next_{← oldNode}

success _{← CAS(top, oldNode, newNode)}

Figure 1.3:Treiber Stack Push Operation

pleted in multiple steps. The strategy here is to leave a sign to the other processes regarding the state of the operation after each step so that they can take action accordingly in order to guarantee the system-wide progress. Having en- countered an incomplete operation a process might (i) ignore and start its own operation, if possible; (ii) try to help (often not a selfless type of help) the incomplete operation before executing its own operation; (iii) try to merge the incomplete operation with its own operation at hand.

For example, one can think of Delete operation on the lock-free skip list [24]. This operation might require updates on multiple pointers in order to entirely detach the deleted element from the skip list. All these updates are not applied atomically but gradually each leaving a sign regarding the state of the operation. First, the element is logically deleted with a mark. This mark leaves a sign to other processes so that they can determine the state of the incomplete operation in case they are operating in the vicinity of the deleted element. This knowledge allows them to avoid modifications that would lead to inconsistent states and take action (help for the next steps of the incomplete delete operation or ignore if possible) accordingly. In the same vein, the remaining steps of the operation are gradually executed until the element is completely detached from the skip list.

Loosely speaking, helping might create focal contention points, and ignor- ing might introduce additional work [25]. Some combination of these techniques is often used to design efficient lock-free data structure operations de- pending on the data structure type or the usage context. There are numerous lock-free implementations of various abstract data types with different design

1.3. LOCK-FREE DATA STRUCTURES 17 choices: skip lists [24, 26], binary trees [27, 28], stacks [23, 29, 30], queues [20, 31–34], vectors [35], bags [36], deques [37, 38], priority queues [39, 40], hash tables [41, 42], linked lists [43, 44]. This variety complicates the gathering of lock-free data structures under a unified generic design.

In document Throughput and energy efficiency of lock-free data structures: Execution Models and Analyses (Page 32-35)