1.3 Lock-free Data Structures
1.3.4 Execution Models and Analyses
It is common to model asynchronous executions by assuming an adversarial scheduler whose capabilities can vary depending on the context of study. This approach is convenient for the impossibility results and reasoning about the correctness properties of algorithms [4], but it leads to worst case bounds when the contention, connected to this performance, is analyzed. In the literature, complexity models have been proposed for contention in asynchronous shared memory systems. In [50], stall time, that is induced by memory operations that access to the same memory location at the same time interval (harware con- flicts), is analyzed by assuming an adversarial scheduler. Both retry loop and hardware conflicts are considered in [51]. To capture the cost of contention, the total amount of work is bounded for an n-process lock-free update proto- col where a process successfully updates a location once and returns from the protocol. In this study, the impact of exponential back-off is also analyzed.
In addition, amortized analysis techniques have been exploited [27, 43] to address the concurrency-related issues since the execution time of an individual lock-free operation cannot be bounded by definition. The failed attempts in the retry loops can be amortized by the successful ones, due to the fundamental property which states that a failed retry implies a successful concurrent retry. These analysis parameters can be set with a measure of contention to bound the average time complexity of the successful operations. Some common con- tention measures are:
• Point Contention[52]: maximum number of operations that are executed concurrently at any point during the execution interval of the operation • Interval Contention[53]: number of operations whose execution interval
overlaps with the execution interval of a given operation
The contention measure (that frames retry loop and hardware conflicts un- der a single contention cost) is often bounded by considering the worst case. However, the worst-case behavior is not enough to express the performance that we observe in practice. A tighter estimate of contention is needed because
1.3. LOCK-FREE DATA STRUCTURES 23 the worst case is reached only if the concurrent operations access the same part of the data structure at the same time.
Close to the practical domain, the expected system and individual opera- tion latencies are analyzed for a general class of lock-free algorithms under a uniform stochastic scheduler [21].
These theoretical analyses for the time complexity of lock-free data struc- tures target the asymptotic behaviors in terms of number of processes. Also, empirical studies [54, 55] have been conducted to understand the throughput and energy efficiency. These empirical studies help to grasp the complicated in- teraction between software and hardware. However, there is a lack of analytical results that target the performance of lock-free data structures, that is observed in practice, with the consideration of the underlying hardware. This thesis aims to bridge the gap between theoretical bounds and actual measured performance. In this thesis, we model and analyze the performance and energy efficiency of lock-free data structures on top of real hardware platforms. The modeling phase transforms the system, that constitutes lock-free program and machine, into an execution model, and the analysis of the model yields numeric values for the metrics of interest (e.g. throughput, cache misses, energy efficiency). This process is iterated throughout this thesis to tackle different types of lock-free data structures and different use cases, in which impacting factors might vary.
We start the process with the abstractions of the lock-free program and the machine that are characterized by a set of parameters. Then, the system is mapped to an execution model (e.g. cyclic pattern, Markov chain, Poisson pro- cess, queueing model in steady states under low and high contention, a system of mathematical equations) which retains the initial parameters. Both of these steps are aligned with the identified, significant performance impacting factors because we aim at representing the actual behavior of the system under a reason- able model complexity. During this process, we might ignore memory manage- ment calls if they are not costly, some type of hardware or algorithmic conflicts, and events when they are improbable. We collect the evidence regarding the insignificance of these details through empirical observations (benchmarking, performance counters). In a second phase, we analyze the execution model to
24 CHAPTER 1. INTRODUCTION estimate (or sometimes bound) the main performance metrics: throughput and power consumption, that can be merged to obtain the energy efficiency. Finally, we validate our models with both synthetic tests and examples picked from ap- plication domains, for a range of lock-free data structures. To the best of our knowledge, we attempt for the first time to model and analyze the performance of lock-free data structures on such a broad domain and obtain estimates that are close to what is observed in practice.
An analytical framework can be useful in many ways. In the first place, it can explain observations and provide an understanding of the phenomena that drive the performance of lock-free data structures. It can identify the issues and bottlenecks in a design which in turn facilitates design decisions.
Secondly, it can be used to rank alternative lock-free data structure designs. We have mentioned in the previous sections that a vast variety of lock-free data structure designs exist. Different lock-free data structure designs can outper- form each other in different configurations, which makes it difficult to conduct a fair comparison. Sometimes strengths or limitations of the data structures are hidden, thus unnoticeable even by their creators because they only appear in some configurations of the domain that it is often not possible to sufficiently cover empirically. An analytical framework can reveal the merits of data struc- tures and provide a fair comparison by covering the whole configuration do- main.
Last but not least, it can help the tuning process of the data structure related parameters. On this last point, lock-free data structures come with specific pa- rameters, e.g. back-off, padding, and memory management related parameters, and become competitive only after picking carefully their values, which often involves a costly brute force approach. This can be replaced, or at least driven, by an analytical estimation of the performance.