Transaction Performance - Semantics, performance and language support for transactional memory

Many researchers agree that using a TM for data synchronization instead of locks brings a tremendous advantage in the practice of programming correct concurrent applications.

However, this comes with the price of overhead induced by the TM. For the advantages of TM in programming to be valuable, the code using TM should also be fast and efficient. This is especially an important issue for STMs, since an STM executes totally in software and its contribution to application execution time is significant. Mechanisms that compensate for or even hide this overhead is the ultimate goal in the field of STM research. There are two metrics used in TM field to evaluate TM overhead:

Throughput: Throughput is a metric of performance traditionally used in TM to measure the number of transactions a TM commits per time unit. Throughput effectively reflects the ability of a TM to perform useful work (in the case of TM useful work is obtained when a commit occurs). While throughput partially depends on the application workload, the overhead a TM introduces in executing a transaction plays an important role in the achieved throughput and, hence, it is an important metric to assess TM performance.

Commit-abort ratio: The commit-abort ratio, which we denote by τ, is the ratio of the number of committing transactions over the total number of complete transactions (committed or aborted) [70]. This metric captures the notion of success of a TM by giving the percentage of transactions that the TM committed versus the total number of transactions the TM attempted to commit. That is, the commit- abort ratio is an important measure of achievable concurrency for TM performance, especially from a theoretical point-of-view.

Although throughput is the most widely used performance metric, it is not sufficient to identify the cause of TM efficiency because it gives information only about commit performance. Nevertheless, evaluating how likely a TM aborts transactions is a crucial issue since aborting can be very costly. First, this cost depends on the efforts wasted in executing the transaction before aborting it: a long transaction is generally costly to retry. Second, abort side-effects might be dramatic for performance: take, as an example, an aborting transaction that has previously forced several other transactions to also abort; this transaction may create further conflicts upon retry.

Hence, it is not possible to understand TM performance only by observing commit performance: one TM may be efficient either because it aborts very few transactions or because it retries transactions very rapidly. The commit-abort ratio is complemen- tary to the throughput since it determines whether a TM is simply fast or whether it is capable of committing most of its transactions. As with throughput, part of the aborts experienced in an execution is the result of the workload the application rep- resents but the rest is due to the TM implementation (which can unnecessarily abort

some transactions just for the sake of implementation simplicity) and commit-abort ratio helps to determine the amount of unnecessary aborts a TM implementation performs. The usefulness of the commit-abort ratio is demonstrated by Gramoli et al. [70] where, apart from complementing throughput, this metric allows classifying different TM designs in terms of how much a design is close to an ideal one where all aborts are due to consistency violations (and not due to TM design decisions). The above metrics are useful to assess performance at a given parallelism, i.e., for a given number of executing threads. However, with concurrent programming another dimension in performance is the behavior of program execution under changing parallelism. Two major indicators of TM performance with respect to varying number of threads are speed-up and scalability :

Speed-up is defined as the gain in performance achieved by parallelizing a sequential task. In the case of TM, this converts to the gain in throughput of committed transactions with respect to a sequential version of a program. A subtle point that needs clarification is that the sequential version of a program should not be considered as the TM-based version of the program running with a single thread, but rather a version of the program that has no data synchronization and performs all the task sequentially on a single thread. Due to the overhead TM incurs, the TM-based version of the program running with a single thread can be significantly slower than a sequential version of the same program (especially when the TM is an STM). Hence, real performance gain obtained by increasing the parallelism can be observed only by comparing throughput with respect to the sequential version. A recent publication by Dragojevi´c et al. [46] point this issue out and demonstrate that STMs can largely outperform sequential code despite all the overhead they incur.

Scalability, in general, describes the ability to improve throughput or capacity of a parallel system when additional computing resources (such as CPUs, memory, storage or I/O bandwidth) are introduced [147]. For TMs this generally converts to the ability to improve transaction throughput as the number of executing threads increases. Scalability indicates how much more work can be done by a TM by adding more potential work (potential work is introduced by increasing the number of threads) to the system. Ideally, the preference is that a TM always produces more work (have higher transaction throughput) with increasing number of threads. However, practically, scalability is bound by the contention of data sharing introduced by the application. Examples of scalability measurements can be found in nearly all TM related work, while good examples contrasting scalability of different realistic applications are by Minh et al. [133], Dragojevi´c et al. [46], Kestor et al. [109] and Zyulkyarov et al. [201].

In document Semantics, performance and language support for transactional memory (Page 83-86)