4.2 Requirements for TM Implementations
4.2.3 Compilers
A large part of the TM support in a C/C++ compiler is straightforward to implement (e. g., analyzing which code is transactional or potentially called from transactions, and cloning transactional code and instrumenting it so that it uses the TM ABI for memory accesses). However, two issues deserve more attention: (1) TM-pure code and (2) the compiler transformations that are allowed (or required) for accesses to shared memory and how to express sequenced–before with these accesses.
TM-pure code. When instrumenting transactional code, the compiler has to decide which operations in the code are unsafe, TM-pure, or handled by the TM runtime library. The latter case is straightforward to handle because the operation just has to be replaced with a call to the associated TM runtime library function as specified by the ABI (see Table 4.1). Unsafe code has to be prefixed with a call to the ABI’s function that requests serial–irrevocable mode.5
However, unsafe code cannot be executed concurrently with other transactions, so the compiler should not treat TM-pure code as unsafe code.
Therefore, the compiler has to determine whether code is trivially TM-pure or whether it can be made TM-pure by the compiler. This decision is somewhat implementation-defined and subject to the as–if rule, but does not depend on the TM runtime library and can be handled by the compiler on its own (if adhering to the rules described next). Compiler implementations will typically have to consider both intermediate forms of transactional code (where most of the instrumentation is likely to happen) and the native code that will be 5If the operation is already dominated (in terms of control flow) by another call to this ABI
Category Remarks Register-only CPU
instructions
Must be safe and use only CPU registers (e. g., most control flow instructions but not system calls). Stack accesses Write-through. Loads and stores are TM-pure.
Compiler-generated code rolls back stack slots on transaction restart or does not re-use them.
Accesses to thread- local data
Write-through. Loads are TM-pure. To make stores TM-pure and ensure rollback, the compiler inserts calls to undo-logging functions (ITM L) before the first store to a location.
Loads from immu- table data
No side effects, no synchronization necessary (e. g., loads from virtual method tables).
Table 4.3: Examples of TM-pure code.
generated for potentially TM-pure operations (e. g., built-in functions used to implement complex operations).
The majority of TM-pure operations are those which (1) do not result in visible side effects in terms of the C++ abstract machine (e. g., I/O or volatile memory accesses), (2) do not contribute to synchronizes–with and are race-free, and (3) are idempotent (wrt. restarts of a transaction) or rolled back by some mechanism (e. g., compiler-generated code).
Table 4.3 shows examples for such TM-pure operations. Loads that need no synchronization in nontransactional code are usually TM-pure because they target immutable, existing variables (otherwise, the original code would not be race-free, in which case transactions have undefined behavior as well). Opera- tions that modify thread-local state can still be TM-pure because of available rollback mechanisms. CPU state (e. g., registers or floating-point state) gets rolled back by the setjmp-like behavior of ITM beginTransaction. Stack slots potentially modified in a transaction get either actively rolled back by code in- serted by the compiler at the start of a transaction (triggered by a bit in the return value of ITM beginTransaction, see Figure 4.4), or the compiler can in- struct its code generator to not re-use stack slots that are live into a transaction and rather use new stack slots to store modifications within this transaction. Finally, the ABI provides undo-logging functions (see Table 4.1) to log the previ- ous values of modified memory locations (without protecting them from accesses by other threads) and undo any changes on transaction restart. A location can only be accessed by a TM-pure operation in a transaction if all accesses in this transaction are by TM-pure operations or unsafe code.
The compiler’s code generator might use built-in functions to implement complex operations from intermediate code. Those functions, even though they might communicate with other threads or might have to synchronize their ac- cesses to shared state, can still be TM-pure if they are idempotent and fulfill the additional requirements discussed next. Note that these guarantees also explain the high-level guarantee C1 in Table 4.2 in more detail.
Race-free even without transaction atomicity. TM-pure operations must be properly synchronized and race-free even if discarding TSO contribu- tions of currently active or future transactions to synchronizes–with. In- tuitively, TM-pure operations must either not need to synchronize at all, or must be independent of any transactional synchronization because this is the sole responsibility of the TM runtime library and only well-defined at the language-level but not at the implementation-level that TM-pure operations are a part of.
No dependence on TSO choice or changes. TM-pure operations have to be independent of a particular TSO choice or change in this choice. They can expect and observe all orderings visible via happens–before to the as- sociated thread and transaction when the transaction was first started. However, they must not depend on or be affected by later additions or changes to TSO (e. g., commits of other transactions). They can also ex- pect that values returned from the TM runtime library during the current execution attempt of the transaction (e. g., since the most recent restart of this transaction) are consistent with the current TSO choice. For exam- ple, if a transaction did not abort since an earlier TM-pure operation, this does not mean that no other transaction committed in the meantime, nor that the current transaction might still be able to successfully commit. Must be self-contained. Because transactions can be aborted during every
invocation of a function of the ABI, and because the compiler can inter- leave TM-pure operations with calls to ABI functions, TM-pure operations must be self-contained in that they do not expect subsequent operations (TM-pure or ABI) to be executed as well. However, executions of individ- ual TM-pure operations will not be interrupted or aborted. This might also restrict the transformations of transactional code that the compiler is allowed to do (e. g., inlining a TM-pure operation and then moving ABI calls into the inlined code can be a fault).
No interference with TM-internal synchronization. TM-pure operations can synchronize with other threads, but this must not create deadlocks or any other kind of conflict when combined with the TM-internal syn- chronization. Self-contained TM-pure operations are important for this requirement as well (e. g., if such an operation acquires a lock it must also release the lock before it can possibly be aborted). TM-pure operations are allowed to block on other operations except anything related to TM (e. g., operations that execute transactions or any ABI function). However, they must not wait for or depend on the execution of other operations that have not been started yet. For example, blocking on a lock acquired by another TM-pure operation is allowed because the acquired lock shows that the other, self-contained operation is already running. However, waiting for another transaction to finish or depending on another transaction to not abort is not allowed.
These requirements are basically also sufficient for functions annotated as transaction pure to be indeed correct TM-pure operations. Additionally, such code has to ensure that it remains self-contained despite potential compiler optimizations (e. g., a programmer could have to add a noinline attribute to
the function). Transactional wrapper functions (associated with functions an- notated with the tm wrapper attribute) are sequences of TM-pure operations interleaved with calls to ABI functions, so the previous discussion also serves as a guideline for how to correctly implement such wrappers.
Compiler transformations. All operations in transactional code that are not TM-pure or unsafe have to be transformed into calls to the TM runtime library’s functions. For individual memory accesses, these transformations are straightforward because the compiler just has to transform the access (e. g., in intermediate code) into a function call (e. g., to ITM RU8). However, transform- ing or reordering several accesses in a piece of transactional code requires more care because the compiler must not introduce additional race conditions that were not present in the C++ source code. Furthermore, the TM runtime library observes the sequencing of accesses (i. e., sequenced–before) at the language level through the order of the compiler-generated library calls, so the compiler must not lose important sequencing information when reordering accesses (high-level guarantee C2 in Table 4.2).
The first requirement for the compiler is to instruct the library to access ex- actly the same locations as accessed on the language level (i. e., by an abstract machine running the program). Accesses can be split or merged if necessary but must not touch other locations because these could be concurrently accessed by other code, leading to race conditions that do not exist in the source program. Using whole-program analysis, the compiler could potentially detect that some locations (e. g., bytes between two variables that have been added to align those variables) are only accessed by loads and stores in the TM runtime library and are always accessible. However, the potential benefit in terms of TM perfor- mance is probably rather small, so just accessing exactly the same locations seems to be sufficient.
Note that this applies to loads from memory as well. Loading from mem- ory will not change the results of other accesses to the same location but the location could not be accessible, which could raise segmentation faults that are visible side effects. Some architectures such as SPARC provide nonfaulting load instructions but we cannot expect that such instructions are generally available. As a second requirement, the compiler must not access locations specula- tively, as it would happen when predicting that a certain branch is taken and prefetching the values that would be accessed in the predicted execution. In the case of misspeculation, the additional access could lead to a race with other concurrent code. For such data-dependent accesses, the compiler must not place them before the other access (and call to the TM runtime library) whose result determines whether the access would be executed on the language level.
Third, the compiler is allowed to reorder two accesses in a single transaction if it can prove that the later access (in terms of control flow) would happen in any case when the first access would execute in this transaction. The reason for this is that (1) transactions do not remove race conditions in the transaction’s source code and that (2) the C++ standard allows undefined behavior resulting from race conditions to occur before the execution of the code that contains the race condition (see §1.9.5 in the standard [65]). Thus, the TM can assume that every transaction must be race-free even if executed atomically in any possible interleaving with nontransactional code. Performing an access earlier
in the execution of the transaction must thus still be race-free, provided that this access is guaranteed to be executed (in contrast to a speculative execution as explained previously) and the reordering would be allowed in a sequential execution of the transaction. If this would instead lead to a race condition, then the race condition could also be triggered in valid execution without reordering, and the reordering just leads to an earlier exposition of the race condition.
Finally, there are restrictions regarding the reordering of code across trans- action boundaries. Basically, TM-pure operations can be moved into and out of transactions because they are independent of TSO. Unsafe operations must not be moved into transactions because this might invalidate liveness proper- ties that exist in the source program, and they also must not be moved out of transactions because this might result in race conditions. Other code must not be moved out of transactions but can be moved into transactions because it does not synchronize (it would be unsafe code otherwise). If a transaction can potentially be canceled (see Section 4.1), then no code can be moved into or out of the transaction, unless the code is TM-pure and its execution cannot be detected by the program.
These previous requirements also show why publication safety is mostly the responsibility of the compiler and the transactional program. To avoid race conditions, the program’s source code must first load the data that determines whether the published data is available (e. g., flags or pointers) before accessing the published data itself. Thus, accesses to published data are data-dependent on the former data, and the compiler must not reorder these accesses so that they happen speculatively before the data dependency. Given these guarantees, the TM runtime library is free to select or even change TSO during the runtime of a transaction that might read published data; the nontransactional accesses to the published data by the publisher and the observer will always be synchronized by the intermediary transactional accesses to the publication flags.