Incremental Adaptations of ASF - Interaction of Hardware Transactional Memory and Microprocesso

So far, this chapter presented the basic ISA design decisions for ASF, and a small prototypical C language wrapper that enables small transactions from C. I will now describe two incremental changes that we made to the design of ASF; I will describe larger repurposing changes in Chapter 6.

3.4.1 Inverted Transactional Semantics

ASF has been designed from the ground up to be an addition to the AMD64 ISA. Therefore, it does not change the meaning of normal loads and stores, but instead prefixes instructions that should behave specially with the LOCK prefix. The original idea was that this would simplify the instruction decoder and followed from the concept of the ASF 1 declarators, because there is no context information required to understand if a specific memory instruction is transactional or not. Instead, additional work that is required for the transactional accesses can easily be added in the decoder front-end because instructions are directly flagged.

Marking transactional accesses relies heavily on a toolchain that detects and converts shared memory accesses inside language-level transactions into the right LOCKed machine instruction. Since the original use-cases for ASF were small, contained transactions, such as DCAS, the working set was available in advance and / or memory accesses were easily marked by the programmer. Early discussions with AMD’s micro-architects suggested that the marking of transactional instructions would be beneficial for a hardware implementation. As outlined above, the idea is that early detection of transactional accesses in the decoder would remove the need to track a transactional context to decide whether a load / store was transactional or not.

Despite explicit marking, ASF needs some context awareness (is a specific memory access inside / outside of a transaction) in the front-end of the CPU (see Chapter 4), in order to properly track whether transactional LOCKed instructions are placed inside an ASF transaction (they constitute an illegal instruction if they are not). Naturally, we looked to exploit the transactional context tracking in hardware, and into benefits of changing the “polarity” of the LOCK prefix in ASF: unmarked, basic memory instructions would exhibit transactional semantics, while marked instructions would be non-transactional inside a transaction. These new inverted transactions are started with the SPECULATE_INV instruction, while the existing mode is still available through SPECULATE.

This relatively simple change creates significant simplifications for the programming model, and is therefore the standard mode of operation of all commercially released HTM architectures [353, 367]. Notably, they were all released after our initial ASF design had been published. Most importantly, it is now possible to use ASF for eliding locks of critical sections and call into binary library code from transactions, while keeping the critical sections themselves in their original binary form. This is possible, because all memory modifications made by the called code will be undone on transaction abort, and unsupported

instructions (that could cause unrecoverable state modifications, such as system calls) would still abort the transaction. The partial execution of such binary code therefore cannot leave any inconsistent state behind.

Together with LD_PRELOAD tricks that replace the pthread locking library with one that performs ASF transactions instead of acquiring / releasing pthread mutexes or locks, we have shown to support transparent transactional lock elision with normal transactional memory [254]. The paper is attached in Appendix B.4.

With the described simple compiler wrappers (Section 3.3), the inverted mode integrates ASF almost fully into the compiler, thanks to the inverted semantics and no need to mark accesses at all. One problem of the non-inverted ASF was that stack accesses (as being unmarked) would occur in a non-transactional fashion. It is therefore critical to not overwrite pre-transactional call stack (for example by starting the transaction and then returning to the transactional code) in normal ASF. Inverted ASF, however, treats all unmarked memory accesses as transactional and thus call-stack overwrites become speculative and will be undone upon transaction abort; at the expense of always adding stack accesses to the transactional working set.

One problem arises with mixing normal and inverted mode transactions: upon a commit, the outer transaction needs to know how to treat the polarity of the lock prefix. This cannot, however, be solved with a COMMIT_INV instruction, as that would need knowledge of the outer transaction type and not the one just being committed. One possible option is to keep a bit-stack in the CPU and memorise the normal / inverted state whenever a new, nested transaction is started. For simplicity, however, we restrict nesting to same-type nesting initially (nesting only inverted transactions in inverted transactions, and non-inverted in non-inverted transactions).

3.4.2 Signalling Transactional Problems

The original ASF 2 specification [186] (Appendix A) was very strict on flagging transactional errors that indicated a programmer problem. Executing wrong instructions, improper mixing of transactional and non-transactional accesses (see Section 3.2.6) and other issues would not only cause an abort of the transaction, but also cause a general protection (#GP) or undefined instruction (#UD) exception. The rea- soning behind the strong flagging of these errors was the assumption that transactions would be written for ASF by a programmer and in accordance with ASF rules for illegal instructions and memory accesses. A large use case for transactional memory, however, is that of replacing critical sections with transactions in existing code–often without recompiling / extensive restructuring of the code in the transaction / critical section.

After practical experience with a full runtime and transactional compiler (using the TU Dresden stack [213], we realised that the generation of exceptions in those cases was rather draconian. In par- ticular when transactions may call into the OS to allocate memory, a #GP exception is a drastic measure, as these will usually kill the application straight away. Instead, these issues can be worked around by following the backup path, which for example uses software transactional memory or a single global lock. Installing a signal handler for these cases seemed a too heavy-weight solution and would require careful distinction between in-transaction exceptions caused by the transactional context around legal code that could be fixed by simple restart and possibly using a different synchronisation mechanism, and those that are genuine programming errors (for example calling a kernel mode instruction in user mode). We therefore changed the specification and implementation of ASF such that these events would be notified to the transaction by aborting and setting a special bit in the abort code and not generate an exception separately.

load A store B ... Transaction 1 load B store A ... Transaction 2

Figure 3.8: Two transactions with overlapping, conflicting access patterns.

In document Interaction of Hardware Transactional Memory and Microprocessor Microarchitecture (Page 84-86)