7.2 An ASF-Based TM Runtime Library
7.2.3 Loads and Stores
For brevity, let us consider a simple HTM-like implementation of hardware transactions. The compiler will transform all accesses to shared memory in a transaction into calls to the data transfer functions as specified in the ABI (i. e., loads, stores, and copying, moving or setting the values of blocks of memory). ASF uses selective annotation, so we can use LOCK MOV instructions to in- struct ASF to make only these accesses transactional. Other accesses do not have to be instrumented, thus we only need LOCK MOV instructions in the implementations of the ABI’s transactional data transfer functions.
Figure 7.6 shows 64-bit load and store functions as an example.10 Static link-
ing of ASF-TM and link-time optimization can reduce the overhead further by inlining theLOCK MOVinstructions into the application code (see Section 3.4.1 for details). As an example, Figure 7.7 shows the code generated by the com- piler for a transactional traversal of a linked list. The code for the sequential version of this traversal is essentially the same but without theLOCKprefixes. Even simple HTMs such as the one I am considering here have to provide the transaction guarantees outlined in Section 4.2. ASF obviously handles concur- rency control between transactions but ASF-TM has to also ensure publication and privatization safety, which depend on the interaction of nontransactional 10The functions in the example take the address of the transaction descriptor as first argu-
memory accesses and publishing or privatizing transactions.
Privatization safety is implicitely guaranteed by ASF because hardware transactions will abort instantaneously on conflicting accesses to the cache lines that they have accessed (see Section 7.1.1). Informally, the snapshot of a active hardware transaction is always up-to-date. This also means that transactions will never operate on inconsistent data (e. g., they never observe only parts of the updates committed by another concurrent transaction). Furthermore, COMMITinstructions are full memory barriers, so later nonspeculative accesses to privatized data are never ordered before the commit.
Publication safety is a bit more involved because it must be ensured by both the compiler and the TM runtime library. Informally, it requires that transac- tional loads and stores are not reordered before other transactional loads that appear earlier in program order.11 Speculative loads are ordered in program or-
der by ASF in terms of both monitoring the respective cache lines and retrieving the data (see Figure 7.2). Speculative stores will not become visible before the speculative region is committed and thus earlier loads will always have started monitoring and loading the data before that. Thus, transactions will always observe the nonspeculative updates that happened before the publishing trans- action signaled the availability of these updates. Similar to privatization safety, a publisher’sCOMMITinstructions represents a full memory barrier. However, the compiler also must take part in ensuring publication safety by not reordering transaction loads or stores before other, earlier loads (see Section 4.2.3).
The final issues that we have to consider is false sharing between specula- tive and nonspeculative accesses. The compiler has a consistent view of which memory locations are potentially shared with other threads or not. It also pro- vides this view to the TM by using TM load and store functions iff the accessed location is shared. However, this information is at the granularity of bytes, whereas speculative accesses in ASF always operate on full cache lines, lead- ing to a potential false sharing between speculative and nonspeculative accesses from ASF’s perspective.
ASF can handle some combinations of such false sharing but will raise a general protection fault if a nonspeculative store targets a cache line that has been accesses speculatively before. This is difficult to avoid by the compiler and TM because this kind of sharing can happen in several situations that can be caused by not just the TM building blocks but also by other parts of the tool chain or the application (e. g., by the linker).
For example, consider a thread’s stack. Variables on the stack are typically thread-private but can be shared as well. ASF’s selective annotation allows the compiler to avoid wasting ASF’s capacity for accesses to nonshared parts of the stack. But if there are shared variables on the stack frames of functions that execute transactions, then these transactions are prone to triggering general protection faults. To avoid this, the compiler’s code generator would have to ensure that these potentially shared variables are on a separate cache line than all other stack slots modified in transactions. This seperation would also have to hold with respect to stack frames of calling functions and other future calls’ stack frames that would end up on the same cache lines. Using a shadow stack for all on-stack allocations of potentially shared data, starting in the function that 11This also applies to externally visible side effects caused by these accesses (e. g., page
starts the outermost transaction, could be a reasonable approach to implement this.
Another example are global variables. Putting every global variable on sep- arate cache lines is not practical because it could bloat applications’ memory requirements. In turn, it can be hard to let the compiler infer which global variables are accessed or not accessed in transactions, because this requires whole-program points-to analysis.12
In summary, it is hard for the TM compiler and runtime library to always avoid the false sharing between speculative and nonspeculative accesses. Always falling back to using software transactions when such a problem could poten- tially occur will likely require a very conservative decision, wasting much of the performance benefit that ASF and selective annotation offer.
Note that the problem is not that ASF cannot deal with the false sharing but instead that it raises the general protection fault. If ASF would just abort the speculative region and signal the false sharing with a special abort reason code, the TM could easily fall back to a software transaction. Advanced compiler analysis and trying to avoid the false sharing would still be possible.
Thus, judging from a software perspective, ASF should abort speculative re- gions when they encounter unsupported false sharing instead of raising a general protection fault. Note that this is similar to the case of disallowed instructions discussed in Section 7.2.1.