• No results found

4.4 Basic Pipeline and Out-of-Order Core integration

4.4.3 Misspeculation

Section 4.1.1 introduced multiple instances for speculation in the OoO core and how they could fail. ASF-speculative load and store instructions are also subject to these mechanisms and this has caused several challenges for our implementation, because of the complex interactions imposed by release and redistribution of resources due to misspeculation.

Precise ASF working-set tracking Because of OoO speculation, the core may overestimate ASF’s work- ing set: misspeculated memory instructions can add spurious ASF-spec entries to the LLB or cache before the misspeculation is detected and the corresponding memory instructions are annulled. Linked tree data structures (see Figure 4.8) can exhibit this issue, for example. If the branch for the test at each node is wrongly predicted, speculative execution might traverse the wrong path / multiple wrong paths of the tree.

The overestimation does not impact correctness of the execution conceptually (all lines that need protection are protected), but has performance implications, since the additional lines artificially increase contention and also put additional pressure on the limited capacity.

It is thus desirable to detect and remove spurious entries in ASF’s working sets. However, recomputing the actual ASF-spec state of a cache line when annulling an ASF-spec memory access is challenging. The state depends not only on in-flight memory instructions, but also has to take into account retired ASF-spec memory instructions of the current speculative region that have referenced the cache line.

Our LLB-based ASF design supports reference counting for that particular purpose and thus can track read/write sets precisely. Adding reference-counting mechanisms to the existing L1 cache would be ex- pensive; thus, the L1-based ASF implementation currently may overestimate the read set. Another option to avoid misspeculated entries in the transactional read/write sets is to mark entries in the cache only after all control flow speculations have been resolved with a mechanism similar to what we describe for misses in the next section; for example at the retire time of the load. The additional cost of an addi-

1 2 4 3 3 Data Cache foo TX.R Miss Buffer foo TX.R 5 6 asf.spec asf.load [foo] asf.load [foo+8] asf.com clear

Figure 4.9: Complications when ASF instructions get squashed during a cache miss, and proper identifi- cation of the associated cache line is required: a speculative LOCK MOV misses in the cache and creates an ASF-marked miss buffer entry (1), and subsequently sends a cache miss request to the memory hierarchy (2). While the request is in flight, the load instruction is squashed (3). If the transaction commits at this point (4) and clears all speculative bits in the cache, the returning response for the miss will then set up the line with ASF marking (5). Simply resetting the ASF flag of either cache line or miss buffer entry upon load squash (3) is not correct, because there might exist an aliasing load that requires the ASF-ness to be preserved (6).

tional access to the data cache may, however, cause additional latency for the instruction, and consume bandwidth of the cache interface.

Orphan cache entries Not tracking ASF’s working set precisely can lead to orphan ASF-spec entries in the cache in an L1-based ASF implementation under specific timing conditions; even though it is safe in principle. These orphan entries remain even though the originating speculative region has already successfully committed or aborted; see Figure 4.9.

To illustrate, consider the following sequence of events: an ASF-spec load misses in the cache and sets up an ASF-spec miss-buffer entry to track the cache miss. The load eventually is annulled because it is on a wrongly predicted branch. The cache-miss handling cannot be aborted at this time. Eventually, the speculative region commits by successfully retiring the COMMIT instruction (the original dependency on the cache-missing load is not present anymore, since that load has been annulled). The cache line is eventually filled into the cache and gets its spec-read bit enabled because the corresponding miss-buffer entry was tagged as ASF-spec, leading to an orphan spec-read cache line.

Note that simply resetting the cache line’s spec-read bit on annulment of referencing ASF-spec loads would be incorrect, because multiple in-flight loads (ASF-spec and non-ASF-spec) may still reference the miss-buffer entry. Similarly, the miss-buffer entry’s ASF-spec state cannot be simply reset because it may still be referenced by other in-flight ASF-spec loads.

A simplified version of the recomputation introduced previously solves this issue (Figure 4.10): we reuse the existing reference from a miss-buffer entry to its associated in-flight loads and count the ASF- spec-load references (or rather track the ASF-ness of each referencing memory instruction referencing the missbuffer entry). We observe that no retired load can contribute to the ASF-spec state of the miss- buffer entry because loads can only retire once their cache misses have been resolved. Therefore, the number of ASF-spec loads referencing the miss-buffer entry can always be computed online by counting all non-retired (in-flight) loads with such a reference, allowing miss-buffer entries to precisely track their ASF-spec state and eliminating the need for dedicated reference counting in the L1 cache. In result, no modification to the L1 cache is necessary, and we readily implemented this mechanism to prevent orphan spec-read cache entries in our ASF prototype.

In a simpler design, it may be viable to wait for the missbuffers to be completely empty before start / commit of the transaction, but doing so increases the entry latency and also commit latency in the presence of non-transactional accesses inside the transaction.

Miss Buffer foo 2 -> 1 2 -> 1 asf.load load load asf.load

Figure 4.10: Recomputation of the ASF-ness of a specific miss buffer entry that is being reused by multiple in-flight ASF-spec / non-ASF-spec accesses. The mechanism performs reference counting for each of the two categories in the miss-buffer.

(retirement of the COMMIT instruction) would also work around the orphan-cache-entries issue. However, our recomputation approach tracks ASF’s working set more closely (for misses) and thus reduces the likelihood of contention.