• No results found

Memory Hierarchy Control and Consistency

IA-64 Application Programming Model

4.4.6 Memory Hierarchy Control and Consistency

4.4.6.1

Hierarchy Control and Hints

IA-64 memory access instructions are defined to specify whether the data being accessed possesses temporal locality. In addition, memory access instructions can specify which levels of the memory hierarchy are affected by the access. This leads to an architectural view of the memory hierarchy depicted in Figure 4-4 composed of zero or more levels of cache between the register files and memory where each level may consist of two parallel structures: a temporal structure and a non- temporal structure. Note that this view applies to data accesses and not instruction accesses.

The temporal structures cache memory accessed with temporal locality; the non-temporal structures cache memory accessed without temporal locality. Both structures assume that memory accesses possess spatial locality. The existence of separate temporal and non-temporal structures, as well as the number of levels of cache, is implementation dependent.

Three mechanisms are defined for allocation control: locality hints; explicit prefetch; and implicit prefetch. Locality hints are specified by load, store, and explicit prefetch (lfetch) instructions. A locality hint specifies a hierarchy level (e.g. 1, 2, all). An access that is temporal with respect to a given hierarchy level is treated as temporal with respect to all lower (higher numbered) levels. An access that is non-temporal with respect to a given hierarchy level is treated as temporal with respect to all lower levels. Finding a cache line closer in the hierarchy than specified in the hint does not demote the line. This enables the precise management of lines using lfetch and then subsequent uses by .nta loads and stores to retain that level in the hierarchy. For example, specifying the .nt2 hint by a prefetch indicates that the data should be cached at level 3. Subsequent loads and stores can specify .nta and have the data remain at level 3.

Locality hints do not affect the functional behavior of the program and may be ignored by the implementation. The locality hints available to loads, stores, and explicit prefetch instructions are given in Table 4-18. Instruction accesses are considered to possess both temporal and spatial locality with respect to level 1.

Figure 4-4. Memory Hierarchy

000722 Temporal Structure Non-temporal Structure Register Files Temporal Structure Non-temporal Structure Temporal Structure Non-temporal Structure Memory

Level 1 Level 2 Level N

Each locality hint implies a particular allocation path in the memory hierarchy. The allocation paths corresponding to the locality hints are depicted in Figure 4-5. The allocation path specifies the structures in which the line containing the data being referenced would best be allocated. If the line is already at the same or higher level in the hierarchy no movement occurs. Hinting that a datum should be cached in a temporal structure indicates that it is likely to be read in the near future. Explicit prefetch is defined in the form of the line prefetch instruction (lfetch, lfetch.fault). The lfetch instructions moves the line containing the addressed byte to a location in the memory hierarchy specified by the locality hint. If the line is already at the same or higher level in the hierarchy, no movement occurs. Both immediate and register post-increment are defined for

lfetch and lfetch.fault. The lfetch instruction does not cause any exceptions, does not affect program behavior, and may be ignored by the implementation. The lfetch.fault

instruction affects the memory hierarchy in exactly the same way as lfetch but takes exceptions as if it were a 1-byte load instruction.

Implicit prefetch is based on the address post-increment of loads, stores, lfetch and

lfetch.fault. The line containing the post-incremented address is moved in the memory hierarchy based on the locality hint of the originating load, store, lfetch or lfetch.fault. If the line is already at the same or higher level in the hierarchy then no movement occurs. Implicit prefetch does not cause any exceptions, does not affect program behavior, and may be ignored by the implementation.

Table 4-18. Locality Hints Specified by Each Instruction Class

Mnemonic Locality Hint

Instruction Type

Load Store lfetch, lfetch.fault

none Temporal, level 1 x x x

nt1 Non-temporal, level 1 x x

nt2 Non-temporal, level 2 x

nta Non-temporal, all levels x x x

Figure 4-5. Allocation Paths Supported in the Memory Hierarchy

000723 Temporal Structure Non-temporal Structure Temporal Structure Non-temporal Structure Level 1 Level 2 Cache Temporal Structure Non-temporal Structure Level 3 Memory

Non-temporal, All Levels Non-temporal, Level 1 Non-temporal, Level 2 Temporal, Level 1

Another form of hint that can be provided on loads is the ld.bias load type. This is a hint to the implementation to acquire exclusive ownership of the line containing the addressed data. The bias hint does not affect program functionality and may be ignored by the implementation.

Two instructions are defined for flush control: flush cache (fc) and flush write buffers (fwb). The

fc instruction invalidates the cache line in all levels of the memory hierarchy above memory. If the cache line is not consistent with memory, then it is copied into memory before invalidation. The

fwb instruction provides a hint to flush all pending buffered writes to memory (no indication of completion occurs).

Table 4-19 summarizes the memory hierarchy control instructions and hint mechanisms.

4.4.6.2

Memory Consistency

IA-64 instruction accesses made by a processor are not coherent with respect to instruction and/or data accesses made by any other processor, nor are instruction accesses made by a processor coherent with respect to data accesses made by that same processor. Therefore, hardware is not required to keep a processor’s instruction caches consistent with respect to any processor’s data caches, including that processor’s own data caches; nor is hardware required to keep a processor’s instruction caches consistent with respect to any other processor’s instruction caches. Data accesses from different processors in the same coherence domain are coherent with respect to each other; this consistency is provided by the hardware. Data accesses from the same processor are subject to data dependency rules; see “Memory Access Ordering” below.

The mechanism(s) by which coherence is maintained is implementation dependent. Separate or unified structures for caching data and instructions are not architecturally visible. Within this context there are two categories of data memory hierarchy control: allocation and flush. Allocation refers to movement towards the processor in the hierarchy (lower numbered levels) and flush refers to movement away from the processor in the hierarchy (higher numbered levels). Allocation and flush occur in line-sized units; the minimum architecturally visible line size is 32 bytes (aligned on a 32-byte boundary). The line size in an implementation may be smaller in which case the implementation will need to move multiple lines for each allocation and flush event. An implementation may allocate and flush in units larger than 32 bytes.

In order to guarantee that a write from a given processor becomes visible to the instruction stream of that same, and other, processors, the affected line(s) must be flushed to memory. Software may use the fc instruction for this purpose. Memory updates by DMA devices are coherent with respect to instruction and data accesses of processors. The consistency between instruction and data caches of processors with respect to memory updates by DMA devices is provided by the hardware. In

Table 4-19. Memory Hierarchy Control Instructions and Hint Mechanisms

Mnemonic Operation

.nt1 and .nta completer on loads Load usage hints

.nta completer on stores Store usage hints

prefetch line at post-increment address on loads and stores Prefetch hint lfetch,lfetch.fault with .nt1, .nt2, and .nta hints Prefetch line

fc Flush cache

case a program modifies its own instructions, the sync.i and srlz.i instructions are used to ensure that prior coherency actions are observed by a given point in the program. Refer to the description sync.i in Chapter 2 of Volume 3 for an example of self-modifying code.

4.4.7

Memory Access Ordering

Memory data access ordering must satisfy read-after-write (RAW), write-after-write (WAW), and write-after-read (WAR) data dependencies to the same memory location. In addition, memory writes and flushes must observe control dependencies. Except for these restrictions, reads, writes, and flushes may occur in an order different from the specified program order. Note that no ordering exists between instruction accesses and data accesses or between any two instruction accesses. The mechanisms described below are defined to enforce a particular memory access order. In the following discussion, the terms “previous” and “subsequent” are used to refer to the program specified order. The term “visible” is used to refer to all architecturally visible effects of performing a memory access (at a minimum this involves reading or writing memory).

Memory accesses follow one of four memory ordering semantics: unordered, release, acquire or fence. Unordered data accesses may become visible in any order. Release data accesses guarantee that all previous data accesses are made visible prior to being made visible themselves. Acquire data accesses guarantee that they are made visible prior to all subsequent data accesses. Fence operations combine the release and acquire semantics into a bi-directional fence, i.e. they guarantee that all previous data accesses are made visible prior to any subsequent data accesses being made visible.

Explicit memory ordering takes the form of a set of instructions: ordered load and ordered check load (ld.acq, ld.c.clr.acq), ordered store (st.rel), semaphores (cmpxchg, xchg, fetchadd), and memory fence (mf). The ld.acq and ld.c.clr.acq instructions follow acquire semantics. The st.rel follows release semantics. The mf instruction is a fence operation. The xchg,

fetchadd.acq, and cmpxchg.acq instructions have acquire semantics. The cmpxchg.rel, and

fetchadd.rel instructions have release semantics. The semaphore instructions also have implicit ordering. If there is a write, it will always follow the read. In addition, the read and write will be performed atomically with no intervening accesses to the same memory region.

Table 4-20 illustrates the ordering interactions between memory accesses with different ordering semantics. “O” indicates that the first and second reference are performed in order with respect to each other. A “-” indicates that no ordering is implied other than data dependencies (and control dependencies for writes and flushes).

Table 4-21 summarizes memory ordering instructions related to cacheable memory. For definitions of the ordering rules related to non-cacheable memory, cache synchronization, and privileged instructions, refer to “Sequentiality Attribute and Ordering” in Chapter 4 of Volume 2.

Table 4-20. Memory Ordering Rules

First Reference Second Reference

Fence Acquire Release Unordered

fence O O O O

acquire O O O O

release O – O –