10: Correct Implementation of a Memory Model

Specification of System Requirements To correctly and efficiently implement a memory model, a system designer must first identify the memory

Definition 4. 10: Correct Implementation of a Memory Model

A system correctly implements a memory model iff for every execution allowed by the system, the result is the same as a result of a valid execution (as specified by Definition 4.9).

A typical implementation of a model obeys the constraints on execution order directly (and usually conservatively); i.e., constraints on the execution order are enforced in real time. A system designer is free to violate these constraints as long as the resulting executions appear to obey the constraints by generating the same results (Definition 4.10).4 _{However, in practice it is difficult for a designer to come up with less}

restrictive constraints and yet guarantee that the allowable executions will produce the same results as with the original constraints. Therefore, it is important to initially provide the designer with as minimal a set of constraints as possible.

The next section presents a more general abstraction for memory operations that is better suited for providing aggressive specifications that further expose the optimizations allowed by a model, thus making it easier to come up with more efficient implementations. This generality also enables the specification of a much larger set of relaxed models.

4.1.3 A More General Abstraction for Memory Operations

Figure 4.3 depicts the conceptual model for memory that inspires our general abstraction. The abstraction consists of n processors, P1,

:::

,Pn, each with a complete copy of the shared (writable) memory, denoted as

Mifor Pi. Each processing node also has a conceptually infinite memory buffer between the processor and

memory on that node. As before, processors access memory using read and write operations. However, due to the presence of multiple copies, we introduce the notion of sub-operations for each memory operation. A read operation R by Pi is comprised of two atomic sub-operations: an initial sub-operation Rinit(i) and a single

and operations that have been generated so far in the system. The partial execution generates a valid outcome or is valid if it could possibly lead to a valid complete execution; i.e., a partial execution is valid if its sets I’ and O’ are respectively subset of sets I and O of the instructions and operations of some valid execution. Note here that in forming I’ and O’, we should not include speculative instructions or operations that may have been generated but are not yet committed in the system.

4_{This would not be possible for models such as linearizability (see Section 2.6 in Chapter 2) where correctness depends on the}

real time when events occur. In contrast, for the models discussed here, we have assumed that the real time occurrence of events is unobservable to the programmer and is therefore not part of the correctness criteria (e.g., see Definition 4.6).

. . .

M

P

R(i) W(j), j = 1..n R(i) W(i) W(i) from other P’s W(j), j = i R(i) response { {

M

P

{ i i n n

P

₁

M

Network

Figure 4.3: General model for shared memory.

read sub-operation R(i). A write operation W by Pi is comprised of at most (n+1) atomic sub-operations:

an initial write sub-operation Winit(i) and at most n sub-operations W(1),

:::

,W(n). A read operation on Pi

results in the read sub-operation R(i) being placed into Pi’s memory buffer. Similarly, a write operation

on Pi results in write sub-operations W(1),

:::

,W(n) being placed in its processor’s memory buffer. The

initial sub-operations Rinit(i) and Winit(i) are mainly used to capture the program order among conflicting

operations. Conceptually, Rinit(i) for a read corresponds to the initial event of placing R(i) into a memory

buffer and Winit(i) for a write corresponds to the initial event of placing W(1),

:::

,W(n) into the memory

buffer.

The sub-operations placed in the buffer are later issued to the memory system (not necessarily in first-in- first-out order). A write sub-operation W(j) by Piexecutes when it is issued from the buffer to the memory

system and atomically updates its destination location in the memory copy Mjof Pjto the specified value. A

read sub-operation R(i) by Piexecutes when its corresponding read operation is issued to the memory system

and returns a value. If there are any write sub-operations W(i) in Pi’s memory buffer that are to the same

location as R(i), then R(i) returns the value of the last such W(i) that was placed in the buffer (and is still in the buffer). Otherwise, R(i) returns the value of the last write sub-operation W(i) that executed in the memory copy Miof Pi, or returns the initial value of the location if there is no such write.

The above abstraction is a conceptual model that captures the important properties of a shared-memory system that are relevant to specifying system requirements for memory models. Below, we describe the significance of the following features that are modeled by the abstraction: a complete copy of memory for each processor, several atomic sub-operations for a write, and buffering operations before issue to memory.

Providing each processor with a copy of memory serves to model the multiple copies that are present in real systems due to the replication and caching of shared data. For example, the copy of memory modeled for a processor may in reality represent a union of the state of the processor’s cache, and blocks belonging to memory or other caches that are not present in this processor’s cache.

copies of a location may be non-atomic. Adve and Hill [AH92a] explain the correspondence to real systems as follows (paraphrased). “While there may be no distinct physical entity in a real system that corresponds to a certain sub-operation, a logically distinct sub-operation may be associated with every operation and a memory copy. For example, updating the main memory on a write corresponds to the sub-operations of the write in memory copies of processors that do not have the block in their cache. Also, while sub-operations may not actually execute atomically in real systems, one can identify a single instant in time at which the sub-operation takes effect such that other sub-operations appear to take effect either before or after this time.” Note that in reality, write operations may actually invalidate a processor’s copy instead of updating it with new data. Nevertheless, the event can be modeled as an update of the logical copy of memory for that processor.

The third feature, i.e., the memory buffer, seems necessary to capture the behavior of many multiprocessor system designs where a processor reads the value of its own write before any of the write’s sub-operations take effect in memory. We refer to this feature as read forwarding. As with the other features, there may be no physical entity corresponding to this buffer in a real system. For example, this scenario can occur in a cache-coherent multiprocessor if a processor does a write to a location that requires exclusive ownership to be requested and allows a subsequent conflicting read (from the same processor) to return the new value from the cache while the ownership request is pending.

The conceptual model presented above provides the intuition for the formalism we present below. Defini- tion 4.11 describes the components of an execution based on the general abstraction for memory operations. Compared to Definition 4.7, the only differences are the presence of memory sub-operations and the fact that the execution order is defined among these sub-operations as opposed to among memory operations.

Definition 4.11: Components of an Execution with the General Abstraction for Memory Operations

In document WRL 95 9 pdf (Page 108-110)