Extensions to Our Abstraction and Specification Framework

Specification of System Requirements To correctly and efficiently implement a memory model, a system designer must first identify the memory

Condition 4. 6: Return Value for Read Sub-Operations

4.5 Extensions to Our Abstraction and Specification Framework

While our framework is general and extensible, the methodology we have presented so far deals only with data read and data write memory operations. Furthermore, parts of our specification methodology, such as imposing execution order constraints among conflicting operations only, depend heavily upon our simplified definition for the result of an execution. This simplified abstraction is extremely useful for isolating and specifying the behavior of shared memory. Furthermore, for most programmers, the simple abstraction and specifications are sufficient for understanding the behavior of a system. Nevertheless, a small number of programmers, such as system programmers, may require a more general framework that also encompasses other types of operations such as those issued to and from I/O devices. Similarly, system designers must typically deal with ordering semantics for a more general set of events.

To characterize the behavior of realistic shared-memory systems, our framework must be generalized in two ways: (a) include more events, such as events generated by I/O devices, and (b) extend the notion of result to include the effect of some of these events. Appendix J identifies some of the issues that arise in modeling a realistic system and describes possible extensions to our framework to deal with these issues. Many of the issues we discuss are not particular to multiprocessors and occur in uniprocessor systems as well. Furthermore, even uniprocessor designs may sacrifice serial semantics for events such as I/O operations or instruction fetches in order to achieve higher performance. Chapter 5 further describes implementation issues with respect to I/O operations, instruction fetches, and multiple granularity data operations.

4.6 Related Work

This section compares our abstraction and specification framework with other approaches. We also describe related work in specifying sufficient conditions for supporting properly-labeled programs.

4.6.1 Relationship to other Shared-Memory Abstractions

Section 4.1.3 presented our general abstraction for shared memory and enumerated the significance of the three main features modeled by this abstraction: a complete copy of memory for each processor, several atomic sub-operations for a write, and buffering operations before issue to memory. This abstraction is an extension of an earlier abstraction we developed jointly with Adve and Hill of Wisconsin [GAG+

93]. Below we discuss some other abstractions for shared-memory that have been proposed as a basis for specifying memory models. We compare these abstractions mainly on the basis of their flexibility for capturing the behavior of various memory models and ordering optimizations. The next section considers the various specification methodologies, most of which are based on the abstractions discussed below.

Dubois et al. [DSB86, SD87] present an abstraction that models the various stages of completion for a memory operation. They use this abstraction to present specifications for both sequential consistency and weak ordering. The notion of “perform with respect to a processor” in this abstraction models the effects of replication and the non-atomicity of writes, essentially capturing the first two features of our abstraction. One of the problems with this abstraction is that the definition of “perform” is based on real time. Another shortcoming of the abstraction is that it does not seem to be powerful enough to capture the read forwarding optimization where the processor is allowed to read the value of its own write before the write takes effect in

any memory copy. In fact, it seems difficult to capture the behavior of commercial models such as TSO, PSO, and RMO and other models such as PC, RCsc, and RCpc using Dubois’ abstraction.15 _{Furthermore, even}

though models such as SC or WO can be easily modeled without resorting to the notion of read forwarding, more aggressive specifications of such models (e.g., conditions shown for SC in Figure 4.7) benefit from a more general abstraction.

The abstraction proposed by Collier [Col92] is formal and captures replication of data and the non- atomicity of writes. In fact, the first two features in our abstraction, that of a complete memory copy per processor and several atomic sub-operations for writes, are based directly on this abstraction. Collier’s abstraction has also been used by other researchers to specify system requirements for memory models (e.g., DRF1 [AH92a]). Yet it has the same shortcoming as Dubois et al.’s abstraction in its inability to capture the read forwarding optimization. Our abstraction subsumes Collier’s abstraction. In particular, our abstraction degenerates to Collier’s abstraction if we remove the Rinitand Winitsub-operations and require W

po ,!R

to imply W(i) xo

,!R(i) in the specification when both operations are to the same location. These notions are

important, however, for properly capturing the read forwarding optimization.

Sindhu et al. [SFC91] also propose an abstraction which is used to specify the TSO and PSO models. This abstraction is flexible enough to handle the read forwarding optimization, modeling it through a conceptual write buffer that allows a read to return the value of a write before it is retired from this buffer. However, the abstraction fails to capture the non-atomicity of writes which is an important feature for modeling the behavior of several models and systems. More recently, Corella et al. [CSB93] have also proposed an abstraction for specifying the PowerPC model. This abstraction fails to deal with the lack of multiple-copy atomicity when the coherence requirement is not imposed on all writes, and also fails to model the read forwarding optimization.

Yet another way of abstracting the system is to represent it in terms of execution histories [HW90]; Hagit et al. have also used this type of abstraction to specify the hybrid consistency model [AF92]. Effectively, a history represents one processor’s view of all memory operations or a combined view of different processors. This type of abstraction is in essence similar to Collier’s abstraction and shares the same advantages and disadvantages.

The abstraction used by Gibbons et al. [GMG91, GM92] to formalize the system requirements for properly-labeled (PL) programs and release consistency is the only abstraction we are aware of that captures the same set of features as our general abstraction. That is, they model the existence of multiple copies, the non-atomicity of writes, the out-of-order execution of memory operations, and allowing the processor to read its own write before the write is issued to the memory system.16 _{The one shortcoming of this abstraction is}

that specifications based on it typically model the system at too detailed a level. For example, Gibbons et al.’s specifications [GMG91, GM92] involve more events than we use in our specification and inherently depend on states and state transitions, making it complex to reason with and difficult to apply to system designs with substantially different assumptions.

The abstraction presented in this chapter extends our previous abstraction [GAG+

93] in a few ways. First, we added the Rinit(i) sub-operation. Our original abstraction had a subtle limitation: given R1

po ,! W

po ,!R2

15_{See an earlier technical report [GGH93b] for a discussion of this limitation and a possible extension to Dubois’ abstraction that}

remedies it.

16_{While the original}

baseabstraction [GMG91] did not model out-of-order read operations from the same processor,the non-blocking M

to the same location on Pi, our original definition of the initiation condition would require R1(i) xo ,! W

init(i)

and Winit(i) xo

,! R2(i). This implicitly orders R1 and R2 (i.e., R1(i) xo

,! R2(i)) which turns out to

be overconstraining in some specifications. Introducing the Rinit(i) sub-operation removes this problem.

Second, we made the atomic read-modify-write condition (Condition 4.7) more aggressive to allow the read forwarding optimization from a previous write to the read of the read-modify-write. Finally, we simplified the format of the specifications by removing some of the intermediate ordering relations (such assxo

,![GAG +

93]).

4.6.2 Related Work on Memory Model Specification

This section describes the various approaches that have been proposed for specifying system requirements for memory models. We compare the various specification methodologies primarily based on the level of aggressive optimizations that can be captured and exposed by each technique.

One of the key observations in our specification methodology is that the behavior of most memory models can be captured without constraining the execution order among non-conflicting operations. For example, we showed equivalent conservative and aggressive specifications for sequential consistency (Figures 4.4 and 4.7), where the aggressive specification imposes execution orders among conflicting operations only and yet maintains the same semantics as the conservative specification. Such aggressive specifications expose a much wider range of optimizations and allow the specification to be used for a wider range of system designs.

We originally made the observation that memory models can be specified aggressively by only imposing constraints on conflicting operations as part of our joint work with Adve and Hill [GAG+

93]. The above observation has been previously made by others as well. For example, Shasha and Snir [SS88] exploit a similar observation in identifying a minimal set of orders (derived from the program order) that are sufficient for achieving sequential consistency for a given program. Collier [Col92] also uses this observation for proving equivalences between different sets of ordering constraints. However, previous specifications of memory models do not exploit this observation to its full potential. Specifically, many of the specifications impose unnecessary ordering constraints on non-conflicting pairs of memory operations; even Shasha and Snir’s implementation involves imposing delays among non-conflicting memory operations that occur in program order. In contrast, our framework presents a unified methodology for specifying ordering constraints that apply to pairs of conflicting memory operations only.

There have been numerous specification techniques that lead to conservative constraints. Dubois et al.’s specification style [DSB86, SD87] places unnecessary constraints on memory ordering since it constrains the execution order among accesses to different locations in a similar way to the conservative conditions for SC in Figure 4.6. This same limitation exists with the specifications for TSO and PSO provided by Sindhu et al. [SFC91] and the specification of release consistency provided by Gibbons et al. [GMG91, GM92]. As discussed above, Collier [Col92] does observe that two sets of conditions are indistinguishable if they maintain the same order among conflicting accesses, yet his methodology for specifying conditions constrains order among non-conflicting operations just like the other schemes. Therefore, none of the above methodologies expose the optimizations that become possible when only the order among conflicting operations is constrained.

Adve and Hill’s specification of sufficient conditions for satisfying DRF1 [AH92a] is one of the few specifications that presents ordering restrictions among conflicting memory operations only. However, parts of these conditions are too general to be easily convertible to an implementation. While Adve and Hill provide

a second set of conditions that translates more easily into an implementation, this latter set of conditions are not as aggressive and restrict orders among operations to different locations. Finally, because their specification is based on Collier’s abstraction, their approach does not easily lend itself to specifying models that exploit the read forwarding optimization.

In designing our specification technique, our primary goals have been to provide a framework that covers both the architecture and compiler requirements, is applicable to a wide range of designs, and exposes as many optimizations as possible without violating the semantics of a memory model. Our specification framework could conceivably be different if we chose a different set of goals. For example, with the general nature of our framework, the designer may have to do some extra work to relate our conditions to a specific implementation. Had we focused on a specific class of implementations, it may have been possible to come up with an abstraction and a set of conditions that more closely match specific designs. Similarly, our methodology of only restricting the order among conflicting operations is beneficial mainly at the architectural level. This complexity would not be very useful if we wanted to only specify requirements for the compiler. And in fact, such complexity is undesirable if the specification is to only be used by programmers to determine the set of possible outcomes under a model (however, we strongly believe programmers should reason with the high- level abstraction presented by programmer-centric models). Nevertheless, we feel the benefit of providing a uniform framework that applies to a wide range of implementations outweighs any of its shortcomings.

In summary, our specification methodology exposes more optimizations and is easier to translate into aggressive implementations than previous methods. Given the generality of our framework, it would be interesting to also use it to specify the system requirements for other models that we have not discussed in this thesis. The fact that a uniform framework may be used for specifying different models can greatly simplify the task of comparing the system implications across the various models.

4.6.3 Related Work on Sufficient Conditions for Programmer-Centric Models

There have been a number of attempts at specifying and proving the correctness of sufficient conditions for supporting various programmer-centric models. The seminal work in this area has been done by our group at Stanford and Adve and Hill at Wisconsin, with some of the work done jointly.

The original papers on the properly-labeled (PL) [GLL+

90] and the data-race-free-0 (DRF0) [AH90b] frameworks each provide sufficient conditions for satisfying the relevant programmer-centric model, along with proofs of correctness for these conditions. For the PL work, the sufficient conditions were in the form of the RCsc model. Adve and Hill later extended their data-race-free model to distinguish between acquire and release operations similar to the PL framework, and provided a new set of conditions for satisfying DRF1 [AH93]. Gibbons et al. [GMG91, GM92] have also provided sufficient conditions, along with proofs, for supporting properly-labeled programs. The sufficient conditions presented in the first paper [GMG91] were limited to processors with blocking reads, but this restriction was alleviated in a later paper [GM92].

As part of our joint work with Adve and Hill on the PLpc model, we identified the optimizations allowed by this model along with specifying ports of PLpc programs to a few system-centric models [GAG+

92]. In a later paper [AGG+

93], we formally specified the sufficient conditions for supporting PLpc programs and provided correctness proofs for both these conditions and the conditions specified for porting PLpc programs to other models. The conditions for supporting PLpc programs were specified using our aggressive specification methodology [GAG+

only, thus exposing a large set of ordering optimizations. The sufficient conditions presented in this chapter for supporting the three properly-labeled models, along with the conditions for porting properly-labeled programs, are an extension of the above work on PLpc. Furthermore, the conditions for porting PL programs provided in this chapter cover a wider range of system-centric model compared to our previous work in specifying such ports for the PLpc model [GAG+

92].

Hagit et al. [AF92, ACFW93] have also proposed hybrid consistency as a set of sufficient conditions for supporting a few programmer-centric models that they have defined. However, as we mentioned in Chapter 3, hybrid consistency places severe restrictions on the reordering of operations compared to analogous conditions for PL and DRF programs, partly because some of the programmer-centric models defined by Hagit et al. are overly restrictive.

Compared to the sufficient conditions presented in this chapter (or in our work on PLpc [AGG+

93]), many of the other specifications are more conservative and often less precise. The evolution of the reach condition is indicative of the latter point. The main purpose for the reach condition is to disallow the types of anomalous executions that arise if we allow “speculative” write sub-operations to take effect in other processors’ memory copies. In most previous work, such conditions were either implicitly assumed or assumed to be imposed by informal descriptions such as “intra-processor dependencies are preserved” [AH90b] or “uniprocessor control and data dependences are respected” [GLL+

90]. Some proofs of correctness (e.g., proof of correctness for PL programs executing on the RCsc model [GLL+

90]) formalized certain aspects of this condition, but the full condition was never presented in precise terms. Later work by Adve and Hill specified this condition more explicitly in the context of the DRF1 model [AH92a] and proved that it is sufficient for ensuring sequentially consistent results for data-race-free programs executing on models such as WO and RCsc. More recently, we jointly formalized an aggressive form of this condition as part of specifying sufficient conditions for PLpc [AGG+

93]. The reach condition presented in this thesis is based upon the above work on PLpc and constitutes the most precise and aggressive set of conditions that we are aware of to eliminate anomalous executions due to speculative writes.

4.6.4 Work on Verifying Specifications

Specifying system requirements often involves subtle issues that may lead to errors in the specification or implementation based on the specification, making automatic verification of specifications and implementations an important research area.

There are several types of verification tools that are interesting. One use is to verify that two specifications are equivalent, or that one is stricter than the other. For example, such a tool may be used to check the equivalence of the aggressive and conservative specifications for various models. Another use is to verify that an implementation satisfies the constraints imposed by a specification. This is somewhat similar to verifying two specifications, except the specification of an implementation may be provided at a much lower level of abstraction compared to the specification of a memory model. Yet a third use is to automatically verify the outcome of small programs or traces or to verify the correctness of simple synchronization primitives under a given specification. A tool that verifies the outcome of small programs or traces may also be used to probabilistically check the equivalence (or stricter relation) between two specifications by comparing the behavior of each specification across a large set of examples (potentially generated in a random fashion). A side benefit of attempting to verify a given specification using any of the above methods is that it will require

the description of the model to be formal and precise. This alone can expose a number of subtle issues to the designers of the model.

Park and Dill [PD95] have described a verifier for the Sparc TSO, PSO, and RMO models that is capable of producing the possible outcomes of very small programs and may also be used to verify the correctness of

In document WRL 95 9 pdf (Page 146-152)