Specification of System Requirements To correctly and efficiently implement a memory model, a system designer must first identify the memory
Condition 4. 6: Return Value for Read Sub-Operations
4.4.3 Porting Properly-Labeled Programs to System-Centric Models
While porting a properly-labeled program to a system-centric model requires guaranteeing sequentially consistent executions for the program, the information provided by properly-labeled programs in the form of operation labels allows us to achieve ports that are substantially more efficient than the ports of SC programs specified in Table 4.1.
Tables 4.3, 4.4, and 4.5 show the sufficient mappings for porting PL1, PL2, and PL3 programs.13 These
mappings are not unique and other mappings may be possible. To determine the appropriate mappings, we ensure that any orders imposed by the sufficient conditions for a properly-labeled program (Figures 4.14, 4.15, and 4.16) are also imposed by the specification of the destination model with the specified mapping. Although these mappings are similar in nature to the mappings for SC programs in Table 4.1, they are significantly more selective. For example, while porting an SC program to WO requires all memory operation to be labeled as synchronization, porting a PL1 program requires this for only the competing memory operations. Since competing operations are typically far less frequent that non-competing operations, selective mapping of operations can provide a substantial performance advantage by exploiting the reordering optimizations of the target model.
Properly-labeled programs port efficiently to most of the system-centric models. Furthermore, overheads due to extra fence instructions for models such as IBM-370, PSO, Alpha, RMO, and PowerPC, or dummy read- modify-writes for models such as TSO, PSO, PC, RCpc, and PowerPC, are significantly reduced compared to porting SC programs simply because these additional instructions are used selectively and infrequently. For example, consider the need for dummy read-modify-writes when porting a PL1 program to PC. Without any information about the program, the mapping in Table 4.1 requires every read to be part of a read-modify-write. However, with the PL1 labels, we can limit this requirement to competing reads only. Therefore, the number of dummy read-modify-writes that are added by the port can be quite low because (a) competing reads are typically infrequent, and (b) some of these reads may already be part of a read-modify-write.14 Table 4.6
shows the ports of properly-labeled programs to the extended versions of some of the system-centric models introduced in Section 4.4.1. While the extended versions were important for porting SC programs more efficiently, the ordinary versions of the models are sufficiently efficient for porting PL programs.
Most of the system-centric models can beneficially exploit the extra information about memory operations as we move from PL1 to PL2 and PL3. The IBM-370, TSO, and PC models have the same mapping for PL1 and PL2 programs since the distinction between competing sync and non-sync operations cannot be exploited by these models. However, PL3 programs lead to potentially more efficient mappings in the above models. Table 4.4 also shows the RCsc model as having the same mapping for PL2 and PL3 programs. While RCsc can actually benefit from the extra information provided by PL3 programs, the more aggressive mapping that arises is complicated and difficult to achieve in practice; therefore, we use the same mappings as for PL2.
Among the system-centric models shown, RCsc is the most efficient model for executing PL1 and PL2 programs and RCpc is the most efficient model for executing PL3 programs. Compared to the other system- centric models, RCsc and RCpc best exploit the reordering optimizations allowed by PL programs and enforce
13For the PowerPC mappings shown in Table 4.5, we assume that if we depend on an existing write to the same address (clause (c) for
PL1 and PL2, or clause (d) for PL3), any SYNC that needs to be placed after the Rc due to the other clauses is placed after the existing write as well (except the SYNC that may be required between Rc and the existing write).
14The fact that PL1 and PL2 programs can be ported reasonably efficiently to the PC, RCpc, and PowerPC models is quite surprising
since the latter models do not provide direct mechanisms for providing atomic writes. As explained above, however, it turns out the need for (potentially dummy) read-modify-writes can be limited to competing reads.
the required orders with virtually no extra overhead (such as additional fence instructions). This is somewhat expected since the RC models were originally developed in conjunction with the proper labeling framework. Of course, the sufficient conditions for properly-labeled models allow yet further optimizations not exploited by RCsc and RCpc.
Table 4.3: Sufficient mappings for porting PL programs to system-centric models.
Model PL1 PL2 PL3
IBM-370
(a) for every Wc
po ,!Rc, at least one of Rc or Wc is a synchronization op- eration, or Wc po ,!X po ,!Rc where X is a fence, a synchronization, or a read to same location as Wc.
(a) same as PL1.
(a) for every Wc nl ns
po ,! Rc nl ns, at least one of Rc nl ns or Wc nl ns is a synchronization op- eration, or Wc nl ns po ,! X po ,!
Rc nl ns where X is a fence, a syn- chronization, or a read to same loca- tion as Wc nl ns.
TSO
(a) for every Wc po
,! Rc, at least
one of Rc or Wc is part of a RMW, or there is a RMW such that Wc po
,!
RMW po ,! Rc.
(a) same as PL1.
(a) for every Wc nl ns
po ,! Rc nl ns, at least one of R nl ns or W nl ns is part of a RMW, or there is a RMW such that Wc nl ns po ,! RMW po ,! R nl ns.
PC (a) every Rc is part of a RMW. (a) same as PL1. (a) every Rc nl ns is part of a RMW.
PSO
(a) for every Wc
po
,! Rc, at least
one of Rc or Wc is part of a RMW, or there is a RMW such that Wc po
,!
STBAR po ,!RMW
po ,!Rc.
(b) a STBAR exists between every W po
,!Wc.
(a) for every Wc po
,! Rc, at least
one of Rc or Wc is part of a RMW, or there is a RMW such that Wc po
,! STBAR po ,! RMW po ,! Rc.
(b) a STBAR exists between ev- ery Wc po
,!Wc and every W po ,!
Wc rel.
(a) for every Wc nl ns
po ,! Rc nl ns, at least one of Rc nl ns or Wc nl ns is part of a RMW, or there is a RMW such that Wc nl ns po ,! STBAR po ,! RMW po ,! Rc nl ns.
(b) a STBAR exists between ev- ery Wc po
,!Wc and every W po ,!
Table 4.4: Sufficient mappings for porting PL programs to system-centric models.
Model PL1 PL2 PL3
WO (a) every Rc and Wc is mapped to a
synchronization operation.
(a) every Rc acq and Wc rel is mapped to a synchronization oper- ation.
(b) for every Xc
po
,! Yc, at least
one of X or Y is mapped to a syn- chronization operation.
(a) every Rc acq and Wc rel is mapped to a synchronization oper- ation.
(b) for every Rc po
,!Yc or Wc po ,!
Wc, at least one operation is mapped to a synchronization operation. (c) for every Wc nl ns po
,!
Rc nl ns, at least one operation is mapped to a synchronization opera- tion.
RCsc
(a) every Rc is mapped to an acquire. (b) every Wc is mapped to a release.
(a) every Rc acq is mapped to an ac- quire; other Rc mapped to non-sync or acquire.
(b) every Wc rel is mapped to a re- lease; other Wc mapped to non-sync or release.
(a) same as PL2.
RCpc
(a) every Rc is mapped to an acquire. (b) every Wc is mapped to a release. (c) every Rc is part of a RMW, with W mapped to non-sync or release.
(a) every Rc acq is mapped to an ac- quire; other Rc mapped to non-sync or acquire.
(b) every Wc rel is mapped to a re- lease; other Wc mapped to non-sync or release.
(c) every Rc is part of a RMW, with W mapped to non-sync or release.
(a) every Rc acq is mapped to an ac- quire; other Rc mapped to non-sync or acquire.
(b) every Wc rel is mapped to a re- lease; other Wc mapped to non-sync or release.
(c) every Rc nl ns is part of a RMW, with W mapped to non-sync or re- lease.
Table 4.5: Sufficient mappings for porting PL programs to system-centric models.
Model PL1 PL2 PL3
Alpha
(a) an MB exists between every Rc po
,! RW and RW po ,! Wc.
(b) an MB exists between every Xc po
,! Yc.
(a) an MB exists between every Rc acq po
,! RW and RW po ,!
Wc rel.
(b) an MB exists between every Xc po
,! Yc.
(a) an MB exists between every Rc acq po ,! RW and RW po ,! Wc rel.
(b) an MB exists between every Rc po ,! Rc, Rc po ,! Wc, and Wc po ,! Wc.
(c) an MB exists between every Wc nl ns
po
,!Rc nl ns.
RMO
(a) a MEMBAR(RY) exists between every Rc po
,! Y.
(b) a MEMBAR(XW) exists be- tween every X po
,! Wc.
(c) a MEMBAR(XY) exists between every Xc po
,!Yc.
(a) a MEMBAR(RY) exists between every Rc acq po
,!Y.
(b) a MEMBAR(XW) exists be- tween every X po
,! Wc rel.
(c) a MEMBAR(XY) exists between every Xc po
,!Yc.
(a) a MEMBAR(RY) exists between every Rc acq po
,! Y.
(b) a MEMBAR(XW) exists be- tween every X po
,!Wc rel.
(c) a MEMBAR(RY) exists between every Rc po
,!Yc.
(d) a MEMBAR(WW) exists be- tween every Wc po
,!Wc.
(e) a
MEMBAR(WR) exists between ev- ery Wc nl ns po
,! Rc nl ns.
PowerPC
(a) a SYNC exists between every Rc po
,! RW and RW po ,! Wc.
(b) a SYNC exists between every Xc po
,! Yc.
(c) every Rc is either part of a RMW or is immediately followed in pro- gram order by an existing write to the same address.
(a) a SYNC exists between every Rc acq po
,! RW and RW po ,!
Wc rel.
(b) a SYNC exists between every Xc
po ,! Yc.
(c) every Rc is either part of a RMW or is immediately followed in pro- gram order by an existing write to the same address.
(a) a SYNC exists between every Rc acq po
,! RW and RW po ,!
Wc rel.
(b) a SYNC exists between every Rc po ,! Rc, Rc po ,! Wc, and Wc po ,! Wc.
(c) a SYNC exists between every Wc nl ns po
,!Rc nl ns.
(d) every Rc is either part of a RMW or is immediately followed in pro- gram order by an existing write to the same address.
Table 4.6: Porting PL programs to extended versions of some system-centric models.
Model PL1 PL2 PL3
TSO+ (a) a MEMBAR(WR) exists be-
tween every Wc po ,!Rc.
(a) same as PL1.
(a) a
MEMBAR(WR) exists between ev- ery Wc nl ns po
,!Rc nl ns.
PC+
(a) a fence exists between every Wc po
,!Rc.
(b) every Wc is mapped to an atomic write.
(a) same as PL1.
(a) a fence exists between every Wc nl ns po
,! Rc nl ns.
(b) every Wc nl ns is mapped to an atomic write.
PSO+
(a) a MEMBAR(WR) exists be- tween every Wc po
,!Rc.
(b) a STBAR exists between every W po
,!Wc.
(a) a MEMBAR(WR) exists be- tween every Wc po
,! Rc.
(b) a STBAR exists between ev- ery Wc po ,!Wc and every W po ,! Wc rel. (a) a
MEMBAR(WR) exists between ev- ery Wc nl ns po
,!Rc nl ns.
(b) a STBAR exists between ev- ery Wc po ,!Wc and every W po ,! Wc rel. RCpc+
(a) every Rc is mapped to an acquire. (b) every Wc is mapped to a release. (c) a fence exists between every Wc po
,!Rc.
(d) every Wc is mapped to an atomic write.
(a) every Rc acq is mapped to an ac- quire; other Rc mapped to non-sync or acquire.
(b) every Wc rel is mapped to a re- lease; other Wc mapped to non-sync or release.
(c) a fence exists between every Wc po
,! Rc.
(d) every Wc is mapped to an atomic write.
(a) every Rc acq is mapped to an ac- quire; other Rc mapped to non-sync or acquire.
(b) every Wc rel is mapped to a re- lease; other Wc mapped to non-sync or release.
(c) a fence exists between every Wc nl ns po
,! Rc nl ns.
(d) every Wc nl ns is mapped to an atomic write.
PowerPC+
(a) a SYNC exists between every Rc po
,! RW and RW po ,! Wc.
(b) a SYNC exists between every Xc po
,! Yc.
(c) every Wc is mapped to an atomic write.
(a) a SYNC exists between every Rc acq po
,! RW and RW po ,!
Wc rel.
(b) a SYNC exists between every Xc po
,!Yc.
(c) every Wc is mapped to an atomic write.
(a) a SYNC exists between every Rc acq po ,! RW and RW po ,! Wc rel.
(b) a SYNC exists between every Rc po ,! Rc, Rc po ,! Wc, and Wc po ,! Wc.
(c) a SYNC exists between every Wc nl ns
po
,! Rc nl ns.
(c) every Wc is mapped to an atomic write.