• No results found

Sequential consistency

Port 0 Port 1 Port 2 Port 3 Port 4 Port

2.7 Caches and Memory Hierarchy

2.7.4.1 Sequential consistency

A popular model for memory consistency is thesequential consistency model(SC model) [126]. This model is an intuitive extension of the uniprocessor model and places strong restrictions on the execution order of the memory accesses. A memory system is sequentially consistent, if the memory accesses of each single processor are performed in the program order described by that processor’s program and if the global result of all memory accesses of all processors appears to all processors in the

samesequential order which results from an arbitrary interleaving of the memory accesses of the different processors. Memory accesses must be performed asatomic

operations, i.e., the effect of each memory operation must become globally visible to all processors before the next memory operation of any processor is started.

The notion of program order leaves some room for interpretation. Program order could be the order of the statements performing memory accesses in thesource pro- gram, but it could also be the order of the memory access operations in a machine program generated by an optimizing compiler which could perform statement reordering to obtain a better performance. In the following, we assume that the order in the source program is used.

Using sequential consistency, the memory operations are treated as atomic oper- ations that are executed in the order given by the source program of each processor and that are centrally sequentialized. This leads to atotal orderof the memory oper- ations of a parallel program which is the same for all processors of the system. In the example given above, not only output 001011, but also 111111 conforms to the SC model. The output 011001 is not possible for sequential consistency.

The requirement of a total order of the memory operations is a stronger restriction as it has been used for the coherence of a memory system in the last section (page 86). For a memory system to be coherent it is required that the write operations to the

samememory location are sequentialized such that they appear to all processors in the same order. But there is no restriction on the order of write operations to different memory locations. On the other hand, sequential consistency requires that all write operations (to arbitrary memory locations) appear to all processors in the same order. The following example illustrates that the atomicity of the write operations is important for the definition of sequential consistency and that the requirement of a sequentialization of the write operations alone is not sufficient.

Example Three processorsP1,P2,P3execute the following statements:

processor P1 P2 P3

program (1)x1=1; (2)while(x1==0); (4)while(x2==0); (3)x2=1; (5)print(x1);

The variablesx1andx2are initialized with 0. ProcessorP2waits untilx1has value 1 and then setsx2to 1. ProcessorP3waits untilx2has value 1 and then prints the value ofx1. Assuming atomicity of write operations, the statements are executed in the order (1), (2), (3), (4), (5), and processorP3prints the value 1 forx1, since write

operation (1) of P1must become visible to P3before P2executes write operation (3). Using a sequentialization of the write operations of a variable without requiring atomicity and global sequentialization as it is required for sequential consistency would allow the execution of statement (3) before the effect of (1) becomes visible for P3. Thus, (5) could print the value 0 forx1.

To further illustrate this behavior, we consider a directory-based protocol and assume that the processors are connected via a network. In particular, we consider a directory-based invalidation protocol to keep the caches of the processors coherent. We assume that the variablesx1andx2have been initialized with 0 and that they are both stored in the local caches of P2 andP3. The cache blocks are marked as shared (S).

The operations of each processor are executed in program order and a memory operation is started not before the preceding operations of the same processor have been completed. Since no assumptions on the transfer of the invalidation messages in the network are made, the following execution order is possible:

(1) P1executes the write operation (1) tox1. Sincex1is not stored in the cache of

P1, a write miss occurs. The directory entry ofx1is accessed and invalidation messages are sent toP2andP3.

(2) P2executes the read operation (2) tox1. We assume that the invalidation message ofP1has already reachedP2and that the memory block ofx1has been marked invalid (I) in the cache of P2. Thus, a read miss occurs, and P2 obtains the current value 1 of x1over the network from P1. The copy of x1in the main memory is also updated.

After having received the current value of x1, P1 leaves the while loop and executes the write operation (3) tox2. Because the corresponding cache block is marked as shared (S) in the cache of P2, a write miss occurs. The directory entry ofx2is accessed and invalidation messages are sent toP1andP3. (3) P3executes the read operation (4) tox2. We assume that the invalidation message

ofP2has already reachedP3. Thus,P3obtains the current value 1 ofx2over the network. After that, P3leaves the while loop and executes the print operation (5). Assuming that the invalidation message ofP1forx1has not yet reachedP3,

P3accesses the old value 0 forx1from its local cache, since the corresponding cache block is still marked with S. This behavior is possible if the invalidation messages may have different transfer times over the network.

In this example, sequential consistency is violated, since the processors observe different orders of the write operation: ProcessorP2observes the orderx1=1,x2=1, whereas P3observes the orderx2 =1,x1 =1 (sinceP3gets thenewvalue ofx2, but theoldvalue ofx1for its read accesses). In a parallel system, sequential consistency can be guaranteed by the following

sufficient conditions[41, 51, 176]:

(1) Every processor issues its memory operations in program order. In particular, the compiler is not allowed to change the order of memory operations, and no out-of-order executions of memory operations are allowed.

2.7 Caches and Memory Hierarchy 95

(2) After a processor has issued a write operation, it waits until the write operation has been completed before it issues the next operation. This includes that for a write miss all cache blocks which contain the memory location written must be marked invalid (I) before the next memory operation starts.

(3) After a processor has issued a read operation, it waits until this read operation and the write operation whose value is returned by the read operation has been entirely completed. This includes that the value returned to the issuing processor becomes visible to all other processors before the issuing processor submits the next memory operation.

These conditions do not contain specific requirements concerning the interconnection network, the memory organization or the cooperation of the processors in the parallel system. In the example from above, condition (3) ensures that after readingx1,P2 waits until the write operation (1) has been completed before it issues the next memory operation (3). Thus,P3always reads the new value ofx1when it reaches statement (5). Therefore, sequential consistency is ensured.

For the programmer, sequential consistency provides an easy and intuitive model. But the model has a performance disadvantage, since all memory accesses must be atomic and since memory accesses must be performed one after another. Therefore, processors may have to wait for quite a long time before memory accesses that they have issued have been completed. To improve performance, consistency models with fewer restrictions have been proposed. We give a short overview in the following and refer to [41, 94] for a more detailed description. The goal of the less restricted models is to still provide a simple and intuitive model but to enable a more efficient implementation.