Memory Consistency Models - The SMG DSM system: enabling shared memory for the grid

The memory consistency model used by the developer is effectively a contract between the application and the DSM system, whereby the DSM guarantees that if the software conforms to the agreement then the shared memory is correct, or consistent, in the event of parallel accesses [71]. In other words the consistency model specifies when the shared memory is valid, and the application and DSM agree to adhere to that.

There are two main categories of memory consistency model, namely those that utilise synchronisation points/operations to specify when shared data becomes consistent (relaxed) and those that don’t (strict). There are a number of formally defined consistency models across the memory consistency spectrum between the strict and relaxed consistency extremes. As consistency is relaxed the elapsed time between resolution of potential inconsistencies between copies of the shared data is extended. By permitting these temporary inconsistencies the more relaxed methods increase performance due to a reduction in inter process communication [72]. There are weaker consistency protocols available than those covered here, namely Sections 4.5.6 & 4.5.7. These models (where consistency is guaranteed for a bounded period) are unsuitable for use with distributed memory, but are used extensively maintaining consistency in file and web servers [73]. The following sections describe a number of the formally defined consistency models that have been defined in previous research.

MEMORY CONSISTENCY MODELS 42

4.5.1 Strict Consistency

The most rigorous of all consistency models is strict consistency. It is the model assumed by serial programs for the trivial case of uniprocessor systems. In a multiprocessor all write operations must be immediately visible to all processes. The formal definition of strict consistency is:

Any read to a memory location x returns the value stored by the most recent write operation to that same variable

The implementation of strict consistency is all but impossible in a distributed system as the notion of Newtonian global time applies [74], where an access at any process is required to be seen instantaneously by all other processes. When a read operation occurs the correct value at that exact point in time must be returned no matter how quickly a write may be subsequently performed.

This is illustrated in Figure 4.1. The variable x is initially located at process 1 (P1), which initialises it, W(x), with the value 1, then subsequently updates the value to 2. Process 0 (P0) initiates a read operation on x, R(x), and a request is directed to P1. In order for strict consistency to be maintained P0 must be returned the value 2. Due to the high latencies of communication between the processes this may not be and possible the read access may return the value 1.

Figure 4.1: Failure to adhere to Strict Consistency due to communication delays

4.5.2 Sequential Consistency

Sequential consistency is a weaker model that does not assume Newtonian global time, and mostly provides enough consistency for general usage. Programmers, if properly trained in parallel application development, can easily adapt to a situation whereby

MEMORY CONSISTENCY MODELS 43

statement execution order is irrelevant. However, it is still in the category of strict models. First defined by Lamport [70], a system is sequentially consistent if:

The result of any operation is the same as if the operations of all processes were executed in some sequential order, and the operations of each individual process appear in the order specified by its program.

A system is defined as sequentially consistent if the result of any execution is the same as if the operations were interleaved so long as all processes see the same sequence of memory accesses [74]. A sequentially consistent system does not guarantee to return a value consistent with its state conforming to Newtonian global time, but guarantees to process memory accesses in a sequential order. An example is shown in Figure 4.2.

Figure 4.2: Sequential consistency

4.5.3 PRAM and Processor Consistency

PRAM (Pipelined RAM) and processor consistency models are similar enough that they are often regarded as equivalent [74]. These consistency models allow concurrent writes from different processors to be seen in different orders by different processes. Time-dependent accesses can also be seen in a different order by different processes. Writes from the same process must be identically and correctly ordered (pipelined) by all processes. These models are also categorised as strict models. The formal definition of PRAM consistency is [75]:

The write operations performed by a single process are observed by other processes in the order that they were performed, but the order in which write operations from multiple processes occur can be seen differently

In effect a writer does not have to wait for all modification to reach other processes before it initiates another write operation. In Figure 4.3 it can be seen that writes observed at

MEMORY CONSISTENCY MODELS 44

Figure 4.3: PRAM consistency

Process 2 (P2) may be inconsistent with stricter models when compared with what is observed at other processes at the same time, but all writes from the same process are observed in the order that they occur.

4.5.4 Weak Consistency

The previously mentioned consistency models are quiet restrictive in that they require all writes from a single process to be ordered and viewable by other processes [74], resulting in excess communication. Weak consistency3 assumes that if all writes can be propagated to all remote processes at a certainsynchronisation point then this restriction may be diminished. Weak consistency is categorised as a relaxed model. Relaxed consistency models require the programmer to access shared data in a more structured fashion, thus reducing the volume of network traffic generated, and increasing performance [66]. With weak consistency the task of making memory globally consistent is tied to the use of synchronisation primitives. When a synchronisation operation occurs all writes performed by a process are propagated to remote processes, and all remote writes are applied locally. Hence, there is a clear distinction between ordinary memory accesses and synchronisation accesses. Weak consistency has the following formally defined prop- erties [76]:

1. Accesses to synchronisation variables must be sequentially consistent. 2. No accesses to a synchronisation variable is allowed to be performed

until all previous writes have completed everywhere.

3. No data accesses are allowed to be performed until all previous accesses to synchronisation variables have been performed.

MEMORY CONSISTENCY MODELS 45

As depicted in Figure 4.4 all processes see synchronisation accesses in the same order. When a process is accessing a synchronisation variable then no other process can access it. Before a process is allowed access to the synchronisation variable all preceding writes must have completed, so by the time access is granted all the writes are guaranteed to have been completed. The final condition means that before an ordinary access is allowed to occur then the preceding synchronisation accesses must have been completed. Shared memory is only brought up to date when a synchronisation variable is accessed.

Figure 4.4: Weak consistency

4.5.5 Release Consistency

The main drawback with weak consistency is that there is uncertainty concerning the status of shared memory when a synchronisation access occurs. Is it about to be written to, or has it just occurred? Due to this uncertainty all actions must occur: all local writes must be flushed to remote processes if they exist, and all external writes must be applied locally. Thus the synchronisation operation has a global effect for all shared variables.

With release consistency (RC) this problem is removed by identifying synchronisation operations as being either the entrance or exit of critical sections, within which shared data is accessed, although the operations still have a global effect. These actions were termedacquireandreleaseby the first implementers of release consistency [77]. Acquire actions define the entering of critical sections, while release actions specify the leaving of a critical section. It is the job of the programmer to instrument the application code with these synchronisation operations, be it ordinary operations on special variables or as special operations [74]. The formal definition of release consistency is:

1. Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully.

MEMORY CONSISTENCY MODELS 46

2. Before a release is allowed to be performed, all previous reads and writes performed by the process must have been completed.

3. The acquire and release accesses must be processor consistent.

In effect accesses to shared data are batched, with acquire signalling the start of the batching, and release its end. Figure 4.5 shows the sequence of events that demonstrates the action of a release-consistent system. A typical release consistent DSM system would be constructed using shared distributed locks, where the locking process is equivalent to acquire, and unlocking to release. Global barrier primitives may also be used whereby arrival at the barrier is equivalent to a release operation, and the departure from the barrier to an acquire.

Figure 4.5: Release Consistency

4.5.6 Lazy-Release Consistency

A negative aspect of Release Consistency is that upon calling a release all updates are sent to all processes with a cached copy of any modified shared data. However, not all of these processes may require the invalidate/update notice (they may not be actively reading the data), thus there is potentially superfluous overhead. Lazy-Release (LRC) extends the principles of release consistency by delaying the pushing of invalidate/update information until it is actually required. When a release occurs no communication is generated. At a subsequent acquire the modifications necessary are directed to the acquiring process [78]. The conditions that must be met to guarantee lazy release consistency [79] are as follows:

1. Before an ordinary access to a shared variable is performed with respect to another process, all previous acquires by the process must have completed successfully with respect to that process.

MEMORY CONSISTENCY MODELS 47

2. Before a release is allowed to be performed, all previous reads and writes performed by the process must have been completed.

3. All synchronisation operations must be sequentially consistent with respect to one another.

Figure 4.6 illustrates the difference between the two types of release consistency. When the release operation occurs no communication occurs until the subsequent acquire, in contrast to the previous model where it does.

Some inefficiencies are introduced as an acquiring process with an out-of-date copy of the data must fetch the data from the current owner. This introduces a stall before computation can begin. Prefetching has been used to attempt to overcome this [80], this is instigated by an operation programmed into the application. Such actions must be application driven, as prefetching and lazy release are opposites, so any attempt to automate the prefetching is likely to cancel much of the benefit accruing from lazy release. A possible solution is for the system to adapt to the data usage pattern and so only actively used data will be prefetched.

Figure 4.6: Lazy-Release Consistency

4.5.7 Entry Consistency

Even an efficient model such as LRC still generates a large volume of individual messages, much of this is due to the global effect of the synchronisation operations. Since an acquire intended to enforce consistency on one variable will also have the latent effect of enforcing consistency on all other variables that have been modified, even if that is not necessary. This is a result of having, in effect, only one global synchronisation variable.

Entry Consistency (EC) is one attempt at reducing this problem by closely binding a shared memory region/block to a specific synchronisation variable, i.e. by allowing more than one synchronisation variable, each covering a subset of the shared variables. In

MEMORY CONSISTENCY MODELS 48

a similar fashion to the previous models, that render shared memory consistent when synchronisation operations occur, EC will do likewise, however, only shared memory bound to the synchronisation variable is made consistent. The numbers of messages are also reduced as the update data can be piggybacked upon synchronisation messages. Stalling of the application, due to waiting for memory consistency to be enforced, is reduced as well. The explicit association of shared data with synchronisation variables creates extra burden for the programmer. This is the downside for the reduction in coherence message generated.

A memory system is Entry Consistent if the following conditions can be met [81]: 1. Before an acquire access to a synchronisation variable s is allowed to

perform with respect to any process pi all updates to shared variables

guarded by s must be performed with respect to that process.

2. Before an exclusive access to a synchronisation variable s by a process pi, then no other processor may hold s in non-exclusive mode.

3. After an exclusive access to s has been performed, any processor’s next nonexclusive mode access to that synchronisation variable may not be performed until it has been performed with respect to the current owner of the synchronisation variable s.

Entry Consistency only guarantees that when an acquire operation on a synchronisation variable occurs, the data bound to that variable is made consistent. Figure 4.7 demonstrates this, where a synchronisation variable,z, is acquired and released by process P1. The data variable x that is bound to z is updated upon an acquire of z at a remote process, while the modifications to a variable that is not bound, y, but modified in the same interval asx, is not (under RC or LRC it would be).

MEMORY COHERENCE PROTOCOLS 49

In document The SMG DSM system: enabling shared memory for the grid (Page 61-69)