DSM Data Distribution Algorithms - The SMG DSM system: enabling shared memory for the grid

The DSM data distribution algorithm coupled with a DSM ownership management algorithm (next section) defines the base functionality required for implementing a distributed shared memory. These were originally derived from the cache coherency pro- tocols of early hardware shared memory multiprocessor systems [63]. The ownership management algorithm specifies how to find the owner of a shared data item, and the DSM data management algorithm specifies how the shared data is distributed. Each is an important consideration as it affects the number of control messages that are gener- ated.

The four basic DSM data distribution algorithms [63] are described below. There are also modified versions not described here. The algorithms can be categorised by whether or not they (a) migrate ownership of data, and (b) replicate data. The associated cost functions for each algorithm consist of two components Cost = a×b, where a is the

probability of the access to remotely located data, andb is the average cost of accessing the remote data item. The parameters for the following cost functions are defined in Ta- ble 4.1 below. These basic algorithms have been extended to allow for fault tolerance [64], with little additional overhead for the central-server & full-replication algorithm.

• Central-server: with this algorithm the owner of shared data never changes. With every read/write to shared data a request is sent to a central server. The server responds with the valid data. Thus two messages are required for each request. The primary problem with this approach is that the server becomes a bottleneck, having to service requests from all processes. A potential solution is to statically distribute the shared data among a number of servers, but a requesting process will then need to know the location of the data. The cost for the central server algorithms is:

Cc= 1− 1 S × 4p (4.1)

• Migration: the ownership of the data is transferred upon receiving a request for the data item. When a process relinquishes ownership of a shared memory item the identity of the process that it transfers ownership to is recorded. In this way it is always possible to ascertain what process is the owner of the item. Data is transferred among processes in blocks of a defined granularity. This scheme is most advantageous when a data block is used predominantly accessed by a single process. If, however, it is accessed by a number of processes then ’thrashing’ of the block will occur. One additional requirement is that as the ownership of a block is transient an efficient algorithm is required in order to find the current owner.

Cm =f × (2P+ 4p) (4.2)

• Read replication: the main disadvantage with the previous algorithms is that only one thread of execution may access data at any one time, i.e. they implement

DSM DATA DISTRIBUTION ALGORITHMS 39

SRSW modes. With this algorithm data is replicated at different nodes allow- ing different threads to read concurrently, eliminating much of the communication overhead associated with the previous algorithms.

When a read to a shared data item occurs and it cannot be satisfied locally, then a copy is sent to the requester; at this point ownership may or may not be transferred. When a write occurs data consistency must be maintained according to the consistency model (see Section 4.5). This algorithm implements the MRSW mode. The management of shared data can be distributed across multiple nodes in order to eliminate any potential bottlenecks.

Crr=f0 × 2P+ 4p+ Sp r+ 1 (4.3)

• Full-replication: the full-replication algorithm implements a MRMW mode, whereby unlike read-replication, full-replication allows for data to be replicated while written to, with the proviso being that only non-competing writes can occur. Reads accesses occur in a similar manner to the previous algorithm, while write accesses are broadcast to other nodes. The order of sequencing writes in order for data consistency to be maintained is left to the consistency model (see Section 4.5).

Examining the cost function below shows that there areS + 2messages for every write, where S is the number of remote caches of the variable. Instead of per- forming this action for every write operation an optimisation is for all writes to be logged, then the shared memory is only updated in a node when a write occurs locally.

Cf r =

r+ 1 × (S+ 2)p (4.4)

Parameter Definition

p The cost of a zero-size packet event (latency)

P The cost of a large packet event (latency & bandwidth) S The number of nodes sharing the data

r Read/Write ratio,or access pattern to a granularity unit f Probability of an access fault on a non-replicated data block f’ Probability of an access fault on a replicated data block

Table 4.1: Parameters for DSM algorithm cost functions

4.3.1 DSM Ownership Management Algorithm

Depending on whether or not data migrates and/or is replicated, the owner of the shared data master version, and copies if they exist, must be located when required. The DSM

DSM DATA DISTRIBUTION ALGORITHMS 40

ownership management algorithm is responsible for doing this and is closely related to the DSM data distribution algorithm. A number of ownership management algorithms for implementing DSM were identified by Li [65]. The two main classifications arise from the decision whether to centralise or distribute management. The main approaches are:

• Centralised Manager: A central ownership manager is responsible for synchro- nising all accesses to shared memory. It must maintain records of the existence of all replicated copies of data, i.e. a copy-set. When a process requires data it will direct its request to the manager, who in turn will forward it to the current owner of the data.

• Improved Centralised Manager: Differs to the previous algorithm in that it doesn’t synchronise access to the data but maintains only a copy-set and a record of the current owner. All requests are still directed to the central manager.

• Fixed Distributed Manager: To reduce the potential for a bottleneck that may arise from centralised management, multiple managers are established, each with responsibility for a subset of the shared memory address space. A hashing function is normally used to provide the mapping between processes and shared memory [66]. When a process requires access to a shared memory area, the request will be directed towards the appropriate manager.

• Broadcast Distributed Manager: Here each process manages the pages that it owns. A message is broadcast when access to a shared memory is required, and the current owner will then respond. A write broadcast results in all nodes invalidating their copies and ownership being transferred to the requesting process, while a read broadcast results in a copy of the data being sent to the requester and the copy-set being updated.

• Dynamic Distributed Manager: Here each process maintains a probable owner (prob owner) field which is updated upon every transfer of ownership. When a process requires access to a shared memory area it will direct its request to the process it believes is the current owner, i.e. to the prob owner. The copy-set only exists on the process that owns the shared location. When ownership is transferred the copy-set is also transferred.

These ownership management functions are interdependent with the DSM data distribution algorithms. If the ownership of a shared block does not change, as occurs with the central server algorithm, then the management function is trivial, and the owner can be identified immediately when required. However if migration occurs, then there is a pos- sibility of a bottleneck at the ownership manager, as is the case with the full-replicated DSM data distribution algorithm, then a distributed ownership manager is required. Fixed ownership is an expensive solution (communication occurs on every write) and due to this fact it is a constraint on parallel computation, so much that it renders it an unattractive solution [65] for DSM.

SYSTEM ORDERING 41

In document The SMG DSM system: enabling shared memory for the grid (Page 58-61)