• No results found

Protocols for DSM Architecture

5.1 Lightning Architecture

5.2.3 Distributed Shared Memory

Each processor in a DSM system is associated with a portion of the system address space, which can be accessed by the other processors through message passing [19]. A general configuration for a DSM is shown in Fig. 5.3(c). Each processor typically executes from its associated memory as in distributed systems. However, all processors have access to the entire address space as with shared memory systems. The required blocks are brought into the address space allotted to the processor, and used for process execution. This configuration is very attractive in that it combines the topological advantages of distributed systems with the programming advantages of shared memory systems. The application base of the resulting system is wider than for both of the two approaches individually, moving the system in the direction of general-purpose parallel computing.

Distributed shared memory systems are constructed as a distributed memory system, but appear to the programmer as a shared memory system. The shared memory appearance is obtained by mapping the memory modules located at the processor into a global address

space. Each processor is associated with its own memory which can also be accessed by other processors through message passing. In this case, message passing is handled by the system and makes the memory access transparent to the programmer.

Message passing involves transmission and reception of request/response packets. For example, in order to support the view of shared memory in a distributed memory system, the following steps are involved. A node identifies the target node that contains the target memory block. This is followed by transmission of a request packet to the target node. The requested block is transmitted by the target node. This might lead to contention problems at the network and the global memory module.

An optically interconnected distributed shared memory system has been proposed and analyzed in [111]. The combination of the concepts of shared memory, distributed memory and optical interconnects result in a system with the advantages of all three: ease of programming, scalable systems, and fast low-cost performance. The network is longer the principal constraint and has changed the perspective of various issues such as processor migration and load balancing.

This section briefly introduced the different memory organizations for parallel and dis- tributed systems. The following section provides the protocol definitions for the Lightning architecture described in Section 5.1.

5.3

Protocol Definition

This section describes the hybrid access protocol. It is defined for a single level system, and then generalized to the multi-level architecture defined in Section 5.1.

The protocol considered here is based on channels pre-allocated for data reception where each node receives on a specific channel referred to as its home channel. Each processor

is assumed to possess a tunable transmitter constructed from a laser-array and a receiver tuned to its home channel.

The traffic characteristics of the system play a crucial role in the access protocol design. Define two classes of traffic:

Class A: small amounts of data, such as control information generated by the cache control

mechanism and the operating system.

Examples are memory block requests, invalidations, cache level acknowledgments, application level low-latency messages, operating system control information, band- width reconfiguration (for the multi-level network) and other network management packets.

Class B: large amounts of data, such as a memory block.

This class of traffic generates a reservation control packet for media access if a reservation based protocol is used. In this case, Class A control information can be piggybacked on a Class B reservation packet.

Memory block length is expressed in terms of control packet length: let

L

denote the ratio of memory block packet length to control packet length.

Often only a control packet needs to be sent to transmit control signals. Block transfers occur only a fraction of the time. This provides the motivation to develop a protocol that can reduce communication latency and transmit unfragmented memory block packets. Define

as the fraction of Class B packets generated in the system. The distribution of packet types is determined both by the media access protocol and the cache coherency protocol. Cache coherency protocols for multiprocessor systems can be broadly classified as snoopy and directory based [21]. The write-invalidate and write-update consistency commands generate varying levels of cache control traffic. Snoopy cache protocols use some form

of broadcast mechanism (for fast invalidations, etc.), and directory based schemes store information on where copies of blocks reside and usually require explicit invalidations. The value of

depends largely on the amount of data sharing which is determined by the address references generated. Snoopy schemes generate less control packets since all nodes can read the invalidate commands. Directory based schemes generate more node-to-node traffic and are characterized by many more cache control packets.

The following sections describe the time multiplexed protocol I-TDMA [92] and the proposed hybrid protocol FatMAC. Each node requires a tunable transmitter and a fixed receiver subsystem. Each node receives traffic on a pre-determined channel. A source node tunes its transmitter to the home channel of the destination node and transmits according to the access protocol. A node receives and processes all traffic along its home channel. In a multi-level system, each node has one receiver per level tuned to the home channel for that level. A source node can compute the home channel of the destination node in a decentralized fashion through a simple computation based on the channel allocation policy. Let

N

i and

C

i respectively denote the number of stations and channels within a level

i

cluster. Node

m

k is assigned



j as its home channel based on the allocation policy, where

j

2f0

;

1

;

2

;:::;C

i?1gand 0

k



N

i?1. An interleaved channel allocation scheme assigns channel



j to station

m

k where

j

=

k

mod

C

i.