• No results found

Design checklist.

4.6 Case Study: Matrix Multiplication

5.5.1 Remote Operations

Figure 5.1: Remote read and write operations. At the top of the figure, we show a global pointer

gp located in processor object pobj1 referencing an integer length in processor object pobj2.

The rest of the figure is a timeline depicting the activity in these two processor objects as a thread in pobj1 first writes and then reads length. The thread in pobj1 is shown as a solid line when

active and as a dashed line when suspended waiting for a remote operation. The diagonal dashed lines represent communications.

CC++ global pointers are used in the same way as C++ local pointers; the only difference is that we use them to operate on data or to invoke functions that may be located in other processor objects. Hence, the following code fragment first assigns to and then reads from the remote location referenced by the global pointer gp.

global int *gp; int len2;

*gp = 5;

As illustrated in Figure 5.1, these read and write operations result in communication.

If we invoke a member function of an object referenced by a global pointer, we perform what is called a remote procedure call (RPC). An RPC has the general form

<type> *global gp; result = gp->p(...)

where gp is a global pointer of an arbitrary <type>, p(...) is a call to a function defined in the

object referenced by that global pointer, and result is a variable that will be set to the value

returned by p(...). An RPC proceeds in three stages:

1. The arguments to the function p(...) are packed into a message, communicated to the

remote processor object, and unpacked. The calling thread suspends execution. 2. A new thread is created in the remote processor object to execute the called function. 3. Upon termination of the remote function, the function return value is transferred back to

the calling thread, which resumes execution.

Basic integer types (char, short, int, long, and the unsigned variants of these), floats, doubles,

and global pointers can be transferred as RPC arguments or return values without any user intervention. Structures, regular pointers, and arrays can be transferred with the aid of transfer functions, to be discussed later in this section.

Program 5.5 uses RPCs to access a variable length located in another processor object; contrast

this with the code fragment given at the beginning of this section, in which read and write operations were used for the same purpose. The communication that results is illustrated in Figure 5.2.

Figure 5.2: Using remote procedure calls to read and write a remote variable. At the top of the

figure, we show a global pointer lp located in processor object pobj1 referencing processor

object pobj2. The rest of the figure is a timeline depicting the activity in these two processor

objects as a thread in pobj1 issues RPCs first to read and then to write the remote variable length. The thread in pobj1 is shown as a vertical solid or dashed line when active or suspended,

waiting for a remote operation; the diagonal dashed lines represent communications. The solid vertical lines in pobj2 represent the threads created to execute the remote procedure calls.

5.5.2 Synchronization

Figure 5.3: Alternative synchronization mechanisms. On the left, the channel: a receiver blocks

until a message is in the channel. On the right, the sync variable: a receiver blocks until the

variable has a value.

A producer thread can use an RPC to move data to a processor object in which a consumer thread is executing, hence effecting communication. However, we also require a mechanism for synchronizing the execution of these two threads, so that the consumer does not read the data before it is communicated by the producer. In the task/channel model of Part I, synchronization is achieved by making a consumer requiring data from a channel block until a producer makes data available. CC++ uses a different but analogous mechanism, the single assignment or sync variable

(Figure 5.3). A sync variable is identified by the type modifier sync, which indicates that the

variable has the following properties:

1. It initially has a special value, ``undefined.''

2. It can be assigned a value at most once, and once assigned is treated as a constant (ANSI C and C++ const).

3. An attempt to read an undefined variable causes the thread that performs the read to block until the variable is assigned a value.

We might think of a sync variable as an empty box with its interior coated with glue; an object cannot be removed once it has been placed inside.

Any regular C++ type can be declared sync, as can a CC++ global pointer. Hence, we can write

the following.

sync int i; // i is a sync integer

sync int *j; // j is a pointer to a sync integer int *sync k; // k is a sync pointer to an integer

sync int *sync l; // l is a sync pointer to a sync integer

We use the following code fragment to illustrate the use of sync variables. This code makes two

concurrent RPCs to functions defined in Program 5.5: one to read the variable length and one to

write that variable.

Length *global lp; int val; par { val = lp->read_len(); lp->write_len(42); }

What is the value of the variable val at the end of the parallel block? Because the read and write

operations are not synchronized, the value is not known. If the read operation executes before the write, val will have some arbitrary value. (The Length class does not initialize the variable length.) If the execution order is reversed, val will have the value 42.

This nondeterminism can be avoided by modifying Program 5.5 to make the variable length a sync variable. That is, we change its definition to the following. sync int length;

Execution order now does not matter: if read_len executes first, it will block until the variable length is assigned a value by write_len.

Example . Channel Communication:

Global pointers and sync variables can be used to implement a variety of communication

mechanisms. In this example, we use these constructs to implement a simple shared queue class. This class can be used to implement channel communication between two concurrently executing producer and consumer tasks: we simply allocate a queue object and provide both tasks with pointers to this object. We shall see in Section 5.11 how this Queue class can be encapsulated in

Recall that a channel is a message queue to which a sender can append a sequence of messages and from which a receiver can remove messages. The only synchronization constraint is that the receiver blocks when removing a message if the queue is empty. An obvious CC++ representation of a message queue is as a linked list, in which each entry contains a message plus a pointer to the next message. Program 5.6 takes this approach, defining a Queue class that maintains pointers to

the head and tail of a message queue represented as a list of IntQData structures. The data

structures manipulated by Program 5.6 are illustrated in Figure 5.4.

Figure 5.4: A message queue class, showing the internal representation of a queue as a linked list

of IntQData structures (two are shown) with message values represented as sync values that are

either defined (42) or undefined (<undef>). Producer and consumer tasks execute enqueue and dequeue operations, respectively.

The Queue class provides enqueue and dequeue functions to add items to the tail of the queue and

remove items from the head, respectively. The sync variable contained in the IntQData structure

used to represent a linked list entry ensures synchronization between the enqueue and dequeue

operations. The queue is initialized to be a single list element containing an undefined variable as its message.

The first action performed by dequeue is to read the message value associated with the first entry

in the queue. This read operation will block if the queue is empty, providing the necessary synchronization. If the queue is not empty, the dequeue function will read the queue value, delete

the list element, and advance the head pointer to the next list element. Similarly, the enqueue

function first allocates a new list element and links it into the queue and then sets the msg field of

the current tail list element. Notice that the order in which these two operations are performed is important. If performed in the opposite order,

tail->value = msg;

tail->next = new IntQData;

then a dequeue function call blocked on the list element tail->value and enabled by the

assignment tail->value=msg could read the pointer tail->next before it is set to reference a

newly created element.