Design checklist.
4.6 Case Study: Matrix Multiplication
5.5.1 Remote Operations
Figure 5.1: Remote read and write operations. At the top of the figure, we show a global pointer
gp located in processor object pobj1 referencing an integer length in processor object pobj2.
The rest of the figure is a timeline depicting the activity in these two processor objects as a thread in pobj1 first writes and then reads length. The thread in pobj1 is shown as a solid line when
active and as a dashed line when suspended waiting for a remote operation. The diagonal dashed lines represent communications.
CC++ global pointers are used in the same way as C++ local pointers; the only difference is that we use them to operate on data or to invoke functions that may be located in other processor objects. Hence, the following code fragment first assigns to and then reads from the remote location referenced by the global pointer gp.
global int *gp; int len2;
*gp = 5;
As illustrated in Figure 5.1, these read and write operations result in communication.
If we invoke a member function of an object referenced by a global pointer, we perform what is called a remote procedure call (RPC). An RPC has the general form
<type> *global gp; result = gp->p(...)
where gp is a global pointer of an arbitrary <type>, p(...) is a call to a function defined in the
object referenced by that global pointer, and result is a variable that will be set to the value
returned by p(...). An RPC proceeds in three stages:
1. The arguments to the function p(...) are packed into a message, communicated to the
remote processor object, and unpacked. The calling thread suspends execution. 2. A new thread is created in the remote processor object to execute the called function. 3. Upon termination of the remote function, the function return value is transferred back to
the calling thread, which resumes execution.
Basic integer types (char, short, int, long, and the unsigned variants of these), floats, doubles,
and global pointers can be transferred as RPC arguments or return values without any user intervention. Structures, regular pointers, and arrays can be transferred with the aid of transfer functions, to be discussed later in this section.
Program 5.5 uses RPCs to access a variable length located in another processor object; contrast
this with the code fragment given at the beginning of this section, in which read and write operations were used for the same purpose. The communication that results is illustrated in Figure 5.2.
Figure 5.2: Using remote procedure calls to read and write a remote variable. At the top of the
figure, we show a global pointer lp located in processor object pobj1 referencing processor
object pobj2. The rest of the figure is a timeline depicting the activity in these two processor
objects as a thread in pobj1 issues RPCs first to read and then to write the remote variable length. The thread in pobj1 is shown as a vertical solid or dashed line when active or suspended,
waiting for a remote operation; the diagonal dashed lines represent communications. The solid vertical lines in pobj2 represent the threads created to execute the remote procedure calls.
5.5.2 Synchronization
Figure 5.3: Alternative synchronization mechanisms. On the left, the channel: a receiver blocks
until a message is in the channel. On the right, the sync variable: a receiver blocks until the
variable has a value.
A producer thread can use an RPC to move data to a processor object in which a consumer thread is executing, hence effecting communication. However, we also require a mechanism for synchronizing the execution of these two threads, so that the consumer does not read the data before it is communicated by the producer. In the task/channel model of Part I, synchronization is achieved by making a consumer requiring data from a channel block until a producer makes data available. CC++ uses a different but analogous mechanism, the single assignment or sync variable
(Figure 5.3). A sync variable is identified by the type modifier sync, which indicates that the
variable has the following properties:
1. It initially has a special value, ``undefined.''
2. It can be assigned a value at most once, and once assigned is treated as a constant (ANSI C and C++ const).
3. An attempt to read an undefined variable causes the thread that performs the read to block until the variable is assigned a value.
We might think of a sync variable as an empty box with its interior coated with glue; an object cannot be removed once it has been placed inside.
Any regular C++ type can be declared sync, as can a CC++ global pointer. Hence, we can write
the following.
sync int i; // i is a sync integer
sync int *j; // j is a pointer to a sync integer int *sync k; // k is a sync pointer to an integer
sync int *sync l; // l is a sync pointer to a sync integer
We use the following code fragment to illustrate the use of sync variables. This code makes two
concurrent RPCs to functions defined in Program 5.5: one to read the variable length and one to
write that variable.
Length *global lp; int val; par { val = lp->read_len(); lp->write_len(42); }
What is the value of the variable val at the end of the parallel block? Because the read and write
operations are not synchronized, the value is not known. If the read operation executes before the write, val will have some arbitrary value. (The Length class does not initialize the variable length.) If the execution order is reversed, val will have the value 42.
This nondeterminism can be avoided by modifying Program 5.5 to make the variable length a sync variable. That is, we change its definition to the following. sync int length;
Execution order now does not matter: if read_len executes first, it will block until the variable length is assigned a value by write_len.
Example . Channel Communication:
Global pointers and sync variables can be used to implement a variety of communication
mechanisms. In this example, we use these constructs to implement a simple shared queue class. This class can be used to implement channel communication between two concurrently executing producer and consumer tasks: we simply allocate a queue object and provide both tasks with pointers to this object. We shall see in Section 5.11 how this Queue class can be encapsulated in
Recall that a channel is a message queue to which a sender can append a sequence of messages and from which a receiver can remove messages. The only synchronization constraint is that the receiver blocks when removing a message if the queue is empty. An obvious CC++ representation of a message queue is as a linked list, in which each entry contains a message plus a pointer to the next message. Program 5.6 takes this approach, defining a Queue class that maintains pointers to
the head and tail of a message queue represented as a list of IntQData structures. The data
structures manipulated by Program 5.6 are illustrated in Figure 5.4.
Figure 5.4: A message queue class, showing the internal representation of a queue as a linked list
of IntQData structures (two are shown) with message values represented as sync values that are
either defined (42) or undefined (<undef>). Producer and consumer tasks execute enqueue and dequeue operations, respectively.
The Queue class provides enqueue and dequeue functions to add items to the tail of the queue and
remove items from the head, respectively. The sync variable contained in the IntQData structure
used to represent a linked list entry ensures synchronization between the enqueue and dequeue
operations. The queue is initialized to be a single list element containing an undefined variable as its message.
The first action performed by dequeue is to read the message value associated with the first entry
in the queue. This read operation will block if the queue is empty, providing the necessary synchronization. If the queue is not empty, the dequeue function will read the queue value, delete
the list element, and advance the head pointer to the next list element. Similarly, the enqueue
function first allocates a new list element and links it into the queue and then sets the msg field of
the current tail list element. Notice that the order in which these two operations are performed is important. If performed in the opposite order,
tail->value = msg;
tail->next = new IntQData;
then a dequeue function call blocked on the list element tail->value and enabled by the
assignment tail->value=msg could read the pointer tail->next before it is set to reference a
newly created element.