Process Communication and Synchronization

Distributed Programming Languages

2.4 Process Communication and Synchronization

An important issue in the design of distributed systems is how the pieces of a program that are running in parallel on different PEs are going to cooperate. The cooperation involves two types of interaction:

communication and synchronization. Principally there exist two complementary communication schemes in distributed systems:

• Message passing which allows processes to exchange messages. An interprocess

communication facility basically provides two abstract operations: send (message) and receive (message).

• Shared data which shares data for communication in a distributed system with no shared memory. The concept of distributed data structure called tuple space [15], used in Linda, is a method for providing shared data.

The key issue in using shared data is to prevent multiple processes from simultaneously changing the data. Note that the mechanisms in shared memory systems, such as monitors and semaphores, cannot be applied here.

The implementation of a message passing scheme normally requires the following list of decisions:

• One-to-one or one-to-many.

• Synchronous or asynchronous.

• One-way or two-way communication.

• Direct or indirect communication.

• Automatic or explicit buffering.

• Implicit or explicit receiving.

One-to-one is also called point-to-point message passing while one-to-many supports an efficient broadcast or multicast facility [34]. With synchronous message passing, the sender is blocked until the

receiver has accepted the message. In asynchronous message passing the sender does not wait for the receiver. One-way communication represents one interaction between the sender and the receiver, while two-way communication indicates many interactions, back and forth, between the sender and the

receiver. Direct or indirect communication differentiates whether the sender sends messages to the receiver directly or indirectly through an intermediate object, usually called a mailbox or port.

A mailbox supports multiple senders and receivers that share the same storage unit. A port is a special example of a mailbox that is normally attached with either a sender or a receiver. When a port is

associated with a sender, it supports one sender and multiple receivers. When a port is associated with a receiver, it supports multiple senders and one receiver. Normally a port is a finite first-in-first-out

(FIFO) queue maintained by the kernel. In many situations ports and mailboxes are used interchangeably.

A communication scheme is symmetric if both the sender and the receiver name each other. In an

asymmetric scheme only the sender names the receiver. Explicit buffering requires the sender to specify the size of the buffer used to hold the message at the receiver to know the source of the message (the sender) while implicit receiving does not care the source of the message.

There are five message-passing models, which are commonly used [7]: synchronous point-to-point, asynchronous point-to-point, rendezvous, remote procedure call, and one-to-many.

The synchronous point-to-point approach has been adopted in the Occam programming language [14]

which uses one-way communication between the sender and the receiver. Writing programs using send and receive primitives is relatively difficult because a programmer has to take care of many details, such as the pairing of responses with request messages, data representation, knowing the address of the

remote processor or the server, and taking care of communication and system failures. Two commonly used higher level communication structures are rendezvous (used in Ada) and remote procedure call (RPC) (used in DP [26]). RPC uses two-way communication and is similar to the client/server model where the client requests a service and waits for the results from the server. The difference in these two approaches is that the caller or sender is blocked in RPC and is unblocked in rendezvous. A detailed study of RPC will be discussed in the next section. One-to-many message passing is not frequently used, although some languages such as broadcasting sequential processes (BSP) [23] use this scheme.

Previous Table of Contents Next

The issues discussed so far are logical or at the application software level. There are more

implementation issues at the underlying hardware and network layers. These issues include [40]:

• How are communication links established?

• Can a communication link be associated with more than two processes?

• How many links should be placed between every pair of processes?

• What is the capacity of a link?

• What is the size of a message? (fixed or variable)

• Is a link unidirectional or bidirectional?

The detailed discussion of these issues is beyond the scope of this book. In the following we focus on commands that are offered in DCDL for process communication and synchronization.

In DCDL asynchronous point-to-point message passing is adopted. Messages are passed to a named receiver process through asynchronous static channels (links). An output command is of the form

send message_list to destination

where the destination is a process name (a one-to-one communication) or a keyword all representing all the other processes (a one-to-all communication). An input command has the form

receive message_list from source

where the source is a process name (optional) and this input command supports both implicit and explicit acceptance of the message. An implicit acceptance of the message is expressed as

receive message_list

An asynchronous static channel (link) is a FIFO queue between two processes. Unless otherwise specified, we assume that there is only one channel between any two processes.

Note that synchronous communication can be simulated by asynchronous communication. At the sending site, we have:

send message_list to destination;

receive empty_signal from destination

At the receiving site, we have:

receive message_list from sender;

send empty_signal to sender

The above structure can be extended to implement barrier synchronization. Many iterative algorithms successively compute better approximations to an answer and they terminate when either the final answer has been computed or when the final answer has converged. Each iteration typically depends on the results of the previous iteration. Note that an iteration with a true guard corresponds to a periodic process.

In the above algorithm true is a Boolean constant that always returns a true value, i.e., the condition is always satisfied. This type of synchronization is called barrier synchronization [31]. The delay point at the end of each iteration represents a barrier that all processes have to arrive before any are allowed to pass.

In an asymmetric implementation one process is called coordinator and the rest are workers. The coordinator sends a special signal to all the other processes when it receives a message from each of them.

To construct a symmetric barrier (one with the same code in each process), we can use a barrier for two processes as a building block. Suppose barrier(P

i, P

j) is a barrier synchronization between processes P

and P

j such as the one in the above example. A barrier for eight processes can be:

stage 1:

Figure 2.4 A three-stage symmetric barrier.

Its graphical representation is shown in Figure 2.4. The following examples show more of the use of send and receive commands in DCDL.

Previous Table of Contents Next

Example 2.7 Squash [27]: The squash program replaces every pair of consecutive asterisks “**” by an upward arrow “↑”. Assume that the final character input is not an asterisk.

The input process is a repetitive command:

input ::= * [ send c to squash ] The output process is another repetitive command:

output ::= * [ receive c from squash ]

When a receive command is used as a guard in a repetitive command, the execution of a corresponding guard is delayed until the corresponding send command is executed. A command with a receive

command in one of its guards terminates only when the corresponding sender terminates.

Example 2.8 We can have the following recursive solution to compute f(n) = f(n - 1) × n², for n > 1 and f (1) = 1.

Figure 2.5 A recursive solution for Example 2.8.

The solution to f(n) is a parallel execution of all these n + 1 processes:

In the above solution n + 1 processes are used to solve the problem (see Figure 2.5). This solution is only used to illustrate the use of interprocess communication and is by no means an efficient one. p(0) is the USER program that sends n to p(1) and receives the result f(n) from p(1). Each p(i) calculates f (n - i + 1) for 1 ≤ i ≤ n and the number of active p(i) depends on n.

One may use only one process to solve the same problem. The definition of f(n) can be easily transformed into a recursive procedure to compute f(n):

In the above solution the result of f (n) is placed in variable ans. Recursion is useful in deriving a simple solution. At the same time, in theory at least, any recursive program can be written iteratively (and vice versa), and in practice it may make sense to do so. Perhaps problems of space and time efficiency force the use of iteration:

Example 2.9 DCDL can also be used to implement a binary semaphore s:

where proc(i) is a process requesting a V or P operation to semaphore s.

For some other problems, several solutions exist for different interprocess communication structures.

Example 2.10 Fibonacci numbers are a sequence of integers defined by the recurrence F(i) = F(i - 1) + F (i - 2) for i > 1, with initial values F(0) = 0 and F(1) = 2. Provide a DCDL implementation of F(i) with one process for each F(i).

Again we define a set of processes: f(i) which calculates F(n - i + 1). Clearly, if n - i + 1 > 1, f(i) depends on values calculated from f(i + 1) and f(i + 2). A natural solution will be f(i) receives (n - i + 1) from f(i - 1) and passes (n - i) to f(i + 1) if (n - i + 1) is larger than 1. Then f(i) waits for results from f(i + 1) and f(i + 2), summarizes the results, and forwards the summation to f(i - 1) and f(i - 2) (see Figure 2.6).

Figure 2.6 A solution for F (n).

In the above algorithm f(0) is USER and f(-1) is a dummy process. f(-1) can be deleted if we change the statement send p + q to f(i - 2) to [i ≠ 1 → send p + q to f(i - 2)] in f(i).

The second solution restricts communication to neighbors only, i.e., f(i) communicates only with f(i - 1) and f(i + 1) (see Figure 2.7).

Figure 2.7 Another solution for F(n).

In document Distributed System Design (Page 60-70)