Lack of flexibility. An RPC client can use only a limited set of services; each new procedure has to be prepared and installed by an experienced programmer

Distributed Programming Languages

3. Lack of flexibility. An RPC client can use only a limited set of services; each new procedure has to be prepared and installed by an experienced programmer

Previous Table of Contents Next

There are many extensions to the basic RPC. However, many attempts to extend RPC to include parallel features [5] lead to complex semantics losing the main advantage of RPC. The asynchronous RPC

proposed by Liskov and Shrira [36] has been successfully implemented in the Mercury communication system at MIT. The lightweight RPC introduced by Bershad [10] aims to improve performance by using the concept of thread. A thread is also called a lightweight process and many threads can share the same address space. In such a system a (heavyweight) process consists of an address space and one or more threads of control. Each thread has its own program counter, register states, and stack. Each thread can independently make remote procedure calls. Other extensions of RPC to support concurrent access to multiple servers and to serve multiple requests simultaneously can be found in [42].

Another mechanism close to RPC is remote evaluation (REV) [44], which allows several procedures (code and data) to be encapsulated in one procedure to be forwarded to a remote site like a procedure call in RPC. The corresponding remote site evaluates the encapsulated procedure. The transmitted data in the encapsulated procedure can be used many times by the procedures in the encapsulated procedure and intermediate results produced (if any) do not need to be passed back to the client, However, if one cannot exploit the above benefit, the communication overhead may even increase due to the frequent transmission of procedures (both code and data). Moreover, relocation can be a problem, especially in a heterogeneous system, because it is not an easy task to move executable code from one machine to another with different instruction sets and data representations.

Recently, a context driven call (CDC) [48] model was proposed which extends the well-known RPC and combines merits of both RPC and REV. Like RPC, CDC allows a set of procedures to be disposed on a remote processor and to be called by the same language construct (a procedure call) as local procedures.

However, a different implementation mechanism is employed in CDC. In addition CDC supports mechanisms for individually sending data to a remote site and receiving data from a remote site. The programmer does not need to be aware of these data movements.

CDC supports two types of data objects: local and remote. A local object is a variable allocated to the address space of the current master. A remote object is a variable allocated to a slave’s address space. To perform a local or remote evaluation of an expression with remote variables, CDC issues an appropriate combination of sending data to a remote site, remote evaluation, and receiving data from remote sites.

Specifically, for a general expression x = e(x

1,x

2, ..., x

) which contains at least one remote object. If it is evaluated locally, the local site needs to receive input data (if any) from remote sites, perform a local evaluation, and then send (result) data to a remote site if variable x is a remote object. If it is evaluated remotely, local site needs to first send input data to the corresponding remote site. The remote site performs a remote evaluation and the local site receives the result from the remote site if x is a local object. A similar situation applies to a local or remote evaluation of a procedure void f (x

1, x

2, ..., x

n) with remote objects. A local evaluation occurs when there are remote objects from different remote sites.

In this case the local site receives (remote) data from remote sites and performs a local evaluation. A remote evaluation happens when all remote objects are from the same remote site. In this situation the local site sends (local) data to the remote site and a remote evaluation is performed there.

To support the above actions the following functions that can be requested from a client (a locate site) to a server (a remote site) are introduced:

• rcreate() creates a remote object of a certain size at a remote site, names the object with a unique name (handler) and returns the handler to the caller.

• rremove() removes a remote object to which a handler is provided as an input.

• rread() copies a remote object to a local buffer with both the handler to the object and the address of the buffer being provided as inputs.

• rwrite() copies the content of a local buffer to a remote object with both the handler to the object and the address of the buffer being provided as inputs.

• rfork() is a nonblocking call which calls a remote stub to generate a new thread and passes the addresses of remote objects as parameters.

The above functions are used to implement various evaluations of expressions and procedures. For example, rcreate() and rwrite() can be used to create and initialize remote objects, respectively. A call to a remote procedure can be realized by rfork(). rread() can be used for a client to obtain the value of a remote object, and a client can release the memory for a remote object using rremove.

In DCDL none of the above mechanisms is included. We try to keep the simplicity of send and receive commands so the reader can focus on algorithms described in DCDL.

2.6 Robustness

Distributed systems have the potential advantages over centralized systems of higher reliability and availability. However, the responsibility for achieving reliability still relies on the operating system, the language runtime system, and the programmer. Two approaches can be used to achieve the reliability in distributed systems:

• Programming fault tolerance.

• Communication fault tolerance.

Previous Table of Contents Next

These two methods are interrelated. Programming fault tolerance can be achieved by either forward recovery or backward recovery. Forward recovery tries to identify the error and correct the system state containing the error based on this knowledge [11]. Exception handling in high level languages, such as Ada, PL/1, and CLU, provides a system structure that supports forward recovery. Backward error recovery corrects the system state by restoring the system to a state which occurred prior to the

manifestation of the fault. The recovery block scheme [28] provides such a system structure. Another programming fault-tolerance technique commonly used is error masking. N-version programming [6]

uses several independently developed versions of an algorithm. A final voting system is applied to the results of these n versions and a correct result is generated. So far there is no commercially available language which supports a backward recovery scheme although several researchers have proposed some language skeletons [30], [32], as well as some underlying supporting mechanisms such as the recovery cache [3] and the recovery metaprogram [2].

Communication fault tolerance deals with faults that occur in process communication. Communication fault tolerance depends on both the communication scheme used (message-passing or RPC) and the nature of the faults (fail-stop [41] or Byzantine type of faults [33]).

In general, there are four types of communication faults:

In document Distributed System Design (Page 71-74)