A Min-process Checkpointing Protocol for Deterministic Mobile Ad Hoc Networks

(1)

A Min-process Checkpointing Protocol for

Deterministic Mobile Ad Hoc Networks

Praveen*, Parveen Kumar** *Nims University, Jaipur (Raj)

**MIET, Meerut (UP)

Abstract

The mobile ad hoc network architecture consists of a set of mobile hosts that can communicate with each other without the assistance of a base station. Nodes within each other radio range communicate directly via wireless links while these which are far apart rely on other nodes to relay messages. Node mobility causes frequent changes in topology. Fault-tolerance is an important design issue in building a reliable Ad hoc network. This paper considers checkpointing recovery schemes for the mobile ad-hoc network environment to introduce software based fault tolerance. In this paper we propose a new anti-message logging based non-intrusive minimum process checkpointing scheme for ad hoc networks. We assume that it uses Cluster Based Routing Protocol (CBRP) which belongs to a class of Hierarchical Reactive routing protocols. It produces a consistent set of checkpoints; the algorithm makes sure that only minimum number of nodes in the cluster is required to take checkpoints; it uses very few control messages. Performance analysis shows that our algorithm outperforms the existing related works and is a novel idea in the field.

1. Introduction

Fault tolerance can be achieved through some kind of redundancy in ad hoc networks. Redundancy can be temporal or spatial. In spatial redundancy or hardware-based fault tolerance, many copies of the application execute on different processors concurrently and strict timing constraints can be met. But the cost of providing fault tolerance using spatial redundancy is quite high and may require extra hardware. In temporal redundancy or software-based fault, an application is restarted from an earlier checkpoint or recovery point after a fault. This may result in the loss of some processing and applications may not be able to meet strict timing targets. Checkpoint-Restart or Backward error recovery is quite inexpensive and does not require extra hardware in general. Besides providing fault tolerance, check pointing can be used for process migration, debugging distributed applications; job swapping, postmortem analysis and stable property detection [6, 29].There are two software based fault tolerance approaches for error recovery:

• Forward Error Recovery • Backward Error Recovery

In forward error recovery techniques, the nature of errors and damage caused by faults must be completely and accurately assessed and so it becomes possible to remove those errors in the process state and enable the process to move forward. In distributed system, accurate assessment of all the faults may not be possible. In backward error recovery techniques, the nature of faults need not be predicted and in case of error, the process state is restored to previous error-free state. It is independent of the nature of faults. Thus, backward error recovery is more general recovery mechanism [6, 29].

There are three steps involved in backward error recovery. These are: • Check pointing the error-free state periodically

• Restoration in case of failure • Restart from the restored state

The global state (GS) of a distributed system is a collection of the local states of the processes and the channels. Local checkpoint is the saved state of a process at a processor at a given instance. Global checkpoint is a collection of local checkpoints, one from each process. A global state is said to be “consistent” if it contains no orphan message; i.e., a message whose receive event is recorded, but its send event is lost [6, 29].

(2)

The mobile ad hoc network distinguishes itself from traditional wireless networks by its dynamic changing topology, no base station support and the need of multi-hop communication MANET, a mobile host (MH) is free to move around and may communicate with others at anytime. Clustering an ad hoc network means partitioning its nodes into clusters, each one with a cluster-head (CH) and possibly some ordinary nodes. There is no fixed infrastructure such as base stations. Nodes within each other radio range communicate directly via wireless links while these which are far apart rely on other nodes to relay messages. Node mobility causes frequent changes in topology.

When a wire line is not available, an ad hoc network can be set for the communication. Clustering of MH provides a convenient framework for resource management. The main advantage of clustering is reducing the number of messages sent to each BS from each node, channel access, power control and bandwidth control. In cluster based architecture, whole network is divided into several clusters and in each cluster network elects one node to be called as cluster head. Hence, clustered ad hoc network consists of three kinds of nodes – cluster heads, gateways and ordinary nodes. Cluster-heads are the nodes that are given the responsibility for routing the messages within the cluster and performing the data aggregation. The communication between two adjacent clusters is conducted through the gateway nodes. All the nodes, other that gateway and cluster-heads are called ordinary nodes. Both gateways and ordinary nodes are managed by their cluster-heads.

During the cluster head election setup, our scheme elects the cluster head that has more weight function [26]. Then we have proposed a non-blocking coordinated checkpointing algorithm in which MHs take a tentative checkpoint and then on receiving a commit message from the initiator, the MHs convert their tentative

checkpoint into permanent. The proposed algorithm requires fewer control messages and hence fewer number of interrupts. Also, our algorithm requires only minimum number of MHs in a cluster to take checkpoints, it makes our algorithm suitable for cluster based protocols in ad hoc networks.

Hence, we propose a new minimum process checkpointing scheme in ad hoc networks for the Cluster Based Routing Protocol (CBRP) which belongs to a class of Hierarchical Reactive routing protocols. It produces a consistent set of checkpoints; the algorithm makes sure that only minimum number of nodes in the cluster is required to take checkpoints; it uses very few control messages. In this paper, we propose a minimum-process coordinated Checkpointing algorithm for Checkpointing deterministic distributed applications on ad hoc networks. We eliminate useless checkpoints as well as blocking of processes during checkpoints at the cost of logging anti-messages of very few messages during Checkpointing.

We assume that the system under consideration is deterministic in nature. In deterministic systems, if two processes start in the same state, and both receive the identical sequence of inputs, they will produce the identical sequence outputs and will finish in the same state. The state of a process is thus completely determined by its starting state and by sequence of messages it has received [17, 18]. David R. Jefferson [16] introduced the concept of anti-message. Anti-message is exactly like an original message in format and content except in one field, its sign. Two messages that are identical except for opposite signs are called anti-messages of one another. All messages sent explicitly by user programs have a positive (+) sign; and their anti-messages have a negative sign (-). Whenever a message and its anti-message occur in the same queue, they immediately annihilate one another. Thus the result of enqueueing a message may be to shorten the queue by one

2. The Proposed Checkpointing Protocol 2.1 Basic Idea

The proposed checkpointing algorithm is based on keeping track of direct dependencies of processes as in [4]. The initiator cluster head [CH] computes min_set [subset of the minimum set] on the basis of dependencies maintained locally at the initiator CH; and sends the checkpoint request to the relevant CHs. On receiving checkpoint request, an CH asks concerned processes to checkpoint and computes new processes for the minimum set. By using this technique, we have tried to optimize the number of messages between CHs. When the initiator CH commits the checkpointing process, it sends the commit request along with the exact minimum set to all CHs and every CH maintains up-to-date array of checkpoint sequence numbers of committed checkpoints. By doing so, we are able to maintain exact dependencies among processes and avoid useless checkpoint requests to the possible extent.

(3)

checkpoint for the current initiation whereas Pi has not taken. If Pi processes m and it receives checkpoint

request later on and takes its checkpoint, then m will become orphan in the recorded global state. We propose that the anti-messages of only those messages, which can become orphan, should be recorded at the receiver end. In deterministic systems, orphan messages are received as duplicate messages on recovery. A duplicate message is annihilated by its anti-message at the receiver end before processing. Hence, in deterministic distributed systems, an orphan message in global checkpoint does not create any inconsistency during recovery if its anti-message is logged at the receiver end. By doing so, we avoid the blocking of processes as well as the useless checkpoints in minimum-process checkpointing. The overheads of logging a few anti-messages may be negligible as compared to taking some useless checkpoints as in [4, 10] or blocking the processes during checkpointing as in [3, 8].

2.2 Data Structures

All Communications to and from MH pass through its local CH. Here, we describe the data structures used in the checkpointing protocol. A process can run on the Gateway node, cluster head or ordinary node. A process that initiates checkpointing, is called initiator process and its local CH is called initiator CH. If the initiator process is on a CH, then the CH is the initiator CH. Data structures are initialized on the completion of a checkpointing process if not mentioned explicitly. We use the term potential checkpoint request to a CH, if at least one process takes a checkpoint in its cell to this request.

(i) Each process Pi maintains the following data structures, which are preferably stored on local CH:

p_temp_csni: integer; I it is a temporary integer csn of a process; it is incremented

on tentative checkpoint ;

cd_vecti[]: A bit vector of length for n processes in the system; cd_vecti[j]=1 implies Pi is

causally dependent upon Pj. cd_vecti[j] is set to ‘1’ only if Pi processes m

received from Pj such that Pj has not taken any permanent checkpoint

after sending m;

tentativei a flag; set to ‘1’ on tentative checkpoint;

(ii) Initiator CH (any CH can be initiator CH) maintains the following Data structures: min_set[] a bit vector of size n; min_set[k]=1 implies Pk belongs to the minimum set;

initially, min_set[] (subset of the minimum set ) is computed by using cd_vect vectors maintained at the initiator CH ; on receiving response() from some CH: min_set=min_set np_min_set; after receiving responses from all relevant processes, min_set[] contains the exact minimum set; ‘’, is a operator for bitwise logical OR; np_min_set is described later;

timer1: a flag; initialized to ‘0’ when the timer is set; set to ‘1’ when maximum allowable time for collecting coordinated checkpoint expires;

T[] a bit vector of length n; T[i]=1 implies Pi has taken its tentative checkpoint;

(iii) Each CH (including initiator CH) maintains the following data structures:

D[]: a bit vector of length n; D[i]=1 implies Pi is running in the cluster of

CH; it also includes the disconnected MHs supported by this CH; F[] a bit vector of length n; F[i] is set to ‘1’ if the tentative checkpoint

t to Pi;

FF[] a bit vector of length n; FF[i] is set to ‘1’ if Pi is in its

cell and it has taken its tentative checkpoint;

s_bit: a flag; set to ‘1’ when some relevant process in its cell fails to take its tentative checkpoint;

Pin: initiator process identification;

CHin initiator CH identification;

p_temp_csnin p_temp_csn of initiator process;

csv[] an array of length n for n processes; csv[j] denotes the Pj’s most recent

committed checkpoint’s csn; on commit, for all j, (if min_set [j]==1) csv[j]++; min_set[] is the exact minimum set received along with the commit request; csv[] is not updated on tentative checkpoints; we maintain one csv array for each CH and not for each process;

(4)

np_min_set a bit vector of length n; it contains all new processes found for the minimum set at the CH; on each potential checkpoint request: if (tnp_min_set≠) np_min_set= np_min_set tnp_min_set;

tmin_set a bit vector of length n; tmin_set[k]=1 implies Pk belongs to the

minimum set; it maintains the local knowledge of the minimum set; on receiving tmin_set, min_set, tnp_min_set along with tent_req (checkpoint request): tmin_set=tmin_set tent_req.tmin_set, tmin_set=tmin_set tent_req.min_set, tmin_set=tmin_set

tent_req.tnp_min_set; on each potential checkpoint request, tnp_min_set is computed, if (tnp_min_set≠) tmin_set= tmin_set tnp_min_set;

chkpt a flag; set to 1 when the CH learns that some checkpointing process is going on; it is used to disallow multiple concurrent initiations of the checkpointing protocol;

2.2.1 Computation of min_set or tnp_min_set:

CHin initially computes the min_set[] on the basis of dependencies of local processes; the min_set[]

thus computed is based on the direct dependencies of the local processes and it is a subset of the minimum set. The computation of the minimum set on the basis of dependency vectors of the processes can be found in [3]. Suppose, CHin sends tent_req to CHs along with min_set[] and some process (say Pk) is found at CHs, which

takes the checkpoint to this tent_req (tentative checkpoint request). All CHs maintains the processes of minimum set to the best of their knowledge in tmin_set. It is required to minimize duplicate checkpoint requests. Suppose, there exists some process (say Pl) such that Pk is directly dependent upon Pl and Pl is not in the

tmin_set (maintained by CHs), then CHs sends tent_req to Pl. The new processes found for the minimum set

while executing a potential checkpoint request at an CH are stored in tnp_min_set. For example, in the present case: tnp_min_set={Pl}. CHs sends the tent_req to Pl; Pl is stored in np_min_set and it is removed from the

tnp_min_set. In this way, np_min_set at an CH maintains all new processes found for the minimum set while executing tent_req from CHin or other CHs. When an CH finds that all the local processes, which were asked to

take checkpoints, have taken their checkpoints, it sends the response to the CHin along with np_min_set; so that

CHin may update its knowledge about minimum set and wait for the new processes before sending commit. In

this way, CHin sends commit only if all the processes in the minimum set have taken their tentative checkpoints.

2.3 Brief Description of the Algorithm along with an Example

We explain our checkpointing algorithm with the help of an example. In Figure 2, at time t0, P2 initiates

checkpointing process. cd_vect2[1]=1 due to m11; and cd_vect1[4]=1 due to m12. On the receipt of m10,P2 does

not set cd_vect2 [3] =1, because, P3 has taken its permanent checkpoint after sending m10. We assume that P1

and P2 are in the cell of the same CH, say CHin. CHin computes min_set (subset of minimum set) on the basis of

cd_vect vectors maintained at CHin, which in case of Figure 2 is {P1, P2, P4}. Therefore, P2 sends tentative

checkpoint request to P1 and P4 and takes its own checkpoint. After taking its checkpoint, P1 sends m14 to P4. P4

logs m14-1. In this case, P1 has taken its checkpoint before sending m14; at the time of receiving m14, P4 has not

taken its checkpoint for the current initiation. If P4 takes checkpoint after receiving m14, them m14 will become

orphan. Therefore P4 logs m14-1. On recovery, P4 will receive m14 as duplicate message because the processes are

deterministic and m14 will be annihilated by m14-1. Hence receive of m14 as duplicate message will not cause any

inconsistency. It should be noted that this scheme is not applicable for non-deterministic systems. After taking its tentative checkpoint C41, P4 also finds that it was dependent upon P5 before taking the checkpoint due to m6

and P5 is not in the minimum set computed so far. Therefore, P4 sends tentative checkpoint request to P5. On

receiving the checkpoint request, P5 takes its checkpoint. At time t1, P2 receives responses from all relevant

processes (not shown in Figure 2) and sends the permanent checkpoint request along with the minimum set [{P1,

P2, P4, P5}] to all processes. When a process, in the minimum set, receives the permanent checkpoint request, it

converts its tentative checkpoint into permanent one. In this example, {C00, C11, C21, C30, C41, C51, C60, m14-1}

constitute a recovery line. It should be noted that, in the recorded global state, m14 is an orphan message and its

(5)

2.4 The Proposed Checkpointing Algorithm

When an MH sends an application message, it needs to first send to its local CH over the wireless cell. The CH can piggyback appropriate information onto the application message, and then route it to the appropriate destination. Conversely, when the CH receives an application message to be forwarded to a local MH, it first updates the relevant vectors that it maintains for the MH, strips all piggybacked information from the message, and then forwards it to the MH. Thus, an MH sends and receives application messages that do not contain any additional information; it is only responsible for checkpointing its local state appropriately and transferring it to the CH.

Each process Pi can initiate the checkpointing process. Initiator CH (say CHin) initiates and

coordinates checkpointing process on behalf of MHi. It computes min_set (subset of the minimum set on the

basis of direct dependencies maintained locally) ; and sends tentative checkpoint request (say tent-req) along with min_set to an CH if the later supports at least one process in the min_set. It also updates its tmin_set on the basis of min_set. We assume that concurrent invocations of the algorithm do not occur.

On receiving the tent-req, along with the min_set from the initiator CH, an CH, say CHi, takes the

following actions. It updates its tmin_set on the basis of min_set. It sends the tent_req to Pi if the following

conditions are met: (i) Pi is running in its cell (ii) Pi is a member of the min_set and (iii) tent_req has not been

sent to Pi. If no such process is found, CHi ignores the tent_req. Otherwise, on the basis of tmin_set, cd_vect

vectors of processes in its cell, initial cd_vect vectors of other processes, it computes tnp_min_set. If tnp_min_set is not empty, CHi sends tent_req along with tmin_set, tnp_min_set to an CH, if the later supports

m14

m11 t0

P

1

P

2

P

3

P

4

P

5

m12

m5

m6

Tentative Checkpoint Permanent Checkpoint

Checkpoint/commit request Computation message

Anti-message logged m3

m10

Figure 2

C10

C20

C30

C40

C50

C11

C21

C41

C51

C00

P

0

m6

t1

P

6

m16

C60

(6)

at least one process in the tnp_min_set. CHi updates np_min_set, tmin_set on the basis of tnp_min_set and

initializes tnp_min_set.

On receiving tent_req along with tmin_set, tnp_min_set from some CH, an CH, say CHj, takes the

following actions. It updates its own tmin_set on the basis of received tmin_set, tnp_min_set and finds any process Pk such that Pk is running in its cell, Pk has not been sent tent_req and Pk is in tnp_min_set. If no such

process exists, it simply ignores this request. Otherwise, it sends the tentative checkpoint request to Pk. On the

basis of tmin_set, cd_vect[] of its processes and initial cd_vect[] of other processes, it computes tnp_min_set. If tnp_min_set is not empty, CHj sends the checkpoint request along with tmin_set, tnp_min_set to an CH,

which supports at least one process in the tnp_min_set. CHj updates np_min_set, tmin_set on the basis of

tnp_min_set. It also initializes tnp_min_set.

For a disconnected MH, that is a member of minimum set, the CH that has its disconnected checkpoint, converts its disconnected checkpoint into the required one.

When an CH learns that all of its relevant processes have taken their tentative checkpoints successfully or at least one of its processes has failed to take its checkpoint, it sends the response message along with the np_min_set to the initiator CH. If, after sending the response message, an CH receives the checkpoint request along with the tnp_min_set, and learns that there is at least one process in tnp_min_set running in its cell and it has not taken its tentative checkpoint, then the CH requests such process to take checkpoint. It again sends the response message to the initiator CH.

When the initiator CH receives a response from some CH, it updates its min_set on the basis of np_min_set, received along with the response. Finally, initiator CH sends permanent checkpoint request to all the processes of the minimum set .

When a process in the minimum set receives the permanent checkpoint request, it converts its tentative checkpoint into tentative one. On receiving abort, a process discards its tentative checkpoint, if any, and undoes the updating of data structures. On receiving commit, processes, in the min_set [], convert their tentative checkpoints into permanent ones. On receiving commit or abort, all processes update their dependency vectors and other data structures.

2.5 Performance Evaluation of the Proposed Protocol

We use following notations to compare our algorithm with other algorithms: NCH: number of CHs.

Nmh: number of MHs.

Cpp: cost of sending a message from one process to another

Cst: cost of sending a message between any two CHs.

Cwireless: cost of sending a message from an MH to its local CH (or vice

versa).

Cbct: cost of broadcasting a message to all CHs.

Csearch: cost incurred to locate an MH and forward a message to its

current local CH, from a source CH.

Tst: average message delay in CH to CH communication.

Twl: average message delay in the wireless network.

Tch: average delay to save a checkpoint on the stable storage. It also

includes the time to transfer the checkpoint from an MH to its local CH.

N: total number of processes

Nmin: number of minimum processes required to take checkpoints.

Nmut: number of useless mutable checkpoints in [4].

Tsearch: average delay incurred to locate an MH and forward a message to

its current local CH.

Nucr: average number of useless checkpoint requests in [4].

Ndep: average number of processes on which a process depends.

2.5.1 Performance of our algorithm

The Synchronization message overhead:

(7)

and the system overhead is Cbct. In the second phase, we broadcast the commit request. The total message

overhead comes out to be: 2*Nmin*Cpp+ 2Cbct

Number of processes taking checkpoints: It requires only minimum number of processes to take their checkpoints.

2.5.2 A Comparative Study

In minimum-process coordinated checkpointing, some useless checkpoints are taken which are discarded on commit [4, 10]; or some blocking of processes takes place during checkpointing [3, 22]. In the proposed scheme, no useless checkpoints are taken and no blocking of processes takes place. We log anti-messages of very few messages at the receiver’s end only during the checkpointing period. The effort of logging few anti-messages may be negligibly small as compared to taking some useless checkpoints or blocking some processes during checkpointing especially in mobile ad hoc networks.

The blocking time of the Koo-Toueg [8] protocol is highest, followed by Cao-Singhal [3] algorithm. The other schemes are non-blocking [4, 10], like the proposed one. In Elnozahy et al [7] algorithm, all processes are required to take their checkpoints in an initiation. In the protocols [3, 8], and the proposed one, only minimum numbers of processes record their checkpoints.

Table 1 A Comparison of System Performance

The message overhead in the proposed protocol is similar to [4].The algorithms proposed in [2, 3, 4, 5, 7, 8, 10, 11, 13, 20, 21, 22], assume that the processes are non-deterministic, whereas, we assume in the proposed algorithm that the processes are deterministic in nature as in [17, 18, 20].

3. Conclusions

When designing an efficient ad hoc network application, we have considered the resource constraints and the scalability of ad hoc networks. Ad hoc network users concerned about information quality and user requirements for real-time features are also increasing. Moreover, ad hoc network applications are expanding into harsher and more dangerous environments. Therefore, checkpointing schemes have emerged as an important issue. We have proposed a minimum-process non-intrusive checkpointing protocol for deterministic mobile ad hoc networks, where no useless checkpoints are taken and no blocking of processes takes place. Also, our scheme minimizes the number of control messages needed. In minimum-process checkpointing protocols, some useless checkpoints are taken or blocking of processes takes place; we eliminate both by logging anti-messages of selective anti-messages at the receiver end only during the checkpointing period. The overheads of logging a few anti-messages may be negligible as compared to taking some useless checkpoints or blocking the processes during checkpointing especially in mobile ad hoc networks. We disallow concurrent executions in spite of concurrent initiations of the proposed protocol.

References

[1] Acharya A. and Badrinath B. R., “Checkpointing Distributed Applications on Mobile Computers,” Proceedings of the 3rd

International Conference on Parallel and Distributed Information Systems, pp. 73-80, September 1994.

[2] Cao G. and Singhal M., “On coordinated checkpointing in Distributed Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225, Dec 1998.

[3] Cao G. and Singhal M., “On the Impossibility of Min-process Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems,” Proceedings of International Conference on Parallel Processing, pp. 37-44, August 1998.

[4] Cao G. and Singhal M., “Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, February 2001.

Cao-Singhal [3]

Cao- Singhal [4]

Koo-Toeg Algorithm [8]

Elnozahy et al [7]

Proposed Algorithm Avg. blocking

Time

2Tst 0 Nmin*Tch 0 0

Average No. of

checkpoints

Nmin Nmin+

Nmut

Nmin N Nmin

Average Message Overhead

3Cbct+2Cwi reless+2NCH

*Cst+3Nmh

* Cwireless

2*Nmin*

Cpp +Cbct+

Nucr*Cpp

3*Nmin*Cpp*

Ndep

2*Cbct+ N

*Cpp

4*Nmin*

(8)

[5] Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of Distributed Systems,” ACM Transaction on Computing Systems, vol. 3, No. 1, pp. 63-75, February 1985.

[6] Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback-Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.

[7] Elnozahy E.N., Johnson D.B. and Zwaenepoel W., “The Performance of Consistent Checkpointing,” Proceedings of the 11th

Symposium on Reliable Distributed Systems, pp. 39-47, October 1992.

[8] Koo R. and Toueg S., “Checkpointing and Roll-Back Recovery for Distributed Systems,” IEEE Trans. on Software Engineering, vol. 13, no. 1, pp. 23-31, January 1987.

[9] Neves N. and Fuchs W. K., “Adaptive Recovery for Mobile Environments,” Communications of the ACM, vol. 40, no. 1, pp. 68-74, January 1997.

[10] Parveen Kumar, Lalit Kumar, R K Chauhan, V K Gupta “A Non-Intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems” Proceedings of IEEE ICPWC-2005, pp 491-95, January 2005.

[11] Prakash R. and Singhal M., “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, October1996.

[12] J.L. Kim, T. Park, “An efficient Protocol for checkpointing Recovery in Distributed Systems,” IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993.

[13] Ni, W., S. Vrbsky and S. Ray, “Pitfalls in Distributed Nonblocking Checkpointing”, Journal of Interconnection Networks, Vol. 1 No. 5, pp. 47-78, March 2004.

[14] L. Lamport, “Time, clocks and ordering of events in a distributed system” Comm. ACM, vol.21, no.7, pp. 558-565, July 1978. [15] Silva, L.M. and J.G. Silva, “Global checkpointing for distributed programs”, Proc. 11th_{symp. Reliable Distributed Systems, pp.}

155-62, Oct. 1992.

[16] David R. Jefferson, “Virtual Time”, ACM Transactions on Programming Languages and Systems, Vol. 7, NO.3, pp 404-425, July 1985.

[17] Johnson, D.B., Zwaenepoel, W., “ Sender-based message logging”, In Proceedingss of 17th_{international Symposium on}

Fault-Tolerant Computing, pp 14-19, 1987.

[18] Johnson, D.B., Zwaenepoel, W., “Recovery in Distributed Systems using optimistic message logging and checkpointing. pp 171-181, 1988.

[19] Parveen Kumar, Lalit Kumar, R K Chauhan, “A Non-intrusive Hybrid Synchronous Checkpointing Protocol for Mobile Systems”, IETE Journal of Research, Vol. 52 No. 2&3, 2006.

[20] Pushpendra Singh, Gilbert Cabillic, “A Checkpointing Algorithm for Mobile Computing Environment”, LNCS, No. 2775, pp 65-74, 2003.

[21] Lalit Kumar Awasthi, P.Kumar, “A Synchronous Checkpointing Protocol for Mobile Distributed Systems: Probabilistic Approach” International Journal of Information and Computer Security, Vol.1, No.3 pp 298-314.

[22] Parveen Kumar, “A Low-Cost Hybrid Coordinated Checkpointing Protocol for Mobile Distributed Systems”, Mobile Information Systems [An International Journal from IOS Press, Netherlands] pp 13-32, Vol. 4, No. 1, 2007.

[23] Murthy & Manoj, “Ad hoc Wireless Networks Architectures and Protocols”, Pearson Education, 2004.

[24] D.J. Baker and A. Ephremides, “The Architectural Organisation of a Mobile Radio Network via a Distributed algorithm”, IEEE Trans. Commun., vol. 29, no. 11, pp 1694-1701, Nov., 1981

[25] D.J. Baker, A. Ephremides and J.A. Flynn “The design and Simulation of a Mobile Radio Network with Distributed Control”, IEEE J. sel. Areas Commun.., pp 226-237, 1984

[26] B.Das, R. Sivakumar and V. Bharghavan, “Routing in Ad-hoc networks using a Spine”,Proc. Sixth International Conference, 1997.

[27] B.Das, R. Sivakumar and V. Bharghavan, “Routing in Ad-hoc networks using Minimum connected Dominating Sets”,Proc. IEEE International Conference, 1997.