Asynchronous Checkpointing and Optimistic Message Logging for Mobile Ad Hoc Networks

In recent years the advancements in wireless communication technology and mobile computing fueled a steady increase in both number and types of applications for wireless networks. Wireless networks can roughly be classified into cellular networks which use dedicated infrastructure (like base stations) and ad hoc networks without infrastructure. A Mobile Ad Hoc Network (MANET) is a collection of mobile nodes that can communicate with each other using Multihop wireless links without using any fixed infrastructure and centralized controller. Since this type of networks exhibits a dynamic topology, that is, the nodes move very frequently, it is hard to establish some intermittent connectivity in this scenario. Fault tolerance is one of the key issues for MANETs. In a cluster federation, clusters are gathered to provide huge computing power. Clustering methods allow fast connection and also better routing and topology management of mobile ad hoc networks (MANET). To work efficiently on such systems, networks characteristics have to be taken into account, for e.g. the latency between two nodes of different clusters is much higher than the latency between two nodes of the same cluster. In this paper, we present a message logging protocol well-suited to provide fault tolerance for cluster federations in mobile ad hoc networks. The proposed scheme is based on optimistic message logging. I. INTRODUCTION Wireless networks include infrastructure-based networks and ad hoc networks. Most wireless infrastructure-based networks are established by a one hop radio connection to a wired network. On the other hand, mobile ad hoc networks are decentralized networks that develop through self-organization [1]. The original idea of MANET started out in the early 1970s. At this time they were known as packet radio networks. Lately, substantial progress has been made in technologies like microelectronics, wireless signal processing, distributed computing and VLSI (Very Large Scale Integration) circuit design and manufacturing [2]. This has given the possibility to put together node and network devices in order to create wireless communications with ad hoc capability.


I. INTRODUCTION
Wireless networks include infrastructure-based networks and ad hoc networks.Most wireless infrastructure-based networks are established by a one hop radio connection to a wired network.On the other hand, mobile ad hoc networks are decentralized networks that develop through self-organization [1].The original idea of MANET started out in the early 1970s.At this time they were known as packet radio networks.Lately, substantial progress has been made in technologies like microelectronics, wireless signal processing, distributed computing and VLSI (Very Large Scale Integration) circuit design and manufacturing [2].This has given the possibility to put together node and network devices in order to create wireless communications with ad hoc capability.
MANETs are formed by a group of nodes that can transmit and receive data and also relay data among themselves.Communication between nodes is made over wireless links.A pair of nodes can establish a wireless link among themselves only if they are within transmission range of each other.An important feature of ad hoc networks is that routes between two hosts may consist of hops through other hosts in the network [3].When a sender node wants to communicate with a receiver node, it may happen that they are not within communication range of each other.However, they might have the chance to communicate if other hosts that lie in-between are willing to forward packets for them.This characteristic of MANET is known as multihopping.An example is shown in figure 1. Node A can communicate directly (single-hop) with node B, node C and node D. If A wants to communicate with node E, node C must serve as an intermediate node for communication between them.Therefore, the communication between nodes A and E is multi-hop.Today wireless bluetooth, personal area networks (PAN), IEEE 802.11 a/b/g, wireless local area networks (WLAN) and HIPERLAN/2, are communication standards that include ad hoc features [4].Figure 2 shows an example of a mobile ad hoc network composed of commonly used wireless devices.Checkpoint and message logging protocols are designed for saving the execution state of mobile application, so that when a MH recovers from a failure, the mobile application can roll back to the last saved consistent state, and restart execution with recovery guarantees.The existing protocols assume that the MH's disk storage is not stable and thus checkpoint and log information are stored at the base stations [5], [6].
Log-based rollback recovery exploits the fact that a process execution can be modeled as a sequence of deterministic state intervals, each starting with the execution of a nondeterministic event.A non-deterministic event can be the receipt of a message from another process or an event internal to the process.A message send event is not a non-deterministic event.Log-based rollback recovery assumes that all nondeterministic events can be identified and their corresponding determinants can be logged into the stable storage.
During failure-free operation, each process logs the determinants of all non-deterministic events that it observes onto the stable storage.Additionally, each process also takes checkpoints to reduce the extent of rollback during recovery.After a failure occurs, the failed processes recover by using the checkpoints and logged determinants to replay the corresponding non-deterministic events precisely as they occurred during the pre-failure execution.Because execution within each deterministic interval depends only on the sequence of non-deterministic events that preceded the interval's beginning, the pre-failure execution of a failed process can be reconstructed during recovery up to the first non-deterministic event whose determinant is not logged.The deterministic intervals composing the process execution are called state intervals.The state intervals are partially ordered by the Lamport's happen-before relation [7].
Message logging techniques are classified into pessimistic [8], optimistic [9], [10], [11] and causal [12], [13], [14], [15].Pessimistic logging protocols assume that a failure can occur after any non-deterministic event in the computation.This assumption is "pessimistic" since in reality failures are rare.In their most straightforward form, pessimistic protocols log to the stable storage the determinant of each non-deterministic event before the event affects the computation.A pessimistic protocol is one in which each process p never sends a message until it knows that all messages delivered before sending the previously sent messages are logged.Pessimistic protocols will never create any inconsistent process (orphans), and so the reconstruction of the state of a crashed process is very straightforward.The pessimistic protocols potentially block a process for each message it receives.
In optimistic logging protocols, processes log determinants asynchronously to the stable storage.These protocols optimistically assume that logging will be complete before a failure occurs.Determinants are kept in a volatile log, and are periodically flushed to the stable storage.Thus, optimistic logging does not require the application to block waiting for the determinants to be written to the stable storage, and therefore incurs much less overhead during failure-free execution.However, the price paid is more complicated recovery, garbage collection, and slower output commit.If a process fails, the determinants in its volatile log are lost, and the state intervals that were started by the non-deterministic events corresponding to these determinants cannot be recovered.Furthermore, if the failed process sent a message during any of the state intervals that cannot be recovered, the receiver of the message becomes an orphan process and must roll back to undo the effects of receiving the message.Optimistic logging protocols do not implement the always-no-orphans condition.
Causal logging combines the advantages of both pessimistic and optimistic logging at the expense of a more complex recovery protocol.Like optimistic logging, it does not require synchronous access to the stable storage except during output commit.Like pessimistic logging, it allows each process to commit output independently and never creates orphans, thus isolating processes from the effects of failures at other processes.Moreover, causal logging limits the rollback of any failed process to the most recent checkpoint on the stable storage, thus minimizing the storage overhead and the amount of lost work.Causal logging protocols make sure that the always-no-orphans property holds by ensuring that the determinant of each non-deterministic event that causally precedes the state of a process is either stable or it is available locally to that process.
In this paper we focus on optimistic based message logging for communications in a clustered ad hoc network, since the checkpointing-only schemes are not suitable for the mobile environment and also in ad hoc environments in which unreliable mobile hosts and fragile network connection may hinder any kind of coordination for checkpointing and recovery.In order to cope with the storage problem, the task of logging is assigned to the CH instead of MHs, since each message heading to a MH is routed through the CH.Also, in order to reduce the overhead imposed on mobile hosts, cluster heads take charge of logging and dependency tracking, and mobile hosts maintain only a small amount of information for mobility tracking.
The rest of this paper is organized as follows: Section 2 discusses related work and problem formulation.In section 3, system model is described.Section 4 explains the basic algorithm and comparison with existing schemes is described in section 5. Finally section 6 concludes the paper.

A. Related Work
Application failure recovery in the mobile computing environment has received considerable attention in the recent years.The schemes that have been proposed employ checkpointing, logging or a combination of both, recognizing the inherent limitations of the mobile computing environments.Since the requirements of a ad hoc network are different from the mobile computing environment, these issues needs to be addressed in this area as well.
Prakash and Singhal describe in [16] a checkpointing algorithm for Mobile Conputing System.Checkpoint collection is synchronous and non-blocking.A minimum number of nodes are forced to take checkpoints.Each MH maintains a dependence vector.MHs maintain causal relationships through message.This scheme reduces energy consumption by www.ijacsa.thesai.orgpowering down individual components during periods of low activity.
In [17] T.Park et.al has presented an efficient movement based recovery scheme.This scheme is a combination of message logging and independent checkpointing.Main feature of this algorithm is that a host carrying its information to the nearby MSS can recover instantly in case of a failure.To enhance failure-free execution, concept of a 'certain range' is introduced.An MH moving inside a range , recovery information remains in host MSS otherwise it moves recovery information to nearby MSS.Though recovery is ensured, failure-free execution cost increases.Due to this out of range concept overheads due to transfer of checkpoint from one MSS to another MSS increases many fold.Sapna E. George [18]et.aldescribes a checkpointing and logging scheme based on mobility of MHs.A checkpoint is saved when hand-off count exceeds a predefined optimum threshold.Optimum threshold is decided as a function of MH's mobility rate, failure rate and log arrival rate.Recovery probability is calculated and recovery cost is minimized in this scheme.
Acharya et al. [6] describes uncoordinated checkpointing, where multiple MHs can arrive at a global consistent checkpoint without coordination messages.However, neither it takes into account how failure recovery is achieved nor does it address the issue of recovery information management in the face of MH movement.
In [19] authors proposed an independent checkpointing scheme which saves the state of processes in the computer to which a mobile host is currently attached.
The authors in [20] presents a low overhead recovery scheme based on a communication induced checkpointing, which allows the processes to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line.The scheme also uses selective pessimistic message logging at the receiver to recover the lost messages.However, the recovery scheme can handle only a single failure at a time.P. Kumar and A. Khunteta [22] proposed a minimumprocess coordinated checkpointing algorithm for deterministic mobile distributed systems, where no useless checkpoints are taken, no blocking of processes takes place, and anti-messages of very few messages are logged during checkpointing.In their algorithm they have tried to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others.

B. Problem Formulation
Cluster federations are hierarchical systems.The latency between two clusters is much higher than the latency between two nodes of the same cluster.For efficient execution on such systems, applications must take into account the topology of the cluster federation.Communications between nodes of the same cluster should be favored over communications between nodes of different clusters.
The objective of the present work is to design an optimistic based message logging for communications in a clustered ad hoc network, since the checkpointing-only schemes are not suitable for the mobile environment and also in ad hoc environments.In order to cope with the storage problem, the task of logging is assigned to the CH instead of MHs, since each message heading to a MH is routed through the CH.In order to reduce the size of dependency information carried in each message for asynchronous recovery, only the messages between the CHs carry the information, and the dependency between the MHs residing in the same Cluster can be traced through the message order within the CH.Using the restricted dependency tracking, no extra overhead is imposed on MHs.Of course, there is a possibility of unnecessary rollbacks due to the imprecise dependency information, however, comparing with the checkpointing-only schemes, the chance of rollback propagation in the message logging schemes is very low.

III. SYSTEM MODEL
A successful approach for dealing with the maintenance of mobile ad hoc networks is by partitioning the network into clusters.In this way the network becomes more manageable.Clustering is a method which aggregates nodes into groups.These groups are contained by the network and they are known as clusters.Clusters are analogous to cells in a cellular network.However, the cluster organization of an ad hoc network cannot be achieved offline as in fixed networks [21].In most clustering techniques nodes are selected to play different roles according to a certain criteria.In general, three types of nodes are defined: Ordinary nodes :-Ordinary nodes are members of a cluster which do not have neighbors belonging to a different cluster.
Gateway nodes:-Gateway nodes are nodes in a nonclusterhead state located at the periphery of a cluster.These types of nodes are called gateways because they are able to listen to transmissions from another node which is in a different cluster.To accomplish this, a gateway node must have at least one neighbor that is a member of another cluster.
Clusterheads:-Most clustering approaches for mobile ad hoc networks select a subset of nodes in order to form a network backbone that supports control functions.A set of the selected nodes are called clusterheads and each node in the network is associated with one.Clusterheads are connected with one another directly or through gateway nodes.The union of gateway nodes and clusterheads form a connected backbone.This connected backbone helps simplify functions such as channel access, bandwidth allocation, routing power control and virtual-circuit support.
Clusterheads are analogous to the base station concept in current cellular systems.They act as local coordinators in resolving channel scheduling and performing power control.However, the difference of a clusterhead from a conventional base station resides in the fact that a clusterhead does not have special hardware, it is selected among the set of stations and it presents a dynamic and mobile behavior.Since clusterheads must perform extra work with respect to ordinary nodes they can easily become a single point of failure within a cluster.For this reason, the clusterhead election process should consider for the clusterhead role, those nodes with a higher degree of www.ijacsa.thesai.orgrelative stability.The main task of a clusterhead is to calculate the routes for long-distance messages and to forward intercluster packets.Figure 3 shows the system model and different roles of nodes in a mobile ad hoc network organized by clusters.

Figure: -3 System Model
The clustering system considered in this paper follows the model presented in the figure above The system is organized into various clusters, each having a clusterhead and ordinary nodes, which will be termed as Mobile Hosts; a set of dynamic links can be established between a MH and a Cluster head.The area covered by a cluster head is called a cell.A MH residing in a cluster can be connected to the clusterhead servicing the cluster and the MH can communicate to another MH only through the cluster head.The links in the dynamic network support FIFO communication in both directions.
For a MH to leave a cluster and enter into another cluster, it first has to end its current connection by sending a leave(x) message to the cluster head, where x is the sequence number of the last message received from the cluster head, and then establish a new connection by sending join (MH-id, previous cluster-id, previous clusterhead-id) message to the new cluster head in the new cluster.Each cluster head maintains a list identifiers, called a Current_Nodelist, with which nodes it connected at current time A MH can also disconnect itself from the cluster head voluntarily without leaving the cluster by sending a disconnect (x) message to conserve power.When the cluster head receives a disconnect message from a Node, it marks that node to be "disconnected" by setting a flag that maintains a list of voluntarily disconnected MHs, called disconnected_Nodelist Later on, the MH can reconnect to any cluster by sending a reconnect (MH-id, previous cluster-id, previous clusterhead_id) message to the current cluster head.If the MH is reconnected to a new cluster, the new cluster head informs the previous cluster head of the reconnection so that the previous cluster head can perform the proper hand-off procedures.

IV. PROPOSED ALGORITHM
The ordinary Nodes in the clustered ad hoc networks are considered highly vulnerable to failures, while the cluster heads are relatively reliable as they are chosen among the nodes with a higher degree of relative stability.By reliable CH, we mean that recovery information for MHs can never be lost due to its own failure.With this assumption, the volatile memory space of a CH can be utilized as a stable storage to save checkpoints and message logs of MHs.

B. Checkpointing and Message Logging
Each mobile host MH independently takes a checkpoint and a unique sequence number is assigned to each checkpoint.For the checkpointing, MH first saves its current state as a checkpoint and then transfers the checkpoint to CH to which it is currently connected.Chk i x denotes the xth checkpoint of MH i .Each checkpoint is identified by a pair of (i, x).MH i then sends the checkpoint to its current CH, say CH p .Each Mobile host also maintains a variable m i rcv , to count the number of messages a mobile host has received, and the value of m i rcv is sent with the checkpoint to the cluster head.On the receipt of a checkpoint and other related information CH saves it into stable storage.The value of m i rcv sent with the checkpoint to CH is used to decide the correct position of the checkpoint with respect to logged messages.
Each CH p also maintains the message log for the nodes residing in that cluster.Since each message that is delivered to MH in the cluster is routed through CH, logging of messages incurs little overhead.Let Msg i x denotes xth message delivered to MH i .In addition, CH p also logs the messages related to the mobility of the nodes, such as join, leave, disconnect and reconnect.Any of these messages sent from MH must also carry the value of m i rcv and sequence number is logged with message.
A node after a failure should be able to locate its latest checkpoint and the message log for recovery.Each mobile host also contains a variable Locate i , to retrieve the information after failure.Locate i contains chk_seq, cp_loc, cp_ch, msg_seq and log_set.chk_seq stores the sequence number of latest checkpoint and cp_loc stores the ID of the cluster and cp_ch stores the ID of the CH that has recorded the latest checkpoint.Let this cluster be called cluster in and CH be called CHp. msg_seq denotes the sequence number for the first message logged after the checkpoint and log_set contains the IDs of the cluster heads that stores its logs.At every checkpoint, cp_loc is updated with the current CH, cp_seq is updated with the www.ijacsa.thesai.orgsequence number of the latest checkpoint and log_set is cleared.At every logging activity, the IDs of the current CHs are added to log_set if it is not present already.Locate i is logged by the CHs which a MH has visited.When a mobile host joins or reconnects to a new cluster, say CH p , it sends Locate i with the connection message.Also when mobile host disconnects itself from or leaves CH p , it sends Locate i with disconnection message if information in the Locate i has been changed since the connection was established.CH p on receipt of Locate i logs it with the message.Each mobile host also maintains a variable timeToCkp i which defines Time interval until next checkpoint

C. Algorithm
We describe pseudocode for the checkpointing and message logging protocol here , ….e i y , where 1≤y, e i x is the first message from the last checkpoint and e i y is the last message before failure.Since all the messages delivered to MH i are logged in CH and Locate i .log_setindicates the order in which MH i has contacted CH since its last checkpoint.After a failure MH i should rollback to the latest checkpoint and the logged messages in the same order and it can reconstruct the same state intervals as the ones before failure.Because all the messages sent and received events are recorded, the MHi's state can be reconstructed.

V. HANDLING FAILURES AND DISCONNECTIONS
We distinguish here failures and disconnections.Failures can be categorized asmobile host falls and is damaged, lost or stolen, battery is discharged.Disconnections are termed as hand-off.Since the mobility rate of mobile hosts in ad hoc networks is very high, so while connected a mobile host can change its position and can join another cluster.This movement is termed as disconnection.We will discuss these two issues separately.
After a MH recovers from a failure, either a mobile host is in the same cluster or the cluster can be changed.When a MH i recovers from a failure, it first sends a join(MH-id, previous cluster-id, previous clusterhead-id) to its current CH, say CH p .CH p checks it Active_nodelist list and Disconnected_nodelist.If MH i is found in any of these lists that mean MH after a failure has recovered in current cell and thus the Loacte i must have been logged in CH p .
Sometimes it may happen that CH is not able to find MHi either in Active_nodelist list or Disconnected_nodelist, which means that MH after a failure has moved to another cluster.In this case, firstly MH sends a join (MH-id, previous cluster-id, previous clusterhead-id) to its current CH, say CH q .Now, CH q will broadcast the recovery message, so that previous CH which has been contacted by MH i can deliver the most recent Loacte i to CH q .After CH q receives the most recent Locate i , it starts with the recovery procedure.During the recovery of MH i , new messages heading to MH i can be logged at CH q .However, these messages are delivered to MH i after consuming all the messages in the log.Only the MHs which have failed rollback to the latest checkpoint and replay the logged messages to ensure the global recovery line and no other MHs need to rollback.
A node after a disconnection should be able to locate its latest checkpoint and the message log for recovery.Each mobile host contains a variable Locate i , to retrieve the information after failure which we have discussed in section 4.1.For each hand-off or disconnection, a MH within a cluster transfers the checkpoint and message logs to the current cluster head, so that the recovery information can be retrieved later from the cluster head.For a MH, MH i , connected to the cluster head CH p , in cluster CL i first saves its checkpoints and message logs and updates the information in Locate i .Let www.ijacsa.thesai.orgMH_data (i, p) denotes the checkpoints and message logs of MH i saved by CH p of CL i .When MH i leaves CL i and joins another cluster head say CH new of cluster CL k , a hand-off procedure is initiated by CH new sending a handoff-request for MH i to CH p .While the hand-off procedure is performed, the recovery information is transferred from CH p to CH new .Figure 4 depicts the sequence of events that take place for the recovery information transfer.

VI. PERFORMANCE COMPARISON
In [1] authors proposed a communication pattern based checkpointing scheme to save consistent global states, in which a checkpoint is taken whenever a message reception is preceded by a message transmission.An independent checkpointing scheme which saves the state of processes in the computer to which a mobile host is currently attached was proposed by authors in [12].Neither of the above approaches needs the checkpointing coordination, however, they may enforce a large number of checkpoints.[13] proposed a lowcost synchronous checkpointing scheme, in which a process can advance its checkpoint asynchronously, however, it may result in considerable message overhead and an inconsistency.[10] presents a low overhead recovery scheme based on a communication induced checkpointing, which allows the processes to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line.The scheme also uses selective pessimistic message logging at the receiver to recover the lost messages.However, the recovery scheme can handle only a single failure at a time.In [18] authors describe a checkpointing and logging scheme based on mobility of MHs.A checkpoint is saved when hand-off count exceeds a predefined optimum threshold.Optimum threshold is decided as a function of MH's mobility rate, failure rate and log arrival rate.Recovery probability is calculated and recovery cost is minimized in this scheme.
We have described an optimistic based message logging scheme since checkpointing-only schemes are not suitable for ad hoc environments and most of the schemes described above are based on checkpointing-only approach.Also, we have followed the asynchronous checkpointing approach, as asynchronous recovery is desirable in ad hoc environments in which MH can be disconnected any time from the network and co-ordination may not be possible.

VII. CONCLUSION
In this paper, we have proposed an optimistic based message logging approach for cluster based ad hoc networks in which each MH in the cluster takes checkpoint independently.Also, each message that is delivered to MH in the cluster is routed through CH which avoids the overhead of message logging at MH. MH only carries minimum information and all the dependency tracking and mobility of MH can be properly traced by CH.The asynchronous checkpointing scheme relives the MH from any kind of coordination and they can take their checkpoints whenever they want.

Figure 1 -
Figure 1 -Multi-hop communication in a mobile ad hoc network

Figure 2 -
Figure 2 -Wireless Mobile Ad hoc Network

Figure 4 :
Figure 4 : Handling failures and disconnections Locate i =variable to retrieve information after failure of MH chk_seq=sequence number of latest checkpoint cp_loc=current ID of a cluster cp_ch=current cluster head ID msg_seq =sequence number of the first message logged after checkpoint log_set=IDs of cluster heads that store MH logs timeToCkp i =Timer to take checkpoint on MH