Data Recovery Approach for Fault-Tolerant IoT Node

Internet of Things (IoT) has a wide range of applications in many sectors like industries, health care, homes, militarily, and agriculture. Especially IoT-based safety and critical applications must be more securable and reliable. Such type of applications needs to be operated continuously even in the presence of errors and faults. In safety and critical IoT applications maintaining data reliability and security is the critical task. IoT suffers from node failures due to limited resources and the nature of deployment which results in data loss consequently. This paper proposes a Data Recovery Approach for Fault Tolerant (DRAFT) IoT node algorithm, which is fully distributed, data replication and recovery implemented through redundant local database storage of other nodes in the network. DRAFT ensures high data availability even in the presence of node failures to preserve the data. When an IoT node fails in any cluster in the network data can be retrieved through redundant storage with the help of neighbor nodes in the cluster. The proposed algorithm is simulated for 100-150 IoT nodes which enhances 5% of network lifetime, and throughput. The performance metrics such as Mean Time to Data Loss (MTTDL), throughput, Network lifetime, and reliability are computed and results are found to be improved. Keywords—Internet of things; data recovery; RAID; node failures; reliability; network lifetime


I. INTRODUCTION
IoT provides an integration of multiple controllers, IoT nodes, servers, and gateways which contains embedded technologies to be logically connected and enables them to sense and interact to the real world and also among themselves. It is attainable for them to gather data from a wide range of existing structures. The accuracy of the IoT network is diminished, when the transmission data is faulty which leads to cause for inappropriate actions. So, it is critical to enhancing the ability of the IoT network to detect and recover the faulty node's data. Difficulties of detecting incorrect data and the quality of data have been studied extensively [1]. Fault tolerance is an important aspect for ensuring the high reliability and availability of the IoT network. Due to resource-constrained devices, there may be a lot of chances to occur failures in the network. Hence traditional networking technologies cannot handle IoT requirements effectively [2]. According to the survey, 48% of IoT projects may fail due to data failures and security. Data failures include missing files, corrupt files, and data blocks, and inconsistent files [3].
Capturing the data quality levels is more effective to estimate the device's quality and their produced data. Data quality estimation mechanism depends on three stages [4], data reliability, device availability, and overall quality of data.Sensors are small devices with limited resources like memory space, battery, and processing power. These resourceconstrained nodes pose multiple challenges for network designers' accurate usage of scarce resources. In certain times IoT applications need to be deployed in harsh environments [5]. In such IoT applications, nodes are prone to failures due to various reasons, such as hardware failure, battery depletion, and external events. Complex IoT applications with the help of various techniques require effective data management [6].
IoT is a complex heterogeneous network, maintaining high reliability is one of the major concerns. IoT networks should be more reliable for safety and critical applications, with the help of heuristic binary decision diagrams can able to access the link failures that occur due to data loss in community structures [7].
IoT applications help the industries to bring a competitive edge on their competitors. Even due to sharing data, security, device faults, and data manipulation between the various smart devices which becomes a serious concern to many industries, these interrupt the workflow of industries. IoT network comprises many sensing nodes, so the network needs to collect and process an enormous amount of external environment data. Enhancing the fault tolerance and reliability can be achieved by adding redundant bits to the original data at the information level is necessary for an IoT network. By employing the Reduced Variable Neighborhood Search (RVNS) algorithm the IoT network can enhance the processing speed and reliable transmission of data [8]. Generally, sensor data validation includes mainly two steps, those are data faults detection and data reconciliation. There is no perfect tool or method for this process.
In [9] various faulty data detection and correction methods and tools are discussed. In an IoT network, each sensor node works with a limited power source and when the sensor stops working the network cannot process data that they received this may affect the prediction of network health and reliability which leads to network failure. When this situation happens RAID structures are very useful for maintaining the redundant copies of data. This paper focuses on data management with the help of a particular RAID-like technique in the cluster for achieving fault tolerance with additional communication costs.
The remaining paper is organized as Section 2 recent data replication and recovery methods in IoT, Section 3 is about the implementation of a DRAFT algorithm in the IoT network. Section 4 is discussed about performance evaluation for DRAFT Algorithm and finally, section 5 draws conclusion (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 1, 2022 and possible future directions of work.

II. RELATED WORK
IoT requires efficient data collection, generation, and presentation through wireless sensor nodes. Due to limited energy resources, sensor nodes are unreliable, which may lead to the loss of valuable data. In IoT, data replication is a promising method for data management. In [10] data replication algorithms for IoT nodes classification, analysis, and comparison are presented. Data replication techniques include query balancing, data availability, system robustness, and data retrieval Resilient data-centric storage algorithm is utilized by [11] to illustrate the tiny database systems for easy data retrieving. It is specially designed for low-powered IoT sensor nodes. A Distributed Hash Table(DHT) is used for storing the redundant data into the node.
The redundant storage of data assures the information available in case of source failure. A greedy replicationbased distributed storage algorithm is proposed in [12], if a node in the network fails then the data can be retrieved through neighbor donor nodes. IoT sensor nodes data is stored in distributed mini data centers in a decentralized manner cloud rather than single cloud storage. In those data storage areas, each IoT data item has predetermined redundant data copies. This problem has been formulated with the help of a complex-linear programming model and heuristic algorithms are proposed in [13]. These algorithms are helping to improve the latency of reading and writing operations.
A multi-dimensional data storage algorithm is implemented within a single node [14], which can create to handle querying, indexing, storing, and ingestion of huge amounts of data. A distributed MDDS offers high ingestion rates, fault tolerance, and horizontal scaling when compared to Relational database management systems (RDBMS). To assure high data availability in the IoT network during the node failure distributed hop by hop data replication (DRAW) technique is proposed in [15]. This algorithm helps to identify the best replica node for maintaining a redundant copy. For selecting the replica node this technique applies a series of conditions like availability of the memory in the device, the number of hops, degree of replication, previous replicas of the data items, and common neighbors of the devices.
Data availability during the failures can be achieved through data replication algorithms. In [16] bridged replication control algorithm (BRC) is proposed for smart logistics. BRC creates temporary replication access when the link failure occurred through the bridge token. BRC provides efficient data management for smart networks. By maintaining redundant data components in multiple storage locations can achieve high fault tolerance. An adaptive data replication algorithm is proposed [17], and incorporated at the gateway level.
IoT network is deployed with many sensor nodes, in this context energy of the sensor nodes is depleted during data transmission. An adjustable data replication algorithm based on virtual grid technology is implemented in [18]. This scheme helps to enhance the lifetime and performance of the sensor nodes. A dynamic sink node will determine the communication link depending on the selected beacon and continuously develop a replica node across the query node to create a balance between the rate of energy consumption and the overhead in the network. Data recovery is one of the essential features of an IoT network. These networks may face some issues due to sensing and connection errors which result in incorrectly received data [19]. By incorporating probability matrix factorization at the cluster level can recover the missing data through neighbor nodes in the cluster.
In [20] a convolution neural network has been utilized for generating data recovery algorithms. For restoring the data which mainly includes two steps, all sensed collected for the training process to the networks and data recovery has been initiated with the help of a trained generator.An redundant residue number system algorithm [21], Max-flow algorithm fault tolerant [22], Device pairing algorithm [23], Finding Least connected points algorithm [24], Least connected neighbour algorithm [25] are the few fault-tolerant algorithms have been studied. The above-listed literature algorithms in Table  I were presented with a minimum of 10 to a maximum of 100 nodes. This work was analyzed properly to extend the existing algorithms works towards 150 IoT nodes with network lifetime, MTTDL, Reliability, and throughput of the network.

III. IMPLEMENTATION OF A DRAFT ALGORITHM IN THE IOT NETWORK
Safety and critical industrial IoT applications such as IoT based nuclear radiation monitoring systems require more data availability and high fault tolerance during data loss. Thus protecting sensed data through redundant storage can significantly improve the system performance. Let us consider 'n' clusters with 'N' static nodes are randomly deployed in a sensing field. Each node has a unique ID, 'i' where i=1,2,3,. . . ., P. Each node 'i' has fixed memory space 'k', to hold data and parity items. The memory space is partitioned into two units based on the RAID5 structure, one is for storing data and another unit is for parity storage. RAID is a well-proportioned technology that creates improved storage reliability and functions by block-level striping with parity in the node storage area. At regular intervals, all the nodes detect environmental factors and produce sensing data D i . The effective capacity for storing the sensing information is (k-1)/k of its total storage 'k'. 1/k storage space is utilized for storing redundant information. Each node preserves a Direct Neighbor Node(DNN) attribute list ′ N ′ Att . Every sensor node in the network can produce data and updates in the data unit as well as parity unit, with the help of all DNN in the cluster, and the node failure is denoted as N F . To implement the DRAFT algorithm, a few assumptions have been made regarding the cluster-based multi-node IoT architecture as follows • Assume that each cluster has at least five nodes.
• All nodes are having identical significance and storage space.
• All the nodes are having similar computation resources and initial energy supply.
To follow these steps the architecture is built as shown in Fig. 1. The basic architecture of IoT nodes consists of five elements they are controller, sensing element, battery, local storage, and network connectivity. The local storage of the node is partitioned into two parts. A DRAFT algorithm is invoked at each node and its storage unit is partitioned into two units based RAID5 mechanism. The inputs of the algorithm are no. of clusters 'n', storage space 'k', Data items D i , 1/k parity unit, (k-1)/k data unit space, DNN attribute table N Att . As the outputs of the algorithm create a node ID, and generate the parity based on DNN, data recovery from the DNN, MTTDL, throughput, network lifetime, and reliability. For every round, if any node data loss occurs can be retrieved through the DNN of that particular node in the cluster. A node collects new data samples continuously, and that data only updates its D i section, but it also transmits a copy of the newly detected data to all of its direct neighbors, allowing them to update their Pi sections by computing the parity from the received data. The procedural steps for DRAFT Algorithm in IoT network as follows:

DRAFT Algorithm
Step1 Input declaration Input:Number of clusters n, Data item Di, storage space k, DNN attribute table N Att Step2 Output declaration Output:Parity generation, data recovery if any data loss occurs,MTTDL,Network Lifetime, throughput,and reliability.
Step3 Node structure Algorithm defnode (node id, cluster id, Data Di, parity Pi) begin C id = Cluster id N id = Node id D i = Data of ith node P i = Parity of ith node end Step4 Storage unit structure Algorithm DRAFT (n, k, DNN) def node Data unit = (k-1)/k Parity unit = 1/k update D i , N att , P i Step5 Initialize the first round Node i transmitted data to cluster head i if (D i Where as Parity is computed by XORing all the inputs in a bitwise manner. If there are more than one boolean inputs, XOR returns true when the two inputs are different. A parity scheme is one of the common approaches for error detection. Cluster1 is grouped with five nodes each node is connected with three nodes. If any node failure causes the neighbor nodes will help to retrieve the data of that failed node by using redundant information. Data storage in D i and P i and recovery is represented in Fig. 2 topology. Here defining the following relations for nodes with their corresponding neighbors, (1) By using the above equations 1 to 5, the pi will store the information in cluster 1, similarly, all the clusters in the network the parity unit will update based on the DNN of the particular node in the network. Di unit will update the sensing data into it.
Using parity can easily rebuild the lost data inputs by conducting XOR of all the remaining values and former output values. Assume that node2 fails the data recovery of the node2  will get with the help of redundant information stored and its corresponding neighbors, Identified that if D2 and D5 at round1 and round2 instances then data recovery has been done with the help of all DNN and parity of the failed node in that cluster which is represented in Table II. Hence the data is proved that fault recovery has been made using a DRAFT algorithm. IoT network is grouped with 'n' no. of clusters, each cluster consists of n ≥ 5 nodes. Each node has four states they are normal state, degraded state, recovery state, and failed state. The reliability of the network and MTTDL of the node has been derived with the help of the following state diagram Fig. 3.
Let us consider the nonzero time of node replacement, the failure rate of a node is normal in degraded and rebuild states. S 0 is the normal state, in this state all the nodes in the cluster are operable and data in every node is available. From this state, the node can pass to state S 1 with the rate of k0(Failure of any node). S 1 is a degraded state, in this state one of the nodes in the cluster has been failed and waiting for a replacement and the remaining k-1 nodes are operable and data can be available. From this state, the node can move to either state F with the failure rate (k − 1)λ 0 (failure of another operable node) or S 2 with the repair rate of D . S 2 is the recovery state, in this state the failed node is replaced and the recovery process has been started, the remaining k-1 nodes are operable and data is available. From this state, the node can move to either state0 with the rate of α 1 (successfully data recovery completed) or to S 1 with the failure rate λ 2 (failure of the node during the data recovery process) or to S F with the rate of (k-1)λ 1 (failure of one of the operable nodes) or with the rate of (k-1) β 1 (read error on one of the operable nodes during data recovery process). S F is the failed state data of the node that are non-recoverable and unavailable. Let us present the state diagram based on the above transitions.

Where as
• λ 0 is the failure rate of the node in a cluster.
• λ 1 is the failure rate of the node in case of data unavailability of the node which is operable.
• λ 2 is the failure rate of the replaced node during the data recovery.
• µ D repair rate of the faulty node.
• α 1 is the success rate of data recovery.
• β 1 rate of reading errors on the operable nodes during the data recovery.
The above state diagram is solved with the help of Kolmogorov-Chapman differential equations which are analyzed as follows: Initially the probabilities of state S 0 is P (S 0 (t)) = 1, the remaining states probability which is equals to zero, hence P (S 1 (t)) = P (S 2 (t)) = P (S F (t)) = 0.

IV. PERFORMANCE EVALUATION FOR DRAFT ALGORITHM
This section presents the simulation setup, performance metrics, comparison of performance analysis with and without the DRAFT algorithm incorporated into the network.

A. Simulation Setup
The proposed DRAFT algorithm has been executed in CUPCARBON. The simulation setup consists of 150 sensor nodes with 27 clusters and is randomly deployed in a 200 X 200m square region. Assuming that all sensor nodes are to be homogeneous resources and characteristics.Each node data size is 100 to 2000 units and generates the data items periodically. Every node in the cluster maintains a DNN attribute table which holds MAC address, neighbors list, node id and the storage space of each node learned through continuous resource management messages broadcast by every 10s. Each round simulation time is set to be 600s. Fault-tolerant system parameters are listed in the following Table III.

B. Performance Metrics
The impact of the DRAFT algorithm on an IoT network is analyzed through the following performance metrics.

1) Network Lifetime:
In the simulation, the average Network lifetime of the IoT network with and without the DRAFT algorithm has been evaluated. This algorithm recovers the data when node failure occurs in the network. Considering 'p' is the probability of node failure that a node fails in one round. Assuming that the probability of a node failure for each round should vary from 0.1% to 0.5% as an increment. The total no. of deployed nodes in a cluster is 'Ni' and the communication range is 'X'. For without recovery scheme, in around at least a single node failure probability is P f and is 1 − (1 − p) ( N d) (Bernoulli's trails) whereas Nd is the network density and considering as a standard value.In the case of with recovery scheme, 'R' is the recovery candidate, and there are two requirements to recover the data, first, the recovery candidate must be alive and the second is all of its direct neighbors should be alive. The probability of R is P V = (1 − p) ( N d) whereas P v is the probability of a recovery candidate. The network lifetime of the proposed recovery scheme is more as compared to the without DRAFT algorithm as clearly shown in Fig. 4. When the probability of node failure is increased automatically the network lifetime is dropped for both with and without recovery scheme.
2) Throughput: Throughput is defined as the amount of data transmitted successfully from one node to another node in the network within a period. When the probability of the node failure decreases then the throughput of the network increases gradually as shown in Fig. 5. If there is a node failure occurs the data has been transmitted successfully because of the DRAFT algorithm. The throughput has improved double times with the recovery scheme as compared to the without recovery scheme. 3) MTTDL: Meantime to data loss is the average time that causes data loss in the node. Data loss is occurred due to error situations in the networks. Backup and data recovery methods are helping to recover data or to avoid data loss in the IoT networks. If failure of any node in the cluster, MTTDL is decreased by increasing the recovery rate of the data. This will help to improve the network reliability and data availability in the network. The simulation results in Fig.  7 show the best recovery rate when the DRAFT algorithm has been incorporated into the network.

4) Reliability:
Reliability is the capacity of the network to work during the presence of node failures concerning time.
Here the time considering as the normalized time which means scaling the time within the range of 0 to 1. For an IoT network at t = 0, the reliability is approximately high, with respect to time the network reliability is gradually decreasing which is shown in Fig. 6. The reliability mainly depends on the failure rate and repair rate of the node, and data recovery of the node. www.ijacsa.thesai.org