An Efficient Data Replication Technique with Fault Tolerance Approach using BVAG with Checkpoint and Rollback-Recovery

Data replication has been one of the pathways for distributed database management as well as computational intelligence scheme as it continues to improve data access and reliability. The performance of data replication technique can be crucial when it involves failure interruption. In order to develop a more efficient data replication technique which can cope with failure, a fault tolerance approach needs to be applied in the data replication transaction. Fault tolerance approach is a core issue for a transaction management as it preserves an operation mode transaction prone to failure. In this study, a data replication technique known as Binary Vote Assignment on Grid (BVAG) has been combined with a fault tolerance approach named as Checkpoint and Rollback-Recovery (CR) to evaluate the effectiveness of applying fault tolerance approach in a data replication transaction. Binary Vote Assignment on Grid with Checkpoint and Rollback-Recovery Transaction Manager (BVAGCRTM) is used to run the BVAGCR proposed method. The performance of the proposed BVAGCR is compared to standard BVAG in terms of total execution time for a single data replication transaction. The experimental results reveal that BVAGCR improves the BVAG total execution time in failure environment of about 31.65% by using CR fault tolerance approach. Besides improving the total execution time of BVAG, BVAGCR also reduces the time taken to execute the most critical phase in BVAGCRTM which is Update (U) phase by 98.82%. Therefore, based on the benefits gained, BVAGCR is recommended as a new and efficient technique to obtain a reliable performance of data replication with failure condition in distributed databases. Keywords—Data replication; computational intelligence; fault tolerance; binary vote assignment on grid; checkpoint and rollback-recovery


I. INTRODUCTION
Data replication is a useful technique for a Distributed Database System (DDS) as it can provide high availability and efficient access to required data and can be applied in a grid computation situation to improve the efficiency of the system [1,2]. Besides, data replication technique can be one of the influential techniques that can expand the usefulness of computational intelligence structure. Data replication involves frequent, incremental copying of data from one database to another database in a continuous manner which can increases availability, provide low response times and allows fast local access of the system [3,4]. Despite the goodness of data replication techniques in handling the distributed database, still, it has some weakness when dealing with failure cases.
Handling data replication in the failure cases is very crucial in order to preserve the effectiveness of the systems. The main challenges of data replication are that the replica has to be kept consistent when updates occur despite having any failure during the transaction"s running [4]. The only way to solve these problems is by enabling fault tolerance. Fault tolerance approach is a crucial issue in distributed computing; it keeps the transaction in an operational condition in subject to failure. The most important point of it is to keep the transaction working even if any of its part goes off or faulty [5]. Fault tolerance is the dynamic approach that"s used to keep the interrelated transaction together, put up with reliability and availability in DDS. Efficient fault tolerance approaches help in detecting of faults and if possible recovers from it [6].
Based on previous studies, the combination of any data replication technique with Checkpoint and Rollback-Recovery (CR) fault tolerance approach in a distributed database is infrequently analyzed irrespectively of its individual promising potential to lessen the total execution time in failure-prone situations [7]. As an example, research done by [7] explored the performance of transaction process using CR only, replication only and the combination of both techniques in linear workflow with the presence of failure. The result obtained reveals that the conditions in which each techniques lead to improved performance. Besides that, paper done by [8] concludes that the CR approach is essential for not only transaction process replication but also for security issues.
Despite good performances, there are only few researchers who had interest in exploring the effectiveness of combining data replication technique with CR fault tolerance approach. It is a common practice to utilize a Checkpoint and Rollback-Recovery (CR) to facilitate an adequate failure recovery for improving transaction reliability [9]. Mainly, the checkpoint is performed to save information linked with the completed portion of a transaction. When a transaction failure occurs, through rollback and information retrievals, the transaction can be resumed from the last successful checkpoint. Instead, without implementing the checkpoint technique, the transaction has to repeat the execution of the entire transaction from the very beginning [10]. Hence, the data replication Therefore, in this study, a data replication technique called as Binary Vote Assignment on Grid (BVAG) is combined with CR with the proposed of evaluating the efficiency of hybridizing data replication technique with fault tolerance approach for a better performance of a single data replication transaction in the presence of failure. The proposed method, BVAGCR is implemented in Binary Vote Assignment on Grid with Checkpoint and Rollback-Recovery Transaction Manager (BVAGCRTM).
The paper is arranged as follows. In the next section, Literature Review is detailing about BVAG data replication technique and CR fault tolerance approach. In Section 3, Methodology describes the procedure of BVAGCR technique which is employed via BVAGCRTM. The Result and Discussion section discussed the outcome obtained from standard BVAG and BVAGCR. Also presented in this section is the comparison of both techniques in terms of execution time while managing data replication transaction with the occurrence of failure. Finally, the conclusion of this research and suggestion for future research are provided in Conclusion.

A. Binary Vote Assignment on Grid (BVAG)
The concept of Binary Vote Assignment on Grid (BVAG) is replicating the data from the primary replica to the neighbours" replica which is located at the adjacent sites of the primary replica [11]. Full replication can result in a huge waste of storage space and consume a lot of bandwidth [12]. By using this technique, the execution time of the replication process in a distributed database can be reduced as it only replicates data at the specified sites [12]. The query expansion process involves augmenting initial user query with additional terms that are related to user requirements [13], while BVAG focus challenge to increase write query availability through replication. BVAG is striding a new track in replication that helps to maximize the write availability with little communication cost as a result of minimum number of quorum size needed. Furthermore, the replication is interconnected with transaction procedure [14].
In BVAG, all sites are logically organized in the form of two-dimensional grid structure. For example, if a BVAG consists of nine sites, it will logically organize in the form of 3 x 3 grid structure ( Fig. 1) as shown. As can be seen in Fig. 1, site A is neighbours to site B and site D, if A is logically located adjacent to B and D. Hence, four sites on the corners of the grid have only two adjacent sites, other sites on the boundaries have only three neighbours and the site located in the middle of the grid formation has four neighbours [14]. Each site has a premier data file. Data will be replicated to the neighbours sites from the primary site [11]. For simplicity, the primary site of any data file and its neighbours are assigned with vote one (1) or vote zero (0) otherwise. A neighbour binary vote assignment on grid, B, is a function such that B(i) ϵ {0,1},1 ≤i ≤d where B(i) is the vote assign to site i. This assignment is treated as an allocation of replicated copies and a vote assigned to the site results in a copy allocated at the neighbour. Due to the data that will be replicated to neighbours, the possible number of data replication from each site, d, should then be: d ≤ quorum (the number of neighbours + a data from the primary site itself).
For example, primary data from the site A which is called as "a" are replicated to site B and site D which are their neighbours. Site E which holds the primary data "e" has four neighbours, namely, sites B, D, F and H that will get the replicated data of "e". As such, the site E has five replicas. Meanwhile, primary data "f" from site F are replicated to site C, E and I. The number of quorums used are based on the total number of replicated data and the primary data, d, which can be three, four or five [11,14].
The transaction procedure in BVAG is called as Binary Vote Assignment on Grid Transaction Manager (BVAGTM). The BVAGTM is applied to control the transaction of each data replication process. The primary site of any primary data file and its replica are assigned with different votes depends on their condition. There are two types of votes used in this study as shown in Table I. Zero (0) specified that the site is available (free). Meanwhile, one (1) displayed that the site is unavailable (busy). The status of each site is statistically independent of others. The status of each site is statistically independent of others. When a site is available, the copy at the site is available too; otherwise, it is unavailable [11,14].  [11,14]. IL phase involves locking the primary site if the primary site is in available (0) status. If the primary site is busy (status = 1), then the primary site will be release (RL). After the primary site has been locked, the PL phase determines the status of each neighbours site. All neighbours sites are locked if they are in available (0) status. Otherwise, the neighbours" sites will be release (RL). Then, OQ phase declares that the quorum obtained is enough for the transaction to be continued. Next, the primary data will be updated in the U phase. Afterward, the updated primary data which is also called as new primary data is replicated to the neighbours" sites in C phase. Last but not least, the transaction will unlock (UL) all the sites that are involved in the transaction. The summary of BVAGTM is shown in Fig. 2.

B. Checkpoint and Rollback-Recovery (CR)
Checkpoint with Rollback-Recovery (CR) is a renowned fault tolerance approach. Checkpoint is a process which stores the recent state of a transaction in stable (non-volatile) storage [9]. It is recognized through the normal execution of a transaction occasionally. The information related to the transaction is saved on a stable storage with the intention of using it in case of site failures. The saved information comprises of the transaction state, its environment, the value of registers, etc. When an error is spotted, the transaction is roll backed to the last saved state [15]. Fig. 3 shows the summary of CR approach. The checkpoint mechanism takes a snapshot of the transaction state and stores the information on some non-volatile storage medium [16]. When failures occur, the restore mechanism copies the last known check pointed state of transaction back into memory and continue processing. The basic idea behind CR is the saving and restoration of transaction state. By saving the current state of the transaction occasionally or before critical code sections, it delivers the baseline information needed for the restoration of lost state in the event of a transaction failure. CR is one of various time efficient fault tolerant approaches [17]. Besides reducing the execution time, CR can also lessen computing resources [18].

III. METHODOLOGY
In this section, the methodology of the proposed technique called as BVAGCR is described. Fig. 4 illustrated the algorithm of BVAGCR technique applied in Binary Vote Assignment on Grid with Checkpoint and Rollback-recovery Transaction Manager (BVAGCRTM) for a single data replication transaction.
First, the following notations are defined as:

1) is transaction,
2) is checkpoint transaction, 3) λ represents different group of transaction T (before and until get quorum). λ can be either α or β, 4) is transaction of group λ 5) is the data to be update 6) is the number of queue for transaction . i = 1,2,3, … 7) is transaction of λ for data χ in queue 1  8) is checkpoint transaction of λ for data χ in queue 1  9) is checkpoint file 10) is checkpoint file for transaction of λ for data χ in queue 1 11) is the status of the required site 12) stands for Primary Replica site 13) stands for Neighbour Replica site 14) is status for 15) is status for 16) is the status of which hold data χ 17) is the status of which hold data χ 18) is the amount of quorum needed to continue the transaction of

20) is the database for 21)
is the database for 22) is the database of that consists of data χ 23) is the database of that consists of data χ A data replication transaction can request to update any data file at any replica. The BVAGCRTM will firstly check whether there is any checkpoint file, that has been saved in BVAGCRTM. If there is none file, BVAGCRTM will accept a new data replication transaction.
A new data replication transaction named as ( ) which request to update data ( ) is in first queue ( ) in BVAGCRTM. The transaction will check the status of primary replica ( ) which hold data ( ) whether it"s status is free (0), busy (1) or having a failure (-1). If the primary replica is free to be used in the transaction, the status will be lock as 1. Else, the primary replica will be released because it is unavailable. The status of ( ) and data ( ) are save in the checkpoint file named as ( ) file. Next, the transaction will request to lock all the neighbours" replicas, ( ) . If all ( )is in free status, then it will be lock as 1. However, if one or any ( ) is busy, all of it will be released for other transactions. The status of each ( ) is then saved in file.
Afterward, the total of is declared and saved in the file. After that, the Update and Commit () function is executed. Data ( ) will be update at ( ) which is the primary database that hold data( In the next section, an experiment considering failure condition has been conducted in order to evaluate the performance of BVAG and BVAGCR. The results obtained and the discussions about the results are also explained in the next section.

IV. RESULT AND DISCUSSION
An experiment of a single transaction with failure condition occurred at the primary replica was conducted in this research using MATLAB simulation. The time between failure and recovery is assumed as 10 seconds. The transaction is continued after failure recovery. In this transaction, the site E is considered as the primary replica holding primary data e. Meanwhile, sites B, D, F, H are the neighbor"s replica which will be receiving the copy of data e from site E. In this case, the transaction, ( ) requests to update data ( ) and replicate the data into the neighbors" replica. A transaction failure is considered to occur in the Update (U) phase seeing that it is the critical phase in BVAG and BVAGCR. ) with failure condition for BVAG and BVAGCR. As can be seen in Fig. 5 (BVAG), the information related to the transaction in BVAGCRTM was not being saved in a checkpoint file ( ). Thus, when a failure occur in T10 , the transaction needs to be started all over again as there is no information recovery can be done if failure occurs.
Meanwhile, in BVAGCR (refer Fig. 6), the information related to the transaction is being saved in the checkpoint file www.ijacsa.thesai.org (highlighted with grey) for each phase of BVAGCRTM. Thus, once the transaction ( ) failed as in T13, the information can be retrieved from the checkpoint file ( ) and the transaction can be resume from the Update phase (U) as the failure occurred in that particular phase.
The execution times for each phase involved in BVAG and BVAGCR methods are recorded before and after failure occurred as shown in Table II and Table III. As presented in Table II, the overall time taken to complete a transaction using BVAG is 15.8574 seconds which include the estimation time duration of failure (10 seconds). The transaction had run four phases which are IL, PL, CQ, OQ that took 0.4931 seconds to be executed before failure occurred. After failure recovery, the transaction needs to be run again from the start due to no checkpoint file, ( ) found. The time taken to rerun the transaction is 5.3643 seconds. For the critical phase U, the time needed to finished it is 4.7809 seconds.
As displayed in Table III, BVAGCR need 10.8381 seconds of time to finish a transaction before failure occurred and after failure recovery which also takes account of the estimate time duration of failure (10 seconds). The transaction had performed four phases same as BVAG which are IL, PL, CQ, and OQ that acquired 0.7443 seconds before failure happened. Once the failure had been recovered, the transaction only needs to rerun at U phase onwards because it had retrieved the information about the transaction which is save in a checkpoint file ( ). Based on the ( ) file, the last saved information of the current transaction is in phase U. The time used to rerun the transaction from U phase until the data has been replicated to all neighbors" sites is 0.0938 seconds. For the critical phase (U), the time needed to finish it is 0.0516 seconds. Fig. 7 shows the comparisons of time taken for total time, update phase, execution time before failure and execution time after failure between BVAG and BVAGCR. Based on Fig. 7 Meanwhile, the total time taken for a single transaction of BVAGCR (10.8381 seconds) is shorter than BVAG (15.8574 seconds). BVAGCR has improved the standard BVAG method by 31.65 % in terms of total execution time. Besides that, the efficiency of BVAGCR can also be seen when it successfully improved the performance of standard BVAG in critical phase which is the Update (U) phase by 98.82%. Thus, based on the results obtained, it can be said that the objective of this study which is to improve the efficiency of standard BVAG by proposing a new data replication technique with fault tolerance approach (BVAGCR) has been successfully achieved.

V. CONCLUSION
This study has explored a new combination of data replication and fault tolerance approach called as BVAGCR.
The performance of BVAGCR is tested using a simulation of MATLAB. A comparison between standard BVAG and BVAGCR has been done in order to evaluate the effectiveness of implying the CR fault tolerance approach in BVAG data replication technique. The result gained from this study shows that the proposed BVAGCR has outperformed standard BVAG in terms of total execution time, time taken to execute the U phase and time taken to rerun the transaction after failure recovery. Therefore, BVAGCR can be proposed as an alternative technique which is efficient and reliable to replicate data in failure condition. To test the robustness of the proposed BVAGCR, future work should explore the application of this proposed method on big data. Besides that, BVAGCR can also be implemented with data mining method in order to get more competent performance of the data replication technique.