Soft Error Tolerance in Memory Applications

—This paper proposes a new method to detect and correct multi bit errors in memory applications using a combination of a clustering approach, Bit-Per-Byte error detection technique, and Majority Logic Decodable (MLD) codes. The likelihood of soft errors accelerates with system complexity, reduction in operational voltages, exponential growth in transistor per chip, increases in clock frequencies, breakdown of memory reliability and device shrinking. Memories are the sensitive part of a computer system. Soft errors in memories may cause an instruction to malfunction. Several techniques are already in practice to mitigate the soft errors. Majority logic decodable codes are proved as effective for memory applications because of their ability to correct a massive number of errors. Since memories are used to hold large number of bits that’s the restraint of Majority logic decodable codes method, so we emphasize on the size of data word in this method. The proposed method aims to detect and correct up to seven bit errors with lesser computational time. It works in an efficient manner in case of adjacent errors which is not possible in Majority logic decodable codes (MLD). It is delineated by Experimental reviews that the proposed approach outperforms existing dominant approach with respect to number of erroneous bit detection and correction, and computational time overhead.


I. INTRODUCTION
The unusual condition of multifaceted nature, and the way that the software and hardware are so unpredictably connected, denotes that the system might be extremely delicate to soft errors.In particular, soft errors are a matter of great concern when planning high accessibility systems or systems utilized as a part of electronic-antagonistic situations [1]- [4].In memory applications, soft error can change an instruction or any data value [3]- [5].Almost all system chips have embedded memories like ROM, DRAM, SRAM, flash memory etc.But soft errors in such memory applications are increasing alarmingly as technology these days is focusing on smaller dimension of devices which leads to the integration of circuits [6].Integrated circuits are prone to particle strike or radiation which can cause the memory cell to change its state and obtain a different value than what was desired.Small size of transistors, capacitors and low operating voltages are also the reasons for soft error in memories.So, fault tolerant technique in memory architecture is fundamental issue to ensure its reliability to the users.A small flaw or glitch in a memory cell can change an instruction or can cause a whole program to work incorrectly leading to inappropriate information or loss of valuable data.
There are some existing dominant approaches to provide fault tolerance in memory applications.For example, for satellite applications, hamming code and parity codes are used to secure memory devices.There are some other methods for error detection and correction such as Error Correction Code (ECC) [7]- [9], Euclidean geometry low-density parity check (EG-LDPC) codes [10], [11], etc.However, almost all of these methods are facing area, and time overhead, and significant power consumption penalty.Also these methods have low error detection and correction rate and exhibits lower performance while working with large data word.To overcome these barriers, we came up with a fault tolerant technique which can work with larger data word and consume lesser processing time.
In this paper, an error detection and correction technique is proposed to protect the memory applications.This method combines the salient features of clustering approach [12], Bit-Per-Byte error detection technique, and Majority Logic Decodable (MLD) codes [13]- [16].Majority Logic Decodable codes are used because of their ability to detect multiple bit upsets; Bit-per-byte technique minimizes the required time to detect the error; and the clustering approach works in a very efficient manner in case of adjacent errors.The proposed method provides high efficiency for error detection and correction and can correct up to 7-bit upsets in a 49-bits" data block.
The rest of this paper is presented as follows.Section 2 provides several related work in this area of research.The proposed methodology and associated examples are discussed in Section 3. Experimental analysis is shown in Section 4. Section 5 concludes the paper.

II. RELATED WORK
First Several techniques are already in practice to provide error detection and correction.Some of them are discussed below.
Naeimi et al. [8] proposed a fault-tolerant memory architecture which can tolerate faults both in the storage unit and in the encoder or decoder.A fast and compact error correcting technique is proposed in that paper which is known as one step majority logic correction.One step majority logic correction works in a way that it corrects every erroneous bit at each step and will output the correct code word after full processing.This method requires the same number of cycles as the number of bits for both detection and correction which is a major degrade in performance in terms of access time in memory.www.ijacsa.thesai.orgShih-Fu et al. [7] presented an error detection method for different set cyclic codes using majority logic decoding scheme.Majority logic decodable codes are most appropriate for memory applications because they deal with large number of errors but it may lower the memory performance with excessive decoding time.MLD was first introduced for Reed-Muller codes.They described a plain majority logic decoder (MLD) whose circuit arrangement includes four components: i) a cyclic shift register; ii) an XOR matrix; iii) a majority gate; and iv) an XOR for correcting the code word bit under decoding.It can correct multiple bit-flips depending on the number of parity check equations [6].They proposed a modified version of MLD which is known as Majority Logic Detector/Decoder.The MLDD technique needs 15 cycles to correct an error.However, it can detect and correct only two bit errors from a 15-bitdata word and the time requirement of this method is high enough to degrade its performance in terms of access time in memory.
Jayalakshmi et al. [5] came out with a modified representation of MLDD.It overcomes the existing techniques by detecting errors in lesser cycles.They used additional logic which results in an area overhead.Another limitation is that this method needs additional three cycles to correct any error.

III. PROPOSED METHODOLOGY TO DETECT AND CORRECT ERRORS
In this chapter, the proposed method will be discussed and explained elaborately.The chapter will take you step by step through our method to have a better understanding about the method.Some examples along with pictorial representation will be provided with the method explanation.

A. Memory with MLDD
The existing MLDD [5] is modified to improve its performance.Euclidean Geometry Low Density Parity Check Codes (EG-LDPC) [6] works behind the existing MLDD.The following Fig. 1 shows how the MLDD modification proposed by us will be used in a memory system.

B. Encoder Architecture
The design of encoder is generated from the EG-LDPC codes.The following parameter are in the function of EG-LDPC for any integer t >= 2, where t is the number of errors that the code can correct.
Let"s consider t=2 and if the other parameters are determined accordingly then we would have a (15, 7, 5) EG-LDPC code which will have a generator matrix like Fig. 2 and if Fig. 3 the architecture of an encoder circuit [7] for (15, 7, 5) EG-LDPC code is shown.The information bits are indicated from i 0 …i 6 .The check bits are calculated using linear sum (XOR) operation of the information bits.The information bits are copied to the encoded vector from c 0 ….c 6 and the check bits are copied from c 7 ….c 14 .Thus the encoded matrix is generated.

C. Design Structure of Corrector
One-step majority-logic is a fast and efficient errorhandling technique [10].There is a class of ECCs that are onestep-majority correctable.Type-I two-dimensional EG-LDPC is one of the example of one-step-majority correctable codes.In this section, the one-step majority-logic corrector for EG-LDPC codes is shown.A linear sum named Parity-Checksum can be formed by computing the internal product of the received vector and a row of a parity-check matrix.The principle of the one-step majority-logic corrector is generating parity-check sums from the defined rows of the parity-check matrix.These steps correct a potential error in one bit e.g., c n-1. www.ijacsa.thesai.org 1) Generate parity-check sums by calculating the inner product of the received vector and the defined rows of paritycheck matrix.
2) The check sums are fed into a majority gate.If the output of majority gate is "1", then the bit cn-1 is corrected by inverting the value of cn-1.
The architecture of a serial one-step majority logic corrector for (15, 7, 5) EG-LDPC code is shown in Fig. 4.

D. Fundamental Concepts of Proposed Methodology
The proposed methodology uses the MLDD [5] technique described above as a part of correction method.Our proposed method is tested for a 49-bit data block and it can correct up to 7 bit errors.We proposed a clustering idea to divide consecutive seven bit placed in different cluster.That"s why this proposed method can be applied where there is need to detect and correct adjacent multiple cell upset (MCU).Because adjacent bits are in different cluster and change in adjacent bits can detect easily and correct.The method is discussed below: 1) At first the data word which has the size of 49 bit, is clustered into 7 clusters keeping distance 7 between the data bits or information bits.We will keep 7 bits in each cluster.So this will result in 49/7=7 clusters.Now each cluster will have the information as shown in Fig. 5.The 49 data bits are represented as a 1 , a 2 , a 3 ..., a 49 .Then form 7 different clusters such as a 1 , a 8 , a 15 , a 22 , a 29 , a 36 , a 43 and adjacent bits like a 1 , a 2 , a 3 are placed in different clusters.
2) Each cluster has 7 information bits.Now we calculate even parity for each cluster.It is quite similar to the idea of bit-per-byte technique.If we consider each cluster as a byte (although each cluster here has 7 bits and a byte is formed of 8 bits) then we can apply the bit per byte technique on the clusters like a bit-per-cluster.We have used even parity technique here to assign parity to the clusters.Even parity means the number of 1"s must be even.If number of 1"s is even then parity is 0, otherwise parity is 1to make number of 1"s is even.So after this step, each cluster has it corresponding parity which will be sent with the information bits.We can visualize it as shown in Fig. 6.  3) Now we are going to apply Majority Logic Detector Decoder (MLDD) scheme for each cluster.Let"s consider each cluster has information bits denoted as i 0 ….i 6 .Then according to the MLD [7] we have generated the check bits from the information bits which are the checksums (XOR) of information bits.The check bits are generated as shown in Fig. 7. Now the clusters have 7 information bits and 8 check bits which is 15 bits.

4)
In this step, the information bits will be sent to the receiving side in the form which was seen at the first step like a 1 , a 2 , a 3 …., a 49 .With the information bits, parity bits of each cluster will also be sent which was calculated using odd parity.Along with these, the check bits for each cluster are also sent to the receiving end.So, the following information are sent from the sending end.
 Information bits (a 1 , a 2 , a 3 , …., a 49 )  Parity bits for each cluster (p 0 , p 1 , p 3 , …, p 6 )  Check bits generated for each cluster (C 7 , C 8 , C 9 , …, C 14 ) 5) This information is sent to the receiving side.While transmitting the above information, any bit may flip and change the state from 0 to1 or 1 to 0 resulting in misleading information.At the receiving end the information bits will be received but they may not be error free.Let the received information bits are a 1 , a 2 , a 3 , …, a 49 ) 6) At the receiving end, we will form clusters like we did in step 1.So we will have 7 clusters keeping distance as 7 among the information bits of each cluster.Finally, the generated clusters are-Cluster1, Cluster2, Cluster3, …, Cluster 7.
7) After forming the clusters, we will calculate the parity bits for each cluster using odd parity.So the parity of each cluster at the receiving end may look like-parity (Cluster1), parity (Cluster2), parity (Cluster3) … parity (Cluster7).www.ijacsa.thesai.org8) In this step parity of each cluster of sending end will be compared with the parity of receiving end"s cluster.If a mismatch is found at any cluster, then that cluster will be taken under consideration and that cluster is assumed to have error in its bits.Now let"s assume Cluster (i) have a mismatch and it has errors.Now check bits will be generated for that cluster using the technique as described in step 3.So after generating the check bits (C 7 , C 8 , C 9 , …, C 14 ) we will have total 15 bits to apply the majority logic decoding.The information bits are copied to C 0 , C 1 , …, C 6 .So the code word will be like: C 0 , C 1 , C 3 …, C 14 .
9) The process of majority logic decoding is outlined shortly as follows: Step 1: Initialize counter variable to 0.
Step 2: Calculate majority values B j as follows: (1) Step 3: If majority value is greater than 2 then go to step 4, else go to step 5.
Step 4: Inverse the 14 th bit.Store the counter which is the erroneous bit position.Go to step 5 Step 5: Perform one-bit cyclic left shift.
Step 6: Increment the counter Step 7: If counter variable equals to 8 then go to step 8 else go to step 2 Step 8: End 10) Now we have the positions where bit flip in a cluster has occurred during transmission and those erroneous bits are corrected.We store those positions in a cluster to determine the actual positions in the data word.Next we examine other clusters to fine errors (if any) and find their positions in the corresponding cluster and thereby correct them.If we follow this method, then we would be able to detect and correct adjacent bit upsets which is a common issue in memory applications.Let"s walk through an example to describe our method with sending end code word of Fig. 8 and receiving end code word of Fig. 9. Sending code word is the original data with parity bits and receiving code word is the erroneous collection of original code word.
For the above example, total seven clusters can be formed with the above forty-nine data bits.Now, the parity bits of receiving clusters are compared with those of the sending clusters.If there is any mismatch, then only for this cluster we will generate 8-bit parity using Majority Logic Detector Decoder (MLDD) scheme.
As shown in Fig. 10 and 11, we can observe that in second cluster there is a mismatch and for this cluster we will generate 8-bit parity using the following architectures shown in Fig. 12.
Then for the erroneous cluster, the size of the code word will be 15-bit.i.e.C 0 , C 1 , C 2 , C 3 , C 4 , C 5 , C 6 , C 7 , C 8 , C 9 , C 10 , C 11 , C 12 , C 13 and C 14.In this case, it will be 011011101000111.
Using majority decoding circuit, we will perform eight left cyclic shift.At each step of shift operation, the majority values B 1 , B 2 , B 3 , and B4 will be calculated.If the majority values are 1 then it is confirmed that the current bit under decoding is erroneous.Then an inverter is added to the 14 th bit position in the register.The whole procedure of eight cycles is shown in Fig. 13.In cycle 1, calculate B 1 , B 2 , B 3 and B 4 using the above (1), ( 2), ( 3) and (4).Then check the majority and this cycle we get B 1 =0, B 2 =0, B 3 =1 and B 4 = 0.So majority is 0 and performs one bit cyclic shift and goes to cycle 2.The values of B 1 , B 2 , B 3 and B 4 are again calculated and this time majority is 1.So according to the proposed algorithm, the 14 th bit is inversed and goes to cycle 3.This procedure is repeated till cycle 8 with the two possibilities, one is majority 0 then perform one bit cyclic shift and another is majority 1 then inverse the 14 th bit.
After the 8 th cycle we can see the original 7 information bits are in last 7 position.Hence, if we do seven right shift then we will get the corrected code word The corrected code word is: 1 1 1 0 1 1 1 0 1 0 0 0 1 1 1.
After going through the whole process, we will get original information bits as expected to be received.Then from the clusters we obtain the information bits of the form a 1 , a 2 , a 3 , …, a 49 .Now the overall workflow of the proposed method is shown in Fig. 14 as a flow chart which provides a better overview of the method.

IV. EXPERIMENTAL ANALYSIS
This proposed methodology is experimented through a simulation procedure.The simulation process includes "errordetection" phase and "error-correction" phase.It identifies the soft error through the detection phase and appropriately recovers it so that the original stored data is retrieved.In this section, the experimental results of proposed method and other existing methods are represented and discussed.The effectiveness of the proposed method is evaluated in this section.

A. Experimental Tools
The following tools are used for the evaluation process of the proposed method.

B. Experimental Result
The outcomes of the experiments are shown in this section along with some comparisons with the existing methods.The results ultimately indicate how the proposed method performs better in terms of the amount of cyclic shift needed.Also it shows that the proposed method performs better to deal with common mode errors or adjacent bit errors while the existing methods are not suitable for this purpose.Fig. 15 shows the comparison of cycle needed for error detection by the plain MLD [8] and existing MLDD [5], and the proposed method.
In all cases MLD [8] occupy 15 cycles to detect errors.In case of MLDD [5], if there is no error then it takes only three cycles to confirm that one.But if there is error, then it takes larger cycles.However, the proposed method requires fewer cycles than MLD [8] and MLDD [5] to detect any error for 14-bitcode word using bit per byte and clustering approach.Fig. 16 shows the comparison of cycle needed for error correction by the plain MLD [8], existing MLDD [5] and the proposed method.
If an error is detected, MLD takes 15 cycles need to run the entire decoding process.The existing MLDD needs 18 cycles.The existing MLDD has same procedure.However, rather than 15 cycles, three additional cycles are required.The proposed method needs (15+3)/2 cycles that means 9 cycles.

V. CONCLUSIONS
The proposed methodology focuses on the architecture of a Majority Logic Decoder/Detector (MLDD) with the utilization of bit-per-byte and clustering approaches for fault detection and correction, with decreased cycles.Along with this, the proposed method is very much useful when there are errors in adjacent bits because each adjacent bit is formed in different cluster.So that errors can be easily detected.So, those systems where much possibility to occur adjacent bit error then this proposed method perform better than any other MLDD system with minimum cycle.The proposed method is designed in a way so that it could deal with larger data block.Experiments are performed for large data word to prove its efficiency.To show better performance with larger data block our clustering based approach may consume more time than other methods which are good for smaller data word.The proposed method can detect and correct multiple adjacent cell upsets whereas, the existing cannot perform that.The main limitation is that when multiple errors occur in same cluster then the proposed method can"t detect these faulty bits.This proposed method is only focused to detect adjacent error and minimum cycle than the exiting.In the later work, we try to detect and correct errors in same cluster and work with large data block quite faster that this proposed method.

Fig. 1 .
Fig. 1.Proposed Structure of a Memory System with MLDD.

Fig. 12 .
Fig. 12. Calculate Cheek Bits when Mismatching in Sending and Receiving Parity Bits.

Fig. 13 .
Fig. 13.Performing Eight Left Cyclic Shift for Acquiring the Error Free Code Word.

Fig. 15 .
Fig. 15.The Comparison among Plain MLD [6], the Method Proposed by Jayarani et al. [3], and the Proposed Method for Error Detection.