VHDL Design and FPGA Implementation of LDPC Decoder for High Data Rate

In this work, we present a FPGA design and implementation of a parallel architecture of a low complexity LDPC decoder for high data rate applications. The selected code is a regular LDPC code (3, 4). VHDL design and synthesis of such architecture uses the decoding by the algorithm of BP (Believe propagation) simplified "Min-Sum". The complexity of the proposed architecture was studied; it is 6335 LEs at a data rate of 2.12 Gbps for quantization of 8 bits at the second iteration. We also realized a platform based on a co-simulation on Simulink to validate performance in BER (Bit Error Rate) of our architecture. Keywords—error correcting codes; LDPC codes; BP “MinSum”; VHDL language; FPGA


INTRODUCTION
LDPC codes were discovered by Gallager [1] [2] in the early 1960.This remarkable discovery has been largely ignored by researchers for nearly 20 years, until the work of Tanner in 1981, in which he provided a new interpretation of the LDPC codes from a graphical perspective.Tanner's work has also been ignored by theorists for about 14 years until the late 1990s, when some coding researchers began to investigate the graphic codes and iterative decoding.Their research led to the rediscovery of Gallager's codes.They showed that a long LDPC codes with iterative decoding based on the Believe Propagation enable a performance error representing only a fraction of a decibel away from the Shannon limit [3][6] [7][8].This discovery makes the LDPC codes powerful competitors relative to turbo codes for error control when high reliability is required.LDPC codes have the advantage of turbo codes, it does not require a long interleaving to achieve a good error performance.Thus in 2004, an LDPC code was first standardized in a satellite broadcast DVB-S2 [9].
In this work, we are interested in building a regular LDPC code and study its performances in terms of complexity, data rate, latency and BER versus SNR for various iterations and quantifications.
We began by recalling the principle of LDPC codes in the first part; the second part is devoted to the implementation of said decoder and the last one to validate our design.

A. Principle Of LDPC Codes
An LDPC code can be represented by its parity check matrix (noted H) or by a bipartite graph (Tanner graph).In the example of Fig. 1, the rows of the matrix are represented by squares and are called check nodes, the columns of the matrix are represented by circles and are called data nodes and the"1" represent the edges in the graph.

B. Encoding of LDPC codes
The encoding operation consists first in finding a generator matrix G such that G.H T = 0.The work of T. J. Richardson and Urbanke R.L [4] showed that the check matrix must undergo a pre-processing before the encoding operation.The aim of this pre-processing is to put this matrix in a lower pseudotriangular form, as shown in Fig. 2, using only permutations of rows or columns.This matrix is composed of 6 sparse submatrix, referenced A, B, C, D, E and a lower triangular T submatrix.The size of T sub-matrix is (m-g)×(m-g) where g is smaller as possible.Once the H pre-processing is completed, the coding principle is based on the resolution of the system represented by the equation ( 1) [4].Where C is a code word: www.ijacsa.thesai.org

C. Decoding of LDPC Codes
Decoding the LDPC codes is done from iterative algorithm; the most used is the BP (Belief Propagation).In our work, we have used the BP "Min-Sum" adapted to the hardware implementation.The algorithm consist to update, first the data nodes after, check nodes at each iteration and at the end make a decoding "Hard" decision that is the most likely codeword [10].

III.
FPGA IMPLEMENTATION OF LDPC DECODER LDPC code discussed in this document is characterized by the H parity matrix given in (3), after we made the necessary transformations on H [9] to determine the generator G matrix.This G matrix is the basis of the LDPC encoder which calculates the word C code from the u information as follows:

C=u.G (2)
The LDPC decoder is designed in VHDL and implemented on the EPC4CE115F29C7 type of FPGA Altera using the simplified BP "Min-Sum".The decoder circuit is given in Fig. 5. Table I summarizes the complexity (in Logic Elements LE), data rate , and decoder latency for 2 nd , 10 th and 20 th iterations quantized on 5, 6, 7, 8 and 11 bits.
Operators used to update variable nodes and check nodes are illustrated in Fig. 3 and Fig. 4 respectively.The functional simulation on Quartus II tool (see Fig. 6) shows the parallel computing implemented, allowing the updating of outputs after the first active edge of the clock.Where, a maximum latency equals to one clock cycle.The evolution of complexity versus the iterations and the number of quantization bits is shown in Fig. 7.This shows that the complexities of the 10 th and the 20 th iteration are multiplied respectively by 5 and 10, relative to the 2 nd iteration, whatever the number of quantization bits.

IV. VALIDATION OF THE DECODER
After functional simulation on Quartus II, we validated our decoder in the digital transmission chain designed on the Simulink tool (see Fig. 8) [5].This chain of Co-simulation also allowed us to measure the BER performance based on the SNR for various iterations and different quantization bits.Fig. 8. Validation platform of our decoder circuit on Matlab/Simulink Fig. 9 shows the BER performance of the decoder for the real data (2 nd , 10 th and 20 th iteration), where one can see that the value of the SNR won in the 10 th and 20 th iteration, www.ijacsa.thesai.orgcompared with the second iteration for a given BER is negligible in comparison to the complexity, which is multiplied respectively by 5 and10.Fig. 10 shows this BER performance for the VHDL implementation for the second iteration (with quantifications of 5 bits, 6 bits, 7 bits and 8 bits).The results show that quantification of 8 bits gives BER performance very close to those of real data.
The comparison with other designs (see Table II), shows that our design has a very low complexity, higher data rate and acceptable BER performance.We note that: For the data rate s in the table II, they are evaluated without removing the parity bits.
For the complexity, some authors have used Stratix and Virtex FPGA circuits, where the complexity is evaluated by different units of LE (Logic Element), which therefore requires an analysis that is performed as follows: For Stratix FPGA from Altera, where the complexity is expressed in ALUTs: LE=1.25*ALUT.[15].
For Virtex FPGA from Xilinx, the complexity is expressed in Slice and LUT, the approximate formula used is LE = Slice* 4*LUT * 0.83 [16].

V. CONCLUSION
In this paper, we designed in VHDL and implemented on the FPGA circuit an LDPC decoder, starting from its parity check matrix, and the determination of all the necessary means for its implementation, namely the generator matrix and decoding equations using the simplified method BP "Min-Sum".Then we tested and validated it on a platform developed in the Simulink software for the co-simulation with Dsp Builder software.
The results show that our design has a high data rate, low latency and very low complexity.The BER versus SNR can be further improved by the increase in the code size and keeping the same principle of parallelism.

Fig. 1 .
Fig. 1.Example of parity check matrix and its correspondent Tanner Graph

Fig. 6 .
Fig. 6.Example of the decoder functional simulation

Fig. 7 .
Fig. 7. Complexity evolution depending on the number of iterations and the number of quantization bits

Fig. 9 .
Fig. 9. BER Performances versus the SNR of the decoder for the 2nd 10th et 20th iteration (Real data)

TABLE I .
DECODER PEFORMACES FOR DIFFERENT ITERATIONS AND NUMBER OF QUANTIZATION BITS

TABLE II .
COMPARISON WITH OTHERS DESIGN U : Uniform quantization