Hardware Architecture for Adaptive Dual Threshold Filter and Discrete Wavelet Transform based ECG Signal Denoising

The ECG signal, like all signals obtained when instrumenting a data acquisition system, is affected by noises of physiological and technical sources such as Electromyogram (EMG) and power line interferences, which can deteriorate its morphology. To overcome this issue, it’s subjected to apply a preprocessing step to remove these noises. Filtring techniques are complex computations becoming more common in medical applications, which must be completed in real-time. As a result, these applications are geared at integrating high-performance embedded architectures. This paper presents an FPGA (Field Programmable Gate Array) embedded architecture designed for an ECG denoising hybrid technique based on the Discrete Wavelet transform (DWT) and the Adaptive Dual Threshold Filter (ADTF), dedicated to handle with noises affecting ECG signals. The architecture was designed following a hardware-software codesign using a high-level description language and synthetized to be implemented on different FPGAs due to the structural description flexibility. The global architecture was divided into a set of functional blocks to allow parallel processing of ECG data. The simulation results confirm the high performance of the system in noise reduction without affecting the morphology of the signal. The process takes 0.3 ms with an acquisition frequency of 360 Hz. The whole architecture requires a small area in different FPGAs in terms of resources utilization. It uses less than 1% of the total registers for all FPGA devices which represents a total of 292 registers for Cyclone III LS, Cyclone IV GX, Cyclone IV E, and Arria II GX; and a total of 329 registers for Cyclone V. The logic elements occupancy varies between 3% using Cyclone V and 60% using Cyclone IV GX freeing up space for other parallel processing tasks. Keywords—ECG signal; DWT; ADTF; hybrid technique; hardware-software codesign; FPGA


I. INTRODUCTION
The ECG or electrocardiogram is an electrophysiological signal whose trace describes the heart's electrical activity captured by electrodes puted on the surface of the body. This signal is currently used for the prevention and detection of cardiovascular diseases [1], [2]. Intelligent diagnostic systems have emerged to better use ECG data in large quantities whose analysis is difficult manually [3]. These systems make it possible to improve the quality of the signal (noise filtering), the enhancement of relevant information, the extraction of information that is not visible by direct visual analysis, as well as to propose a diagnosis that can provide sufficient help to doctors to make the right decisions [4]. Noise degrades the precision and accuracy of the analysis. Signal denoising is then highly desirable and essential.
Digital filters are used for denoising by selecting the useful information frequency band or the noisy frequency bands [16]. Thus, high reduction of noise increases the order of the filter a lot, which can increase the complexity and the processing time. EMD methods disintegrate the noisy signal into IMFs (Intrinsic Mode Functions) and eliminate the noisy ones [8], which can destroy the signal. Wavelet methods put in view time and frequency information and decompose the signal into details and approximations [17]. Adaptive filtering can be used in several cases, as ADTF [14], which is performant in highfrequency noise reduction.
The study we presente in this paper concerns the denoising of ECG signals using an algorithm based on the DWT and the ADTF. The hybridization of the tow algorithms was published by Jenkal et al. in [11], this technique aims to combine the advantages of both ADTF and DWT methods to deal with deferent noises, especially high-frequency noises, EMG (Electromyogram) noises, and power line interferences.
The results of this technique were evaluated using Matlab and compared to others methods in [11] and it offers high performances in terms of Mean Square Error (MSE), Percent Root mean square Difference (PRD), Signal-to-Noise Ratio Improvement (SNRimp), and Signal-to-Noise Ratio Output (SNRout).
Analyzing ECG signals in large quantities using this technique requires complex calculations with a need for rapid and real-time processing, which pushes us to move towards hardware implementation on high-performance embedded architectures. FPGA (Field Programmable Gate Array) seems to be good choise for high performance and low power [18]; which are essential needs to applications like signal processing, especially cardiac signals. In addition, low-cost FPGAs can be used for the implementation, as well the system can be moved anywhere.
The approach presented in this article is an original method of our research team published for the first time in [11], validated under Matlab in terms of filtering performance of ECG signals; the goal of this work is the on-board implementation of this method to put it into practice for the supervision of patient cardiac data.
For an FPGA implementation, the two filters, ADTF and DWT, are designed using the VHDL (VHSIC Hardware Description Language) under the Quartus II tool and the Modelsim simulation environment. The algorithm proves the high performance in noise reduction, maintaining the morphology and essential features of the original signal. The simulation results shows that the system has a processing time of 0.3 ms operating at 50 KHz, which respects largely the real-time constraint. The given architecture can be implementable in low-cost FPGAs families because of the modest area that it occupies, and gives possibility to add other blocks for more processing stages as QRS and abnormalities detection. Thus the global architecture uses less than 1% of the total registers for 5 FPGA devices: Cyclone IV Gx, Cyclone IV E,Cyclone III LS, Cyclone V, and Arria II Gx. The logic elements occupancy varies between 3% using Cyclone V and 60% using Cyclone IV GX. The total used pins are 28 for the whole architecture, representing 9% for Cyclone IV E and Cyclone III LS, 10% for Cyclone V, 16% for Arria II GX, and 35% for Cyclone IV GX.
The rest of this paper is organized as follows: The first section describes the ECG signal with an overview of related work.
The second section presents the hybrid technique based DWT and ADTF algorithms.
The third section depicts the VHDL implementation of the whole algorithm, and a discussion of the given results.
Finally, a conclusion and perspectives are presented in the last section.

II. ECG SIGNAL DENOISING OVERVIEW
The cardiovascular system comprises the heart and the vascular system, where the main function is to ensure an adequate continuous blood flow with sufficient pressure to the organs and tissues to meet energy needs and cell renewal. Diagnosing his condition appears to be a vital task for the prevention of cardiovascular disease [19]. The electrocardiogram (ECG) signal remains one of the predominant and most widely used tools for this purpose.
The ECG is the recording of the heart's electrical activity moving in time and corresponding to the depolarization and repolarization of the heart muscle [20]. Fig. 1 represents the recording of the cardiac cycle, where the P wave reflects atrial depolarization, the QRS complex visualize the ventricular depolarization, and the T wave represents the ventricular repolarization.
Nowadays, diagnosis is done in an automatic manner where an automated ECG processing system usually consists of four successive stages [21] as follows: signal preprocessing, waves detection, features extraction, and finally, abnormalities detection and classification.
The signal preprocessing (or denoising) step essentially eliminates the different noises that affect the ECG signal during its acquisition. These noises are two types: physiological noises including muscle noise (EMG), and technical noises incorporating power line interference [22]. Due to its lowfrequency band, ECG is too sensitive to these noises. Several techniques have been proposed to deal with this problem, such EMD or methods using banks of filters, wavelet transform, and adaptive filtring.
Infinite Impulse Response (IIR) and Finite Impulse Response (FIR) filters are digital filters used for ECG denoising. The denoising operation is based on frequency bands selection related to useful information in the signal and the noise frequency bands [16]. For excellent denoising, the number of needed coefficients increases a lot which results in a high computational and increases the delay. EMD methods are also very used to denoise ECG signals where the signal is disintegrated into a set of IMFs [23], [8]. The filtering is done by eliminating the noisy IMFs that can affect useful information in the signal. To overcome this issue, the mode-mixing is removed using Ensemble EMD.
Wavelet methods highlight time and frequency information simultaneously [17], where the signal is decomposed into different resolutions to give details and approximations, then thresholding techniques are used to denoise the signal.
Adaptive filtering proves the good performance for ECG denoising in some cases, ADTF as an example, is a good solution for high-frequency noise reduction [14], [4], [24]. The main advantage of this method is the low complexity compared to other methods like EMD and DWT. The ADTF complexity has a linear form depending on the signal size only, when the EMD and DWT also have a linear complexity but depending on different parameters. Some techniques can gather two or more methods to benefit from their advantages together. The ADTF is reunited to DWT in [11], the next section details more this technique.

A. ADTF Algorithm
The ADTF algorithm calculates, in the first step, three parameters: the average of the chosen window (µ), the lower and higher thresholds (Lt and Ht, respectively). Following the equations: Where W is the window length, Input(i) is the input ECG signal, Min and Max are the minimum and maximum values of the window samples. While α is the thresholding coefficient with 0 < α < 1. The value of α varies to adjust the thresholding operation according to the noise concentration in the signal [14]; in case of a high concentration of noise, lower values of α are favored; otherwise, higher values can be tolerated.

B. DWT Algorithm
In diffrent signal processing applications, the transformation of signals into frequency domaine is very important. To obtain the frequency spectrum of a signal, Fourier transform is the most used. Biological signals, like ECG, have different temporal and frequency characteristics. For example, they are not stationary, and it is precisely in their characteristics (statistical, frequency, temporal, spatial) that reside most of the information they contain. A transformation that provides information on the frequency content while preserving the location to have a time-frequency representation is essential to analyze them.
The discrete wavelet transform studies the signal in various frequency bands with different resolutions by decomposition into a rough estimate and more precise information through two functions, called scale function and wavelet function, which are associated with the low pass and the high pass filters, respectively. The high pass filter provides the wavelet coefficients or details noted D, the low pass filter provides the approximation coefficients noted A. This approximation is, in turn, decomposed by a second pair of filters, the process is explained in Fig. 2. The signal decomposition corresponds to the convolution of the signal (x (n)) with the impulse response of the low pass and high pass processing filters h and g as presented in Fig.  3. (4) and (5) are the equations of these filters for one decomposition level.
Where A[k] is the approximation given by the low-pass filter, D[k] is the detail given by the high-pass filter, x[n] is the discretized form of the original signal, h[n] and g[n] are, respectively, the half-band of the low-pass and high-pass filters. Generally, the mother wavelet is chosen based on the closeness between the wavelet and the processed signal. For ECG signal we opted to use the Daubechies as mother wavelet because of the similarity between them especially Db4 wavelet as it can be seen in fig. 4. Signal denoising using DWT consists of the following three steps: The wavelet transform of the observed signal, which consists of the decomposition of the signal into details and approximations.
The thresholding of the coefficients resulting from the decomposition or elimination of details containing noise. The inverse wavelet of the modified coefficients to restore useful information that has effectively undergone the denoising operation.
To obtain a perfect reconstruction, the analysis and synthesis filters satisfy the condition presented in (6), where h (z) and h '(z) are, respectively, the analysis and the synthesis low pass filters, g (z) and g '(z) are the analysis and the synthesis high pass filters respectively.
In [10] and [25], the performance of DWT in ECG signal processing is presented, especially in the baseline wander noise removing, the architecture is implemented in a low-cost FPGA as the Xilinx ARTIX 7.

C. Hybrid Technique
The hybrid technique is a marriage between ADTF and DWT; this combination permits to reduce, successively, the noise from ECG signal. The whole process is described in Fig. 5, where the ECG signal is subjected to two stages of noise reduction: The first step of this method is the application of the ADTF in the noisy signal; the chosen window is 10 samples, the α coefficient is equal to 0.1(10%), Table I shows the influence of α coefficient in the denoising in terms of signal-to-noise ratio improvement (SNRimp) with Gaussian noise of 10 dB as confirmed in [11]. The second step is the DW T application on the corrected signal by the first step, where the signal is decomposed into many frequency bands. The wavelet mother used in this case is debauchies dB4; the coefficients of this wavelet are the closest to the ECG signal in terms of similarity, as it can be shown in Fig. 4. After decomposition, the details D1 and D2 concentrate an important quantity of noise, so we opted to eliminate these details. Then, the inverse DWT is applied to have the denoised signal.   The fusion of the two techniques provides better results, in terms of PRD, especially for a high density of noise. Taking, for example, the case of the signal 100 from the MIT-BIH database correlated with Gaussian noise of 5 dB, the filtering result using only the ADTF gives a value of the PRD of 24.55 while the hybrid method provides 18.26. The same for signal 103, the parameter PRD is equal to 25.23 with the ADTF and 19.61 with the hybrid method.
The following part dissects the results of this technique dedicated to implementation on an FPGA, where a detailed description of the hardware architecture is presented, with the simulation results and the report on the use of the hardware resources of different FPGA families.

A. Hardware Architecture
As the implementation target is FPGA in this work, we opted for the VHDL to describe the algorithm's behavior and architecture. Quartus II software is used for synthesis. Quartus II synthesis tool transform the code design into a synthesizable Register Transfer Level (RTL) with gate-level netlist. Modelsim ALTERA tool is used for simulation to verify the good behavior of the designed architecture.
VHDL is a hardware description language used to describe the behavioral o the studied algorithm; then, the functional VHDL description can be converted into a logic gate schema that can be implemented in FPGA boards [18]. The proposed architecture is dedicated to being implemented on different FPGA targets, so it is based on a structural description separated on a set of blocks. The various blocks describe the ADTF/DWT modules separately to make it possible to process the modules simultaneously, which permits reducing the processing time.
The architecture of the proposed method is composed by two main blocks, the first for the ADTF denoising stage and the second for the DWT denoising stage, Fig. 8 shows the RTL schema of the global architecture.
The ADTF block incorporates three functional blocks: the ADTF-LOAD (F B1), a shift register to prepare the signal window for the second functional block, ADTF-TREATMENT (F B2), the latter calculates the necessary parameters for the ADTF process. The third functional block, ADTF-TEST (F B3), applies the thresholding operation to the median value of the window.
The output of the first block goes through the second block, where a window of eight elements is prepared by the DATA-LOAD functional block (F B4); then DWT, details elimination,  The purpose of FB1 (Fig. 9) is to prepare the window for the functional blocks; it receives the input ECG signal with a frequency of 360Hz (the M IT BIH database) and gives a window of 10 samples in the output based on a shift register. This permits the online processing of cardiac signals. The FB2 (Fig. 10) computes the average, the maximum, and the minimum of the window received from FB1. The result of the average computation is coded in 30 bits, and its minimized, for resources optimization, to 16 bits: 11 bits for the integer part and the rest 5 bits for the fractional fixed-point part. The maximum and minimum are coded in 11 bits, and they are calculated using loop tests. The FB3 (Fig. 11) aims to apply the denoising operation by calculating the Higher and Lower threshold (Ht and Lt) using the parameters received from FB2. To compute the Ht and the Lt, the α coefficient is used as mentioned in the equations www.ijacsa.thesai.org (2,3). A register of 11 bits is reserved to memorize the α value where α = 0.1, so one bit for the integer part to represent the zero and 10 bits to represent the fractional part.
For the correction stage, the median value of the selected window is compared to the integer part of the two thresholds. Then the assignment of the results to the output of the module. The output can take one of the tree values: it can be the same as the median value if this last is in the margin between the Ht and the Lt, or it takes the Ht or the Lt, respectively if it exceeds the Ht or it is less than the Lt.
The output size is coded in 16 bits, 11 bits for the integer part, and 5 bits for the fractional part. If the median value is affected to the output, which is coded in 11 bits, five zeros are added to the fractional fixed-point part. The output of the ADTF denoising block is the input of the second block, which concerns the DWT denoising where FB4 (Fig. 12) consists of loading eight samples of the signal, which will be a part of the signal to which the DWT is applied. This size is imposed by the number of coefficients of the mother wavelet dB4, which are eight. The output, therefore, is a window of eight elements coded in 16 bits. The FB5 (Fig. 13) is the main functional block of the second block, where the wavelet transform is applied to the eight elements. The signal is decomposed into two levels to extract details from levels 1 and 2; then, the denoising process eliminates the extracted details. The input FB5 is eight elements from the previous FB4, coded in 16 bits. The output represents the result of the decomposition, denoising, and reconstruction operations, which is resized to 16 bits: 11 bits for the integer part and 5 bits for the fractional part.

MIT-BIH Arrhythmia of Physionet [26]
, an International database, is used to test the functioning of the VHDL architecture; It contains 48 records of a half-hour. These signals are sampled with a frequency of 360 Hz and a 11-bits resolution. For the test, White Gaussian Noise (WGN) with SNR levels of 5dB, 10dB and 20dB are correlated to the original signals before the denoising process.
The simulation is done in Modelsin ALTERA software in order to evaluate the good behavior of the VHDL architecture of the hybrid technique. Fig. 14 shows the simulation results of the hybrid technique applied to signal 100 of the MIT-BIH database to which we added a White Gaussian Noise of 20 dB. The simulation results demonstrate the high performance of the algorithm in noise reduction without distortion of the original signal, and therefore conservation of its morphology as is clearly shown in Fig. 14 Once the architecture is synthesized, the implementation is the next step after timing verification. In Fig. 15, timing Simulation of Hybrid-top-level-module of the architecture is visualized. As it can be seen, the system response in 0.3 ms using a processing clk of 50Khz which largely responds to the real-time constraint, with an acquisition frequency of 360 Hz.    The used devices in the comparison are classified in the range of low-cost and low-power technologies, so the architecture of the hybrid technique does not need expensive FPGA boards to ensure high performance. The study is done for Cyclone III, Cyclone IV, Cyclone V, and Arria II families.

C. Hardware Resources Consumption and Discussion
The hybrid architecture uses less than 1% of the total registers for all FPGA devices which is a total of 292 for Cyclone IV GX, Cyclone III LS, Cyclone IV E, and Arria II GX; and a total of 329 for Cyclone V as it can be shown in Fig. 16. The logic elements occupancy varies between 3% using Cyclone V and 60% using Cyclone IV GX as it can be seen in Fig. 17. The global architecture uses a total of 28 pins, 11 pins for the input signal, which is coded in 11 bits, 16 pins for the output or corrected signal, and one pin for the clock with a percentage of 9% for Cyclone IV E and Cyclone III LS, 10% for Cyclone V, 16% for Arria II GX, and 35% for Cyclone IV GX as montioned in fig.18 .
DSP blocks are available only in the Cyclone V and Arria II technologies; these blocks contain optimized units for some arithmetic operations, multiplication, for example, so the architecture uses 4 DSP blocks in the case of Arria II GX, which represents 2% of the total blocks, and 127 DSP blocks using Cyclone V which is an 81% of the available DSP blocks for this device. The other devices use the embedded multiplier 9-bit elements in place of DSP blocks to optimize multiplications, so the architecture needs eight embedded multiplier 9-bit, which is 5% for the Cyclone IV GX, 3% for Cyclone IV E, and 2% for Cyclone III LS as shown in Fig. 19. While there is no need for memory blocks in the architecture.  V. CONCLUSION In this paper, a hardware architecture of a hybrid techniquebased ECG signals denoising is presented to satisfy the exigency of medical applications as ECG monitoring in terms of real-time processing, low power consumption, and portability. The algorithm is firstly evaluated in Matlab for validation; then, a VHDL description is presented for FPGA implementation purposes. The given architecture is adequate to be implementable on low-cost FPGA families because of the small area it requires and the possibility it gives to add other blocks for more processing tasks such as QRS and abnormalities detection. The simulation results show that the system's response takes 0.3 ms, responding to the real time processing constraint imposed by an acquisition period of 2.77 ms.
This study opens the way to design a global architecture permitting the extraction of necessary characteristics for the heart rate computation and heart diseases detection afterward; in order to put in practice a system allowing real-time monitoring of patients cardiac state.

ACKNOWLEDGMENT
We would like to thank the CNRST (National Centre for Scientific and Technical Research) of Morocco for the support (scholarship number: 588UIZ2017).