One-Lead Electrocardiogram for Biometric Authentication using Time Series Analysis and Support Vector Machine

In this research, a person identification system has been simulated using electrocardiogram (ECG) signals as biometrics. Ten adult people were participated as the subjects in this research taken from their signal ECG using the one-lead ECG machine. A total of 65 raw ECG waves from the 10 subjects were analyzed. This raw signal is then processed using the Hjorth Descriptor and Sample Entropy (SampEn) to get the signal features. Support Vector Machine (SVM) algorithm was used as the classifier for the subject authentication based upon the record of ECG signal. The results of the research showed that the highest accuracy value of 93.8% was found in Hjorth Descriptor. Compared to SampEn, this method is quite promising to be implemented for having a good performance and fewer features. Keywords—ECG; biometric; Hjorth; sample entropy; SVM


I. INTRODUCTION
Biometric can be defined as a unique feature measurement from the physical features found in each person.The characteristics of behavioral or physiological features of an individual can be used to differentiate from one person to other [1].The automatic biometric system has been widely used such as the person identification and access control, inspection area, and the criminal processing.There have been more research and development particularly for the multimodal biometric system using more than one biometric modality in which the accuracy and security level can be enhanced [2].
Biometric can be classified into two methods: physiological and behavioral [3].The physiological biometric is related to the physical characteristics of body or human organs such as the facial pattern, fingerprint, iris, hand geometry, DNA and aroma.However, these biometric characteristics tend to be effortlessly falsified and could be forcibly obtained or be physically damaged [3].Therefore, an alternative biometric system that has a unique feature that is difficult to be falsified is deemed necessary [4].
The biometric modality that has the referred criteria is biopotential or bio-signal.The use of electroencephalogram (EEG) and ECG as bio-signal-based biometric modalities has been widely investigated as reported in the study [5]- [9].The bio-signal is potential to be the future biometric that is found difficult to be falsified or to attack this biometric system.However, ECG has some excellences such as tending to be linear, having continuous signal (regular rhythm), low complexity and relatively simpler in taking the signal if compared to the EEG signals.Based upon this explanation, the ECG signal was selected in this research as the biometric modalities.The advantage of biometric from the signal of the heart is that it is almost impossible to duplicate the electrical activity of human heart.In addition, the natural characteristics of biometric have made it possible to increase the security in comparison to other traditional biometric systems.
The ECG based biometric system method for the purpose of authentication includes the analysis in the time domain, frequency domain or time-frequency domain.This analysis is used to obtain the features in each subject of ECG in which it will later be matched with the database for authentication.The most widely used analyses method in the domain of frequency are wavelet and Fourier transformation.The research by Belgacem [10] reported the analysis on the ECG signal in 20 subjects of observation for authentication using discrete wavelet transform (DWT).In his research, DWT was used to obtain the feature coefficient from ECG waves.The random forest algorithm was used for the authentication based upon the features.The research by Anita [11] has proposed the ECG biometric for human recognition using haar wavelet, it reports 98.96% and 98.48% classification accuracy for identification on three different databases i.e.QT database, PTB database and MIT-BIH arrhythmia database.
The wavelet transform method was also applied in the ECG biometric by Wei-Quan [6] conducting a detailed deduction of the wavelet transform and continued with the accuracy test through the MATLAB simulation.The research of Wei-Quan, however, did not give any reports about the authentication or classification method used.Wavelet transformations as an ECG biometric base have also been reported in the research of Chee Yeen [12] with a focus to study the effects of various features used for the performance or accuracy of authentication.Chee Yeen intended to obtain a dominant feature producing the best performance in authentication.The Support Vector Machine (SVM) method was used for the feature-based authentication.
Another analysis method in the frequency domain on ECG biometric is Fourier transform as reported in the study [13] informing that the Fast Fourier Transform (FFT) method combined with the nearest neighbour classifier had a good performance for the ECG"s biometric.The FFT method was also used in the simultaneous ECG and electromyogram www.ijacsa.thesai.org(EMG) wave based biometric studies by Belgacem [14] with the Optimum-Path Forest classifier for authentication.
The studies previously explained are the examples of proposals of ECG biometric systems that have good performance for person authentication.Nevertheless, the use of feature extraction methods in the frequency domain tends to have high computational complexity, long processing times and relatively large memory resources.Therefore, an alternative method is deemed important used as a solution to the problems, one of which is through the time domain analysis proposed in this research.This proposed study focuses on time series analysis methods using Hjorth Descriptor and Sample Entropy for ECG biometrics.These methods have been selected for having good performance based upon some previous research to classify ECG and Epileptic EEG signals [15]- [17].Both of these methods are basically used for analysis of signal complexity.The varied signal form and ECG rhythm for each person will provide different measures of complexity.Because of this, both methods are considered for use in the proposed system.These methods would be combined with the Support Vector Machine (SVM) algorithm as a classifier.Applying these two feature extraction methods enables to determine the simplest method with the relatively fast computing time and expected to provide high accuracy.The contributions of this research in the theoretical and practical domain include: the use of appropriate methods in the person authentication through ECG signals, i.e. to determine an algorithm, in this case, purposely to reduce the computational complexity.Thus, the designed algorithm will correctly work in the individual authentication with high accuracy and low complexity of computation.
This paper is organized as follows.Section 2 describes the related theory which used in this paper.Section 3 describes the system design.Section 4 describes results and discussion which present the performance of each method.Finally, Section 5 presents conclusions of the research.

A. Biometric
In essence, biometric system refers to a system used to identify individuals based upon the differences in the scope of behavioral/psychological characteristics [1].It is possible that these characteristics in every human are unique from one to other.Also, the application of biometric-based authentication is considered more reliable compared to passwords/tokens and knowledge authentication.The main problem in making a practical biometric system is how to determine someone to be authenticated.The mechanism of the biometric system is conducted through several stages, the first of which is the enrollment stage.At this stage, the input will be scanned by a biometric sensor, and represented into a digital form.The subsequent stage is the matching stage [18], in which the input will be matched with the stored database.As explained in the previous section, Physiological biometrics is related to the physical characteristics of the body.Behavioral biometrics which might be used is sounds, gait, signatures and speech rhythms.But the behavioral biometrics tends to be simple to be falsified.These two biometric methods are shown in Fig. 1.

B. Hjorth Descriptor
The Hjorth Descriptor refers to a parameter to quantify and retrieve the signal features.Initially, it was used to analyze EEG signal characteristics.But in the research [15], [16] this method proved to have good performance in the case of processing ECG signals.Therefore, we use the Hjorth method on this proposed system.The Hjorth Descriptor parameter consists of activity, mobility and complexity.If we have x(n), the input signal, then or generally, it can be formulated as: The equation of Hjorth Descriptor is presented as follows (2)-(4) [20]: C. Sample Entropy (SampEn) Sample Entropy (SampEn) is an improvement in the Approximate Entropy (ApEn) method as proposed by Richman and Moorman [21].It is proposed to improve the ApEn where there is a bias due to self-match caused by a signal that is considered equal to itself.The advantage of SampEn compared to ApEn is that it has a good performance for short data sequences with noise and is able to separate the large signal variations.SampEn is one method that is widely used to measure signal complexity.In a research conducted by Rizal [17], it was proven that SampEn can provide high accuracy in the case of epileptic EEG classifications.
SampEn will calculate the probability of data sequence equal to another sequence in the signal sequence with tolerance r.This probability is expressed by X ( ) and Y ( ), each of which states the probability of two data sequences that Telkom University www.ijacsa.thesai.orgare suitable for numbers m + 1 points and the probability of two data sequences that will match the point of number m in tolerance r.The SampEn equation can be expressed by: ( )

D. Support Vector Machine (SVM)
The concept of Support Vector Machine (SVM) is to design a hyperplane that can classify all training data into two classes.Fig. 2 shows several patterns as the members of two classes.Line-1 and Line-2 are the examples of various discrimination boundaries [22] to obtain the best hyperplane.For the linear SVM used in this study, the equation of Line-1 and Line-2 were obtained by the following approach [23]: Some studies using the SVM method for classification on electrocardiogram signals include: ECG arrhythmias classification into four types of arrhythmias with experimental results of 93% [24], the numerical results indicating that SVM achieved 99.68% for cardiac beat detection using single lead ECG [25], automatic classifier for detecting five pathologies (AAMI standard) reaching an accuracy rate of 99.17% by SVM method tested by means of the MIT-BIH ECG Arrhythmias Database [26].In other studies, SVM Classifier achieved 90% accuracy based upon ECG signals for the detection of abnormalities developed for the remote healthcare systems.Other SVM reviews as biometric classifier can be seen in [27].Based on the description of the research above, the SVM method has the achievement rate of ≥90% in classifying the ECG signals; thus, it became the selected method in this study.

E. Performance Parameter
The performance of a classifier is measured by 3 parameters: sensitivity, specificity, and accuracy [28] considered for validation [29] where these three parameters can be calculated based upon the data generated by the confusion matrix [30] as shown in Fig. 3.  Accuracy in machine learning systems can be interpreted as a measurement of correct predictions made by the conditions over a specific data set [31].Sensitivity refers to a measurement to determine the ability of a classifier to correct observations accurately into certain categories [31], often referred to as TPR (True Positive Rate).Specificity, meanwhile, is a measurement to find out the value of an error called TNR (True Negative Rate) [32].The calculation of the values of sensitivity, specificity, and accuracy as shown in Fig. 3, is represented in the following equation [33][34][35]:

III. SYSTEM DESIGN
Based on [36], there are two models of biometric systems, namely: 1) The verification system compares the biometrics of a person with one reference biometric on the database, claimed by that person.In the verification system, it is only one input entered into one database.
2) Identification system compares a biometric with all biometrics existing in the database.There is the element of searching in the identification system for involving the process of matching one input to many database samples.
In this study, the proposed biometric system refers to the identification system where the mechanism was carried out by storing the ECG signal template database and then the data was used as a comparison when there was an input requesting the authentication.The biometric mechanism in this research can be seen in Fig. 4 and explained in the following section.

A. ECG Signal Acquisition
ECG is a device that measures the heart's electrical activity which is widely used for cardiovascular disease monitoring [37].ECG has a variety of rhythms, shapes and amplitudes in each human so that it is proposed for biometrics.In this proposed biometric system, ECG signal acquisition was conducted using the one-lead ECG device.ECG acquisition principally based upon Einthoven's triangle leads is shown in Fig. 5. Data collection was carried out with a sampling frequency of 100Hz for approximately 60 seconds on 10 subjects.Scenarios for retrieving the ECG signal were carried out during normal/relaxing conditions without any activities.This raw data is the main modality for the feature extraction process.Fig. 6 depicts the example of taking ECG signals on the subject of adult person.
ECG signals were then stored in the file format text in the form of a decimal value of 10 bits in the range of 0 to1023.The graph of ECG waves of each subject is illustrated in Fig. 7.
The ECG graph as shown in Fig. 7 for each subject had a complete ECG signal components, namely the PQRST wave.Visually, this wave had various forms from one subject to other.This initial hypothesis becomes a strong base for the success of authentication in the proposed system.

B. Feature Extraction
At this stage, the raw ECG signal for each subject was preprocessed by making the signal amplitude at level -1 to +1 with an aim to minimize the calculation complexity in the feature extraction process.The following are the equations used in pre-processing.
Equation ( 13) is used to remove the DC signal components.
Equation ( 14) is used to make the signal amplitude at level-1 to +1.Fig. 8 portrays the signal pre-processing results.
The next process is feature extraction to obtain the value of the feature extraction coefficient.In this study, the Hjorth descriptor and Sample Entropy methods were used to obtain the signal features.This method would obtain the signal complexity parameters from each ECG data for each subject.From this process, the features database of each subject would be obtained and then would be compared with the test data.The following are the signal features for each subject displayed in the form of tables and graphs.From the graph as shown in Fig. 9 and Fig. 10, it can be seen the average value of each signal feature in each subject.Tables 1 and 2 shows that the average values of the signal features in each subject were different from one to another, even in little range of values.The little difference of value was because the ECG signal owned by one individual and the other had a similar magnitude, frequency and QRS complex form.However, we visually could still see the difference in signal characteristics for each individual.In addition, the similarity of values only occurred in some features.Such condition will make it easier for the classifier to identify the individuals with one to another.

C. Classification and Validation
To test the accuracy of the system in authenticating the persons, SVM was used as a classifier.The SVM types used included linear, cubic, quadratic and SVM Gaussian.The purpose of using these types of SVM was to obtain the best accuracy value.Validation was carried out using the 10-Fold Cross Validation (NFCV) that distributed the data into N datasets where one dataset was the test data and N-1 was training data.In this study, the iteration process was carried out 10 times and the measurement of accuracy came from the average accuracy of each process.

IV. RESULTS AND DISCUSSION
In this section, a test was conducted to calculate the accuracy of the system that has been designed.The total number of test datasets was 65 from 10 persons where each person has 4 to 9 datasets.In this research, the 10-fold cross validation was used to divide the training dataset and the test dataset randomly with an iteration of N times until all datasets were valid as the training data and test data.The cross validation model was conducted as illustrated in Fig. 11.

A. System Accuracy using Hjorth Descriptor
Table 3 shows the result of the authentication accuracy for each classifier in the experiment using the Hjortssh Descriptor.
The confusion matrix of the description in Table 3 where the highest accuracy was 93.8% as seen in Table 4 below.The results showed the highest accuracy value of 93.8% using the SVM Gaussian with the validation as shown in Table 4.These results were quite consistent with other SVM methods, indicating that the Hjorth Descriptor has a good performance for signal separation in each person.From the results of this test, the average values of sensitivity and specificity were found at 93.1% and 99.32% respectively.The value of accuracy is also highly affected by the use of the Hjorth Descriptor itself that is being prone to the noise [22] and it can affect the value of activity or variance.Thus, in the further study, it is deemed necessary to do the denoising at the preprocessing stage without removing the information or characteristics of the ECG signal.Another disadvantage is that the Hjorth Descriptor"s performance is not good if used on a long signal line so that it requires a signal segmentation.Possible in the next research, it was done by limiting the number of processed PQRST waves.

B. System Accuracy Using Sample Entropy
Table 5 presents the results of the individual authentication in an experiment using Sample Entropy.

Fig. 2 .
Fig. 2. The Determination of Hyperplane in Support Vector Machine.

Fig. 7 .
Fig. 7.The Graph of ECG Signals in Each Subject.

TABLE I .
MEAN AND STD.DEV OF HJORTH PARAMETERS

TABLE II .
MEAN AND STD.DEV OF SAMPLE ENTROPY

TABLE III .
ACCURACY OF HJORTH DESCRIPTOR

TABLE IV .
CONFUSION MATRIX USING HJORTH

TABLE V .
ACCURACY ON SAMPLE ENTROPY

TABLE VI .
CONFUSION MATRIX USING SAMPLE ENTROPY