Sleep Apnea Detection Method Based on Improved Random Forest

—Random forest (RF) helps to solve problems such as the detection of sleep apnea (SA) by constructing multiple decision trees, but there is no definite rule for the selection of input features in the model. In this paper, we propose a SA detection method based on fuzzy C-mean clustering (FCM) and backward feature rejection method, which improves the sensitivity and accuracy of SA detection by selecting the optimal set of features to input to the random forest model. Firstly, FCM clustering is performed on the RR interval features of ECG signals, and then the backward feature rejection method is used to combine the intra-cluster tightness, inter-cluster separation and contour coefficient metrics to eliminate redundant features to determine the optimal feature set, which is then inputted into the RF to detect SA. The experimental results of this method on Apnea-ECG database data show that the SA detection accuracy is 88.6%, sensitivity is 90.5%, and specificity is 85.5%, and the algorithm can adaptively select a smaller number of more discriminative features through FCM to reduce the input dimensions and improve the accuracy and sensitivity of the RF model for sleep apnea detection.


INTRODUCTION
Sleep apnea (SA) leads to nocturnal hypoxia and hypercapnia, which makes elevated blood pressure and heart rate [1], and may lead to cardiovascular diseases such as hypertension, coronary artery disease, and cardiac arrhythmia, and even cause serious consequences such as heart failure and sudden death [2][3].Studies have shown that SA is also associated with neurological disorders such as Alzheimer's, Parkinson's disease and depression.Owen, et al. first identified Alzheimer's-like amyloid plaques in the brains of people who are clinically proven to have obstructive SA [4].Patients with SA in middle age are more likely to develop Alzheimer's disease in old age [5].Parkinson's disease patients have degeneration in the brain stem area that controls breathing, which can cause reduced respiratory muscle function, and sleep-disordered breathing.Yang, et al. propose a method to detect Parkinson's disease and predict disease severity by breathing at night [6].SA can also cause neurasthenia, which affects the classification of autistic patients using electroencephalography [7].Therefore, timely, accurate and convenient detection of SA is of great significance.
Scholars have carried out a lot of exploration and research in SA monitoring.Tagluk, et al. proposed a SA detection method based on wavelet transform and artificial neural network [8], which utilizes multi-resolution wavelet transform to decompose the abdominal breathing signal into multiple spectral components, and these spectral components are inputted into the artificial neural network to classify the SA condition of patients.However, detecting abdominal breathing often requires the use of larger devices, which impacts the patient's daily life.Mendez, et al. investigated a method for detecting SA based on empirical mode decomposition (EMD) and wavelet analysis (WA) of ECG signals [9], whereby features are extracted from the decomposition results, a heart rate variability time-domain measure and three additional nonlinear measures are used as inputs to a linear discriminant classifier.However, this method requires lot input feature parameters and complex computational model, thus the robustness needs to be improved.Iwasaki, et al. analyzed the adjacent R-wave intervals in the ECG signals and used a long and short-term memory model to detect SA [10].Urtnasan, et al. proposed a deep learning architecture based on a convolutional neural network using a single-lead ECG signal for SA classification [11].However, such deep learning models usually require multiple experiments to get the algorithm parameters and lack of interpretability.
Among the commonly used machine learning algorithms, the random forest (RF) algorithm has been widely used in biomedical signal processing due to its ability to handle highdimensional data, capture nonlinear relationships, reduce the risk of overfitting, and possess the advantages of noise immunity [12][13].The RF algorithm can also help to solve the problems of signal classification and anomaly detection [14][15].However, the RF algorithm has no definite rules for input features selection.Although the algorithm itself measures the contribution of each feature by means of feature importance assessment [16], this is only a relative metric, which does not reflect which features are absolutely important and should be selected.Selecting appropriate features is a relatively subjective process that depends on the specific dataset and problem domain [17].
To address the problem of how to select appropriate input features, this paper proposes a SA detection method based on Fuzzy C-means clustering (FCM) and backward feature rejection method, which selects appropriate input features through FCM clustering index, reduces the dimensionality of the input features, reduces the complexity of model training and prediction, and avoids too much noise and irrelevant information from causing model Interference.Feature selection allows the model to focus more on important features, thus better capturing patterns and relationships in the data and *Corresponding Author.www.ijacsa.thesai.orgimproving the accuracy and sensitivity of the RF model for SA detection.
The focus of this paper is to improve the random selection process of input features in the traditional RF method.FCM was applied to the input features, and the index of intra-cluster tightness, inter-cluster separation and contour coefficient of the samples were calculated.Combined with the method of reverse feature elimination, redundant features are eliminated to determine the optimal selection.Better classification accuracy of SA can be obtained by using only a small number of distinct features.The structure of this paper is as follows: Section II describes the data set and the method used in this paper, Section III is the experimental results and analysis, Section IV is the discussion, and Section V is the conclusion.

A. Datasets
Apnea-ECG database [18]: the database consists of 70 records divided into a training set containing 35 records (a01 to a20, b01 to b05 and c01 to c10) and a test set of 35 records (x01 to x35).The individual records in the database range in length from seven to ten hours and are sampled at a frequency of 100 Hz.Each record consists of the ECG signal, a set of manual apnea annotations and a set of machine-generated QRS annotations.In addition, eight records (a01 to a04, b01, and c01 to c03) were accompanied by four additional signals: chest and abdominal respiratory signals obtained using respiratory inductive plethysmography, oronasal airflow and oxygen saturation recorded with nasal thermistor.

B. Extracting ECG Signal Features
SA causes changes in the autonomic nervous system and cardiovascular regulation, leading to prolongation or shortening of the RR interval [19].In this paper, the Pan-Tompkins algorithm was adapted to identify QRS wave clusters [20], and the RR interval was determined by the time difference between adjacent R peaks, and the feature information was extracted from the RR interval sequence.
QRS wave detection: the raw ECG signals were processed in one-minute segments according to the annotation file.Persegment SA detection determines whether each one-minute segment is SA or normal, and it is an important basis for SA diagnosis in suspected patients [21].Band-pass filtering, differential amplification, squaring operation, moving window integration and threshold detection were performed on each segment using the Pan-Tompkins algorithm to locate the Rwave, as shown in Fig. 1.
Feature extraction: combined with the results of literature [22][23], we per-formed feature extraction on the obtained RR intervals as shown in Table Ⅰ.The average of all RR intervals.
The standard deviation of all RR intervals.

√ ∑( )
The root mean square value of the difference between all neighboring RR intervals.

PRR
The peak value of all RR Intervals.

PNN50
Percentage of the number of heartbeats where the difference between two neighboring RR intervals is greater than 50ms.

KRR
The degree of bias and kurtosis of the RR interval signal.

C. Determining the Best Subset of Features
There is redundancy among the RR interval features, which affects the accuracy of classification [24][25], therefore feature selection is needed before classification.The backward feature elimination method used in this paper is a kind of greedy algorithm [26], which and obtains a feature set that has the smallest number of features and the highest correct classification rate.The specific process of the method is described as follows:  Initialization: Determine the complete feature set containing all features.
 Random feature elimination: Randomly eliminate one feature from the feature set to form a new feature set.
 FCM feature clustering [27]: It make the new feature sample into n fuzzy clusters .Utilizing the fuzzy cluster's degrees of membership ranging from 0 to 1, iteratively optimize the objective function S to find the minimum value.
 denotes the clustering centroid of the ith class; denotes the degree of membership between the kth sample and the ith class; denotes the euclidean distance between the center of the ith class and sample .Compare the degree of membership and , if (the given sensitivity threshold), it means that the objective function S has reached the minimal value, and the final clustering result has been obtained; otherwise, continue iterating until the convergence condition is satisfied.
 Calculate the clustering metrics: calculate the average intra-cluster compactness (AIC), average inter-cluster separation (AIS) and average silhouette coefficient (ASC) for each cluster.
 AIC indicates the degree of compactness of the sample points within the FCM clusters.The lower the value, the more compact the sample points within the clusters.
 Where is the number of samples in the cluster, is the number of clusters, and is the distance between the sample and the sample, and denotes the average distance of each data point to the clustering center.
 AIS measures the separation between different clusters, and higher values indicate higher separation between different clusters, clearer boundary between clusters.
 Here, is the distance between the center of the cluster and the center of the cluster, and denotes the average distance between two clustering centers.
 AIS measures the separation between different clusters, and higher values indicate higher separation between different clusters, clearer boundary between clusters.

∑ (13)
 Here, is the total number of samples, average distance between sample point and all the samples in the nearest cluster, and denotes the average www.ijacsa.thesai.orgdistance between sample point and the other samples within the same cluster.
 Determine the best feature subset: It compare the clustering metrics before and after the removal of features, if the metrics after the removal of features are increased, then continue to perform steps (2) ~ (4); otherwise, keep the feature.Go through remaining features until all the corresponding metrics are reduced, then stop the search process and determine the set of features before removal as the best feature subset.
 The overall flow chart of the backward feature elimination method is shown in Fig. 2.

D. Detecting Sleep Apnea
The best feature subset determined from the last step was used as input to the RF classifier, and the classification accuracy of SA was calculated using the 10-fold cross validation method and the confusion matrix. Attribute selection.Assume each sample has attributes, m attributes are randomly selected from these attributes when each node of the decision tree needs to be split (where m is less than ).
 Calculate the Gini index and node splitting.Assume is an attribute of data set .Attribute has different values .Under the condition , the dataset is partitioned into two parts and , and the Gini index of this partition is:  Select the attribute with the smallest Gini index and its corresponding splitting node as the optimal attribute and optimal splitting node, generate two child nodes, and distribute the remaining training data into the two child nodes.

E. Evaluation Criteria
The performance of the method was evaluated by calculating the metrics of accuracy, sensitivity and specificity for SA detection through a confusion matrix.(15) (16) (17) Where TP denotes the number of true samples classified as positive (true positive); TN denotes the number of true samples classified as negative (true negative); FP denotes the number of false samples classified as positive (false positive); and FN denotes the number of false samples classified as negative (false negative).

III. RESULTS
In this paper, 70 records from the Apnea-ECG database are used as experimental samples, and each ECG signal of these records is segmented into one-minute segment by annotation file.The RR interval features of ECG signals are extracted by Pan-Tompkins algorithm.The best subsets of features are selected using the backward feature elimination method as: AVRR, RMSSD, PRR and KRR.
In Fig. 3, by visualizing the membership matrix U, the membership distribution of the data points between the clusters before and after the removal of features are illustrated.The data membership distribution in Fig. 3(a) is more centralized, and the feature points have fuzzy attribution relationships among multiple clusters, which make it difficult to be clearly classified into specific clusters.In contrast, the distribution of data membership in Fig. 3(b) is more dispersed, and the attribution of feature points is more explicit and differentiated.
Features with greater divergence are more favorable, since they allow the classifier capture and represent the characteristic patterns of different types of SA events with higher accuracy.
The average intra-cluster tightness, average inter-cluster separation, and average contour coefficient were calculated before and after the removal of the features, and the two sets of features were input into RF classifier to evaluate the SA detection accuracy, as shown in Table Ⅱ, respectively: The above four RR period features were matched with apnea labels to reconstruct the database as input to the RF classifier, and the classification accuracy of SA detection is calculated using the 10-fold cross validation method and the confusion matrix.The performance comparison with existing studies using the Apnea-ECG dataset is given in Table Ⅲ.Ⅱ, the feature clustering metrics are all increased by using backward feature elimination, indicating that the optimal feature subset has greater variability and differentiation, which is consistent with the visualized membership matrix in Fig. 3 Inputting the features after elimination to the RF classifier yields a classification with higher accuracy, sensitivity, and specificity, it is shown that the method adaptively removes irrelevant features and reduces the dimensionality of the input features so that the RF classifier can better capture patterns and relationships in the data and improve the accuracy of the detection of SA.
Table Ⅲ compares the SA detection results of different methods.Although the method based on WA and Hilbert transform can achieve the detection accuracy of up to 90.5%, the number of features used in it is as high as 40, which undoubtedly increases signal pre-processing process and the overall algorithm complexity.The methods based on EMD and redistributed spectra require half the number of features to achieve similar results, indicating that the nonlinear features calculated by the former do not play an important role in classification [9].The number of hidden layers and neurons in the Multilayer Perceptron classifier depends on experience, and the detection accuracy of sleep apnea is only 81.4%.Support Vector Machine also has problems with optimization of regularization coefficient and kernel function parameters.Compared with the above method, the number of features used by SA is the least, but the detection performance is not ideal.
Under the same dataset, the accuracy of our method for SA detection is 88.6%, sensitivity is 90.5% and specificity is 85.5%.It outperforms other existing methods in terms of sensitivity, high sensitivity means that the model can detect as many true apnea events as possible, reducing the possibility of underreporting, and is comparable to WA and EMD methods in terms of accuracy.This method reduces the computation and storage requirements by selecting fewer and more discriminative features using the FCM method.Moreover, it improves the model's sensitivity to key information and enhances the robustness of the model.
The discussion and evaluation of proposed algorithm on different data sets will be carried out in the future.Driving by the need of integrating the algorithm into wearable devices, how to further improve the running speed and efficiency of the algorithm is our improving direction.

V. CONCLUSION
In this paper, an improved RF-based SA detection method is proposed.FCM and backward feature elimination method are used to select the RR interval features of ECG signals.And a small number of the best feature subsets with obvious differences are obtained as inputs to the RF classifier, which improve the sensitivity of the RF model to key information and obtain a better SA detection accuracy.


Extraction of the best feature subset.From the best feature subset D, n samples (sub-training set) are drawn randomly with put-back to form a new training set as samples at the root node of the decision tree.The remaining samples form the out-of-bag dataset (OOB) as the final test set.

Fig. 3 .
Fig. 3. Visualization results of the affiliation matrix of RR interval features.

TABLE I .
LIST OF RR INTERVAL FEATURES (3)struct a random forest.Repeat step(3)in the sample subset of each child node, and recursively perform node splitting until all leaf nodes are generated; repeat (2) to (4) to obtain different decision trees.Sleep apnea detection.Each decision tree performs a 10-fold cross validation calculation for each piece of data in the test set, and the category with the most votes in k classification results is the final category for that sample.

TABLE II .
COMPARISON OF CLUSTERING METRICS AND SA CLASSIFICATION PERFORMANCE BEFORE AND AFTER FEATURE REMOVAL