Stacked Autoencoder based Feature Compression for Optimal Classification of Parkinson Disease from Vocal Feature Vectors using Immune Algorithms

Parkinson’s disease (PD) is a neurological progressive disorder and is most common among people who are above 60 years old. It affects the brain nerve cells due to the deficiency of dopamine secretion. Dopamine acts as a neurotransmitter and helps in the movement of the body parts. Once brain cells/neurons start dying due to aging, then it will lead to a decrease in dopamine levels. The symptoms of Parkinson’s are difficultly in doing regular/habitual movements, uncontrollable shaking of hands and limbs may encounter memory loss, stiff muscles, sudden temporary loss of control, etc. The severity of the disease will be worse if not diagnosed and treated at the early stages. This paper concentrates on developing Parkinson’s disease diagnosing system using machine learning techniques and algorithms. Machine Learning is an integral part of artificial intelligence it takes huge data as input and train by making use of existing algorithms to understand the pattern of the data. Based on the recognized pattern, the machine will act accordingly without any human intervention. In this work, two major approaches have been employed to diagnose PD. Initially, 26 vocal data of PD affected and healthy individual datasets are obtained from the UCI Machine Learning data repository, are taken as initial raw data/features. In pre-processing, the mRMR feature selection algorithm is employed to minimize the feature count and increase the accuracy rate. The selected features will further be extracted using the Stacked Autoencoder technique to improve and increase the accuracy rate and quality of classification with reduced run time. K-fold cross-validation is used to evaluate the predictive capability of the model and the effectiveness of the extracted features. Artificial Immune Recognition System – Parallel (AIRS-P), an immune inspired algorithm is employed to classify the data from the extracted features. The proposed system attained 97% accuracy, outperforms the benchmarked algorithms and proved its significance on PD classification. Keywords—Immune algorithms; Parkinson’s disease; stacked autoencoder; airs-parallel; machine learning


I. INTRODUCTION
Parkinson's disease is a complex neuro-related disorder, having more prevalence among elderly people around the world. It is essential to diagnose it early to treat it accordingly. Although it has several treatments, medications, and surgery, it is always better to recognize the symptoms at the early stages. So, it helps in better recovery of the PD affected patients.
Medication plays an important role in controlling the symptoms of PD. Medications include Dopamine Promoter, Anti-depressant; Anti-tremor can help in overcoming the effects of PD. The most prescribed medicine is L-dopa (Levodopa) combined with Carbidopa. The medicine will be converted into dopamine by the brain cells and thus it balances the level of dopamine needed for the motor actions of the nervous system. However, recognizing the symptoms of PD and early diagnosis helps to control the severity before it gets worse.
The genetic mutations and environmental factors may be the genesis for Parkinson's disease. The usage of herbicides, fungicides, and pesticides is the emergence of acquiring Parkinson's disease. The related studies unveiled the 70% of the people affected by the PD due to the excessive usage of the pesticides. There are several transformations obtain in the brain of Parkinson's Disease afflicted people includes, including the clumps of some certain element in the interior of brain cells called Lewy bodies. Lewy bodies retain the key value to identify the genesis of the PD. Despite the possibility of many elements present in the Lewy bodies from the elements the vital protein termed asalpha-synuclein (a-synuclein). Alphasynuclein is encountered in Lewy bodies in a clustered form, the cells in the clumped formcould not be decomposed.Various genes are decisively linked up with the Parkinson's disease such as LRRK2, DJ-1, PRKN (Parkin), PINK1, GBA (Glucocerebrosidase -beta) and SNCA. Parkinson's disease is closely similar to other diseases such as Progressive supranuclear palsy (PSP), Corticobasal degeneration (CBD), and Multiple System Astrophys (MSA). The above three can be described as the name of Parkinson's plus Disease.
Attributable to Parkinson's plus disease, it is challenging to diagnose Parkinson's disease from Parkinson's plus diseases. PD diagnosis is elicited from either neurological examination, lab tests, or scans of the brain. As a consequence of not having the proper treatment to treat the PD, surgery, or medication is the possible one to improve the health of PD affected people. Several medications procedures are followed. Surgery will be prescribed when the medications are no longer good enough. Deep Brain Stimulation is the type of surgery used presently. In the future, some potential treatments will explore the areas in particularly neural (cell) transplantation, Gene therapy, and 470 | P a g e www.ijacsa.thesai.org Immunotherapy. In Neural Transplantation, displace the affected and dead brain cells with the new cells. The new cells can develop and increased. The outcome of the research holds the partial result, some peoples are getting improvement in the health and some of not. Gene therapy is another research area; this technique is also having some complications to implement effectively. Research is still on the horizon to cure Parkinson's disease entirely [1].
Computer-Aided Diagnosis (CAD) is rapidly emerging in these days to help people to check the early symptoms on their own with needed reports and data. This paper is one such diagnosis system developed using Artificial Intelligence, Machine Learning, and neural network schemes. For, we employ Stacked Autoencoder and AIRS parallel to extract the raw features and classify PD affected persons from healthy individuals by applying the feature vectors.
The upcoming part of the work is compiled in the following manner. Section 1 of the paper is Introduction has already been discussed. Section 2 discusses the existing works that inspired this paper. Materials and methods will be Section 3 as it mainly concentrates on the technical aspects of the proposed work and employed algorithms. Results, Simulation, and Comparison are done on Section 4 of the paper. Section 5 concludes with the summarization, significance, and importance of the work based on the results and existing works.

II. LITERATURE SURVEY
In this part, the prevailing literature is conducting and preforming review and on it. The associated works mainly explore diverse feature extraction methods and classification algorithms on the healthy data.James Parkinson who wrote the initial medical depiction for Parkinson's disease in 1817. But it was further processed by Jean-Martin Charcot Parkinson's disease. Jean-Martin dissociated Parkinson's disease from other disorders and is characterized by tremors. It is a neurological progressive disorder. The person whose age is more than 60, Parkinson's disease is common to them. It mainly affects the brain nerve cells due to the deficiency of dopamine secretion. Dopamine acts as a neurotransmitter and helps in the movement of the body parts. Because of aging, the human brain cells start to perish, it will lead to a decrease of dopamine levels. The major symptoms of Parkinson's are difficultly in doing regular/habitual movements, uncontrollable shaking of hands and limbs may encounter memory loss, stiff muscles, sudden temporary loss of control and facial expression changes are recognized. The severity of the disease will be worse if not diagnosed and treated at the early stages.To diagnose the PD, there is a limited diagnostic test are an avail.
To diagnose the motor disorders of PDDaTscan is the only way out. To make the diagnosis ineffective way, machine learning provides an efficient way. In this paper, the voice dataset is used for diagnosing the PD by the use of supervised learning. The dataset found from the UCI machine learning repository. The overall dataset consists of 195 vowel voice records. Among the dataset, 48 voice records from healthy people and 147 from affected persons. 22 features are selected for preprocessing. 10 features are selected based on the Filter-Based Feature Selection algorithm from 22 features. The specific algorithm used was the Pearson Correlation scoring method is implemented to correlate the features with the label. K-fold cross-validation is used to perform training and testing on all data to increase the efficiency of the outcome.The dataset employs the following models Averaged Perceptron, Bayes Point Machine, Decision Forests, Locally-Deep SVM,Boosted Decision Tree Logistic Regression,Boosted Decision Tree, Neural Networks, and SVM. From these models Boosted Decision Tree provide the most accurate result when compared to other models. This paper concludes the voice recordings are feasible to diagnose Parkinson's disease.
Artificial Immune Recognition System is a modern supervised learning algorithm, inspired by the immune system. AIRS provides the best outcome for classification problems. AIRS is the fusion of artificial intelligence and biological inspired computation evolved from the metaphor and the heuristic knowledge of the biological immune system [2--6]. The AIRS is the first AIS procedure used to solve the classification problems. AIRS has somespecialized characteristics such as Self -Regulation, Generalization, Performance, and Parameter Stability. AIRS has many biological terms such as antigens, B-cells, T-cells, clonal selection, etc., the implementation level of AIRS is a very complicated one. The procedure of the AIRS algorithm is needed to prepare a collection of memory cells. Those memory cells are needed to train the data. The developmental process of the AIRS algorithm has the following steps: 1) Construct the data for the training process and the data should be normalized, use Euclidean distance measures for calculating the affinity measures, then select the antigens randomly for memory pool.
2) Training the memory cells by antigens. 3) From the selected memory cells are mutated clones, such clones are moved to ARB (Artificial Recognition Ball). 4) Competing for the limited resources 4. Selecting the memory cells. 5) The classification has to be done by implementing the k nearest neighbor method. The above life cycle of AIRS produces better accuracy in diagnosing the disease early. This paperconcludes, the AIRS has provideda good accurate result when compared to the rest of the classifiers [2]. This paper describes the Parallel AIRS. Parallel AIRS is one of the AIRS algorithms. AIRS 1 and AIRS 2 are the serial versions of the AIRS algorithm [7][8][9]. Both algorithms relying on a single processor to train the memory cells. But Parallel AIRS has multiple processors, so more than one processor can perform their task-parallel. AIRS 1 and AIRS 2 algorithm have the nine steps [10][11][12][13] worked in a single processor. The following steps are done by the parallel AIRS. Step1: From the root read the training data Step 2: Distribute the training the data to np (number of processes).
Step 3: Each processor executes from step 1 to step 9 based on the serial processuntil the training data obtained. Now each processor holds the trained memory cells.
Step 4: Collect the memory cells from each processor and the memory cells are merged and back to the root (initial stage). Speed up is achieved without any loss of accuracy in the classification. The efficiency of Parallel processor can be stated as E(P) = T(1) / P * P(T), where P is the total number of processors, T(1) is the time for AIRS 1 and AIRS 2, T(P) is the time for the algorithm of a parallel version. The AIRS algorithms (AIRS 1, AIRS 2, and Parallel AIRS) implemented on datasets in the WEKA platform [14]. The 471 | P a g e www.ijacsa.thesai.org classification accuracy of Parallel AIRS shows the best when compared to the other AIRS algorithms.
Autoencoder is one of the unsupervised machine learning algorithmsin a deep neural network. The output values should be equal to input values. It is used to deplete the size of our inputs as a compressed form, by performing the reconstruction the original data is evolved. The architecture of the Autoencoder is of three parts, encoder, hidden layer, and decoder. The encoder compresses the input data into latent space representation [15][16][17][18][19][20]. It reduces the original dimension of the data. Hidden layers refer to code it holds the compressed input and the decoder, it reconstructs the code from latent space representation to produce the output. The autoencoder is used to extract some specific features from the data and produce the output. So, the autoencoder used as feature extraction. Stacked Autoencoder consists of various sparse encoder layers. Each is placed one after another like a hierarchical format. Each input of successive bottleneck (hidden or internal) layer connected to each bottleneck layer of output [21][22]. The algorithm of stacked autoencoder mainly follows three steps: First, obtained the trained data from the autoencoder. Second, trained data of the previous layer is used as an input to the successive layer, this process will continue until the training should be completed on all input data. Finally, after the completion of training in all internal layers, finetuning is attained. This paper employs Stacked Autoencoder to diagnose Alzheimer's disease (AD), mild cognitive impairment (MCI). By training the data employing the Stacked Autoencoder improves the level of accuracy.

III. MATERIALS AND METHODS
This segment discusses the process implemented in the paper to discern the best classifier for PD. It explains the dataset, feature extraction algorithms, k -fold cross-validation, and AIRS -Parallel classification algorithm. Fig. 1 represents the complete workflow of the proposed model.

A. Dataset Information
The examining work is started with acquiring the samples of voice recordings of PD affected peoples and healthy peoples from the UCI repository. The dataset is taken from the University of California and the Irvine Machine Learning repository. It contains 20 patient details with healthy people has 20 samples with 10 females and 10 males and the affected people 16 females and 14 males. The finalized version of the dataset holds 1040 instances and 26 attributes. The details about the dataset are given in Table I.

B. Feature Selection
In pre-processing, feature selection is the first step, where the raw data will be analyzed by a particular algorithm and the features will be further reduced based on the quality and clarity of the data. mRMR is the feature selection algorithm used in this work. For comparative analysis, two further feature selection algorithms called Correlation Feature Selection and Genetic Algorithm were employed. In general, the feature selection technique with the least number of features selected will mostly be considered as an optimized one [22][23][24][25]. Table III shows the numbers of features by this technique.  Minimum Redundancy -Maximum Relevance (mRMR) is a technique, here used to select the optimum feature subsets. The core mechanism of the algorithms is; it selects the features highly relevant to the necessary classification yet features are mutually having less relevance it implies minimizing redundancy between the feature data. This technique fetches high accuracy with mutually unrelated features having more details about the problem. It helps in the precise classification of the data. The final subset S is identified based on the following equation.

C. Feature Extraction
When the input data is too large to process if it is repetitive it can be manipulated into a compressed set of features called feature extraction. Some of the feature extraction techniques are Latent semantic analysis, Partial least squares, principal component analysis, Multifactor dimensionality reduction. Autoencoder is one of feature extraction, it produces the output by eliminating unnecessary interruptions or noise. A stacked autoencoder is one of the methods of the autoencoder. In this paper, stack autoencoder is used as a feature extraction method. The stacked autoencoder receives the input as a voice signal from the big data source. The input is compressed by applying encoder layer this can be done by several layers and stored the values in the hidden layer, In the hidden layer, the data has to be trainedonce the training is done with it the output is then reconstructed from the hidden layer making use of decoder 472 | P a g e www.ijacsa.thesai.org layer and it produces the output, the output should be equal to the input. Here, the number of feature 22 is given as an input and it performs the compression and it extracts the output with 8 features, here the remaining 14 features are considered as a noisy signal and eliminate those signals. This autoencoder model employs a cross-entropy loss function, and it suits well for this binary classification task. The parameters of the SAE are given in Table II. The general equation of the cross-entropy is represented below.
D. K-Fold Cross-Validation K fold Cross Validation is also known as Rotation Estimation. It is one of the statistical methods used in machine learning to evaluate the skill of the particular model. This method holds a single variable "k". k refers to the total number of the groups the data has to be split for validation purposes. It has a simple procedure to work with k fold cross-validation. It randomly shuffles the input dataset, then it split into 10 groups (if k=10). Acquire one group for testing the data, the remaining group is undergone training the data by applying the model on it. Once it is trained then the group moves to test the data. After the completion of the test data, evaluate the score of the test data set. The evaluated score has been reserved and eliminate the models. Based on the value of the score, the model skills have to be analyzed.

E. AIRS-Parallel
In this system, AIRS Parallel is used as a classifier to diagnose the disease effectively. It is one of the AIRS methods. The training data set has to be given as an input to the AIRS Parallel after the completion of k fold cross-validation runs. It divides the dataset into many processors. Each processor holds some dataset randomly. Each processor performs the serial version of the AIRS process on the dataset. After the process completion on each processor, it gathers memory cells from each processor. Performing the merging operation on the memory cells into a single pool of memory cells. The memory cells in the pool are further divided into classes. In each class, an affinity pairwise calculation has been performed between the memory cells. If the affinity is less between the two memory cells then the affinity threshold scalar is a product by affinity threshold (aff(mc1, mc2) < afft * affts). As a result, only one of the memory cells is retained in the last pool. The outcome of this algorithm provides better accuracy to diagnose Parkinson's disease. The model skills have to be analyzed. In Table IV, the parameters of the artificial immune algorithm AIRS is given.

IV. EXPERIMENTAL RESULTS
In this section, the classification performance of the proposed combination will be evaluated and compared with the existing techniques. In summary, initially, the training dataset contains 26 featured vocal datasets obtained from 20 PD affected patients and 20 healthy individuals. It contains various kinds of 26 sound recordings of the voluntary individuals, in turn, forms 1040 overall voice recordings. The sound recording consists of sustained vowels, words, numbers, and small sentences. The test dataset consists of 6 voice samples that have been recorded from 28 PD affected patients. These 6 voice samples contain only sustained vowels 'a' and 'o' every three times and it has a total of 168 voice recordings. The 473 | P a g e www.ijacsa.thesai.org dataset is obtained from the UCI Machine Learning repository. To narrow down the dataset for more accurate prediction and with comparatively reduced run time, the dataset will be preprocessed before the classification stage. In, pre-processing of the feature dataset, the 26 features have been reduced to 13 feature subsets. Furthermore, the selected 13 feature subset has been reduced to eight feature vectors deploying stacked autoencoder by performing compression and dimensionality reduction mechanisms. The extracted features have been estimated through the K-fold cross-validation technique to evaluate its predictive accuracy utilizing the existing dataset [26]. Here 5 folds were used to test and train the model to predict the accuracy. Fig. 2 represents the number of features selected by different feature selection methods.
For comparison of the proposed with the existing techniques, quality metrics need to be employed to determine the accurate performance analysis of the proposed work and its significance. For the reason that 4 major metrics were used to evaluate the proposed Stacked Auto encoder-AIRS Parallel technique. The main goal is to attain better disease classification accuracy to prove the importance of this work. The metrics are accuracy, specificity, sensitivity, and the confusion matrix plot. The parameters of the AIRS Parallel algorithm for Parkinson's disease need to be disclosed earlier. Table I represents the parameters used in the proposed algorithm with values

A. Performance Evaluation
Also referred to as an error matrix, it contains a table used to express the performance of the classifier on a test dataset for true known values. A confusion matrix has actual information and predicted information has been classified by applying the classification algorithm [27]. Based on the available data in the matrix, the performance of the model will be analyzed. The following table represents the confusion matrix for a binary classifier and the next table represents the outcome confusion matrix of the proposed work. Accuracy decides the overall performance of the system by classifying the PD affected individuals from the healthy ones and the accuracy was determined in percentage, higher the percentage, higher the accuracy [28][29][30]. The classification accuracy for the datasets of this study was calculated using the below equation. The sensitivity (sen) and specificity (spec) are calculated from the following equations.

TP TN Acc TP TN FP FN
The overall significance and importance of the model will be exposed only by comparing the results with the existing model's performances. Seeing, AIRS, and AIRS 2 algorithm with stacked auto encoder's performances will be taken to compare the results with the proposed combination. The results of AIRS -Stacked Autoencoder, AIRS 2 -Stacked Autoencoder, and the proposed AIRS Parallel -Stacked Autoencoder were presented as the Table V for comparison. From the results given in Table V, it is evident thatthe proposed model outperforms the compared combinations in terms of Classification Accuracy, Specificity, and Sensitivity. Also, the proposed model can be further compared with the previous work, CFS-ACO with SVM classifier. Table VI shows the comparison values.
In Fig. 3, the scores of different validation metrics attained by different immune algorithms on the selected features is plotted. It is visible the proposed work comparatively classifies the disease better than CFS-ACO-SVM combination [31]. Insensitivity, the previous work seems to perform better than the proposed work but it gives a better percentage of results in terms of accuracy and specificity. This section highlighted the peculiarity of the work and from the experimentation results; the numbers in percentage clearly show the need for the work for better progress in the future.

A. Correlation-based Feature Selection
Correlation-based Feature Selection (CFS) evaluates and selects the feature subsets from the given data using a unique selection process. The feature selection was done based on acquiring an effective feature subset, it having more correlation with the classification and less or uncorrelated to the existing features [32].

B. Genetic Algorithm
Genetic Algorithm (GA) is a nature-inspired, search based selection technique derived from Charles Darwin's evolution theory. GA resembles the ideology of nature by selecting the fittest individuals for procreation of the forthcoming generation. GA has five main phases for a successful selection process. They are Initial Population, Fitness Function, Selection, Crossover, and Mutation. Each phase plays a significant role in GA for an optimal selection, thus resulting in healthy offspring reproduction. The performance of CFS and GA with stacked autoencoder is represented in Table VII and the accuracy of feature selection methods on AIRS-P is given in Fig. 4.

VI. CONCLUSION
In this paper, the voice and speech recordings of PD affected and healthy individuals are analyzed with different statistical feature selection methods and neural network models. The 26 feature instances are pre-processed by deployingmRMR and Stacked Autoencoder -a neural networkbased auto encoder technique used to reduce the noise in the data and compress the information of the data to reduce the number of attributes present in the original dataset. After dimensionality reduction of the dataset, the classification ability of the compressed vectors was evaluated with the Kfold cross-validation technique. Finally, the 8 feature vectors will be classified by the AIRS Parallel algorithm. The result of the proposed work was compared with AIRS, AIRS 2, and CFS-ACO-SVM combination. From the comparison, we can visibly conclude, the proposed AIRS Parallel with Stacked Autoencoder technique comparatively outperforms the employed techniques in all given quality metrics with 97% accuracy. It denotes the importance of this classification system for PD. Any Artificially Intelligence machine learning system will not be able to attain a 100% classification accuracy rate. But, the run time and other aspects of the system can be improvised in the future works. As the next step to this diagnosis/classification model, a Computer-Aided Diagnosis system can be developed, inspired by this proposed model to get a better classification accuracy rate with less run time and memory space usage.

VII. CONFLICT OF INTEREST
Authors declare no conflict of interest. 475 | P a g e www.ijacsa.thesai.org