An Intelligent Approach based on the Combination of the Discrete Wavelet Transform, Delta Delta MFCC for Parkinson's Disease Diagnosis

holdout scheme Graphical Abstract: Abstract —To diagnose Parkinson’s disease (PD), it is necessary to monitor the progression of symptoms. Unfortunately, diagnosis is often confirmed years after the onset of the disease. Communication problems are often the first symptoms that appear earlier in people with Parkinson’s disease. In this study, we focus on the signal of speech to discriminate between people with and without PD, for this, we used a Spanish database that contains 50 records of which 28 are patients with Parkinson’s disease and 22 are healthy people, these records contain five types of supported vowels (/a/, /e/, /i/, /o/ and /u/), The proposed treatment is based on the decomposition of each sample using Discrete Wavelet Transform (DWT) by testing several kinds of wavelets, then extracting the delta delta Mel Frequency Cepstral Coefficients (delta delta MFCC) from the decomposed signals, finally we apply the decision tree as a classifier, the purpose of this process is to determine which is the appropriate wavelet analyzer for each type of vowel

However, for these systems of recognition of Parkinson's disease, research work has studied the improvement of the performance of this system by combining the MFCC coefficients with other types of acoustic parameters such as LPCC [28], PLP [28], energy [29], wavelets [29][30][31] and Empirical Mode Decomposition (EMD) [32].
In this study, far from static coefficients, for correct detection of Parkinson's disease through voice our contribution consists in proposing a new method of selection of relevant acoustic parameters based on the use of dynamic delta delta MFCC coefficients combined with wavelets, these differential coefficients referred as dynamic parameters provide useful information on the temporal trajectory of the speech signal. This information extracted by discrete wavelet transform and delta delta MFCC will be used in the classification block using a decision tree classifier. To evaluate the performance of this model we applied it to a database of five vowels (/a/, /e/, /i/, /o/, /u/), each vowel includes 28 people with Parkinson's disease and 22 are healthy people.
The structure of the rest of the article is as follows: section II concerns a definition of the methods used, section III presents a description of the process used and the results obtained, and finally the conclusion in section IV.

A. Feature Extraction
For feature extraction, we are interested in the joint use of DWT and the second derivative of the MFCC (delta delta MFCC). This step allows extracting features that will be used by the decision tree classifier.

1) Discrete wavelet transform:
For the discrete wavelet transform, it is the discrete version of the Continuous Wavelet Transform (CWT) that used Mallat's algorithm, it is based on the principle of multi-resolution that allows the separation of details and approximations of signals by using a pair of filters H and G that constitute respectively a low-pass filter and a high-pass filter.
With the high-pass filters, we obtain the coefficients of the discrete wavelet decomposition (the details), and with the lowpass filters, we obtain the approximation coefficients. This operation is applied again to the approximation, generating another detail and a new approximation as shown in Figure 1. There are several types of wavelets, in our case; we used the wavelets presented in the table I: 2) Delta delta MFCC: Temporal changes in the cepstrum play an important role in human perception and it is through the derivatives of MFCC coefficients that we can measure these changes. Usually, MFCC coefficients are referred to as static parameters, since they contain only the information about a given frame. To improve the frame representation, it is often proposed to introduce new parameters into the parameter vector. The reference [33] proposed the use of dynamic parameters that present cepstral transition information in the speech signal. In particular, it proposed second-order differential coefficients, also called delta delta coefficients, derived from cepstral coefficients. Let d t be the first-order differential coefficient (delta MFCC) of frame t, then the corresponding second-order differential coefficient (delta delta MFCC) dd t is calculated by the following formula: The coefficients delta delta MFCC, also called acceleration coefficients are obtained using the second derivative of static Mel Frequency Cepstral Coefficients (MFCC), the latter is a representation defined as the discrete cosine transform of the logarithm of the spectrum of the energy of the speech segment. The spectral energy is calculated by applying a bank of evenly spaced filters on a modified frequency scale, called the Mel scale. The Mel scale redistributes the frequencies in a nonlinear scale that simulates human perception of sounds. Figure  2 illustrates the steps involved in obtaining the delta delta MFCC coefficients. Based on the results obtained in [26], we use only the first 12 delta delta MFCC coefficients.

B. Feature Classification
1) The decision tree classifier: A decision tree is one of the most popular techniques in machine learning. Indeed, decision tree learning is part of supervised learning, it is generally a classifier presented in the form of a tree structure [34].
A decision tree consists of a set of rules allowing a segment of a data set into homogeneous groups. Each rule associates the conjunction of tests with the descriptive variables. The first vertex is called the root of the tree, the following variables, which correspond to non-terminal nodes, are segmentation variables; each branch corresponds to a modality of the variable considered at this level of the tree. This process is repeated on each node of the tree, the nodes that are not pure are segmented until pure leaves are obtained.
Here we try to classify a population of individuals containing healthy people and people with Parkinson's disease pronounce these five vowels (/a/, /e/, /i/, /o/, /u/) into two classes with respect to a label {1 (healthy), 0 (sick)} from the recordings. The decision tree-learning algorithm is described below: Algorithm 1: Decision tree Data: a sample Ω of m labeled records Initialization: empty tree; current node: root; current sample: Ω  Repeat  Decide whether the current node is terminal  If the current node is terminal then  Label the current node with a leaf  Otherwise  Select a test and create the subtree  End if  Current node: a node not yet studied  Current sample: sample reaching the current node  Until a decision tree is produced Output: decision tree 2) Holdout method: All classification results are obtained using the "holdout" method. This method is a commonly used practice for evaluating machine-learning models. It works by first dividing the data randomly into two parts; one of larger size is used for training and the other part is reserved for error rate estimation. Another version of this method, called "data shuffle", consists in repeating L times the random division of the data into two parts; one for training and the other for testing, and then calculating the average of the L estimates of the error rates evaluated on the test data parts. The advantage of this method is that all data are used for both training and testing. The holdout is a simple method to understand and generally results in a less biased model estimate than other methods.

III. METHODOLOGY AND RESULT
We have implemented our algorithm in Matlab version R2019a. The tests were performed on a PC with the following configuration: www.ijacsa.thesai.org CPU: Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz Memory: 4,00 Go Operating System: Windows 10 64bit.

A. Data Collection
This study takes into account two subgroups of this group: the Healthy Control (HC) group with 22 speakers and the Parkinson's disease (PD) group with 28 speakers. All utterers included in this Italian corpus were registered in Bari (Puglia region), Italy. Each recording session took place in a controlled environment, taking into account factors such as room temperature, distance from the microphone, time of day, and having a conversation with the subject to warm up their vocal muscles. The sampling frequency was 16 kHz; more information is available in [35]. Table II includes the demographic information of the corpus.

B. Methodology
The objective of this part of Parkinson's disease detection is to design and experiment with a system based on extensive use of wavelet types combined with acceleration coefficients (delta delta MFCC) as features and the decision tree as a classifier, as illustrated in Figure 3.
For the feature selection and extraction domain, which is an important step in the recognition process of Parkinson's disease, we consider the use of the wavelet transform, which is a time and frequency analysis tool that allows obtaining variable temporal and frequency resolutions.  Our model for the diagnosis of Parkinson's disease will be applied to a corpus containing 28 subjects with Parkinson's disease and 22 healthy subjects for each vowel (/a/, /e/, /i/, /o/, /u/). It is based firstly on the decomposition of the signal into two sub-bands of approximate frequency and details by the use of a wide range of wavelets (Debauchies, Coiflets, Symlets, and Discrete Meyer) on the first seven scales.
Susceptible to better characterize our system, extracting the first 12 coefficients delta delta MFCC that are parameters widely used to encode the dynamic information of cepstral parameters. This extraction is only performed on the lowfrequency band (approximation) for each type of wavelet and from the first scale up to the seventh scale.
The main objective of the feature selection step is its direct contribution to the performance of the overall system this step allowed us to extract the features that will be then used by the decision tree classifier with the cross-validation method "holdout" that serves to decompose the base of each vowel (/a/, /e/, /i/, /o/, /u/) into 80% as a learning base and the whole base as a test base.

C. Experiments and Results
In this part of our work, we implemented our Parkinson's disease detection system where we introduced the DWT method and the delta delta MFCC coefficients as a feature vector, we changed the wavelet types for each first seven scales, then we measured the impact of each of these changes on the Parkinson's disease recognition performance by the decision tree classifier performance evaluation metrics.
A comparative study between the extracted features based on delta delta MFCCs is performed to visualize the distinctive behavior of the subject with Parkinson's disease and the healthy subject for the five vowels. Figures 4-8 show the 12 delta delta MFCC coefficients as extracted features for a healthy person and a person with Parkinson's disease. The feature patterns are obtained from 50 speech samples for each vowel (/a/, /e/, /i/, /o/, /u/) from people with Parkinson's and healthy people. Almost all speech signals follow the same pattern. Figure 4 clearly distinguishes the variation in the speech pattern of a healthy person and a person with Parkinson's disease for the vowel /a/ based on the wavelet of discrete Meyer at scale 5 the latter is the one that gave us a better Accuracy. Figures 5, 6, 7, and 8 present the delta delta MFCC coefficients respectively for the vowels /e/, /i/, /o/, and /u/, we based on these wavelets with these scales respectively discrete Meyer at scale 5, Coiflets level 3 at scale 6, discrete Meyer at scale 7 and discrete Meyer at scale 6 because it is those that gave us  Tables III, IV, V, VI, and VII a notable variation is observed between the healthy speech signal and that of Parkinson's disease. Thus, the proposed feature-based on delta delta MFCC can be a good marker for the prediction of Parkinson's disease.     The following tables show the different results obtained for the five vowels according to accuracy, specificity, and sensitivity.
For the vowel /a/, the results obtained in Table III show that the wavelet by using discrete Meyer at scale 5 offers better modeling for the classification of healthy people and people with Parkinson's disease with high accuracy of 97.5%. While for the vowel /e/, Table IV is still with the same wavelet, and on the same scale of 5, the best accuracy of 92.5% is achieved.
In contrast to the experiment for the vowel /i/, in Table V we register a slight performance low of the accuracy of 87,5% by using the wavelet of Coiflets level 3 at scale 6 and discrete Meyer at scale 7.
For the vowel /o/, according to the results shown in Table  VI, we find that the wavelet of discrete Meyer at scale 7 gives accuracy up to 92,5% for the discrimination between healthy patients and patients with Parkinson's disease. For the vowel /u/, Table VII shows an important result with an accuracy of 90% by using the wavelet of discrete Meyer at scale 6. www.ijacsa.thesai.org      Scale  wavele  t   1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5 Scale  wavele  t   1  2  3  4  5  6  7  1  2  3  4  5  6  7  1  2  3  4  5 In Table VIII we have listed, but not exhaustively, the different works on Parkinson's disease recognition systems. The performances are difficult to compare because they vary according to several elements such as the type of data, the choice of the learning models, the way the parameters are obtained, etc. In this section, we will compare our results with those of some works. Zayrit et al [29] evaluated the /a/ vowel in the Turkish corpus by a vector of 21 prosodic features including LPC, ZCR, energy, Shannon entropy, and MFCC. The recognition accuracy of PD using the SVM classifier was around 91.18%. The researchers in [26] also conducted experiments using a Turkish corpus where the accuracy was 82.50%. This accuracy rate was shown for cepstral features applied with the SVM classifier. Sakar et al [36] reported a PD recognition accuracy of 68.45% using prosodic features in an SVM-based classification. They used the vowel /a/ from the Turkish corpus. Belhoussine drissi et al [30] reported a recognition accuracy of 86.84% for PD using cepstral features in an SVMbased classification. They used a database of 38 recordings, 18 of which were from healthy individuals and 20 from patients with Parkinson's disease from the Turkic corpus. References [37], [38], [39], and [40] extracted the spectral, and prosodic features used respectively Random subspace classifier, Support Vector Machines (SVM) as classifiers, Multimodal approach and LDA-NN-GA classifiers, they achieved recognition accuracy of PD respectively 74.17%, 82.5%, 70%, and 95%. In this article, the results show that the joint use of the wavelets and the delta delta MFCC coefficients as features brings a significantly important improvement to the performance of the Parkinson's disease diagnostic system with an accuracy of up to 97,5%.

IV. CONCLUSION
The work presented in this article is part of a project to recognize Parkinson's disease from the voice, in an educational context. The objective is to detect the state of each person if he/she is healthy or suffering from Parkinson's disease. To achieve this goal, we first proposed a comprehensive and www.ijacsa.thesai.org efficient system for automatic recognition of people's states from a Spanish corpus of the five sustained vowels (/a/, /e/, /i/, /o/, and /u/) produced by 28 subjects with Parkinson's disease and 22 healthy subjects, Our process starts with the transformation of the speech signals by several types of DWT based on the approximation of the first seven scales which will be injected into the delta delta MFCC block to extract the 12 coefficients at each time. These coefficients are applied in the classification using the decision tree classifier.
The results show that the proposed feature is superior, providing a maximum accuracy of 97.5% for the database that contains the vowel /a/. There is a significant improvement over recent studies. The complete study showed that the proposed combination of wavelets with delta delta MFCC could be used to effectively detect PD.