Emotional State Prediction Based on EEG Signals using Ensemble Methods

org


INTRODUCTION
Mental health is a source of concern due to its significant impact on an individual's quality of life and on society, where poor mental health can lead to multiple health and financial losses, in addition to suicide in critical cases.Mental health issues are increasing due to different factors such as social media, social state, and financial state.Additionally, natural disasters and global epidemics affect an individual's mental health, such as what is caused by the Covid-19 pandemic [1].In 2011, the World Health Organization predicted that depression would be the leading cause of the global illness burden by 2030 [2].Twenty-five percent of people worldwide have mental health problems [3].Hence, it is essential to pay attention to this field and find solutions to mitigate the expected impacts of poor mental health.Furthermore, personal emotions primarily affect people's mental health [4].EEG signals contain brain electrical activity information gathered from the scalp by electrodes [5].Thus, several studies have been concerned with conducting experiments classifying emotions using EEG signals.Multiple previous studies have focused on classification accuracy and the number of features.The previous experiments achieved high accuracy with an appropriate number of components in the EEG signal classification process [6], but ignored processing time.On the other hand, limited studies have considered time to be an essential factor for the model evaluation.However, their results have shown that the processing took a relatively long time, impairing the model's efficiency.A short processing time is a required factor in mental health data processing, enabling smart devices to adopt these models and allowing their users to monitor their emotional states through them directly.Moreover, fast processing to detect personal emotions contributes to preventing critical cases, such as suicide due to depression. Comparison of the proposed model results with several studies that used the same dataset.

II. LITERATURE REVIEW
Table I shows a summary of related works in terms of the dataset, features extraction algorithm, number of features, best accuracy, and processing times.As shown in the table, few studies have been concerned with the models' processing times.On the other hand, various datasets and algorithms were used.www.ijacsa.thesai.orgThe emotions-state dataset was used in several studies; it contained 2548 features and classified the emotions into three classes: negative, positive, and nature [7].For instance, Bird et al. [7] used the dataset to propose a work that included biological inspiration used in all implementation steps rather than being limited to the classification stage.Additionally, they explored deep learning and tuning using long short-term memory (LSTM).Moreover, they have tested AdaBoost using two different models.
The system implements an evolutionary optimization of a multilayer perceptron (MLP) to estimate the network's best hyperparameters.The model extracted 500 features and tested them with several classifiers: deep evolutionary multilayer perceptron (DEvo MLP), LSTM, AdaBoost deep evolutionary multilayer perceptron (AdaBoost DEvo MLP), and AdaBoost LSTM.The accuracy results were 96.11%, 96.86%, 96.23%, and 97.06%, while the training times were 16.66, 65.11, 32.88, and 594.55 seconds, respectively.In study [8], the authors implemented four feature extraction methods: One Rule (OneR), Bayes Network (BN), Info-Gain, and Symmetrical Uncertainty.This study conducted a classification using single and ensemble methods on a dataset to determine the best result.The single models were: OneR, RT, Sequential Minimal Optimization (SMO), Naive Bayes (NB), BN, Logistic Regression (LR), and MLP.In contrast, the ensemble models were RF, Vote, and AdaBoost of random forest (AdaBoost RF).In the single model, MLP with Info-Gain achieved the highest accuracy by 94.89%.Simultaneously, RF with Info-Gain gained the best accuracy in ensemble methods by 97.89%.On the other hand, this experiment [9] converted the EEG signs to 2D and 3D convolutional neural network (CNN) images.In the beginning, the authors used three feature selection methods: Kullback-Leibler Divergence, OneR, and Symmetrical Uncertainty.The best results for each feature selection of 2D CNN were 98.22%, 97.28%, and 97.12%.In contrast, the accuracy of 3D CNN was 97.28%, 96.97%, and 97.12%, respectively.
On the other hand, multiple studies used a popular dataset called SEED [22]; the signals were collected from 15 participants using 15 Chinese film clips.Each participant experimented three times, and they classified the films into three classes: negative, positive, and neutral.Authors in [10] used the SEED dataset to propose a model called (STRNN) which integrated spatial and temporal dependencies with a recurrent neural network (RNN).The proposed approach had www.ijacsa.thesai.orgtwo layers: a multi-direction spatial RNN (SRNN) and a bidirection temporal RNN (TRNN) layer.These layers captured spatial and temporal information within the sequence signal.The accuracy of this experiment was 89.50%.While Asadur Rahman et al., in [11] used PCA and t-statistics to extract five channels and used them with four classifiers: support vector machine (SVM), artificial neural network (ANN), linear discriminant analysis (LDA), and k-nearest neighbor (KNN).The classification accuracy was 85.85%, 86.57%, 82.50%, and 73.42%, respectively.DEAP [23] dataset signals were collected from 32 participants; each participant watched 40 music videos for one minute per video.The participants classified the videos based on their levels of valence, arousal, liking, dominance, and familiarity.It is another common dataset used broadly in the state of the art.For instance, Doma and Pirouz in [12] used the DEAP dataset with PCA, and without PCA, with several classifiers: SVM, logistic regression, decision tree, KNN, and naive bayes.The classifiers achieved accuracy ranging from 55% to 75% and an F1 score between 70% and 86%.The better F1-score was 84.73% obtained by SVM with PCA of arousal classification.However, Hemanth in [13] used a Kohonen neural network (KohonenNN) with several modifications to achieve better accuracy.The modified KohonenNN I and II improved the accuracy by 1% to 2%.The best accuracy was achieved using KohonenNN I, which was 87%.In this study [14] suggested a novel model called ECLGCNN.ECLGCNN consists of three layers.The first layer is a graph convolutional neural network (GCNNs) devoted to calculating the relationship between two channels of EEG signals and extracting the graph domain features from differential entropy.The second layer is LSTM, which handles memorizing the changes between two EEG channels.The final layer is the dense layer, which focuses on classifying the emotions.The study conducted two experiments; the first experiment was subject-dependent, and it achieved 90.45% and 90.60% for valence and arousal.The second experiment was subject-independent and achieved lower accuracy, which was 84.81% and 85.27%.In this study [15], authors suggested using an empirical mode decomposition (EMD) to decompose the signals.Then they extracted the features using secondorder difference plots (SODP), which are the mean, area, and measure of central tendency.The experiment used an SVM and two-hidden layers of MLP to classify the multi-class emotions.MLP achieved good results in each classification.However, the best accuracy was 100% for high and low of arousal.Another study [16] carried out several feature extraction algorithms: power, entropy, fractal dimension, and statistical.Additionally, they used several classifiers: SVM, KNN, and decision tree.They adopted PCA to select features.The system achieved the overall best accuracies of 78.96%, 77.62%, and 77.60% for arousal, valence, and dominance, respectively, using SVM with statistical features.Some studies have used multiple emotional datasets.For instance, Cheng et al. [24] used the DEAP and DREAMER [17] datasets.The authors used the deep forest to extract spatial and temporal information from these two datasets.Both experiments achieved good results.The accuracy of the first dataset was 97.69% and 97.53% for valence and arousal, respectively.In contrast, the second dataset's accuracies were 89.03%, 90.41%, and 89.89% for valence, arousal, and dominance, respectively.This study considered the running time; the experiment took 693.4861seconds in the first experiment and 1307.406seconds in the second experiment.In contrast, another study [18] used SEED and DEAP datasets to implement their model.The research used a flexible analytic wavelet transform (FAWT) with information potential (IP) to extract the features.The experiment was tested using two classifiers: RF and SVM.SVM was better than RF, where it achieved 59.06% accuracy on DEAP and 83.33% accuracy on the SEED database.Additionally, authors in [19] used the same two datasets; they used three algorithms to extract the EEG signals.The three algorithms are variance, discrete wavelet transform (DWT), and fast Fourier transform (FFT).DEAP and SEED were used to validate the model.The DEAP dataset contains four states: valence, arousal, dominance, and liking.In contrast, the SEED has three states: negative, positive, and neutral.The classifier used spiking neural networks (SNNs), which achieved 78%, 74%, 80%, and 86.27% accuracy, respectively, on the first dataset, while on the second dataset, it achieved 96.67% accuracy.
On the other hand, some studies have used a different dataset.For example, authors in [20] used Indian films to classify emotions into happy, sad, fear, and relax.The study worked in two stages to remove the noise of the dataset.The first stage used empirical mode decomposition (EMD), while the second stage used variational mode decomposition (VMD).A Multi-class least squares SVM (MC-LS-SVM) classifier was used to classify the emotions alongside the morlet wavelet (MW) kernel function.This model's best accuracies were 92.79%, 88.98%, 87.62%, and 93.13% for happy, sad, fear, and relaxed emotions, respectively.Seal et al. in [21] used a dataset that classified the EEG into four categories: sad, fear, happy, and neutral.The experiment used a discrete wavelet transform and an extreme learning machine (ELM).The best accuracy was 94.72% from the FP1-F7 channel in the subband of gamma.

III. METHOD
In this section, we present the proposed model's method, as shown in Fig. 1.We experimented on the emotional state dataset used by some researchers, as mentioned in the literature review section [7].The database classified a person's emotional state into positive, negative, and neutral.Data were collected from one man and one woman using electrodes via an EEG headband while watching six clips for one minute per clip.Three films stimulated positive emotions, while the other three stimulated negative emotions.Six minutes were recorded for each person in a neutral or normal state.This experiment resulted in a dataset of 2,549 attributes and 2,132 rows.Two algorithms of the feature extraction stage, PCA and FastICA, were applied in this experiment.In addition, three ensemble classifiers were included in the classification stage: RF, XGBoost, and AdaBoost.Moreover, each experiment was tested using 10-cross-fold validation accuracy.Each experiment will be discussed in detail in the following subsections.www.ijacsa.thesai.org

A. Feature Extraction
This model used two feature extraction algorithms to extract the most important EEG signals of emotions: PCA and FastICA.Several studies used different extraction feature algorithms based on several specifications.In this study, we used PCA for its fast-processing capabilities [25].Similarly, FastICA is a fast version of ICA and is suitable for large datasets [26].The two following subsections will discuss the two selected extract feature algorithms.

1) Principal component analysis:
The main concept of principal component analysis is dimension reduction by transforming the correlated variables into new uncorrelated components, which maintain a maximum variation of the original components [27].Based on [28], there are some steps to extracting the features using a PCA, which are presented as follows: a) Suppose there is a matrix of m × n size.b) Convert the matrix to an N dimension vector with input data x as .c) Calculate the mean vector as follows: Calculate the covariance matrix.e) Compute the eigenvalues and eigenvectors.f) Select the components using k-eigenvectors of the highest eigenvalues, then construct a w matrix of dimensions.
g) Construct the principal components using the w matrix to transform the samples into a new subspace.
In this study, PCA was used with each classifier algorithm to obtain a good result.Based on multiple experiments, we have seen that the best selections were 36 components with RF, 33 components with XGBoost, and 28 components with AdaBoost.

2) Fast independent component analysis:
Fast independent component analysis is a type of ICA algorithm responsible for separating the unknown mixed signals to obtain useful independent signals using the source signal's independent and non-Gaussian nature [29].An algorithm of FastICA works faster and is iteratively used at constant points with a simple structure and fast convergence [30].
To implement FastICA, some steps follow, as mentioned in [31]: a) Remove the mean of x by centralization.b) Transform x linearly to obtain an uncorrelated vector called z.Thus, the covariance matrix of z will be: . c) Construct an operation matrix called w, which satisfies: ‖ ‖ .d) Update the separation matrix w, then iterate it based on the Newton iteration method to obtain .
e) Normalize to be ‖ ‖ .f) Judge the coverage of w.If it is good, then the best estimate of the source signal if not, go back to step 3.In FastICA experiments, we selected 33 components with RF and 31 components with AdaBoost and XGBoost.

B. Classifiers
This subsection shows the three classifiers used in this study.We focused on using ensemble classifiers based on their powerful performance.RF and AdaBoost were selected based on their good performance in classifying the EEG signals in some studies [32] [33].Additionally, we used XGBoost due to its effective label classifications and fast computation [34].The following subsections will discuss the selected classifiers.
1) Random forest: RF consists of multiple decision trees.Each decision tree makes a classification separately and then provides its result.The final result was identified based on the voting of all the decision trees [35].The RF algorithm describes the steps according to Evans et al.,in [36] as follows: a) Construct iterative N bootstraps of n size sampled from z population.www.ijacsa.thesai.orgb) Grow an RF tree randomly at each node and define m variables from M to permute over each node to know the best split using the Gini entropy index.c) Use out-of-bag data to validate every tree d) Produce several RF trees: To predict a new observation x_i, be a class prediction of th RF tree, then:

 
In this study, we tuned parameters where the best parameters of RF with PCA and Fast ICA are illustrated in Table II.
2) Extreme gradient boosting: Extreme gradient boosting (XGBoost) is a robust algorithm based on a gradient boosting system [37].It is a tree-boosting system; however, there is a major difference between RF and gradient-boosted machines (GBM): RF trees are built independently.In contrast, the GBM added a new tree to complete the previously built ones [38].According to Duan et al. [39], the general points of XGBoost algorithms are presented as follows:  Assume is a dataset of n samples and m features .
 The model uses z additive functions to approximate the response of the system, as follows: where, F is the regression trees space, and it is defined as: where, stands for the tree structure, indicates the number of leaf nodes, and the w represents the weights.Besides, is a function that illustrates that and are compatible with an independent tree.
Table III illustrates the parameters we used with the XGBoost classifier with PCA and FastICA based on the tuning parameters process.
3) Adaptive boost: Adaptive boost (AdaBoost) is an algorithm that improves learners' accuracy by changing the sample weight distribution [40].As mentioned in [41] AdaBoost works to reduce the exponential loss greedily.Eq. (6) shows that: where, indicates to a weak learner and x is the object used.

∑ [ ] 
where, h(x_i) is the hypothesis made by a weak learner, and αt is the parameter of it to minimize the sum of error in training.
In this experiment, AdaBoost was used two times, first with PCA while the second was with FastICA.The parameters we have used in the two experiments are shown in Table IV.

C. Performance Measures
The proposed model was evaluated in terms of accuracy, number of features, and processing times.The formulation of accuracy is presented as follows:

  
where:  True Positive (TP): means when the actual class was positive, and the model was predicted to be positive.
 True Negative (TN): means when the actual class was not positive, and the model was predicted to be not positive.
 False Positive (FP): means when the actual class was not positive, but the model was predicted to be positive.
 False Negative (FN): means when the actual class was positive, but the model was predicted to be not positive.To achieve scientific accuracy, we tested each classifier ten times with PCA and FastICA, avoiding biased results of the classifier accuracy.Then, we calculated the average of these experiments.
Additionally, several measures were calculated to give complete view of the results, namely precision, recall, and F1score.
Precision was calculated as follows:

  
This means the percentage of the relevant results.Recall was calculated as follows:

  
This indicates the average of precision and recall.

IV. RESULT AND DISCUSSION
One of the goals of this study was to reduce the number of features to fewer than 63, which is the lowest number of features achieved by previous studies that used the same dataset [8].We succeeded in reducing it by 48%, which equals 33 features using PCA with an accuracy equal to 95% using the XGBoost classifier.On the other hand, FastICA decreased the features by 51%, which is 31 features using the XGBoost classifier with 94% accuracy.According to the time, we noticed that all classifiers' processing times were between 2 seconds and 15 seconds.Notably, all classifiers take a short time to process; thus, we decided to take time as the third factor in the evaluation process.However, RF was the faster algorithm; it took just four and two seconds of processing time with PCA and FastICA.
On the other hand, AdaBoost had the lowest accuracy, achieving 86% in the two experiments.Additionally, AdaBoost was the slowest algorithm in both experiments.The highest accuracy achieved in the experiments was 95% using the XGBoost classifier, with 33 features extracted by PCA.In contrast, the same accuracy was achieved by RF using PCA, but with 36 features.Fig. 2 shows the results of the three classifiers for the three evaluation criteria.The figure shows the performance of the three classifiers with PCA and FastICA.The accuracy of the RF and XGBoost were high and equal in both experiments.However, RF outperformed in the time processing criteria, while XGBoost outperformed in the number of features.AdaBoost was the lowest and slowest classifier in this work.Table V shows the results of each measure of the three emotions in all experiments.
Generally, some existing works achieved better accuracy than this experiment, but with more features, such as [7].However, we think 95% is not a bad percentage, especially within a short processing time of 14 seconds.Table VI presents a performance comparison of the proposed work and existing works that used the same emotional dataset.

V. CONCLUSION
This paper presented an EEG signals emotion prediction model that concerns three factors: accuracy, number of features, and processing time.PCA and FastICA were used to extract the features from the signals.RF, XGBoost, and AdaBoost have been used to classify the signals.The XGBoost results were the best in the two experiments.The best accuracy of this work was 95% using 33 features extracted by PCA; the classification process took 14 seconds.In contrast, RF achieved the same accuracy in the PCA experiment but used 36 features within four seconds.AdaBoost achieved the lowest accuracy and longest time in both experiments.XGBoost was the fastest classifier in both experiments.This model can be adopted in smart devices, such as smartwatches that offer dynamic monitoring of mentally patients to protect them from the probable implications.In future work, the authors will utilize other methods and techniques to obtain better accuracy with fewer features and processing time.


This gap encouraged us to propose a model that classifies emotional signals considering accuracy, number of features, and processing time.PCA and FastICA were used in the extraction feature stage due to their speed in the extraction, while RF, AdaBoost, and XGBoost were implemented to classify the emotions.The contributions of this study are summarized as follows:  Implementing three ensemble classification algorithms on EEG signals using two feature extraction algorithms.Defining the best combination of feature extraction and classification algorithms based on three factors: accuracy, number of features, and processing time.

Fig. 1 .
Fig. 1.Research Methodology.The research methodology included three main steps.The first step was the feature extraction step, which contained two methods: PCA and FastICA.The second step was the classification step, which consisted of three classifiers: random forest, XGBoost, and adaptive boost.Finally, the third step was the evaluation step, which was based on three criteria: the accuracy of the model, the number of features, and the processing time.

TABLE I .
SUMMARIZATION OF RELATED WORK

TABLE II .
PARAMETERS USED BY RF SELECTED BASED ON PARAMETERS TUNING

TABLE VI .
PERFORMANCE COMPARISON BETWEEN THE MODELS USING THE SAME DATASET IN THIS WORK