Emotion Classification in Arousal Valence Model using MAHNOB-HCI Database

Emotion recognition from physiological signals attracted the attention of researchers from different disciplines, such as affective computing, cognitive science and psychology. This paper aims to classify emotional statements using peripheral physiological signals based on arousal-valence evaluation. These signals are the Electrocardiogram, Respiration Volume, Skin Temperature and Galvanic Skin Response. We explored the signals collected in the MAHNOB-HCI multimodal tagging database. We defined the emotion into three different ways: two and three classes using 1-9 discrete self-rating scales and another model using 9 emotional keywords to establish the three defined areas in arousal-valence dimensions. To perform the accuracies, we began by removing the artefacts and noise from the signals, and then we extracted 169 features. We finished by classifying the emotional states using the support vector machine. The obtained results showed that the electrocardiogram and respiration volume were the most relevant signals for human’s feeling recognition task. Moreover, the obtained accuracies were promising comparing to recent related works for each of the three establishments of emotion modeling. Keywords—Emotion Classification; MAHNOB-HCI; Peripheral Physiological Signals; Arousal-Valence Space; Support Vector Machine


INTRODUCTION
For affective and correct interaction between human and machine (HCI), recognizing human's emotion is a one of the key stage in affective computing field and especially in emotional intelligence for HCI issue.Thus, several researches could be targeted that will benefit from feeling assessment.We cite those done in medicine field and particularly for children with autism who are disable to clearly express their feelings [1].Emotion recognition system can identify the critical states during driving by detecting the stress level assessments [2][3] [4].Moreover, there are applications that affect daily lives without stress [5] with more pleasing life [6].
The emotion can be noticeable from different modalities.The facial expression is the most popular way to recognize the affective states [7][8] [9].Also, the human speech [10] [11] and motions or gestures are very used in emotion assessing problem.However, these channels cannot usually identify the real emotional states because it is easy to secret a facial expression or fake a tone of voice [12].Moreover, they are not effective for people who cannot reveal their feeling verbally like autistic people [13].Also, they aren't available unless the user is usually facing to the camera or microphone in an adequate environment with no dark or noise for data collection.
It is difficult to compare these investigated approaches because they are divergent in different ways.Indeed, the related works are dissimilar in the modality to recognize the affective states that can be natural or induced.Thus, emotion can be evoked by watching affective movies [20], video clips [21], since playing a video game [22], driving a car or listening to music [23] [24].Moreover, the emotion can be defined into different models: the first is Eckman's model that is based on universal emotional expressions to present out six discrete basic emotions: Happiness, Sadness, Surprise, Fear, Anger and Disgust [25].The second is the Plutchik's model that presents out eight fundamental emotions: Joy, Trust, Fear, Surprise, Sadness, Disgust, Anger and Anticipation [26].The third is based on Russel et al. model [27] who have focused on twodimensional evaluation ,like the valence-arousal model [19].Some other works merge the previous models to define the emotion in the continuous space using affective keywords [28] [20].
Among recent and related researches, we cite the work based on MAHNOB dataset [20].They classified the affective states into three defined classes and they achieved 46.2%, 45.5% for arousal and valence, respectively.Another similar work done by Koelstra et al. who created freely multimodal dataset DEAP, is detailed in [21].They classified the emotional statements into two classes for valence, arousal and liking.In the previous contribution, they obtained 57%, 62.7%, 59.1% for arousal, valence and liking, respectively.We notice that we cannot directly compare these studies, because they used www.ijacsa.thesai.orgdifferent classes in arousal valence model.Moreover, the manner to define the emotion is dissimilar: In fact, in [20], they classified the affective statements using nine discrete emotional keywords tagging to define three classes in arousal valence model.However, in [21], they used discrete self-rating scales from 1 to 9 for arousal, valence and liking.This paper aims to identify the human affective states into arousal-valence area using three ways of modeling and defining the emotion in this continuous space.The proposed approach is based on peripheral physiological signals (ECG, Resp, Temp and GSR) to use wearable and non-obtrusive sensors for future work.We began by defining the emotional states into two classes which are "positive" and "negative" for valence and "High" and "Low" in arousal.Then, we established three classes using the self-reported discrete scaling values (from 1to 9 scales in arousal and valence axis).The three classes are named calm, medium aroused, and excited, unpleasant, neutral valence and pleasant).Finally, we defined these three classes using nine emotional keywords (Happiness, amusement, neutral, anger, fear, surprise, anxiety, disgust and sadness).In the last emotion's definition, we combined the emotion's model done by Russel and Ekman [28][29] [30].Another purpose of this paper is to select the most relevant peripheral physiological signal for emotion sensing problem.Firstly, we began by classifying one signal and then, we fused their level features.We explored the recent multimodal MAHNOB-HCI database.A judiciously process was applied in preprocessing, features extraction and classification stages.For the last step we used the support vector machine (SVM).
The foregoing of this paper is organized as follows.Section II describes the proposed approach.The next section gives the details of the preprocessing data and feature extraction stages.In section IV, we present SVM classifier and how we modeled the emotion states.The section V summarizes the obtained results.Finally, we conclude this contribution and present future work in section VI.

A. Proposed approach
The emotion recognition system had several steps that ought to be carefully done to have promising classification rate.As a first step, we pre-processed the data to smooth the signals.
Then, a bunch of selected features were extracted.After normalizing all features, an early level feature fusion (LFF) was applied to compare the proposed approach to related works.Finally, we classified the data corresponding to their labels using the support vector machine.All these steps will be detailed in the foregoing sections.Fig . 1 presents the block diagram of this work.

B. MAHNOB-HCI Multimodal Database
Investigations in emotion recognition field motivated the establishment of many databases to involve this issue.Some datasets contained speech and audio-visual signals as modalities to assess the human affective states [31] [32].Healey and Picard [33]  In the proposed approach, we chose the latest database for several reasons.In fact, it had five modalities which were judiciously synchronized.Also, a comparative study between DEAP and MAHNOB datasets done by Godin et al. [34], demonstrated that the best accuracies were obtained after using the recorded signals in MAHNOB database.According to the experimental setup of the MAHNOB-HCI database, each trial contained 30 seconds before the beginning of the affective stimuli experience and another 30 seconds after the end.So firstly, we eliminated these two 30 seconds to have the pertinent information.Next, Butterworth filters were applied to eliminate artefacts and baseline wandering for the GSR, Resp and ECG signals.The cut-off frequencies are 0.3 Hz, 0.45 Hz and 0.5 Hz, respectively.
Adding to characteristic features like the heart rate variability from the electrocardiogram (1) and the breathing rate from the respiratory volume, a bunch of statistical values were extracted from the data.

⁄
(1) Whereas: HRV is the heart rate variability in beats per minute

: The mean of RR intervals
To reduce the difference between the participants, we normalized features by mapping each one to the interval [0,1].The preprocessing data and features extraction stages were based on the studies reported in [21] and [20].

IV. SVM CLASSIFICATION
Different machine learning algorithms were successfully applied to classify the human emotional states given a bunch of physiological features.We cite the artificial neural network (ANN), k-Nearest Neighbors (k-NN), Bayesian Network and Regression Tree (RT).In this approach, we employed the support vector machine which is the most popular and pertinent classifier in this issue [35].Indeed, a comparative study described in [36], proved that the SVM gave the best accuracy rates rather than other machine learning techniques such as k-NN, regression tree and Bayesian network.
Basically, it is a supervised machine learning technique.Adding to linear classification, SVM resolves efficiently a nonlinear problem with its several kernels to obtain the optimized classification rates.SVM performs the classification by finding the suitable hyper-plans that separate the classes very well by maximizing the distance between each class and the hyperplans.For the implementation, we used the LibSVM library under MATLAB platform3 [37].
Tables .II and .III present the two and three defined classes using 1-9 discrete scales in arousal-valence areas.The rated scales were reported by the participant after/during watching the affective video.On the other hand, we also defined the three classes in arousal valence model using the nine affective keywords, which are (1) Joy or Happiness, (2) Amusement, (3) Sadness, (4) Disgust, (5) Anxiety, (6) Fear, (7) Surprise, (8) Anger, and ( 9) Neutral.According to the table .IV, we assigned the labels "High" and "Low" for arousal, "Positive" and "Negative" for valence.The three classes were "Clam", "Medium", and "Activated" for arousal and "Unpleasant", "Neutral" and "Pleasant" for valence dimension.

V. RESULTS AND DISCUSSIONS
In this section, we summarize and evaluate the obtained results for emotion classification in arousal valence dimension.We presented the emotional states in two and three defined classes, as explained earlier.
For the aim of this paper, we classified each peripheral physiological signal and then, we applied an early fusion for all descriptors to compare the proposed approach to related studies.The level feature fusion is to combine all the modalities before the training stage [38].Thus, a simple concatenation was applied to all the extracted features.The table .V summarizes the classification accuracy after testing www.ijacsa.thesai.orgseveral SMV's Kernel functions for two defined classes.In this table, we can clearly note that the ECG and the RESP signals are the most relevant signals for the emotion assessing task, and precisely ECG for arousal and RESP for valence.
We achieved 64.23% in arousal and 68.75% in valence dimension and these accuracies were very promising compared to related works.In fact, Koelstra et .al[21] obtained 62.7% in valence 57% in arousal and Torres Valencia et al. [39] achieved 55% ± 3.9 and 57.50% ± 3.9 in arousal and valence, respectively.Both of these previous studies used the DEAP database.The achieved results prove the potential of the recorded data in MAHNOB-HCI database and their chosen videos were more powerful to evoke the emotion than videos clips used in DEAP.This explanation is well developed in [34].Indeed, the authors proved that the heart rate variability calculated from the ECG (not available in DEAP), is a very relevant feature in emotion recognition task and it is more accurate than the HRV calculated from the PPG signal which is recorded in DEAP database.The table .VI presents the accuracies after classifying the emotion into three areas in arousal valence space using the selfreported scaling.On the other hand, Table .VII shows the results after defining the three classes basing on the nine selfreported affective keywords previously reported.
Both of these tables prove that the human's emotion is more noticeable from the respiration and electrocardiogram signals then other bodily responses (Temperature or Galvanic Skin Response).In addition, we can clearly see that the Gaussian kernel function is the best solution that could find the performed hyper-plans.Moreover, it is easier to recognize the emotion after fusing all the peripheral physiological signals, as shown in table V, VI and VII.
The table .VIII resumes the obtained results and three recent related works and it proves that the obtained accuracies are promising in the three ways for emotion's establishments in arousal-valence model.
The achieved accuracies are explained by the fact that we correctly pre-processed the signals to have the significant information.In addition, we warily selected features, which are relevant than chosen in the studies earlier mentioned [20] [21][39].

VI. CONCLUSION
This paper presented a whole process in emotion recognition system from peripheral physiological signals.For this aim, we used the recent multimodal database MAHNOB-HCI.In this dataset, they collected the bodily responses from 24 participants after eliciting their feeling using 20 selected videos.Basing on the self-reported emotion from the participant, we proposed three ways to model the affective states in arousal valence space.In fact, we established two and three main classes using the discrete rating values (from 1 to 9 scales) and another model using 9 emotional keywords to define three areas in arousal valence dimension.We preprocessed the data to remove noise and artefacts from the data.Then, we extracted selected features.After normalizing them to minimize the difference between participants, an early level feature fusion was applied for further analysis.Finally, we classified for the first time each physiological signal and then the LFF data using the support vector machine.We used its different kernel's functions to perform the classification rates.Results showed the relevance of the electrocardiogram and respiration signals in emotion assessment task.Moreover, the RBF kernel is the most suitable algorithm.Results proved also, that detecting affective states is easier after fusing all the bodily responses.The obtained accuracies were promising compared to recent related works.
As future work, we aim to implement additional techniques such as the feature selection and reduction mechanisms (ANOVA, PCA, and Fisher) to eliminate the redundant information and select the most relevant features.Moreover, we would like to implement other classification algorithms that can lead for best results.

Fig. 1 .
Fig. 1.Block diagram of the proposed approach collected one of the first affective physiological datasets at MIT.Their collected signals were the electrocardiogram (ECG), galvanic skin response (GSR), electromyogram (EMG) and the respiration pattern.This database of stress recognition is publicly available from Physionet 1 .Another novel dataset is the Database for Emotion Analysis using Physiological signals (DEAP) [21].It contains the spontaneous bodily responses of 32 participants after inducing their emotional states by watching selected music videos clips.This dataset is freely available on the internet 2 for academic research.More recently, Soleymani et al.[20] created the MAHNOB-HCI multimodal database.They recorded the peripheral physiological signals from 24 participants after eliciting their emotion by 20 affective movies.These signals are the Electrocardiogram (ECG), Galvanic Skin Response (GSR), Skin Temperature (Temp) and Respiration Volume (RESP).They also recorded the EEG signal, eye gaze and the face videos.Rate www.ijacsa.thesai.org Table .I summarizes the content of the MAHNOB-HCI database.

TABLE V .
CLASSIFICATION ACCURACIES AFTER USING SEVERAL SVM'S KERNEL (TWO CLASSES)

TABLE VI .
CLASSIFICATION ACCURACIES BY USING SEVERAL SVM'S KERNELS (3 CLASSES USING SCALING RATES)

TABLE VIII .
OBTAINED RESULTS COMPARED TO RELATED WORK FOR THE THREE WAYS IN MODELLING EMOTIONS