Emotional Changes Detection for Dementia People with Spectrograms from Physiological Signals

Due to aging society, there has recently been an increasing percentage of people with serious cognitive decline and dementia around the world. Such patients often lose their diversity of facial expressions and even their ability to speak, rendering them unable to express their feelings to their caregivers. However, emotions and feelings are strongly correlated with physiological signals, detectable with EEG and ECG etc. Therefore, this research develops an emotion predicting system for people with dementia using bio-signals to support their interaction with their caregivers. In this paper, we focused on a previous study for binary classification of emotional changes using spectrograms of EEG and RRI by CNN, verifying the effectiveness of the method. Firstly, the participants were required to watch simulating videos while collecting their EEG and ECG data. Then, STFT was performed, processing the raw data signals by extracting the time-frequency domain features to get the spectrograms. Finally, deep learning was used to detect the emotional changes. CNN was used for arousal classification, with an accuracy of 90.00% with EEG spectrograms, 91.67% with RRI spectrograms, and 93.33% with EEG and RRI spectrograms. Keywords—Emotion classification; people with dementia; EEG and ECG; spectrograms; CNN


A. Background
According to report from World Population Prospects 2017, the number of people aged 60 or over, is expected to rise from 962 million globally in 2017 to 2.1 billion in 2050 and 3.1 billion in 2100 [1].Furthermore, rapidly aging populations worldwide result in an increase of people living with agerelated illnesses like dementia, causing considerable concern for future health and social care provision [2].It's estimated that the proportion of the general population aged 60 and over with dementia is between 5 to 8 percent, meaning that 50 million people were living with dementia in 2017, while the total number of people with dementia is predicted to approach 75 million in 2030 and almost triple by 2050 to 132 million [3].Unfortunately, no treatment is currently available to cure dementia or to alter its progressive course.Therefore, early diagnosis and the promotion of optimal health management, along with the care provided by family and friends are of the utmost importance in caring for people with dementia.
As the illness gradually develops, cognitive impairment and decreased ability to communicate as dementia becomes more severe [4], resulting in an increasing physical, emotional and financial stress for caregivers.As cognitive decline progresses into its later stages, it can become difficult for patients to express their feelings appropriately, making communication with caregivers increasingly difficult.Therefore, in order to support care for such dementia patients in daily life, the need for emotion predicting systems has been put forward in the field.Generally speaking, that is the need to detect the emotions and feelings of dementia patients to support caregivers by helping them to understand what the patients are experiencing.

B. Related Works
In general, the main method used for detecting the emotional state of a subject is through their emotional expressions [5], like speech, facial expression, gesture, and/or physiological signals.There is already a body of work focusing on emotional expressions: speech signal, facial expressions and gestures [6][7] [8].However, physiological signals also reflect internal emotional information, and have increasingly received more attention recently.They are comprised of the signals originating from the central nervous system (CNS): electroencephalogram (EEG), and the autonomic nervous system (ANS): electrocardiogram (ECG), galvanic skin response (GSR), electromyogram (EMG) etc.Recently presented studies have shown that ANS, which includes the sympathetic nervous system and the parasympathetic nervous system, is viewed as a vital component of emotional response [9].Among the most common methods for measuring physiological signals for emotion recognition are EEG and ECG, which have been shown by cognitive theory and psychological experiments to indicate possible underlying emotional states [10].EEG analysis is mainly comprised of four frequency bands: delta waves (δ: 0.5-4Hz), usually appearing during deep sleep; theta waves (θ: 4-8Hz), always present during drowsiness; alpha waves (α: 8-13Hz), associated with relaxation and super-learning; beta waves (β: 13-30Hz), related with active thinking and attention [11].The partial correlations of cortical functions with EEG channels are shown in Fig. 1 [12].ECG, on the other hand, records the electrical activity of the heart.Based on feature extraction of ECG signals, heart rate variability (HRV) is determined by heart rate (HR) or RR interval, calculated from R peaks.A low HRV can indicate a relaxed state, whereas a high one represents a www.ijacsa.thesai.orgpotential state of stress or depression.While signal power [13] in low frequency (LF: 0.04-0.15Hz)and high frequency (HF: 0.15-0.4Hz)bands also appear to have a close relationship with emotions.
While speech and facial appearance might be disguised by a person intentionally or consciously [14], the physiological signs are natural reactions of the body, and as such are very difficult to mask.Since it is difficult to detect the emotions of a dementia patient by speech or facial expression, these signals may be a more reliable way to access that information.This study concentrates on emotion classification based on EEG and ECG, one of which represents the CNS and the other the ANS parameter [15].

C. Convolutional Neural Network
Over these years, there has been a lot of works in the field of affective computing, recognizing emotions from psychological signals by different classification algorithms.The team led by Prof. Picard [16] firstly performed the research on the emotion detection from signals: ECG, GSR and EMG.They extracted the time domain features, like mean value, variance value, first order differential mean value, etc.The Sequential Floating Forward Selection (SFFS) and Fisher Projection (FP) were chosen to select features for classification of 8 kinds.The best accuracy reached 82.5%.Murugappan et al [17], applied KNN and linear discriminate analysis to get an accuracy of 77%-88% using HRV features extracted from ECG. Mohammadi et al [18], utilized the frequency domain features for ECG, and classified by KNN and SVM.Nasehi et al [19] obtained an accuracy of 64.78% from EEG by neural network classifier.
Upon above research and results, it could be concluded that it's feasible to detect emotion state from physiological signals.However, most of them focused on time domain or frequency domain features, few used spectrograms created from timefrequency domain features as image classification using CNN.Also, there were not many studies combing the features together drawn from EEG and ECG data for emotion classification.The paper opted another perspective for feature extraction to get spectrogram images from EEG and ECG.We intended to sort different emotions using CNN conveniently to get good accuracy simultaneously.
Currently, the application of deep learning in the field of image classification is increasingly popular, especially CNNs (convolutional networks), which have been proven extremely effective in computer vision and are perhaps the most widely used approach in the class of machine learning algorithms [20].Their specific characteristic is that they capture features automatically, which is the integration of feature extraction and classification into a single algorithm only with the "raw" data (like pixels) as inputs [21].More recently, applying CNNs for bio-signal related problems, such as EEG and ECG, has attracted growing attention from psychologists and researchers [22].In this paper, we adopt CNNs to process spectrograms (2-D time-frequency data) with the aim to classify emotions, in order to address the above stated needs of dementia patients.With this method, we avoid the need to select frequency bands or channels in the process of feature selection.

A. Emotional Data Acquisition
There are many different ways to elicit the target emotions, such as pictures, film clips, music, emotional behaviors, imaging different scenarios and telling stories [23][24][25] [26].Film-clips have been found to elicit target emotions better compared with other methods.In this experiment, six different short video clips [27][24] [28] were chosen in advance according to other articles and YouTube rankings to stimulating different emotions (scared, excited, quiet, bored, sad, happy).Each video clip lasts around 2.5-4 minutes and was sandwiched between "quiet time" for 10 seconds, used to provide a short time gap for a smoother transition between emotional states.The entire process lasted for about 20 minutes.

B. Participants and Experiment Protocol
There were 11 healthy young participants, 5 males and 6 females, aged between 24~30 years old, participated in the experiment.Prior to the experiment, each subject was asked to sign a consent form and filled out a questionnaire about their recent mental state.Next, a presentation was given to them about the meaning of the experiment protocol, and scores used for evaluation.We arranged an experimenter available for any questions.After the participants understood about the experiment contents, they were asked to sit down and prepare for the experiment.Then the electrodes were placed on the right position on the skin of the subject.The signals should be checked for stability, after that, there was a practical trial for the participants to get familiar with the whole system.The subjects were requested to relax, minimize movement and concentrate on the videos.After that, there were a few minutes for them to calm down, and then reset their physiological signals during a 1 min quiet state, followed by the each video for a total of 20 minutes.The whole emotion inducing process is as shown Fig. 2.

C. Materials and Setup
The experiments were conducted in a laboratory environment with suitable conditions (T = 25℃, RH = 50%).EEG and ECG signals were collected on an equipped recording PC (Dell Precision M3800).Stimulating videos were performed on a PC (HP Touch Smart).The subjects were seated approximately 70 cm from the screen.The speaker volume was set at a relatively loud level; however, each subject was asked whether the volume was comfortable or not before the experiment and it was adjusted when necessary.

A. EEG Spectrogram
For EEG feature extraction, the spectrogram analysis of EEG was performed in a time-frequency domain method, using the short time Fourier transform (STFT) and applying a hamming window.The window size for spectrogram analysis was 4s with 3.9s overlapping, sliding window was 0.1s.We recorded the amplitude of each frequency band, consisting of delta wave, theta wave, alpha wave, and beta wave frequencies.The delta band (0-4Hz) was removed to eliminate the remaining noise from pulses, neck movement, and eye blinking [30].Our research focused on the alpha (8-14Hz) and beta (14-30Hz) frequencies.Alpha waves are typical for relaxed mental state and are most visible in normal daily lives.Beta waves are related to an activated state of mind appeared during intensive mental activity [31].The analysis method is reported as Fig. 4, which was realized by the API "signal processing" [32] through "SciPy", a Python-based ecosystem of open-source software for mathematics, science, and engineering to render many kinds of functions and algorithms.

B. ECG Analysis
The whole process of ECG is shown in Fig. 5.To extract features from ECG, the peaks of R-waves on the ECG plot were detected, as Fig. 5(a).After R-peaks detection, the R-R intervals (RRI) were calculated by subtracting the peak time from the previous timing to form the waveform which shows the temporal alternation of the peak timings, as Fig. 5(b).Due to this process, the RRI are sampled sparsely and unevenly.To apply a time-frequency domain analysis, the time intervals need to be resampled by 4Hz to get the evenly data graph, as Fig. 5(c).In the analysis, we used a window size of 60s and a step size of 1s, to get the RRI spectrogram.And the LF (0.04-0.15Hz) and HF (0.15-0.4Hz) amplitudes were obtained from the spectrogram, arranging the frequency from 0-0.5Hz, as Fig. 5(d).To get the RRI, the package "BioSPPy" [33], a toolbox for bio-signal processing written in Python, was adopted.Besides, the RRI spectrograms were achieved by the API "signal processing" [32] through "SciPy".www.ijacsa.thesai.org

C. Prepare Dataset
As the measured data of 1 subject was incomplete due to her exceeding reaction to the scared scenes, her data was excluded.The other 10 participants, 5 males and 5 females were taken into account for emotion changes detection.Considering that the emotion of participants may change during one video and the adequate information of RRI spectrograms, the spectrogram images were cut into 15s pieces.And the subjects were asked to make an evaluation about the videos per 15s.Then based on their evaluations, the video pieces per 15s were labeled with the scores.However, the evaluations on the fifth video eliciting boredom from subjects showed considerable variation, and so the data was discarded.The video showed somebody playing games by himself because nobody would celebrate his birthday with him.It seemed his words and accents were difficult to understand, so the subjects didn't know what was happening.
The problem of binary classification of arousal for changes in emotion was considered, where a given set of signals is classified by the subject according to how much the emotion was induced by each video.The subjective rating has a 9-point rating scale, so we defined the scores ranging from 1 to 5 as the low arousal class, labeled as 0, and those from 6 to 9 as the high arousal class [33], labeled as 1, as illustrated in Table I.The representative EEG and RRI spectrograms of two states: the high arousal emotion and the low arousal emotion were shown below, as Fig. 6.For the low arousal emotion as in Fig. 6(a), there's no sign of beta waves, while the high arousal emotion as in Fig. 6(b), beta waves are a prominent component of the signal.When feeling relaxed and experiencing no big emotional changes, HF amplitudes are prominent, as in Fig. 6(c).On the other hand, when emotional arousal was higher, the pressured moment may persist, and the LF amplitudes become more predominant, as shown in Fig. 6(d).

D. Results
CNN classification was constructed to detect emotional changes.The network adopted in this study was implemented in "Keras" (Python), and mainly constructed of convolution and pooling layers.The networks for EEG and RRI spectrograms were adjusted according to the different features.
First of all, we tried to apply VGGNet-16 [34], an extensive architecture built for the ImageNet image classification challenge.Both EEG and RRI spectrograms were processed separately with 300 epochs, this took about 34 hours, with 405s per epoch.The accuracy of EEG was 93.33%, while RRI's was 91.67%, this verified the stability and reliability of the dataset.
Then we designed simpler CNN structures for EEG and RRI, these required much less time, at just 40s per epoch.The CNN1 used for EEG had three convolutional layers, three maxpooling layers.A fully connected layer with 256 hidden nodes was added after the last max-pooling layer to obtain the output defined as 0 for the low arousal class and 1 for the high arousal class.The size of all the filters in the convolutional layers was set to 3×3, so the frequency resolution of EEG spectrograms is smaller and the small scale features are considered.We used a dropout probability of 0.3 after the fully connected layer to avoid overfitting.
Next, CNN1 was applied to the RRI spectrograms; however, this seemed to produce greater over-learning.We believe this is because the scales and characteristics of their features are different.Therefore, CNN2 was set for RRI by adjusting some parameters of CNN1.CNN2 had two convolutional layers, and two max-pooling layers, followed by fully connected layers.The size of all the filters in convolutional layers was set to 5×5, to bring out the bigger features of the image.The dropout was used after two maxpooing layers and full connected layer.Dropout regularization has been proved to be an effective way to reducing the overfitting in deep learning [35].www.ijacsa.thesai.org The ReLU activating function for the convolutional layers of both networks was used.Also, the optimizer used for both CNNs was a SGD (Stochastic gradient descent), with a smaller learning rate of 0.0001.After 300 epochs, the accuracy of CNN1 with EEG spectrograms was 90.00%, and the CNN2 for RRI spectrograms was 91.67%.The accuracy curve for each network is as follows in Fig. 7 and Fig. 8.After that, both EEG and RRI together are considered as features for the binary classification.The merged network structure according to the "concatenate" function from "Merge layers" in "Keras" [36], is shown in Table II.It takes the spectrograms of EEG and RRI as inputs for CNN1 and CNN2 respectively, then merge the two outputs to get the final result.The overall accuracy reached 93.33% after 300 epochs, as illustrated in Fig. 9. III), demonstrated that combining features of both EEG and RRI for classification of emotional changes performed better than only EEG or RRI features separately.IV.CONCLUSIONS

A complete comparison of accuracies between EEG and RRI (Table
In this paper, we strove to build upon prior research in the field of emotion detection for people with dementia and explored new techniques for increasing the effectiveness of deep learning models for EEG and ECG signals.It was proven that with EEG and RRI spectrograms from young people, binary emotional changes classification could be performed well by CNN.This allows for the convenience of automatic feature extraction.As a result, we are hopefully that the emotion detection system supposed in the study could be implemented in practice in caring for the elderly. However, there are still some limitations.Our results show only binary classification for emotional changes; however, for real applications, recognition of the discrete emotions (e.g.sad, scary, angry, excited etc.) will need to be achieved in the future.Besides that, we found some factors that influence accuracy, including placement of electrodes, number, gender, educational and cultural backgrounds of subjects, number of emotions considered, and type and effect of emotion induction www.ijacsa.thesai.org[37].Therefore, in further studies we'd like to invite more participants of a similar cultural background to those already involved in our experiment, then try several different stimuli for triggering target emotions to obtain more sample data with the aim of working towards recognition of discrete emotions.Furthermore, we should try to perform the experiment with elderly participants in the future.Taking the accessibility and convenience of signals measurement of the elderly into account, attention would be placed on designing garment-type sensors for physical data collection instead of using active electrodes.We intend to incorporate a silver fiber in the fabric of garments to make a chest belt and head belt for more convenient and flexible data measurement.

Fig. 3 .
Fig. 3.The whole Experiment Setup.(a) the Monitor Instrument Set: Polymate Ⅱ AP216 and Dell Precision M3800; (b) the Experiment of One Subject; (c) the Electrode Placement; (d) the Signals Collecting Page.III.DATA ANALYSIS The raw ECG and EEG data measured by the Polymate II AP216 had the noise due to muscle tone, blinking, and other small movements removed by the display software AP Viewer [29].The software can redisplay the waveform of data recorded with the instrument, and export specified sections of the file during a given period into CSV format.

Fig. 6 .
Fig. 6.The Representative EEG and RRI Spectrograms of the Two States.(a) the EEG Spectrograms of the Low Arousal Class; (b) the RRI Spectrograms of the High Arousal Class; (c) the RRI Spectrograms of the Low Arousal Class; (d) the RRI Spectrograms of the High Arousal Class.

TABLE II .
THE MERGED STRUCTURE OF CNN1 AND CNN2

TABLE III .
COMPARISON OF CLASSIFICATION ACCURACIES BETWEEN EEG, RRI, EEG AND RRI