Emotion Recognition based on Skew Symmetric Gaussian

This present paper highlights a methodology for Emotion Recognition based on Skew Symmetric Gaussian Mixture Model classifier and MFCC-SDC ceptral coefficients as the features for the recognition of various emotions from the generated data-set of emotional voices belonging to students of both genders in GITAM University. For training and testing of the developed methodology, the data collection is carried out from the students of GITAM University of Visakhapatnam campus using acting sequence consisting of five different emotions namely Happy, Sad, Angry, Neutral, Boredom; each uttering one short emotional base sentence. For training the data we have considered fifty speakers from different regions (30 male & 20 female) and one long sentence containing an emotional speech from each speaker. The experimentation is conducted on text dependent speech emotion recognition and results obtained are tabulated by constructing a Confusion Matrix and comparing with existing methodology like Gaussian mixture model.


INTRODUCTION
For human communication system, speech acts as the medium from his/her voices.The main advantage of speech is that we can identify the interacting person voice.Emotion is a integral part of the speech which helps to identify the internal feelings of the speaker and in other words, emotion helps to understand the listeners state of mind.In many of the practical situations like BPO, telephonic communication etc it is desirable to understand the emotions of the speaker [1].The Emotion Recognition is carried out by an acting sequence generated by each speaker varying different emotion back grounds, these speeches are therefore termed as acting sequences.
However, while generating different emotions the same step is to be applied.The emotions of speech narrate the prosody in a speech.The prosody of speech depends upon many characters, which include rules of the language, condition of living, place of living, culture of community etc [2].Many models have been evolved for the recognition of the emotions based on GMM [3,4,5,6], SVM [7,8], HMM [9,10], Truncated GMM [11,12].In order to have effective speaker emotion recognition system, the features that help in recognition the emotions are to be identified effectively.Many of the models in Literature use MFCC coefficients for the recognition of speaker emotions.
However, these MFCC coefficients will be useful, if speech is of short duration, for long-term speech Signals Shifted Delta Coefficients (SDC) features are more appropriated since they identify the dynamic behavior of the speaker along with the prosodic features of the emotions.More over the models discussed mainly in the Literature Review are aimed towards the identification of the speaker's emotional speech, assuming that the emotion speech signals are symmetric.However in reality, the speech signal is asymmetric in nature.
Hence in order to interpret the speech signal effectively, it is advantageous to use asymmetric distributions like Skew Gaussian Distribution, Log Distribution, Gamma Distribution, etc.In this paper we have utilized Skew Gaussian Distribution, since it contains Gaussian distribution as a particular case.The rest of the paper is organized as follows, in Section-2, the feature extraction methodology is presented, section -3, deals with the skew Gaussian mixture model, methodology is presented along with performance evaluation in section -4, the section-5 of the paper deals with results arrived and comparisons.

II. THE FEATURE EXTRACTION
For effective recognition of emotional speech it is important to extract the features effectively.In this paper we have considered the MFCC-SDC features for the extraction of features from the speech database.MFCC are preferred because of its ability to interpret the signal in short duration and SDC is used to extract the features from the long duration speech samples or dynamically changing samples.The processes carried out for the extraction of features are 1.The speech voices are fragmented in to small frames and these frames are given to MFCC.
2. Long term speeches are also considered and segmented the speech samples of windows sizes of 30ms, 60 ms etc and these sequences are given to SDC.
3. Using the combo effect of MFCC-SDC, the features are extracted for different emotions.
4. Classification of the emotion is carried out using Skew Gaussian mixture model.

5.
The developed model is compared to that of GMM.www.ijacsa.thesai.org

III. SKEW GAUSSIAN DISTRIBUTION
For the recognition of speech samples it is essential to understand the behavioral pattern of the speech signal.In many of the cases, the speech signal assumed to be symmetric but speech signal mainly depends upon the prosodic and also depends upon the pitch, energy and other factors associated with each speaker.Since these features are not symmetric always, asymmetric features are to be considered.Hence in this paper we have considered speech signal which caters the speeches which are both symmetric and asymmetric.The results obtained are compared with that of existing model like Gaussian mixture model and results obtained in the presented in the above Table - In this paper a novel methodology for Emotion Recognition is done by using Skew Gaussian Mixture Model is developed.These emotions are recorded at 30 ms with five different emotions.The speech database is generated from the acting sequence of one short emotionally based speech sentence comprising of 5 different emotions from50 students (speakers) from different dialects of Andhra Pradesh.The features are extracted and for recognizing, the test speaker's emotion is considered and classified using Skew Gaussian Mixture Model.The results obtained are presented in the confusion matrix for both genders in Table-1 and in Table -2, and Bargraphs-1 &2.From the above Tables and Bar-Graphs, it can be see that the recognition rate is 90%.incase of certain emotion and for the other emotion.The output is compared with that of the existing model based on GMM and from the

Fig- 1
Fig-1 Process of Emotion Recognition Model The speech signal in the database containing different emotions such as Happy, Sad, Angry, Neutral, Boredom are extracted and are trained using Skew Gaussian mixture model.The feature extraction process involves the generation of emotions samples in .WAV form extracting the amplitude values for each of these emotions signals and these values are given as input to be Skew Gaussian distribution and the probability density function values of the Skew Gaussian distribution are generated.For the testing purpose, test signal is considered, amplitude are values generated and these feature values are to be given as input to the speech signal for classification of the emotion.

1 Where
The probability density function of Skew Gaussian distribution is given by  z = 2. ∅  .∅   ; − <  <  model a database is generated with 50 students (30 male and 20 female) of GITAM University containing different dialects of Andhra Pradesh.The voices are recorded in acting sequences with different emotions Happy, Sad, Angry, Neutral and Boredom.The process of the emotion classification is as follows A. Phase-1Extract the MFCC-SDC coefficients B. Phase-2: Train The Data By The Probability Density Function Of Skew Gaussian Distribution Model C. Phase-3: Consider A Test Signal From Voice Database, Apply The Step-1 & Step-2 And Classify The Emotion D. Phase-4: The Output Generated Is Depicted In The Form Of Confusion Matrix Shown In Below Table-

TABLE 1 .
1 And Table -2 COMPARISON OF CONFUSION MATRIX FOR IDENTIFY DIFFERENT EMOTION OF MALE Figure 1.BARCHART-1, REPRESENTING THE RECOGNITION RATES FROM MALE DATABASE

TABLE 2 .
CONFUSION MATRIX FOR IDENTIFY DIFFERENT EMOTION OF FEMALE www.ijacsa.thesai.org

Table - 1
& Table -2, it can be clearly seen that our method outperforms the existing model.The overall emotion rate is above 85%.