Facial Expression Recognition using Hybrid Texture Features based Ensemble Classifier

Communication is fundamental to humans. In the literature, it has been shown through many scientific research studies that human communication ranges from 54 to 94 percent is non-verbal. Facial expressions are the most of the important part of the non-verbal communication and it is the most promising way for people to communicate their feelings and emotions to represent their intentions. Pervasive computing and ambient intelligence is required to develop human-centered systems that actively react to complex human communication happening naturally. Therefore, Facial Expression Recognition (FER) system is required that can be used for such type of problem. In this paper, FER system has been proposed by using hybrid texture features to predict the expressions of human. Existing FER system has a problem that these systems show discrepancies in different cultures and ethnicities. Proposed systems also solve this type of problem by using hybrid texture features which are invariant to scale as well as rotate. For texture features, Gabor LBP (GLBP) features have been used to classify expressions by using Random Forest Classifier. Experimentation has been performed on different facial databases that demonstrate promising results. Keywords—Expression classification; ensemble; adaboost; facial; features


INTRODUCTION
Communication is fundamental to humans.Many scientific research studies have shown that most of the human communication (55% to 93%) is non-verbal [1], [2].Along with the movement of the head, facial expressions are the main part of the non-verbal communication [3].Facial expressions are one of the most compelling, of course, excellent means for people to communicate their feelings and emotions to clarify and to strain their understanding, disagreements, and intentions.The next generation computing, such as, pervasive computing and ambient intelligence, needs to develop humancentered systems that enthusiastically respond to complex human communication happening naturally [4].Traditional HCI ignore bulk of information communicated through nonverbal ways and just caters for user's intentional input.FER plays a key role in the development of interfaces that can understand the emotional expressions and emotional responses of the user [5], [6].The effective application areas of FER are: affective computing (autism syndrome diagnosis, analysis of depression, pain detection, stress recognition, drowsiness detection, etc.), smart media (smart home, smart meeting), video conferencing and visual surveillance, lie detection, psychiatry, emotion and paralinguistic communication, robotics, computer games (e.g.Microsoft Kinect), medicine, expression driven facial animation (e.g.facial movements in the film Avatar, Cartoons for children), security, HCI, facial image fusion for gender conversion and different age groups' fusion [7]- [9], etc.Thus, there is a strong interest for both academia and industry, and it makes the researchers to pursue for the goal to develop robust and efficient FER systems.
To develop a reliable and robust system for the FER is still a difficult and challenging problem [10], [11].The FER is innately data driven.The challenge of generalization and robustness for various ethnicities and cultural variations can only be reasonably addressed if a representative database of such variations is available.But, there is no database constructed yet by keeping this problem in view.A good number of traditional expression recognition systems, trained over a benchmark data set and tested over the corresponding one, can offer an optimal performance.It is observed that while being in physical/practical environment such systems reflect a substantial decline in their performance due to in-appropriation or insufficiency of training datasets for facial expression patterns which could be expected in feature [12].An expression can be demonstrated by so many combinations of facial expression patterns.Each of the images in facial expression datasets represents a single common expression i.e. disgust, but a variety of facial expression patterns easily observable.So, it is too difficult to engender such all possible patterns and to use these as training data even when the construction of an expression recognition system is based on a large number of facial images.This is nearly impossible to consider all possible future variations in facial expression patterns ahead-in-time.Thus, based on existing datasets, the highest recognition performance is never expected in practical environments.Although, some efforts have been made to construct naturalistic databases but they still have many limitations along with publically accessibility issues [13].However, researchers in the FER community are agreed that spontaneous data gathering with real-life environments is very monotonous task.Consequently, the process for data base construction requires new methods those could expedite its creation process and represent the actual real conditions.Furthermore, the construction of robust algorithms with reliability is also required.Under such sort of situation when these databases are not available in absolution, an incremental learning along with dynamically weighted majority voting based solution is proposed.Head movements, pose variations and illumination variations are frequently encountered while dealing with spontaneous expressions.This problem is www.ijacsa.thesai.orgaddressed by using a feature set that is robust against in-plane image rotations and illumination variations.

II. PROPOSED METHOD
Proposed method has been divided into three different phases.First phase is to extract region of interest (ROI) and in this case is to extract the face portion only that is used for facial expression recognition.Hairs, or head or neck and other parts of bodies has no involvement in facial expression recognition therefore, face extraction is the most important step.In the second phase, the most suitable features that represent the facial expression are important.So in this phase texture and geometrical features has been extracted for the decision of expression.In the third phase, intelligent classifier has been used to decide the expression type by using the features extracted in second phase.Fig. 1 shows proposed method.

A. Preprocessing to extract Face part
First step in proposed method is to extract face from whole image.Most of the time during photography or image acquisition, it is not common to take only face image.Always whole body image or some time head with neck is also taken in the images.Facial expressions are only available on the faces.Therefore, it is important to extract the face part from the images.Proposed method used one of the standard method available in the literature to extract face part by using voila and Jones method.Proposed system did not require any other step as preprocessing except this face part extraction.Result of voila jones method has been shown in Fig. 2 that has been taken from MUG database.

B. Feature Extraction
Feature represents the characteristics of an object to distinguish from other objects.Facial image also contains some characteristics, six based upon it can be distinguished and can be identified the expression as well.So some impressive types of features are required that can be used for all type of images as well as for all ethnicities and cultural people so there should be no discrepancies in different cultures and ethnicities.Most of the times, frontal faces are not available, some posed either left or right faces available.Similarly, it is not necessary to take picture from same distance from camera.Sometimes, it has been taken from long distances and some time from very short distances so scaling is also a factor to consider during features extraction.So such type of features is required that are rotation invariant as well as scale in variant.Texture features is special type of features extraction that can be used to handle such type of variations.So hybrid texture features have been used by extracting Gabor based features and local binary patterns (LBP) features.

C. LBP Features using Gabor Filter
Gabor filter can be used to extract texture information.Texture shows a specific pattern and facial images have some specific pattern that represents a specific expression.Like laughing has specific pattern that always remain same for all faces similarly most the time sad also has some specific pattern, smile also has a specific pattern.Therefore, texture is the most suitable for features extraction in the case of facial expression.So the characteristics of texture can be represented by a spatial frequencies and it can also be represented by their orientations.There are different types of Gabor filter that can be applied on images to extract texture features [14]- [18].But in facial expression 2-D Gabor filter is most suitable due to nature of images that are in 2-D form.Gabor filter is a Gaussian kernel function and that can be modulated by a sinusoidal wave of precise frequencies and orientation [19]- [21].To represent the 2-D Gabor filter, following equations can be used: Where, variables x, and y are the spatial variables, σ x and σ y represents are the scaling parameters of the filter, and W is the central frequency of the complex wave.Gabor filter bank is a combination of different Gabor filters applied at different scales, frequencies and orientation.It is possible to generate different filter banks with different orientation and scales.In this paper, Gabor filter bank has been created by applying two frequencies, two scales, and two orientations.For this purpose, following values has been used for generation of filter bank.After calculating these filters, convolution is required to apply on the original images.So these eight filters are convolved with the original images so it returns eight new convolved images.After applying Gabor filter bank, there are magnitude values of the Gabor transform.These magnitudes represent changes very slowly with displacement.Thus there is required a process to encode these magnitude values.LBP can be used to encode these magnitude values.Basic advantage of LBP encoding over magnitude images is to improve and enhance the information in the Gabor filtered images.Applying local binary pattern on Gabor is actually a representation approach based on multi-resolution spatial histogram combining local intensity distribution with the spatial information, therefore, it is robust www.ijacsa.thesai.org to noise and local image transformations.Additionally, instead of directly using the intensity to compute the spatial histogram, multi-scale and multi-orientation Gabor filters are used for the decomposition of an image, followed by the local binary patterns (LBP) operator.LBP operator on each pixel of the image to get LBP coded image and then represented as a histogram of that code.The Gabor and LBP combination further improves the representational power of the spatial histogram greatly (Fig. 3).

III. CLASSIFICATION USING ADABOOST
Classification is the process to differentiate into classes by using some characteristics.In the literature, many different classifiers are available that can be classified individually.Ensemble classification used different weak classifiers and combine intelligently to combine those classifiers to improve the performance of classification.One of the most important ensemble classifier is AdaBoost that is also known as adaptive boosting.This AdaBoost was proposed by [14] and it improves the simple classifier by using the iterative procedure.In this iterative procedure, during each iteration, there is a process to improve the misclassified samples.This procedure increased weights of misclassified patterns and decrease the weights of correctly classified samples during each iteration.In this way, weak classifiers given more preferences and these weak classifiers are forced to learn more by using difficult samples [14].In this way, classification performance improves during this iterative weight adjustment procedure.These adaptive weights can be used for the classification of new samples.In this way, algorithm supposed that the training set contains m samples and these samples are labelled as -1 and +1.In this way, classification of the new sample can be find out by using voting for all classifiers Mt with weights αt.Mathematically, it can be written as: ∑ Pseudocode of the AdaBoost is given in Fig. 4.

IV. RESULTS AND DISCUSSION
For performance evaluation two different datasets has been used to evaluate proposed method.To validate results accuracy has been used as quantitative measure and results has been presented in the form of confusion matrix.After face detection each face image was resized to [150 x 150] before the extraction of feature vector.These datasets contain different types of expressions like some are spontaneous and some are posed so it is a mixture of different expressions.Some sample images have been shown in Fig. 5. JAFFE database which consists of 213 facial expression images of Japanese females and MUG database which consists of both posed and induced expressions are used in the experimentation.The MUG database consists of 86 subject's image sequences.Out of which 51 subjects are males and 35 subjects are females.
Results has been shown in the form confusion matrix where ANG, DIS, FEA, HAP, SAD, SUR and NEU represents the anger, disgust, fear, happy, sad, surprise and neutral expressions, respectively.In this paper, hybrid texture features have been extracted by using Gabor and LBP.Random Forest classifier has been used for prediction of expressions.
For experimentation, three different types of experiments have been conducted by using different ratios of training and testing data like 60-40 ratio mean 60 percent for training and 40 percent for testing similarly 50-50 and 40-60 percent so that there should no biasness in the results.These tests have been conducted 10 times and then results have been shown by taking average of all these differently 10 times.Results have been shown in Table 1 on JAFFE database and Table 2 shows results on MUG database.www.ijacsa.thesai.orgTable 1 shows results of proposed method on JAFFE dataset.It shows that approximately it achieves good results for all expression.As table shows that on anger accuracy is 96%, disgust 87.66%, fear 91.63%, happy 95.38%, sad 91%, surprise 91% and neutral 94%.Average results of all these has been shown that 92.38% that is good for this problem.These results has been shown on three experiments and then taken average results of all those three experiments.This shows that proposed method works well specially by extracting features of hybrid Gabor LBP features.These GLBP features plays important role in good accuracy and recognition rate.As table shows that on anger accuracy is 95.15%, disgust 91.08%, fear 92.07%, happy 96.09%, sad 92.14%, surprise 90.72% and neutral 91%.Average results of all these has been shown that 92.60% that is good for this problem.These results have been shown on three experiments and then taken average results of all those three experiments.This shows that proposed method works well specially by extracting features of hybrid Gabor LBP features.These GLBP features plays an important role in good accuracy and recognition rate.

V. CONCLUSION
Facial expressions are the most of the important part of the non-verbal communication and it is the most promising way for people to communicate their feelings and emotions to represent their intentions.Pervasive computing and ambient intelligence required to develop human-centered systems that actively react to complex human communication happening naturally.Therefore, Facial Expression Recognition (FER) system is required that can be used for such type of problem.In this paper, FER system has been proposed by using hybrid texture features to predict the expressions of human.Existing FER systems has a problem that these systems show discrepancies in different cultures and ethnicities.Proposed system also solves this type of problem by using hybrid texture features which are invariant to scale as well as rotation.For texture features, Gabor LBP (GLBP) features have been used to classify expressions by using AdaBoost Classifier.Experimentation has been performed on different facial
Fig. 3. Gabor Filters and Implementation on LBP.

Table 2
shows results of proposed method on MUG dataset.It shows that approximately it achieves good results for all expression.

Table 3
shows comparison of different methods on both JAFFE and MUG datasets.It has been compare with existing methods like F. Wallhoff used DCT features with SVM and hidden Markova model (HMM) and it achieved 61.7% on JAFFE and 63.5 on MUG datasets.Similar other results have been shown in this table.Proposed method also shows 92.38% on JAFFE and 92.60 on MUG datasets that is the highest from all other existing methods.