Face Verification across Aging using Deep Learning with Histogram of Oriented Gradients

One of the complex procedures which affect man’s face shape and texture is facial aging. These changes tend to deteriorate the efficacy of systems that automatically verify faces. It seems that CNN (also known as Convolutional Neural Networks) are thought to be one of the most common deep learning approaches where multiple layers are trained robustly while maintaining the minimum number of learned parameters to improve system performance. In this paper, a deeper model of convolutional neural network is fitted with Histogram of Oriented Gradients (HOG) descriptor to handle feature extraction and classification of two face images with the age gap is proposed. Furthermore, the model has been trained and tested in the MORPH and FG-NET datasets. Experiments on FG-NET achieve a state of the arts accuracy (reaching 100%) while results on MORPH dataset have significant improvements in accuracy of 99.85%. Keywords—Facial aging; verify faces; Convolutional Neural Networks (CNN); Histogram of Oriented Gradients (HOG)


I. INTRODUCTION
As age proceeds, face appearance is affected dramatically which is a phenomenon [1]. Despite the fact that age effects on face appearance have been studied for a while, novice work to reveal faces during age progress has been done. One of the most emergent issues is how to identify an invariant facial feature. In other words, the basic problem of this research is how to develop a scheme that represents and matches facial features and that is flexible to deal with different face aging changes. What is a suitable algorithm to extract invariant features is the one that improves performance throughout the system by boosting its accuracy. Moreover, a suitable algorithm stands out when compared with other systems that identify images of people as they age.
The effects of aging are normally seen on the face in the shape of subtle differences of both face shape and texture during maturity [2]. Overall, people with common ethnic groups or gender experience similar face features during their different ages. It is also the case that people who gain or lose weight across their ages tend to have analogous face aging features [3]. There are several causes that are thought to form constrains for face identification across age advancement.
There are several reasons that hinder facial identification through age advancement. One of them is due to changing facial biometrics, such as texture and shape that take place over the years. This seems to limit the development of a system that is able to adapt to these changes. This paper will consider how a subject could be recognized despite age changes over the years and other significant variations caused by lighting, expressions, poses, resolutions, and backgrounds. Face verification in aging subjects is a challenging process, as human aging is non-uniform. Besides, extracting textural and shape features from the images is another challenge. Some researchers study the effect of Local Binary Pattern (LBP) features [6,7] in face verification have achieved significant improvements. On the other hand, Histograms of Oriented Gradient (HOG) is a shape descriptor used to detect objects like cars and humans was chosen for its advanced results in facial recognition [3].
In this paper, deep convolutional neural network architecture is combined with HOG descriptor to extract features from facial images and classify them. Face verification is accommodated by calculating similarity distance using "Euclidean" distance.
The remaining parts of this paper are organized as follows: In Section 2, background and related works for face verification across age are introduced while methods and techniques for implementation are presented in Section 3. Subsequently, Section 4 consists of experimental results whereas the last section concludes the paper and gives future orientations.
II. BACKGROUND AND RELATED WORK A face verification system is defined as a complicated system that requires high system performance. Recently, many automatic systems have been using it for face verification. The most powerful technique is a deep learning approach which has been used to extract both textures and shape features from the face. The main issue, however, is how to build model architecture to improve system performance.
In the literature, both deep learning-based approaches and Convolutional Neural Network (CNN) has been used for face verification. CNN models differ in terms of layers' number, activation function, etc. Simone [4] conducted a research to investigate the task of long time gap face verification that deploys a DCNN through using a layer with injection feature that maximizes identification precision through spotting a scale of similarity for the external features. The method has been assessed in accordance with the LAG (Large Age Gap) database and proved to function better than other contemporary state-of-the-art systems.
The usage of CNN to recognize facial features for automatic face verification of themes that refer to various age, ethnicity and gender groups has been tested by El Khiyari and Wechsler [5]. As far as multiple demographic categories are concerned, the researchers concluded to that face verification biometric performance was comparatively lower in black women themes (subjects) of 18 to 30 years old. Afterwards, the VGG-face convolutional neural network [6] was utilized for features mining through activation layers. Surprisingly enough, the distance of features between themes was equal to the distance between their relevant sets. Singleton and set similarity distances were both used in order to assess the performance of identification and verification.
Kasim and et al. [7] proposed a CNN model from scratch and compared it with two pre-trained methods AlexNet and GoogLeNet by implemented in Celebrity Face Recognition dataset. Their results concluded to that despite validation accuracy was 100% in both models; GoogLeNet was better compared to elapsed time.
Ling, Haibin, et al. [3] suggested using GOP (Gardient Orientation Pyramid) as a facial describer during age advancement. Subsequently, they compared it to other various methods such as gradient with magnitude, intensity difference, gradient orientation, Bayesian face, and surprisingly enough to a couple of other marketable face recognition products (Vendor A and Vendor B). The method could be considered simple if compared to its rivals and showed promising results. The suggested method was applied to passport verification operations and then validated on a couple of passport photo databases with long age gaps through the SVM classifier. Moreover, they studied how recognition performance varies with increasing time lapses between images resulting in saturation of the added age gaps if the gap is more than four years up to ten.
Facial aging has been investigated as a series of viscalelastic events by Pittenger and Shaw [8] face. They examined the importance of three human face growing parameters: shear, strain, and radial growth at the supposed age of faces and concluded to that the most influential factor that affected facial aging was the cardioidal strain transformations.
Biswas et al. [9] proposed a method based on the coherency of features drifts to being used in face verification across age progression. They noticed, depending on the shape and muscle structure of the individuals; there is a coherency among image features drifts. Therefore, they proposed a computational measure to calculate coherency and incoherency between two feature drifts maps. So, images belong of the same character, however, at different ages are constant. Contrary, incoherent images refer to various characters with dissimilar featuring drifts. In their work, the researchers assessed their method on children photographs that were captured at different ages through the FG-NET database (350 pairs) for a range of ages (1-18 years old) as well as an SIFT Feature extractor to extract drift landmarks. Their suggested approach performed superior to the other image difference and SVM classifiers.
Some models simulate the wrinkles process, for example Wu et al. [10] generated a 3D model to simulate wrinkles in plastic-visco-elastic processes. Furthermore, other variations have been included such as age, gender, expression, and facial hair, like Givens et al. [11] discussed three face recognition affected by these variations.
Park, Unsang, Yiying Tong, and Anil K. Jain [12] sug-gested converting 2D age modeling into 3D age modeling via the points that show features on 2D portraits investigated by conventional Active Appearance Model (AAM) and then changing the 2D feature points into their equivalent 3D peers through the a reduced model (Morphable). The researchers also invented a model that deployed PCA to extract both texture's aging pattern and shapes aging patterns separately. Thereafter, simulated the aging process and tested the performance of the proposed structure through making a comparison between face recognition precision and with state-of-the-art matcher (FACEVACS).

III. METHODOLOGY
The proposed Convolutional neural system is indicated in Fig. 1. Images preprocessing accomplished with data augmentation were the first system step. Then, a novel convolution neural network architecture is built from scratch to extract features and classify facial images. There are two databases that are relied upon in this system, which are MORPH dataset [13], and the FG-NET dataset [14]. Each contains sufficient face images. The proposed algorithm is given in algorithm 1.

8:
Training CNN with tarain dataset T,get training face classifier f T ()

9:
Testing the test dataset S with CNN,get testing face classifier g S () 10: Classifiy each image in S As 11: loop: 12: for i=1 to n 13: if x j ← T then 14: return L 15: End for. 16: close;

A. Image Preprocessing
Improving the model performance required pretreatment of the dataset. Additionally, Data Augmentation is used for preprocessing to prevent networks from over fitting by generalizing image features [15]. All input images are translated both horizontally and vertically in the range [-30, 30]. After that, images are rotated and measured against the size of the standard input layer (224×224). Finally, the processed images were introduced to the CCN network via RGB colour values. complexity, at some points, could possibly degrade the system performance.
Our model consists of deep convolutional network architecture comprising five convolutional layers and one fully connected layer that is designed to accomplish the feature extraction and classification stage. CNN architecture consists of five convolutional layers, each one of them is followed by batch normalization, rectified linear unit (ReLU) as an activation layer, and a max-pooling layer. All these layers represent the feature extraction stage.
The input layer accepts a facial image of size 224×224 with RGB color, which is passed to the first convolutional layer that has 8 filters with size 3 ×3 pixel to detect general features in an image such as vertical and horizontal edges and textures. Furthermore, convolutional layers have several parameters including output size, filter size, stride, and filter numbers. On the other hand, the output features map from each convolutional layer is firstly normalized using batch normalization where RelU function is used as an activation function to convert all negative values to zeros. In turn, the output of this layer is directed to the Max-pooling layer with stride value 2×2 in order to reduce feature map size to a half.
In the classification stage, there is one fully connected layer which converts the feature map into a vector of 672 neurons for a classification task followed by a SoftMax layer, which has 672 neurons where each neuron represents class (subject). In addition to that, the loss function is cross entropy, which is calculated by this equation: The output is the predicted labels (classes) for each facial image and the reflects the probability for the predicted class. Table I contains more details about the CNN layers structure and the value of parameters. Fig. 2 shows the architecture.
Histogram Oriented Gradient (HOG) is a shape descriptor that is used to detect objects, e.g. cars, humans, etc. It was firstly introduced by Dalal and Triggs to detect humans [16]. The basic idea of their invention is that the shape of objects and the appearance inside images can be defined by the distribution of intensity gradients or edge directions. Therefore, there is a need to: divide images into cells, and for each cell create a histogram in order to describe the distribution of the directions. Then histograms are normalized and concatenated into vectors, which is calculated as follows [16]: 1. First compute Gradient with this equation.
g y (x, y) = I(x, y + 1) − I(x, Y − 1) 2. Then Orientation θ and magnitude m(x,y) are calculated as in the following formula.
m(x, y) = δx(x, y) 2 + δy(x, y) 2 θ(x, y) = arctan δy(x, y) δx(x, y) 3. Divide image Orientation and magnitude into cells so that the number of cells in rows and columns act as parameters to choose when implementing HOG.

Histogram of the orientations is computed for each block; then normalized by the formula below.
Hist norm = Hist Hist + ε

Concatenated Normalized Histograms into a Vector.
Local Binary Pattern (LBP) is a texture descriptor [17]. It works by dividing an image into multiple cells where each pixel in the center of the cell is compared to its eight neighbors, starting from the top-left direction. Beginning clockwise, if the pixel in the center is larger than its neighbors, it is changed by zero and otherwise, it changes by one. After that, the decimal value of all binary numbers is calculated, resulting in LBP code which replaces the central pixel. To collect information over larger regions, you can select larger cell sizes. The LBP code for P neighbors situated on a circle of radius R is computed as follows [18]: Where S(l)=1 if l ≥ 0 and 0 otherwise.

C. Classification
The adopted classifier in the previously mentioned methodology ( fig. 1) is Support Vector Machine (SVM) [19]. Precisely, the study included a linear multi-class SVM in order to constitute subjects/classes. The Multi-class SVM technique is to use a one-versus-all classification approach to represent the output of the k-th SVM as in (8).
the forecast class is 1) Performance Metrics: Performance measures are established on the four digits obtained when running the classifier to test the dataset. These metrics are, false positives (FPs), true positives (TPs), true negatives (TNs), and false negatives (FNs). Thus, the system validation accuracy is calculated as follows [20]: 2) Face verification: The performance of facial verification across age method was evaluated using Euclidean distance [6], which measures the similarity between pairs of feature vectors. Given the two feature image vectors (a) and (b), the similarity distance is the Euclidean distance calculated in the following way: For two-image feature sets, A = {a 1 ,. . . , a N } and B = {b 1 ,. . . , b N }, the Minimum similarity distances between the two sets is defined as follows: (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020

IV. EXPERIMENTS AND RESULTS
Various experiments were carried out in this section to assess the efficacy of the suggested face verification across age approach. Two publicly available datasets were used to demonstrate the anticipated methods.

A. Facial Dataset
In the current study, two datasets are used for training and testing; First, MORPH dataset, which is a standard benchmark dataset for face recognition [13]. Second, FG-NET dataset [14], which contains 1002 photos that show 82 characters, where multiple photos of the same character are considered to reflect variability in age, in addition to intrinsic variability, for instance, pose, illumination and expression.
The MORPH dataset comprises 4132 photos that show 672 characters which differ in terms of age. The images have been divided into classes; each class contains images of the same subjects at various ages with a maximum of 5 years' age difference. Moreover, the database was classified into a couple of categories: in the first, 80% of the data was picked arbitrary to train the CNN network, whereas the other 20% was utilized to examine it.

B. Model Implementation
The implementation is accomplished by a personal computer with Intel Core i7 processor 2.20 GHZ, include Nvidia GeforceGTx card with 4 GB and Matlab 2018 software.
Our experiment goal is to design a CNN model capable to verify two face images regardless of the age difference. After various experiments with different parameters value, researchers set initial learn rate to 0.1, L2 regularization to 1.000000000000000e-04, with a gradient threshold method of l2 norm, validation frequency of 50, and finally to shuffle every epoch. More details about training parameters for the pre-trained models are illustrated in Table II. To make the model more general, the stochastic gradient descent with momentum (SGDM) optimizer was applied with a value of 0.9 [21], which is defined as follows: Here, "η" is defined as the learning rate and ∆γJ is the gradient of the loss term with respect to the weight vector γ.
The network in each restoration is validated by the system when training was applied. The fine-tuned CNN is used to classify the validated images and it is also the case in this stage of calculating the accuracy of classification.
The proposed system is evaluated based on two databases to predict face verification with an age difference. On the one hand, MORPH dataset contains 4132 images of 672 subjects that vary in age. On the other hand, the FG-NET dataset consists of 1002 images of 82 subjects within size 150x150 pixel. Images are converted to RGB color to match the CNN input layer. The process of training and testing is shown in Fig. 3 , examples of classified images and their predicted labels in Fig. 4.

C. Experiments in FG-NET Dataset
The innovated model is examined and tested in FG-Net dataset, which included 1002 images of 82 subjects. It particularly contains different images of the same person at different ages. For evaluation, HOG descriptor with deep convolutional neural network reached a maximum accuracy of 100% that is the same when combining both LBP and HOG within the same CNN. From this result, it seems that HOG improves validation accuracy when compared to the minimum accuracy generated by LBP.FG-net database contained a limited number of images (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 which gave 100% accuracy.Usage was obtained and scored as in Table III.

D. Experiments in MORPH Dataset
The proposed model was evaluated in the MORPH dataset. when HOG is used as feature extraction, 29.61% accuracy is obtained, which is an improvement over LBP with a rate of 25.59%, combining deep convolutional neural network with LBP seems to give minimum accuracy than combining with both LBP and HOG. On the other hand, combining deep convolutional neural network architecture with HOG proved to give the highest accuracy value (99.85%). Despite the FG-NET dataset contains fewer images, it appears that there is no improvement in the accuracy of the MORPH dataset as shown in Table IV.

E. Performance Comparison of our Result with the State-ofthe-Art Works
In Table V, improvements in accuracy over previous works can be seen. In the Morph dataset, combining HOG with deep convolutional neural network reaches 99.85% accuracy which is an improvements compared to results by [4] with layer injection. By comparing the proposed model with the results obtained by [7], we notice that despite obtaining 100% accuracy, but the model contains a limited number of layers and its depth is not sufficient to learn all features.

V. CONCLUSION AND FUTURE WORK
The problem of facial image verification with an age difference as feature extraction and classification was outlined in this paper. The trained process was fine-tuned on MORPH and FG-NET publicly available datasets, HOG achieved much better results than LBP when combined with the deep convolutional neural network. Further analysis also showed that a state of the arts is achieved through fusion design of a CNN with more depth and efficiency to accommodate human's age and gender will be the action plan.Also,use pre-trained Models in deep learning is under consideration. www.ijacsa.thesai.org