Face Recognition using Principal Component Analysis and Clustered Self-Organizing Map

—Face recognition is one of the cornerstones of the face processing schemes that composed the contemporary intelligent vision-based interactive systems between computers and humans. Instead of using neurons of the Self-Organized Map (SOM) neural network to cluster the facial data, in this work, we applied an agglomerative hierarchical clustering to cluster the neurons of the SOM network, which in turns, used to cluster the facial dataset. In prior, Principal Component Analysis (PCA) is employed to reduce the dimension of the facial data as well as to establish the initial state of SOM neurons. The design of the clustered-SOM recognition engine involves post-training steps that labeled the clustered SOM neurons resulting in a supervised SOM network. The effectiveness of the proposed model is demonstrated using the well-known ORL database. Using five images per person for SOM training, the proposed recognizer results in a recognition rate of 94.7%, whereas using nine images raise the recognition rate up to 99.33%. The facial recognizer has attained a notable reliability and robustness against the additive white Gaussian noise, where increasing the level of noise variance from 0 to 0.09, the recognition rate decreased only by 8%. Furthermore, time cost is analyzed, where using 200 images for training takes less than 4 seconds to be performed, whereas testing using a new set of 200 images takes less than 0.013 seconds which is competitive to many artificial intelligence and machine learning based schemes.


I. INTRODUCTION
As a basic definition, facial recognition is the process that utilizes techniques and algorithms to match the physical characteristics against the photos of peoples' faces where face identification allows faster and more accurate face identification rather than that is carried out by the naked human eye. Face recognition can take variants spheres of face recognition-related activities and operations to new ambits. For example, in the fields of security, face recognition can do a lot more to enhance security extends from street crime to airport security where these issues have been dominated the headlines in many countries all over the world. The limited information in security cases opens the doors for a wide band of accusations related to bias or discrimination.
Face recognition systems open the counterpart door that entails no antecedent information related to age, race, or gender especially face recognition that is carried out based on classical techniques that adapt prior saved databases composed of the faces of persons of interest or persons who are suspected to involve in a serious violent crime. In the nutshell, face recognition systems offer up a further intelligence in people identification, especially in situations where it is considered a tedious task to be done by human staff alone as encountered in the large, crowded areas and establishments.
This recent surge of facial recognition usage increases the demands on recognition performance metrics included: (1) recognition accuracy and (2) the speed of response. The majority of facial recognition methods and schemes in literature have been built based on two major cascaded engines: (1) facial data representation (facial features/characteristics extraction) engines (2) Facial classification engines.
A considerable portion of artificial intelligent and machine learning based schemes is simple to be implemented, however the recognition performance is moderate, or the reliability of the facial recognizer show dramatic changes against changes occur in the number of training images used per person, or against additive white Gaussian noise.
The other portion of techniques, that adapt complex frameworks (such as complex-structured neural networks) show high to moderate recognition performance associated with high computational overhead and time cost that lessen the applicability of these techniques.
In this work, we aim to design an intelligent facial recognizer model that can be deployed with low computational overhead and time cost yet have competitive facial recognition (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 3, 2022 510 | P a g e www.ijacsa.thesai.org rate. Furthermore, we aim a high level of reliability in our designed model through a high robustness against sizeable change in the volume of available training data as well as against high levels of additive Gaussian noise.
To achieve these goals, in this work, we use an effective combination of the simple standard PCA algorithm that modified mathematically to suit the high dimensionality of the facial data and to dramatically lower the computational overhead. Then, we designed a special variant of SOM, denoted as clustered SOM to perform the classification step, where instead of using the converged SOM neurons as our classifiers, we applied an agglomerative clustering algorithm, to obtain a set of clusters composed of SOM neurons, afterwards, these clusters are labeled based on the training dataset yields a supervised version of clustered SOM. The training data with labeled exemplars are used to label the different generated clusters by measuring the distance between the facial vectors that composed the training data and the neurons that play the role of "cluster members". Then, the different labeled SOM clusters used to classify the new coming facial data into their corresponding classes.
This hybrid framework has the ability to learn and response fast due to the flexibility of the supervised clustered SOM, where it can learn easily and with low computational overhead as well as it has a high detection performance where the operational neurons undergo two cascaded refinement one during raw SOM neural network training and the other one during the agglomerative clustering applied on these converged neurons. Thus, we can summarize the contribution of this work in five-fold:  We present the design and implementation of a human facial recognition system based on PCA as a feature extractor (facial representation) and the agglomerative hierarchical clustered SOM as the recognition engine. Moreover, we study the recognition performance of the system using a benchmark ORL dataset.  The proposed system shows supremacy in terms of time-cost and computational overhead to perform either training or test stages where the processing time per image in the training stage is as low as 0.0182 sec per training image whereas it expends less than 7e-5 sec per test image.  Based on the several performance analyses that conducted to measure the reliability and the robustness of the proposed recognition model, it can be implemented in the real time applications where the high robust against noisy signal as well as the fast time response and fast learning of new coming images, make the model mostly appropriate for security systems that utilized the personal biometric features.
The rest of this paper is organized as follows: related works that constructed based on SOM network is discussed in Section 2. Section 3 details the proposed method. Section 4 presents the experimental results and analysis. Section 5 presents the results discussion and Section 6 concludes the paper.

II. RELATED WORK
One of the efficient yet robust unsupervised neural networks is Self-Organizing Map (SOM) as it can play a dual role in the field of face recognition, where it can be used either to represent the facial data (as a feature extractor and dimensional reduction tool), or it can be used as a classification engine. SOM network can be even integrated with other dimensionality reduction techniques such as PCA as proposed by Kumar et al. [23]. SOM network can be used as a dimensionality reduction as well as facial data representation as proposed by Lawrence et al. [24].
As a pre-processing step applied to the SOM network before the output of the network can be fed to the classification engine, Ruiz and Jaime [25] applied Fourier transform to the output of the SOM network (optimal weight vectors of the SOM neurons) to attain a translation invariance to the feature map generated by applying a two-dimensional Gabor filter to the input raw facial images. Afterwards, backpropagation neural network was used for sake of classification.
In a facial recognition scheme uses K-Nearest Neighbor (KNN) as a classification engine, Yodkhad et al. [26] used the clustering capability of the SOM network to group the training data and extract the representative prototype of each group, which, accelerate the classification duty of KNN algorithm.
Using SOM network as the major classifier engine of the face recognition system is proposed by Neagoe and Stanculescu [22], where PCA, LDA, and ICA techniques of data representation and feature extraction were used for comparison purposes cascaded by a SOM-variant scheme called concurrent SOM (CSOM) which was proposed and developed by the first author of [22]. In this scheme, the training data is partitioned into multiple sub datasets (partitions), each partition represents one class of data. Then, multiple SOM networks are generated and each one is trained by one partition of training data. In the testing stage, the Euclidean distance between the testing vectorial image and each trained SOM network is calculated, and the winner "SOM network" with the minimum distance gives the class to the input testing image.
Besides the eigenfaces that can be generated by the PCA method and the fisher face that can be generated by LDA (or Fishers' Linear Discriminant), SOM network can be used to generate what is called SOM-face as proposed in [27] where an www.ijacsa.thesai.org enhanced version of SOM network, called as kernel-based SOM network is used to extract the representative features of the facial data, where authors called it as SOM-face [27].
Based on SOM-face, Zhi and Meng [28] proposed a face recognition method based on multiple training images, where, in addition to the topological shape-feature vector generated by SOM network, a wavelet-feature vector is generated by a wavelet-network, then both feature vector representations fed to the classification engine.
Instead of training the SOM network via the individual vector facial images as a holistic, Tan et al. [29], partitioned each facial image into equal-sized nonoverlapping subblocks. Then, the resulted subblocks used to train the SOM map or multiple SOM maps in a step called by the authors as the localizing step. The major goal of this step is to generate the local vector representation of facial data to be fed to a soft KNN-ensemble classifier for sake of facial recognition.
Using the SOM network as a classifier engine, Monteiro et al. [30] proposed four schemes of SOM/SOM-variant-based classifiers. Since SOM is used as a classifier, all four classifiers' engines that implemented by the authors use either SOM grid neurons pre-training labeling or post-training labeling step. The first proposed classifier used the labeled training data for SOM neurons labeling which, in turn, used as the classification engine of the recognition system. The second classifier uses the centroids of the labels available for each class of facial data to label the SOM neurons after the training process, where the centroids of labels of each class are precomputed. The third classification engine is built by turning SOM into a supervised classifier by augmenting each input vector with its corresponding class label, where these vectors are used to adjust the corresponding augmented weight-vectors of SOM neurons. The fourth SOM-based classifier was built by using an entire SOM network to represent one single class of the available facial data. Afterwards, these networks are trained separately using the corresponding input facial vectors for each class. During the testing phase, however, the best-matched neuron (winner) is chosen via all the available trained SOM networks and the winner trained SOM network assigns its class to the incoming test vector.

III. METHOD
The high-level block diagram of the proposed method is shown in Fig. 1. Seeking an optimal representative subspace of the input facial data, classical PCA is applied as a first step. Then, facial data is projected on the resulting PCA-subspace and split into projected training and projected testing datasets. The projected training dataset is used to establish the topological space of the SOM neural grid. Then, the bestmatched units of the SOM network undergo a hierarchical clustering for sake of optimal robust and compact representation of facial data. As a post-training step, the projected labeled training data used in analyzing and labeling SOM clusters based on the majority occurrence voting of a specific subject (face image belongs to a specific person in the training dataset). Then, new coming facial data is classified by the proposed facial recognizer into their corresponding classes.
In subsequent sections, each step of the recognition model is explained in detail.

A. Principal Component Analysis (PCA)
Formally, the ORL database is composed of gray-scale images, that can be represented by the matrix , where corresponds to the vertical and horizontal dimensions of the image ( ) of the person (subject/class). As a first step, the pixels of are vectorized into an n-dimensional vector: where by reading the pixel values in the image in a raster-scan manner, in our case, .
Thus, the set of images in the ORL database can be represented as a rectangular matrix of columns where index represents the total number of images in the raw dataset.
The dimensionality of these images is too large to be fed and efficiently analyzed by the recognition engine of the proposed model. For sake of obtaining a more compendious representation of data, the regular form of the principal component analysis technique is used.
Given as input, a rectangular matrix whose columns are seen as variables, the main objective of the principal component analysis is to create a new set of variables (called principal components) that have a linear combination of the input variables in such a way that the variance between the principal components (resulted basis vectors) and each of the original variables is maximized.
As a first step, the vectorized dataset is split into two subdatasets: the first one is the training dataset which used to produce the principal components basis vector used to establish and train the SOM network, and to label the clusters of the clustered SOM, represents the total number of involved images in the training stage. The other sub-dataset is the testing dataset which is used to test the recognition performance of the proposed model, represents the testing images involved in the testing stage, where . Fig. 2 shows the pipeline of the facial data representation stage using the PCA method.
As PCA is a variance optimization process, if some variables (vectors of matrix ) show a large variance compared to other variables, then, during variance maximization, PCA will load on the ones of large variances. Therefore, as a prior step to PCA, is to normalize the data in two succeeded steps. First, the average of vectorized training images is obtained as in (1): Each vectorized image vector is subtracted from the average image vector to obtain the normalized training dataset as elaborated in (2) and (3): Let the set of all standardized image vectors compose the standardized training dataset matrix , whereas represents the standardized testing dataset matrix composed of all standardized image vectors . .  Let be defined as a linear transformation that maps the standard version of n-dimensional matrix onto a feature subspace of -dimensional feature vectors, where . By projecting on space, the new formed feature vectors are defined as in (4): Based on (4) we can define the matrix [ ] as that represents the projected training dataset that will be used to train the neurons of the SOM network in the recognition stage of the proposed model. Same wise is applied on the standardized testing dataset as defined in (5): Based on (5), we define, , as the result of projecting the standardized testing data on the eigenspace defined by . The columns of matrix are the eigenvectors that represent the eigenstructure decomposition of the covariance matrix that can be defined as in (6) and (7): , the scalars are the eigenvalues of the covariance matrix .
The covariance matrix is of high dimensionality to be computationally processed by the PCA engine for sake of finding the eigenvectors in an efficient manner. To solve this problem, many previous works had handled this difficulty via different tactics. In this work, we adapted the solution proposed by [31].
First, let be the eigenvectors correspond to the covariance matrix that defined in (7) and let be the eigenvectors corresponds to the new-defined matrix that constructed by switching the order of the transpose in (7) as illustrated in (8): The eigenvectors and the eigenvalues corresponds to the matrix are given as in (9): (9) Pre-multiplying both sides of (9) by matrix , we have (10): Substitute the value of of (8) and rearrange the terms, yields (11): (11) based on the basic definition of the covariance matrix in (7) we have (12): (12) It can be noted from (12), that the terms " " and represent the eigenvectors and the eigenvalues of the covariance matrix respectively. Thus, based on (12), to find the eigenvalues and the eigenvectors of the matrix , we first construct the matrix and then find the corresponding eigenvalues and eigenvectors . Then, the eigenvalues of are set to whereas the eigenvectors of are obtained by multiplying the standardized training matrix by the eigenvectors of the matrix as summarized in (13) and (14):

B. Hierarchical Clustered SOM Network
The second stage in our proposed model is the recognition engine. The hierarchical agglomerative clustering was applied on the well-trained neurons of the self-organizing map (converged map) yields clusters that were labeled by training data and used as the classification engine of the system.
As illustrated in Algorithm I, the first step of the classification engine is to establish the SOM grid which is used as a second layer of dimensional reduction and facial data representation. SOM networks belong to the family of topographic maps, represents a type of competitive unsupervised learning systems where the input space, which in our case is is "mapped" in a less-dimensional output space with the following principle: the similar feature vector will be projected into the same neuron or, at least, in the neighborhood of it, in the output space of the SOM grid.
As can be shown in Fig. 1, in our proposed model, we have two cascaded projections: first projection is when the standardized training data and the standardized testing data projected onto the eigenvectors of the space producing and .The second projection, is projected onto the SOM grid to construct the output space of the SOM neural network.
Typically, the incremental-learning algorithm of SOM networks proceeds as follows [32]: Let the codebooks of the SOM neurons modeled by the vectors whereas represents the observation vectors (input space). Then, we can define the regression of a set of weight vectors codebooks into the input space by the following mathematical relation (15): : is the sample index.
( ) is called the neighborhood function, which is often, chosen to be as Gaussian defined by (16): where, " " refers to the "winner" neurons on the SOM grid, where the weight vectors of winner neurons are defined by the condition (17): , represents the total number of neurons of SOM grid : represents the learning rate which decreases in a monotonic manner according to learning steps (iterations) and has the value of: .
: represents the width of the neighborhood function which decreases monotonically with learning steps.
: represent the 2D vectorial locations in the display SOM grid, where and .
In the incremental (online) learning described by (15) The unsupervised learning is accomplished recursively for each presentation of the training feature vector . However, in our work, we use batch SOM [33] instead of an online SOM variant. Thus, weight updating takes place at the end of each epoch. Mathematically, let's define and as the start and finish of each epoch, then, weight updating is given by (18): A self-organizing map in its raw version described in equations (15) to (18), serves as a dimensional reduction and facial representation analysis scheme and the next step of the recognition stage is to predict the SOM neuron's membership of a new (testing) facial feature vector matrix that is presented to the output layer of the SOM network.
This process yields a set of contiguous neurons in correspondence to a particular facial pattern in the testing dataset. As a consequence, a set of facial patterns might belong to different persons mapped to the same set of contagious neurons in the output layer of the SOM network. However, in our model, the SOM network is required to serve as a facial recognition engine that can recognize among 40 classes represents the subjects (person labels) of the original dataset. This entails SOM-nodes to be highly subject-oriented (high dependable on the person class). To enhance the uniqueness of SOM response, we apply a hierarchical agglomerative clustering on the SOM neurons themselves successively.
There are two main schemes of hierarchical cluster analysis: Agglomerative Hierarchical Clustering (HAC) and Divisive Hierarchical Clustering (DAC). In our proposed model, we used the agglomerative hierarchical clustering scheme as illustrated in Algorithm II. This type is of a bottomup approach where each SOM codebook neuron is treated as a singleton cluster at the outset, then it agglomerates each pair of clusters successively. This process continues until clusters are merged into a pre-specified number of clusters that are specified at the beginning of the process. The prespecified number of clusters, in turn, represents the different target classes of the recognition process.
where, : are the number of elements in respoectively. represents the members of respoectively.

5
: Based on a distance computed. Merge the two clusters of least distance into one cluster. Update while loop index:

: ENDWHILE
As a post-training step, the projected labeled training data used in analyzing and labeling the resulted SOM clusters based on the majority occurrence voting of a specific subject (face image belongs to a specific person in the training dataset). Then, new coming facial data is classified by the proposed facial recognizer into their corresponding classes.

IV. EXPERIMENTAL RESULTS
In this section, the facial recognition performance of our proposed system as well as a comparison to other facial recognition systems are represented. Moreover, a series of experiments are carried out to evaluate the efficiency and the robustness of the proposed facial recognition model.

A. Experimental Setup
All experiments are performed using the ORL (Olivetti Research Lab) [34] dataset, which is a classical dataset composed of 400 sample images, each of 92×112 grayscale pixel resolution with 256 intensity levels. The dataset contains images for 40 persons (subjects), 10 images for each person (subject). As shown in Fig. 5, The images were taken at different lighting conditions, even for some subjects they are taken at different sessions, which adds kind of facial distortions such as different facial expressions (smiling, nonsmiling, open eyes, and closed eyes) and different facial details (wearing glasses or no glasses). www.ijacsa.thesai.org Although all images are taken in an upright position in the frontal view, it exhibits a slight left-right rotation in the pose angle and alignment, which, in turn, can be exploited to examine the robustness of the proposed system against imprecise facial alignment.
We measured the performance based on the recognition rate criterion basically, which required setting up an experimental protocol same as used in several previous works in this field, where the images per person are randomly permutated. Then, five images of each person are used for the training stage whereas the other five images are used for testing purposes. Moreover, for sake of examining the efficiency and the reliability of the proposed system, we experimented with different sizes of training dataset per person. The results for 30 runs of the experiment were recorded and the average is taken and analyzed in the following experiments.

B. Experiment 1
The performance of our proposed model as well as the recognition performance of other recognition schemes are shown in Table I and Table II.  In Table I, we compared favorably with other facial recognizers which were built using either SOM or SOMvariants. However, based on a recently published survey, the SOM network has limited usage [41] in face recognition systems either using SOM as a feature extractor, data representative or using it as a face recognition engine. Therefore, to better discuss and interpret the results obtained from our proposed method, a comparison with other existing schemes that used different machine learning and artificial neural networks other than SOM network are shown in Table  www.ijacsa.thesai.org II. It is worth to mention that all schemes listed in Table I and  Table II are built based on ORL dataset.

C. Experiment 2: Impact of Training Dataset Volume
In this experiment, the impact of increasing the ratio of the size of facial data involved in the training stage to that used for testing purposes is demonstrated. Fig. 6 and Fig. 7 show the relationship between increasing the ratio of training to testing data versus the achieved recognition performance. The experiment is set up by varying the number of involved training images from 5 to 9 images. The characteristic curve between the recognition rate and the number of training images involved is computed for every case.
Obviously, as the size of training data increases, the principal components analysis algorithm generates more representative eigenvectors which, in turn, reflects in more accurate dataset projection and further enhanced dataset presentation enabling clustered SOM network to recognize better.

D. Experiment 3: Impact of Adding Noise
To discuss the robustness of our model against additive noise, we have conducted noise sensitivity experiments on the ORL dataset where the noisy testing images were generated by adding Gaussian additive noise of zero mean and different values of variation to each test image as illustrated in Fig. 8.  We run the experiment at five levels of deviation and the corresponding average recognition rate over 30 iterations using 200 images of the ORL dataset as training images (training to test ratio is 5:5) was reported as shown in Table III. In addition to using the raw images of the ORL dataset without any type of pre-processing.

E. Experiment 4: Time Cost
The objective of this experiment is to verify the proposed model in terms of overhead complexity where the average training and average testing time corresponding to different sizes of training to test ratios were recorded in Table IV. Simulations are done on MATLAB 2021a, executed on an Intel Core (TM) i7-4500U CPU, 8 GB RAM in Windows 10 platform using customized code developed for this model and using SOM toolbox [42].

V. DISCUSSION
It can be noted from Table I and Table II, our proposed model shows supremacy against most of the different methods that used SOM/SOM-variants network either as a classifier or as a feature extractor as in using the regular SOM in [23] or as RBF-kernel based SOM presented in [42] or concurrent SOMbased technique proposed in [22].
The system proposed by Tan et al. [11] has about 3.15% average improvement over our proposed method, however, this improvement comes with a computational cost as a normal result to portioning each image into sub-block for sake of extracting local-features of facial data, although Tan et al. [11] didn't refer to the time expended, extensive computations always reflect as time-consuming and power-hungry face recognition style. The same scenario repeated for [24] where SOM is used as a feature extractor that fed a CNN network with optimal facial data representation. www.ijacsa.thesai.org Zhi and Ming [35] had achieved comparable performance to our method, however, the training data size used is large than that used in our method. Moreover, for the same training data size used by [35], our proposed system can achieve an average performance reaches up to 97.61% as elaborated in Fig. 7. Although superior results are reported in Table I for methods that used techniques other than PCA and SOM, these methods have underperformance in contrary to our proposed system in other aspects of recognition performance.
Abuzneid et al. [13] combined different types of machine learning methods that led to a computationally intensive solution including increasing the system latency resulted from by recognizing facial images in the test phase of the system.
On the other hand, authors [13] had conducted imagepreprocessing including cropping, resizing, and histogram equalization and it took 25 hours to perform the backpropagation network training, where nothing was recorded about the time required for the testing stage. The proposed method proposed by Abuzneid et al. [13] has several cascaded computational blocks as LBPH, BBNN representation, and multi-KNN which is considered a huge computational overhead.
As another example, in the proposed system by Sun et al. [43], two types of descriptors, Local Gradient Number Pattern (LGNP) and Fuzzy Convex-Concave Partition (FCCP) were used to represent facial data. Moreover, a deep neural network was used as a classification engine which represents a recognition system with high complexity.
Lawrence et al. [24] reported in their experimental results that without preprocessing step, the resultant error yielded two times greater error rate which means that the average recognition rate will decrease down to 88.5%.
The authors reported that the training time required to train the CNN network was approximately 4 hours. Although Gupta et al. [21] achieved a comparable recognition performance using SIFT-64 and SURF-64 facial data representation that cascaded by random forest as a classifier, authors [21], as can be noted from Table II used 80% of the ORL dataset to train their model. However, as shown in Table III, our model can achieve a higher recognition performance reached up to 98.16% for using this percentage of training data.
As a typical example of image pre-processing that can enhance the overall performance but at the same time can blur many of the facial recognition performance drawbacks is that proposed by Qin et al [44] where down sampling algorithm is used to resize the facial image down into 46×56 pixel matrix proceeded by a non-linear transformation stretching gray image enhancement as a pre-processing step. Collaborative Representation (CR) was used as a feature extractor whereas an enhanced KNN was used as a classification engine. In our case, we have used raw images for training and testing to test the robustness of our system against different facial effects in one hand and to keep the computational cost down to the minimum in the other hand.  As shown in Fig. 8 and as revealed in Table III, our proposed system can handle adding additional white Gaussian noise at different levels of variations and beyond that can occur in real-time photo capturing.
As with all face recognizers that built on the machine learning techniques, one major limitation to our proposed model is the need to re-train in case of a new persons (subjects) are added to the database. However, as shown in Table IV, the time cost required for training 200 images is less than 4 seconds and that required for 360 images is less than 8 seconds which is substantially low if it is to be used in real-time applications. www.ijacsa.thesai.org VI. CONCLUSION This paper has presented an agglomerative clustered SOMbased face recognition model where regular PCA was used to extract the eigenfaces of the facial data for purpose of facial data representation whereas supervised clustered SOM was used as the recognition engine of the proposed model. The proposed model is found to be efficient in terms of time cost for both training and testing stages where it takes less than 3.7 seconds to train the model using 200 training images whereas identifying one single image takes less than 7e-5 seconds. Therefore, the online training version of the system can be used efficiently for real-life applications where the cost of training and testing is as important issue same as the recognition accuracy. The proposed model is rigorously validated using the ORL dataset and based on the comparative analysis conducted in this work, the recognition performance is superior to methods that used SOM/SOM-variants. Moreover, the system shows robustness against adding Gaussian noise at different levels of variations as well as robustness against using the raw facial data without the need for an image pre-processing step. Using the clustered SOM with features processors other than PCA or using an ensemble of SOM networks are a possible extension for our future work.