Competitive Sparse Representation Classification for Face Recognition

—A method, named competitive sparse representation classification (CSRC), is proposed for face recognition in this paper. CSRC introduces a lowest competitive deletion mechanism which removes the lowest competitive sample based on the competitive ability of training samples for representing a probe in multiple rounds collaborative linear representation. In other words, in each round of competing, whether a training sample is retained or not in the next round depends on the ability of representing the input probe. Because of the number of training samples used for representing the probe decreases in CSRC, the coding vector is transformed into a low dimensional space comparing with the initial coding vector. Then the sparse representation makes CSRC discriminative for classifying the probe. In addition, due to the fast algorithm, the FR system has less computational cost. To verify the validity of CSRC, we conduct a series of experiments on AR, Extended YB, and ORL databases respectively.


INTRODUCTION
Face Recognition (FR) has become to a hot research area for its convenience in daily life.Recently, linear representation methods are very popular which represent the probe with training samples from gallery set.Collaborative representation (CR) method has achieved good performance for FR [1][2][3][4],in which a given testing image y can be represented by a training set A with a coding vector x, i.e. y=Ax.The training set A including all samples from all subjects is an over-complete dictionary.It is known that face images from a specific class lie in a linear subspace and a probe can be represented by images which have the same label as the probe.Comparing to the single-representation method, collaborative representation has more ability to compensate the pixels of probe.In order to make the coding vector more discriminative, the sparse constraint was introduced in regular term.Inspired by compressive sensing, the induced sparse constraint on coding vector uses 0 l -norm so that the representation problem is formulized as: The 0 l -norm is widely discussed and used in many researches.However, the problem of find the sparsest solution of an underdetermined system suffers the issue of NP-hard.Researchers put forward different solutions to 0 l -norm [5][6][7].Now, the sparse constraint methods in CR can be divided into two categories: first one uses 1 l -norm constraint instead of 0 lnorm, and second one employs supervised sparse scheme.The first CR method uses 1 l -norm in FR is Sparse Representation Classification (SRC) [1].In SRC, the sparse representation is solved using Lasso formulation in which the sparse degree of the coding vector can be adjusted by the norm constraint intensity.However, the 1 l -norm constraint could not bring good performance when the training samples have high correlation [8].In addition, 1 l -norm based sparse representation problem is quite time consuming.A two-phase sparse representation (TPTSR) was proposed by Xu. et.al using supervised sparse constraint scheme [9].The M nearest neighbors of a probe are selected based on the first phase of representation and then used as the new training set in the second phase of representation in TPTSR.However, the onetime deletion in TPTSR may lead to all samples from the classes as probe are removed when the probe is seriously distorted.In addition, TPTSR is sensitive to illumination because of the samples with negative coefficients are likely removed.
In this paper, we propose a supervised sparse constraint method named as competitive sparse representation classification (CSRC).Based on the competitive ability of each training sample for representing a probe, the proposed CSRC introduces a lowest competitive deletion mechanism which removes the lowest competitive samples based on the competitive ability of each training sample in collaborative www.ijacsa.thesai.orglinear representation.Only those samples with high competitive can be used in the next collaborative representation.Then the dimensionality of the coding vector in the next representation is bigger than the current one.The multi-phase deletion is more useful for classification than two-phases deletion [10].According to that the probe is represented based on the multi-phase deletion and until the condition is satisfied.Meanwhile, the dimensionality of the final coding vector is much smaller than the first one's, i.e. the coding vector is sparse.In CSRC, the competitive ability of samples from correct class as probe is increasing as the lowest competitive ones are removed.In addition, the fast algorithm of CSRC enhances the efficiency of the FR system, because the algorithm avoids the procedure of finding inverse matrixes in each collaborative representation.
One advantage of CSRC is the multi-phases deletion lets the competitive ability of the correct class is strengthened gradually and avoids all samples form correct class of probe are removed in the one-time collaborative representation.The other one is that comparing with 1 l -norm sparse constraint, CSRC has lower computational complexity with the fast algorithm.
This paper is organized as followed: in section 2, three parts are described: the introduction about a basic general framework for classification using competitive sparse representation, description the optimization method, and the analysis of the computational cost of CSRC.The features of CSRC are described in section 3.We conduct a series of experiments to verify the good performance of our method in section 4 and the conclusions are demonstrated in section 5.

II. COMPETITIVE SPARSE REPRESENTATION CLASSIFICATION
Face images from a same class lie in a linear subspace, a query image can be represented by within-class samples [11,12].But face recognition is a lack of samples problem in general.When the number of training samples of each class is not big enough, it is hard to obtain a good representation of a query image by a small part of training samples that from a single-subject.That is to say, the representation has large distance with the query image.Thus, the recognition result is unstable.However, the query image can be represented faithfully by collaborative linear representation which each training samples are put in the dictionary (sometimes the dictionary is over-completed).Because more training samples participate in representing the query image in collaborative linear representation, the competitive of training samples that from the class as the query decreases.As a consequence, the query image is likely classified into the wrong class.However, in the collaborative representation, not every sample has high competitiveness (high coefficient value).So removing these less competitive images from the dictionary can increase the competitive of samples from correct class.
The lowest competitive deletion mechanism in CSRC, which delete the lowest competitive training samples from the dictionary in multi-phases.This mechanism can increase the competitive of the correct samples through removing the lowest competitive samples.In the meantime, as the samples are removed in CSRC, the dimensional of representation coefficients are smaller.Compared with the space of the initial representation coefficients, the final representation coefficients lie in a subspace of it.In other words, the representation coefficients are sparse.The sparse coding vector (representation coefficients) has more discriminate information.

A. Competitive Sparse Representation Classification
Given sufficient training samples of the i th object class, The representation framework is written as following: Where  is the noisy term.The Eq. ( 2) can be written as ridge regression form: The coding vector can be computed as: Where the unit matrix 1 , and remove the corresponding training sample j a .So the dictionary is divided into two subsets: the first subset includes the deleted image, and the second subset includes the retained samples.Here the samples in the second part will be used as a dictionary in the second phase.Let 1r A and 2 A denote the removed sample after the first representation and the training samples in 2th representation respectively.So the above two subsets can be described as . Then the test y can be represented over the new dictionary 2 A .In the same way, repeatedly conduct the above operation in the next representation phase.
Assume the k th collaborative representation reaches the maximum number of the deletion phases.The final coding vector k x can be represented as following: Since many samples are removed from the dictionary 1 A www.ijacsa.thesai.org that is used in the first representation, it is very likely that k A excludes all the samples of some classes.Therefore, the origin classification problem is weakened to a simpler problem which contains fewer classes.The coefficients of the deleted samples as zero and then select the coefficients associated with the i th class and mark it as i δ , 1, 2,..., ic  .Then the Euclidean distance is used for measuring the distance between each class and the test image y .The rule of classification is in favor of the class with minimum distance.The formula can be expressed as following: 2 ID(i) arg min ( ) arg min || || 1, 2,..., The detailed algorithm is given as following:

B. Optimization
As it known to all, the analytical solution of the above linear model is T 1T ()    x A A I A y .Due to the deletion operation, the dictionary is updated in each phase.However, the new dictionary is the subset of the last dictionary.CSRC implements a fast algorithm to avoiding the repeated matrix inversion calculation.Now let a new symbol A expresses the elementary transformation of A , i.e.  A AE , where E is an elementary matrix.A can be treated that the matrix contains two matrices .The detailed derivation processes are written as follows:

A A I AE AE I E A AE I E A AE E IE E A A I E E A A I E E A A I E (7)
The key problem for solving the coding vector is inverse matrix.In order to have a convenience expression in (7), the equation can be represented by the four matrices ( O , P , C , and V ) as: According the elementary transformation of matrix, the inverse matrix will be transformed as following:.

OP O O P V CO P CO O P V CO P A A I
CV V CO P CO V CO P (10) In fact, the ultimate goal is to obtain the solution inverse matrix about s A , which is used as the new dictionary in the next iteration, then the inverse matrix can be written as Combining ( 12) and ( 13), (14), and (15) respectively, the four block matrices can be descripted as following

C. Complexity analysis
We only analyze the time complexity of (5) in this section, since this process is the most time consuming in algorithm of CSRC.It is time consuming way to solve (5) directly in a certain collaborative representation and the time complexity is

() O n m+ n
However, the time complexity that CSRC obtains the coding vector is much less than (19).Since the matrix inversion in ( 5) is replaced by (18).Moreover, the matrix 22 Q in ( 18) is only one element, it save more calculation.Therefore, the complexity of CSRC is denoted as 2 () O n m .In addition, it is much less than the complexity that SRC obtains the spares coding vector, i.e.

III. ANALYSIS OF CSRC
The method Collaborative representation based on Classification (CRC) uses 2 l -norm constraint coding vector in the collaborative representation [13].Although the coding vector is not sparse, it fully embodies the competitive level of each training sample in CRC.However, CRC obtains the regression model uses only one collaborative representation.When the number of the training samples is small, it may lead to regression model over-fitted.The competitive representation, adopted in CSRC, removes the lowest competitive training sample from the training set in the current round and the rest training samples will be used in the nest round.After several rounds of competing, all samples of the subject which has a low correlation with the query may be removed.So CSRC reduces the scale of the FR problem.The Fig. 1, depicts the residuals between the probe and prediction of each class which are calculated by CRC and CSRC respectively, illustrates this phenomenon on ORL database.The upper one in the figure is obtained by CRC and the bottom one is obtained by CSRC.The distance between the probe and the predictions from over 20 classes is 1, which means the training from these classes are removed and the coefficients respect to them are zeros.In addition, in CSRC method, the lowest competitive deletion mechanism reduces residual the correct class of the probe and enlarges the residuals of the wrong classes.According to the figure it is easy to calculate that the ratio of two smallest residuals by CRC and CSRC are 1.579 and 1.717 respectively.The ratio of two smallest residuals is enlarged by CSRC, which means CSRC has better discriminative than CRC.
It is easy to find that CSRC reduces the interference of the wrong classes.Fig. 1.The residual images by CRC and CSRC for a clear testing face on the ORL database.We select the fifth image of the first man as a probe and the first five images as training samples.In above two histograms, the horizontal axis denotes the number of the class and the vertical axis denotes the residuals between the probe and each class.The top one: the two smallest residual are 0.4922 and 0.7774 by CRC and the ratio of them is 1.579.The bottom one: two smallest residual are 0.4895 and 0.84.3 by CSRC and the ratio of them is 1.717

IV. EXPERIMENTAL RESULTS
To evaluate the proposed CSRC algorithm, we conduct a serious of experiments on images from AR database, Extended YB database and ORL database respectively, as well as comparing with state-of-the-art methods including CRC, SRC (without extended matrix), and TPTSR (the candidate set is 10%).We also assess the recognition rate of CSRC(the candidate set is 10%) on the occluded testing faces.All experiments are performed in MATLAB on 2014b on desktop with 4GHZ CPU and 8G RAM.

AR database
More than 4000 color face images of 126 people (70 men and 56 women) consist in AR database [14].Each people has 26 images include frontal views of face with different facial expression, illumination and occlusion.The pictures of each individual were taken in two sessions (separated by two weeks).Each section contains 13 color images and 120 individuals (65 men and 55 women) participated in both sessions.The images of these 100 individuals (50 women and 50 men) were selected and used in our experiment.Faces that are used to test these methods are gray and then normalized it to 50×40 pixels.www.ijacsa.thesai.org We select the first seven faces in session one as training samples and the first seven faces in the session two as the testing samples for each class and a specific class faces are shown in Fig. 2. The recognition rates of the four methods are shown in the Tab. 1.Since the testing samples and training samples are collected in different time, none of all methods have a 100% recognition rate.Since the training samples have high correlation, the sparse representation by SRC could not obtain a good performance.However, since CRC could not increase the competitive ability of the samples from the correct class as probe, so CRC has lower result than TPTSR and CSRC.Furthermore, TPTSR has lower recognition rate than CSRC, because of that TPTSR is likely to delete all images from the correct class in the first phase.From the experiment we can see that the lowest competitive deletion mechanism in CSRC makes the coding vector has more discriminant information indeed.

Extended Yale B database
The extended Yale B face database contains 38 persons under 64 illumination conditions [15,16].A subset (contains 31 individuals) is used in this experiment.The 64 images of a person in a particular pose are acquired at camera frame rate of 30 frames/ second, so there is only small change in head pose and facial expression for those 64 images.Each image is resized to 50×40 pixels in our experiment.Several frontal faces of one person are shown in Fig. 3.As is known to all, illumination is another big challenge for face recognition.
Faces were captured under carious laboratory-controlled lighting conditions.Samples in subset one (seven images per person) under nominal lighting condition was used as the gallery.Since the recognition rate for test subset 2and 3 (characterize slight-to-moderate luminance variations) are by all methods.Here we select faces in subset 4 are used for verify CSRC method.Due to the increasing illumination condition, the recognition results are not very high in subset 4. From the Tab.2, CSRC is better than CRC for testing the illuminated images, which means the deletion mechanism makes the classification more discriminative.In addition, compared with TPTSR, CSRC has about 22% higher recognition result than it.The reason for which is that the training samples from the correct class as probe is easy removed in the two-phases deletion, as well as the samples with negative coefficient are not deleted.To the contrary CSRC reduces the risk that the correct training samples will be removed in one time through the multi-phase competitive deletion.each subject, we choose the first five images as the training images and the rest images are used for testing.From the Tab.3, the recognition results of these four methods are close.Since the deletion operation in CSRC, the sparse coding vector has more discrimination than CRC, so CSRC has 1.5% higher recognition rate than CRC.

B. Recognition with sunglasses and scarf
In this section we test CSRC's ability to cope with real possibly malicious occlusions using a subset of AR database.The chosen subset consists of 1200 images of 100 subjects, 50 male and 50 female.For each subject, eight frontal faces (half face are from session one and another half are from session two) without occlusions are used as training samples.We select the testing face images with sunglasses (two samples for each subject and each sample with about 20 percent occlusion) and scarf (two samples for each subject and each sample with approximately 40 percent occlusion on the faces) respectively and the testing samples of a specific subject are show in Fig. 5.The recognition rates by TPTSR and CSRC are shown in Tab. 4. CSRC has little better than TPTSR for testing samples with sunglasses.In the scarf case, the recognition rate by CSRC is 26% higher than TPTSR's.Because that the proportion of the scarf almost reaches to 40%, it is likely that the images of correct class as probe are deleted in the first collaborative representation in TPTSR.On the contrary, CSRC makes sure the images of correct class of probe have high competitive.V. CONCLUSIONS In this paper, a competitive representation framework is proposed to solve the sparse representation problem.The lowest competitive deletion mechanism ensures the competitive ability for representing a probe decrease and enhances the competitive ability of the correct class as the probe.What's more the fast algorithm makes the FR system more efficiency.According to the experiments, the multiple rounds of competitive representation has better performance in general than the two-phase deletion.In addition, SCRC adoptively reduces the over-fitting issue of the regression model.However, CSRC also has some disadvantages, such as has not enough robustness to deal with occlusions, disguises, and corruption.In the further, we will pay more attention on these disadvantages.
denotes the 0 l -norm, which counts the number of nonzero entries of the coding vector and  is a small error tolerance.
 is a Lagrangian coefficient.Since CSRC deletes only one training sample in each phase, CSRC removes the least competitive samples based on the corresponding entries of the coding vector, i.e., it finds the minimum absolute value

A 1 (
denote the deleted sample and the new training samples respectively.Since the matrix A is given, so the inverse matrix T things are that O , P , C and V are already known in the matrix A .A group of new symbols are introduced to express the four block of T 1 () ORL database, created by AT&T lab in Cambridge University, contains 400 face images of 40 subjects, i.e. each individual providing 10 face images, including expression variants, multiple directions of posture change within 20% of the scale of the change.Dimensionality of each face is reduced to 50×40.All face images are show in Fig. 4. For

Fig. 2 .
Fig. 2. Frontal faces with emotion and illumination changes on AR database.The top seven faces are from session one and the down seven samples are from session two

Fig. 4 .
Fig. 4.Ten images of a specific class from the ORL database.The top row represents the training samples and the images in the bottom row are testing samples

Fig. 5 .
Fig. 5. Face images with sunglasses and scarf respectively on AR database

TABLE I .
RECOGNITION RATES (%) ON AR DSTABASE Fig. 3. Some sample faces of a subject from Extended Yale B database.The top row: seven images with moderate illuminance variations from subset 1.The down row: a part images with large illumination variations from subset 4 www.ijacsa.thesai.org

TABLE III .
RECOGNITION RATES (%) ON ORL DATABASE

TABLE IV .
RECOGNITION RATES (%) ON AR DATABASE FOR SUNGLASSES AND SCARF