Face Retrieval Based On Local Binary Pattern and Its Variants : A Comprehensive Study

Face retrieval (FR) is one of the specific fields in content-based image retrieval (CBIR). Its aim is to search relevant faces in large database based on the contents of the images rather than the metadata. It has many applications in important areas such as face searching, forensics, and identification... In this paper, we experimentally evaluate Face Retrieval based on Local Binary Pattern (LBP) and its variants: Rotation Invariant Local Binary Pattern (RILBP) and Pyramid of Local Binary Pattern (PLBP). We also use a grid LBP based operator, which divides an image into sub-regions then concentrates LBP feature vector from each of them into a spatially enhanced feature histogram. These features were firstly tested on three fontal face datasets: The Database of Faces (TDF), Caltech Faces 1999 (CF1999) and the combination of The Database of faces and Caltech Faces 1999 (CF). Good result on these dataset has encouraged us to conduct tests on Labeled Faces in the Wild (LFW), where the images were taken from real-world condition. Mean average precision (MAP) was used for measuring the performance of the system. We carry out the experiments in two main stages indexing and searching with the use of k-fold cross-validation. We further boost the system by using Locality Sensitive Hashing (LSH). Furthermore, we also evaluate the impact of LSH on the searching stage. The experimental results have shown that LSH is effective for face searching as well as LBP is robust feature in fontal face retrieval. Keywords—Face Retrieval; LBP; PLBP; Grid LBP; LSH


INTRODUCTION
The principle of face retrieval problem is the comparison of faces so that those which higher "likelihood" will near the top of result list.In practice, these "likelihood" parameters are presented by a number, which is calculated by a distance function.Next problem is to quantify the faces in the way that a computer can understand.A face can be seen as a combination of smaller part such as eyes, nose, and lips.From that assumption, our approach falls into global features because they can manifest the relative position of these parts in a face or else the global shape.Furthermore, LBP feature can characterized local object appearance rather well.Finally, we choose Euclidean distance to compare LBP feature vector due to the simplification in implement and its low computation cost.
Despite having applications in important areas (face finding, medical, forensics, etc.), retrieving images of faces with high performance in real-world condition is a difficult task due to technical and practical challenges.Beside semantic gap, there are challenges that we should consider when designing a face retrieval system.The first challenge is measuring similarity among images.This was done by compare their feature vector.A good method will increase the performance of system while maintaining the computational time.The next challenge is about how good is your system work in real life.It is difficult to design the system that work well in any conditions, due to broad domains (faces of many people in database), high variance of content (images captured in daylight/night, bad weather/good weather, indoor/outdoor…), capture setting (high/low brightness, HDR, etc.), among many others.Finally, understanding of user needs and intentions is the key.This involved in user interface as well as accuracy of the system.For example, face retrieval applications in forensics which required high accuracy and fast search time to find criminal in very large database is different with the applications for finding some famous people look like you which easy to use and do not require a good accuracy.A face retrieval system consist of indexing, organizing, annotating, and retrieving visual information about faces.Its goal is to retrieve desired face images from the given information (a face image).In this test, we concern the effect of LBP and its extension on indexing and retrieving to find out if good features can increase the system performance.In Figure 1, we introduced a face retrieval system that focus on the important of feature extraction and indexing process.They are the key issues due to the role of index in FR system.Feature extraction decides what feature(s) to be save in the index while the indexing process influent the way of searcher works and the execution time of searching process.In our tests, we used pre-cropped face images so there is no use of face detection module.In application view, this module is needed for extracting face from user provided image.We can easily build it by using existing function from OpenCV library.www.ijacsa.thesai.org The rest of the paper is organized as follows.Firstly, we present a face retrieval system designed to work with global features.This system focuses mainly on the indexing process because it decides how the searching process occurred.In addition, time in FR is an important factor so the way index was built wills greatly influent searching time.Secondly, we empirically evaluate LBP and its variants for face retrieval on several dataset that focus on fontal face.Furthermore, we also test our approach on the Labeled Faces in the Wild (LFW) dataset has some changes, for example, difference in facial detail or facial expression… Results from these dataset help understanding the effect of dataset properties on the system.LFW puts the face retrieval system in real-world condition by using image collected around the web.Testing on this dataset will determine the practical applicability of tested features.Thirdly, we boost the system by using LSH.We also conducted tests to prove its ability to the traditional nearest neighbor method.The tests include indexing time, searching time, and searching time on different size of dataset.
The rest of this paper is structured as follows.In section II, we present related works.The original LBP and its variants are introduced in section III.We discuss the indexing process of FR in section IV.Experimental result and discussion are showed in section V. Finally, section VI draws conclusions of our work and indicates future studies.

A. Content-based image retrieval
The origin of the term "Content-based image retrieval" was mentioned in [1] that Kato (1992) used it to describe his tests in automatic retrieval of images from a database using color and shape feature.Since then, the term has been widely used to define the process of retrieving images from a large collection using features that can be automatically extracted from the images.These features can be primitive or semantic but the extraction process must be predominantly automatic.
The vital difference between CBIR and classical information retrieval (IR) is the image databases differ from text databases.A digital image can be seen as an array of pixel intensities which we cannot directly understand while text (collection of words stored as ASCII strings) is logically organized by the author.When the search process occur on the databases, IR searcher use metadata or full-text in the index and CBIR searcher use a distance function to measure the similarity of the query image's feature(s) among images' feature(s) in the databases.Therefore, searching images by keywords from manually assigned is not CBIR.
There are keys issues that was mention in [1] as the important needs when designing a CBIR system:  Understanding of user's needs and intention.
 Choosing feature(s), which balance in computing cost and retrieval performance.
 Providing compact storage for bulk images as well as efficient ways to access, update, and delete them.
 Designing a good method that reflects human similarity judgments to match query and stored images.
CBIR uses many methods from the field of image processing and computed vision since it involved these fields in feature extraction, preprocessing images (resizing, adjusting contrast, cropping…)… We can think of CBIR as a combination of databases with these methods in a way that created a working system for searching images based on theirs content.

B. Face retrieval
We can simply regard a face retrieval system as a CBIR system that works with raw pixel values from face images.In 1987, a face retrieval application, FRAME -Face Retrieval and Matching Equipment, was introduced to help the police in their investigation [2].In the early form of FR presented in [3], the indexing stage is not fully automatic due to the lack of good face component detection.So there is a module that allows user to add addition information and modify the result of face's components detection.There also some limitations, for example, the way of features extraction was complicated and computation cost, the system was still depend on supplement information as text to increase retrieval performance… We can stage that this is not truly a CBIR system but this work has pointed out the important parts of a face retrieval: feature extraction, way of storing and accessing images in the database, a distance function to measure image similarity, how we use an image as a query to search on the database.
Later in 1994, MIT's Media Laboratory presented a famous FR system that its core is the eigenfaces database [4 -5].The special thing about this system is it was trained with large face images database to compute 20 features (called eigenfeatures).These features can characterize any human face in higher level of information which more robust and brought better precision than raw pixel values.The drawbacks of this method are the system needs training before further process, and it cannot completely automated for broad domains of images [6].

C. Local Binary Pattern
LBP was first introduce in [7] as a powerful feature for texture classification.It was later use for solving face recognition [8 -10] due to its ability to represent face image [10 -12].It also used for facial expressions recognition [13], gender classification [14], human detection [15], etc.It also combined with block-based method for image retrieval [16].Many works were done to extend its invariant against rotation [17] and scale [18].

III. LBP AND ITS VARIANTS
In this section, we will brief introduction about the original LBP and its extensions: Rotation Invariant Local Binary Pattern, Pyramid of LBP and Grid LBP.The content of each sub-section included the idea behind, the process of calculating histogram for each version.

A. Local Binary Pattern
A LBP operator considered each pixel has a code (called Local Binary Patterns codes or LBP codes) which was calculated by thresholding its neighborhood with the center value (Figure 2).As a consequence, this make LBP invariant against gray-scale.It also presented local primitives such as curve edges, spots, corner and so on [11].www.ijacsa.thesai.orgIn Figure 3, the neighborhood was expanded to capture dominant feature with large-scale structures.The neighborhood can be denoted by a pair where is the sampling points on a circle of radius of .Therefore, there are different output values.
Oajala's study [7] has proved that some bins in LBP histogram contain more information than others.These bins is called uniform patterns.A LBP code must contain at most two bitwise transitions from zero to one or vice versa when the binary string is considered circular (Figure 5).It was shown that neighborhood accounting nearly 90% of all patterns and about 70% for in the neighborhood.
here n is the number of total different output generated by the LBP operator as follows: This LBP histogram is statistical since each local micropatterns in Figure 3 have their own LBP code and creating a histogram over the whole image will presented their distribution.Therefore, it can describe image characteristics.

B. Rotation Invariant Local Binary Pattern
In Figure 4, we can see that an edge pattern has LBP code of 26 but if we rotate this pattern 90 o clockwise its LBP code is 7. Therefore, rotation has distinct these pattern into different bins.In generality, if we compare an image with its rotated one, computer will consider these two images different base on their LBP histogram.To extended LBP ability against rotation, Ojala et al [17] has provided a simple yet effective operator call .As in Figure 6, this operator use neighborhood to generate 36 different values that represent 36 unique rotation invariant LBP patterns.We can obtain these codes easily by rotating LBP code: where is a circular bit-wise right shift.Also in [17] mention this operator is not good at discrimination due to the distraction of some patterns to the analysis and crude quantization of the angular space at 45 o intervals.To overcome this problem, 36 patterns were reduced to 9 patterns which were numbered in Figure 6.The process of calculating these patterns is denoted by: where: , is the center pixel and is its neighborhood.
 is a function that calculate number of spatial transitions in binary code which was denoted by: www.ijacsa.thesai.org There is a operator which is consider 5 x 5 square contain 16 pixel neighborhood but we don't go into detail about it since we use the original LBP operator in our test.

C. Pyramid of LBP
LBP was extended its feature against texture resolution by cascading the hierarchical spatial pyramids information of LBP [18].Detail of this process was showed in Figure 7. Firstly, edge contours was extracted from input image by using Canny edge detection which using Sobel filter without Gaussian smoothing.Then the image was partition into level.In each level, instead of calculating gradients, we use operator to get histogram from sub-region in that level.Finally, these histograms was concentrated by order of level to form a PLBP feature vector.

D. Grid LBP
A face image is a combination of eyes, nose, lip… each component can be seen as a micro-pattern which effectively presented by the LBP histogram.This can be done by equally dividing face images into small regions .The LBP histograms obtained from these regions are concatenated into a spatially enhanced, single feature histogram, which was defined as follows: where , , is an image labeled by using LBP operator.
The relative position of each sub-regions' histogram indicated their location on the face image.This represented global shape of face images as well as the local texture.Therefore, it is perceptive that LBP feature can represent face images [10 -12].Following the setting in [10], we choose operator and divide face images into 6x7 sub-regions (Figure 8).Thus the final histogram will have the length of 2478 (59x42).To boost the searching process in trade-off of low computation and space, we use locality sensitive hashing.The idea behind LSH [19] is using feature vectors to create random projections then combined them into a hash string.A LSH function map a feature vector to a collection of integer that each integer was calculate as follows: where is a vector which created by random independent values from a normal distribution, the constant affects the granularity of the results and a random, the uniformly distributed number [ .
There are different ways of using LSH depend on types of problem.A more detail can be found in [20].In retrieval problem, these parameters: , , and were initialized once with unchanged value before indexing.In searching stage, due to the use of distance (Euclidean distance), we assume two images can be used for nearest neighbors if there are which matched hashes of hash tuples.For implementation, a hash bundle was randomly generated before using in indexing and searching.To draw normally distributed random numbers for , the Box-Muller [21] transform was used as follows: where , are variables for and ] are uniformly distributed random numbers.
In the indexing process, after feature vectors obtained from feature extractor, each will be projected into hashes.Then information represent for an image (path, feature vector, hashes) is packed.To index that image, indexer considers hashes information from package as a string so that it can index this string in an inverted index.The searching process on existing index was done by generating hashes from query image's feature vector and using it as a string query.Result of this process is a list of items (each item considered as a package) that later sorted based on the similarity of query image's feature vector to each item's feature vector.

A. Datasets
The database of faces (TDF) [22] includes images of 40 distinct subjects.Each subject has ten different images ( pixels), which each image was taken in dark background.This dataset features the varying in lighting, facial expression (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses) in its images.
Caltech Faces 1999 (CF1999) [23] differ from previous datasets by adding backgrounds and facial expressions to its images.However, there is no difference in facial detail.This database has 447 images ( pixels) of 27 unique people.We later crop and resize images in this database so that only frontal face in the image.
Combination of CF1999 and TDF (CF) was created for: testing the sustainability of LBP features against the increasing of the number of images and the diversity of data properties.
Labeled Faces in the Wild (LFW) [24] contain 13,233 copped images ( pixels) which were collected around the web.All the images were taken in real-world condition.As we observed, these images also difference in face poses, ages, blur / not blur… That make this database far more challenge than any database that we used in the test.

B. Mean Average Precision (MAP)
To measure the retrieval effectiveness based on queries result we use MAP, which was denoted as follows: where is the number of queries, is the number of correct items among first recommendations, and is the position (start from 1) of item in the recommendation list which has items .If a person has correct images but we request a list of = 10 items then else .Equation (7) means that the more correct items appear in recommendation list and the higher their positions in this list, the higher MAP return.www.ijacsa.thesai.org

C. Evaluation Framework
We prepared images of each dataset for further stages by conduct a preprocessing stage that included cropping, resizing (using Lanczos algorithm) and converting to gray scale.After that these images was divided into three equal parts, two (as training set) for indexing and one (as testing set) for searching.The evaluation of a dataset was ended when all parts was used for querying which means after we done querying with first part, the process (indexing and searching) was repeated with second part (the rest two part was used for indexing) and then with final part. ) and grid ).

D. Results and disscustion
We have done some tests to show the computation trade-off with/without using LSH.The result of LWF has more weight than the rest in Table I because we usually deal with large image database in real-world application.The 63.3% is lower search time of LSH method is significant compare with small increasing of indexing time as showed in Table II.The results in Figure 10 have showed the effectiveness of LSH with traditional method (nearest neighbor).We timed 500 queries on different dataset size from 500 to 12000 images.Each query will return a list of 50 items (recommenced faces).Image used in query and dataset are randomly took from LWF dataset.It is clearly that traditional method is good if the size of dataset is under 4000.Beyond that point, the execution time rapidly grown while the time of LSH slightly increase and remain under 180 milliseconds/query even if the size is 12000 images.We conducted the tests with different number of returned items from a query.This will show how probability of finding proper faces changes when you continue to look further at the list.In the later part, we will mainly use MAP results of the list of 10 items.There also an illustration of the result of a query from our system in Figures 11 and 12. Fig. 11 shows some results from these queries that use and .Figure 12 shows some results from these queries that use on and .These results show the power of and compare to and , respectively.Overall, operator that involved parting image into smaller region is the best method.It was showed that , and yield better result than the others.The MAP in LFW dataset is very low compare to the results of other datasets but it showed that in general, the uniform version is better and image partition greatly improved retrieval performance.TDF expanded CF1999 by adding different facial details and some side movement of face into its content but its image background is constant (dark background).In Table IV, performed nearly 250% better by 0.4411/0.4740compare to its MAP in CF1999.There is also a slight increase in MAP of but its score is still very low by 0.1868/0.1686.Same as CF1999 dataset, has higher MAP than and they still hold the second and third high score.While others operator showed their improvement in MAP, decreased significantly from 0.8540 in CF1999 to 0.6490 in TDF.From the result in Table V, it can be seen that except and , the entire MAP score of others operator is higher than CF1999's scores and lower than TDF's scores.continue to show its poorly perform in this dataset by the lowest MAP of 0.1379/0.1282.In the other side, survived the test and showed the better score than TDF dataset.
We put LBP and its extension in to real-world test with LFW dataset.This dataset included previous datasets properties and real life challenges: different light setup; blur, noise, many faces are partly covered, same person but different ages.All the operator and method perform very poorly.Table VI showed that none the MAP scores in this dataset is over 0.002.Except some result that show the MAP scores from the list of 50 -70 items of .This dataset presented the trend that the uniform version has higher MAP than the original.In the terms of user, we likely look for the correct items in first 10-20 results.So we statistically count number of queries that there is no correct item in their first 10 returned items.Then we divided them to total of queries in each dataset.From Table VII, we can find out that which feature is better when using in real life application.continue to show its impressive ability in practice, its error on first three dataset always below 1% and despite high error rate in LFW dataset, it has the lowest error rate among the tested features.There also two potential features that can be used in practice although their MAP   In general, we can not use LBP and its extensions in practice but there are two feature that showed their promise potential grid and .The biggest challenges these features encountered so far are: (1) many photo was taken in different backgrounds; (2) skewness make the matching between face images even harder since this break www.ijacsa.thesai.org the relation of texture position; (3) blur, noise cause the loss of texture information; (4) different facial expression and detail.

VI. CONCLUSION
In this paper, we present our result of LBP and its variant in a FR system.We have been evaluated on three fontal faces datasets.Each dataset has unique properties that represent challenges when work with face images.It has been proved that overpowered the rest in all test.Further tests in LFW showed that the lack of skewness invariant as well as bur, facial expression and noise has a huge drawback on all tested features performance.High error rate in LFW dataset shows that we need to overcome these problems before apply features on real-world application.We also find out that can be as well as from user perspective.Next, the key contribution is that our system can stand against the increasing in the size of dataset and its properties without sudden drop in MAP.Finally, LSH is an ideal choice when designing a FR system that works with large database.However, our system also has some limitations.We have observed that skewness in face image a great influence in retrieval performance and our approach shows its weakness in real-world condition that will be considered in the next works.
In the future, we will consider the fusion of different features such as HOG, learning feature… As well as, we also study face alignment and 3D face modeling to improve the performance in real-word dataset.

Fig. 3 .
Fig. 3. Extended LBP operator compare to the basic one A histogram of an image labeled by LBP operator can be detail as follows:

Fig. 4 .
Fig. 4. Examples of texture primitives presented by LBP (white circles mean zeros and the black circles mean ones)

Fig. 6 .
Fig. 6.Observed 36 patterns.Black and white dot indicate to bit value of zeros and ones

Fig. 8 .
Fig. 8. Illustration of applying the grid 6x7 on images IV.INDEXING

Fig. 9 .
Fig. 9.Illustration of LSH works in the retrieval system

Fig. 10 .
Fig. 10.Average execution time of each query in different dataset size (milliseconds)

Fig. 11 .
Fig. 11.Some results of the queries from our system on and operator

Fig. 12 .
Fig. 12.Some results of the queries from our system on and operator

TABLE I .
AVERAGE EXECUTION TIME OF EACH QUERY IN EACH DATASET (MILLISECONDS / QUERY)

TABLE II .
AVERAGE INDEXING TIME OF EACH IMAGE IN EACH DATASET (MILLISECONDS / IMAGE)

TABLE III
better that its original version.www.ijacsa.thesai.orghas proven its effectiveness in this case with the highest MAP score of 0.8540, which is over two times than the second high score (0.3755 of ).

TABLE VII .
ZERO MAP AT FIRST 10 RETURNED ITEMS IN PERCENTAGE (%)