Clustering and Bayesian network for image of faces classification

In a content based image classification system, target images are sorted by feature similarities with respect to the query (CBIR). In this paper, we propose to use new approach combining distance tangent, k-means algorithm and Bayesian network for image classification. First, we use the technique of tangent distance to calculate several tangent spaces representing the same image. The objective is to reduce the error in the classification phase. Second, we cut the image in a whole of blocks. For each block, we compute a vector of descriptors. Then, we use K-means to cluster the low-level features including color and texture information to build a vector of labels for each image. Finally, we apply five variants of Bayesian networks classifiers (Na\"ive Bayes, Global Tree Augmented Na\"ive Bayes (GTAN), Global Forest Augmented Na\"ive Bayes (GFAN), Tree Augmented Na\"ive Bayes for each class (TAN), and Forest Augmented Na\"ive Bayes for each class (FAN) to classify the image of faces using the vector of labels. In order to validate the feasibility and effectively, we compare the results of GFAN to FAN and to the others classifiers (NB, GTAN, TAN). The results demonstrate FAN outperforms than GFAN, NB, GTAN and TAN in the overall classification accuracy.


INTRODUCTION
Classification is a basic task in data mining and pattern recognition that requires the construction of a classifier, that is, a function that assigns a class label to instances described by a set of features (or attributes) [15]. Learning accurate classifiers from pre-classified data has been a very active research topic in the machine learning. In recent years, numerous approaches have been proposed in face recognition for classification such as Fuzzy sets, Rough sets, Hidden Markov Model (HMM), Neural Network, Support Vector Machine and Genetic Algorithms, Ant Behavior Simulation, Case-based Reasoning, Bayesian Networks etc. Much of the related work on image classification for indexing, classifying and retrieval has focused on the definition of low-level descriptors and the generation of metrics in the descriptor space [2]. These descriptors are extremely useful in some generic image classification tasks or when classification based on query by example. However, if the aim is to classify the image using the descriptors of the object content this image.
Several methods have been proposed for face recognition and classification, we quote: structural methods and global techniques. Structural techniques deal with local or analytical characteristics. It is to extract the geometric and structural features that constitute the local structure of face of image. Thus, analysis of the human face is achieved by the individual description of its different parts (eyes, nose, mouth, ..) and by measuring the relative positions of one by another. The class of global methods includes methods that enhance the overall properties of the face. Among the most important approaches, we quote Correlation Technique used by Alexandre [Lemieux 03] is based on a simple comparison between a test image and face learning, Principal component analysis approach (eigenfaces) based on principal component analysis (PCA) [52][53], discrete cosine transform technique (DCT) which based on computing the discrete cosine transform [58], Technique using Neural Networks and support vector machine (SVM) In [45].
There are two questions to be answered in order to solve difficulties that are hampering the progress of research in this direction. Firstly, how to link semantically objects in images with high-level features? That's mean how to learn the dependence between objects that reflected better the data? Secondly, how to classify the image using the structure of dependence finding? Our paper presents a work which uses three variants of naïve Bayesian Networks to classify image of faces using the structure of dependence finding between objects. This paper is divided as follows: Section 2, presents an overview of distance tangent; Section 3 describes the developed approach based in Naïve Bayesian Network, we describe how the feature space is extracted; and we introduce the method of building the Naïve Bayesian network, Global Tree Augmented Naïve Bayes (GTAN), Global Forest Augmented Naïve Bayes (GFAN), Tree Augmented Naïve Bayes (TAN) and Forest Augmented Naïve Bayes (FAN)and inferring posterior probabilities out of the network; Section 4 presents some experiments; finally, Section 5 presents the discussion and conclusions.

II. TANGENT DISTANCE
The tangent distance is a mathematical tool that can compare two images taking into account small transformations (rotations, translations, etc.).. Introduced in the early 90s by Simard [60] it was combined with different classifiers for character recognition, detection and recognition of faces and recognition of speech. It is still not widely used. The distance of an image to another image I1 I2 is calculated by measuring the distance between the parameter spaces via I1 and I2 respectively. These spaces locally model all forms generated by the possible transformations between two images. When an image is transformed (e.g. scaled and rotated) by a transformation which depends on L parameters (e.g. the scaling factor and rotation angle), the set of all transformed patterns (1) is a manifold of at most dimension L in pattern space. The distance between two patterns can now be defined as the minimum distance between their respective manifolds, being truly invariant with respect to the L regarded transformations. Unfortunately, computation of this distance is a hard nonlinear optimization problem and the manifolds concerned generally do not have an analytic expression. Therefore, small transformations of the pattern xare approximated by a tangent subspace to the manifold at the point x. This subspace is obtained by adding to x a linear combination of the vectors that span the tangent subspace and are the partial derivatives of with respect to . We obtain a first-order approximation of The single-sided (SS) TD is defined as : The tangent vectors can be computed using finite differences between the original image xand a reasonably small transformation of x. Example images that were computed using 3 are shown in Figure 1

A. Definition
Bayesian networks represent a set of variables in the form of nodes on a directed acyclic graph (DAG). It maps the conditional independencies of these variables. Bayesian networks bring us four advantages as a data modeling tool [20-22-24]. Firstly, Bayesian networks are able to handle incomplete or noisy data which is very frequently in image analysis. Secondly, Bayesian networks are able to ascertain causal relationships through conditional independencies, allowing the modeling of relationships between variables. The last advantage is that Bayesian networks are able to incorporate existing knowledge, or pre-known data into its learning, allowing more accurate results by using what we already know. Bayesian network is defined by : is a set of nodes of G, and E of of G ;  A finite probabilistic space ( ;  A set of random variables associated with graph nodes and defined on as : where is a set causes (parents) in graph G.

B. Inference in bayesian network
Suppose we have a Bayesian network defined by a graph and the probability distribution associated with (G, P). Suppose that the graph is composed of n nodes, denoted . The general problem of inference is to compute where Y X and . To calculate these conditional probabilities we can use methods of exact or approximate inferences. The first gives an accurate result, but is extremely costly in time and memory. The second turn, requires less resources but the result is an approximation of the exact solution.
To calculate these conditional probabilities we can use methods of exact or approximate inferences. The first gives an accurate result, but is extremely costly in time and memory. The second turn, requires less resources but the result is an approximation of the exact solution. A BN is usually transformed into a decomposable Markov network [59] for inference. During this transformation, two graphical operations are performed on the DAG of a BN, namely, moralization and triangulation.

C. Parameters learning
In this case the structure is completely known a priori and all variables are observable from the data, the learning of conditional probabilities associated with variables (network nodes) may be from either a randomly or according to a Bayesian approach. The statistical learning calculation value frequencies in the training data is based on the maximum likelihood (ML) defined as follows: where is the number of events in the data base for which the variable in the state of his parents are in the configuration The Bayesian approach for learning from complete data consists to find the most likely θ given the data observed using the method of maximum a posteriori (MAP) where : With the conjugate prior distribution ; est la distribution de Dirichlet : And the posterior parameter : Thus D. Structure learning Structure learning is the act of finding a plausible structure for a graph based on data input. However, it has been proven that this is an NP-Hard problem, and therefore any learning algorithm that would be appropriate for use on such a large dataset such as microarray data would require some form of modification for it to be feasible. It is explained by Spirtes et al (2001) that finding the most appropriate DAG from sample data is large problem as the number of possible DAGs grows super-exponentially with the number of nodes present. The number of possible structures is super-exponential in the number of variables. The number of possible combinations G of DAGs of n variables can be calculated by the recursive formula [20] In practice we use heuristics and approximation methods like K2 algorithm.

E. Bayesian network as a classifier 1)
Naïve bayes A variant of Bayesian Network is called Naïve Bayes. Naïve Bayes is one of the most effective and efficient classification algorithms. The conditional independence assumption in naïve Bayes is rarely true in reality. Indeed, naive Bayes has been found to work poorly for regression problems (Frank et al., 2000), and produces poor probability estimates (Bennett, 2000). One way to alleviate the conditional independence assumption is to extend the structure of naive Bayes to represent explicitly attribute dependencies by adding arcs between attributes.

2)
TAN An extended tree-like naive Bayes (called Tree augmented naive Bayes (TAN) was developed, in which the class node directly points to all attribute nodes and an attribute node can have only one parent from another attribute node (in addition to the class node). Figure 3 shows an example of TAN. Construct-TAN procedure is described in our previous work [5].

3)
FAN To improve the performance of TAN, we have added a premature stop criterion based on a minimum value of increase in the score in the search algorithms of maximum weight spanning tree. (For more details view [5]).

IV. PROPOSED SYSTEM
In this section, we present two architectures of the classification system developed face. We recall the architecture developed in our article [5] and the new architecture developed in this work. In the latter, we proposed a Bayesian network for each class. So we all structures as classes in the training set. Each structure models the dependencies between different objects in the face image. The proposed system comprises three main modules: a module for extracting primitive blocks from the local cutting of facial images, a classification module of primitives in the cluster by using the method of k-means, and a classification module the overall picture based on the structures of Bayesian networks developed for each class. The developed system for faces classification is shown in the following figure:  Values are more scattered, the greater the standard deviation is large. Most values are clustered around the mean, minus the standard deviation is high. Energy is a parameter that measures the homogeneity of the image. The energy has a value far lower than a few homogeneous areas. In this case, there are a lot of gray level transitions. Homogeneity is a parameter that has a behavior opposite contrast. Has more texture and more homogeneous regions of the parameter is high.
The mean, standard deviation, energy, entropy, contrast, and homogeneity are computed as follow: The means is defined as: (1) The standard deviation is defined as: (2) The energy is defined as: (3) The entropy is defined as (4) The contrast is defined as (5) The homogeneity is defined as (6) C.
Step 3: Clustering of blocs with K-means: Our aproach is based on modeling of the image by a 3x3 grid reflecting a local description of the image. So the image to be processed is considered a grid of nine blocks ( figure 6). At each block we will apply the descriptors presented in the previous section.
To generate the label vector from vector descriptor, we used the k-means algorithm. Each vector will undergo a clustering attribute, and replace the labels generated vector components descriptors. We use the method of k-means to cluster the descriptor as shown in figure? Step 4: Structure Learning The originality of this work is the development of a Bayesian network for each class. Then, we have compared the results of this network to a global Bayesian network. We have utilized naïve Bayes, Global Structure Tree Augmented Naïve Bayes, Global Structure Forest Augmented Naïve Bayes, Tree Augmented Naïve Bayes for each class (TAN), and Forest Augmented Naïve Bayes for each class (FAN) classifiers. This sub-section describes the implementation of these methods

E.
Step 5: Parameters learning NB, TAN and FAN classifiers parameters were obtained by using the procedure as follows.
In implementation of NB, TAN and FAN, we used the Laplace estimation to avoid the zero-frequency problem. More precisely, we estimated the probabilities and using Laplace estimation as follows : Where -N: is the total number of training instances.

F. Step 6: Classification
In this work the decisions are inferred using Bayesian Networks. Class of an example is decided by calculating posterior probabilities of classes using Bayes rule. This is described for both classifiers.
 NB classifier In NB classifier, class variable maximizing equation (7) is assigned to a given example.
 TAN and FAN classifiers In TAN and FAN classifiers, the class probability P (C|A) is estimated by the following equation defined as: Where is the parent of and The classification criterion used is the most common maximum a posteriori (MAP) in Bayesian Classification problems. It is given by:

V. EXPERIMENTS AND RESULTS
In this section, we present the results obtained using a database of images. We start by presenting the database with which we conducted our tests, then we present our results according to the used structure (global or one for each class)..  -The structure of GFAN (Global FAN) returns to the structure of NB, that is to say, the attributes are independent of each other. In fact, the variability of face images and the choice of a very high mutual information will decrease the degree of dependency between attributes. -On the other hand, we note that there is a dependency between the attributes of face images in the same class as shows the structures of class 1, 2, 3.4, and 5. -By comparing the structure with those of FAN TAN, we see that the choice of a very high dependency criterion will eliminate weak dependencies between attributes and keeps only the bonds that represent a strong dependency between attributes and which will influence positively on the classification results.

C. Parameter learning
After estimating the global structure of GTAN, GFAN and the structure of TAN and FAN of each class. We have used those structures to estimate the conditional and a priori probability and using Laplace estimation to avoid the zero-frequency problem; we obtained the results as follows (Tables I, II):

E. Naïve Bayes
From the table III, we note that the naive Bayesian networks, despite their simple construction gave a very good classification rates. However, this rate is improving as we see that the classification rates obtained by class 3 is 40%. So we conclude that there is a slight dependency between the

G. Forest augmented Naïve Bayes
From the table above, we find that the rate of correct classification GFAN is the same as that obtained by naive Bayes. Since learning of GFAN structure with a strict threshold equivalent to 0.8 gave the same structure as Naïve Bayes. However, using the structure of class 1, we note that the classification rate was slightly improved to class 1. Also for the Class 3 classification rate increased from 0.4 to 0.9 using the structure for class 3 FAN

H. Discussion
According to our experiments, we observe that the naive Bayesian network gave a good result as NAT. Two factors may cause:  The directions of links are crucial in a NAT. According to the TAN algorithm detailed in our article [5] an attribute is randomly selected as the root of the tree and the directions of all links are made thereafter. We note that the selection of the root attribute actually determines the structure of the TAN result, since TAN is a directed graph. Thus the selection of the root attribute is important to build a NAT.  Of unnecessary links can exist in a NAT. According to the TAN algorithm, a spanning tree of maximum weight is constructed. Thus, the number of links is set to n-1. Sometimes it could be a possible bad fit of the data, since some links may be unnecessary to exist in the NAT.
It is observed that NB and Global Fan gave the same classification rate, since they have the same structure. We note also that the rate of correct classification given by FAN is very high that TAN. Several factors are involved: 1-According to the FAN algorithm illustrated in our article [5], the choice of the attribute A root is defined by the equation below, the maximum mutual root has the information with the class., . It is obvious to use this strategy, ie the attribute that has the greatest influence on the classification should be the root of the tree. 2-Filtering of links that have less than a conditional mutual information threshold. These links are at high risk for a possible bad fit of the data which could distort the calculation of conditional probabilities. Specifically, the use of a conditional average mutual information defined in the equation below as a threshold. All links that have conditional mutual information unless Iavg are removed.
.   This study is an extension of our previous work [5]. We have developed a new approach for classifying image of faces using distance tangent and the method of k-means to cluster the vectors descriptors of the images. First, we use the technique of tangent distance to calculate several tangent spaces representing the same image. The objective is to reduce the error in the research phase. Then we have used Bayesian network as classifier to classify the whole image into five classes. We have implemented and compared three classifiers: NB TAN and FAN using two types of structure, a global structure and structure per class presenting respectively the dependence between the attributes inter and intra-class. The goal was to compare the results obtained by these structures and apply algorithms that can produce useful information from a high dimensional data. In particular, the aim is to improve Naïve Bayes by removing some of the unwarranted independence relations among features and hence we extend Naïve Bayes structure by implementing the Tree Augmented Naïve Bayes. Unfortunately, our experiments show that TAN performs even worse than Naïve Bayes in classification. Responding to this problem, we have modified the traditional TAN learning algorithm by implementing a novel learning algorithm, called Forest Augmented Naïve Bayes. We experimentally test our algorithm in data image of faces and compared it to NB and TAN. The experimental results show that FAN improves significantly NB classifiers' performance in classification. In addition, the results show that the mean of classification accuracy is better when the number of cluster is optimal that's mean the number of cluster that can reflected better the data. Then, we marked that the structure of FAN per class performs better than Global FAN. This results is explained by the use of structure of FAN per class reflect better the dependence of the attribute intra-class (in the same class), and the use of a global structure reflects better the dependence inter-class (between the classes).