Multilayer Neural Networks and Nearest Neighbor Classifier Performances for Image Annotation

—The explosive growth of image data leads to the research and development of image content searching and indexing systems. Image annotation systems aim at annotating automatically animage with some controlled keywords that can be used for indexing and retrieval of images. This paper presents a comparative evaluation of the image content annotation system by using the multilayer neural networks and the nearest neighbour classifier. The region growing segmentation is used to separate objects, the Hu moments, Legendre moments and Zernike moments which are used in as feature descriptors for the image content characterization and annotation.The ETH-80 database image is used in the experiments here. The best annotation rate is achieved by using Legendre moments as feature extraction method and the multilayer neural network as a classifier.


INTRODUCTION
As the online resources are still a vital resource in everyday life, developing automatic methods for managing large volumes of digital information is increasingly important.Among these methods, automatic indexing multimedia data remain an important challenge, especially the image annotation [1].Annotated images play a very important role in information processing.They are useful for an image retrieval based on keywords and image content [2].Manual annotation is not only boring but also not practical in many cases due to the abundance of information.Most images are, therefore, available without adequate annotation.Automatic image content annotation becomes a recent research interest [3].It attempts to explore the visual characteristics of images and associate them with image contents and semantics.
Many classifiers have been used for the image classification and annotation without any performance's evaluation.In this paper, we use the neural network classifier for image annotation, and we compare its performance with the nearest neighbour classifier.We use the same type of moments as a method of features extraction for each classifier in order to have an objective comparison between performances of the considered classifiers.In such case, we used a system that tends to extract objects from each image and find the annotation terms that describe its individual content.
The rest of the paper is organised as follows.The Section 2 presents the adopted annotation system while the Section 3 discusses the primordial tasks of any annotation and recognition system.They are the image segmentation and features extraction problems in addition to a brief formulation for Hu, Legendre and Zernike moments as features extraction method.The Section 4 is reserved for the image classification and annotation using neural network or the nearest neighbour classifier.Finally, the Section 5 presents the experimental annotation results and the comparing performance of each used classifier.

II. ANNOTATION SYSTEM
The automatic annotation's techniques of image's content attempt to explore visual features of images that describe an image content and associate them with its semantics.It's an effective technology to improve the image indexing and searching in the large volume of information that is available in the media.The algorithms and systems, used for image annotation, are commonly divided into those tasks [4]:

 Segmentation and Features extraction;
 Classification and Annotation.

Keywords&
Features Database

Segmentation
Features Extraction www.ijacsa.thesai.org The annotation system adopted in this work is shown in Fig. 1.The system has a reference database that contains keywords and features descriptor of images that are already annotated by experts (manual offline annotation).This database is used for modelling and training the classifier in order to choose the appropriate keywords.To achieve this goal, by using region growing segmentation, the input image is firstly segmented into regions that represent objects in the image.The feature vectors of each region are secondly computed and extracted from the image.Those features are finally feed into the input of the classifier, that can be the multilayer neural networks or the nearest neighbor classifier, in order to decide and choose the appropriated keywords for annotation tasks.Therefore, the image content is annotated and the performance of each classifier can be evaluated.

III. IMAGE SEGMENTATION AND FEATURES EXTRACTION
The features' vector, which is extracted from the entire image, loses local information.So, it is necessary to segment an image into regions or objects of interest and use of local characteristics.The image segmentation is the process of partitioning a digital image into multiple segments.It is very important in many applications for any image processing, and it still remains a challenge for scientists and researchers.So far, the efforts and attempts are still being made to improve the segmentation techniques.With the improvement of computer processing capabilities, there are several possible segmentation techniques of an image: threshold, region growing, active contours, level sets, etc... [5].Among these methods, the region growing is well suited because of its simplicity and has been successfully used several times as a segmentation technique of digital images.
The regions are iteratively grown by comparing all unallocated neighbouring pixels to the regions.The difference between a pixel's intensity and the pixel's mean in one region is used as a measure of similarity, it's a predicate that control the evolution of segmentation process.The pixel, with the smallest difference measured this way, is allocated to the respective region.This process continues until all pixels are allocated to a region.By assuming that the objects are localised at the center of images, the region growing segmentation is started from the corner of the image to isolate the objects in the center of the image.The region growing segmentation algorithm used in this paper is presented in Fig. 2 and an example of image segmentation is in Fig. 3.After dividing the original image into several distinct regions that correspond to objects in a scene, the feature vector can be extracted from each region and can be considered as a representation of an object in the entire image.
The feature extraction task transforms carefully the rich content and large input data of images into a reduced representation set of features in order to decrease the processing time.Not only it enhances the retrieval and annotation accuracy, but the annotation speed as well, since a large image database can be organized according to the classification rule and, therefore, search can be performed [6].
In the feature extraction method, the representation of the image content must be considered in some situations such as: translation, rotation and change of scale.This is the reason that justifies the use of moments for feature extraction method from the segmented image.
The use of moments for image analysis and pattern recognition was inspired by Hu [7] and Alt [10].In this paper, the moments used are:  Legendre moments.

A. Hu moments
For a discrete image of M x N pixels with intensity function f(x, y), Hu [7] defined the following seven moments that are invariant to the change of scale, translation and rotation: While all the pixels in image are not visited; 1. Choose an unlabeled pixel p k ; 2. Set the region's mean to intensity of pixel p k ;

Consider unlabeled neighboring pixels p kj ;
If (pixel's intensity -region's mean) < threshold; a. Affect the pixel p kj to the region labeled by k.
b. Update the region's mean and go back to ; Else k = k + 1 and go back to.

End If
End While www.ijacsa.thesai.org   .They were used in several pattern recognition applications [9].The orthogonal property of Legendre polynomials implies no redundancy or overlap of information between the moments with different orders.This property enables the contribution of each moment to be unique and independent of the information in an image [10].The Legendre moments for a discrete image of M x N pixels with intensity function f(x, y)is the following [13]:


, xi and yj denote the normalized pixel coordinates in the range of [-1, +1], which are given by: x P p is the p th -order Legendre polynomial defined by: And, the recurrent formula of the Legendre polynomials is:

 
In this work the recurrent formula is used for calculating Legendre polynomials in order to increase the computation speed.

C. Zernike moments
Zernike moments are the mapping of an image onto a set of complex Zernike polynomials.As these Zernike polynomials are orthogonal to each other, Zernike moments can represent the properties of an image with no redundancy or overlap of information between the moments [11].Due to these characteristics, Zernike moments have been utilized as feature sets in many applications [12].
The discrete form of the Zernike moments of an image size M × N represented by f(x, y) is expressed, in the unit disk 1 ) ( 2 2

  y x
, as follows [13]: Where www.ijacsa.thesai.org  is the number of pixels located in the unit circle, and the transformed phase xy  and the distance xy r at the pixel of coordinates (x, y) are [13]: Most of the time taken for the computation of Zernike moments is due to the computation of radial polynomials.Therefore, researchers have proposed faster methods that reduce the factorial terms by utilizing the recurrence's relations on the radial polynomials.In this paper, we obtained Zernike moments using the direct method.

IV. IMAGE'S CLASSIFICATION AND ANNOTATION
As the features are extracted, a suitable classifier must be chosen.Many classifiers are used and each classifier is found suitable to classify a particular kind of feature vectors depending on their characteristics.Some of these classifiers are based on supervised learning which require an intensive learning and a training phase of the classifier parameters (parameters of Support Vector Machines [14], Boosting [15], parametric generative models [16], decision trees [17], and Neural Network [18]).They are also known as parametric classifiers.Other classifiers base their classification decision directly on the data, and require no learning and training of parameters.These methods are also known as nonparametric classifiers.The most common nonparametric classifier, based on distance estimation, is the Nearest Neighbour classifier.
There are several types of the classifier.Based on their ability to detect complex nonlinear relationships between dependent and independent variables, the multilayer neural networks are used in this paper to classify and annotate image content.Their performances are compared to those of the Nearest Neighbour classifier.

A. Multilayer Neural Network classifier
A multilayer neural network consists of an input layer including a set of input nodes, one or more hidden layers of nodes, and an output layer of nodes [19].Fig. 4 shows an example of a three layer network, used in this paper, having input layer formed by m nodes, one hidden layer formed by 20 nodes, and output layer formed by n nodes.This neural network is trained to classify inputs according to target classes.
The training input data are loaded from the reference database while the target data should consist of a vector of all zero values except for the element i that represents the appropriate class.After random initialisation of bias and connection weights, the training principle of the neural network is based on a loop of steps.It starts by propagating inputs from the input layer to the output layer.The error is calculated and back propagated for each layer of the neural network in order to update the bias and connection weights.When the bias and connection weights are changed, the propagation of the inputs is repeated until having the minimum error between outputs and targets.This is the criterion stop of the neural network training.
The inputs Y i are presented to the input layer and propagated to the hidden layer using the following formula: Then from the hidden layer to the output layer: Finally, the outputs are: Where f is the activation function (hyperbolic tangent sigmoid) used in the tree layer neural network, defined by: At the output layer, the error between the desired output T k and the actual output O k is calculated by: The calculated error is propagated to the hidden layer using the following formula: Then the calculated back propagation of error from the hidden layer to the input layer is: The bias and the connections weights of the input layer i, the hidden layer j and the output layer k are adjusted by:

B. Nearest Neighbor classifier
The nearest neighbour classifier is used to compare the feature vector of the input image and the feature vectors stored in the database.It is obtained by finding the distance between the prototype image and the database.The class is found by measuring the distance between a feature vector of input image and feature vectors of images in reference database.The Euclidean distance measurement is used in this paper, but other distance measurement can be also used [14].
Let X 1 , X 2 , …, X k be the k class features vectors in the database and X q the feature vector of the query image.The feature vector with the minimum distance is found to be the closest matching vector.It is given by:

C. Image Annotation
For the image annotation, low-level feature vectors are calculated iteratively for each region in the image, either by using Hu moments or Legendre moments or Zernike moments.These features vector are presented either to the nearest neighbour classifier or to the Multilayer neural network that was already trained.
When features vector are presented to the nearest neighbour classifier to test matching with the feature values in reference database, the label or keyword of image class with the minimum distance is selected.
When features vectors are feed to the input layer of the Multilayer neural network, where each of the input neurons or nodes correspond to each element of the features vector, The output neurons of the neural network represent the class labels of images to be classified and annotated.Then, each region is annotated by the corresponding label that is found by the classifier.
The input layer of the neural network has a variable number of input nodes.It has seven input nodes in the case of adoption of Hu moments, nine input nodes when adopting Zernike moments and ten input nodes when used Legendre moment as feature extraction method.However, the number of input nodes for the neural networks can be changed and increased when using Zernike and Legendre moments, as a feature extraction method, to increase the accuracy of the annotation system.The features can be also combined together and feed to the Multilayer neural network or the nearest neighbour classifier.

A. Experiments
In our experiments, for each region that represent an object from the input image, the number of input features extracted using Hu invariants features extraction method is 7 (hu1, hu2, hu3, hu4, hu5, hu6, hu7) while the number of input features extracted using the order 4 of Zernike moments is 9 (Z00, Z11, Z20, Z22, Z31, Z33, Z40, Z42, Z44) and the number of input features extracted using the order 3 of Legendre moments is 10 (L00, L01, L02, L03, L10, L11, L12, L20, L21, L30).These inputs are feed to the input of the multilayer neural network or the nearest neighbour classifier in order to do matching with the feature values in the reference database.Then, the appropriate keywords are selected and used for annotation of the input image.The accuracy of image annotation system is evaluated by the precision rate which is the number of the correct results divided by the number of all returned results.
All the experiments are conducted using the ETH-80 database containing a set of 8 different object images [20].The proposed system has been implemented and tested on a Core 2 Duo personal computer using Matlab Software.

B. Results
For both of the used classifiers (the neural network classifier and Nearest Neighbour classifier), the annotation www.ijacsa.thesai.orgrates of Hu, Zernike and Legendre Moments, as the feature extraction method, are given for each object in Table 1.The best annotation rate is achieved when we use the Legendre moments as the features extraction method and the multilayer neural networks as a classifier.However, for the computation time, the Nearest Neighbour classifier based on Hu moments is the best.
The annotation rates of Zernike moments is lower than the annotation rate of Legendre moments because Zernike moments are computed only from pixels of an image inside a unit circle obtained by the mapping transformation over a unit circle.In one hand, those pixels outside a unit circle are not considered.On the other hand, only the absolute values of extracted complex Zernike moments are feed into the classifier.This is the main reason for the modest results obtained by Zernike moments.The Fig. 7 presents an example of annotation results obtained using the presented system and Zernike moment as a www.ijacsa.thesai.orgfeature extraction method.The Graphical User Interface is illustrated in Fig. 8.The results are also affected by the accuracy of the image segmentation method.In most cases, it is very difficult to have an automatic ideal segmentation.This problem decreases the annotation rates.Therefore, any annotation attempt must consider the image segmentation as an important step, not only for automatic image annotation system, but also for the other systems which require its use.The Multilayer neural network classifier based on Legendre moments and Zernike moments is very expensive regarding to the computation time of the features in addition to the training time of the classifier.So, any use of them in a real time for an online image annotation system will be difficult and impracticable.

VI. CONCLUSION
In this paper, we evaluated the image annotation performance of the neural network classifier and the nearest neighbour classifier based on Hu moments, Legendre moments and Zernike moments as feature extraction methods.The performance of each classifier and each feature extraction method has been experimentally analyzed.The successful experimental results proved that the proposed image annotation system based on the multilayer neural network classifier gives the best results for some images that are well and properly segmented.However, the processing time and the Image segmentation remain a challenge that needs more attention in order to increase precision and accuracy of the image annotation system.Also, the gap between the low-level features and the semantic content of an image must be reduced and considered for more accuracy of any image annotation system.Other Image segmentation and other classifiers may be considered in the future works for other image databases.

Figure 1 .
Figure 1.The block diagram of the image annotation system.

Figure 3 .
Figure 3. Example of image segmentation result.
moments Legendre moments were first introduced by Teague[8]

Figure 4 .
Figure 4.The three layer neural network.
The fig.5shows some examples of image objects from the ETH-80 image database used in our experiments.The experiments are based on eight classes of objects (Apple, Car, Cow, Cup, Dog, Horse, Pears, and Tomato).The number of prototypes per class is 5.

Figure 5 .
Figure 5.Some examples of objects from the ETH-80 database.

Figure 6 .
Figure 6.Comparison of general annotation rates.The fig 6 gives the comparison of general annotation rate for each extraction method and for each classifier.

Figure 7 .
Figure 7. Example of the annotation results using Zernike moment features and (a) the multilayer neural network classifier (b) the Nearest Neighbour classifier.

TABLE I .
OJECTS ANNOTATION RATE OF HU, ZERNIKE AND LEGENDRE MOMENTS USING NEURAL NETWORK AND NEAREST NEIGHBOUR CLASSIFIER The general annotation rates and error rates of the neural network classifier or the Nearest Neighbour classifier based on Hu moments, Zernike moments and Legendre Moments as feature extraction method are given in Table 2.The experimental results showed that the annotation rate of the neural network classifier based on Legendre moments, Zernike moments and Hu moments is higher than the annotation rate of the Nearest Neighbour classifier based on Legendre moments, Zernike moments and Hu moments.

TABLE II .
Annotation and Error Rate of Hu, Zernike and Legendre Moments using Neural Network and Nearest Neighbour classifier