Laguerre Kernels –based Svm for Image Classification

—Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convey several salient properties that other methods hardly provide. However the performance of SVMs is very sensitive to how the kernel function is selected, the challenge is to choose the kernel function for accurate data classification. In this paper, we introduce a set of new kernel functions derived from the generalized Laguerre polynomials. The proposed kernels could improve the classification accuracy of SVMs for both linear and nonlinear data sets. The proposed kernel functions satisfy Mercer's condition and orthogonally properties which are important and useful in some applications when the support vector number is needed as in feature selection. The performance of the generalized Laguerre kernels is evaluated in comparison with the existing kernels. It was found that the choice of the kernel function, and the values of the parameters for that kernel are critical for a given amount of data. The proposed kernels give good classification accuracy in nearly all the data sets, especially those of high dimensions.


I. INTRODUCTION
Improving efficacy of classifiers have been an extensive research area in machine learning over the past two decades, which led to state-of-the-art classifiers like support vector machines, neural networks and many more.Support vector machine (SVM) is a robust classification tool, effectively over comes many traditional classification problems like local optimum and curse of dimensionality [1].Support vector machines (SVMs) algorithm [2][3] has been shown to be one of the most effective machine learning algorithms.It gives very good results in terms of accuracy when the data are linearly or non-linearly separable.When the data are linearly separable, the SVMs result is a separating hyperplane, which maximizes the margin of separation between classes, measured along a line perpendicular to the hyperplane.If data are not linearly separable, the algorithm works by mapping the data to a higher dimensional feature space (where the data becomes separable) using an appropriate kernel function and a maximum margin separating hyperplane is found in this space.Thus the weight vector that defines the maximal margin hyperplane is a sufficient statistic for the SVMs algorithm (it contains all the information needed for constructing the separating hyperplane).Since this weight vector can be expressed as a weighted sum of a subset of training instances, called support vectors, it follows that the support vectors and the associated weights also constitute sufficient statistics for learning SVMs from centralized data.
One issue for improving the accuracy of SVMs is finding an appropriate kernel for the given data to improve the accuracy of SVMs.Most research relies on a priori knowledge to select the correct kernel, and then tweaks the kernel parameters via machine learning or trial-and-error.While there exist rules-of-thumb for choosing appropriate kernel functions and parameters, this limits the usefulness of SVMs to expert users, especially since different functions and parameters can have widely varying performance.Williamson et al. [4] published a method for the use of entropy numbers in choosing an appropriate kernel function.It was an attempt to explain kernel function choice by more analytical means rather than previous ad-hoc or empirical methods.The entropy numbers associated with mapping operators for Mercer kernels is discussed.In [5], it was stated that previous work on invariance transformations was mostly appropriate only for linear SVM classifiers.For non-linear SVM classifiers, an analytical method of utilizing kernel principal component analysis (PCA) map for incorporating invariance transformations was presented in [6].
Tsang et al. [7] discussed a way to take advantage of the approximations inherent in kernel classifiers, by using the Minimum Enclosing Ball algorithm as an alternative means of speeding up training.Training time had previously been reduced mostly by modifying the training set in some way.Their final classifiers, which they called the Core Vector Machine, converged in linear time with space requirements independent of the number of data points.Zanaty and Aljahdali [8] investigated the performance of different kernels when they are applied to different data sets.Zanaty et al. [9][10] combined GF and RBF functions in one kernel called "universal kernel" to take advantage of their respective strengths.The universal kernels constructed the most established kernels such as radial bases, gauss, and polynomial functions by optimizing the parameters using the training data.SedatOzer et al., [11] introduced a set of new kernel functions derived from the generalized Chebyshev polynomials, where the generalized Chebyshev kernel approaches the minimum support vector number and maximum classification performance.Zhi-Bin Pan et al. [12] introduced support vector machine based on orthogonal Legendre polynomials, to reduce www.ijacsa.thesai.org the redundancy in feature space due to the orthogonality of Legendre polynomials, which may enable the SVM to construct the separating hyperplane with less support vectors.These kernels satisfy Mercer's condition and converge faster than the existing kernels.
Completely achieving a SVM with high accuracy classification therefore, requires specifying high quality kernel function.In this paper, a new set of Laguerre functions is introduced that could improve the classification accuracy of SVMs.A class of Laguerre kernel functions on the basis of the properties of the common kernels is proposed, which can find numerous applications in practice.The proposed set of kernel functions provides competitive performance when compared to all other common kernel functions on average for the simulation datasets.The results indicate that they can be used as a good alternative to other common kernel functions for SVM classification in order to obtain better accuracy.
The rest of this paper is organized as follows: In section 2, SVM classifiers are discussed.The kernel functions are discussed in section 3. The generalized Legendre kernels are discussed in section 4. Section 5 presents the functional analysis of the proposed Laguerre kernels.Experimental and comparative results are given in section 6.Finally, section 7 shows the conclusion.

II. SVM CLASSIFIER
SVMs [14] are a relatively new approach for creating classifiers that have become increasingly popular in the machine learning community.They present several advantages over other methods like neural networks in areas like training speed, convergence, complexity control of the classifier, as well as a stronger mathematical background based on optimization and statistical learning theory.In the novel learning paradigm embodied in support vector machines "learning" (selection, identification, estimation, training or tuning), the parameters are not predefined and their number depends on the training data used [14][15].The support vector machines combine two main ideas.The first one is concept of an optimum linear margin classifier, which constructs a separating hyperplane that maximizes distances to the training point.The second one is concept of a kernel.In its simplest form, the kernel is a function which calculates the dot product of two training vectors.Kernels calculate this dot product in feature space, often without explicitly calculating the feature vectors, operating directly on the input vectors instead.When we use feature transformation, which reformulates input vector into new features, the dot product is calculated in feature space, even if the new feature space has higher dimensionality.So the linear classifier is unaffected.Margin maximization provides a useful trade off with classification accuracy, whichcan easily lead to overfitting of the training data.Consider aninput space X with input vectors x, a target space Y = {1,-1} and a training setT r = {(x 1 , y 1 ) , ...,(x N , y N )} with x i ∈ X and y i ∈ Y.In SVM classification, separation of the two classes Y = {1,-1} is done by means of the maximum margin hyperplane, i.e. the hyperplane that maximizes the distance to the closest data points and guarantees the best generalization on new, unseen examples.Let us consider two hyperplanes: (2) The distance from the hyperplane to a point x i can be written: Consequently the margin between two hyperplanes can be written as: To maximize this margin we have to minimize ||w||.This comes down to solving a quadratic optimization problem with linear constraints.Notice however that we assumed that the data in T r are perfectly linear separable.In practice however this will often not be the case.
Therefore we employ the so called soft-margin method in contrast to the hard-margin method.Omitting further details we can rewrite the soft-margin optimization problem by stating the hyperplane in its dual form, i.e. find the Lagrange multipliers α i ≥ 0 (i = 1,...,N) that : Considering the dual problem above, we can now write the maximum margin hyperplane as a linear combination of support vectors.By definition, the vectors x i corresponding with non-zero α i are called the support vectors and this set consists of those data points that lie closest to the hyperplane and thus are the most difficult to classify.In order to classify a new point x new , one has to determine the sign of If this sign is positive x new belongs to class 1, if negative to class -1, if zero x new lies on the decision boundary.Note that we have restricted the summation to the set of support vectors because the other α i are zero anyway.

III. KERNEL FUNCTIONS
Support vector machine is one of kernel-based learning algorithms that consist of a learning algorithm and the kernel function [16,17].The kernel function creates the hypothesis space where the learning process searches for.The kernel can be considered as a similarity measure between two inputs which corresponds to their inner product in some feature space into which the original inputs are mapped.This is very useful, for instance, when the concept to be learned depends nonlinearly on the data, but the learning algorithm is able to learn only linear dependencies.www.ijacsa.thesai.orgSince support vector machines are linear classifiers, it is necessary to map the input vectors with a nonlinear mapping in order to learn non-linear relations.The resulting vectors are usually called features.Formally, let k denote the input space, which can be any set, and F denote the feature vector space.For any mapping: The inner product of the mapped inputs is called a kernel function: A necessary condition for this is that   z x k , is symmetric and finitely positive semi definite [18][19].There are many different types of kernels that can be found in the literature [18][19].

IV. PROPOSED KERNELS
A critical step in support vector machine classification is choosing a suitable kernel of SVMs for a particular application, i.e. various applications need different kernels to get reliable classification results.It is well known that the two typical kernel functions often used in SVMs are the radial basis function kernel and polynomial kernel.More recent kernels are presented in [9][10][11][12][20][21][22][23] to handle high dimension data sets and are computationally efficient when handling nonseparable data with multi attributes.However, it is difficult to find kernels that are able to achieve high classification accuracy for a diversity of data sets.In order to construct kernel functions from existing ones or by using some other simpler kernel functions as building blocks, the closure properties of kernel functions are essential [16][17][18].
For given non-separable data, in order to be linearly separable, a suitable kernel has to be chosen.Classical kernels, such as Gauss RBF and POLY functions, can be used to transfer non-separable data to separable, but their performance in terms of accuracy is dependent on the given data sets.The following POLY function performs well [20] with nearly all data sets, except high dimension ones: whered is the polynomial degree.
The same performance [20] is obtained with the Gauss RBF of the following form: where  is appositive parameter controlling the radius.
Zanaty et al in [9] presented the polynomial Radial basis function (RBPF) as: wherep is a parameter.Zanaty et al in [10] presented Support vector machines (SVMs) with universal kernels, called Gaussian radial basis polynomials function (GRPF) given by: where , and dare the kernel parameters for the Gaussian, polynomial and universal kernels,respectively.  and are the scaling parameters for the polynomial kernel and determines the width of the Gaussian kernel respectively.Kernel functions should be applied onto input vectors directly instead of applying them onto eachelement and combining the results by a product, since the kernel functions are supposed to provide a measure of the correlation of two input vectors in a higher dimensional space.

A. Laguerre polynomials
The Laguerre polynomials are defined by the equation: The exponential function can be expanded to give: Recall the binomial expansion: using the notation: 6) may therefore be written as Equating powers of n t , we get: 5) then gives www.ijacsa.thesai.org (15) Note that the series in Equation ( 14  B.1Rodrigues' Formula for the Laguerre polynomial Using Equation (11), Equation ( 11) may be written as Recall the Leibniz formula for the derivative of a product: and that Equation ( 13) may therefore be written as Laguerre polynomials of low order can be evaluated by using the Rodrigues' formula (21) :

B. Recurrence Relations
We write the defining equation ( 5) in the form Differentiating both sides with respect to t, we get (24) and equating coefficients of n t , we obtain (25) and hence the recurrence relation If, on the other hand, we differentiate Equation ( 23) with respect to x, we get Equating coefficients of n t yields the identity and hence the recurrence relation

C. Orthogonality of the Laguerre Polynomials
Laguerre's differential equation can be cast into selfadjoint form by first writing it as , and subtract.This gives: Integrating both sides from  to 0 , and using the rule for the derivative of a product, we get The case n m  can be examined by noting that: Thus we have: The Right Hand Side may be evaluated by integrating by parts n times.The procedure is as follows: The first term on the Right Hand Side vanishes at both limits, so we obtain: A continuation of this process leads ultimately to the result: Thus the only term in the summation (45) which survives is the n r  term; hence we obtain: So we finally obtain: This result may be combined with the orthogonality relation (38) to give: The weight function x e  may be removed by defining a new function: Equation (57) may then be written as:

V. GENERALIZED LAGUERRE KERNELS
Here, we propose a generalized way of expressing the kernel function to clarify the ambiguity on how to implement Laguerre kernels.To the best of our knowledge, there was no previous work defining the Laguerrepolynomials for vector inputs recursively.Therefore for vector inputs, we define the generalized Laguerre polynomials as: Therefore, the generalized Laguerre, , yield rowvectors, otherwise, it yields a scalar value.Thus by using generalized Laguerre polynomials, we define generalized n th order Laguerre kernel as Where x and z are m-dimensional vectors.A. Functional analysis Before presenting reproducing kernel (i.e., Mercerkernel), Mercer's theoremof functional analysis is presented here as described in [24]that gives conditions under which wecan construct the mapping  from the eigenfunctiondecomposition of k .According to Mercer's work [24],it is known that if k is the symmetrical and continuous kernel of an integraloperator Then k can be expanded into a uniformly convergent series In this case, the mapping from input space tofeature space produced by the kernel is expressed as ),...) ( ), ( ( : such that k acts as the given dot product, i.e., ). , ( be Mercer kernels and let is a nonnegative constant.According toMercer's theorem, we have By taking the sum of the positive combination of (56)with coefficients i a over i , one obtains .This kind of kernel can findnumerous applications in practice.
Theorem 2.The product of Mercer kernels is also a Mercer kernel.
The proofis similar to that of the precedingtheorem.Theorem 3.To be a valid SVM kernel, a kernel should satisfy the Mercer Conditions [26][27].If the kernel does not satisfy the Mercer Conditions, SVM may not find the Kernel Parameter: (54) www.ijacsa.thesai.orgoptimal parameters, but rather it may find suboptimal parameters.Also if the Mercer conditions are not satisfied, then the Hessian matrix for the optimization part may not be positive definite.Therefore we examine if the generalized Leguerre kernel satisfies the Mercer conditions: Mercer Theorem: To be a valid SVM kernel, for any finite function ) (x g , the following integration should always be non-negative for the given kernel function ) , ( z x k [1]: Where

VI. EXPERIMENTAL RESULTS
The classification experiments are conducted on different data like Cloud, Liver, Seed, Forest Fire and Yeast dataavailable at http://archive.ics.uci.edu/ml/datasets.html.These data sets have been given to the algorithm with different sizes (classes and attributes).Table II shows the classification accuracy for five different data sets using Laguerre kernel of order from 2 to 5 implementations.As shown in Figure 2, it is clear that when the order of polynomials increases, the accuracy increases for all data sets.

A. Comparative results
The performance of the proposed kernel with SVMs, in terms of classification accuracy, is evaluated by application to a variety of data sets available at: http://www.cs.toronto.edu.delve/data/image-set/desc.html.Firstly, we used LIBSVM with different kernels(linear, polynomial, radial basis function [8]).The parameters used include two parameters for the RBF kernel parameter γ=0.5 and  =0.5, d=1 for linear and d=5 for polynomial kernels.
Table II lists the main characteristics of the seven datasets used in the experiments.In order to evaluate the performance of the support vector machine with different kernels, we carried out some experiments with different data sets from learning benchmarks domains [28].
The data has even different classes of image.They contain 210 data for training and another 2100 data for testing, Each vector has 18 elements with different minimum and maximum values.For the training, we have 30 data for the class(+1) and180 data for the class (-1) and similarly for testing.As can be seen from Table II, the generalized Laguerre kernel results show better generalization ability than the existing Gaussian, Polynomial (POLY) and Chebyshev [12] kernels.For example, in Table II, the 5 th order generalized Laguerre kernel results in classification accuracy of more than 96% for all test data sets, while the existing kernels achieve less than 96% for most test data sets.More specific, comparing the results of Tables III, the 5 th order generalized Laguerrekernel always gives good results and may be the best at all, as shown in Figure 3.  show that the proposed kernel function results in the best accuracy in nearly all the data sets especially in the data set with large number of attributes.The obtained results are encouraging and suggest that the proposed method is worth further consideration.
) terminates after n terms, i.e. ) (x L n is a polynomial of degree n (see Fig.(1)).

Fig. 1 .
Fig. 1.The Laguerre function for the first five polynomials.
Replacing n by m, we have: 53) www.ijacsa.thesai.orgFunctions which satisfy the relation (54) are said to be normalized, and we say that the ) (x n  form an orthonormal set of functions.

Fig. 3 .
Fig. 3. SVM Classification Accuracy with differentKernels VII.CONCLUSION In this paper, SVMs have been improved to solve the classification problems by mapping the training data into a feature space by the aid of Laguerre kernel functions and then separating the data using a large margin hyperplane.A class of Laguerre kernel functions on the basis of the properties of the common kernels is proposed, which can find numerous applications in practice.Experimental results illustrate the validity and effectiveness of the proposed kernel.The experimental results

TABLE II .
CLASSIFICATION ACCURACY OF DIFFERENT DATA SETS USING LAGUERRE KERNELFUNCTION

TABLE III .
RESULTS ON IMAGE SEGMENTATION DATA WITH VARIOUSKERNELFUNCTIONS.