The SVM Classifier Based on the Modified Particle Swarm Optimization

The problem of development of the SVM classifier based on the modified particle swarm optimization has been considered. This algorithm carries out the simultaneous search of the kernel function type, values of the kernel function parameters and value of the regularization parameter for the SVM classifier. Such SVM classifier provides the high quality of data classification. The idea of particles' {\guillemotleft}regeneration{\guillemotright} is put on the basis of the modified particle swarm optimization algorithm. At the realization of this idea, some particles change their kernel function type to the one which corresponds to the particle with the best value of the classification accuracy. The offered particle swarm optimization algorithm allows reducing the time expenditures for development of the SVM classifier. The results of experimental studies confirm the efficiency of this algorithm.


INTRODUCTION
Currently, for the different classification problems in various applications the SVM algorithm (Support Vector Machines, SVM), which carries out training on precedents («supervised learning»), is successfully used.This algorithm includes in the group of boundary classification algorithms [1], [2].
The main feature of the SVM classifier is using of the special function called the kernel, with which the experimental data set has been converted from the original space of characteristics into the higher dimension space with the construction of a hyperplane that separates classes.A herewith two parallel hyperplanes must be constructed on both sides of the separating hyperplane.These hyperplanes define borders of classes and have been situated at the maximal possible distance from each other.It has been assumed that the bigger distance between these parallel hyperplanes gives the better accuracy of the SVM classifier.Vectors of the classified objects' characteristics which are the nearest to the parallel hyperplanes are called support vectors.An example of the separating hyperplane building in the 2D space has been shown in Fig. 1.Training of the SVM classifier assumes solving a quadratic optimization problem [1]- [3].Using a standard quadratic problem solver for training the SVM classifier would involve solving a big quadratic programming problem even for a moderate sized data set.This can limit the size of problems which can be solved with the application of the SVM classifier.Nowdays methods like SMO [10,11], chunking [12] and simple SVM [13], Pegasos [14] exist that iteratively compute the required solution and have a linear space complexity [15].
A solution for the problem which has been connected with a choice of the optimal parameters' values of the SVM classifier represents essential interest.It is necessary to find the kernel function type, values of the kernel function parameters and value of the regularization parameter, which must be set by a user and shouldn't change [1], [2].It is impossible to provide implementing of high-accuracy data classification with the use of the SVM classifier without adequate solution of this problem.
Let values of the parameters of the SVM classifier be optimal, if high accuracy of classification has been achieved: numbers of error within training and test sets are minimal, moreover the number of errors within test set must not strongly differ from the number of errors within training set.It will allow excluding retraining of the SVM classifier.www.ijacsa.thesai.org In the simplest case solution of this problem can be achieved by a search of the kernel function types, values of the kernel function parameters and value of the regularization parameter that demands significant computational expenses.A herewith for an assessment of classification quality, the indicators of classification accuracy, classification completeness, etc. can be used [3].
In most cases of the development of binary classifiers, it is necessary to work with the complex, multiple extremal, multiple parameter objective function.
Gradient methods are not suitable for search of the optimum of such objective function, but search algorithms of stochastic optimization, such as the genetic algorithm [16]- [18], the artificial bee colony algorithm [19], the particle swarm algorithm [20], [21], etc., have been used.earch of the optimal decision is carried out at once in all space of possible decisions.
The particle swarm algorithm (Particle Swarm Optimization, PSO algorithm), which is based on an idea of possibility to solve the optimization problems using modeling of animals' groups' behavior is the simplest algorithm of evolutionary programming because for its implementation it is necessary to be able to determine only value of the optimized function [20], [21].
The traditional approach to the application of the PSO algorithm consists of the repeated applications of the PSO algorithm for the fixed type of the kernel functions to choose optimal values of the kernel function parameters and value of the regularization parameter with the subsequent choice of the best type of the kernel function and values of the kernel function parameters and value of the regularization parameter corresponding to this kernel function type.
Along with the traditional approach to the application of the PSO algorithm a new approach, that implements the simultaneous search for the best type of the kernel function, values of the kernel function parameters and value of the regularization parameter, is offered [22].Hereafter, particle swarm algorithms corresponding to traditional and modified approaches will be called as the traditional PSO algorithm and the modified PSO algorithm consequently.
The objective of this paper is to fulfill a comparative analysis of the traditional and modified particle swarm algorithms, applied for the development of the SVM classifier, both on the search time of the optimal parameters of the SVM classifier and the quality of data classification.
The rest of this paper is structured as follows.Section II presents the main stages of the SVM classifier development.Then, Section III details the proposed new approach for solving the problem of the simultaneous search of the kernel function type, values of the kernel function parameters and value of the regularization parameter for the SVM classifier.This approach is based on the application of the modified PSO algorithm.Experimental results comparing the traditional PSO algorithm to the modified PSO algorithm follow in Section IV.Finally, сonclusions are drawn in Section V. , where w is a vector-perpendicular to the separating hyperplane; b is a parameter which corresponds to the shortest distance from the origin of coordinates to the hyperplane; , wz is a scalar product of vectors w and z [1-3] specifies a strip that separates the classes.The wider the strip, the more confidently we can classify objects.The objects closest to the separating hyperplane, are exactly on the bounders of the strip.
In the case of linear separability of classes we can choose a hyperplane so that there is no any object from the training set between them, and then maximize the distance between the hyperplanes (width of the strip) 2, ww  , solving the problem of quadratic optimization [1], [2]: The problem of the separating hyperplane building can be reformulated as the dual problem of searching a saddle point of the Lagrange function, which reduces to the problem of quadratic programming, containing only dual variables [1], [2]: www.ijacsa.thesai.org where i  is a dual variable; i z is the object of the training set; i y is a number (+1 or −1), which characterize the class of the object i z from the experimental data set; ) , ( In training of the SVM classifier it is necessary to determine the kernel function type ) , ( , values of the kernel parameters and value of the regularization parameter C , which allows finding a compromise between maximizing of the gap separating the classes and minimizing of the total error.A herewith typically one of the following functions is used as the kernel function ) , ( [26]: )] are some of parameters; th is a hyperbolic tangent.

These kernel functions allow dividing the objects from different classes.
As a result of the SVM classifier training the support vectors must be determined.These vectors are closest to the hyperplane separating the classes and contain all information about the classes' separation.The main problem dealing with the training of the SVM classifier, is the lack of recommendations for the choice of value of the regularization parameter, the kernel function type and values of the kernel function parameters, which can provide the high accuracy of objects' classification.This problem can be solved with the use of various optimization algorithms, in particular with the use of the PSO algorithm.

III. THE MODIFIED PSO ALGORITHM
In the traditional PSO algorithm the n -dimensional search space ( n is the number of parameters which are subject to optimization) is inhabited by a swarm of m agents-particles (elementary solutions).Position (location) of the i -th particle is determined by vector ) , , , ( , which defines a set of values of optimization parameters.A herewith these parameters can be presented in an explicit form or even absent in analytical record of the objective function ) ,..., , ( ) (  of the optimization algorithm (for example, the optimum is the minimum which must be achieved).
The particles must be situated randomly in the search space during the process of initialization.A herewith each i -th particle ( ) has its own vector of speed ) coordinates' values in every single moment of time corresponding to some iteration of the PSO algorithm.
The coordinates of the i -th particle ( m i , 1  ) in the ndimensional search space uniquely determine the value of the objective function ) , , , ( ) ( which is a certain solution of the optimization problem [20] - [22].
For each position of the n -dimensional search space where the i -th particle ( m i , 1  ) was placed, the calculation of value of the objective function ) ( i x f is performed.A herewith each i -th particle remembers the best value of the objective function found personally as well as the coordinates of the position in the n -dimensional space corresponding to the value of the objective function.Moreover each i -th particle ( m i , 1  ) «knows» the best position (in terms of achieving the optimum of the objective function) among all positions that had been «explored» by particles (due to it the immediate exchange of information is replicated by all the particles).At each iteration particles correct their velocity to, on the one hand, move closer to the best position which was found by the particle independently and, on the other hand, to get closer to the position which is the best globally at the current moment.After a number of iterations particles must come close to the best position (globally the best for all iterations).However, it is possible that some particles will stay somewhere in the relatively good local optimum.

Convergence of the PSO algorithm depends on how velocity vector correction is performed. There are different approaches to implementation of velocity vector
) [20].In the classical version of the PSO algorithm correction of each j -th coordinate of ) is made in accordance with formula [20]: where j i v is the j -th coordinate of velocity vector of the i - th particle; j i x is the j -th coordinate of vector i x , defining the position of the i -th particle; j i х ˆis the j -th coordinate of the best position vector found by the i -th particle during its www.ijacsa.thesai.orgexistence; j х ~ is the j -th coordinate of the globally best position within the particles swarm in which the objective function has the optimal value; rˆ and r ~ are random numbers in interval (0, 1), which introduce an element of randomness in the search process; ˆ and  ~ are personal and global coefficients for particle acceleration which are constant and determine behavior and effectiveness of the PSO algorithm in general.
With personal and global acceleration coefficients in (3) random numbers rˆ and r ~must be scaled; a herewith the global acceleration coefficient  ~ operates by the impact of the global best position on the speeds of all particles and the personal acceleration coefficient ˆ operates by the impact of the personal best position on the velocity of some particle.
Currently different versions of the traditional PSO algorithm are known.In one of the most known canonical version of the PSO algorithm it is supposed to undertake the normalization of the acceleration coefficients ˆ and  ~ to make the convergence of the algorithm not so much dependent on the choice of their values [20].
A herewith correction of each j -th coordinate of the velocity vector ( ) is performed in accordance with formula: where  is a compression ratio; K is the some scaling coefficient, which takes values from the interval (0, 1).When using formula (4) for correction of velocity vector the convergence of the PSO algorithm is guaranteed and there is no need to control the particle velocity explicitly [20].
Let the correction of velocity vector of the i -th particle ( m i , 1  ) is executed in accordance with one of the formulas (3) or (4).The correction of the j -th position coordinate of the ) can be executed in accordance with the formula: Then for each i -th particle ( m i , 1  ) the new value of the objective function ) ( i x f can be calculated and the following check must be perfomed: whether a new position with coordinates vector i x became the best among all positions in which the i -th particle has previously been placed.If new position of the i -th particle is recognized to be the best at the current moment the information about it must be stored in a vector i x ˆ ( m i , 1  ).
A herewith value of the objective function for this position must be remembered.Then among all new positions of the swarm particles the check of the globally best position must be carried out.If some new position is recognized as the best globally at the current moment, the information about it must be stored in a vector x ~.A herewith value of the objective function ) ( i x f for this position must be remembered.
In the case of the SVM classifier's development with the use of the PSO algorithm the swarm particles can be defined by vectors declaring their position in the search space and corded by the kernel function parameters and the regularization parameter: ) , , ( , where i is a number of particle ( are the kernel function parameters of the i - th [a herewith parameter 1 i x is equal to the kernel function parameters d ,  or 2 k (depending on the kernel function type which corresponds to a swamp particle); parameter 2 i x is equal to the kernel function parameter 1 k , if the swamp particle corresponds to the sigmoid type of the kernel function, otherwise this parameter is assumed to be zero]; i C is the regularization parameter.
Then traditional approach to the application of the PSO algorithm in developing the SVM classifier must be concluded in numerous implementation of the PSO algorithm under the fixed kernel function type aiming to choose the optimal parameters values of the kernel function and value of the regularization parameter.
As result for each type T of the kernel function, participating in the search, the particle with the optimal combination of the parameters values ) , , ~ (

C x x
providing high quality of classification will be defined.
The best type and the best values of the required parameters get out by results of the comparative analysis of the best particles received at realization of the PSO algorithm with the fixed kernel function type.
Along with the traditional approach to the application of the PSO algorithm in the development of the SVM classifier there is a new approach that implements a simultaneous search for the best kernel function type T ~, parameters' values 1 x and 2 x of the kernel function and value of the regularization parameter C ~.At such approach each i -th particle in a swamp ) defined by a vector which describes particle's position in the search space: ) , , , ( , where i T is the number of the kernel function type (for example, 1, 2, 3for polynomial, radial basis and sigmoid functions accordingly); www.ijacsa.thesai.orgparameters , x i C are defined as in the previous case.A herewith it is possible to «regenerate» particle through changing its coordinate i T on number of that kernel function type, for which particles show the highest quality of classification.In the case of particles' «regeneration» the parameters' values change so that they corresponded to new type of the kernel function (taking into account ranges of change of their values).Particles which didn't undergo «regeneration», carry out the movement in own space of search of some dimension.
The number of particles taking part in «regeneration» must be determined before start of algorithm.This number must be equal to 15% -25% of the initial number of particles.It will allow particles to investigate the space of search.A herewith they won't be located in it for a long time if their indicators of accuracy are the worst.
The offered modified PSO algorithm can be presented by the following consequence of steps.
Step 1.To determine parameters of the PSO algorithm: number m of particles in a swamp, velocity coefficient K , ).To determine the particles' «regeneration» percentage p .
Step 2. To define equal number of particles for each kernel type function T , included in search, to initialize coordinate i T for each i -th particle ( m i , 1  ) (a herewith every kernel function type must be corresponded by equal number of particles), other coordinates of the i -th particle ( m i , 1  ) must be generated randomly from the corresponding ranges: ] , [ ).To establish initial position of the i -th particle ( m i , 1  ) as its best known position ) , , , ˆ ( , to determine the best particle with coordinates' vector ) , , ~ , ( from all the m particles, and to determine the best particle for each kernel function type T , including in a search, with coordinates' vector ) , , , ( . Herewith number of executed iterations must be considered as 1.
Step 3. To execute while the number of iterations is less than the fixed number max N : ) using formulas: where rˆ and r ~ are random numbers in interval (0, 1),  is a compression ratio calculated using the formula (5); a herewith formula ( 8) is the modification of formula (4): the coordinates' values ) with aim to find the optimal combination ) , , ,

C x x T
, which will provide high quality of classification;  increase of iterations number on 1.
The particle with the optimal combination of the parameters' values ) , , ,

C x x T
which provides the highest quality of classification on chosen the function types will be defined after execution of the offered algorithm.
After executing of the modified PSO algorithm it can be found out that all particles will be situated in the search space which corresponds to the kernel function with the highest classification quality because some particles in the modified PSO algorithm changed their coordinate, which is responsible for number of the kernel function.A herewith all other search spaces will turn out to be empty because all particles will «regenerate» their coordinate with number of the kernel function type.In some cases (for small values of the iterations' number max

N
and for small value of the particles' «regeneration» percentage p ) some particles will not «regenerate» their kernel function type and will stay in their initial search space.www.ijacsa.thesai.org Using of this approach in the application of the PSO algorithm in the problem of the SVM classifier development allows reducing the time required to construct the desired SVM classifier.
Quality evaluation of the SVM classifier can be executed with the use of different classification quality indicators [3].There are cross validation data indicator, accuracy indicator, classification completeness indicator and ROC curve analysis based indicator, etc.

IV. EXPERIMENTAL STUDIES
The feasibility of the modified PSO algorithm using for the SVM classifier development was approved by test and real data.In the experiment for a particular data set the traditional PSO algorithm and the modified PSO algorithm were carried out.Comparison between these algorithms was executed using the found optimal parameters values of the SVM algorithm, classification accuracy and spent time.
Actual data used in the experimental researches was taken from Statlog project and from UCI machine learning library.Particularly, we used two data sets for medical diagnostics and one data set for credit scoring: ) (Australian data set in the Table, the source is http://archive.ics.uci.edu/ml/machine-learning-data bases/statlog/australian/).Moreover two testing data sets were used in experimental researches: Test [11] and МОТП12 (the source is http://machinelearning.ru/wiki/images/b/b2/ MOTP12_svm_example.rar).
For all data sets binary classification was performed.
For development of the SVM classifier the traditional and the modified PSO algorithms were used; a herewith the choice of the optimal values of the SVM classifier parameters was realized.The kernels with polynomial, radial basis and sigmoid functions were included in the search and the identical values of the PSO algorithm parameters and the identical ranges of values' change of the required SVM classifier parameters were established.
The short description of characteristics of each data set is provided in the Table .Here search results of the optimal values of parameters of the SVM classifier with the application of the traditional PSO algorithm and the modified PSO algorithm are presented (in the identical ranges of parameters' change and at the identical PSO algorithm parameters), number of error made during the training and testing of the SVM classifier and search time.For example, for WDBC data set with the use of the traditional and the modified PSO algorithms the kernel with radial basis function (number 2) was determined as the optimal.For the traditional PSO algorithm the optimal values of the kernel parameter and the regularization parameter are equal to    .Particles marked by asterisk bullets in the search spaces and the best position from the search space is marked by white round bullet.During realization of the modified PSO algorithm the swamp particles moves towards the best (optimal) position for the current iteration in the search space and demonstrate collective search of the optimal position.A herewith velocity and direction of each particle are corrected.Moreover «regeneration» of particles takes place: some particles change own search space to space, in which particles show the best quality of classification.
Thus, during realization of the modified PSO algorithm there is a change of the particles' coordinates, which are responsible for parameters of the kernel function ) , (   z z i and the regularization parameter C .Besides, the type of the kernel function also changes.As a result the particles moves towards the united search space (in this casethe corresponding to radial basis kernel function) leaving the space where they were initialized.
In the reviewed example only 7 particles didn't change their kernel function type after 20 iterations.Other particles situated near the best position responsible for the optimal solution in the search space (Figure 5).
It is visible from the Table that that as a result of search for the reviewed data sets both algorithms determined identical kernel function type as the optimal, similar values of the kernel function parameter and the regularization parameter, and also similar accuracy values of training and testing of the SVM classifier.
But the modified PSO algorithm is more effective, because it took less (more than 2-3 times) time for search than traditional one.

V. CONCLUSION
The experimental results obtained on the base of the test data traditionally used to assess the classification quality, confirm the efficiency of the modified PSO algorithm.This algorithm allows choosing the best kernel function type, values of the kernel function parameters and value of the regularization parameter with the time expenditures which are significantly less, than in the case of the traditional PSO algorithm.A herewith high accuracy of classification is provided.
The obtained results had been reached thanks to «regeneration» of particles in the modified PSO algorithm.Particles which participate in the «regeneration» process change their kernel function type to the one which corresponds to the particle with the best value of the classification accuracy.Also, these articles change the accessory ranges of their parameters.
Further researches will have been devoted to the development of recommendations on the application of the modified PSO algorithm in the solution of the practical problems.

Fig. 1 .
Fig. 1.Linear separation for two classes by the SVM classifier in the 2D space The SVM classifier supposes an execution of training, testing, and classification.Satisfactory quality of training and testing allows using the resulting SVM classifier in the classification of new objects.

-
personal and global velocity coefficients ˆ and  ~, maximum iterations number max N of the PSO algorithm.To determine types T of kernel functions, which take part in the search ( sigmoid function) and ranges boundaries of the kernel function parameters and the regularization parameter C for the chosen kernel functions' types T : the modified PSO algorithm the optimal values of the kernel parameter and the regularization parameter are equal to 01 by the traditional PSO algorithm is equal to 99.12%, and the classification accuracy by the modified PSO algorithm is equal to 99.65%.A herewith the search time came to 10108 and 3250 seconds accordingly.For Heart data set in the Figures2 -4the examples of position of the particles swarm in the D-2 search spaces and in the D-3 search space during initialization, at the 3-rd iteration and at the 12-th iteration (with the use of the modified PSO algorithm) are shown.The kernels with polynomial, radial basis and sigmoid functions were included in the search.A herewith the following change ranges of values' parameters were set: function).www.ijacsa.thesai.org

Fig. 2 .Fig. 3 .Fig. 4 .
Fig. 2. Location of particles in a swamp during the initialization (polynomial kernel function is on the left, radial basis is in the middle, sigmoid is on the right)

Fig. 5 .
Fig. 5. Location of particles after the 20-th iteration Change range for the regularization parameter C was determined as: 10 1 .0   C .Moreover, the following values of parameters of the PSO algorithm were set: number m of particles in a swarm equal to 600 (200 per each kernel function type); iterations' number 20 max  N ; personal and global velocity coefficients equal to 2 ˆ  of particles: to choose p % of particles cancer data set of The Department of Surgery at the University of Wisconsin, in which the total number

TABLE I .
THE SEARCH RESULTS BY MEANS OF THE TRADITIONAL PSO ALGORITHM AND THE MODIFIED PSO ALGORITHM