Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles

The problem with development of the support vector machine (SVM) classifiers using modified particle swarm optimization (PSO) algorithm and their ensembles has been considered. Solving this problem would allow fulfilling the highprecision data classification, especially Big Data classification, with the acceptable time expenditures. The modified PSO algorithm conducts a simultaneous search of the type of kernel functions, the parameters of the kernel function and the value of the regularization parameter for the SVM classifier. The idea of particles' «regeneration» served as the basis for the modified PSO algorithm. In the implementation of this algorithm, some particles change the type of their kernel function to the one which corresponds to the particle with the best value of the classification accuracy. The offered PSO algorithm allows reducing the time expenditures for the developed SVM classifiers, which is very important for Big Data classification problem. In most cases such SVM classifier provides the high quality of data classification. In exceptional cases the SVM ensembles based on the decorrelation maximization algorithm for the different strategies of the decision-making on the data classification and the majority vote rule can be used. Also, the two-level SVM classifier has been offered. This classifier works as the group of the SVM classifiers at the first level and as the SVM classifier on the base of the modified PSO algorithm at the second level. The results of experimental studies confirm the efficiency of the offered approaches for Big Data classification. Keywords—Big Data; classification; ensemble; SVM classifier; kernel function type; kernel function parameters; particle swarm optimization algorithm; regularization parameter; support vectors


INTRODUCTION
Big Data is a term for data sets that are so large and/or complex that traditional data processing technologies are inadequate.They require technologies that can be used to store and process the exponentially increasing data sets which contain structured, semi structured and unstructured data.Volume, variety and velocity are three defining characteristics of Big Data.Volume refers to the huge amount of data, variety refers to the number of data types and velocity refers to the speed of data processing.The problems of the Big Data management result from the expansion of all three characteristics.The Big Data does not consist of only numbers and strings but also geospatial data, audio, video, web data, social files, etc. obtained from various sources such as sensors, mobile phones, cameras and so on.
The main purpose of the Big Data technologies is to provide the high quality of data processing and data analysis.Nowadays the Big Data technologies have been applied in many fields of science and engineering, including physical, biological and biomedical sciences.Also, they have been used in government agencies, financial corporations, large enterprises, etc.
The high volume of storage space, in particular, the cloud storage is needed to manage and reuse Big Data which can be useful for many purposes, for example, for hardware and software maintenances.It is therefore necessary to perform the analytical, retrieval and process operations, which are very complex and time consuming ones.In order to overcome these difficulties new Big Data technologies have been getting a lot of attention over the last few years.The Big Data processing improves the transfer speed of the data sets in comparison to the speed of the simple data exchanges.The Big Data mining tools are very useful to the end users when they solve their own actual problems.
Currently many efficient approaches must be implemented when dealing with the Big Data.In particular, the feature selection, clustering and classification plays an important role in the Big Data analysis, when it is necessary to retrieve, search or classify a data, using the Big Data sets.These approaches are useful for such spheres as pattern recognition, machine learning, bio-informatics, data mining, semantic ontology and so on.As there are many algorithms available for the feature selection, clustering and classification, it is necessary to find the appropriate algorithms which must be chosen properly for the problem of the Big Data analysis.
The machine learning algorithms can be considered along a spectrum of the supervised and unsupervised learning algorithms.In the strictly unsupervised learning, the problem is to find the structure such as clusters in the unlabeled data set.The supervised learning uses the training set of the classified data to construct classifier, which can be used to classify new data.In both cases, the Big Data applications demonstrate the growing number of features and the growing volume of the input data.www.ijacsa.thesai.org The Support Vector Machine (SVM) algorithm is the supervised machine learning algorithm.Currently, the SVM algorithm (one of the boundary classification algorithms [1,2]) is used for different classification problems in various applications with great success.
The SVM classifiers based on the SVM algorithm have been applied for credit risk analysis [3], medical diagnostics [4], handwritten character recognition [5], text categorization [6], information extraction [7], pedestrian detection [8], face detection [9], Earth remote sensing [10], etc. SVM classifier uses special kernel function to construct a hyperplane separating the classes of data.An example of the separating hyperplane in the 2D space is shown in Fig. 1.
The SVM classifier is used for training, testing, and classification.Satisfactory quality of training and testing allows using the resulting SVM classifier in the classification of new objects.SVM algorithms are well-known for their excellent performance in the sphere of the statistical classification.Still, the high computational cost due to the cubic runtime complexity is problematic for the Big Data sets: the training of the SVM classifier requires solving a quadratic optimization problem [1,3].Using a standard quadratic problem solver for the SVM classifier training would involve solving a big quadratic programming problem even for a moderate sized data set.This can limit the size of problems which can be solved with the application of the SVM classifier.Nowadays methods like SMO [11], chunking [12] and simple SVM [13], Pegasos [14] exist that iteratively compute the required solution and have a linear space complexity [15].
In recent years to mitigate the problem of the high computational cost the cascade SVM algorithm had been proposed [16].In this algorithm the SVM classifier is iteratively trained on subsets of the original data set, acquired support vectors of the resulting models are combined to create new training sets.The general idea is to bound the sizes of all considered training sets and therefore obtain a significant speedup.This algorithm can easily be parallelized because the number of independent models has to be fitted during each stage of the cascade [17].
In the millennium of Big Data it is necessary to develop data mining algorithms which are suitable for the Big Data analysis.Several parallel algorithms have been developed using threads, MPI, MapReduce and so on [18].Among all these techniques MapReduce is practically well suited for the Big Data analysis.One of the last trends in the Big Data processing and analysis is using the Hadoop framework for the SVM classifiers development [18].Hadoop is an open-source software framework for the distributed storage and distributed processing of very large data sets on the computer clusters built from the commodity hardware.The Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in the distributed computing environment.
Therefore the use of the SVM algorithm is very perspective for the Big Data classification [19,21].
Choosing optimal parameters for the SMV classifier is a significant problem at the moment.It is necessary to find the kernel function type, values of the kernel function parameters and value of the regularization parameter, which must be set by a user and shouldn't be changed [1,2].It is impossible to provide implementing of high-accuracy data classification with the use of the SVM classifier without adequate solution to this problem.
In the simplest case solution to this problem can be found by a search of the kernel function types, values of the kernel function parameters and value of the regularization parameter that demands significant computational expenses.For an assessment of classification quality, the indicators of classification accuracy, classification completeness, etc. can be used [3].
Usually, developing the binary classifiers requires working with the complex, multiextreme function, multi-parameter objective function.
Gradient methods are not suitable for search of the optimum of this objective function, but search algorithms of stochastic optimization, such as the genetic algorithm [22,24], the artificial bee colony algorithm [25], the particle swarm algorithm [26][27][28][29], etc., have been used.Each of the optimal decision is carried out at once in all space of possible decisions.
The particle swarm algorithm (Particle Swarm Optimization, PSO algorithm), which is based on an idea of possibility to solve the optimization problems using modeling of animals' groups' behavior is the simplest algorithm of evolutionary programming because for its implementation it is necessary to be able to determine only value of the optimized function [26][27][28][29].
The traditional approach to application of the PSO algorithm consists of the repeated applications of the PSO algorithm for the fixed type of the kernel functions to choose optimal values of the kernel function parameters and value of the regularization parameter with the subsequent choice of the best type of the kernel function and values of the kernel function parameters and value of the regularization parameter corresponding to this kernel function type.www.ijacsa.thesai.org In a traditional approach to the application of the PSO algorithm it applied repeatedly to the fixed type of the kernel functions to find the optimal parameters.Whereas with a new approach the algorithm uses simultaneous search for the best type of the kernel function, values of the kernel function parameters and value of the regularization parameter.Hereafter, particle swarm algorithms corresponding to traditional and modified approaches will be called as the traditional PSO algorithm and the modified PSO algorithm respectively [30,31].
It is necessary to say that the PSO algorithm and other nature inspired swarm optimization algorithms are very well suited for the distributed architecture and handling of high volume unstructured data in the Big Data analytics.
In recent years, much attention is paid to the question of increasing the accuracy of the models based on the machine learning algorithms.Therefore approaches dealing with the creation of the classifiers' ensembles for the accuracy increase of the classification solution have been investigated [3][4][5].The training of the SVM ensemble is the training procedure of the finite set of the base (individual) classifiers: the individual solutions are combined to form the resulting classification decisions, based on the aggregated classifier.There are different approaches to choose the combination rules of the individual classifiers in the ensemble and the strategies for the creation of the resulting classification decisions [2].
The main purposes of this paper are the following: to create the modified PSO algorithm and compare it with the traditional one using the time required to find the optimal parameters of the SVM classifier and the classification accuracy of data; to improve the accuracy of the classification decisions using the SVM ensemble based on the decorrelation maximization algorithm for the different strategies of the decision-making on the data classification and the majority vote rule; to improve the accuracy of the classification decisions using the two-level SVM classifier.
The rest of this paper is structured as follows.Section II presents the main stages of the SVM classifier development.Section III details the proposed new approach for solving the problem of the simultaneous search of the kernel function type, values of the kernel function parameters and value of the regularization parameter for the SVM classifier.This approach is based on the application of the modified PSO algorithm, the main idea of which is the «regeneration» of particles: some particles change their kernel function type to the one which corresponds to the particle with the best value of the classification accuracy.Section IV is devoted to the problems of the development of the SVM ensembles on the base of the decorrelation maximization algorithm for the different strategies of the decision-making on the data classification and the majority vote rule.Section V details the two-level SVM classifier.This classifier works as the group of the SVM classifiers at the first level and as the SVM classifier on the base of the modified PSO algorithm at the second level.Experimental results follow in Section VI.Finally, conclusions are drawn in Section VII.

II. THE SUPPORT VECTOR MACHINE CLASSIFIER
Let the experimental data set be a set in the form of )} ( ),..., {( 1 having a value of +1 or −1 depending on the class of the object i z .It is assumed that every object i z is mapped to q -dimensional vector of numerical values of characteristics ) , , , ( (typically normalized by values from the interval [0, 1]) where l i z is the numeric value of the l -th characteristic for the i -th object ( ) [30], [31].It is necessary to use the special function ) , ( which is called the kernel, to build the classifier , which compares the class to the number from the set or some object from the set Z . To build «the best» SVM classifier it is necessary to implement the numerous repeated training (for the training data set with S elements) and testing (for the test data set S s  elements, s S  ) on the different randomly generated training and test sets with following determination of the best SVM classifier in terms of the highest possible classification quality provision.The test set contains the part of data from the experimental data set.The size of the test set must be equal to 1/10 -1/3 of the experimental data set.The test set doesn't participate in controlling the parameters of the SVM-classifier.This set is used to measure classifier's accuracy.The SVM classifier with satisfactory training and testing results can be used to classify new objects [1][2][3].
The separating hyperplane for the objects from the training set can be represented by equation , where w is a vector-perpendicular to the separating hyperplane; b is a parameter which corresponds to the shortest distance from the origin of coordinates to the hyperplane; z w, is a scalar product of vectors w and z [1][2][3].The condition specifies a strip that separates the classes.The wider the strip, the more confidently we can classify objects.The objects closest to the separating hyperplane, are exactly on the bounders of the strip.
If classes can be separated by the straight line, a hyperplane can be chosen so that no objects from the training set get between them and then maximize the distance between the hyperplanes (width of the strip) , which will solve the problem of quadratic optimization [1,2]: Finding the separating hyperplane is basically the dual problem of searching a saddle point of the Lagrange function, which reduces to the problem of quadratic programming, containing only dual variables [1,2]: www.ijacsa.thesai.org )] are some of parameters; th is a hyperbolic tangent.
These kernel functions allow dividing the objects from different classes.
As a result of the training, the classification function is determined in the following form [1], [3]: The classification decision, associating the object z to the class −1 or +1, is adopted in accordance with the rule [1], [3]: The SVM classifier training results in determining the support vectors.Using the PSO algorithm provides better accuracy of classification by choosing the kernel function type, values of the kernel function parameters and value of the regularization parameter.
Quality of the SVM classifier can be measured by different classification quality indicators [3].There are cross validation data indicator, accuracy indicator, classification completeness indicator and ROC curve analysis based indicator, etc.

III. THE MODIFIED PARTICLE SWARM OPTIMIZATION ALGORITHM
In the traditional PSO algorithm the n -dimensional search space ( n is the number of parameters which are subject to optimization) is inhabited by a swarm of m agents-particles (elementary solutions).Position (location) of the i -th particle is determined by vector ) , , , ( , which defines a set of values of optimization parameters.These parameters can be presented in an explicit form or even absent in the analytical record of the objective function ) ,..., , ( ) (  of the optimization algorithm (for example, the optimum is the minimum which must be achieved).
The particles must be situated randomly in the search space during the process of initialization.Each i -th particle ( m i , 1  ) has its own vector of speed ) coordinates' values in every single moment of time corresponding to some iteration of the PSO algorithm.
The coordinates of the i -th particle ( ) in the ndimensional search space uniquely determine the value of the objective function ) ,..., , ( ) (  which is a certain solution of the optimization problem [26][27][28][29].
For each position of the n -dimensional search space where the i -th particle ( ) was placed, the calculation of value of the objective function ) ( i x u is performed.A herewith each i -th particle remembers the best value of the objective function found personally as well as the coordinates of the position in the n -dimensional space corresponding to the value of the objective function.Moreover each i -th particle ( m i , 1  ) «knows» the best position (in terms of achieving the optimum of the objective function) among all positions that had been «explored» by particles (due to it the immediate exchange of information is replicated by all the particles).At each iteration particles correct their velocity to, on the one hand, move closer to the best position which was found by the particle independently and, on the other hand, to get closer to the position which is the best globally at the current moment.After a number of iterations particles must come close to the best position (globally the best for all iterations).However, it is possible that some particles will stay somewhere in the relatively good local optimum.
Convergence of the PSO algorithm depends on how velocity vector correction is performed.There are different approaches to implementation of velocity vector i v correction for the i -th particle ( ) [26].In the classical version of the PSO algorithm correction of each j -th coordinate of www.ijacsa.thesai.orgvelocity vector ( ) is made in accordance with formula [26]: where j i v is the j -th coordinate of velocity vector of the i - th particle; j i x is the j -th coordinate of vector i x , defining the position of the i -th particle; j i х ˆ is the j -th coordinate of the best position vector found by the i -th particle during its existence; j х ~ is the j -th coordinate of the globally best position within the particles swarm in which the objective function has the optimal value; rˆ and r ~ are random numbers in interval (0, 1), which introduce an element of randomness in the search process; ˆ and  ~ are personal and global coefficients for particle acceleration which are constant and determine behavior and effectiveness of the PSO algorithm in general.
With personal and global acceleration coefficients in (5) random numbers rˆ and r ~must be scaled; the global acceleration coefficient  ~ operates by the impact of the global best position on the speeds of all particles and the personal acceleration coefficient ˆ operates by the impact of the personal best position on the velocity of some particle.
Currently different versions of the traditional PSO algorithm are known.In one of the most known canonical version it is supposed to undertake the normalization of the acceleration coefficients ˆ and ~ to make the convergence of the algorithm not so much dependent on the choice of their values [26].
Correction of each j -th coordinate of the velocity vector ) is performed in accordance with formula [26]: where  is a compression ratio; K is some scaling coefficient, which takes values from the interval (0, 1).
When using formula (6) for correction of velocity vector the convergence of the PSO algorithm is guaranteed and there is no need to control the particle velocity explicitly [26].
Let the correction of velocity vector of the i -th particle ( ) is executed in accordance with one of the formulas (5) or (6).The correction of the j -th coordinate of the i -th ) can be executed in accordance with the formula:

 
Then for each i -th particle ( ) the new value of the objective function ) ( i x u can be calculated and the following check must be performed: whether a new position with coordinates vector i x became the best among all positions in which the i -th particle has previously been placed.If new position of the i -th particle is recognized to be the best at the current moment the information about it must be stored in a vector ).
Value of the objective function ) ( i x u for this position must be remembered.Then among all new positions of the swarm particles the check of the globally best position must be carried out.If some new position is recognized as the best globally at the current moment, the information about it must be stored in vector x ~.Value of the objective function for this position must be remembered.
In the case of the SVM classifier's development with the use of the PSO algorithm the swarm particles can be defined by vectors declaring their position in the search space and corded by the kernel function parameters and the regularization parameter: , where i is a number of particle ( are the kernel function parameters of the i - th particle, [parameter 1 i x is equal to the kernel function parameters d ,  or 2 k (depending on the kernel function type which corresponds to a swamp particle); parameter x is equal to the kernel function parameter 1 k , if the swamp particle corresponds to the sigmoid type of the kernel function, otherwise this parameter is assumed to be zero]; i C is the regularization parameter [30,31].
After that to choose the optimal parameter values of the kernel function and the regularization parameter traditional approach to the application of the PSO algorithm is concluded numerous times for the fixed kernel function's type.
As a result for each type T of the kernel function, participating in the search, the particle with the optimal combination of the parameters values ) , , ~ (

C x x
providing high quality of classification will be defined [30,31].
The best type and the best values of the required parameters are found using the results of the comparative analysis of the best particles received at realization of the PSO algorithm with the fixed kernel function type.
Along with the traditional approach to the application of the PSO algorithm in the development of the SVM classifier there is a new approach that implements a simultaneous search for the best kernel function type T ~, parameters' values 1 x and 2 x of the kernel function and value of the regularization parameter C ~ [30,31].At such approach each i -th particle in a swamp ( m i , 1  ) defined by a vector which describes particle's position in the search space: ) , , , ( , where i T is the number of the kernel function type (for example, 1, 2, 3for www.ijacsa.thesai.orgpolynomial, radial basis and sigmoid functions accordingly); parameters , C are defined as in the previous case.It is possible to «regenerate» particle through changing its coordinate i T on number of that kernel function type, for which particles show the highest quality of classification.In the case of particles' «regeneration» the parameters' values change so that they corresponded to new type of the kernel function (taking into account ranges of change of their values).Particles which didn't undergo «regeneration», carry out the movement in own space of search of some dimension.
The number of particles taking part in «regeneration» must be determined before start of algorithm.This number must be equal to 15% -25% of the initial number of particles.It will allow particles to investigate the space of search.A herewith they won't be located in it for a long time if their indicators of accuracy are the worst.
The offered modified PSO algorithm can be presented by the following consequence of steps [30].
Step 1.To determine parameters of the PSO algorithm: number m of particles in a swamp, velocity coefficient K , personal and global velocity coefficients ˆ and  ~, maximum iterations number max N of the PSO algorithm.To determine types T of kernel functions, which take part in the search ( ).To determine the particles' «regeneration» percentage p .
Step 2. To define equal number of particles for each kernel type function T , included in search, to initialize coordinate i T for each i -th particle ( ) (every kernel function type must be corresponded by equal number of particles), other coordinates of the i -th particle ( ) must be generated randomly from the corresponding ranges: ] , [ ).To establish initial position of the i -th particle ( ) as its best known position ) , , , ˆ ( , to determine the best particle with coordinates' vector from all the m particles, and to determine the best particle for each kernel function type T , including in a search, with coordinates' vector ) , , , ( . Number of executed iterations must be considered as 1.
Step 3. To execute while the number of iterations is less than the fixed number max  where rˆ and r ~ are random numbers in interval (0, 1),  is a compression ratio calculated using the formula (7); formula (10) is the modification of formula ( 6): the coordinates' values of the globally best particle;  accuracy calculation of the SVM classifier with parameters' values ) , , , ( ) with aim to find the optimal combination ) , , , ~ ( , which will provide high quality of classification;  increase of iterations number on 1.
The particle with the optimal combination of the parameters' values which provides the highest quality of classification on chosen the function types will be defined after execution of the offered algorithm.
After executing of the modified PSO algorithm it can be found out that all particles will be situated in the search space which corresponds to the kernel function with the highest classification quality because some particles in the modified PSO algorithm changed their coordinate, which is responsible for number of the kernel function.A herewith all other search spaces will turn out to be empty because all particles will «regenerate» their coordinate with number of the kernel function type.In some cases (for small value max N and for small value p ) some particles will not «regenerate» their kernel function type and will stay in their initial search space.
The modified PSO algorithm allows reducing the time expenditures for development of the SVM classifier.

IV. THE SUPPORT VECTOR MACHINE ENSEMBLE
In most cases SVM classifier based on the modified PSO algorithm provides high quality of data classification.In www.ijacsa.thesai.orgexceptional cases the SVM ensembles can be used to increase the classification accuracy.The using of the SVM ensemble allows fulfilling the high-precision data classification, especially Big Data classification, with the acceptable time expenditures.
After training, each classifier generates its own (individual) classification decisions, same or different from the actual results of classification.Accordingly, the different individual SVM classifiers correspond to the different classification accuracy.The quality of the received classification decisions can be improved on the base of ensembles of the SVM classifiers [3], [33][34][35][36].In this case, the finite set of individually trained classifiers must be learned.Then the classification decisions of these classifiers are combined.The resulting solution is based on the aggregated classifier.The majority vote method and the vote method based on the degree of reliability can be used as the rules (strategies) of the definition of the aggregated solutions.
The majority vote method is one of the most common and frequently used methods for combining decisions in the ensemble of classifiers.But this method does not fully use the information about the reliability of each individual SVM classifier.For example, suppose that the SVM classifier ensemble aggregates the results of five individual SVM classifiers, where values of the function ) (z f of the object z (3) obtained from the three individual SVM classifiers, are negative (class -1), but very close to the neutral position, and values of the function ) (z f of the other two SVM classifiers are strongly positive (class +1), i.e. very far away from the neutral position.Then the result of the aggregated decision of the ensemble on the basis of «one classifierone vote» is following: the object z belongs to the negative class (majority vote), although it is obvious, that the best and more appropriate choice for the object z is a positive class.Despite the good potential of the majority vote method for combining of the group of decisions, it is recommended to use other methods to increase the accuracy of classification.
Vote method based on the degree of reliability uses value of the function ) (z f for the object z obtained by each individual SVM classifier.The greater the positive value of ) (z f in (3) returned by the SVM classifier, the more precisely the object z is determined in class +1, and the less negative value ) (z f , the more precisely the object z is defined in class -1.Values «-1» and «+1» for ) (z f indicate that the object z is situated on the boundary of the negative and positive classes, respectively. When using an ensemble of classifiers for solving classification problems special attention should be paid to the methods of forming a set of individual classifiers, which can later be used in the development of the final SVM classifier.It is experimentally confirmed [3], [33][34][35][36][37], that the ensemble of classifiers shows better accuracy than any of its individual members, if individual classifiers are accurate and varied.Therefore, the formation of the set of the individual SVM classifiers is required: 1) to use the various kernel functions; 2) to build classifiers in the different ranges of change of the kernel parameters and regularization parameter; 3) to use various sets of training and test data.To select the appropriate members of the ensemble in the set of the trained SVM classifiers it is recommended to use the principle of maximum decorrelation.In this case the correlation between the selected classifications should be as small as possible.After training, each private j -th classifier from the k trained classifier will correspond to a certain array of errors: , where ij e is the error of j -th classifier at i -th row of the experimental data set ( The SVM classifiers not permitting an error on the experimental data set should be excluded from further consideration and from the remaining quantity of the SVM classifiersIt is necessary to select an appropriate number of individual SVM classifiers with maximal variety.To solve this problem decorrelation maximization algorithm can be used.This algorithm provides a variety of individual SVM classifiers, being used in the construction of the ensemble [3].If the correlation between the selected classifiers is small, then the decorrelation is maximal.

 
where ij e is the error of the j -th classifier at the i -th row of the experimental data set ( ).
On the basis of the error matrix E (13) the following assessments can be calculated [3]: Then the elements tj r of the correlation matrix with size k k  are calculated as: where tj r is the correlation coefficient, representing the degree of correlation of t -th and j -th classifiers ( ).
Using the correlation matrix R it is possible for each individual j -th classifier to calculate the plural-correlation coefficient j  , which characterizes the degree of correlation of j -th and all other 1) (  k where R is the determinant of the correlation matrix R ; jj R is the cofactor of the element jj r of the correlation matrix A quantity  can take values from 0 to 1. The closer the coefficient to 1, the stronger the relationship between the analyzed variables (in this case, between individual classifiers) [3].It is believed that there is a dependency, if the coefficient of determination is not less than 0.5.If the coefficient of determination greater than 0.8, it is assumed that high dependence exists.
For selection of individual SVM classifiers for integration into the ensemble it is necessary to determine the threshold  .Thus, the j -th individual classifier must be removed from the list of classifiers if the coefficient of determination 2 ).If it is necessary to identify the most various classifiers, generating decisions with the most different arrays of errors on the experimental data set, thresholds  , satisfying to condition 0.7   should be selected.The additional considerations can be also taken into account to avoid the exclusion of insufficient or excessive number of individual SVM classifiers.
The decorrelation maximization algorithm can be summarized into the following steps [3].
Step 1.To calculate the matrix V and the correlation matrix R with formulas (15), ( 16) and ( 17) respectively.
Step 2. To calculate the multiple correlation coefficients ) with (18) for all classifiers.
Step 3. To remove classifiers, for which from the list of classifiers.
Step 4. To repeat iteratively steps 1 -3 for the remaining classifiers in the list until for all classifiers the condition ) will not satisfied.
As a result, the list of classifiers used to form the ensemble will consist of m ( k m  ) individual classifiers.
For classifiers selected in the ensemble, it is necessary to carry out:  the rationing of degrees of the reliability;  the strategy search for the integration of members of the ensemble;  the calculation of the aggregated decision of the ensemble.
Value of the reliability ) (z f j , which is defined for the object z by the j -th classifier, falls into the interval (-∞, +∞).
The main drawback of such values is that in the ensemble the individual classifiers with large absolute value are often dominated in the final decision of the ensemble.To overcome this drawback, the rationing is carried out: the transformation of values of degrees of reliability in the interval [0; 1] is fulfilled.In the case of binary classification in the rationalization for the object z the values of the reliability of its membership to positive class (labeled +1) ) (z g j  and to negative class ) (z g j  are determined.These values can be determined by the formulas [3]: The selected individual classifiers are combined into the ensemble using ) in accordance with one of the following five strategies [3].

1) Maximum strategy:
2) Minimum strategy: 3) Median strategy: The value ) (z Α is an aggregated measure of the reliability's value of the SVM classifier ensemble.It can be used to integrate the members of the ensemble [3].
The learning algorithm of the ensemble of the SVM classifiers can be summarized into the following steps.
Step 1.To divide the experimental data set into k training data sets: 1 TR , …, k TR .
Step 2. To learn k individual SVM classifiers with the different training data sets Step 3. To select m ( k m  ) SVM classifiers from k classifiers using the decorrelation maximization algorithm.
Step 4. To determine values of m classification functions for each selected individual SVM classifier: ) Step 5. To transform values of degrees of reliability, using ( 19) and ( 20), for the positive class Step 6.To determine the aggregated value ) (z Α of the reliability of the SVM classifier ensemble using ( 21) -( 25).This algorithm, used for the weak SVM classifiers, will provide a better quality of the classification accuracy than accuracy of any single individual classifier used for aggregation.
The problem of choosing of the threshold  is very important.Value  for which all five rules of classification ( 21) - (25) show stable improvement of the classification quality must be chosen as the threshold value *  ( 0.7 * 


).Thus the use of each of the five rules leads to improvement of the classification quality resulting in the reduction of the number of erroneous decisions, when the smaller number of individual classifiers, corresponding to the threshold value *  , is applied.Such stable improvement of the classification quality isn't observed for all examined values   (for which It should be noted, that the majority vote rule may be used for decisions, obtained using the classification rules ( 21) - (25), to determine the required threshold value *  .

V. TWO-LEVEL SVM CLASSIFIER
The main problem which limits the use of the PSO algorithm is associated with quite a lot of time required to search for the optimal parameters of the SVM classifier (the kernel function type, the values of the kernel function parameters and the value of the regularization parameter).The search time can be partly reduced by using a small number of particles in the swarm and a small number of iterations of the PSO algorithm.But in this case we limit the number of the generated and compared SVM classifiers, and will probably find the worst decision.
One approach to reducing the search time is associated with the reduction in the size of the training data set.A herewith those objects that will not affect the classification results shouldn't be considered.This approach is based on the following theoretical fact of the development of the SVM classifier: the classification function (3) performs the summation only for the support vectors for which 0  i  .These vectors contain all the information about the objects division and play the main role in the construction of the hyperplanes separating the classes.
Therefore the two-level SVM classifier has been developed.This SVM classifier works as the group of the SVM classifiers at the first level and as the SVM classifier on the base of the modified PSO algorithm at the second level.This two-level SVM classifier is iteratively trained on subsets of the original experimental data set at the the first level.Then support vectors of the obtained SVM classifiers are combined to create the new training set for the SVM classifier on the base of the modified PSO algorithm.
The proposed approach to can be described by the following consequence of steps.

VI. EXPERIMENTAL STUDIES
The assessment of the offered approaches for the development of the SVM classifiers and their ensembles has been carried out by test and real data.www.ijacsa.thesai.org In the first experiments for the particular data set the traditional PSO algorithm and the modified PSO algorithm have been applied.Comparison between these algorithms was executed using the found optimal parameters values of the SVM classifier, classification accuracy and spent time.All data sets used in the experimental researches we taken from the Statlog project and from the UCI library for machine learning.
Particularly, we used two data sets for medical diagnostics, two data sets for credit scoring and one data set for the creation of the predictive model of the spam recognition on the base of the e-mails' data set: ) and all information was obtained with the use of digital images (WDBC data set in the Tabl.I, the source is http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancerwisconsin/);  Heart disease data set, in which the total number of instances is 270 including 150 cases with the diagnosed heart disease (class 1) and 120 cases without such diagnosis (class 2); a herewith each patient is described by 13 characteristics ( 13  q ) (Heart data set in the Tabl.I, the source is http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/; a herewith desease was found for 150 patients (class 1) and desease was not found for 120 patients (class 2));  Australian consumer credit data set, in which the total number of instances is 690 including 382 creditworthy cases (class 1) and 308 default cases (class 2); a herewith each applicant is described by 14 characteristics (

 q
) (Australian data set in the Tabl.I, the source is http://archive.ics.uci.edu/ml/machinelearning-databases/ statlog/australian/);  German credit data set, in which the total number of instances is 1000 including 700 creditworthy cases (class 1) and 300 default cases (class 2); a herewith each applicant is described by 24 characteristics ( 24  q ) (German data set in the Tabl.I; the source is http://archive.ics.uci.edu/ml/machine-learningdatabases/statlog/german/); Spam data set, in which in which the total number of instances is 4601 including 1813 cases with the spam (class 1), that is equal to 39.4% of the data set size, and 2788 cases without the spam (class 2); a herewith each e-mail is described by 57 characteristics ( 57  q ) (Spam data set in the Tabl.I; the source is https://archive.ics.uci.edu/ml/machine-learningdatabases/spambase/).
The Spam data set we consider as an example of the Big Data.It is logical, especially, if we plan to use the developed SVM-classifier for the identification of the new spam patterns in a data flow.Also, we used two test data sets Test1 (Test1 data set in the Tabl.I) and Test2 (Test2 data set in the Tabl.I; the source is http://machinelearning.ru/wiki/images/b/b2/MOTP12_svm_ex ample.rar)[26].
For all data sets binary classification has been performed.

Experimental calculations were made on the base of PC under the Microsoft Windows 7 for x64-based Operating
System with the random access memory of 3 GB and the fournuclear Intel® Core™ i3 processor with the kernels' clock frequency of 2.53 GHz.The SVM algorithm from the software package MATLAB 7.12.0.635 was applied for the modeling.
For development of the SVM classifier the traditional and the modified PSO algorithms were used, meaning that the choice of the optimal values of the SVM classifier parameters was conducted.The kernels with polynomial (#1), radial basis (#2) and sigmoid (#3) functions were included in the search and the identical values of the PSO algorithm parameters and the identical ranges of values' change of the required SVM classifier parameters were established.
The short description of characteristics of each data set is provided in the Table I.Here the search results of the optimal values of parameters of the SVM classifier with the application of the traditional PSO algorithm and the modified PSO algorithm are presented (in the identical ranges of parameters' change and at the identical PSO algorithm parameters), number of error made during the training and testing of the SVM classifier and search time.For example, for the WDBC data set with the use of the traditional and the modified PSO algorithms the kernel with radial basis function (#2) was determined as the optimal.For the traditional PSO algorithm the optimal values of the kernel parameter and the regularization parameter are equal to 89 .The classification accuracy by the traditional PSO algorithm is equal to 99.47%, and the classification accuracy by the modified PSO algorithm is equal to 99.65%.The search time came to 3919 and 1464 seconds accordingly.
For the Spam data set with the use of the traditional and the modified PSO algorithms the kernel with radial basis function (#2) also was determined as the optimal.For the traditional PSO algorithm the optimal values of the kernel parameter and the regularization parameter are equal to 47 .The classification accuracy by the traditional and modified PSO algorithm is equal to 97.91%.The search time came to 92645 and 44933 seconds accordingly.Particles are marked by asterisk bullets in the search spaces and the best position from the search space is marked by white round bullet.During realization of the modified PSO algorithm the swamp particles moves towards the best (optimal) position for the current iteration in the search space and demonstrate collective search of the optimal position.A herewith velocity and direction of each particle are corrected.Moreover «regeneration» of particles takes place: some particles change own search space to space, in which particles show the best quality of classification.Thus, at the realization of the modified PSO algorithm there is a change of the particles' coordinates, which are responsible for the parameters of the kernel function ) , (   z z i and the regularization parameter C .Besides, the type of the kernel function also changes.As a result the particles moves towards the united search space (in this casethe space corresponding to the radial basis kernel function) leaving the space where they were initialized.
In the reviewed example only 7 particles didn't change their kernel function type after 20 iterations.Other particles situated near the best position responsible for the optimal solution in the search space (Fig. 5).Fig. 6 shows the location examples of the particles swarm in the D-2 search space at the initialization and at the 2-nd, the 7-th and the 10-th iterations of the traditional PSO algorithm for the radial basis kernel function.The best particle has been found at the 8-th iteration, though 20 iterations have been executed.The kernels with polynomial, radial basis and sigmoid functions were included in the search.The following change ranges of values' parameters were set: Table II shows the information on the best SVM classifier at the different iterations of the traditional PSO (for the radial basis kernel function, which was defined as the best kernel function) and modified PSO algorithm (for three kernel functions) for the Spam data set.I, that as a result of the search for the reviewed data sets both algorithms determined identical kernel function type as the optimal, similar values of the kernel function parameter and the regularization parameter, and also similar accuracy values of training and testing of the SVM classifier.

It is visible from the Table
But the modified PSO algorithm is more effective, because it took less (more than 2 -3 times) time for search compared to the traditional one.
At the determination of the optimal parameters' values of the SVM classifier with use of the traditional or modified PSO algorithm in the chosen search space we must create the huge number of the SVM classifiers to figure out, which shows the maximum classification accuracy under the minimum number of the support vectors.Therefore at the implementation of the PSO algorithm with 600 particles under 20 iterations of the PSO algorithm it is necessary to build and compare 12000 SVM classifiers.
If the average time of the training and testing of the SVM classifier equal to 5 seconds, then the time expenditures for the search of the optimal parameters' values of the SVM classifier will be 60000 5 20 600    seconds or about 16.67 hours, that considerably surpasses the time expenditures for the development of 18 SVM classifiers (90 seconds) under the SVM ensemble development.
The experimental studies show, that the search time is defined: a) by the own parameters of the PSO algorithm (the speed coefficients, the quantity of the kernel functions and the types of the kernel functions, the search ranges, etc.); b) by properties of the experimental data set used for the training and testing of the SVM classifier (in particular, by the size of the data set and the number of characteristics).The lesser search time of the modified PSO algorithm in comparison with the search time of the traditional PSO algorithm is explained by the fact that some particles "regenerate" from the one search space (with the one kernel function type) to another search space (with the another kernel function type).The time expenditures for the SVM classifier development for the first kernel function type are more expensive than for the second kernel function type (in particular, the most expensive on time is the polynomial kernel function).
It should be noted that the SVM classifier for the German data set doesn't have really good classification accuracy assessment (in comparison with the SVM classifiers for other data sets).The attempt of the SVM classifier training in this case leads to the SVM classifier with not really high classification accuracy or to the retraining of the SVM classifier when the number of error for the test set is significantly more, than for the training set (with the acceptable classification accuracy for the experimental data set in general).Therefore, it is expedient to try to use other approaches to the classifier development, in particular, the approach based on the SVM ensemble development.
One more reason to use SVM ensemble is the realization of the PSO algorithm, which deals with the high time expenditures: to increase the classification accuracy we need to increase the number of the PSO algorithm iterations or/and number of particles in the swarm, however it doesn't guarantee that the expected high classification accuracy will be obtained.Therefore it is necessary to try to develop the SVM ensemble on the base of the individual SVM classifiers with not really high classification accuracy.The classification accuracy of the SVM ensemble should have higher classification accuracy than the classification accuracy of the used individual SVM classifiers.
In the last experiments the usefulness of the SVM ensembles was confirmed with application of test and real data sets.III.
The optimal threshold value *  for the reviewed example is 0.3, since for 3 .0 *   all five classification rules (strategies) ( 21) - (25) give the stable improvement of the classification quality when the number of classifiers reduces to the number corresponding to the threshold value 3 .0 *   .The finite number of classifiers in the SVM ensemble proved equal to 6.A further decrease in the number of classifiers is not feasible (due to a further sharp decrease in their number and a substantial reduction of their variety).
The use of the median strategy (or sum strategy) with 3 .0 *   allowed classifying correctly 98.29% of the objects of the initial data set.At the same time, the maximum classification accuracy of one of the individual SVM classifiers, used in the SVM ensemble, was equal to 93.2%, and the accuracy reached with use of the majority vote rule was equal to 96.8%.
Thus, the use of the SVM ensemble allowed increasing the classification accuracy by more than 5% compared to the maximum classification accuracy of one of the individual classifiers in the SVM ensemble.For the Spam data set we also developed 18 individual SVM classifiers with use of various input parameters.
The parameters and characteristics of 18 individual classifiers have been shown in the Table IV.The kernels with polynomial (#1), radial basis (#2), sigmoid (#3) and linear (#4) functions were included in the search.In the Table IV for the sigmoid kernel function the first number is 1 k , the second number is 2 k .
Also Table IV  Table VI shows information on the characteristics of the individual SVM-classifier, which take a part in the SVM ensemble.This ensemble was created on the base of the strategies ( 21) -( 25  Thus, the use of the SVM ensemble allowed increasing the classification accuracy almost by 5% compared to the maximum classification accuracy of one of the individual classifiers in the SVM ensemble.
The SVM ensemble with 98.59% classification accuracy doesn't concede to the SVM classifier on the base of the modified PSO algorithm with 97.91% classification accuracy and strongly surpasses it at the minimization of the time expenditures.
The proposed two-level SVM classifier was used for the Test2 data set classification (Table I).It is evident that, despite the small volume ( 400  s ) and the number of characteristics ( 2  q ), the PSO algorithm finds the optimal parameters for the SVM classifier in quite a long time (longer than, for example, for the WDBC data set of 569 objects with 30 characteristics).This is due to the data being hard to separate.Fig. 7 shows the location of the data in the 2D space and its division into two classes.Objects of the first class are marked by asterisk bullets, objects of the second class are marked by plus bullets.It is obviously that it is very difficultly to draw the curve separating the classes.For this data set the group of 9 SVM classifiers was trained (Table VII).Three kernel functions were included in the search: polynomial (# 1), radial basis (# 2) and sigmoidal (# 3).In the Table VII for the sigmoid kernel function the first number is 1 k , the second number is 2 k .
At the first level of the two-level SVM classifier 215 objects were selected from the initial 400 objects.These 215 objects have been identified by the group of the SVM classifiers as the support vectors.Noteworthy, 204 objects    During the experiments it was found that the individual classifiers show the accuracy of ranging from 85.75% to 91.5%.The accuracy of the two-level SVM classifier amounted to 96.75%.Thus, using the two-level SVM classifier has improved the classification accuracy by more than 5% compared to the maximum precision of one of the SVM classifiers.The number of objects used in the training and testing of the SVM classifier was reduced from 400 to 215.
Besides, the offered two-level SVM classifier has been used for Spam data set classification.For this data set the group of 10 SVM classifiers was trained (Table VIII).Three kernel functions were included in the search: polynomial (# 1), radial basis (# 2) and sigmoidal (# 3).In the Table VIII for the sigmoid kernel function the first number is 1 k , the second number is 2 k .A herewith we used the SVM classifiers which show the acceptable classification accuracy (more than 80%) under the small number of the support vectors (till 1000).
At the first level of the two-level SVM classifier 2055 objects (that is equal to about 45% of the original experimental data set) were selected from the initial 4601 objects.These During the experiments it was found that the individual classifiers show the accuracy of ranging from 85.75% to www.ijacsa.thesai.org91.5%.The accuracy of the two-level SVM classifier amounted to 97.26%.Thus, the two-level SVM classifier improved the classification accuracy by almost 3% compared to the maximum accuracy of one of the SVM classifiers.The number of objects used in the training and testing of the SVM classifier was reduced from 4601 to 2055 (i.e. more than twice).Thus, the results of experimental studies confirm the efficiency of the offered approaches for Big Data classification.

VII. CONCLUSION
The efficiency of the suggested approaches has been confirmed by the results of experimental studies.
The SVM classifiers on the base of the modified PSO algorithm allow classifying data with the high classification accuracy.
The modified PSO algorithm allows choosing the best kernel function type, values of the kernel function parameters and value of the regularization parameter within appropriate time expenditures, which turned out to be significantly less than when using the traditional PSO algorithm.The main feature of the modified PSO algorithm is using the «regeneration» of the particles.
The SVM ensembles based on the decorrelation maximization algorithm for the different strategies of the decision-making on the data classification and the majority vote rule allow reducing the accident classification decision received by one classifier, and help to improve the classification accuracy.The shortcomings of some classifiers are compensated by strengths of others classifiers thanks to combination of their results.Classifiers counterbalance the results' accident of each other, finding the most plausible output classification decision.It allows finding the best classification result with minimum classification error.
The two-level SVM classifier also allows improving the classification accuracy within appropriate time expenditures.Further researches will be devoted to the development of recommendations on the application of the SVM classifiers based on the modified PSO algorithm and their ensembles for the solution of the practical problems, especially for the Big Data classification problems.It is necessary to say that the PSO algorithm and other nature inspired swarm optimization algorithms are very well suited for the distributed architecture and handling of high volume unstructured data in the Big Data analytics.

Fig. 1 .
Fig. 1.Linear separation for two classes by the SVM classifier in the 2D space

-
sigmoid function) and ranges boundaries of the kernel function parameters and the regularization parameter C for the chosen kernel functions' types T : decision (−1 or +1) of j -th classifier at i -th row of the experimental data set; ij y ~ is the real meaning of a class (−1 or +1), for which the i -th object is belong to.
Let there be an error matrix E of set of individual SVM classifiers with size k s  :

1 TR
, …, k TR and to obtain k individual SVM classifiers (ensemble members).

1 ) 2 TR 2 ) 2 SV 2 SV 3 ) 4 ) 5 )
To train k SVM classifiers on the original experimental data set using different training data sets 1 TR , , …, k TR at the first level of the two-level SVM classifier.A herewith it is necessary to use the different kernel functions types, the different values of the kernel function parameters and the regularization parameter.To obtain the support vectors sets 1 SV , , …, k SV from the trained SVM classifiers and form the set SV as the union of the support vectors sets 1 SV , set SV consists of t objects ( s t  , where s is the number of objects in the experimental data set).To select from the set SV the subset  SV , consisting of T ( t T  ) objects (support vectors), which have been correctly classified by the SVM classifiers.It is necessary to ensure that false data is not participated in the training of the SVM classifier on the base of the modified PSO algorithm at the second level of the two-level SVM classifier.The rest objects (support vectors) from the set SV will form the subset  SV with T t  objects.The subset  SV will be used for the training and the subset  SV will be used for the testing of the SVM classifier on the base of the modified PSO algorithm.To develop the SVM classifier on the base of the modified PSO algorithm.To classify objects (from the original experimental data set) which not included in the sets  SV and  SV .The using of the two-level SVM classifier also allows carrying out the high-precision data classification, especially Big Data classification, with the acceptable time expenditures.

Fig. 2 -
Fig. 2-4 show for the Spam data set the location examples of the particles swarm in the D-2 search spaces and in the D-3 search space at the initialization, at the 3-rd iteration and at the 12-th iteration.These locations of the particles in the swamp were obtained with the use of the modified PSO algorithm.The kernels with polynomial, radial basis and sigmoid functions were included in the search.A herewith the following change ranges of values' parameters were set: 8 3   d , Ν d  (for the polynomial function); 10 1 .0    (for the radial basis function); 1 .0 10 2     k and

Fig. 2 .Fig. 3 .
Fig. 2. Location of the particles in the swamp at the initialization (polynomial kernel function is on the left, radial basis is in the middle, sigmoid is on the right)

Fig. 4 .
Fig. 4. Location of the particles in the swamp at the 12-th iteration (polynomial kernel function is on the left, radial basis is on the right) Change range for the regularization parameter C was determined as: 10 1 .0   C .Moreover, the following values of parameters of the PSO algorithm were set: number m of particles in a swarm equal to 600 (200 per each kernel function type); iterations' number 20 max  N ; personal and

Fig. 5 .Fig. 6 .
Fig. 5. Location of particles after the 20-th iteration Several individual SVM classifiers using different types of the kernel function, different values of the kernel function functions of the kernel parameters and different values of the regularization parameter were learned in experiments for the particular data sets.The different training and test sets randomly generated from the original data set were used.Then the decorrelation maximization algorithm for the different www.ijacsa.thesai.orgstrategies of the decision-making on the data classification and the majority vote rule were applied.For example, for the German data set we developed 18 individual SVM classifiers with use of various input parameters.At the testing it was found, that the individual classifiers indicate the classification accuracy in range from 83.5% to 93.2%, and the initial values of the determination coefficient (if 1 *   ), calculated for all 18 individual classifiers, are in the range from 0.049 to 0.534.As a result, the threshold values  classification parameters corresponding to the different threshold values  are given in the Table shows information on the time expenditures for the training of each individual SVM-classifier.The total time of the training is 90 seconds.At the training for each individual SVM-classifier the training set was formed in a random way on the base of the initial experimental data set of the e-mails.The number of instances in the test set was equal to 10%-25% of the initial number of instances in the initial experimental data set.At the testing it was found, that the individual classifiers indicate the classification accuracy ranged from 82.29% to 94.63%, and the initial values of the determination coefficient (if 1 *   ), calculated for all 18 individual classifiers, are in the range from 0.025 to 0.757.As a result, the threshold values  classification parameters corresponding to the different threshold values  are given in the Table V.The optimal threshold value *  for the reviewed example belongs to the threshold values from this range all five classification rules (21) -(25) give the stable improvement of the classification quality when the number of classifiers reduces to the number corresponding to the threshold value *  classifiers in the SVM ensemble proved is equal to 4. A further decrease in the number of classifiers is not feasible (due to a further sharp decrease in their number and a substantial reduction of their variety).
) for the threshold values * VI shows information on the characteristics of the best SVM-classifier on the base of the traditional PSO algorithm and the modified PSO algorithm.Use of the maximum (minimum) strategy allowed classifying correctly 98.59% of the objects in the initial data set.At the same time, the maximum classification accuracy of one of the individual SVM classifiers, used in the SVM ensemble, was equal to 94.04% (for the 13-th SVM classifier), and the accuracy reached with use of the majority vote rule was equal to 96.8%.The application of other strategies also leads to increasing of the classification accuracy in comparison to the classification accuracy of the individual SVM classifiers, the classification accuracy on the base of the majority vote rule and the classification accuracy of the SVM classifier on the base of the PSO algorithm.

Fig. 7 .
Fig. 7. Representation of the data set Test2 in 2D space were classified correctly and entered in the training set  SV , and 11 objects were incorrectly classified and entered in the test set  SV .The time used for the development of one individual SVM classifier is on average less than 1 second.At the second level of the two-level SVM classifier the SVM classifier on the base of the modified PSO algorithm has been created.A herewith we used the training set  SV and the test set  SV .The search time for optimal parameters www.ijacsa.thesai.orgamounted to 2465 seconds, that almost 3 times less than the search time for the original experimental data set (7146 seconds).The remaining 185 objects (more than 46%) were not used in the development of the SVM classifier and compiled the classifying data set.These objects were correctly classified by the two-level SVM classifier.

Fig. 8
Fig. 8 shows the classification results of the Test2 data set: on the leftthe part of the objects (support vectors) and their separating curve; on the rightthe original experimental data set (after the classification of the remaining 185 objects).

Fig. 8 .
Fig. 8.The classification results of the Test2 data set 2055 objects have been identified by the group of the SVM classifiers as the support vectors.Noteworthy, 1834 objects were classified correctly and entered in the training set  SV , and 221 objects were incorrectly classified and entered in the test set  SV .The time used for the development of one individual SVM classifier is on average less than 4 second.At the second level of the two-level SVM classifier we found the best SVM classifier on the base of the modified PSO algorithm with the polynomial kernel function.time for optimal parameters amounted to 19566 seconds, that almost 2 times less than the search time for the original experimental data set (44933 seconds).

TABLE I .
THE SEARCH RESULTS BY MEANS OF THE TRADITIONAL PSO ALGORITHM AND THE MODIFIED PSO ALGORITHM

TABLE II .
THE CHARACTERISTICS OF THE BEST CLASSIFIER AT THE REALIZATION OF THE PSO ALGORITHM

TABLE III .
VALUES OF CLASSIFICATION PARAMETERS AT THE DIFFERENT THRESHOLD VALUES OF THE DETERMINATION COEFFICIENT (GERMAN DATA SET)

TABLE IV .
THE PARAMETERS AND CHARACTERISTICS OF THE INDIVIDUAL CLASSIFIERS (SPAM DATA SET)

TABLE V .
VALUES OF CLASSIFICATION PARAMETERS AT THE DIFFERENT THRESHOLD VALUES OF THE DETERMINATION COEFFICIENT (SPAM DATA SET)

TABLE VI .
THE CLASSIFICATION RESULTS ON THE BASE OF THE INDIVIDUAL SVM CLASSIFIERS AND THEIR SVM ENSEMBLE

TABLE VII .
THE PARAMETERS AND CHARACTERISTICS OF THE INDIVIDUAL CLASSIFIERS (TEST2 DATA SET)