Separability Detection Cooperative Particle Swarm Optimizer based on Covariance Matrix Adaptation

The particle swarm optimizer (PSO) is a population- based optimization technique that can be widely utilized to many applications. The cooperative particle swarm optimization (CPSO) applies cooperative behavior to improve the PSO on finding the global optimum in a high-dimensional space. This is achieved by employing multiple swarms to partition the search space. However, independent changes made by different swarms on correlated variables will deteriorate the performance of the algorithm. This paper proposes a separability detection approach based on covariance matrix adaptation to find non-separable variables so that they can previously be placed into the same swarm to address the difficulty that the original CPSO encounters.


INTRODUCTION
The particle swarm optimizer (PSO) [1,2] is a stochastic, population-based optimization learning algorithm.Its learning procedure is based on a population made of individuals with specific behaviors similar to certain biological phenomena.Individuals keep exploring the solution space and exploiting information between individuals while evolution proceeding.In general, by means of exploring and exploiting, the PSO is less likely to be trapped at the local optimum.
As with many stochastic optimization algorithms [1,[3][4][5][6], the PSO suffers from the "curse of dimensionality," which implies that its performance deteriorates as the dimensionality of the search space increases.To cope with this difficulty, Potter [3] proposed a cooperative coevolutionary genetic algorithm (CCGA) that partitions the search space by splitting the solution vectors into smaller ones.The mechanism proposed by Potter significantly improves the performance of the original GA.Van den Bergh [5] applies this technique to the PSO and presented several cooperative PSO models named CPSOs.In the CPSOs learning procedure, the search space can be arbitrarily partitioned into different number of subspaces.Each smaller search space is then searched by a separate swarm.The fitness function is evaluated by the context vector, which means the concatenation of particles found by each of the swarms.However, as with the CCGA algorithm, the performance of the CPSO deteriorates when correlated variables are placed into separate populations.In this paper, we call such variables "non-separable."A function f is said to be separable if (arg min ( , ), , arg min ( , )) and it is followed by a fact that f can be optimized in a sequence of n independent 1-D optimization processes.This paper proposes a variation on the original CPSO to detect the separability of the variables.To this end, we adopt a mechanism from evolution strategy with covariance matrix adaption (CMA-ES) [8,9].The performance of the CPSO after applying separability detection is compared with that of the traditional PSO and CPSO algorithm.
This paper is organized as follows.Section II presents an overview of the PSO and the CPSO.In section III, we describe the proposed separability detection cooperative particle swarm optimizer (SD-CPSO).This is followed by the experiment results presented in section VI.Finally, some directions for the future research are discussed in section V.

II. RELATED WORKS
The PSO is first introduced by Kennedy and Eberhart.It's one of the most powerful methods for solving global optimization problems.The algorithm searches an optimal point in a multi-dimensional space by adjusting the trajectories of its particles.The individual particle updates its position and velocity based on its personal best performance and the global best performance among all particles that denote y and respectively.The position x i,d and velocity v i,d of the d-th dimension of i-th particle are updated as follows: ( ( 1) ( ) where y i represents the previous best position yielding the best performance of the i-th particle; c 1 and c 2 denote the acceleration constants describing the weighting of each particle been pulled toward y and $ y respectively; Let s denote the swarm size and f() denote the fitness function evaluating the performance yielded by a particle.After (2) and (3) are executed, the personal best position y of each particle is updated as follows: www.ijacsa.thesai.org( 1), if ( ( 1)) ( ( )), ( 1) ( 1), if ( ( 1)) ( ( )), and the global best position is found by: $ ( 1) arg min ( ( 1)), The CPSO [5,6] is one of the most significant improvements to the original PSO.Van den Bergh presented a family of CPSOs, including CPSO-S, CPSO-S K , CPSO-H, CPSO-H K .Algorithm CPSO-H K is the hybrid from PSO and CPSO-S K and it is proposed to address the issue of "pseudominima."A discussion of pseudominima is outside of the scope of this article.The objective of this article is to propose a self-organized technique to assist the CPSO-S K in finding how the components on a context vector be related.
The concept of CPSO-S is that instead of trying to find an optimal n-dimensional vector, the vector is split into n parts so that each of n swarms optimizes a 1-D vector.The CPSO-S K is a family of CPSO-S, where a vector is split into K parts rather than n, where K n ≤ .K also represents the number of swarms.Each of the K swarms acts as a PSO optimizer (2)- (5).The main difference between the PSO and the CPSO is that the fitness of a single particle of the CPSO has to be evaluated through global best particles of the other swarms.Let P j denote the j-th swarm and P j ‧x i represents the i-th particle in the swarm j.The fitness of P j .xi is defined as: , ., , ) The CPSO applies cooperative behavior to improve the PSO on find the global optimum in a high-dimensional space.This is achieved by employing multiple swarms to explore the subspaces of the search space separately to reduce the curse of dimensionality.However, there is no absolute criterion that the CPSO is superior than the PSO since independent changes made by different swarms on correlated variables will deteriorate its performance .In addition, in one generation of a n-dim CPSO-S operation, the computational cost is n times larger than that of a PSO operation.

III. METHODLOGY
This paper proposes an approach to help the CPSO selforganize the swarms composed of non-separable variables.Consider a particular optimization task illustrated in Fig. 1, from which we can see a 2-dim function with a bar-shaped local optimal region and a global optimum lies in it.The task is to find its global optimum by particle swarm optimizer.At first, particles are uniformly distributed in the search space.At this moment, we expect particles to be divided into two swarms, performing separate 1-dim PSO operation on each dimension to speed up the process of particles gathering around the optimal region.
If by any chance particles gather around the optimal region as we expected, as shown in Fig. 2. At this point of time, we prefer particles performing 2-dim PSO operation on the whole search space to reduce the computational cost, which, in this case, represents the number of function evaluations.In order to implement the idea illustrated above, we have to determine the timing of switching between the PSO and the CPSO operation when dealing with a task.In this paper, we think this can be done by determining the separability between variables, and placing non-separable into the same swarm at each generation.If at certain moment, all variables are determined as non-separable, then the PSO operation is taken; otherwise, the CPSO operation is taken.
The separability between variables is found by estimating the covariance matrix of the distribution of particles.The method we adopt is called the covariance matrix adaptation proposed in [8,9].In the standard CMA-ES, a population of new search points is generated by sampling a multivariate normal distribution N with mean n m ∈ ¡ and covariance matrix . The equation of sampling new search points, for each generation number g = 0, 1, 2, …, reads where ~ denotes the same distribution on the left and right hand side; σ (g) denotes the overall standard deviation, step-size, at generation g and λ is the sample size.The new mean m (g+1) of the search distribution is a weighted average of the μ selected points from λ samples ( 1)   1 g x + , ( 1)   2 g x + ,…, ( 1)   g x λ + : ( 1) ( 1) : 1 with www.ijacsa.thesai.org where w i are positive weights, and ( 1)   : denotes the i-th rank individual out of λ samples from (8).The index i:λ denotes the i-th rank individual and ( 1) ( 1) ( 1) 1: 2: : where f() is the objective function to be minimized.The adaption of new covariance matrix C (g+1) is formed by a combination of rank-μ and rank-one update [10] ( ) ( ) ( ) exp 1 (0, ) ( 1) (2 ) where c σ is the backward time horizon of evolution path, similar to c c ; d σ is a damping parameter and ( 1)   g p σ

+
is the conjugate evolution path for step-size σ (g+1) .The expectation of the Euclidean norm of a N(0, I) reads where O(‧) represents high-order terms.
Consider the estimated covariance matrix has the form shown as follows, where n is the number of dimensions, c jk represents the weighted covariance between variables j and k.The separability between dimensions can be obtained from correlation coefficient matrix with its element defined as follows: We define a parameter ρ thres to determine whether dimension j and k are viewed as separable.If ρ jk < ρ thres then we say variable j and k are separable.Conventionally, if |ρ|>0.8, it implies that there exists a very strong linear relationship between these two variables; 0.8>|ρ|>0.6implies strong relationship, and 0.6>|ρ|>0.4implies moderate relationship.So, in this paper, we avoid setting ρ thres less than 0.6.The block diagram of the proposed method can be found in Fig. 3.

IV. EXPERIMENT RESULTS
In order to compare the performance between different algorithms, a fair time measure must be selected.Here we use the number of function evaluations as a time measure following [5].The performance of the proposed SD-CPSO is verified by real-parameter minimization tasks, which contains totally nine test functions.By their nature they can be divided into two parts: unimodal and multi-modal functions.www.ijacsa.thesai.org The first two functions are unimodal, followed by seven multimodal functions with three of them have simple global structures (single-funnel functions) and another four have complex global structures (multi-funnel functions).The difference between single-and multi-funnel functions can be illustrated by the following two figures, where Figure 4 shows a visualization of a 2-D Rastrigin's function, from which we can see that in spite of the large amount of local minima, there exists a trend to the global minimum.Figure 5 shows a visualization of a 2-D double Rastrigin's function, from which we can see that there are two funnel-type global trends and a large amount of noisy local minima.The types and names of functions are described in Table I.A detailed definition of test functions can be seen in [11,12].
All functions are of 50 dimensions and have been adjusted to zero optimal solution respectively.To make sure that there was sufficient correlation between the variables, making it even harder for optimization, all the functions were further tested under 45 degree coordinate rotation.
In the following of this chapter, we will describe the configurations of the algorithms that we use to compare the performance with the proposed SD-CPSO in section 3.A.Experiment result and the discussion will be shown in section 3.B.

A. Algorithms Configuration
The three algorithms for comparison are listed as follows: • PSO: the origin algorithm.
• CPSO-S: algorithm that splits swarm into each dimension.
For each algorithm, experiments are executed for 50 times.Denote n the dimension of the optimization task and s the number of particles in one swarm.Parameters of the three algorithms are listed in Table II.

B. Experiment Result
This section presents optimization results.The number of maximum fitness calculation times, initial search range, initial search position and minimum fitness threshold are detailed in Table III.All particles are evenly distributed in the initial search range.The experimental data is obtained by executing each 50 dimensional test functions until the stopping criterion is met.The procedure was repeated 50 times to compute the average fitness value.In the paper, instead of the actual numeric fitness value, the rank of the minimum average fitness value is defined as the standard of comparison.The reason is that we want to exclude the impact of the different degree of scale on the raw numeric difference between each test function.For example, some functions have very large fitness gap between the best and the second best local minimum, some of them don't even have local minima.Therefore, the numeric difference may not be a good performing index for evaluating algorithms.The experiment result is shown in Table IV   The results to be discussed are divided into three parts in accordance with the function types: 1) Unimodal Function: Under the sphere function f 1 , CPSO-S has the best performance, owing to its property of rapid convergence.As to ellipsoid function f 2 , at first, PSO is better than the other two algorithms.As shown from the experiment result, all three algorithms are capable of solving unimodal optimization task, and no improvement of performance can be found by applying our method.

2) Multimodal Function:
The SD-CPSO is better than other algorithms under the f 3 and f 5 test functions except for f 4 , the Rastrigin's function.We think it might due to the fact that Rastrigin's function is nearly the same after rotation, which makes our effort trying to find a special trend to the global optimum irrelevant.However, the superiority of the proposed SD-CPSO in finding global optima of multimodal functions can be seen in substance.

3) Multi-Funnel Function:
From Table IV we can see that in coping with multi-funnel function optimization tasks, the superiority of the proposed SD-CPSO is obvious.In general, the optimization of multi-funnel function is difficult as we can see especially from the optimization result of the f 6 function.Despite the proposed SD-CPSO has better performance on the optimization tasks of f 7 and f 8 function, the improvement is not very obvious.However, in the optimization of f 9 , the Michalewicz's function, the improvement is remarkable.As a result, we will illustrate the optimization results of applying Michalewicz's function in both its unrotated and rotated form in Fig. 7.   Fig. 8, on the other hand, illustrates the ability of SD-CPSO self-organizes the decomposition of dimensions.We place the detected non-separable variables to the same swarm in the CPSO operation to alleviate the detrimental effect we encountered when placing independent variables into separate swarms.When particles waver in the valley, the number of swarm decreased for the sake of correlated dimension has being coupled, and when swarms step into the local minimum region, the number of swarm increased to adapt these uncorrelated sphere-liked region.

V. CONCLUSION
In this paper, we propose a self-organization approach to the CPSO.This approach determines the suitable swarm structure of the CPSO by estimating the correlations between variables.Experiments show reasonable performance.The combination of dimensions forming a swarm is detected by covariance matrix adaptation.Future research should be done to investigate the pseudominima caused by the split of swarm.www.ijacsa.thesai.org in the range [0, 1].

Figure 1 .
Figure 1.Case with particles uniformly distributed in the search space to find the global optimum lies in a bar-shaped local optimal region.

Figure 2 .
Figure 2. Case with particles gather around the bar-shaped optimal region to find the global optimum.
where μ cov ≥ 1 is the weighting between rank-μ update and rank-one update; c cov ∈ [0,1] is the learning rate for the covariance matrix update, and formula used to compute the estimated covariance matrix for the selected samples.The evolution path ( 1) g c p + for rank-one update is described as follows: effective selection mass.The new stepsize σ (g+1) is updated according to( 1)

Fig. 7 (
Fig. 7(a) represents the result of applying unrotated Michalewicz's function.Michalewicz's function introduces many valleys into the plain, and the function values for points in the space outside the narrow valleys give very little information about the location of the global optimum.Thus, the swarms need to follow through these valleys to find minimums.In its rotated version, these narrow valleys are too correlated to follow through from the perspective of the CPSO.In Fig. 7(b), the SD-CPSO in evidence overcomes the drawback.

Figure 8 .
Figure 8. Results of the number of swarms of applying rotated Michalewicz's function.
science from the University of Maryland in 1985, and the Ph.D. degree in electrical engineering from the University of Illinois, Champaign, in 1988.Since 1988, he has been on the faculty of the Department of Electrical Engineering at National Chiao Tung University, Hsinchu, Taiwan, where he is currently a professor.His research interests include fuzzy systems, genetic algorithms, neural networks automatic target recognition, scheduling, image processing, and image recognition.Yi-Chang Cheng received the B.S. degree in engineering science from the National Cheng Kung University, Taiwan, R.O.C., in 2005.He is currently pursuing the Ph.D. degree at the department of electrical engineering from the National Chiao Tung University, Taiwan, R.O.C.His research interests include neural networks, fuzzy systems, evolutional algorithms and genetic algorithms.Jyun-Wei Chang received the B.S. and M.S. degree in electronic engineering from National Kaohsiung University of Applied Sciences, Taiwan, R.O.C. in 2005 and 2007, respectively.He is currently pursuing the Ph.D. degree at the department of electrical engineering from the National Chiao Tung University, Taiwan, R.O.C.His research interests include neural networks, fuzzy systems, and evolutional algorithms.Pei-Chia Hung received the B.S. degree in engineering science from the National Chiao Tung University, Taiwan, R.O.C., in 2004.He is currently pursuing the Ph.D. degree at the department of electrical engineering from the National Chiao Tung University, Taiwan, R.O.C.His research interests include image processing and image compression.

TABLE I .
TYPE AND NAME OF THE TEST FUNCTION.

TABLE III .
PARAMETERS OF THE EXPERIMENT.
as follows.

TABLE IV .
AVERAGE FITNESS VALUE.