Reflected Adaptive Differential Evolution with Two External Archives for Large-Scale Global Optimization

JADE is an adaptive scheme of nature inspired algorithm, Differential Evolution (DE). It performed considerably improved on a set of well-studied benchmark test problems. In this paper, we evaluate the performance of new JADE with two external archives to deal with unconstrained continuous large-scale global optimization problems labeled as Reflected Adaptive Differential Evolution with Two External Archives (RJADE/TA). The only archive of JADE stores failed solutions. In contrast, the proposed second archive stores superior solutions at regular intervals of the optimization process to avoid premature convergence towards local optima. The superior solutions which are sent to the archive are reflected by new potential solutions. At the end of the search process, the best solution is selected from the second archive and the current population. The performance of RJADE/TA algorithm is then extensively evaluated on two test beds. At first on 28 latest benchmark functions constructed for the 2013 Congress on Evolutionary Computation special session. Secondly on ten benchmark problems from CEC2010 Special Session and Competition on Large-Scale Global Optimization. Experimental results demonstrated a very competitive performance of the algorithm. Keywords—Adaptive differential evolution; large scale global optimization; archives.


I. INTRODUCTION
Optimization deals with finding the optimal solution for single or multi-objective functions [1].An unconstrained single objective optimization problem can be stated as follows: where f (x) denotes the objective function, and x = (x 1 , x 2 , ..., x n ) T is an n-dimensional real vector.
DE [2] is a most popular bio-inspired scheme for finding the global optimum x * of problem (1).The heuristic is essentially an evolutionary one and relies on the usual genetic operators of mutation and crossover.DE is easy to understand and implement, has a few parameters to control, and is robust.
There is no doubt that DE is a remarkable optimizer for many optimization problems.However, it has few drawbacks like, stagnation, premature convergence, and loss of diversity.Since it is a global optimizer, so its local search ability is not that good.More details can be found in [3].
In another experiment, adaptive variant of DE, the socalled JADE [15], is proposed for numerical optimization.It has shown performance improvement over the state-of-theart algorithms, jDE [8], SaDE [29] and DE/rand/1/bin [2] according to the reported results in [15] and [30].However, JADE is not reliable; on some problems.For instance, it finds the global optima in some runs, but it can also be trapped in local optima [30].To improve the reliability of JADE, in this paper, we introduce two new strategies in JADE and thus propose Reflected Adaptive Differential Evolution with Two External Archives (RJADE/TA).
The rest of this paper is organized as follows.Section II describes the basic DE and JADE algorithm.Section III presents proposed RJADE/TA.Section IV gives the experimental results and finally Section V concludes this paper and discusses future research directions.

A. Differential Evolution
The four main schemes of differential evolution (DE) are detailed as follows.
1) Parent Selection: For each member x i , i = 1, 2, ..., N p , of the current generation G three other members, x r1 , x r2 and x r3 are randomly selected, where r 1 , r 2 and r 3 are randomly chosen indices such that r 1 , r 2 and r 3 ∈ {1, 2, ..., N p } and i = r 1 = r 2 = r 3 .Thus, for each individual, x i , a mating pool of four individuals is formed in which it breeds against three individuals and produces an offspring.
2) Mutation: After selection, mutation is applied to produce a mutant vector v i , by adding a scaled difference of the two already chosen vectors to the third chosen vector.i.e., where F ∈ (0, 2) [31] is the scaling factor.
3) Crossover: After mutation, the parameters of the parent vector x i and mutant vector v i are mixed by a crossover operator and a trial member u i is generated as follows: where j ∈ {1, 2, ..., n}.

4) Survivor Selection:
At the end, the trial vector generated in (3) is compared with its parent on the basis of its objective function value.The fittest will propagate to the next generation.i.e.,

B. JADE
Before presenting the new algorithm, we give the details of the DE's version JADE, upon which the devised algorithm in this paper is based.JADE [15] is an adaptive version of DE.It improves the performance of DE, by implementing a new mutation strategy DE/current-to-p best with/without external archive, and adaptively controlling the parameters F and CR.JADE adopts the crossover and selection scheme of classic DE as described in Equation (3) and Equation (4).DE/currentto-pbest strategy incorporates not only the best solution information, but also the information of other good solutions.Specifically, any solution from the top p% population can be randomly selected in DE/current-to-p best to play the role of the single best solution in DE/current-to-best [15].Where p is the percentage of top good solutions and the default value for it is 5% of N p .Other suggested values of p are between 5% and 20%, inclusive.JADE modifies classic DE in three aspects.
1) DE/current/to-pbest strategy: JADE utilizes two mutation strategies, one with external archive, and the other without it.These strategies are the improvement of DE/current-tobest/1 strategy.They can be expressed as follows [15]: where x p best is a vector chosen randomly from the top p% individuals and x i , x r1 and x r2 are chosen from the current population P , while xr2 is chosen randomly from P ∪ A.
Where A denotes the archive of JADE, which records the inferior parent solutions found during the current generation.
2) Control Parameter Adaptation: For each individual x i , control parameter F i and the crossover probability, CR i are generated independently from Cauchy and Normal distributions, respectively as follows [15]: where rand is a uniform random number from [0, 1], and µCR and µF are the means of the Normal and Cauchy distributions with standard deviation 0.1.Cauchy distribution is more helpful than the Normal distribution to diversify the mutation factors and thus prevent premature convergence, which often occurs in mutation strategies if the mutation factors are highly concentrated around a certain value [15].The standard deviation is chosen to be relatively small (0.1) because otherwise the adaptation does not function efficiently; e.g., in the case of an infinite standard deviation, the truncated Normal distribution gets independent of the value of µCR [15].CR i and F i given in Equations ( 7) and ( 8) are then truncated to (0, 1] and [0, 1], respectively.Initially, both µF and µCR are set to 0.5 as suggested in [15].They are expressed as below [15]: Here mean L denotes the Lehmer mean, mean A denotes the arithmetic mean, and S F is the set of successful F i 's, while S CR is the set of successful CR i 's at generation G.The Lehmer mean is helpful to propagate larger mutation factors, which in turn improves the progress rate.To the contrary, an arithmetic mean of S F tends to be smaller than the optimal value of the mutation factor and thus it might cause premature convergence at the end.The parameter c in Equations ( 9) and ( 10) is a constant which controls the rate of parameter adaptation and is chosen between 0 and 1.The life span of a successful CR i or F i is roughly 1 c generations; i.e., after 1 c generations, the old value of µCR or µF is reduced by a factor of (1 − c) 1 c , when c is close to zero, if c = 0 no parameter adaptation takes place.
3) Optional External Archive: At each generation, the failed parents are sent to the archive.The Euclidian distance of the archive members from the current population is utilized in the mutation operation in order to diversify the population and avert the premature convergence.If the archive size exceeds N p , some solutions are randomly deleted from it to keep its size equal to N p .

III. REVIEW
Almost two decades have been passed when DE was proposed in 1995 to cope with non-differentiable, non-convex and non-linear problems defined in the continuous parameter space [32].Since then, DE and its uncountable and diversified variants have emerged as one of the most competitive and versatile family of the evolutionary computing optimizers and have been prosperously applied to solve numerous real-world problems from diverse discipline of science and technology [33].Extensive literature on DE is available, which is evident from the recent surveys on DE [34], [32].However, this section attempts to review some of the relevant methods.The hybridization of DE with local search strategies is a popular area of research among the practitioners.Many hybrid algorithms have shown significant performance improvement.
In [35] Sequential Quadratic Programming (SQP) is merged in DE algorithm.This new hybrid applies the DE algorithm until function evaluations reach 30% of the maximum function evaluations.It then applies SQP for the first time to the best point thus obtained.Afterwards, SQP is applied after each 100 generations to the best solution of the current search.In this work, the population size keeps reducing dynamically and the process terminates with minimum population size.
In another experiment DE is combined with simplex method and this method is know as NSDE [36].The authors applied nonlinear simplex method with uniform random numbers to initialize DE population.Initially, N p individuals are generated uniformly and then next N p are generated from these N p points by application of Nelder-Mead Simplex (NMS).Now from 2N p population, the fittest N p are selected as DE's initial population and the rest of DE is unchanged in this algorithm.Their algorithm only modify DE in the population step.
Further, differential evolution algorithm with localization around the best point (DELB) is proposed in [37].In DELB the initial evolutionary steps are the same as DE except that the mutation scale factor F is chosen from [−1, −0.4]∪[0.4,1] randomly for each mutant vector, DELB modifies the selection of DE by introducing reflection and contraction.The trial vector is compared with the current best and the parent vector.If the parent is worse than the trial vector it is replaced by a new concentrated or reflected vector.In DELB, the trial vector can be replaced by its parent vector, or reflected vector or contracted vector, while in classic DE only the trial vector replaces the parent.
Inspired by the above techniques, a new variant RJADE/TA of DE family is presented, which records the best individuals of the optimization process at regular intervals.Besides, it utilizes an reflection strategy of local search for replacing the archived solutions.The detail of RJADE/TA is presented in the following section.

IV. PROPOSED REFLECTED ADAPTIVE DIFFERENTIAL EVOLUTION WITH TWO EXTERNAL ARCHIVES
This section proposes a new DE algorithm, RJADE/TA, which modifies JADE in two aspects, first it introduces a second external archive into JADE, which stores superior solutions of the search at regular intervals of the optimization process.Second, these superior solutions are then reflected by new significant/potantial solutions in the current population.RJADE/TA adopts the same crossover and mutation operations as described in JADE [15].We have done some modification to the Pseudo-code of JADE; this addition can be seen in lines 26 to 31 of Algorithm 1.Further in the last line the best solution is selected from P U A 2 , the rest of the code remains the same.Generate CR i = rand(µCR, 0.1);

A. Best Solution's Reflection
Early convergence of the algorithms may be achieved due to best solution.Thus to avoid premature convergence, stagnation and local optima RJADE/TA reflects the best solution, x best,G of the search process and send it to the archive A 2 .To implement the reflection mechanism [38] in RJADE/TA, first the center of mass of the current population P except the best solution x best,G is computed as: where x c,G denotes the center of mass of N p − 1 individuals, since one candidate solution will be archived, this operation can be seen in Algorithm 1 (line 29).Once the center of mass of N p − 1 individuals is calculated, then the best individual x best,G (the solution with minimum objective value) of P is reflected through the center of mass x c,G as follows: Where x r,G is the mirror image or reflection [38] of x best,G through the centroid x c,G , this newly produced solution is known as reflected solution.The coefficient of reflection is "1" as suggested in [38].
The reflected solution replaces x best,G in the population P and the best solution x best,G by itself is transferred to the second archive A 2 .

B. Second External Archive in RJADE/TA
When the search procedure reaches its 50% function evaluations the first archive A 2 update is made.After which A 2 is updated at regular interval of generations κ.As mentioned earlier that JADE has archive A, which stores inferior solutions, if the archive size exceeds N p ; some solutions are removed from it.In contrast the proposed second archive A 2 records the best solution of the search after each κ generations.In other words the best solution of the current population, after κ generations is removed from the search procedure and is kept passive in archive A 2 during the optimization.The objective of sending the best solution from the current optimization process is that the best solution information may cause difficulties such as premature convergence due to the resultant reduced population diversity [15].Best solution some times mislead the search to local optima or stagnation.
The second archive A 2 is initialized as 0 and is updated with a best solution in each κ generations (see Algorithm 1).The interval between two reflections is κ, this is kept 1000 here.If we reflect the best solution at each generation, there will be one extra evaluation at each generation, which may be a wastage of computational energy.Furthermore, if we store best solution at each generation then the best solution of current generation and the previous will be not much different from each other.Which again will be wastage of computation.That is why we selected κ a 1000.There are few differences in A and A 2 which are given below.1) A 2 stores best solution of the current population, while A records the recently explored inferior solutions.
2) The size of A is kept N p , if this size exceeds, some solutions are randomly deleted from A, however in the new archive A 2 the size may exceeds N p .It keeps the record of all best solutions, no solution is removed from it.
3) A 2 records the best solution (only one solution) of the current generation, this may be a parent solution or a child solution.In contrast, A keeps the inferior parents solutions (more than one) only, it does not record inferior child solution.4) A 2 is initialized as 0 and is updated after κ generations (1000 say).On the other hand A is updated at the end of each generation.5) The recorded inferior parents of A are later on utilized in mutation.Where in A 2 the stored best solution is reflected with a new solution; which is sent to the current population.Once a solution is kept in A 2 , it remains inactive during the optimization.When the search procedures are terminated, then the second archive's solution contribute towards the selection of optimal solution.

A. Experimental Setup
Experimental validations for the proposed RJADE/TA are conducted on a set of 28 new and complex test functions [39] provided by CEC 2013 special session and a 1000 dimensional functions designed for CEC 2010 competition on large scale global optimization problems [40].

B. CEC 2013 Test Suite
In the CEC 2013 test suite, the previously proposed composition functions of CEC 2005 [2] are enhanced and additional test functions are considered for real parameter single objective optimization.Three types of problems are developed: • Functions 1-5 are unimodal; • 6-20 are multimodal functions.
• 21-28 are composit functions, which are designed by combining various problems into a complex landscape.

C. Parameter Settings for CEC 2013 Test Suite
We performed our experiments following the guidelines of the CEC2013 competition [39].For all the problems, the initialization range is [−100; 100].For all of the problems the number of dimensions are n = 10 and 30, and the maximum number of objective function evaluations are 10000×n per run.When the difference between the values of the best solution found and the optimal (known) solution is 10 −8 or less, the error is set to 0. The population size is set to 100.

D. Results on CEC 2013 functions
The experimental statistics(best, mean, median, worst and standard deviation) obtained by our algorithm in 51 runs, on 28 functions with dimensions n = 10 of the CEC 2013 test functions are summarized in Table I.In Table II, the Mean values of function error values(f (x) − f (x * )) obtained by RJADE/TA are presented for n = 10.These values are compared with state of the art algorithms, jDE, jDEsoo [41] a new version of jDE, SPSRDEMMS [42] and jDErpo [13].Among these SPSRDEMMS and jDErpo were specially developed for CEC 2013 competition.
In Table II the -shows that the corresponding algorithm loesses against our RJADE/TA algorithm.The + indicates that the particular algorithm wins against our algorithm, and = reveals that both the algorithms performs equivalently.The outstanding performance of RJADE/TA is clearly visible from Table II, where many negative -signs made this fact evident.It is very clear from the Table that our RJADE/TA algorithm performed significantly better than jDE and jDEsoo algorithms on 15 out of 28 functions, on 4 functions both got similar results.On the other hand jDE and jDEsoo showed better performance on only 9 functions.As compared with SPSRDEMMS, our algorithm found better solutions for 16 out of 28 functions and SPSRDEMMS showed good results on 12 functions.Furthermore, jDErpo and RJADE/TA performed better than each other on 12 functions.
Table III shows the comparison of RJADE/TA against jDE, jDEsoo, SPSRDEMMS and jDErpo for n = 30.It is interesting to note that the performance of RJADE/TA increased with the increase in dimension.It found better results for 20 out of 28 function against jDE and jDEsoo.jDE only solved 5 out of 28 problems for 30 dimensions, and jDEsoo got good results on 3 out of 28 functions.SPSRDEMMS and jDErpo performed inferior on 16 functions, and superior on 8 functions only, which can be seen from Table III.
Tables II and III showed the comparison of RJADE/TA against each of the particular algorithms.Here we present the overall percentage of all the algorithms, jDE, jDErpo, SPSRDEMMS, jDErpo and RJADE/TA on 30 dimensional problems.Table IV demonstrates that RJADE/TA performance percentage is 50% while jDErpo is 37%, the remaining three algorithms in comparison performed less than or equal to 25%.This percentage validity is even more clearly visible from the bar graph 1.Each bar shows the number of test problems optimized by particular algorithm.The last bar representing RJADE/TA.

E. CEC2010 Test Instances
Here we evaluate RJADE/TA on ten complex optimization problems used in CEC2010 special session and competition on large scale global optimization [40].Since separability provides a measure of the complexity of various problems, in [40] a test suite for high dimensional problems is devised which is based on separability and non separability of the functions.Here, three kinds of high-dimensional problems are considered: • Separable functions; • Partially-separable functions, in which only a small number of variables are dependent and the rest are independent; • Partially-separable functions that consist of multiple independent subcomponents, each of which is m-nonseparable; and This test suite provided an enhanced platform for evaluating the performance of algorithms on high-dimensional problems in various scenarios [40].Below we list only those test functions (F1-F10) which are used in this work.The parameter m controls the number of variables in each group and hence defining the degree of separability.

F. Parameter Settings for CEC2010 instances
For this experiment the population size N p is chosen 50 and the dimension n is set to 1000.The maximum function evaluations are chosen 3 × 10 +06 .The value to reach is set to 10 −2 .RJADE/TA and JADE were run 25 independent times for all test instances as suggested in the original paper [40].All these experiments were conducted in MATLAB software.

G. Comparison of RJADE/TA with JADE 0n CEC 2010 instances
The best, median, mean and standard deviation of function error values obtained in 25 runs of the proposed algorithm, RJADE/TA are presented in Table V.These statistics were requested in [40] as well.The best results are typed as bold.
As can be seen from Table V, overall RJADE/TA performed well as compared with JADE in obtaining the "best" solution for five out of ten test instances, F3, F4, F5, F7 and F8.For F6 both algorithms got the same accuracy.Here F3 is separable and all others are single-group m-nonseparable functions.Surly it is due to the additional second archive of RJADE/TA which provides more chance to the population for searching the region and discouraging early convergence.For     the remaining four test instances, F1, F3, F9 and F10 JADE got better solutions than RJADE/TA; here F1 and F3 are separable and two functions F9 and F10 are partially-separable functions that consist of multiple independent subcomponents.Furthermore, the failure on F10 ( n 2m -group nonseparable) could be its complexity, as it is the sum of ten rotated Rastrigins functions applied to groups of m (50 here) decision variables each and one non-rotated Rastrigins function applied to the remaining 500 decision variables.The failure on F9 can be due to its complex nature like F10.
Considering "Mean", "Median" and Standard deviation, we see that RJADE/TA's is more suitable to solve single-group mnonseparable functions, F3-F8, which is visible from Table V. Hence in general, the analysis of above experimental results lead us to the conclusion that RJADE/TA in much much better than JADE in optimizing problems from the category of singlegroup m-nonseparable functions.

VI. CONCLUSIONS
The current DE variant JADE with one optional external archive some times exhibit poor reliability [30].Moreover, best solutions some times mislead the search to a local optima.In this paper, we have attempted to introduce a second archive A 2 into JADE for overcoming this shortcoming for large scale global optimization problems.This archive stores the best solution, which is removed from the current population after regular intervals.The removal of best solution is compensated by a new potential solution in the population.Thus we have proposed an approach RJADE/TA to add A 2 to JADE algorithm and add new good divers solutions to the population to make a systematic and rational search in the region defined for the search process.RJADE/TA takes the advantages of both archives, A with inferior solutions and A 2 with superior solutions.It is easy to implement and does not introduce any complicated structures.
The performance of the developed RJADE/TA has been demonstrated by taking advantage of 28 complex competition test functions from CEC 2013 and 10 functions from CEC2010.On CEC2013 test suit RJADE/TA was compared with jDE, jDEsoo, jDErpo and SPSRDEMMS algorithms on 10 and 30 dimensions.The superior performance of RJADE/TA was demonstrated on 10 and 30 dimensions.Moreover, we have compared RJADE/TA with classical JADE with 1000 dimensions.RJADE/TA notably outperformed JADE and is very competitive in solving single-group m-nonseparable functions.In this paper, our aim was to analyze the behavior of algorithm if the best solution is removed from it.
In future JADE with second Archive only can be explored.The experiments may be carried out at other higher dimensional problems.This may be extended to constrained optimization.

33 :
µF = (1 − c) • µF + c • mean L (S F ); 34: end while 35: Output: the solution vector with the smallest objective function value from P U A 2 in the search.

Fig. 1 :
Fig. 1: Comparison of RJADE/TA and other up to date algorithms with dimension n = 30

TABLE I :
EXPERIMENTAL RESULTS OF RJADE/TA ON 28 TEST FUNCTIONS OVER 51 RUNS WITH DIMENSION n = 10.

TABLE II :
COMPARISON OF RJADE/TA WITH OTHER ALGORITHMS ON THE MEAN OF THE FUNCTION ERROR VALUES AT EXECUTION TERMINATION OVER 51 RUNS, ON 28 TEST FUNCTIONS WITH n=10.

TABLE III :
COMPARISON OF RJADE/TA WITH OTHER ALGORITHMS ON THE MEAN OF THE FUNCTION ERROR VALUES AT EXECUTION TERMINATION OVER 51 RUNS, ON 28 TEST FUNCTIONS WITH n=30.

TABLE IV :
%age comparison of RJADE/TA with other algorithms

TABLE V :
EXPERIMENTAL RESULTS OF JADE, AND RJADE/TA ON 10 TEST INSTANCES OF 1000 VARIABLES WITH 3 • 10 +06 F ES.Best, Median, Mean AND the Std Dev OF THE FUNCTION ERROR VALUES OBTAINED OVER 25 RUNS.