Differential Evolution Enhanced with Eager Random Search for Solving Real-Parameter Optimization Problems

Differential evolution (DE) presents a class of evolutionary computing techniques that appear effective to handle real parameter optimization tasks in many practical applications. However, the performance of DE is not always perfect to ensure fast convergence to the global optimum. It can easily get stagnation resulting in low precision of acquired results or even failure. This paper proposes a new memetic DE algorithm by incorporating Eager Random Search (ERS) to enhance the performance of a basic DE algorithm. ERS is a local search method that is eager to replace the current solution by a better candidate in the neighborhood. Three concrete local search strategies for ERS are further introduced and discussed, leading to variants of the proposed memetic DE algorithm. In addition, only a small subset of randomly selected variables is used in each step of the local search for randomly deciding the next trial solution. The results of tests on a set of benchmark problems have demonstrated that the hybridization of DE with Eager Random Search can substantially augment DE algorithms to find better or more precise solutions while not requiring extra computing resources. Keywords—Evolutionary Algorithm, Differential Evolution, Eager Random Search, Memetic Algorithm, Optimization


I. INTRODUCTION
Evolutionary algorithms (EAs) are stochastic and biologically inspired techniques that provide powerful and robust means to solve many real-world optimization problems.They are population-based optimization approaches [1] which perform parallel and beam search, thereby exhibiting strong global search ability in complex and high dimensional spaces.Another merit of EAs is that they dont need the derivative information of objective functions.This is very attractive for wide applications of EAs in various situations without requiring the problem space to be continuous and differentiable.Many variants of EAs have been developed to deal with real-parameter continuous optimization problems, including evolution strategies [2], real-coded genetic algorithms [3], [4], differential evolution (DE) [5], [6], and particle swarm optimization [7] and [8].
Differential evolution presents a class of evolutionary techniques to solve real parameter optimization tasks with nonlinear and multimodal objective functions.Despite sharing common concepts of EAs, DE differs from many other EAs in that mutation in DE is based on differences of pair(s) of individuals randomly selected from the population.Thus, the direction and magnitude of the search is decided by the distribution of solutions instead of a pre-specified probability density function.DE has been used as very competitive alternative in many practical applications due to its simple and compact structure, easy use with fewer control parameters, as well as high convergence in large problem spaces.However, the performance of DE is not always excellent to ensure fast convergence to the global optimum.It can easily get stagnation resulting in low precision of acquired results or even failure [9].
Recent researches have shown that hybridization of EAs with other techniques such as metaheuristics or local search techniques can greatly improve the efficiency of the search.EAs that are augmented with local search for self-refinement are called Memetic Algorithms (MAs) [ [10], [11]].In MAs, a local search mechanism is applied to members of the population in order to exploit the most promising regions gathered from global sampling done in the evolutionary process.Memetic computing has been used with DE to refine individuals in their neighborhood.Norman and Iba [12] proposed a crossover-based adaptive method to generate offspring in the vicinity of parents.Many other works apply local search mechanisms to certain individuals of every generation to obtain possibly even better solutions, see examples in ( [13], [14], [15], [16]), [17]).This paper proposes a new memetic DE algorithm by incorporating Eager Random Search (ERS) to enhance the performance of a conventional DE algorithm.ERS is a local search method that is eager to move to a position that is identified as better than the current one without considering other opportunities in the neighborhood.This is different from common local search methods such as gradient descent [18] or hill climbing [19] which seek local optimal actions during the search.Forsaking optimality of moves in ERS is advantageous to increase randomness and diversity of search for avoiding premature convergence.Three concrete local search strategies within ERS are introduced and discussed, leading to variants of the proposed memetic DE algorithm.In addition, only a small subset of randomly selected variables is used in every step of the local search for randomly deciding the next trial point.The results of tests on a set of benchmark problems have demonstrated that the hybridization of DE with www.ijarai.thesai.orgEager Random Search can bring improvement of performance compared to pure DE algorithms while not incurring extra computing expenses.
The rest of the paper is organized as follows.Section 2 briefly presents the related works.Section 3 introduces the basic DE algorithm.Then, the proposed memetic DE algorithm in combination with Eager Random Search is presented in details in Section 4. Section 5 gives the results of tests for evaluation.Finally, concluding remarks are given in Section 6.

II. RELATED WORK
Since the first proposal of DE in 1997 [20], a lot of works have been done to improve the search ability of this algorithm, resulting in many variants of DE.A brief overview on some of them is given in this section.
Ali, Pant and Nagar [13] proposed two different local search algorithms, namely Trigonometric Local Search and Interpolated Local Search, which were applied to refine the best solution and two random solutions in every generation respectively.
Local search differential evolution was developed in [14] where a new local search operator was used on every individual in the population with a probability.The search strategy attempted to find a random better solution between trial vector and the best solution in the generation.
Dai, Zhou, Zhang and Jiang [15] combined Orthogonal Local Search with DE in the so-called OLSDE (Orthogonal Local Search Differential Evolution) algorithm.Therein two individuals were randomly selected from the population in each generation and they were used to generate a group of trial solutions with the orthogonal method.Then the best solution from the group of trial solutions replaced the worst individual in the population.
Jia, Zheng and Khan [9] proposed a memetic DE algorithm in combination with chaotic local search (CLS).The adaptive shrinking strategy embedded within CLS enabled the DE optimizer to explore large space in the early search phase and to exploit small regions in the later phase.Moreover, the chaotic iteration produced a higher probability to move into a boundary field, which appeared helpful for avoiding premature convergence to some extent.A similar work of utilizing chaotic principle based local search in DE was presented in [21].
Poikolainen and Neri [22] proposed a DE algorithm employing concurrent fitness based local search (DEcfbLS).The local search was applied to multiple promising solutions in the population, and the selection of individuals for local improvement was based on a fitness-based adaptation rule.Further, the local search operator was realized by making trial moves successively on single dimensions.But there was not much variation in the step sizes of the moves for different variables within an iteration of the search.

III. BASIC DE
DE is a stochastic and population based algorithm with Np individuals in the population.Every individual in the population stands for a possible solution to the problem.One of the Np individuals is represented by X i,g with i = 1, 2, ., N p and g is the index of the generation.DE has three consecutive steps in every iteration: mutation, recombination and selection.The explanation of these steps is given below: MUTATION.N p mutated individuals are generated using some individuals of the population.The vector for the mutated solution is called mutant vector and it is represented by V i,g .There are some ways to mutate the current population, but only three will be explained in this paper.The notation to name them is DE/x/y/z, where x stands for the vector to be mutated, y represents the number of difference vectors used in the mutation and z stands for the crossover used in the algorithm.We will not include z in the notation because only the binomial crossover method is used here.The three mutation strategies (random, current to best and current to rand) will be explained below.The other mutation strategies and their performance are given in [23].
-Random Mutation Strategy: Random mutation strategy attempts to mutate three individual in the population.When only one difference vector is employed in mutation, the approach is represented by DE/rand/1.A new, mutated vector is created according to Eq. 1 where V i,g represents the mutant vector, i stands for the index of the vector, g stands for the generation, r 1 , r 2 , r 3 ∈ { 1,2,. . .,N p } are random integers and F is the scaling factor in the interval [0, 2].
Fig. 1 shows how this mutation strategy works.All the variables in the figure appear in Eq. 1 with the same meaning, and d is the difference vector between X r2,g and X r3,g .The current to best mutation strategy is referred as DE/current-to-best/1.It moves the current individual towards the best individual in the population before being disturbed with a scaled difference of two randomly selected vectors.Hence the mutant vector is created by where V i,g stands for the mutant vector, X i,g and X best,g represent the current individual and the best individual in the population respectively, F 1 and F 2 are the scaling factors in the interval [0, 2] and r 1 , r 2 ∈ { 1,2,. . .,N p } are randomly created integers.Fig. 2 shows how the DE/current-to-best/1 strategy works to produce a mutant vector, where d1 denotes the difference vector between the current individual X i,g , and X best,g , d2 is the difference vector between X r1,g and X r2,g .The current to rand mutation strategy is referred to as DE/current-to-rand/1.It moves the current individual towards a random vector before being disturbed with a scaled difference of two randomly selected individuals.Thus the mutant vector is created according to Eq. 3 as follows where X i,g represents the current individual, V i,g stands for the mutant vector, g stands for the generation, i is the index of the vector, F 1 and F 2 are the scaling factors in the interval [0, 2] and r1, r2, r3 ∈ { 1,2,. . .,N p } are randomly created integers.Fig. 3 explains how the DE/current-to-rand/1 strategy works to produce a mutant vector, where d1 is the difference vector between the current individual, X i,g , and X r1,g , and d2 is the difference vector between X r3,g and X r2,g .CROSSOVER.In step two we recombine the set of mutated solutions created in step 1 (mutation) with the original population members to produce trial solutions.A new trial vector is denoted by T i,g where i is the index and g is the generation.
where j stands for the index of every parameter in a vector, CR is the probability of the recombination and j rand is a randomly selected integer in [1, N p ] to ensure that at least one parameter from the mutant vector is selected.
SELECTION.In this last step we compare a trial vector with its parent in the population with the same index i to choose the stronger one to enter the next generation Therefore, if the problem to solve is a minimization problem, the next generation is created according to equation 4 where X i,g is an individual in the population, X i,g+1 is the individual in the next generation, T i,g is the trial vector, f (T i,g ) stands for the fitness value of trial solution and f (X i,g ) is the fitness value of the individual in the population.
The pseudocode for basic DE is given in Alg. 1.First of all we create the initial population with randomly generated individuals.Then we evaluate every individual in the population with a fitness function.Afterward we perform the three main steps: mutation, recombination and selection.First we mutate the population according Eq. 1, Eq. 2 or Eq. 3, then we recombine mutant vectors and their parents to get trial vectors according to Eq. 4, which are also called offspring.Finally we compare the offspring with their parents and the better individuals get into the updated population.From step 3 to step 7 we need to repeat it until the termination condition is satisfied.Create mutant vectors using a mutation strategy in Eq. 1, Eq. 2 or Eq. 3

5:
Create trial vectors by recombining mutant vectors with parents vector according to Eq. 4 6: Evaluate trial vectors with their fitness function 7: Select winning vectors according to Eq. 5 as individuals in the next generation 8: end while

IV. DE INTEGRATED WITH ERS
This section is devoted to the proposal of the memetic DE algorithm with integrated ERS for local search.We will first introduce ERS as a general local search method together with its three concrete (search) strategies, and then we shall outline how ERS can be incorporated into DE to enable selfrefinement of individuals inside a DE process.

A. Eager Random Local Search (ERS)
The main idea of ERS is to immediately move to a randomly created new position in the neighborhood without considering other opportunities as long as this new position receives a better fitness score than the current position.This is different from some other conventional local search methods such as Hill Climbing in which the next move is always to the best position in the surroundings.Forsaking optimality of moves in ERS is beneficial to achieve more randomness and diversity of search for avoiding local optima.Further, in exploiting the neighborhood, only a small subset of randomly selected variables undergoes changes to randomly create a trial solution.If this trial solution is better, it simply replaces the current one.Otherwise a new trial solution is generated with other randomly selected variables.This procedure is terminated when a given number of trial solutions have been created without finding improved ones.The formal procedure of ERS is given in Algorithm 2, where α denotes the portion of variables that are subject to local changes and M is the maximum number of times a trial solution can be created in order to find a better position than the current one.
The next more detailed issue with ERS is how to change a selected variable in making a trial solution in the neighborhood.This corresponds to the way to assign a possible value for parameter k in line 7 of Algorithm 2. Our idea is to solve this issue using a suitable probability function.We consider three probability distributions (uniform, normal, and Cauchy) as alternatives for usage when generating a new value for a selected parameter/variable.The use of different probability distributions lead to different local search strategies within the ERS family, which will be explained in the sequel.if This new solution is better than the parent then 12: Replace the parent solution with the new one; end if 17: end while trial solution X will get the following value on this dimension regardless of its initial value in the current solution: where rand(a i , b i ) is a uniform random number between a i and b i , and a i and b i are the minimum and maximum values respectively on dimension k.
As equal chance is given to the whole range of a variable when changing a solution, RLS is more likely to create new points with large variation, thus increasing the opportunity to jump out from a local optimum.The disadvantage of RLS lies on its fine tuning ability to reach the exact optimum.

2) Normal Local Search (NLS):
In Normal Local Search (NLS), we create a new trial solution by disturbing the current solution in terms of a normal probability distribution.This means that, if dimension k is selected for change, the value on this dimension for trial solution X will be given by where N (0, δ) represents a random number generated according to a normal density function with its mean being zero.
Owing to the use of the normal probability distribution, NLS usually creates new trial solutions that are quite close to the current one.This may, on one hand, bring benefit for the fine-tuning ability to reach the exact optimum.But, on the other hand, it will make it more difficult for the local search to escape from a local optimum.

3) Cauchy Local Search (CLS):
In this third local search strategy, we apply the Cauchy density function in creating trial solutions in the neighborhood.It is called Cauchy Local search (CLS).A nice property of the Cauchy function is that it is centered around its mean value whereas exhibiting a wider distribution than the normal probability function, as is shown in Fig. 3. Hence CLS will have more chances to make big moves in attempts to find possibly better positions and to leave away from local minima.Regarding the fine-search ability, CLS will be better than RLS though it is not expected as good as NLS.
More concretely, a Cauchy probability density function is defined by Its corresponding cumulative probability function is given by It follows that, on a selected dimension k, the value of trial solution X will be generated as follows: where rand(0, 1) is a random uniform number between 0 and 1. Create mutant vectors using a mutation strategy in Eq. 1, Eq. 2 or Eq. 3

5:
Create trial vectors by recombining mutant vectors with parents vector according to Eq. 4 if the result from local search X r is better than X best then 11: replace X best by X r in the population 12: end if 13: end while

V. EXPERIMENTS AND RESULTS
To examine the merit our proposed memetic DE algorithm compared to basic DE, we tested the algorithms in thirteen benchmark functions [24] listed in Table 1.Functions 1 to 7 are unimodal and functions 8 to 13 are multimodal functions that contain many local optima.Table 1 gives   The following specification of these parameters was used in the experiments: N p = 60, CR = 0.85 and F, F 1, F 2 = 0.9.All the algorithms were applied to the benchmark problems with the aim to find the best solution for each of them.Every algorithm was executed 30 times on every function to acquire a fair result for the comparison.The condition to finish the execution of DE programs is that the error of the best result found is below 10e-8 with respect to the true minimum or the number of evaluations has exceeded www.ijarai.thesai.org u(xi, 10, 100, 4), where yi = 1 + 1 4 (xi + 1) The results of experiments will be presented as follows: First we will compare the performance (the quality of acquired solutions) of the various DE approaches with random mutation strategy, secondly we will compare the performance of the same approaches using the current to rand mutation strategy and third we will compare the performance of the same approaches using the current to best mutation strategy.

B. Performance of the Memetic DE with random mutation strategy
First, random mutation strategy (DE/rand/1) was used in all DE approaches to study the effect of the ERS local search strategies in the memetic DE algorithm.The results can be observed in Table 2 and the values in boldface represent the lowest average error found by the approaches.
In Table 3 there is a ranking among all the approaches for every function.The last row represents the average of the rankings.
We can see in Table 2 and Table 3 that DECLS is the best in all the unimodal functions except on Function 4 that is the second best.In multimodal functions, DERLS is the best on Functions 8, 10 and 11.DECLS found the exact optimum all the times in Functions 12 and 13.The basic, DE performed the worst in multimodal functions.According to the above analysis, we can say that DECLS improve a lot the performance of basic DE with random mutation strategy and also we found out that DERLS is really good in multimodal functions particularly on Function 8, which is the most difficult function.Considering all the functions and the average ranking in Table 3, the best algorithm is DECLS and the weakest one is the basic DE. 0,00E+00 0,00E+00 0,00E+00 0,00E+00 (0,00E+00) (0,00E+00) (0,00E+00) (0,00E+00) f7  The next mutation strategy used in our experiments was current to rand mutation strategy (DE/current-to-rand/1) and the results are illustrated in Table 4.The first column of this table shows the functions that we used for testing and the results for every algorithm are given in Columns 2-5.
Table 5 shows the ranking of all the approaches for every test function with current to rand mutation strategy.
We can see in Table 4 and Table 5 that in unimodal functions the best algorithm is DECLS except in Function 3 DENLS is the best.In multimodal functions, basic DE is the worst, because it has the worst results in Functions 8 and 10, two of the most difficult functions and basic DE did not find good result in Function 9.The best algorithms in multimodal functions are DECLS and DERLS.According to this analysis we can say that DECLS is most desirable as it appeared to be competent in all functions, also in the average ranking DECLS gets the best result.0,00E+00 0,00E+00 0,00E+00 0,00E+00 (0,00E+00) (0,00E+00) (0,00E+00) (0,00E+00) f7  The last experiments were related with current to best mutation strategy (DE/current-to-best/1).This mutation strategy was used in all DE approaches to study the effect of our proposed ERS strategies in the memetic DE algorithm.The results can be observed in Table 6 and the values in boldface represent the lowest average error found by the approaches.
In Table 7 there is a ranking among all the approaches for every function.The last row represents the average of the rankings.
We can see in Table 6 and Table 7 that DECLS got the best results in most unimodal functions.DERLS is the best on Functions 8 and 10, always finding the true optimum in Function 10.DECLS is the best algorithm in Function 9 and only this algorithm always found the true optimum in Functions 12 and 13.According to the above analysis, we can say that DECLS is the best algorithm, because it is competitive in almost all unimodal and multimodal functions.DECLS also  gets the best ranking among others.Besides, DERLS is shown to be competitive in multimodal functions.

VI. CONCLUSIONS
In this paper we propose a memetic DE algorithm by incorporating Eager Random Search (ERS) as a local search method to enhance the search ability of a pure DE algorithm.Three concrete local search strategies (RLS, NLS, and CLS) are introduced and explained as instances of the ERS method.use of different local search strategies from the ERS family leads to variants of the proposed memetic DE algorithm, which are abbreviated as DERLS, DENLS and DECLS The results of the experiments have demonstrated that the overall ranking of DECLS is superior the of basic DE and other memetic DE variants considering all the test functions and various mutation strategies used.In addition, we found out that DERLS is much better than the other counterparts in very difficult multimodal functions.
In future work, we intend to improve our proposed algorithms with adaptive parameters in mutation, crossover and local search and attempting to hybridize both alternatives to take advantage of the best features from each of them.Moreover, we will also apply and test our new computing algorithms in real industrial scenarios.

Algorithm 1
Differential Evolution 1: Initialize the population with ramdomly created individuals 2: Calculate the fitness values of all individuals in the population 3: while The termination condition is not satisfied do 4:

1 )
Random Local Search (RLS): In Random Local Search (RLS), we simply use a uniform probability distribution when new trial solutions are created given a current solution.To be more specific, when dimension k is selected for change, the Algorithm 2 Eager Random Local Search 1: Set i = 1; 2: while i <= M do

6 :
Evaluate trial vectors with their fitness function 7:Select winning vectors according to Eq. 5 as individuals in the next generation 8:Identify the best individual X best in the population 9:Perform local search from X best using a ERS strategy 10: the definition of every function.The most difficult functions are 8, 9 and 10, which are shown in Figs. 5, 6 and 7 respectively.

TABLE I :
The thirteen functions used in the experiments

TABLE II :
Average error of the found solutions on the test problems with random mutation strategy

TABLE IV :
Average error of the found solutions on the test problems with current to rand mutation strategy

TABLE V :
Ranking of all approaches with current to rand mutation strategy

TABLE VI :
Average error of the found solutions on the test problems with current to best mutation strategy

TABLE VII :
Ranking of all DE approaches with random mutation strategy