Improved PSO Performance using LSTM based Inertia Weight Estimation;

Particle Swarm Optimization (PSO) is first introduced in the year 1995. It is mostly an applied populationbased meta-heuristic optimization algorithm. PSO is diversely used in the areas of sciences, engineering, technology, medicine, and humanities. Particle Swarm Optimization (PSO) is improved its performance by tuning the inertia weight, topology, velocity clamping. Researchers proposed different Inertia Weight based PSO (IWPSO). Every Inertia Weight based PSO in excelling the existing PSOs. A Long Short Term Memory (LSTM) predicting inertia weight based PSO (LSTMIWPSO) is proposed and its performance is compared with constant, random, and linearly decreasing Inertia Weight PSO. Tests are conducted on swarm sizes 50, 75, and 100 with dimensions 10, 15, and 25. The experimental results show that LSTM based IWPSO supersedes the CIWPSO, RIWPSO, and LDIWPSO. Keywords—Particle swarm optimization; inertia weight; long short term memory; benchmark functions; convergence

Since 1995, each year, new PSO variants have been created based on initialization parameters, constriction factor, mutation operator, inertia weight, topologies, parallel processing, fuzzy logic, neural networks, ensemble, etc.,. The new variants mostly supersede established PSO variants. A comprehensive review of PSO variants is discussed in [10] [11].
Many researchers focused their attention on computing inertia weigh for faster convergence of the swarm. Different Inertia Weight Particle Swarm Optimizations (IWPSO) are discussed in [12]. It is observed that every inertia weight computing strategy supersedes the other.
In this work, a new inertia weight computing strategy is proposed. It uses a trained Long Short Term Memory (LSTM) to predict the inertia weight in every iteration, till stopping criteria is met. The predicted IW is used for computation of fitness function. Its performance is compared with Constant Inertia Weight PSO (CIWPSO) [13], Random Inertia Weight PSO (RIWPSO) [14], and Linear Decreasing Inertia Weight PSO (LDIWPSO) [15] using benchmark functions [12].
The remainder of the paper is organized as follows: Particle Swarm Optimization (PSO) and Inertia Weight based PSO is summarized in Section II, the Recurrent Neural Network, LSTM and LSTMIWPSO is briefed in Section III, Experimental Results are discussed in Section IV, and in Section V Conclusion and Future Work is briefed.

II. PARTICLE SWARM OPTIMIZATION
The formulation of PSO [16] [17] [18] is done based on the objective function given in equation (1). The objective function measures the closeness of the corresponding solution to the optimum.
f(x): ℝ d → ℝ (1) where d is the number of dimensions of search space, S. S is a subset of ℝ , shown in equation (2) and defined by equation (3). The global optimization problem is shown in equation (4) and equation (5).
where 1 and 2 are two positive acceleration coefficients, random() and Random() are two random functions in the [0,1].
s then clamped to a maximum velocity , the parameter given by the user. The first part of the (6) represents the previous velocity, the second part is the cognition part of the particle, and the third part represents the cooperation among the particles [1][17] [19].
As particles tends to explore the search space hugely, the velocities of the particles are limited to the constant pv max [16]. The particle velocity is adjusted using.
The value for pv max is typically chosen as a fraction of the search space dimension shown as (4) [20] [21], where δ is the velocity clamping factor.
As the search space, S, is bounded by the interval , the velocity clamping [22] of the particle is in

A. Inertia Weight based PSO
Shi and Russell Eberhart [13], developed inertia weight based PSO (IWPSO). In IWPSO, exploration and exploitation of swarm particles are controlled. The equation (6) with inertia weight is given by equation (10).
III. RECURRENT NEURAL NETWORK Recurrent Neural Networks (RNN) are time-dynamic discrete systems dealing with input vector sequences [23] [24]. RNNs traditionally propagate information forward in time, forming predictions using only past and present inputs. The basic Recurrent Neural network is shown in Fig. 1. The traditional RNN, for each time step , the output is computed using equation (11), and the activation function < > is computed using equation (12).
where t represents time, y <t> is the predicted value, W ya , W aa , W ax , b y , and b a are the coefficients, and h and g are the activation functions. Generally, activation functions are given in equations (13), (14), and (15).
RNN is observed with vanishing [25] and exploding gradient [26] phenomenon. It is due to multiplicative gradient and resulting in its inability to catch dependencies that can be exponentially decreasing/increasing with respect to the number of layers.
In RNN, the loss function, ℒ, for all time steps is defined based on the loss obtained at every time step.

A. Long Short Term Memory
Long Short Term Memory is special kind of RNN architecture capable in learning long term dependencies. Hochreiter and Schmidhuber [27] introduced the efficient and effective, gradient based the Long Short Term Memory (LSTM). Fig. 2 depicts the dependencies of the memory cell of an LSTM depicting dependencies. In order to deal with vanishing gradient problem, The LSTM has the power to delete or add information to a cell state that is carefully controlled by mechanisms called gates [27]. LSTM uses three gates called update gate (Γ ), forget gate (Γ ) and output gate (Γ ). The computation of ̃< > , < > , < > , Γ , Γ , Γ are shown through equation (17) -equation (22).
< > = Γ * ̃< > + Γ * < −1> (21) Let � < > be the predicted output at each time step and < > be the actual output at each time step. Then the error at each time step is given by:- The value of can be calculated as the summation of the gradients at each step 583 | P a g e www.ijacsa.thesai.org Thus the total error gradient is given by equations (27) and (28):- It is to note the gradient equation involves a chain of < > for an LSTM Back-Propagation while the gradient equation involves a chain of < > for a basic Recurrent Neural Network.

B. LSTM Inertia Weight based PSO
In LSTMIWPSO, the new inertia weight is computed using LSTM. Initially, LSTM is trained with different inertia weights from 0.05 to 1.00. In every iteration, a new IW is predicted using trained LSTM. The predicted IW is used to move the swarm using equations (10) and (7). The process is terminated when the stopping criterion is reached. The pseudocode for LSTMIWPSO is shown in Fig. 3. The pseudocode for LSTMIWPSO is given below: Step 1: Initialization For each particle, P i , in the population Initialize px i with uniform distribution Initialize pv i randomly. Build and Train LSTM network for Inertia Weight Prediction. Predict the new Inertia Weight. Evaluate the objective function of px i and assigned the value to fitness[i]. Initialize pbest i with a copy of px i . Initialize pbest_ftness i with a copy of fitness i . Initialize pgbest with index of the particle with the least fitness.
Step 2: Repeat until stopping criterion is reached For each particle, P i ,: Update pv i and px i according to the equations (10) and (7) Evaluate fitness i If fitness i < pbest_fitness i then Pbest i = px i Pbest_fitness i = fitness i Update pgbest by the particle with current least fitness among the population Predict the new Inertia Weight using trained LSTM  Table I. Swarm sizes of 50, 75 and 100 particles of different dimensions, 10, 15 and 25, are considered for experiments. A total of 15 simulations are performed to reduce the occurrence of randomness. Along with LSTMIWPSO, LDIWPSO, RIWPSO and CIWPSO are implemented. The results are collected in terms of the best error, mean error, variance, standard deviation, mean square error, root mean square error, mean iteration and mean time taken (in seconds) to evaluate the performance of LSTMIWPSO with CIWPSO, RIWPSO and LDIWPSO.
From Table II and Fig. 4, the performance of LSTMIWPSO, for benchmark functions f1, f3, f4, and f5 as fitness functions, swarm size with the dimension 10, the best error is nearer to CIWPSO, RIWPSO, and LDIWPSO. The best error is moderately higher, in the case of dimensions 15 and 25. For f2 function, the best error for LSTMIWPSO is the same as CIWPSO, RIWPSO, and LDIWPSO.
For swarm sizes 50, 75, and 100 with dimensions 10, 15, and 25 and f1-f5 as fitness functions, the mean error is computed using the CIWPSO, RIWPSO, and LDIWPSO, and LSTMIWPSO. The processed results are collected and tabulated in Table III and graphically shown in Fig. 5. The mean error, except for swarm size 100 and dimension 10, when compared to CIWPSO, RIWPSO and LDIWPSO, for LSTMIWPSO, is smaller.
The variance and standard deviation are computed to access the performance of CIWPSO, RIWPSO, LDIWPSO and LSTMIWPSO. The computed results are tabled in Table  IV and V. The same are shown graphically in Fig. 6 and Fig. 7. From Table IV, Table V, Fig. 6, and Fig. 7, it is evident that the performance of LSTMIWPSO in terms variance and standard deviation is flair with swarm sizes 50, 75, and 100, with dimensions 10, 15 and 25 on the benchmark functions f1 -f5.
To access the CIWPSO, RIWPSO, LDIWPSO and LSTMIWPSO performance, the MSE and RMSE are computed. Tables VI and VII show the computed results. The same is seen in Fig. 8 and Fig. 9 graphically. It is evident from Table VI, Table VII, Fig. 8, and Fig. 9 that LSTMIWPSO's output in terms of MSE and RMSE is substantially better for swarm sizes 50, 75, and 100, and for benchmark functions f1 -f5, with dimensions 10, 15 and 25, except for the swarm size 100 and dimension 10.
From Table VIII and Fig. 10, the meantime for LSTMIWPSO is transcending for the swarm sizes 75 and100 with dimension 10. In other scenarios, it is non-paying when compared with other methods for the benchmarks considered.
From Table IX

V. CONCLUSION AND FUTURE WORK
In this paper, a new inertia weight based PSO using LSTM (LSTMIWPSO) is presented. A set of 5 most common optimization test problems and eight criteria are considered to assess the performance of LSTMIWPSO against CIWPSO, RIWPSO, and LDIWPSO. The overall outcome shows that LSTMIWPSO is progressive with CIWPSO, RIWPSO, and LDIWPSO. In the future, the parameters of LSTM are tuned to enhance efficiency. Also, more experiments with larger swarm sizes and dimensions are conducted to evaluate LSTMIWPSO performance with other existing inertia weight based PSO. There is a scope for the use of LSTMIWPSO in the optimization of the different optimization applications without any restriction of the domains specified.