FPGA Implementation of Parallel Particle Swarm Optimization Algorithm and Compared with Genetic Algorithm

In this paper, a digital implementation of Particle Swarm Optimization algorithm (PSO) is developed for implementation on Field Programmable Gate Array (FPGA). PSO is a recent intelligent heuristic search method in which the mechanism of algorithm is inspired by the swarming of biological populations. PSO is similar to the Genetic Algorithm (GA). In fact, both of them use a combination of deterministic and probabilistic rules. The experimental results of this algorithm are effective to evaluate the performance of the PSO compared to GA and other PSO algorithm. New digital solutions are available to generate a hardware implementation of PSO Algorithms. Thus, we developed a hardware architecture based on Finite state machine (FSM) and implemented into FPGA to solve some dispatch computing problems over other circuits based on swarm intelligence. Moreover, the inherent parallelism of these new hardware solutions with a large computational capacity makes the running time negligible regardless the complexity of the processing. Keywords—PSO algorithm; GA; FPGA; Finite state machine; hardware


INTRODUCTION
Over the last decade, several meta-heuristic algorithms are proposed to solve hard and complex optimization problems.The effectiveness of this algorithm give satisfaction to solve the most difficult problems for many algorithms related for various optimization problems.The proposed architecture is tested on some benchmarks functions.We have also analyzed the operators of GAs to describe how the performance of each one can be enhanced by incorporating some features of the other.We used standard benchmarks functions to make comparison between the two algorithms.In fact, PSO algorithm use the technique [1] that explores all the search space to fix parameters that minimizes or maximizes a problem.So, the ability and the simplicity to solve complex problems make the studies active in this area compared with many others optimization techniques [2] [3].This research attempts to present that PSO has a good effectiveness to find the best global optimal solution as the GA but with a better computing efficiency (less using of resource hardware and execution time).The main objective of this paper is to compare the computational efficiency of our optimized PSO with GA and other PSO algorithms using a set of benchmark test problems.The results of this optimization algorithm could prove to be important for the future study of PSO.The organization of the paper is described as follow: The first chapter briefly introduces the general steps performing the mechanism of PSO.Especially, a brief introduction of pseudo random number generator [4].The next section describes the background functional architecture which performs the GA and PSO algorithm.In chapter 3, a description of the architecture used in the hardware implementation of PSO and genetic algorithm; the second part illustrates the experimental results of some benchmarks functions applied into the PSO algorithm and compared with GA and others PSO algorithms.Finally, we conclude our work and we make some implications and directions for future studies.

II. PARTICLE SWARM OPTIMIZATION
In Particle Swarm Optimization algorithm we can say that each « bird » may be a solution through a search space.Birds are called particles and to explore all the search space, each particle is evaluated by the fitness function and to manage the flying of the swarm to the prey, they use velocities module.Each particle flies around the solution by following the optimum position of particles [5] [6].All particles are associated with points in the search space and their positions are depending on their own solution and of their neighbors.Some particles come into play randomly in every iteration through this environment; they look the assessment of themselves and their neighbors [7].Then, they follow successful particles of the given problem.PSO algorithm give satisfactory results in solving many dispatch problems related to biology medical, finance, 3d graphics, image processing and others.[8], but it is hard to choose the setting parameters because it is too complicated to find the best setting of a desired application.So, we have to set first, several parameters of the PSO algorithm: [9]

 Number of iteration
In the beginning we generate a random population after that we search for the best solution after each iteration.Then, the particles update their positions using two best solutions.The first one is the best solution towards the problem and it is www.ijacsa.thesai.orgnamed « lbest ».The other optimal solution is followed by the PSO algorithm and obtained by any particles from the population and it is named « Gbest ».

A. The random number generator
Programming PSO algorithms requires the use of random generator; there are several methods to generate a random numbers.In fact it is impossible to generate a random number based on algorithms that"s why they are called pseudo random number.The random generators programs are particularly suitable for implementation and effective.Most pseudo random algorithms try to produce outputs that are uniformly distributed.A common class generator uses a linear congruence.Others are inspired by the Fibonacci sequence by adding the two previous values.Most popular and fast algorithms were created in 1948 D. H. Lehmer introduced linear generators congruentiels and will eventually become extremely popular.
In our algorithm we used the bloc of the pseudo random generator [13] at the initial position of particles and in the velocity vector.We choose the frequently used pseudo-random generator called the linear congruent of Lehmer: (1) Where: -F n +1 : is the random number obtained from the function F -F n : is the previous number obtained -A and B : are multiplicative and additive value, respectively -C : the modulo number

B. Position and velocity equations
Velocity equation allows changing the position of a desired particle and generally, the objective of using PSO algorithm is to indicate by their positions the distance to the best particle.So, these equations are updated throughout the race of iterations using the equations below: xi(t) is the particle position at time t and vi is the velocity of particle at the instant t(i), w is parameters, c 1 and c 2 are constant coefficients, r 1 and r 2 are random numbers at each iteration, « Gbest » is the optimal solution found until now and « lbest » is the best solution found by the particle i.So, generally the velocity vector allows directing the research process and reflects the sociability of the particles.
The convergence to the optimum solution can be fixed by a number of iterations depending on the fitness or when the variation tends to zero (like sphere function) or when it tends to the best minimized solution.Here some parameters that comes into play:  In this table we present a sample from the sphere function, we can easily see that the "lbest" of particle x 1 is located in it 3 and the "lbest" of particle x 2 and x 3 are located in it 2 but the "gbest" is x 3 and located in it 2 .
The global minimum for the sphere function is clearly located at x i = 0, in each iteration we pick the "lbest" and we save the results into memory in order to compare its value with the new position of particles in the next iteration.

A. GA architecture
To optimize a problem in GA, we have to explore all the searching state in order to maximize (or minimize) a chosen function.So, the use of genetic algorithm is suitable for a quick exploration of an area.The organizational chart that describes the architecture of GA is shown by the following figure.www.ijacsa.thesai.orgFor hardware implementation of PSO algorithm, the architecture is decomposed into five operations that are performed on each particle: update the position, evaluate the fitness, update the particle's best position, update the global best position and update the velocity.
We can demonstrate from the two architectures that the two algorithms share some common points.In fact the two algorithms begin with a random population in the search space and both of them use fitness module to evaluate the generation.
Both of them update the generation and search for an optimal value using the pseudo random number but the two of them does not guarantee the success.However, the Particle Swarm Optimization doesn"t have crossover and mutation operators.Indeed, PSO update its particles using the velocity module.

C. The FSM
In our paper a dynamic parallel PSO is implemented to be applied into large optimization problem and compared with GA and others PSO algorithms.The FSM is used to exploit all type of parallelism to find the optimum solution in a reduced portion of times.The dynamical process of FSM is represented in figure 3, in fact every state may have at every time a position of many possible finite states.Firstly, we must propose a number of fixed states; every transition may have one or more around states.In this way, states which have only one state and have no possible transitions we named the final states.
The algorithm performs the updating of the optimum fitness number after the evaluations for all the particles.Here, when we update their positions and velocities we can obtain a good convergence rates after evaluating each particle.In a dynamic parallel computing, the main factor of performance is the communication latency after each transition between states.The goal of parallel dynamic computing is to produce optimal results even when we use multiple processors to reduce the running time.In this architecture we used pair memory modules to compound the bandwidth and thus, we can ameliorate the capabilities of our algorithm and we cannot do this only if we use Dual Channel bloc RAM.In that way we can access to the data memory in two modes write or read at the same frequency.There are problems with the dual RAM.In fact, the reading time of the content of memory is delayed by one clock comparative to the last reading.The description of the 8 states is presented in the sequel:   Luckily, new advances in processor technology are capable and available to compute a complex program and use low cost power beyond clusters of mid-range performance computers.So, the dynamic process implemented in the particle swarm optimization could be separated in two states which update position and velocity of each particle using dynamic process with the goal to reduce the processing time.
In our paper, the soul of the parallel processing was used to generate a dynamic PSO algorithm and the aim of using parallel computing to the PSO algorithm, is to speed up the algorithm processing using a uniform distribution method to achieve optimum solutions with a significant execution time.
Figure 3 present the finite state machine of the global control module; especially, it presents step by step the code of the PSO in order to keep the algorithm more practical.

D. Benchmark test functions
The Most researchers use a number of population size between 10 to 50 for the performance comparison between algorithms, here we fixed the population at 20 chromosome for the GA and the same for PSO algorithm.To test the PSO and to compare its performance with other algorithm, we used some standard benchmark functions which are described as below:  Sphere function  Rosen-brock function  Rastrigin function  Zakharov function Some well-known benchmark functions have been selected for comparing the two implementations.So, to test and compare the performance of our proposed PSO algorithm we used unimodal and multimodal functions.These functions are described as below:

IV. VALIDATION EXAMPLES
Most researchers use a number of population size between 10 to 50 for the performance comparison of GA and PSO, the swarm size used for the PSO is the same as the population size used in GA and is fixed at 20 particles in the PSO swarm and 20 chromosomes in GA population.In the GA all variables of each individual are represented with binary strings of "0" and "1" that are referred to as chromosomes.Like genetic algorithm, PSO begins with a random population and to perform its exploration, GA use three operators (crossover, selection and mutation) to propagate its population from iteration to another.

∑
Sphere function is useful to evaluate the characteristics of our optimization algorithms, such as the robustness and the convergence velocity.This function has a local minimum and it is unimodal and continuous.The interval of search space is between [-1,1].Figure 4 present the results of simulation using modelsim of the sphere function.The detailed results describe that our solution converges to zero from iteration to another.These particles work together in a parallel dynamic state to get the best solution of any function.They update position and velocity even if the algorithm has a lot of particles and this cannot make a hard impact on the global execution time speed.Indeed, the number of particles in this algorithm is limited by the size of embedded features of FPGA.The following tables present the number of LUT (Look up Table ), bloc RAM and all the resource materials used in this function.In the following figure, we can easily see the difference between the two algorithms, here the PSO algorithm give better optimization in the use of hardware resources than the Genetic Algorithm.We implemented the sphere function with two algorithms, GA and PSO using Spartan 3 from Xilinx, and then we can realize that the processing time of one iteration of PSO algorithm gives higher operation speed for optimization problems rather than genetic algorithm.The following table describes this.

B. The rastrigin function
This function is described below:

∑
The Rastrigin function contains several local minima.But it has just one global minimum and it is highly multimodal and the location of the minima is distributed regular.The synthesis results of the whole system are shown in the following tables: We can easily see that GA require a lot of hardware resource while the PSO algorithm use less number of slice and flip flop as it shows the following figure.

C. Rosenbrock function
The function of rosenbrock is a non-convex benchmark of two variables which is used to test some mathematical optimization problems.It was introduced in 1960 by Howard H. Rosenbrock and it is known by the banana function name.
In this function the global minimum of search algorithms converge easily.The function is described as follow: The global minimum is obtained at point (x, y) = (1, 1), for which the function is 0. A different coefficient is sometimes given in the second term, but that doesn"t have a great affect in the position of the global minima.

D. Zakharov function
We used another benchmark which is the zakharov function whose global minimum occurs at x = (0): The platform of Spartan-3 FPGA is from Xilinx.The Spartan3 is one of the best low cost generation of FPGAs and the board can offers a choice of many platforms which deliver a unique cost optimization balanced between programmable logic, connectivity and hardware applications.It creates a PROM file and this latter can be written to the non volatile memory of Spartan-3.The platform of Spartan3 board includes the following elements (Figure 11    To make a comparison of this algorithm to deliver better solution in a significant time especially, its robustness and speed, we have tested it against other meta-heuristic algorithms, like genetic algorithms and another PSO algorithm.For GA, we used the basic model with elitism method and a probability of mutation equal 5%.The simulations have been carried out using spartran-3 of Xilinx with 50MHz.We have also fixed the population n = 20 for all simulations.The results are favorable and proved that Real BAT can be effective for many problems related to any algorithms used.The experiment results was carried out at minimum 5 % which allow judging whether the results of the PSO are acceptable and optimized in execution time compared to the best results of other algorithms.Since its invention, many researchers have worked on the PSO algorithm [11] and how to accelerate its performance to give a good convergence and to reduce the use of hardware resource for embedded applications.In this section we will present some works lean on parallelization algorithms proposed by other researches.In fact, there are many interesting improvements using PSO algorithm for several applications; al.Reynolds [12] suggested a smart technique for modified PSO algorithm using neural networks.His technique is based on a deterministic approach while the particles update their positions to simplify the hardware implementation because the standard PSO algorithm has been implemented to use random generators only for the operations of update and to reduce the hardware resource Upegui and Peña [13] use a discrete recombination of PSO algorithm called (PSODR), that"s allow to decrease the time of computing of the velocity module.It is clear that these modified PSO algorithm allows generating competitive results compared to those of the basic PSO algorithm [14].Moreover another works on the PSODR algorithm are proposed by Bratton and Blackwell with simplified models of the PSODR algorithm are analyzed and proposed by Blackwell and Bratton [15] with effective results and promising.
Many researches presented a modified variant of PSO either to reduce the materials resource or to eliminate explicit problem related directly on the architecture of PSO.That"s why we developed a modified architecture using finite state machine to program a parallel algorithm that could give effective results to solve several problems [16].Thus, we fixed the representations of the data by 20 particles to bearing several purpose of applications.
A comparison performance of PSO algorithms on some processors platforms are represented in the following table.We choose two different processors platforms, the Xilinx xc3s500 [17] and the Xilinx Micro-Blaze soft processor core for the Sphere test function.The random number generator plays a big role in the implementation of the two algorithms.That"s why we can obtain some difference in the number of iterations even we use the same equation of random generator and the same initial seed used for the three tests.In order to evaluate the performance of our proposed PSO algorithm, we consider and compare two implementations of the PSO process: the first one is our algorithm and the second use the processor Xilinx MicroBlaze [18].
[10]:Position and velocity equations of particles  Number of particles in the search space  The Gbest fitness achieved. Positions of particles having the best solution of all.
The number of population. The size of the neighborhood. The dimension of the search space. The values of the coefficients. The maximum speed.Each iteration allows the particles to move as a function of three components:  Its current speed  Its local best solution  The global best solution in its neighborhood.

Fig. 1 .Fig. 2 .
Fig. 1.Architecture Of The GAB. PSO architectureThe architecture of our optimized PSO algorithm is presented in the following figure:


S0: Initialize parameters, signals and counters of PSO algorithm and goto S1  S1: Generate initial population and their velocities using random generator and goto S2 or S3  S2: Save positions and velocities value into memory (RAM)  S3: Evaluate particles using fitness module and goto S4 or S5  S4: Save evaluated value into Bloc RAM and goto S6

Fig. 3 .
Fig. 3.The finate state machine of the PSO algorithm If fit (i) < local-best (i)Then update local-best (i) Results If fit(i) <Global-best(i) Then update Global-best Update velocities Update particles positions Update iteration www.ijacsa.thesai.org

Fig. 6 .
Fig. 6.Comparison of hardware resource between PSO and GA

Fig. 8 .
Fig. 8. a comparison of hardware resource used in the two algorithms

 4 , 4
320 logic cell and equivalents  12 x 18K of bit block RAMs (216K bits)  12 of hardware multipliers (18x18) Digital extern clock (DCMs)  A lot of I/O signals and it is up to 173  Three "40" pin expansion connectors  PS/2 mouse/keyboard port, VGA port and serial port.

Fig. 12 .
Fig. 12. Display of the number of iteration to achieve the optimal solution VI.RELATED WORK

TABLE I .
EXAMPLE OF SOME SELECTED PARTICLES Test gbest If fit(i) <Global-best(i) then update Global-best and if the number of iteration is achieved then go to final state else go to State S7  S6: Test lbest If fit(i) < local-best(i) then update localbest(i) then, go to S2 and return to state S4  S7: Update particles positions and velocities  S8: Update the number of iteration if iteration not achieved then go to state S3 else go to final state (SF)  SF: Display the optimum solution.

TABLE II .
DEVICE UTILIZATION SUMMARY OF PSO TABLE III.DEVICE UTILIZATION SUMMARY OF GA

TABLE IV .
PROCESSING TIME OF ONE ITERATION

TABLE V .
DEVICE UTILIZATION SUMMARY OF PSO TABLE VI.DEVICE UTILIZATION SUMMARY OF GA

TABLE VII .
COMPARISON OF OTHER PLATFORMS