An Adaptive Discrete Brain Storm Algorithm Solves 3D Protein Structure Prediction

Brain Storm Optimization (BSO) is one of the major effective swarm intelligence algorithms that simulate the human brainstorming process to find optimality for optimization problems. BSO method has successfully been applied to many real-world problems. This study employs BSO method, called BSO-IP, to solve the integer programming problem. Our method collects best solutions to generate new solutions that then search for optimal solutions in all areas of search space.The BSO-IP method solves some benchmark integer programming problems to test its efficiency. The BSO-IP is used to simulate the 3D protein structure prediction problem, which is mathematically presented as an integer programming problem to approve the viability and helpfulness of our proposed Algorithm. The experimental results of different benchmarks protein structure show that our proposed method is superior in high performance, convergence, and stability in predicting protein structure. We examined our strategy results to be promising compared to other results. Keywords—Brain storm optimization; integer programming problem; three dimensional protein structure prediction


I. INTRODUCTION
The optimization problem is a significant branch of modern science problem. Previously, Scientists took more time to find an optimal solution to these problems. However, recently, widely researched Optimization problems depend on the population. The algorithms for this subject are called populationbased optimization algorithm. The population-based optimization problem works by communicating and competing with each other, and its optimization algorithms are classified as swarm intelligence algorithms.
Particle swarm optimization (PSO) [1], bacterial foraging optimization [2] , artificial bee colony optimization [3], and ant colony optimization (ACO) [4] are examples of PSO are inspired by animals and insects such as ants, birds, and bees. Brain Storm Optimization (BSO) is a new type of PSO, proposed by Shi [5,6]. Many researchers play significant efforts to develop the BSO algorithm to make it more efficient.
BSO depends on two major functions, namely, divergence and convergence. Learning and developing capabilities are the two basic functions that BSO possesses. Divergence correlates with learning and convergence with developing capabilities. These functions find better possible solutions than the current solution, which depends on one member of the population. These two functions are essential to finding the best potential solutions to solve (NP) problems. The BSO algorithm is a mixture of swarm intelligence and data mining techniques. Each solution produced using the BSO algorithm not only solves the problem but is also an outlet to other solutions to the problem. This feature is the sole characteristic of combining swarm intelligence and data mining techniques.
Most of BSO algorithms are employed to solve the continuous optimization problem [7] and [8]. Only a few papers have been dedicated integer programming problems and their real applications like [9]. This study employs the BSO algorithm ,called the BSO-IP, to solve an integer programming problem, to solve some benchmark integer programming problems, Also, BSO-IP results were compared with those from other methods to show our method strength. The BSO-IP method makes an adaptive update to solutions by collects the best solutions to help to generate new solutions that differentiate them to search for optimal solutions in all areas of the search space.
This paper presents the BSO Algorithm approach to one of the most important problems in bioinformatics, which is the protein structure prediction (PSP) in 3D. PSP is characterized by forecasting of the 3D structure of a protein using its essential structure data. PSP is a significant research topic in bioinformatics, medication, and different fields such as sedate structure, and the forecast of maladies. The dimensional folding structure of a protein determines its biological function. There are many traditional experimental methods to determine protein folding structure such as X-ray crystallography and NMR spectroscopy [10].
PSP is presented as a mathematical form, which is an integer programming. BSO-HP algorithm, simulated to solve different benchmarks benchmarks HP model. is used to test the effectiveness of the BSo-HP algorithm.
However, they are very expensive and time-consuming because of the polypeptide chain structures such enormous number of various spatial structures. It is as yet difficult to look for the global minimum energy conformations of proteins from its sequence of amino acids and make analysis for the protein folding process. The most series problem lies in finding the simplest model representing the relationship between the structure of a protein and free energy.
Whatever is left of the paper is sorted out as taken after. In Section II, we highlight the fundamental techniques and structure for the BSO method and briefly review of the integer programming problem. The design of the proposed methods for the integer programming problem known as BSO-IP is introduced, and the numerical experiments of the BSO-IP method are discussed in Section III. The BSO method is applied to solve PSP as HP-BSO strategy in Section IV. Furthermore the correlation between the proposed technique and different strategies in Section V, Finally, Section VI shows the conclusions of this paper.

A. Brain Storm Optimization Method Techniques
The BSO algorithm was designed by Shi [5,6] like other swarm intelligence optimization algorithm but inspired by the brain of the human brain processing. Humans are the smartest living creatures ever,so algorithms based on humans and on human behavior are more effective and rewarding than those from insects, ants, and other living things.
The BSO algorithm is designed according to the brainstorming process. Osborn created four rules to generate the idea. Open-minded people generate many different ideas during brainstorming. Every population in the BSO algorithm contains a group of diverse ideas. At the end of every step of brainstorming, Every population in the BSO algorithm contain a group of diverse ideas. At the end of every step of brainstorming , every idea will be evaluated. Therefore, no idea ignored.
There are five major operations of the BSO algorithm are shown in Fig. 1 with the following description: • Population initialization.
In initialization, populations are generated randomly from the normal distribution inside the search space, and the size of the population is constant at every iteration. It is necessary to evaluate each individual after each generation because evaluated value determines the competence of the individual as the potential solution. Many of clustering types can be used in the clustering step however, the K-means clustering algorithm is applied in the BSO algorithm. The updating individual step includes two suboperation presented in the following equations: Where x i old is the summation of i-dimensional of x i old1 and x i old2 weights, and ω 1 and ω 2 are coefficients for weighting two existing individuals. ω 1 and ω 2 2 equal 0 if new individual x i new is generated depending on existing individual x i old . And if it depends on two existing individuals x i old1 and x i old2 , then the coefficient ζ(t) is randomly generated by one possibly function: where logsig() is a logarithmic sigmoid transfer function, T is the maximum number of iterations, t is the current iteration number, k is the chang in slope of the logsig(), and random() is a random value within (0,1).

B. Integer Programming Problem
An integer programming problem, as an optimization problem in mathematical form, contains a few or the entirety of the variables confined to be integers. The mathematical programming problem can be represented in a mathematical form as follows: g i (y) < 0, i = 1, . . . , I, h j (y) = 0, j = 1, . . . , J, L l ≤ y l ≤ U l , l = 1, . . . , n, where f, g, h are nonconvex functions in the general case, and n is the number of discrete variables. L y = (L 1 , . . . , L n ), and U y = (U 1 , . . . , U n ), are thelower and upper bounds for discrete variables, respectively. Problem (3) is the same as general nonlinear programming except that the design variables can take on any form of zero-one, integer and discrete variables. Therefore, the penalty methodology [11] was employed to transform this constrained problem into series of unconstrained problems, whose unconstrained solutions converge to the solutions of the constrained problem.

1) Penalty Function:
The penalty method transforms a constrained optimization problem by a series of unconstrained problems whose solutions must converge to the solution of the original constrained problem. In the case of minimization with inequality constraints, the corresponding minimization problems are formed by adding a penalty term to the objective function. The penalty term grows when the constraints are violated and is set to zero in the region where constraints are not violated. The penalty term is usually a product of a positive penalty coefficient and a penalty function. We try to solve the constrained Equation (3), where f , g i and h i are real valued function defined in search space, S ⊂ R n The general formulation of the exterior penalty function is: Where ϕ(y) is the new objective, G i and L i are called constraint violation function, and most common form is for them are Where α and β are normally 1 or 2. There are many different formulations of penalty function.

III. AN ADAPTIVE DISCRETE BRAIN STORM OPTIMIZATION ALGORITHM FOR INTEGER PROGRAMMING PROBLEM
A discrete BSO method is simulated to solve integer programming problem as NP problem, which is called BSO-IP. The key operations of the BSO-IP algorithm are designed as the following description.

A. Initial Population
Initial population p was created from the uniform random distribution inside the search space. The population size pop num is a fixed around all search processes.

B. Clustering Individuals and Disrupting Cluster Centers
Clustering analysis is considered unsupervised learning. It is a technique to divide data into several groups. The m goal of clustering algorithms is to separate data into small groups with similar and related objects. There are two ways of measuring similarity in the clustering analysis: first, finding an intercept between objects. In another way, the distance between the objects is calculated or measured; second, calculating distance is the common way to measure the similarity in clustering. The clustering process is similar to the brainstorming process of dividing ideas into small groups with similar objects. We applied the K-mean clustering algorithm [12] because its efficiency and accurate computation. Procedures 3.1 demonstrates the clustering technique. Procedure 3.1: Clustering Technique 1. Let X = x 1 , x 2 , . . . , x n be the set of data points and V = v 1 , v 2 , . . . , v c be the set of centers. 2. Randomly select 'c' cluster centers 3. Calculate the distance between each data point and cluster centers using k-mean algorithm. 4. Assign the data point to the cluster center whose distance from the cluster center is the minimum of all cluster centers. 5. Recalculate the new cluster center using: where, 'c i ' represents the number of data points in the ith cluster. 6.Recalculate the distance between each data point and newly obtained cluster centers using the k-mean algorithm. 7. If no data point was reassigned then stop, otherwise repeat from Step 3.
C. Updating Individuals 1) New Individual Generation: To generate new individuals, we employ the prior information the best individuals saved in Best-list. The best individuals are generated after generating the initial population.
A new individual is generating based on one or two clustering centers with the following details: • Two Individuals Operators Procedure 3.3: two Individuals operators 1. Select two clustering centers randomly P C1 and P C2 . 2. Determine the similar part in P C1 with the best individuals in Best-list.
3. Change the selected part in P C1 with the corresponding part in P C2 .
2) Selection: When executing the algorithm, the population size does not change; rather, it is fixed. In each iteration, a new individual is replaced with the old individual. The replacement follows the selection technique: preserving the best by comparing the new individual to the old individual in the same index and choosing the best. Finally, the Best-list is updated with the enhancement individuals.
The BSO-IP algorithm criteria are presented in Fig. 2.  Table I.   TABLE I Table II.
g 1 has various dimensions n=5, 10, 15, 20, 25, 30. The problems from g 1 to g 6 are mentioned in [13],whereas g 7 and g 8 problems are mentioned in [14].The BSO-IP method is programmed in MATLAB and ran 50 times to get the results, which satisfy the termination condition of obtaining the optimal solution with errors 10 −3 or to get the maximum number of iterations. Table III presents results for the BSO-IP method. Also, g* is the known solution, g-best is the best solution obtained by the proposed method, g-mean is the mean of the optimal values, SR is the success rate, and g-evolution is the fitness function evolution. We compare the BSO-IP method with PSO-In, PSO-Co, PSO-BO, and BB methods [15]. The BSO-IP Algorithm ran 30 times under termination conditions to reach to the exact solution with accuracy 10 −6 or got 2500 as presented in PSO-In, PSO-Co, PSO-BO, and BB methods in Table IV. Alternatively,  Table V presents the comparison between our proposed method and PSO-In, PSO-Co, PSO-BO,and BB methods. The results show that the BSO-IP method found all the optimal solutions for the test problem with the lowest fitness function evolution.  2) Results of Constrained Problems: BSO-IP method is applied to solve constrained problems. The performance of the BSO-IP method is presented on well-known problems f 1 to f 4 , [16] shown in Table VI. Our proposed method solves constrained problem by transforming it into an unconstrained problem by using the penalty function equations 5 and 4: The BSO-IP MATLAB code runs 50 times with the termination condition is to find the exact solution with an error of 10 −6 or to get the maximum number of iterations. The result of our method is shown in Table VII.  TABLE VI. BENCHMARK CONSTRAINED FUNCTIONS   functions Definition Range optimal solution f1 y 2 1 + y 2 2 + y 2 3 + y 2 4 + y 2 Table VIII presents the comparison between BSO-IP methods with MI-LXPM, RST2ANU and AXNUM methods [16] for 4 well-known problems f 1 to f 4 . The results follow after the BSO-IP code runs 50 times and the termination condition to reach the optimal solution with an error of 0.01 or achieve the maximum number of iterations. The termination condition is presented in [16] to compare our result with other methods with the same termination condition.
Table VIII also Presents the success rate (SR), fitness evaluation f-eval, and the best solution found by the solver (f-best). The result demonstrates that the BSO-IP method is promising since it found the optimal solution with the lowest fitness function evolution.

A. HP Lattice Model
The HP model, is such that each amino acid sequence is disconnected as an alphabetic string with H (hydrophobic amino acid) and P (hydrophilic amino acid). The protein adaptations self-keeping away from way on a 3D lattice. The primary thrust of the development of the tertiary structure is the communications among hydrophobic amino acids which are near the lattice yet not adjoining in the sequence, signified as H-H interaction. The free vitality of a protein conformation(X) is communicated by the quantity of H-H interactions. From Anfinsen's supposition [17], the arrangement structures a center in the spatial structure shield dissolvable by hydrophilic amino acids with negligible free vitality. So, the higher the H-H interactions, the lower the free vitality. We expected that the free vitality is equivalent to the smaller number of H-H interactions. HP lattice model is used to solve protein structure forecast problems on 2D and 3D lattice broadly. This study focused on the 3D HP square lattice model. Many meta-heuristics methods tried to solving HP models like genetic algorithm [18] [19] [20]. Example included memetic algorithm [21] , evolutionary strategy method [17] , ACO method [22] and the Tabu search method [23] [24]. A. Baz [21] applied a memetic algorithm to solve the 3D lattice HP model. M.T. Haque [25] used a genetic algorithm to solve the 3D HP lattice model. X. Zhang [23] presented an improved Tabu search for the 3D HP lattice model. T. Thalheim [22] applied ACO to predict PSP of HP model. P.H.R. Gabrial [17] presented an evolutionary strategy to solve the 3D HP model. Few papers have tried to solve the PSP problem as a mathematical model [26] and [27]. We treat this problem as a simpler mathematical model than other methods; because the our mathematical model is more accurate in finding the solution and is more time efficient.
BSO algorithm solves the 3D HP model, called BSO-HP, as an integer mathematical model. The result demonstrates the strength of BSO-HP to deal with 3D HP model as NP problem.

PSP Problem as Integer Programming Problem
The following equation presents the PSP problem as an integer programming problem. Three constraints describe the problem: first, the overlapping constraint, which prevents two nodes from being in the same coordinate; second, connectivity constraint, which prevents any cut or change in the protein's sequential arrangement and makes sure there exist a link to other nodes. Finally, the boundary constraint is for refusing the straight structure of the HP model.  BSO-HP algorithm solves the PSP problem on the basis of biological theory. Thus, the BSO-HP algorithm has been applied to all procedures in the BSO-IP algorithm besides some procedures to deal with the 3D HP protein structure.
First, protein structure used the following description to write the individual on the BSO-HP algorithm: • Protein sequence can be written as the chain of amino acid donated as S vector, S = {s 1 , . . . , s n } where n donates the length of the protein sequence, Each s in the S vector may be H or P monomers.
• Denote the direction by vector; X: it contains the direction of each three monomers, X vector has a length of n-2, and each direction is in the range of 0 to 4, where 0 means forward, 1 means left, 2 means right, 3 means up and 4 means down.
• Finally, matrix M involves the coordinate of each node (x, y, z). The nodes in beginning take two coordinates (0,0,0) and (0,0,1).   Table IX : From the structure of the protein, we will implement some procedures like initial population and updating individual procedures from the BSO-IP and used in the BSO-HP algorithm are well described below.
1) Initial Population: Every individual is represented by the direction of two nodes generated random values of length n-1, where n is the length of a sequence of protein lattice. For example , X={0, 1, 2, 3, 1, 4, 2,...., 4}. Procedure 4.1 will introduce how we generate the initial solution: 2) Updating Individual: The new individual generation method applied two methods, Attract H and move pull methods, depending on the structure of protein sequences. Attract H method considers an intensification process. which is a very important for rapid convergence to the optimal solution. Besides, the move pull method also considers a diversification process, which generates alternative solutions to cover more regions in the search space. The Attract H and the move pull methods are used to generate new solutions in one individual mutation.   Fig. 4(a) shows the p2 model without using Attract H method, and Fig. 4(b) shows the p2 model after using Attract H method. There is a clear difference since energy has a lower value after applying Attract H method. b) Move Pull Method: Move pull method is considered as an intensification process, focusing on the solution. Its function is to choose three nodes linked together, randomly and then move the three cells in all available directions as presented in Fig. 5 to find the best solution or to make an H node adjacent to another H node with no links between them. This and this improves the resulting solutions. Procedure 4.3 presents the method. We generate new individuals, either through one cluster center or more, or through one individual or two. To know which to use,a random value between the (0,1) range is generated. There are two ways to generate the new individual, The first is from one cluster, with the following procedure 4.4 describes the first updating method:    2. If the value generated is less than the predetermined value, Then select one cluster center and update it by using Attract H method using procedure 4.2 3. Else, choose a random individual from the cluster group to update it by using Move Pull method using procedure 4.3.
The second part generates the new individuals from two cluster centers or two individuals of two different clusters. This method is considered a diversification process. The following procedure 4.5 describes the second new individual generation method: Procedure 4.5: Updating individual 1. Generate random values in the (0,1) range. 2. If the value generated is less than the predetermined value, then select two random cluster centers and combine them using the crossover process. 3. Else, choose a random individuals from two clusters to combine the by crossover process.

V. EXPERIMENTS AND DISCUSSION
BSO-HP algorithm is applied in different HP benchmark models [27,28,29] shown in Table XI.

A. Parameter Settings
All parameter values are summarized with their assigned values. These values have a common setting in the literature or are determined through our preliminary numerical experiments. Table X presents additional parameters applied to solve the HP model problem.   Table XII presents the results of our proposed BSO-HP method. The best energy values founded in one run are recorded. These results emphasize that our method can find the best-known solution for all HP models except in p6 and p9 models; our method can also find the new optimal solution. Sample results presented in Fig. 6 are obtained from different dimensions. For the problem, Fig. 6(b) and Fig. 6(c) obtain the best solution from all algorithms treated with this problem.
The strength of the BSO-HP method is in finding more than one construction of the same model with the optimal solution. Fig. 7 show how the BSO-HP method found multishapes of the best solution. Fig. 7(a) and Fig. 7(b) show multiconformation of sequence p3 model with length 13 and the energy is -5, Fig. 7(c) and Fig. 7(d) show multiconformation of sequence P4 model with length 17 and the energy is -9

C. Comparison Results
BSO-HP method was compared with other methods to exhibit the strength of the method. Table XIII presents the comparison between theBSO-HP method with MCMPSO-TS [28], HGA-PSO [29], and TPPSO [27] based on reaching the optimal solution. MCMPSO-TS method was tested on p1, p2, p3, p4, p5, p6, p8, p10, and p11 and focused on the small HP lengths. HGA-PSO method was tested on p5, p7, p8, p11, p12, p13, and p14. The TPPSO method that was tested on p9, p10. BSO-HP method covered all benchmark models and not only  found the optimal solution in all models but also got the best solution compared with the rest methods.

VI. CONCLUSION
An adaptive discrete brainstorm algorithm is designed to deal with nonlinear integer programming problems and their applications. The BSO-IP algorithm used the prior knowledge of best solutions in the search space to generate new solutions. This convergence operator helps reach the optimal solution. Several sets of benchmark test problems of nonlinear integer programming problems were tested, and the results proved the promising performance of the BSO-IP. Additionally, the proposed method BSO-HP was applied to solve PSP problems as an NP integer programming problem. The BSO-HP algorithm employed the same procedures as the BSO-IP algorithm, except in some additional procedures to deal with the biological basis in the PSP problem. Numerical results  TABLE XIII. COMPARISON BETWEEN BSO-HP AND OTHER METHODS   HP  length  best  MCMPSO-TS  HGA-PSO  TPPSO  HP-BSO  p1  5  -1  -1  ---1  p2  8  -2  -2  ---2  p3  13  -5  -5  ---5  p4  17  -9  -9  ---9  p5  20  -11  -11  -11  --11  p6  21  -8  -8  ---9  p7  24  -13  --13  --13  p8 25 -9 -9 -9 --9 p9 27 -9 ---9 show that the BSO-HP method is a promising optimization method. Moreover, the BSO-HP method obtained new optimal solutions for two benchmark protein sequences. We will apply our proposed method to the other types of PSP problems as the 3D face-centred-cube HP model. Also, The proposed method obtained multishapes of the same protein sequence with the same lowest energy, a feature important to biologists. We would like to improve our proposed method to be able to help biologists in a laboratory.