An Efficient Binary Clonal Selection Algorithm with Optimum Path Forest for Feature Selection

Feature selection is an important step in different applications such as data mining, classification, pattern recognition, and optimization. Until now, finding the most informative set of features among a large dataset is still an open problem. In computer science, a lot of metaphors are imported from nature and biology and proved to be efficient when applying them in an artificial way to solve a lot of problems. Examples include Neural Networks, Human Genetics, Flower Pollination, and Human Immune system. Clonal selection is one of the processes that happens in the human immune system while recognizing new infections. Mimicking this process in an artificial way resulted in a powerful algorithm, which is the Clonal Selection Algorithm. In this paper, we tried to explore the power of the Clonal Selection Algorithm in its binary form for solving the feature selection problem, we used the accuracy of the Optimum-Path Forest classifier, which is much faster than other classifiers, as a fitness function to be optimized. Experiments on three public benchmark datasets are conducted to compare the proposed Binary Clonal Selection Algorithm in conjunction with the Optimum Path Forest classifier with other four powerful algorithms. The four algorithms are Binary Flower Pollination Algorithm, Binary Bat Algorithm, Binary Cuckoo Search, and Binary Differential Evolution Algorithm. In terms of classification accuracy, experiments revealed that the proposed method outperformed the other four algorithms and moreover with a smaller number of features. Also, the proposed method took less average execution time in comparison with the other algorithms, except for Binary Cuckoo Search. The statistical analysis showed that our proposal has a significant difference in accuracy compared with the Binary Bat Algorithm and the Binary Differential Evolution Algorithm. Keywords—Feature selection; artificial immune system; clonal selection algorithm; optimization; optimum path forest


I. INTRODUCTION
Artificial Immune System (AIS) uses ideas inspired by the immune system of the human body for solving different kinds of problems in various research areas like pattern recognition, data mining, machine learning, and optimization. Clonal selection is an important branch in AIS that is responsible for the response of the immune system to harmful antigens. It selects the cells (antibodies) that identified the antigens (Ag) to proliferate. Then, the procedure of affinity maturation is applied to the selected cells to improve their affinities to be suitable for the selective Ag [1]. The mentioned characteristic of the Clonal Selection Algorithm helped to make it very appropriate for solving multidimensional optimization tasks, where the optimization can be defined as a searching process for the best solution among the available solutions to a specific problem.
The target of the feature selection problem is to compose a subset that contains the best features selected among all features in a particular domain. The obtained features subset can be used to optimize an objective function of a certain problem, so the problem of feature selection can be categorized as an optimization problem. Solving this problem can be very useful in decreasing the dimensionality of the data and removing irrelevant and noisy data, subsequently, it will have a good effect on the implementation and execution of many applications.
Many natural inspired algorithms were used to find a solution to the problem of feature selection, such as Particle Swarm Optimization (PSO) [2,3], Binary Ant Colony Optimization (ACO) [4,5], Gravitational Search Algorithm (GSA) [6], Binary Differential Evolution (BDE) [7], Cloning Algorithm [8,9], Artificial Fish Swarm (AFS) [10,11], Harmony Search Algorithm (HSA) [12], Binary FireFly Algorithm (FFA) [13], Binary Cuckoo Search (BCS) [14], Binary Bat Algorithm (BBA) [15], Binary Flower Pollination Algorithm (BFPA) [16], and Binary Clonal Flower Pollination Algorithm (BCFA) [17]. The natural immune system uses the clonal selection to select the best cells that can recognize the antigens. The chosen cells are proliferated and then matured to improve their affinity to the particular antigens. The clonal selection concept has a serious role in the success of the human immune system and has an excellent ability of selection at work. As the feature www.ijacsa.thesai.org selection problem can be defined as selecting the optimal subset of features to improve the fitness function of a particular problem, thus, the clonal selection algorithm was chosen in the current study to find a solution for the feature selection problem, as it has achieved good results in solving many problems of different applications. Such as function optimization [18], pattern recognition [19], scheduling [20] and industrial engineering (IE) related problems [21].
In BCSA, the search domain is designed as an ndimensional, where n refers to the features number. The concept of the algorithm is representing each solution as a binary set of coordinates that indicate where a feature will be selected or not. The fitness function to be maximized is the accuracy of the Optimum-Path Forest (OPF) classifier [22,23]. The point is to train the classifier and calculate its accuracy every time the solution is mutated; so fast and robust classifier should be used to handle this task like OPF that has been used in many applications and achieved good results similar to that of Support Vector Machine (SVM) classifier but OPF is faster than SVM in the training phase [15].
Although there are many algorithms that were applied to find the solutions to the problem of feature selection as mentioned above, the accuracy of the classification and the speed of execution still need to be enhanced. In this paper, we introduced a modified binary clonal selection algorithm to improve the accuracy and the speed of solving the feature selection problem, taking into consideration reducing the number of features. The proposed algorithm was compared with BFPA, BCS, BBA, and BDEA. Public UCI datasets [24] were used in the experimental results. Experiments include sensitivity analysis and execution time comparison.
The paper is sectioned as follows; Clonal Selection Algorithm is presented in Section 2. The Optimum Path Forest (OPF) classifier is explained in Section 3. The proposed Binary Clonal Selection Algorithm (BCSA) is presented in Section 4. Experimental results and discussion are demonstrated in Section 5. Finally, Section 6 contains conclusions and future work.

II. THE CLONAL SELECTION ALGORITHM
The biological immune system is partitioned to innate immunity and adaptive immunity, mostly, the researchers propose ideas that are based on the latter. The natural immune system has an ability to protect the human body from the attack of harmful microorganisms; it can discriminate between the normal inhabitant microorganisms and harmful ones. The harmful organisms are foreign bodies that can stimulate our immune system, so they are called antigens (Ag). The immune system of the human body produces another component called antibody (Ab) to attack each antigen, and each Ag has a specific Ab to recognize it.
The clonal selection is responsible for the adaptive response of the immune system to the foreign antigens as proposed in [1], where the cells (antibodies) that detect these antigens are stimulated, cloned, and divided into plasma and memory cells. The algorithm of clonal selection can be considered as one of the evolutionary strategies that have the ability to solve the complicated problems in different areas. The features of clonal selection theory [32] can be listed as follows:  All antibodies are mutated for maturation. The mutation can be seen as genetic changes for better recognition of antigens.
 The antibodies that carry self-reactive receptors are removed from the repertoire (the set of antibodies).
 Proliferation and differentiation for the most-stimulated antibodies.
 The best set of antibodies is chosen as memory cells for any future attacks.
The AIS algorithm [1] is explained as follows: 1) Initialize population of solutions P, some of these solutions are stored as a memory M and others are the remaining solutions Pr, so P = Pr + M; 2) The solutions n that achieved the highest affinity measure are selected to compose the population P n ; 3) The selected solutions n are cloned (reproduced), generating a clone population C. The rate of cloning is proportional to the solution affinity with the objective function (antigen); 4) The clone population C is submitted to an affinity mutation process, the rate of mutation is inversely proportional to the solution affinity with the objective function (antigen), generating a maturated population C*; 5) From the matured population C*, reselect the highest affinity solutions to be stored in the memory part M, also some solutions of P can be exchanged with other maturated solutions of C*; 6) The lowest affinity solutions d are replaced by new initialized solutions to increase the population diversity.
In optimization problems, the objective is to look for the best solution among all available solutions, the role of AIS is to develop the solutions depending on the mechanisms of the natural immune system like clonal selection, immune network theory, or other immune system concepts. The algorithm of clonal selection optimization consists of a set of candidate solutions (antibodies) and a set of objectives (antigens), where the antibody tries to match or catch (optimize) the antigen [33].

III. THE SUPERVISED OPTIMUM-PATH FOREST (OPF) CLASSIFIER
OPF is a supervised classifier that can deal with the labeled samples, it has faster training advantage than other classifiers like SVMs and ANNs [22,23], so it is expected to be useful in the current study. OPF is a graph-based classifier in which the features are represented as graph nodes and these nodes are connected by using some adjacency relation, where the arc between two nodes can be defined as a sequence of www.ijacsa.thesai.org adjacent nodes. Euclidean norm is calculated to weight the arc between every two nodes, defining a complete graph.
In the current research, the dataset is partitioned into four subsets training set Z 1 , learning set Z 2 , evaluating set Z 3, and testing set Z 4 . The graph is represented by (Z 1 , A), where the samples in Z 1 can be considered as the graph nodes and each pair of samples in A= Z 1 Z 1 can be performed as the graph arcs, as explained in Fig. 1(a), where there is a complete graph of different class samples (circles and stars). Function λ(s) is responsible for assigning the correct class label i to any sample s ∈ Z 2 ⋃ Z 3 ⋃ Z 4 . The set of prototype samples of all classes is represented by S ⊂ Z 1 .
OPF classifier has three phases in its procedure [22,23]: the training phase, the learning phase, and the classification/testing phase. In the training phase, the purpose is to generate an optimum-path forest that contains a group of discrete trees with optimum paths (OPTs) rooted in a special set S ⊂ Z 1 called prototypes. The algorithm minimum spanning tree (MST) should be applied to generate these prototypes that are samples with different class labels and have the same arcs, as explained in Fig. 1(b), where the dashed circles and stars are samples with different class labels, then the arcs that connect these samples are removed to produce a group of trees rooted in the generated prototype samples as described in Fig. 1(c), where there are two trees; one is rooted by a circle sample and the other is rooted by star sample.
The connectivity function for a path-cost F max is calculated as follows [22,23]: In which, d(x,y) represents the distance between the nodes x and y, and is the path that can be defined as a sequence of adjacent samples where π x is the path that ends in sample x ∈ Z 1 . The path is defined as a trivial if π x = <x>. The function calculates the maximum distance of the path . Now, the classifier has been initialized by the training set Z 1 during the first phase, then it will use the learning set Z 2 with Z 1 in the learning process during the learning phase. The target of this phase is learning OPF from its errors for enhancing its performance. The process starts with the training set Z 1 to initialize the classifier and generate an initial instance I that can be evaluated over the set Z 2 . Then, Z 2 samples that were misclassified are selected and exchanged with random selected non-prototype samples from Z 1 to generate the new sets Z 1 and Z 2 . The learning over the generated new sets Z 1 and Z 2 continues until the few iterations T are met. This technique helps in increasing the effective samples in the training set Z 1 and this is very important because this set will be used with the testing set in the final phase. Finally, the best classifier instance with the best accuracy will be employed in the final testing phase. The OPF accuracy is calculated by this equation: In which, the function L(I) is the classifier accuracy of the instance I, c refers to the classes number and the error function E(i) is calculated as follows: Where NZ 2 (i) is the samples number N in Z 2 for each class i in which i=1,2,3...c. FP (i) is the false positive, which refers to the samples number in Z 2 that were incorrectly classified as class i but they belong to the other classes. While, FN (i) is the false negative, which indicates the number of Z 2 samples that were wrongly classified as being from other classes but they belong to class i. As regards to the classification/testing phase, the target is to classify the new sample by assigning a class label to it, and this is done by connecting a new sample y∈ Z 3 (or Z 4 ) to all training nodes, as similar to the square shape (test sample) in Fig. 1(d). Then, the distance d(x,y) between the test node y and each training node x∈ Z 1 is calculated and used to weight the arcs. The test sample will be classified by assigning to it the class label of the training sample that achieved the minimum path-cost with it, as shown in Fig. 1(e). The pathcost C(y) between the samples is computed as follows: It can be supposed that x*∈ Z 1 is the training sample that achieved the optimum cost with the test sample y depending on Eq. 5. In this context, L(x*) = λ(R(y)), where; R(y) is used to get the root of sample y and the function λ(y) is used to assign the correct class label. The classification assigns the class label of x* as the class label of the test sample y. The classification error happens in the case that L(x*) ≠ λ(y). The same procedure of the classification is applied to the learning samples in Z 2 . However, some samples of Z 2 that were misclassified are used to teach the classifier and improve its classification performance.
OPF is explained in the following pseudo-code of Algorithm 1 that is applied to measure the solution fitness function and also when the solution is mutated to measure its new fitness function.
The algorithm receives the training and evaluating sets as input data to learn through them along with the iterations of the loop (1-10), where the classifier is trained over Z 1 and then evaluated over Z 2 . The last best accuracy (bestAcc) is compared with the obtained accuracy over Z 2 (acc) and in case that the latter (acc) is higher than the last bestAcc, the best instance classifier is updated with the current classifier; otherwise, the last best is kept. Then, the misclassified samples of Z 2 x w y Z 1 . These steps continue until the stopping condition is met. By following these steps, the classifier can increase its classification quality by learning from its classification errors. Finally, the accuracy of the best classifier instance is returned and used as the solution fitness function. www.ijacsa.thesai.org Output: The best classifier instance I.
The accuracy of the classifier instance I is calculated using Eq. 2 and stores it in acc variable; 5.
The current classifier instance is updated to be the best instance; 7. bestAcc = acc; 8.
The misclassified samples of Z 2 are exchanged with random non-prototypes samples from Z 1 ; 10. End while 11. Return the best classifier instance where its accuracy represents the fitness function of the solution; www.ijacsa.thesai.org In the original procedure of CSA, the solutions are updated to continuous positions, while in BCSA, the search domain is modeled as an n-dimensional, where n denotes the features number. The algorithm represents each solution (individual) in the population as a string of binary in which 1 refers that the feature will be chosen to construct a new dataset with selected features but 0 otherwise, which means that each solution encodes subset of features. The affinity function of the solution is calculated by measuring the accuracy of the OPF classifier in Algorithm 1 where each solution may have a different subset of features, so the training and evaluating subsets may be different among the solutions. The mutation process is done on each solution by applying a random walk like the distribution of Lévy flights as follows: Lévy z Γ(λ) refers to the gamma function. Each mutated solution is converted to a binary vector by employing Eq. 8 which can provide only binary values: Where Where is the mutated solution i with its feature vector j th σ ∼ U(0,1).
The proposed algorithm deals with the selected features problem as an optimization problem where it searches for the best solution (antibody) with the best features subset that achieves the highest accuracy of OPF classifier (antigen). The pseudo-code of the proposed algorithm BCSA is explained in detail in Algorithm 2.
The algorithm initializes a population of solutions through the loop in Lines 1-6. The position of each solution is Input: Training set (Z 1 ), Evaluating set (Z 2 ), Population size (p), Number of features (f) and number of iterations (T).

Output:
The best solution with the selected features that achieved the best fitness value over Z 2 . For each solution s ( s=1,2,....p)  Apply a random walk on each solution (mutation) in clones population to compose maturated population; 16.

1.
The fitness function of the solution is measured by applying Algorithm 1 with the new sets of mutation ; 17. End for 18. The highest solutions are chosen from the original and the maturated populations to compose the memory set for a next-generation ; 19. Replace solutions by novel ones (diversity introduction) ; 20. End while 21. Return the highest solution that fulfilled the best accuracy (fitness function) and its selected features will be used in testing OPF classifier (Testing phase) ; www.ijacsa.thesai.org initialized by a vector of random binary values (Lines 2-4). Then, the loop in Lines 7-10 constructs the new training set Z 1 and evaluating set Z 2 with the selected features. After that, Algorithm 1 is applied to the newly constructed training and evaluating sets to measure the OPF accuracy to be the solution fitness function f s . The main functionality of the proposed algorithm is explained through the iterations in  where the highest solutions n are selected and then proliferated (cloned) according to their affinities where the cloning is proportional to the affinity, as explained through Lines 12-13. Moreover, lines 14-17 contain a loop on the clones` population, in which the mutation process (Eq. 6) is applied for each solution and this mutation is restricted by Eq. where the solution that got the best OPF accuracy overall runtime iterations is returned and its selected features will be used in the testing phase.

V. RESULTS AND DISCUSSIONS
The proposed techniques were developed by Java language through a PC Intel Core i5 with 8GB RAM and Windows 7 operating system. The procedure of the experiment run each algorithm 10 times to get the average values, each technique in the experiment used 20 solutions for the population size, and the internal iteration number for each technique was 1000. The results of BCSA were compared to the results of BFPA, BCS, BBA, and BDEA. Table I presents the parameters of the employed optimizers.
The experiments were executed using UCI public datasets which are called the Australian dataset, Breast Cancer dataset, and German Number dataset. Table II illustrates the details that are related to the used datasets. In the current study, the datasets were randomly partitioned into four disjoint subsets Z 1 , Z 2 , Z 3, and Z 4 . Z 1 is the training set that was used in the experiment with a percentage of 30% to initialize and train the classifier. While Z 2 is the learning set with a percentage of 20%. This learning has a serious impact on improving the composition of samples of Z 1 . However, Z 3 is the validating set that was used with a percentage of 20% to ensure the efficiency of the subset of features selected. Z 4 is the testing set that was used with a percentage of 30% for finally calculating OPF accuracy with the features selected.
The methodology of the experiment depends on the threshold approach that was presented in [15]. This approach divides the running times into values range from 10% to 90%, for each period of the running time, the best solution that got the highest fitness function over Z 2 was stored in a vector. Then, the features subset of the stored solutions is used to test the validation set Z 3 . After that the best-stored features subset that maximized the accuracy over Z 3 will be used to evaluate the testing set Z 4 . The purpose of the validation step is to guarantee the best-selected features and ensure the quality of them before their using in the test step over Z 4, as explained in Fig. 2 [17].
Throughout the remaining of the research paper, the bold format represents the best values. Table III displays the average results of classification accuracy (fitness function) of the compared techniques over testing sets of all datasets. The obtained results revealed that BCSA surpassed the other four algorithms in all datasets. Table IV shows the calculated standard deviation of the classification accuracy results of the compared techniques. It was demonstrated that BCSA had better results than BBA, BFPA, and BDEA in the Breast Cancer dataset. Besides, it was evident that the standard deviation of BCSA was higher than those of BFPA and BDEA in the German Numeric dataset.
The average classification errors of the different algorithms are represented in Table V. It was remarked that BCSA achieved the least classification error compared with other algorithms in the three datasets. Wilcoxon rank test [34] was calculated between the proposed algorithm BCSA and the compared techniques. The results in Table VI indicate that the BCSA outperforms the BBA over the Australian dataset (0.022) and Breast Cancer dataset (0.007) and also, surpasses BDEA over the Breast Cancer dataset (0.047), taking into u v α = 0.05. The results in Table VII are the average of features selected by the compared algorithms over the used datasets. It is remarked that although BCSA selected the smallest number of features compared with the other techniques, it achieved the highest accuracy as shown in Table III and Fig. 3, also it achieved the lowest classification error as outlined in Table V. Table VIII shows the execution time of the used algorithms. The obtained results revealed that BCSA was executed in less time over the Breast Cancer dataset and German Numeric dataset, also it had the best mean execution time over the three datasets. The experimental results proved that BCSA outperformed the compared techniques where the best results were obtained through the classification accuracy and the number of the features selected.
The special characteristics of the clonal selection are the reason behind the exceeding of BCSA over the compared algorithms. An adaptive cloning technique, so that the highaffinity solutions are cloned by a low cloning rate, and the low-affinity solutions are cloned by a high cloning rate, this step enhances exploitation. In order not to locate in local optima, the worst affinity individuals are exchanged by randomly newly generated individuals, therefore, the algorithm always maintains the population diversity that is very important for exploration property. Also, the receptor editing helps to achieve population diversity, exploring new search regions, and avoiding local optima. www.ijacsa.thesai.org    From the current research, it could be concluded that the suggested Binary Clonal Selection Algorithm (BCSA) has an ability to solve optimization problems such as feature selection problem and get notable results against four powerful techniques. The target of this problem is to look for the most informative subset of features that represent all features in a specific domain. The proposed BCSA surpassed famous techniques like BFPA, BCS, BBA, and BDEA and got the best results through the accuracy of classification, the number of features selected, and the execution time.
Therefore, it is suggested that BCSA is tested against many public datasets and real-world problems. It is proposed to be used with different classifiers like Support Vector Machine (SVM), k-Nearest Neighbors (k-NN) and Artificial Neural Networks (ANN) in order to assure its reliability. Moreover, we intend to apply it to big data mining and to solve other problems like feature weighting, job scheduling, and text processing.
Additionally, there are some ideas related to the Clonal Selection Algorithm where the ratio of the mutation can be adapted according to the individual affinity and the number of iterations. If we assumed that the population converges by times, so the mutation can have a large value at first and then decrease with time. The same concept can be applied to the ratio of cloning but in the reverse order, the ratio of the cloning can start with a small value and increase with time.