Fusion of BIFFOA and Adaptive Two-Phase Mutation for Helmetless Motorcyclist Detection

Road traffic injuries and deaths cause considerable economic losses to individuals, families, and nations as a whole. One of the strategies needed to curtail these fatalities is the surveillance of helmetless motorcyclists, which is carried out by developing an automatic detection system based on computer vision. Generally, this system consists of three subsystems, namely, moving object segmentation, motorcycle classification, and helmetless head detection. HOPG-LDB (Histogram of Oriented Phase and Gradient Local Difference Binary) descriptor for this system produced good accuracy; however, it still has a drawback related to a large number of features. Based on these observations, this paper proposed an Adaptive Twophase Mutation Binary Improved Fruit Fly Optimization Algorithm (ATMBIFFOA) to reduce the features. The ATMBIFFOA is a new feature selection algorithm that improved BIFFOA (Binary Improved Fruit Fly Optimization Algorithm) with an adaptive two-phase mutation algorithm. The BIFFOA produced good accuracy; however, weak in reducing feature dimension. The adaptive two-phase mutation algorithm was used to cover this weakness. The experiment results show that the proposed method can reduce the number of features and computation time effectively from BIFFOA. The proposed method produced motorcycle classification accuracy of 96.06% for the JSC1 dataset and 96.85% for the JSC2 dataset. As for helmetless head detection, the proposed method produced an average precision of 66.29% for the JSC1 dataset and 63.95% for the JSC2 dataset. Keywords—Motorcycle classification; helmetless head detection; BIFFOA; two-phase mutation algorithm


I. INTRODUCTION
Road traffic injuries and deaths cause considerable economic losses to individuals, families, and nations as a whole. Based on the current trends, these problems are predicted to continually occur for a long period. Furthermore, World Health Organization (WHO) published that traffic accidents were the 7th leading cause of death in the world, with 1.35 million mortality cases being recorded yearly [1]. In Indonesia, the number of deaths caused by two and three-wheel motorcyclists was approximately 74% among other traffic accidents [1]. The main cause of this type of accident is the head injury sustained due to the unyieldingness of the use of helmets. WHO reported that the use of helmets reduces the risk of 69% of head injuries [1]. Most countries mandated the use of helmets; however, many motorcyclists still violate the regulation and escape the consequences, because of the difficulty of direct surveillance on the highway, which is not monitored for a full day. Meanwhile, research in automatic detection based on computer vision has been growing rapidly, to curtail these problems.
In general, the study of detection of motorcyclists not wearing helmets was divided into two subsystems, namely motorcycle detection and helmetless head detection [2]. The feature extraction process gives an impact on the performance of both subsystems. Previous studies have used hand-crafted features and a convolutional neural network (CNN). The Histogram of Oriented Gradient (HOG) descriptor is a handcrafted feature that results in relatively high accuracy. The author in [3] used HOG to classify vehicles in various environments and views. The author in [4] used HOG in both subsystems which resulted in good accuracy, but it still incorrectly detects distant objects. The author in [5] reported that HOG produces higher accuracy than Wavelet Transform (WT), Local Binary Pattern (LBP), and their combination in helmetless head detection. The author in [6] reported that HOG produces higher accuracy than Scale-Invariant Feature Transform (SIFT) and LBP in both subsystems.
Currently, the CNN method is popular for classification and detection in various domains. The author in [7] used CNN for motorcycle detection to overcome the problem of changing lighting and poor video quality. CNN was also used for helmetless head detection with various models, for example, AlexNet [8], VGG16 [9], VGG19, Inception V3, and MobileNets [10]. The author in [11] combined HOG and LBP for vehicle classification, and compared hand-crafted features (combination of HOG, LBP, and Haralick) and Custom CNN for helmetless head detection. The result showed that the method produces higher accuracy than HOG and LBP for motorcycle classification. For helmetless head detection, Custom CNN is superior in terms of accuracy and hand-crafted features are superior in terms of prediction time with relatively good accuracy. In addition, [12] compared several hand-crafted features (HOG, LBP, and Gabor) and CNN for vehicle classification. The result showed that the HOG produces better accuracy than other descriptors and CNN.
However, several authors stated that the HOG lacks to deal effectively with images of varying lighting [13], and different local patterns [14]. This ineffectiveness can be solved by combining HOG, Histogram of Oriented Phase (HOP), and Local Difference Binary (LDB) descriptors called Histogram of Oriented Phase and Gradient -Local Difference Binary (HOPG-LDB) [15]. The result of the experiment showed that *Corresponding Author 571 | P a g e www.ijacsa.thesai.org the HOPG-LDB descriptor increases the accuracy of the HOG, however, it still has a drawback related to a large number of features. The author in [16] stated that one of the preprocessing techniques that reduce these numbers is feature selection. The author in [17] stated that feature selection techniques are divided into 2, namely filters and wrappers. The author in [18] reported that the wrapper method tends to provide better performance than the filter. This technique can significantly improve the selection of relevant features [19].
The author in [20] reported that Binary Improved Fruit Fly Optimization Algorithm (BIFFOA) feature selection produces a good performance. Based on the experiment of this method compared with other algorithms, namely binary Gray Wolf Optimization (bGWO), Binary Gravitational Search Algorithm (BGSA), Binary Bat Algorithm (BBA), Binary Salp Swarm Algorithm (BSSA), Binary Grasshopper Optimization Algorithm (BGOA), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Correlation-based Feature Selection (CFS), Fast Correlation-Based Filter (FCBF), F-Score, Information Gain (IG), and spectrum. The results showed that the BIFFOA has the best accuracy, however, weak in reducing feature dimensions.
A solution for reducing feature dimensions was proposed in [21], which integrated the Gray Wolf Optimizer algorithm (GWO) and two-phase mutation (TMGWO). The first phase mutation is used to reduce feature dimensions and the second attempt to increase accuracy. The experiments of this method were compared with other algorithms, namely BBA, Binary Crowd Search Algorithm (BCSA), binary Gray Wolf Optimization Algorithm (bGWOA), binary Whale Optimization Algorithm (bWOA), Discrete Particle Swarm Optimization (DPSO), Flower Algorithm (FA), Multi-Verse Optimization (MVO), PSO, and Non-Linear Particle Swarm Optimization (NLPSO). The results showed that the TMGWO produces the best accuracy and second-best feature reduction compared to other methods.
Detection of motorcyclists not wearing helmets in real-time is be required the high accuracy and speed. The addition of a feature selection process can improve this performance. BIFFOA feature selection produced a good performance; however, weak in reducing feature dimension [20]. This weakness can be solved by adding a two-phase mutation algorithm. This study aims to add a feature selection process to detect motorcyclists not wearing helmets. The addition of the proposed feature selection is to reduce the number of features so the computation time of motorcycle classification and helmetless head detection can be reduced. The main contributions of this paper are: • Adaptive Two-phase Mutation Binary Improved Fruit Fly Optimization Algorithm (ATMBIFFOA) is proposed. This algorithm is a fusion of BIFFOA and an adaptive two-phase mutation algorithm.
• An algorithm of adaptive two-phase mutation that is modified from a two-phase mutation is proposed.
• The ATMBIFFOA feature selection is added after the feature extraction process to reduce features in the motorcycle classification and helmetless head detection.
This paper is organized as follows. Related work is presented in Section II that is divided into two parts: motorcycle detection and helmetless head detection. Section III explains the dataset used, the proposed algorithm, and the evaluation methods. In Section IV, we present the experimental result and discussion. Finally, the main conclusion is introduced and future work is suggested in Section V.
II. RELATED WORK In general, this study is divided into two subsystems: motorcycle detection and helmetless head detection.

A. Motorcycle Detection
Motorcycle detection has concerned three processes: vehicle segmentation, feature extraction, and classification. The author in [22] used three shape features: length, width, and their ratios to categorize vehicles into five groups. The result received from the usage of the decision tree (DT) classifier confirmed high accuracy, however, the features had been now no longer able to differentiate bicycles, motorcycles, and tricycles. The author in [23] used the features of length, width, area, diameter, and the ratio of distance to decide the object's center of mass and its main axis length. The classification method used a multilayer perceptron (MLP) to categorize the vehicles into three categories: heavy and mild duties, and motorcycles.
The author in [24] used the area feature to categorize motorcycles and others. Meanwhile, the author in [25] proposed a way that specializes in calculating the number of motorcycles on the street in real-time. The features used are area, height, and width to categorize motorcycles and nonmotorcycles.
The author in [6] compared a few descriptors: HOG, SIFT, and LBP in classifying motorcycles and non-motorcycles. The effects confirmed that the HOG descriptor has exceptional accuracy. The author in [26] compared HOG, Speeded Up Robust Features (SURF), SIFT, and LBP in motorcycles detection. It has a look at extensively utilized information of images taken from in front, besides, and at the back of motorcycles. The result confirmed that the HOG descriptor has better accuracy.
A observe through as in [27] proposed a system that categorized vehicles into four categories: cars, vans, buses, and motorcycles. The system used Intensity Pyramid-based HOG (IPHOG) descriptor and support vector machine (SVM) classifier. The outcomes confirmed that the situations of climate and light converting have decreased accuracy than the normal condition.
The author in [28] used the LBP descriptor and SVM classifier to locate motorcycles. This descriptor became as compared with SURF, HOG, and Haar Wavelet. The outcomes confirmed that the proposed method has higher accuracy than the others. The author in [5] proposed a WT descriptor that became as compared with LBP, HOG, and SURF. The outcomes confirmed that the WT descriptor has higher accuracy than the others.
The author in [29] proposed a combination of shape and color features comprising of area, the ratio of width and height, 572 | P a g e www.ijacsa.thesai.org and color deviation standard. These features served as entering for the k-nearest neighbors (KNN) classifier to decide the motorcycles and non-motorcycles. The proposed approach becomes capable of calculating the wide variety of passengers on a motorcycle. The outcomes confirmed mistakes in type due to the fact the data had been taken from afar, overlapping vehicles, and the passenger sitting too near the rider. The author in [11] concatenated HOG and LBP with sequential minimum optimization (SMO) for training the SVM classifier. The outcomes display that the combination of those descriptors produced higher accuracy than the HOG and LBP descriptors. The author in [15] concatenated HOG, HOP, and LDB descriptors with MLP classifier. The results show that the proposed descriptor has higher accuracy than HOG, HOP, LDB, HOG-HOP, HOG-LDB, and HOP-LDB descriptors. Moreover, the proposed method has higher accuracy than in [5], [6], and [11].
CNN has additionally been used for motorcycle detection. The author in [30] used CNN which specializes in jam situations. The CNN is also used in [7] to address numerous lights and bad video quality. The take a look at outcomes displays that the accuracy generated is better than hand-crafted features, however, the computation time is much longer.

B. Helmetless Head Detection
Helmetless head detection has involved three stages: ROI (region of interest) determination, feature extraction, and classification. The ROI determination pursuits to check the region round a rider's head. The heads of the rider and passenger are above the motorcycle image; therefore, the studies focused at the top a part of the object. Once the head region is known, the following steps are feature extraction and classification.
The author in [24] used the circular hough transform (CHT) descriptor for helmetless head detection. The result confirmed that it error still occurs for the detection of two or extra passengers. The author in [5] proposed the HOG descriptor, and the dataset was taken in a static environment. The assessment was performed by comparing HOG, WT, LBP, WT+LBP, WT+HOG, LBP+HOG, and WT+HOG+LBP descriptors. The result confirmed that the HOG descriptor has the best accuracy. The author in [6] compared HOG, SIFT, and LBP descriptors. The outcomes confirmed that HOG has the best accuracy. However, the data were taken on a quiet road.
Some researchers have combined shape, texture, and color features to enhance accuracy. The author in [29] used features of arc circularity, average intensity, and hue. Data were taken from three recording conditions, which include near, far, and medium. It turned into located that the greatest mistakes had been from the data recording acquired from afar.
The author in [26] used the features of arc circularity, average intensity and hue, and Center Symmetric-Local Binary Pattern (CS-LBP). These features served as entering the KNN classifier for the classification of heads with and without helmets. The technique focused on troubles with data recording taken from special angles withinside the front, besides, and back. However, the head images were cropped manually. The author in [31] extensively utilized arc circularity, average intensity and color, and HOG.
The author in [28] used geometric, shape, and texture characteristics. The study used a combination of CHT, LBP, and HOG descriptors. CHT is used to decide the geometric form of an image. This technique has a weak point that the incapability of detecting images of low resolution. The author in [15] used the HOPG-LDB descriptor that concatenated HOG, HOP, and LDB descriptors. The results show that the HOPG-LDB descriptor has higher accuracy than HOG, HOP, LDB, HOG-HOP, HOG-LDB, and HOP-LDB descriptors. Moreover, the proposed method has higher accuracy than in [5] and [6].
The author in [32] used CNN with the YOLOv2 model to detect riders without helmets. The author in [7] used the AlexNet model on each light and heavy traffic. The author in [8] extensively utilized the AlexNet model and inaccurate detections have been made for riders placing on hats. The author in [33] used the iter_45, Inception-V3 network, and full ImageNet network models. The author in [34] proposed Faster Regions with Convolution Neural Network (R-CNN) for decreasing the computing time. However, inaccurate detections have been nevertheless made for bicycle riders. The author in [11] compared hand-crafted features (a combination of HOG, LBP, and Haralick) and Custom CNN. Custom CNN is advanced in terms of accuracy and hand-crafted features are advanced in terms of prediction time with pretty proper accuracy.

A. Dataset
This study used two datasets, namely JSC1 and JSC2 taken from the rear and front of an object, respectively. Both datasets contain motorcycle and non-motorcycle images used for the input of motorcycle classification. The input of helmetless head detection used the motorcycle images. Fig. 1 shows some samples of both datasets.
The datasets were generated from the segmentation process of the video. This process consists of several stages, namely histogram equalization of video frames that have been converted to grayscale, Gaussian Mixture Model (GMM) to determine foreground, and morphological operations (opening and dilation) to remove noise. The author in [35] stated that GMM is robust to lighting changes. The first video for the JSC1 dataset was recorded on Cipinang Baru Timur Street at East Jakarta, Indonesia with a frame speed of 19.49 fps. The second video for the JSC2 dataset was recorded on Budi Raya Street at West Jakarta with a frame speed of 20 fps. Both videos have a resolution of 1280 x 720 pixels and a duration of 3 hours. Training data was generated from the first 2 hours and testing data was generated from the rest. This data division technique was also used in [6]. The number of the training and testing data are shown in Table I and Table II, respectively. 573 | P a g e www.ijacsa.thesai.org

B. Developed System
In general, the system for detecting helmetless motorcyclists is divided into 3 subsystems, namely moving object segmentation, motorcycle classification, and helmetless head detection. This study focuses on developing motorcycle classification and helmetless head detection, as shown in Fig.  2. The stage of the head detection in the helmetless head detection subsystem is begun the determination process of the ROI of the head from the motorcycle image. The ROI limits are determined based on the minimum and maximum positions of the upper 1/3 of the blob image generated from the segmentation process. The resulting image was converted to a grayscale image and was enhanced by its contrast using CLAHE (contrast-limited adaptive histogram equalization). Fig. 3(a) is an example of a motorcycle image. Fig. 3(b) is the result of this process. The next step was to create two binary images with opposite intensities using thresholding and inverse thresholding, but some blobs still need to be filtered and fixed. Fig. 3(c) shows the result of this process. The filtering was carried out by morphological operations (opening and filling holes), removing blobs on the side and top edges, and removing too big blobs. Moreover, overly tall blobs were fixed by removing the bottom. Fig. 3(d) shows the result of this process. Edge detection of Laplace of Gaussian (LoG) is used for the next step with the results as in Fig. 3(e). After that, CHT is applied to both images, and then the results are combined on an ROI head image. An example of this result is as in Fig. 3(f). The classification in the head detection is used to classify the objects bounding box on the circular into the head and nonhead. Classification in helmetless head detection is used to classify head objects into heads wearing a helmet and not wearing a helmet. Fig. 3(g) is an example of the classification of head detection. The author in [5] reported that the MLP classifier produces a good performance so this paper used it. The feature extraction process used the HOPG-LDB descriptor [15].

C. Binary Improved Fruit Fly Optimization Algorithm (BIFFOA)
The author in [20] explained that BIFFOA is developed from the Improved Fruit Fly Optimization (IFFO) algorithm for the feature selection, by converting it from continuous to binary version. The author in [36] explained that the IFFO algorithm is improved from the Fruit Fly Optimization (FFO) algorithm that is used to determine global optimization. The weakness of the FFO algorithm is that the search radius on all iterations is the same. In the IFFO Algorithm, the search radius (r) for each iteration is calculated using (1).
where r max and r min are the radii of maximum and minimum, respectively. Iter represents the iteration, and Iter max represents the maximum number of iterations. The author in [37] explained that r min = (UB-LB)/2 and r max = 10 -5 , where UB (upper bound) and LB (lower bound) value 1 and 0, respectively.
The initialization parameters in the BIFFOA algorithm are PS, r max , r min , and Iter max . In addition, the initial swarm location is initialized by selecting the best solution, which is determined by the agent with the smallest fitness value. The fitness function is designed as shown in (2).
574 | P a g e www.ijacsa.thesai.org where γ R (D) represents the classification error rate of a given classifier. |R| and |C| denote the length of the selected feature subset and the total number of features, respectively. α and β represent the weight of classification accuracy and selected feature subset, respectively. The values of α and β for this study are 0.99 and 0.01, respectively. The agents used are a swarm of fruit fly positions. The initial positions of this fruit fly are binary numbers randomly generated. The position of fruit flies is updated using (3).
where n is the dimension length and rand() is the generation of random numbers between [0, 1]. δ j is the j th dimension of the optimal solution. S(∆x i,j ) is the sigmoidal transfer function (Sshaped), as in (4).
where ∆x i,j is calculated using (5).
where r is the search radius for every iteration that is calculated using equation (1). d is a dimension index that is chosen randomly. The pseudocode of the BIFFOA is shown in Algorithm 1 [20]. 2. Set PS, r max, r min , Iter max 3. Calculate the fitness of all agents using (2) 4. Set the best solution as swarm location 5. Iter=0 6. X*=∆ 7. Repeat 8. Calculate the search radius r using (1) 9. Calculate ∆x i,j using (5) //Osphres is foraging phase 10.
Using (3)  575 | P a g e www.ijacsa.thesai.org pseudocode of the two-phase mutation [21]. The input of this algorithm is the best grey wolf (X α ) in each iteration. The X α is mutated in two phases, the first is used to reduce features and the second is utilized in improving accuracy. The mutation is executed when the r is less than the Mutation Probability (Mp). The value of r is between 0 and 1, which is generated randomly and the Mp value is 0.5. Algorithm 2. The standard two-phase mutation 1. Input: the best grey wolf X α from each iteration 2. Fitness= calculate the fitness of X α //start the first phase 3. Define vector one_positions to store the locations of the selected features in X α 4. Define X mutated1 = X α 5. For i=1 to length of one_positions //for each selected feature in X α 6.
Generate a random number r 7.
End if 15. End for //start the second phase 16. Define vector zero_positions to store the locations of the unselected features in X α 17. Define X mutated2 = X α 18. For j=1 to length of zero_positions //for each unselected feature in X α 19.
Generate a random number r 20. If

E. Proposed Algorithm: Fusion of BIFFOA with Adaptive
Two-Phase Mutation Algorithm ATMBIFFOA is a new feature selection algorithm that is proposed in this paper. It improved the BIFFOA by adding an adaptive two-phase mutation algorithm that aims to reduce feature dimensions. The pseudocode of the ATMBIFFOA is found in Algorithm 3, while that of the adaptive two-phase mutation algorithm is found in Algorithm 4.
The input of the adaptive two-phase mutation algorithm is the best solution for each iteration (X*) that is defined as shown in (6). * = ( 1 , 2 , … , ) where x j is the j th dimension of X*, and n is the dimension length of X*. When each x j values= 0, then the corresponding feature is unselected. And when each x j values= 1, then the corresponding feature is selected. The Mp is defined as the vector as shown in (7).
where mp j is the mutation probability of the j th dimension. The mp j values are constant at the beginning of iteration (mp j in the study is 0.5). However, when the iteration is greater than or equal to the weight iteration (I w ), then Mp is equal to the dimension weights of the best agents. The I w is calculated using (8).
where t w is a weight threshold that values a range of [0, 1].
The dimension weights of the best agents are represented in the vector (W i ), as in (9).
where w i,j is the weight of the best agent in the i th iteration and j th dimension that is calculated using (10).
where x i,j is the value of the best solution in the i th iteration and j th dimension. 2. Set PS, r max, r min , Iter max 3. Set W 0 , Mp, I w 4. Calculate the fitness of all agents using (2) 5. Set the best solution as swarm location 6. Iter=0 7. X*=∆ 8. Repeat 9. Calculate the search radius r using (1) 10. Calculate ∆x i,j using (5) //Osphres is foraging phase 11.

20.
End if //Mutation process 21. Process of the adaptive two-phase mutation 22. Until Iter=Iter max 23. Output: Solution X* 576 | P a g e www.ijacsa.thesai.org The mutation process in the two-phase mutation algorithm is executed in all dimensions of one_position and zero_position vector. Therefore, this algorithm takes a long time when used in a large number of features. The adaptive two-phase mutation algorithm limited the number of mutated dimensions with the 1 st Mutation Candidate Probability (P mc1 ) and the 2 nd Mutation Candidate Probability (P mc2 ), therefore, its computation time can be reduced. The mutation position of both vectors is selected through random permutation. Algorithm 4. The proposed adaptive two-phase mutation 1. Input: the best solution X* from each iteration //start the first phase 2. Define vector one_positions to store the locations of the selected features in X* 3. Define X mutated1 = X* 4. Define the number of mutation candidate n mc =P mc1 × length of one_position 5. Define vector one_mutation_candidate to store the location of the mutated candidate by selecting n mc random permutations in one_position. 6. For i=1 to length of one_mutation_candidate 7.
Generate a random number r 8.
If Fitness_mutated= the fitness of X mutated1 11.
End if 16. End for //start the second phase 17. Define vector zero_positions to store the locations of the unselected features in X* 18. Define X mutated2 = X* 19. Define the number of mutation candidate n mc =P mc2 × length of zero_position 20. Define vector zero_mutation_candidate to store the location of the mutated candidate, by selecting the n mc random permutations in the zero_position 21. For j=1 to the length of zero_mutation_candidate 22.
Generate a random number r 23. If

F. Evaluation Methods
The parameters for measuring performance were feature number (NF), time of average classification (Time), accuracy (Acc), precision (Pre), and recall (Rec). Especially for helmetless head detection, we used average precision (AP) to measure accuracy. Equations (11), (12), and (13) are used in calculating accuracy, precision, and recall, respectively. where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively. The TP is the true detection of the ground-truth bounding box. The correct detection was measured using the intersection over union (IOU) as in (14).
where B p and B gt represent the predicted and ground-truth bounding box, respectively. The detection is considered correct when the IOU is greater than or equal to the threshold. In this study, the threshold value was 0.5.
AP is the area under the precision-recall curve which has a range of [0, 1] and it is calculated as in (15) [38].
where � � � is the precision measured at the time of recall � .
The experiment was carried out by selecting the best result on each process, namely a combination of cell and block sizes on the HOPG-LDB descriptor, variation of t w value on the ATMBIFFOA, and a combination of hidden layer number, neuron number, and training algorithm on the MLP. The block size variations were 2×2 and 3×3 cells. The cell size variations in the 2×2 blocks were 4×4, 6×6, 8×8, and 12×12 pixels and the 3×3 block sizes were 4×4 and 8×8 pixels. The variations of t w values were 0.25, 0.5, and 0.75. The variations of the number of hidden layers used are 1, 2, and 3. The number of neurons in the hidden layers (n H ) was determined by using (17) [39].
where n i is the number of neurons in the input layer, n o is the number of neurons in the output layer, and l is an integer constant of 1 to 10. The l variation of this study was 1, 5, and 10. Finally, we used 8 variations of the training algorithm, namely the gradient descent with adaptive learning rate backpropagation (traingda), scaled conjugate gradient backpropagation (trainscg), conjugate gradient backpropagation with Powell-Beale restarts (traincgb), conjugate gradient backpropagation with Fletcher-Reeves update (traincgf), conjugate gradient backpropagation with Polak-Ribiére update (traincgp), one step secant backpropagation (trainoss), gradient descent with momentum and adaptive learning rate backpropagation (traingdx), and gradient descent backpropagation (traingd). This paper used the learning rate of 0.05, the epoch maximum number of 1000, and the limit for error of 0.001 for the training. The author in [40] reported that these parameters result in a good performance. 577 | P a g e www.ijacsa.thesai.org For the ATMBIFFOA, the parameters of PS and Iter max are 24 and 100, respectively. The values of P mc1 and P mc2 are 0.25 and 0.01, respectively. The K-fold cross-validation (K=5) was used to separate the training and validation data in the feature selection process. Each experiment is run 5 times and the average results are used. All the experiments were carried out in Windows 10 Ultimate 64-bit operating system, with processor Intel Core (TM) i7-9750HQ CPU and 16 GB of RAM. All the algorithms were implemented in the MATLAB R2019a Software.

IV. RESULTS AND DISCUSSION
This section shows the experiment results of the proposed method for motorcycle classification and helmetless head detection.

A. Motorcycle Classification
The first experiment is to determine the best accuracy of the proposed method (ATMBIFFOA) with variations of t w . Table III shows the results of this experiment for the motorcycle classification. From this table, we can be seen that the best accuracy reaches 96.06% for the JSC1 dataset and 96.85% for the JSC2 dataset.
Furthermore, the proposed method is compared with the previous study, namely in Table IV. Here, [20] used BIFFOA feature selection. From this table, it can be seen that the proposed method is superior in terms of the number of features and classification time. Meanwhile, in terms of accuracy, the proposed method is superior for the JSC1 dataset, and [20] is superior for the JSC2 dataset. For this reason, we conclude that the addition of an adaptive two-phase mutation algorithm in BIFFOA can effectively reduce the number of features.   [20]. The proposed method produces a better optimal solution than the BIFFOA. Table V shows the experimental results of helmetless head detection with variations in the value of t w . From this table, it can be seen that the highest AP reaches 66.29% for the JSC1 dataset and 63.95% for the JSC2 dataset. Furthermore, the proposed method is compared with previous studies. Table VI shows a comparison of the proposed feature selection method and previous study. Here, [20] used the BIFFOA feature selection. From this table, it can be seen that the proposed method is superior in terms of the number of features and classification time. In addition, the AP of the proposed method is superior for the JSC2 dataset, although it is slightly lower for the JSC1 dataset. Fig. 5 shows the comparison of the convergence curve between the proposed method and the previous study. The proposed method produces a better optimal solution than [20]. Therefore, we can conclude that the addition of an adaptive two-phase mutation algorithm to BIFFOA can reduce features effectively. 578 | P a g e www.ijacsa.thesai.org The proposed method is also compared with previous studies, as shown in Table VII. Here, [6] used a combination of HOG descriptor and SVM classifier, and [5] used a combination of HOG descriptor and MLP classifier. AP of the proposed method is superior when compared to these methods. Fig. 6 shows a comparison of the precision-recall curve of the proposed method and these methods. 579 | P a g e www.ijacsa.thesai.org V. CONCLUSION This paper proposed a new feature selection algorithm called ATMBIFFOA for motorcycle classification and helmetless head detection. The experiment used two datasets with different recording angles, namely the rear and front of an object. The motorcycle classification accuracy of the proposed method reaches 96.06% for the JSC1 dataset and 96.85% for the JSC2 dataset. Meanwhile, the AP of helmetless head detection reaches 66.29% for the JSC1 dataset and 63.95% for the JSC2 dataset. The proposed algorithm is more effective than BIFFOA in terms of the number of features and the time of classification. For this reason, the proposed method is more suitable for the detection of motorcyclists who do not wear helmets in real-time. However, the addition of an adaptive twophase mutation algorithm to BIFFOA can significantly increase the feature selection time. In the future, ATMBIFFOA can be used with faster classifiers such as KNN, SVM, and DT to reduce the time consumption of feature selection.