An Efficient Image Clustering Technique based on Fuzzy C-means and Cuckoo Search Algorithm

Clustering is a predominant technique used in image segmentation due to its simple, easy and efficient approach. It is very important for the analysis, extraction and interpretation of images; which makes it used in multiple applications and in various fields. In this article, we propose a different image segmentation technique based on the cooperation between an optimization algorithm which is the Cuckoo Search Algorithm (CSA) and a clustering technique which is the Fuzzy C-means (FCM). The clustering method we propose goes through two major steps. In the first step, CSA explores the entire search space of the specified data to find the optimal clustering centers. Subsequently, these centers are evaluated using a new objective function. The result of the first step is used to initialize the FCM algorithm in the second step. The efficiency of the suggested method is measured on several images selected from the BSD300 database and we compare it with other algorithms such as FCM optimized by genetic algorithms (FCM-GA) and FCM optimized by particle swarm optimization (FCM-PSO). The experimental results on the different algorithms used in this paper show that the proposed method improves the segmentation results, based on the analysis of the best values of fitness, MSE, PSNR, CC, RI, GCE, BDE and VOI. Keywords—Clustering; classification; image segmentation; fuzzy c-means; cuckoo search algorithm


I. INTRODUCTION
Segmentation is an important step in extracting qualitative information from the image. It is done via dividing the image in question into regions with homogeneity according to a predefined criterion (gray level, color, intensity, texture, etc.). Several segmentation approaches have appeared in recent years. Some of them seek to delimit homogeneous regions by their contours (contour approach) while others seek to find homogeneous regions (region approach).
The segmentation process represents a crucial step in computer vision systems, as features and decisions are extracted and made from its output. The first image segmentation algorithms were developed in the 1970s. Since then, many techniques and methods of segmentation have been experimented to try to improve the results. Nevertheless, until today, no classical image segmentation algorithm can provide perfect results on a wide variety of images.
Several segmentation techniques exist and four main categories can be distinguished: segmentation by classification [1][2][3], by regions [4], by contours [5], and finally segmentation by region-contour cooperation [6]. Clustering is considered among the most used image segmentation algorithms. The latter is a field of machine learning belonging to unsupervised learning. Clustering is mainly used to group populations into communities with similar common criteria. It is a data mining task that aims at dividing the elements of a set into groups, i.e. to establish a partition of this set. Each group must be as homogeneous as possible, and the groups must be as heterogeneous as possible. However, classical clustering methods converge to the local optimum and require a prior initialization of cluster centers. Therefore, unsupervised classification is studied as an optimization domain, i.e., finding a partition of the data that optimizes a given feature. The image processing system presented has a multidisciplinary aspect. Its applications can be found in various fields like medical imaging [7], video analysis [8], and remote sensing [9]. In the literature, we do not find a technique to generalize it to image segmentation. Each method is used for a given type of image and in a well-defined computing context to know its performance and efficiency. Therefore, the different techniques proposed for image segmentation have asserted their defects and limitations. Researchers then found new, more flexible and efficient strategies to solve the segmentation problem, using metaheuristic approaches that now occupy an increasing place in the clustering framework for image segmentation. Metaheuristics are a set of algorithms that allow finding the fastest and most efficient solution for several optimization problems for which no more efficient classical method is known. They are iterative, i.e. starting from a single solution considered as a starting point, the search consists in moving from one solution to a neighboring solution by successive moves in a neighborhood constituted by the set of solutions by examining the fitness function.
The growing interest in metaheuristics is justified by the development of machines with enormous computational capacities, which has allowed the design of more and more metaheuristics that have proven to be quite efficient in addressing many problems like image segmentation. Thus, metaheuristics is a generic approach which operating principle is based on general mechanisms independent of any problem. Metaheuristics are stochastic and can therefore avoid being trapped in local minima. They are mainly guided by chance; however they are often combined with other algorithms in order to accelerate their convergence. www.ijacsa.thesai.org Metaheuristics are classified into two broad classes: single solution metaheuristics and solution population metaheuristics. Population-based optimization methods improve a population of solutions over time. The advantage of these methods is to use the population as a factor of diversity. Furthermore, singlesolution optimization methods are called trajectory methods, i.e., they allow a trajectory to be described during the search process. In the literature, there are several algorithms based on a population of solutions employed to increase the quality of segmented images, like: evolutionary algorithms [10], genetic algorithms [11], PSO algorithm [12], ABC (Bee Colony Algorithm) algorithm [13], the SCA algorithm (Sine Cosine Algorithm) [14], the cuckoo search algorithm [15] and others. The operation of metaheuristics is progressive and iterative. The initial step is often chosen randomly and the stopping step is often fixed using a stopping criterion. All metaheuristics rely on the balance between search intensification and diversification. Otherwise, we will see a convergence towards local minima through too long an exploration due to lack of intensification or lack of diversification. Several different methods have been devoted to unsupervised automatic classification. However, the evolution towards metaheuristics has given, in some difficult cases, very good results. In order to help improve the efficiency and performance of clustering-based image segmentation methods, we used a metaheuristic called "Cuckoo Search Algorithm" which was described by authors in [16]. This metaheuristic is an iterative stochastic method for solving many optimization problems. This method has been very successful in the optimization community; its good performance in different applications and the possibility of hybridization with other metaheuristics have contributed to this craze. In particular, this algorithm is based on the Cuckoo Search, which is inspired by the fascinating life style, habitat and reproduction of a bird species called cuckoo. It is also based on the parasitic behavior of this species combined with levitating flight-like movement logic specific to certain bird and fly species.
In the literature, there are several metaheuristic algorithms that have been used in the field of image segmentation, but the reasons for the choice of the cuckoo search are due to the use of two fundamental mechanisms:  Intensification, which refers to the exploitation close to the optimal solution found.
 Diversification, which refers to the efficient exploration of the totality of the research field.
To solve the problem of image segmentation and improve the quality of segmented images for use in various applications. In this paper, we propose a new image segmentation method based on the hybridization of FCM clustering and CSA algorithm, which focuses on the issue of finding the optimal cluster centers in the first step and starting the FCM clustering operation in the second step. Our method can not only search for the optimal solution in the global range, but it can also exercise the accuracy of the local optimization ability of FCM algorithm. We also compared the proposed technique with other existing clustering-based segmentation algorithms, such as FCM-GA and PSO-GA. The experimental results showed the efficiency of our hybrid algorithm on the different types of images used in our work and proved its performance by making a visual and statistical analysis of the different results obtained.
The fundamental contributions of this document can be mentioned by the following points:  The creation of a clustering approach based on the cooperation between the FCM and CSA algorithms.
 The fixation of the number of clusters k and consequently of the center of each cluster.
 The use of CSA operators to generate the initial centers and then the FCM starts with the generated centroids.
 The proposed method has been tested by several evaluation criteria well recognized in this field.
 The results obtained confirmed the robustness and performance of the proposed method compared to other algorithms.
In the next sections, we will first present a description of the workings of the CSA and FCM algorithms, in order to exploit the advantages of CSA to optimize the FCM clustering problem. In addition, the results achieved by our approach will be compared to other well-known image segmentation methods in the literature.
The structure of this document is as follows: Section 2 lists the related work. Section 3 presents the background used. The description of our approach is provided in Section 4, while the discussion of the obtained results is presented in Section 5. Last, the conclusion is displayed in Section 6.

II. RELATED WORK
Data mining refers to the set of algorithms and methods used to explore and analyze large computer databases in order to detect in these data: unknown rules, associations and trends (not fixed a priori) and particular structures, restoring in a concise way, the essential information useful for decision support. It uses advanced statistical methods such as data partitioning (gathering data in homogeneous packets), and regularly employs artificial intelligence mechanisms or neural networks. In other words, clustering allows to group objects with similar properties into several homogeneous classes so that the intersection of the formed classes in pairs gives an empty set and the union of all classes gives the initial data set. Note that the degree of overlap between classes and the multidimensional nature are the most important difficulties in solving a classification problem. Knowing that, data elements from different clusters have minimal similarity [17]. Clustering is classified as the main unsupervised learning problem; thus, the clustering process can be hard or fuzzy. The hard method assigns each object a single label, e.g., K-means is the most popular classification technique [18] for hard clustering; while in fuzzy classification, an object can simultaneously belong to several classes [19], e.g., FCM which is widely used for image segmentation by fuzzy classification [20]. Fuzzy methods can be easily converted into hard methods.
The use of FCM standard for image segmentation has limited performance because the result is strongly dependent www.ijacsa.thesai.org on the initial cluster centers. As a result, the algorithm quite often falls into locally optimal solutions and misses global solutions. Another disadvantage of FCM is its high sensitivity to image artifacts, such as noise and intensity inhomogeneity. In the literature, many bio-inspired techniques, such as Algorithm (GA), Whale Optimization Algorithm (WOA), Ant Colony Optimization (ACO), Differential Evolution and also PSO, were proposed in addition to FCM to reduce its weaknesses [21].
Recently, other metaheuristic approaches have been employed to address several optimization problems and can open new perspectives and improve image segmentation. Some of these approaches are: In [22], the author described an algorithm for fireflies based on fuzzy classification. This algorithm has two phases. In the first phase, an optimal value is identified of the number of predetermined clusters, and then the result of the first step is input to the FCM algorithm to perform the cluster segmentation operation. The results obtained show promising results compared to the traditional FCM algorithm.
In [23], the author introduced a new method for liver segmentation using the whale optimization algorithm (WOA). The proposed technique starts by dividing the image into a predefined number of classes. The clustering process of this method converts the prepared image into a binary image and after multiplication by the WOA segmented image. This technique is tested using a database of MRI images. The results demonstrate the robustness of the technique suggested by the authors.
Based on a metaheuristic algorithm called Grey Wolf Optimizer, the authors in [24], proposed a new algorithm for satellite image segmentation. This algorithm has been modified to work as an automatic clustering algorithm. This technique has been evaluated on satellite images and shows an efficient accuracy with a shorter computation time.
In [25], the author used a clustering strategy for fish image segmentation using the Salp Swarm Algorithm (SSA). This method is used to cluster the image pixels to produce compact and quasi uniform super pixels. The results of the experiments conducted by the proposed model show the performance and efficiency for different cases compared to the work.
In this work, we introduce a new clustering-based approach to image segmentation. This technique is performed by hybridizing the FCM algorithm and the CSA algorithm which was proposed by the researchers in [16]. The CSA is a recent optimization algorithm based on artificial intelligence, which has shown its robustness and efficiency on a large number of optimization problems. Many researchers have proven the efficiency of Yang's CS algorithm in different applications, such as face recognition [26], neural network training [27] and engineering design [28]. The CSA has also been used in clustering problems and as examples we can cite [29]. Although the CS algorithm is simple and very efficient and also has few parameters, it sometimes falls into the local optimum during the search. Therefore, many researchers have been working to improve the performance of this algorithm, and thus they have proposed improved versions of the CS algorithm [30,31]. In our paper, a hybrid algorithm between CSA and FCM is introduced for image segmentation using clustering technique. On the proposed method, the initial step size is randomly calculated without being designed in advance. In addition, to reduce local extremes and improve the variety of cuckoos, the value changes non-linearly with iterations. To judge the efficiency and the performances of our proposal, we have tested it on several images of different types and compared it by several classical clustering algorithms or metaheuristics.

A. Fuzzy C-Means Algorithm
FCM is a clustering technique, developed by Bezdek in 1981. In image processing, FCM consists in finding the exact membership of a pixel to a cluster. Each pixel is initially assigned a value that corresponds to its degree of membership in each cluster. This degree varies between 0 and 1: this is the fuzzification. We apply the chosen fuzzy rule; this rule manages the defuzzification of the system by assigning each pixel to a single class, namely the one to which it has the highest degree of membership. The concept of this operation is as follows: Each of the N pixels belongs to each of the C classes with a membership coefficient U; the set of membership degrees is stored in the FCM matrix U. This algorithm is often used in fuzzy image segmentation.

 Principle of the FCM algorithm
The FCM algorithm [32] is a fuzzy segmentation technique applicable to different types of images. To partition the image, we need to minimize the criterion of the sum of intra-class distances generalized to the fuzzy case and given by the following formula: Under the following constraints: Where: [ is a parameter that characterizes the degree of fuzziness, ) is the Euclidean distance given by the following formula: The basic idea of FCM classification is to assign to each vector a degree of membership , to each class centered in . The algorithm minimizes a certain error between classes by iteratively computing the degree of membership and the class centers using previously denoted relations. The update and presented by the following expressions: The function to update the centers is: The FCM is based on the update of the membership function during the iteration of the algorithm. The FCM thus makes the partition examine by minimizing the fitness function The FCM algorithm is as follows: Verification of the results obtained by the clustering algorithm is an essential part of the clustering process. The most important method of cluster validation is based on internal cluster validity indicators. Clustering will be good if the clusters are maximally separated from each other and if the objects within the clusters are increasingly close (compact) to the center of gravity. Thus, this operation separates data objects into different clusters with the goal of maximizing intra-cluster similarity and minimizing inter-cluster similarity. To evaluate the quality of the partitions of clustering algorithms, we will use the validity indices which are numerous and very well known in the literature. In our paper, we will implement two indices [33] to examine the new objective function used in this paper, which are:  The Partition Coefficient (PC) determines the amount of overlap between clusters: A clustering approach is considered better and efficient if the PC values are high while the SC values are low.

B. Cuckoo Search Algorithm
The metaheuristic methods are a new generation of powerful and general approximate methods that consist of a set of fundamental concepts. Among of which we find the Cuckoo Search algorithm which is a very recent meta-heuristic, inspired by the parasitism of cuckoo birds by laying their eggs in the nests of other birds, other species created by the authors Yang and Deb in 2009. This algorithm aims at breeding high quality solutions for optimization problems. In Cuckoo Search, an egg refers to a solution of the optimization problem at hand. A cuckoo egg refers to a solution just generated, and a nest means a set of possible solutions. It is based on the aggressive cuckoo breeding strategy complemented by a behavior called Levy flights [34]. The latter is a class of random walks in which the jumps are determined according to the Levy distribution which is based on a power law with infinite variance and a mean of the type [35].
The use of the Levy flight by CSA optimizes the search, this process is carried out as follows: the new solutions are generated by a random walk of Levy around the best solution obtained until now, which accelerates the global search.
The fitness function is a function that gives each solution in the search space a numerical value to show its quality. In our treatment, a better quality nest will give us access to new generations. Thus, the quality of a cuckoo egg is necessarily related to its ability to produce a new cuckoo.
Yang and Deb incorporated the Levy flight present in relation (10) to get a new solution X(t+1) generated at each cuckoo i: Where is the displacement step size.
The Levy flight represents a random walk whose random steps are defined from the Levy distribution given in the following equation: In general, the CSA steps are summarized in Algorithm 2 presented as follows:

IV. THE PROPOSED METHOD
This paper proposes a new image segmentation technique using CSA and FCM algorithms. The CSA has strong overall optimization capability and a hybridization of FCM and CSA has improved performance over traditional FCM clustering. In the approach we propose, the CSA is used to find the optimal clustering of data taking into account a new objective function of having initial cluster centers. Then the centers found by CSA are used as input for the FCM algorithm. The process of the proposed technique treats the populations as host nests and then the sum of the population gives a better solution to each generation. In other words, the better cuckoo egg is considered to be the optimal solution and will be passed on to the next generation. In our paper, the image segmentation technique is performed on a clustering method that forcefully depends on the cluster center. However, the processing is initiated by generating the cluster centers randomly and after integrating the CSA to refine the location of the class center. Then, CSA updates the class centers to minimize the fitness function of FCM to find near-optimal centers.

A. Fitness Function
The new objective function proposed in this paper to evaluate the quality of clustering results is presented as follows:    (12) The parameters of this function are presented by:  SC is the subarea coefficient determined by equation (7),  PC is the partition coefficient presented in equation (8).
 The intra_cluster [36] is computed using the equation given below: The primary objective of our proposed method is to provide a cooperative technique to globally improve the performance of image segmentation results and overcome the limitations of FCM alone. For this purpose, we use the CSA algorithm to minimize the function shown in equation (12) to get the nearoptimal initial cluster centers. Then, these centers are applied as initial inputs to the FCM. However, for the fitness function to be minimized, it must have the term value (intra_cluster + SC) low and the parameter value PC high.
The proposed approach can be summarized in the following points:  The CSA finds the near-optimal centroids, after this stage, FCM algorithm operation starts with these cluster centers generated by CSA.
 The performance of the image segmentation results is evaluated by a fitness function given by equation (12).
 The role of CSA is to search for the optimized centroids  The use of FCM on the input image by the optimal centers generated by CSA.
 The clustering is done by merging the results and gives the final segmented image.
The main steps of the hybrid algorithm of our method are presented in Fig. 1.  www.ijacsa.thesai.org

V. EXPERIMENTAL RESULTS AND DISCUSSIONS
For a better segmentation of images, it is necessary to minimize the fitness function. The minimum value of fitness corresponds to segmentation with a minimum distance between pixels belonging to the same region. So to evaluate the proposed approach and better experiment the performance and robustness in image processing, especially image segmentation, we have performed several tests on different reference images. Furthermore, we compared the proposed method with other existing clustering-based segmentation techniques that perform well, such as: FCM based on genetic algorithms [37], FCM based on particle swarm optimization [38] and the standard FCM algorithm. The algorithms used in the experiment are implemented in the MATLAB 2014b platform and run on a computer containing the following configuration: a 4th generation Intel Core (TM) i5 processor at 2.5 GHz, 4G of RAM and running Microsoft Windows 10 64 bits. The effectiveness of various image clustering approaches is analyzed and discussed by different evaluation indices to examine the quality of image segmentation. These are mean square error (MSE) [39], peak signal to noise ratio (PNSR) [40], Rand Index (RI) [41], global coherence error (GCE)) [42], boundary displacement error (BDE) [42], information of variation (VOI) [42], and correlation coefficient (CC) [43].
The FCM, FCM-PSO, FCM-GA, FCM-CSA algorithms are implemented in their original versions. Thus, the parameters have to be adjusted for each algorithm, in order to get the best matching values that can produce good image segmentation results with a short execution time.
First, we perform a series of experiments based on the modification of the number of clusters k, to search for good image segmentation results based on the evaluation parameters mentioned above. Then and in order to optimize the results obtained by CSA, we will apply the fraction Pa =0.25 which allows us to have the optimal solution. In order to approach the best image segmentation, we followed practically to choose the value of each parameter. The experiments show that the choice of cluster number k is influential on the quality of the segmented image i.e. the choice of k is dependent on the image to be segmented; therefore, to present the performance of the proposed technique and the measures of the evaluation criteria www.ijacsa.thesai.org of the algorithms used in this article, we will focus on the choice of cluster number k which is equal to 4 on several images selected from the BSD300 database [44]. Table I shows the best values of the parameters that were optimized for the algorithms used in this paper (npop is the population size and MaxIt is the number of iterations). A clustering technique is considered to be effective and good performance to evaluate the result of the segmented image if the PSNR metric value is large and the MSE value is small as well as the CC parameter value is high. The MSE, PSNR and CC parameter values of the segmented images are measured by the algorithms used in this paper. From the results shown in Fig. 4, 5 and 6, we can see that the MSE values obtained by the proposed approach are very small. On the other hand, the PSNR values obtained by our method are very high, while the correlation coefficient values are high, which clearly show that the proposed approach with the use of the objective function proposed in this paper, can generate correct segmentation results compared to other comparison algorithms.
According to the obtained results, we can conclude that the proposed hybrid algorithm shows good performance and gives better results, because the image segmented by the proposed approach generates well detailed segmentation results, the different regions of the image are visible.   In order to examine and present the effectiveness of the proposed technique, we compared the segmentation results obtained by different test images with all the algorithms used in this paper. We also evaluated the performance of the segmented image results using four well-recognized image segmentation evaluation indexes in the literature: PRI, VOI, GCE, and BDE which are mentioned earlier. Furthermore, the experiments show that the segmented image result is of good quality and closer to the ground truth, if the value of RI is larger, and the values of VOI, GCE and BDE are smaller.
According to the results displayed in Fig. 2, we can say that our approach gives better results compared to other methods, knowing that each of the segmented image results is related to its content and the number of classes we choose. For the comparison experiments, the value of the parameter K (number of clusters) was changed several times for the segmentation of different images. From the experiments performed on the different algorithms, we can see that the objects in each image can be identified or not depending on the image content and the choice of K. And according to these experiments, we chose the number of clusters equal to 4 for all the selected images in the Berkeley 300 database in order to properly present the performance of our approach and clearly visualize the quality of the segmented image, as well as the measures of the evaluation indices of the methods used.
In the BSD300 database, each image corresponds to several field truth segmentations, which leads to a segmentation result corresponding to several performance index groups. Therefore, the average value of several performance index groups is generally considered as the final performance index of the segmentation result.  We also note that the values of RI, VOI, GCE and BDE obtained by our technique are better than those obtained by the other techniques. In detail, we notice that the values of VOI, GCE and BDE of our algorithm are smaller, and the RI value is larger than that obtained by the other methods.
Based on the results of the statistical calculations presented in the previous Fig. 3, 4 and 5 and the values of the parameters of the evaluation indices, indicated in Table II applying the different image segmentation techniques used in this paper, it can be seen that the quality of the segmentation image varies from one method to another depending on the optimization algorithm used to improve the classical FCM method. In summary, the cohesion within clusters is very high by our clustering technique compared to other clustering methods. The clustering technique used in this paper which is based on the FCM optimized by CSA gives good values in terms of cluster quality measures according to the experimental results. Therefore, the detailed analysis of these results on several reference and real images shows the robustness and high efficiency of our method in terms of accuracy and reliability.

VI. CONCLUSION
FCM is the most widely used clustering algorithm in classification problems, especially in image segmentation because it is efficient and simple. However, FCM has the limitation of being sensitive to prior values and often falls into local optima. To overcome this drawback we proposed a new image segmentation method that relies on the optimization of segmentation by cuckoo search. CSA has a strong global optimization capability and hybridization of FCM with CSA will give an increased performance compared to traditional FCM clustering. Our method has been used on various images, and despite their complexity, the segmentation performed by FCM gives quite good results, and with the help of CSA, it makes a jump and gives us the optimal solution. The performance of the method has been evaluated based on the best values of the cluster evaluation indices and the values of the fitness function used in this paper. We also compared the proposed technique with other existing clustering-based segmentation algorithms such as FCM-GA and PSO-GA. The results indicate that a perfect initialization of the classes gives better results by the proposed algorithm. The experimental results showed the efficiency of our method on the different types of images used in our work and proved its robustness by making a visual analysis of the different results obtained.
Nevertheless, our approach requires the knowledge of the number of classes and it relies on the Euclidean distance to measure the similarity between an observation and the center of a class which makes it usable only to detect spherical classes. To overcome these drawbacks, I propose as a perspective of this work, to apply other hybrid methods based on recent metaheuristics for image segmentation in order to improve the quality of classification and reduce the execution time.