Color Image Segmentation via Improved K-Means Algorithm

Data clustering techniques are often used to segment the real world images. Unsupervised image segmentation algorithms that are based on the clustering suffer from random initialization. There is a need for efficient and effective image segmentation algorithm, which can be used in the computer vision, object recognition, image recognition, or compression. To address these problems, the authors present a density-based initialization scheme to segment the color images. In the kernel density based clustering technique, the data sample is mapped to a high-dimensional space for the effective data classification. The Gaussian kernel is used for the density estimation and for the mapping of sample image into a highdimensional color space. The proposed initialization scheme for the k-means clustering algorithm can homogenously segment an image into the regions of interest with the capability of avoiding the dead centre and the trapped centre by local minima phenomena. The performance of the experimental result indicates that the proposed approach is more effective, compared to the other existing clustering-based image segmentation algorithms. In the proposed approach, the Berkeley image database has been used for the comparison analysis with the recent clustering-based image segmentation algorithms like kmeans++, k-medoids and k-mode. Keywords—k-means; k-means++; k-medoids; k-mode; kernel density component


INTRODUCTION
Unsupervised color image segmentation is an important image processing technique that has a wide application in computer vision applications, pattern recognition, image retrieval [1], image editing [2], and medical image analysis [3].The objective of image segmentation is to partition an image into the homogeneous region on the basis of an application [4].
Image segmentation algorithms that are based on the clustering can be subdivided into the hierarchical and partitioned techniques.The hierarchical clustering is a bottomup approach, where a nested cluster structure is obtained by merging the nearby data points.The partitioning clustering is an iterative partitioning process which uses k seed value as an input from the user and each object of the data set must be assigned to precisely one cluster [5].Due to simplicity and ease of implementation, the k-means clustering [6] and partitioning around medoids [7] are the popular choices for the performing image segmentation.
The k-means algorithm uses the feature of an image to find the k number of groups.The k-means algorithm aims to minimize an objective function [8], in order to find the groups.For the dataset * n + with n observations, the purpose of k-means clustering is to find the k groups in as such that the objective function f km is minimized as shown in " (1)": Here, z in is a variable defined in "(2)".
Here, represents the k th cluster and µ k represents the mean vector of the observation .
The main advantage of k-means is that it always finds local optima for any given initial centroid locations.Despite being used in a wide array of application, the k-means algorithm is not exempt from limitations.From a practical point of view, the seed value of the algorithm is vital since each seed can produce different local optima leading to the varying partitions.The quality and efficiency of the algorithm can vary far away from the global optimum, even under repeated random initialization.Therefore, a good initialization is critical for finding the globally optimal partitions.Several methods have been documented in the literature on improving the initialization procedure that changes the performance, both in terms of quality and convergence properties.
In this paper, a density-based color image segmentation technique is used to improve the results of classical partitionbased image segmentation methods, like the k-means clustering algorithm.The k-means algorithm is enhanced, by providing a reduced-set representation of kernelized center as an initial seed value.In the kernel density based clustering technique, the data sample is mapped to a high-dimensional space for effective data classification [9].One of the popular choices is the Gaussian kernel, which is used in the proposed scheme for mapping sample image into a high-dimensional color space.Moreover, a reduced-set kernelized center can be employed for reducing the computational complexity of various algorithms.
The experimental results were compared by using the four types of evaluation measures: Probabilistic Rand Index (NPR), Global Consistency Error (GCE), and Variation of Information (VOI) on the Berkeley image database.The performance of the experimental result indicates that the proposed approach is more effective as compared to other existing clustering-based www.ijacsa.thesai.orgimage segmentation algorithms such as the k-means, k-means++, k-medoids and k-mode.
The rest of the paper is framed as follows: Section 2 describes the related works.Section 3 provides the proposed approach for Image segmentation via a density-based initialization of k-means algorithm and validation measures.Result and comparative analysis are discussed in Section 4. Finally, the conclusion has been presented in section 5.

II. RELATED WORK
The k-means class of algorithms suffers from the random selection of initial cluster centers.The arbitrary choice of initial cluster centers leads to the non-repeatable clustering results that may be difficult to comprehend.The results of partitionbased image segmentation algorithms are better when the initial partitions are close to the final solution.A short review of the existing work is included in this section for clusteringbased image segmentation and computing an initial seed value for the k-means algorithm which is used for the color image segmentation.
T. Pavlidis in 1982 shows the image segmentation process from a wide perspective.It summarizes the use of different methods such as clustering, edge-based segmentation, graphbased approaches, region growing, probabilistic or Bayesian approaches and other approaches for the image segmentation [10].
Chan, et al. in 2001 introduced a region-based method known as Chan-Vese or CV model [11].This method formulates the image segmentation problem as a k-means clustering model.As pointed out in [12], a global method, the CV model cannot solve the intensity irregularity problem in a better way.Wang et al. in 2010 pointed out that the local binary fitting (LBF) method is sensitive to the initialization.To prevent this, they introduced the local order method [13].
Besides improving global methods into local versions, many researchers focus on the convexity of the segmentation models.
Arthur and Vassilvitskii in 2007 introduced the k-means++ algorithm to find the initial centers with probability proportional to the distance to the nearest center [14].Maitra, in 2009, used the local modes present in the data set to initialize the k-means algorithm which is used for the segmentation [15].
In the k-medoids [7] methods, a cluster is represented by one of its points.N-medoids are selected from the given data and clusters are defined as the subset of points close to the respective medoids.Two early versions of the k-medoids methods, the partitioning around medoids (PAM) algorithm and the clustering large applications (CLARA) are a popular choice for the image segmentation.PAM is an iterative optimization that combines the relocation of the points between the perspective clusters with re-nominating the points as potential medoids.
Zhiding et al. in 2010 documented an adaptive unsupervised method for the color image segmentation.The algorithm clusters the pixel in 3D, RGB color space by using the ant colony-fuzzy c-means hybrid algorithm (AFHA), which uses an ant system for intelligent initialization of the cluster centroids [16].
Khan et al,. in 2013, introduced a novel initialization scheme to determine the number of clusters and obtain the initial cluster centers for the fuzzy C-means algorithm to segment any kind of color images.The hierarchical approach has been used to integrate the splitting and merging techniques in order to obtain an initialization for FCM [17].S. Khan et al., in 2013, presented a solution for the randomized initialization of the k-mode algorithm.A prominent attribute selection method has been used to find an initial cluster center.It performs multiple clustering of the data based on the attribute values to obtain a deterministic mode.These modes are used for initialization [18].
The k-modes [19] algorithm allows the user to work with a kernel density estimate of bandwidth "σ" but produces exactly k clusters.It finds the centroids that are valid patterns and lie in the high-density area.The k-modes algorithm uses local bandwidth at each point rather than a global one.
A good initialization scheme will improve the results of clustering.Thus, following in the same direction, a new initialization technique for color image segmentation has been proposed.The k-mean, k-means++, k-medoids, and k-modes are cumulatively used to demonstrate the effectiveness of the proposed approach.

III. PROPOSED METHOD
The pixel of a color is represented by the three values corresponding to the R (red), G (green), and B (blue).By using either linear or nonlinear transform on the RGB scale, one can find the color models such as intensity, saturation, and hue.
Each color space has its own characteristic.In the colorbased clustering technique, it is desirable that the selected color features are defined in a uniform color space [20].In order to get the uniform color space, kernel density estimation has been used for estimating the probability density function of a continuous random variable [21].In the proposed method, a Gaussian kernel based initialization scheme for color image segmentation is used.Unlike the standard k-means, the proposed algorithm uses a density estimate to select an initial cluster center from the color space.
In the following section, the algorithm to select initial seed value for the k-means algorithm that has a high impact on the color image segmentations is presented.These initial points are selected from the denser region of the data sets.
The algorithm starts by choosing the attribute value in a n×m data matrix, having maximum variance.A Gaussian kernel is placed over each data point of the selected attribute for the estimation of density.Further, the first seed point is selected, where the density is maximum.The next probable point is selected, that has a density equivalent to the initially selected point.In this way, the k points are obtained, having a similar density with respect to each other.
The kernel density technique is used to estimate the probability density function of a continuous random variable.www.ijacsa.thesai.orgLet n be a sample from a variable P, then the kernel density estimate is a "sum of n kernel functions".In this paper, the popular Gaussian kernels have been used.
Each Gaussian kernel function is centered on a sample data point with the variance h, which is defined as the bandwidth and it is used to control the level of smoothing.The density of the data points depends on the width of Gaussian kernel, so a proper value of h is obtained from the Silverman approximation rule for which, | | - w standard deviation of the sample points P [22].
In one-dimensional case, the density estimator is defined as follows: For the d-dimensional case, the kernel function is the product of d Gaussian functions, each with its own bandwidth henceforth; the density estimator is defined as follows: Where a d-dimensional point p is denoted by .
The proposed algorithm is briefly described in the following steps: Step1.The data set is first normalized.
Step2.The attribute which has the maximum variance is selected.
Step3.The density estimate for the selected attribute is computed by using the Gaussian window.For multidimensional data, the kernel function is computed as the product of d Gaussian function, each with its own bandwidth h j and the density estimator is given in "(4)".
Step4.The first point is selected where the density is maximum.The next (k-1) points are selected from the other denser regions of the data set, so as the density of the points are similar or equivalent with respect to each other.
Step5.The indices of selected k data points are used for the initialization purpose.
Step6.The k-means algorithm is executed with the help of initial seed value, computed in the previous step.
The motive behind the above method is to find the initial points from the denser area of the given dataset.In this way, the selected data points represent the common characteristics of the entire dataset and are used for the initialization.
The modified k-means algorithm now used to segment any color image.A color image is passed to the modified algorithm and a suitable value of the number of a segment is also set to get the desired segmented image.In proposed algorithm, the images from the Berkeley Image Database [26] are used to test the validity of the proposed method.Nine images are used though it possible to test it over all the images.

A. Validation Measures
The Normalized Probabilistic Rand (NPR) index, Variation of Information (VOI), Global Consistency Error (GCE), and peak signal to noise ratio (PSNR) is used as the validity measure to check the quality of the segmented image.It is important to evaluate the quality of a segmented image obtained by various clustering algorithm because the results of various clustering algorithms gives different results.The NPR index [23] is the generalized version of rand index, which is used to measure the quality of clustering results.The NPR uses the hand-labeled set of ground-truth segmentation to perform a comparison between two image segmentation algorithms.The value of NPR is in the range of -1 to 1, where the high value indicates better segmentation.
The GCE measures the consistency level between the outputs of two segmentation algorithm applied to a given image.It also shows whether there is a refinement relation between the two segments or a possible overlap of pixels.The range of the GCE is between 0 and 1.A value close to zero represents better segmentation.The Variation of Information metric defines the information gain or information loss between the two segments.It also measures the degree of randomness in the given segment.The range of VOI is [0, ∞], a smaller value indicates better results [24].The PSNR value represents the region homogeneity between image and its segmented image.The higher value indicates the better segmentation results.

IV. RESULTS AND DISCUSSION
For each algorithm, its correctness is measured by the NPR index, GCE, VOI, and PSNR as well as its stability with respect to changes in the parameter settings and with respect to the different images.An algorithm which produces correct segmentation results with a wide array of parameters on any image, as well as accurate segmentation results on multiple images with the same parameters, will be useful for the preprocessing step in a larger system.
The results are based on the Berkeley image segmentation database [25], which contains 300 natural images along with several ground truth hand segmentations for each image [26].In contrast to the results presented in this database, the proposed algorithm uses the same image feature (position and color) for segmentation, thereby making their output directly comparable.
The proposed method can be applied directly to the 321×481 images taken from the Berkeley database.Due to a few homogeneous groups in an image, it is efficient to reduce the image size for the computation of centroids.Also the MATLAB implementations of k-mode and k-medoids clustering encounter memory errors for the high value of image resolution (running 32-bit MATLAB on a machine with 2GB RAM).Hence, the images are downsized to 64×64.
Nine images from Berkeley image segmentation database are used.The images are gm n F gu "a" o F gu " " for k=2 and k=3 value, see fig. 1. www.ijacsa.thesai.orgThe Value k=3 represents the number of color segments present in an image.The value k=3 is used because the database itself segmented it into three regions.In order to compare the results of the proposed method with the human segmented image, the value of k as three is used.That is why the value of k=3 is used for an image a, d, e, f, g, i and k=2 for the image b, c, and h.
In Figure 1, the first column is the true images, the second column is for the human segmented images, the third column contains the test images, and all other columns are the output of the various algorithm used in the paper.
The result of comparative analysis of recent clustering based image segmentation is presented in the Tables 1-9.A histogram comparison of the validation test is also included for better visualization; see Figures 2-10.
After analysing the results, it is also observed that the proposed approach (kernel) performed better compared to the recent clustering based color image segmentation.The NPR results indicate that the proposed algorithm is better in 8 out of nine images used for segmentation and it is equivalent to an image (h), which can be seen in Tables 1-9 and Figures 2-10.The PSNR results show that the algorithm is better in 8 out of nine cases with respect to other algorithm used for comparison.Only for the image (f), the proposed algorithm gives poorer results.The GCE and VOI validation test shows that the proposed method is better in 7 out of 9 cases and equivalent in two cases, see Tables 1-9 and Figures 2-10.It can also be observed that the proposed algorithm gives significantly better results in terms of PSNR and NPR but for other two validation measure results are not significant.The NPR and PSNR value indicate that the proposed method is useful for color image segmentation.Numerical experiments on the images from the Berkeley database shows that the proposed method is able to perform competitively against the popular clustering-based image segmentation algorithms and often give a close solution with www.ijacsa.thesai.orgrespect to human segmented images.Therefore, the proposed technique can be used for effective image segmentation.

V. CONCLUSIONS
A density-based algorithm for initializing the k-means algorithm has been proposed and used in the color image segmentation.Four popular clustering-based image segmentation techniques are used for comparison of the results.The list does not include every possible strategy proposed in the literature.Indeed, it is not practical to compare every method available.However, the work provides a starting point in refining and evaluating new strategies for the k-means algorithm used in the image segmentation.
The Image is first segmented with the k-means algorithm and then with k-means++, k-medoids, k-mode and at last with the proposed method.The results of the segmentation are compared with the help of four validation measures.On the performance basis, the proposed method is better than the kmeans and other partitioned based segmentation techniques.The only difference between these techniques is the way of getting initial seed pixel: in the k-means random selection is used while in k-means++ pixels are generated by the weighted probability distribution of the spectrum while the proposed method uses the density of the pixel.

Fig. 1 .
Fig. 1.Segmentation results (Each row from left to right: original image, human segment, test image, k-means based segmentation, k-means++ based segmentation, k-medoids based segmentation, k-mode based segmentation, proposed algorithm)

Fig. 7 .
Fig. 7. Comparison of validation results on image on image (f) for k=3

Fig. 10 .
Fig. 10.Comparison of validation results on image on image (i) for k=3

TABLE II .
VALIDATION TEST RESULTS ON IMAGE (B)

TABLE III .
VALIDATION TEST RESULTS ON IMAGE (C) Fig. 4. Comparison of validation results on image on image (c) for k=2

TABLE IV .
VALIDATION TEST RESULTS ON IMAGE (D)

TABLE VII .
VALIDATION TEST RESULTS ON IMAGE (G) Fig. 8.Comparison of validation results on image on image (g) for k=3 Fig. 9. Comparison of validation results on image on image (h) for k=2

TABLE IX
. VALIDATION TEST RESULTS ON IMAGE (I) k-means k-means++ k-medoids k-mode