Genetic Algorithm Utilizing Image Clustering with Merge and Split Processes Which Allows Minimizing Fisher Distance Between Clusters

Genetic algorithm utilizing image clustering with merge and split processes which allows minimizing Fisher distance between clusters is proposed. Through experiments with simulation and real remote sensing satellite imagery data, it is found that the proposed clustering method is superior to the conventional k-means and ISODATA clustering methods in comparison to the geographic maps and classification results from Maximum Likelihood classification method.


INTRODUCTION
There are hierarchical and non-hierarchical clustering methods.In particular, k-means and ISODATA clustering methods are well known as conventional clustering methods with relatively high clustering performance.One of the problems of the conventional clustering methods is poor clustering performance when the data distributions are overlapped and when the data distribution is concave shape, not convective.Also Genetic Algorithm: GA clustering with fitness function of Fisher distance between clusters or some other definition of distance is proposed.
The clustering method proposed here is GA based clustering with merge and split of the cluster.Therefore, it is possible to minimize fitness function (Fisher distance, in this case) through GA process.Also it is possible to refine the clusters created through GA process by merge and split process like ISODATA.
The following section describes the proposed GA based clustering with merge and split process followed by some experiments.Then conclusion is described together with some discussions.

A. GA Based Clustering Method
GA allows minimization of fitness function which is defined as distance between clusters, in general.GA based clustering performance depends on the definition of fitness function.
Fundamental scheme for the GA based image clustering is shown in Figure 1.Image data is expressed with two dimensional m by m matrixes.Chromosome is defined as pixel, at first.Then selection of chromosome has to be done (based on elite selection strategy in this case).Through cross over, and mutation processes, chromosome is updated by minimizing fitness function such as distance between clusters.Thus chromosome represents cluster number, finally.Conversion can be defined with difference of chromosome between the previous and the current chromosomes.The proposed clustering method introduces Fisher distance (Ratio of within cluster variance and between cluster variance) as fitness function.

B. Problem Discription
Basically, GA provides one of local minima, not global optimum solution.Therefore, clustering performance is not good enough.In particular, data distributions in feature space are overlapped as shown in Figure 2. It should consist of three clusters.It, however, they are overlapped each other.Therefore, it is not easy to make clusters.Furthermore, data distributions are concave shape, not convective shape Figure 3 shows conceptual illustration of process flow of the proposed merge and split procedure.In this case, cluster #1 is distributed as concave shape.If GA based cluster is applied to the data with the number of cluster is two.Then cluster #1 includes not only cluster #1 of data but also cluster #2 data at first.The number of clusters then increased from two to three.After that GA based clustering is applied to the data results in three clusters.

A. Preliminary Simulations
In order to create the aforementioned situation of data distribution like Figure 3, two clusters of simulation data is created.The simulation data is generated manually and is distributed as shown in Figure 4   Relation between Fisher distances after and before the merge and split process is shown in Figure 9. Fisher distance is increased remarkably by the GA algorithm.

B. Experiments with Remote Sensing Satellite Images
Landsat-5 TM image data of Saga city, Kyushu Japan which is acquired on 21 May 1985 is used for experiments.Figure 10 shows just two bands of images while Figure 11 shows data distribution on two dimensional feature plane.The number of clusters is set as two.Maximum Likelihood classification: MLH is applied to the data, first.Then k-means, ISODATA and the proposed GA based clustering methods are applied.Figure 12 shows the clustered results of these methods.The clustered resultant image of the proposed GA based method is very similar to that from MLH classification method while these from the conventional clustering methods, k-means and ISODATA differ from that from MLH classification method.On the other hand, the clustered result on the two dimensional feature plane for k-means, ISODATA, and the proposed methods are shown in Figure 13.Same experiment is conducted with the different portion of the same satellite image data.Images of Band 1 and 2 are shown in Figure 14.Data distribution on two dimensional feature plane is shown in Figure 15.Using this image data, comparison among the aforementioned three clustering methods is conducted.k-means, ISODATA and the proposed GA based clustering methods are applied.Figure 16 shows the clustered results of these methods.
The clustered resultant image of the proposed GA based method is very similar to that from MLH classification method while these from the conventional clustering methods, k-means and ISODATA differ from that from MLH classification method.On the other hand, the clustered result on the two dimensional feature plane for k-means, ISODATA, and the proposed methods are shown in Figure 17.For the proposed GA based method, between cluster variance depends on iteration number, obviously.Figure 18 (a) and (b) shows the between cluster variances for the original image #1 and #2, respectively.

C. Comparison with Geographical Map
Comparison between geographical map and the clustered resultant images is conducted.Figure 19 (a) shows the geographical map while Figure 19 (b) shows Landsat-5 TM image.Figure 20 shows the clustered images by the proposed method with and without merge and split processes.The clustered image by the proposed method with merge and split is much similar to the geographical map data than that by the proposed method without merge and split processes.Therefore, it is said that the proposed GA based clustering method is superior to the other conventional clustering methods, k-means and ISODATA as well as the proposed GA based clustering method without merge and split processes.The clustered image by the proposed method with merge and split is much similar to the geographical map data than that by the proposed method without merge and split processes.Therefore, it is said that the proposed GA based clustering method is superior to the other conventional clustering methods, k-means and ISODATA as well as the proposed GA based clustering method without merge and split processes.

Fig. 1 .
Fig. 1.Fundamental scheme for the GA based clustering

Fig. 2 .Fig. 3 .
Fig. 2. Well overlapped data distribution in feature space in two dimensional feature spaces which are corresponding to two bands, Band 1 and Band 2. The image consists of 32 by 32 pixels.Two clusters of simulation imagery data of Band 1 and Band 2 are shown in Figure 5 (a) and (b), respectively.The clustered result by k-means clustering is shown in Figure 6 while the clustered result from the proposed method is shown in Figure 7.In the proposed method, the parameters for GA are as follows, Decreasing factor for elite selection strategy: 0.75 Cross over probability: 0.6 Mutation probability: 0.03, Range for iteration: 1000 to 3000 Meanwhile, clustered resultant images for k-means and the proposed GA based clustering methods are shown in Figure 8.

Fig. 9 .
Fig. 9. Relation between Fisher distances after and before the merge and split process

Fig. 15 .
Fig. 15.Data distribution on two dimensional feature plane

2 Fig. 18 .
Fig. 18. Between cluster variances for the original images 1 and 2 Fig. 20.Clustered images by the proposed method with and without merge and split processes IV.CONCLUSION Genetic algorithm utilizing image clustering with merge and split processes which allows minimizing Fisher distance between clusters is proposed.Through experiments with simulation and real remote sensing satellite imagery data, it is found that the proposed clustering method is superior to the