Image Clustering Method Based on Density Maps Derived from Self-Organizing Mapping: SOM

A new method for image clustering with density maps derived from Self-Organizing Maps (SOM) is proposed together with a clarification of learning processes during a construction of clusters. It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. In accordance with the experimental results with Landsat-5 TM image, it takes more than 20000 of iteration for convergence of the SOM learning processes.


INTRODUCTION
Clustering method is widely used for data analysis and pattern recognition [1]- [4].Meanwhile, Self-Organizing Map: SOM proposed by T. Kohonen is a neural network with two layers which allows use as un-supervised classification, or learning method [5] based on a similarity between separable data groups to be classified [6].In other word, SOM is a visualization tool for multi-dimensional data rearranging the data in accordance with a similarity based on a learning process with the statistical characteristics of the data.It is used to be used for pattern recognition in combination with Learning Vector Quantization (LVQ 1 ).SOM is consists of mdimensional input layer which represent as a vector and two dimensional output layer which is also represented as a vector connected each other nodes between input and output layers with weighting coefficients.In a learning process, winning unit is chosen based on the difference between input vector and weighting coefficients vector then the selected unit and surrounding units get closer to the input vector.
SOM is utilized for clustering [7].After a learning process, a density map 2 is created in accordance with code vector 1 http://en.wikipedia.org/wiki/Learning_Vector_Quantization 2 http://books.google.co.jp/books?id=wxvQoFy1YBgC&pg=SA1-PA210&lpg=SA1-PA210&dq=density+map+SOM&source=bl&ots=sU95Gi28u g&sig=uZBXSATAqYaXPJtkmrGHts7uoqU&hl=ja&sa=X&e density.Based on the density map, a pixel labeling 3 can be done.This is the basic idea on the proposed image clustering method with SOM learning.Other than this, clustering methods with learning processes, reinforcement learning is also proposed for image retrievals [8] and rescue simulations [9].Also probability density model for SOM is proposed.
The image clustering method with SOM learning based on density map is proposed in the following section followed by experimental results with satellite remote sensing imagery data.Then finally, conclusions and some discussions are described.

II. PROPOSED IMAGE CLUSTERING METHOD
Firstly imagery data are mapped to a feature space.In parallel, SOM learning process creates a density map in accordance with a similarity between the mapped data in the feature space and density map or between input data in the feature space and two dimensional density maps.As a result of SOM learning process, code vector is obtained.It is easy to recognize the density of the code vector visually.Although code vector density map represent cluster boundaries, it is not easy that neither to determine a boundary nor to put a label to the pixel in concern by using the density map.The method proposed here is to use density map for finding boundaries among sub-clusters then some of sub-clusters which have a high similarity are to be merged in the following procedure, where h(t) denotes neighboring function or weighting function including learning coefficients.
where N(t) denotes the number or size of neighboring units.a(t) is called learning coefficient and ranges from 0 to 1 as is expressed as follows, where a 0 is an initial value and T denotes the number of total learning number or the number of update.In the equation ( 1), [x(t)-m(t)] implies cost function 4 which should be minimized, and if c=argmin.||x-mi || (4) i is obtained then such m i unit is called winning unit.The neighboring unit is defined around m i unit.The size of the neighboring unit, N(t) is a variable which starts with a relatively large then is getting small reaching to the wining unit only after the SOM learning process.
The SOM learning process is illustrated in Fig. 1.The existing clustering algorithm such as k-means clustering algorithm 5 is similar to the SOM learning process.If m i is redefined as mean vector of cluster i, then the cost function defined in the k-means clustering is expressed as follows, Therefore, the mean vector of each cluster is determined to minimize the equation ( 6) of cost function.Let I(x(t)) be a binary function and is equal to 1 if the x(t) belongs to the cluster i and is 0 if the x(t) does not belong to the cluster i, then the cost function can be rewritten as follows, Meanwhile m i (x(t)) is updated as follows, It is because of the following equation.
The difference of input data is enhanced in the output layer unit through SOM learning so that similar code vector of the unit becomes formed.Meanwhile, if the similar input data are separated in their location each other, it becomes neighboring units in the output layer unit.Density map f(j,k) is defined as follows, where D is neighboring unit, 8 neighbor unit centered the unit in concern in this paper.This density map has the relation among the input imagery data, feature space and SOM learning process as is illustrated in the Fig. 2.This is an inverse function of the similar data concentration so that the density map obtained by a SOM learning process is quite similar to the distribution in the feature space mapped    Fig. 4 shows a preliminary result of density map, binarized density map and clustering result with increasing of the iteration number.In this case, initial variances of the two clusters are set at 0.03.In accordance with the number of iteration, density map becomes clear together with binalized density map.Furthermore, cluster result becomes ideal goal.

A. Simulation Data
Experiments with simulated imagery data and real satellite remote sensing imagery data are conducted.With a random number generator, three types of 30 sets of simulated imagery data consists of 32 by 32pixels are generated.The first type is the most separable data set with a cluster to cluster distance, between cluster variance σ b = 8σ (σ means a within cluster variance) while the third one is the most difficult to separate data set of σ b = 3σ and the second one is the middle between the easiest and difficult, σ b = 4σ.The number of clusters is set to two.
Although the original simulated images are not illustrated in the figure, it is quite obvious that the right half of image portion is cluster #1 and the left half is cluster #2.The top number shows the number of iteration so that SOM learning process is started from the left hand side.As is illustrated in the figure, the estimated boundary in the density map varies so remarkably.In conjunction with the changes of the density map, clustered result is varied.It is also found that the probability of the correct clustering becomes high in accordance with the number of iteration.
Also an example of SOM leaning process is shown in Fig. 6.It takes a long time for the SOM learning with a relatively long between cluster distance (difficult to cluster) while it converged at the number of iteration of around 1000 for the relatively short between cluster distance (easy to cluster) as is shown in Fig. 7. True simulated data consists two clusters and adjacent each other cluster at the center line of simulation data.It is shown that two clusters can be separated into two right and left regions in accordance with the iteration number, learning processes.Fig. 8 shows a portion of Landsat-5 TM image for each spectral band.Also, Fig. 9 shows the clustered results for the proposed SOM based clustering with density map, k-mean clustering and supervised classification of Maximum Likelihood classification: MLH as well as a portion of original Landsat-5 TM image which is corresponding area to the area used.For these experiments with real remote sensing satellite imagery data, five classes or clusters, Ariake sea, Road, Paddy field, Bare soil, Artificial construction (houses) are set.By referring the corresponding topographic land use map of Saga, Japan together with the original Landsat-5 TM image, it is found that the clustered result from the proposed method is more appropriate than that from k-mean clustering and MLH.In particular, detailed portion of tiny road between paddy fields are classified with the proposed method.SOM learning process is shown in Fig. 10.In accordance with increasing of iteration number, boundaries of the density map are getting much clear.Furthermore, the clustered results become a true classified map with increasing of iteration number.
Table 1 shows confusion matrix between SOM clustering and MLH classification.Percent Correct Classification: PCC is 88.8% so that classification results for both SOM clustering and MLH classification are similar except soil and water body.Spectral characteristics of these soil and water body are quite similar.Therefore, it is understandable the poor classification performance between soil and water body.www.ijacsa.thesai.orgIn this case with the real satellite remote sensing imagery data, it takes much long time (more than 20000 times of iteration is needed) as is shown in Fig. 10.Also, it is found that there are a few local minima until the SOM learning is converged.
Separability is defined as a ratio between intra cluster variance and between cluster variance.It is also found that the mean of separability6 , between cluster variance of the proposed SOM based image clustering method with density map is around 16% better than the existing k-mean clustering as is shown in Table 2. www.ijacsa.thesai.org

IV. CONCLUSION
A new method for image clustering with density maps derived from Self-Organizing Maps (SOM) is proposed together with a clarification of learning processes during a construction of clusters.It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data.It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering.
It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data.It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering.In accordance with the experimental results with Landsat-5 TM image, it takes more than 20000 of iteration for convergence of the SOM learning processes.Therefore, acceleration of learning process is a next issue for research.

( 1 )
Create density map based on SOM learning (2)Binary image is generated from the density map (3)Define sub-clusters in accordance with the separated areas of the binary image (4)Calculate similarities of the sub-clusters (5)Merge the sub-clusters which show the highest similarity (6)Process (4) and (5) until the number of clusters reaches the desired number of clusters i=hijYT7L0CIibiQfn0NSTAw&ved=0CGkQ6AEwBA#v=one page&q=density%20map%20SOM&f=false 3 http://books.

Figure 1 .
Figure 1.Illustrative view of the SOM learning process 5 http://books.google.co.jp/books?id=WonHHAAACAAJ&dq=k-means+clustering&hl=ja&sa=X&ei=hirYT_DvF8PJmQWX8 KGgAw&ved=0CD4Q6AEwAQ www.ijacsa.thesai.orgfrom the input data.An example of density map is illustrated in the Fig.3.In the figure, dark portion means dense of code vector meanwhile light portion is sparse of code vector and becomes boundary between the different clusters.

Figure 2 .
Figure 2. Relations among the input imagery data, feature space and density map generated through SOM learning.

Figure 3 .
Figure 3. Example of density map as a result of SOM learning process.

Figure 4
Figure 4 Example of preliminary result of density map, binarized density map and clustering result with increasing of the iteration number (multiplied by 512).

Fig. 5
Fig.5 shows examples density map, estimated boundary and clustered result for the easiest separate type of simulated imagery data Clustering has been done in an iterative manner.The example shows iteration number 1 to 9 as an example.Density map and estimated boundary changes by iteration by iteration results in refinement of the cluster results.Thus the proposed method may reach a final cluster result.

Figure 5
Figure 5 Examples density map, estimated boundary and clustered result for the easiest separate type of simulated imagery data

Figure 6
Figure 6 Example of SOM learning process for three simulation imagery data sets

Figure 7
Figure 7 Landsat-5 TM image of northern Kyushu, Japan used.

Figure 8 Figure 10
Figure 10 SOM learning process (Density map: top row, clustered result: bottom row, iteration number is x multiplied by 1024)

Figure 10
Figure 10 Example of a learning process for Landsat-5 TM imagery data clustering with the proposed SOM based image clustering method.

TABLE I .
CONFUSION MATRIX BETWEEN SOM AND MLH

TABLE II .
SEPARABILITY AMONG FIVE CLUSTERS FOR BOTH K-MEAN CLUSTERING AND THE PROPOSED CLUSTERING METHOD FOR LANDSAT TM IMAGERY DATA