Pursuit Reinforcement Competitive Learning : PRCL based Online Clustering with Tracking Algorithm and its Application to Image Retrieval

Pursuit Reinforcement guided Competitive Learning: PRCL based on relatively fast online clustering that allows grouping the data in concern into several clusters when the number of data and distribution of data are varied of reinforcement guided competitive learning is proposed. One of applications of the proposed method is image portion retrievals from the relatively large scale of the images such as Earth observation satellite images. It is found that the proposed method shows relatively fast on the retrievals in comparison to the other existing conventional online clustering such as Vector Quatization: VQ. Moreover, the proposed method shows much faster than the others for the multi-stage retrievals of image portion as well as scale estimation. Keywords—Pursuit Reinforcement Guided Competitive Learning; Reinforcement Guided Competitive Learning; Sustained Reinforcement Guided Competitive Learning Vector Quantization; Learning Automata


I. INTRODUCTION
Clustering is an exploratory data analysis tool that deals with the task of grouping objects that are similar to each other [1,2,3].For many years, many clustering algorithms have been proposed and widely used.It is commonly used in many fields, such as data mining, pattern recognition, image classification, biological sciences, marketing, city-planning, document retrieval, etc.
Many cases of clustering commonly used the static data.It means that the clustering can be made after the entire data have been collected, then grouped into clusters whose members are similar in some way.In the data mining, there is a kind of data which comes every time so that we cannot stop it in a while in order to make clustering.
Online clustering is a kind of clustering that is used for dynamic data.It is not considering a number of data, but only focus on a new data and previous centroids.However, determining position of each centroid because of a new data attracted some approaches.Vector Quantization (VQ) was a very simple approach to do online clustering.It is derived from concept of competitive learning network [4], [5].Likas (1999) proposed Reinforcement Guided Competitive Learning (RGCL) [6] as an approach for on-line clustering based on reinforcement learning.It utilized the concept of reward in the reinforcement learning from winning unit in the Learning Vector Quantization.The Sustained RGCL (SRGCL) was modification of RGCL in considering a sustained exploration in reinforcement learning.On the other hand, other approaches such as modified ISODATA, k-means clustering, Self-Organization Mapping: SOM based clustering, spatial feature utilizing clustering, Fisher distance measure utilizing clustering, GA based clustering and so on are proposed in order to improve clustering performance [7]- [25].
A new approach for online clustering based on reinforcement learning, called Pursuit Reinforcement Guided Competitive Learning.PRCL which is derived from pursuit method in reinforcement learning that maintain both actionvalue and action preferences, with the preferences continually pursuing the action that is greedy according to the current action-value estimates together with learning automata is proposed.PRCL can be used as online clustering method.One of the applications is, then introduced for evacuation simulation.
The following section describes the proposed PRCL with learning automata together with the existing conventional online clustering methods of RGCL, SRGCL and VQ.Then preliminary experiments are described followed by its application of image retrievals.After all, conclusion is described with some discussions.

A. Reinforcement Learning
Reinforcement Learning is learning what to do---how to map situations to actions---so as to maximize a numerical reward signal [4].The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them.In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation and, through that, all subsequent rewards.These two characteristics (trial-and-error search and delayed reward) are the two most important distinguishing features of Reinforcement Learning.
Reinforcement Learning is defined not by characterizing learning algorithms, but by characterizing a learning problem.Any algorithm that is well suited to solving that problem we consider to be a Reinforcement Learning algorithm.Clearly www.ijarai.thesai.orgsuch an agent must be able to sense the state of the environment to some extent and must be able to take actions that affect that state.The agent must also have a goal or goals relating to the state of the environment.

B. Competitivet Learning
Competitive learning is defined as unsupervised learning method of which one of the output neurons firing through competition among the neurons without training sample.It has both of the features of supervised and unsupervised learning methods.It can be applicable for clustering.In such case, inputs are the data in concern while outputs are clusters.The following cost function is usually used.
In the clustering based on competitive learning, J implies the minimum distance min.d(x,w)between input data x i and cluster center w r .Therefore, J is sum of dissimilarity within cluster.

C. Simple Competitivet Learning
SCL (Simple Competitive Learning) is the simplest competitive learning.The basic idea of the SCL is WTA (Winner Take All).Namely, winner of the neuron gets all.The winner of the neuron is determined with the following equation, ) ( where Wi denotes weight of the neuron i while x denotes input data, and α is the coefficient for determine its convergence speed.

VQ (Vector Quantization)
Process flow of the VQ is as follows, 1) weighting vector is defined as the selected sample vector of each cluster.
2) input data belongs the sample vector of cluster i * which shows the shortest distance between input data and the sample vector.
3) sample vector is updated based on the following equation, where t denotes leaning number of which the number is incremented for each input data.The weight is increased with Δw i when the input data x j is matched with t-th sample vector.Meanwhile, weight is not changed when the input data does not match to any sample vector.Repeating these process, representative vector is updated and then most appropriate clusters are formed.

RGCL (Reinforcement Guided Competitive Learning)
All the clusters of output neurons are represented with Bernoulli units where the weight is assumed to be vector.The distance between input data and weighting vector is calculated with the following equation, ) , Then probability p i is calculated with the equation ( 5 The probability is increased in accordance with input data is getting close to weighting vector.Therefore, it is probable that the distance between input data and the neuron of which the output is 1.Then the input data belongs the cluster representing the neuron of which the output is 1. The process flow of RGCL is as follows, 1) a data is selected from the samples randomly 2) determine a winner neuron i ＊ 3) reward r i of input data x j is updated as follows, Then weight vector is updated as follows, ) )( ( SRGCL is the method which allows control the convergence speed with the parameter η which is added to the RGCL as follows, It is not always true that SRGCL is superior to RGCL.The convergence performance depends on the relation between input data and the control parameter.Therefore, the most appropriate control parameter has to be determined.

Tracking Algorithm
N arm bandit problem is defined as the machine learning problem which allows analyze a most appropriate strategy for getting the maximum prize from a slot machine with at least one lever.Learning automata is one of the N arm bandit problem solving methods in an efficient manner.www.ijarai.thesai.org The action of "draw one the specific lever" is represented as a, while play is defined with t, together with the probability of the prize is expressed with π t (a), (n + 1)-th play of total prize depends on the accumulated prize at n-th play and the current prize.In case of the total prize is increased, the probability is expressed as follows Also, the probability is represented as follows, in case of the total prize is decreased, where β is the convergence speed control parameter.If the appreciable actions are always selected, then the total prize is getting close to the maximum prize.This method is one of the learning automata.Namely, reward is provided when it is predicted to win while punishment is given when it is predicted to loose.Through these processes with actions, the total prize is getting closer to the maximum prize.

E. Proposed Clutering Method
In the convergence process of RGCL, it is sometime happened that the convergence speed is decreased and or unstable due to the weight is too large or too small.The method proposed here uses learning automata for adjustment of the weight.Namely, most appropriate prediction of win/loose probability can be done with learning automata.Thus the most appropriate reward and punishment can be given.
Online clustering method based on competitive and reinforcement learning as well as learning automata is proposed here.Namely, winner of the neuron is determined with WTA at first based on competitive neural network of basic learning method, a reward is calculated with the result of the winner neuron based on learning automata.Then the final winner neuron is determined through agent action which has the maximum reward based on reinforcement learning method.Therefore, the proposed method is called PRCL: Pursuit Reinforcement Guided Competitive Learning.
The procedure of the proposed PRCL is as follows, 1) Initializing the reward r for each data as follows, where n denotes the desirable number of cluster, while u i 0 denotes initial cluster center.
2) data is selected from the samples randomly 3) winner neuron i* を is determined with equation ( 13 where r(x,u t ) denote the current reward while r(x,u t+1 ) denotes that for the next learning number, respectively.

5)
The neuron i* which has the maximum reward is selected by equation (15) ) , ( min arg * i i i w x r u (15) 6) weight is updated with the followed equation,

F. Proposed Image Retieval Method
A huge computation resource is required for image retrieval when template matching is applied to a huge image database, in general.It is possible to reduce the required computer resource by shrinking search areas in concern with an online clustering.It is also possible to shrink the search areas of template matching by division of the image with an appropriate size together with clustering the divided images by using the proposed PRCL of online clustering.Decimated template image with 1/2 sampling rate is used for template matching to the original large sized image in concern.It can be ensuring 56.25% of matching ratio in maximum.In the second level of the decimation, it is also ensuring 78.4% of matching ratio.Feature extraction is then applied to the divided image regions with feature vectors based on color information and gray scale.After that, clustering is made based on the feature vector space.Thus the search areas can be shrinking.
The actual procedure is as follows, Table 1 shows averaged computation time required for convergence of the proposed and the other conventional methods for Iris data in the UCI repository data.In this case, the parameters for each clustering methods are as follows, VQ α = 0.1 RGCL α = 0.1 SRGCL α = 0.1, η = 0.0001 PRCL α = 0.1, β = 0.1 In this case, the maximum learning number is set at 15000 while the parameters for each method is set as follows, VQ α = 0.1 RGCL1 α = 0.1 RGCL2 α = 0.5(t <= 500), α = 0.1(t > 500) SRGCL1 α = 0.1, η = 0.0001 SRGCL2 α = 0.5(t <= 500), α = 0.1(t > 500), η = 0.0001 PRCL α = 0.1, β = 0.1 It is found that the convergence performance of RGCL and SRGCL has influenced by the parameter α.The averaged processing time over 100 times of 4000 of learning number (which is defined as 1 set) is evaluated.Table 2 shows just one of the examples of evaluation results for Iris dataset.From Table 1, it is found that the proposed method is second fastest method.The proposed method, however, shows the highest convergence performance in terms of convergence speed and stability.From Table 3, it is found that all of online clustering methods show almost same (within 5%) clustering performance for relatively simple dataset of Iris, Ruspini, while the clustering performance are different for comparatively complicated dataset, Fossil, New thyroid.In such case, the proposed PRCL shows the highest performance.In particular, clustering performance of PRCL for New thyroid is 3.54% better than VQ, and 8.64% better than RGCL as well as 18.13% better than SRGL.It is because that the PRCL is functioning for adjustment of the complexity of the input data by the learning automata.The actual procedure is as follows, 1) 120 by 120 of template image is extracted from the original ASTER/VNIR image 2) Clustering is applied to the 15 dimensional features extracted through decimation with decimation factor of 60 pixels created from the original image 3) Add the features derived from the template image and make the online clustering with VQ and the proposed PRCL (the first clustering) 4) Selected cluster region in the original image is expanded to one block further, 5) Then the decimation with the decimation factor of 30 pixels is applied to the selected cluster region in the original image 6) The online clustering of VQ or the proposed PRCL is applied to the decimated image (the second clustering) 7) Thus the best match image portion is retrieved with referring to the clustering result through the matching between template and the original image.

B. Image Retrievals
Fig. 3 (a) and (b) shows the clustered image of the first clustering while Fig. 3 (c) and (d) shows those for the second clustering.From these images, it is found that the number of clusters of the proposed PRCL is greater than that of VQ.Meanwhile, Fig. 4 shows the first clustering results of residual error (the cost function of J) through the individual 10 times trials.From the figure, it is confirmed that convergence performance of the proposed PRCL is superior to that of VQ.Table 4 shows elapsed time for the proposed and existing template matching, pyramid search as well as conventional VQ methods.From Table 4, the elapsed time of the proposed PRCL shows shortest followed by the conventional Pyramid Search, VQ and the conventional Template Matching.Although VQ is the most appropriate image retrieval method, traditionally, the proposed PRCL can reduce the process time with 3.05%.
Furthermore, the proposed PRCL can use previously reduced cluster results.Therefore, much faster image retrievals can be expected referring to the database of the cluster results for the proposed PRCL.Table 5 shows the processing time for the conventional Pyramid Search, VQ and the proposed PRCL with referring to the database.From Table 5, it is found that the proposed PRCL can reduce the process time by 1.82% in comparison to VQ while by 24.98% comparing to the conventional Pyramid Search, respectively.
It is also possible to retrieve the image portion in concern with online clustering only for all the required process.Namely, decimation with the decimation factor of 1/2 is applied to the original image recursively until the pixel interval becomes one pixel.The process time for this image retrieval method is evaluated for VQ and the proposed PRCL.Table 6 shows the evaluation result with the original image size of 128 by 128 pixels.From Table 6, it is found that the proposed PRCL achieves 9.79% shortened process time in comparison to the VQ.It is suspected that image retrievals require much longer time for the distance between the template image and the portion of original image is too long.The time required for image portion retrievals of the proposed PRCL is examined with the function of the distance.As the results of examination, it is confirmed that the process time for long distance is much longer than that for short distance.It, however, only 1% longer time is required when the distance is 10 times long.
It is also suspected that process time is varied by the complexity of the image portion.Therefore, another examination is conducted for a relation between process time and variance of the image portion.As the results of the examination, it is found that the process time depends on the variance of the image portion.Therefore, the process time for the areas of sea, forest, etc. is much shorter than those for urban, river, road network, etc. as shown in Fig. 5

C. Case Study for Image Retrievals when Scales do not Match between Template and the Original Images
Image retrieval might not be worked when the scales of the template and the original images do not match without a prior information of the scale.Even for the case with a prior information of the scale, it is hard to get a fine matching between both images when the scales are different.The procedure for the aforementioned case is as follows, 1) 120 by 120 pixels of image portion is extracted from the n times of the original image 2) The original image is sampled with 60 pixels of interval.Then clustering is applied to the sampled image in the 15 dimensional feature space of which the feature vectors are derived from the sampled images 3) Feature vector of the template image is added to the feature space.Then online clustering is applied.
4) Cluster region is expanded by one block at the selected cluster region in the original image 5) The secondary clustering applied for the sampled image in the search regions with 30 pixels' interval 6) By referring to the clustered result, nearest vector is selected from the original image to the template image vector.Then the image scale ratio is calculated with the norm of the template vector and the norm of the original image vector 7) Scale conversion is applied to the template image.Then online clustering is applied again 8) The image scale ratio is calculated again.If the image scale ratio is not changed largely, then the iteration process is reckoned to be converged.If not, the aforementioned processes are repeated 9) After that, template image matching can be done with the calculated image scale ratio and the nearest image vectors of the template and the original image portion Image retrieval results of the case are shown in Table 7 with the image scale ratio ranges from 0.5 to 2.0.Although the matching accuracy is quite good (less than one pixel) for the case that image scale ratio is one, the matching accuracy is decreased in accordance with the image scale ratio.On the other hand, the matching accuracy is getting poor when the image scale ratio is decreased less than one sharply.IV.CONCLUSION Pursuit Reinforcement guided Competitive Learning: PRCL based on relatively fast online clustering that allows grouping the data in concern into several clusters when the number of data and distribution of data are varied of reinforcement guided competitive learning is proposed.One of applications of the proposed method is image portion retrievals from the relatively large scale of the images such as Earth observation satellite images.It is found that the proposed method shows relatively fast on the retrievals in comparison to the other existing conventional online clustering such as Vector Quatization: VQ.Moreover, the proposed method shows much faster than the others for the multi-stage retrievals of image portion as well as scale estimation.
Also, it is found that the matching accuracy is quite good (less than one pixel) for the case that image scale ratio is one.Meanwhile, the matching accuracy is decreased in accordance with the image scale ratio.On the other hand, the matching accuracy is getting poor when the image scale ratio is decreased less than one sharply.
Further investigation is required for another applications of the proposed online clustering.

4 )
the reward of each output neuron corresponding to input data is updated based on equation(14)

Fig. 4 .
Fig. 4. Convergence processes of the image portion retrievals (120x120) from 4980x4200 of ASTER/VNIR image for the proposed PRCL method and the existing conventional VQ

Fig. 5 .
Fig. 5. Process time of the proposed PRCL as functions of the distance between template and image portion and variance of image portion

TABLE II .
AVERAGED COMPUTATION TIME REQUIRED FOR CONVERGENCE OF THE PROPOSED AND THE OTHER CONVENTIONAL METHODS FOR IRIS DATA IN THE UCI REPOSITORY DATA

TABLE III .
CLUSTERING ERRORS OF THE PROPOSED AND THE OTHER CONVENTIONAL METHODS FOR EACH UCI REPOSITORY DATA

TABLE IV .
ELAPSED TIME FOR THE PROPOSED AND EXISTING TEMPLATE MATCHING, PYRAMID SEARCH AS WELL AS CONVENTIONAL VQ METHODS

TABLE V .
ELAPSED TIME FOR THE PROPOSED PRCL AND EXISTING PYRAMID SEARCH AS WELL AS THE CONVENTIONAL VQ METHODS

TABLE VI .
ELAPSED TIME FOR THE PROPOSED AND CONVENTIONAL VQ METHODS

TABLE VII .
ELAPSED TIME, ESTIMATED IMAGE SCALE, ERROR IN UNIT OF PIXEL AND PERCENT FOR THE CASE THAT IMAGE SCALE DOES NOT MUCH EACH OTHER BETWEEN TEMPLATE AND SATELLITE IMAGES