Pursuit Reinforcement Competitive Learning : PRCL based Online Clustering with Learning Automata-PRCL and its Application to Evacuation Simulation -

A new online clustering method based on not only reinforcement and competitive learning but also pursuit algorithm (Pursuit Reinforcement Competitive Learning: PRCL) as well as learning automata is proposed for reaching a relatively stable clustering solution in comparatively short time duration. UCI repository data which are widely used for evaluation of clustering performance in usual is used for a comparative study among the existing conventional online clustering methods of Reinforcement Guided Competitive Learning: RGCL, Sustained RGCL: SRGCL, Vector Quantization, and the proposed PRCL. The results show that the clustering accuracy of the proposed method is superior to the conventional methods. More importantly, it is found that the proposed PRCL is much faster than the conventional methods. The proposed method is then applied to the evacuation simulation study. It is found that the proposed method is much faster than the conventional method of vector quatization to find the most appropriate evacuation route. Due to the fact that the proposed PRCL method allows finding the most appropriate evacuation route, collisions among peoples who have to evacuate for the proposed method is much less than that of vector quatization. Keywords—Pursuit Reinforcement Guided Competitive Learning; Reinforcement Guided Competitive Learning; Sustained Reinforcement Guided Competitive Learning Vector Quantization; Learning Automata


INTRODUCTION
Clustering is an exploratory data analysis tool that deals with the task of grouping objects that are similar to each other [1,2,3].For many years, many clustering algorithms have been proposed and widely used.It is commonly used in many fields, such as data mining, pattern recognition, image classification, biological sciences, marketing, city-planning, document retrieval, etc.
Many cases of clustering commonly used the static data.It means that the clustering can be made after the entire data have been collected, then grouped into clusters whose members are similar in some way.In the data mining, there is a kind of data which comes every time so that we cannot stop it in a while in order to make clustering.
Online clustering is a kind of clustering that is used for dynamic data.It is not considering a number of data, but only focus on a new data and previous centroids.However, determining position of each centroid because of a new data attracted some approaches.Vector Quantization (VQ) was a very simple approach to do online clustering.It is derived from concept of competitive learning network [4], [5].Likas (1999) proposed Reinforcement Guided Competitive Learning (RGCL) [6] as an approach for on-line clustering based on reinforcement learning.It utilized the concept of reward in the reinforcement learning from winning unit in the Learning Vector Quantization.The Sustained RGCL (SRGCL) was modification of RGCL in considering a sustained exploration in reinforcement learning.On the other hand, other approaches such as modified ISODATA, k-means clustering, Self-Organization Mapping: SOM based clustering, spatial feature utilizing clustering, Fisher distance measure utilizing clustering, GA based clustering and so on are proposed in order to improve clustering performance [7]- [25].
A new approach for online clustering based on reinforcement learning, called Pursuit Reinforcement Guided Competitive Learning.PRCL which is derived from pursuit method in reinforcement learning that maintain both actionvalue and action preferences, with the preferences continually pursuing the action that is greedy according to the current action-value estimates together with learning automata is proposed.PRCL can be used as online clustering method.Image search application is discussed [26].Another application is, then introduced for evacuation simulation.
The following section describes the proposed PRCL with learning automata together with the existing conventional online clustering methods of RGCL, SRGCL and VQ.Then preliminary experiments are described followed by its application of evacuation simulation.After all, conclusion is described with some discussions.

II.
THEORETICAL BACKGROUD Individuals appear and disappear in evacuation simulation.For instance, when a disaster occurs, individuals disappear if the individual evacuated in a safe area.Therefore, the number of individual varies for time being.The conventional cluster methods allow making clusters when the individuals are fixed.It is time consumable that conventional clustering is applied to the individuals each time of the number of individuals are changed.Therefore, online clustering is effective in such case.One of the problems of online clustering is computing performance.It has to be completed in a real time basis.Another problem of online clustering is clustering accuracy.In order to improve clustering accuracy, the proposed method www.ijarai.thesai.orgintroduces reinforcement competitive learning with learning automata.Learning automata featured reinforcement competitive learning is new original idea.

A. Reinforcement Learning
Reinforcement Learning is learning what to do and how to map situations to actions so as to maximize a numerical reward signal [4].The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them.In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation and, through that, all subsequent rewards.These two characteristics (trial-and-error search and delayed reward) are the two most important distinguishing features of Reinforcement Learning.
Reinforcement Learning is defined not by characterizing learning algorithms, but by characterizing a learning problem.Any algorithm that is well suited to solving that problem we consider to be a Reinforcement Learning algorithm.Clearly such an agent must be able to sense the state of the environment to some extent and must be able to take actions that affect that state.The agent must also have a goal or goals relating to the state of the environment.

B. Competitivet Learning
Competitive learning is defined as unsupervised learning method of which one of the output neurons firing through competition among the neurons without training sample.It has both of the features of supervised and unsupervised learning methods.It can be applicable for clustering.In such case, inputs are the data in concern while outputs are clusters.The following cost function is usually used.
In the clustering based on competitive learning, J implies the minimum distance min.d(x,w)between input data x i and cluster center w r .Therefore, J is sum of dissimilarity within cluster.

C. Simple Competitivet Learning
SCL (Simple Competitive Learning) is the simplest competitive learning.The basic idea of the SCL is WTA (Winner Take All).Namely, winner of the neuron gets all.The winner of the neuron is determined with the following equation, ) ( Where Wi denotes weight of the neuron i while x denotes input data, and α is the coefficient for determine its convergence speed.

VQ (Vector Quantization)
Process flow of the VQ is as follows, 1) weighting vector is defined as the selected sample vector of each cluster.
2) input data belongs the sample vector of cluster i * which shows the shortest distance between input data and the sample vector.
3) sample vector is updated based on the following equation, Where t denotes leaning number of which the number is incremented for each input data.The weight is increased with Δw i when the input data x j is matched with t-th sample vector.Meanwhile, weight is not changed when the input data does not match to any sample vector.Repeating these process, representative vector is updated and then most appropriate clusters are formed.

RGCL (Reinforcement Guided Competitive Learning)
All the clusters of output neurons are represented with Bernoulli units where the weight is assumed to be vector.The distance between input data and weighting vector is calculated with the following equation, ) , ( Then probability p i is calculated with the equation (5), The probability is increased in accordance with input data is getting close to weighting vector.Therefore, it is probable that the distance between input data and the neuron of which the output is 1.Then the input data belongs the cluster representing the neuron of which the output is 1.
The process flow of RGCL is as follows, 1) a data is selected from the samples randomly 2) determine a winner neuron i ＊ 3) reward r i of input data x j is updated as follows, Then weight vector is updated as follows, ) )( ( SRGCL is the method which allows control the convergence speed with the parameter η which is added to the RGCL as follows, It is not always true that SRGCL is superior to RGCL.The convergence performance depends on the relation between input data and the control parameter.Therefore, the most appropriate control parameter has to be determined.

Learning Automata
N arm bandit problem is defined as the machine learning problem which allows analyzes a most appropriate strategy for getting the maximum prize from a slot machine with at least one lever.Learning automata is one of the N arm bandit problem solving methods in an efficient manner.
The action of "draw one the specific lever" is represented as a, while play is defined with t, together with the probability of the prize is expressed with π t (a), (n + 1)-th play of total prize depends on the accumulated prize at n-th play and the current prize.In case of the total prize is increased, the probability is expressed as follows, Also, the probability is represented as follows, in case of the total prize is decreased, Where β is the convergence speed control parameter.If the appreciable actions are always selected, then the total prize is getting close to the maximum prize.This method is one of the learning automata.Namely, reward is provided when it is predicted to win while punishment is given when it is predicted to loose.Through these processes with actions, the total prize is getting closer to the maximum prize.

E. Proposed Clutering Method
In the convergence process of RGCL, it is sometime happened that the convergence speed is decreased and or unstable due to the weight is too large or too small.The method proposed here uses learning automata for adjustment of the weight.Namely, most appropriate prediction of win/loose probability can be done with learning automata.Thus the most appropriate reward and punishment can be given.
Online clustering method based on competitive and reinforcement learning as well as learning automata is proposed here.Namely, winner of the neuron is determined with WTA at first based on competitive neural network of basic learning method, a reward is calculated with the result of the winner neuron based on learning automata.Then the final winner neuron is determined through agent action which has the maximum reward based on reinforcement learning method.Therefore, the proposed method is called PRCL: Pursuit Reinforcement Guided Competitive Learning.
The procedure of the proposed PRCL is as follows, 1) Initializing the reward r for each data as follows, Where n denotes the desirable number of cluster, while u i 0 denotes initial cluster center.
2) data is selected from the samples randomly 3) winner neuron i* をis determined with equation (13) ) , ( m i n a r g 4) the reward of each output neuron corresponding to input data is updated based on equation ( 14) where r(x,u t ) denote the current reward while r(x,u t+1 ) denotes that for the next learning number, respectively.

5)
The neuron i* which has the maximum reward is selected by equation ( 15)

A. Preliminary Experiments
Comparative study on online clustering performance is conducted with Iris, Wine, New thyroid, Ruspini, Chernoff and Fossil datasets form the well-known UCI repository.One of the examples of learning process and convergence speed for Iris dataset is shown in Fig. 1.In this case, the maximum learning number is set at 15000 while the parameters for each method is set as follows, 1 shows convergence processes of the proposed online clustering method (Pursuit Reinforcement guided Competitive Learning: PRCL) and the other conventional methods with a variety of parameters for UCI repository data.Meanwhile, Fig. 2 shows relation between cost function J and computation time required for reach J to the designated values.It is found from Fig. 1 that the convergence performance of RGCL and SRGCL has influenced by the parameter α.The averaged processing time over 100 times of 4000 of learning number (which is defined as 1 set) is evaluated.Table 1 shows just one of the examples of evaluation results for Iris dataset.From Table 1, it is found that the proposed method is second fastest method.The proposed method, however, shows the highest convergence performance in terms of convergence speed and stability as shown in Fig. 1.
Table 2 shows clustering errors of the proposed and the other conventional methods for each UCI repository data.All the parameters are set as follows, From Table 2, it is found that all of online clustering methods show almost same (within 5%) clustering performance for relatively simple dataset of Iris, Ruspini, while the clustering performance are different for comparatively complicated dataset, Fossil, New thyroid.In such case, the proposed PRCL shows the highest performance.In particular, clustering performance of PRCL for New thyroid is 3.54% better than VQ, and 8.64% better than RGCL as well as 18.13% better than SRGL.It is because that the PRCL is functioning for adjustment of the complexity of the input data by the learning automata.

B. Evacuation Simulations
It is assumed that the peoples in the disaster occurred area evacuate to safe areas such as shelters.Online clustering is considerably effective because the peoples who could evacuate to the shelter has to be disappeared from the input nodes.Furthermore, convergence performance and stability of the convergence status is much more important.Therefore, it is expected that the proposed PRCL does work such cases.
Simulation condition is as follows, 1) 256 by 256 mesh size of the simulation cells are assumed to be disaster occurred area.The shelters are situated at the top right corner and the top middle.Then 100 peoples are distributed randomly in the simulation cells.
2)The peoples move toward one of the nearby shelters by one cell by one cell at once of learning number.
3)The simulation is stopped when all the people is reached to the shelter.
4)There are two evacuation conditions, with and without consideration of the queuing at shelter entrance.
Three methods, Minimum Distance: MD, the proposed PRCL, and the VQ are taken into account.Therefore, there are six methods, MD1 and MD2 with and without que, PRCL1 and PRCL2 with and without que, and VQ1 and VQ2 with and without que.
The action strategy is as follows, 1) if there is no people in the target cell, then the people may forward one step further.
2) if there is people in the target cell, then the people searches an empty nearby cell in clock wise direction 3) when the cell surrounding to the shelters are occupied by the other peoples, the people wait for the next learning number, or next simulation number.
Convergence process of the proposed PRCL with the different parameters of α and β are shown in Fig. 3.It is found that the parameter α is not so effective while the parameter β is relatively effective for residual error expressed with equation (1).Table 3 shows the summarized evacuation simulation results.No turn in Table 3 shows the required number of learning for all the people is evacuated.Meanwhile, Rout 1and Rout 2 denotes the number of peoples who evacuated through the top right and the top middle shelters respectively.No collision denotes the number of which the people could not step due to the cell intended to move is occupied by the other people already.From Table 3, the evacuation time of PRCL is shortest in comparison to the other methods.Furthermore, the number of evacuated peoples from two shelters are almost equal for PRCL comparing to the other methods.The number of collisions, also, shows minimum for PRCL in comparison to the other methods.Therefore, it is concluded that the proposed PRCL is superior to the other typical online clustering methods.Meanwhile, the evacuation performance for the methods of which que is taken into account shows better than those of which que is not taken into account.

IV. CONCLUSION
A new online clustering method based on not only reinforcement and competitive learning but also pursuit algorithm (Pursuit Reinforcement Guided Competitive Learning: PRCL) as well as learning automata is proposed for reaching a relatively stable clustering solution in comparatively short time duration.UCI repository data which are widely used for evaluation of clustering performance in usual is used for a comparative study among the existing conventional online clustering methods of Reinforcement Guided Competitive Learning: RGCL, Sustained RGCL: SRGCL, Vector Quantization, and the proposed PRCL.
The results show that the clustering accuracy of the proposed method is superior to the conventional methods.More importantly, it is found that the proposed PRCL is much faster than the conventional methods.The proposed method is then applied to the evacuation simulation study.It is found that the proposed method is much faster than the conventional method of vector quatization to find the most appropriate evacuation route.Due to the fact that the proposed PRCL method allows finding the most appropriate evacuation route, collisions among peoples who have to evacuate for the proposed method is much less than that of vector quatization.

Fig. 1 .
Fig. 1.Convergence processes of the proposed online clustering method (Pursuit Reinforcement guided Competitive Learning: PRCL) and the other conventional methods with a variety of parameters for UCI repository data

Fig. 2 .
Fig. 2. Relation between cost function J and computation time required for reach J to the designated values

Fig. 3 .
Fig. 3. Relation between parameters α,β and convergence performanceFig.4shows the distribution of peoples who have to evacuate (Left: The learning number of turn is 50 while Right: the learning number of turn is 200).As shown in Fig.3, the number of evacuated peoples for the right and the middle

Fig. 4 .
Fig. 4. Distribution of peoples who have to evacuate (Left: The number of turn is 50 while Right: the number of turn is 200)

Fig. 5 (
Fig.5 (a) shows the number of peoples who have to evacuate, who have evacuated, collision and the queue length (waiting peoples at the evacuation route) for the proposed PRCL (Nc: No. of collision, Ne: No. of peoples who have evacuated, Ne': No. of peoples who have to evacuate, Lq: Length of queue

Fig. 5 .
Fig. 5.The number of peoples who have to evacuate, who have evacuated, collision and the queue length (waiting peoples at the evacuation route) for the proposed and the conventional method of vector quatization Meanwhile, Fig.4 (b) shows the number of peoples who have to evacuate, who have evacuated, collision and the queue length (waiting peoples at the evacuation route) for the VQ.

TABLE III .
EVACUATION SIMULATIONS SUMMERY