Backpropagation with Vector Chaotic Learning Rate

In eural etwork ( ) training, local minimum is an integrated problem. In this paper, a modification of standard backpropagation (BP) algorithm, called backpropagation with vector chaotic learning rate (BPVL) is proposed to improve the performance of s. BPVL method generates a chaotic time series as Vector form of Mackey Glass and logistic map. A rescaled version of these series is used as learning rate (LR). In BP training the weights of become inactive, after arrival of local minima in the training session. Using integrated chaotic learning rate, the weight update accelerated in the local minimum region. BPVL is tested on six real world benchmark classification problems such as breast cancer, diabetes, heart disease, australian credit card, horse and glass. The proposed BPVL outperforms the existing BP and BPCL in terms of generalization ability and also convergence rate. Keywords— eural network; backpropagation; BPCL; BPVL chaos; generalization ability; convergence rate.


INTRODUCTION
Gradient based methods are one of the most widely used error minimization methods used to train back propagation networks.The BP training algorithm is a supervised learning method for multi-layered feed forward neural networks [1,20].BP have been applied to a wide variety of problems, including pattern recognition, signal processing, image compression, speech recognition etc due to its most appealing features and adaptive nature [2].It is essentially a gradient descent local optimization technique which involves backward error correction of the network weights.Despite the general success of BP in learning the neural networks, several major deficiencies are still needed to be solved [3,4].First, the BP algorithm will get trapped in local minima especially for nonlinearly separable problems [4].Having trapped into local minima, BP may lead to failure in finding a global optimal solution [6] second; the convergence rate of BP is still too slow even if learning can be achieved [5].Furthermore, the convergence behavior of the BP algorithm depends very much on the choices of initial values of connection weights and the parameters in the algorithm such as the learning rate and the momentum.Improving the training efficiency of neural network based algorithm is an active area of research and numerous papers have been proposed in the literature.Early days BP algorithm saw further improvements.
There are many improvement and variations of BP with goals of enlarged speed of convergence, avoidance of local minima and improvement in the network's ability to generalize.BP trains the NNs with constant values of learning rate (LR) and momentum factor.When the LR and momentum factor are made adaptive with the training, the performance of BP is increased [8,9].The speed of convergence is accelerated for BP algorithm by using adaptive accuracy of weights, instead of using fixed weight accuracy [10].Dynamic LR of BP algorithm is optimized by using an efficient method of deriving the first and second derivatives of the objective function with respect to the LR [11].Another approach that does not require the second order derivative has been proposed to optimize LR.In this case, a set of recursive formula is formed which accelerate the convergence of BP with remarkable savings in running time [12].Reducing the number of patterns in the active training set effectively increases training efficiency, and accordingly, permits training with a larger pattern set [13].Many modifications that have been proposed to improve the performance of BP have focused on solving "flat spot" [7] problem to increase the generalization ability.However, their performance is limited due to the error overshooting problem.A novel approach called 2P-MGFPROP has been introduced to overcome the error overshooting problem and hence it speeds up the convergence rate.C. C. Cheung enhanced this approach by dividing the learning process into multiple phases, and different fast learning algorithms are assigned in different phases to improve the convergence rate [15].All these methods require much computational effort and these do not guarantee good generalization ability for all the cases.BPCL (Backpropagation with Chaotic Learning Rate) escapes the NN training from premature saturation with chaotic LR.In BPCL to generate chaos Logistic map is use, which is very fast.For this reason we search another learning criteria which is perform much better than BPCL.In this paper, a modified BP algorithm, called 'Backpropagation with Vector Chaotic Learning Rate' (BPVL) is proposed which is a vector form of Mackey glass and Logistic map chaotic time series .BPVL is applied on several benchmark problems including breast cancer, diabetes, heart disease, Australian credit card & horse.For all the problems, BPVL outperforms BP & BPCL in terms of generalization ability and convergence rate.
The rest of the paper is organized as follows.The proposed BPVL is described in section II.Section III includes the experimental studies.The discussion on BPVL is presented in section IV.Section V. concludes the paper.http://ijacsa.thesai.org/II.BPVL BPVL follows the standard BP as well as BPCL.In BPCL the learning rate is chaotic but in BPVL the learning rate is a vector.The Vector is formed by two chaotic time series, one is logistic map [18,19] and another is Mackey Glass [22,23].
The rescaled LR (RLR) is produced by the following eq n : A feed-forward neural network with one input layer, one hidden layer, and one output layer is shown in Fig. 1.Let, the numbers of input, hidden and output units are I, J and K respectively.w ij is the network weight that connects input unit i and hidden unit j, and w jk is the network weight that connects hidden unit j and output unit k.The number of training examples is and any arbitrary n-th training example is {x n1 , x n2 ,……., x nI ; y n1 , y n2 , ……..,y nK }, where x n is the input vector and y n is the target vector.h nj and o nk are the outputs of hidden unit j and output unit k for the n-th training example.∆ represents the difference between the current and new value of the network weights.The consecutive steps of BPVL are given below.
Figure 1.A feed-forward neural network with single hidden layer.
Step 2) Compute the Rescaled LR by using following equation: Step 3) Compute h nj and o nk for the n-th training example by using the following: Where, the sigmoid function ݂ሺ‫ݔ‬ሻ = 1/ሺ1 + ݁ ି௫ ሻ is used as the activation function.Compute the changes of the weights, ∆w ij and ∆w jk for the n-th example using the following: Update w ij and w jk by adding ∆w ij and ∆w jk with them correspondingly.When this process is repeated for all the training examples, iteration is completed.ITE is increased by one.
Step 4) Check the termination condition.If the termination condition is fulfilled, the training is stopped and the trained NN is tested; otherwise go to step 2.

A. Characteristics of Benchmark datasets
BPVL is applied on six benchmark classification problems -breast cancer, diabetes, heart disease, Australian credit card, horse, and glass identification.The datasets are collected from the University of California at Irvine (UCI) Repository of the machine learning database [16] and PROBEN1 [17].Some characteristics of the datasets are listed in TABLE I

B. Experimental Process and Comparison
A feed-forward NN with single hidden layer is taken for each problem.The numbers of input and output units of the NN are equal to the number of attributes and number of classes of the dataset respectively.The number of hidden units is taken arbitrarily for several datasets.The weights of the NN are initially randomized in the interval (-1, +1).The training is stopped when the mean square error on the validation set increases in consecutive five iterations.In order to check the generalization performance of trained NN, the 'testing error rate' (TER) is computed.TER is the ration of the number of w ij w jk x n1 iterations.Here TER is taken in dB 100 times of orginal error Hence BPVL obtains good generalization ability than that of BPCL as well as BP and the number of required iteration with BP is about two times of BPVL and Less than BPCL.Fig. 2 shows the training error of BPVL,BPCL and BP with respect to iteration..    IV.BPVL outperforms BP & BPCL in all the cases.For example, when RLR is -0.03 to 0.15, TER and iteration of BPVL are 13.0125(dB) and 21.50, while these are 13.1952(dB) and 57.75 for standard BP & 13.0535(dB) and 31.50 for BPCL.The convergence curve of BPVL is also better as shown in Fig. 4.  V shows the comparative results between BPCL and BP.When RLR is -0.03 to 0.15, TER and iteration of BPVL are 11.4489(dB) and 22.40, while these are 11.8011(dB) and 41.90 for standard BP 11.5776(dB) & 31.80 for BPCL respectively.These results ensure that the generalization ability of BPVL is better than that of BP & BPCL for credit card problem.Fig. 5 shows the competitive convergence curves of BPVL,BPCL and BP.

IV. DISCUSSION
The difference between BPCL and BPVL is that BPCL makes the LR chaotic during the training by logistic map while BPVL trains the network by a Vector learning rate, which is a vector form of two chaotic time series, Mackey glass and Logistic map.Logistic map is the fastest chaotic time series; that is why consecutive learning rate difference is high in BPCL.To optimize this problem vector learning rate is used in BPVL.The training error curve of BPVL is improved and sometimes it is marginally improved than that of BPCL & BP.However, BPCL is proposed to achieve a good generalization ability, not to improve the convergence, although BPVL shows fastest convergence too.
The NN training is terminated based on validation examples.When the mean square error on the validation set starts to increase, BPVL stops the training.There is a possibility to fall the validation error into local minima and so the mean square error is checked in five consecutive iterations.

V. CONCLUSION
In this paper, a supervised training algorithm, called BPVL is proposed.BPVL works on the learning parameter of training.Standard BP trains NNs with a constant value of LR.On the other hand, biological systems such as human brain involve chaos.To fill up this gap, BPVL trains NNs with LR which is a vector chaotic time series.The chaotic series is generated by using the complex vector form of the Mackey Glass and logistic map and a rescaled version of this is intentionally incorporated with the training.Due to the nonlinear error surface of real world problem, BP training often falls into premature saturation that leads the training to a non updating zone.Since most of the time, the LR is positive and sometimes it is negative, BPVL resolves this problem.
BPVL is applied on six benchmark classification problems to observe the training performance.BPVL is capable to train NNs with better generalization ability and also faster convergence rate than that of standard and

3 )
Heart Disease: The size of NN is considered as 35-4-2.This problem has 920 examples.The first 460 examples are used to train the network, 230 examples for validation and the trained network is tested with last 230 examples.The obtained average results are reported in TABLE (t) is logistic map time series & M (t) is Mackey Glass time series.However, proposed BPVL is explained as follows. C . For example, cancer is a 2 class problem having total examples of 699 with 9 attributes.Other problems are arranged in a similar fashion.The total examples of a dataset are divided intotraining examples, validation examples, and testing examples.The training examples are used to train the NN, validation examples are used to terminate the training process and the trained NN is tested with the testing examples.

TABLE I .
CHARACTERISTIC OF BENCHMARK DATASETS.

umber of Total Examples Input Attributes Output Classes
nK http://ijacsa.thesai.org/classified testing examples to the number of testing examples.The experimental results are reported in terms of TER and number of required iterations.Mean and SD indicate the average and standard deviation values of 20 independent trials.BPVL is compared with standard BP & BPCL.To make a fair comparison, the LR of BP is selected in such a way that, this fixed value of learning rate is always an intermediate point in the range [min (RLR) & max (RLR)] 1) Breast Cancer: This problem has 699 examples.The first 350 examples are used as training, 175 examples as validation and last 174 examples for testing.A NN of 9-4-2 (nine input units, four hidden units, and two output units) is taken.The experimental results are listed in TABLE II.When RLR is -0.03 to 0.15, BPVL requires 75.40 iterations to obtain TER of 0.7115(dB), BPCL requires 87.85 iterations to obtain TER of 1.0037(dB)In this range, the LR of BP is considered at 0.06.The TER with BP is 1.9033(dB) and BP requires 152.65 o

TABLE II .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP,BPCL AND BPVL FOR CANCER PROBLEM OVER 20 INDEPENDENT RUNS.This problem has 768 examples .The first 384 examples are used for training, 192 examples for validation and the last 192 examples are used for testing.Here the size of NN is 8-4-2.The results for different LRs are reported in TABLE III.It is shown that BPVL has fast convergence rate than BPCL & BP for all the cases.

TABLE III .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP, BPCL AND BPVL FOR DIABETES PROBLEM OVER 20 INDEPENDENT RUNS.

TABLE IV .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP, BPCL AND BPVL FOR HEART PROBLEM OVER 20 INDEPENDENT RUNS.The training is stopped with 173 validation examples and the NN is tested with last 172 testing examples.TABLE

TABLE V .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP, BPCL AND BPVL FOR CARD PROBLEM OVER 20 INDEPENDENT RUNS.The horse dataset has total 364 examples.The first 182 examples are used for training, 91 examples for validation and last 91 examples for testing.A 58-7-3 NN is trained.The numerical results are reported in TABLE VI and the error curves are shown in Fig. 6.When RLR is -0.02 to 0.15, TER and the number of required iterations of BPVL are 11.5776(dB) and 30.40 while these are 14.6419(dB) and 47.05 for standard BP,.14.5163(dB) and 38.5 for BPCL respectively.

TABLE VI .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP, BPCL AND BPVL FOR HORSE PROBLEM OVER 20 INDEPENDENT RUNS.The glass dataset has total 214 examples.The first 107 examples are used for training, 54 examples for validation and last 53 examples for testing.A 9-6-6 NN is trained.The numerical results are reported in TABLE VII and the error curves are shown in Fig.7.When RLR is -0.02 to 0.15, TER and iteration of BPVL are 15.9295(dB) and 131.60, while these are 16.0239(dB) and 186.55 and 16.2407(dB) and 410.50 for BPCL and standard BP respectively.Here the numbers of required iterations for all examples with BP are about four times than that of with BPVL.Fig.7shows that BPCL has better convergence rate than that of BP.

TABLE VII .
TERS AND NUMBER OF REQUIRED ITERATIONS WITH BP,BPCL AND BPVL FOR GLASS PROBLEM OVER 20 INDEPENDENT RUNS.