Dynamic Modification of Activation Function using the Backpropagation Algorithm in the Artificial Neural Networks

The paper proposes the dynamic modification of the activation function in a learning technique, more exactly backpropagation algorithm. The modification consists in changing slope of sigmoid function for activation function according to increase or decrease the error in an epoch of learning. The study was done using the Waikato Environment for Knowledge Analysis (WEKA) platform to complete adding this feature in Multilayer Perceptron class. This study aims the dynamic modification of activation function has changed to relative gradient error, also neural networks with hidden layers have not used for it. Keywords—Artificial neural networks; activation function; sigmoid function; WEKA; multilayer perceptron; instance; classifier; gradient; rate metric; performance; dynamic modification


I. INTRODUCTION
The behavior of artificial neural networks has been an area that has been extensively studied over time, providing consistent results.It studied the computational power of artificial neural networks by activating recurring sigmoid.[1] This aspect occurs due to the diversity of systems which are described by datasets used for exercising them.
In applications, the activation function and gradient selection are impacted directly by network convergence.[2].
In this context, the universal behavior of Turing neural networks has been studied for a specific activation function which was called linear function by form: An activation function in computing of neural artificial networks plays an important role [3,4].
Neural networks consist of an important method for analysis of activation function because of their capability to deal with data sets which change.For example, in Multilayer Perceptron (MLP), the most common system, neurons are organized in layers [5].
One of the activation functions studied on large scale in the literature is the sigmoid function: These functions represent the main activation functions, being currently the most used by neural networks, representing from this reason a standard in building of an architecture of Neural Network type [6,7].
The activation functions that have been proposed along the time were analyzed in the light of the applicability of the backpropagation algorithm.It has noted that a function that enables differentiated rate calculated by the neural network to be differentiated so and the error of function becomes differentiated [8].
The main purpose of activation function consists in the scalation of outputs of the neurons in neural networks and the introduction a non-linear relationship between the input and output of the neuron [9].
By the other hand, the sigmoid function is used usually for hidden layers because it combines the linear, curvilinear and constant behavior depends by the input value [10].
Also, it has demonstrated that sigmoid function is not efficiently for a single hidden unit, but when more hidden units are involved, it becomes more usefully [11].
Our approach in this paper consists in dynamic modifying of activation function using backpropagation algorithm for training an artificial neural network.

II. BACKPROPAGATION ALGORITHM
It is the most common algorithm used to train neural networks regardless of the nature of the data set used.The Neural network architecture is determined by repeat trials, the goal is to obtain the best possible classification of data set used in this context [8,12].Backpropagation algorithm is a supervised learning algorithm and his purpose is to correct the error.It compares the computed output value with the real value, and it tries to change the weights through the calculated error, and in analog manner until the size of obtained error becomes smaller than the error obtained in the first round.The backpropagation network composition is detailed in Fig. 1 [13].
As learning strategy, the backpropagation algorithm proved to be effectively in that it ensures a classification whose accuracy is generally satisfactory [14,15].www.ijacsa.thesai.orgThe main disadvantage of this learning technique is the fact that they are required repeated attempts to establish network architecture, number of hidden layers and number of neurons in each hidden layer, in the context in which training requires a lot of resources as memory and runtime, as in Fig. 2.
The classical backpropagation algorithm is divided in two phases: the first phase consists in the propagation of useful information from the input layer toward the exit, then spread in the reverse direction of errors, and the second phase consists in updating the weights. In the first phase, the input data propagate through the network to generate the output values, then calculate the error and propagate back toward the hidden layers and toward the entry to generate the differences between the obtained values and actual values for each neuron.
 In the second phase, for each weight calculates the gradient of error using generated differences in the first step.This weight is updated in the opposite direction to the error by a coefficient of learning.
Backpropagation algorithm has been focused on theorems which have guaranteed "universal approximation", a property of neural networks [16].

III. BACKPROPAGATION ALGORITHM WITH DYNAMIC ACTIVATION FUNCTION
The main element of an artificial neural network is artificial neuron, which can be represented by a sum and a function, so the network will be composed of several interconnected functions.More clearly, these functions are the filters through which the information passes.And depending on how we want to learn neural network, these functions have specific characteristics to the chosen purpose [17,19].
One of the most important characteristics of an artificial neural network is the activation function [7].
One of the most important activation functions is sigmoid function.From point of view of evolution, this function has developed from Heaveside function as it is showed in Fig. 3.
The sigmoid function can be calculated as it is showed in Fig. 4; respectively its curve can be seen.
Among the most commonly available types of activation functions we mention them in Fig. 5: Whatever type of activation function and its characteristics, it remains unchanged during the training process.
We consider that this aspect represents a serious restriction.How learning is a process of dynamic change of weights which characterize numerically the connections between neurons of network, it is normally that this learning process to be reflected in the dynamic activation function by dynamic changing of its characteristics [20].www.ijacsa.thesai.orgIn other words, learning is both dynamic modification of weights and also dynamic modification of activation function characteristics.
To achieve this goal we start from a classical activation function, more exactly sigmoid function (Fig. 4) wherein the β coefficient value is set dynamically during each epoch of learning simultaneously with the adjustment of the weights.
The initial value was considered to be 1 so that this value can change (increase or decrease) depending on the error level of present gradient in each of the learning epochs.The percentage change of this coefficient and its direction of adjustment (increase or decrease) is set by the user at the beginning of the learning process.Added code can be viewed in Fig. 6: From point of view of user, the modification sets initial settings for training process which can be visualized in Fig. 7 with the implicit values which are β=10% and modification direction variant=True.
The study was performed taking into account a total of 10 sets of data provided by this platform as follows.
For these date sets we can observe the experimental results in the following tables and diagrams, which are centralized in Tables 1 and 2 and Fig. 8.After analyzing the experimental results several observations can be made: 1) Introduction of the concept according to the dynamic activation leads to a change in drive definite process that quickly leads to a reduction of the error learning.
For example, this aspect is visible in the graph of error evolution in the case of using the iris.arffdata set (β = 10%, variant = True) as it is showed in Fig. 9.
It can be seen in the first 50 epochs the modification of characteristics activation function leads to a search process directed toward a lower error in the training process.This aspect is present also in the case (β=5%, variant=True), it can be visualized in the graph in Fig. 10.
2) For a number of four data sets (zoo.arff,mushroom.arff,anneal.arff,carr.arff)reveals a definite reduction of the error obtained at the end of the 500 epochs of training.
The error of training is reduced as follows: 30% for zoo.arff, 80% for mushroom.arff,85% for anneal.arffdata set respectively with 15 for carr.arffcompared to the situation where the training to perform a static activation function whose features remain constant in the training process.The only data set where error of training increases is credit_g.arff,whose error increases by 15%.For other datasets variation of error is insignificant.
3) An interesting aspect is related to the following thing: for credit_g.arffdata set, whose error increases, we can see that in testing phase it keeps the same number of instance correct classified as in the case of using an unchanged activation function.This aspect is remarked for zoo.arff data set, which for β=5% coefficient we have on the hand an increase of training error equal to 136.9%, but on the other hand when we pass to testing phase, doesn't appear any variation in the number of instances classified correctly.This aspect looks to be a result in fact the instances from data set for a static activation function aren't classified correctly, so in the case of activation function some data set are classified correctly meanwhile other instances are incorrectly classified.
4) We can see that significant differences occur between a dynamic activation of function with β = 10% and a dynamic activation of function with β = 5%.But what we can also see in both cases is an improvement in accuracy of classification.It is also specified that will be necessary to find the optimal of dynamic modification for activation function from a training epoch to another training epoch.

VI. CONCLUSIONS
Introducing the concept of dynamic modification during the training process leads to an increase of accuracy for classification.We can remark that a classical training algorithm such as backpropagation algorithm becomes more flexible in the meaning that the learning algorithm is faster guided toward a classification error generally lower.
The proposed method is attempted to monitor the error and check the status for accuracy for different data sets.
Influence of the nature of the data set is no longer reflected in the weights of the neural network but it can be reflected in the activation function.For this reason, the dynamic modification of the activation function may represent a significant way to improve the classification neural network in the wide range of directions where can advance this concept.


IV. EXPERIMENTAL EXPERIMENTS In this study we took in consideration an activation function by sigmoid type in the context of a neural network architecture without hidden layers.The study has been done considering:  The β coefficient value of activation function decreases if the error increases during an epoch, respectively it increases.The direction can reverse (β coefficient value of activation function increases if the error increases during an epoch, respectively decreases, if the error decreases) is defined at the start of training. Percentage to scale this coefficient is set at the beginning of the training.It is defined by user.In our experiments we consider the scenario: The β coefficient of activation function decreases if the error value increases during an epoch, the coefficient increases, if the error decreases. The percentage of modification for β coefficient was 10%, respectively 5% from actual value during training epoch.To achieve the scenario above, Multilayer Perceptron Classifier has been changed through Weka 3.8.3platform provided by adding a parameter that determines the direction of change of β value for activation function as the coefficient which is modified.The two parameters are:  modRate which specifies the percentage of modification for β coefficient. variant which defines the direction of change for β value (True -in case the value of β coefficient for activation function decreases, if the error increases during an epoch, respectively it increases, if the error decreases and False otherwise.

TABLE I
Fig. 8. Variation Error Per Epoch and Correctly Clasified Instances in Dynamic Mode.www.ijacsa.thesai.orgV. EVALUATION IMPACT OF TRAINING PROCESS BY DYNAMIC INTRODUCING ACTIVATION FUNCTION