A two-level on-line learning algorithm of Artiﬁcial Neural Network with forward connections

—An Artiﬁcial Neural Network with cross-connection is one of the most popular network structures. The structure contains: an input layer, at least one hidden layer and an output layer. Analysing and describing an ANN structure, one usually ﬁnds that the ﬁrst parameter is the number of ANN’s layers. A hierarchical structure is a default and accepted way of describing the network. Using this assumption, the network structure can be described from a different point of view. A set of concepts and models can be used to describe the complexity of ANN’s structure in addition to using a two-level learning algorithm. Implementing the hierarchical structure to the learning algorithm, an ANN structure is divided into sub-networks. Every sub-network is responsible for ﬁnding the optimal value of its weight coefﬁcients using a local target function to minimise the learning error. The second coordination level of the learning algorithm is responsible for coordinating the local solutions and ﬁnding the minimum of the global target function. In the article a special emphasis is placed on the coordinator’s role in the learning algorithm and its target function. In each iteration the coordinator has to send coordination parameters into the ﬁrst level of sub-networks. Using the input X and the teaching Z vectors, the local procedures are working and ﬁnding their weight coefﬁcients. At the same step the feedback information is calculated and sent to the coordinator. The process is being repeated until the minimum of local target functions is achieved. As an example, a two-level learning algorithm is used to implement an ANN in the underwriting process for classifying the category of health in a life insurance company.


I. INTRODUCTION
In practice many ANN structures are used but the most popular are the ANNs with forward connections that have a complete or semi-complete set of weight coefficients.The structure of an ANN is depicted in (Fig. 1).Neurons in both the hidden and the output layers use sigmoid or tanh activation functions.In the output layer the linear activation function is usually used for approximation tasks.In the most common structures hidden layers include more neurons than input layers, so input information is not compressed in the hidden layers.In this paper two assumptions are accepted: • To define an ANN structure only the hidden layers and output layer are included.A network described as ANN (10-15-8) includes 10 neurons in the input layer, 15 neurons in one hidden layer and 8 in the output one.
It is a two-layer ANN.
• To implement a two-level learning algorithm, an ANN with one hidden layer is used.The concept layer is used in the primary sense.Fig. 1: Scheme of the ANN with forward connections Using (Fig. 1) symbols a set of forward and back formulas can be written.For forward For the first layer (the hidden layer) The coordinator is described by the Ψ function For the second layer (the output layer) The target function (IJARAI) International Journal of Advanced Research in Artificial Intelligence, Where: j = 0, 1, ..n 0 -number of input neurons, i = 0, 1, ..n 1 -number of hidden neurons, k = 1, 2, ..n 2 -number of output neurons.
Using a standard backpropagation notation, derivatives with respect to the weight coefficients are achieved Equation ( 4) is known as coordination function.

II. TYPES OF HIERARCHICAL MODELS
Using concepts described in [1], an ANN will be treated as a complex system in an internal hierarchical structure.Three terms are introduced in relation to an ANN: • The layer of both an ANN and a learning algorithm description or abstraction, • The layer of algorithm complexity, • The layer of algorithm structure.To distinguish between these concepts, the following three terms: a stratum, a level, and an echelon, are used respectively.The term layer is used as a common term referring to any of the aforementioned concepts.For future use of the formal description of different concepts of the hierarchical structures, we describe an ANN as a relation between sets X and Y.

A. ANN layers of description or abstraction
To treat an ANN as a complex system and to describe it in a complete and detailed way, a different approach should be used.There arises the dilemma of the simplicity of description and the complete understanding of an ANN's behavior [1].At the first stage, a verbal description is used to help understand how an ANN is built.For a more detailed analysis, mathematical descriptions using algebra and/or differential equations are required.Finally, math formulas have to be implemented in a computer program or an electronic device.Therefore, to achieve a complex description of an ANN, a family of concepts and models from different fields of science and technology have to be used.Every model uses its own set of variables, laws, principles and terminology by means of which an ANN is described.For such a hierarchical description the functioning on any level should be as independent as possible.
To separate this concept of hierarchy from others, a new name is used, a stratified ANN or a stratified description [1].The layer of abstraction will be referred as stratum (Fig. 2).

Fig. 2: The ANN stratification description
Using this definition we can state that [1]: • the selection of strata, in terms of which an ANN is described, depends on the scientist, their target and needs.• the concepts in which every stratum is described should be as independent as possible.
• one can comprehensively understand how an ANN is working, moving down from the hierarchy of strata.• a stratified description implies a reduction in information sent up the hierarchy by the reduction of information.The input set X and the output set Y are both representable as Cartesian products.It is assumed that there are given two families of sets : Where: n s -the number of strata in which one describes an ANN structure.If concepts in which every stratum is described are fully independent, the ANN stratification can be described as: Where: i = 1, 2...n s

B. Organisational hierarchy
For a multi-layer ANN a lot of hidden layers and one output layer are sectioned off.The smaller part will be described as a sub-network.Every sub-network has its own output vector that is, at the same time, an input vector of the succeeding one V i j , i = 1, 2, ...n − 1, j = 2, 3...n, where n is the number of sub-networks.Because of the specific organisation of an ANN's hierarchy there are a lot of sub-networks on the first level, for each of which local target functions are defined: These sets of local tasks have to be coordinated to achieve the global solution.The coordinator, as an independent task, will have its own target function Ψ. Taking everything into account, this concept is the base on which one may build the new scheme of the ANN learning algorithm structure (Fig. 3).It is the hierarchical organisational structure.To distinguish this concept of hierarchy structure from others, the echelon is used.

Fig. 3: The hierarchical structure of an ANN learning algorithm
The two-level ANN learning algorithm can be described as a set of procedures.The procedures on the first level are responsible for solving their local tasks and calculating the part of matrix weight coefficients .The second-level procedure has to coordinate all the local procedures (tasks) using its own local target function.There is the vertical interaction between the procedures and two types of information are sent.One is a downward transmission of control signals: The second is upward from the first level to the second.It is a feedback signal that informs the coordinator about the behaviour of the first-level tasks: Consequently, in all the structures, three different task are defined: • the global target function • a set of the first-level tasks (the first level task) Where: Where: To build the two-level learning algorithm two assumptions have been made: • There is no explicit relation between the procedures on the first level for direct communication.The procedures are using only input and output vectors and a coordinator signal.• There is no direct relation between the global target function Φ and the coordinator task Ψ.

C. Levels of calculation complexity
The standard ANN learning algorithm is a non-linear minimisation task without constraints.To solve this task, iteration procedures are used.Using the most popular back propagation algorithm, one has to choose a lot of control parameters.From a theoretical point of view one can have only general suggestions and recommendations regarding the choice of real parameters, for example learning parameters.The algorithm is time-consuming and its convergence is not fast.Dividing the primary algorithm for all ANNs into the sub-network tasks, the local target functions are simpler and can be used in different procedures.Additionally, a new procedure is needed: the coordination procedure.In practice, however, the coordinator does not have the ability to find all the parameters needed for the first-level procedures.To solve this problem, a multi-level decision hierarchy is proposed [1].Solving the problems in the iteration algorithm on both the first and the second level, one can observe certain dynamic processes.These processes are non-linear and use a lot of control parameters.During the learning process these parameters are stable and do not change.Practice proves that this solution is not optimal.To control the way learning parameters change, an additional level could be used the adaptation level (Fig. 4).
Thus, one can build three levels at a minimum: • The local optimisation procedures: the algorithm is defined directly as a minimisation task without constraints.• The coordination procedure: this algorithm could be defined directly as a minimization of the target function as well.Constraints could exist or not.• The adaptation procedure: the task or procedure on this level should specify the value of learning parameters not only for the coordinator level, but also on the first level.
To solve this task, a procedure should achieve dynamic characteristic of the learning process on all the levels.
As a conclusion one can state that the complexity of the problem increases from the first level to the next one.The coordination and adaptation procedures need more time to solve their own procedure.The two-layer ANN with an input layer, one hidden layer and an output layer can be used for further considerations.This simple structure is very popular and by using it one can solve a lot of practice tasks.Since this network is used to solve different classification tasks, sigmoid activation functions are used.To decompose the standard learning algorithm structure into a sub-network task, the coordination target function has to be built.The two-level learning algorithm structure for the ANN with one hidden layer is shown in (Fig. 5).

Fig. 5: Scheme of two-level learning algorithm structure
According to [2][8] the following set of formulas can be written.
1) For the first sub-network: The local target function Φ1 is defined as error-mean-square Other relations Where: γ1− a target value given by the coordinator Total derivatives with respect to the weight coefficients of matrix W 1.
A sigmoid derivative function The new value of weight coefficients Where: α 1 -the learning coefficient β 1 -the regularisation coefficient The feedback information sent by the first sub-network to the coordinator Where: i = 1, 2, ...n 1 2) For the second sub-network: The local target function Φ2 is also defined as error-mean-square.
Other relations Where: γ2 i the input value given by the coordinator.Total derivatives with respect to the weight coefficients of matrix W2.
And the sigmoid derivative The new value of the weight coefficients As we state in (30), the local target Φ2 is the function of W 2 weight coefficient and γ2 parameters given by the coordinator.The total derivatives with respect to the coordination parameters γ2 are: New feedback information sent by the second sub-network to the coordinator: 3) For the coordinator: In a two-level learning algorithm, the coordinator plays the main role.It is now time to decide what kind of coordination principle will be chosen.This principle specifies various strategies for the coordinator and determines the structure of the coordinator.In [1] three ways were introduced in which the interaction could be performed.
• Interaction Prediction.The coordination input may involve a prediction of the interface input.• Interaction Decoupling.Each first-level sub-system is introduced into the solution of its own task and can treat the interface input as an additional decision variable to be free.This means that sub-systems are completely decoupled.• Interaction Estimation.The coordinator specifies the ranges of interface inputs over which they may vary.In this article the Interaction Prediction is used.The coordinator predicts the interface between sub-networks.This means that the output of the first sub-network V 1 1 and the input of the second sub-network V 1 2 are predicted.The signal γ1 predicts the output signal of the first sub-network V 1 1 .The first sub-network uses this signal as a teachers value and it is a part of the target function Φ1 of the first sub-network according to formula (22).The coordinator predicts signal γ2 as well.The second sub-network uses this signal as the input value V 1 2 .Using this assumption, it gives the ability to define the local target function Φ2 of the second subnetwork (30).Consequently, two local target functions Φ1 and Φ2 can be defined.As stated above, the coordinator needs the feedback information from the first-level sub-networks checking if the predicted signals γ1, γ2 were true.If not, the coordinator using its own target function should find the new value of the coordination signal γ1, γ2.The first subnetwork using the formula (24) calculates the new value of its output signal which,,at the same time, is the feedback signal 1 to the coordinator (29).The second sub-network is trying to minimise the local target function Φ2 and calculate the new optimal value of input signal 2 (38), which is sending to the coordinator.Therefore, the coordinator has full information and is ready to calculate and predict the new value of the coordination input signal γ1, γ2.Taking into account that, the coordinator target function is defined as: Using gradient algorithm one can calculate using gradient algorithm one can calculate The new value of the coordinator signal γ1 i γ2 i Fig. 6: Functional multi-level hierarchy for an ANN learning algorithm

IV. EXAMPLE
In a life insurance company the underwriting process has been playing the main role in risk control and premium calculation.The ANNs could be used to help the insurance agents to classify the insurance applicant and calculate the first level of premium.Therefore, a special short questionnaire was prepared which only includes 10 main questions.The data were used to teach the ANN work as an insurance specialist, known as the underwriter.All data were divided into three subsets: • The first set of both the input data X and the output data Z included 250 records of data.This set is known as the learning set.As an example, a small part of the input data is shown in the (Fig. 7).The learning epoch includes 250 (IJARAI) International Journal of Advanced Research in Artificial Intelligence, www.ijarai.thesai.orgvectors that are sending into the ANN input one by one.When this sequence of the process is finished, the next iteration begins.• The second set is used to verify the quality of the learning process.This set contains 150 records.It is known as verification set.• Finally, the third set, known as the testing set, contains only 100 records.It helps the decision-making specialist to decide whether the ANN achieves good quality and if it is ready for use.

Fig. 7: An input data example
To achieve this, the two-level learning algorithm has been used to teach the ANN.The structure of the ANN includes only one hidden layer: 10 input neurons as the dimensionality of the vector X, 15 neurons in the hidden layer and 8 neurons in the output layer.This structure can be shortly described as the ANN (10-15-8).Two sub-networks were introduced in accordance with the algorithm description.The first subnetwork includes the hidden layer and its local target function Φ1.The second sub-network includes the output layer with its local target function Φ2.The coordinator has its own local target function Ψ and coordinates local tasks to achieve the minimum of the global target function Φ (the whole ANN).The main goal of this example is to study the dynamic characteristic of the ANN learning process, especially the relations between all the target functions: two local target functions Φ1 and Φ1, the coordinator target function Ψ and the global target function Φ. Al values have to achieve the minimum value according to the relation, as shown in formula (22)(30)(40).
In Fig. 8. the dynamic characteristic of the learning process of the first sub-network is shown.In the beginning phase, the target function Φ1 decreases its value from 1.2 to less than 0.1 during 4,000 iterations.After that, it very slowly decreases its value to the target value 0.001.
To study the dynamic characteristic of the second subnetwork (Fig. 9.), one can say that in the beginning phase of the learning process, errors occur less frequently and achieve value 0.1 after only 2,000 iterations.The differences between the sub-networks can be explained by the dimensionality of W 1 and W 2 matrices.The matrix W 1 includes 165 weight coefficients while W 2 includes only 128 weight coefficients.Finally, the coordinator target function is shown in Fig. 10.The quality of the dynamic processes is the same for both Φ1 and Φ2.The starting error is the greatest and achieves its value 0.1 after 6,000 operations, which can be explained by the relations between the sub-networks.When the learning process started, the sub-networks were not connected (decoupled).This means that every sub-network has to change both the weight coefficient and the input− output vectors using the coordinator gamma signal.The coordinator calculates the optimal γ value using feedback information 1, 2 from both sub-networks.This is the iteration process and it has two stages.During the first stage all errors decrease their value dramatically, rather quickly achieving value less than 0.1.After that, the process stabilises and achieves the final value after a   When the ANN achieves the final learning result, the verification sets are used and the differences between the teachers data and the ANN's calculations are collocated (Fig. 12).It can be seen that not all the output vectors are the same.In most cases, the ANN calculates less than the teacher (an insurance specialist).The chart shows a number of categories for a couple of insurance candidates.When a category is higher, the insurance premium is greater as well.Fig. 12: Result of the ANN's learning Therefore, one can state that the ANN is more conservative than a life insurance company specialist.For a few candidates it calculates the higher category and they would pay higher premium.Finally, the dynamic characteristic of the global target function is shown (Fig. 13).The characteristic is closely related to all the above.The maximum value is higher and the process needs more time to achieve value less than 0.1, namely after about 8,000 iterations (the sum of the local values).

V. CONCLUSION
In [1] for the big systems with hierarchical structure, three coordination principles are defined.For the ANN learning process interaction prediction was used.Each sub-network is www.ijarai.thesai.orgresponsible for finding the minimum value of its own target function to treat the interface inputs as additional variables.The γ signal plays this role.For the first sub-network, γ1 is used as the teacher data and the sub-network should change its own weight coefficients in such a way that the final subnetworks output should be as close to the teacher value as possible (in a square error sense, of course).For the second sub-network, gamma works as the input vector.This vector and the teacher vector are used to train the sub-network.The coordinator is responsible for finding the optimal value of the γ1 signal, using its own target function Ψ.The underwriting process was used to show that this learning algorithm structure is able of finding the minimum of the global target function Φ for quite a complicated problem and the ANN is ready to work.The weight coefficient for both W 1 and W 2 matrices were memorised and the program was sent to insurance agents to use.As it has been emphasised, during the second stage, the learning algorithm works far away from the optimal value.Convergence errors decrease their values very slowly and (Fig. 15.) reeffirmed the quality of the dynamic characterics: and Where: n− number of iterations Therefore coordinator should use a more complicated coordination algorithm that has to include not only a PD algorithm structure but a PID algorithm as well.This work will be continued.Analysing the learning result shown in Fig. 12. one can see that the ANN should find a discrete category value (from the set 1 to 8).From time to time, the network solutions are different than the specialist's decisions.Shifting the solution into fuzzy sets could solve this conflict.What follows is the suggestion that the second sub-network should

Fig. 4 :
Fig. 4: Functional multi-level hierarchy for an ANN learning algorithm

Fig. 8 :Fig. 9 :
Fig. 8: The target function Φ1 of the first sub-network depending on the iteration number

(
IJARAI) International Journal of Advanced Research in Artificial Intelligence, considerably long time (number of iterations).This part of the algorithm is not optimal and the coordinator should change the strategy to calculate a new γ value.

Fig. 10 :
Fig. 10: The coordinator target function value Ψ depending on the iteration numberr

Fig. 11 :
Fig. 11: The part of the learnig process including vibrations

Fig. 13 :
Fig. 13: Learning the global target function Φ for the ANN Characteristics of the feedback signals 1(n) end 2(n) are depicted on Fig.14.