Construction of Neural Networks that Do Not Have Critical Points Based on Hierarchical Structure

a critical point is a point at which the derivatives of an error function are all zero. It has been shown in the literature that critical points caused by the hierarchical structure of a real- valued neural network (NN) can be local minima or saddle points, although most critical points caused by the hierarchical structure are saddle points in the case of complex-valued neural networks. Several studies have demonstrated that singularity of those kinds has a negative effect on learning dynamics in neural networks. As described in this paper, the decomposition of high- dimensional neural networks into low-dimensional neural networks equivalent to the original neural networks yields neural networks that have no critical point based on the hierarchical structure. Concretely, the following three cases are shown: (a) A 2-2-2 real-valued NN is constructed from a 1-1-1 complex-valued NN. (b) A 4-4-4 real-valued NN is constructed from a 1-1-1 quaternionic NN. (c) A 2-2-2 complex-valued NN is constructed from a 1-1-1 quaternionic NN. Those NNs described above do not suffer from a negative effect by singular points during learning comparatively because they have no critical point based on a hierarchical structure.


I. INTRODUCTION
A neural network is a network composed of neurons, and can be trained to find nonlinear relationships in data.NNs have been studied for many years in the hope of achieving human-like flexibility to process information.The common objective of training of a neural network is to determine the global minimum of an error function.However, learning algorithms for NN such as the back-propagataion learning algorithm take a very long time to find the global minimum due to the standstill of learning generally.They demonstrated that critical points in a three-layer realvalued NN with 1 H  hidden neurons behave as critical points in a three-layer real-valued NN with H hidden neurons, and that they are local minima or saddle points.This kind of critical point turns into singular points of a real-valued NN to stagnate training.
A complex-valued NN extends (real-valued) parameters such as weight and threshold values in an ordinary NN to complex numbers.It is suitable for information processing of complex-valued data and two-dimensional data.Moreover, it is applicable to communications, image-processing, biologic information processing, land-mine detection, wind prediction, independent component analysis (ICA), etc. Reportedly, a critical point in a three-layer complex-valued NN also behaves in the same manner as that in a three-layer real-valued NN [1]: critical points in a three-layer complex-valued NN with 1 H  hidden neurons turn into critical points in a three-layer complex-valued NN wit H neurons, which are saddle points (except for cases meeting rare conditions).Such singular points have been emerging lately as objects of study.Learning models with a hierarchical structure or symmetry of exchange of weights, such as a hierarchical NN and Gaussian mixture model, usually have a singular point.It has been revealed that a singular point affects the training dynamics of a learning model and that it engenders stagnation of training.This paper presents an attempt to implement an NN having no critical point based on a hierarchical structure.

II. ANALYSIS
In this section, it is demonstrated that NNs having no critical point based on a hierarchical structure can be constructed by decomposing a high-dimensional NN into equivalent lower-dimensional NNs.

A. Construction of a 2-2-2 real-valued NN
A 2-2-2 real-valued NN having no critical point based on a hierarchical structure is constructed from a 1-1-1 complexvalued NN.
We will use a ib  C for the weight between the inputneuron and the hidden neuron, v iw  Cfor the weight www.ijacsa.thesai.org between the hidden neuron and the output neuron, c id  C for the threshold of the hidden neuron, and p iq  C for the threshold of the output neuron, where i denotes 1  and C denotes the set of complex numbers.We assume that 0 a ib and 0 v iw .Let x iy  C denote the input signal, and let X iY  C denote the output signal.We will use activation functions defined by the following equations: ( ) tanh( ) tanh( ), (1) for the hidden neuron, and ( ) , for the output neuron.This 1-1-1 complex-valued NN is apparently equivalent to a 2-2-2 real-valued NN (called NET 2 here) shown in Fig. 1.

Proposition 1 NET 2 has no critical point based on a hierarchical structure.
(Proof) Assume a 2-1-2 real-valued NN obtained by removing the hidden neuron 1 from the NET 2 (called NET 3 here) (Fig. 2).Also assume that the learning parameter of NET 3 is a critical point that implements mapping 1 ( , ) F x y .It is necessary to realize any one of the following three conditions for implementing the same mapping 1 F by appending once- removed hidden neuron 1 to NET 3 again.
1) A weight vector between hidden neuron 1 appended and the two output neurons is 0 .0 vw  must hold in this case, but this violates the assumption 0 v iw  .
2) A weight vector between hidden neuron 1 appended and the two input neurons is 0 .0 ab Must hold in this case, which violates the assumption 0 a ib  .
3) For weight vector 1 w between hidden neuron 1 appended and the two input neurons, and the weight vector 2 w between hidden neuron 2 and the two input neurons, 12  w w or 12  w w .0 ab Must hold in this case, but this violates the assumption 0 a ib  .
Therefore, mapping 1 F cannot be implemented by NET 3 with the original hidden neuron 1 appended and having the weight structure of NET 2.
The description above illustrates a case in which hidden neuron 1 is removed, but removal of the hidden neuron 2 engenders the same conclusion.Consequently, NET 2 has no critical point based on a hierarchical structure.(QED) See appendix A for the practical implementation process for 2-2-2 real-valued NN having no critical points based on a hierarchical structure.

B. Construction of 4-4-4 real-valued NN
A 4-4-4 real-valued NN having no critical point based on a hierarchical structure is constructed from a 1-1-1 quaternionic NN.The quaternionic NN is an extension of the classical realvalued neural network to quaternions, of which the weights, threshold values, input and output signals are all quaternions, where a quaternion is a four-dimensional number invented by W. R. Hamilton in 1843.
Consider a 1-1-1 quaternionic NN (called NET 4 here).Let the weight between the input neuron and hidden neuron be A a ib jc kd     Q , and the weight between a hidden neuron and an output neuron be B i j k where Q represents a set of quaternions.We assume that 0 A  and 0 B  .Let C p iq jr ks     Q denote the threshold of the hidden neuron, D i j k represent the threshold of the output neuron, I v iw jx ky     Q be the input signal, and O V iW jX kY     Q be the output signal.We can use the activation functions defined by the following equations: For the hidden neuron, and ( ) , For the output neuron.Because a quaternion is noncommutative for multiplication, the computational result varies with the multiplication sequence of an input value and weight: IA AI  .Accordingly, quaternion neurons of two kinds exist: a normal quaternary neuron (computing AI ) and www.ijacsa.thesai.organ inverse quaternary neuron (computing IA ).This paper specifically addresses a quaternionic NN that comprises only inverse quaternary neurons as an example.

Proposition 2 NET 5 has no critical point based on a hierarchical structure.
(Proof) Assume a 4-3-4 real-valued NN obtained by removing the hidden neuron 1 from the NET 5 (called NET 6 here).Also assume that the learning parameter of the NET 6 is a critical point that implements mapping 2 ( , , , ) F v w x y .It is necessary to realize any one of the following three conditions for implementation of the same mapping 2 F by appending once-removed hidden neuron 1 to the NET 6 again.

1) A weight vector between hidden neuron 1 appended and
the four output neurons is 0 .0 B  must hold in this case, but this violates the assumption 0 B  .
2) A weight vector between hidden neuron 1 appended and the four input neurons is 0 .0 A  must hold in this case, but this violates the assumption of 0 A  .
3) Letting j w denote the weight vector between hidden neuron j and the four input neurons for any 14 j where the hidden neuron 1 is the appended one, then there exist some

 ww
In this case, 0 A  must hold, which violates the assumption of 0 A  .Therefore, mapping 2 F cannot be implemented by the NET 6 with the original hidden neuron 1 appended and having the weight structure of the NET 5.
The description above presents a case in which hidden neuron 1 is removed, but removal of the hidden neuron j engenders the same conclusion ( 24 j ).Consequently, NET 5 has no critical point based on a hierarchical structure. (QED) See appendix B for the practical implementation process used for the 4-4-4 real-valued NN having no critical points based on a hierarchical structure.

C. Construction of 2-2-2 complex-valued NN
A 2-2-2 complex-valued NN having no critical point based on a hierarchical structure is constructed from a 1-1-1 quaternionic NN.
Equation (6) has the meaning described below.The weight A between the input neuron and the hidden neuron of the NET 7 can be written using Cayley-Dickson notation as follows.
A a ib jc kd Where 1 x a ib  C and 2 x c id  C .Equation (6) can be rewritten as 12 x ix  from (8).Therefore, if we regard 1 x and 2 x respectively as two vectors, 1 x and 2 x do not intersect orthogonally.Furthermore, if 12 That is, the weight A has information related only to 1 x .Consequently, (6) means the exclusion of such a special case.
In addition, (7) means that the threshold value of the hidden neuron is 0, which is necessary for application of the condition for complex-valued NN being reducible.
Cayley-Dickson notation reveals that the NET 7 is equivalent to a 2-2-2 complex-valued NN (called NET 8 here) shown in Fig. 4, where ,, The activation functions are given as (1) and (2).

Proposition 3 NET 8 has no critical point based on a hierarchical structure (as a complex-valued NN).
www.ijacsa.thesai.org, , , , , , , , ,  v x a c V X               are all complex numbers.The threshold values of the hidden neurons are all omitted because they are 0.
(Proof) Assume a 2-1-2 complex-valued NN obtained by removing the hidden neuron 1 from NET 8 (called NET 9 here).Also assume that the learning parameter of the NET 9 is a critical point that implements complex mapping ( , ) G v x  .
It is necessary to realize any one of the following three conditions for implementing the same complex mapping G by appending once-removed hidden neuron 1 to the NET 9 again.

1) A weight vector between hidden neuron 1 appended and
the two output neurons is 0 .0    Must hold in this case, but this violates the assumption 0 2) A weight vector between hidden neuron 1 appended and the two input neurons is 0 .0 ac  must hold in this case, but this violates the assumption 0 A a c j     .

3) Let
  0 ac  must hold in this case, but this violates the assumption Therefore, the complex mapping G cannot be implemented by the NET 9 with the original hidden neuron 1 appended and having the weight structure of NET 8.
The description above presents case in which hidden neuron 1 is removed, but removal of the hidden neuron 2 engenders the same conclusion.Consequently, NET 8 has no critical point based on a hierarchical structure (as a complexvalued NN).(QED) This paper assumes the threshold value of the hidden neuron of a 1-1-1 quaternionic NN to be 0 (( 7)).This threshold value is necessary to apply the `three conditions for a complex-valued NN to be reducible' as described in the proof of Proposition 3. As a result, all threshold values of the hidden neuron of the obtained 2-2-2 complex-valued NN are 0. Considering a 1-1-1 quaternionic NN with possibly non-zero threshold value of a hidden neuron might yield a 2-2-2 complex-valued NN with the possibly non-zero threshold value of a hidden neuron.For a three-layer complex-valued NN with a possibly non-zero threshold value of a hidden neuron to be reducible, exceptional reducibility is necessary in addition to the three conditions presented above [2].
See the appendix C for the practical implementation process of a 2-2-2 complex-valued NN having no critical points based on a hierarchical structure.

III. DISCUSSION
Fukumizu and Amari proved that a critical point of the three-layered real-valued NN with 1 H  hidden neurons always gives many critical points of the three-layered realvalued NN with H hidden neurons.These critical points can be local minima or saddle points.
Local minima cause plateaus, which have a strong negative influence on learning.Recently, it was proven that most of the local minima that Fukumizu et al. discovered are resolved by extending the real-valued NN to complex numbers; most of the critical points attributable to the hierarchical structure of the complex-valued NN are saddle points, which is a prominent property of the complex-valued NN [1].That is, there exist many critical points based on a hierarchical structure both in the real-valued NN and the complex-valued NN.
Such critical points can be local minima or saddle points in the real-valued NN, although most critical points of the complex-valued NN are saddle points.However, in both cases, critical points do exist in the networks.As described in this paper, an attempt is made to remove critical points themselves from NNs based on a hierarchical structure.

IV. CONCLUSION
This paper presented a proposal for an implementation process of a NN having no critical point based on a hierarchical structure.Results demonstrate that real-valued and complex-valued NNs having no critical point based on a hierarchical structure can be constructed by decomposing a high-dimensional NN into equivalent real-valued or complexvalued NNs.Concretely, the following three cases are shown: (a) A 2-2-2 real-valued NN is constructed from a 1-1-1 complex-valued NN.
(c) A 2-2-2 complex-valued NN is constructed from a 1-1-1 quaternionic NN.Those NNs described above do not suffer www.ijacsa.thesai.orgfrom a negative effect by singular points during learning comparatively because they have no critical point based on a hierarchical structure.
The author expects to address the following issues in future studies.
1) Although quaternionic NN that comprise only inverse quaternary neurons are used for this study, the case with normal quaternary neurons shall be considered.
2) General complex-valued NNs with possibly non-zero threshold values of a hidden neuron shall be analyzed, which requires consideration of exceptional reducibility [2].
3) A 2 s -dimensional Clifford NN having no critical point based on a hierarchical structure shall be produced by decomposing a general 2 n -dimensional Clifford NN [3] into equivalent Clifford NNs of 2 s dimensions ( sn  ).

A. Implementation of the 2-2-2 real-valued NN
The practical implementation of the 2-2-2 real-valued NN having no critical points based on a hierarchical structure is described below.
2) Create NET 2 shown in Fig. 1 by decomposing NET 1 where the complex numbers are decomposed into two real numbers.That is, the complex number a ib  C representing the complex-valued weight between the input neuron and the hidden neuron is decomposed into the two real numbers a  R and b  R .The complex number v iw  C representing the complex-valued weight between the hidden neuron and the output neuron is decomposed into the two real numbers v  R and w R .The complex number c id  C representing the complex-valued threshold of the hidden neuron is decomposed into the two real numbers c  R and d  R .The complex number p iq  C representing the complex-valued threshold of the output neuron is decomposed into p  R and q  R .
3) The activation functions of NET 2 are as follows: for the hidden neurons, and ( ) , for the output neurons.The following conditions are imposed on NET 2 for the assumption that 0 a ib  and 0 v iw 

B. Implementation of the4-4-4 real-valued NN
Practical implementation of the 4-4-4 real-valued NN having no critical points based on a hierarchical structure is the following.

C. Implementation of the 2-2-2 complex-valued NN
The practical implementation of the 2-2-2 complex-valued NN having no critical points based on a hierarchical structure is the following.


is the global minimum of error function () , is designated as the critical point of the error function E .A critical point can be a local minimum, a local maximum, or a saddle point.Fukumizu et al. mathematically proved the existence of a local minimum resulting from a hierarchical structure in a real-valued NN (ordinary NN handling real-valued signals).

2 )
Create NET 5 shown in Fig.3 by decomposing NET 4 where the quaternions are decomposed into four real numbers.That is, quaternion A a ib jc kd     Q representing the quaternionic weight between the input neuron and the hidden neuron is decomposed into the four real numbers a  R , b  R , c  R , and d  R .The quaternion B  Q representing the quaternionic weight between a hidden neuron and an output neuron is decomposed into the four real numbers   R ,   R , jr ks     Q representing the quaternionic threshold of the hidden neuron is decomposed into four real numbers p  R , q  R , r  R , and s  R .The quaternion D  Q representing the quaternionic threshold of the output neuron is decomposed into four real numbers   R ,   R ,   R , and   R .The activation functions of NET 5 are as follows:

1 )
Consider NET 7 (1-1-1 quaternionic NN) defined in Section II-C.Create NET 8 shown in Fig.4by decomposing NET 7 where the quaternions are decomposed into the two complex numbers.That is, the quaternion A a ib jc kd     Q representing the quaternionic weight between the input neuron and the hidden neuron is decomposed into the two complex a a ib    C and c c id    C .The quaternion  Q representing the quaternionic weight between a hidden neuron and an output neuron is decomposed into the two complex numbers ' i .The activation functions of NET 8 are the following: (1) for the hidden neurons and (2) for the output neurons.