Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks

In this paper, the natural gradient descent method for the multilayer stochastic complex-valued neural networks is considered, and the natural gradient is given for a single stochastic complex-valued neuron as an example. Since the space of the learnable parameters of stochastic complex-valued neural networks is not the Euclidean space but a curved manifold, the complex-valued natural gradient method is expected to exhibit excellent learning performance. Keywords—Neural network; Complex number; Learning; Singular point


I. INTRODUCTION
Complex-valued neural networks whose parameters (weights and threshold values) are all complex numbers, are useful in fields dealing with complex numbers or two-dimensional vectors such as telecommunications, speech recognition and image processing with Fourier transformation.Indeed, we can find some applications of complex-valued neural networks to various fields in the literature [6], [9].
The multilayer complex-valued neural network is usually trained using the gradient descent learning method [5], [10], [11], [12], as in the case of the multilayer real-valued neural network.The space of the learnable parameters of stochastic complex-valued neural networks is, however, not the Euclidean space but a curved manifold.For stochastic complex-valued neural networks, the ordinary gradient does not give the steepest direction of a target function, and the steepest direction is given by the natural gradient [2], [3].It has been shown in [4] that the natural gradient method could avoid singular points of the real-valued parameter space which is a cause of standstill in learning, and the natural gradient method could improve the learning performance of the real-valued neural networks as a result.Similarly, there exist many singular points in the complex-valued neural networks [7].Thus, the natural gradient method would be useful for the complex-valued neural networks, too.In this paper, we extend the natural gradient descent method for the multilayer stochastic real-valued neural networks to the complex domain, and give the natural gradient for a single stochastic complex-valued neuron as an example.
Section II describes the complex-valued neural network.Section III is devoted to the explanation of the natural gradient method, and Section IV presents the natural gradient method in complex-valued neural networks, which is followed by our conclusion in Section V.

II. COMPLEX-VALUED NEURAL NETWORK MODEL
This section describes the complex-valued neural network model used in this paper.First, we will consider the following complex-valued neuron.The input signals, weights, thresholds and output signals are all complex numbers.The net input U n to a complex-valued neuron n is defined as: , where W nm is the complexvalued weight connecting the complex-valued neurons n and m, X m is the complex-valued input signal from the complexvalued neuron m, and V n is the complex-valued threshold value of the complex-valued neuron n.To obtain the complexvalued output signal, convert the net input U n into its real and imaginary parts as follows: U n = x + iy = z, where i denotes √ −1.The complex-valued output signal is defined to be where φ : R → R, (R denotes the set of real numbers).Eq. ( 1) is often called a split-type complex-valued activation function.Note that the activation function f C is not a regular complex-valued function because the Cauchy-Riemann equations do not hold.
The complex-valued neural network used in this paper consists of such complex-valued neurons described above.
Note that various types of activation functions other than Eq. ( 1) can be considered naturally (for examples, the nonsplit-type (fully) one [10]).

III. NATURAL GRADIENT METHOD
This section briefly describes the natural gradient proposed in [2], [3].Let S = {w ∈ R N } be a Riemannian space with the Riemannian metric tensor that is, G(w) is the unit matrix, then S is an Euclidean space.Amari proved the following theorem [3].

Theorem 1:
The steepest descent direction of L(w) in a Riemannian space is given by where and ∇L is the conventional gradient, is called the natural gradient of L in the Riemannian space.The natural gradient descent algorithm is given by where ε t is the learning rate.
Amari derived the natural gradients explicitly in the case of the space of real-valued perceptrons for neural learning, the space of matrices for blind source separation, and the space of linear dynamical systems for blind multichannel source deconvolution [3].

IV. NATURAL GRADIENT IN COMPLEX-VALUED NEURAL NETWORKS
In this section, the natural gradient is applied to the complex-valued neural networks and the natural gradient descent algorithm is explicitly derived for a single complexvalued neuron.

A. Natural Gradient Learning in Complex-Valued Neural Networks
Let us consider a stochastic complex-valued multilayer feedforward neural network with N input neurons, one output neuron, and a learnable complex-valued vector parameter w = (w 1 , • • • , w N ) T ∈ C N which consists of all the weights and thresholds (C denotes the set of complex numbers).Assume that the complex-valued input signal z = (z 1 , • • • , z N ) T ∈ C N is subject to an unknown probability distribution q(z), and the complex-valued output signal y ∈ C is given by where g C is a complex function, and n = n R + in I is a complex-valued random variable subject to a complex normal distribution (or bivariate normal distribution) N (µ, Σ).The model specifies the probability density of the input-output pair as p(z, y; w) = q(z) • p(y|z; w).(7) Define a loss function l(z, y; w) when input signal z is processed by the stochastic complex-valued neural network having parameter w as: Given the training set {(z t , y t ), t = 1, • • • , T }, minimizing the loss function (Eq.( 8)) is equivalent to maximizing the probability that the stochastic complex-valued neural network outputs the training output signal y t .
The space of all the probability distributions which the above stochastic complex-valued neural network realizes, can be regarded as a 2N -dimensional Riemannian space because the complex-valued parameter consists of the two real-valued parameters: the real-part and the imaginary part.Thus, the information geometry [1] can be applied to the complex-vauled case, too.
The natural gradient descent algorithm for the complexvalued neural network is given by where {(z t , y t ) ∈ C N × C, t = 1, 2, • • • } is the sequence of the complex-valued training signals, and Eq. ( 11) is the natural gradient of l(z, y, v), and the usual gradient ∇l(z, y, v) is given by The Riemannian metric tensor G(v) is the Fisher information matrix [3], and is given by

B. Natural Gradient Learning in a Single Complex-Valued Neuron
In this section, the natural gradient descent learning algorithm for a single complex-valued neuron is given.

Consider a stochastic complex-valued neuron with Ninputs, weights w
where f C : C → C is a so-called split-type complex-valued activation function which is defined to be for any a + ib ∈ C, and φ : R → R is suitably chosen, for examle, the sigmoid function was used in [11], and the scaled error function (IJACSA) International Journal of Advanced Computer Science and Applications, www.ijacsa.thesai.org was used in [13].n = n R + in I is a complex-valued random variable subject to the complex normal distribution (or bivariate normal distribution) N (µ, Σ) where We assume that the input signal z = x + iy where T is subject to the multivariate complex normal distribution (or 2N -dimensional normal distribution) N (0, I) where the variance covariance matrix I is the unit matrix; denote its joint probability density function by q(z).The loss function l(z, y; θ) is defined as where Given the sequence of the complex-valued training signals the the natural gradient descent algorithm for the stochastic complex-valued neuron is given by We shall calculate the Fisher information matirx G(θ) = (g ij (θ)).For any 1 ≤ i, j ≤ 2N + 2, (from Eq. ( 21)) Here, since n R is independent of n I , log p(y|z; θ) = log p((φ(S) where By simple calculations, we obtain