Cultural Algorithm Initializes Weights of Neural Network Model for Annual Electricity Consumption Prediction

The accurate prediction of annual electricity consumption is crucial in managing energy operations. The neural network (NN) has achieved a lot of achievements in yearly electricity consumption prediction due to its universal approximation property. However, the well-known backpropagation (BP) algorithms for training NN has easily got stuck in local optima. In this paper, we study the weights initialization of NN for the prediction of annual electricity consumption using the Cultural algorithm (CA), and the proposed algorithm is named as NN-CA. The NN-CA was compared to the weights initialization using the other six metaheuristic algorithms as well as the BP. The experiments were conducted on the annual electricity consumption datasets taken from 21 countries. The experimental results showed that the proposed NN-CA achieved more productive and better prediction accuracy than other competitors. This result indicates the possible consequences of the proposed NN-CA in the application of annual electricity consumption prediction. Keywords—Neural network; weights initialization; metaheuristic algorithm; cultural algorithm; annual electricity consumption prediction


I. INTRODUCTION
Electricity is a major driving force for economic development in many countries. The overall demand for power increases continuously, even more, prominent in the future.
APEC is the acronym of the Asia-Pacific Economic Cooperation that is a cooperative economic group in the Asia-Pacific region. The high growth rates in recent decades of APEC results in a significant increase in electricity consumption. APEC energy data has proved essential in tracking energy consumption, reduction, and in determining the group's renewable energy goals. APEC is committed to improving efficient energy technologies; by setting targets and action plans, thereby creating the necessity to predict future electricity consumption usage accurately.
The artificial neural network (ANN) computation is based on the learning process of human perception and the function of the brain's nervous system, which has been widely applied to various problems in classification, pattern recognition, regression, and prediction. In general, humans have learning processes in which processes are characterized by pattern recognition. The pattern-based learning method is described as follows: People observe unknown objects and perceive their identities as distinct from others, especially when viewed more often and in different ways, which results in learning and memory. The human brain contains numerous processing units linked by several nervous systems that perform rapid analysis and decision making. The artificial neural network represents a simulation of the human brain [1] [2]. Many studies regarding ANNs have been conducted for solutions in various disciplines.

A. Background
ANN is a distributed data processing system consisting of several simple calculation elements working together through a weighted connection. This calculation architecture was inspired by the human brain, which can learn intricate data patterns and classify them into general data. ANN can be categorized into several types according to not only instructional and unattended learning methods but also feedback-recall architectures.
ANN's most commonly used architecture is the multilayer neural perceptron (MLP). The weights of MLP can be adjusted using both the gradient-based process and the stochastic-based process. The original gradient-based supervised training algorithm of MLP is the error back-propagation (BP) algorithm [3]. BP and its variants are the most frequently used neural network techniques for classification and prediction [4] [5].
However, the gradient-based method has two significant disadvantages: slow convergence speed and being trapped at a local minimum easily because of having a high dependency on the initial parameters (weights) [6] [7]. Metaheuristic algorithms can overcome those disadvantages of the gradientbased algorithms. Algorithms of this kind use randomizationbased techniques to perform the exploration and exploitation searches [8], which are capable of generating solutions to complex real-life problems that gradient-based methods are unable to solve [9]. The population-based structure is the most efficient and commonly used architecture in metaheuristic algorithms. The two often used categories of metaheuristic algorithms are evolutionary and swarm intelligence algorithms [10] [11].
Metaheuristic algorithms were applied as supervised training algorithms of MLPs. For a given problem (input and target values), both the structure and weights of an MLP can be optimized. In this paper, we focus on selecting proper initial values of the connecting weights in an MLP network. A *Corresponding Author metaheuristic algorithm will perform the initial weights selection. The existing metaheuristics that used to train MLPs for the annual electricity consumption prediction included the Artificial Bee Colony (ABC) [12] [13], Teaching-Learning-Based Optimization (TLBO) [13], Harmony Search (HS) [14] and Jaya Algorithm (JA) [15]. Techniques from prior studies found that the applied ANN model (ANN-TLBO), optimized by the TLBO algorithm to predict electric energy demand outperformed the ANN-BP and ANN-ABC models [13]; In other studies conducted to predict the electricity consumption of the ANN-TLBO in comparison with the ANN-BP, ANN-ABC, ANN-HS, ANN-TLBO, and ANN-JA models; the ANN-TLBO yielded better efficiency than that of the other models [15]. TLBO algorithm itself is two phases algorithm; a teacher phase and a learner phase [16].
Not Free Lunch Theorem (NFL) said that there is no superior optimization algorithm for all optimization problems [17]. Although a variety of evolution-based algorithms have been implemented and examined in the literature for MLP training, recognizing that the question of reaching local minima still exists. The Cultural Algorithm (CA) is very similar to the TLBO because it is also a two-stage algorithm; the population level and the belief space level [18]. This characteristic might lead to a more efficient in the initial weights selection. Therefore, we propose, herein, a new MLP training method based on the CA, in which to develop a single hidden layer neural network for annual electricity consumption prediction.

A. Multilayer Perceptron for Neural Model Training
MLP is a widely used type of feedforward neural network having a multi-layered structure for complex tasks. There are several layers, namely the input layer, hidden layers, and the output layer. Each layer of MLP comprises of numerous neurons and the connecting weights between the two consecutive layers. The connecting weights are represented by real numbers in [−1, 1]. The input layer is responsible for receiving information for the neural network and sending it to the first hidden layer through the connecting weights. Each hidden layer will contain a layer that is responsible for receiving information for the neural network and sending it to the hidden layer. In an MLP fully interconnected by weights, each neuron of the hidden layer contains summation and activation functions. The weighted summation of input is described in Eq. (1), where is the input variable , and is the connection weight between and the hidden neuron . An activation function is used to trig the output of neurons based on the value of the summation function. The Sigmoid function is most often applied. However, different types of activation functions may be utilized in the MLP.
Each node of the hidden layer calculates its output by Eq.
(2). The production of the node in the hidden layer is described in Eq. (2.) [19].
The outcomes of the lower hidden layer are fed to the adjacent layer. Once all neurons in the last hidden layer produce results, the production of the network will be obtained by Eq. (3).
The initialization of the weights of a neural network is one of the essential problems, as network initialization can speed up the learning process. Zero initialization [20] and Random initialization [21] are generally practiced techniques used to initialize the parameters. Traditionally, the weights of a neural network are set to small random numbers.

B. Cultural Algorithm
Cultural algorithms (CA) is a kind of evolutionary algorithms; it is first presented by R. G. Reynolds [18]. Their computational models are based on principles of human social Cultural evolution that make practical use of the learning process through various agent-based techniques based on experience and knowledge gained over time. The cultural process allows for improved efficiency in finding the optimal solution within a search space and making it easier to find the optimal global solution. The cultural changes within an optimization problem model represent information transmitted within and between populations. The main principle of the CA is to preserve socially accepted beliefs and discard unacceptable beliefs.
The CA can be divided into two main components as a population space and a belief space. Each member of the former part is evaluated through a performance function and may be carried out by an Evolutionary Algorithm (EA). An acceptance function then determines which individuals are to impact the belief space. At each generation, the knowledge acquired in the population search (e.g., the population's best solution) will be memorized in the belief space [22]. The interaction and help between the two spaces are similar to the evolution of human culture [23]. The significant components of CA are shown in Fig. 1.
The CA uses a dual evolutionary mechanism, while lowerlevel populations help periodically enter the top level of beliefs. On the other hand, a high level of belief will evolve these elite people to influence the lower communities [25]. This mechanism results in the improvement of the population diversity and the convergence characteristics, accordingly. The interested reader can see [18] for more details of CA.

C. Cultural Algorithm for Training Neural Network Model
We propose CA as a training algorithm of the Neural Network model. CA will find a proper set of the initial weights for an MLP, and from now on, we call the proposed algorithm as NN-CA. It can be applied not only for a single hidden but also several hidden layers. Two main aspects must be considered when the approach is used: the representation of the weights as the search agent of the CA; and the selection of the fitness function.
The representation is straightforward, as all the weights of an MLP are organized and indexed to be a row vector. This vector is a search agent of CA. The fitness function will be explained after the presentation of the workflow. The workflow of the CA approach applied to train the neural network model may be described in the following steps: 1) Initialization: the search agents in the population and belief spaces are randomly generated for training. Each search agent in a belief space represents a possible MLP. Each dataset is separated as the training part and the testing part.
2) Fitness evaluation: Each possible MLP is evaluated its quality through a fitness function. All the weights of a search agent of belief space are mapped to an MLP, and then each MLP is assessed by the selected fitness function. Typically, the Mean Squared Error (MSE), which is dependent on the neural network training model and the problem of interest, is selected to perform.
3) Update the accepted population in the belief space. 4) Steps 2 to 3 are repeated until the terminated condition is found.
5) The reliability evaluation of the neural network model that has the lowest MSE value will be conducted on the testing part of the dataset to determine the Mean Absolute Error (MAE).
The MSE, which is the average of the error-squared for all training samples, as shown Eq. (4), acts as the fitness function. It depends on the difference between each actual (or the target) its associated output values of the MLP.
The Mean Absolute Error (MAE) that evaluates the reliability of each model is shown in Eq. (5).
where is annual electricity consumption value produced from MLP and is the actual annual electricity consumption value.

III. EXPERIMENTAL RESULTS
The experiments aimed to examine the effectiveness of the proposed method for the annual electricity consumption prediction. The neuron network model used was a single hidden layer MLP. All the experiments are programmed in MATLAB, and ran on Intel 2.9 GHz, 8 GB memory. The operating system is Windows 10.
Our study utilized the Asia-Pacific Cooperation (APEC) energy database, which contained the annual electricity statuses of 21 countries in the Asia-Pacific region. There are four input variables: Population (million person), GDP (billion US$), imports (billion US$), and exports (billion US$) were independent variables in model annual electricity consumption (TWh).
Data were divided into two parts: training data (1990 to 2008) and testing data (2009 to 2017); which consisted of population, GDP, imports, and exports data from the World Bank [26]; and annual electricity consumption data, from the Expert Group on Energy Data and Analysis (EGEDA) [27]. Annual Electricity consumption target data are shown in Fig. 3.
The Pearson correlation coefficient (R) was applied to examine the dependency between each input variable and annual electricity consumption. All related R-values are shown in Table I.
From Table I, the GDP of Russia has a relatively low Rvalue (R < 0.5). That means the annual electricity consumption in Russia does not maintain a linear relationship with its GDP parameters.   Based on the electric consumption data we studied, the size of MLP is 4:h:1, where h is the number of neurons in the hidden layer. Because the prediction accuracy depends on the MLP size or h, we compare two strategies to study the effect h. The first strategy h was assigned as 2×N +1, where N is the dimension of dataset features or the dataset features [19]. The second strategy appointed the number of hidden neurons to be 5, 10, 15, and 20 [13]. There are some predefined settings: all BP experiments were executed with 5,000 iterations, each metaheuristic algorithm evolved 5,000 iterations, the population size of CA is 50, the MLPs weights must be in the interval of [-1, 1].

A. Comparing the Results of Neural Network Models
The proposed NN-CA was compared with MLP trained by the error back-propagation algorithm (which is label as BP), as well as other metaheuristic algorithm trainers, based on the MSE evaluation measures. Input, hidden layer neurons, and output variables were assigned before starting the experiment. From Table I, there are four input variables: population, GDP, imports, and exports. To determine the suitable network architecture, the BPs were trained with a single hidden layer, incorporating nine hidden nodes that specified by the first strategy, and 5, 10, 15, 20 hidden nodes as determined by the second strategy. Table II presents the average ranks, Friedman test [28], produced by each competitor, where a lower score is better. The significant differences do exist between the six algorithms. As seen from Table II, NN-CA with a 4-20-1 architecture produced the best overall ranking in comparison with other algorithms, which shows the merits of the proposed NN-CA.
The annual electricity consumption variable was provided in the output data. The overall results that the 4-20-1 architecture of the neural network model was the most superior.  Vol. 11, No. 6, 2020 We can see that NN-CA outperforms the other algorithms in all the results of the Friedman rank test with an even lower number of hidden neurons, such as the 4-5-1 neural network model architecture.
The overall results, the 4-20-1 architecture of the neural network model and the 4-5-1 neural network model architecture presented in Tables III and IV. As demonstrated in each table above, the proposed NN-CA outperformed all other training optimizers and BP. CA can select a proper search agent to be the initial weights of an MLP. Each algorithm within each dataset was statistically compared via the Friedman test. This comparison confirmed the significance, and contrast, of the NN-CA's ability with that of the other trainers.  The results indicated that the NN-CA algorithm was the fastest convergence speeds in Australia, Brunei, Canada, Chile, Hong Kong, Japan, Korea, Mexico, New Zealand, Peru, Philippines, Russia, Singapore, Thailand, and USA datasets. Within other datasets, the NN-CA was not deemed best; its results remained very competitive in each case.

IV. CONCLUSION
In this paper, MLP predicted consumer electricity usage, a method based on metaheuristic algorithms for the weights initialization of an MLP was implemented, as well as to analyze annual electricity consumption. The goals of the training problem were to avoid high local optima with convergence to the best solution in the predefined time. The result of the proposed technique was an MLP that has the lowest MSE. The proposed NN-CA outperformed all competitive algorithms, found the best-initialized weight values that the error back-propagation algorithm did not stick at a local minimum and can reduce the MSE effectively. The proposed method was proved to be suitable for the annual electricity consumption prediction, which will accurately support the power network infrastructure plan.
Because the MLP in this paper is a fully connected network, there are a lot of unnecessary weights or links. An MLP having only the necessary weights is not only more compact but also more accurate than the MLP with whole weights. Therefore, determining those weights and removing them from the final model is a necessity. However, this problem is very time-consuming. That is our future work.