An Ensemble GRU Approach for Wind Speed Forecasting with Data Augmentation

This paper proposes an ensemble model for wind speed forecasting using the recurrent neural network known as Gated Recurrent Unit (GRU) and data augmentation. For the experimentation, a single wind speed time series is used, from which four augmented time series are generated, which serve to train four GRU sub-models respectively, the results of these submodels are averaged to generate the results of the proposal ensemble model (E-GRU). The results achieved by E-GRU are compared with those of each sub-model, showing that E-GRU outperforms the sub-models. Likewise, the proposal model (EGRU) is compared with benchmark models without data augmentation such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), showing that E-GRU is much more precise, reaching a difference of around 15% with respect to the Relative Root mean Square Error (RRMSE) and 11% with respect to the Mean Absolute Percentage Error (MAPE). Keywords—Wind speed forecasting; recurrent neural networks; gated recurrent unit; ensemble GRU; data augmentation


I. INTRODUCTION
Earth's natural greenhouse effect makes life possible as we know [1]. However, human activities, such as the burning of fossil fuels and deforestation, have intensified the natural phenomenon, causing global warming [2], and due to this problem, the exploitation of renewable energies such as solar, wind, thermal energy and others have emerged as excellent alternatives for its solution.
Regarding wind energy, this is harnessed through the use of wind machines or wind motors capable of transforming wind energy into mechanical rotational energy usable for the production of electrical energy. Thus, the prediction of wind speed time series has become an essential task in wind energy farms, this helps in the planning of energy production [3] among others.
In models based on deep learning, the problem of overfitting [4], [5], [6] is usually presented due to the lack of data. Various solutions have been suggested in the literature, such as the use of dropout layers, regularization and data augmentation.
In this work an ensemble model for wind speed forecasting is proposed, it is based on the recurrent neural network known as Gated Recurrent Unit (GRU), where despite having enough historical data for the training phase [7], a data augmentation process is used with the sole objective of improving the precision of the model results, thus it is used the data augmentation technique proposed by Flores et all (2021) "in press" [8]. GRU is used instead of Long Short-Term Memory (LSTM), due to the antecedents such as [9], [10], and others where GRU presents slightly better results than LSTM. The proposal ensemble model (E-GRU) consists of four GRU sub-models, for which four different augmented time series have been generated from a single wind speed time series. The final result is the average of the four sub-model predictions. The idea of using an ensemble model arises from the need to take advantage of the default and excess predicted values with respect to the observed or original data.
The main contribution of this study is a novel ensemble model (E-GRU) for wind speed time series forecasting based on recurrent neural networks as GRU and data augmentation.
The content of the work has been organized as follows. In the first section, the problem and the respective solution are described. In the second section, the theoretical bases are described, which are the basis of the paper's proposal. In the third section, the methodology followed for the implementation of the proposal is described. In the fourth section, the results achieved are described and discussed. In the last section, the conclusion reached at the end of the study is presented, as well as future work.
II. BACKGROUND This section briefly describes some theoretical bases that are important for understanding the content of the paper.

A. Recurrent Neural Networks (RNN)
Just like Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), RNNs are part of the fundamental architectures of Deep Learning, which specialize in working with sequential data, hence their use in natural language processing (NLP) as well as in time series regression.
The best known RNN is probably Long Short-Term Memory (LSTM) known to overcome the vanishing gradient problem in RNNs. Several variants are generated from LSTM, including Gated Recurrent Unit (GRU), which, as mentioned above, for certain case studies, especially in time series, presents better results than LSTM.
The GRU architecture is shown in Fig. 1 Where: Vectors of parameters Element-wise sigmoid function Element-wise multiplication

B. Data Augmentation
Data augmentation arose to solve overfitting problems in image classification [11] models like CNN and others. Many of these techniques consisted of zooming, rotation, flipping, etc. Later, the concept was transferred to time series classification, here techniques such as time-warping, rotation, scaling, jittering, etc. emerged.
This work uses the technique proposed in "in press" [8] which is based on two basic techniques such as time-warping and jittering. The first one allows to increase the length of the original time series and the second one makes the synthetic data generated with the first one non-linear. Thus, this technique works with two parameters, the block size and the sub-block size, the first indicates the number of synthetic items to insert between each pair of the original time series and the second the number of linear synthetic items in each synthetic block. Fig. 2, shows a graphical view of this data augmentation technique.

III. METHODOLOGY
The methodology followed for the implementation of the proposal is described below.

A. Time Series Selection
The selected daily wind speed time series is the same that was used in the work "in press" [7], and was obtained from the repository of the National Aeronautics and Space Administration (NASA) using Power Data Access Viewer with latitude: -17.6851 and longitude: -71.3515. This corresponds to a point in Ilo city in Peru that has enormous potential for wind energy.
This time series ranges from 1981-01-01 to the present, however, for the purposes of experimentation in this study, the years 1981-2016 will be used for training and the years 2017-2020 for testing.

B. Time Series Imputation
The selected daily wind speed time series does not present NA values, so the application of any data imputation technique was not necessary at this stage.

C. Data Augmentation
In this phase, the data augmentation technique based on time-warping and jittering proposed in [8] was configured according to Table I.  As can be seen in Table I, the first two augmented time series (TS-1 and TS-2) have the same parameters as well as the third and fourth (TS-3 and TS-4), but due to the randomness of the data augmentation technique different items are generated for each synthetic block, this can be seen in Fig. 3.

D. Ensemble Model Implementation (E-GRU)
At this stage, the ensemble model is implemented. Here the four sub-models have the same characteristics, which are detailed in Table II.   TABLE II. HYPERPARAMETERS OF EACH SUB-MODEL

E. Evaluation
For the evaluation of the predicted days, it is necessary to extract those corresponding to the original data since these also include predicted synthetic values. For this process, the value of the block-size parameter of the data augmentation technique is considered, which we will call z; the predicted time series begins to be traversed and the predicted value located after the z value is extracted, then z new positions are traversed and the next value is extracted, and so on until reaching the last predicted value.
The model is evaluated through three regression metrics, these correspond to the Root Mean Square Error (RMSE), Relative RMSE (RRMSE) and Mean Absolute Percentage Error (MAPE), which are estimated through equations (5), (6) and (7) respectively.
A graphical version of the proposal model (E-GRU) can be seen in Fig. 4.

IV. RESULTS AND DISCUSSION
After experimentation, this section shows and describes the results achieved.

A. Results
According to Table III and Fig. 5, it can be seen that the ensemble proposal model E-GRU on average surpasses all the sub-models.
Regarding the RMSE, on average E-GRU is superior to all sub-models. However, for the forecast horizon of 500 days, GRU-1 (0.0284) slightly exceeds E-GRU (0.0288), this is the horizon where E-GRU reaches its worst performance.
According to RRMSE on average and in all prediction horizons, E-GRU outperforms all sub-models. It is important to highlight that according to the RRMSE achieved, E-GRU and all the sub-models can be classified as excellent since they present RRMSE <10% [12], [13].
With respect to MAPE, like the previous metrics, on average E-GRU outperforms all sub-models. However, it is important to highlight that GRU-1 for the horizons of 50 and 100 predicted days, manages to surpass E-GRU.
According to Fig. 6, the importance of the ensemble process in the proposal can be appreciated. The data predicted by the sub-models closely approximates the original data by default and excess, and the average operation of the ensemble model makes it much closer to these, making E-GRU more accurate than the sub-models.
Likewise, it is important to highlight the importance of each sub-model, thus in Fig. 6 for the point enclosed in the circle, GRU-4, the worst of the sub-models according to Table  III, is the only one that contributes to improving the proposal model precision. 571 | P a g e www.ijacsa.thesai.org  According to Table IV, in reference to the average and all the prediction horizons it can be seen that the ensemble proposal model E-GRU far exceeds the results of the benchmark models (LSTM and GRU). Here it is important to highlight that the architecture of the LSTM and GRU models is four-layer and use the same hyperparameters as the submodels of ensemble proposal model, but they do not use data augmentation.
Regarding the RRMSE, there is an average difference of approximately 15% between the results of the ensemble proposal model (E-GRU) and the benchmark models. Likewise, with respect to MAPE, the percentage difference is approximately 11%.

B. Discussion
In this part, the results achieved by the ensemble proposal model E-GRU are compared with those achieved by other state-of-the-art models in the prediction of wind speed time series.
Here, according to Table V, the high precision of the models proposed by Qureshi et al [14] and Flores et al [7] can be highlighted. In the first case, the authors use an architecture based on Deep Neural Networks and Meta Regression with Transfer Learning (DNN MRT), reaching an RMSE = 0.0953. In the second case, the authors use an architecture based on the recurrent neural network GRU including data augmentation, reaching an RMSE = 0.0876.
The E-GRU proposal model uses the same GRU architecture of [7] for each sub-model as well as the same data augmentation technique, the fundamental difference is that instead of using a single augmented time series, it uses four augmented time series, which are different due to the randomness of the technique and also work with different values for the sub-block size parameter.
The results show that the proposal ensemble model manages to surpass the state-of-the-art models including the techniques proposed in [14] and [7].