Recurrent Neural Networks for Meteorological Time Series Imputation

The aim of the work presented in this paper is to analyze the effectiveness of recurrent neural networks in imputation processes of meteorological time series, for this six different models based on recurrent neural networks such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are implemented and it is experimented with hourly meteorological time series such as temperature, wind direction and wind velocity. The implemented models have architectures of 2, 3 and 4 sequential layers and their results are compared with each other, as well as with other imputation techniques for univariate time series mainly based on moving averages. The results show that for temperature time series on average the recurrent neural network achieve better results than the imputation techniques based on moving averages; in the case of wind direction time series, on average only one model based on RNN manages to exceed the models based on moving averages; and finally, for wind velocity time series on average, no RNNbased model manages to exceed the results achieved by moving averages-based models. Keywords—Recurrent neural network; long short-term memory; gated recurrent unit; univariate time series imputation


I. INTRODUCTION
The imputation of time series is a very important activity within the stage of homogenization of data, it is typical of the processing of meteorological time series. This will allow a subsequent time series to be used in forecasting processes.
There are many reasons why NA values are found: values may not have been measured, values may be measured but lost or values may be measured but erroneously [1]. Missing values can cause problems, since complete data is usually needed for proper processing and analysis.
It's very known that the accuracy of the imputation techniques will allow better results in forecasting or prediction processes [2]. Thus, a good selection of the imputation technique presented a certain problem is very important.
There is not a very large number of imputation techniques for univariate time series, among them can be mentioned those based on moving averages such as: Simple Moving Average (SMA) [3], Linear Weighted Moving Average (LWMA) [3], Exponential Weighted Moving Average (EWMA) [3], Autoregressive Integrated Moving Average (ARIMA) [4] among others.
Nowadays, recurrent neural networks (RNN) such as Long Short-Term Memory (LSTM) [5] and Gated Recurrent Unit GRU) [6] have become the most commonly used in prediction or forecasting models today for the accuracy of the results they offer in different fields such as machine translation, robot control, speech recognition, time series prediction among others. However, despite the benefits described, in the state of the art it is very difficult to find works that use recurrent neural networks for univariate time series imputation, which was one of the main motivations for the realization of the present study.
Thus, this paper presents the results of the implementation of six different models for hourly time series imputation based on recurrent neural networks. The analyzed time series correspond to temperature, wind direction and wind velocity and they were obtained from the Moquegua 1 meteorological station of SENAMHI in southern Peru. The gap-sizes analyzed correspond to short-gaps (1 to 2 NAs), medium-gaps (3 to 10 NAs) and large-gaps (11 to 30 NAs) [7]. Fig. 1 shows a graphical view of the 3-time series for 24 hours. The content of the paper has been organized as follows: In the second section the related work is briefly described as proposed in this study. In the third section, the theoretical concepts and bases that will allow a better understanding of the content of the paper are described. In the fourth section, the models based on recurrent neural networks implemented in this 1  study are described and detailed. In the fifth section, the results achieved by the six different models in the time series are explained in detail. In the sixth section, the results achieved by the proposed models are compared and discussed with other models and techniques of the state of the art. In the seventh section, the arrived conclusions are explained according to study results. Finally, it describes the future work that can be done to improve the achieved results.

II. RELATED WORK
This section shows a brief review of the works related to this study which are described below: The first methods of imputation consisted of the use of parameters such as mean, median or mode [8], due to its simplicity there was a risk of inserting bias into the time series.
Another technique used later than the first was to use the last data observed before the missing one. This was called Last Observation Carried Forward (LOCF).
We also have the Hot-Deck [9] technique that consisted of randomly using existing data to replace the Not Available (NA) value.
Another group of techniques widely used are those based on moving averages including Simple Moving Average (SMA) [3], Linear Weighted Moving Average (LWMA) [3], Exponential Weighted Moving Average (EWMA) [3] which basically consisted of using the average of the data around the missing data assigning a weight according to its proximity to the NA value. This set of techniques are implemented in the present study to compare the results achieved by the imputation models based on recurrent neural networks.
An improved technique based on moving averages is what is known as Autoregressive Integrated Moving Averages (ARIMA) [4], which is a statistical technique that works with variations and regressions in a series of time to find patterns that will later serve to make predictions. This work also compares the results of ARIMA with those achieved by the imputation models based on recurrence neural networks.
Another technique used for imputation of time series is known as Local Average of Nearest Neighbors (LANN) [2], this technique is quite simple and consists only of using the prior and next values around an NA value, producing very good results at the level or better than those based on moving averages.

Two new imputation techniques inspired by Case Based
Reasoning [10] are CBRi [11] and CBRm [8] which, like LANN, use only the prior and next values of an NA value, completing the missing values from the average of the historical data similar to the prior and next. The difference between the two is that CBRi is designed for short-gaps and CBRm for medium-gaps.
Another new technique is known as Average of Historical Vectors (AHV) [12] that uses only values similar to the prior value of the NA value to calculate the missing data. This technique is complemented by an adjustment algorithm (iNN) [12] and a smoothing algorithm (LANNf) [12].

A. Time Series Imputation
The time series imputation refers to the process of calculating and completing the missing data or Not Available (NA) values in a series of time. For this it is very important to determine how the NA values originated, so they can be Missing Completely at Random (MCAR), Missing at Random (MAR) or Not Missing at Random (NMAR) [1]. It is also very important to determine the characteristics of the time series, so it can be very useful some characteristics such as: trend, seasonal or non-seasonal cycles, pulses, etc.

B. Recurrent Neural Networks (RNN)
An RNN is a type of neural network [13] that allows modeling different kind of problems such as time series for prediction.
The architecture of this neural network is very similar to the architecture of a Multilayer Perceptron (MLP) with the difference that an MLP allows connections between hidden units associated with a time delay. These connections allow the RNN to retain and remember information from the past [14], in this way it can find temporary correlations between facts that can be very separated in time. Fig. 2 shows the unfolded structure of an RNN.
Training an RNN is very difficult to implement [13] due to the vanishing and exploding gradients, This led to the implementation of a special type of RNN that is known as LSTM (Long Short-Term Memory) and that solves the above problems.

C. Long Short-Term Memory (LSTM)
As mentioned above, the LSTM networks were created to solve the problem of the vanishing and exploding gradients of the first recurrent neural networks. The LSTM networks work with special hidden units, whose objective is to remember input data for a long time [3], so LSTM networks are better than conventional RNN [5]. LSTM networks have several layers for each time step. Fig. 3 shows the LSTM architecture.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 3, 2020 484 | P a g e www.ijacsa.thesai.org

D. Gated Recurrent Unit (GRU)
GRUs are an activation mechanism in RNNs and were introduced by K. Cho et al. [6] in 2014. GRUs are a variation of LSTM networks, since both have a very similar architecture. However, unlike LSTM networks, GRUs have fewer parameters, since they lack an output gate. In many studies, LSTM networks have proven to be stronger than GRUs, since they can easily perform unlimited counting, while GRUs do not, so GRUs do not learn certain languages that LSTM can do [15]. Fig. 4 shows a very common architecture of GRU. According to Fig. 3 the following equations can be obtained and some parameters are described: Where: In the present study, six models based on recurrent neural networks were implemented, which are described below: As can be seen in Table I of the six models implemented, three correspond to LSTM and three to GRU, the process followed to implement each of them is described below.

A. Time Series Selection
The hourly time series chosen for experimentation corresponds to temperatures, wind direction and wind velocity obtained from the SENAMHI repository. The data used for the training stage corresponds to 6000 hours from 2019-05-20 00:00:00 to 2020-01-24 23:00:00. The same period was used for all three-time series.

C. Implementation of Models
Once the time series and training data were selected, the first model was implemented, as shown in Fig. 5.
This model was trained with the data of the three-time series predicting 168 values for each time series.
Next, the remaining five models were implemented, predicting 168 values for each time series in every model. Fig. 6, Fig. 7, Fig. 8, Fig. 9 and Fig. 10 show the architecture of these models.

D. Evaluating Predictions
The results of six models are evaluated through Root Mean Squared Error (RMSE) according equation (4): The results achieved are described in the next section.    1-

V. RESULTS
This section shows the results achieved. According to what is shown in Table III and in Fig. 11 for the imputation process in the temperature time series on an average, the best model is LSTM LSTM LSTM (RMSE 0.5565), this model was also the one that produced the best results for all the gap-sizes.
Likewise, it can be seen that the GRU models were in second, third and fourth place, with the best GRU model being the GRU GRU (RMSE 0.5898) on average. So for this type of time series, the most recommended models would be the LSTM LSTM LSTM and the GRU GRU.
According to what is shown in Table IV and in Fig. 12 for the imputation process of wind direction time series, on average the best model was LSTM LSTM LSTM LSTM and this model was the best for each gap-size.
Likewise, it can be seen that similar to the temperature time series, the 4-layer LSTM model is the only one that managed to outperform the three GRU models and it is shown that the GRU models present more homogeneous results, while the LSTM models present results more heterogeneous that is, there is a greater dispersion among them.
According to Table V and Fig. 13 on average, the best model for imputation of the wind velocity time series was LSTM LSTM LSTM LSTM as well as for each gap-size.    According to what is shown in Table VI, on average, the best technique for univariate time series imputation of temperatures is the recurrent neural network of 3 layers LSTM LSTM LSTM. However, performing an individual analysis for each gap-size, it is noted that this model is the best for medium-gaps (RMSE 0.5592) and large-gaps (RMSE 0.5407), but for short-gaps this is surpassed by the ARIMA-Kalman model (RMSE 0.4931).
According to   As noted in the previous tables, the small difference between the RMSEs obtained by the models based on recurrent neural networks for short-gaps, medium-gaps and large-gaps should be highlighted. That is, the RMSE varies very little and it costs almost the same to impute 1 or 2 values than 30.
Likewise, it is also important to highlight that for shortgaps the imputation techniques of the state of the art offer very good results, while their performance is diminished in mediumgaps and much more in large-gaps, where RNN models offer the best results.

VII. CONCLUSIONS
The effectiveness of six models based on recurrent neural networks in nine case studies was analyzed, and in seven of them at least one model based on recurrent neural networks outperformed other imputation techniques of the state of the art, so we conclude that models based on recurrent neural networks are highly recommended to be implemented for univariate time series imputation especially for medium and large gap-sizes.
The results achieved show that not all models achieve optimal results, so it is important to implement not only one model but several in such a way that the most appropriate model can be chosen for the problem to solve.
In the three time series analyzed, the LSTM-based models show greater heterogeneity in their results compared to GRUbased models whose results are more homogeneous.

VIII. FUTURE WORK
In the present work it was experimented with models based on recurrent neural networks, differentiating them only by the number of layers and the number of neurons in each layer, for future works it would be important to be able to implement hybrid models that contain both LSTM and GRU layers, since it has been seen in different works that hybrid models for certain time series produce better results than non-hybrid models. Likewise, it can be experimented with other parameters such as the number of epochs, the batch-size, the training data size, the optimizer, etc.
Likewise, the results achieved by the RNN-based models for the wind direction and wind velocity time series, despite exceeding the state-of-the-art techniques, are not optimal (they have a high RMSE) so they could be improved by increasing the size of the training data or adding more variables to the model.