Time Series Forecasting using LSTM and ARIMA

org


INTRODUCTION
Time series is a time-dependent dataset, which means that the values are obtained in specific intervals of time. Usually, the values are taken at regular intervals, but the sampling could be irregular [4]. If a time series has a definite pattern, then any value of the series should be a function of previous values. Time series models differ from others in the way it predicts. With the advancement of information technology, now there are many more ways to collect time series data. A time series model uses a lag value of the target variable and uses it as a predictor variable, whereas traditional models use other variables as predictors. Time series analysis is the process of extracting output from time series data using different techniques. One of the famous types of time series analysis is Time series forecasting. In time series forecasting, the results are the predicted outputs from the trained models. There are many forecasting models available. In this research, LSTM and ARIMA models are used. A general LSTM model has a cell; each cell has three parts, i.e. Forget gate, Input Gate and Output Gate. The states are affected by both past states and current input with feedback connections. LSTM models are able to learn longterm dependencies with feedback connections. LSTM can be single or multilayered models, the functionality differs for both. They are preferred because they hold information for a long time by default. ARIMA forecasts temporal dependencies using only historical values. These models help to gain better insights into the data and predict future trends. It works by stationarising the series, means studying the correlation of the values and check its residual diagnosis ACF and PACF plots.
A number of forecasting problems can be solved using LSTM and ARIMA models. Although, the models have been used for prediction in the past (21) 19 18 12, but the goal of this research is to elaborate the differences between the outcomes that these models provides. The key point is understanding how well one model performs than the other when trained on same datasets. The evaluation of these models is done based standard evaluation parameters (MAE, MSE, R recall, etc.). The graphs included will also help in determining which model is best for doing time series for stock price prediction.

II. REVIEW OF RELATED LITERATURE
The time series is the set of quantitative observations arranged in chronological order. Time series analysis has attracted a lot of attention in the past three decades [1]. In the past, it is generally believed that time can be a continuous or discontinuous variable and no comparison exists between the dependent variables [2]. Time series have always been used in the field of econometrics. Jan Tinbergen (1939) devised the first time series econometric model. He also started the scientific research program on the basis of experienced econometrics. At that time, it was hardly considered that chronologically ordered observations might depend on each other. The dominating assumption was that, according to the classical linear regression model, the surplus of the estimated equations is randomly determined and is independent of each other [3].
The fundamental goal of building a time series model is the same as building a precision model that provides a value nearly equal to the values present in the series. From a statistical point of view, time series are the recordings of aleatory processes which vary over time. The most distinguishing feature of time series is that the distribution of observation at a specific point conditional on the previous value of the series depends on the outcome of those previous observations, simply making the outcomes independent. The recordings can be a continuous pattern or a set of discrete values. There are two main types of time series patterns that exist, stationary and non-stationary. Stationary time series values have statistical properties and moments while nonstationary values are simple recordings with changing statistical properties. Both types of series can be used in time series forecasting models [5] [6]. www.ijacsa.thesai.org Time series forecasting model building is done based on the type of dataset used to train models. Stationary datasets are easier to train for prediction than non-stationary datasets. In fact, it is a necessity to convert a non-stationary dataset into a stationary dataset [7]. Models easily understood stationary datasets and extract information more efficiently from them. There are certain methods devised to convert non-stationary datasets into stationary datasets [8]. The popular methods are the Hilbert-Hhuang transform, Fourier transform, Dicky Fuller, etc. Hilbert Hhuang method is specially developed for analyzing non-linear and non-stationary data [9].
Naturally, people's habit of forecasting and making predictions is immemorial. Simply forecasting is the name of predicting outcomes of a plan, but in python, the forecasting is done using datasets available and training them using in-built functions. Using time series forecasting is the process of finding possible values for anything using a known data set. Time series is a popular technique in the current era to solve all types of problems, predicting directly affects the decisions and escort towards clearer imagination. Time series forecasting can be done using both machine learning and deep learning models. The models are different based on their working specifications [10].
There are numerous models in python that serve the purpose of predicting values. Supervised machine learning models and deep neural networks are used for prognostications. Supervised machine learning empowers systems with the ability to learn automatically from the data and get better outcomes with experience without being explicitly programmed [11]. Similarly, deep networks have applications in many areas of life including prediction, detection, creation, etc. Deep neural networks model provide better forecasts as compared to machine learning models. When dealing with real world's problems shallow neural networks need to be sufficiently expressive to predict the task optimally while deep neural networks (DNNs) have been proposed as a way of producing more predictive models [12]. These models are sequences of layers in which each layer uses a linear transformation function. When the layers are combined, it constructs a deep neural model. As many functions are included, that automatically enhances the models' prediction ability.
These models have complex backgrounds and seem difficult to build. With the passage of time, there came a great evolution and now these models are readily present in python libraries. Python has libraries with inbuilt forecasting models that can be used to do predictions. LSTM and ARIMA are the two most influential and long-established models. Models are used considering the characteristics of the encountered problem. A traditional LSTM is a sub-type of RNN model that saves previous sequential data as temporal pieces of information [13], [14], [15]. A general LSTM model comprises the cell; each cell has three parts, i.e. Forget gate, Input Gate and Output Gate. These states are affected by both past states and current input with feedback connections. A standard recurrent cell consists of Sigma and Tanh cells. Each memory cell in LSTM has recurrently self-connected linear unit LSTM. These linear units are called CEC (Constant Error Council). The architecture of LSTM permits it to bridge huge time lags between relevant input events of almost 1000 steps and more. This method is used in the processing of time-series data, in prediction, as well as in the classification of data.
LSTM models work efficiently and have been widely used in various kinds of tasks. The activation function in the output layer determines in which direction the training will lead here [16], [17].
ARIMA (autoregressive integrated moving average) is also a sub-type of RNN networks and has proven immensely useful for prediction [18], [19], and [20]. They have been studied thoroughly and remodeling has been done on ARIMA processes. These models were popularized by George Box and Gwilym Jenkins [21]. That's why ARIMA processes are sometimes known as Box-Jenkins models. They effectively put together everything in a comprehensive manner and the relevant information required to understand and use univariate ARIMA processes. ARIMA models have gained popularity because they can very accurately do short-term forecasts. Sometimes there is some information that cannot be extracted through regression, so ARIMA is used to capture this additional information. In these models, autocorrelation and partial autocorrelation functions are used, as basic instruments, to identify the stationarity of time series [22]. Both LSTM and ARIMA models have their own specific functionalities and their own advantages and disadvantages.
The previous studies showcase the use of ARIMA and LSTM models in different industries. The use of these models is common in finance and commerce markets but they cannot be seen used in trading markets. The literature gives insights that use of LSTM and ARIMA models in prediction problems is very limited. The research is done to discuss the results of implementation of these two models and evaluating the performance models when they are trained on same dataset. The results helps estimate which model is more useful for stock exchange prediction.

III. METHODOLOGY
Time series forecasting is used to deal with a surprisingly vast set of problems. There are different techniques available to implement time series forecasting, each differs slightly but affects greatly. The process consists of five major steps. As in this paper, High Price prediction is being done using two different models (i.e. LSTM and ARIMA). This five-step methodology is used to implement these two models. Although the general working of all five models is identical, it varies greatly according to the technicalities used in each model.  315 | P a g e www.ijacsa.thesai.org

A. Understand the Problem
As mentioned earlier, time series data is used to deal with lots of problems. Time series forecasting is the type where certain solutions are speculated by training previous data. The solutions are; predicting stock market reach at a certain time, estimating survey results based on previous data, weather prediction at a certain date, etc. it is important to build a model keeping in view the nature of the problem you are dealing with. Explore the models that can be used according to the given problem. In addition to understanding models, thoroughly understand the techniques used to implement these models in order to choose the best according to the problem.
The purpose of this research is to build two different time series forecasting models in order to proclaim which is the best model for predicting high prices. To do so, there are two best-performing models (LSTM & ARIMA) that have been working exceptionally well while dealing with prediction problems.

B. Data Gathering / Pre-processing
After understanding the nature of the problem, the next step is to determine the type of data needed in order to train the models. The data should be the cross-sectional type of data. This type has one dependable variable that confides in numerous independent variables. In such data types, the smallscale or aggregate entity is observed at different points in time. In a nutshell, this is the data of different entities collected at the same time.
In this paper, data from Mulkia Gulf Real Estate from Saudi Exchange datasets are used. The data is extensive to cause models to overfit, so certain sheets (i.e. Fig. 2; sheet 1 & Fig. 3; sheet 7) are chosen to train models. It is a cross-section of data perfectly meeting the criteria of the needed data. The purpose of using this data is because the data possess similarities that are consonant with the prediction problem. After training models on this data the trained model can be used to predict the high price value. This data accurately meets the criteria of needed data to predict the highest price value.
After data has been selected, the next step is to pre-process this data. As mentioned earlier, the data type used is nonstationary and cross-sectional data. The models used can train on stationary data more efficiently, so it is necessary to convert this data into stationary data. There are certain methods available to convert non-stationary data into stationary. ADF (Argumented Dicky Fuller) method is used to transform the data into non-stationary. In python, ADF can be imported from the Stats model library as statsmodels.tsa.stattools.adfullers.

C. Exploratory Analysis
Exploratory analysis is the process of analyzing the relationships between variables that exist in raw data. These initial relationships help to understand the nature of the data and how accurately the desired information will be extracted from it. It's mandatory to do exploratory analysis independent of the type of problem, it's mandatory to explore the data first. It's also called preliminary analysis of data, where you plot the data in its original form to find certain structures. When preliminary analysis is done, check the validity of measures, and point out any outliners. It also helps in examining the weightage of different variables to evaluate the effectiveness of certain manipulative variables, etc. The major thing that should be implemented while doing preliminary analysis is cleaning disrupted data, checking the data for null values, etc. Fig. 4 shows that the data has been scaled properly and is ready to train for the prediction models. It also shows the regression patterns of various variables alongside dates.

D. Choosing Libraries and Training Models
In Python, there are thousands of libraries and in-build models available that deal with machine learning and deep learning problems. Python libraries are a set of specific functions put together in a single file. The purpose to make these libraries is to assist coders in doing obvious steps. There is a whole set of libraries available for different domains of artificial intelligence. The problem explained in this research paper is time series forecasting, which is a type of machine learning and deep learning problem. The libraries which come in handy while dealing with such problems are sklearn, tensorflow, numpy, etc.    These libraries are designed according to a specific set of rules that fit their receptive type of problem. For preprocessing of this data; numpy, sklearn, tensorflow, and keras are used. Similarly, there are some in-built forecasting models that are available in python. Use these models suitably for respective problems. The models serve specific purposes; they have been created according to some mathematical ordinances. According to the problem type choose these rules and embed them into deep learning models. A typical deep learning model consists of an input layer, multiple hidden layers (where most of the work is done), and the output layer. Fig. 5 shows the general representation of a deep learning neural network model. Now keeping in view the nature of the problem, use LSTM and Auto-Regressive models.

E. Evaluating Models
Evaluation is done by measuring evaluation parameters. The three basic parameters that are used in measuring a neural model's reliability in time series forecasting problems are Accuracy, Precision, and Recall. In addition to these parameters, some auxiliary parameters are available to evaluate regression results. The added parameters are MSE (Mean Square Error), MAE (Mean Absolute error), MAPE (Mean Absolute percentage), and MDAPE (Median Absolute Error Average and Percentage). These metrics collectively help provide an explanation regarding the mistakes made unknowingly.

IV. IMPLEMENTING TIME SERIES MODELS
In this research paper, the two best time series forecasting models are used that are already available in python. These models will be trained on two datasets from Mulkia Gulf Real Estate from Saudi Exchange. Sheet 1 & Sheet 7 of these datasets will be used to train the models. The aim is to determine which model is the best model for respective targeted industry. The problem is predicting the highest profits values. The trained models will be attested, and finding will be evaluates based on evaluation parameters to get exquisite results. The general methodology explained above will be followed. The models are embedded one by one into the neural networks. After embedding the models, the preprocessed datasets will be provided and will run in the code for the training of models. The general method proposed to train the models is nearly identical. They only differentiate based on the functionality attributes, and the technical differences, i.e. number of hidden layers used, the evaluation parameters used, and the dataset preparations, the working background. Detailed information on these python models is provided below.

A. Long Short-Term Memory Loss (LSTM)
As the prediction model should remember long-lasting events from part, LSTM is the first choice to use in this paper. LSTM is an inbuilt python function that can be imported from TensorFlow. LSTM is the branch of recurring neural networks. It was seen that there weren't any RNN structures that can do backpropagation of long intervals, so to solve such difficulties LSTM was proposed. LSTM has two gate unit cells that open and close to the information flow within each memory cell, i.e. Packs of information between time lags [23]. It requires the data used to be in a definite shape. The dimension should be equivalent, and the data should be properly cleaned, integrated, and scaled. The commonly used activation functions for LSTM-based regression problems are Sigmoid and Tanh. Tanh has proven to be very effective in dealing with vanishing gradients.
The functioning of the LSTM model is explained in Fig. 6. As said earlier, LSTM has the ability to remember information for the long term. This is done by remembering the previous output and combining it with the current one. To understand the working of LSTMs better, there is a need to fathom the mathematics behind them.
Divide the working of LSTM neural networks into four stages. In the first stage, the model decides whether to forget or remember information in the previous cell. This value is decided by doing the following calculation: Where σ is the activation used in the input layer, W f and B f are the weight and bias vectors, h t −1 is the output at time t-1, and X t is the input vector at time t. If the forgetting value is equal to zero then this means that the previous value has been forgotten.
In the second stage, the model decides which value will get stored in the next cell state. To do so, sigmoid and tanh layers are used. The sigmoid layer chooses the pieces of information which need to be updated, and the layers decide on a second option value. By combing these two values, the models create new values to update the cell state. The formula to calculate sigmoid and tanh values are: www.ijacsa.thesai.org Input gate value i t = σ (W i · [h t −1, X t ] +bi) Updated value C t = tanh (WC· [h t −1, X t ]+bC) σ represents the sigmoid-shaped function, it is the input gate value and C is the updated value at time t.
In the third stage, the new cell state NC t is obtained using ft and its value. The equation below shows how to calculate this NC t value, In the final stage of LSTM, it is determined which value will get considers as output. Use the sigmoid equation to determine which cell state will be the output; the value is processed through tanh to get a value between 1 and -1.
The inbuild model present in python work is according to the above-explained structure. The models can be improved further by choosing the correct values of weights and bias values, activation functions, and the number of hidden layers.

B. Autoregressive Integrated Moving Average (ARIMA)
ARIMA forecasts temporal dependencies using only historical values. The data used for Autoregressive models are prepared differently from LSTM. In addition to necessary preprocessing steps, the AR model's data needs to be stationary. Simply put, data is stationary when its numeric properties do not change over time. From a mathematical perspective, it refers to the data whose Mean and Variance will not depend on time. If the data doesn't meet the properties of the stationary dataset, you can do a series transformation to make it stationary.
ARIMA model is a combination of two models Autoregressive (AR) and Moving Average (MA), integration (I) is applied at least once to make the data set stationary.
In the AR part of the model future values are predicted using the lags from the data values. The general equation AR model is: The Moving Average is the part of the model where value is forecasted using the forecasting error differences is calculated while making predictions. The general form of MA equations is: ∑ Prediction is done by combining all three orders and getting an estimation of how to quickly fit the model. Some standard denotations are used to represent the above three, i.e. p, d, and q; -p‖ is the number of observations included in the model, -d‖ is the number of times differentiating the raw observations, -q‖ is the number of moving average size. To find these parameters, first fetch a Correlation and Partial Correlational graph from the dataset.  Fig. 7 and 8 can be used to roughly estimate the value of p, d, and q. The integer value of p can be obtained from the cutoff edge points of PAC graphs. Similarly, the value of q can be obtained using an AC graph. If the graph does not represent steady cut-off points, use in-built PACF and CAF functions. Fig. 9 shows how the ARIMA works and on which parameters its working can be evaluated. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 1, 2023 318 | P a g e www.ijacsa.thesai.org V.
EVALUATION PARAMETERS FOR MODEL'S PERFORMANCE Taking note of the significance of time series models, it is very important to examine the results of these models. For specific industries, the targeted values of forecasting models are the milestones for making decisions [9]. Artificial intelligence learning models are always trained using realworld data. They are prepared to use for actual problems, so they are usually tested using the same type of data.
Keeping in view the importance of time series models, scientists have devised a number of ways to test the audacity of the models used. In this part, the performance parameters the will be used to evaluate the models will be discussed, and how best model among them will be picked. Following are the parameters used for testing model's prediction validation.

A. R Square Score -R2 Score
It's a statistical value that is used in a prediction model to find the extent of difference between dependent variables that can be made sense of by the independent factor. In simple words, R2 is how well the testing data fits the trained regression model. The general formula used to calculate the R2 score of the training models is.

R2 = 1-sum of squares due to regressions total sum of squares
The R2 score can only be calculated for both. The R2 values for both models are displayed in Table I.

D. Mean Absolute Error -MAE
It is the mean of the absolute difference between the observed value, and the real observation value. In other words, it tells how much larger the difference between actual and predicted values which can be expected from the model. The smaller the value of MAE, the better the model will work. If the value is zero, the model can predict future values accurately. Models are compared on the basis of MAE values such as; a model with a smaller MAE will be considered best among all. MAE simply calculates the error; it can't identify the weightage of individual values. The general equation used to measure MAE is:

F. Median Absolute Percentage Error -MDAPE
A little modification is done in MAPE -Mean Absolute percentage error in order to get MdAPE. The median of the MAE is found by arranging the values from the smallest to the largest; then the middle value only if it's even is picked as MdAPE.
MdAPE is recommended for evaluation when you are dealing with ARIMA models. MDAPE evaluation is done for ARIMA models. MdAE Percentage is interpreted as good when it's between 10% and 20%. The RMSE value for ARIMA is (Sheet 1 & Sheet 2; 0.630533 and 0.333124).
In addition to all the parameters mentioned above, also study the avg_loss and val_loss graphs to evaluate models like LSTM. These additional parameters help to evaluate and compare the two models better.

VI. RESULTS
The purpose of the research is to know which model is the best-performing model for the Highest Price value prediction among the two that have trained using different Python libraries. The results will comprehend using graphs combined with the evaluation parameters explained above.
As mentioned, two sets of Data have been used; Sheet 1 & Sheet 7 from the dataset of Mulkia Gulf Real Estate available at Saudi Exchanges. The first model trained on this data is Long Short Term Memory (LSTM). The evaluation graph shows the regression between actual trained and prediction train values from sheet 1 to sheet 7 as follows: www.ijacsa.thesai.org    The visualization shows (Fig. 12 & 13) shows very little difference between the actual value and the predicted values from ARIMA.
The mathematical evaluation of ARIMA was done using MAE, MAPE, MdAPE, and MSE. You can read those values from tables. The overall values gave an idea that ARIMA can predict the future highest price more accurately than LSTM.

VII. CONCLUSIONS
In this paper, the time series forecasting problem is explained. In this paper, the highest price prediction using two python in-build models (i.e. LSTM & ARIMA) is done. The data is trained on the Mulkia Gulf Real State dataset. The dataset is pre-processed and trained on deep-learning models. After training evaluation of models is done using evaluation parameters; MAE, MSE, RMSE, val_loss, accuracy, R2 score, etc. The values of these parameters collectively determine which model performed the best. These parameters indicate that the ARIMA model can predict the highest price more accurately. When viewed from the graphical point of view, there is very little difference between actual values and predicted values in Fig. 12 and 13, this shows that ARIMA can predict more precisely as compared to LSTMs. Overall, it was concluded that for stock price prediction, ARIMA models can perform better than LSTM Models.