Can the Futures Market be Predicted-Perspective based on AutoGluon

—This paper discusses how to raise efficiency of predicting the Chinese futures market correlation coefficient. First, the predicted periods are divided by major events and the predictabilities between different periods are compared at the same time. Second, on this basis, an automatic machine learning framework, AutoGluon is applied to compare the predictive ability between different deep learning models such as LSTM and GRU. Results demonstrate that: (1) Compared by LSTM and GRU, AutoGluon can indeed raise efficiency of predicting. (2) The changes of prediction error between different periods can explain the influence of major events happened in futures market. (3) Although the predictive ability of many models decline over time, the performance of XGBoost is relatively stable, which can provide useful tools for market participants.


I. INTRODUCTION
Financial time series forecasting methods include econometric method represented by ARIMA model and GARCH model, and deep learning methods represented by LSTM and GRU model. Futures market forecasting can also adopt the same method. The traditional econometric methods based on linear function hypothesis and model driven show good applicability in dealing with small amount of calculation and low-dimensional data. However, with the explosive growth of data volume in the era of big data, econometric method has gradually exposed its weaknesses.
The LSTM model was originally designed for natural language processing tasks. Now it has attracted more and more attentions in time series forecasting tasks. A large number of in-depth studies have been carried out on the use of LSTM model to predict future prices. The results showed that deep learning method has obvious advantages in forecasting accuracy compared with econometric methods. However, the deep learning models are too dependent on model structure and parameter adjustment, which makes it is difficult to deploy rapidly in different situations. In recent years, the automatic machine learning method represented by AutoGluon framework has performed excellently in various tasks relying on bagging and stacking strategy, and has attracted more and more attentions due to its ease of use. Whether deep learning model or automatic machine learning method, the premise for predicting the future is that the data obeys the assumption of independent and identical distribution. A large number of studies have proved that it is difficult for financial market participants to get rid of psychological effects such as greed and fear, resulting in the fact that historical data have a certain impact on future data, which leads to the fact that financial time series data are not completely independent before and after. At the same time, the impact of major events may change the expectations of market participants, making it difficult for financial time series data in different periods to maintain the same distribution assumption. These problems make many models highly fitting historical data in the training process have poor generalization ability in the testing process. Models that perfectly fit existing data cannot guarantee the same prediction accuracy in the future.
In short, the assumption that financial time series data obey the independent and identical distribution fundamentally challenges the basis of machine learning model to predict the future. The specific manifestation is the difference between training error and testing error, but the changes of difference may indicate the changes of market risk.
The innovation of this paper is that according to the time point of major public events (China's supply-side reform and the new corona epidemic), the futures index time series data are segmented and the correlation coefficients between varieties are calculated. By analyzing the difference between the training error and the testing error of the futures index correlation coefficient, new ideas are provided for the futures market prediction.
The structure of this paper is as follows: the second section is the literature review; the third section introduces the models used in this paper; the fourth section is the empirical analysis process; and the fifth section is the summary and enlightenment.

II. LITERATURE REVIEW
There are many studies on the use of econometric models to predict financial markets. Li Hongquan [1] used interval measurement method to study the crude oil price prediction. Zhang Y J, Yao T, He L Y [2] compared the abilities of different GARCH models to predict the crude oil market. Li Hongquan and Zhou Liang [3] used CoVaR, cross-sectional VaR, absorption ratio, Granger causality index and information spillover index to measure systemic financial risk, and examined the predictive ability of five indicators on macroeconomy in detail. Hong Yongmiao, Wang Shouyang [4] pointed out that the econometric methods focus on the relationships between the economic variables to reveal the inherent nature of economic operation, but due to the highly simplified and abstract mathematical model, many other www.ijacsa.thesai.org factors in reality may be not taken into account, which often results in model misdesign.
With the rapid development of artificial intelligence technology in the context of big data, machine learning, deep learning and text analysis have been widely used in the research of financial market prediction. Chen Y, He K, Tso G K F. [5] used deep learning model to predict the crude oil prices, R.A.de Oliveira, D.M.Q.Nelson, A.C.M.Pereira. The author in [6] studied the application of LSTM model in stock market forecasting. Mu Nianguo, Yao Honggang [7] proposed a prediction model of recurrent neural network based on attention mechanism, and found that the prediction effect of gated recurrent network was improved after adding attention mechanism. In addition, a large number of literatures focus on improving the prediction ability of deep learning model in stock and commodity markets [8]- [16]. The common point of the above research is using machine learning models to predict future prices directly, and the researches focused on improving the accuracy and speed of model prediction. Ensembles that combine predictions from multiple models have long been known to outperform individual models. Wang Y, Liu L, Wu C. [17] studied the effect of using time-varying parameter models to predict the crude oil prices. Sun Fuxiong et al. [18] took the Chinese listed companies as the research object, and put forward the combination model of stock suspension prediction. The empirical analysis results showed that the combination model prediction has achieved high accuracy. Zhou Hao et al. [19] proposed an improved crude oil price combination forecasting model, therefore proposed a dynamic particle swarm optimization algorithm. The experimental results showed that the predictions of combined model can greatly reduce the computational complexity and improve the prediction accuracy. Nick Erickson, Jonas Mueller et al. [20] proposed the AutoGluon framework based on automatic machine learning, which greatly simplifies the preliminary work such as feature engineering and parameter debugging of traditional machine learning models, and performs well in the prediction task of structured data. To our knowledge, there is no precedent to apply AutoGluon to the prediction of financial time series. In general, the research on the prediction of financial time series using single or combined models of econometrics and deep learning has been quite sufficient. But the research on the prediction performance of automatic machine learning is not sufficient enough, and the financial time series data do not obey the assumption of independent and identical distribution is always an unavoidable matter. In this paper, the AutoGluon framework is used for predicting the financial time series for the first time. By analyzing the difference between the training error and the testing error of the correlation coefficients of the futures index, the influence of major public events on the futures market is studied to explore the method of predicting the risk of China's futures market and provide reference for researchers.

A. AutoGluon based on Automatic Machine Learning
In the past decades, many powerful machine learning models have emerged. But how to integrate these models is faced with many obstacles, such as model selection, model integration, super-parameter adjustment, feature engineering, and data preprocessing. Automatic machine learning(AutoML) provides a possible solution through the combination of model selection algorithm and super-parameter optimization strategy. As a representative of AutoML, AutoGluon arranges and trains different models hierarchically, which saves training time and reduces overfitting by bagging and stacking strategy. It has long been found that the combination of multiple models can achieve better performance than single models. The popular AutoML uses bagging and stacking strategy to improve prediction ability and reduce variance. Specifically, several 'base' models are trained separately at each layer, then the outputs of each model are aggregated as features to be transmitted to the next layer for further training (stack) to achieve performance beyond the 'base' models. As a typical AutoML, AutoGluon embodies these ideas in Fig. 1. This paper select four representative machine learning algorithms to generate the basic model of AutoGluon, including artificial neural network, LightGBM algorithm, XGBoost algorithm and CatBoost algorithm. After the data were input into the model, different samples are formed by random repeated sampling, then the bagging strategy is applied to each layer to train the basic model on different samples by using four algorithms. At the same time, the stacking strategy is used to train the basic model on the same original data sample of each layer. Finally, all the scalars of each model output are connected to obtain a vector, and then a linear combination is made to obtain the final output of the model. The key codes are given in Fig. 2. 3.hyperparameter_tune_kwargs= {'num_trials':num_trials,'scheduler':'local','sea rcher':search_strategy,}predictor=TabularPre dictor(label=label).fit(train_data,time_limit=t ime_limit,num_stack_levels=1,num_bag_fol ds=3) www.ijacsa.thesai.org

B. LSTM and GRU
As branches of the recurrent neural network, LSTM and GRU models can solve the problem of gradient disappearance, and are often used for time series prediction. In order to compare with AutoGluon, Keras platform is applied to build LSTM and GRU models, and the key codes are given in Fig. 3.

C. Model Assessment Index
There are many indexes to evaluate the fitting ability of machine learning model. This paper use mean square error (MSE) and mean absolute error (MAE) as model evaluation indexes. These can be calculated using the following formulas.
where n is the total number of samples, is the actual value and is the predicted value.
Compared with MAE, MSE gives greater weight to outliers, so it is not as stable as MAE. For the fixed learning rate, the effective convergence of MSE is better than that of MAE, so MSE and MAE are used to evaluate the performance of the models.

A. Variables and Data
It is common to select extra-price indicators as explanatory variables to forecast future price. However, the available time of extra-price indicators often lags behind the price itself, which leads to the fact that the hindsight predictable phenomenon cannot be realized in real time. Moreover, the reflexivity between some extra-price indicators and prices is hard to be falsified. For example, oil prices influence oil production and vice versa. Therefore, this paper study the risk measurement of futures market by establishing the correlation coefficient time series between the futures index of rebar, iron ore and coke. The explained variable is the current value of the correlation coefficient of China's futures market price index, and the explanatory variable is the historical value of the correlation coefficient. The specific algorithm is to use the corr function of math module in python to calculate the correlation coefficient based on the daily closing price of futures index, and the number of cycles is 100.
This paper adopts the black industry index of South China Futures released by the tushare data community, which includes rebar, hot coil, iron ore, coke, coking coal, wire rod, manganese silicon and ferrosilicon. However, due to the different listing dates of each variety, the historical transactions of wire rod, manganese silicon, ferrosilicon, hot coil and coking coal are not active and the market influence is small. Considering the above factors, this paper only analyzes the futures price index of rebar, iron ore and coke for 2001 trading days from October 21, 2013 to December 31, 2021.
It can be seen from Table I that  Then the training set and testing set are input into the model respectively. Finally, the results are compared and analyzed. The specific process is given in Fig. 4.

B. Empirical Analysis
For market participants, when a good fitting model of historical data (training set) can predict future data (testing set) within a certain error range, the risk is low. On the other hand the risk rises when the prediction error increases. Based on this, this paper proposes two hypotheses: 1) When the market is history=model.fit(x_train,y_train, batch_size =1,epochs =30) www.ijacsa.thesai.org influenced by external events, the risk will increase characterized by greater prediction error between the training set and the testing set. 2) When the market gradually adapts to the influence of external events, the risk will be reduced which is characterized by the decrease of prediction error between the training set and the testing set. If these two assumptions hold, it can be estimated that the market risk level by observing the change of the prediction error between the training set and the testing set, and then replace the model when the original model is obviously unable to adapt to market changes. After inputting the dataset into different models, the output results are as follows.
where Ratio of error=Testing MSE / Training MSE, and the smaller the MSE index is, the better the fitting degree of the model to the dataset is.
It can be seen from Table II that AutoGluon framework has obvious advantages in fitting degree compared with LSTM model and GRU model in each training set, but it is completely backward in the testing set. Especially in interval 5, the error ratio of AutoGluon framework is as high as 24.22, which is far higher than that of other models. If the corresponding MSE index is carefully observed, it can be found that the MSE of the training set is only 0.01, and the MSE of the testing set is 0.35, which indicates that the AutoGluon framework has a certain overfitting phenomenon and leads to poor generalization ability of the model. Therefore, when measuring the prediction accuracy of the model, the MSE value of the model in a single interval cannot be used as the sole criterion, but the performance of the model on the training set and the testing set should be compared. However, even if the model performs well in both training set and testing set, it cannot guarantee that the model will have the same stable performance in the future.
Taking interval 2 (supply side reform) and interval 5 (new corona epidemic) as reference points, from interval 1 to interval 3 and from interval 4 to interval 6, it can be seen that the occurrence of two major events increases the error ratio of each model. This phenomenon confirms the first assumption mentioned above. One possible explanation is that the occurrence of major events leads to the increase of market risk, which is manifested as the decrease of model prediction ability. By comparing interval 3 and interval 4, the impact of old major events on the market gradually decreases as the error ratio decreases. This phenomenon confirms the second assumption mentioned above. But as major new events occur, the error ratio expands again. Although the error ratio has fluctuation, but if the interval 1, 3, 4, 6 is divided into a group and the interval 2, 5 is divided into a group, the overall error ratio increases gradually. This shows that as time goes by and major events influence the market, the overall forecasting ability of the model is declining. The following Table III MAE index  descriptive statistics also reflects the same characteristics.   TABLE II  This paper use artificial neural network, LightGBM algorithm, XGBoost algorithm and CatBoost algorithm to generate AutoGluon framework sub-model for prediction. The dataset is divided into six intervals, and AutoGluon framework generates more than 30 sub-models in each interval. For simplification, this paper selects the interval before and after the outbreak of the new coronavirus (interval 5), and studies the top 10 performance sub-models in the training set and the testing set, respectively. The evaluation index is MAE as follows. It should be noted that the research conclusions of other intervals and MSE are basically consistent with this. where 'L' represents the number of stacking layers, 'T' represents the parameter search times and 'BAG' represents the use of bagging strategy.
In the training set of six intervals, the prediction accuracy of the model is the highest, whether measured by MAE or MSE. However, in the testing set of each interval, Weighted Ensemble _ L3 performs quite backward, which may be due to overfitting in the training process of weighted combination model, which also shows that the generalization ability of weighted combination model is weak. In the testing set, it is found that several models trained by XGBoost algorithm perform well. It is also worth noting that the same model performs poorly in the training set and does not enter the top ten.
Based on the characteristics of the AutoGluon framework, the more stacking layers and parameter search times, the more complex the sub-model trained will be. However, more complex models do not necessarily achieve better prediction results, which is quite obvious on several sub-models generated by XGBoost algorithm in Table IV testing set. In general, from the perspective of model prediction accuracy, AutoGluon framework is generally better than LSTM model and GRU model, especially in the training set. There is no significant difference between LSTM model and GRU model. It is particularly noteworthy that the testing/ training error ratio of AutoGluon framework is much larger than that of LSTM model and GRU model, which indicates that AutoGluon framework has certain over-fitting phenomenon.But this does not affect the conclusion that AutoGluon framework has stronger overall prediction performance. At the same time, it is noteworthy that the testing/training error ratios of LSTM model and GRU model are smaller than those of AutoGluon framework, indicating that the prediction performance of LSTM model and GRU model is more stable.

V. CONCLUSION
In this paper, it is found that: (1) the impact of major events increases the difficulty of futures market prediction. At the same time, with the passage of time, it is more difficult to accurately predict the market through a single model, which is verified by comparing the change of interval error ratio before and after the event. (2) Although over-fitting phenomenon exist, the prediction accuracy of AutoGluon framework which consumes more resources is generally better than LSTM model and GRU model, but the overall performance difference between LSTM model and GRU model is trivial. (3) It may be meaningful to compare model performance only on specific datasets or tasks. By consuming more resources to train more complex weighted combination models, it is not certain to achieve better prediction results in specific tasks, while simple models are not necessarily inferior to complex models. The diversity of specific tasks and the ease of use of AutoGluon framework will make AutoGluon framework based on automatic machine learning have greater advantages over traditional machine learning methods in the future.
Based on these findings, such following suggestions are put forward: (1) In addition to MSE or MAE, ratio of error may be more suitable to measure the model prediction ability. (2) In order to improve the performance of time series prediction task model, XGBoost algorithm is worth being studied in the future.