Prediction of assets behavior in financial series using machine learning algorithms

The prediction of financial assets using either classification or regression models, is a challenge that has been growing in the recent years, despite the large number of publications of forecasting models for this task. Basically, the non-linear tendency of the series and the unexpected behavior of assets (compared to forecasts generated in studies of fundamental analysis or technical analysis) make this problem very hard to solve. In this work, we present for this task some modeling techniques using Support Vector Machines (SVM) and a comparative performance analysis against other basic machine learning approaches, such as Logistic Regression and Naive Bayes. We use an evaluation set based on company stocks of the BVM&F, the official stock market in Brazil, the third largest in the world. We show good prediction results, and we conclude that it is not possible to find a single model that generates good results for every asset. We also present how to evaluate such parameters for each model. The generated model can also provide additional information to other approaches, such as regression models.


INTRODUCTION
The prediction of financial market assets is an issue that concerns both investors and researchers.In recent years, it has been studied using different machine learning approaches, as show in [12].Despite the large amount of research, the prediction of the behavior of an asset in the real world, either with classification or regression models, is still a difficult task to accomplish [13].
The main difficulty on making good predictions is due to both the non-linear characteristic of financial time series and the great amount of uncertainty and noise found in financial market data [14], [15], [16].For this reason, we argue that classical statistical models are not good to make this kind of prediction.This type of time series requires the use of algorithms with a greater ability to generalize, such as Support Vector Machine (SVM) and Artificial Neural Networks (ANN).This work focuses in solving the asset prediction problem in the financial market, addressing the problem as a classification task and modeling it using supervised techniques such as Support Vector Machine, Logistic Regression and Naïve Bayes together with interday data to generate its classifiers.
The main purpose is to serve as decision model for the investor and can still be used as an entry into new models, particularly regression, in order to reduce its error.
The choice of the Support Vector Machine (SVM) algorithm was made in order to present results comparable and often superior to those achieved by other machine learning algorithms such as Artificial Neural Networks [30], [31].
We also use the Logistic Regression (LR) (with gradient descent method to choose the best parameters) and Naive Bayes algorithms (NB) to serve as a comparison in our study [3].A baseline (BLS) has been implemented to verify the distance between the probabilities of success of an investor without any market knowledge to the accuracy found with some parameters applied to algorithms.We also use the open source Framework FAMA [6] for development and implementation of algorithms.Despite good results in recent studies [12], the challenge of finding models with good generalization ability with actual data is still open.We conclude with the results of the experiments that we can find models with good amount of hits if the parameters are set correctly, and verify that, despite the good generalization characteristic proposed in algorithms in machine learning algorithms, it is not possible to apply a sole model for all stock assets.

II. RELATED WORK IN ASSETS' PREDICTIONS
In recent years several techniques for regression and classification financial assets have been explored, from classical statistical methods to more complex algorithms for machine learning, such as Artificial Neural Networks [19], [20], Logistic Regression [17], [18], PLSR [21] and more recently Support Vector Machine [22], [23], [24].Reference works in the area prove that these soft computing techniques are well accepted for the study and evaluation of financial series.
The main difference between most of these works is the output information, while some of them provide the action to be taken by the user (classification problem), some others focus on the minimum and maximum stock values achieved during the day (regression problem).
Whatever the choice, we still got the same problem: the accuracy's loss in new periods and scenarios.www.ijarai.thesai.orgOur novel contributions are the analysis (interday stock parameters and some others variables in this problem) of how this models (focusing in the classifiers generated by SVM) behave with new data scenarios and the adjustments needed to minimize the loss percentage rate expected between training and tests scenarios.In order to achieve these results, we define a hybrid model using a sliding cross validation environment, where the model is re-trained after a defined period

III. MACHINE LEARNING TECHNIQUES AND MODELING
In this section we detail the machine learning algorithms used in this work, as well, as their important parameters.For the training task, we set our target variable (y) as "+1" if the day's closing value of the asset is grater them the day before's, which means that today, the asset closing value grew when compared to the closing value of the day before, also, if the value of the subtraction (day closing valueday-1 closing value) is negative, then we set the output target variable as "-1", thus creating a binary classification problem.

A. Tendency Keeper (TK)
Tendency Keeper is actually not a machine learning algorithm and it was used as our baseline system (BLS) for comparisons.The TK approach performs next day classification only considering the closing value in the day (cvd) subtracted the closing value of the previous day (cvd-1).If we have a negative value (cvd < cvd-1) then the output value for the next day is classified as the negative class ("-1"), on the other way around, if we have a zero or positive value (cvd >= cvd-1), then the TK sets the positive class to the target variable ("+1").
Here, we expect an accuracy rate close to 50%, similar to a simple guess, which is the probability of success of an investor without any knowledge of the financial market.
We use this one as a lower limit to compare the quality of classifiers.

B. Naïve Bayes (NB)
The Naive Bayes algorithm is a probabilistic model based on Bayes rule.Because of its simplicity, this algorithm is widely used in Machine Learning, for both discrete and continuous X.It is a naive approach because it considers the attributes to be conditionally independent, i.e., a given event does not imply another one.
In other words the attributes X 1 , …, X n are all conditionally independent given Y and the data has a normal symmetrical distribution.Despite this naive and simplistic premise, it reports good performances in several classification tasks [7].
We can represent Bayes rule as: Where: y i denotes the target value of the e th example x k denotes the k th attribute value for an example x Our implementation assumes that for each attribute x k in ith example x, we calculate standard-deviation (σ) and the average (μ) for each class, in our study case, "+1" and "-1".
After this we compare the result of each formula (as defined in Equation 1) for each class.The algorithm evaluate the class ("+1" or "-1") using the higher output value of the calculation for each class Where z is defined by:

C. Logistic Regression
Logistic Regression is a function approximation algorithm that uses training data to directly estimate P(Y|X).It is an approach to learn functions of the form .
Roughly, it gives the probability of y = 1 as a set of discrete or continuous variables in a vector X.In this implementation we use a gradient descendent function to find the best set of parameters.
As all others algorithms, our output value is labeled as "+1" or "-1".We can denote an example by ̅ and the value of feature as .It also defines an additional feature, (bias feature).The probability of an example being a positive value is given by: Where g(z) = ( 5) and { } denote the weight for the feature.In training we get weight vector ( ) defined by the gradient descendent method.

D. Support Vector Machine (SVM)
Support Vector Machine (SVM) is nowadays the most promising machine learning algorithms, is based by statistical learning theory [4], developed by study cases started with [5] and establishes a series of principles to be followed in obtaining classifiers with good generalization.
Basically, the algorithm defines some key points to be the support vectors, at first defined by the biggest distance between the linear classifier and the closest class's examples (labeled as +1 and -1 values).In other words, it defines a margin which is the width that the boundary could be increased by, before achieves a data example.In the experiments sections, we will explain more directly how its variations (kernels and theirs parameters) impacted the prediction result.We used the library Libsvm [1], with was integrated in FAMA framework to implement our experiments.www.ijarai.thesai.orgSome solutions for solving classifications problems requires that input data must be linearly separated, but we know that is not always possible.To solve this issue, Vapnik proposed a mathematical method to transform low-dimensional data into a high-dimensional projection (using kernel functions), which is easier to separate input data linearly.
The resulting hyperplane is defined maximizing the distance between the "nearest" vectors of different classes.The thinking is that a bigger margin directly implies in the best capacity of generalization.The Figure 1 shows this idea.The Kernels are functions that help the transformation from low-dimensional feature space into a higher-dimensional feature space, which is a necessary condition for separating the input data values properly.
There are some kernels implementations and we describe the kernels type used into our model analysis.Despite the fact that the LIBSVM library supports several formulations for classification, regression and distribution estimation, we focus our work on the classification models: C-Support Vector Classification [9] and nu-Support Vector Classification [10].
As kernel types, there are many kernels functions, but the common are: linear, polynomial, sigmoid and radial base function.
 Linear: ( ) where is called the kernel function and are the training vectors.We test all these in our experiments.The results are showed and detailed in section 4 As we can see, depending on the kernel choice, some parameters ( ) have to be set.

A. Database
We use data from the Bovespa website [8], available through interface files, to create a financial series.The prepossessing task produced a new database contains 107(187-80) records with information between 01-Jan-2006 and 31-Dec-2006-10, since we need to disregard the first 80 values (window's maximum size) to set the discrepancy values into our generated dataset.In the original dataset, we have 187 days, in order to compute the difference of the 80 day before's, we need to discard 80 first days.
The preprocessing task must consider some factors such as: outliers' removal from the sample, attributes' selection and scaling of the values

B. Input Attributes
From our database, we selected the attributes, according to technical analysis, that are most relevant to the final value of an stock [25]: opening price (open), closing price (close), day maximum (max), day minimum (min) and volume (vol).We labeled these values as "base attributes".
Moreover we used the series discrepancy (3, 5, 10, 15, 20, 40, 60 or 80 days).The results show the direct impact in model sensibility when we change the window, overfitting the model when using bigger window values.
A good pattern that was found is the combination between these "base attributes" and the series discrepancy values.We realize that when used together, this combination can be good for accuracy since we have a low discrepancy (most of the time when this value is equal 3 or 5).This pattern was valid only for values less than or equal to 5.
As example, from original database values, we produce the following input structure, using a discrepancy value of 3, showed in Table I.

C. Outliers: split and inplit
We must consider two relevant aspects in financial series: split and inplit.Both are techniques used as strategy aiming asset price increase.
"Split" is a strategy that companies use in order to improve the liquidity of an asset.The Split technique occurs when the www.ijarai.thesai.orgstock price is too expansive, difficulting the financial transaction, often caused by memory investor [12]."Inplit" is the inverse operation of the "split".The inplit is used to to enhance the liquidity of the asset when their price is far below the market and reducing the volatility of the asset (when the asset's value is too low any variation represents a large variation of percentage).
In this work, we must consider that operations (both split and inplit) as a noise in the financial series, since it generates a large variation in the price of the asset, although they do not cause any impact in the investment portfolio's value

D. Scaling
In machine learning techniques, it is advisable to put all input values into a range of [-1, +1] or [0, +1].This increase the performance of algorithms (by avoid numerical difficulties during the calculation, for instance kernel functions usually depend on the inner products of feature vectors) and doesn't privilege some (greater) numeric values.Warren S. Sarle [13] explains the importance of scaling in your research.
Obviously, the same scaling method must be used in both training and testing dataset.For example, if the x attribute of training was scaled from [-100, +100] to [-1, +1] and the same attribute in the test data lies in the range [-120, +80] then the result test dataset must be scaled to [-1.20, + 0.8] After the data preprocessing task, we perform some analysis of the SVM parameters in order to find a model that presents a good performance with new data.
V. EXPERIMENTS Among the choice of attributes, we analyze the impact of some parameters and their variations into model´s accuracy.These parameters are described in Table II.Furthermore, we analyze some specifics details in modeling task as training period, input attributes and cross validation method compared with our sliding cross validation implementation.
One problem yet to be solved is finding a good training period.Given the characteristics of financial time series, we can not train with a very large (subject to underfitting) or a very small dataset (subject to overfitting) [9].The next Tables (III, IV and V) shows this behavior.We argue the oldest values are less important for closing value day then closer values.When we work with a bigger training dataset, the output model cannot find a generic good model for prediction.From technical analysis, we used basic values as: opening value, closing value, highest and lower value in the day and also volume.Dow's theory argues that these variables can be used to predict the market movement [26].We cannot take this statement into our model for all assets.Basically we found a pattern in some assets, when we use a small discrepancy (3,5) we can see little improvement in accuracy but, when this value grows, in most cases, this improvement is lost.Furthermore, there are some assets where this affirmative is not true, such as ELET6 asset.

In Table VI and Table VII, we show the accuracy of SVM and NB training considering discrepancy values (D) and considering both base attributes plus discrepancy values (BA+D)
. For ALLL11 and also CSNA3 we cannot see any (real) improvement into hit rate when se use both input groups.For PETR4 in some window size values we have a considered gain.For ELET6 the combination of base attributes and discrepancy values helps in the prediction task for any window size value when we compare SVM results.In Table VIII   Bezzera da Silva et.Al. [11] studies the correlation between stocks in Bovespa with market graph by Power Law.
Despite the difference in the research focus, we can see the relation between stocks and if we can cluster these different groups, specifics model rules can be applied to get best model as possible.
Analyzing the variations of kernels, we can see the best values for accuracy by using the Polynomial Kernel and window variable value set to 3 and 5 (for ELET6).This test is carried out with the nu value of 0.5 [2].In financial series models, it is important to retrain the model after a certain period in attempt to get the actual tendency.We recognize the importance of historical data, of course, but argue that only a specific time period is really important in order to make a correct prediction.We can prove this affirmative comparing standard cross-validation method with 80% of total data used to train and get the classifier and 20% to test the accuracy of the model and sliding crossvalidation method.This one was created using the same parameters used with traditional cross-validation but being retrained before predict next day tendency.In our tests we have used 80 days to train and predict next day tendency (high or low).With "sliding" validation, we get the average of the output values to calculate model´s accuracy.This parameter is still open in our study and presents best performance with SVM approach.
Table X shows the difference in accuracy between standard cross-validation (C1) and sliding cross-validation (C2) which strongly indicates the need of retraining the model with 80days prior.As we can see our method sliding cross-validation performs best when compared with cross-validation method.This shows the need to retrain the model after a certain time period, likely looking for the tendency period

VI. CONCLUSION AND FUTURE WORK
As expected, the SVM algorithm had a better accuracy than the other algorithms studied in this work and it also presented good generalization abilities.It can be noticed that the parameter adjustment using kernel functions and the defined margin, especially regarding the implementation, directly impacted the outcome of the model.Despite all difficulties found in the financial time series, such as noise and uncertainties, after adjustment of the data, we obtained good results to serve as a basis for decision making.Another important factor is the period considered for training the model, which does not produce good results in cross validation when it is too small or too large.The approach to validation of the model followed the method of the experiment.The retraining presented with our sliding cross validation method provides the best results compared with cross validation method.It highlighted the need to retrain the model after a certain period.
In attempt to do better predictions, some factors will be considered in future works.The moving averages (simple, weighted, exponential and others), are often used in technical analysis as input parameters in the model to indicate an uptrend or downtrend (through lines of support and resistance).It can be a good factor to the tABLEmodel once we have difference in the behavior of the predictor variable in a downward trend and upward trend.
We also can look for the relation between stocks.Recent works use graphs to group stocks through its correlation [11].This grouping can bring benefits to the data analysis, since it is expected that correlated assets by similarity in behavior.
We argue that the last days have more influence on the price´s behavior of the stock and it can be proved that by sliding validation method that considers only 80-days to train and produce a better model.Next steps can consider analyze of the variation of this variable in accuracy results as well as calculate the input variables by weights.We also will consider a hybrid model created by the analysis of confusion matrix.Finally the prediction target can be reformulated to transform the problem to multiclass.A sensibility factor can separate the samples into 3 classes as "high negative variation", "neutral variation" or "high positive variation" by calculates the variation of price.This can put the focus on more specific situations.
On the other hand, recent works have focused in semantic observation [27], [28], [29].Rules can be extracted from the database and applied to the model in an attempt to find a pattern that minimizes the prediction error.

Fig. 1 .
Fig. 1.The simplest kind of SVM: samples and key-points as support vectors (on the dashed line) as maximum margin linear classifier.

TABLE I .
AN SIMPLE EXAMPLE WITH THE FIRST SIX DAY'S VALUES AS INPUT MATRIX FOR 3 DAY'S DISCREPANCY VALUE.

TABLE II .
ALGORITHMS PARAMETERS FOR MODEL TRAINING.

TABLE IV
, we show the results of the cross validation training with 80% of data and testing with 20%.TableIXshow the cross-validation results for NB.www.ijarai.thesai.org

TABLE VI .
ACCURACY OF SVM MODEL IN TRAINING DATABASE FOR DIFFERENT WINDOW SIZE AND DIFFERENT INPUT PARAMETERS

TABLE VII .
ACCURACY OF NB MODEL IN TRAINING DATABASE FOR DIFFERENT WINDOW SIZE AND DIFFERENT INPUT PARAMETERS