Forecasting the Tehran Stock Market by Artificial Neural Network

— One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock market data to give individuals or institutions useful information about the market behavior for investment decisions. The enormous amount of valuable data generated by the stock market has attracted researchers to explore this problem domain using different methodologies. Potential significant benefits of solving these problems motivated extensive research for years. In this paper, computational data mining methodology was used to predict seven major stock market indexes. Two learning algorithms including Linear Regression and Neural Network Standard feed-forward back prop (FFB) were tested and compared. The models were trained from four years of historical data from March 2007 to February 2011 in order to predict the major stock prices indexes in the Iran (Tehran Stock Exchange). The performance of these prediction models was evaluated using two widely used statistical metrics. We can show that using Neural Network Standard feed-forward back prop (FFB) algorithm resulted in better prediction accuracy. In addition, traditional knowledge shows that a longer training period with more training data could help to build a more accurate prediction model. However, as the stock market in Iran has been highly fluctuating in the past two years, this paper shows that data collected from a closer and shorter period could help to reduce the prediction error for such highly speculated fast changing environment.


INTRODUCTION
Data mining called knowledge discovery in databases, in computer science, the process of discovering interesting and useful patterns and relationships in large volumes of data.The field combines tools from statistics and artificial intelligence (such as neural networks and machine learning) with database management to analyze large digital collections, known as data sets.An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspect of biological neural networks.Neural Networks (NN) as Artificial Intelligence method has become very important in making stock market predictions, as it has proved to be more advantages than the other methods.Since then lot of research was carried out using different topologies of Neural Networks.According to Wong, Bodnovich and Selvi [1] the most frequent areas of neural networks applications are production operations (53.5%) and finance (25.4%).In finance, it is more specific to stock market predictions.When compared to the other methods NN outperformed and the accuracy rate ranges from 68% to 90% [9].The popularity of these methods is mainly due to the benefits outnumber the limitations it has.Different NN methods were used for optimal feature selection to generating buy and sell signals, the more popular being the former.
In this paper, we anticipated Mobarakeh-Steel Co. try.The high level was in Tehran Stock Exchange.We used data from 15 March 2007 until 14 February 2011 for training the neural Network and from15 February 2011 until 30 th we have performed experiments using MATLAB and we got 97 % results, which are very encouraging.
In stock market when brokers want to sell or buy stock, they mostly depend on technical trading rules.Robert Edward and John Magee have [10], defined technical trading rules as "the science of recording the actual history of trading (price changes, volume of transaction, etc.) in a certain stock or in "The averages" and then deducing from that pictured history the probable future trend".Different artificial intelligence methods were used to optimize the prediction by successful selection of trading rules.

II. PROPOSED APPROACH
A researcher originates primary data for the specific purpose of the problem at hand, but secondary data are data that have already been collected for other purposes.Secondary data can be classified as internal and external.Our research strategy is the analyses are of secondary data.For conducting our research, types of data are needed: 1) Companies' stock prices, which are an internal secondary data, gathered from the financial databases.
Techniques used for cash forecasting can be broadly classified.

b) Factor analysis method c) Expert system approach
We have used Factor analysis method for cash forecasting because by using this method the results are better than of two above methods.If there are not enough neurons in each layer, the outputs will not be able to fit all the data points (under-fitting).On the other hand, if there are too many neurons in each layer, oscillations may occur between data points (over-fitting).Therefore, a topology study was conducted in order to find the most appropriate architecture neural network to fit Cash forecasting parameters.There are several combinations of neurons and layers.

III. EXPERIMENTAL
The Propose approach has been implemented through Matlab Software.

A. Number of Layers
The model is a three layer feed-forward neural network and was trained using fast back propagation algorithm because it was found to be the most efficient and reliable means to be used for this study.Table 2   Mainly calendar effects are included as parameters affecting cash withdrawal in this model, Total number of input neurons needed in this model hence is seven, each representing the values of an individual variable at a particular instant of time.

C. Output Layer Size
In this model, only one output unit is needed for indicating the value of forecasted cash.

D. Optimal Hidden Layer Size
There is no easy way to determine the optimal number of hidden units without training using number of hidden units and estimating the error of each.The best approach to find the optimal number of hidden units is trial and error.In practice, we can use either the forward selection or backward selection to determine the hidden layer size.
Forward selection: Starts with choosing an appropriate criterion for evaluating the performance of the network.Then we select a small number of hidden neurons; record its performance i.e. forecast accuracy.
Next, we slightly increase the hidden neurons, train and test until the error is acceptably small or no significant improvement is noted.
Backward selection: Starts with a large number of hidden neurons and the decreases the number gradually.
For this study, the forward selection approach was used to select the size of hidden layer and best result was with 10 Neurons in hidden layer, as evident from the following Table 3.Stock markets have been studied over and over again to extract useful patterns and predict their movements.Mining textual documents and time series concurrently, such as predicting the movements of stock prices based on the contents of the intraday price, is an emerging topic in data mining and text mining community.Stock price trend forecasting based solely on the technical and fundamental data analysis enjoys great popularity.However, numeric time series data only contain the event and not the cause why it happened.
Textual data such as news articles have richer information, hence exploiting textual information especially in addition to numeric time series data increases the quality of the input and improved predictions are expected from this kind of input rather than only numerical data.Information about company"s report or breaking news stories can dramatically affect the share price of a security.
In order to make the prediction model, the research process should be implemented consists of different steps including data collection, data preprocessing, alignment, feature and document selection, document representation, classification and model evaluation.With the prediction model (feed-forward back prop) we can conclude that our prediction model out performs the random labeling.The prediction model will notify the up or down of the stock price movement when an upcoming pieces of news is released, and 83 percent of time can p r e d i c t c o r r e c t l y .This can be very beneficial for individual and corporate investors, financial analysts, and users of financial news.With such a model, they can foresee the future behavior and movement of stock prices; take correct actions immediately and act properly in their trading to gain more profit and prevent loss.
Among the 20 text files provided by Tehran Stock Exchange Service Company (TSESC), Mobarakeh-Steel Co. Intra day price and their corresponding date and time during years 2007 till 2011 are chosen to be given to the split and merge algorithm.Before implementing, the segmentation algorithm in R Programming Language, Mobarakeh-Steel Co. text file should be read by the program.The program reads 1904 intra day prices (data points) for this company during years 2007 and 2011, which is equal to 2266.TABLE 1. Data Sets Architecture of the Model: Design of right architecture involves several important steps: a) Selecting the number of layers b) Basic decision about the amount of neurons to be used in each layer c) Choosing the appropriate neurons' transfer functions.

Figure2, Performance Figure3,
Figure2, Performance shows a comparison of the two algorithms.

TABLE 2 .
Selection of Algorithms

TABLE 3 .
Optimal Neurons in Hidden LayerOptimal Transfer Function As shown in table 4, for best transfer function, tansig in hidden (at 10 neurons) and logsig in output layer were found to be optimal.

Transfer function Mean Forecast Error for 60datapoints Sum Squared error at 5000epochs
In this model, only one output unit is needed for indicating the value of forecasted cash.Shown in the table 6 that percentage predicted for Real Data.Shown in the table 7 that Prices Predicted from Mobarakeh-steel Co. Figure 5, Real price vs. percentage price.

TABLE 7 .
Price Prediction of mobarakeh-steel Co.