A Cluster based Non-Linear Regression Framework for Periodic Multi-Stock Trend Prediction on Real Time Stock Market Data

Trend prediction is and has been one of the very important tasks in the stock market since day one. For a sophisticated trend prediction using real time stock market data, stock sentiment news and technical analysis plays a vital role. While predicting the trend in the conventional way, technical indicators are delayed due to temporal data and less historic data. All the conventional stock trend predicting methods sustained without sentiment scores, technical scores and time periods for trend prediction. Considering the fact that all the previous conventional methods of stock trend predictions are bound to take single stock for trend prediction due to high computational memory and time, this prototype of highly functioning algorithms focus on trend prediction with multi stock data breaking all the conventional rules. This multi stock trend prediction model commissions and implements the effectively programmed algorithms on real time stock market data set. In this multi-stock trend prediction model, a new stock technical indicator and new stock sentiment score are proposed in order to improve the stock feature selection for trend prediction. In order to find the best real time feature selection model, a technical feature selection measure and stock news sentiment score are developed and incorporated. We used integrated stock market data to make a hybrid clustered model to find the relational multi stocks. Giving a final verdict, this is a cluster based nonlinear regression multi stock framework in order to predict the time-based trend prediction. The multi stock trend regression accuracy is bettered by 12% and recall by 11% while we cross check the experimental outcomes, henceforth making this model more accurate and precision furnished. Keywords—Multi-stock trend prediction; stock market; clustering; nonlinear regression


I. INTRODUCTION
Stock markets provide investors with the most profitable avenue to spend their money. The investors can't find another way to make a high rate of return growth from anywhere else. They will have to bear the loss if things go south. This implies that investment is quite a risky thing to be involved. There is every opportunity to make returns bigger and bigger, and to lose everything [1]. When looked into theories, any change in share prices is associated with change in fundamental variable relevant to share price assessment. For example, Stock earnings, size, dividend pay-out ratio, various economic variables etc. A moving average can be calculated using a predetermined period by using the mathematical analysis of the stock's average price value. Every time the price of the stock changes, the average price either goes up or down [2]. Simple (arithmetic), triangular, exponential, and variable and weighted moving averages, calculated using open, close, low, high and stock price volumes are the different type of moving average indicators. The trend of the price maybe hiked or inflated at times at any point. These common outlines the flow of market trading. The investors gain profit when they buy a stock that shows uptrend. The uptrend stock here is value-appreciating stock. In uptrend, the value of the stock appreciates consistently over a period, even if there are consolidated. For example, a company's particular stock price began to say at ₹ 275, its price reached say ₹ 380, we say there is an uptrend for that stock, even though there were brief dips in stock prices between them. An uptrend can span hours, months, and years [3]. A downtrend is pretty much contrary of an uptrend. In downtrend, the stock price depreciates steadily over a period that may include some brief rises. If he follows these signals the investor can be benefited. Moving average trading is profitable if the price level shifts between selling and buying signals are adequate, otherwise it will result in losses. Bollinger Bands can be a great aid for dealers in dualistic decisions [4]. New openings to the trade can be opened by them. The market will likely get to seize around when the market approaches a Bollinger band. This information alone suffices for a dualistic decision to be won. Bollinger bands need a simple indication of how much they should make an attempt on the markets. Types of dualistic options with high outputs such as hierarchy options or one touch choices need this prediction, which is Bollinger Bands can turn a normal strategy into one that is highly profitable. Bollinger bands form essential levels of conflict and trend prediction. The relative strength index [5] is an indicator of momentum that measures the level of current price changes in order to assess the over-bought or over-sold conditions of a stock price or other quality. The RSI compares the momentum of stock price predictions [6].
When we select the appropriate technique of pre-processing data the sentiment analysis can be improved. This very fact makes pre-processing of data a crucial step in the process. Aside from the usual pre-processing techniques, some of the news articles require different pre-processing techniques because the content produced by the user community, for example, received messages from Twitter [7]. The views expressed in the news are either positive or negative or neutral opinion which plays an important role in the trend prediction. Analyzing sentiment is the task of selecting the sentiment label www.ijacsa.thesai.org for a given news article. It may also be considered a task to classify. As a result of this news reporting on a company, the opinions are formed among the investors, so they can make informed decisions about their share in that company's stock. All inputs are deemed independent from each other to predict the test class labels. The financial media reporters gather the information through reliable sources and the same would be disseminated in news article format [8]. The news articles which are published must be checked for trustworthiness. There are different methods of media where these news articles can be disseminated, and source from which the sentiment is to be derived from the news articles published in that source must be decided with utmost care. A Research Paper's result accuracy is based on the credible sources. Yahoo finance, the money control is few official websites we can say where the news articles are trustworthy. In the text classification approach, each word in the article is weighted with the frequency and this is classified within the specified group. Considering the importance of trading volume in understanding stock market microstructure, comprehensive empirical studies were conducted to research the relation between price, volatility and volume of trading. Market investors are often inclined to look for better investment options providing higher returns as the investment decision is made to gain better returns than other avenues available, or to expect a higher return than others. The probability of not achieving the anticipated or targeted return is commonly known as risk, but risk estimation is a difficult activity. Volatility is usually taken as the indicator of risk. Simple words volatility is a standard deviation in returns. Volatility can be actual volatility, historical volatility, the volatility implied and the volatility forward. Although considering the reasons for volatility, the economists argue that the market is moving according to the information provided to the market; others argue that volatility has little to do with the economic or external factors, and it is the reaction of the investors that exerts greater market impact. Investors are generally averse to risk. At the same time investment with volatile assets has to be made. The investment in security usually has varying purposes. Some buy stock and keep long to have the privilege of owning these capital assets. But some others are buying stock to sell and have the price differences. The return on equities varies with shifts in stock prices. Stock rates rarely remain the same. It is unpredictable. On the one hand, price volatility [26] is an opportunity, on the other, a threat to the investors. Price stability will reduce the risk stemming from price volatility. Yet stock prices cannot stay steady over time, because they are more prone to shifts in environmental factors. There are no bounds or barriers to the flow of funds in a globalized environment. The major players now in the Indian Stock Market are the FII (Foreign institutional investors). With the incurring losses, the risk of the stock increases, this is in fact measured by standard deviation statistic. This is the dispersion from what is required of the real. The larger the dispersion the greater the perceived security risk will be. The risk of a stock is viewed in relation to the market as well. Each safety is susceptible to market influence. Market influence may be greater or smaller. But the fact is, to a greater extent, the fortune of the individual stock is governed by the market. This part of the risk to the stock is called the systematic risk. All stocks on the market must share that class of risk. Such risks are therefore also known as nondiversifiable risk, because they cannot be eliminated through diversification. The statistics used to measure this portion of overall risk are beta. A security beta tells how far the security is market related. Operation on the stock market became popular nowadays. Present investors don't find investing in the stock market as pointless. They find investment in stocks to be more remunerative than other opportunities. Formerly stock investment has not received due respect and it has been treated as somewhat speculative that even today some discounts for its social acceptability are considerable. Stocks give not just the institutional investors but also the small retail investors a better opportunity. But that doesn't mean everyone knows the surgery. Market operation transparency is still in jeopardy. The SEBI is trying hard to get things working.
Looking back at our previous contributions, we have developed a single stock trend prediction using the technical and news data in an intraday process. Now, we propose a single stock trend prediction model using the technical and news data in different periodic time intervals. In this contribution an advanced multi-stock trend prediction model is designed and implemented on real time stock market data in different periodic time intervals. In this paper, new multi-stock technical and sentimental scores are developed to improve the stock selection process. A multi-stock clustering algorithm and classification models are developed in order to predict the periodic multi-stock trend.

II. RELATED WORK
Jeon et al. [9] demonstrated behavior next day by using a random subsample of collected tweets for stock market. They've gathered the NASDAQ, S&P 500 and DJIA tweet posts. For each day, they considered the factor of combined fear and hope, and analyzed the relationship between market indicators and these factors. They reported that the above mentioned stocks had been negatively associated with emotional tweets. Their findings have proved that stock market reaction on the very next day can be predicted by collecting emotional data [10].
Vu [11] proposed a new machine learning system by integrating features, consumer assurance and last 3 days data into the products. The cross-validation method has been adopted in a Decision Tree classifier for integrating all of the filtered features. Pre-processing steps include extracting noisy data, normalizing tweets and selecting data. The model was tested with NER (Named Entity Recognition Task) and without NER for Google, Apple, Amazon, and Microsoft companies stock and yielded 80.49 percent, 82.93 percent, 75.00 percent, and 75.61 percent for up and down NER (Named Entity Recognition Task) labels, respectively [11].
Vijh et al. [12] created a thorough study of stock prediction from data collection (how to collect it from twitter and tweet description), cloud storage, and then the process of opinion analysis (software and techniques) and finally the phase of prediction. Over time they examined the correlation between financial markets and social media data. They built a cloudbased system in JSON format to store various dimensions of public emotions contained in fetched tweets. The program was assessed for four companies listed under the UK Stock www.ijacsa.thesai.org Exchange, and the data checked were collected for 30 days. Their finding will help the firms assess the concerns of stakeholders and establish a new market strategy. Overall, the research enhanced the efficiency in the forecasting phase with emotional analysis and synthesizing.
Zhang et al. [13] used the twitter and survey index sentiments and attentive indicators, volatility and trading volume of S&P index 500 to forecast returns. Various supervised learning techniques and the Diebold-Mariano test were conducted and compared with autoregressive baseline model to confirm the significance of sentiments and attentionbased predictions. They noted that tweet volume and sentiments were relevant to predicting lower-market capitalization portfolios. In addition, they show that Kalman Filter indicators and Twitter sentiment were helpful in forecasting some sentiment labels based on surveys.
Chen et al. [14] analyzed the data obtained from various networking networks called chat rooms, web forums, and micro blogs and found different characteristics to be present. They believed that chat room posts at the activity level are strongly correlated with the trend in stock and assumption that is true. For chat room post sentiments the same performance was achieved with short posts reported from previous studies. The result indicated that post sentiments improved stock price return forecasting as compared to using only historical prices. They also developed a trading strategy and reported a return of 21 per cent over seven months. Proposed Model The overall process of predicting stock market direction consists of different steps that include data collection, pre-processing of text and selection of features. The programming is required for the overall work to be carried out. R language, python, was used with Java. The packages of those tools have been used to implement the algorithms proposed. The data our problem requires are of two types [15]. The historic stock values and the news stories from which the emotions are to be derived. Unlike the other systems which used the static data, our system is based on both the streaming data and the static data. The crawler crawls on the specified website and extracts the specified company's news articles for which the future direction of the stock is to be predicted. Since the stock prices must be correlated with the news articles, the news articles must be extracted along with the time stamps. Such news articles then act as the input to the module for the study of sentiments. The researchers have been pursuing paths of sentiment analysis for many years, and have come up with many different algorithms to characterize the text's feeling. Every algorithm has some advantages and disadvantages. Choosing the algorithm for sentiment analysis [25] may depend on the available datasets, domain and prior experience. One approach is to be chosen among approaches, linguisticbased, lexicon-based, and machine learning. If the approach is selected, the correct algorithm must be determined in that approach [16]. It is very crucial to decide what data set is being used for the research. Our framework does not dispose of readily accessible data sets. Most of the data is data processing that is being processed and stored in the database. Our system has to have two types of data. One relates to historical stock values and the other set of data containing news articles which are published online. The data used is from a combination of two different sources to study the correlation between news articles and stock prices: a dataset of historical data and a corpus of news articles. The initial source of data used to extract news articles is the website of money control, which has a large reservoir of critical news for the individual stocks. Money control is India's premier source of financial information. They derived historical values for 2012 from http:/ichart.finance.yahoo.com for the Infosys stock in NIFTY. This data is then loaded into a table of databases that can then be queried and processed. The moneycontrol.com Website was used for the news articles. To predict future prices based on the sequence of events, historical data are extracted from the moneycontrol.com web site for all the companies listed in BSE (around 3000) [17]. The code (scraper) is written to extract the open price of each company, close prices for the years 2007 to 2014. For this system, events from disclosure records and pieces of content collected from indiatimes.com, moneycontrol.com, sebi.com, watchoutinvestors.com, ecourts.gov.in, cibil.com are used as corpus.
Data pre-processing greatly decreases word space but there are still incentives for knowledge loss. Kang [18] measured the volatility of Indian stock market day-to-day returns. The study period was 1961-2005, and data were gathered together from the Economic Times Index and S&P CNX Nifty. The series observed volatility clustering quiet intervals of big returns were interspersed with cycles of volatility of great returns. The GARCH model was used to check the volatility effect asymmetry, and the result indicated a volatility asymmetry. It was known that high price movements started in response to strong economic fundamentals, and that the real reason for sudden movement was market imperfection [19].
Smruti et al. [20] proposed an extreme learning based PCA approach to predict the stock market data on limited training dataset. Shangkun et al. [21] proposed a gradient boosting approach to detect the trend in the china market. Shanoli et al. [22], proposed a novel time series model to predict the stock market data using the rule based approach on the training data.

III. FILTER BASEEDD STOCK TECHNICAL PREDICTION MODEL
In the paper [23], we have proposed a novel filtered based classification model on the technical and stock news datasets in order to predict the trend of the to find the bullish trend stocks on the real-time market data. This model is tested on the continuous type of technical data for trend prediction. In the proposed framework, a correlated multi-stock trend prediction model is designed and implemented on the real-time market data. In the initial phase, a real-time stock technical data and its related news are extracted from the money control and zerodha websites. These technical data and news data are pre-processed using the novel approaches developed in the papers [24]. In this work, an improved version of technical indicator and sentiment scores are defined based on the contextual information of the stock data. These scores and technical data are integrated to form the training data. A novel clustering measure is used to form the clusters based on the integrated data features. This clustering model is implemented to form the clusters based on the technical data and scores of an integrated dataset. Finally, this clustered data is given to classification www.ijacsa.thesai.org model to predict the trend of the multiple stocks based on the input test sample as shown in the Fig. 1.

A. Stock Technical and Comments Data Collection and Pre-Processing
In this phase, all the stock related technical and news data are extracted from the zerodha or trade view or money control websites for data collection. All these collected data are preprocessed using the models in the papers [20] [21]. Text preprocessor is applied on the stock news data as text filtering. In this work, a modified version of stock technical score and sentiment score are developed on the technical and stock news datasets.

B. Hybrid Stock News Score
To each comment in the stock corpus S, we construct a dictionary of words that contains bullish and bearish words. In the stock news training data S, each input stock news is represented as sn[i] and term frequencies of the sn[i] is presented as tsn[i][j], where i, j represent the j th term of the ith stock news s. Here, the term frequency and normalized term frequencies are used to find the news score of the stock. T represents the total tokens in the i th stock news. This normalized data is scaled by using inverse document frequency (idf) and multi-stock scaling factor (mssf) is represented in Eq. (1) (1)

C. Proposed New Stock Technical Indicator
In the paper, a novel mutual information (MI) is proposed to find the contextual relationship of the bullish and bearish stocks using the technical indicators. Hybrid technical mutual information is represented in terms of bullish and bearish cases as shown in Eq. (2). (2) Where bu represents the bullish and be represents the bearish stock type.

F. Training Data with Integrated Scores
In this work, technical data is integrated with the newly computed technical score and the stocks news score as unlabeled data for data clustering. Here, multiple stock's technical data and scores are integrated to find the score-based data clustering on multiple stocks.

H. Multi-Stock Trend Prediction Model
In the proposed multi-stock trend prediction model, a hybrid multi-linear regression model is designed and implemented to predict the trend of the multiple stocks. In this model, a new probability estimation based non-linear regression model is designed and implemented on the training clustered dataset. A Non-linear regression estimation using the time wise trend prediction is given as.
)(q(s( )) ) nn b(s( )) (s( )) (p(s( )) ) n 1 b(s( )) ( q(s( )) b(s( )). p( p s( )) ) n In the proposed multi-stock trend prediction algorithm, a non-linear regression model is used to predict the trend of the input stock with different time frames. In this work, we have used 1m,3m,5m,10m,15m,30m,1H,1D,1W,1Mt time frames in order to predict the trend of the given stock based on the clustered stock market dataset. Here, the computed non-linear regression estimator value is tested against the MACD signal value and middle line Bollinger line values to predict the similar type of trends in the real-time market.

IV. EXPERMENTAL RESULTS
Experimental results are simulated using java environment and real-time market data. Proposed model is compared to the traditional stock market classification models to verify the performance of the hybrid feature selection-based clustering and classification model to the traditional models. Also, proposed model is compared to the traditional techniques by using various statistical performance measures such as accuracy, true positive rate, recall, precision, false positive rate, runtime etc. These performance metrics are analyzed and compared by using third party java libraries. Different types of statistical metrics such as recall, precision, accuracy, Fmeasure are evaluated on the stock market sentiment data along with the technical data. These statistical measures are evaluated based on the confusion matrix as described in Table I. Accuracy: It is the ratio of correctly labelled stock predictions class labels to the entire stock class labels as shown in Eq.   Fig. 2 illustrates the 5 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 2, it is noted that the reliance industries stock is uptrend in the afternoon session. Fig. 3 illustrates the 10 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 3, it is noted that the reliance industries stock is downtrend in the morning session and slightly uptrend in the afternoon session. Fig. 4 illustrates the 15 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 4, it is noted that the reliance industries stock is downtrend in the morning session and slightly uptrend in the afternoon session. Fig. 5 illustrates the 5 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 5, it is noted that the HDFC bank stock has uptrend in the afternoon session.    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 545 | P a g e www.ijacsa.thesai.org Fig. 6 illustrates the 10 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 6, it is noted that the HDFC bank stock is downtrend in the morning session and slightly uptrend in the afternoon session. Fig. 7 illustrates the 15 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 7, it is noted that the HDFC bank stock is downtrend in the morning session and slightly uptrend in the afternoon session. Fig. 8 illustrates the 5 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 8, it is noted that the nifty index is uptrend in the entire session. Fig. 9 illustrates the 10 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 9, it is noted that the Nifty index is downtrend in the morning session and slightly uptrend in the afternoon session. Fig. 10 illustrates the 15 min candlestick pattern graph in the ZERODHA brokerage website. As shown in the Fig. 10, it is noted that the Nifty index is downtrend in the morning session and slightly uptrend in the afternoon session.      Table II describes the performance of computational runtime (ms) of stock trend feature extraction using the proposed approach on large datasets. From the Table II, it is clearly shown that the present feature extraction procedure has low computation runtime as compared to the conventional approaches. Table III illustrates the proposed multi-stock feature selection measures on the input data. From the Table III, it is observed that the proposed multi-stock feature selection has better filtering than the conventional feature selection measures.    Table VI, it is clearly shown that the present framework has better efficiency F1-measure as compared to the conventional approaches. Fig. 11 describes the performance of recall of multi-stock trend classification using the proposed learning framework on large datasets. As shown in the Fig. 11, it is clearly shown that the present framework has better efficiency recall as compared to the conventional frameworks. Table VII describes the performance of precision of multistock trend classification using the proposed framework on large datasets. From the Table VII, it is clearly shown that the present framework has better efficiency precision measure as compared to the conventional approaches. Fig. 12 describes the performance of accuracy of stock trend classification using the proposed multi-stock trend prediction framework on large datasets. As shown in the Fig. 12, it is clearly shown that the present framework has better efficiency accuracy as compared to the conventional frameworks.

V. INFERENCE
Performance of various segments such as precision, recall, F-Measure, accuracy and runtime are improved due to data filtering and feature selection in the above model as we can see in the results from the above tables. From the above tables, it is clearly identified that the proposed feature extraction and scoring approach optimizes the stock sentiment of the social media comments and its technical data. The proposed stock feature has less runtime and more efficiency in the real time stock market databases when compared with the traditional feature extraction measures. When the traditional classifiers and the proposed non-linear classifiers are compared from the above tables, it is observed that the performance of the nonlinear classifiers are better than the traditional classifiers in terms of recall precision, accuracy and runtime(ms). 12% of accuracy is obtained through the proposed model when compared to the traditional stock market prediction classifiers.

VI. CONCLUSIONS
In this paper, a hybrid real-time multi-stock trend prediction model is designed and implemented on the stock market data. Since, most of the conventional single stock trend prediction models are depend on data size and limited feature space, it is difficult to find a novel feature selection measure on the stock technical data and stock news data. Also, these models are independent of temporal features for stock trend prediction. In this work, an advanced time based multi-stock trend prediction model is developed on the real-time data. In this model, a new technical stock feature selection indicator and sentiment scores are computed for the clustering method. Finally, a cluster based non-linear regression framework for periodic multi-stock trend prediction is applied on the real time stock market data. Experimental results proved that the present model has better efficiency than the traditional technical indicators in terms of accuracy, f-measure, precision and recall. From the experimental results, it is observed that the proposed stock market trend prediction model has 9% of runtime (ms) and 12% of average classification accuracy as compared to the traditional trend prediction models on training and test dataset.