Techniques for Improving the Labelling Process of Sentiment Analysis in the Saudi Stock Market

Sentiment analysis is utilised to assess users’ feedback and comments. Recently, researchers have shown an increased interest in this topic due to the spread and expansion of social networks. Users’ feedback and comments are written in unstructured formats, usually with informal language, which presents challenges for sentiment analysis. For the Arabic language, further challenges exist due to the complexity of the language and no sentiment lexicon is available. Therefore, labelling carried out by hand can lead to mislabelling and misclassification. Consequently, inaccurate classification creates the need to construct a relabelling process for Arabic documents to remove noise in labelling. The aim of this study is to improve the labelling process of the sentiment analysis. Two approaches were utilised. First, a neutral class was added to create a framework of reliable Twitter tweets with positive, negative, or neutral sentiments. The second approach was improving the labelling process by relabelling. In this study, the relabelling process applied to only seven random features (positive or negative): “earnings” (حابرا), “losses” (رئاسخ), “green colour” (رضخلاا_نوللاب), “growing” (هدايز), “distribution” (عيزوت), “decrease” (ضافخوا), “financial penalty” (ةمارغ), and “delay” (ليجات). Of the 48 tweets documented and examined, 20 tweets were relabelled and the classification error was reduced by 1.34%. Keywords—Opinion mining; association rule; Arabic language; sentiment analysis; Twitter


INTRODUCTION
Classifying a sentiment polarity as positive or negative is challenging due to the subjectivity factor of the sentiment polarity.Opinionated text can also carry some speech informality-such as sarcasm, subjective language, and emoticons-that makes the opinion detection harder.This required more understanding of the text beyond the facts being expressed [1].In addition, sentiment polarity might contain positive and negative keywords that can make the labelling process unreliable.This occurred frequently in the neutral class, where one tweet might contain both positive and negative keywords.Labelling carried out by hand can cause human mislabelling sentiments.Therefore, adding the neutral class can help give flexibility for humans to have more options in the labelling process.However, this might cause less accuracy results.Since, the classifier techniques are used to cover the hall data set vectors, including the neutral data training.The reason behind this is that the data dictionary becomes larger which consist of all vectors that belong to positive, negative, and neutral.But humans will still label the data manually, which may create some mistakes in the labelling process.Consequently, the inaccurate classification creates the need to construct a relabelling process for Arabic tweets to remove noise on labelling.The main goal of the relabelling process is to remove the labelling noise.This will update experts' knowledge about labelling, which may lead to better classification.This is necessary because of the high degree of noise in labelling texts.Especially, for comments are long and consist of multiple sentences such as blogs [2].This paper presents techniques for improving the labelling process of sentiment analysis.Section 2 shows the need to improve the labelling process for the neutral class.Section 3 demonstrates the Arabic sentiment analysis.Section 4 shows the experiment-classification into positive, negative, and neutral classes.Section 5 shows the need to improve the labelling process by relabelling.Section 6 demonstrates the process of relabelling.Section 7 analyses the experimental findings from Saudi stock market data.The final section contains a conclusion and recommendations for further work in this area.

II. IMPROVING THE LABELLING PROCESS WITH THE NEUTRAL CLASS
Researchers tend to ignore the neutral class under the hypothesis that there is less learning sentiment from neutral texts compared to positive or negative classes.The neutral class is useful, though, in real-life applications since sentiment is sometimes being neutral and excluding it forces instances into other classes (positive or negative) [3]- [5].In addition, sentiment polarity might have positive and negative keywords that can make the labelling process unreliable.This happened regularly in the neutral class, where one tweet might have both positive and negative keywords.Labelling carried out by hand can cause human mislabelling of sentiments.Therefore, adding the neutral class can give humans more flexibility and options www.ijacsa.thesai.org in the labelling process.However, this might cause less accurate results, since the data dictionary, which consists of all vectors that belong to positive, negative, and neutral, becomes larger.

III. ARABIC SENTIMENT ANALYSIS
Limited research has been conducted on Arabic sentiment analysis, so this is a field that is still in its early stages [6].However, Boudad et al. [7], [8] reviewed the challenges and open issues that need to be addressed and explored in more depth to improve Arabic sentiment analysis, finding that these include domain, method of sentiment classification, data preprocessing, and level of sentiment analysis.They show that, in contrast to work on the English language, work on Arabic sentiment analysis is still in the early stages, and there are a lot of potential approaches and techniques that have not yet been explored.Another work carried out by Ibrahim et al. [9] have presented a multi-genre tagged corpus of MSA and colloquial language, with a focus on Egyptian dialects.Interestingly, they suggested that NLP supplements, which have been applied to other languages like English, are not valid for processing Arabic directly.Further, Abdulla et al. [10] explored the polarity of 2,000 collected tweets on various topics, such as politics and art.They used SVM, NB, KNN, and D-tree for their documents' sentiment classification.They showed that SVM and NB have better accuracy than other classifiers in a corpus-based approach.Their results reported that the average accuracy of SVM was 87.2%, while the average accuracy of NB was 81.3%.El-Halees's [11] combined approach classified documents using lexicon-based methods, used these as a training set, and then applied k-nearest neighbour to classify the rest of the documents.

IV. EXPERIMENT-CLASSIFICATION INTO POSITIVE, NEGATIVE, AND NEUTRAL CLASSES
In this paper, Twitter has been chosen as a platform for opinion mining in trading strategy with the Saudi stock market to carry out and illustrate the relationship between Saudi tweets (standard and Arabian Gulf dialects) and the Saudi stock market index.The tweets' source data was obtained from the Mubasher company website in Saudi Arabia, which was extracted from the Saudi Stock Exchange (which is known by TASI 1 Index).This experiment will add the neutral class with the N-gram feature.For this study machine learning approach utilised, in which a set of data labelled as positive, negative.The classifiers, which were used to explore the polarity of all the classes' data was Naive-Bayes and SVM.Two different weighting schemes (Term Frequency-Inverse Document Frequency (TF-IDF) and Binary Term Occurrence (BTO)) were used for all classes (Positive, Negative, and Neutral).Table I shows the comparison between the classifiers with the neutral class in term of class accuracy, recall, and precision.Table I shows that SVM with TF-IDF worked better to classify the targeted documents when we add the neutral class.To sum up the classification experiment, the best accuracy achieved by SVM with TF-IDF was 83.58%.Moreover, the best recall and precision was achieved by SVM with less classification error.The analysis shows similar result for SVM with both schemas and only slight differences between recall and precision.Table II shows the comparison between the classifiers in terms of class accuracy, recall, and precision.
A one-to-one model shows the relationships between the positive, negative, and neutral classes and the TASI.The build model illustrates the results in sentiment analysis by showing the positive, negative, and neutral opinions as well as the TASI closing values.Fig. 1 shows the relation between labelling by human operators and the TASI for the Saudi stock market for positive negative and neutral classes between the middle of March 2015 to May 10, 2015.It can clearly be seen that the positive, negative, and neutral classes rise and fall with each other over time; the greatest score for neutral classes occurred on 21/4/2015; the lowest neutral class score occurred on 25/3/2015; and the lowest negative class score occurred on 28/4/2015.Only once did the neutral class get lower than the negative class, over a four-day period between 23/3/2015 and 26/3/2015.At that point, the TASI started to fall sharply.The neutral class frequently went above the positive class, but TASI remained the same.In conclusion, the neutral class mostly rose and fell with TASI.This indicates that the neutral class is an important consideration in the sentiment analysis process.

V. IMPROVING THE LABELLING PROCESS WITH RELABELLING
High dimensionality in texts makes text pre-processing very significant in text classification problems, including sentiment analysis [12], [13].This problem increases once the dimensionality becomes higher, like when adding neutral class for the classification.For example, in the previous experiment conducted to improve the labelling process of the neutral class, there was approximately 17% misclassification when SVM were used and approximately 28% misclassification when NB www.ijacsa.thesai.org was used to classify the documents.In addition, labelling the documents conducted manually by humans may have introduced mistakes into the labelling process even when the neutral class was added.Thus, the inaccurate classification creates the need to construct a relabelling process for Arabic tweets to remove the noise on the original labelling.Below are the suggested steps for a relabelling process for Arabic sentiments analysis.The most challenging part of the process is feature selection since any feature can occur in all classes (positive, negative, and neutral).In addition, the main difficulty is to find out how many times the feature occurred in each class.Therefore, a wordlist technique was used to represent the text data as a matrix to show the frequent occurrence of each term within the three classes.Next, filtering the feature helps to select the highest features presented by the wordlist process.Then, in order to understand the sentences' structure and the sentiments behind them, a visualisation technique was used.This visualisation technique was applied to all data to achieve both a high level of understanding of the general structure and of the sentiment within an accumulated corpus.In other words, visualising the text shows the vital importance of the correlation between terms involved in the textual contents in general.However, visualisation shows only the feature with all the related terms in the textual contents without showing the classes they belong to.Using the wordlist technique with the visualisation can produce the important features created by the wordlist technique during the pre-process stage.Overcoming the visualisation limitation for the important features in the targeted text is essential to the relabelling process.After that, association rules extracted from the documents that have features that occurred in a questionable class.Association rules were generated regardless of the minimum support and minimum confidence threshold using the visualisation technique for the features that belong to the questionable class.By following these processes, the documents that have features in the questionable class can be relabelled again and the noise of the original labelling will be reduced.The second step was to pre-processing the data by cleaning up hashtags, duplicate tweets, retweets, URLs, and special characters, and preparing the collected data for the labelling process.The next step is to label the cleaned data as positive, negative, or neutral by the expert in the domain.After that, the relabelling process consists of several steps: the Wordlist process, Filtering, Visualisation, Extract Rule, and Relabelling.

A. Wordlist Process
Fig. 3 shows the wordlist process.This phase uses the same corpus classified-positive, negative, or neutral-and the same data pre-processing procedure used in Opinion mining for the positive, negative, and neutral classes.The goal of visualising association rules as wordlists2 is to have data sets that contain a row for each word and attributes for the word itself, and the number of labelled documents in each class for each term or word occurring in the training data.One other word represents the text data as a matrix to show the frequent occurrence of each term within the three classes.The key feature in this process was n-gram, which represents the correlation between the feature selection and other terms with their frequent occurrence for just two nods within the all-classes data.

B. Filtering
Features in the context of opinion mining are the words, phrases, or terms that strongly express the opinion as three polarities: positive or negative or neutral.In other words, features are the keywords chosen for the text as positive or negative.That means features have a higher impact on the orientation of a text than other words in the same texts.The impact of feature selection is to help to reduce the dimensionality of a text to increase the classification accuracy.Features in the text are considered explicit or implicit.Features appear in a text as explicit, whereas the feature does not appear is implied [14].In the proposed process, the explicit features only considered.

C. Visualization
The importance of visualising text is to understand the sentences' structure and the sentiments behind them.
Visualising the text shows the vital importance of the correlation between terms in the textual contents.The first step of the visualising technique is to produce the important features created by the wordlist process.Then, it was decided to select one of the features that appear in the dictionary created by the wordlist.Selection of the feature was done randomly to cover high-, average-, and low-frequency features to prove the concept of investigating the labelling noise.The next step is to visualise the selected feature in all-classes data as a wordlist representation.The wordlist shows how frequently the selected feature occurs in positive, negative, and neutral classes.If the selected feature was positive sentiment and occurred in other classes, such as neutral or negative, then the other classes (neutral, negative) are considered as a questionable class.In other words, if the selected feature is from the positive list, then it should occur only in the positive class-otherwise, this feature occurring in different classes would be considered a questionable class.A strong positive keyword should affect the text to be classified as positive unless there is a negation.Besides, it should not occur in the neutral class unless there are other words in the text affecting the sentiment.However, features that happened in a questionable class need further investigation to confirm the correctness of the labelling.Fig. 4 illustrates the visualisation association rules process.In this phase, the same corpus classified as positive, negative, or neutral was used in this stage; and the same data preprocessing procedure used in the opinion mining process was carried out.After that, FP-Growth was used to discover frequent items discovery regarding the minimum support and minimum confidence threshold.Then, association rules were generated to expose the relationships between seemingly unrelated data.The output of visualization is the association of the high-frequency terms correlated with the selected feature presented previously from the wordlist process.

D. Association Rules
The importance of association rule mining is to extract interesting correlations, frequent patterns, associations, or casual structures between sets of items in the transaction databases [15].Association rule mining is divided into two steps.One, frequent patterns are mined about the threshold minimum support.Second, association rules are built according to the threshold minimum confidence [16].Some terms or words appear with higher frequency in the dataset, while others rarely occur.In this case, the values of the minimum support will control the rule discovery.For instance, if the minimum support is set at a high value, rules that infrequently occur will not be found.Otherwise, if the minimum support is set at a low value, rules that frequently occur will be found.This cause rules with high confidence have very little support might be ignored [17], [18].www.ijacsa.thesai.orgText mining is defined as knowledge revelation from textual databases.Rules are created by analysing data for frequent if/then patterns.The frequent if/then patterns were mined utilizing methods such as the Apriori algorithm and the FP-Growth algorithm [19], [20].However, for this study, the FP-Growth method was used to discover the frequent item set in the targeted document [21,22].Since, the main advantages of the FP-Growth are: passes only two times over data-set, no candidate generation, and compresses data-set [23].

E. Extract Rules
In this phase, the extraction of association rules from collection of documents was based on the features created by the wordlist.Association rules were generated around the minimum support and minimum confidence threshold using the previous visualisation process; the only difference here is the data we are going to use are the data belonging to the questionable class.This step focuses on extracting the rule that occurred less frequently in the questionable class within a specific document.

F. Relabelling
In this step, searching is the training data for the feature occurring less in each questionable classes according to the wordlist matrix.We ensured reliability of the relabelling applied by the expert for a specific document.Then, sentiment with labelling noise was sent as a recommendation to the expert to check its labelling.

VII. EXPERIMENT -RELABELLING
The Arabic text classifications regarding Saudi stock market opinions through the SVM algorithm were designed and implemented.The classification error was 16.42%.Therefore, a framework was created for relabeling.
The relabelling process started by representing the text data as a matrix to show the frequent occurrence of each term within the three classes.The relabelling process focused on representing the correlation between the feature selection and other terms with their high-frequency occurrence for just two nodes within the all-classes data.Table III shows the feature -earnings‖ ‫)ارباح(‬ as positive sentiment in the Saudi stock market domain.Table III shows the occurrence of the feature -earnings‖ ‫)ارباح(‬ in the positive, negative, and neutral classes.Fig. 5 shows the association rules that related to the feature -earnings‖ ‫)ارباح(‬ in all classes with respect to the minimum support and minimum confidence threshold.The feature -earnings‖ ‫)ارباح(‬ entailed sharing the profits of some company in the Saudi stock market.] occurred 79 timesthree in the neutral class, 66 in the positive class, and 10 in the negative class.Since the phrase -distribute profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] occurred in negative and neutral classes then both classes become questionable classes.Therefore, the feature -earnings‖ ‫)ارباح(‬ needs further investigation in order to find the association rules in both classes.As result, two scenarios will be followed: In the first scenario, the association rules that occurred for the feature -earnings‖ ‫)ارباح(‬ in the neutral class are extracted.Association rules are generated with regard to the minimum support and minimum confidence threshold using the previous process of the visualisation.Fig. 6 shows that the feature -earnings‖ ‫)ارباح(‬ occurred with many rules that appeared in the premises column with the minimum support and minimum confidence values.However, according to the first scenario, the relevant rule here ] (support: 0.005 confidence: 1), which represents the phrase -distribute profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] illustrated in the wordlist matrix in the neutral class.www.ijacsa.thesai.orgFig. 8 shows the correlation rules that can happen with the feature -earnings‖ ‫)ارباح(‬ in the negative class.This process can identify the negation terms, such as ‫,عذو‬ which means the opposite of positive to help solving the negations problem with Arabic sentiment analysis.The second example verifies our experiment with another positive sentiment -Rise" ‫.)ارحفاع(‬ Table VII shows the feature -Rise" ‫)ارحفاع(‬ as positive sentiment in the Saudi stock market domain.Table VII also shows the occurrence of the feature -Rise" ‫)ارحفاع(‬ in the positive, negative, and neutral classes.Fig. 8 shows the association rules related to the feature -Rise" ‫)ارحفاع(‬ in all classes with respect to the minimum support and minimum confidence threshold.The feature -Rise" ‫)ارحفاع(‬ is meant to obtain a financial advantage or benefit from an investment of some company in the Saudi stock market.In addition, Fig. 9 shows the most important rules for the feature -Rise" ‫,)ارحفاع(‬ which is -earnings‖ --> -rise‖ ‫]ارباح[‬ --> ‫]ارحفاع[‬ (support: 0.091 confidence: 1), and the feature -percentage‖ --> -rise‖ ‫]َسبّ[‬ --> ‫]ارحفاع[‬ (support: 0.013 confidence: 1).The term -earnings‖ ‫]ارباح[‬ correlated with the term -rise" ‫)ارحفاع(‬ to compose positive phrases High profits in the sentence.Further, the term -percentage‖ ‫]َسبّ[‬ correlated with the feature -rise" ‫)ارحفاع(‬ to compose positive phrases high ratio in the sentence.] occurred in the negative class, the negative class becomes a questionable class.Therefore, the feature -Rise" ‫)ارحفاع(‬ in the negative class needs further investigation to find the association rules in the negative class.As a result, the second scenario will be followed: Extract the association rules that occurred for the feature -Rise" ‫)ارحفاع(‬ in the negative class.Association rules are generated around the minimum support and minimum confidence threshold using the previous process of the visualisation.
The feature -Rise" ‫)ارحفاع(‬ occurred with many rules that appeared in the premises column with the minimum support and minimum confidence values.However, according to the second scenario, the interested rule here   Fig. 11 shows the association rules related to the feature -green" ‫)االخضز(‬ in all classes with respect to the minimum support and minimum confidence threshold.The feature -green" ‫)االخضز(‬ indicates that the Saudi stock market values are closing green.Fig. 11 shows the most important rules for the feature -green" ‫,)االخضز(‬ which is ‫]االخضز[‬ --> ‫]بانهٌٕ[‬ (support: 0.006 confidence: 1).The feature -green" ‫)االخضز(‬ correlated with the term -colour" ‫)انهٌٕ(‬ to come up with the positive phrase -green colour‖ in the sentence.Since the phrase -green colour‖ ‫]بانهٌٕ_االخضز[‬ occurred in the negative class, the negative class becomes the questionable class.Therefore, the feature -green" ‫)االخضز(‬ in the negative class needs further investigation in order to find out the association rules in the positive class.As result, the second scenario will be followed: Extract the association rules that occurred for the feature -green" ‫)االخضز(‬ in the negative class.Association rules are generated with regard to the minimum support and minimum confidence threshold using the previous process of the visualisation.The rule ‫]االخضز[‬ --> ‫]بانهٌٕ_االخضز[‬ (support: 0.011 confidence: 1) which represents the phrase -green colour‖ ‫]بانهٌٕ_االخضز[‬ occurred in the negative class.Therefore, the next step is to search for the phrase -green colour‖ ‫]بانهٌٕ_االخضز[‬ in the negative class documents.Table XV shows the phrase -green colour‖ ‫]بانهٌٕ_االخضز[‬ happened in seven documents.Seven documents have been found to satisfy the rule ‫]االخضز[‬ --> ‫,]بانهٌٕ_االخضز[‬ it has been sent again to the expert who labelled the document in the first stage.It can be seen from the structure of the seven documents that the term -decline‖ ( ‫حزاجع‬ ( as negative sentiment came sometimes before and after the phrase -green colour‖ [ ‫,]بانهٌٕ_االخضز‬ which is a negative term that puts the seven documents in the unreliable situation during the labelling process.Table XVIII shows the comparison carried out between the result with the original classification and the new classification.The result showed that there was an improvement of 1.34% using SVM with TF-IDF with the new classification.To sum up, the results show that our process can readily classify Arabic tweets.Furthermore, they can handle many antecedent text association rules for the positive class, the negative class, and the neutral class.The analysis shows the importance of the neutral class in sentiment analysis of Arabic documents; adding the neutral class shows different results of classification accuracy.The reason results are different is that the new vectors dictionary for the text data consists of all the words that belong to positive and negative classes as well.The obtained results help to understand the text structure and the sentiment behind them.Finally, these efforts are meant to add to the breadth of expert knowledge in this field and to be beneficial to the future of machine learning methods.

VIII. CONCLUSION
This study presents a relabeling process to enhance the classification accuracy and update the expert knowledge in the original labelling.Since human error occurs in labelling data, visualisation of the text can show the importance of the correlation between terms involved in the textual structured contents.This is especially apparent in the wordlist and the Ngram steps of the pre-process stage.After the relabeling process applied for random only seven features (positive or negative), namely, -earnings‖ ‫,)ارباح(‬ -losses" ‫,)خسائز(‬ -Green color‖ ‫]بانهٌٕ_االخضز[‬ , -growing‖ ‫,)سٌادِ(‬ -Distribution‖ ‫,)حٕسٌع(‬ -Decrease‖ ‫,اَخفاض‬ -Financial penalty‖ ‫غزايت‬ , and -delay‖ ‫حاجٍم‬ .Of the 48 tweets documented and examined, 20 tweets were relabelled and the classification error was reduced by 1.34%.The current study should be repeated in other domains such as education.

Fig. 1 .
Fig. 1.The relation between labelling by human operators and the TASI for the Saudi stock market for positive, negative, and neutral classes.

Fig. 2 Fig. 2 .
Fig.2demonstrates the relabelling process for the Arabic sentiments analysis.The process started by collecting the

Fig. 5 .
Fig. 5. Visualize the association rules for the feature -Earnings " ‫(ارباح)‬ .The next step is to find out from the wordlist representation the occurrence of the most frequent phrases that related to feature -earnings‖ ‫.)ارباح(‬ Table IV shows that the phrase -high profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] occurred 79 times-three in the neutral class, 66 in the positive class, and 10 in the negative class.
announces a dividend distribution for the second half of 2014 From Fig. 6, the rule ‫]ارباح[‬ --> [ ‫حٕسٌع‬ -‫ارباح‬ ] (support: 0.021 confidence: 1) which represents the phrase -distribute profits‖ occurred in the negative class.Therefore, the next step is to search for the phrase -distribute profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] in the negative class documents.Table VI shows the phrase -distribute profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] happened in 10 documents.Moreover, only one document has been found to satisfy the rule ‫]ارباح[‬ --> [ ‫حٕسٌع‬ -‫ارباح‬ ], so it has been sent again to the expert who labelled the document in the first stage.It can been from the structure in the rest of the nine documents that the phrase -distribute profits‖ ‫أرباح‬ ‫حٕسٌع‬ has occurred with the negation [ ‫حٕسٌع‬ ‫أرباح‬ ] --> [ ‫,]عذو‬ which is the right place for this term in the negative class.

Fig. 7
Fig.7shows that the feature -earnings‖ ‫)ارباح(‬ occurred with many rules that appeared in the premises column with the minimum support and minimum confidence values.According to the second scenario, the interested rule hereis ‫]ارباح[‬ --> [ ‫حٕسٌع‬ -‫ارباح‬] (support: 0.021 confidence: 1), which represents the phrase -distribute profits‖ [ ‫حٕسٌع‬ -‫ارباح‬ ] illustrated in the wordlist matrix in the negative class.

Fig. 7 .
Fig. 7.The correlation rules of the feature -Earnings‖ in the negative class.

Fig. 9 .
Fig. 9. Visualize the association rules for the feature -Rise" ‫.)ارحفاع(‬The next step is to from the wordlist the representation of the occurrence of the most frequent phrases related to the feature -Rise" ‫.)ارحفاع(‬ Table VIII shows that the phrase -high profits‖ ‫-ارحفاع[‬ -‫َسبّ‬ ] occurred 30 times-zero in the neutral class, 23 in the positive class, and seven in the negative class. is ‫]ارحفاع[‬ --> [ ‫ارحفاع‬ -‫]َسبّ‬ (support: 0.015 confidence: 1) which represents the phrase -high profits‖ [ ‫ارح‬ ‫فاع‬ -‫َسبّ‬ ] illustrated in the wordlist matrix in the negative class.The rule ‫]ارحفاع[‬ --> [ ‫ارحفاع‬ -‫َسبّ‬ ] (support: 0.015 confidence: 1) which represents the phrase -high profits‖ [ ‫ارحفاع‬ -‫َسبّ‬ ] occurred in the negative class.Therefore, the next step is to search for the phrase -high profits‖ [ ‫ارحفاع‬ -‫َسبّ‬ ] in the negative class documents.

Fig. 11 .
Fig. 11.Visualize the association rules for the feature -Green‖ ‫.)االخضز(‬The next step is to find out from the wordlist representation of the occurrence of the most frequent phrases that related to the feature -green" ‫.)االخضز(‬ Table XIV shows that the phrase -green colour‖ ‫]بانهٌٕ_االخضز[‬ occurred 12 times-zero in the neutral class, eight in the positive class, and seven in the negative class.

TABLE I .
PRECISION AND RECALL FOR POSITIVE, NEGATIVE, AND NEUTRAL CLASSES USING N-GRAM FEATURE WITH SVM AND NB 1 https://www.tadawul.com.sa

TABLE III .
OCCURRENCE OF THE FEATURE -EARNINGS‖ ‫ارباح(‬ )

TABLE IV .
PHRASES FOR THE FEATURE -EARNINGS‖ ‫ارباح(‬ ) IN ALL DATA

TABLE VIII .
PHRASES FOR THE FEATURE -RISE" ‫ارحفاع(‬ ) IN ALL DATA TableIXshows the phrase -high www.ijacsa.thesai.orgFinally,colour is used as for both positive and negative sentiment in this domain.For instance, green colour indicates a positive sentiment in the stock market domain, while red indicates negative sentiment with the HMI field in computing.TableXIIIshows the feature -green" ‫)االخضز(‬ as positive sentiment in the Saudi stock market domain.Table XIII also shows the occurrence of the feature -green" ‫)االخضز(‬ in the positive, negative, and neutral classes.

TABLE XIII .
PHRASES FOR THE FEATURE -GREEN‖ ‫)االخضز(‬ IN ALL DATA

TABLE XIV .
PHRASES FOR THE FEATURE -GREEN COLOR‖ ‫)بانهٌٕ_االخضز(‬ IN ALL DATA

TABLE XVII .
ALL CLASS PERFORMANCE ACCURACY IN NEW CLASSIFICATION FOR SVM WITH TF-IDF SCHEMA

TABLE XVIII .
ALL CLASS PERFORMANCE ACCURACY COMPARISON FOR SVM WITH TF-IDF SCHEMA