SVM Optimization for Sentiment Analysis

Exponential growth in mobile technology and mini computing devices has led to a massive increment in social media users, who are continuously posting their views and comments about certain products and services, which are in their use. These views and comments can be extremely beneficial for the companies which are interested to know about the public opinion regarding their offered products or services. This type of public opinion otherwise can be obtained via questionnaires and surveys, which is no doubt a difficult and complex task. So, the valuable information in the form of comments and posts from micro-blogging sites can be used by the companies to eliminate the flaws and to improve the products or services according to customer needs. However, extracting a general opinion out of a staggering number of users’ comments manually cannot be feasible. A solution to this is to use an automatic method for sentiment mining. Support Vector Machine (SVM) is one of the widely used classification techniques for polarity detection from textual data. This study proposes a technique to tune the SVM performance by using grid search method for sentiment analysis. In this paper, three datasets are used for the experiment and performance of proposed technique is evaluated using three information retrieval metrics: precision, recall and f-measure. Keywords—Sentiment analysis; polarity detection; machine learning technique; support vector machine (SVM); optimized SVM; grid search technique


I. INTRODUCTION
In order to compete in market, it is essential for the organizations and companies to get aware of the consumer"s opinion regarding the products and services, they provide. Now days micro-blogging websites are the rich sources of textual data including opinions and reviews about products, services, brands, movies, music, news and politics, etc. These sources can be used to extract public opinion about anything i.e. from comments regarding particular product or brand to the views about next election etc. In past, this type of information was obtained through telephonic and door to door surveys which were really time consuming and complex task. So the need of automatic technique for opinion extraction from textual data brought the researchers to this domain. Today, there are fundamentally three approaches available to extract the opinion from text: Lexicon driven [1], machine learning based [2] and finally the hybrid of both [3]. The lexicon-based approach depends upon its dictionary of weighted words to generalize the polarity of any given text. It does not require any form of prior training as its every aspect is pre-defined and pre-programmed. Machine learning approach can be further categorized as supervised, unsupervised and semi supervised technique. Supervised technique needs to get trained in order to perform classification. For training purpose pre-classified/pre-labeled data (training data) is used and then it can be capable to classify the real input data (test data) [2], [4], [5]. The hybrid approach integrated both approaches (lexicon driven and machine learning) and usually brings more accurate results. SVM is one of the widely used supervised machine learning algorithms for sentiment analysis. It has proved to be highly effective in categorization of traditional texts. This research intends to tune the performance of SVM for sentiment analysis using grid search technique. Optimized Sentiment Analysis Framework (OSAF) is used in this research which is an extension of Sentiment Analysis Framework (SAF) [5]. OSAF consist of four phases: Dataset, Pre-processing, Classification and Results. Tuning is performed in the classification phase. Three datasets were used in this research, two from twitter and one from IMDB. In proposed technique, Grid-Search keeps tweaking the SVM"s parameters and compares the accuracy of the obtained results with the prelabeled data and chooses the parameters with best result. On the other hand cross-validation takes the accuracy further up to a notch by changing the testing data ratio to the training data in accordance with a pre-defined range until it obtains the highest possible accuracy. To evaluate the performance of proposed technique three information retrieval metrics are used precision, recall and f-measure.
Remaining paper is organized as follows. Section II is about related work in this domain. Section III elaborates the materials & methods used for this research. Section IV presents results & discussion. Section V finally concludes the paper.

II. RELATED WORK
Many researchers have been working to extract the sentiment from textual data by using data mining techniques. Some of the selected studies are discussed here. In [6], J48 and MLP were used to classify five datasets. Performance was evaluated in terms of TP rate, FP rate, Precision, Recall, Fmeasure and ROC Area. According to results, MLP performed better on each dataset. Neural Network also showed better learning capability and was suggested by the authors as good alternate option for classification. Authors in [7] presented an application for Arabic sentiment analysis of twitter data. Machine learning techniques: NB and SVM were used to analyze 1000 tweets for polarity detection. Feature vector approach was used with machine learning classifiers to improve the accuracy. Some problem areas were also identified by the researchers regarding training data such as multiple occurrences of tweets, opinion spamming and dual opinion tweets. It was mentioned that issues like these can www.ijacsa.thesai.org compromise the level of accuracy. Authors in [4] performed sentiment classification of Arabic tweets by using three machine learning algorithms: Naïve Bayes, Decision Tree and Support Vector Machine. Authors in this research used a framework for classification which includes: Term Frequency-Inverse Document Frequency (TF-IDF) and Arabic stemming as sub tasks. One dataset was used for all three techniques to evaluate the performance. Accuracy was measured in terms of three information retrieval metrics: precision, recall, and fmeasure. Authors in [8] presented a feature vector technique by dividing the feature extraction process in two steps, where process starts with the extraction of twitter specific features and then integrated with feature vector. After that these features are removed from the tweets and then again the feature extraction process is completed, like the case with normal text. These extracted features are also added to the feature vector. The accuracy of the proposed feature vector technique is same for Nave Bayes, SVM, Maximum Entropy and Ensemble classifiers. In [9], students" academic performance was analyzed and predicted by using three data mining techniques: Decision tree (C4.5), Multilayer Perception and Naïve Bayes. These techniques were applied on the student"s data of 2 undergraduate courses from two semesters. According to the results, Naïve Bayes showed the prediction accuracy of 86% which was higher than MLP and Decision tree. This type of prediction can help the teachers to detect those students in early stage, who are expected to get F grade. So ultimately, with the teacher"s special care to those students, the academic performance can be improved. In [10], authors predicted the rainfall in Malaysia by using five classification algorithms: Naïve Bayes, Decision Tree, Support Vector Machine, Neural Network and Random Forest. A comparative analysis was performed to identify the technique (s), which can give good result with low training data. Results showed that Decision Tree and Random Forest both have potential to bring higher F-measure after getting trained with lower amount of training data. Support Vector Machine and Naive Bayes brought lower F-measure, when got trained with little amount of data. Neural Network showed less performance in prediction even after getting trained with large amount of training data. In [11], authors performed a comparative analysis of various data mining techniques upon the practitioner"s decision of potential therapy change for 40 patients of posterior cancer. J48, MLP, Naïve Bayes, Radial Basis Function and K-Nearest Neighbor were used for classification. According to results, Radial Basis Function performed well and brought high accuracy and Kappa score with low error rate. In [5], performance of SVM was analyzed for polarity classification of textual data. The authors have proposed Sentiment Analysis Framework which consists of 4 phases: Dataset, Pre-processing, Classification and Results. Three datasets were used for the experiment, two from twitter and one from IMDB reviews. Performance of SVM was evaluated for each dataset by keeping in view three different ratios of training data and test data: 70:30, 50:50 and 30:70. Precision, recall and f-measure were used for performance evaluation. In [12], the authors have used SVM for sentiment analysis of twitter data. In this experiment, default parameters were selected in Weka along with 10 fold cross validation technique. Performance of SVM was analyzed on two datasets of pre-labeled tweets. Accuracy was evaluated In terms of Precision, Recall and F-Measure. In [13], authors conducted a systematic literature review on sentiment analysis. Latest research regarding use of SVM for sentiment analysis was the focus of this SLR and published research papers from year 2012 to 2017 were considered. Total of 901 articles were collected and by following a thorough systematic approach, 8 research articles were selected for critical review. Research objectives were identified in the form of research question and during the critical review the answers to those questions were provided.

III. MATERIALS AND METHODS
This research aims to optimize the performance of SVM for sentiment analysis using grid search technique. To analyze the performance of proposed technique, results are compared with three pre-labeled datasets, two from twitter [14], [15] and one from IMDB reviews [16]. An Optimized Sentiment Analysis Framework (OSAF) is used in this research ( Fig. 1), which is an extension of Sentiment Analysis Framework (SAF) [5]. The proposed framework consists of four phases: Dataset, Pre-processing, Classification and Results. Dataset phase deals with the insertion of data into the WEKA (Waikato Environment for Knowledge Analysis) environment on which the classification is going to be performed. Preprocessing Phase deals with the conversion of strings into vectors, which is a pre requisite process for the processing of SVM. This phase has further five steps: 1) Term Frequency-Inverse Document Frequency (TF-IDF), 2) Stemming, 3) Stop Words, 4) Tokenizing and 5) Words to Keep. Classification phase deals with the working of SVM in WEKA using K-fold cross validation grid search technique. Result phase produces the results in the form of tables and graphs. For performance evaluation, three information retrieval metrics are used: precision, recall and f-measure.

A. WEKA
This study used WEKA for the implementation of SVM grid search technique. WEKA is one of the widely used data mining software, developed in Java language at the University of Waikato, New Zealand [17]. The reason behind the widely acceptance is it's easy to use GUI interface for different functionalities such as data analysis, classification, predictive modeling and visualizations. Further advantages of this software include its general public license and its portability.

B. Data Sets
Three datasets are used in this research (Table I). First dataset [14] consisted of tweets related to following four topics: "Apple", "Google", "Microsoft" and "Twitter". It contains 571 positive, 519 negative, 2331 neutral and 1689 irrelevant tweets. In the second dataset [15], tweets are related to U.S. airlines and categorized as 2362 "positive", 9178 "negative" and 3099 "neutral". Third dataset [16] was taken from the Internet Movie Database (IMDB) reviews and contains 1000 positive and 1000 negative comments. www.ijacsa.thesai.org  Dataset phase deals with the arrangement of relevant data and its transformation to CSV/ARFF format to use in WEKA Workbench [17]. Simple CLI can be used to convert text files into ARFF format using "WEKA.core.converters.TextDirectoryLoader" function.

C. Pre-Processing
It is the most important phase of the framework. Purpose of this phase is to normalize the data by converting the strings into vectors for the classification process. Following sub tasks are performed in this phase.

1) Term Frequency-Inverse Document Frequency (TF-IDF)
TF-IDF provides important information in pre-processing phase by evaluating the frequency of useful words, which essentially makes the sentiment detection process easy. Frequency of these terms plays an important role in identification of important information as explained by [18]. For example, frequently appearing words in a text document can be "Good", "Bad", "Happy" or "Sad" etc. Identification and frequency of these words can play a vital role in the process of opinion mining. Term Frequency (TF) is the number of occurrences of a term in a given document and can be calculated with following equation:


Where TD corresponds to frequency of term t in a given document d. TF-IDF contains the inverse document frequency (IDF) that reverts higher weight-age for rare conditions while maintaining lower weight-age for common conditions as explained by [19]. IDF can be calculated with following equation: Where N represents number of documents and represents the number of terms. When both TF and IDF parameters are set to true, the results are calculated using the following equation: (3) In WEKA, TF & IDF transformations are also available along with other filters.

2) Stemming
The process of stemming is immensely useful in many areas of computational linguistics and information retrieval as it reduces all words with the same stem/ base to a common form [20], for example, the word 'working' will be stemmed in to 'work' and so on. Word stemming is one of the essential features of pre-processing in text mining [21]- [25]. In this study, "IterativeLovinsStemmer" is selected in WEKA as the word stemmer in the pre-processing phase. It is based on the LovinsAlgortihm which was the first Stemming algorithm by Lovins JB in 1968 [26].

3) MultiStopWords
The Concept of stop words was originally introduced by [27]. These are common high frequency words like "A", "the", "of", "and", "an". This data is not use full and also does not affect the performance of classification; thus, it has to be removed. There are several methods available for stop word removal as explained by [20], [21], [23], [28], [29]. In this research, "MultiStopwords" was selected for stop words criterion for the pre-processing phase in WEKA.

4) N-GramTokenizer
"N-GramTokenizer" was selected as the Tokenizer in WEKA for pre-processing of data. It first breaks the text into words whenever one of the listed specified characters is detected in it. Afterwards it emits N-Grams of each word of the specified length.

5) WordstoKeep
1000 words were short-listed in "wordstokeep" parameter to narrow down the results within a limited amount of time. www.ijacsa.thesai.org After applying these parameters, pre-processing was performed on all three datasets. Then the processed datasets were forwarded to the classifier.

D. Classification
In supervised machine learning approach, first the algorithm has to get trained with pre classified data (training data) with which it makes rules for classification and then on the basis of these rules it classifies the input data (test data). For performance analysis of any supervised machine learning algorithm, pre classified data is provided as test data and then the results of the algorithm can be compared with this pre labeled data. Same strategy is used is this study to analyze the performance of proposed grid search technique. Pre labeled datasets are obtained from social forums: Twitter and IMDB. For classification, Support Vector Machine (SVM) with grid search and K-fold cross validation technique is used. Grid-Search is basically a model for hyper parameter optimization. Hyper parameter tuning is an important task in SVM to extract more accurate results [30]- [33]. In Grid-Search, different models having different parameter values are trained and then evaluated using cross validation. For an RBF kernel, there are two parameters: C and ϒ. It cannot be ascertained in advance that which values of C and ϒ are best suited for a given problem, so an optimized model is required which can identify the ideal pair of values for these parameters to achieve maximum accuracy. The process of 10-k cross validation is performed on each model of C and ϒ and the pair with optimum results is selected. Cross validation is a method used to test multiple models under a particular classifier with the subset of input data as explained by [35]. For K-fold cross validation, the training data is first divided into k subsets of same size. One subset is tested using the classifier on the remaining k-1 subsets. The cross validation procedure can prevent the overfitting problem [34], [36], a binary classification problem is shown in Fig. 2 to illustrate this issue. Filled circles and triangles are the training data while hollow circles and triangles are the testing data. The testing accuracy of the classifier in Fig. 2(a) and (b) is not good as it overfits the training data. On the other hand, the classifier in Fig. 2(c) and (d) does not overfit the training data and gives better accuracy with cross validation. Fig. 2. Over-fitting with cross validation [36].

IV. RESULTS AND DISCUSSION
Performance of proposed technique is evaluated for all of three selected datasets and for this purpose three information retrieval parameters are used: Precision, Recall and F-Measure.
The precision can be calculated using TP and FP rate as shown below: Precision  TP is used for the sentences which are correctly classified whereas FP is used for sentences, which are wrongly classified.

Recall can be calculated as shown below:
Recall  FN is used for non-classified sentences and TP are the sentences, which are correctly classified (as explained above).
F-measure can be computed as below:

A. Results with the First Dataset
First dataset is taken from [14] and consisted of tweets about following topics: "Apple", "Google", "Microsoft" and "Twitter". According to results the average Precision, Recall and F-Measure scores are 0.745, 0.752 and 0.747, respectively.
By keeping in view the f-measure score for each class, it can be analyzed that the proposed technique performed well for irrelevant class with the score of 0.87. Detailed results are available in Table II whereas shown graphically in Fig. 3.  (Table III and Fig. 4). Highest fmeasure is reported in negative class which is 0.868.

C. Results with the Third Dataset
Third dataset is taken from [16] and consisted of IMDB reviews. The results show that Precision, Recall and F-Measure values on average are 0.841, 0.841 and 0.841 respectively. Highest f-measure is reported in negative class with the score of 0.843 (Table IV, Fig. 5).

D. Performance Evaluation of Grid Search SVM
To evaluate the performance of grid search technique, fmeasure scores of this research are compared with f-measure scores achieved with 70:30 in [5]. In that study, Sentiment Analysis Framework (SAF) was proposed and SVM was used along with various proportions of training and test data. The selected study is an ideal benchmark for comparison as the datasets used in both the studies are same. For comparison, fmeasure score is considered as it is the average of precision and recall. OSAF used in this research is the extension of SAF. F-measure scores of both the studies are compared in Table V and graphically represented in Fig. 6. It can be seen that the grid search technique explored in this research has improved the performance in each class of all three data sets. is achieved for the given dataset. This characteristic helps to tune the performance of SVM in classification. Cross validation on the other hand keeps changing the ratio of training and test data until finds the particular proportion which further enhance the accuracy to the maximum point. Finally performance of the explored technique was evaluated by comparing the f-measure scores with the published research, which have used the SVM with same datasets. It has been observed that grid search technique performed well and it is further suggested to use this technique on further datasets to explore its accuracy.