Comparison of Accuracy between Long Short-Term Memory-Deep Learning and Multinomial Logistic Regression-Machine Learning in Sentiment Analysis on Twitter

The paper is about sentiment analysis research on Twitter. In this research data with the keyword, ‘Russian Hacking’ concerning the 2016 US presidential election on Twitter was taken as a dataset using Twitter API with Python programming language. The first process in sentiment analysis is the cleaning phase of tweet data, then using the Lexicon-based method to produce positive, negative, and neutral sentiment values for each tweet. Data that has been cleaned and classified will be processed in the Deep learning method with Long ShortTerm Memory (LSTM) algorithm and Machine learning method with Multinomial Logistic Regression (MLR) algorithm. The accuracy of these two classification methods are calculated using the confusion-matrix method. The accuracy obtained from the LSTM classification method is 93 % and the MLR classification method is 92 %. Thus, it can be concluded that LSTM is better in classifying sentiments compared to MLR. Keywords—Sentiment analysis; deep learning; machine learning; Long Short-Term Memory (LSTM); Multinomial Logistic Regression (MLR)


I. INTRODUCTION
The demand for information technology needs in this era is increasing, especially in terms of communication. One form of technological progress is social media. Twitter is becoming the dominant form of social media. Twitter is a website that is a service of microblog, which is a form of a blog that limits the size of each post, which provides facilities for users to be able to write messages in Twitter updates which contain only 140 characters [21], [14]. Twitter users can express various opinions and opinions. Forms of expressions written by users on twitter are called tweets. The number of large tweets shared by Twitter users every second, making the collection of tweets can be processed and analyzed to find out a review or public opinion about a particular product, service, or topic. Sentiment analysis [1], [3], [19] is a research branch of text mining that is used to analyze and classify opinions from a text document. Sentiment analysis is the process of extracting, processing and understanding textual data to get information in the form of opinions or tendencies of opinion on a problem or object by someone, whether it tends to have a negative or positive opinion or opinion. Sentiment analysis can be done with many methods, one of which is the Machine learning and Deep learning methods [4].
Based on these reasons, the authors are interested in conducting sentiment analysis research. The research conducted is to make a comparison on the application of the Deep learning classification method using the Long Short-Term Memory (LSTM) algorithm and Machine learning classification method using Multinomial Logistic Regression (MLR) algorithm. The application of the classification method is done by using data taken from Twitter with the help of the Twitter API application and the Python programming language. The analyzed topic is the case of Russian hacking that concerns the US presidential election in 2016. Data are taken via Twitter based on Hastag (#) relating to research topics. The accuracy results obtained using the LSTM classification method will be compared with the MLR classification method, so that it can be known which algorithm is capable of producing classification methods with better accuracy values.
In the rest of paper, we show briefly the literature review and related work in Section II. In Section III the research methodology is presented. The implementation and results related to our research are also shown in Section IV. The last section is conclusion and future work of our research.

A. Sentiment Analysis
Sentiment analysis is a type of natural language processing to track people's moods about certain products or topics. Sentiment analysis, which is also called opinion mining [17], involves building a system to collect and examine opinions about products or services made in web posts, blogs, or comments on social media. Sentiment analysis is useful in many ways. For example, in marketing [7], [10], it helps the success of new advertisements or product launches, determining which version of the product or service is popular and even identifying the types of demographics that like or do not like certain features.
The basic task in sentiment analysis is to classify the text that is in a sentence or document, then determined the opinions expressed in sentence or whether the document is positive, negative or neutral. As in [15], [16], sentiment analysis can also express emotional feelings of sadness, joy, or anger.

B. Lexicon-based Method
The Lexicon-based method can [16], [20]: i) identify the sentiments of each opinion words contained in the tweet data, and ii) handle multi-opinion problems in a data. This method is an improvement from the method that cannot handle multiopinion problems. In handling the multi-opinion problem, this method collects all sentiments from the word opinion based on the distance between the words of opinion and its features. So that finally it can be used to determine the class of opinion of each data. In this method, the data is divided into three parameters of sentiment analysis, namely positive, negative, and neutral.

C. Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM) is another type of processing module for Recurrent Neural Network (RNN). LSTM was created by Hochreiter & Schmidhuber (1997) [11] and later developed and popularized by many researchers [9], [8], [2], [5], [6], [13]. Like RNN, the LSTM network also consists of modules with repetitive processing. The difference is that the modules that make up the LSTM network are LSTM modules [18], as seen in Figure 1: The LSTM module (one green box) has different processing from the regular RNN module. Another difference is the addition of a signal given from one time step to the next time step, namely the context (memory cell), represented by the symbol C t in Figure 2. The diagram in Figure 3 explains that each line carries the entire vector, from the output of one node (node) to another input. The pink circle represents the operation of elements, such as the addition or multiplication of vector elements, while the yellow box is the neural network layer (containing parameters and biases) that can study. Two lines joining indicate a combination of two matrices or vectors, while the split line indicates the content is copied and the copy goes to a different node.

1) LSTM Key Mechanism:
The key idea of LSTM is the path that connects the old context (C t−1 ) to the new context (C t ) at the top of the LSTM module, as illustrated below in Figure 4. C t context is also called cell state or memory cell in several articles. With the path above, a value in the old context will easily be passed on to a new context with very little modification, if needed. Context is a vector, which we specify as the LSTM network designer. The intuition is, each element we expect to be able to record a feature of input, for example in natural language processing for English, an element recording the gender of the subject, other elements recording whether the subject is singular or plural, etc. These features will be found by LSTM alone in the training process. Another key idea is the sigmoid gate (sigmoid gate) which regulates how much information can pass. Let us see Figure 5, for an input x, the output of the sigmoid gate is σ(A · x + b), where A is a parameter, b is biased, both are studied in the training process, and σ is a sigmoid function. The gate output is a number between zero and one; zero means that the information is totally blocked, while one means the entire information is included. The output from the sigmoid gate will be multiplied by another value to control how much the value is used.
For example, with the sigmoid gate in Figure 6, LSTM can manage how much information from C t−1 is included into C t .

D. Multinomial Logistic Regression (MLR)
Multinomial logistic regression is a logistic regression used when the dependent variable has a polichotomous or multinomial scale with nominal scale response variables with three categories [12]. For regression models with a dependent variable with a nominal category of three categories, the results of Y variable variables are coded 1, 2, and 3. Y variable is parameterized into three logit functions. Logistic regression method is stated in a probability model, namely the model in which the dependent variable is the logarithm of the probability that an attribute will apply in the condition of certain independent variables. As in [12], the model used in MLR is Using logistical transformation, logistic functions are obtained, Based on the two logit functions, the trichotomous logistic regression model is obtained as follows: III. RESEARCH METHOD The authors conducted research with the following stages:

A. Retrieval of Data
We perform tweet data retrieval through the service provided by Twitter, namely, Twitter API using the Python programming language by specifying key words or hashtags related to the topic taken until a number of tweets are needed.

B. Pre-Processing
We perform a cleaning process (Figure 7) on the tweet that has been obtained, including eliminating the URL, deleting the hashtag (#) and mention (), changing the negation word with the negation dictionary, deleting duplication of data and classifying the tweet using the Lexicon-based method, as seen in Figure 8 and Figure 9 with the opinion lexicon owned Hu and Liu [16], who divided tweets into positive, negative and neutral classes.

C. Processing
LSTM and MLR algorithms are applied. Tweets that have been cleaned and classified later are included in the classification method to be built. Classification methods to be built are LSTM and MLR. The accuracy results generated by the two classification methods are then calculated using the confusionmatrix method and then compared, so it can be determined which method can produce better accuracy values.

D. Visualization
The results of the information patterns found will be visualized and displayed in a form that is easier to understand, i.e. in the form of diagrams and tables.

E. Report Preparation
Writing and documenting the research starting from the initial stage, which is taking tweets to the results of sentiment analysis and visualizing sentiment analysis data into tables, diagrams, and wordcloud

IV. IMPLEMENTATION AND RESULTS
Data obtained from Twitter by using access to the Twitter API that has undergone a pre-processing stage, namely cleaning and through the classification stage of the tweet using the Lexicon-based method, will be the datasets. It is processed at the processing stage. In this stage it is made classification methods by applying deep learning with LSTM algorithm and Machine-learning with MLR algorithm. In the final stage, the comparison of the two classification methods is visualized in the form of diagrams and tables.

A. Implementation of LSTM Classification Method
The built LSTM classification method is explained at this stage. This stage discusses briefly how to build a LSTM model from starting loading datasets to testing and visualization, as seen in Figure 10. The built deep learning model has an input layer, hidden layer, and output layer. The input layer contains input data in the form of a matrix vector which is named the embeddingmatrix variable, the hidden layer is an additional layer that is useful to train data repeatedly until it gets optimal accuracy or results, the output layer is the result of processing from the hidden layer. The following is the layer design on the LSTM classification method. Based on the design of the LSTM classification method layer in table I, the following layer design can be specified: 1) The input layer is an embedding matrix with a number of matrix vectors of 22,922 with a vector length of 35. 2) Hidden layer 1 has 64 neurons, uses the ReLU activation function, and the dropout is 0.4. 3) Hidden layer 2 has 64 neurons, uses the ReLU activation function, and the dropout is 0.4. 4) Output Layer has 3 neurons, using the Softmax activation function.
The Figure 11 shows a graph of the workflow model of the LSTM model built:

2) Training and Evaluation of the LSTM Classification
Method: After declaration of input layer, hidden layer, and output layer, the next stage is the training data process. The training process is carried out as many as 3 epochs and the results of the training model will be stored every time there is an increase in value on the accuracy produced by each epoch. From the results of the training conducted with 3 epochs, it was found that the greatest accuracy value was produced at the 3rd epoch with an accuracy value of 0.9365 or 94 %. Visualization of the results of accuracy and loss values during the training process can be seen in Figure 12.

3) Calculation of LSTM Accuracy with Confusion-Matrix:
Accuracy calculations are obtained using the confusion-matrix method. After the data training is done through the learning process, and the test data is evaluated using validation data, the next step is to enter the LSTM model into a new variable to do  Table II shows the results of accuracy calculation using the confusion-matrix method in the form of classification report.

B. Implementation of MLR Classification Method
The built machine-learning classification method with MLR algorithm is explained at this stage. It discusses how to build a MLR model from starting loading datasets to testing and visualization, as seen in Figure 13. Following is the implementation chart of the MLR classification method:

1) MLR Classification Training Method:
The training phase is carried out on several N-gram models to obtain the model with the best accuracy value which will then be evaluated with the Logistic Regression algorithm. N-gram is a method for retrieving bits of letter characters of n from a word. N-gram has three types of processing models in a sentence, the type of processing includes Unigram for separating one word in a sentence, Bigram for separating two words in a sentence, and Trigram for separating three words in a sentence. Classifier training models are carried out on the N-gram model with several conditions, among others, the Unigram with stop words model, Unigram without stop words, and Unigram without custom stop words. Custom stop words are stop words derived from the words that most often appear on the corpus. In table III it is shown the custom stopwords list in this study. Figure 14 shows a comparison chart of the results of using stopwords, without stopwords and with no custom stopwords.
Based on the training process conducted using the Unigram word processing model with default stopwords, custom stopwords, and without stopwords, the best accuracy is obtained by 92% with default stopwords. After knowing the greatest accuracy using default stopwords, an experiment is conducted using the word processing Bigram and Trigram.   From the training results the best accuracy of MLR classification method can be obtained through the Unigram word processing method with default stopwords with 92% accuracy.
2) Calculation of MLR Accuracy using Confusion-Matrix: Sentiment analysis carried out in this study is comparing the results of accuracy obtained from the method of deep learning classification using the LSTM algorithm and machinelearning classification method using MLR algorithm. Accuracy calculations are obtained using the confusion-matrix method. The table V shows the results of accuracy calculations with Unigram and default stopwords using the confusion-matrix method with 10,000 features in the form of classification reports:

C. Comparison of Accuracy Results for the LSTM and MLR Classification Methods
The final result in this study is to determine which classification method is better in conducting sentiment analysis. Based on the classification report obtained it can be concluded that the Deep Learning classification method with the LSTM algorithm is better in analyzing sentiments compared to the MLR classification method using Unigram with default stopwords. The following table VI shows the comparison results of the two classification methods tested.

D. Testing of LSTM and MLR Classification Methods
This test is carried out by providing input data on the two classification methods that have been built and have completed the learning process with training data. For example, the author will enter three sentences that later the LSTM and MLR classification method will provide the results of classification of sentiments. The following table VII shows three sentences that will be used as testing material:   Table IX shows the results of the sentiment classification obtained using the MLR classification method. The best accuracy results obtained from the MLR classification method are using the Unigram word processing model with default stopwords and 10,000 features. The test was carried out by entering the same three sentences as the tests performed on the LSTM classification model. This was done so that the accuracy of the resulting sentiment classification could be compared.

E. Comparison Results of Sentiments on Trial Data
Comparison of classification results can be seen in table X as follows: 4) Based on the sentiment class, the LSTM classification method and MLR are able to produce the same sentiment class, but the weighting of the sentiment value from the LSTM classification model is better. 5) The LSTM classification method is better than the MLR classification method in classifying sentiment classes.

V. CONCLUSION AND FUTURE WORK
In this study sentiment analysis was carried out on the topic "Russian Hacking Cases Regarding the 2016 US Presidential Election". Sentiment analysis was carried out by testing the classification method of deep learning using LSTM algorithm and Machine-learning using MLR algorithm. Based on the results obtained from testing the LSTM-and MLR-classification method, the following conclusions can be drawn: Testing of the Deep-learning classification method using the LSTM algorithm produces an accuracy value of 93%. Testing of the Machinelearning classification method using the MLR algorithm produces an accuracy value of 92%. Deep-learning classification method using LSTM algorithm is better in doing sentiment analysis than Machine-learning classification method using MLR algorithm. Differences in accuracy generated by the classification method of LSTM Deep-learning and MLR Machinelearning are significant enough.
The sentiment analysis carried out in this study still has many shortcomings. For that it is necessary to develop the improvement of the application that has been made in this study. The suggestions of application development are as follows: 1) The research was conducted using only two classification methods, namely LSTM deep learning algorithm and MLR machine-learning algorithm. It is expected that the other classification method will be added so that more classification methods can be compared in the it's accuracy results.
2) The dataset used is still relatively small for data processing with deep learning algorithms. For further development, the dataset used is expected to be even more so that the resulting accuracy can be better.

ACKNOWLEDGMENT
The Authors gratefully acknowledge Gunadarma University for providing research funding and for permission in using the research facilities.