Sentiment Analysis of Arabic Jordanian Dialect Tweets

Sentiment Analysis (SA) of social media contents has become one of the growing areas of research in data mining. SA provides the ability of text mining the public opinions of a subjective manner in real time. This paper proposes a SA model of Arabic Jordanian dialect tweets. Tweets are annotated on three different classes; positive, negative, and neutral. Support Vector Machines (SVM) and Naïve Bayes (NB) are used as supervised machine learning classification tools. Preprocessing of such tweets for SA is done via; cleaning noisy tweets, normalization, tokenization, namely, Entity Recognition, removing stop words, and stemming. The results of the experiments conducted on this model showed encouraging outcomes when Arabic light stemmer/segment is applied on Arabic Jordanian dialect tweets. Also, the results showed that SVM has better performance than NB on such tweets’ classifications. Keywords—Sentiment analysis; Arabic Jordanian dialect; tweets; machine learning; text mining


I. INTRODUCTION
Sentiment Analysis (SA) or opinion mining is defined as the task of finding authors" opinion with respect to a topic or issue.Also, it is focused on analyzing sentences or classifying texts into either positive, negative, or neutral opinions.Furthermore, SA has gain high popularity in recent years to analyze and benefit from the available data that exit on online social media such as blogs, wikis, and tweeter [1].Such analysis could be based on knowledge or statistics [2].Finally, SA requires dealing with many natural language processing issues such as conceptual primitives [3], sarcasm [4], aspectsbased [5], and subjectivity detection [6].
Arabic sentiment analysis of tweets may include the opinion of the public in regard to a specific topic [7], [8].The performance of Arabic SA tools have become deeply engaged with the compatibility of the social media availability.Researchers have been working on sentiment analysis and opinion mining using different tools to define people's views and comments from negativity, positivity and neutrality opinions.Our research considers the sentiment analysis of how Jordanians, using their own dialect of local idioms and words, react to trends and news over Twitter.
Our target is to establish a SA model for classification of Arabic Jordanian dialect tweets into either negative, positive, or neutral, by recognizing words; named entities, stop words, and stemmers.To accomplish this task, we have collected tweets according to their locations, then we filtered these tweets to collect different types of terminologies in order to identify Jordanian Arabic dialect efficiently.
The rest of the paper is organized as follows: Section two lists some related work of sentiment analysis that emphasizes on Arabic SA.Section three discusses some background concepts needed for this research such as machine learning classification techniques.Section four presents the proposed Arabic Jordanian dialect tweets SA model.Section five provides the evaluation measures, the experimental results and the evaluation of this model.Finally, Section six presents the conclusions and some future work.

II. RELATED WORK
There are many researches that have considered Arabic sentiment analysis.In [9], the authors proposed a hybrid approach which combines SVM and semantic orientation on Egyptian dialect corpus of tweets.In [10], the authors presented a model for sentiment analysis of Saudi Arabic tweets to extract feedback from Mubasher products.In [11], the authors developed Corpus for Arabic Sentiment Analysis of Saudi Tweets.In [12], the authors explained how mining social networks can be done on Arabic Slang comments by proposing a SVM based classifier that applies sentiment analysis to classify youth news comments on Facebook.
The authors in [13], studied the effect of social media (Libyan tweets) during the Arab Spring based on two sources of information; the language of leaders in public speeches, and the language of the public in social media.Some researches such as in [14], have focused on Arabic opinion mining using a combined approach of lexicon based method and k-nearest method for classifying documents.This research is similar to the work of [15] that designed a framework for data collection, statistical analysis, sentiment analysis, and language model comparison to understand the interests of Twitter users towards news headlines.However, we differ by not using the behavior of the English language entities.Instead, we use the behavior of Arabic Jordanian dialect.Also, this research provides a formulation of the opinion mining problem, identifies the key pieces of information that should be mined, and describes how a structured opinion summary can be produced from unstructured tweets.This research is also different from other related researches by focusing specifically on Arabic Jordanian dialect tweets written by Twitter users who comment and writes special local idioms and words that are www.ijacsa.thesai.orgmainly used in Jordan.Hence, our research takes such locality of words and idioms into consideration during SA.

III. BACKGROUND
Machine Learning (ML) has two main approaches, supervised learning and unsupervised learning.The problem with unsupervised machine learning is that they may overlap and learn to localize tweets with minimal unsupervised algorithms.Therefore, we have used two supervised ML approaches for the classifications of Arabic Jordanian dialect tweets, namely, Naïve Bayes (NB) and Support Vector Machine (SVM).
Naïve Bayes (NB) is a learning method in which it introduces the multinomial model, or a probabilistic learning method.NB often relies on the bag of words presentation of a document, where it collects the most used words neglecting other infrequent words.Bag of words depends on the feature extraction method to provide the classification of some data [16].Furthermore, NB has a language modeling that divides each text as a representation of unigram, bigram, or trigram and tests the probability of the query corresponding with a specific document.Support Vector Machine (SVM) makes non-probabilistic binary vectors as a learning algorithm to be applied for classification.The most important models for SVM text classifications are Linear and Radial Basis functions.Linear classification tends to train the data-set then builds a model that assigns classes or categories [17].It represents the features as points in space predicted to one of the assigned classes.SVM provides good classification performance in several fields; but mostly applied for image recognition and text classification.
Many researchers used supervised learning approaches on data related to publically released corpuses for Arabic SA [11].Such researches use the Modern Standard Arabic (MSA).MSA can handle neutral indications in which they are written in questions" forms or as unknown purposes such as the phrase ‫انحمذهللا(‬ Praised God), in which it could mean something good or bad depending on the mode of the writer.
Furthermore, various methods of Arabic text mining have been discussed and researched.A hybrid approach of sentiment analysis by [9] combines SVM and Semantic Orientation (SO).The challenges that they faced for a given word are in terms of its root or its spelling or its different meanings.
Other classifications for SA are based on predicted classes and polarity, and/or on the level of classification (sentence or document).Lexicon based SA text extraction is annotated with semantic orientation polarity and strength.SA proved that light stemming comes in handy for the accuracy and for the performance of classification [18].
Finally, an automatic classifier of Arabic text documents based NB and SVM algorithms was presented in [19], the results indicated that the SVM algorithm handled the text documents classification better than the NB algorithm.

IV. PROPOSED ARABIC JORDANIAN DIALECT TWEETS SA MODEL
In this section, we present our proposed model that analyzes, mines, and classifies Arabic Jordanian dialect tweets as illustrated in Fig. 1.

A. Collecting Tweets
A connection to Twitter is created in order to collect a corpus of Jordanian dialect tweets.A read only application is built to collect written tweets from Twitter.The collected tweets are based on the following parameters; users timeline, home timeline, trends, and searching for queries.For each parameter, a corpus of 1000 tweets is collected.

B. Tweets Extraction
Tweets extraction helps in extracting the important content of a tweet (the essence).Hence, what is needed from a tweet is written after the hash-tags, and subsequently extracting the feature words, words that carry a message for the user whether it is a positive, negative, or neutral tweet.Also, tweets extraction is needed to facilitate analyzing the features vector and selection process (unigrams, bigrams and trigrams), and to facilitate the classification of both training and testing sets of tweets.

D. Tweets Preprocessing
Several preprocessing stages have to be done on the collected tweets in order for the SA process to be more effective.These stages are as follows: Fig. 1.Arabic Jordanian Dialect Tweets SA Model.www.ijacsa.thesai.org 1) Cleaning stage: Dealing with Arabic characters and letters needs further cleaning process, so we have to continue cleaning the tweets for each line that contains special symbols, and various characters such as emoticons.Those symbols and characters may lead us to a different classification from what the user is intended in the tweet.Hence, we took the emoticons and emotion characters into consideration during the sentiment classification.Table 1 presents some special symbols that we have used along with their meanings and sentiments.Finally, the cleaned extracted tweets are stored into a database in a comma separated values format for further manipulation.
The annotation process of the collected tweets is done by an Arab Jordanian student.They were fully aware of the Jordanian dialect meanings and domains.As a result of this annotation, they labeled each tweet with either positive, negative, or neutral.
Positive tweets may include good words indicators that express happy feelings, and/or love, excitement, etc.For example, the following tweets are positive tweets:  ‫نهجمٍع‬ ‫بانخٍر‬ ‫مباركت‬ ‫جمعت‬ (Blessed Friday for everyone)  ‫وعم‬ ‫مه‬ ‫زده‬ ‫انههم‬ ‫ببهذي,‬ ‫افتخر‬ (Proud of my country, God raise it with blesses) Negative tweets may include bad words, negativity indicators that express bad feelings, sad, depressed, and/or anger.For example, the following tweets are negative tweets:  ‫شىفك‬ ‫بذي‬ ‫ما‬ ‫هىن‬ ‫مه‬ ‫زحهق‬ (Move from here I don"t want to see you)  ‫بعض‬ ‫عهى‬ ‫كهمتٍه‬ ‫تركب‬ ‫بتعرف‬ ‫ما‬ ‫واسكت‬ ‫اوطم‬ (Shut up, you don"t know how to put two words together) Neutral tweets may not include positive nor negative words, they may include questions about something, uncertainty, etc.Also, a tweet is considered neutral if it does not bear any opinion.For example, the following tweets are neutral tweets:  ‫بكمم‬ ‫اوا‬ ‫اٌذك,‬ ‫مه‬ ‫اترك‬ (Leave your hands out of it, I will continue)  ‫هانٍىمٍه‬ ‫معتذل‬ ‫انجى‬ (The weather is fine these two days) 2) Normalization: Normalization of tweets is needed since there is no single convention of spelling of some Arabic letters, for example either one of the letters ‫ة(‬ Taa) or ‫ه(‬ haa) could be used as the last letter of a word.Hence, if we have the word ‫"نهفة"(‬ Funny), then its normalized form is ‫."نهفه"‬This normalization stage starts by removing all extra spaces, then the occurrence of any un-normalized letter is replaced with its normalized form as shown in Table 2.
All non-standard words that have numbers and/or dates are identified.Such words would be mapped into special built in vocabularies.This results in smaller number of Arabic tweet vocabularies and improves the accuracy of the classification task.Some Arabic words have several Tatweels (stressing a literal by repeating it several times).The occurrences of such a letter in a word are replaced by only one occurrence.For example, the word ‫)"نمــــــــــــــــــــا"(‬ (name of a person) would be normalized into ‫ّا"(‬ ‫.)"نم‬Moreover, Tashkeel (added vocalizations or diacritization on Arabic alphabets) could affect the performance of the stemmers, and the performance of the classification process.Hence, words with Tashkeels will be replaced with their corresponding words without Tashkeels.Table 3 presents some examples of Arabic words in their un-normalized vocalized Tashkeel forms and their corresponding unvocalized normalized forms.
3) Tokenization: Tokenization is an important step in SA since it reduces the typographical variation of words.Feature extraction process and the bag of words require tokenization.Dealing with Arabic language requires a high level component that uses a dictionary of features that transforms these words into feature vectors, or feature indices; such that the index of the feature (word) in the vocabulary is linked to its frequency in the whole training corpus.

4) Named Entity Recognition (NER):
NER is a significant tool in Natural Language Processing (NLP); it allows the identification of proper nouns in an unstructured text.NER has three categories of name entities; ENAMEX (person, organization, and country), TIMEX (date and time), and NUMEX (percentages and numbers).For example the word ‫"أردن"‬ means a country name (Jordan), and the word ‫"األحذ"‬ means the name of day in the week in Arabic (Sunday).www.ijacsa.thesai.org5) Some of the Arabic language names start with " ‫ال‬ The" which helps in defining each word that starts with these two letters as a named word.Furthermore, NER helps in making the classification faster than processing the whole sentence.Hence, we have built a function to define Arabic named entities, so that in a future work we would apply SA only on those Arabic names.We estimated the frequency of each name in the tweets to see the most frequent names being used, and we measured the probability of having these names in positive tweets, in negative tweets, and/or in neutral tweets.
6) Removing stop words: Some stop words can help in attaining the full meaning of a tweet and some of them are just extra characters that need to be removed.Some examples of the Jordanian dialect stop words that don"t affect the tweets meaning and can be removed from tweets are: 7) Stemming:-The last stage in the preprocessing of Arabic Jordanian dialect tweets is stemming.It is done by removing any attached suffixes, prefixes, and/or infixes from words in tweets.A stemmed word represents a broader concept to the original word, also it may lead to save storage [11].The goal of stemming tweets is to reduce the derived or inflected words into their stems, base or root form in order to improve SA.Furthermore, stemming helps in putting all the variation of a word into one bucket, effectively decreasing our entropy and gives better concepts to the data.
Tashaphyne is an Arabic light stemmer/segment tool developed by [20] for exploring the sentiment analysis of Arabic roots.This tool has demonstrated the potential of mapping Arabic words into their basic roots for the process of SA task, showing noteworthy improvements to baseline performance [21].Hence, we have used this tool as a useful light stemmer for our Arabic Jordanian dialect tweets.For instance, in Arabic Jordanian dialect the word ‫"وشمً"‬ which means a man with good Jordanian manners would be stemmed into ‫."وشم"‬Moreover, N-gram is a traditional method that takes into consideration the occurrences of N-words in a tweet and has the ability to identify formal expressions [22].Hence, we have used N-gram in our SA.
Finally, in this research, we have implemented the term frequency using weka [23].Term frequency assigns weights for each term in a document in which it depends on the number of occurrences of the term in a document, and it gives more weight to those terms that appear more frequent in tweets because these terms represent words and language patterns that are more used by the Arabic tweeters.

V. EXPERMINAL RESULTS AND EVALUATION
Several experiments have been conducted to compare the performance of Naïve Bayes (NB) and Support Vector Machines (SVM) classifiers of Arabic Jordanian dialect tweets.The classifications are conducted on three balanced and unbalanced classes namely; positive, negative, and neutral tweets.We have used a total of 3550 Jordanian dialect tweets as follows; 616 positive tweets, 1313 negative tweets, and 1621 neutral tweets.The first experiment used un-stemmed tweets with unigram feature, the second experiment used stemmed tweets, the third experiment used rooted tweets, the fourth experiment used stemmed and rooted bigram tweets, the fifth experiment used stemmed rooted tri-gram tweets, and the final sixth experiment used stemmed and rooted n-grams tweets.
Three measures of sentiment classification performance are used, namely; Accuracy, Precision, and Recall.In addition the Receiver Operating Characteristics (ROC) introduced in [24] is also used to measure the performance of the classifiers.The ROC graphs are used to visualize, organize, and select classifications based on the performance.The difference between the ROC and accuracy is that the ROC is helpful in managing unbalanced instances of classes, whereas, the accuracy is a single number to sum up the performance.Finally, we have used, also, the F-measure to evaluate the effectiveness of our proposed sentiment model of Arabic Jordanian dialect tweets.Furthermore, in order to evaluate the performance of sentiment analysis model, cross validation is used in which 10fold equal sized sets are produced.Each set is divided into two groups, training and testing, the testing set is taken by 10-fold from the training tweets.
The results obtained from conducting these experiments are shown in Table 4. From this table, it is shown that the SVM classifier performs better than the NB classifier in all measures of every experiment.Both classifiers have better performance on all measures when the set of tweets were balanced.
The ROC performance reached an average of 0.71 on NB and an average of 0.77 on SVM on all experiments, which are considered to be good values taking into account the instances used and the prediction of data since ROC compares the true positive and false positive rates, which is the fraction of sensitivity or recall in machine learning.
It is also noticed that the classification results using the unigrams experiments are better than using the bigrams experiments in SVM and vice versa in NB.This is due to the fact that NB classifies according to the probabilities, in addition to the fact that using bigrams would increase the probability of estimation.
For the evaluation of bigram and trigram experiments, it is noticed that the measures obtained from using stemmed trigrams" features experiment are lower than those of using bigrams experiment, but in the rooted experiment, the performance of trigram is higher with accuracy of 55% in NB and of 76% in SVM.Hence, we can conclude that rooted balanced trigrams perform better than unigrams and bigrams.
A final experiment was conducted to determine the optimum threshold (ROC curve) for each stemmed unigram features using SVM.We have tried different term frequency thresholds as features until we got the best results of the ROC area with different values for each positive, negative, and neutral.Fig. 2 shows the optimum threshold for positive, negative, and neutral stemmed unigram features.
In this final experiment, the thresholds curve is applied for each class with its corresponding Area Under Curve (AUC) plot, positive tweets have AUC value of 0.8904, negative tweets have AUC value of 0.8666, and neutral tweets have AUC value of 0.8666.Hence, positive tweets have performed more accurately than the other sentiments; and the amount of data that positive tweets holds are more accurate to predict the correct positive sentiments.Finally, Fig. 3 illustrates the accuracy differences between SVM and NB classifiers, and proves that classification with SVM provides us with higher accuracies in both cases of stemmed unigrams and bigrams.

VI. CONCLUSION AND FUTURE WORK
Sentiment analysis or opinion mining has increasingly evolved since the growth of social media networks; it is the process of evaluating the person's feelings to a specific subject.
The sentiment analysis model we have proposed in this research is based on three classes/labels; positive, negative, and neutral.Our model of sentiment analysis started with collecting and extracting of Arabic Jordanian dialect tweets followed by cleaning and annotating of such tweets.Then, these tweets were gone through various steps of preprocessing that includes; normalization, tokenization, name entity recognition, removing of stop word, and stemming.
Several experiments were conducted on the proposed model using supervised ML.We conclude that classifications using SVM on Arabic light stemming always yield better results than using NB.Furthermore, imposing a balance between the three classes, and reducing the number of instances to the most used instances improved the accuracy.The outcomes were very promising as SVM achieved an accuracy of 82.1%.
Despite the fact that rooted experiments have difficulties in correctly classifying Arabic Jordanian dialect tweets, we took the annotation and preprocessing steps very seriously.
Then we demonstrated the positive effect of classification using light stemming mechanism on the un-stemmed tweets.We have used all the polarity of classes on both (stemmed and un-stemmed tweets).After that we tested the same mechanism on the rooted tweets.Our experiments" results proved that stemming affects the accuracy of tweets" classifications.Furthermore, the conducted experiments showed many aspects and beneficial insights about Arabic Jordanian dialect users.For example; we discovered that the amount of negative emotions the users obtain is more than positive emotions.
Finally, one direction to extend this research would be the improvement of light stemming, and rooting mechanism, by monitoring the performance of each rule on Jordanian dialects to test the improvement of the overall performance and the classification process.Another direction for future research is to apply the semantic orientation approach and building a list of positive, negative, and neutral tweets for better understanding of the Jordanian Arabic dialect tweets.

TABLE I .
SOME SYMBOLS AND THEIR MEANINGS

TABLE III .
SOME NORMALIZED TASHKEEL WORDS