Arabic Sentiment Analysis : A Survey

Most social media commentary in the Arabic language space is made using unstructured non-grammatical slang Arabic language, presenting complex challenges for sentiment analysis and opinion extraction of online commentary and micro blogging data in this important domain. This paper provides a comprehensive analysis of the important research works in the field of Arabic sentiment analysis. An in-depth qualitative analysis of the various features of the research works is carried out and a summary of objective findings is presented. We used smoothness analysis to evaluate the percentage error in the performance scores reported in the studies from their linearly-projected values (smoothness) which is an estimate of the influence of the different approaches used by the authors on the performance scores obtained. To solve a bounding issue with the data as it was reported, we modified existing logarithmic smoothing technique and applied it to pre-process the performance scores before the analysis. Our results from the analysis have been reported and interpreted for the various performance parameters: accuracy, precision, recall and F-score. Keywords—Arabic Sentiment Analysis; Qualitative Analysis; Quantitative Analysis; Smoothness Analysis


INTRODUCTION
Sentiment analysis is a type of natural language processing (NLP), where NLP or computational linguistics, is the scientific study of human languages from a computational perspective [1].Natural language processing is an extensive field covering such applications and investigations as human language translation/generation/comprehension, speech & named entity recognition, question answering and information retrieval, word/topic segmentation, and relationship extraction.Sentiment Analysis (SA) is using natural language processing, statistics, or machine learning methods to extract, identify, or otherwise characterize the sentiment content of a text unit [2].Sentiment analysis has also been referred to as opinion mining (OM) and is concerned with the analysis of human opinion, sentiment, and emotion about specific entities (such as food, products, organizations, etc.) and issues (politics, news, etc.) [3][4] [5].
Sentiment analysis, involves in building a system to collect and examine opinions about the product made in blog posts, comments, reviews or tweets.Sentiment analysis can be useful in several ways.For example, in marketing it helps in judging the success of an ad campaign or new product launch, determine which versions of a product or service are popular and even identify which demographics like or dislike particular features [38] [4] [5].This paper reviews efforts to build SA systems for Arabic.The rest of this paper has arranged as following: After a brief discussion of the properties of Arabic language in Section 2, we review sentiment analysis process in Section 3. Related work and qualitative analysis for Arabic presented in Section 4, we presented quantitative analysis in Section 5, conclusion and future work in Section 6.

II. ARABIC LANGUAGE CHALLANGES
As an important player in international politics and the global economy, the Arab world is the focus of many multinational interest groups and analysts who endeavour daily to decipher sentiments on issues like oil and gas prices, stock market movements, politics and foreign policy, emanating from this part of the world.The resulting chatter being in the Arabic language, there is a great need for natural language analysis of large amounts of Arabic language text and documents to support the required sentiment extraction.As described in the foregoing, the relative importance of the Arabic language in global communications demands a proportional amount of interest and research for naturallanguage processing of large amounts of Arabic language text and documents to facilitate sentiment extraction for industrial use [6][7] [8].
The reality, however, is that there is relatively little available support for Arabic-language sentiment analysis, majorly for the following reasons: (1) relatively limited scholarly work and research funding in this area, when compared to other-language studies, especially English.(2) Morphological complexities and dialectal varieties of the Arabic language which require advanced pre-processing and lexicon-building steps beyond what is applicable for the English language domain [7][8].This limits the potential applications of current tools and custom tools for Arabic SA may not be easy to come by, may be limited in current functionality, or may not be freely available.Farra et al [9] illustrated the challenges of Arabic-language sentiment analysis: the existence of many inflectional and derivation forms -where words have transitional meanings depending on position within a sentence, and the type of sentence (verbal or nominal).Multiple word prefixing, suffixing, affixing, and diacritical forms add high-order dimensionality for words, where the same three-letter root can generate different words in each case [9].The nature of the Arabic language identifies the need for custom tools for Arabic SA that will be capable of identifying these diacritics and performing efficient automated POS tagging for Arabic text.As explained, morphological analyzers should be used in tandem with POS taggers to carry out root extraction as well as prefix, suffix and affix extraction.Currently, tools like MADA (Morphological Analysis and Disambiguation for Arabic) and BAMA (Buckwalter Arabic Morphological Analyzer) are being used by Arabic language sentiment analysis researchers but these tools are far from being advanced, and there is still a need for complex and more capable POS taggers to be developed for this domain, among other issues.

III. SENTIMENT ANALYSIS PROCESS
Sentiment Analysis generally consists of three main steps: pre-processing, feature selection and sentiment classification.

A. Preprocessing
The text documents contain rich textual information such as words and phrases, punctuation, abbreviation, emoticons etc.They also tend to have misspelling, duplicate-characters (such as "cooool"), especially for social media text.Direct application of SA methods on such text usually leads to poor performance.Therefore, pre-processing is typically conducted to convert the text into textual features that could be fit into the SA methods.Once the pre-processed text features are extracted, they are ready to be fit in the next phase of SA -Feature Selection [10] [11].Pre-processing is usually based on NLP techniques such as tokenization (splitting the sentences into words), de-noising (remove special characters, capture symbols for emotions), normalization (remove duplicate characters, identify root words etc.), stop-words removal (remove the stop words and the words which are of no use to sentiment analysis), stemming (return the word to its stem or root), lemmatization (convert inflected words to their root form) etc.
Haddi et al. [10] studied the role of text pre-processing in sentiment analysis, including online text cleaning, white space removal, expanding abbreviation, stemming, negation and stop words removal.For stop words, they constructed list of domain specific stop words which are not standard stop words but carry no information for the specific domain.Bao et al. [11] evaluated the effects of text pre-processing in twitter sentiment analysis.They first considered username, hashtags, emotions, digital symbols, single letters, punctuations and other non-alphabetic symbols for de-noising.Then they conducted five steps for pre-processing: URLs features reservation, negation transformation, repeated letters normalization, stemming and lemmatization.They showed that sentiment classification accuracy rises when URLs features reservation, negation transformation and repeated letters normalization are employed while descends when stemming and lemmatization are applied.

B. Feature Selection
The outputs of pre-processing are the extracted text features.Many text features are considered for SA: unigram (individual words), bigram (two consecutive words), or ngrams (n consecutive words) and either their presence for binary weighting or their frequency to indicate their relative importance; words and phrases commonly used to express opinions words and phrases commonly used to express intensification of opinions negative words that change the opinion orientation; part-of-speech (POS) to find adjectives that contains opinion information, emoticon (special characters to represent emotions).Many words in the text do not have an impact on the general orientation of it.Therefore, keeping those words makes the dimensionality of the classification problem high and hence the classification more difficult.These words may also contain noise for the classification problem [12][13] [14].The goal of feature selection is to select important text features out of the pool of all extracted ones.Generally speaking, feature selection methods can be categorized into filter methods and wrapper methods.Filter methods rank the features according to certain metric and select the top-ranked features.Wrapper methods, on the contrary, select the best subset of features by generation and evaluation of different subsets with a classifier.Therefore, the selected features tend to be classifier specific, namely they might perform well using the specific classifier that is used for the selection, but not necessarily well with other classifiers.
The work by Yu and Wu [12] presented a 'contextual entropy model' based on basic point-wise mutual information (PMI) to perform seed word expansion originating from a small corpus of stock market news articles.The model estimates the similarity between words and seed words by comparing their relative contextual distributions using an entropy system and selecting high-match entries.Elawady et al. [13] evaluated the performance of mRMR (minimum redundancy maximum relevance), IG (information gain) and hybrid method based on Rough set theory and IG.They showed that mRMR has better performance compared with IG and the hybrid method has the best performance for sentiment analysis tasks.Agarwal and Mittal [14] considered using text features such as unigram, bigrams, the concatenation of them and POS (parts of speech).They also compared the performance of mRMR and IG and showed that mRMR is superior to IG for sentiment analysis tasks.

C. Sentiment Classification
Sentiment classification techniques are usually divided into supervised, unsupervised and semi-supervised approaches.Supervised learning uses training data to process extracted text features by adopting machine learning techniques.Unsupervised learning in the sentiment analysis context relies on robust sentiment lexicons with a sizeable number of terms with known polarity and the application of statistical-semantical weighing and distribution schemes to apply polarities to unknown words and determine the polarity of blocks of text.We can further divide unsupervised methods into dictionary-based and corpus-based relative to how the lexicon is built.[15][16] [17].Dictionary-based approach carries out a forked distributed search (two forks: antonym and synonym) for each opinion word in the dictionary.The corpus-based approach guarantees context specificity of word orientations by searching a large corpus.Lexicon-based approaches require manual collection of the opinion words and has been criticized for requiring too much human effort [15][16] [17].As a solution, the semi-supervised approach uses an initial list of seed words with annotated polarities and uses synonym-based label propagation to map polarities to unknown words [15][16][18] [9].

IV. RELATED WORK
Many studies have presented several different approaches for sentiment analysis.In general, many of these studies focus on sentiment analysis for the English language and other www.ijacsa.thesai.orglanguages (Chinese, Italy, Ordo).There are comparatively few studies for sentiment analysis for the Arabic language.In this section we first present some important sentiment analysis studies in different languages before going on to survey the Arabic sentiment analysis studies.

A. Sentiment analysis In Genaral
Moraes et al. [19] compared the performance of SVM (support vector machines) and NN (neural networks) for a document-level SA analysis.They showed that NN achieves better performance than SVM on balanced datasets.Rui and Liu [20] investigated pre-consumer (prior to purchase) and post-consumer (after purchase) opinion differences using NB and SVM classifiers on twitter data from both classes of users.Li and Li [21] addressed subjectivity and expresser credibility in opinion studies using SVM as the classifier.Wang et al [22] studied the performance of three popular ensemble methods (bagging, boosting, random subspace) based on five basic learners (Naive Bayes, Maximum Entropy, Decision Tree, K-Nearest Neighbour, and Support Vector Machines) on sentiment classification tasks.They showed that random subspace achieves the best results.
New developments in supervised learning show a heavy dependence on conceptual analysis.Formal Concept Analysis and Fuzzy Formal Concept Analysis (FCA/FFCA) specifically were employed in works by Li and Tsai [23] showing an abstract conceptual classification system of documents and use of training (FFCA-based conceptual classifier training as opposed to document-based training) examples to boost accuracy.Kontopoulos et al. [24] have used FCA also to build an ontology domain model.In their work, they proposed the use of ontology-based techniques toward a more efficient sentiment analysis of twitter posts by breaking down each tweet into a set of aspects relevant to the subject.Poria et al. [25] proposed a novel paradigm to concept-level sentiment analysis that merges linguistics, common-sense computing, and machine learning for improving the accuracy of tasks such as polarity detection.Yang and Cardie [26] proposed an approach that allows structured modelling of sentiment by considering both local and global contextual information.They encode intuitive lexical and discourse knowledge as expressive constraints and integrate them into the learning of conditional random field models via posterior regularization.The paper by Tang et al. [27] shows a joint sentence-level segmentation and classification system.Latent Dirichlet Allocation (LDA) was used by Xiang and Zhou [28] in the creation of topic-specific information, before going on to divide the data into several subsets based on topic distribution.In the last wave, they presented a semi-supervised training system to further increase classification accuracy.They showed that the framework can better handle the inconsistent sentiment polarity between a phrase and the words it contains.Tang et al. [29] applied neural network to learn sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words.Unsupervised approaches also have a long history for SA.Xianghua and Guo [30] presented work in the Chineselanguage domain.Their work used an unsupervised approach to automatically segment Chinese social reviews into aspectsand compute the sentiment expressed in each aspect.They used Latent Dirichlet Allocation (LDA) for aspect discovery and employed a sliding-window context over the review text to generate local topics and the linked sentiment.In [31] by Cruz and Troyano presented a taxonomy-based approach where knowledge about how people express opinions in a given domain is catalogued.They showed that this domainspecific knowledge improves opinion mining accuracy.Huang et al. [32] considered words, symbols or phrases with emotional tendencies as input features.They studied the phenomenon of polysemy in single-character emotional word in Chinese and discussed single-character and multi-character emotional word separately.Kiritchenko et al. [33] conducted SA for short informal texts on both message-level and termlevel.They generated novel high-coverage tweet-specific sentiment lexicons from tweets with sentiment word hashtags and from tweets with emoticons.Pablos et al. [34] used a set of raw texts from a specific domain (the corpus) to build a list of opinion terms for that domain using seed-list propagation based on rules that featured dependency relations and POS restrictions.In unsupervised approach a significant methods are introduced in [35] [36].
Semi-supervised approaches for SA have recently attracted lots of attention.A semi-supervised approach was proposed by Tang et al [37] to evaluate different types of emotional signals in Twitter data using a correlated model.The model presents dual learning based on controlled alternating propagating and fitting processes operating on labelled and unlabeled data.Zhou et al. [38] applied a semi-supervised approach Fuzzy Deep Belief Network (FDBN) on SA.The deep architecture of FDBN consists of a set of unsupervised hidden layers and a final layer of supervised training.They did a comprehensive evaluation on the state-of-the-art semi-supervised methods for SA, including semi-supervised spectral learning(Spectral), transductive SVM(TSVM), deep belief networks(DBN), personal/impersonal views(PIV), active learning(Active), mine the easy classify the hard(MECH), active deep networks(ADN), fuzzy deep belief networks(FDBN), active FDBN(AFD).A hybrid study was performed by Ortigosa et al. [39] that combined machine learning and lexicon-based approaches with a selective logic that uses machine learning when a sufficient level of labelled data is available, and a lexicon-based system when not available.They believed their approach will not only extract sentiment but also identify significant changes in emotional signatures.As we have seen in the foregoing section, there has been a lot of advancement in sentiment analysis for the English-language domain.Many highly conceptual and experimental methods have been developed to improve the performance of basic classifiers, also more work has been done to advance the scope and applicability of supervised, unsupervised, semi-supervised, and hybrid techniques.This could be the result of an abundant level of research focus in this area, as well as favorable linkages between the research and profitable industrial applications.

B. Arabic Sentiment Analysis
There are many studies have been done in opinion mining field.Most of these studies have been done in English language context, and a little in Arabic language context.In this paper we will present some studies of Arabic language context.We present a comprehensive review of recent Arabic sentiment analysis research using a component-by-component approach.
We study the following components: approach used, methods (classifiers) used, data sources used, Arabic dialects processed, and sentiment analysis level.We also provide a merit-based assessment of the advantages and disadvantages of the sentiment analysis systems used in each research work surveyed.As we have seen in the introduction and related work, the approaches in sentiment analysis are usually divided into four classes: supervised, unsupervised, semi-supervised, and hybrid.Table 1 below categorizes the surveyed Arabic SA studies into these classes, and fig. 1 shows the result.
Arabic sentiment analysis studies used different method based on the used approach, some of these methods considered as a dominance.We have collected the various methods used in the different Arabic SA studies for supervised, semi-supervised, unsupervised and hybrid experiments and presented the result in Table 2.
From the above table, the most widely used methods (by far) appear to be based on Support Vector Machines (SVM), Naive Bayes (NB), and K-Nearest Neighbors (KNN).Lexicon-based approaches are generally prevalent across the majority of works sampled.Ensemble methods (comprising a variety of techniques) are also gaining significance.
In Arabic sentiment analysis studies several different text sources have been used, based on the objective of study, as outlined in Table 3 where the researchers in this domain appear to use Tweets, reviews/opinions & comments almost exclusively as datasets for their work on sentiment analysis.This may indicate a focus on social media.
We also investigated the size and diversity of the datasets used for the various Arabic sentiment analysis studies.We found that, there is significant variety in the quantity (size) of the datasets used in the various studies.It is more common to find studies where a single type of data was used, but there are a number of cases where multiple data types were combined.
One of the most crucial aspects of this work is the critical review of the various Arabic language sentiment analysis studies surveyed, with the goal of identifying positive highlights, shortcomings, and areas of improvement, after a comprehensive review of each of the studies.Our comments are provided in Tables 6 and 7.As summarized from the above table, we have made some conclusive observations about studies in the field of Arabic Sentiment Analysis through our review of current Arabic SA research works.We found that most Arabic sentiment analysis works focus on the use of supervised methods as opposed to other classes of sentiment analysis including unsupervised, semi-supervised and hybrid or experimental systems.This method requires a huge amount of corpus and manually labeling for training and testing purpose this can be expensive, time-consuming, and difficult due to sarcasm especially in Arabic text [40] [41].The main disadvantage of this approach, it is a domain-biased which mean it give low accuracy when it is applied in different domain that was trained.This approach usually use machine learning methods such as Support Vector Machines, Naïve Bayes Classifiers and Maximum Entropy approaches [40] [41].In the other hand some studies employed the lexicon-based approach using different techniques to generate sentiment lexicons that would contribute to the task of sentiment analysis.This approach is based on a list of sentiment words with their polarities to determine the sentiment of review.This approach is considered practical since it is not domain-biased, recently some researchers intended to use the ontology in this approach, and such ontology may be used for different tasks: Arabic NLP tools, information retrieval [42].Dialects are not supported in many of the Arabic SA studies surveyed in this paper.This presents a major disadvantage because the Arab language is dialectically rich and its diverse structural properties in the various dialects need to be fully captured in order to derive maximum benefit from Arabic SA, especially for less-formal channels like Social Media, whose corpora are principally not in Modern Standard Arabic (MSA).
It was also noticed that a limited set of classifiers (techniques) were repeatedly used for sentiment analysis in many of the papers surveyed.While researchers probably choose these same set of classifiers because they are proven to be effective, value is not being added to the field of Arabic SA if more experimental or conceptually novel techniques are not implemented or investigated.There is very little focus on sentence-level sentiment analysis for many of the studies.Most of the observations recorded during this survey generally lead to the conclusion that Arabic sentiment analysis is in its growing phases.

S/N Processing Level Studies 1
Sentence-level [2], [3], [23], [52], [69], [70] 2 Document-level [1], [4], [5], [6], [7], [9], [10], [11], [13], [19], [20], [21], [22], [27], [28], [29], [32], [33], [39], [41], [42], [46], [48], [49], [51], [65], 3 Document-level + Sentence-level [24] TABLE VI.ADVANTAGES ANA DISADVANTAGES SUMMARY Paper ID Advantages Disadvantages [1] showed extensive list of features, studied the importance of different features disregarded neutral and mixed classes [2] The annotations are extensive No Sentiment Analysis evaluations on the corpus [3] Multi-genre corpus No Sentiment Analysis evaluations on the corpus [4] Multiple lexicons constructed, integrated lexicon achieves best performance Dialects are not considered [5] Negation and Intensification are considered Neutral class is not included.Sarcasm is not considered [6] Advanced lexicon construction Using individual words polarities technique [7] showed extensive list of features, studied the importance of different features could try more classification methods, no details given on how sampling is conducted to obtain a balanced subset of data [9] Pre-processing leads to improvement Tags need to be added manually [10] Introduced Social Network specific features Dialects are not considered [11] Besides classification on subjectivity and polarity, also considered intensity classification Does not deal with Emoticons, chat language and Arabizi [13] Large-scale lexicon Could try more classifiers, dialects not considered [19] Studied the effects of pre-processing and the characteristics of the dataset Could try more classifiers, dialects not considered [20] Developed three lexicons as well as a negation library, the dataset was large Intensifications were not considered [21] Evaluated methods to learn the weights of the words and combine such weights Dialects are not considered [22] Combination of multiple methods improves the performance Considered posts from only three domains [23] label propagation is effective for lexicon construction Only considered sentence level [24] Considered both grammar and lexicon Dialects, suffix and prefix extraction not extracted, small dataset [27] Developed three lexicons as well as a negation library, the dataset was large Only sentence level [28] Particularly addressed slang language No benefits for non-slang cases [29] NLP is used, word presence feature leads to better performance Could use more classifiers

TABLE VII. ADVANTAGES AND DISADVANTAGES SUMMARY
Paper ID Advantages Disadvantages [32] Considered both supervised and unsupervised approaches Evaluated limited supervised methods [33] SentiStrength has better performance than SocialMention Dialects are not considered [39] Semi-supervised lexicon construction Dialects, Franco Arabic and compound phrases are not considered (single word match only) [41] Addressed unbalanced classification Proposed methods didn't show advantage [42] Ensemble classifier achieves better classification More classifiers can be added to the ensemble [46] Extensive feature categories, addressed topic shift semi-supervised approach improves subjectivity analysis but not sentiment analysis [48] The Corpus has good quality Could include more features [49] Determines the polarity of an Arabic corpus using English translation SA depends on the quality of the translation [51] Very detailed investigation on the processing techniques pre-processing techniques could be improved by cross-validation, lexicon might not be extensive [52] n-gram features are used The corpus is small and low frequency terms are ignored [65] Sizeable dataset used Could have used more classifiers [69] Used an ensemble of classifiers with a relatively comprehensive dataset The size of the dataset is small [70] Supported dialectal Arabic in addition to MSA Could have used more classifiers, relatively limited dataset V. QUANTITATIVE ANALYSIS OF RECENT ARABIC SA RESEARCH Our primary concern for performing a quantitative analysis on the performance data provided by the different Arabic sentiment analysis studies is to determine if, and the degree to which, there is any significant difference in the performance outputs (evaluated across accuracy, precision, recall and Fscore) for each of the methods used in the research works being surveyed, as this knowledge will put us in a position to potentially identify areas for improvement in current approaches.Table 8, catalogues reported statistics collated www.ijacsa.thesai.orgfrom the various research publications being surveyed.Note: where multiple results were provided in these works, we selected only the best results.Every attempt was made to state the results as they were originally published by their various authors.83.00% 72.00% [70] 99.90% 99.90% 99.90% 99.90%

A. Analysis Technique -Smoothness Analysis
Smoothness analysis is based on arithmetic series in discrete mathematics [42].For any arithmetic series, we have a first term   , last term   and common difference  such that any member of the series can be represented as: Because real-life data may not always behave as an arithmetic series, the smoothness of a distribution is simply an estimation of the error in the real distribution relative to the projected arithmetic series distribution [42]: Where: S 1 = first term in the real data series when arranged in increasing order, S n = last term in the real data series when arranged in increasing order, and S_bar = average of the real data series.
Benefit: the smoothness of a distribution as calculated by equation (1) gives us the % error (percentage error) in the straight-line form of the data, and tells us how the data has changed with respect to the different input values (that is, we can evaluate the impact or significance of the different methods used by the research works on the performance scores reported).

B. Local Optimum Problem in Smoothness Analysis
When evaluating the impact of studies using equation ( 1) and the data from Table 7, we run into the problem of local optimum: an approximately constant score for all performance categories.This is because all the performance scores are less than 1 and are therefore, similar.
Lemma: For all pairs of similar values, the smoothness function will return a zero (no impact) result.
This invalidates the analysis unless a solution can be obtained to proportionally amplify the input values (performance scores), so that the validity condition (shown below) can be met: (Validity condition for smoothness function)

C. Solution to Local Optimum: Logarithmic Smoothing
To solve the local optimum problem described above which will invalidate our analysis according to Lemma due to the closely-bounded performance scores, we explore the use of the logarithmic smoothing technique described in [43], a procedure for proportionally expanding individual elements within the space of a closely-bounded range.
Where: r = component of a point in r-coordinate, θ = component of a point in θ-coordinate, ϕ = component of a point in ϕ-coordinate, r' = target projection of r, θ' = target projection of θ, ϕ' = target projection of ϕ, r max = maximum value of r, θ max = maximum value of θ, ϕ max = maximum value of ϕ Benefit: At any point within the sphere (3D space), the function Γ(ln r, ln θ, ln ϕ) gives a smooth projection that is continuous in r, θ, and ϕ directions [43].This means that with this transformation, the problem of local optimum can be reasonably avoided because input values are transformed to their smooth projections r → r smooth , θ → θ smooth , ϕ → ϕ smooth and these values will pass the validity condition because r smooth > r, θ smooth > θ, ϕ smooth > ϕ and For our purpose in this analysis, we present a simplification of this idea as follows: As we only have 1-dimensional data (each performance parameter is evaluated on a case-by-case basis -accuracy only, precision only, recall only and F-score only), for which only the r-coordinate is sufficient, we need to remove unnecessary coordinates (θ, ϕ) by setting these values to 1: This reduces equation (2) to a form that is applicable for our analysis, which is: Γ(ln r, ln 1, ln 1) = Γ(ln r, 0,0) = r′(1)(1)e Which we can write as: Conclusion: equation ( 3) above is the logarithmic smoothing function that we will use in our analysis to solve the problem of local optimum.Outcome: there is proportional amplification in the data, such that the behavior of the data remains unchanged, while small differences are much easily visualized and evaluated.

D. Results of The Analysis
Comparative Results -Accuracy: Table 9 shows the raw accuracy scores and the converted logical scores for use in the smoothness analysis.
To arrive at the logical scores shown in the table above (used for the analysis), we used the function of logarithmic smoothing, by setting r' = 1000, calculating converted scores and arranging in increasing order.For this table, max: r (largest element in r) is 0.999.The smoothness = 0.0463, for this dataset (slightly rough), as calculated by using the smoothness function, indicating that: there is no significant impact of the different methods used on the accuracy.See Fig. 3 for a visualization (plot correlates well with trend-line).By using the same process, we obtained: 0.10973023, 0.12700562, and 0.045964546, as smoothness result for precision, recall, and F-score, respectively.The results lead to the following conclusions: there is slight impact of the different methods used on the precision and recall (see slightly rough curves in Fig. 3, Fig. 4, and Fig. 5 -the plots do not correlate very well with their trend-lines) but there is no significant impact on accuracy and F-score (see smooth curves in Fig. 3, and Fig. 4 -the plots correlate well with their trend-lines).In this paper we have surveyed the important Arabic sentiment analysis studies qualitatively and quantitatively.We have presented detailed analyses of methods used and results obtained in the current Arabic sentiment analysis studies, as well as a rich discourse on the direction of current research, present limitations.In our qualitative evaluation, we found that, the majority of Arabic SA uses established supervised methods as opposed to more progressive or experimental unsupervised and semi-supervised approaches.The dialects are not processed in many of the Arabic SA studies surveyed, which is a major drawback on the effectiveness of current Arabic SA because most of the available Arabic language text in the social media and other spaces represent a wide range of distinct, autonomous, and morphologically complex Arabic language dialects.It was also observed that many of the studies surveyed used the same limited set of classifiersraising questions about reasonable value added to the field if every study essentially repeats the same experiment on a Different dataset.There is a definite need for more inventiveness and creativity in the design of experiments as well as the development of novel classification and analysis techniques beyond the established algorithms.
For our quantitative evaluation, we applied rigorous data modelling and statistical procedures to investigate the effectiveness of methods adopted by the various researchers in the Arabic SA works surveyed.We collected performance data (accuracy, precision, recall and f-score) for the various studies and applied advanced techniques including logarithmic smoothing field analysis and a relative smoothness function, to uncover deep patterns in the performance data.Our approach was based on the reasoning that similar processes will produce similar results.The various studies conducted for Arabic SA will not be differentiable if they all produce similar results across the various performance classes -accuracy, precision, recall and f-score.But where we have significant variance of results, then there is opportunity for improvement.The analysis performed yielded the following conclusion: there is only a slight impact of the different methods used on the Precision & Recall of results obtained while there was no significant impact on the Accuracy & F-score.This ultimately leads us to the conclusion that Arabic SA researchers should employ a more diverse set of techniques and approaches that do more to improve scoring across the full range of performance parameters.
In the future work, we believe that there is a promising trend to obtain optimal Arabic SA system.We intend to propose and develop a new hybrid method using deep learning technique and big data technique such as Hadoop and MapReduce to solve some of the existing problems in Arabic sentiment analysis as highlighted in this survey as well as to obtain optimal system for Arabic SA.As we have seen, most of the work in the field of Arabic sentiment analysis has focused on the use of supervised learning techniques, and are largely lexicon-based approaches with the characteristic limitations.We believe that the opportunity space for growth in this field will be driven by the exploration of unsupervised learning techniques, principally through hybrid method.www.ijacsa.thesai.org have a valid analysis (by the validity condition for smoothness function).

Fig. 2
Fig.2 below shows the effect of applying the logarithmic smoothing function equation (3) in transforming data from closely-bounded spaces (x-space) to loose-bounded spaces (Lspace):

TABLE I .
ARABIC SA STUDIES BY APPROACH

TABLE II .
ARABIC SA STUDIES BY METHOD TABLE III.ARABIC SA STUDIES BY DATA

Table 4
below presents an overview of the Arabic dialect distribution in the studies surveyed.As we can see from the above table, modern Standard Arabic (MSA) sources are widely used throughout the studies sampled in this survey.Where dialects are used, Egyptian (MSA/Egyptian) was more favorable.There are also works with Levantine, Khaliji, Arabizi, Mesopotamian, Syro-Palestinian, Middle East Region, and Informal (Lebanese, Syrian, Iraqi, Libyan, Algerian, Tunisian, and Sudanese) dialects.

TABLE IV .
ARABIC SA STUDIES BY LANGUAGE

TABLE V .
ARABIC SA STUDIES BY PROCESSING LEVEL

TABLE VIII .
REPORTED STATISTICS FROM SURVEYED STUDIES

Table 10
presents a summary of smoothness results obtained from the experiments.As can be seen from this analysis, Accuracy & F-score are not impacted by the different methods adopted by the various researchers in the studies surveyed.Precision & Recall, however, show slight response to the different methods used by the researchers in the studies surveyed.

Table 8 :
Accuracy distribution by logical score and