Arabic Phrase-level Contextual Polarity Recognition to Enhance Sentiment Arabic Lexical Semantic Database Generation

—Most of opinion mining works need lexical resources for opinion which recognize the polarity of words (positive/ negative) regardless their contexts which called prior polarity. The word prior polarity may be changed when it is considered in its contexts, for example, positive words may be used in phrases expressing negative sentiments, or vice versa. In this paper, we aim at generating sentiment Arabic lexical semantic database having the word prior coupled with its contextual polarities and the related phrases. To do that, we study first the prior polarity effects of each word using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and Support Vector Machine classifier. We then use the seminal English two-step contextual polarity phrase-level recognition approach to enhance word polarities within its contexts. Our results achieve significant improvement over baselines.


INTRODUCTION
Opinion mining is the task to distinguish between subjective and objective sentiments in the text.Most work of opinion mining has been extensively explored at documentlevel while there has been few researches investigating feature design at the sentence-level.Any sentence may have positive, negative and neutral opinions, for example, [ " ‫و‬ ‫بجد‬ ‫أعمل‬ ‫ظللت‬ ‫لكن‬ ‫الماضیة‬ ‫األشھر‬ ‫طیلة‬ ‫أجتھاد‬ ‫النتائج‬ ‫سیئ‬ ‫كانت‬ ‫ه"‬ ] ["I have been working hard and over the past few months but the results were bad" ] and it is difficult to accurately mark subjective phrase boundaries such that the polarity classification may differ substantially from the sentence-level and the document-level in that resulting bag-of-words feature vectors tend to be very sparse resulting in lower classification accuracy [1].
We used SentiRDI [2] which is a large set of subjective clues coupled with their prior polarities; subjective clues are words with polar (positive/negative) prior polarities.We considered each phrase having one of these clues to classify its contextual polarity.To classify the contextual polarities, we used the seminal English work approach [3] that first determines if the phrases are polar or neutral and then it takes the polar phrases for additional classification to determine the polarity for each polar phrase.In our research, all annotations and classification results were manually revised and assessed.For the classification assessment, we used F-measure (F), Precision (P), and Recall (R).
This paper is organized as follow: Section II describes in brief some main contextual polarity related works.Section III gives the overview of prior polarity subjectivity Arabic database (SentiRDI).Section IV describes the corpus that is used in sentence subjectivity classifier and contextual polarity.Section V describes the sentence subjectivity classification using Support Vector Machine (SVM) .Section VI explains the contextual polarity influencers and proposed features that are used in the two-step phrase-level classification approach [3].Section VII shows the experimental results of contextual polarity.Section VIII shows the analysis of the experimental results.Finally, Section IX draws our conclusions and future work.

II. CONTEXTUAL POLARITY RELATED WORK
Nowadays, many researches have been contributing to the contextual polarity recognition task at various textual levels such as [1,3,4,5].They mainly classified expressions related to some subjective clues.Also, they often used manual developed lexicons to help in classifying polarities.Per to our knowledge, there is no robust and tested phrase-level contextual polarity study in Arabic.

III. PRIOR POLARITY SUBJECTIVITY DATABASE
Our approach uses an Arabic lexical Resource for opinion mining (SentiRDI) [2] which has the subjectivity and the orientation of more than 18,400 semantic fields covering over 150,000 words in Arabic.Subjective semantic fields in the database are the subjective clues [1,3] which are words used to express private states [6] mainly an opinion, emotion, evaluation, stance, speculation etc.

IV. RESEARCH CORPUS
We translated MPQA opinion corpus 1 in Arabic which consists of 535 English-language news articles from a variety of sources, manually annotated [7] for subjectivity analysis.The corpus consists of 9700 sentences, 55% of them are labeled as subjective, while the rest are objective.We consider only 3578 sentences with 18,678 subjective phrases.Subjective phrase is the expression which contains subjective clue (term 1 http://mpqa.cs.pitt.edu/that has subjective prior polarity).The translated annotations were manually revised and corrected by all authors.
V. SUBJECTIVITY CLASSIFICATION Simple text preprocessing was executed in order to remove special characters and non-Arabic characters in corpus.More advanced text preprocessing was executed in order to prepare it for SVM algorithm input such as extracting named entities using [8], assigning Part Of Speech tags (POS) using the Research and Development International (RDI) 2 and assigning the prior polarity of each word by using SentiRDI.The features that were extracted from the sentence are:- Where n is number of words in sentence, Pwi polarity of word i in sentence that is specified before from prior polarity database (SentiRDI) such that Average Term Frequency: Inverse Sentence Frequency (TF-ISF) for sentence (Si) can be computed by the following equation:- |||| =1 Where TF presents the number of occurrences of each term within the sentence and can be normalized by dividing it by size of sentence.
Where Nt,s is the number of occurrences of term t in sentence S. ||S|| is the number of words in sentence S. ISF is used for terms that appear in the small number of sentences.This factor is useful because numbers of subjective terms are small compared with neutral (objective) ones.
Where S is the number of all sentences in the corpus and Si is the number of sentences containing term i.
The above contextual polarity influencers are extracted from our corpus and used in our classifiers as features as described below.In order to classify the contextual polarities of the subjective expressions, first we determine whether the clue instances are neutral or polar in their contexts.While neutral clues are words which have non-neutral prior polarities with neutral contextual polarities, polar clues are words which have non-neutral prior polarities with non-neutral contextual polarities.Second, all polar clues that result from the first-step are taken for more classification to determine whether the polar clue instance has positive contextual polarity or negative polar polarity.

B. Baseline (prior polarity classifier)
We created a simple prior polarity classifier (TABLE I) assuming that the contextual polarity of a clue instance equals to the clue's prior polarity.We apply this classifier on all extracted subjective expression (18,678) from translated MPQA corpus.The classifier has accuracy of 48.45% and the following table describes the results of this classifier.

C. Features of Neutral-polar classification
The neutral-polar classifier is to recognize the neutral clues from the polar ones.The features set used in this classifier are: Word: it is the word which has non-neutral prior polarity subjective clue (SC).
Semantic ID of SC: it is the feature presents the RDIArabSemanticDB word semantic field identification.This feature is designed to help in recognizing the meaning of SC decreasing the ambiguity of the word sense.
POS of SC: it is the part of speech of the subjective clue.We used Stanford Log-Linear Part of Speech Tagger to extract POS.

POS of previous word:
it is the POS that presents POS tag of the SC previous word.

POS of next word: it is the POS that presents POS tag of the SC next word.
Prior polarity of SC: it is the prior polarity of the subjective clue from SentiRDI.This feature has a binary value of (0) if it is positive or (1) if it is negative.NER_SC: it is the binary feature to present if the subjective clue is a named entity.SC_before: it is the binary feature to present if the subjective clue is preceded by another one.
SC_After: it is the binary feature to present if the subjective clue is followed by another one.
Self_intensifier: it is the binary feature to present if subjective clue is one of intensifiers or not.
Intensifier_before_after: it is the binary feature to present if there is intensifier before or after the subjective clue.
Connector: it is the binary feature to present if there is connector ‫"أو"[‬ , ‫"و"‬ ] ["and", "or"] between two subjective clues (in this case they have the same polarity ) .Shift_conn: it is the binary feature to present if there is a connector ‫من"[‬ ‫,"بالرغم‬ ‫"لكن"‬ ] ["but"," in spite of"] between two subjective clues (in this case they have opposite polarity ) .
Obj_shifter: it is the binary feature to present if there is one of objective shifters before a subjective clue.
Self_obj_shifter: it is the binary feature to present if the subjective clue is one of objective shifters or not.www.ijacsa.thesai.org

D. Features of Polarity classification
This is the second-step classifier that takes all polar expressions are produced from the first-step neutral-polar classifier to determine whether the contextual polarity is positive or negative.The features set used in this classifier are Word: it is the word which has non-neutral prior polarity subjective clue (SC).
Semantic ID of SC: it presents the RDIArabSemanticDB semantic field identification which helps in recognizing the meaning of the subjective clue decreasing the ambiguity of word sense.
Prior polarity of SC: it is the prior polarity of subjective clue extracted from SentiRDI.This feature has a binary value that takes value (0) if it is positive or (1) if it is negative.
Prior polarity of next word: it presents the prior polarity of the SC next word.
Prior polarity of previous word: it presents the prior polarity of the SC previous word Self_intensifier: it is the binary feature to present if the SC is one of intensifiers or not.
Intensifier-before-after: it is the binary feature to present if there is an intensifier before or after the subjective clue.
Connector: it is the binary feature to present if there is connector ‫"أو"[‬ , ‫"و"‬ ] ["and", "or"] between two subjective clues (in this case they have similar polarities) .Shift_conn: it is the binary feature to present if there is a connector ‫من"[‬ ‫,"بالرغم‬ ‫"لكن"‬ ] ["but"," in spite of"] between two subjective clues (in this case they have opposite polarities) .
Negation: it is the binary feature to present if the subjective clue is preceded by one of the negative tools.Here, we consider a 4-word window before the subjective clue to deal with longer-distance dependencies.
General polarity shifter: it is the binary feature to present if the subjective clue is preceded by one of the shifters; these shifters alter the polarity to its opposite.
Negative polarity shifter: is the binary feature to present if the subjective clue is preceded by one of the shifters; these shifters alter the polarity to its negation.
Positive polarity shifter: it is the binary feature to present if the subjective clue is preceded by one of the shifters; these shifters change polarity to its affirmative.

VII. EXPERIMENTAL RESULTS OF CONTEXTUAL POLARITY
The objective of the experiments is to classify the contextual polarities of the expressions that contain instances of the subjectivity clues from SentiRDI.Support vector machine (SVM) is used for the classification task.In order to classify the contextual polarities of subjective expressions, first we determine whether clue instances are neutral or polar in context (the results of this classifier shown in Table II).Second, all the polar clues that result from the first-step are considered for more classification to determine whether the polar clue instance is positive or negative polar polarity (the results of this classifier shown in Table III).

VIII. ANALYSIS OF EXPERIMENTAL RESULTS
As shown above, contextual polarity recognition task (Table II polar results) enhances the classification of prior polarities of expressions in Table I.As well, the selected features surpasses both baseline classifiers (Table II and Table III).The final output of this research is that SentiRDI augmented with contextual polarities and the related phrases or examples; APPENDEX A shows some samples of our output.
From our experiments, we found that the quality of the prior polarity and the contextual polarity depend on many prerequired Natural Language Processing (NLP) tasks.These tasks are very useful to acquire prior and contextual polarities of the subjective clues, unfortunately, they add as well, at the same time, incremental error ratios to our target mission.The pre-required NLP tasks are:-Normalization of writing Arabic: in Arabic language there are some letters have different forms.For example, ["Alif"] ["a"] has four forms ‫,]"ا","أ","إ","آ"[‬ ["Yaa"] has two forms ‫"ي"[‬ , ‫"ى"‬ ] and ["Taa el marpouta" and el haa el marpouta "] ‫"ة","ه"[‬ ].
Arabic parser: unfortunately, until now there exists no highly accurate public parser for Arabic language due to its high ineffectual nature, complexity, and variant sources of ambiguities (lexical, structural, and semantic).
Named Entity Recognition: we used only the named entities extracted by [8] so we were dramatically affected by its performance.www.ijacsa.thesai.org

IX. CONCLUSION AND FUTURE WORK
In this paper, we study the seminal English two-step contextual polarity phrase-level recognition approach [3] to enhance word polarities within its contexts in Arabic language.Using this approach, we are able to automatically identify the contextual polarities for our Arabic large set of sentiment expressions, achieving results that are significantly better than baselines.Our main contribution is to acquire the sentiment Arabic lexical semantic database (SentiRDI) having the word prior polarities coupled with its contextual polarities and the related phrases (APPENDIX A).
In the future, we are going to extend the database depending on further analysis of exiting opinion mining English corpora.We intend to build our own examples and sentences to enrich the classifier performance with Arabic polar and neutral examples.
The word Part of Speech (POS): RDI-ArabMorphoPOS tagger was used[9].We used our prior polarity semantic database (SentiRDI) to determine the polarity of each word to acquire the following four features: Number of positive noun; Number of positive verb; Number of negative noun; Number of negative verb.Average Polarity of sentence =

TABLE II
TableIIpresents the results of neutral-polar classifier for the 15-feature classifier and two baseline classifiers.TableIIIpresents the results of polarity classifier for the 13-feature classifier and two baseline classifiers.The two baseline classifiers are the word token (WT) classifier and the word token with prior polarity (WT + PP) classifier.

TABLE III .
STEP 2 SVM CLASSIFIER RESULTS