Feature Fusion for Negation Scope Detection in Sentiment Analysis: Comprehensive Analysis over Social Media

—Negation control for sentiment analysis is essential and effective decision support system. Negation control include identiﬁcation of negation cues, scope of negation and their inﬂuence within it. Negation can either shift or change the polarity score of opinionated word. This paper present a framework for feature fusion of text feature extraction, negation cue and scope detection technique for enhancing the performance of recent sentiment classiﬁer for negation control. Explore text feature POS, BOW and HT with negation cue and scope detection techniques for classiﬁcation technique over social media data set. This paper has included the evaluation of sentiment classiﬁcation (Support vector machine, Navies Bayes, Linear Regression and Random Forest) and Nine feature fusion over presented prepossessing framework. This paper yield interesting result about collective response of feature fusion for negation scope detection and clas-siﬁcation technique. Feature Fusion vector signiﬁcantly increase the polarity classiﬁcation accuracy of sentiment classiﬁcation technique. POS with Grammatical dependency tree can detect negation with better accuracy as compared to other feature fusion.


I. INTRODUCTION
Sentiment analysis (SA) is the computational analysis of the opinion, attitudes, emotions of speaker/writer towards some topic and identification of non-trivial, subjective information from text repository.Before the term sentiment analysis came into existence [1], this area was recognized as opinion mining, point of view and subjectivity.At the present time, SA is speedily growing field due to the rise of online message spreading platform such as blogs, social media and commercial website.On regular basis billions of people share their experiences, knowledge and views on latest trend of politics, economics and other global-critical issue.In current time Sentiment Analysis, subjectivity and Opinion mining enthralled significant interest from both the research community and Marketing Agency [2].The main purpose of sentiment analysis is to rank the opinion according to its level of positive, negative or neutral polarity [3].Sentiment analysis have many applications, ranging from product analysis [4]to improving sales and marketing strategies, predicting stock market fluctuations [5], identifying changes ideological in political issues [6] , in the prediction of film critics [7] and in Electronic Government.-regulation [8], that is to say, the opinion of the citizens on a law before its approval.Although there has been a lot of work done in the area of sentiment analysis, there are still open challenges related to SA's multilingual strategy, classifying the sentence with slangs, symbols, misspelled words and expressions.idioms, SA sarcastic sentences and handle negation and identify polarity.mark in negative feelings [3].Here in this paper we have summarized the effect of negation cues over sentiment analysis and introduced a comparative analysis of recent text feature extraction, negation cues and scope detection technique.This paper present a framework securitizing and preprocessed social media data set and formulate the supervised classification technique with feature fusion for negative sentiment analysis.The rest of the paper is organized as follows: Section 2 presents over view of Negative sentiment analysis; Section 3 covers related work on negation handle mechanism for sentiment analysis and polarity detection over social media data set.Section 4 present a framework for securitizing and preprocessed social media data set and subsection 4(A-C) explain how social media data are processed, step for prepossessing, negation cue and scope technique for efficient SA and experimental Contents for performance evaluation respectively.Section 5 describe the experimental setup for comparative evaluation of different scope detection technique with classification approach for sentiment analysis over social media and finally, Section 6 concludes the paper and outlines the founding and future work.

II. NEGATION SENTIMENT ANALYSIS
Negation can be defined as a linguistic event.It acts as polarity influence which can effect the meaning or the semantic of sentence for e.g the polarity of sentence changes from positive to negative eventually which can swing the polarity strength.To overcome this, necessary action for negation in SA are required.Author in [9] state that negation is a complex phenomenon that studied under different disciplines.In NLP, negation is considered as operator and scope is a principle feature of operators, i.e. negation influence the meaning of other phase of the sentence within their scope.Negations can not only change the meaning of single words or phase of words but also reduce the polarities of opinionated word.For example consider following sentences S 1 , S 2 and S 3 .
Sentence (S 1 ):This Sunscreen Lotion is not costly but it www.ijacsa.thesai.orgsuits me.Sentence (S 2 ): The product doesn't have nice packaging but really effective.Sentence (S 3 ): This sunscreen is less relevant for fairer skin.
Where in Sentence S 1 , scope of negation 'Not' is only limited to the next word after negation i.e. 'costly'.Where negation only invert the meaning of word "Suits".Whereas in Sentence S2, Scope of negation "not" is till the end of sentence.On other hand in sentence S3 uses diminisher "Less" to reduce the polarities of opinionated words instead of completely reversing the polarities.Method to handle negation in sentiment analysis is depend upon type of negative linguistic patterns and class negative word used in respective negative sentence as shown in Fig. 1 and Table I.Depending upon assertion linguistic patterns, negation in negative sentences may be occur explicitly (with explicit clues such as not, no etc.) and implicitly (with implicit clues such as scarcely, hardly, few, seldom, little, only, etc.).For expressing the negative opinion, if negation encoded opinionated word has been used then its implicit negation whereas if standard negation cues are used with opinionated word then it is explicit negation.The list of explicit and implicit negation cue are listed in Table I.For example consider following sentences S 4 , S 5 and S 6 .
Sentence (S 4 ):This music system is not good.Sentence (S 5 ): My personal experience to use this music system is horrible.Sentence (S 6 ): Sound system of this music system is superb, I'm suffering from headache after enjoying the song!!! Sentence (S 7 ): This music system is irrelevant for oldies!!!For instance Sentence S 4 have explicit negative sentiment about the music system whereas sentence S 5 use "horrible" as opinionated that encode negative sentiment about music system.On other hand sentence S 6 use irony to reflect its negative sentiment about the respective product.Whereas at structural level negative sentence may be appear with morphological, syntactic, contrast, compound and nonnegative negations.In Morphological negation, negative meaning is carried out by modifying opinionated word either by prefix (e.g.ir-, non-, un-etc.)or suffix (e.g.-less).Whereas in Syntactic negations, explicit negation cues are used to revise the polarity of a single opinionated word or a sequence of words.For instance sentence S 7 use morphological negation to show its negative concern about the cell phone.Whereas S 1 , S 2 , S 4 and S 5 syntactic negative sentence.In contrast negation, negative expression show contrast or manage opposition between opinionated terms.While compound negation express comparison or inequality between opinionated term.Whereas in non-negative negation that's used for interrogative and conditional sentences, negative cues and opinionated term may not contain any opinion or sentiment.For instance, sentence S 8 , S 9 and S 10 shows contrast, compound and non-negative negation respectively.Contrast negation (S 8 ):-I brought this cell phone not for camera resolution but for its MP3.
Compound negation (S 9 ):-Touchscreen of cell phone is not better than other.. Non-negative Negation (S 10 ):-Is Sound quality of this cell phone is not good?
Intensifier and diminisher phase of word use as valance shifter in negation.Valance shifter usually degrade or upgrade the polarity strength instead of inverting the polarity of opinionated word.For instance, sentence S 11 and S 12 shows intensifier and diminisher based valance shifter in negation.Where the term "Very much" in sentence S 11 degrade the negative polarity orientated by the term "not relevant" while the term "less" used in S 12 shift positive polarity of front camera towards little bit negative.Negative Intensifier (S 11 ):-This Sunscreen is not very much relevant for me.Negative Diminishes (S 12 ):-Effect of this Sunscreen is less relevant for beach outing.

III. RELATED WORK
Handle Negation in sentiment analysis required to identification negation term as cue detection and recognize its linguistic influence as scope detection.Recently researcher focus to identify negative cues grammatical structure for framing supervised syntactic rule through for training purpose [10], [11], [12], [13] Ghiassi et.al.[14] applied supervised rules for polarity score calculation and tagged opinionated term with six different polarity level i.e. "XP" (extremely positive), "VP" (very positive), "SP" (somewhat positive), "SN" (somewhat negative), "VN" (very negative) or "XN" (extremely negative) by using information gain feature extraction technique.Whereas Apple et al. [15] present fuzzy set theory based probabilistic classifier for categorizing polarity intensity up to five level from Mild to most intensive as Poorly slight, Moderate, very and Most intensive sentiment word.Garcia et al. [16] present probabilistic classifier to highlight the negativity, Korkontzelos et al. [17] use part of speech (POS) to evaluate grammatical dependency among negation cue and opinionated word in medical area.Diamantini et al. [18] use depth-first search (DFS) strategy for building grammatical dependency tree to identify of negation cues.Tian Kang et al. [19] use Conditional Random Fields (CRF) for 'BIO' tagging to represent the boundaries of negation cues.Prollochs et al. [20] use manually labeled dataset for predicting negation cue and it scopes by reinforcement learning and machine learning technique.
Polarity shift via negative cues affect sentiment analysis performance.Recent research has been focus over arithmetical techniques to discriminate explicit and implicit polarity shifts valuation.Tellez et al. [21]use rule-based method to spot polarity shifts in explicit negations and contrasts.Ghiassi et al. [14] use BOW to handle valence shifter such as intensifiers, diminishers and sarcasm.
Jimenez-Zafra et al. [22] use SFU review -NEG corpus for the supervised polarity classification system.AL-Sharuee et al. [23] handling intensifiers and negation using SentiWordNet and use antonym dictionary to replace adjectives and adverbs
that follow negation terms with their opposite sentiment words.

IV. COMPARATIVE ANALYSIS
This paper present a four tier framework for feature fusion of text feature extraction (POS, BOW and HT) and negation scope detection technique.Comparative analysis are present interesting and useful facts regarding the state-of-the-art of four benchmark sentiment classifier with feature fusion (as mention in Table II).Proposed framework use to comparing the performance supervised sentiment classifier after preprocessing and feature fusion for negation sentiment classification as shown in Fig. 2.

A. Social Media Massage Pre-Processing
Social media post and tweets contain high rich of domain specific slag language, emoticons, symbols, idioms and sarcastic sentences.For accurate sentiment analysis proposed framework explored the unique properties social media data and try to refine by sentence splitting, slag replacement, word normalization and negation control pre-processing step for better sentiment classification.
Cell Phone-User Review (R 1 ) "I bought a dual camera cell phone last week.Camera resolution is awesome, having lower battery life, but its ok for me.I m loooooving it." Sentence splitting phase split the review R 1 into five different sentence as sentence S 1 , S 2 ,S 3 , S 4 and S 5 .
S 1 :-I bought a dual camera cell phone last week.
S 2 :-Camera resolution is awesome.
S 3 :-having lower battery life.
S 4 :-but its ok for me.

B. Text Feature Extraction from Social Media Post
Once the social media massages are preprocessed, processed massages are passed for sentiment classification.For relevant classification this paper deploys bag-of-words (BoW), feature hashing (FH), and POS feature extraction technique to extract and select text features.
1) Parts of Speech (POS) tagger: POS Taggers provide syntax analysis of social media posts or tweets, and annotated each word as noun, verb, adjective, adverb, coordinating conjunction etc with a grammatical tagger .In sentiment analysis POS tagger used for Phrase identification, entity extraction and word sense disambiguation.POS Tagger employed probabilistic approach to evaluate the grammatical tagger and annotated highest probable tagger as shown in equation 1.
Where -P (tag i |phase j ) is the probability of tagger i annotated over phase j.
n(tag i , phase j ) is number of times phase j appears with grammatical tagger i.
n(phase j ) is number of times phase j appears.
n m (tag i ) is number of times a phase that had never been seen with grammatical tagger i gets grammatical tagger i.
n m () is number of such occurrences in total.
For Sentiment analysis, adjectives (grammatical tagger) are fine source of polarity for opinionated word in message.Consider the unprocessed comment C 7 and their resultant pos tagger provided by Stanford parser [http://nlp.stanford.edu:8080/parser/index.jsp].In processed comment C 8 word "nice" is adjective that shown polarity of comment C 8 about the entity "Camera".
2) Bag-of-Words (BoW): For sentiment analysis, bag-ofwords is use to transforms social media post or tweets into weighted vectors that contain relative polarity score of each word in massage.BoW independently tackle each word (token) in a tweet as order-invariant collection of features as shown in equation 3.
In sentiment analysis short phase of word should capture better sentiment then single word.Bag-of-word work over that principle and consider bigram, trigram or n-gram phase of words for polarity score with help of sentiment lexicon.Consider the review tweet C 9 about the quality of phone .Unigram work over single word token "bad" whereas bigram take two word phase token i.e. " very bad" for calculating the polarity of comment.
Comment (C 9 ): It is a very bad phone.
Word phase " very bad" defiantly has higher negative polarity value than "bad" 3) Feature Hashing (FH): Hashtag is a opinionated term that labeled itself by social media user at end of their tweets to convey their sentiment and opinion.Generally social media user use hashtag to convey their sarcastic.Consider the review comment C 10 about movie, which is not positive but reviewer labeled their positive sentiment at end of tweet to convey their actual feeling .

C. Negation Feature Extraction
Negation control in sentiment analysis are involve two sub task specifically negative cues and scope detection.Negative cues detection is responsible to recognize the negative influence phase or term in sentences.For negation control, proposed framework use rule based keyword matching technique for negative cue detection and conjunction analysis, punctuation mark identification and grammatical dependency tree for scope detection technique.
1) Negation cues detection: Negation cues are the term or the phase of word that reflect negativity in review post.Proposed framework identify the negation cues by keywords matching technique from negation words corpus and replaced by token "NEGATION" as shown in negation feature extraction section in Fig. 2.
For example consider the comment C 11 .Where negation word 'not' identify by keyword matching and replace by taken 'NEGATION' for further treatment as shown in comment C 12 .
Sentence C 11 :Battery Life of this cell phone is not long but I am happy with its camera resolution.
Sentence C 12 :Battery Life of this cell phone is NEGATION long but I am happy with its camera resolution.
2) Negation Scope detection: -Scope detection technique figure out the linguistic impact of negation cues in opinion sentences.Proposed framework use conjunction word analysis (CWA), Punctuation mark identification (PMI) and Grammatical dependency tree (GDT) scope detection technique to figure out the linguistic coverage of "NEGATION" token labeled by negation cues detection phase.
(a) Conjunction Analysis: Conjunction words determine and fixed the influence of negative word that comes before and after the occurrence of "NEGATION" token.For example consider the Comment C 13 where one lady post different opinion about different aspect of beauty product.Lady have negative opinion about price but positive opinion about it quality.In comment C 13 conjunction word "but" help to figure out the influence of two opposite sentiment orientated opinionated word "good" and "expensive" before and after its appearance.
Comment (C 13 ):-"This Sunscreen lotion is really good but it's too expensive." Some other Conjunction word such as "expect", "however", "whereas", "although", "and", "or", "unless", "nevertheless" help to figure out the influence of negative token in sentence.Conjunction word "AND" some time fail to figure out the scope of negation.For example consider the sentence C 14 where negation word "doesn't" invert the polarity of both "good" and "nice" sentiment word.Comment (C 14 ):-"This cell phone doesn't have good battery backup and nice camera quality." Whereas as proposed feature fusion, Text feature extraction technique POS [24], [25], BOW [21], [16], [19], [13], [26], [27] and Hashtag help the overcome the limitation of conjunction word "AND" through grammatical marking, sentiment word and sarcasm identification respectively and simultaneously lead to evaluate polarity score of different part of sentence.(b) Punctuation Mark Identification: -Punctuation Mark (",", "!", ";") limit the influence of negation between "NEGATION" token and next punctuation mark.For example consider the production manager comment over company last year production in sentence C 15 .Where manager is really upset about current year production but he hopeful for next year.In this comment comma "," is use to separate out these two sentiment of production manager.Sentence (C 15 ):-"The production of this year is not up to mark, we are hopeful for next year." Punctuation Mark "," some time fail to figure out the scope of negation.For example consider the comment C 16 where negation word "doesn,t" invert the polarity of both "good" and "nice" sentiment word.
(c) Grammatical Dependency Tree: -Grammatical dependency between orders of occurrence of sentiments oriented word and negative cue help to figure out influence of NEGATION token [18].Grammatical dependency parser build syntactic tree [28] and their lowest level are help to figure out scope of negation.Text feature extraction technique POS [24], [25], BOW [21], [16], [19], [13], [26], [27] and Hashtag help for grammatical marking lead to evaluate lowest level of grammatical syntactic relationship.

D. Sentiment Classification
After examine the text feature extraction (POS, BOW, HT) and scope of negation (CWA, PMI, GDT) technique, proposed framework present nine one too many feature fusion case from Text feature to Scope of negation as shown in table.Feature fusion improve the performance of feature extraction by overcome the limitation of their subordinate.This paper evaluate the performance of Classifiers SVM, Naives Bayes, Linear regression and random Forest after incorporating the different feature fusion case as shown in table.F e a t u r e M a r g i n Fig. 3. Support Vector Machine For SA 1) Support Vector Machine:: In proposed framework, SVM determine the optimal hyper plane (W f f P S + b) based on feature fusion to maximize feature margin (f m ) between positive and negative polarity social media post and tweets as shown in Fig. 3. Support vector machine for sentiment classification [3], classifier the preprocessed massage dataset M f f after feature fusion.Where the performance of polarity classification depend upon type of feature fusion applied.After incorporating feature fusion technique for negative sentence sentiment analysis, SVM treat all the token in scope of negation as feature fusion vector space as shown in equation 4.
Where m f f is pre-processed text data set after incorporating Feature fusion.
-W f f vs is Feature fusion vector space.
-W s i is the sentiment word in negative scope.
-W sn is the set of word in scope of negation.
n t is negative token.
Preprocessed massage data set (M f f ) is the set of n couple of element (t i , P c ), where t i is associated with token within the M f f and P c indicate their respective polarity class (+ve , -ve) as shown in equation.t i can be capture by using feature fusion technique as shown in equation 5.
The Feature fusion vectors that define the hyper plane are the support sentiment feature fusion vectors (ffv) as shown in equation 6.
f f v = {(Superb, +ve) , (Best, +ve) , (Horrible, −ve)} (6) In proposed framework, SVM is needed to maximize the width of the feature margin (f m ).Where (W f f .P c + b 1 ) ≥ +ve∀ P ositive Sentiment over the P ositive hyperplane (7) (W f f .P c + b 2 ) ≥ −ve∀ N egative Sentiment over the N egative hyperplane (8) Feature margin between positive and negative hyber plane is To maximized the feature margin (f m ) , it's needed to minimized weight of sentiment feature vector space (W f f ).
2) Naive Bayes: In proposed framework Naïve Bayes determine the polarity class (+ve,-ve) of any preprocessed massage data set M f f after feature fusion on the basis of maximum posterior probability as shown in equation 11 and 12 [3].
Where P (p|M f f ) is final posterior probability and P(M f f |p) is the probability of sentence M f f belong to polarity class P c .Whereas P(p) and P(M f f ) is the independent probability polarity class P c and sentence M f f .Whereas after incorporating feature fusion vector (f f v ) as a relevant feature for negative sentiment analysis, NB treat all the token in f f v as independent probability entity as shown in equation 13.
Where P (n|f f v ) are independent given the polarity Class (P c ) and each word in scope of negation substitute their individual probability for exploring polarity classes.
3) Random forest: In proposed framework Random forest predict the polarity class (+ve, -ve) for preprocessed massage data set (M f f ) after incorporating feature fusion.Random forest predict the sentiment polarity class of sentence (M f f ) by building randomized regression trees {f f n (c,p c ,M f f )m≥1} based relationship between polarity class and sentences as shown in equation 14.
Where E pc is exception on polarity class (P c ) classification with random feature fusion parameter (ff) on condition c and data set (M f f ).Whereas incorporation of Feature fusion vector of negation as conditional parameter 'c' lead to minimized exception (Epc) on polarity class and increase classification rate.

4) Linear regression::
In proposed framework linear regression find a feature fusion based decision boundary that linearly distinct positive and negative polarity classes as shown in equation 15.
Where P passing the polarity function C * m f f through the threshold function as shown in equation 16.

V. ENVIRONMENT SETUP RESULT ANALYSIS
For performance analysis of recent benchmark classification technique (NB, SVM, RF and LR) over five different social media data set from two different source total nine different experiment has been carried out.Nine different experiment belong to one to many nine different feature fusion case that build in proposed framework as shown in Table II.All the nine different experiment has been carried out over 5 different social media post and tweets data set.First two data set is scraped by twitter API i.e.Stanford data set (TSCDS) [29] and Sanders Twitter Sentiment Corpus data set (TSDS) [30].Stanford data set contain 160000 training tweets accompanied by 80000 both positive and negative tweets.Whereas Sanders Twitter Sentiment data set contain 570 positive and 654 negative tweets.However last three data set has been carried out from amazon online product reviews data set of smartphone (ASPR), movies (AMR) and book (ABR) [31].Detail description of data set composition is summarized in Table III.
Performance evaluation of benchmark sentiment classifier with and without feature fusion for negation control are described in Table IV.Performance of classifier has been increased after incorporating feature fusion over negative social media post or tweets.
After evaluating the performance baseline sentiment classi-fier with feature fusion following outcome has been acquired.POS+GDT is best suited feature extraction and Scope detection technique to identify the range of influence marked by negation for negative sentiment Analysis.Whereas other gives biased result.NB is best suited sentiment classification approach under negation for case 1 to case 6 but for case 7 to case 9 LR achieve highest performance.Whereas SVM achieved highest improvement after encapsulating feature fusion with classification.

VI. CONCLUSION
This paper present a framework for comparative analysis to analysis the performance of benchmark supervised sen-
(a) &(b).In feature fusion case 2(POS+CWA), NB lead the performance.Whereas SVM and LR gain highest improvement twitter and Amazon data set respectively as shown in Fig. 6(a) &(b).

Fig. 7 .
Fig. 7. Feature Fusion Case 3:-POS and GDT based Feature Extraction for SA with Negation Scope detection Technique
Proposed Framework use domain specific Slag and emoticon corpus for slag replacement.For example consider the unprocessed comment C 1 and C 2 where tokens 'Ur' and 'lol' are compared to entries in slag corpus and return processed comment C 3 and C 4 with token 'Your' and 'laughing out loud'.Unprocessed comment C 1 : Ur sound is really pleasant.Unprocessed comment C 2 : It's Really Good .lol!.Processed comment C 3 : Your sound is really pleasant.Proposed Framework use Rogets Thesaurus corpus for word normalization by keyword matching.For normalization phase of post are match with entries in Rogets Thesaurus.If missed, repeated letters are sub sequentially compact until it's not matched.For example consider the unprocessed comment C 5 where the token 'gooooood' are compared to entries in Rogets thesaurus and return refine one i.e. 'good' with processed comment C 6 .Unprocessed comment S 5 : Its really Gooooood.Processed comment S 6 : Its really Good.
8. Feature Fusion Case 4:-BOW and CWA based Feature Extraction for SA with Negation Scope detection Technique for negation control.For the negation control feature fusion case 3 (POS+GDT) is best suited feature extraction technique that improve the performance of NB by approximate 45.24% to 82.98%, NB by approximate 47.58% -93.59%,RF by approximate 45.96% -66.46% and LR by approximate 44.56% -63.28% over different variant of social media data set.It is observed that NB is best suited sentiment classification approach under feature fusion for negation whereas SVM achieved highest improvement over different variant of social media data set.
nique for negative control over social media data set.Social media post or tweets may contain noise, misspelled word, emoticon and Slag language that required to be preprocess before feature extraction and sentiment analysis.Proposed framework initially preprocessed social media post or tweets to tackle noise, misspelled words and slag languages.And finally classify the tweets according to their polarity score after incorporating feature fusion