Impact of Anaphora Resolution on Opinion Target Identification

Opinion mining is an interesting area of research because of its wide applications in the decision-making process. Opinion mining aims to extract user’s perception from the text and to create a fast and accurate summary of people’s opinion about anything. In this study, we have worked on opinion target identification and the impact of anaphora resolution on opinion target extraction. Anaphora resolution can be utilized to detect opinion target in sentences having prepositions instead of nouns. We empirically evaluated the impact of anaphora resolution using benchmark datasets. We have achieved accuracy such as precision: 88.14 recall: 71.45 and f-score: 72.12, respectively. Keywords—Opinion mining; machine learning; evaluative expression; anaphora resolution; opinion targets


INTRODUCTION
Opinion is a personal view, statement, or judgment of an individual about something [1].People's view, knowledge, experience play an important role in human guidance and decision making [2].For example in the sentence "Mahnoor Baloch is a good actress" is positive opinion regarding Mah,Noor Baloch, "Kamran is not a good player" is negative opinion, while "Milk is good for health but tea is not" is a neutral sentence.Opinion has different components.Identification of each component from free text is a challenging task [3], [4].This study is about opinion target identification.OM attempts to find the evaluative perspective of natural language context [9].The evaluative expression represents a source, an attitude, and a target or destination.For instance, in the sentence 'I disliked the rooms of the hotel, they were not well decorated', the speaker (the source) communicates a negative behavior regarding 'the rooms of a hotel' (the Target) [3].
Our research problem has different subproblem and has been approached in a different manner: In some papers, it is regarded as subjectivity analysis at the document or sentence level, and some has worked on opinion target and opinion words while some has tried correlation between the two.In this work, our goal is to investigate whether anaphora resolution (AR) can be potentially exploited to get improvement in domain-independent opinion-target pair extraction.The word anaphora came from two ancient Greek words "Ana" and "phora"."Ana" means back, upstream, back in an upward direction whereas "phora" means the act of carrying.Anaphora is employed quite regularly in, both written and verbal discussion to ignore over -a reiteration of words for the purpose to enhance the continuity of terms [5], [6].Table I shows an example of anaphora resolution: The natural language processing consists of the various complicated demanding domain of learning in which anaphora resolution considered an important and interesting field of research [7].The AR is necessary for the utilization of maximum certifiable NLP approaches and it is unavoidable incident in the estimation of sentence structure.In the discussion, an AR is an issue to recognized anaphors about prior else subsequent elements.The described elements (predecessors) may be specified or unspecified noun terms, verb terms, pronouns, the entire expressions, and phrases.There are three basic kinds of anaphor: (a) "pronominal anaphors", this is general sort of anaphor used to recognize antecedents of pronoun anaphors within sentence;(b) "definite noun-phrase anaphors", this sort of anaphor identify an antecedent via a noun terms; (c) "ordinal/quantifier anaphors", This kind of anaphor refers to an ordinal such as second and it may be represented to some unspecified quantity like few, some, etc. [8].This paper is organized as follows: Section 2 presents the related work on opinion target identification and our problem.Section 3 discusses the proposed framework we employed for domain-independent opinion target extraction, while in Section 4 we explain experimental work.Section 5 concludes the paper.www.ijacsa.thesai.org

II. LITERATURE REVIEW
The OM problem has been addressed in many research papers and diverse approaches have been employed for its solution.The OM has been categorized in subproblems as explained by [4].Opinion words and opinion targets have been identified in different ways.Some work has purely ,used grammatical structure [30] of the language and some employed semantic features [2] and some has used both syntactic and semantic features [11], [18].The combined approa,ch has shown proven results.In this work, we have adopted the combined approach.However, our goal is to the test the impact of anaphora resolution on opinion targets extraction.As explained in the introduction we regarded the problem of anaphora in context of inopinion target since the object and features in the free text are mostly referred by anaphora.There has been sound work on anaphora resolution.An anaphora resolution has been developed as a source of semantic evaluation with help of word features and Backus-Naur Form (BNF).This technique depends on coordinating restraints for the syntactical features of various wordthe s, opinion, and text.They get approximately 96% accuracy whereas the algorithm was also checked for complicated and composite sentences [12], [13].The heuristic rules and WordNet ontology was used to enhance the accuracy of anaphora resolution.The intrasentential and inter-sentential anaphora and pleonastic-it operation in English communication were utilized to improve the resolution accuracy [14].The relevance scoring between context matrixes and WordNet glosses are used for calculating and extraction of the right sense of target word [12], [15].The anaphora choice of pronoun has been essential for extracting general needs from the text of necessities report spontaneously [16].The dependency and dialogue pattern was utilized to provide assistance in the resolution of particular kind of references.To resolved entity pronoun references in Hindi discourse a Paninian grammar dependent heuristic model were applied [17].
To improve unsupervised opinion targets extraction technique patterns and semantic analysis has been employed [18].For the identification opinion targets, two steps are employed: candidate selection and opinion targets selection.The combined lexical based syntactic pattern was used for candidate selection while a hybrid likelihood ratio test approach with semantic base relatedness was employed for candidate selection [18].For the annotation of opinions in unstructured text documents, a method was developed.Appositive instances were resolved by using Normalized Google Distance (NGD).Latterly the issue of anaphora resolved documents has been performed by employing the Vector Space Model [19].The machine learning method has been employed to categorize subjective and objective sentences.They worked on rule-based domain independent opinion evaluation technique.They performed experiments on data collected from different websites [20].Assigning predescribed categories to textual documents is referred to text classification.They build up a common method to evaluate the semantic relatedness of documents.To increase the semantic importance assign to every document anaphora resolution were used.The hidden meaning of the text was expressed more efficiently by word semantic and WordNet scientific categorization which provided an authentic description as compared to conventional Information method [21].The rulebased technique was used by the proposed algorithm for Pashto dialect in their oblique, immediate as the well possessive state being the resolution of strong personal pronouns [22].The pronominal anaphora resolution (PAR) was used with other conventional attributes along with global discourse knowledge.The referent of an anaphoric pronoun was evaluated locally by the attributes involved in searching.Usually, the sentence which includes the anaphor as well as several sentences quickly before structure the neighborhood setting of content.With the processing of discourse, the knowledge base gets were also improved [23].The superlative entropy model and Random Forest classifier for the pronominal anaphora resolution using benchmarking technique provides precise features of Malay discourse like gender-neutral pronouns.They persist in a particular two steps procedure: First, Managing implantation to investigate the components of Malay anaphors.Second, In light of the investigated output, the pronominal resolution framework was outlining, actualizing, and assessing [8].To determine the reciprocal pronouns in the Pashto language an algorithm has been developed depends on some specific principle.Since in the Pashto language, the NLP mechanism along with a collection of written explained texts were inaccessible, a little physically labeled and divided corpus was made for Pashto dialect [24].Several issues were found in resolving pronouns in the Malayalam language compared to English discourse as its free phrase order language.The physical experiment was accomplished by settling anaphora on various stories about the data set.The execution of numerous NLP application like passage abstraction, Passage classification, and text retrieval has been enhanced through anaphora resolution system [25].The individual pronoun anaphora resolution were assisted to accomplish website page data handling by a large number of paroxysmal text in the web [26].An algorithm has been developed to resolve the distributive anaphoric connection by utilizing the global learning includes maximum characteristics of the noun in Urdu conversation [27].
The most relevant work to our problem is [10], [11].They have worked on the improvement of opinion target extraction with anaphora resolution however, their approach is slightly different.Furthermore, their work is specifically for movie domain.

III. PROPOSED ARCHITECTURE
The whole procedure of the proposed structure of opinion target identification from unstructured reviews is discussed in this section.There are two main objectives of the proposed work; to identify opinion targets from evaluative expression and to improve opinion target identification by anaphora resolution.The procedure clarifies how opinion targets can be extracted from an input unstructured review.The following three phases used in this procedure as elaborated in the block diagram (Fig. 1).Every step describes a summary of the substeps included in the procedure.

A. Pre-Processing
The pre-processing phase applied for noise removal, sentence division and parts of speech tagging (POS).The POS www.ijacsa.thesai.orgtagging involves allocating exact grammatical category to every word of the text.

B. Candidate Selection
The identification of candidate features is a vital phase of opinion target extraction [28].To find out evaluative expressions including opinion and targets the proposed algorithm is employed.This procedure utilized the following three basic steps.

C. Regular Expressions
We adopted the Regular Expression (RE) pattern from [18].These patterns are used for extraction of strings containing opinion and targets through base noun phrases along with various boundary conditions.The opinion lexicon dictionary is utilized by the proposed patterns for identification of opinionated expressions that consist of opinion and targets.

D. Candidate Selection
The candidate target features are selected in the extracted evaluative expressions by the pronoun phrase and also to obtain the relevance scoring arranged it according to their no. of occurrence.This algorithm consists of the following two steps.
 In this step, we look for constituents of the lexical patterns in the input sentence.If a sentence consists of any patterns of the proposed pBNP, at that point the sentence is named as opinionated, or then nonopinionated.The algorithm examines the pBNP constituent pattern on priority bases as vBNP, dBNP, iBNP, and sBNP, individually.
 At this stage, a set of a candidate features is produced from the extricated patterns.All pronoun phrases in the evaluative expression take out in step 1 are chosen as candidate features and the recurrence of each particular noun is determined.

E. Opinion Target Extraction
In this step semantic based likelihood ratio technique is derived from [18].The relevance scoring technique is utilized to categorize candidate features into relevant and irrelevant.The LRT is used to extract opinion targets that happen maximum no of times while semantic based relation is applied to finds targets occurred infrequently.Table II describes sample product features.

F. Enhancement of Semantic-Based LRT through anaphora
In this step, we propose an enhancement of the semanticbased likelihood ratio test technique derived from [18]

A. Datasets
We have used manually labelled datasets regarding nine products of customer review that have been described frequently in research of opinion mining and target identification.These datasets used for analysis and assessment of proposed work.The author's website is openly used to avail these datasets, every product features for opinion recording is conveniently labeled via a manual procedure with respect to mentioned annotation strategy as follows.Table IV shows an explanation of desired nine datasets.
 The sentence that consists of positive or negative remarks regarding features of the product then this sentence is considered as opinionated.
 The opinion statements consist of positive or negative suggestions described adjectives.
 The criteria for the product are the product feature that represented by the customer's opinions.

B. Tools and Implementation
Section shows the achievement matrices and assessment principles that have been utilized throughout the time of research process to assure the validity of the results.The accuracy is calculated by utilizing the following three performance matrices.

C. Tools and Implementation
This section described explanation regarding simulation tools utilized in this task.The following state-of-the-art software is applied to experiments and simulation.The part of speech tagging is accomplished via the Stanford part of speech tagger [29].The parts of the speech tagging software are freeware and broadly described in English language texts.The algorithm used in this thesis depends on the grammatical attributes for evaluation of language elements.Thus, by using this software the actual datasets are changed to POS tagged corpora.The test evaluation and pattern extraction are performed by Text Stat 3.0 and from author's website, it's easily accessible for academic research.The WordNet.Net Library is developed by Troy Simpson and from author's website, it's available openly.This library facilitates the WordNet dictionary for similarity scoring by a DotNet port.The implementation of the semantic-based relevance scoring algorithm is employed by this library.The WordNet dictionary is a collection of a lengthy lexical database consisting of 117000 synsets.Every synset shows a distinctive idea that is combined with the conceptual-semantic and lexical association [30].MS Excel is used to generate results and graphs.

D. Results
Initially, the datasets are changed over into a parts of speech tagged datasets, utilizing the Stanford parser [29].At that point, the proposed algorithm is executed through the model framework with the following setups to extract the candidate features.
The experimental setup depends on a combination of four unique patterns, i.e. linking verb base noun phrases, definite base noun phrases, preposition based noun phrases and subjective base noun phrases with pronouns.This setup is named as pBNP.
In every step, the result of each pattern is contrasted with the manually labeled features to recognize True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).Accuracy such as precision, recall, and f-score is determined by utilizing the confusion matrix generated by the proposed framework.
To make the results comparable, the same setup is used for both the Likelihood and semantic-based Hybrid Likelihood techniques.The evaluation measures precision (P), recall (R) and f-score (F), which are calculated using the following parameters:  TP = number of extracted pBNP which are target features.
 FP = number of extracted pBNP which are not target features  TN =number of non-target features pBNP, which are not extracted  FN = number of targets features pBNP, which are not extracted This setup implements the semantic likelihood ratio test with the proposed lexical patterns (pBNP).The cBNP-L uses the candidate features extracted through cBNPs and employs the likelihood ratio test for relevance scoring to extract the opinion targets.

E. Influence of Anaphora Resolution in Opinion Target
Extraction Table V shows the result of nine datasets in term of precision, recall, and f-score to the impact of anaphora resolution.

F. Comparative Results of Proposed Method with the Existing Approaches
Table VI presents average comparative results between the baseline, the semantic-based Hybrid Likelihood Ratio Test techniques and Semantic-based LRT with Anaphora resolution in terms of the average precision, recall and f-score respectively.Fig. 2 describes the comparative results of proposed semantic based opinion target extraction through anaphora resolution with existing hybrid semantic based likelihood ratio test.As shown in the above graph the score of the proposed technique is higher than the existing semantic based Hybrid Likelihood Ratio Test.Subsequently precision decreases slightly while high increase the recall and improve f-score.V. CONCLUSION This study describes an impact of resolution on opinion target identification in text documents.We used nine datasets taken from author website for the evaluation of desired work.The proposed work recognized opinion targets from evaluative expression and slightly enhance its result by employing anaphora resolution.The learning of the current task and drawback of the proposed work discover that there is space for enhancement in the proposed method.Thus, suggested method retrieve domain progressive assessment expressions that can be utilized for identification of target attributes in a cross-domain via a supervised machine learning algorithm.Thus the future task must be given attention in this dimension.

TABLE I
by anaphora resolution.As given in TableIIIthere are features which are represented by pronouns.In these datasets, targets are calculated manually which are pronouns and then total pronouns are found out in each dataset.The following table shows examples of the targets manually labeled dataset having pronouns.TableIIIrepresents the influence of the pronouns on target features.The influence of the product features in the canon power dataset which contains 60 targets feature out of 173 pronouns, therefore the influence of the pronouns on target

TABLE IV .
DATASETS DESCRIPTION

TABLE V .
PRECISION, RECALL AND F-SCORE WITH EFFECT TO ANAPHORA

TABLE VI .
COMPARATIVE RESULTS