A Survey on Sentiment Analysis Approaches in e-Commerce

Sentiment analysis represents the process of judging customers’ behavior expression and feeling as either positive, negative or neutral. Hence, a tangle of different approaches for sentiment analysis is being used, reflecting analysis of unstructured customers’ reviews dataset to guide and generate insightful and helpful information. The aim of this paper is to highlight research design of sentiment analysis and choice of methodological by other researchers in E-Commerce customers’ reviews to guide future development. This paper presents a study of sentiment analysis approaches, process challenges and trends to give researchers a review and survey in existing literature. Next, this study will discuss on feature extraction and classification method of sentiment analysis of customers’ reviews to have an exhaustive view of their methods. The knowledge on challenges of sentiment analysis underpins to clarify future directions. Keywords—Sentiment analysis; e-Commerce; feature extraction; classification; customers’ reviews


I. INTRODUCTION
Since COVID-19 is a pandemic globally and causing companies to not be able to operate normally due to lockeddown with business operators to doing e-Commerce to survive [1]. e-Commerce is a safe way for consumers who make purchases for essentials and non-essentials goods and services online while staying home during the lock-down phase. E-Commerce is made possible via different online platforms. Online platform is known as e-commerce or Electronic commerce which is online transaction business used in buying and selling products through internet [2] [3]. Examples of e-Commerce are Shopee, Lazada, Zalora and eBay. These are world famous platforms selling goods, necessary products or services over the internet. However, people are doubtful of buying products from online platforms [4]. According to USA survey, there are 81% internet users who are buying products from online platform [5]. These customers express their feedback on their purchased items or services by writing reviews online at the comments section. Hence, reading other customers" feedback, comments or reviews is important to understand more about the products or services. Customers" reviews also known as Word of mouth (WOM) [3] [6] help other customers or clients to understand about the products, services and retailers. The more convincing the reviews are, the more confident the potential customers or clients will feel toward the products or services and be convinced to select and purchase them. Though customers" reviews are vital to effective customers" decision to make the right choices, the increasing number of reviews will require a potential customer to spend more time and effort to go through each review thus affecting the decision making process to be quite tedious as the potential customer has to read each review and analyze the product or service involved before making the final decision [1] [2]. Thus, to assist customers to improve making purchase decisions, many reviews analysis methods are employed to extract useful information for customers. Sentiment analysis helps to identify and analyze customers" or clients" sentiments in their text reviews to extract and present specific information necessary to make better purchase decisions on products or services in E-Commerce. This paper contributes survey analysis results by other researchers on sentiment analysis methods future development.
The paper is organized as follows: after this introduction, level of sentiment analysis, method for identification and basic requirement of sentiment analysis is discussed in Section 2. Section 3 outlines sentiment analysis process presented with supporting examples. Section 4 reviews studies on related works from other researchers in different backgrounds and E-Commerce customers" reviews using sentiment analysis. Section 5 discusses comparative analysis table of sentiment analysis with different methods in e-Commerce. Section 6 present discussion on comparative analysis table. Finally in Section 6, the paper ends with conclusions and acknowledgements.

II. SENTIMENT ANALSIS LEVEL
Nowadays, the huge number of reviews requires efficient method for analyzing [4]. Customers and retailers reading thousands of reviews manually take plenty of time to classify the reviews in e-commerce using sentiment analysis method. The volume of reviews stored like mountain which requires some effective classifier to identify valuable information from text. Sentiment analysis or opinion mining is useful to extract customer"s behavior by analyzing and exploring customer"s reviews in E-commerce [7] [8] [9]. Customers express their emotions by writing subjective judgement about the products in E-commerce [40]. Sentiment analysis also helps to categorize the unstructured text as positive, negative and neutral whereby summarizes judgement by customers in order to understand other customer"s expression and strength better about product and retailer [7] [10] [11]. Unstructured sentiments refer to detailed opinion by customer about the product [8]. Some information is explicit and others are implicit features. There are three levels of sentiment analysis: www.ijacsa.thesai.org Document-level Sentiment Analysis (DSA), Sentence-level Sentiment Analysis (SSA) and Aspect-level Sentiment Analysis (ASA).

A. Document-Level Sentiment Analysis (DSA)
A document talk about negative or positive sentiment is called DSA. It is extracting sentiments from whole document [8] [12] [13] . The scenario has applied the sentiment analysis of air purifier based on coarse-grained reviews whereby the researcher presented neural network model to identify semantics of sentences classification [14].

B. Sentence-Level Sentiment Analysis (SSA)
SSA means sentiment expressed in sentences which decide whether negative or positive. Whereby it is simple sentiment analysis for extracting sentiments or customer"s experiences from sentences [8] [12] [13]. At sentence level, the researchers present phrase recursive autoencoder (PRAE) model to identify sentiment in sentences for analysis of coarse-grained reviews [14]. However, according to [14] document and sentences level sentiment analysis unable to fine-grained features from the words.

C. Aspect-Level Sentiment Analysis (ASA)
ASA is opinion that classifies by identifying entities and their properties by classification and extraction [13] [15]. Whereby, it is interested on opinion words only from the reviews such as "Love the Amazon show", it is clearly mentioned using the word love [12]. The aspect "love" from the text is important feature extraction phase that needed for sentiment analysis method. At aspect level classification, researcher presents hybrid model for the analysis of finegrained product"s features [14]. It also expressed out sentiment polarity for further prediction process [16].

A. Preprocessed Texts
As first step, data cleaning exhibit to clean unnecessary reviews from selected dataset [17] [18]. Data preprocessing perform to remove all missing values, remove stop words, tokenization, unwanted symbols, digits and URL tags [31]. Tokenization helps divide sentences into words, phrase or symbol and remove all stop words such as "the", "is", "are" and "a" [9]. The words required to convert to lower case as preparation for next step.

B. Feature Extraction
Aspect extraction from unstructured data helps extract all relevant information from dataset, reduces or removes irrelevant features of data for sentiment classification whereby the method is known as feature extraction [2] [19]. Feature extraction also helps extract implicit information from reviews other than explicit opinion to give more effective and better performance. There are few methods used for feature extraction :-frequent pattern mining with association rule mining, term document matrix (TDM), parts-of-speech (POS) tagging, Maximum entropy (ME), N-gram and lexicon [2] [8] [10] [16]. Those methods have advantages and disadvantages while applying for extract features in reviews. Frequent pattern mining is itemset, subsequence or substructures which helps find sequence database [20]. Apriori algorithm with association rules is one of the approach is in many fields. Other than that, according to U.A.Chauhan with other researchers has implemented Part-of-speech to find differences between noun, adjective, verb and adverb [5]. By extracting the term, in sentences reveals the hidden story and emotions of customers to be classified positive or negative. Furthermore, TDM is implemented to compute frequency of each word using method like bag of words and term frequency-inverse document frequency (TF-IDF) [5] [21] [22] [23]. TF-IDF helps to calculate number of times the word occurs and focuses on the importance term. By extracting most frequent words, researchers can ignore words with least scores. Some implement N-gram features for extracting the features as unigram (One word), bigram (2 words) and trigram (3 words) whereby N represent number of words [22] [24]. Based on researchers, unigrams features commit to increase accuracy result in classification method. N-grams helps to avoid semantic scores, the score calculation creates domain independent sentiment dictionary and computes to eliminate human annotators. These are some options by researchers for extract features from dataset before classifying the sentiment into positive, negative or neutral.

C. Sentiment Classification
Sentiment refers to feeling, emotions or responses of an individual by words for expressing human behavior and character [11] [25]. Hence, in this area explicit and implicit features that extract and identify hidden sentiment in measurable format. Whereby there are few methods to polarize the aspect in review theoretically: lexicon and machine learning classifier [2] [11] [13]. There are dictionary-based and corpus-based approaches for lexicon based such as Senti, HowNet, and Wordnet [8]. It is WorldNet dictionary which is stored with polarity positive, negative and neutral. Whereby automatically it is able to score the words in documents by www.ijacsa.thesai.org counting number of positive and negative words in review [21]. If the review has more positive words than negative words, it is polarized as positive reviews. Some machine learning classifier for supervised learning are Naïve Bayes, Support vector machines (SVM), Maximum Entropy and Random forest [2] [9] [10] [26] [27] [31]. Fig. 2 shows summary overall sentiment classification based machine learning and lexicon approaches. Supervised learning required training labeled data to process output result based on input data [21], whereas unsupervised learning requires unlabeled training data to identify pattern of data output. Many researches used Naïve Bayes and SVM machine learning method for sentiment classification [15] [21].

D. Evaluation Score
Based on feature extraction and sentiment classification on online reviews is rank the result using statical method [2]. The overall evaluation result is very important to judge subjective online reviews for customers. The result can predict or measure with mean squared error (MSA), confusion matrix, accuracy, precision, recall and F1-score [9] [15] [28] [29].
Equation of precision is presented as true positive (high quality reviews) divide by true positive (high positive reviews) + false positive (low quality reviews) [5] [18] [29].

Recall = (2)
F-score is calculated based on recall and precision as) [ Mean absolute error (MAE) and Root Mean Square Error (RMSE) measure the closeness between fitted line to the data points [18] [29] [30].
Confusion matrix helps to show data difference between two classes [29].

IV. RELATED WORK
This section presents related studies of sentiment analysis on customer"s reviews which conducted by researchers on E-Commerce and different background of studies.

A. Previous Work from Different Background
Sentiment analysis is applied in different background of study. In Indonesia, Twitter status analyze with sentiment analysis method using SentiWordNet [32]. The emotion in tweet describes sentiment of user in Indonesian language whereby each sentences can polarize to positive, negative and neutral. Researcher used SentiWordNet which contains set of words score between 0 (negative) to 1(positive). After scoring of sentiment, final result identified by sentiment classification method by calculating accuracy positive score 0.77, neutral score 0.60, negative score 0.78 and average score is 0.74. Hence, emotions that express in words transform to meaning information. Other than that, sentiment analysis is also applied to extract Arabic opinions from text which is collected from twitter posts. In this research, Machine learning (ML) and Lexicon based (LB) approach with respect sentiment orientation is applied. The data collected and query using Tweepy Application Programming Interface (API) whereby positive and negative tweets are selected. N-gram features applied to divide the letter as Unigram (one word), bigram (two words) or trigram (three words). ML classifiers such as Naïve Bayes (NB), BNB, Multinomial NB (MNB), Maximum Entropy (ME), Support Vector Machine (SVM), Logistic Regression (LR), Stochastic Gradient Decent (SGD), RR, Adaptive Boosting (Ada-boost) and PA. To evaluate the performance of classifier accuracy, precision, recall and Fscore is used. Based on final result, single fold SVM 99.31 with unigram feature shows highest accuracy and 10 fold PA 99.96% with unigram feature shows highest accuracy. The www.ijacsa.thesai.org classifier helps to extract and discover the polarity of the given tweet. On other hand, movies" reviews and score in Rotten Tomatoes website predict with sentiment analysis method [33]. In this research, lexical based approach and supervised machine learning approach were used to predict sentiment polarity in movies" reviews. This researcher also has polarized the sentiment in review using SentiWordNet whereby it classified into two classes rotten (negative) and fresh (positive). The result evaluate by comparing proposed method and baseline method by calculating average precision, recall and F measure whereby proposed method shows highest result 0.97. Hence, it is able to show better judgement on movies" reviews from rotten tomatoes website.

B. Theory of Sentiments Analysis in E-Commerce
This section discussed previous work related sentiment analysis in customers" reviews e-commerce. Amazon dataset on product"s reviews has been selected by few writers for sentiment analysis. Sentiment analysis of unstructured data in Amazon dataset helps to measure and evaluate information in sentiment in reviews using natural language processing techniques [4] [7] [34]. Sentiment analysis was implemented for analysis of e-commerce product reviews to categorize negative and positive comments and visualize it in charts [4]. The model is developed with unigram and bigram and evaluate with classifier such as linear support vector machine, Multinomial Naïve Bayes, Stochastic Gradient, Random Forest, Logistic regression and Decision tree by product category cellphone, musical and electronics [4] [27]. The result measure using accuracy, precision, recall and F-measure whereby linear support machine shows highest accuracy 93.57 better results compared to other papers. Text mining techniques Apriori and Term Frequency-Inverse Document Frequency (TF-IDF) were applied for identifying text features in proper way [23] [34]. Table I shows implementation of sentiment analysis on review by some researchers based on Amazon dataset. The table presents many research and different sentiment analysis approaches toward resolving problems in e-commerce customers" reviews. Process of identifying sentiment from unstructured dataset provides different results as different methods are applied. The challenging part of sentiment analysis is to discover what customers like and dislike as written expression [15] [27]. The researchers used sentiment analysis method for identifying sentiment scores in online reviews and overall result as presented in Table I. Different researchers have conducted different approaches for feature extraction and sentiment classification, hence, some future improvement in method application is needed to attain greater accuracy.  Table I, most of the method in sentiment analysis implement feature extraction and sentiment classification in their process flow for get better accuracy results. Many researchers have implemented text mining method for extract most frequent information from dataset like TF-IDF and Apriori algorithm. Machine learning supervised method is used in most of the papers for classification of information. The experimental results from frequent pattern mining and supervised machine learning methods are able to provide more than 90% accuracy result. For future investigation, lexicon method and result analysis needed for compare accuracy result with machine learning method in sentiment analysis of customers" reviews.

VII. CONCLUSION
In this paper, we have presented methodology of sentiment analysis and approaches based on previous studies in Ecommerce. Research studies results are to address customer satisfaction on online shopping platform based on other"s reviews. Data analysis approach presents statical result for predicting and building strong confidence among customers who purchase product from online. Most researches have looked into many approaches and challenges, toward judging customers" behavior as discussed in different methods. The approaches are further applied in other field like Airline, Tourism, Hotel industry, hospitality and others. Sentiment analysis methodology and interpretation using analytic tools perform accurate result to customers. Hence, there are many challenges and ongoing more research in this area have to be discussed and improved to produce more efficient and reliable sentiment analysis approaches.