Reputation Measurement based on a Hybrid Sentiment Analysis Approach for Saudi Telecom Companies

Thousands of active people on social media daily share their thoughts and opinions about different subjects and different issues. Many social media platforms used to express the feeling or opinion and at top of it is Twitter. On Twitter, many opinions are expressed in many fields such as movies, events, products, and services; this data considered a valuable resource for companies and decision-makers to help in making decisions. This study was based on using a hybrid approach to extract the opinions from an Arabic tweet to measuring service providers’ reputation. In this study, the Saudi telecom companies used as a case study. This research concentrates on determining peoples’ opinions more accurately by utilizing the Retweet and Favorite. The number resulting from positive and negative tweets after applying the polarity equation was used to estimate reputation scores. The result indicated that the STC company represents a high reputation compared to other companies. The proposed approach shows promising results to expand existing knowledge of sentiment analysis in the domain of measure reputation. Keywords—Reputation; sentiment analysis; Arabic language; social media


I. INTRODUCTION
Data analysis and mining are some of the most important areas studied in computer science [1]. Due to the importance of data in various fields and the amount of data that is generated during the day, many companies have paid great attention to data. The amount of data is increasing day by day due to a large number of social media platforms [2]. A large amount of content on social media is processed periodically. This data can provide information about users' loyalty, complaints, potential customers, or measure the reputation. To deliver several services and better understanding the customers companies rely on different social media platforms such as Facebook and Twitter [3].
Twitter is one of the popular platforms where people express their thoughts and opinions in 140 characters. The number of active users in this micro-blog is more than 328 million users, who vary between presidents, academics, organizations, and bodies [4]. Statistics indicate that the number of Twitter users in Saudi Arabia has reached 41%, which is a large percentage compared to other countries [5]. In addition, the Arabic language is considered one of the rich languages that have broad audiences, so the focus was on it and Saudi society.
With the millions of opinions that are disclosed on social media about some topic, it is out of the question that these opinions will be biased [6]. People tend to refer to people's comments and opinions about products before they buy them, the movies they will watch or the countries they are thinking of traveling to. The opinions expressed by online users are a kind of electronic word of mouth (eWOM) [7]. The opinions are greatly influencing the decisions a customer will make and its behavior [8]. Where these opinions are considered as more trustworthy and credible way because it's independent of marketers' selling efforts [9].
The spread of sentiment on the Internet has created a new field of text analysis. It attracted many studies in many fields such as education [10], stock market [11] and others. Usergenerated comments are contributing to the establishment of digitalized WOM forms, which in turn contributes to creating an online reputation. Reputation is a factor that influences the desirability of a product or service, so it needs investment and good management [12] [13]. Reputation affects companies' capabilities as well as increases their chances of sustainability in the market [14] [15].
The objective of this study is to measure reputation based on a hybrid sentiment analysis approach that combines textual and non-textual features. The analysis improved by including the number of retweets and favorite tweets. In the end, the reputation was calculated accurately for the service providers through the Beta reputation equation.
The rest of the paper is structured as: Section 2 investigates the related works of customer feedback analysis in different sectors. Then, Section 3 describes the methodology proposed which added a new step into the SA approach: the polarity equation, and reputation calculation by equation. Section 4 examines the supervised ML algorithm to determine which algorithm was the best and examines the reputation equation with two different experiments. Then describes the results after utilized the proposed methodology. Section 5 concludes the final notes of the work with recommended guidelines for future developments and research.

II. RELATED WORKS
Feedback from customers is important and it may be a complaint, recommendation, evaluation, etc. Different data mining techniques are used on the feedback to provide valuable The sources of feedback varied, some researchers resorted to using traditional methods such as the survey. The study of [16] was concerned about the customer satisfaction with the South Korean airline service with the two different customer groups the low-cost carrier and the full-service carrier. To investigate customer satisfaction, they related it to six different factors positive emotion, negative emotion, social words, comparison, risky values, and monetary values. They are analyzing more than 133000 of the customer feedback after they experience the airline service. The positive emotion and the social words were considered the most impactable factor on customer satisfaction while the monetary value has always a negative impact. The limitation of this study was they didn't pay attention to the demographic factor of the customer also they concert only with the English feedback.
Also, the work of [17] was based on the predictive analysis they investigated the customer intention to return visit the services provided by the airline services. By using the customer feedback comment and satisfaction rating for the previous use of the services they applied a machine learning approach. They used seven classifiers to analyze the sentiment in the comment. Their work was different from the previous work in that they lookup for the actual intention of the customer to return the visit. Also, they analyzed a huge data set what is contain with 309331 customer comment. All the long comment indicates higher accuracy. Their results indicated that customer feedback is very important to determine the actual intention for returned visits and also, it helps to understand the customer behavior over time.
Traditional methods are not recommended due to the sample that may participate in this survey. Also, the survey may be mandatory, which means the respondents may answering only for completion. Therefore, researchers have resorted to some alternative methods, which are based on social media or websites. These methods are more effective as they involve different samples and sectors.
Their work [18] was based on a sentiment analysis which they referred to as opinion mining. They come up with a new idea to analyzing customer feedback from Amazon. So, they designed a framework that received any kind of data from the user and preprocessing it, and then process the data. The result shows the amount of the sales product and the market share as a figure. Also, shows the count of the positive and negative reviews for each feature of the sold phones. Which is give the company a better picture of its product and helps it to succeed in the market.
Their work [19] was focused on the complaints of the customer they analyzed the data set from the Customer Relationship Management system (CRM). Which is a system that deals with the complaints of the customer for the Metropolitan Transportation Authority (MTA). The MTA provides several services that including a subway and buses and a railroad. From the correlation matrix they come up with the agency and subject matter attributes that are positively correlated and there is no negative correlation. Also, they performed more analysis on details of the complaints to improve the quality of the services. For the classification, they applied different modeling: naive Bayes, KNN, random tree, and ID3. For each model, the input is the complement details that give the agency name as an output. Their work was helpful to determine the factor that led to the customer complaint and also helped the company to improve the quality of the provided services.
The comments that posted in different social media platform could be considered as customer feedback. Mini and Poulose study [20] to discover the customer intention about the eco-tourism. They took a property in Kerala in India as a case study. They collected the data from Facebook and performed a sentiment analysis on it. They were able to determine the satisfaction and dissatisfaction of the customers which will help improve the quality and improvement process. In their study the focus only on collecting the comment from Facebook that was written in English. Also, the multiple emotion in the collected comments was not easy to extract it.
The study of [21] was based on user-generated content (UGC) to investigate the airport service quality. Their study was in London Heathrow airport services. Sentiment analysis was performing on the collected tweets. They were able to evaluate the quality of the services and determined they good services and the services that need to be improved. UGC is considered a better resource more than the traditional way. The limitation of this study they ignore the demographic characteristics and it's applied to one airport.
UGC on social media plays a significant role in measuring reputation. As the work is done by [22] they measure reputation for four companies Apple, Twitter, Google, and Microsoft by analyzing the tweet. Their proposed approach a distinct work on dealing with the misspelling in the tweet and can apply to all languages also, it does not need a text preprocessing stage. They sentimentally classify each tweet to one of the companies and determine the emotion in this tweet. By relying on using N-gram approaches their results to show better accuracy more than the Bayesian algorithm and a neural network.
The same data set was also analyzed in the study of [23]. Their study was focus on the user reputation by applying some factor to calculate to determine which user have a higher reputation than the others. Their proposed framework was first assigned a score for each tweet the positive tweet was indicated by 1 and the negative was indicated by -1. Then they extracted the ID of the tweet to retrieve the Twitter creator they extracted five factors for each creator: number of followers, number of following, Twitter lists, numbers of retweeted by the followers, and number of tweets posted by that user. Via these five factors, they calculated the reputation of the user. Then they calculate the accumulated weight by multiplying the sentiment score with the user reputation. The tweet from the higher reputation user will have a higher accumulated weight. The proposed method would be helpful to minimize the effort and the size of data by extracting the tweet that only relevant to the higher reputation users. www.ijacsa.thesai.org Online reputation may be useful in determining and measuring profits, according to a study of [24]. They emerged Booking review and Financial Analysis Made Easy (FAME) and used Latent semantic analysis on the textual content. They were able to determine which attributes are associated with financial profitability. From the positive review, they were able to determine that the hotel location and the room quality was indicated the highest impact attribute on the financial profitability. The study was conducted only on the UK hotels and they extract the review from the Booking platform only.
In the tourism sector, there have been several studies and research, such as the work that was done by [25]. They measured Marbella's reputation as a tourist destination. Their study focused on the factors that attract tourists and determine the geographical distribution of the countries from which they came. They analyzed visitor reviews according to sociodemographic characteristics of the city over three years. Their study was focusing only on TripAdvisor reviews.
Also, the study of [26] that took a different sector focused on the political sector. They examine the reputation of the members of the National Assembly (MNAs) of Pakistan on Facebook and Twitter. They were able to determine the credibility of the members and which platforms have a more supportive audience. The comment was chosen based on the systematic random sampling technique. The supportive audience was on Facebook more than Twitter.
Many kinds of research have been dedicated to the field of analyzing customer feedback. Regrettably, few studies consider the Arabic language.
The study of [27] was aimed at determining the polarity of the text only. Therefore, it dealt with several topics in politics, sport and education. They collected data from Twitter and Facebook and then applied three classifiers, SVM, Naïve Bayes, and K-NN using Rapid miner. The size of the data set was small, so no promising results were shown.
The study of [28] was also on the same area they concerned about the telecom Saudi company STC, Mobile, and Zain. Their study focused on the assignment of the polarity scores of the tweets by SWN. The determination of polarity in SWN was based on a lexicon as the results showed that most of the tweets tended to be natural. Defining polarity helped them better understand their customers and their needs. They were also able to identify areas of clients who had complained about some problems.
Previous studies have indicated determining the polarity of sentiment in the telecom industry, both of which were based on a lexicon-based approach. All previous studies focused on text analysis only, they did not include the number of retweets and favorites. The following Table I gives an overview of the previous studies, their purpose, source of data, and the approach used in their methodology.

III. PROPOSED METHODOLOGY
The methodology that followed in this study contains five different phases. Fig. 1 shows an overview of the proposed methodology. Later, in this section, every phase will explain in detail.
The polarity of the collected tweets was determined to positive, negative, or natural. For this study, only positive and negative tweets were included. Most of the sentiment analysis that was performed was focus on polarity only. For more precise a hybrid approach that combines textual and nontextual features was applied. The hybrid approach was based on the polarity equation [29]. The integer number from the polarity equation for both the positive and negative classes was the input for the reputation equation [30]. www.ijacsa.thesai.org

A. Phase One: Data Collation
The data for this study was extracted through Twitter API. The analysis focused on the telecom service providers sector in Saudi Arabia. The providers were the data obtained from STC, Mobily, and Zine. The data was collected from the official account of the three providers: @stc_ksa, @Mobily, @ZainKSA. Each of these accounts is active accounts with thousands of tweets exchanged daily.
For each account, 5000 tweets were collected and the total amount of the data was approximately 15,000. Due to time constraints, data were collected within February only. The collected tweet was in the Arabic language mostly, while the tweets that contain some English words or letter were removed later as showed in Section III-B.
For each tweet, the time for its creation, tweet ID, text, the number of their retweet and favorite, user location, screen name, followers, and friends were extracted. For this study, only text, the number of their retweet, and favorite were used. Fig. 2 shows the statistical descriptive for the numerical values in the dataset.

B. Phase Two: Data Preprocessing
To prepared data that has been gathered for the sentiment analysis, it has to be cleaned and preprocessing. Preprocessing data means removing all the noise, unwanted parts, and missing values even the duplicate data. Performing all these procedures leads to get quality data and reliable results. The steps that followed on preprocessing data are:

1) Removed unwanted parts:
 Any English words, letter, or numbers.
3) Normalizing repeated letters if it was more than two letters.  The collected tweets are considered unlabeled data. To use some of the ML algorithms an online tool was used to label data. The Mazajak tool 1 created by Ibrahim Abu Farha and Dr. Walid Magdy [31] classified the text polarity to Positive, Negative, or Neutral. However, there may be chances where some word polarity is not detected correctly. For example, the appearance of words that contains a prayer to Allah gives a positive labeled result. The tweet ‫ﻧﺘﻜم؟(‬ ‫ون‬ ‫عدل‬ ‫ت‬ ‫ﺘى‬ ‫م‬ ‫ي‬ ‫ارب‬ ً) labeled as positive, which indicated a negative opinion. Fig. 3 shows the class distribution for the three companies.
After selected Positive and Negative tweets the amount of data became 2,770 for STC, 1,552 for Mobily, and 2,553 for Zain. Therefore, the total amount of the three companies was 6,875.

C. Phase Three: Sentiment Analysis
This sentiment analysis stage refers to the primary stage. Where the analysis is performing on text-only. The binary sentiment analysis works only with Positive and Negative tweets, Tweets that rated Neutral were excluded from this analysis.  To prepare the target variable for classification the text must convert into a numerical record. Bag-of-Words representation works fine with a short text, is based on collect unique words in the dataset as features. Therefore, for the tweets in the proposed case Count Vectorizer was used for features extraction.
The dataset was split into training and testing sets. The training part was used to build the classification algorithms model, and the testing part to evaluate the model. Three Machine Learning (ML) algorithms were applied to the dataset Support Vector Machines (SVM), Decision Tree (DT), and Multinomial Naive Bayes (MNB).

D. Phase Four: Popularity Scoring
The popularity scoring equation takes the numbers of retweet and the favorite into polarity consideration. The popularity scoring equation can be considered as an advanced sentiment analysis stage. Since the primary stage dealt with text, the advanced stage deal with a retweet and the favorite.
A retweet means reposting someone's tweet or message in your timeline [32]. Where the popular tweets are reposted many times. Any person following you was able to see the retweeted tweet. Favorite the tweet means to keep that tweet on your account under the favorite tab. To favorite the tweet only, the user must press the heart-shaped icon. Both retweet and favorite are considered as a sign of agreement and admiration on the content.
The Popularity Scoring Equation that applied adopted from a study of [29]. It consists of two equations dependent on each other. Where the outputs of the first equation are inputs to the second equation.
The first popularity scoring equation 1 takes the sum of the classified tweet Ct (e.g. Positive classified tweet) plus the number of its retweet #RT number plus the number of its favorite #F av.
The output from equation 1 which is the popularity scoring of the classified tweet (e.g. Positive classified tweet) multiplied by 100 and divided by the total number of the popularity scoring for all the classified tweet 2.

E. Phase Five: Reputation Calculation
Most reputation measurement systems rely on the opinion generated by customers, where this opinion reflects the general community's opinion [33]. From various principles of reputation measurement systems, this study is based on Bayesian systems. The Bayesian systems using the binary classes as input [34], and the reputation scores calculated based on statistical updating of beta probability density functions (PDF). To measure the reputation of service providers, the equation presented in a study was adopted [30]. The input in the sentiment analysis system is the positive and negative classes. www.ijacsa.thesai.org In the proposed approach, the output from equation 2 used as input in the reputation equation. After the polarity calculated of both the positive and negative class, with the numbers of retweeting and favorite. The polarity number (e.g. positive class) took as a positive representation in the reputation equation. The equation 3 represent reputation equation: The α represent α = r + 1 and β = s + 1, where r is the output of popularity scoring number for positive classes and s the output of popularity scoring number for negative ones.

IV. RESULTS AND EVALUATION
This section indicates the results of the calculated reputation through a proposed methodology. By applying a hybrid sentiment analysis approach on 15,000 collected tweets.

A. Evaluation Metric
The proposed sentiment analysis that was applied was based on two stages: the primary stage and the advanced stage. On the primary stage SVM, DT, and MNB classification algorithms are used to capture the sentiment. For the training model 80% of the data was used and for testing 20% was used. Moreover, to measure the performance of classification four measurements are used in this study, which are the accuracy, precision, recall, and F-score.
Accuracy: is the degree of closeness of the classified outcomes to the true value. Measurement of the accuracy is significantly important because it reflects the percentage to realize the correct pattern and polarity. Where T P indicate True Positive, TN for True Negative, F P for False Positive, and FN for False Negative.   F-Score or the F-Measure: conveys the harmonic mean between the precision and the recall. It represents the integration of both precision and recalls into a single score.
After the primary sentiment analysis stage performed the results of the evaluation model summarize Precision, Recall, Fscore, and Accuracy in Tables II, III, and IV.   From the Tables II, III, and IV, the SVM gives a higher accuracy among the three classifiers. The accuracy of the three companies for STC, Mobily, and Zain equal 92%, 95%, and 89% respectively as Fig. 4 showed. DT mostly gives low results compared to the other classifiers because some classes are imbalanced.

B. Evaluation of Reputation
As mentioned before the analysis only included the positive and negative labels. After eliminating the natural labeled, duplicated, and blanked the dataset was reduced to 6,875. Two tests were applied to capture the impact of the Popularity Scoring equation on mustering reputation.
The first test was applied without the Popularity Scoring equation, so the number of retweets and favorites did not include. Only the total number of the classified tweet (e.g. Positive classified tweet) which represents α in the reputation equation. For example, the positive classified tweet for STC is 1636, where the negative is 1134. By applying the reputation equation 3 the reputation scores equal to 0.59.
The second test by applying the Popularity Scoring equation where the number of retweets and favorites counted. The result after applying the reputation equation was higher. The reputation score for STC was 0.97.
However, taking into consideration the number of retweets and favorites for each tweet showed an improvement in calculating reputation. Table V and Fig. 5, 6 shows the difference in reputation score between the two testing. (a) STC Reputation Score.
(c) Zain Reputation Score. From the result, the STC reputation score was higher after applying the Popularity Scoring equation. While Mobily and Zain indicate the lowest reputation score after applying the Popularity Scoring equation. The reputation scores of Mobily and Zain decreased, and this is not due to defective or erroneous results, but after applying a Popularity Scoring equation gave more accurate results that included the re-tweet calculation and favorites.

V. CONCLUSION AND FUTURE WORK
In this study, a new opinion review presented based on a hybrid approach focused on the undertaking of sentence-level sentiment analysis to calculate reputation scores from Arabic tweets. The divergence of opinions between the customers of Telecom service providers in Twitter causes a need for a new approach to determine the reputation score of service providers involving developing a new way to compute the polarity of Arabic sentiment.
A new step was added to the traditional sentiment analysis process to enhance its accuracy. First, the significance value of counting the retweets and favorites numbers in the polarity score explains, which represents a non-verbal opinion. Second, developing a reputation approach based on the polarity score of sentiment. The first step could assist the classifier to understand and make a better accuracy about the sentiment analysis in Arabic text. Different measures to evaluate the performance and efficacy of the classification were used in this study: the accuracy, precision, recall, and F-score. The result indicates that the SVM is the best-performed classifier, while the lowestperforming classifier is the Decision Tree. Also, the degree of reputation indicates that the STC company represents the highest reputation among its customers compared to other companies.
For future work, the approach will expand to consider multilevel word polarity instead of binary level. Also, there is a need to study the demographic characteristics of customers. In the data label, there may be a possibility where some word polarity is not noticed correctly. Currently, those instances are not handled so, it's better to use a lexicon-based approach besides labeling data by the Maza-jak tool. Moreover, the spam tweets should distinguish and eliminate to deliver more reliable reputation scores.