Ranking Beauty Clinics in Riyadh using Lexicon-Based Sentiment Analysis and Multiattribute-Utility Theory

In recent years, the amount of beauty-related usercreated content has steadily increased. Digital beauty-clinic reviews have major impact on user preferences. In supporting user selection decisions, ranking beauty clinics via online reviews is a valuable study subject, although research on this problem is still fairly limited. Sentiment analysis is a very important subject in the research community to evaluate a predefined sentiment from online texts written in a natural language on a particular topic. Recently, research on sentiment analysis for the Arabic language has become popular, since the language has become the fastest-growing language on the web. However, most sentimentanalysis tools are designed for the Modern Standard Arabic language, which is not widely used on social-media platforms. Moreover, the number of lexicons designed to handle the informal Arabic language is restricted, especially in the beautyclinic-related domain. Besides, numerous sentiment-analysis studies have concentrated on improving the accuracy of sentiment classifiers. Studies about choosing the right company or product on the basis of the results of sentiment analysis are still missing. In decision-analysis domain, the multiattributeutility theory has been extensively used in selecting the best option among a set of alternatives. Thus, this research aims to propose a systematic methodology that can develop a beautyclinic-domain-related sentiment lexicon in Saudi dialect, perform sentiment analysis on online reviews of 10 beauty clinics in Riyadh based on the built lexicon, and feed the lexicon-based sentiment analysis results to the multiattribute-utility theory method to evaluate and rank the beauty clinics. Results showed that the Abdelazim Bassam Clinic is Riyadh’s best beauty clinic on the basis of the proposed method. The research not only impacts data analysts regarding how to rate beauty clinics on the basis of lexicon-based sentiment-analysis results, but also directs users toward selecting the best beauty clinic. Keywords—Arabic language; beauty clinics; ranking; Lexiconbased; machine learning; sentiment analysis; multiattribute-utility theory


I. INTRODUCTION
The number of women who undergo cosmetic procedures in Saudi Arabia has grown significantly over the last few years. Liposuction, laser hair removal, Botox, and filler treatments are all classified as cosmetic procedures. Exposure to cosmetic procedures has increased in Saudi Arabia due to media penetration, ever-changing beauty standards, the popularity of the fashion and film industries, per capita income growth, reduced prices for cosmetics, and increasing obesity, thereby generating a high demand for cosmetic treatment in Saudi Arabia [1]. This demonstrates that cosmetic procedures are becoming a trend in Saudi Arabia not only for women, but also for men to some extent; one study showed that 90.5% of clients are female and 9.5% of them are male. Therefore, the number of beauty clinics has also relatively increased [2]. However, according to the Saudi Press Agency [3], many beauty clinics took advantage of this trend in an unpleasant way that resulted in many distortions and legal problems, especially related to unlicensed cosmetic surgeries carried out in beauty salons and unlicensed beauty clinics that used untrusted medical devices. This phenomenon is a growing concern for Saudi women when choosing an appropriate clinic, as there is no clear study or analysis of beauty clinics in Saudi Arabia, especially in Riyadh.
Currently, with the advent and rapid growth of socialmedia apps, people have begun revealing their views and opinions on, for example, goods, social issues, and policies on the Internet, which has increased the number of user-generated reviews containing rich opinions and sentiment information. Opinion mining, often referred to as sentiment analysis, is a research field that seeks to examine individual perceptions or opinions toward entities such as topics, people, problems, organizations, or events. Sentiment analysis (SA) is a way of classifying text into positive, negative, or neutral sentiments that deal with subjective statements. SA is common because people tend to take advice from other people when making successful investments. It also allows for suppliers to provide insight into the success of their products or services [4].
One of the most commonly utilized techniques is lexiconbased SA. Using this technique, sentiment is classified on the basis of sentiment lexicons. Sentiment lexicons are series of terms correlated with positive-or negative-sentiment orientation [5,6]. They are also called polar or opinion terms. There were several studies performed that built sentiment lexicons for the Modern Standard Arabic language [7][8][9][10]. However, they are not relevant to views or comments extracted from social-media platforms, which mostly use the *Corresponding Author www.ijacsa.thesai.org informal Arabic language. The informal Arabic language ignores standard rules of spelling and grammar. Furthermore, it can vary from one region to another because each region has its own dialect. Several studies developed informal Arabic lexicons in the Lebanese [11], Algerian [12], and Saudi dialects [13,14], but they might not be relevant to the beautyclinic-related context. In addition, most sentiment-analysis studies focused on improving accuracy. There is still a lack of studies providing information on how to select the best product or organization on the basis of sentiment-analysis results.
There is an established method called multicriteria decision making (MCDM) that can help decision-makers to make decision on a set of alternatives. Some well-known MCDM techniques include multiattribute-utility theory (MAUT) [15], VlseKriterijumska Optimizcija I Kaompromisno Resenje (VIKOR) [16], analytic hierarchy process (AHP) [17], and Technique for Order Preference by Similarity to the Ideal Solution (TOPSIS) [18]. Research on MCDM has shown that it is challenging to determine which technique is better because no technique can be the best among the existing techniques [19,20]. However, because of its simplicity, MAUT was commonly used [21][22][23]. MAUT is a well-established form of decision analysis that explicitly discusses how an alternative can be chosen from a collection of alternatives. This approach allows for the decision maker to separately assess alternatives for each attribute. The decision maker allocates relative weights to different attributes. Then, a formal model is used to combine and aggregate values and weights to produce a global assessment of each alternative [15]. A few examples of successful MAUT applications include its recent use to rank a number of bridges [24], select iron and steel suppliers [25], and identify attributes for inclusion in the framework of an education recommendation system [26]. Nevertheless, to the best of our knowledge, it has not been used to analyze decisions based on sentiment analysis results.
To tackle the aforementioned drawbacks, this study proposes a research method that can build a beauty-clinicdomain-related sentiment lexicon in Saudi dialect, conduct sentiment analysis based on the developed lexicon and pass the results to the multiattribute-utility theory method to assess and rank the beauty clinics. This study facilitates potential clients in selecting an experienced and reputable beauty clinic based on reviews from other people with previous experience in cosmetic procedures by extracting their opinions from Twitter, which is one of the most famous social-media platforms where people express their views. The key contributions of this study are as follows:  A new sentiment lexicon was built for the informal Arabic language (Saudi dialect) from the Twitter accounts of 10 beauty clinics in Riyadh using a manual approach to address the irrelevant sentiment-lexicon problem;  Online reviews from the Twitter accounts of 10 beauty clinics were preprocessed by performing tokenization, normalization, cleaning, stemming, and stop-words removal; and classified into sentiment using the newly built sentiment-lexicon;  The performance of the 10 beauty clinics was evaluated by applying multiattribute-utility theory using the results of lexicon-based sentiment analysis as inputs to show their ranking.
The remainder of this study is structured as follows: Section 2 outlines previous work related to the study; Section 3 elaborates on the methodology used for the research; Section 4 introduces the findings of the study; Section 5 addresses the sensitivity analyses; lastly, Section 6 provides the study conclusions.

A. Sentiment Lexicon
A sentiment lexicon is one of the most useful means for carrying out sentiment-analysis research for any language. It is an essential tool for both unsupervised (lexicon-based) and supervised (machine-learning) classifiers. It is used by many researchers to generate unsupervised sentiment models or training features in supervised approaches to train machinelearning algorithms. A sentiment lexicon is a set of opinion words or polar terms correlated on the basis of their orientation, i.e., positive, negative, or neutral. The sources and approaches applied to non-English sentiment-lexicon development are graphically illustrated in Fig. 1. For further information on each method, readers may refer to the work of Kaity and Balakrishnan [27].
Al-Twairesh et al. [28] used a translation-based approach to generate comprehensive Arabic sentiment lexicons (AraSenti-Trans) employing the MADAMIRA tool [29], which recognizes Arabic terms in tweets and eliminates tweets containing non-Arabic terms and dialects. They applied two lexicons constructed by Hu and Liu [30], and Wilson et al. [31] as sentiment-orientation tools after preprocessing. They used MADAMIRA's English glossary to determine word polarity upon comparing both lexicons in [30] and [31]. They also applied a frequency-based approach to build the AraSenti-PMI lexicon by adopting the pointwise-mutualinformation (PMI) measure to differentiate words as either positive or negative in a corpus. Mahyoub et al. [32] used a relationship-based approach by introducing an algorithm that assigns sentiment scores to Arabic WordNet entries for Arabic sentiment-lexicon construction. Using synset relationships, a semisupervised-learning algorithm was applied to enhance the number of terms in Arabic WordNet. Badaro et al. [7] used a merger-based approach to combine four existing sentiment lexicons (Standard Arabic Morphological Analyzer [33], English WordNet, English SentiWordNet [34,35], and Arabic WordNet [36]) into a new Arabic sentiment lexicon (ArSenL). However, these lexicons were built for standard Arabic, which is not relevant to the language used on social media, i.e., informal Arabic. www.ijacsa.thesai.org Duwairi et al. [37] adopted a crowdsourcing approach to build three sentiment lexicons: a lexicon mapping the Jordanian dialect to Modern Standard Arabic (MSA), a lexicon mapping Arabizi terms to MSA, and an emoticon lexicon. Abdul-Mageed et al. [38] applied a manual approach to develop a 3982 adjective sentiment lexicon as part of the SAMAR system built in MSA and Arabic dialects to examine Arabic subjectivity and sentiments. Al-Ghaith [14] built the SaudiSentiPlus sentiment lexicon containing 7139 Saudi dialect terms by first using an automatic translation of English sentiment lexicons previously generated by [30] and [39]. He then manually extracted all Saudi dialect terms from Twitter sentiment data. However, the lexicons may not be applicable to the beauty-clinic-related context. Despite it being considered time-consuming and costly, many researchers still use a manual approach, particularly those studying sentiment analysis in languages lacking lexical resources.

B. Sentiment Analysis
Sentiment analysis is computational analysis of sentiments, opinions, and emotions in a text related to a specific subject [6]. It assists in attaining many objectives, such as observing public sentiments toward political movements, measuring customer satisfaction [40], establishing market intelligence [41], and predicting sales for a product. Therefore, researchers took advantage of SA and developed it for various purposes in different fields and languages [42,43]. There are four sentiment-analysis tasks: sentiment classification, subjective classification, opinionspam detection, and review usefulness. SA classification can be done on three levels: the document level, which attempts to show the polarity of an entire document opinion focusing on a single topic (e.g., product or place) from a single opinion holder; the sentence level, which assumes the sentence is a distinct unit that comprises only an opinion; and the feature level, also called the entity level, which seeks to assign polarity to multiple reviews using extracted features [44][45][46][47][48]. These processes can be completed by applying three main techniques: a lexicon-based, machine-learning, or hybrid technique [49]. The lexicon-based technique counts and weighs sentiment-related words to enable the adoption of a lexicon to perform sentiment analysis. The machine-learning (ML) technique predicts the polarity of a large number of posts using training datasets or annotated corpora. The hybrid technique combines the lexicon-based and machine-learning techniques, whereby the lexicon-based classifier builds a dataset by labeling each tweet on the basis of a sentiment lexicon, and the machine-learning-based classifier is trained and tested using the dataset generated from the lexicon-based approach [49,50].

1) Lexicon-Based approach:
A lexicon-based approach, known as the dual-polarity algorithm, was developed by El-Beltagy and Ali [51] to determine the weights of Egyptian dialect sentiment lexicon terms, achieving 70% accuracy. Abdulla et al. [52] constructed a lexicon-based sentimentanalysis method by considering negation and intensification to measure text polarity. They applied a basic lexicon-based approach on an Arabic corpus collected from Twitter and Yahoo!-Maktoob by calculating positive and negative terms in a given text, and then they categorized the sentences using a larger category. Applying various lexicon scalability stages, they achieved 70.05% accuracy. Ayyoub et al. [8] developed a sentiment lexicon of approximately 120,000 Arabic terms extracted from Twitter, and developed a lexicon-based sentiment-analysis method using a predicate calculus. Their proposed method demonstrated better predictive accuracy (86.89%) compared to that of the keyword-based method. Assiri et al. [10] built a Saudi dialect lexicon manually that includes 14,000 sentiment words. They then built a weighted lexicon-based classifier. Their proposed classifier removed associations between polarity and nonpolarity terms for a dataset, and then weighted these terms by using their associations. They applied new rules for processing certain linguistic features such as negation and supplication. Their enhanced lexicon-based sentiment-analysis classifier achieved 81% accuracy. www.ijacsa.thesai.org 2) Machine-Learning approach: Hodeghatta [53] conducted sentiment analysis using the naïve Bayes and MaxEnt algorithms to categorize Twitter messages related to Hollywood movies as positive, negative, and cognitive sentiments across various regions of various countries. The classifiers were tested for both unigrams and bigrams. The MaxEnt classifier with unigrams achieved the highest accuracy of 84%. Jokhio et al. [54] determined the sentiment polarity (positive, negative, or neutral) of English tweets related to plastic-surgery treatments in the United States using a naïve Bayes algorithm. The experimental results demonstrate the importance of the suggested method, which may allow potential groups of people who want to have plastic-surgery treatments to make better decisions. Recently, Naseem et al. [55][56][57] proposed Deep Intelligent Contextual Embedding (DICE) [55], Hybrid Words Representation [56] and Transformer-based Deep Intelligent Contextual Embedding (DICET) [57] models to improve tweet quality by considering polysemy, syntax, semantics, and out of vocabulary (OOV) words; and handling the noise within the textual context. The evaluation of the models executed on airline-related datasets indicated that their proposed models improved the accuracy for tweets classification with an average accuracy of 93.5%, 94.2%, and 94.6%, respectively. The results verified that their models can successfully handle low-quality data and language complexities. However, these studies only considered tweets written in the English language.
3) Hybrid approach (Lexicon-Based + ML): Yoo and Nam [58] proposed a hybrid approach to Korean sentiment analysis, where a lexicon-based classifier was used to locally parse sentiment components to identify opinions, while a naïve Bayes classifier was used to categorize the text in a dictionary by training the MUSE (Multilingual Sentiment Lexica and Sentiment-Annotated Corpora) opinion corpus. Aldayel and Azmi [59] combined a lexicon-based technique and a machine-learning-based algorithm to detect the polarities of Arabic tweets. Through this approach, the lexicalbased classifier was used to label the training data from a manually built sentiment lexicon. The output of the lexiconbased classifier was adopted as training data for the supportvector-machine (SVM) classifier.
However, all aforementioned studies [8,10,[51][52][53][54][55][56][57][58][59] solely focused on how to improve the accuracy or performance of the sentiment classifier, which is not the main focus in this study. There is still a lack of studies providing information on how to select the best product or organization on the basis of sentiment-analysis results.

C. Multiattribute-Utility Theory
Multiattribute-utility theory (MAUT) is a well-established decision-analysis approach that discusses precisely how to choose an alternative from a set of alternatives. Although the functional implementation of the multiattribute-utility technique can differ, the process involves the following steps [15]: 1) Identifying alternatives and value-relevant attributes: The first process is to define the alternatives available and their most important attributes.
2) Separately assessing each alternative in terms of each attribute: Each alternative is assessed in terms of each attribute set out in Step 1.
3) Allocating relative weights to attributes: The weights for each attribute are allocated using a direct-rating method. The total rating for each attribute is calculated and transformed into weights using the following formula: where is the weight for each attribute, and is the relative importance rating score for each attribute. Thus, individual weights are obtained that sum to 1 for each attribute, as is conventional in the MAUT [15].

4)
Aggregating attribute weights and single-attribute assessments of alternatives for a global assessment of alternatives: Weights and single-attribute values or functions are aggregated by adopting the weighted linear additive preference function [15] shown in (2).

∑
( 2) where is the value of alternative A in terms of the ith attribute, is the importance weight of the i-th attribute, and n is the number of different attributes [15].

5)
Conducting sensitivity analyses and offering recommendations: In the previous step, multiattribute utilities are determined for all alternatives. The best alternative should be that with the highest multiattribute utility. Sensitivity analyses are carried out in the last stage of the process to determine the stability of the results. The effect of various values and weights on the multiattribute utility of the available alternatives can then be calculated. Different values and weights can be obtained using various methods of elicitation. For instance, both the direct-rating method and the bisection approach may be applied to obtain values. Thus, multiattribute utilities are measured twice: once using the values calculated using the direct-rating method, and then using the values calculated using the bisecting tool. Then, the resulting multiattribute utility for the available alternatives can be evaluated, and the stability of the results can be calculated. An alternative way of understanding the robustness of the results is to apply a different weighting method. For instance, the equal-weight method can be adopted. This method simplifies the selection process by avoiding information on the relative importance of each attribute [60,61]. This means that the approach assumes that all attributes are of equal weight. For instance, for four attributes, the weight of each attribute would be 1/4 = 0.25. The total value for each alternative is attained by summing the values in terms of each attribute. The rankreciprocal rule and rank-sum weighting techniques can also be used to assign weights [15]. www.ijacsa.thesai.org

A. Phase 1: Collecting Data
In this phase, data were collected using the Twitter application programming interface (API), which is an interface for researchers to collect data from a given socialmedia service. Opinions were aggregated using the Tweepy library by searching for tweets related to the 10 selected beauty clinics (Panorama, Pearl, Alzallal, Obagi, Abas, Derma, Kesaaee, AbdelAzim Bassam, Adma, and The Clinic). These beauty clinics are the most popular in Riyadh. The overall number of aggregated tweets was 780.

B. Phase 2: Constructing Sentiment Lexicon
A sentiment lexicon is a table where each word is associated with its polarity, which indicates each word's orientation (positive, neutral, or negative). In this phase, an informal Arabic sentiment lexicon in the Saudi dialect was built by manually extracting sentiment words from the collected dataset. Each sentiment word was assigned its polarity. Afterward, the lexicon was used to assign polarity to each tweet using its score.
As this project was conducted in the informal Arabic language, many tweets contained emoji. Since most emoji change the meaning of sentences, an emoji lexicon was also built. A function from the Python library called extract_emojis() was used to find the emoji in our dataset.
In order to simplify the classification process in Phase 4, the lexicon was separated into four groups: positive-and negative-word lexicons, and positive-and negative-emoji lexicons. Each of these lexicons was stored in a different list.

C. Phase 3: Preprocessing Data
In this phase, the dataset was preprocessed using the following techniques:  Tokenization: Data were transformed from the sentence level to the word level by dividing the text into a set of meaningful tokens using the codecs and nltk libraries, and the word_tokenize() function.
 Text cleaning: All non-Arabic words, special Twitter characters, and usernames (for example, # and @), and irrelevant information (for example, universal resource locators (URLs)) were removed. In addition, retweets were eliminated.
 Text normalization and eliminating repeated letters: In this step, some Arabic letters were transformed into general letters, such as unifying the letter alif by replacing ‫أ‬ ‫,إ‬ , and ‫آ‬ with ‫.ا‬ In addition, repeated letters such as ‫مرررة‬ and ‫مرة‬ were removed. This process was done using the re, sys, string, and argparse libraries.
 Stop-word removal: Words with no meaning or that do not hold information and have no effect on the output, such as conjunctions, articles, and prepositions, including ‫/في‬fi (in), ‫/مه‬min (of), and ‫/على‬ala (on), were removed.

D. Phase 4: Classifying Tweets
Algorithm 1 [59] was applied to classify the preprocessed unannotated tweets as positive, negative, or neutral. The algorithm takes the dataset containing tweets and tokenizes it; then, it checks each token's score from the built lexicon and adds it for each sentence. If the overall score for the tweet is greater than zero, a positive label is assigned; otherwise, a negative label is assigned.

Algorithm 1
Input: preprocessed unannotated tweets Output: set of classified and annotated tweets along with their sentiment (positive or negative), and a set of unclassified tweets (corresponding to neutral sentiment). Begin For each tweet t in training set do W = tokenize tweet t into a list of words w1, w2, … and emoji e1, e2, … score = 0 For each wi, ei in W do Stem wi, ei If (wi is in Lexicon) score = score + positivity_degree(wi) Else If (ei is in Lexicon) score = score + positivity_degree(ei) End for If (score > 0) return POSITIVE Else return score < 0 ? NEGATIVE:NEUTRAL End In order to refer to our four lexicon groups, a function called tweets_polarity was created to determine the polarity of each tweet. This function takes one parameter (i.e., a tweet) and returns its polarity. At first, each tweet is tokenized and stored into a tokenized-tweet list, W. Second, the score variable is set to zero. Third, a for loop is created to run through each word and emoji in W. The body of the for loop contains an if condition to check if the word belongs to a certain lexicon; for example, if the word belongs to the positive-word lexicon, the score variable is increased by one. Fourth, another if condition enables determining the polarity value using the score variable; for example, if the score variable was greater than zero, the polarity was set to positive.
A for loop was created to run through each entry in the tweet list, followed by storing the value of that entry as a variable and passing it along to the tweets_polarity function before calling it. This function is supposed to return the polarity value for a tweet each time it is called. Lastly, the tweets and their polarity values were saved to a dataset. This dataset was used in Phase 5 to train and test three machinelearning classifiers. It was also used in Phase 6 to calculate the total score and rank the beauty clinics.

E. Phase 5: Developing Classification Model
In this phase, classification models were built on the basis of three popular classifiers: random forest (RF), support vector machine (SVM), and K-nearest neighbor (KNN). The models were trained and tested using Python. The dataset generated from Phase 4 was divided into training and testing sets in a 70:30 ratio. All three classifiers were set to their default values. Next, the performance of the models was compared on the basis of their accuracy.

F. Phase 6: Ranking Beauty Clinics
In this phase, the multiattribute-utility theory technique was applied in the following steps: 1) Identifying alternatives and value-relevant attributes: In this study, we evaluated the 10 most popular beauty clinics in Riyadh on the basis of three attributes generated from lexiconbased sentiment analysis: positive, negative, and neutral tweets.
2) Separately assessing each alternative in terms of each attribute: The 10 beauty clinics were evaluated using the three attributes generated from the lexicon-based sentiment analysis as a function of the total count of positive, negative, and neutral tweets.
3) Allocating relative weights to the attributes: The weights for each attribute were assigned using the direct-rating method. An online questionnaire was designed where the decision makers (beauty-clinic users) were invited to rate the relative importance of the three attributes on a five-point Likert scale, where 1 denotes the least and 5 denotes the most important attribute. The total rating for each attribute was computed and transformed into weights using (1).

4)
Aggregating the weights of attributes and the singleattribute assessments of alternatives to calculate the global assessment of alternatives: Weights and single-attribute values were aggregated using (2). 5) Running sensitivity analyses and generating recommendations: The alternatives were ranked on the basis of the aggregation score obtained in Step 4. The alternative with the highest aggregation score was assigned the highest rank; the alternative with the lowest score was assigned the lowest rank. The alternative with the highest score was considered the most preferred. Sensitivity analyses were run to evaluate the stability of the results. In this study, two sensitivity analyses were conducted: (1) using only the single attribute with the highest weight, and (2) using equal weight. In the first, alternatives were ranked on the basis of the rating score for the attribute with the highest weight. In the second, since we used the equal-weight technique, the weight for each attribute was 0.33. The global value for each alternative was computed, and the alternatives were ranked on the basis of their global score. The ranking results using these techniques were compared with the result using the proposed technique.
In order to direct us in the presentation of the results of our analysis, the six following research questions were raised:  RQ1: What is the structure of the built sentiment lexicon?
 RQ2: What is the total number of positive, negative, and neutral tweets for each beauty clinic?
 RQ3: Which ML classifiers perform best?
 RQ4: Which clinic is the most preferred by clients?
 RQ5: What is the impact of using the single-attribute utility?
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 72 | P a g e www.ijacsa.thesai.org  RQ6: What is the impact of using different weighting methods?

A. RQ1 Answer: Structure of Sentiment Lexicon
The sentiment lexicon generated from the 780 tweets collected in this study consisted of 658 sentiment words: 276 negative, 276 positive, and 106 neutral words. Table 1 shows a sample of the informal Arabic word lexicon.
It also consisted of 150 sentiment emoji: 72 positive, 54 negative, and 24 neutral emoji. Table 2 presents an example of the emoji lexicon. Since neutral words and emoji were assigned a polarity of zero, they were not inserted into the lexicon list.

B. RQ2 Answer: Total Number of Positive, Negative, and Neutral Tweets for each Beauty Clinic
We were unable to present the results of tweet classification because we are bound to Twitter's privacy policy that prevents the publication of the original tweets that were collected. Table 3 portrays the number of tweets for each beauty clinic that were classified as positive, negative, or neutral. The total count of positive, negative, and neutral tweets was considered as the rating score for each clinic in terms of each attribute. Table 4 demonstrates that SVM outperformed RF and KNN in classifying the tweets as positive, negative, or neutral sentiments with 72.1% accuracy. This result was expected, and it was deemed acceptable for the small size of the training dataset. It may have achieved better performance if we had increased the size of the training dataset. Table 5 shows the total relative importance of each attribute as rated by 46 beauty-clinic users. The weights for each attribute were calculated by using (1). Positive tweets had the highest weight value (0.5), followed by neutral tweets (0.4) and negative tweets (0.1). These weight values were used to aggregate the rating score for each beauty clinic. Table 6 demonstrates the total rating score for each beauty clinic that was computed using (2). The global scores for each beauty clinic were sorted in descending order, and a bar graph was plotted to show the rank.    Fig. 3 illustrates the ranking of the 10 Riyadh beauty clinics. AbdelAzim Bassam Clinic was found to be the most preferred among all beauty clinics investigated in this study on the basis of its rank. www.ijacsa.thesai.org  Table 7 presents the results of the first sensitivity analysis that was conducted using a single attribute. Results showed that the existing ranking of the alternatives changed for five pairs of alternatives:

D. RQ4 Answer: Most Preferred Beauty Clinic
Despite these changes in the ranking of some pairs of alternatives, the first-ranked alternative remained unaffected.

B. RQ6 Answer: Impact of using Different Weighting
Methods Table 8 displays the results of the second sensitivity analysis that was conducted using equal weight. Results showed that the existing ranking of the alternatives changed for five pairs of alternatives: A 2 -A 4 , A 3 -A 7 , A 3 -A 10 , A 6 -A 8 , and A 7 -A 10 . The impact of using equal weight was more critical because it affected the first-ranked alternative.

C. Validity Threats
There were three threats to validity considered in this study: threats to construct, internal, and external validity.
Threats to construct validity were related to the performance measure used in this study. Machine-learning performance was measured on the basis of accuracy. Many other performance metrics can be used, such as F-measure, sensitivity, and precision. However, accuracy is widely used by many researchers when measuring the performance of lexicon-based or machine-learning classifiers in sentimentanalysis research.
Threats to internal validity were related to uncontrolled internal factors that might have influenced the experiment results. These internal factors would have occurred during method implementation. In order to minimize these threats, we first used a standard lexicon-based algorithm proposed by [59] to build the training dataset. Second, we chose three wellknown classifiers provided by scikit-learn libraries [62] (RF, SVM, and KNN) to train and test the tweet dataset. We applied default settings for all three classifiers. Third, the formula used for MAUT was taken from an established resource [15].
Threats to external validity were related to the possibility of generalizing the results of this study. One possible factor was the use of a dataset from 10 beauty clinics in Riyadh. This dataset was very limited and domain-specific. However, this study focused on evaluating 10 beauty clinics in Riyadh; thus, the best way to reduce this threat was to select the Twitter accounts of the 10 most popular beauty clinics in Riyadh. Another possible factor constituted the methods used for sensitivity analysis. In this study, we ran sensitivity analyses using the single-attribute evaluation technique and the equalweight method. There are various other methods that can be used, such as VIKOR, AHP, TOPSIS, and the rank-sum www.ijacsa.thesai.org weighting method. However, the methods used in this analysis are among numerous suggested in the literature, and they are commonly used because of their simplicity.

VI. CONCLUSION
In this study, a novel method for ranking beauty clinics in Riyadh was proposed. It started with data collection from 10 beauty clinics in Riyadh using the Twitter API. The dataset was manually transformed into a sentiment lexicon. Then, data were preprocessed by applying tokenization, cleaning, normalization, stop-word removal, and stemming techniques. Next, preprocessed tweets were classified as positive, negative, or neutral using a lexicon-based approach. Lastly, the rating score for each beauty clinic was computed using multiattribute-utility theory to rank the beauty clinics.
Results showed that Abdelazim Bassam Clinic is Riyadh's best beauty clinic on the basis of the proposed method. This study will impact clients when choosing the best beauty clinic. Moreover, it can assist beauty-clinic owners in understanding how they are faring with their clients, as it gives them a better picture of how they stack up against their competitors, thereby providing them with an opportunity to improve their services. Furthermore, this study provides data analysts with an example of how to rate beauty clinics by using lexicon-based sentiment-analysis results.
In the future, we will increase the number of words and emoji in our sentiment lexicon. Furthermore, we will combine some machine-learning algorithms and evaluate their performance in classifying tweets as a function of their polarity. An aspect-based sentiment analysis will be applied to obtain additional attributes to evaluate the performance of beauty clinics.