What Drives Airbnb Customers’ Satisfaction in Amsterdam? A Sentiment Analysis

The sharing economy is a new socio-economic system that allows individuals to rent out their personal belongings, such as their private car or a room in their home, for a short period. This study aims to investigate the attributes that impact customers’ satisfaction when using the sharing economy propriety rentals websites. Large data sets of Airbnb’s online reviews and listings in Amsterdam were analyzed using sentiment analysis, word clustering, ordinal logistic regression, and visualization techniques. Findings reveal that the polarity of Airbnb guests reviews in Amsterdam is significantly impacted by property price, value, cleanliness, rate, host communication, easiness of check-in, the accuracy of property description, and whether the host is super host or not. Surprisingly, the property neighborhood was not found to impact customers’ sentiment in Amsterdam. In addition, Airbnb guests in Amsterdam tend to positively express their experience satisfaction level mainly based on property exact location and host interaction followed with the facilities surrounding the property, property cleanliness, and room quality. On the other hand, negative online reviews tend to be mainly linked to problems with check-in services followed by aspects related to weak host interaction, location, and room quality. The results indicate that Airbnb hosts need to offer clear and easy check-in services with emphasizing the importance of keeping a good communication channel with their guests to enhance customers’ experience and increase customers’ satisfaction level. Future studies should investigate the applicability of the findings of this study in the context of other cities. Keywords—Airbnb; customer satisfaction; customer experience; big data; sentiment analysis; ordinal logistic regression


I. INTRODUCTION
The development of media technology has contributed to the prosperity of a new business concept, which is the sharing economy [1]. A sharing economy is an advanced approach for conducting businesses, where people can share their own resources in a decentralized or peer-to-peer (P2P) platform for an exchange of money [2]. It has generated new business opportunities, especially for those who don't have the essential resources. Additionally, property owners have granted the chance to liquidate and utilize their unused assets across the globe, allowing them to serve customers from different regions and cultures [3]. The sharing economy had provided the necessary services in different qualities and prices 24/7 in order to provide suitable services to fulfill individual needs and cater to a different level of purchasing power. As a result, consumption and production sustainability have been accelerated [4]. The concept of the sharing economy has been applied in different fields including, shared mobility, hospitality, on-demand staffing, and media streaming. Lime, JustPark, and Uber are some examples of sharing economy applications. In the context of hospitality, several companies have emerged, such as Airbnb, Couchsurfing, HomeAway, Roomorama, and HomeExchange. These organizations act as an intermediary between guests and hosts to facilitate communications between them [5].
Looking at Airbnb as an alternative option for hotels, several existing research have explored this platform from different perspectives. The author in [6] examined the substitution and complementary effects of Airbnb supply on hotel sales performance patterns while [7] studied the extent to which Airbnb is used as a substitute for hotels and [8] studied the price difference between hotel property and the nearby Airbnb listings offers. Additionally, many other researchers have studied the social contact of guests in the Airbnb accommodation. For instance, [9] explored the social contact of Airbnb guests during their stay, taking into account these types of contact: guest to host, guest to the community, and guest to guest communications whereas [10] studied the reciprocal aspect of social interactions in P2P accommodations. Several other studies have examined factors that might affect guests' booking intentions such as studying the impact of gender congruity between guests and hosts on guests' booking [11], and the impact of properties and hosts descriptions on guests' booking and posting reviews [12]. Moreover, some authors recognized the importance of studying online advertising strategies [13], price determination [14], trust-attachment building mechanisms [15], and customers' psychological behavior [16]. Despite the diversity of Airbnb research fields, few studies have examined guests' reviews toward their Airbnb stay. [17] have investigated the attributes that influence Airbnb users' experiences through the use of text mining and sentiment analysis. Similarly, [18] have investigated hidden dimensions in textual reviews through the use of Latent Aspect Rating Analysis (LARA). Analyzing big data such as customer reviews is a key success factor for sharing economy businesses [19]. It enables a business owner to understand the organization's strengths and weaknesses along with customers' habits. Therefore, it helps to predict future trends, improve decision making, detect frauds, and eventually increase business revenues.
However, to the best of the authors' knowledge, there are no previous studies investigating factors that drive customers' satisfaction through analyzing guests' online reviews 255 | P a g e www.ijacsa.thesai.org simultaneously with the listing features. Therefore, to better understands customers' needs and achieve customers' satisfaction, there is a need to identify factors that makes customers satisfied. Analyzing users' comments and reviews, through the use of various data mining techniques, helps to identify factors impacting users' satisfaction [19]. Therefore, the research question to be answered in this research is: What makes customers satisfied when using the Airbnb platform in Amsterdam?
Airbnb online reviews dataset in Amsterdam will be analyzed simultaneously with the listings details dataset by using sentiment analysis, topic clustering, regression, and visualization techniques. The paper begins with reviewing relevant literature, clarifying the applied methodology, presenting the results and discussion, and finally concludes the paper.

A. Sharing Economy
According to [20], sharing economy concept is the result of a massive shift, not a new phenomenon in itself. This is because sharing economy is an online community that includes economic aspects such as selling, buying, and renting which already existed before. However, the new concept is distinguished by allowing users to participate using different types of resources, in both economic and social aspects, whether it was a human, merchandise, service, or property [21]. A more comprehensive description was provided by [22], where they defined sharing economy as a P2P environment that enables a person to be a client and a provider at the same time. The sharing economy facilitates social participation and cooperative consumption which in turn supports protecting the environment, minimizing resource waste, and enhancing community awareness [4]. The sharing economy covers five main areas: product-service systems, redistribution markets, collaborative lifestyles, access-based consumption, and commercial sharing systems.
As mentioned earlier, sharing economy businesses have witnessed rapid development in the last decade. The author in [5] justify the reason to be related to the fact that sharing economy businesses are less expensive for consumers, especially in the case of accommodation as it is cheaper for travelers to choose a place from Airbnb to stay, rather than renting a room in a hotel. Additionally, [19] have demonstrated that science and technology revolutions are playing a huge role in sharing economy growth through the development of electronic social networking (e-SN) platforms which contributed to the development of society and the economy.
Although business fields differ, online hospitality is one of the most famous fields in sharing economy. The word hospitality was defined by [22] as two associated meanings. From the guests' perspective, hospitality is a process of delivering a service of high quality while from hosts' perspectives; it is about providing services and rooms with a focus on profitability. In terms of hospitality, using the sharing economy, hospitality means allowing individuals to invest their properties to host visitors for a small financial return through the use of online sharing economy portals such as the Airbnb website.

B. Customers' Satisfaction
The word satisfaction is defined as "a psychological statement with the fulfillment of a need or desire and the pleasure obtained by such fulfillment" [19]. Customers' satisfaction can be interpreted as customers' judgments, opinions, feelings, impressions, or emotional reactions towards their comprehensive experiences of a product or a service [23]. [22] have indicated that it is essential to understand customers' satisfaction and how it is impacted to ensure customers' commitments and obtaining their loyalties. Increasing customer satisfaction will increase customer retention and loyalty. Consequently, organizations' revenues and profits will rise as there is a direct link between customers' satisfaction and profit [24]. However, increasing customers' satisfaction comes at a cost, so organizations must be careful not to reach excessive satisfaction that costs them a lot without returns. To achieve the maximum customer satisfaction level with minimum cost, organizations must understand their customers and what makes them satisfied [24].
Obtaining customers' satisfaction could be a big challenge for many organizations. The first step to increase customers' satisfaction is identifying customers' current satisfaction levels and the reasons that led to this satisfaction level. Capturing and measuring customer experience is an excellent way to achieve that. Analyzing rating scores and comments that were produced by customers after the end of a service, can grant comprehensive knowledge about customers' satisfaction [25]. In this research, Airbnb guests' reviews will be analyzed, to better understand customers' satisfaction along with factors that affect their level of satisfaction.

C. Airbnb
Airbnb, Inc. is an online marketplace that was founded in 2008 for offering accommodates, homestays, or tourism experiences. It is one of the most leading and profitable platforms within the sharing economy [26]. The organization works as a broker that connects real estate owners with hospitality seekers. In particular, it links individuals who need to lease out their homes (hosts) with individuals who are searching for lodging (guests) in that region in exchange for a small commission from each reservation [27]. Hosts are taking part in Airbnb as an approach to obtain some pay from their property even though they are facing the risk that guests may harm their property. On the other hand, guests are taking advantage of having accommodation with relatively lower prices than other places while at the same time facing the risk that the property's quality will not be as expected. Generally, Airbnb provides services for both guests and hosts to provide a better coordinating outcome [28].
Several types of research have been applied to Airbnb, some focusing on user experience and multimedia, others on administration, tourism, and architecture [29]. Covering multiple areas including review bias, hospitality exchanges, price and neighborhood prediction, neighborhoods ranking, socio-economic characteristics, listing recommendation system, rentals' distribution, users' preferences and expectations, image mining, demand mining, grading schema, 256 | P a g e www.ijacsa.thesai.org matching schema, consumer segmentation, trust evaluation, innovation adoption, and adoption evaluation. In addition to some researches that focus on analyzing customers' satisfaction dimensions [30], analyzing customers feedback using text mining [3], and analyzing public opinions using content analysis [21]. Previous studies have almost exclusively focused on analyzing user reviews separately without linking them to listings data. A more comprehensive analysis is required for analyzing factors affecting customers' satisfaction, in a way that enables non-technical users to understand it without any effort.

A. Data Collection and Preparation
Over time, extensive literature has confirmed the importance and usefulness of analyzing online review comments for researchers, business owners, as well as other customers. Since users' comments illustrate their perceptions and feelings formed while using a service or a product, it could be used to measure and analyze users' satisfaction level, along with identifying key factors that led to such a level of satisfaction [31].
On the Airbnb website, guests are posting reviews that reflect their experiences and opinions about the property they stay on. This provides data that can be analyzed, to identify factors that make customers satisfied [32]. The current study obtained online review comments of Airbnb' guests about their accommodations in Amsterdam, as well as detailed listing data from the Inside Airbnb website [33]. Given the fact of the massive availability of Airbnb data as public Datasets available on the Inside Airbnb website, both datasets were obtained easily. The listing dataset contains 107 attributes that describe host, property features, and scored reviews while the review dataset contains six attributes and more than 400 thousand review records. Datasets were then pre-processed or cleaned before the analysis phase, as performing this step usually helps to generate feasible results [34]. For the listings dataset, the cleaning process was done by eliminating attributes containing errors, no values (blank), and useless values. Thereafter, new columns were generated or derived from existing attributes, such as host experience in months derived from the host starting date, weekly and monthly discounts derived from prices. On the other hand, the review dataset was checked for missing data and non-English content to be removed. Google Online Spreadsheet was used to detect comments' language, and to clean non-English data from the set, which represents 18% of the records. Unwanted columns were eliminated including comment ID, reviewer ID, and reviewer name. Additionally, a filter was applied using the date of the comment, choosing comments within 2019 only, to minimize the number of reviews and make it manageable to analyze the data using home devices with limited capabilities. Dummy comments containing characters or numbers only were also removed leaving a total of 110,747 comments eligible for the analysis.
B. Data Analysis 1) Sentiments analysis: The first step in the analysis stage began with sentiment analysis. The sentiment analysis has been conducted using MeaningCloud's Text Analytics add-in in Excel. MeaningCloud sentiment analysis produces two sheets. The first one is for the global sentiment analysis sorting includes the polarity, i.e., positive (P), negative (N), or Neutral (NEW), and a confidence score while the other one is for comments' topics sorting including topics categories and topics types (which were used later on in text mining).
2) Regression analysis: The second step in the analysis was identifying factors affecting users' satisfaction. To begin this step, the two datasets were merged. Ordinal logistic regression (OLR) test was chosen as suggested by [35], to answer the research question, by determining which of the independent variables (IV) have a statistically significant effect on our dependent variable (DV), which is polarity.
The reason for selecting the OLR test in this study is because the DV is an ordinal categorical variable where the value is represented with three points scale (N=0, NEW=1, and P=2) and the IVs are a mix of continuous and categorical. Multicollinearity test was conducted to ensure there are no highly correlated IVs, and only two IVs (accommodates and beds) were highly correlated with a result of >0.8; accordingly, the beds variable was removed from the test. Without performing the multicollinearity test, there would be a problem with determining the variable that contributes to the interpretation of the DV [35]. After eliminating the highly correlated variables, the final variables set contained 32 attributes as following: host experience in months, host response time, host response rate, the host is super-host, host total listings count, the host has profile pic, host identity verified, neighborhoods cleansed, is location exact, property type, room type, accommodates, bathrooms, bedrooms, bed type, square feet, price, weakly discount, monthly discount, security deposit, cleaning fee, extra people fee, instant bookable, cancellation policy, require guest phone verification, review scores rating, review scores accuracy, review scores cleanliness, review scores check-in, review scores communication, review scores location, and review scores value.
Taking the results from the OLR test, factors that affect polarity were visualized using Power BI software. Power BI was also used to conduct a text mining test to find the most frequent words within comments. Word's visualization helped to derive additional insights about repeated words in different polarity levels.

A. Sentiment Analysis and Topic Clustering
The results of the sentiment analysis indicate that Airbnb users were mostly having a positive experience in Amsterdam (see Fig. 1). The topic clustering result shows that users were overwhelmingly positive about two aspects of their experience i.e. location and host (Table I). For instance, the likelihood score of 52% for a location means that reviews with the term location represent 52% of the positive sentiments. Location is an essential indicator for customers' satisfaction since they care about the proximity of different facilities to their residence such as public transportation, restaurants, park, station, 257 | P a g e www.ijacsa.thesai.org markets, and shops. As a second major topic within the positive sentiments, the term host represents 31.66% of the positive results; this indicates that many users were satisfied with the way their hosts were treating them. In contrast, the topic that consistently received a negative sentiment was check-in (Table II). The term Check-in represents the majority of the negative sentiment with a likelihood score of 95.69%. This result gives an indicator for hosts to improve their checkin services, to make this process quick, efficient, and smooth. This can be achieved by following the authoritative guide to the Airbnb check-in process. Fig. 2 shows word clouds that support the previous findings.
Although Airbnb's comments are mainly positive, taking a closer look at areas where negative sentiments have occurred, allows Airbnb hosts to address these areas by fixing any problems or setting future expectations (e.g., improving checkin services). Moreover, the strong appearance of topics related to the city's environment (such as location, facilities, transportation, restaurants, parks, stations, markets, and shops) among positive sentiments compared to their weak presence among negative ones, indicating that Amsterdam's general environment can play an essential role in shaping the Airbnb guests positive experience.

B. Factors Influencing Comments Polarity
After applying an OLR test, the authors found that only nine out of thirty-two factors were influencing customers' satisfaction with a significant P value of < 0.05 (Table II). These affecting factors are price, room type (Entire Home/Apartment), the host is not a super-host, review scores rating, review scores accuracy, review scores cleanliness, review scores check-in, review scores communication, and review scores value. Surprisingly, the property neighborhood was not found impacting customers sentiment in Amsterdam which might be a case only limited to Amsterdam as it is one of the safest cities in the world.

1) Price:
Price is the first-factor influences customers' satisfaction (B= 5.957, p<0.05). Fig. 3 illustrates the relationship between properties' daily prices and the polarity of comments sentiment. The figure shows that as the price raises, the negative reviews decline. In other words, positive comments are more commonly associated with high price properties comparing with negative ones. According to [14], properties with some features such as luxurious, penthouse, unique, chic & designed, duplex, Sauna, Jacuzzi & Spa, are relatively more expensive than other properties. It is, hence, expected that the benefits of obtaining luxurious characteristics, make customers willing to pay more.
Similarly, [36] indicates that some amenities are considered a price determinant. The availability of these amenities will give Airbnb customers the quality they are seeking. Eventually, they will be more satisfied and happier compared to customers who didn't obtain these amenities. Therefore, property price impacts customers' satisfaction indirectly since prices are determined by quality, which is critical to gain customers' satisfaction.
2) Room type: The second affecting factor is the room type, including the entire home/apartment, hotel room, private room, and shared room. Fig. 4 showing the percentage of negative and positive comments among the four property types (calculated as a percentage out of the same polarity). For example, entire home/apt negative comments represent 29% out of the total negative sentiments for the four types. The type of entire home/apt is more likely to get negative reviews than positive ones. After comparing the average of different attributes based on user sentiment (Positive versus Negative), the researchers found some factors that affect the entire home/apt polarity (see Fig. 5). The result indicates that decreasing the average square feet influences making negative feelings. Moreover, as host listing increased, the probability of getting a dissatisfied guest will be increased as well. Hence, it is evidence that there's a negative relationship between host total listings and customers' satisfaction. This could be due to hosts being busy handling all their properties simultaneously, therefore providing poor customer services. Eventually, hosts will fail to deliver the expected level of care to their guests.

3) Super-host:
The third affecting factor is when the host is not a super-host. Fig. 6 is showing a similar distribution of super-host within the positive comments, which illustrates that whether a host is super-host or not, it will not affect the positivity of a sentiment. In contrast, the negative sentiments are widely affected by the host being not a super-host. Fig. 6 illustrated that 88.8% of unsatisfied customers were hosted by non-super-hosts. This supports [24] findings, who confirmed that poor services have a more significant impact on satisfaction level than good services. People tend to expect a certain level of quality. Obtaining this level will not positively 261 | P a g e www.ijacsa.thesai.org affect their satisfaction degree, but failing to reach this level will significantly affect their satisfaction negatively. According to [10], what distinguishes P2P accommodation from traditional hotels is to create a sense of place by providing a closer interaction with a resident (host). In such an aspect, online and face-to-face interactions with the host can influence customers' satisfaction, due to the atmosphere host creates. This fact supports our contribution here since low host interactions will lead to customer dissatisfaction. 4) Neighborhood: Surprisingly, though location plays an important role as an affecting element on polarity within topic clustering results (see Fig. 2), the OLR test provided evidence that neighborhood wasn't affecting the polarity (Table II). After a careful review of the results, it seems that the neighborhood by itself has no impact, while the exact location inside each neighborhood does. The reason for this situation is that different location within the same neighborhood district in terms of facilities, quietness, transportation, etc. To confirm OLR result, the relationship between polarity and neighborhoods were represented in Fig. 7 using maps and a bar chart. The maps show a similar distribution between the Ps and Ns comments, and the bar chart supported this result by indicating that each neighborhood has almost the same distribution rate of the Ps and Ns (top 15 neighborhoods). This could be due to the characteristic that made Amsterdam a unique city that everyone would like to visit. Such as its high quality of life, the European culture lifestyle, the small village feeling, and its beautiful landscape in all neighborhoods. One of these characteristics is safety, as [37] illustrated that Amsterdam is considered the fourth safest city in the world and the safest among all of Europe. The level of safety of all Amsterdam's neighborhoods is high in terms of personal security, health security, infrastructure security, and digital security. This support our findings that Amsterdam's neighborhoods do not affect Airbnb customer's satisfaction since the visitors (guests) will be positively biased toward this city and all of its neighborhood in general.

V. CONCLUSION AND FUTURE WORK
Airbnb represents a flourishing implementation of the sharing economy generally, and hospitality mainly. While Airbnb remains a topic of significant attention within the sector, preliminary literature on Airbnb has examined customers' reviews independently. This study is the first endeavor to analyze Airbnb's listings features as factors impacting review polarity. The authors found that prices, the host being a super-host, and room type are the main impacting factors on customers' satisfaction. Moreover, in the topic clustering test, positive comments contained "Location" as the most frequent word, while negative comments contained "Check-In". However, Amsterdam as a place has been confirmed that it's affecting Airbnb's customers' satisfaction and could be a motive to generate positive comments. This study was attempting to avoid the biased sample effect, by looking at factors as a percentage of each factor polarity out of the same polarity total in visualization. Regarding this attempt, there are some limitations to this study. The use of one sample such as Amsterdam could affect the result since there are limitations with a non-random sample.
This study highlights some possible orientations for future research. It would be beneficial to compare the results of different countries, to distinguish common factors versus cultural factors. Accordingly, this will help to confirm whether customers' satisfaction is affected by a country's general environment or not. Furthermore, it would be interesting to test text attributes such as room description and host "about" against polarity. Lastly, this study introduced a new method of determining factors affecting customers' satisfaction, by examining factors against polarity using sentiment analysis and regression test, then confirm the results using visualization. This approach can be applied to other studies in hospitality and beyond.