Netnography and Text Mining to Understand Perceptions of Indian Travellers using Online Travel Services

Advancements in the electronic commerce industry have helped online travel services (OTA) in many ways. The paper examines the overall impact of traveller’s using online services and their sentiments derived from a collection of reviews for online travel service providers known as online travel agents (OTA) in India. Customer reviews from different identified sources are collected and the satisfaction of travellers using various online travel services is analyzed using netnographic analysis and text mining. This paper also covers a detailed process of data collection, analysis using netnography and text mining methods which helps us for the analysis and deriving sentiments from collected reviews. Various results obtained are presented as part of token lists, keyword analysis, and servicespecific analysis. The statistical analysis of different results is tested to understand the relationship between various services and OTA. Keywords—Consumer; travellers; netnography; text mining; OTA; sentiment; perception


A. Tourism Market in India
India, the country is known for its rich history, cultures. India placed 40th out of 136 nations as per the report of Travel & Tourism Competitiveness for the year 2017 [1]. The tourism industry in India is playing a significant role in the economy [2].
The current scenario shows that India has a bright future in the tourism sector as lots of development taking place in the Indian tourism industry. In the last few years, the Ministry of Tourism, Government of India, has initiated various attempts to lift tourism in a country that has opened research opportunities in it. The plans like PRASAD, e-tourist Visa, Mobile applications for tourists are helping out to gain a new tourist footprint [3].

B. Online travel services in India
It is expected that by the end of 2020, the count of the bookings made online is likely to reach 4 billion in India. [4] Makemytrip.com has been positioning high on the rundown of movement sites that fill in as a one-stop look for the whole group. The online booking travel related firm is a commonly known name these days & it has been collaborating with imperative players in the industry. Yatra.com [5] is the best for movement specialists, long-standing corporate customers, and easygoing hikers alike & is an excellent travel site in India.
The best costs and cashback bargains on air are assured with yatra.com.
Cleartrip online travel service provider offers internet booking for train and flight tickets for domestic and international bookings. Goibibo.com is another name which offers the best arrangements/bookings in trains, flight. From booking universal/local occasions to encouraging Foreign Exchange, Visa, Passport, protection for movement purposes. Thomas cook does this everything for the travellers. Travelguru.com, regularly appraised as a standout amongst the most went by Indian travel destinations on the web, is a fortune place of treats with regards to booking air tickets, lodging offices.
Netnography and Tourism industry: Netnography is Method used for studying culture and communities online. It is a tool to understand social interaction in digital communications [6]. Netnography is a known tool accepted as a virtual ethnography tool, is being explored along with text mining. The approach used in this paper also produces an analysis of perceptions of travellers in India using text mining techniques.
Phases in the Netnography Process [11]: Fig. 1 explains various phases in the process of netnographic analysis; below are the various phases in it: 1) Research planning: Research planning [3] is the initial phase in netnographic studies. This phase speaks about defining the problem, defining the research objectives, should talk about translating the research objectives into a specific set of questions.
2) Entrée: In this phase, online communities, blogs, groups are identified and allows an ethnographic to enter into the communities to get a better knowledge of cultures and communities.
3) Data collection: Data collection is a very vital phase in this type of study. In this phase, Netnographer collects data from Internet data, interview data, and field notes. Travellers' reviews downloaded using various rating & referral sites like mouthshut.com and consumeraffairs.com. These are the websites that allow the user to collect millions of reviews posted by travellers in India. The whole reviews were collected using systematics steps in the netnographic process [7]. www.ijacsa.thesai.org 4) Interpretations/Data analysis: This makes use of One of the data analysis technique known as Analytical coding, Renal data analysis consist of coding, noting, abstracting, checking and refining, generalizing and theorizing as shown in Fig. 2.
5) The data collection process was performed from sources like mouthshut.com [8] and Consumer Affairs.com for this study. More than 2000 reviews from these platforms have been collected and maintained in an excel sheet.

II. RESEARCH METHODOLOGY
The online travel agents considered in this research work includes MakeMyTrip, Goibibo, Cleartrip, redBus, and Yatra. A good number of reviews for each of these agents [12] selected are evaluated using netnographic studies. The analysis performed as netnographic analysis is used to carry out the sentiments, the levels of sentiments of travellers using this online travel-related online services in India. Fig. 3 shows the overall rating for Cleartrip on mouthshut.com. Fig. 4 represents the sample to review and rating for Cleartrip [7]. In addition to actual comment, Reviews contain values from 1 (low) to 5(high) for all the website components. Fig. 5 is a sample of the overall rating for MakeMyTrip. Referring to the Fig. 5, it is very much clear that for all five website components, the overall rating for MakeMyTrip is 2 [8].
Data collected from the identified referral, rating sites is stored in the Excel sheet. This data contains various fields as 1) Date of review, 2) Reviewer name, 3) Gender, 4) Age, 5) Location, 6) Review, 7) Source, 8) Review rating, 9) Service and support, 10) Information depth, 11) Content, 12) User-friendly, 13) Time to load. Fig. 6 shows an Excel document of reviews collected from different sources of online travel agents.

A. Text Pre-Processing
The reviews were collected from various online platforms & referral rating sites need to have a series of text preprocessing before referring them for analysis using Text mining. The concept of text preprocessing consists of a series of stages, which include spelling in normalization, filtering, lemmatization. The various text preprocessing tasks are being essential and this includes content cleanup, tokenization, grammatical form tagging [9]. Table I represent various phases in determining positive negative reviews the process of determining sentiments consist of some of the essential processes like splitting words, POS tagging, lemmatization, joining and then sentiment analysis.

Process Output
Initial Review Horrible experience, hotels don't provide basic amenities and charge for everything. Customer service part is worst.

Lowercase
Horrible experience, hotels don't provide basic amenities and charge for everything. Customer service part is worst.

Punctuation
Horrible experience hotels don't provide basic amenities and charge for everything customer service part is worst The results obtained from this research are presented in this section.

1) Token identification:
Various subsections are being identified based on the results. After processing the textual reviews collected using text mining techniques following are the detailed list of tokens identified. The below section represents the top 25 tokens from the overall volume of reviews processed and also online agent-specific [10].
Tokens identified are used for deriving the overall perception of the travellers for the overall. Table II represents the top 25 tokens from both the categories positive and negative. The top 25 tokens based on the frequency of their occurrence has shown in the Table II For overall positive tokens care, support, help, these are the frequently used words by the reviewers while posting the reviews. Similarly, in the negative tokens category words like problems, fraud, cheat, and mistakes are some of the commonly used words by the reviews while expressing on social platforms. 2) Gender-specific (Male) Analysis of tokens: In the gender-specific category of tokens obtained, the Table III represents the top tokens based on their frequency of occurrence in reviews that are listed. It is also identified from the Table III that when it comes to male support, care, friend, help these are the frequently used words while expressing in the form of review. And it denotes the satisfaction of the travellers the category of male.
Similarly, for or negative categories in males, the commonly used words based on the results obtained are a problem, fraud, Waste, mistake. This represents the dissatisfaction of travellers regarding online travel-related services. Tables III and IV   Overall male analysis performed and results obtained indicate that usually while expressing on social platforms about dissatisfaction, travellers have used quite similar words. 3) Gender-specific (Female) Analysis of tokens: This section describes various tokens used in the form of reviews while expressing on social platforms by female travellers. The analysis carried out in below Table V and VI represents the top 5 tokens based on their value of frequency. It also represents their importance while deriving positive and negative sentiments from the reviews hosted by female travellers. In a positive category, the frequently used word by female Travellers is Care, support, please, and so on. In negative categories, the most frequently identified words are problem fraud, cheat, a mistake.
Tables V and VI are the tabular representation of the top 5 reviews from gender-specific analysis of tokens in female traveller.
Overall female analysis performed and results obtained indicate that usually while expressing on social platforms about dissatisfaction, travellers have used quite similar words.
The analysis carried out in table VI represents the top 5 tokens based on their value of frequency for the positive female category. Table VII represents a comparison matrix with the top ten tokens in each OTA. Based on the results shown, it is very much clear that satisfied travellers are frequently expressing with words like support, help, kind, comfort, and dissatisfied travellers are using words like problem, fraud, cheater, mistake while expressing on social platforms.
The comparison matrix shows the frequency of particular words when Traveller post their comments reviews on various sites. This comparison matrix also helps us understand comparative analysis between the keywords in the form of tokens for all the five online travel agents chosen in this research. Words like problem, help, and support are common in all the OTA.

4) Keyword Analysis:
Following part of the research paper presents various keyword visualizations to understand more importance of each of the keyword plays in the entire study. Fig. 9 explains lines of code referred from Pythonbased implementation which has helped to get the overall graphical representation of keywords and its analysis. Fig. 10 represents the first Visualization of keywords for the real data values collected for the research purpose. From the results obtained, it very much clear that words like Hotel, book, ticket, service, call are the most critical words based on their appearances. Fig. 11 and 12 represents the essential keywords in positive and negative category for the entire study. From the overall positive category, it is very much clear that words like best, awesome, trust, kind, super are the critical words describing positive sentiments in this category. In contrast, in overall negative word analysis. Words like fraud, bad, horrible, pathetic, failed is the crucial words that frequently occurred while expressing sentiments by the Travellers.     The section below describes the online travel agent (OTA) Specific keyword analysis. Fig. 13 and 14 depicts the visualization of keywords in Cleartrip. Based on the results obtained, it is very much clear that words like best, trust, great, happy are the most critical in a positive category for Cleartrip. In contrast, Fig. 13 shows bad, fraud, pathetic, fail other common words in negative categories for Cleartrip.   Fig. 15 and 16, covers keywords based on their frequent occurrence for Goibibo. Based on the analysis carried out for Goibibo keywords excellent, best, great, trust, free are the crucial words in the positive category for Goibibo. Whereas worst, bad, fraud, hate is the commonly observed keywords in the negative category for Goibibo. Fig. 17 and 18 in graphical representation. In a positive category, best, free, kind, happy, wonderful are critical words for MMT. In contrast, when it comes to negative category pathetic, bad, worst, horrible, fraud, these are the words carrying much importance in MMT keyword analysis.

8) redBus Keyword Analysis:
Similar to other online travel agents, keyword analysis is performed for redBus. Fig. 19 and 20 depicts various important words in positive and negative categories for redBus. Best, happy, kind, comfortable these are the top-rated words in the positive category, and bad, worst, horrible these are some of the highly-rated words in the negative category for redBus. 9) Yatra Keyword Analysis: Fig. 21 and 22 is the word count analysis done on online reviews collected for Yatra online travel agent. The outcome of the analysis demonstrates the words like best, great, kind, splendid, love is the essential words in the positive category for the Yatra. In the negative category, analysis says the words like bad, fraud, worst, unprofessional, horrible are some of the words that play an essential role in Yatra.

10) Analysis based on Rating:
Each of the review collected from the Mouthshut.com has a rating value attached to it ranging from 1 as Low to 5 as High. Following is the OTA‗s and reviews based on rating value percentage Fig. 23 covers lines of code for getting the statistical value of review ratings for all the reviews which are being collected and processed and relevant graphical representations are obtained with the help of similar lines of code.
The Table VIII represents a statistical analysis of reviews regarding the overall rating given by the reviewer on the scale of 1 to 5, where 1 is low, and 5 is high. The table shows a good number of reviews for each of the online travel agents as review rating low, which means dissatisfaction of travellers is low. At the same time the volume of reviews classified under 5 represents the high level of satisfaction of travellers using an online travel agent and there relevant services MakeMyTrip, Goibibo and Cleartrip are having good volume of reviews under the highest rating. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 566 | P a g e www.ijacsa.thesai.org  Table IX represents the statistical analysis of review ratings for all the OTA in percentage collected and processed for each of these online travel agents. From the table, it is very much clear that MakeMyTrip is having the highest percentage of review rating value with 5. It means the satisfaction level of travellers using various services offered by MakeMyTrip is also high in comparison to other online travel agents.
Cleartrip has 16.2% reviews based on the overall rating given to the review this percentage and it's second after MakeMyTrip. Goibibo is third in the list with the highest percentage under 5. It is very much clear from table values that Yatra is having low preference under the top-level, which means satisfaction is very low for the Yatra. Looking at Table IX for Yatra, 64.9 percent of Travellers have given Low review rating, represent the percentage of the satisfaction of Travellers using services offered by Yatra.

IV. ARCHITECTURAL VIEW OF MODEL
In this research work, netnography and Text mining techniques are used as an integrated approach. The following are the various stages in the architectural view of the basic model in this study. Fig. 24 represents the architectural view of the platform used in the text mining process. As mentioned in the diagram, the following are the steps that help get a visualized representation of specific results. The first stage in the architectural view is inputting of the review. In this research study, the reviews or posts used are collected from various online platforms using Netnographic guidelines. Collected reviews are maintained in a repository. The second stage is the preprocessing stage. In this stage, preprocessing of the text reviews collected is done using various techniques of splitting the words, converting the words, checking for the missing values is performed. The next stage is cleaning the reviews; in this process of cleaning of the reviews, it is very much essential that some of the processes which help the text to refine and get a meaningful text for further processing. Sentiment score stage, the sentiments of each word, and then for the entire review are obtained.in this stage, using the Vader sentiment lexicon dictionary. The sentiments are derived as positive, negative, neutral-compound sentiment score help to classify the review into positive or negative. In the review classification stage, based on the compound sentiment scores with more than 0.05 compounded courses are classified as positive, and the sentiment scores between -0.05 and 0.0 are considered neutral review. The review has a compounded score of less than -0.05 classified as negative reviews. In the visualization stage, based on the review, the various analysis has been carried out in the graphical representation has been generated in the form of the word cloud, pie charts.

V. STATISTICAL ANALYSIS OF RESULT
In this section, the statistical analysis results obtained using the Chi-square test are discussed. A Chi-square test for various parameters is performed to understand and test the relationship between various variables based on the results of the test acceptance or rejection of the hypothesis is being chosen. Performed Chi-square test on the following scenarios:

1) Scenario 1(OTA vs. Gender):
Here, the Chi-square statistical test is used to test whether two variables are independent or not. Chi-square hypothesis testing is used to understand whether there is any relationship between the online travel agent and the gender of travellers. a) Null hypothesis (H0): No association/relationship between gender of traveller and OTA.
b) The alternate hypothesis (H1): Gender of traveller and OTA has an association/relationship Table X(a) and X(b) shows the calculations of test statistics. P-value obtained is .00000009307. With the degree of freedom 4 and at 5% level of significance, the critical value/tabular value is 9.49. The calculated chi-square value(33.54) is greater than the critical or table value(9.49). There is enough of the statistical evidence to reject the null hypothesis and to accept the fact that there is an association or relationship between various OTAs and the gender of travellers.
Since 33.52 > 9.48 or our P-value < 0.05 The alternate hypothesis is accepted. Thus there is an association between gender and people using OTA. From Table XI, P-value obtained using the chi-square test is 0.01, chi-square value from P-value is 12.78; critical or tabulated chi-square value at the degree of freedom 4 and level of significance 5% is 9.49; since 12.78 > 9.48 An alternate hypothesis is accepted; it means an association or relationship between sentiments and people using OTA. Fig. 26 is a graphical representation of about table values representing the relationship between OTA Sentiment against the total number of reviews for the online travel agents.

3) Statistical Analysis of Positive negative ratio Vs.
Gender: In the below section, a detailed analysis of the relationship between positive-negative sentiment ratio and gender. Based on the data shown in the Table, P-value using the Chi-square statistical test is 0.00. The following are the null and alternate hypotheses framed to identify whether there is any relationship between the positive-negative sentiment ratio and gender.

a) Null hypothesis (H0): No association between Sentiment and Gender b) Alternate Hypothesis (H1): Association between Sentiment and Gender
Referring to Table XII, The Chi-square value calculated using P-value is 12.96. And with a 5% level of significance and 4 degrees of freedom, the tabular value for chi-square is 9.48. Since the calculated value, 12.96 is more than the tabular value for chi-square 9.48. There is enough evidence to reject the null hypothesis and accept the alternate hypothesis, which says there is a relationship between Sentiment and gender. Since 12.95 > 9.48, Alternate hypothesis is accepted. Fig. 27 represents a detailed classification of the total volume of reviews and sentiments as a positive and negative gender-wise. The finding from the analyses says that the male and female total percentage of negative review sentiment in male is less than female. www.ijacsa.thesai.org

VI. SERVICE SPECIFIC ANALYSIS
Online travel agents used in this research provides a various set of services to the users, customers, travellers. These services include hotel booking, flight booking, bus booking. In this section, the results obtained regarding these services are evaluated, and reasonable interpretations are derived. Fig. 28 shows lines of code help to do the analysis with reference to sentiments for each of the OTA specific to services identified.  After redBus Travellers preferred Goibibo and then Makemytrip, the finding here is for bus-related services, redBus is on top of all.
Flight-related services are offered by the following online travel agents chosen for the study, Cleartrip, Goibibo, MakeMyTrip, and Yatra. The graphical representation Fig. 30 shows that for the flight-related services, the volume of negative reviews and sentiments are more with Cleartrip than other service providers. For Goibibo, the positive sentiments are little more than the negative sentiments which we can interpret in a manner that Travellers or customers are a little happier regarding flight-related services offered by Goibibo. Similarly, for MakeMyTrip, positive sentiments are more than negative sentiments for all the possible reviews processed for flight-related services which also represents the satisfaction of travellers is more than dissatisfaction for Makemytrip. For Yatra, the positive sentiments are less than the negative sentiments. Also, the volume of reviews for Yatra regarding flight-related services is low; it represents the satisfaction is www.ijacsa.thesai.org lower than the dissatisfaction and preference for flight-related services is not towards Yatra. The following are the online travel agents that offer hotel booking related services online Cleartrip, Goibibo, MakeMyTrip, Yatra. Fig. 31 clearly explains the results of the analysis done regarding Hotel related services offered by these online travel agents in India. When it comes to Cleartrip Hotel related services, Fig. 31 represents the negative sentiments are more than positive it means when it comes to hotel-related services, online dissatisfaction is more observed in customers or travellers about Cleartrip. Goibibo looks at the top choice for Hotel related services online. Again based on the analysis, it is proved that positive sentiments are more than negative sentiments. It means the satisfaction of travellers using Hotel related services offered by Goibibo is high. MakeMyTrip also follows the line of Goibibo when it comes to hotel-related services. It has become the second preference for Hotel related services after Goibibo. It is also clearly indicated that Yatra is rarely preferred by travellers when it comes to online Hotel related services.

VII. CONCLUSION
Customer is King for a successful business, customer satisfaction is considered to be a prime attribute. This research helps in consumer decision making while choosing various travel-related services and also businesses to choose their strategies to improve on services offered based on listening to the consumers.
Netnography and Text mining techniques used to perform analysis and processing of reviews and comments collected from various online platforms. The review collection has followed all the guidelines suggested by netnographic studies for five online travel agents i.e. MakeMyTrip (MMT), Yatra, Cleartrip, Goibibo, redBus analysis is performed using text mining as an integrated approach.
Based on results obtained regarding service-specific analysis, it is possible to conclude that for flight booking, related services traveller's satisfaction is more with MMT, Goibibo. Consumers have preferred MakeMyTrip as the first option and Goibibo as Second and followed by Yatra. For Hotel booking related services satisfaction of travellers using services from MMT, Goibibo is high compared to others. For bus booking related services, users are more happy and satisfied with redBus then Goibibo and MMT.
This research work has used various approaches concerning the understanding of satisfaction, dissatisfaction, or the perception of travellers using selective travel-related online service providers in India to build a competitive advantage to customers and company too. Online travel service providers in India can make appropriate decisions according to the results obtained from the study regarding customer perception and create a competitive advantage. Also, consumers can choose a particular online travel-related service provider based on the result of the study for improved services and traveller satisfaction.