Sentiment Analysis on Customer Satisfaction of Digital Banking in Indonesia

Southeast Asia, including Indonesia, is seeing an increase in digital banking adoption, owing to changing customer expectations and increasing digital penetration. The pandemic Covid-19 has hastened this tendency for digital transformation. However, customer satisfaction should not be left unmanaged during this transition. This research aims to obtain customer satisfaction of digital banking in Indonesia based on sentiment analysis from Twitter. Data collected were related to three digital banks in Indonesia, namely Jenius, Jago, and Blu. Total of 34,605 tweets were collected and analyzed within the period of August 1st 2021 to October 31st 2021. Sentiment analysis was conducted using nine standalone classifiers, Naïve Bayes, Logistic Regression, K-Nearest Neighbours, Support Vector Machines, Random Forest, Decision Tree, Adaptive Boosting, eXtreme Gradient Boosting and Light Gradient Boosting Machine. Two ensemble methods were also used for this research, hard voting and soft voting. The results of this study show that SVM among other stand-alone classifiers has the best performance when used to predict sentiments with value for F1-score 73.34%. Ensemble method performed better than using stand-alone classifier, and soft voting with 5-best classifiers performed best overall with value for F1-score 74.89%. The results also show that Jago sentiments were mainly positive, Jenius sentiments mostly were negative and for Blu, most sentiments were neutral. Keywords—Sentiment analysis; ensemble method; customer satisfaction; digital bank


I. INTRODUCTION
The Covid-19 global pandemic has wreaked havoc on economies, people, and societies [1]- [4]. With more individuals staying at home and working from home, the banking industry is seeing how the global health crisis has prompted clients to adopt digital services to cope with lockdown measures. Indonesia is well suited for digital banking due to its large unbanked population and high mobile penetration rate [5]- [8]. Traditional banks with legacy models, on the other hand, are facing a pressing need to digitally convert their services in order to keep up with increased demand, as the pandemic increases both consumers' and businesses' need for digital banking services availability, access, and control [9].
Because of digital banking, the competitive dynamics in banking industry of Asia Pacific are growing fiercer. Newly licensed indigenous digital banks, worldwide virtual-only banks, and digitized traditional banks are all fueling competition. In the long run, digital banking will almost probably lead to more ratings dispersion between banking systems and institutions in Asia Pacific [10]. The rise of digital banking in Asia was recently charted by McKinsey. As the region's authorities raise license allocations and establish standards for the next generation of digital banks, there will be chances for both incumbents and new entrants to enter the digital banking sector [11]. Fig. 1 shows growth of digital banking in Asia. Indonesia, however, has yet to establish a legal framework for digital banks. The prerequisite to establish a digital bank can only be met with the acquisition of a banking license. It differs from its two neighbors, Singapore and Malaysia [12]. Currently there are several digital banks operated in Indonesia such as Jenius, Jago, and Blu. Customer satisfaction will be one of the important factors for the success of digital banks. Text mining technique such as sentiment analysis had been used by researches to examine users' opinion through social media. Previous study by Wisnu et al. [13] used sentiment analysis using Twitter data to obtain customer satisfaction of digital payment in Indonesia, and compare K-Nearest Neighbor (KNN) and Naïve Bayes classifier algorithm accuracy. KNN has superior accuracy than Naïve Bayes, according to their research, and clients are nearly satisfied with the services offered. Another study by Effendy et al. [14] used sentiment analysis to analyze public opinion on Twitter for City Public Transportation using Support Vector Machine (SVM). Their study showed that the public has mixed feelings for public transport services.
This study employed sentiment analysis to determine customer satisfaction with digital banking in Indonesia from Twitter data collected for three digital banks in Indonesia, namely Jenius, Jago, and Blu, based on previous research. Sentiment analysis was carried out utilizing two ensemble approaches, hard voting and soft voting, as well as several RQ3. What is the customer satisfaction of digital banks in Indonesia namely Jenius, Jago, and Blu based on sentiment of tweets?
By answering these research questions, we hope that the result of this study may enrich previous research in the use of text mining for analyzing customer satisfaction in banking industry.
This study is structured in several sections: in Section II, the relevant literature regarding digital banking and sentiment analysis is discussed here. In Section III, the research process used in this study is explained here, while in Section IV and V, result of data analysis and discussion are presented. The last section, that is Section VI and VII, concludes with conclusion and future work recommended based on this study.

A. Digital Bank
Digital banking, often known as branchless banking, is the delivery of financial services outside of traditional bank branches by information and communication technology (ICTs), according to [15]. To some extent, most major banks throughout the world have gone digital. For everything from checking their balances to making complex payments, consumers have grown accustomed to having the option of visiting a local branch or banking via their home phone, mobile phone, or computer. Digital banks, on the other hand, are a more recent and disruptive branch of banking. Digital banks are distinct from other types of digital banking in that they can only be accessed via the internet. Within a city or a country, they do not have any branch offices. Consumers assume that digital banks' infrastructure and personnel savings will be immediately translated into higher savings rates and lower lending rates. Some clients, on the other hand, may miss the emotional comfort of going to a neighborhood branch office, renting a safe deposit box, getting advice from a banker, or dealing with neighborly activities [16].
Currently there are several digital banks operated in Indonesia such as Jenius, Jago, and Blu. Bank Tabungan Pensiunan Nasional (BTPN) established Jenius, Indonesia's first digital bank in 2016, and released a mobile banking app that allows clients to manage their money using their mobile phones, bringing smooth banking to the Indonesian new generation's fingertips [17]. Whereas Bank Artos, in 2020, changed their name to Bank Jago and start providing digital banking and digital finance service that boast integration with various digital ecosystem in Indonesia [18]. In 2021, Bank Central Asia (BCA), Indonesia's biggest private bank, through their subsidiary, now named Bank Digital BCA launched their digital bank apps Blu [19].

B. Sentiment Analysis
In recent years, sentiment-based opinion mining has been investigated to better understand the attitudes and features of demographic or market groupings, as well as the trustworthiness of content and reasons for submitting evaluations [20]. Diverse sentiment analysis approaches have been produced in a variety of disciplines, resulting in modest number of publications [20]- [22]. Sentiment analysis is based on the assumption that information presented through text (e.g., a review) is either subjective (i.e., opinionated) or objective (i.e., factual) (i.e., factual). Personal sentiments, beliefs, and judgments are used to make subjective assessments of entities or events. To develop objective reviews, facts, evidence, and measurable observations are used [23]. Customer evaluations and social media posts commonly convey happiness, dissatisfaction, disappointment, delight, and other emotions [24]. Sentiment analysis is a polarity classification challenge in terms of methodology. Depending on the number of classes involved, sentiment polarity categorization might be binary, ternary, or ordinal. In a binary categorization, we assume that a given customer evaluation is subjective. In other words, a binary categorization assumes that a text is mostly negative or positive, and then assigns the review a polarity of "negative" or "positive". The negative and positive poles of sentiment are defined differently depending its domain and application. In the context of tourism, "negative" and "positive" may relate to "unsatisfied" and "satisfied" respectively, although additional research is needed to connect theoretical frameworks of satisfaction to sentiment polarity.
Because reviews aren't always subjective, a ternary classification with a third, "objective" category is required. In the ternary classification problem, the classifier performs an implicit classification to distinguish between objective and subjective statements, assigning a "negative", "positive", or "neutral" class label. Negative and positive polarity are sometimes confused with neutral polarity. To tackle sentiment analysis, a cascaded technique can be utilized, consisting of a binary classifier to discriminate between subjective and objective evaluations and a binary polarity classifier to further categorize subjective reviews into two groups, negative or positive. In objective assessments, words that are clearly defined as negative or positive in a dictionary are rarely seen. They may also comprise polarities that are blended without a distinct sense of direction. Ordinal classification can be done utilizing a sentiment strength rating system in addition to binary and ternary classification (e.g., one to five stars) [25].

C. Feature Inference and Extraction
Following the study done by Lyu et al. [26], Face++ API could be used to infer demographic information such as age and gender of users using their profile pictures. Face++ API will return zero or more faces based on profile picture URL provided. In this study, missing picture URL or invalid URL and result with 0 faces are excluded, the rest were inferred. Based on study by Lyu et al. [26], Face++ API provides good accuracy when used to infer age and gender information paired with Twitter data. www.ijacsa.thesai.org In addition, the TF-IDF (Term Frequency -Inverse Document Frequency) approach was applied to extract features in this work. The TF-IDF method combines the concepts of Term Frequency (TF) and Inverse Document Frequency (IDF). The TF approach counts the number of times a word appears in a document. Meanwhile, the IDF approach calculates the word's importance throughout the whole list of documents. TF-IDF is a statistical methodology for determining the relevance of a word in relation to a document in a collection of documents that combines the TF and IDF methodologies. The TF-IDF approach [27] has the potential to increase sentiment analysis performance.

III. METHODOLOGY
This research aims to find customer satisfaction of digital banking using sentiment analysis from tweets related to three digital banks in Indonesia, namely Jenius, Jago, and Blu. This research is done using Python, a general-purpose programming language used in many applications such as education, scientific and numeric computing. Tweets were collected through Twitter API using snscrape library.

A. Data Collection
Data was collected from social media site Twitter through Twitter API. Tweets data were scrapped from between August 1st 2021 to October 31st 2021. Subset of the data collected will be annotated manually by two researchers and used for training and evaluating the classifier model.

B. Pre-processing
Data obtained from Twitter needs to be cleansed before feeding it to the classifier model. Pre-processing is needed to remove noise, unwanted or unnecessary data and make it predictable and analyzable for next task. The text preprocessing steps in this research are described as follows.
 Case Folding -First process was converting all the characters in a tweet into the same case, in this case lower case is used.
 Sentence Normalization -Second process was identifying what type of text is there and then, removing special (non-alphanumeric) characters, and trimmed excess spaces.
 Word Tokenize -Third process was splitting sentence into words to be subjected for further analysis like stop word removal and stemming.
 Stop Words Removal -Fourth process was removing stop words from given text so that other words which define overall meaning could be focused more. NLP-id library was used in this process and later in stemming process.
 Stemming -Fifth process is reducing a word to its word stem that affixes to suffixes and prefixes. Lemmatizer was used in the process to get the root word from every word in a tweet.

C. Feature Inference and Extraction
Face++ API was used to infer demographic information such as age and gender of Twitter users using their profile pictures. Demographic was inferred using facepplib library. TF-IDF utilized in this study to extract features. It determines if a word is related to a document in a collection of documents. sklearn library is used for this and later subsequent process.

D. Training Standalone Classifiers
In this step, we trained several popular classifiers namely Naïve Bayes (NB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), Decision Tree (DT), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGB) and Light Gradient Boosting Machine (LGBM). We utilized multinomial distribution for the NB classifier because it has been shown to perform well in text classification based on [28]. Meanwhile, we used the linear kernel for SVM for the same reason. The classification model is then assessed for accuracy, precision, recall, and F1-Score using K-fold cross validation with K-Fold=10.

E. Ensembling Classifiers
In this step, several best classifiers from the previous step were combined or ensembled to obtain better predictive performance. Hard voting and soft voting were the two types of ensemble voting procedures we employed. In hard voting, each stand-alone classifier has one vote, and the winner was determined by a majority vote. Meanwhile, average category probabilities were utilized as voting scores in soft voting, and the winner was chosen based on each classifier's greatest vote score or average probability. www.ijacsa.thesai.org

F. Sentiment Analysis Result
In the last step, we take best classifier model from previous two steps and used it to analyze the sentiment for rest of the data. We used sentiment analysis to identify sentiment polarity of given tweet whether it's positive, negative, or neutral towards selected digital banks. Then, result of sentiment analysis will be discussed further.

A. Data Collection
Data collected from Twitter by using Python snscrape library yielded total of 34,605 tweets in Indonesian language related to three digital banks in Indonesia, namely Jenius, Jago, and Blu. We removed duplicate tweets based on tweet ID and excluded tweets coming from each bank official account as we would like to find bank's customer satisfaction sentiment and that left us with a total of 24,672 tweets. We sampled 2100 tweets and manually annotated their sentiment. Process of manual annotation yielded 813 tweets with positive polarity, 585 tweets with neutral polarity, and 702 tweets with negative polarity. Table I shows three samples of manually annotated tweets.

B. Pre-processing
After data collection and manual annotation steps, next step was to prepare the data for building classifier model. The text pre-processing used are case folding, sentence normalization, word tokenize, stop words removal and stemming with the help of Python NLP-id library.

C. Feature Inference and Extraction
After data pre-processing, next step was feature inference and extraction. Face++ API was used to infer demographic information such as age and gender of tweets' users using their profile picture. Out of 24,672 tweets, we found 9,515 valid unique users that have profile pictures to be inferred. But unfortunately, only 61 users age and gender information could be inferred and a total of 103 accompanying tweets. Age information was found between range of 21 years old and 68 years old and we simplify it based on these age group, young adult (21)(22)(23)(24)(25), adult (26)(27)(28)(29)(30)(31)(32)(33)(34)(35), middle aged adult (36-55) and senior (>55).
TF-IDF was used for feature extraction which gave weights to words based on its frequency and importance. The result will be used to fit classifier model algorithm for prediction. We used scikit-learn machine learning toolkit for this process to extract both unigram and bigram word features. Furthermore, we only retained 50,000 most frequently occurring words in the dataset. The word must occur in at least 2 documents and must not occur in more than 50 percent of the documents. Table III shows the result of TF-IDF.
The higher the TF-IDF score, the rarer the term is in the document and vice versa. For example, the more common the word across documents, the lower the score is (e.g., "dapet" and "biaya"). The more unique a word to the first document (e.g., "dapet cashback" and "jenius appsnya"), the higher the score is.

D. Model Training
After feature extraction steps, next process was to train and evaluate selected classifier models using scikit-learn machine learning toolkit. In the process, we compared 9 standalone classifiers with 2 ensemble classifiers built from 5-best standalone classifiers. For model testing, we employed 10-fold cross validation. The tweets dataset was first separated into ten equal parts. Tweets from 9 folds were used as train set in each iteration of cross validation, while the remaining fold was used as test set. In the procedure, accuracy, precision, recall, and F1score were assessed. Table IV shows the results of each classifier model's performance. Table IV, SVM outperforms all other standalone classifiers in terms of overall performance, with values of 74.29%, 74.58%, 73.13%, and 73.34% for accuracy, precision, recall, and F1-score, respectively. LR performed below RF with similar F1-score of 73.30% with RF, NB and LGBM are the next best three after. It was clear that KNN performed worst overall with accuracy, precision, recall, and F1-score www.ijacsa.thesai.org respectively 40.52%, 46.65%, 38.25% and 32.56%. Meanwhile, AdaBoost and XGB performed better than KNN by 65.72% and 70.63% F1-score respectively. All the ensemble methods with 5-best classifiers (SVM, LR, RF, NB, LGBM) have higher performance overall over stand-alone classifiers. Soft voting ensemble method with 5-best classifiers perform best overall with value for accuracy, precision, recall, and F1score respectively 75.86%, 76.18%, 74.67% and 74.89%.

E. Sentiment Analysis
In this last step, we used best performing classifier model from previous step which was soft voting ensemble method with 5-best classifiers to predict sentiment automatically for the rest of tweets dataset. Out of 22,572 tweets, 12,504 tweets have positive sentiment, 5,603 tweets have neutral sentiment, and 4,465 tweets have negative sentiment. Result of predicted sentiment then combined back with manual sentiment for result discussion. Table V shows the combined result of manual and predicted sentiment mapped to each digital bank, namely Jenius, Jago and Blu. Result of sentiment analysis was also combined with demographic information inferred from users' profile picture to find sentiment polarity distribution in different age group and gender. Out of 9,515 valid unique users, we were only able to inferred 63 users with a total of 103 accompanying tweets. Table VI shows the demographic group of successfully inferred users and their tweets sentiment.

V. DISCUSSION
This research aims to obtain customer satisfaction of digital banking in Indonesia based on sentiment analysis from Twitter related data collected for three digital banks in Indonesia, namely Jenius, Jago, and Blu. The result displayed in Table IV shows that SVM has the best performance overall compared to the other stand-alone classifiers with value for accuracy, precision, recall, and F1-score respectively 74.29%, 74.58%, 73.13% and 73.34% compared to research done by [28], SVM performance was second best behind NB by small margin. This research showed similar result with previous research that KNN perform worst compared to the other stand-alone classifiers accuracy, precision, recall, and F1-score respectively 40.52%, 46.65%, 38.25% and 32.56% for text classification.
Both ensemble methods, hard voting and soft voting with 5-best classifiers performed better overall compared to standalone classifiers. Soft voting with 5-best classifier being the best with overall performance of accuracy, precision, recall, and F1-score respectively 75.86%, 76.18%, 74.67% and 74.89%. Ensemble method is a technique that combines several stand-alone classifiers model to produce one optimal classifier model. Thus, the performance is dependent to the classifiers that composed it. Then, by using only the 5-best classifiers, the ensemble method's likelihood of achieving greater performance might be improved. Between hard voting and soft voting, hard voting has a lower performance compared to soft voting because they dependent to individually predicted labels of each classifier, low performance from one of the classifiers could affect the result more. Soft voting, on the other hand, performs better because it uses a more robust technique to forecast labels formulated by the average probability value from all classifiers, which reduces overfitting and creates a smoother model [28].
This research also produced classification of sentiment with positive, neutral and negative polarity for each digital bank as seen in Table V. Bank Jago has most tweets compared to the other two digital banks with value of 12,639 tweets, Bank Jenius and Bank Blu have 10,094 and 1,939 tweets respectively. Small number of tweets for Bank Blu might be contributed to the fact that they are relatively a new player in digital banking industry in Indonesia. Fig. 3   For demographic distribution result, based on age group, adult and young adult tend to express their opinion more in social media and as they get older, they are less inclined to voice their opinion. Middle aged adult tends to give opinion with varying polarity, on contrary young adult and adult tend to give opinion with positive polarity. Meanwhile, based on gender group, female tend to express their opinion more than male and they have tendencies to give positive opinion as their counterpart, male tends to give opinion with mixed polarity. But this result needs to be confirmed further as population of successfully inferred users for demographic information is very less compared to total population of valid unique users. This low inferred rate may be contributed due to small resolution and low quality of profile pictures. Image preprocessing is recommended for future studies to improve success rate of inferred profile pictures such as interpolation technique to increase low image resolution and image filtering and segmentation technique to improve image quality.
The sentiment of the digital banking in Indonesia could be further analyzed using word cloud to view most frequently appeared word in the tweets, which were represented by its size and color hue. Fig. 4 displayed word cloud related to three digital banks in Indonesia. Green word cloud represents positive sentiment, yellow word cloud represents neutral sentiment and red word cloud represents negative sentiment. The result showed that most tweets contained comments related to digital banks apps and its features, user experience, bank policies, and running promotions.
Most positive sentiment that mostly coming from Bank Jago were related to appreciation of promotional events such as giveaways that were held by Bank Jago. Other things that the users being appreciative were related to Bank Jago policies related to low admin fee and high interest rate compared to other digital banks. Some users also praised Bank Jago apps for its simple user interface and fast user experience. Meanwhile, for negative sentiments that mostly coming from Bank Jenius were related to complaint of slow responding apps, login issues and other bad user experiences when using the apps. Other things that user complaints related to Bank Jenius policy, Feesible, whereby additional subscription fee is charged to customers to unlock services or features, but without significant improvements to its apps. Complaints also raised for their complicated process to unlink connected device. Most neutral sentiments observed were related to inquiry of services and features offered by respective digital banks. Inquiries also included new prospective users asking for information, user experience and comparison of digital banks before deciding to open an account. Users also curious for new player in the digital bank space, Bank Blu, as it was often mentioned together with other digital banks comments. This was often true when users asked for comparison of digital banks. This showed that word of mouth and user testimony from social media like Twitter is one of deciding factor for user in choosing digital banks. Based on this, digital banks should pay more attention to the sentiments and user voices on social media, and make sure that their needs are being heard and improvements done are based on user feedbacks. They can leverage automatic sentiment analysis in their customer service tools to increase respond time in complaints handling or identifying most requested improvements from user feedbacks or targeting specific demographic group for promotions.

VI. CONCLUSION
Sentiment analysis is useful in social media analysis as it allows us to gain an insight in public opinion for certain topics. This research aims obtain customer satisfaction of digital banking in Indonesia through sentiment analysis using Twitter data for three digital banks in Indonesia, namely Jenius, Jago, and Blu with sentiment analysis approach. Sentiment analysis was conducted using nine stand-alone classifiers, namely, Naïve Bayes, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Random Forest, Decision Tree, Adaptive Boosting, eXtreme Gradient Boosting and Light Gradient Boosting Machine, and two ensemble methods were also used for this research, hard voting and soft voting.
Results of classifier models evaluation showed that SVM among other stand-alone classifiers has the best performance when used to predict sentiments compared to the other standalone classifiers with value for accuracy, precision, recall, and F1-score respectively 74.29%, 74.58%, 73.13% and 73.34%. Meanwhile, ensemble method performed better than using stand-alone classifier, and soft voting with 5-best classifiers www.ijacsa.thesai.org perform best overall with value for accuracy, precision, recall, and F1-score respectively 75.86%, 76.18%, 74.67% and 74.89%.
Results of sentiment analysis showed that positive sentiments were dominant for Bank Jago with the value of 82.62%, Bank Jenius sentiment mostly were negative with the value of 43.50% and for Bank Blu, most sentiment were neutral with the value of 44.46%. Based on findings, user tweets revolved around digital banks apps and its features, user experience, bank policies, and running promotions. Positive sentiments came in form of appreciation toward promotional events, customer centric policies, user friendly apps and fast user experience. Meanwhile, negative sentiments came in form of complaints toward bad user experience when using apps and complicated policies. Lastly, neutral compliments came in form of information or inquiry related to digital banks services and service comparison between them. Furthermore, demographic distribution shows that based on age group, adult and young adult tend to voice their opinion more on social media compared to other age groups. Meanwhile based on gender group, female tend to voice their opinion more compared to their male counterpart. Based on this, digital banks should pay more attention to the sentiments and user voices on social media and improve their services and offering accordingly.

VII. FUTURE WORK
Based on findings in this study, there are several suggestions for the future works that can be explored to improve classifier performance and feature inference. First, adding more data pre-classification and pre-processing steps could be considered, such as part-of-speech (POS) tagging. Second, instead of using only TF-IDF feature, another type of features could be considered, such as Word2Vec or Paragraph2Vec. Success rate of feature inference based on profile picture may be improved by image preprocessing technique to increase image resolution and quality such as interpolation, image filtering and segmentation technique. Finally, study related to customer satisfaction of digital banking in Indonesia could be expanded beyond text classification and opinion mining using different set of methodology.