Machine Learning Mini Batch K-means and Business Intelligence Utilization for Credit Card Customer Segmentation

An effective marketing strategy is a method to identify the customers well. One of the methods is by performing a customer segmentation. This study provided an illustration of customer segmentation based on the RFM (Recency, Frequency, Monetary) analysis using a machine learning clustering that can be combined with customer segmentation based on demography, geography, and customer habit through data warehouse-based business intelligence. The purpose of classifying the customers based on the RFM and machine learning clustering analyses was to make a customer level. Meanwhile, customer segmentation based on demography, geography, and behavior was to classify the customers with the same characteristics. The combination of both provided a better analysis result in understanding customers. This study also showed a result that minibatch kmeans was the machine learning model with the rapid performance in clustering 3-dimension data, namely recency, frequency, and monetary. Keywords—Customer segmentation; machine learning; business intelligence; data warehouse


I. INTRODUCTION
Knowing customer needs is a way to win the competition in the market and increase company profits. By knowing what customers want, companies can create effective marketing strategies. Every customer has different needs and expectations, but some have similar or the same characteristics. One way to group several customers who have the same characteristics is to create customer segmentation. Customer segmentation is also the key to improving customer relationships. The process of information analysis to understand the market and customer is a part of a marketing strategy known as marketing intelligence.
Previous research used machine learning for customer segmentation. The type of machine learning used is unsupervised machine learning. One of them is using k-means [1] or using Hierarchical Clustering which is combined with PCA (Principal Component Analysis) technique [2]. Several other studies combine RFM (Recency, Frequency, Monetary) Analysis with K-means to determine customer ratings [3]. Some of these studies only make segmentation based on numerical values or predictive numbers generated by machine learning such as annual income and spending scores [1] or RFM Score [3], but do not grouping them with categorical and descriptive data. When there is a question, which city does the customer group live in with the highest annual income? So, to get the answer, we must explore the data further.
Another problem is how to make the data well integrated. Well-organized data will facilitate analysis and report generation better. The quality of the data must also be considered. Problems in data such as duplication, different formats, incomplete and dirty data are things that must be overcome for better data governance [4].
Based on that fact, a research idea emerged that utilizes machine learning and business intelligence to create customer segmentation in a data warehouse platform. The information technology advancement enables the data to be processed and analyzed better. One of the examples is machine learning and business intelligence technology utilization. The use of machine learning can display predictive data, while business intelligence displays descriptive data. The integration and the combination of both will provide knowledge for a company in making an accurate business need. Data warehouse will make the data well integrated, stored for a long time and not interfere with data in the main system or transaction operations. Data quality can be handled well through the ETL (Extract, Transform, Load) process in the data warehouse [5].
This study discussed the utilization of machine learning and business intelligence built in the data warehouse platform using SQL Server. The outcome can be analyzed by marketing through dashboard and business intelligence reports. The data used here were the credit card transaction data for three months from banking companies in Indonesia. A machine learning model that can be used is unsupervised learning known as clustering. This study also tested some machine learning models of clustering to find a model with a rapid performance.

II. THEORETICAL FRAMEWORK
Banking is any kind of activity in banks, including organizational business activities and the process. Meanwhile, a bank is a business entity operating the business activity. The function of a bank is collecting funds and also a distributor of credit to both individuals and business entities [6]. A bank has several types of loan products, such as working capital loans, investment loans, and consumer loans. A credit card is a part of consumer credit given to an individual in the form of a card that can be used for purchasing goods and services in shops, supermarkets, restaurants, etc.

1)
Geography. It divides a market based on the location of the domicile, for instance, country, province, and city.
2) Demography. It divides a segmentation based on age, family, income, occupation, education, religion, etc.
3) Psychography. It is a part of psychological and demographic science in understanding consumers better, such as lifestyle or the value of life.

4)
Behavior. This segmentation divides customers into several groups based on their habits, knowledge, or responses to a certain product.
Customer segmentation can also be performed based on Cost to Serve, Net Price, and relationship value [9]. Net Price and Cost to Serve are types of costs that can be measured. Relationship value has qualitative characters, and the value is determined intuitively by managers. This matrix is able to provide variability in making customer segmentation.
Another method in making customer segmentation is by using the RFM (Recency, Frequency, and Monetary) analysis. The purpose is to determine the customer level based on their purchase history [10]. This method consists of 3 dimensions, namely.

1)
Recency. The last time a customer does a transaction. The Recency value is calculated from the difference of total days by subtracting the last date of transaction from the date of the current process. The lower value will be better.
2) Frequency. The frequency of a customer does a transaction. The higher value will be better.
3) Monetary. The amount of money spent. It is equal to Frequency that the higher value will be better.
A data warehouse is separate data storage from the primary application operating an operational transaction process. In this process, the data are transformed into information that can be analyzed by consumers [11]. The data process starts from data integration that has been extracted from the primary operational application and is transformed and loaded into a format that is appropriate to the data structure in the data warehouse. This process is known as ETL (Extract Transform Load).
The ETL process contains data cleaning, filtering, aggregate, and types of transformation. After the data are collected orderly, the data can be used for data mining or business intelligence. Business intelligence (BI) is defined as data presentation to entrepreneurs to be used for gaining knowledge or making a business decision [11]. Business intelligence is an important part of business analytics because it produces effective analysis [12].
In making a business intelligence model, two tabulated models are generally used, namely.

1)
A fact table is a table containing a transaction table consisting of numeric data that can be changed every day. For example, sales, purchase, finance, etc.
2) A dimension table is a table containing data category that generally the content rarely changes, and it will be used for data classification and aggregation contained in the fact table. For example, the Customer dimension table contains customer id, customer name, date of birth, address, and the like.
A schema in business intelligence is a group of tables consisting of dimension, fact, and attribute designed in differently according to the necessity. A schema consists of several types. They are as follows.
1) Star Schema. This schema puts the fact table in the center and the dimension table is placed around the fact table forming a star pattern. In the star schema, the dimension table with the same hierarchical data structure is placed in one table.
2) Snowflake Schema. This schema is different from the star schema model, whereby the dimension table is separated for the main category. For example, in the star schema, the data in the product table for product and product group is merged, while in the snowflake schema, both data are separated. Thus, in the end, the snowflake schema model looks like a snowflake.
Machine learning is a part of Artificial Intelligence (AI) allowing a system to learn from data rather than by explicit programming. Machine learning uses several repetitive algorithms by learning from data to improve, describe data, and predict results [13].
Unsupervised learning is a process of grouping unlabeled data. One of the utilizations is for clustering. The machine learning models for clustering are as follows.
1) Hierarchical Clustering. It works by forming a hierarchy or based on a certain level to appear like a tree structure. Thus, the clustering process is performed according to the level or step by step. Hierarchical clustering consists of two clustering, namely Agglomerative (bottom-up) and Divisive (Top-Down).
2) Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH). It is an algorithm that can cluster big data by making a small and brief summary at first and storing information as much as possible. The smaller and brief summary is then grouped as a substitute for the larger data cluster. The mechanism of the first BIRCH algorithm is summarizing a group of big data into a smaller one that is known as Clustering Feature (CF) tree. Each node of this tree consists of some Clustering features (CF). Then, each node, www.ijacsa.thesai.org including a leaf node, has some CF; besides, the internal node CF has a pointer to sub-node, and all leaf nodes are linked by a doubly linked list.
3) K-means. This clustering algorithm is one of the nonhierarchical clustering methods that try to make partitions for the existing objects into one or more clusters or object groups according to the characteristics. Thus, the object with the same characteristics is grouped in one cluster and the object with different characteristics is grouped in another cluster. 4) Mini Batch K-means. This algorithm forms a minibatch consisting of a collection of small randomized data with a constant size enable to be stored in a memory. The mechanism is that the sample is taken randomly from the dataset to form a minibatch, and then it is assigned to the nearby centroid. In the second step, the centroid is updated and so on.
One of the ways to find the optimal total cluster is by using an elbow method. It is done by seeing the percentage of the comparison between the total clusters that will form an elbow in a certain point. This method can be illustrated through a line plot between SSE (Sum of Squared error) compared to the total cluster and finding a point that represents "an elbow point" (the point after SSE or inertia starts decreasing in a linear fashion). Elbow method is often used in previous studies for determining the optimal number of clusters [14], [15] , in addition to the silhouette coefficient method [16].

III. RELATED WORK
A study on Customer segmentation using a machine learning method was the Fuzzy C-Means Clustering utilization for Customer Relationship Management (CRM) database on an online shop, namely tokodiapers.com [17]. Subsequently, another study focused on the implementation of k-means clustering on Recency-Frequency-Monetary (RFM)-based customer segmentation [18], [19]. Customer segmentation using PCA was combined with machine learning to make clustering [20]. The combination of K-means and ANN methods used SOM [21], [22]. Some other studies compared the clustering models between k-means, fuzzy c-means, Repetitive median K-Means [10], and between k-means, kmedoids, and DBSCAN [23].
Previous studies showed that business intelligence can be used for descriptive data in marketing strategy [24], social media analysis [25], travel companies [26], and can also be implemented in small-scale companies [27]. The business intelligence implementation can be combined with the data warehouse implementation. For example, business intelligence implementation using Higher Education data in Iran [28].
Based on literature studies and previous research, this study offers a complete and different solution for customer segmentation. First, customer segmentation based on RFM analysis creates customer levels combined with Geographic, Demographic, Psychographic and Behavioral to classify customers with the same characteristics. Second, the data is presented in business intelligence reports and is based on a data warehouse. Finally, the research will test several clustering models to find the fastest model.

IV. RESEARCH METHODOLOGY
This study was conducted through several phases as shown below.

A. Understanding the Business Process
In this phase, it was done by seeking information on how customer segmentation was implemented. There were two methods. They are as follows.
1) Literature study. It was performed by reading relevant books and journals.
2) Conducting a field observation and interview with users.
Based on these processes, it can be inferred that 2 methods of customer segmentation will be implemented. First, customer segmentation was ranked based on the RFM (Recency, Frequency, Monetary) analysis [10], and the second method was customer segmentation based on Geography, Demography, Psychography, Behavior [8].

B. Analysis Data
The study used secondary data taken from the credit card transaction history in bank XYZ in Indonesia for three months, namely October to December 2020. There are five CSV (Comma Separated File) format files that will be used, namely.

1)
Cc_transaction.csv is credit card transaction data whose information consists of customer id, category id, transaction date, amount in foreign currency, currency, card number, payee account and payee name.
2) Category.csv is shopping category data whose columns consist of category id, category name and group category.
3) Currency.csv contains a list of currencies consisting of the following columns currency id, currency code and currency name. 4) Customer.csv contains data from customer profiles consisting of customer id, customer name, gender, marital, grade, profession, address1, address2, postal code, open date, birthday, city, and province. 5) Rate.csv contains exchange rate information for all currencies consisting of currency id, date and rate.

C. Designing a Data Architecture and Data Flow
This phase consisted of designing tables, data architecture, and the data flow. The column structure of the created table must match the format and content of the data.

D. Preparation Process in the Data Warehouse
This phase consisted of an ETL (extract, transform, and load) process. The data in this phase were processed through cleaning, filtering, and normalization, thus, the data entering the database were neat and clean data. The data from the ETL process were imported into a staging table and then processed into a fact and dimension table for the needs of business intelligence and a table of RFM analysis for the machine learning process.

E. The Process of Machine Learning
It consisted of several stages as seen below. www.ijacsa.thesai.org 1) Feature selection. The data were taken from the table of RFM analysis that was made in the preparation process in the data warehouse.
2) The production of a machine learning model. This stage consisted of several tests for machine learning clustering to find the rapid model. The models being tested here were agglomerative hierarchical clustering, Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH), Kmeans, and Mini Batch K-means.
3) Optimizing the total clustering. This stage was done to find the optimum total clusters using an Elbow method.
4) Implementation of clustering model. The machine learning model selected is the fastest from the previous testing process. The number of clusters is in accordance with the recommendations of the elbow method.

F. Business Intelligence Process
It consisted of several steps as follows.
1) The tables of load from the machine learning process and ETL at data warehouse.
2) Making a dimension model and the relationship between those tables.At this stage, it will be decided to use the star schema or snowflake model.
3) Design and dashboard visualization. This stage aims to design and present data in a business intelligence portal. The data displayed is a summary of the clustering results of machine learning and demography, geography, and customer habits. On the other hand, it displays detailed data from transaction history and customer profiles. History data is also available for each transaction and customer grouping. All data can be selected and filtered based on the results of machine learning clustering, year and month.

A. The Process in the Data Warehouse
The result from the observation and analysis was an illustration of how to design the data architecture and data flow. The following is an illustration of the data architecture and data flow.
From the Fig. 1 we can see that the first process is data processing from the server of core banking and then it is integrated into SQL server database as data warehouse through an ETL (Extract, Transform, and Load) process using SQL commands (bulk insert method). The result of the ETL process was stored in the staging table. Subsequently, two processes were done; first, the data from the staging table were processed  into a fact table and dimension table through the SQL demand. Second, the data from the staging table formed an RFM _analysis table that would be used as a feature selection for a clustering process in machine learning. Processes in machine learning using Python with the Scikit-learn library. The result obtained from the machine learning process was exported into a dimension table and a fact table.
The outcome of the machine learning process and the ETL process in the data warehouse was forming a dimension table and a fact table used for a modeling process in the business intelligence. The detail of processes and data structure can be seen in Fig. 2.  222 | P a g e www.ijacsa.thesai.org

B. Machine Learning Process
The data used in this process was an RFM analysis table that was made in the ETL process previously. The content consisted of the summary of recency, frequency, and monetary values based on customer per month and per year as shown in Table I. The next process was testing the machine learning clustering model aiming at seeking a clustering model with a rapid performance. The first trial test was conducted using the data with a range of 3 months from the staging table. Overall, the total data was 46.079 rows. The test was conducted using a trial test on 2 up to 5 clusters. The computer specification used here was Intel Core i3 6006U 2 GHz, Memory 12 GB, Hard disk SSD 512 GB, and VGA Nvidia GeForce 940MX with 2GB dedicated VRAM. The Table II illustrates the detail of the test result showing the speed of the clustering process done in a second. From the Table II, agglomerative clustering has the longest clustering process; even when trying to perform a clustering process with 2 clusters, an error message appears stating that not enough memory. BIRCH occupies the third rank for better performance than agglomerative clustering. Meanwhile, Kmeans reaches a far better performance than BIRCH and agglomerative clustering. Nevertheless, in general, the performance of minibatch k-means is faster with the longest duration of 1.47 seconds for 2 Clusters.
The experimental results in Table II show that when the number of clusters increases, the clustering process becomes faster. This happens for clustering using agglomerative, birch and minibatch kmeans, but for kmeans the opposite happens. The unique thing is that when using 2 clusters, the kmeans algorithm is better than the mini batch kmeans. However, when trying 3 or more clusters, the minibatch kmeans performance is superior.
The machine learning Mini Batch K-Means model is finally chosen because it is the rapid model in performing a clustering. The next phase was selecting the optimal number of clusters from the data. The method used here was the elbow method. The result shows that the total optimal cluster in each month is 4 clusters. It is shown from the intersection of lines forming a perpendicular line as a visualization of an elbow method in Fig.  3. Before the data modeling, it was normalized using a Min-Max score. This process was aimed at making the data have the same range from 0 to 1. Therefore, data visualization in the form of a graphic becomes more precise. The formula used in this Min-Max method as shown in (1).
Xold is the former score. Xmin is the minimum score and Xmax is the maximum score in the data range.
The clustering prediction was done using minibatch kmeans with a total of 4 clusters. After grouping the data using machine learning, the recency, frequency, and monetary scores of each datum were calculated. The scoring was done by ranking the data. The best one was scored 4 and the lower one was scored 3 and so on. The following is the calculation result of the RFM score for October 2020 in Table III.   TABLE III. THE CALCULATION RESULT OF RFM From the Table III, it can be inferred that the highest score is cluster 3 with a score of 11 points. Even though the score is the highest, cluster 3 does not have the highest recency score since the best recency score is occupied by cluster 1. Meanwhile, the lowest score is occupied by clusters 0 and 2 with the same score of 5. However, cluster 0 is considered the lowest since it has the lowest recency and frequency scores. Subsequently, each of these clusters is labeled according to their class. The highest cluster is cluster 3 labeled with Royal; the cluster below cluster 3 is cluster 1 labeled with Potential. Then, clusters 2 and 0 are labeled with Keep and At-Risk respectively.
Clusters with the Royal label are the most important and loyal customers because the number of shopping transactions is the largest and the shopping frequency is the most frequent. Meanwhile, the group of potential clusters are customers who have the potential to become loyal credit card users, because the recency value is higher even though the amount and frequency of shopping are smaller. It is possible that the group from this cluster is a new customer. Clusters with the labels At Risk and Keep must be considered, because in this group they rarely shop in large quantities. The same grouping was also done for the data in November and December 2020. The following is the 3D visualization of customer clustering in October 2020 in Fig. 4.

C. Business Intelligence Process
The table containing the result of ETL and Machine Learning processes is loaded into the business intelligence model. The modeling schema is made using a Star Schema by placing two fact tables, namely cc_rfm_fact and rfm_monhtly_fact. Meanwhile, the other dimension tables are around the two fact tables as shown in Fig. 5. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 10, 2021 224 | P a g e www.ijacsa.thesai.org The dashboard in the business intelligence consists of several parts as follows.
1) Main Dashboard. It displays a summary and aggregate data from all data. The top side has filtering based on year, month, and cluster choices. Besides filtering, there is a scorecard containing segmentation information of cluster, RFM score, and total customer, average recency score, average frequency score, and average monetary. score. Meanwhile, the central side has a bar graph showing data on shopping habits based on the shopping category. The demographic data, such as profession, sex, and marital status, are in the form of a pie chart or doughnut chart. On the right side, there is information about transaction amount based on the currency. The complete illustration can be seen in Fig. 6.
2) Distribution map. It shows the number of customer distribution based on the province and regency. The data are presented in the form of an Indonesian map. There is filtering on the top side according to customer segmentation, year, and month as shown in Fig. 7.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 10, 2021 225 | P a g e www.ijacsa.thesai.org 3) Transaction detail. It displays detailed data of each customer transaction. The column consists of customer-id, transaction date, category id, payee account, payee name, and the amount in Indonesian rupiah (IDR). This information can be filtered based on the label, year, and month through a slicer on the top side as shown in Fig. 8. 4) Customer Detail. It displays the data of customer-id, customer name, sex, position, profession, marital status, zip code, and date of birth. As in the transaction detail, on the top side, there is a slicer to filter the data based on the segmentation, year, and month. 5) Forecasting Credit Transaction. It contains credit card transaction predictions in the next few days. Forecasting uses an Exponential Smoothing method that has been available in the Power BI application. Exponential Smoothing is a forecasting method of a moving average providing an exponential or graded weight in the latest data [29]. The graph can be seen in Fig. 9.    This study makes a segmentation using a machine learning clustering model and business intelligence in the customers of data warehouse-based credit cards. The data warehouse concept utilization is used for constructing an integrated data management system that can handle a large amount of data. The machine learning clustering model is used for grouping customers according to a rank to know the most loyal customers and inactive customers. This clustering uses the RFM (Recency, Frequency, and Monetary) analysis as the measurement variable and feature selection in the clustering process. The RFM analysis is chosen because it can present customer loyalty based on their shopping behavior, such as the last time of shopping (recency), the frequency of shopping (frequency), and the amount of money spent for shopping (monetary).
The data from the machine learning clustering process are combined with other data and presented in the form of a business intelligence portal dashboard. The dimension modeling process uses a star schema because it is superior in terms of speed compared to the snowflake schema. The data and graphics displayed in the business intelligence portal present the segmentation data according to demography, geography, and behavior.
The combination of machine learning clustering segmentation for ranking customers with segmentation based on the demography, geography, and behavior provides complete and strong information as a support in a business decision for a marketing department. For example, the marketing department wants to increase the customers" credit card transactions by a limited promotional budget. First, he/she will see a group of customers having a relatively low RFM score. Then, from the group, the city with the highest number of customers is observed. Thus, the marketing department can effectively promote to a group of customers with the same characteristics.
This study also shows the minibatch k-means clustering algorithm has faster performance than that of agglomerative hierarchical clustering, BIRCH, and k-means algorithms. The result shows that out of 46.097 rows of data, the minibatch kmeans method is superior to agglomerative clustering, BIRCH is superior with a thin margin to k-means. Some clusters are tested using an elbow method. The result shows that the best and optimal cluster is 4 clusters.