Churn Customer Estimation Method based on LightGBM for Improving Sales

—Churn customer estimation method is proposed for improving sales. By analyzing the differences between customers who churn and customers who do not churn (returning), we will conduct a customer churn analysis to reduce customer churn and take steps to reduce the number of unique customers. By predicting customers who are likely to defect using decision tree models such as LightGBM, which is a machine learning method, and logistic regression, we will discover important feature values in prediction and utilize the knowledge obtained through Exploratory Data Analysis (EDA). As results for experiments, it is found that the proposed method allows estimation and prediction of churn customers as well as characteristics and behavior of churn customers. Also, it is found that the proposed method is superior to the conventional method, GradientBoostingClassifier (GBC) by around 10%.


I. INTRODUCTION
Churn customer estimation method is very important for improving sales. By analyzing the differences between customers who churn and customers and who do not churn (returning), a customer churn analysis to reduce customer churn is conducted through taking steps to reduce the number of unique customers. By predicting customers who are likely to defect using decision tree models such as LightGBM, which is one of a machine learning method, and logistic regression, for discovering important feature values in prediction and utilize the knowledge obtained through Exploratory Data Analysis (EDA).
In order to predict churn customers, the method based on LightGBM and EDA is proposed here. LigtGBM is decision tree gradient boosting frameworks just as of XGBoost method and is convenient and fast machine learning method. Although there are differences in the details of the implementation method, there is no problem in thinking that they are almost the same framework in general. LightGBM is much faster than XGBoost method because it handles continuous values as histograms. XGBoost did not originally have this implementation, but now it is also possible to adopt a histogram-based algorithm with the parameter tree_method = hist.
The comparison between XGBoost and LightGBM is also a research topic because gradient boosting is highly practical.
There is "Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms" published in September 2018 [1]. After testing XGBoost, LightGBM, and Catboost 1 , it is concluded that no method is clearly superior in all situations.
The specific features and advantages of XGBoost and Light GBM is as follows, • No need to impute missing values • There is no problem even if there are redundant feature values (even if there are explanatory variables with high correlation, they can be used as they are) • The difference from random forest is that trees are made in series.
On the other hand, approaches to data analysis can be broadly divided into a "hypothesis verification type" that verifies hypotheses with data and an "exploration type (EDA)" that generates hypotheses from data. Methods of data analysis are roughly divided into CDA: Confirmatory Data Analysis and EDA [2]- [6]. CDA is a general term for analytical methods aimed at hypothesis verification, while EDA is an analytical method aimed at obtaining hypotheses and knowledge from large-scale, multi-general data. EDA does not select explanatory variables in advance and performs exploratory analysis by seeking knowledge from a wide range of subjects. When we actually analyze data, we go back and forth between the hypothesis testing type and the search type to find out what we know.
Data analysis requires setting hypotheses to be verified, and there is nothing to be gained from analysis without hypotheses. However, there are times when a hypothesis cannot be obtained. Therefore, in order to create a hypothesis, it is necessary to look at the data from various angles and explore trends. Therefore, an exploratory data analysis is performed. EDA can help by making sure stakeholders are asking the right questions. EDA helps answer questions about standard deviations, categorical variables, and confidence intervals. Once EDA is complete and insights are obtained, the features can be used for more sophisticated data analysis and modeling, including machine learning.
The cost of acquiring a new customer is higher than the cost of retaining an existing customer, up to five times as www.ijacsa.thesai.org much. Therefore, lowering the churn rate has a large positive impact on profits. Churn prediction is especially important for subscription-based services. By predicting churn, you can estimate CLTV (customer lifetime value) 2 and measure the growth potential of your business [7]- [19]. Also, customer churn is when customers cancel services such as subscriptions, and revenue churn is, for example, the loss of Monthly Recurring Revenue: MRR at the beginning of the month.
Customer profiling method with Big Data based on Binary Decision Tree: BDT and clustering for sales prediction is proposed and tested with POS: Point of Sales data [20]. Furthermore, a modified Prophet+Optuna prediction method for sales estimations is also proposed [21]. In this study, churn customer estimation method is proposed and examined with POS data for improving further sales.
In the next section, some of previous works are introduced. Then the proposed method for customer churn prediction is described followed by the experiment. Then conclusion and some discussions are described.

II. PREVIOUS WORKS
The 5:25 rule states that if you reduce customer churn by 5%, your profit margin will improve by 25%. From a mediumto long-term strategy perspective, it is important to implement planned measures after fully considering the balance between the customer retention rate, the defection rate, and the acquisition of new customers. Selling products to new customers requires five times the cost of selling products to existing customers (1:5 rule). Reducing the probability of customer defection and increasing sales of existing customers are important for increasing corporate profits.
It is important to maintain sales to reduce the withdrawal rate related to the top 20% of the treatment menu from the Partley's law 3 . A good way to identify the top 20% is to use a point card. With a point card, it is relatively easy to identify whether a customer is a regular customer or not.
If the new customer development cost is 100, the existing customer retention cost will be 17 to 20. The top 20% of customers account for 60-80% of total sales. Furthermore, in the bottom 30%, the degree of contribution to sales is less than 4%. The top 5% of customers with the highest loyalty often purchase related products. Reducing the defection rate (=increasing the rate of continuous purchases) has a large impact.
If the defection rate drops from 30% to 20%, the company's expected total sales now and in the future will increase by 1.5 times. A 10% increase or decrease in the attrition rate leads to a 50% increase or decrease in sales. where, the share of the number of customers is the ratio of customers who purchase the company's products among all customers in the relevant market, and the intra-customer share is the ratio of the company's products to all purchases of the product group by one customer. In addition, 1-continuous purchase rate: customer defection rate = the ratio of customers who purchased the company's products to no longer purchase the company's products.
For existing customers, the largest defection (=low repeat rate) occurs from the first purchase (F1, Frequency = 1) to the second purchase (F2, Frequency = 2). Also, if the purchase at F1 is not a regular purchase, the repeat rate from F1 to F2 is often about 20 to 40%. Furthermore, the repeat rate rises from F2 to F3, F3 to F4, etc., and when it exceeds F3, it rises to about 70 to 90%, and stable repeat earnings can be obtained.
Possible reasons for separation are as follows.
1) I did not get the results I wanted or could not get them.
2) I felt that the price of the treatment was higher than the benefits obtained (e.g., I was dissatisfied with the cost performance).
3) I felt dissatisfaction and anxiety about the company's response, not the treatment.
Therefore, customer defection analysis is necessary. It is necessary to calculate the "customer defection rate", the percentage of customers who did not use the service for the second time or more during a certain period of time, from Customer Relation Management (CRM) data, and to analyze the trend of "what kind of customers are defected". In particular, if the customer abandons the service after using it multiple times, it is necessary to take a customer's purchase history, frequency, and questionnaire.
For example, conducting questionnaires using Google Form, etc., and the "Frequently Asked Questions (FAQ)" page posted on the company's website have a great impact on customer satisfaction. , it is possible to avoid the risk of customers feeling dissatisfied and leaving. In addition, customer information in CRM is not just for approaching repeat customers, but it is necessary to collect and analyze data to grasp the tendency of customers who have already left, and to find out the reasons for leaving.

III. PROPOSED METHOD
First, customer churn is defined and then features of the customer churn are extracted from the customer data derived from the POS: Point of Sales data.
Customer churn prediction is performed by the following method.
• Theme setting: Define business problem and goals to be achieved → Define Before → After with monitorable metrics • Analytics design: Define the built model and necessary data → In many cases, data such as transaction history and CRM (customer relationship management) system www.ijacsa.thesai.org • Dataset generation: Preparing data and performing EDA, performing necessary preprocessing to create datasets suitable for machine learning algorithms.
• Predictive model training and testing: Train a churn prediction (departure prediction) model using various machine learning algorithms for classification problems → test the learned prediction model After that customer churn is characterized and estimated based on LightGBM. Meantime, ROC (Receiver Operatorating Characteristic) curve evaluation method 4 is applied to the estimated churn ratio followed by feature importance is analyzed.
Some of the countermeasures are proposed for mitigation of customer churn.

A. Data Used
We used POS customer data from 1 September 2009 to December 31, 2021. The outline of the data is as follows:

1) Total number of customers (persons) 878,181 Number of unique customer IDs
2) Total number of cases (cases) 8,857,257 Number of sales item IDs (cut and color are counted as 2 cases, discounts are also counted as 1 case) 3

B. Definition of Customer Churn
A customer who visited the store in the previous three months did not return to the store in the next three months, and a customer who did not visit the store was defined as a churn. To give an easy-to-understand example, it was defined as "out of the customers who visited the store between January and March, the customers who visited between April and June returned, and the customers who did not visit the store were rejected." The format of the final churn prediction output is as follows. It is a specification that predicts the probability that each customer will defect in the next three months. In other words, the customer ID and the likelihood of churn are represented as paired data as shown in Table I. About 65,000 customers visited all stores from January to March 2021, and customers who visited between April and June returned to the store, and those who did not return to the store. 4 https://zero2one.jp/ai-word/roc-curve-and-auc/ "0" in Fig. 1 represents recurrence and "1" as customer churn. The overall churn rate was about 42%. The features used are shown in Table II.

1) The difference between churn customers and non-churn customers:
The difference between churn customers and nonchurn customers was evaluated from the number of visits. The results are shown in Fig. 2. In the figure, orange indicates churn and blue indicates return. The lower the number of visits to the store, the higher the attrition rate, and the higher the number, the lower the attrition rate. There is a marked difference. www.ijacsa.thesai.org   Fig. 4 indicates how many days before the first visit to the store from the analysis point. This time, we analyzed customers who visited the store from January to March, so March 31st was the day before. From this, we can see that the churn rate is higher for people who first visited the store recently, and the churn rate is lower for people who first visited the store a long time ago. These differences are significant. shows the number of days before the last visit from the point of analysis, just like the date of the first visit. This time, we analyzed customers who visited the store from January to March, so March 31 st was the day before. From this result, we can see that the withdrawal rate is lower for those who last visited the store more recently, and the withdrawal rate is higher for those who last visited the store more than 50 days ago.

5) Gender:
The churn rate is lower for men than for women (the churn rate for those entered as women exceeds 60%, but for men it is a little over 50%) as shown in Fig. 6. Customers whose gender is unknown (not entered) have a very low churn rate. The reason for that is unclear. www.ijacsa.thesai.org Fig. 6. Gender dependency against churn rate 6) Age: The churn rate is high for those in their 20s and 30s and decreases for those in their 50s as shown in Fig. 7. Fig. 7. Age dependency against churn rate 7) Service menu: We categorized customers according to the menu they ordered the most and investigated the churn rate. As a result, it was found that the rejection rate for dyeing white hair is very low at around 30%, while the rejection rate for child cuts and school cuts is high as shown in Fig. 8.  Fig. 9 shows only those customers whose average unit price per visit/ number of visits is more than 2000 Yen in KDE 5 (Kernel Density Estimation). Customers with this value of 6,000 Yen or more seem to have a slightly higher churn rate. In other words, it seems that the churn rate is high for people who order expensive menus despite the fact that they visit the store less frequently.

1) LightGBM based prediction of customer churn:
The results of predicting customer churn using the above feature values (excluding distance to the store) are shown below. Fig.  10 shows the feature value order of customer churn prediction using LightGBM. It can be seen that the number of visits to the store on the day of the first visit has a large effect and is greatly affected to the churn.  Fig. 11, ROC curve and Churn pct (histogram) are seemed reasonable (not perfectly satisfied but marginal). Also, AUC (Area Under the Curve) and logarithmic function of loss are evaluated. As shown in Table  III, both show reasonably satisfied values.  Churn customer estimation method is proposed for improving sales. By analyzing the differences between customers who churn and customers who do not churn (returning), we conduct a customer churn analysis to reduce customer churn and take steps to reduce the number of unique customers. By predicting customers who are likely to defect using decision tree models such as LightGBM, which is a machine learning method, and logistic regression, we discover important feature values in prediction and utilize the knowledge obtained through EDA.
As results for experiments, it is found that the proposed method allows estimation and prediction of churn customers as well as characteristics and behavior of churn customers. Also, it is found that the proposed method is superior to the conventional method, GradientBoostingClassifier: GBC by around 10%.

FUTURE RESEARCH WORKS
Further investigations are required for improvement of prediction accuracy. We could be able to take measures such as sending DMs and coupons to customers with a 90% chance of churn. In order to increase the accuracy of churn prediction, not only LightGBM but also ensemble models such as Random Forest and logistic regression will be learned, and the accuracy will increase a little more. In addition, this time, we had the customers of all stores who visited the store during a specific www.ijacsa.thesai.org period learn, but if we try to learn for each store without narrowing down the period, a different result may appear.