Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine

—Frauds have no constant patterns. They always change their behavior; so, we need to use an unsupervised learning. Fraudsters learn about new technology that allows them to execute frauds through online transactions. Fraudsters assume the regular behavior of consumers, and fraud patterns change fast. So, fraud detection systems need to detect online transactions by using unsupervised learning, because some fraudsters commit frauds once through online mediums and then switch to other techniques. This paper aims to 1) focus on fraud cases that cannot be detected based on previous history or supervised learning, 2) create a model of deep Auto-encoder and restricted Boltzmann machine (RBM) that can reconstruct normal transactions to find anomalies from normal patterns. The proposed deep learning based on auto-encoder (AE) is an unsupervised learning algorithm that applies backpropagation by setting the inputs equal to the outputs. The RBM has two layers, the input layer (visible) and hidden layer. In this research, we use the Tensorflow library from Google to implement AE, RBM, and H2O by using deep learning. The results show the mean squared error, root mean squared error, and area under curve.


I. INTRODUCTION
Fraud detection in online shopping systems is the hottest topic nowadays. Fraud investigators, banking systems, and electronic payment systems such as PayPal must have an efficient and complex fraud detection system to prevent fraud activities that change rapidly. According to a CyberSource report from 2017, the present fraud loss by order channel, that is, the percentage of fraud loss in their web store was 74 percent and 49 percent in their mobile channels [1]. Based on this information, the lesson is to determine anomalies across patterns of fraud behavior that have undergone change relative to the past.
A good fraud detection system should be able to identify the fraud transaction accurately and should make the detection possible in real-time transactions. Fraud detection can be divided into two groups: anomaly detection and misuse detection [2]. Anomaly detection systems bring normal transaction to be trained and use techniques to determine novel frauds. Conversely, a misuse fraud detection system uses the labeled transaction as normal or fraud transaction to be trained in the database history. So, this misuse detection system entails a system of supervised learning and anomaly detection system a system of unsupervised learning. What is the difference between supervised learning and unsupervised learning? The answer is that supervised learning studies labeled datasets. They use labeled datasets to train and to render it accurate by changing the parameters of the learning rate. After that, they apply parameters of learning rate to the dataset, the techniques that implement supervised learning such as multilayerperceptron (MLP) to build the model based on the history of the database. This supervised learning has a disadvantage, since if new fraud transactions happen that do not match with the records of the database, then this transaction will be considered genuine. While, unsupervised learning acquires information from new transactions and finds anomalous patterns from new transaction. This unsupervised learning is more difficult than supervised learning, because we have to use appropriate techniques to detect anomalous behavior.
Neural networks were introduced to detect credit card frauds in the past. Now, we focus on deep learning that is a subfield of machine learning (ML). Based on deep learning in the first period, they use deep learning to know about an image's processing. For example, Facebook uses deep learning in the function to tag people and to know who the person is for subsequent reference. Further, deep learning in neural networks have many algorithms for use in fraud detection, but in this paper, we selected the AE and RBM to detect whether normal transaction of datasets qualified as novel frauds. We believe that some normal transaction in datasets that were labeled as fraud also show suspicious transaction behavior. So, in this paper we focus on unsupervised learning.
In this paper, we use three datasets in these experiments; these datasets are the German, Australian, and European datasets [4], [3], [18]. The first dataset is German, provided by Professor Dr. Hans Hofman [4]. There are twenty attributes that describe the capability, such as credit history, purpose to use credit card, credit amount, job, among others. The German dataset were 1000 instances. The second dataset is from Australia. [3] The attributes' names and values in this dataset have been changed to meaningless symbols to protect the confidentiality of the data. There were 690 instances. The last dataset was from a European cardholder from September 2013. This dataset shows the transaction that occurred in two days with 284, 807 transactions. There were 31 features in this dataset. The 28 features, such as V1, V28 is a numerical input variable result of PCA transformation. Other 3 feature that do www.ijacsa.thesai.org not bind with PCA are "Time", "Amount", and "Class". This experiment will bring together three datasets to compare different receiver operating characteristics (ROC) to understand the performance of binary classifiers.

II. RELATED WORK
In the past decade, credit card was introduced in the financial segment. Now, credit card has become a popular payment method in online shopping for goods and services. Since the introduction of credit cards, fraudsters have tried to falsely adopt normal behavior of users to make their own payments. Due to these problems, most research on credit card fraud detection has focused on pattern matching in which abnormal patterns are identified as distinct from normal transactions. Many techniques for credit card fraud detection have been presented in the last few years. We will briefly review some of those techniques below.
The K-nearest neighbor (KNN) algorithms are used to detect credit card frauds. This technique is a supervised learning technique. KNN is used for classification of credit card fraud detection by calculating its nearest point. If the new transaction is coming and the point is near the fraudulent transaction, KNN identifies this transaction as a fraud [5]. Many people confuse KNN with K-means clustering, whether they are the same techniques or not. K-means and KNN are different. K-means is an unsupervised learning technique, used for clustering. K-Means tries to determine new patterns from the data and by clustering the data into groups. Conversely, KNN is the number used to compare the nearest neighbor to classify or predict a new transaction based on previous history. The distance in KNN between two data instances can be calculated by using different method, but mostly by using the Euclidean distance. KNN is very useful.
The outlier detection is another method used to detect both supervised and unsupervised learning. Supervised outlier detection method studies and classifies the outlier using the training dataset. Conversely, unsupervised outlier detection is similar to clustering data into multiple groups based on their attributes. N. Malini and Dr. M. Pushpa mention that the outlier detection method based on unsupervised learning is preferred to detect credit card fraud over outlier supervised learning, because unsupervised learning outlier does not require prior information to label data as fraudulent. So, it needs to be trained by using normal transactions to discriminate between a legal or illegal transaction [5].
Some credit card fraud transaction datasets contain the problem of imbalance in datasets. Anusorn Charleonnan mentions that the unbalance of datasets has many characteristics that emerge during the classification. He uses RUS, a data sampling technique, by trying to relieve the problem of class unbalance by editing the class distribution of training datasets. There are two major methods of adjusting the imbalance in datasets, undersampling and oversampling. In his research, he also uses the MRN algorithm for the classification problem of credit card fraud [6].
Artificial neural network (ANN) is a flexible computing framework used to solve a comprehensive range of non-linear problems. The main idea of ANN is mimicking the learning algorithm of the human brain. The smallest unit of ANN is called a perceptron, is represented as a node. Several perceptrons are connected as a network like the human brain. Each node has a weighed communication with several other nodes in the adjacent layer. A weight is simply a floating-point number, and it can be adjusted when the input eventually comes to train the network. Inputs are passed from input nodes through hidden layers to output nodes. Each node can learn and adjust itself to make it more accurate and appropriate.
The problem of credit card fraud detection has been analyzed with the Chebyshev Function Link Artificial Neural Network (CFANN). CFANN consists of two components, functional expansion and learning. Mukesh Kumar Mishra and Rajashree Dash, authors who used CFANN to detect credit card fraud by comparing it with MLP, and the Decision Tree [7]. MLP infers that the topology was structured into a number of layers. The first layer is called input layer, the middle layer is called the hidden layer. This layer can have more than one layer, and the last layer is called the output layer. Feed forward infers that all information flows in the same direction, the leftto-right direction, without recurrent links. Decision Tree is a structured tree that has a root node and a number of internal and leaf nodes. Their paper compares the performance of CFANN, MLP, and Decision Tree. The result of their study suggests that MLP outperforms CFANN and Decision Tree in fraud detection. Conversely, CFANN makes accurate predictions over the other two techniques [7].
Deep learning forms a state of the art technology in the present day. Most people in IT should follow this. First, ANN was introduced. After that, ML becomes a subset of ANN, and deep learning, a subfield of ML. Deep learning has been used in many fields such as image recognition in Facebook, speech recognition in Apple or Siri, and natural language processing in Google translator. Yamini Pandey used deep learning with the H2O algorithm framework to know complex patterns in the dataset. H2O is an open source for predictive data analytics on Big Data. Supervised learning is based on predictive analytics. The author used H2O based multi-layered, feed forward neural network to find credit card fraud patterns. H2O's performance based on the deep learning model shows less error in mean squared error, root mean squared error, mean absolute error, and root mean squared log error. Hence, these errors are low that enhances accuracy. The model's accuracy is also high in relation to the errors mentioned above [8]. Another concern before registering credit cards is credit cards' analysis' judgement. Ayahiko Niimi uses deep learning to judge whether a credit card should be issued to the user if they satisfy particular criteria. Transaction judgement refers to the validity of a transaction's attributes before making the decisions. To verify the transaction, the author uses the benchmark experiment based on deep learning and confirms that the result of deep learning has similar accuracy as the Gaussian kernel SVM. For the comparison, the authors use five typical algorithms and change the parameters of deep learning for five times, such as activation function and dropout parameter [9]. Deep learning is the state of the art technology that recently attracted the IT circle's considerable attention. The deep learning principle is an ANN that has many hidden layers. Conversely, non-deep learning feed forward neural networks have only a single hidden layer. The given picture shows the comparison between non-deep learning as in Fig. 1 and deep learning with hidden layers as in Fig. 2.
Now, we know about ANN, ML, and Deep Learning (DL). If these three words are metaphorically equated with the human body, they would be comparable as follows: artificial intelligence is like the body that contains the traits of intelligence, reasoning, communication, emotions, and feeling. ML is like one system that acts in the body, especially the visual system. Finally, deep learning is comparable to the visual signaling mechanism. It consists of a number of cells, such as retina that acts as a receptor and translates light signals into nerve signals. Now, we shall compare all the three categories with the human body.
Deep learning is a generic term used for multilayer neural network. Based on deep learning, there are many algorithms to implement such as AE, deep convolutional network, support vector machine, and others. One problem in selecting the algorithm to solve the problem is that the developer should know the real problem and what each algorithm in deep learning does. The three algorithms of deep learning that do unsupervised learning are RBM, AE, and the sparse coding model. Unsupervised learning automatically extracts the meaningful features of your data, leverages the availability of unlabeled data, and adds a data-dependent regularization for training.
In this study, we use AE for credit card fraud detection. AE has the input equal to the output in the hidden layer that has more or less the kind of input units depicted in the Fig. 3.   The equation of an encoder and a decoder are presented here: In this study to implement AE, we use the hyperbolic tangent function or "tanh" function to encode and decode the input to the output. As a sample of a neural network, when we have already used the AE model, we should reconstruct the error by using backpropagation. Backpropagation computes the "error signal", propagates the error backwards through network that starts at the output units by using the condition that the error forms the difference between the actual and desired output values. Based on the AE, we use parameter gradients for realizing backpropagation. www.ijacsa.thesai.org  Another algorithm is RBM. There are two structures in this algorithm, visible or input layer and hidden layer. Each input node takes the input feature from the dataset to be learned. The design is different from other deep learning, because there is no output layer. The output of RBM is getting the reconstruction back to the input as shown in the picture below or Fig. 4. The point of RBM is the way in which they learn by themselves for data reconstruction; this is unsupervised learning.
Let us proceed to our design of credit card fraud detection system by using deep learning between AE and RBM in Fig. 5. First, the consumer orders the product via internet by using the credit card payment method. After that, the issuing bank sends the transaction to the acquiring bank by sending the amount of money, date and time of payment, location of internet usage, and more. Now, this is the credit card fraud detection system used to validate the behavior of credit card. As you can see, the credit card fraud system requests consumer's profile from the database to bring their behavior into the AE and RBM by using deep learning. Based on the AE, the acquiring bank transfers the input that is the amount of money, date and time, location of internet use, and other information. Then, the AE uses past behavior to be trained first, and then uses the new coming transaction as a validation test for the transaction. AE does not use labeled transactions to be trained, because it is unsupervised learning. RBM uses all transactions that transfer from acquiring bank as visible input and then that goes to the hidden node, and after the calculation of the activation function, the RBM reconstructs the model by transferring the new input from the activation function back to the output or visible function. As a conclusion of this in Fig. 5, if the transaction is fraudulent, the system will record this transaction as a fraud in the database and will then reject it. Next, the acquiring bank sends a SMS alert to the real consumer that the transaction has not been processed, because the system suspects the transaction as fraudulent.

IV. COMPARATIVE FRAUD DETECTION TECHNIQUES
Before focusing on the study of AE and RBM, this paper would prefer to compare it with other techniques to show that deep learning is suitable for finding anomalous patterns against normal transactions in Table I.

V. PROPOSED METHOD
In this paper, we use Keras [15] as a high-level neural network API implemented by python. Another program that we implement in AE is H2O [16] package. We use the H2O package to find MSE, RMSE, and variable importance across each attribute of the datasets. Conversely, we used Keras in parallel processing to get AUC and confusion matrix. Both frameworks, we coded in python on Jupyterlab.
Before we could develop the program AE by using Keras API and code the program AE by using H2O, the datasets needed to be cleansed. As we know, the German credit card data set and the Australian dataset classified characteristics for each attribute. You can see the details of these attributes in [3], [4]. This is the step of cleansing data. 2) After obtaining the classification for each attribute, we transform those classifications into PCA by using XLSTAT [14].  As mentioned above, every hidden layer we used was the "Tanh" activation function. In Keras, there are many activation functions to implement. Based on the experiment, we used "Tanh" function, because it achieves a high level of AUC. We divide the train and test with 80 and 20 percentage of data by using normal transactions to predict fraudulent transactions. This is an example of Python Coding in Keras as in Table II. As you can see, in Keras API, we need to build our model by preparing the command ourselves. Conversely, in the H2O package, we use the command of AE in Table III. Base on methodology of our research, we coded in Python and then we used Area of Under Curve to identify the success rate of the model. If the percentage of AUC is high then mean that we found unsupervised learning rate with true positive rate on our model. Conversely, some datasets that has less amount of data will get more false positive rate because they has not much data to be trained.

VI. EVALUATE THE RESULT
These are the result of the German Dataset show in Fig. 6, 7 and 8; as we mentioned above that the Dataset was divided for training and testing in a ratio 80:20 by using the normal labeled transactions in the column "Creditability" to find anomalous patterns. These form the AUC and confusion matrix.
This form the MSE and RSME from H2O the package of the German Dataset.   Let us move on to another dataset, the Australian Dataset. The AUC result is given, and the confusion matrix from Keras. The results are shown in Fig. 9, 10 and 11.   This is the Australian Dataset's MSE and RSE obtained by running the H2O package.
Here, we move on to the large dataset, the European Dataset with 284, 807 transactions. The results are shown in Fig. 12, 13 and 14.   As summarized by three datasets, there is lesser data in the German and Australian datasets. So, when we find anomalies in fraud detection, we obtain a lower of AUC, because we trained the systems for a small number of data and validated the test data for a lesser amount. Conversely, when we apply this AE model based on Keras with a large amount¸ the European Dataset, we got AUC of 0.9603. AE is suitable for large datasets.
Further RBM's results based on the three datasets are presented: we begin by explaining the German Dataset in Fig. 15. As you can see, the AUC of German Dataset is 0.4562.   The graph shows the result of the Australian Dataset by using the RBM algorithm to implement in Fig. 16. The AUC score is 0.5238.
While the biggest dataset is the European Dataset that produced an AUC value greater than the other two datasets shown above (Australian and German Dataset). The AUC score of European dataset is 0.9505 which can be seen in Fig. 17. This is the summary of AUC's score that implemented AE and RBM of three different datasets.
From this research, we can conclude that AE and RBM produce high AUC score and accuracy for bigger datasets, www.ijacsa.thesai.org because there is a large amount of data to be trained. You can see the details of AUC's score in Table IV.  Based on two popular datasets, we can conclude that supervised learning dataset is suitable for history database for credit card fraud detection. Supervised learning such as multilayer perceptron in neural network that uses the prediction algorithm to identify whether new transactions are legal or illegal. When a credit card used, the neural network based on the fraud detection system checks for the pattern used by the fraudster and corroborates the pattern in question or checks for attributes that have been determined as illegal; if the pattern matches with genuine transaction behavior, then the transaction is considered legitimate. Conversely, unsupervised learning entails knowing about normal transactions and finding anomalous patterns, and then, responding in real-time to the system as a fraud or legal transaction.

VII. CONCLUSION AND FUTURE WORK
Nowadays, in the global computing environment, online payments are important, because online payments use only the credential information from the credit card to fulfill an application and then deduct money. Due to this reason, it is important to find the best solution to detect the maximum number of frauds in online systems. AE and RBM are the two types of deep learning that use normal transactions to detect frauds in real-time. In this study, we focused on ways to build AE based on Keras, RBM, and H2O. To verify our proposed methods, we used benchmark experiments with other tools to confirm that AE and RBM in deep learning can accurately achieve credit card detection with a large dataset such as the European Dataset. Although, for these experiments, it will be better to use real credit card fraud transactions with a huge amount of data. We guarantee that AE and RBM can make more accurate AUC for receiver operator characteristics than that observable from the results from the European Dataset.