Prevention and Detection of Financial Statement Fraud – An Implementation of Data Mining Framework

Every day, news of financial statement fraud is adversely affecting the economy worldwide. Considering the influence of the loss incurred due to fraud, effective measures and methods should be employed for prevention and detection of financial statement fraud. Data mining methods could possibly assist auditors in prevention and detection of fraud because data mining can use past cases of fraud to build models to identify and detect the risk of fraud and can design new techniques for preventing fraudulent financial reporting. In this study we implement a data mining methodology for preventing fraudulent financial reporting at the first place and for detection if fraud has been perpetrated. The association rules generated in this study are going to be of great importance for both researchers and practitioners in preventing fraudulent financial reporting. Decision rules produced in this research complements the prevention mechanism by detecting financial statement fraud.


INTRODUCTION
Financial statement fraud is a deliberate misstatement of material facts by the management in the books of accounts of a company with the aim of deceiving investors and creditors.This illegitimate task performed by management has a severe impact on the economy throughout the world because it significantly dampens the confidence of investors.
The magnitude of this problem can be evaluated by the fact that a number of Chinese companies listed on US stock exchanges have of faced accusations accounting fraud, and in June 2011, the U.S. Securities and Exchange Commission warned investors against investing with Chinese firms listing via reverse mergers.While over 20 US listed Chinese companies have been de-listed or halted in 2011, a number of others have been hit by the resignation of their auditors [1].
Association of certified fraud examiners (ACFE) in its report to the nation on occupational fraud and abuse (2012) [2] suggests that the typical organization loses 5% of its revenue to fraud each year.The median loss caused by occupational fraud cases was $140,000.
This study by ACFE reveals that perpetrators with higher levels of authority tend to cause much larger losses.The median loss among frauds committed by owner / executives was $573,000, the median loss caused by managers was $180,000 and the median loss caused by employees was $60,000.The report by the ACFE also measured the common methods of detecting fraud and found that in more than 43 % cases tips and complaints have been the most effective means of detecting frauds.
Prevention and detection of financial statement fraud has become a major concern for almost all organisations globally.Though, it is a fact that prevention of financial statement fraud is the best way to reduce it, but detection of fraudulent financial reporting is critical in case of failure of prevention mechanism.
The aim of this paper is to provide a methodology for prevention and detection of financial statement fraud and to present the empirical results by implementing the framework.In this research, we test the applicability of data mining framework for prevention and detection of financial statement fraud.As per the recommendations of the framework we apply descriptive data mining for prevention and predictive data mining techniques for detection of financial statement fraud.This paper is organized as follows.Section 2 summarizes the contribution in the field of prevention and detection of financial statement fraud.Section 3 implements the data mining framework for detection of fraud if prevention techniques have failed followed by conclusion (Section 4).

II. RELATED WORK
Cost of financial statement fraud is very high both in terms of finance as well as the goodwill of the organization and related country.In order to curb the chances of fraud and to detect the fraudulent financial reporting, number of researchers had used various techniques from the field of statics, artificial intelligence and data mining.
For instance, Spathis et al [3] compared multi-criteria decision aids with statistical techniques such as logit and discriminant analysis in detecting fraudulent financial statements.Neural Network based support systems was proposed by Koskivaara [4] in 2004.He demonstrated neural network as a possible tool for use in auditing and found that the main application areas of NN were detection of material errors, and management fraud.www.ijacsa.thesai.orgA decision tree was constructed by Koh and Low [5] in order to predict the hidden problems in financial statements by examining the following six variables: quick assets to current liabilities, market value of equity to total assets, total liabilities to total assets, interest payments to earnings before interest and tax, net income to total assets, and retained earnings to total assets.Kirkos et al [6], carry out an in-depth analysis of publicly available data of 76 Greek manufacturing firms for detecting fraudulent financial statements by using three Data Mining classification methods namely Decision Trees, Neural Networks and Bayesian Belief Networks.They investigated the usefulness of these techniques in identification of FFS.
In 2007, a genetic algorithm approach to detecting financial statement fraud was presented by Hoogs et al [7].An innovative fraud detection mechanism is developed by Huang et al. [8] on the basis of Zipf's Law.This technique reduces the burden of auditors in reviewing the overwhelming volumes of datasets and assists them in identification of any potential fraud records.A novel financial kernel using support vector machines for detection of management fraud was developed by Cecchini et al [9].
In 2008, the effectiveness of CART on identification and detection of financial statement fraud was examined by Belinna et al [10] and found CART as a very effective technique in distinguishing fraudulent financial statement from non-fraudulent.Juszczak et al. [11] apply many different classification techniques in a supervised two-class setting and a semi-supervised one-class setting in order to compare the performances of these techniques and settings.
Further, Zhou & Kapoor [12] in 2011 applied four data mining techniques namely regression, decision trees, neural network and Bayesian networks in order to examine the effectiveness and limitations of these techniques in detection of financial statement fraud.They explore a selfadaptive framework based on a response surface model with domain knowledge to detect financial statement fraud.
Ravisankar et al [13] applied six data mining techniques namely Multilayer Feed Forward Neural Network (MLFF), Support Vector Machines (SVM), Genetic Programming (GP), Group Method of Data Handling (GMDH), Logistic Regression (LR), and Probabilistic Neural Network (PNN) to identify companies that resort to financial statement fraud on a data set obtained from 202 Chinese companies.They found Probabilistic neural network as the best techniques without feature selection.Genetic Programming and PNN outperformed others with feature selection and with marginally equal accuracies.
Recently, Johan Perols [14] compares the performance of six popular statistical and machine learning models in detecting financial statement fraud.The results show, somewhat surprisingly, that logistic regression and support vector machines perform well relative to an artificial neural network in detection and identification of financial statement fraud.
The review of the existing literature reveals that the research conducted till date is solely in the field of detection and identification of financial statement fraud and a very little or no work has been done in the field of prevention of fraudulent financial reporting.
Therefore, in the present research we implement a data mining framework for prevention along with detection of financial statement fraud.
The major objective of this research is to test the applicability of predictive and descriptive data mining techniques for detection and prevention of fraud respectively by implementing a data mining framework.In order to feel the sense of fraud, we implement association rule mining and to detect fraudulent financial reporting we apply three classification techniques namely decision trees, naïve Bayesian classifier and Genetic programming.

III. THE METHODOLOGY: APPLICABILITY & ITS IMPLEMENTATION
The methodology applied in this paper is a data mining framework of Gupta & Gill (2012) [15].The framework is presented as Fig 1.
The first step of the framework is feature selection.We selected 62 financial ratios / variables as features to be used as input vector in further analysis.
These features represent behavioural characteristics along with measures of liquidity, safety, profitability and efficiency of the organisations under consideration.Table 1 present the list of 62 features.All the incidents of violation of the Foreign Corrupt Practices Act (FCPA) have been removed from the sample, because FCPA prohibits the practice of bribing foreign officials and most of the AAERs issued because of FCPA do not reflect which financial statement viz.balance sheet or income statement, is affected.
We identified 29 organisations with charges of issuing fraudulent financial statements and hence termed as fraudulent in this study.85 organisations out of total of 114 have been marked as nonfraudulent since no indication or proof of falsifying financial statement has been reported.However, absence of any proof does not guarantee that these firms have not falsified their financial statements or will not do the same in future.
In order to make dataset ready for mining, data need to be pre -processed.Data has been transformed in to an appropriate format for mining during the step of Data preprocessing.Dataset is cleaned further by replacing missing values with the mean of the variable.Each of the independent financial variables has been normalized by using range transformation (min = 0.0, max = 1.0).
We compiled all the 62 input variables given in Table 1.In order to reduce dimensionality of the dataset we applied one way ANOVA.The variables with pvalue <=0.05 are considered significant and informative and with high pvalue are deemed to be noninformative.Informative variables are tested further using descriptive data mining methods.The input variables which are considered significant are given in Table 2 along with respective F-values and pvalues.The step of data preprocessing is followed by selection of an appropriate data mining technique.The framework suggests the use of descriptive data mining technique for prevention and predictive methods for detection of financial statement fraud.Therefore, we first apply association rule mining for preventing fraudulent financial reporting at the first place.
We implement association rules by using RapidMiner version 5.2.3.All the informative variables have been converted into nominal variables.Nominal variables further converted into binomial variables because it is the preliminary requirement for rule engine.In the next stage of the framework, Rule engine generates the required association rules.
In the process of rule generation, frequent itemsets is being generated using FP Growth.The minimum support for FP Growth has been set to 0.95.The frequent itemsets generated has been used for creating the association rules.The minimum confidence for generating rules is 0.8.Table 3 lists the association rules generated by rule engine.Now, the rule monitor module will monitor the financial ratios of each organisation and compare the values of the ratios with the values given in the association rules for indicating the anomaly.Anomalies detected by rule monitor are reflected as number of non fraud companies identified as fraud in Table 3.The results generated by rule monitor are able to raise an alarm regarding fraud.
In view of the whistle blown by rule monitor, organisations should consider the presence or absence of conditions which refers to certain financial pressures exhibited by the management.Such organisations should think in terms of providing employees the working environment that values honesty because irresponsible and ineffective corporate governance could increase the chances of financial statement fraud.The absence of effective corporate governance may provide enough opportunity to the managers / employees for selecting an option of fraudulent financial reporting.Hence, this unlawful practice of fraudulent financial reporting could be prevented by checking or taking away the opportunity to commit fraud and by avoiding the combination of opportunity, pressure and motive in an organisation.Once the prevention mechanism has failed to prevent fraud then the framework suggest the usage of predictive data mining for detection and identification of financial statement fraud.In this study three data mining techniques namely CART, Naive Bayesian Classifier and Genetic Programming have been used for detection of fraudulent financial statements and differentiating between fraud and non fraud reporting.In order to have better reliability of the result, tenfold cross validation has been implemented.
A decision tree (CART) has been constructed in this study by using SIPINA Research edition software version -32 bit.The complete dataset has been used as training data for constructing the tree given as Figure 2. The confidence level was set to 0.05.CART manages to classify 95 % cases.This method well classifies 98 % non fraud cases and misclassifies only 4 fraud cases.The percentage of classification for fraud cases is 86 %.
The financial ratio namely Deposits and cash to current assets has been used as the first splitter by the decision tree constructed in this research.This ratio is an indicator for the measurement of capability of a company in converting its non liquid assets into cash.At second level of the tree, retained earnings / total assets (t2) and net profit / total assets has been used as a splitter.The ratios used by tree are given in Table 4.We applied Naïve Bayesian Classifier, the second method of classification by using SIPINA Research edition software version -32 bit.The method correctly classifies 88% cases.
Third method of classification, Genetic programming has been implemented using a data mining tool Discipulus version 5.1.The process begins with division of dataset in to two datasets namely training data and validation data.The training data set has been used to train the sample and validation dataset is used exclusively for the purpose of validation.In this study, 80% of the whole dataset is designated as training data for training the sample, whereas, rest 20% is assigned exclusively for the purpose of validation.Since our dependent variable (target output) is binary, we select "hits then fitness" as a fitness function.Every single run of Discipulus has been set to terminate after it has gone 50 generations with no improvement in fitness.
Performance evaluation, the final step of the framework is used for measuring the performance and judging the efficacy of data mining methods.Performance of association rules generated in this study has been measured with the help of support, confidence, lift and conviction (Table 3).The rules generated by rule engine have support of more than 40% and confidence more than 80%.Performance matrix indicating the sensitivity (type 1 error) and specificity (type II error) of the three methods used in this study is given in Table 8.Decision tree (CART) classifies 25 fraud cases as fraud from a total of 29 such cases correctly therefore, produces best sensitivity.The following are the decision rules generated by using decision tree (Figure 2).Since Decision trees are capable of identifying type 1 error in more than 86% and Genetic programming correctly detect type II error for almost all the cases present in the dataset, therefore, we arrive at a conclusion that data mining techniques used in this study are capable enough for identification and detection of financial statement fraud in case of failure of prevention mechanism.

IV. CONCLUSION
Prevention along with detection of financial statement fraud would be of great value to the organizations throughout the world.Considering the need of such a mechanism, we employ a data mining framework for prevention and detection of financial statement fraud in this study.The framework used in this research follow the conventional flow of data mining.
We identified and collected 62 features from financial statements of 114 organizations.Then we find 35 informative variables by using one way ANOVA.These informative variables are being used for implementing association rule mining for prevention and three predictive mining techniques namely Decision Tree, Naïve Bayesian Classifier, Genetic programming for detection of financial statement fraud.Rule Engine module of the framework generated 7 association rules.These rules are used by rule monitor module for raising an alarm regarding fraud and hence preventing it at the first place.
The three data mining methods used for detection of financial statement fraud are compared on the basis of two important evaluation criteria namely sensitivity and specificity.Decision tree produces best sensitivity and Genetic programming best specificity as compared with other two methods.These techniques will detect the fraud in case of failure of prevention mechanism.Hence, the framework used in this research is able to prevent fraudulent financial reporting and detect it if management of the organization is capable of perpetrating financial statement fraud despite the presence of anti fraud environment.

Figure 1 :
Figure 1: A data mining framework for prevention and detection of financial statement fraud.
Sensitivity and specificity have been used as a metrics for performance evaluation of classification techniques used in this research.The confusion matrix for Decision trees, Naïve Bayesian classifier and Genetic programming is given below.

Table 1 :
Features For Prevention & Detection Of Financial During the second step of Data Collection, all the financial ratios of Table 1 have been collected from financial statements namely balance sheet, income statement and cash flow statement for 114 companies listed in different stock exchanges globally.The dataset used in this study has been collected from www.wikinvest.com.The companies accused of fraudulent financial reporting has been identified by analysing Accounting and Auditing Enforcement Releases published by S.E.C. (U.S. Securities and Exchange Commission) for the period of five years starting from 2007.

TABLE 2 :
LIST OF INFORMATIVE VARIABLES

TABLE 3 :
ASSOCIATION RULE