Fuzzy C-means Based Inference Mechanism for Association Rule Mining: a Clinical Data Mining Approach

—Association rule mining has wide variety of research in the field of data mining, many of association rule mining approaches are well investigated in literature, but the major issue with ARM is, huge number of frequent patterns cannot produce direct knowledge or factual knowledge, hence to find factual knowledge and to discover inference, we propose a novel approach AFIRM in this paper followed by two step procedure, first is to discover frequent pattern by Appling ARM algorithm and second is to discover inference by adopting the concept of Fuzzy c-means clustering, for performance analysis, we apply this approach on a clinical dataset (contained symptoms information of patients) and we got highly effected disease in a couple of months or in a session as hidden knowledge or inference.


INTRODUCTION
Association rule mining (ARM) is the well-researched data mining technique [10,15].To find the frequent relation or association between items from market basket analysis perspective, which uses a rule based knowledge representation to find the relationship or causal dependencies between objects, things, attributes, outcomes, symptoms, occurrences, etc.It was first introduced in 1993 as the AIS algorithm [2] then in 1994 R. Agrawal and R. Srikant provided a candidate generation based technique formally known as Apriori algorithm [1] to generate rules, it is a streamlined version of AIS algorithm, it outperforms when support count is high and a number of items are less.The second approach for ARM is Frequent Pattern growth formally known as FP-Growth algorithm [11] proposed by J. Han, J. Pei and Y. in 2000.
But in present era we realize that association rule mining is a strong approach not only for market basket analysis rather than it plays vital role in various data analysis fields like stock market analysis, DNA pattern recognition, web data mining and also in clinical data mining, but the limitation of Association rule mining is it produces huge numbers of frequent patterns as per predefined thresholds which is insufficient to draw a conclusion.
The aim of this research work is to design and develop an inference mechanism for association rule mining, in order to discover abstract knowledge from the huge number of frequent patterns.Under this research, we proposed new algorithms, AFIRM to achieve the required goal.The major objectives of this research can be summarized in following points.
 To investigate the literature and problem identification.
 To choose a suitable approach for association rule mining for finding frequent patterns.
 To design an inference mechanism framework for association rule mining.
 To develop an inference mechanism for association rule mining in order to discover abstract/inference knowledge from frequent patterns.
 To analyze the result to meet abstract inference knowledge.
In this paper, we proposed a Fuzzy C-means (FCM) based inference system in order to discover inference knowledge, and the rest of the paper is organized as follows: section 2 elaborates the detail about fuzzy inference mechanism, section 3 give a comparison between hard and soft clustering approaches, section 4 introduces a literature review with basic terminology and previous research with related work already done in respective area, section 5 presented a comparative study between Apriori and FP-growth algorithm in order to choose most suitable ARM approach, section 6 proposed new approach along with illustrative example, simulation and result is presented in section 7, finally concluding remark is given in section 8.

II. INFERENCE MECHANISM
An inference engine is developed for an expert system consisting of inference mechanism as well as a control strategy.The term inference refers to the process of searching through the knowledge base and deriving new knowledge [18].
It involves formal reasoning by matching and unification, similar to the one performed by human experts to solve problems in a specific area of knowledge.An inference rule may be defined as a statement that has two parts an if clause and a then clause.It can be understood more clearly by adopting following example.www.ijacsa.thesai.orgWhereas fuzzy, c-means clustering or soft clustering algorithm provides an extended capacity of data categorization [7] where data elements can fit into more than one cluster and is associated with each object as shown Figure -3 & 4.This is a set of membership levels that specify the strength of the association between the data objects and a particular cluster.Fuzzy clustering is a process of assigning such membership levels and then using them to assign data objects to one or more clusters.It is a strong approach from the data analysis perspective because it provides the way of organizing data into multiple predefined clusters or groups, based on their similarity among the classes from which it belongs to.But the additional extensibility of this approach is a common itemset that may refer to multiple clusters based on their degree of membership or similarity.Similarity refers to the mathematical calculation of similarity in the general distance calculation between objects using some well-defined distance measures.

A. Algorithm: Fuzzy C-Means (FCM)
Fuzzy c-means clustering algorithm is most suitable approach for implementing a fuzzy inference system, in this regard; we adopt the FCM algorithm to find factual knowledge hide in frequent patterns generated by ARM algorithm.Step-1: Let D dimensional matrix U with p data points represented by U=Upq, where initialize U=U (0)

Input
Step-2: Assign the pre-defined number of clusters C, where 2≤C≤n Step-3: At k-step calculate the center vector Cq, Where the cluster center q C can calculate as follows: Step-4: Calculate the degree of membership for pth data point in cluster q can be calculated as follows.
Step-5: Update Ur, Ur+1 (as STEP-4) at each step , Then STOP (Where,  is the termination decisive factor or pre specified termination criteria between 0 and 1) Step-7: Else return to Step-3 IV.RELATED WORK An inference mechanism framework for association rule mining proposed in [4] presents a theoretical and numerical study on association rule based inference mechanism for discovering factual knowledge from a clinical dataset in order to discover highly effected disease in a particular session by proposing an algorithm AIRM while this paper is presented a modified AIRM algorithm as AFIRM algorithm further describe in the next section.
In this manner to review the literature and to study on related work we investigate some good research approaches in the respective field, we also review the approaches for clinical data mining and fuzzy inference based approach.

A. Inference approaches for ARM
In [9] Ronald Fagin et al. presents a brief overview of inference rules, they also gave a brief discussion on the applicability of inference in various areas like inference in propositional logic, non-standard propositional logic, propositional modal logic and inference in first order logic, etc, X. Wanga et.al [23] proposed a concurrent neuro-fuzzy model to discover and analyze useful knowledge from the available Web log data, for this they made use of the cluster information generated by a self-organizing map for pattern analysis and a fuzzy inference system to capture the chaotic trend to provide short-term (hourly) and long-term (daily) Web traffic trend predictions.Yang et al. [24] proposed an approach generic rule-base inference methodology using the evidential reasoning (RIMER) in this they proposed a new knowledge representation scheme in a rule base by analyzing existing knowledge base structure using a belief structure.Similarly R. Chow et.al [6] provides an ARM based Inference Mechanism, in this paper, they propose a refined and practical model of inference detection using a reference corpus.This model is inspired by association rule mining: inferences are based on word co-occurrences.Using the model and taking the Web as the reference corpus, that model also includes the important case of private corpora, to model inference detection in enterprise settings in which there is a large private document repository.They found inferences in private corpora by using analogues of his Web-mining algorithms, relying on an index for the corpus rather than a Web search engine.Treebased mining approach for discovering patterns of human interaction in meetings [25] presented by Zhiwen Yu et.al for mining, human interactions for accessing and understanding meeting content for this they proposed a tree-based mining method for discovering frequent patterns of human interaction in meeting discussions.As per this study the mining results would be useful for summarization, indexing, and comparison of meeting records and interpretation of human interaction in meetings.

B. ARM approaches for clinical data mining
Data mining is more popular in the field of medical science due to its high applicability and analysis ability in medical and clinical data mining here we review some previous approaches related to medical field.S. Venus et al. [21] proposed a rule based backward chaining inference engine which is an Arabic expert system based approach on natural language for diagnosing diseases, similarly Yanqing Ji et.al presents a potential causal association mining algorithm [16] for screening, adverse drug reactions in post marketing surveillance and proposed a novel data mining approach to signaling potential ADRs (Adverse drug reaction) from electronic health databases they also introduced potential causal association rules (PCARs) to represent the potential causal relationship between a drug and ICD-9 [13] coded signs or symptoms representing potential ADRs.Due to the infrequent nature of ADRs, the existing frequency-based data mining methods cannot effectively discover PCARs.They introduce a new interesting measure, potential causal leverage, to quantify the degree of association of a potential causal association rule (PCAR) similarly paper [20,22] proposed practical and applied fuzzy logic based approaches for medical diagnosis, whereas paper [5] presents a novel data mining approach to generate adverse drug events detection rules.The main objective of this paper is to automatically detect cases of ADEs (adverse drug events) by data mining.They used decision trees and association rules to discover ADE detection rules, with respect to time constraints.The rules are then filtered, validated, and reorganized by a committee of experts.The rules are described in a rule repository, and several statistics are automatically computed in every medical department, such as the confidence, relative risk, and median delay of outcome appearance.

V. PERFORMANCE ANALYSIS OF APRIORI & FP-GROWTH ALGORITHMS
In this section, performance analysis has been carried out in order to analyze the efficiency of Apriori and FP-Growth algorithm moreover choose the most appropriate ARM algorithm for finding frequent patterns, for this purpose we apply the both algorithms on T10I4D100K [12] and pima D38.N768.C2 [14] datasets, and measure time efficiency on different support counts, as shown in figure 5 and 6.Above analysis can be outlined in following points.
 FP-Growth algorithm outperforms on low support count.
 Apriori performs better on the high support count.
 FP-Growth takes much processing time, if large transaction set is given, due to its requirement of large storage memory in order to store a tree structure.www.ijacsa.thesai.org

VI. PROPOSED APPROACH
To overcome the limitations of the previous AIRM approach we adopt the concept of fuzzy clustering instead of classification used in previous (AIRM) approach which accepts frequent patters with degree of matching 1 (100%), whereas the newly proposed algorithm accepts all the frequent patterns either it has degree of matching 0, 1 or between of 0 & 1 because of its fuzzy nature.Under this process FCM accepts the normalized data set which contained normalized degrees of matching and produces clusters to demonstrate the nature of frequent patterns.In order to implement a fuzzy inference mechanism as shown in figure.7,AFIRM follows the following steps.

A. Data selection and Preprocessing
This is the very first step of AFIRM algorithm where data selection has performed for preprocessing, this step first load and scan the main dataset as well as fact dataset, then preprocess the datasets by mapping items/ symptoms by its corresponding index values.For this it previously read all items/symptoms and create an index table by assigning a unique index value to each item/symptom.

B. Association rule mining
This step performs the association rule mining in order to find frequent patterns, hence we use the FP -growth algorithm because it reads database once, and find frequent patterns.This process generates all the frequent patterns or frequent symptoms as per given thresholds.

C. Fact matching and matrix generation
This is the most significant step of this algorithm where all the frequent symptoms match with fact dataset and find its degree of similarity (where factual data set contain symptoms with related disease), afterward all these information further stores in another database where frequent symptoms, disease and degree of similarity camps to gather in the form of a matrix.

E. FCM Clustering and Inference evaluation
FCM is the fuzzy based clustering algorithm so that it can better categorize the above generated degrees of similarity, hence we apply the Fuzzy c-means algorithm on normalized matrix in order to categorize the frequent symptoms.When we plot these clusters, we seem that visualization, demonstrate that which diseases is highly effected in a session.A. Case Study After preprocessing, original dataset will map in following form.After clustering a post processing phase is needed to cluster the data and validate the result, it depends on data properties, where data points are plotted with different colors depending on the cluster assigned.In this step we apply the post processing on clusters, to obtain the exact information about clusters.www.ijacsa.thesai.orgrepresenting the amount of possible disease in a particular cluster with the respective degree of matching.In this figure cluster 2 and 4 contain the high number of objects as diseases and showing that the disease5 has a high possibility of infection under this study, we also infer that the cluster 2, 4 and 5 are represented which disease would be highly effective.

IX. CONCLUSION
In this paper, we have proposed a fuzzy, c-means based inference system for association rule mining, in this study, we propose a novel algorithm AFIRM (Association fuzz inference rule mining) which is an extended version of previously proposed algorithm AIRM.
It consists of two phases.First phase scan the given dataset with corresponding fact dataset and perform preprocess to meet required format for rule mining then we apply FP-growth on pre-processed dataset in order to find frequent patterns.Second, we match these frequent patterns with fact dataset and create a matrix of degree of matching or similar and then, we normalize this matrix to apply FCM procedure.Therefore Fuzzy c-means clustering algorithm categorizes this data in different clusters on the basis of pre assigned degree of matching.Experimental results show that FIRM (Fuzzy inference rule miner) out performs comparisons to previous approach AIRM in order to discover background knowledge more over use full inference.
In future AFIRM can also use the concept of Markov predictor to know future possibilities.Secondly AIRM and AFIRM both algorithms first discover the association rules using FP-growth algorithm, to optimize, an expert mechanism might be explored to find most suitable and efficient algorithm instead of FP-growth algorithm on the basis of the type of dataset to efficiently discover association rules.

Rule 1 :
If Symptoms are headache, sneezing, running_nose then the patient have cold Rule 2: if Symptoms are fever, cough and running nose, then patients have measles III.SOFT V/S HARD CLUSTERING Fuzzy c-means clustering algorithm is a little bit different from traditional clustering approaches due to its fuzzy nature and also because of the capacity of handling delicate data.It first introduced by Dunn [8] and then improved by Bezdek [3].Traditional clustering algorithms (K-means [19], Kmedoids [17]) are also known as hard clustering techniques because it divides the data in distinct clusters where each data element belongs to the exactly one group as shown in Figure-1 and 2.

Fig. 1 .Fig. 2 .
Fig. 1.Partitioning of data on the basis of similarity in traditional or Hard clustering approaches

Fig. 3 .Fig. 4 .
Fig. 3. Partitioning of data on the basis of degree of membership in fuzzy clustering approach : D dimensional matrix, Number of Clusters -C Output: Cluster Center Cq, Membership matrix U=[Uij] Apriori www.ijacsa.thesai.org

2 . 3 .
Abbreviation Meaning DS Dataset FP Frequent patterns ARM Association rule mining Infr Inference Min_sup Minimum support FIS Fuzzy inference system

Fig. 9 .
Fig. 9. Frequent pattern discovery under ARM Figure.11shows the resulting clusters, which categorize the frequent symptoms on the basis of degree of matching (DOM) in order to demonstrate and categorize the possibility of disease infection.Earlier than, DOM normalization has applied to transform the DOM in the required form, as shown in Figure-10.

Fig. 12 .
Fig. 12. Clusters generated after FCM, representing amounts of possible diseases in resulting clusters

TABLE I .
EXECUTION TIME OF APRIORI AND FP-GROWTH ON DIFFERENT SUPPORT COUNTS (FOR T10I4D100K DATASET)