Improved Association Rules Mining based on Analytic Network Process in Clinical Decision Making

Association Rules Mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. In order to overcome this problem in clinical decision making, this paper concentrates on using Analytic Network Process method to improve the process of extracting rules. The rules provided by association rules, through group decision making of physicians and health experts, are used to organize and evaluate related features by analytic network process. The proposed method has been applied in the completed blood count based on real database. It generated interesting association rules useable and useful for medical diagnosis. Keywords—Clinical Data Mining; Clinical Decision Making; Association Rules Mining; Analytic Network Process


Clinical
Data Mining (CDM) involves the conceptualization, extraction, analysis and interpretation of available clinical data for clinical decision making and practitioner reflection.Clinical database can be obtained from various sources which accumulate large quantities of information about patients and their medical conditions.Relationships and patterns within this data could provide new medical knowledge [1,2].
Popular discourse about CDM focuses on the construction or application of algorithms to acquire medical knowledge.Li et al. have proposed a privacy-preserving method for training a Restricted Boltzmann Machine (RBM) which can be got without revealing their private data to each other when using our privacy-preserving method [3].

Exarchos et al. presented the methodology for the development of the EMBalance diagnostic Decision Support
System for balance disorders.Medical data from patients with balance disorders have been analyzed using data mining techniques [4].
Bagherzadeh-Khiabani et al. have shown the application of some variable selection methods, usually used in data mining, for an epidemiological study.Also, they found that the worst and the best models were the full model and models based on the wrappers, respectively [5].
Association rule mining (ARM) is one of the most important methods in DM.In particular, the goal of association rules is to detect relationships or associations between specific values of categorical variables in large data sets, making it possible for analysts and researchers to uncover the hidden patterns.This powerful exploratory technique has a wide range of applications in many areas of business practice, industries, medicine, financial analysis, etc [6,7].Babashzadeh et al. proposed a novel approach to modeling medical query contexts based on mining semantic-based AR for improving clinical text retrieval.First, the concepts in the query context were derived from the rules that covered the query and then weighted according to their semantic relatedness to the query concepts.The query context was then exploited to re-rank patients records for improving clinical retrieval performance [8].
Most of the existing studies in temporal data mining consider only lifespan of items to find general temporal association rules.Hong et al. have organized time into granules and considered temporal data mining for different levels of granules.They designed a three-phase mining framework with consideration of the item lifespan definition [9].Moreover, the purpose of CDM is to help the medical workers, especially physicians, with making decisions according to their own understanding.Then, methods offered to facilitate better decisions should become more descriptive and considerably transparent.Despite the attractive suggestion of fully automatic data analysis, knowledge of the processes behind the data remains indispensable in avoiding many pitfalls of DM.
Based on empirical evidence, ARM faces the rules explosion; consequently, it is complicated to appreciate all discovered knowledge.Based on the literature related to human brain abilities in information processing, when the number of logical phrases and rules is large, it is hard or almost impossible to understand and make a good sense [10,11].
There are many ideas regarding the application of Multiple Criteria Decision Making (MCDM) in evaluation problems.Wen et al. have presented solutions to incorporate the uncertainty from clinical data into the MCDA model when evaluating the overall benefit-risk profiles among different treatment options [12].www.ijacsa.thesai.orgliua have proposed a novel hybrid MCDM model by integrating the 2-tuple DEMATEL technique and fuzzy MULTIMOORA method for selection of health-care waste treatment alternatives.It made use of modified 2-tuple DEMATEL for obtaining the relative weights of criteria and fuzzy MULTIMOORA for assessing the alternatives according to each criterion [13].
Shafii et al. have assessed the service quality of teaching hospitals of Medical Sciences using Fuzzy Analytical Hierarchy Process (FAHP) and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) [14].
The purpose of this study is to present a solution to this problem in the clinical decision making.It is based on considering the medical experts knowledge to preprocess and evaluate the explored rules by Analytic Network Process (ANP).ANP is one of the main methods of MCDM technique used to make group decision making [15].
Basically, a network model in ANP is constructed based on expert judgments to model an abstract decision problem.A cluster in the ANP network corresponds to a class, and elements in a cluster are equivalent to mainline subclasses in a class.Then, in the ANP context, the resulting network model only includes alternative clusters, contrary to the general network model in the ANP which comprises a goal cluster, criteria clusters, and alternative clusters [16].
Ergu and Peng proposed a framework for SaaS software packages evaluation and selection by combining the virtual team and the BOCR (benefits, opportunities, costs, and risks) of ANP.Their proposed framework has shown great potentials for aiding practitioners and researchers concerned with the cloud services [17].

Nilashi et al. have developed Fuzzy ANP for the Hospital
Information System (HIS) to understand the potential factors that importantly driving or inhibiting the decision of HIS adoption from non-adopters' perspective.This study mainly integrated the diffusion of innovation theory, technologyorganization-environment framework, institutional theory along with human-organization-technology fit model that can be tailored in understanding of the HIS adoption by Malaysian public hospitals [18].
In this paper, towards CDM, ANP method is applied to improve the process of extracting rules.The clusters provided by ANP through physicians and health experts are used to organize and evaluate related features based on ARM information.To justify our proposed approach, a dataset of Completed Blood Count (CBC) is used.The discovered rules are interesting, useful and influential enough to identify the condition of the cases.
The paper is organized as follows.Section 2 describes ARM and ANP, which are used as the main technologies.Section 3 presents the novel approach used to improve the discovered association rules.Also, in section 4, a case study is carried out to illustrate the proposed method.The paper ends with concluding remarks in section 5.

II. METHODOLOGY
In this section, the methods developed and implemented in our paper are described.First of all, the ANP method is presented.Then, ARM procedure is briefly described.

A. Analytic Network Process (ANP)
The ANP, a generalization of the AHP, is one of the most widely used multiple criteria decision making (MCDM) methods [20].The ANP incorporates feedback and interdependent relationships among decision elements and alternatives.This method provides a more accurate approach when modeling complex decision problems.The ANP derives relative priority scales of absolute numbers from individual judgments by making paired comparisons of elements on a common property or a control criterion.ANP includes the following steps [16, 19, and 20 1 indicates that some element of cluster has influence on some (at least one) elements of cluster .4. Use usual pair-wise matrices to compare the influence of the elements belonging to each cluster on any element, and derive a priority vector, and obtain the (N×N) Unweighted Super matrix, U =[ ], with [0,1], 1 ,..., G and i, j = 1, ..., N, where is the influence of i, which belongs to cluster , on element j, which belong to cluster . = 0 indicates that the element which belongs to cluster , has no influence on the element j, which belongs to cluster .www.ijacsa.thesai.org(1)  =1 indicates that the element i, which belongs to cluster is the unique element of cluster which has influence on element j which belongs to cluster . Given a cluster, , and an element j that belongs to cluster , , the sum of the unweighted values of the elements which belong to , that have influence on xj is 1.If any element of has influence on xj then the sum is 0.
Given ; Columns sum, ∑ , indicates how many clusters have influence on the column element.Identify the components and elements of the network and their relationships. is the weighted influence of element i, which belongs to cluster , on element j, which belongs to cluster .
∑ ( ) is the normalized weighted influence of element i, which belongs to cluster , on element j, which belongs to cluster .

 ∑ (
) Q is a left-stochastic matrix.8. Raise the weighted super matrix to limiting powers until the weights converge and remain stable (limit super matrix= ).li is the final priority of element xi.If xi is an alternative, li is the rating of the alternative.If xi is a criterion, li is the weight of the criterion.

B. Association Rules (AR)
Mining of association rules was introduced by Agrawal et al. [6].An association on rule is an expression of X Y, where X is a set of items, and Y is a single item.AR is an initial data exploration approach often applied to extremely large data sets.It provides valuable information in assessing significant correlations.It has been applied to a variety of fields including medicine and medical insurance fraud detection, business applications, market basket analysis, etc.Let I = {Ii; i = 1,…, m} be a set of literals called items.A database D is a set of transactions, where each transaction t is a set of items such that t I.An association rule is an implication of the form X Y, where X I, Y I, and X Y= .A transaction t is called to contain X, if X t.
Let Dsupp(X) be the fraction of transactions that contain X in a database D. The degree of support for a rule X Y is defined as Dsupp(X Y) = Dsupp(X Y).

The degree of confidence for X
Y is defined as Dconf(X Y) = Dsupp(X Y)/Dsupp(X).
The problem of mining association rules is to find all association rules that have their degrees of support and of confidence no less than the pre-specified minimal support α and the minimal confidence β, respectively.Let π denote the set of all discovered rules; then, π = {r: X Y| Dsupp (r) ≥α, Dconf (r) ≥β, X I, Y I, and X Y= } [6,7].

III. THE PROPOSED APPROACH
Although one of the main tasks of DM tools is discovering novel and hidden rules and patterns from database, in practice, data analyzers face the rules explosion.It is not only complicated to appreciate valuable discovered knowledge, but also some of the available rules have no logical and scientific existence, especially in clinical data analysis.This paper has proposed to apply group decision making, physicians and other medicine experts' knowledge to avoid this trouble.Generally, ANP is a multiple criteria decision making methodology.A network model in ANP is composed of expert judgments to model an abstract decision problem.Then ANP is integrated with ARM to improve the discovered rules for clinical decision makers in medical diagnosis.The proposed approach includes the following steps: 1. Make a list from all attributes of available cleaned database which can biologically influence target variable based on the experts' knowledge.Suppose that X= {x1, x2, …, xN} as input attributes to exploit ARM.
3. Identify the attributes' relationships, inner dependence and outer dependence of attributes of C together, through the group of decision making.Then make ̂, the relationship matrix, and a network model of the problem.
4. Make pair-wised comparison questionnaires to obtain the importance of each cluster, unweighted super matrix, ̂ where its elements is the influence of cluster on cluster through group decision making.
5. Calculate the weighted super matrix of W and Q, the importance (weight) of the attribute j of cluster i and each cluster (see steps 6 and 7, section 2.1).www.ijacsa.thesai.org6. Evaluate the significance of each extracted rule based on network model and inner and outer relationship.
 If a rule includes a relationship among the attributes of two independent clusters, then that rule is ignored.
 If a rule includes a relationship among the attributes of a cluster with no inner-dependence, then that rule is ignored.
7. Aggregate the importance level for the rest of rules which could be calculated by the geometric mean of the importance level of each contained attributes in a rule.
where ILk is the importance level of rule k, is the final priority of attribute i, xi, and is the weight of cluster i.
8. Prioritize the discovered rules based on their IL.
Hence, in the proposed method, the information obtained from the ANP is then applied to evaluate ARM and recognize the maximum likelihood X Y.The fact, grouping and summarizing all attributes into a network model help data miner to cluster and understand better the structure of database and the interaction of all attributes.Furthermore, ANP presents an overall structure of all features by experts' judgment, thereby avoiding considering all rules where there are not any rational relationships among attributes.
Meanwhile, this approach is not against producing novel and original rules and patterns, one of the essential data mining tasks; the fact, it just improves the process of discovering rules.Moreover, by implementing ANP, final rules are prioritized based on aggregating the group decision making of medical experts judgments.It is useful to recognize the importance of rules in addition to their support and confidence, as provided by ARM from the database.

IV. APPLICATION OF THE PROPOSED METHOD IN LABORATORY DATA
In this section, we examine our proposed approach by carrying it out on the medical data.These days, access to huge medicine database and the need to extract useful information are much vital.In this paper, we tried to analyze the database of complete blood count to discover novel and practical rules influential to recognize the condition of cases.
The blood count is the most common screening test for virtually every patient and clearly, it plays an important role in Point-of-Care-testing.A routine complete blood count (CBC) is required as part of the evaluation and includes the red blood cell count, hemoglobin, hematocrit, red cell indices (including MCV, MCHC, and MCH), white blood cell count and its differentiation and plate count [21].Based on the examination of the blood, the physician is directed toward a more focused assessment of the marrow or systemic disorders that secondarily involve the hematopoietic system.Hemoglobin (Hb), the main component of the red blood cell (RBC), is a conjugated protein that serves as the vehicle for the transportation of oxygen (O2) and carbon dioxide (CO2).Also, the most useful parameter is the MCV.The MCV is the average volume of red cells, expressed in Femtoliters or cubic micrometers.The MCH is the content (weight) of Hb of the average red cell.The value is expressed in pictograms.The MCHC is the average concentration of Hb in a given volume of packed red cells, expressed in g/dL [22,23].
Leukocytes in the blood serve different functions and arise from different hematopoietic lineages, so it is important to separately evaluate each of the major leukocyte types.By using the typical method, it is possible to quantify lymphocytes, Neutrophils and mixed cells (including Monocytes, basophiles and Eosinophil) [24].
Platelets are small cell fragments adapted to adhere to damaged blood vessels to aggregate one with another, and facilitate the production of fibrin [25].The main objective in this study was to discover significant, reliable and novel rules from CBC register.By implementing ARM, about 240 rules were obtained while it was clear that extracting information from these rules was a complex task.
Then, by implementing the proposed approach, the influenced attributes were organized into some groups.The following groups were revealed in table I, where X= {Platelets, Age, Gender, W.B.C, Lymphocytes counts (LC), Neutrophil counts (NC), R.B.C, Hb, Hct, MCV, MCH, MCHC} and C={Platelets, Other Measures, White Blood Measures, Red Blood Measures}.
Then, based on the inner and outer interdependences of the grouped attributes, the structure of the network model was in Fig. 1.Subsequently, in each assessing level of case condition, we should make a pair-wised comparison of groups and all influenced attributes to determine the importance of each group (ci) and each attribute (xj) grouped in ci on target variable through group decision making.In this application, we have gotten almost 11 pairwised comparison sheets for every expert.For example, if visitor have a critical situation, then the group importance is in table II.Then, to calculate super matrix of Q, the importance (weight) of each cluster and attribute j of cluster i should be calculated by normalized Eigen vector.The part of super matrix, the weights of the clusters in critical situation, can be seen in table III (wi).Therefore, in this case, the Q is a supper matrix with 11 parts and each part has different dimensions depending on its element.Hence, by using the ANP information, as mentioned in section 3, to make AR, we considered RBC, Hb, Hct and MCV and disregarded MCH and MCHC; also, association rules between age and sex were ignored.Then, about 23 remaining rules were prioritized where confidences being greater than 60 percent.Moreover, the decision maker's team confirmed that the extracted rules were reasonable, almost interesting and practical.For example: IF ((404.00<NC<= 744.50) (HB <= 186.00) (Plate <= 476500.00)(200.50 < LC <= 505.00) (297.00 < RBC) (28.00 < HCT)) THEN (patient = h) where support=25% and confidence=100%.The useful result from Q calculation is in the calculation of importance level (IL) of each rule by the geometric mean.In this rule, we have IL=18%, where it has the 5th rank among 23 rules.
While this model effectively incorporates qualitative and quantitative measures into the evaluation process, its efficacy depends on the accuracy and the value of judgment provided by the clinical decision making team.The full involvement of the relevant decision makers would help to utilize their experience and expertise in a clinical decision making process.

V. CONCLUSION
Based on the recent studies, this was the first attempt to apply ANP model in the process of ARM in the context of CDM.ANP has the ability to be used as a decision making analysis tool since it incorporates feedback and interdependent relationships among decision criteria and alternatives.In addition, evaluation and selection of novel and reasonable rules can be very useful in both academic research and practice for decision makers.In this paper, organizing all features and summarizing them in a network model helped to group and identify the structure of database through medical experts' knowledge.Furthermore, this approach helped decision makers to avoid considering all rules where there were not any rational relationships between them, thereby strengthening the results.To validate the proposed approach could be effective for clinical decision making, it is exploited the database of complete blood count.The results presented the more influential and practical rules to recognize the condition of cases.Finally, this study is suggested applying more MCDM

Fig. 1 .
Fig. 1.ANP model used to select the influenced blood factors for assessing the patient ]:1.Given a decision problem with x1, x2, ..., xN elements, the first step consists of building a model grouping the elements into c1,c2, ..., cG clusters.
The cleaned database contains: Patient code, patient age, gender, test code, result, test name, normal rang, low and high critical range for the factors of: W.B.C, R.B.C, Hb, Hct, MCV, MCH, MCHC, Platelets, Lymphocytes, Neutrophil, and Mixed (Mono & Eos & Bas).Based on the knowledge of the experts, attributes of patient code, test code, and test name were ignored.

TABLE II .
THE PAIRWISED COMPARISON OF THE CLUSTERS IN CRITICAL CONDITION