Mining Positive and Negative Association Rules Using FII-Tree

Positive and negative association rules are important to find useful information hidden in large datasets, especially negative association rules can reflect mutually exclusive correlation among items. Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there has been an increasing demand for mining the infrequent items. In this paper, we propose a tree based approach to store both frequent and infrequent itemsets to mine both the positive and negative association rules from frequent and infrequent itemsets. It minimizes I/O overhead by scanning the database only once. The performance study shows that the proposed method is an efficient than the previously proposed method.


INTRODUCTION
Association rule mining is a data mining task that discovers associations among items in a transactional database.Association rules have been extensively studied in the literature since Agrawal et al. first introduced it in [1,2].A typical example of association rule mining application is the market basket analysis.Much effort has been devoted and algorithms proposed for efficiently discovering association rules [2,3,4,5,6,16].
Association rules provide a convenient and effective way to identify and represent certain dependencies between attributes in a database [1].Association rule mining includes positive and negative association rule mining [9,11,12,17].In the traditional approach to find association rules, one merely thinks in terms of positive association rules: especially when determining the degree of support and confidence [2].
The study of the negative association rule is a new active research field in recent years.It still focuses on the transactional databases, and has made a number of important research results [7,8,10,18].Brin M. et al. referred to the relevance of the two sets firstly [1].Savasere O. et al. described a strong negative association rules model [2].Xindong Wu et al. proposed a PR model [3], and gave an algorithm that can mine positive and negative association rules simultaneously.
Ling Zhou et al. [14] and Junfeng Ding et al. [15] proposed methods to mine association rules from infrequent itemsets.Mining positive association rules from frequent itemsets and negative association rules from infrequent itemsets with some interesting measures are described in [9].Honglei Zhu et al. [12] mine both positive and negative association rules from frequent and infrequent itemsets respectively with the differential support and confidence.
In this paper we examine the problem of mining positive and negative association rules from frequent and infrequent itemstes.
The rest of this paper is organized as follows.Section 2 briefly presents the relevant concepts and definitions.In Section 3, the existing strategies for mining both positive and negative association rules are reviewed.The proposed algorithm is presented in Section 4. Section 5, illustrate the computational results.The concluding remarks are finally made in Section 6.

II. CONCEPTS AND DEFINITIONS
Let I= {i 1 , i 2 , i 3 ...i n } be a finite set of items and DB be a transactional database.Support of the itemset X  I is [1]:

of transactions contains X/Total No.of Transactions in DB
(1) Definition 1: If the support of itemset X is greater than or equal to user defined minimum support (ms) threshold, X is called frequent itemset otherwise infrequent Itemset [5].

A. Positive Association Rule:
A (positive) association rule is of the form: X Y, with X, Y , X  Y = Ø [1] [9].Support and confidence of X Y are defined as [2]: An interesting positive association rule has support and confidence greater than user given thresholds minimum support (ms) and minimum confidence (mc) respectively.

B. Negative Association Rule:
A negative association rule is an implication of the form X  ¬Y (or ¬ X  Y), where X  I, Y  I, and X ∩Y = Ø [9].The rule ¬ X  ¬Y is equivalent to a positive association rule in the form of Y X.From [12], we extracted the following formulas: Definition 2: The bit vector (BV) of an item i is in form , where k=1, 2 …m.The size of BV is equal to the number of items in i, and the support of an itemset is equal to the number 1s in the bit vector.

III. RELATED WORK
Several algorithms have been proposed for mining association rules, negative association rules.But only few a algorithms have been proposed for mining both positive and negative association rules concurrently.Wu et al [9] presented an Apriori-based framework for mining both positive and negative ARs based on rule dependency measures and an additional threshold interest (mi).A rule X  ¬Y (or ¬ X  Y) is only considered as a valid negative AR, if both X and Y are frequent and the interest (X, ¬Y) ≥ mi (

or interest (¬X, Y) ≥ mi).
The most common frame-work in the association rule generation is the "Support-Confidence" one.In [11], authors considered another frame-work called correlation analysis that adds to the support-confidence.They combined the two phases (mining frequent itemsets and generating strong association rules) and generated the relevant rules while analyzing the correlations within each candidate itemset.Their algorithm avoids evaluating item combinations redundantly.For each candidate itemset, they computed all possible combinations of items are outputted to analyze their correlations.At the end, they keep only those rules generated from item combinations with strong correlation.If the correlation is positive, a positive rule is discovered.If the correlation is negative, two negative rules are discovered.
Honglei Zhu et al. [12] proposed for the purpose of simultaneously generating positive ARs from frequent itemsets and negative ARs from infrequent itemsets with differential minimum support and differential minimum confidence.An innovative approach has proposed in [13].In this, authors dividing the itemset space into four parts for mining positive and negative association rules.In [14], the authors proposed a method to mine association rules form infrequent itemsets.

IV. PROBLEM DESCRIPTION AND PROPOSED METHOD
Most of the methods proposed for mining positive and negative association rules, maintains both frequent and infrequent itemsets and hence suffer from scalability.To maintain the execution time within user's expectations, it is necessary to design an efficient approach to mine both positive and negative association rules.
Problem Statement: Given a database of transactions DB and user-defined minimum support (ms) value, minimum confidence (mc) values, the problem is to extract all interesting positive and negative Boolean association rules.
We propose Frequent and Infrequent Itemset tree (FII-tree) as a data structure to hold requisite itemsets and also a method to extract all the positive and negative association rules.
The proposed process consists of two phases.
Phase 1: In this phase, we construct an FII-tree which can hold all frequent and infrequent itemsets.The root of the tree, labeled with "null".Each non-root node in a tree has generic form <I, c>, where I is an itemset and c is the support of I. Infrequent 1-itemsets are ignored.However, all frequent 1itemsets are placed in level 1 in lexicographic order, all frequent and infrequent 2-itemsets are in level 2, and so on, all frequent and infrequent k-itemdsets are placed in level k.The highest level of the FII-tree is L, where L is equal to the number of frequent 1-itemsets.For FII-tree, two indices, one is FreqIndex for all frequent itemset lists and the second one is InfreqIndex for all infrequent itemset lists are maintained separately for easy accessibility of frequent and infrequent itemsets.The step by step process of Creation of FII-tree is given below: 1) Scan the database DB once, and store in an item based vectors BV, then find frequent 1-itemsets based on definition2.

) Insert frequent 1-items one by one in the tree, and assign to an index FreqIndex 3) Generate candidate k-itemsets C k (k=2, 3 …) from frequent (k-1)-itemsets. For each item X in a candidate kitemsets C k a) If supp(X) ≥ min_supp and Corr(X)>1 then assign X to frequent k-itemset list (FL k ) otherwise assign X to infrequent k-itemset list (IFL k ). Calculate support of X by performing bitwise AND operation between bit vectors (BV) (if x 1 , x 2 X then the supp(X) = x 1 ^ x 2 ). b) Assign the FL k and IFL k to FreqIndex k and InfreqIndex k respectively (where k= 2, 3, 4…
Insert frequent and infrequent 2-itemsets in FII-tree.For three itemsets perform the bitwise AND operation between three bit vectors as given below: R = (BV 1 ^ BV 2 ^ BV 3 ) = (1110011000) ^ (1111011110) ^ (1101111111) =1100011000, the number of 1's in the resultant bit vector is 4, so the Supp(ABC)=4, Supp(ABD)=2, and Supp(BCD)=3.Thus the infrequent 3-itemsets are {ABC, ABD, and BCD} as their supports are less than 50% and there are no frequent 3-itemsets and 4-itemsets, Supp(ABCD)=01, this is the largest itemset for the given transactional database (DB), then the algorithm stop processing.The FII-tree for the above example is shown in fig1.
In the Fig. 1, solid straight lines are the links between the items in the same levels and links between different levels of the tree.All arcs are the links between frequent itemsets and the dashed arcs are the links between infrequent items.Different levels of frequent items are linked to frequent index called FreqIndex and infrequent items are linked to infrequent index called InfreqIndex.
EXPERIMENTAL RESULTS We conduct experiments on a different transaction size and differing number of transactions in a database to compare our approach with the PNAR [12].The execution time with different minimum supports for the dataset T50I30D200K is shown in the Fig. 2.

Fig. 2. Execution time with different minimum supports
The execution time with different dataset sizes (number of transactions) for the fixed minimum support 0.5 is shown in Fig. 3.It can be observed that both the methods generate equal number of positive and negative association rules, but the proposed approach reduce the execution time over the existing method.

VI. CONCLUSION AND FUTURE WORK
In this paper, we have designed a new tree structure to store both the frequent and infrequent itemsets for mining both positive and negative association rules.In the proposed method the database is scanned only once for mining positive and negative association rules, so it reduces the number of I/O operations.Another flexibility of the structure is, if any new frequent 1-itemsets are mined by reducing the user threshold value (minimum support (ms)), the proposed method allows appending of new items to the tree without reconstructing from scratch.
(DB) consists of 5 items and 10 transactions.The Item based bit vectors are: the support count of an itemset is the number of 1s in each bit vector.For example Supp(A)=5, Supp(B)=8, Supp(C)=9 Supp(D)=6 and Supp(E)=3.The minimum support (ms) is 5 (50%), then the frequent 1-itemsets are: {A, B, C, D}.Insert frequent 1-itmsets in FII-tree.The candidate 2itemsets are AB, AC, AD, BC, BD and CD.Next perform the bitwise AND (^) operation between each pair of frequent itemset.

Fig. 3 .
Fig. 3. Execution time with different dataset sizeRecently, there have been some interesting studies about mining frequent patterns in databases which allow adding new data or deleting old data.The maintenance of the already mined frequent patterns when updating databases is an interesting topic for future research.

TABLE I
Insert i, ifreq 1 in a node then add node to the tree.