Reducing Attributes in Rough Set Theory with the Viewpoint of Mining Frequent Patterns

The main objective of the Attribute Reduction problem in Rough Set Theory is to find and retain the set of attributes whose values vary most between objects in an Information System or Decision System. Besides, Mining Frequent Patterns aims finding items that the number of times they appear together in transactions exceeds a given threshold as much as possible. Therefore, the two problems have similarities. From that, an idea formed is to solve the problem of Attribute Reduction from the viewpoint and method of Mining Frequent Patterns. The main difficulty of the Attribute Reduction problem is the time consuming for execution, NP-hard. This article proposes two new algorithms for Attribute Reduction: one has linear complexity, and one has global optimum with concepts of Maximal Random Prior Set and Maximal Set.


INTRODUCTION
Attribute reduction has played an important role in rough set applied in many fields, such as data mining, pattern recognition, machine learning.In recent years, there are many proposed reduction algorithms based on positive-region, information entropy and discernibility matrix (Qian et. al. 2011).
Attribute reduction methods have been applied by reducing inadequate features to discover hidden patterns from high dimensional data sets.Meanwhile, the nature of the original features still remained and the time consuming for pattern recognition has been decreased (Dash et. al. 2010) (Liang et. al. 2013) (Qian et. al. 2010).The characteristics of the data set are remained by keeping the important attributes.Therefore, the quality of this data set has been enhanced through the removal of redundant attributes (Sadasivam et. al. 2012).Also, rule induction can be applied in rough set theory due to attribute reduction algorithms (Yao and Zhao 2008) (Ju et. al. 2011).

One of applications of attribute reduction is gene selection.
A paper presented a Quick Reduct based Genetic Algorithm (Anitha 2012) while a minimal spanning tree based on rough set theory for gene selection was introduced (Pati and Das 2013).Based on cross entropy, the relatively dispensable attributes have been omitted in the decision system and the optimal attributes set has described the same discriminative features for the original data set (Zheng and Yan 2012).In the sense of entropies, many discernibility matrixes were introduced (Wei et. al. 2013).
Based on indiscernibility and discernibility, similarities and differences of objects have been figured out and hence, attribute reduction has been carried out according to these basic theories.Attribute set is reduced by generating redacts using the indiscernibility relation of Rough Set Theory (Sengupta and Das 2012).By transforming discernibility matrix into a simplest equivalent matrix, valuable attributes have been retained while unimportant attributes will be removed from the discernibility matrix (Yao and Zhao 2009).An attribute reduction algorithm based on genetic algorithm with improved selection operator and discernibility matrix was researched and introduced (Zhenjiang et.al. 2012).Some others discussed an algorithm on discernibility matrix and Information Gain to reduce attributes (Azhagusundari and Thanamani 2013).
In addition, a proposed hybrid algorithm for large data sets was studied to overcome the shortcoming about computationally time-consuming and inefficient significance measure for more attributes with the same greatest value (Qian et. al. 2011).
Heterogeneous attribute reduction technique can be based on neighborhood rough sets by using neighborhood dependency to evaluate the discriminating capability of a subset of heterogeneous attributes.This neighborhood model reduced the attributes according to the thresholds of samples in decision positive region (Hu et. al. 2008).
In incomplete decision systems, attribute reduction methods, such as distributive reduction and positive region reduction have been given by discernibility function (Jilin et. al. 2009).To deal with these systems, a paper proposed a new attribute reduction method based on information quantity.This approach improved traditional tolerance relationship calculation methods using an extension of tolerance relationship in rough set theory (Xu et. al. 2012).Another research presented a new attribute reduction algorithm based on incomplete decision table, which improves the two aspects of time and space complexity (Yue et. al. 2012).
Handling attribute reduction problem in special systems is also a challenged issue.There are some researches in attribute reduction about dynamic data sets (Wang et. al. 2013), fuzzy sets (Chen et. al. 2012), Inconsistent Disjunctive Set-valued Ordered Decision Information System (Zhang et. al. 2012) etc. www.ijacsa.thesai.orgEven, the design and implement of rough set processor in VHDL have studied on Binary Discernibility matrix and reduct calculator block (Tiwari et. al. 2012).Thereby, the speed of the operation for a dedicated hardware has been increased.The calculation time is always a big issue in attribute reduction.A new accelerator for attribute reduction has been proposed based on perspective of objects and attributes (Liang et. al. 2013).Particle swarm optimization was a new heuristic algorithm which has been applied to many optimization problems successfully (Ding et. al. 2012).Nowadays, it is often used to solve non-deterministic polynomial (NP)-hard problem such as attribute reduction problem.Co-PSAR was introduced based on this idea to find the minimal reduction set.An algorithm based on rough set and Wasp Swarm Optimization was also introduced.It utilizes mutual information based information entropy to find core attributes, and then utilizes the significance of feature as probability information to search through the feature space for minimum attributes reduction result (Fan and Zhong 2012).A popular method in swarm intelligence is Ant Colony Optimization (ACO).A research proposed hybrid approach can help in improving classification accuracy and also in finding more robust features to improve classifier performance based on ACO (Arafat et. al. 2013).
Genetic algorithm was also researched and applied to attribute reduction.The convergence speed of algorithm is faster in global optimal solution (Zhenjiang et.al. 2012) (Liu et. al. 2013).
Besides, granular computing has been a new research approach studied to reduce the attribute in decision system (Li et. al. 2013).A paper presented a novel granularity partition model and developed a fast effective feature selection algorithm in decision systems (Sun et. al. 2012).

Some other approaches have been researched recently about
Nonlinear Great Deluge Algorithm (Jaddi and Abdullah 2013), Quantization (Li et. al. 2012), attribute significance (Zhai et. al. 2012), degree of condition attributes (Qiu et. al. 2012) … They are all proved their efficiency in solving attribute reduction problem.
This article introduces an algorithm based on bit-chains and maximal random prior set.It finds out a reduction with linear time but the result is not global optimization.Therefore, another algorithm based on maximal set (a new development of maximal random prior set) and the algorithm for Accumulating Frequent Pattern (Nguyen TT and Nguyen PK 2013) to find a global optimal reduction is also proposed.

II. FORMULATION MODEL
Zero chain is a bit-chain with each bit equals 0.

Definition 2 (intersection operation  ):
The intersection operation  is a dyadic operation in bit-chains space.

Definition 3 (cover operation):
A bit-chain A is said to cover a bit-chain B if and only if with every position having bit-1 turned on in B, A has a corresponding bit-1 turned on.
Consequence 1: A bit-chain the result of an intersection operation and differing from zero chain is always covered by two bit-chains generating it. Having number of bit-1 turned on as much as possible.
 If there are more than one bit-chain meeting three criteria above, the bit-chain chosen to be the maximal random prior form of S is one covered by the first elements in S.
For example, consider a set of 4-bit-chains <abcd>: Review three bit-chains: <0011>: has two bit-1 turned on but is only covered by the first two bit-chains of S.
<1000>: has one bit-1 turned on and is covered by three bitchains of S.
<0010>: has one bit-1 turned on and is covered by three bitchains of S.
Between <1000> and <0010>, <0010> is covered by the first two elements in S, so  -S has to be <0010>.

Definition 5 (maximal random prior elements):
Maximal random prior elements of set S of bit-chains have the following characteristics:

Definition 6 (maximal random prior set):
A set P containing all maximal random prior elements of a set S of bitchains is called maximal random prior set of S. www.ijacsa.thesai.orgConsequence 2: All elements in maximal random prior set P do not have any the same position where bit-1 turned on.
Consequence 3: When the bit-chains set is arranged in different orders, it will produce different maximal random prior sets.
Theorem 1: When the intersection operations are made between an element in S and elements in P, the results differing from zero chain will not cover each other.
Proof: According to Consequence 2, the results made from intersection operations of an element in S and elements in P will not have bit-1 turned on at the same position.So that, these will not cover each other.

III. ALGORITHM FOR FINDING MAXIMAL RANDOM PRIOR SET
A. Idea Consider a Boolean function f the intersection () of n propositions.Each proposition in f is a union () of m variables a 1 , a 2 , ..., a m .According to commutative law of Boolean algebra, n propositions of f can be changed into the form: Clearly, ( a p ) is a reduction of f.
If n propositions in f are transformed into a set S of m-bitchains, the maximal random prior set P will be a reduction of f.According to the above analysis, an algorithm is taken shape to construct maximal random prior set P of the bit-chains set S with the following main ideas: Each element in set S will be inspected with the existing order in S. At the same time, the set P will be also created or modified correspondingly with the number of elements inspected in S.
The initial set P is empty.Obviously, the set S with one first element has the corresponding set P also containing only this first element.
Scanning the next element of S, the intersection operations (  ) made between this element and the existing elements of P to find out the new maximal random prior forms.If the new form is generated, it will replace the old form in P because this new form is covered by elements of S more than the old form, evidently.If the new form is not generated, obviously, the next element of S is one new maximal random prior form.However, a question maybe be brought out.Whenever the next element in S is inspected, the elements have to carry out intersection operations with the existing elements in P; at that time, we have two element groups listed such as: (1) the old elements of P, (2) the new elements created by the intersection operations.Maybe, the new elements will cover together or cover the old elements or be covered by the old elements.Therefore, whether the set P is ensured the consistency as Consequence 2 stated?The answer is "Yes" since Consequence 1 and Theorem 1 are generated to ensure this.

C. Accuracy of The Algorithm
Theorem 2: FIND_MaximalRandomPriorSet algorithm can find out the maximal random prior set P of a bit-chains set S with a given order.

Proof by Induction:
With number of elements in S is 1, the only element in S is also form  -S.According to the algorithm, the only element in S is inserted into P.Then, the only element in P satisfies the definition of maximal random prior set.Since, Theorem 2 is correct when S has 1 element.
Assume that Theorem 2 is correct when S has k elements.We need to prove Theorem 2 is correct when S has k + 1 element, too.
Because Theorem 2 is correct when S has k elements, we have the set P contains all maximal random prior elements of this set S.
When S has k + 1 elements, it means the original set S having k elements are added a new element.
According to FIND_MaximalRandomPriorSet algorithm, we make intersection operations between elements in current P and the new (k + 1) th element denoted s k+1 in S (line 4 and line 5): www.ijacsa.thesai.org If the result of the intersection operation between s k+1 and an element p i in P differs from zero chain (line 6),  If all intersection operations between s k+1 and each element in P return zero chain, it means s k+1 does not cover any element in P. Thus, the element s k+1 is form  -{s k+1 }, then s k+1 is inserted into P (line 13).
In both cases, we receive the set P satisfying the properties of the maximal random prior set of S. So, Theorem 2 is correct when S has k + 1 element.
In conclusion: FIND_MaximalRandomPriorSet algorithm can find out the maximal random prior set P of a bit-chains set S with a given order.

IV. ATTRIBUTE REDUCTION IN ROUGH SET THEORY
The maximal random prior set P is useful in solving and reducing Boolean algebra functions.One of the most important applications of the set P is finding out a solution of attribute reduction problem in rough set theory.

A. Rough Set
In rough set theory, information system is a pair (U, A), where U is a non-empty finite set of objects and A is a nonempty finite set of attributes.A decision system is any information system of the form (U; A  {d}), where d  A is decision attribute.With |U| denotes cardinal of U, discernibility matrix of a decision system is a symmetric |U|x|U| matrix with each entry


Table II presents a discernibility matrix of decision system "Play Sport" where a, b, c, d denote Wind, Temperature, Humidity and Outlook, respectively.Discernibility function is a Boolean function retrieved from discernibility matrix and can be defined by the formula Discernibility function can be simplified by using laws of Boolean algebra.All constituents in the minimal disjunctive normal form of this function are all reductions of decision system (Pawlak 2003).However, simplifying discernibility function is a NP-hard problem and attribute reduction is always the key problem in rough set theory.

B. The Maximal Random Prior Set and Attribute Reduction
Problem Consider a discernibility function f retrieved from discernibility matrix of a decision system with m attributes has n constituents.Each constituent in this function will be transformed into an m-bit-chain, with each bit denotes an attribute.The function will be converted into a set S has n bitchains.The maximal random prior set P of the set S is the simplification of discernibility function f.
for each y  M do 10.
if y  q then 11.
if q  y then

Consequence 4:
The bit-chain of the pattern which has the highest frequency in Representative Set of a set S is the maximal form of S.
From Consequence 4, the Definition 8 can be modified to become the following definition.

Definition 10 (maximal elements): Maximal elements of set S of bit-chains have the following characteristics:
The first element (q 1 ) is the element {y 0  P * | x  S, x  y 0 and y  P * | x  S, x  y  y 0 .frequency> y.frequency} The second element (q 2 ) is the element {y 0  P * | x  S 1 , x  y 0 and y  P * | x  S 1 , x  y  y 0 .frequency> y.frequency}, here S The third element (q 3 ) is the element {y 0  P * | x  S 2 , x  y 0 and y  P * | x  S 2 , x  y  y 0 .frequency> y.frequency}, here S 2 = S 1 \{x  S 1 | x  q 2 } … The (k + 1) th element (q k+1 ) is the element {y 0  P * | x  S k , x  y 0 and y  P * | x  S k , x  y  y 0 .frequency> y.frequency}, here S k = S k-1 \{x  S k-1 | x  q k } After Definition 10 is appeared, the algorithm for finding Maximal Set is created as follows: FIND_MaximalSet Input: m-bit-chains set S A representative set P of S Output: the maximal set Q 1. while S is not empty do 2.
for each x  S do 5.
S Proof: The algorithm FIND_MaximalSet works as follows: First we find the element q j , then delete elements in P * and in S covering q j .Repeat this until the set S is empty.
In the above, if we do not delete elements in P * covering q j , then we can see that the q j we find is the same as in Definition 10.Hence to prove the correctness of the FIND_MaximalSet algorithm, we need to show that when we delete elements in P * covering q j then we obtain the same maximal elements as defined in Definition 10.
We show this by induction on j.
From Definition 10, q 2 is determined as follows: It is the element in P * which is covered by at least one element in S 1 and is the one with the most frequency among such.We show now that q 2 can be determined from P 1 by the same criteria.

Definition 4 (
maximal random prior form  -S): The maximal random prior form of a set S of bit-chains, denoted by  -S, is a bit-chain satisfying four criteria:  Being covered most by elements in S.  Being covered by the first element in S.
→ d and (0 1 1 0) → b  c So, minimal function f = d  (b  c).In conclusion, (d  b) and (d  c) are two reductions of discernibility function f.V. EXPERIMENTATION 1 FIND_MaximalRandomPriorSet algorithm is developed and tested on a personal computer with specification: Windows 7 Ultimate 32-bit, Service Pack 1 Operating System; 4096MB RAM; Intel(R) Core(TM)2 Duo, E7400, 2.80GHz; 300GB www.ijacsa.thesai.orgHDD.Programming language is C#.NET on Visual Studio 2008.The results of some testing patterns: with S has k + 1 elements.Replace p i in P by this new result element (line 7).When s k+1 , together with p i , create a new maximal random prior form, we terminate intersection operations between s k+1 and remaining elements in P (line 9).

Theorem 3 :
The pseudo-code of GetMaximalForm algorithm is shown here: FIND_MaximalSet algorithm can find out the maximal set Q of a bit-chains set S.