A New Viewpoint for Mining Frequent Patterns Thanh-Trung Nguyen

According to the traditional viewpoint of Data mining, transactions are accumulated over a long period of time (in years) in order to find out the frequent patterns associated with a given threshold of support, and then they are applied to practice of business as important experience for the next business processes. From the point of view, many algorithms have been proposed to exploit frequent patterns. However, the huge number of transactions accumulated for a long time and having to handle all the transactions at once are still challenges for the existing algorithms. In addition, today, new characteristics of the business market and the regular changes of business database with too large frequency of added-deleted-altered operations are demanding a new algorithm mining frequent patterns to meet the above challenges. This article proposes a new perspective in the field of mining frequent patterns: accumulating frequent patterns along with a mathematical model and algorithms to solve existing challenges. Keywords—accumulating frequent patterns; data mining; frequent pattern; horizontal parallelization; representative set; vertical parallelization


I. INTRODUCTION
Frequent pattern mining is a basic problem in data mining and knowledge discovery.Frequent patterns are set of items which occur in dataset more than user specified number of times.Identifying frequent patterns will play an essential role in mining associations, correlations, and many other interesting relationships among data (Prakash et. al. 2011).Recently, frequent pattern has been studied and applied into many areas such as: text categorization (Yuan et. al. 2013), text mining (Kim et. al. 2012), social network (Nancy and Ramani 2012), frequent subgraph (Wang and Ramon 2012) … Various techniques have been proposed to improve the performance of frequent pattern mining algorithms.In general, methods of finding frequent patterns may fall into 3 main categories: Candidate Generation Methods, Without Candidate Generation Methods and Parallel Methods.
The common rule of Candidate Generation Methods is probing dataset many times to generate good candidates which can be used as frequent patterns of dataset.Apriori algorithm, proposed by Agrawal and Srikant in 1995 (Prasad and Ramakrishna 2011), is a typical technique in this approach.Recently, many researches focus on Apriori and improve this algorithm to reduce the complexity and increase the efficiency of finding frequent patterns.Partitioning technique (Prasad and Ramakrishna 2011), pattern based algorithms and incremental Apriori based algorithms (Sharma and Garg 2011) can be viewed as the Candidate Generation Methods.Many improvements of Apriori were presented.An efficient improved Apriori algorithm for mining association rules using Logical Table based approach was proposed (Malpani and Pal 2012).Another group focuses on map/reduce design and implementation of Apriori algorithm for structured data analysis (Koundinya et. al. 2012) while some researchers proposed an improved algorithm for mining frequent patterns in large datasets using transposition of the database with minor modification of the Apriori-like algorithm (Gunaseelan and Uma 2012).The custom-built Apriori algorithm (Rawat and Rajamani 2010), and the modified Apriori algorithm (Raghunathan and Murugesan 2010) were introduced but the time consuming has been still a big obstacle.Reduced Candidate Set (RCS) is an algorithm which is more efficiency than original Apriori algorithm (Bahel and Dule 2010).Record Filter Approach is a method which takes less time than Apriori (Goswami et. al. 2010).In addition, Sumathi et.al. 2012 also proposed the algorithm taking vertical tidset representation of the database and removes all the non-maximal frequent itemsets to get exact set of Maximal Frequent Itemset directly.Besides, a method was introduced by Utmal et.al. 2012.This method firstly finds frequent 1_itemset and then uses the heap tree to sort frequent patterns generated, and so repeatedly.Although Apriori and its developments are proved the effectiveness, many scientists still focus on the other heuristic algorithms and try to find better algorithms.Genetic algorithms (Prakash et. al. 2011), Dynamic Function (Joshi et. al. 2010) and depth-first search were studied and applied successfully in reality.Besides, a study presented a new technique using a classifier which can predict the fastest ARM (Association Rule Mining) algorithm with a high degree of accuracy (Sadat et. al. 2011).Another approach was also based on Apriori algorithm but provides better reduction in time because of the prior separation in the data (Khare and Gupta 2012).
Apriori and Apriori-like algorithms sometimes create large number of candidate sets.It is hard to pass the database and compare the candidates.Without Candidate Generation is another approach which determines complete set of frequent item sets without candidate generation, based on divide and conquer technique.Interesting patterns, Constraint-based mining are typical methods which were all introduced recently (Prasad and Ramakrishna 2011).FP-Tree, FP-Growth and their developments (Aggarwal et. al. 2009) -(Kiran and Reddy 2011) -(Deypir and Sadreddini 2011) - (Xiaoyun et. al. 2009) -(Duraiswamy and Jayanthi 2011) introduced a prefix tree structure which can be used to find out frequent patterns in www.ijacsa.thesai.orgdatasets.In other hand, researchers studied a transaction reduction technique with FP-tree based bottom up approach for mining cross-level pattern (Jayanthi and Duraiswamy 2012), and construction of FP-tree using Huffman Coding (Patro et. al. 2012).H-mine is a pattern mining algorithm which is used to discover features of products from reviews (Prasad and Ramakrishna 2011) - (Ghorashi et. al. 2012).Besides that, a novel tree structure called FPU-tree was proposed which is efficient than available trees for storing data streams (Baskar et. al. 2012).Another study tried to find improved association to show that which item set is most acceptable association with others (Saxena and Satsangi 2012).Q-FP tree was introduced as a way to mine data streams for association rule mining (Sharma and Jain 2012).
Applying parallel technique to solve high complexity problems is an interesting field.In recent years, several parallel, distributed extensions of serial algorithms for frequent pattern discovery have been proposed.A study presented a distributed, parallel algorithm, which makes feasible the application of SPADA to large data sets (Appice et. al. 2011) while another group introduced HPFP-Miner algorithm, based on FP-Tree algorithm, to integrate parallelism into finding the frequent patterns (Xiaoyun et. al. 2009).
A method for finding the frequent occurrence patterns and the frequent occurrence time-series change patterns from the observational data of the weather-monitor satellite applied successfully parallel method to solve the problem of calculation cost (Niimi et. al. 2010).PFunc, a novel task parallel library whose customizable task scheduling and task priorities facilitated the implementation of clustered scheduling policy, was presented and proved its efficiency (Kambadur et. al. 2012).Another method is partitioning for finding frequent pattern from huge database.It is based on key based division for finding the local frequent pattern (LFP).After finding the partition frequent pattern from the subdivided local database, it then find the global frequent pattern from the local database and perform the pruning from the whole database (Gupta and Satsangi 2012).Fuzzy frequent pattern mining has been also a new approach recently (Picado-Muiño et.al. 2012).
Although there are significant improvement in finding frequent patterns recently, working with the varying database is still a big challenge.Especially, it need not scan again the whole database whenever having need of adding a new element or deleting/modifying an element.Besides, a number of algorithms are effective, but their basis of mathematics and way of installation are complex.In addition, it is the limit of computer memory.Hence, combining how to store the data mining context most effectively with costing the memory least and how to store frequent patterns is also not a small challenge.Finally, ability of dividing data into several parts for parallel processing is also concerned.Furthermore, characteristics of the medium and small market also produce challenges need to be solved:  Businesses need to regularly change the minimum support threshold in order to find acceptable laws based on the number of buyers.
 Due to the specific management of enterprises, the operations such as adding, deleting, editing frequently impact on database.
 The need of accumulating the results immediately after each operation on the invoice to be able to refer to the laws at any time.
In this article, a mathematical space will be introduced with some new related concepts and propositions to design new algorithms which are expected to solve remain issues in finding frequent patterns.

A. Presenting the Problem
We have a set of transactions and the goal is to produce the frequent patterns according to a specific bias called min support.
We can present the set of transactions as a set S of bit chains.For a chain in S, the i th bit is set to 1 when the i th item is chosen and otherwise.The representative set P of S is the set of all patterns in S with maximal occurrence time.
We can calculate the frequent patterns easily according to P.
So, the problem is transferred to rebuilding the representative set P whenever S is modified (add, delete or alter elements).

B. Adding a New Transaction
We just simply use the above algorithm to rebuild the representative set P when a new transaction is added.
Algorithm for rebuilding the representative set when adding a new element to S: Let S be a set of n size-m bit chains with representative set P. In this section, we consider the algorithm for rebuilding the representative set when a new chain is added to S.
ALGORITHM NewRepresentative (P, z) // Finding new representative set for S when one chain is added to S. // Input: P is the representative set of S, z is a chain added to S. // Output: The new representative set P of S  {z}.
if q ≠ 0 // q is not a chain with all bits 0 7. if x  q then P = P \ x 8.
for each y  M do 10.
if y  q then 11.

The complexity of the algorithm:
The complexity of NewRepresentative algorithm is nm2 2m , where n is the number of transactions and m is the number of items.(Of course, if we are more careful, we may get a better estimate, however the above estimate is linear in n, and this is the most important thing).www.ijacsa.thesai.orgProof: Let P be the representative set before adding z, and let Q the new representative set after adding z.Let |P| be the cardinal of P (i.e. the number of elements of P).
The key observation is that |P| ≤ 2 m .This is because we can not have two elements [z; p] and [z'; p'] in P such that z = z' and p ≠ p'.Therefore, the number of elements of P will always less than or equal to the number of chains of m bits, the latter is 2 m .Fixed an x in P.
In line 5, the complexity will be m.
In line 7, the complexity will be m again.
In line 8, the complexity is m again.
In lines 9-13, the cardinal |M| is at most |P|, since from the definition of M on line 22 the worst case is when we add every thing of the form q = x  [z; 1] into M, here x runs all over P, and in this case |M| = |P|.Hence the complexity of these lines 9-13 is less than or equal to |P|.Lines 18-24 the complexity is at most m.
Hence when we let x vary in P (but fix a transaction z), we see that the total complexity for lines 5-24 is about m|P| 2 ≤ m2 2m .
If we vary the transactions z (whose number is n), we see that the complexity for the whole algorithm is nm2 2m .

Theorem 1:
Let S be a set of size-m bit chains and P be representative set of S. By the method of induction on the number m of entries of chain, in the first step, we show that the claim is correct if u and v differ at only one k th entry.
Without loss of generality, we assume that u k = 0 and v k = 1.The following cases must be true: -Case 1: In summary, the above claim is true for any u and v of S that differ only at one entry.
By induction in the second step, it is assumed that the claim is true if u and v differ at r entries, and only one of the three statements (a), (b) or (c) is true.
Without loss of generality, we assume that the first r entries of u and v are different, and they differ at (r + 1)-th entries.Applying the same method in the first step where r = 1 to this instance, it is obtained True statements when u ≠ v, and their first r entries are different: True statements when u ≠ v, and their first r + 1 entries are different: True statements when combining the two possibilities: Therefore, if u and v are different at r + 1 entries, only one of the (a), (b), (c) statements is correct.The above claim is true, and Theorem 1 is proved.

Theorem 2:
Let S be a set of n size-m bit chains.The representative set of S is determined by applying NewRepresentative algorithm to each of n elements of S in turn.
Proof: We prove the theorem by induction on the number n of elements of S.
Firstly, when applying the above algorithm to the set S of only one element, this element is added into P and then P with that only element is the representative set of S. Thus, theorem 2 is proved in the case of n = 1.
Next, assume that whenever S has n elements, the above algorithm can be applied to S to obtain a representative set P 0 .www.ijacsa.thesai.orgNow we prove that when S has n + 1 elements then the above algorithm can be applied to yield a representative P for S. We assume that S is the union of a set S 0 of n elements and an element z, and that we already had a representative set P 0 for S 0 .Each element of P 0 allows forming a maximal rectangle from S 0 , and we call p the number of elements of P 0 .
The fifth statement in the NewRepresentative algorithm shows that the operator  can be applied to z and p elements of P 0 to produce p new elements belonging to P. This means z "scans" all elements in the set P 0 to find out new rectangle forms when adding z into S 0 .Consequently, three groups of 2p + 1 elements in total are created from the sets P 0 , P, and z.
To remove redundant rectangles, we have to check whether each element of P 0 is contained by elements of P or not, whether elements of P contain other one another, and whether z is contained by an element in P.
Let x be an element of P 0 and consider the form x  [z; 1].There are two cases: if the form of z covers the one of x then x is a new form; or if the form of x covers the one of z then z is a new form.In either case, the frequency of the new form is always one unit greater than frequency of the original.
According to Theorem 1, with x  P 0 , if some pattern w contains x then w must be a new element which belongs to P, and that new element is q = x  [z; 1].To check whether x is contained by elements belonging to P, we need only to check that whether x is contained by q or not.If x is contained by q, it must be removed from the representative set (line 7).
In summary, first, the algorithm checks whether elements belonging to P 0 is contained by elements belonging to P.Then, the algorithm checks whether elements of P contain one another (from line 9 to line 18), and whether [z; 1] is contained by elements belonging to P or not (line 8).
Finally, the above NewRepresentative algorithm can be used to find new representative set when adding new elements to S.

C. Deleting a Transaction
Definition 5: Let S be a set of bit chains and P received by applying the algorithm NewRepresentative to S be the representative set of S. Given [p; k]  P, and s 1 , s 2 , … , s r  S are r (r  k) chains participating in taking shape p, i.e., these chains participate in creating a rectangle with the form p in S, denoted p_crd: s 1 , s 2 , … , s r , otherwise, p_crd: !s 1 , !s 2 , … , !s r For example, with To sum up, Theorem 3 is right.
When having Theorem 3, modifying the representative set after a transaction was deleted is rather simple.We just use the chain/transaction deleted to scan all elements of the representative set and reduce their frequency by 1 unit if they are covered by this chain.The example for this situation will be showed in the section III.E Now the algorithm NewRepresentative_Delete are generated: ALGORITHM NewRepresentative_Delete (P, z) // Finding new representative for S when one chain is removed from S. // Input: P is the representative set of S, z is a chain removed from S. // Output: The new representative set P of S \ {z} 1.For each x  P do 2.
if z  x.Form then 3.

D. Altering a Transaction
The operation of altering a transaction is equivalent to deleting that transaction and adding new transaction with the changed content.

E. Example
Give the set S of transactions {o 1 , o 2 , o 3 , o 4 , o 5 } and the set I of items {i 1 , i 2 , i 3 , i 4 }. Figure 4 describes elements in S. To increase the speed of computation, we can realize intuitively that grouping the chains/transactions in the period of preprocessing data, before running the algorithms is a good idea.

Form
Frequency 1110 2 0111 2 0011 1 Fig. 5.The result after grouping the bit chains of S.
The proposed algorithm is tested on two datasets: the Retail data taken from a small and medium enterprise in reality and the T10I4D100K data taken from http://fimi.ua.ac.be/data/ website.First 10,000 transactions of T10I4D100K is run and compared with 12,189 transactions of Retail data.

Datasets
No Figure 6 shows the experimental results.The running time and the number of frequent patterns of T10I4D100K are absolutely larger than Retail.The result shows that the number of frequent patterns in reality of a store or an enterprise is often small.T10I4D100K is generated using the generator from the IBM Almaden Quest research group so that the transactions fluctuate much and the number of frequent patterns increases sharply when adding a new transaction.
The fast increase of number of frequent patterns leads to a big issue in computation: overflow.Although the large and the fast growth of frequent patterns, it is easy to prove that the maximum number of frequent pattern cannot be larger than a specified value.For example, if M is the maximum number of items in a store and N is the maximum number of items which a customer can purchase.The number of frequent patterns in the store is always not larger than . It means the number of frequent patterns may increase fast but it is not big enough to make the system to be crashed.
To reduce the running time of the algorithm, parallelization is one of good ways.Parallelization was applied to find frequent patterns from huge database in the past (Gupta and Satsangi 2012).The large amount of frequent patterns is a reason makes us apply parallelization method to the NewRepresentative algorithm.
One of the big issues when developing an algorithm in parallel systems is the complexity of algorithms.Some algorithms can not divide into small part to run simultaneously in separate sites or machines.Fortunately, the NewRepresentative algorithm can be expanded for parallel systems easily.The following section introduces two ways to parallelize the algorithm: Horizontal Parallelization and Vertical Parallelization.The parallelization methods share the resource of machines and reduce the running time.It's one of the efficient ways to increase the speed of algorithms having high executing complexity.
V. THE PRINCIPLES OF PARALLEL COMPUTING Parallel computing is a form of computation in which many calculations are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently.
The parallel system of the NewRepresentative algorithm has the following structure: At first, the fragmentation will be implemented.The whole data will be divided into small and equal fragments.In Horizontal Parallelization, the transactions in data will be divided while the items are the information which will be divided in Vertical Parallelization.After the data is fragmented properly, all fragments must be allocated in various sites of network.A master site has responsibility for fragmenting data, allocating fragments to sites and merging the results from site into the final result.Sites will run the NewRepresentative algorithm with the fragments which are assigned to them simultaneously and finally, send the results into the master site for merging.VI.HORIZONTAL PARALLELIZATION Consider a set of n transactions S. If there are k machine located in k separate sites.Horizontal Parallelization (HP) will divide the set S into k equal fragments and allocate those fragments to k sites.The NewRepresentative algorithm is applied on n/k transactions in each site.After running algorithm, every site has its own representative set.All representative sets are sent back to master site for merging.
Merging representative sets from sites in mater site is similar to find representative set when adding new transactions into dataset.
PS: the set of representative sets from other sites.//Output: The new representative set of horizontal parallelization.1. for each P  PS do www.ijacsa.thesai.org

Theorem 4:
Horizontal Parallelization method returns the representative set.
Proof: We prove by induction on k.
If k = 1, then the HP method is the NewRepresentative method, hence returns the representative set.
If k = 2, then let the two sites be S 1 and S 2 , and let (u 1 , p 1 ), … , (u m , p m ) be the representative set for S 1 , and let (v 1 , q 1 ), … , (v n , q n ) be the representative set for S 2 .We need to show that the HP algorithm will give us the representative for the union of S 1 and S 2 (S 1  S 2 ).Let (w, r) be a representative element for S 1  S 2 .We denote by (w 1 , r 1 ) the restriction of (w, r) to S 1 , that is w 1 = w and r 1 = the number of when w appears in S 1 .Similarly we define (w 2 , r 2 ).Then by definition we must have r 1 + r 2 = r.Also, there must be one of the representative elements in S 1 , called (u 1 , p 1 ), so that u 1  w and p 1 = r 1 .In fact, by definition of representative elements, there must be such a (u 1 , p 1 ) for which u 1  w and p 1  r 1 .We also have a (v 1 , q 1 ) in S 2 with v 1 w and q 1  r 2 .Now we must have p 1 = r 1 and q 1 = r 2 because otherwise w will appear p 1 + q 1 > r 1 + r 2 = r times in S 1  S 2 .Now when we apply the HP algorithm then we will see at least one element (w', r) where w' = u 1  v 1 , and in particular w' must cover w.Then w' must be in fact w, otherwise, we have an element (w', r) with w'  w but the frequency of w' is strictly larger than of w, and hence (w, r) cannot be a representative element.Then we see that (w, r) is produced when using the HP algorithm as wanted.
Assume that we proved the correctness of the HP algorithm for k sites.We now prove the correctness of the HP algorithm for k + 1 sites.We denote these sites by S 1 , S 2 , … , S k , S k+1 .By the induction assumption, we can find the representative set for k sites S 1 , S 2 , … , S k using the HP algorithm.Denote by


. Now apply the case of two sites which we proved above to the two sites S and S k+1 , we have that the HP algorithm produces the representative set for S  S k+1 , that is we have the representative set for the union of the k + 1 sites S 1 , … , S k , S k+1 .Therefore we completed the proof of Theorem 4.
VII. VERTICAL PARALLELIZATION Vertical Parallelization is more complex than Horizontal Parallelization.While Horizontal Parallelization focuses on transactions, Vertical Parallelization focuses on items.Vertical Parallelization (VP) divides the dataset into fragments based on items.Each fragment contains a subset of items.Fragments are allocated into separate sites.In each sites, they run NewRepresentative algorithm to find out representative sets.The representative sets will be sent back to the master site and will be merged to find the final representative set.
At the master site, representative sets of other sites will be merged into representative set of master site.The below algorithm is run to find out the final representative set.
PS: the set of representative sets from other sites.//Output: The new representative set of vertical parallelization.
1. for each P  PS do 2.
for each m  PM do 3. flag = 0; 4. M =  // M: set of used elements in P 5.
for each z  P do 6.q = m ⇼ z; 7.
if frequency of q ≠ 0 then Proof: We prove by induction on k.
If k = 1, this is the NewRepresentative algorithm, hence gives the representative set.
If k = 2, we let S 1 and S 2 be the two sites, and let S be the union.Let R = ({i 1 , … , i n }, k, {o 1 , … , o k }) = (I, k, O) be a www.ijacsa.thesai.orgrepresentative element for S. Let R 1 be the restriction of R to S 1 , that is R 1 = (I 1 , p 1 , O 1 ), where I 1 is the intersection of {i 1 , … , i n } with the set of items in S 1 , O 1 is the set of transactions in R 1 containing all items in I 1 .We define similarly R 2 = (I 2 , p 2 , O 2 ) the restriction of R to S 2 .Note that if I 1 =  then p 1 = 0, otherwise p 1 must be positive (p 1 > 0).Similarly, if I 2 =  then p 2 = 0, otherwise p 2 must be positive (p 1 > 0).We use the convention that if Remark that at least one of I 1 or I 2 is non-empty.Now by definition, there must be a representative element >= k, and at least one of these is strict.This is a contradiction to the assumption that R is a representative element of S. Now that we proved the correctness of the algorithm for k = 1 and k = 2.Then, assume that we proved the correctness of the VP algorithm for k sites.We now prove the correctness of the VP algorithm for k + 1 sites.We denote these sites by S 1 , S , … , S k , S k+1 .By the induction assumption, we can find the representative set for k sites S 1 , S 2 , … , S k using the VP Divide the items into two segments: {i  Vertical Parallelization model is applied in 17 machines with the same configuration.Each machine is located in a separate site.1,000 items is digitized into 1,000-tuple bit-chain.The master site will divide the bit-chain into 17 fragments (16 60-tuple bit-chains and a 40-tuple bit-chain).Users can regularly change the minimum support threshold in order to find acceptable laws based on the number of buyers without rerunning the algorithms.
With accumulating frequent patterns, users can refer to the laws at any time.
The algorithm is easy for implement with low complexity (nm2 m , where n is number of transactions and m is number of items).
Practically, it is easy to prove that the maximum number of frequent patterns cannot be larger than a specified value.Because the invoice form is in accordance with the government (the number of categories sold is a constant called r), the number of frequent patterns in the store is always not larger than This approach is simple for expanding for parallel systems.By applying parallel strategy to this algorithm to reduce time consuming, the article presented two methods: Vertical Parallelization and Horizontal Parallelization methods.

Definition 1 :
Given 2 bit chains with the same length: a = a 1 a 2 …a m , b = b 1 b 2 …b m .a is said to cover b or b is covered by a denoted a  bif pos(b)  pos(a) where pos(s) = {i  s i = 1}

Figure 1 , 3 Theorem 3 :
the chains 1110 and 0111 are 2 in 4 chains participating in creating [0110; 4].Let s 1 = 1110, s 2 = 0111 and p = 0110, we have p_crd = s 1 , s 2 .Besides, the chain s 3 = 0011 does not participate in creating [0110; 4], so p_crd = !sLet S be a set of bit chains and P received by applying the algorithm NewRepresentative to S be the representative set of S. With an arbitrary s  S, we have: [p; k]  P  p_crd: s, s  p and [p'; k']  P  p'_crd: !s, s ! p' Proof: Suppose to the contradiction that Theorem 3 is wrong.It has 2 cases: (1) [p; k]  P  p_crd: s, s ! p or (2) [p'; k']  P  p'_crd: !s, s  p' With (1), we have s ! p.According to the algorithm NewRepresentative, s can not participate in creating p, p_crd: !s, hence (1) is wrong.With (2), we have s  p'.According to the algorithm NewRepresentative, s have to participate in creating p', p'_crd: s, hence (2) is wrong.

.
Now apply the case of two sites which we proved above to the two sites S and S k+1 , we have that the VP algorithm produces the representative set for S  S k+1 , that is we have the representative set for the union of the k + 1 sites S 1 , … , S k , S k+1 .Therefore we completed the proof of Theorem 5.Example: Give the set S of transactions {o 1 , o 2 , o 3 , o 4 , o 5 , o 6 , o 7 } and the set I of items {i 1 , i 2 , i 3 }. i

PFig. 10 .
Fig.10.The result when running the algorithm in single machine.

Fig. 11 .
Fig.11.The result of running the algorithm in sites.After running the algorithm in sites, the representative sets are sent back to master site for merging.The merging time is 27.5275745s and the number of final frequent patterns is 139,491.So, the total time of Parallelization is: max(Running Times) + Merging Time = 21.4922293s+ 27.5275745s = 49.0198038sIt is easy to see the efficiency and accuracy of Vertical Parallelization method.IX.CONCLUSION Proposed algorithms (NewRepresentative and NewRepresentative_Delete) solved cases of adding, deleting, and altering in the context of mining data without scanning the database.
set P obtained represents the context of the problem.In addition, the algorithm allows segmenting the context to solve partially.

Definition 2: + Let u be a bit chain, k is a non-negative integer, we say [u; k] is a pattern.
Let S be a set of size-m bit chains with representative set P, then two arbitrary elements in P do not coincide with each other.www.ijacsa.thesai.

org Maximal Rectangle: If
we present the set S in the form of matrix with each row being an element in S, intuitively, we can see that each element [u; k] of the representative set P forms a maximal rectangle with maximal height of k.