Concurrent Edge Prevision and Rear Edge Pruning Approach for Frequent Closed Itemset Mining

Past observations have shown that a frequent item set mining algorithm are purported to mine the closed ones because the finish provides a compact and a whole progress set and higher potency. Anyhow, the newest closed item set mining algorithms works with candidate maintenance combined with check paradigm that is pricey in runtime yet as space usage when support threshold is a smaller amount or the item sets gets long. Here, we show, CEG&REP that could be a capable algorithm used for mining closed sequences while not candidate. It implements a completely unique sequence finality verification model by constructing a Graph structure that build by an approach labeled " Concurrent Edge Prevision and Rear Edge Pruning " briefly will refer as CEG&REP. a whole observation having sparse and dense real-life knowledge sets proved that CEG&REP performs bigger compared to older algorithms because it takes low memory and is quicker than any algorithms those cited in literature frequently.


INTRODUCTION
Sequential item set mining, is an important task, having many applications with market, customer and web log analysis, item set discovery in protein sequences.Capable mining techniques are being observed extensively, including the general sequential item set mining [1,2,3,4,5,6], constraintbased sequential item set mining [7,8,9], frequent episode mining [10], cyclic association rule mining [11], temporal relation mining [12], partial periodic pattern mining [13], and long sequential item set mining [14].Recently it's quite convincing that for mining frequent item sets, one should mine all the closed ones as the end leads to compact and complete result set having high efficiency [15,16,17,18], unlike mining frequent item sets, there are less methods for mining closed sequential item sets.This is because of intensity of the problem and CloSpan is the only variety of algorithm [17], similar to the frequent closed item set mining algorithms, it follows a candidate maintenance-and-test paradigm, as it maintains a set of readily mined closed sequence candidates used to prune search space and verify whether a recently found frequent sequence is to be closed or not.Unluckily, a closed item set mining algorithm under this paradigm has bad scalability in the number of frequent closed item sets as many frequent closed item sets (or just candidates) consume memory and leading to high search space for the closure checking of recent item sets, which happens when the support threshold is less or the item sets gets long.
Finding a way to mine frequent closed sequences without the help of candidate maintenance seems to be difficult.Here, we show a solution leading to an algorithm, CEG&REP, which can mine efficiently all the sets of frequent closed sequences through a sequence graph protruding approach.In CEG&REP, we need not eye down on any historical frequent closed sequence for a new pattern's closure checking, leading to the proposal of Sequence graph edge pruning technique and other kinds of optimization techniques.
The observations display the performance of the CEG&REP to find closed frequent itemsets using Sequence Graph: The comparative study claims some interesting performance improvements over BIDE and other frequently cited algorithms.
In section II most frequently cited work and their limits explained.In section III the Dataset adoption and formulation explained.In section IV, introduction to CEG&REP and its utilization for Sequence Graph protruding explained.In section V, the algorithms used in CEG&REP described.In section V1, results gained from a comparative study briefed and fallowed by conclusion of the study.

II. RELATED WORK
The sequential item set mining problem was initiated by Agrawal and Srikant , and the same developed a filtered algorithm, GSP [2], basing on the Apriori property [19].Since then, lots of sequential item set mining algorithms are being developed for efficiency.Some are, SPADE [4], PrefixSpan [5], and SPAM [6].SPADE is on principle of vertical id-list format and it uses a lattice-theoretic method to decompose the search space into many tiny spaces, on the other hand PrefixSpan implements a horizontal format dataset representation and mines the sequential item sets with the pattern-growth paradigm: grow a prefix item set to attain longer sequential item sets on building and scanning its database.The SPADE and the PrefixSPan highly perform GSP.SPAM is a recent algorithm used for mining lengthy sequential item sets and implements a vertical bitmap representation.Its observations reveal, SPAM is better efficient in mining long www.ijacsa.thesai.orgitem sets compared to SPADE and PrefixSpan but, it still takes more space than SPADE and PrefixSpan.Since the frequent closed item set mining [15], many capable frequent closed item set mining algorithms are introduced, like A-Close [15], CLOSET [20], CHARM [16], and CLOSET+ [18].Many such algorithms are to maintain the ready mined frequent closed item sets to attain item set closure checking.To decrease the memory usage and search space for item set closure checking, two algorithms, TFP [21] and CLOSET+2, implement a compact 2-level hash indexed result-tree structure to keep the readily mined frequent closed item set candidates.Some pruning methods and item set closure verifying methods, initiated the can be extended for optimizing the mining of closed sequential item sets also.CloSpan is a new algorithm used for mining frequent closed sequences [17].It goes by the candidate maintenance-and-test method: initially create a set of closed sequence candidates stored in a hash indexed result-tree structure and do post-pruning on it.It requires some pruning techniques such as Common Prefix and Backward Sub-Item set pruning to prune the search space as CloSpan requires maintaining the set of closed sequence candidates, it consumes much memory leading to heavy search space for item set closure checking when there are more frequent closed sequences.Because of which, it does not scale well the number of frequent closed sequences.BIDE [26] is another closed pattern mining algorithm and ranked high in performance when compared to other algorithms discussed.Bide projects the sequences after projection it prunes the patterns that are subsets of current patterns if and only if subset and superset contains same support required.But this model is opting to projection and pruning in sequential manner.This sequential approach sometimes turns to expensive when sequence length is considerably high.In our earlier literature [27] we discussed some other interesting works published in recent literature.
Here, we bring Sequence Graph protruding that based on edge projection and pruning, an asymmetric parallel algorithm for finding the set of frequent closed sequences.The giving of this paper is: (A) an improved sequence graph based idea is generated for mining closed sequences without candidate maintenance, termed as Concurrent Edge Prevision and Rear Edge Pruning (CEG&REP) based Sequence Graph Protruding for closed itemset mining.The Edge Projection is a forward approach grows till edge with required support is possible during that time the edges will be pruned.During this pruning process vertices of the edge that differs in support with next edge projected will be considered as closed itemset, also the sequence of vertices that connected by edges with similar support and no projection possible also be considered as closed itemset (B) in the Edge Projection and pruning based Sequence Graph Protruding for closed itemset mining, we create a algorithms for Edge Prevision and Rear Edge Pruning (C) The performance clearly signifies that proposed model has a very high capacity: it can be faster than an order of magnitude of CloSpan but uses order(s) of magnitude less memory in several cases.It has a good scalability to the database size.When compared to BIDE the model is proven as equivalent and efficient in an incremental way that proportional to increment in pattern length and data density.

III. DATASET ADOPTION AND FORMULATION
Item Sets I: A set of diverse elements by which the sequences generate.P(e i ): a transaction, where e i usage is true for that transaction.S: represents set of sequences 't': represents total number of sequences and its value is volatile s j : is a sequence that belongs to S Subsequence: a sequence of sequence set 'S' is considered as subsequence of another sequence of Sequence Set 'S' if all items in sequence S p is belongs to s q as an ordered list.This can be formulated as If Then where Total Support 'ts' : occurrence count of a sequence as an ordered list in all sequences in sequence set 'S' can adopt as total support 'ts' of that sequence.Total support 'ts' of a sequence can determine by fallowing formulation.
Is set of sequences : Represents the total support 'ts' of sequence s t is the number of super sequences of s t 1 | ( ( ), ) Qualified support 'q s ': The resultant coefficient of total support divides by size of sequence database adopt as qualified support 'qs'.Qualified support can be found by using fallowing formulation.
Sub-sequence and Super-sequence: A sequence is sub sequence for its next projected sequence if both sequences having same total support.Super-sequence: A sequence is a super sequence for a sequence from which that projected, if both having same total support.

If
rs where 'rs' is required support threshold given by user And where

A. Preprocess:
As a first stage of the proposal we perform dataset preprocessing and itemsets Database initialization.We find itemsets with single element, in parallel prunes itemsets with single element those contains total support less than required support.

B. Edge Prevision:
In this phase, we select all itemsets from given itemset database as input in parallel.Then we start projecting edges from each selected itemset to all possible elements.The first iteration includes the pruning process in parallel, from second iteration onwards this pruning is not required, which we claimed as an efficient process compared to other similar techniques like BIDE.In first iteration, we project an itemset that spawned from selected itemset from and an element considered from 'I'.If the is greater or equal to , then an edge will be defined between and .If then we prune from .This pruning process required and limited to first iteration only.
From second iteration onwards project the itemset that spawned from to each element of 'I'.An edge can be defined between and if is greater or equal to .In this description is a projected itemset in previous iteration and eligible as a sequence.Then apply the fallowing validation to find closed sequence.
If any of that edge will be pruned and all disjoint graphs except will be considered as closed sequence and moves it into and remove all disjoint graphs from memory.

If
and there after no projection spawned then will be considered as closed sequence and moves it into and remove from memory.
The above process continues till the elements available in memory those are connected through direct or transitive edges and projecting itemsets i.e., till graph become empty

1) Algorithm used in CEG&REP:
This section describes algorithms for initializing sequence database with single elements sequences, spawning itemset projections and pruning edges from Sequence Graph SG.

Algorithm 1: Concurrent Edge Prevision to build graph structure and Rear Edge Pruning
Step  This segment focuses mainly on providing evidence on asserting the claimed assumptions that 1) The CEG&REP is similar to BIDE which is actually a sealed series mining algorithm that is competent enough to momentously surpass results when evaluated against other algorithms such as CloSpan and spade.2) Utilization of memory and momentum is rapid when compared to the ColSpan algorithm which is again analogous to BIDE. 3) There is the involvement of an enhanced occurrence and a probability reduction in the memory exploitation rate with the aid of the trait equivalent prognosis and also rim snipping of the CEG&REP.This is on the basis of the surveillance done which concludes that CEG&REP's implementation is far more noteworthy and important in contrast with the likes of BIDE, to be precise.

A. Dataset Characteristics:
Pi is supposedly found to be a very opaque dataset, which assists in excavating enormous quantity of recurring clogged series with a profitably high threshold somewhere close to 90%.It also has a distinct element of being enclosed with 190 protein series and 21 divergent objects.Reviewing of serviceable legacy's consistency has been made use of by this dataset.Fig. 5 portrays an image depicting dataset series extent status.
In assessment with all the other regularly quoted forms like spade, prefixspan and CloSpan, BIDE has made its mark as a most preferable, superior and sealed example of mining copy, taking in view the detailed study of the factors mainly, memory consumption and runtime, judging with CEG&REP.The disparity in memory exploitation of CEG&REP and BIDE can be clearly observed because of the consumption level of CEG&REP being low than that of BIDE.
V. CONCLUSION It has been scientifically and experimentally proved that clogged prototype mining propels dense product set and considerably enhanced competency as compared to recurrent prototype of mining even though both these types project similar animated power.The detailed study has verified that the case usually holds true when the count of recurrent moulds is considerably large and is the same with the recurrent bordered models as well.However, there is the downbeat in which the earlier formed clogged mining algorithms depend on chronological set of recurrent mining outlines.It is used to verify whether an innovative recurrent outline is blocked or else if it can nullify few previously mined blocked patterns.This leads to a situation where the memory utilization is considerably high but also leads to inadequacy of increasing seek out space for outline closure inspection.This paper anticipates an unusual algorithm for withdrawing recurring closed series with the help of Sequence Graph.It performs te following functions: It shuns the blight of contender's maintenance and test exemplar, supervises memory space expertly and ensures recurrent closure of clogging in a wellorganized manner and at the same instant guzzling less amount of memory plot in comparison with the earlier developed mining algorithms.There is no necessity of preserving the already defined set of blocked recurrences, hence it very well balances the range of the count of frequent clogged models.A Sequence graph is embraced by CEG&REP and has the capability of harvesting the recurrent clogged pattern in an online approach.The efficacy of dataset drafts can be showcased by a wide-spread range of experimentation on a number of authentic datasets amassing varied allocation attributes.CEG&REP is rich in terms of velocity and memory spacing in comparison with the BIDE and CloSpan algorithms.ON the basis of the amount of progressions, linear scalability is provided.It has been proven and verified by many scientific research studies that limitations are crucial for a number of chronological outlined mining algorithms.Future studies include proposing of claiming a deduction advance on perking up the rule coherency on predictable itemsets.
I' is set of diverse elements Sequence set 'S': A set of sequences, where each sequence contains elements each element 'e' belongs to 'I' and true for a function p(e).Sequence set can formulate as Represents a sequence's' of items those belongs to set of distinct items 'I'.'m': total ordered items.

Fig 1 :
Fig 1: Concurrent Edge Prevision to build graph structure and Rear Edge Pruning


www.ijacsa.thesai.orgJAVA 1.6_ 20th build was employed for accomplishment of the CEG&REP and BIDE algorithms.A workstation equipped with core2duo processor, 2GB RAM and Windows XP installation was made use of for investigation of the algorithms.The parallel replica was deployed to attain the thread concept in JAVA.

Fig 3 :
Fig 3: A comparison report for Runtime

Fig 5 :
Fig 5: Sequence length and number of sequences at different thresholds in Pi dataset