Association Rule Hiding Techniques for Privacy Preserving Data Mining : A Study

Association rule mining is an efficient data mining technique that recognizes the frequent items and associative rule based on a market basket data analysis for large set of transactional databases. The probability of most frequent data item occurrence of the transactional data items are calculated to present the associative rule that represents the habits of buying products of the customers in demand. Identifying associative rules of a transactional database in data mining may expose the confidentiality and privacy of an organization and individual. Privacy Preserving Data Mining (PPDM) is a solution for privacy threats in data mining. This issue is solved using Association Rule Hiding (ARH) techniques in Privacy Preserving Data Mining (PPDM). This research work on Association Rule Hiding technique in data mining performs the generation of sensitive association rules by the way of hiding based on the transactional data items. The property of hiding rules not the data makes the sensitive rule hiding process is a minimal side effects and higher data utility technique. Keywords—Association rule mining; transactional data; privacy preservation; Association Rule Hiding (ARH); Privacy Preserving Data Mining (PPDM)


INTRODUCTION
Data belongs to a person or an organization may have different sensitive levels.These data are made available only for authorized persons.So ensuring the protection of sensitive data by access restriction is not a complete method.This may affect the utility of the data mining result and with help of the knowledge the user may re-identify sensitive data items from non-sensitive data is known as Inference Problem.The privacy preserving data mining is to provide a solution for protecting sensitive information by developing a data mining techniques which could be applied on databases without affecting the accuracy of data mining result and without violating the privacy of individuals is the motivation for this research.
Data mining is the method of determining patterns in large data sets with artificial intelligence, machine learning, statistics and database systems.The aim of data mining process is to extract information from a huge volume of data set to have logical structural representation of the data item in the transactional database.It is utilized to mine significant and useful information or knowledge from large database.Protected or private information extracted by data mining methods leads to the risk of threats to privacy.Association rule mining is a technique in data mining to recognize the regularities created in large volume of data.The method is cooperated by allowing third party to recognize and disclose hidden private information for an individual or organization.
Privacy-preserving data mining with association rule denotes the area of data mining that looks to preserve sensitive information from unnecessary or unlawful disclosure.Privacy information comprises personal or confidential information in business like social security numbers, home address, credit card numbers, credit ratings, purchasing behavior, medical records and best-selling services.The privacy preservation data mining requires guarantee for hiding of sensitive information in efficient manner.The association rule hiding technique protects the sensitive data indirectly under the scanner.Also it fails to hide data items which are not sensitive.It affects the privacy of rules and the utility of the data mining results.This paper is organized as follows: Section 2 discusses survey with existing techniques of Association Rule Hiding (ARH) for Privacy Preserving Data Mining (PPDM), Section 3 shows the Association Rule Hiding (ARH) for Privacy Preserving Data Mining (PPDM), Section 4 identifies the possible comparison between them, Section 5 discusses about the limitations of the existing techniques and Section 6 concludes the paper, key areas of research is given for improving the selection of sensitive rules for enhancing the business transactions.It also preserves the association rules for maintaining the privacy in database.

II. LITERATURE SURVEY
Privacy-Preserving Data Mining of Association Rules from Outsourced Transaction Databases technique [3] is developed with an encryption scheme.Encryption/Decryption (E/D) model was used to change the client data before it is shipped to the server.But, the mined results are not intended for sharing and remain exposed.Attack able to identify the intricacies of the rule preservation and data item property supports are not true supports.To Secure Association Rules, Secure Multi-party Computation (SMC) algorithm [4] is introduced to hide the association rules in a horizontally distributed database.The combination of private item subsets is calculated using SMC algorithm.Though, secure protocol is not relying on the commutative encryption and transfer.The SMC algorithm fails to secure the transaction items.
A perturbation-based PPDM with Multilevel Trust (MLT-PPDM) [5] is developed to preserve the privacy of data and association rules at different levels.This method preserved multiple perturbed copies, data miner perform resistance to diversity attacks and reconstruct the original data more accurately.MLT-PPDM permits the data owners for designs perturbed copies of data for different trust levels.However, the data set does not re-anonymize after it is updated with insertions and deletions.Efficiently Hiding Sensitive Item set with Transaction Deletion Based on Genetic Algorithms [2] is planned to enhance the chosen transactions deleted, so minimizing the side effects in Privacy-preserving data mining (PPDM) technique.But, predefined item set and a missing item set are non-sensitive item set that affect the rules being disclosed.
A Hiding Sensitive Association Rules [1] with Limited Side Effects are designed for PPDM.Heuristic method is involved for raising the hidden sensitive rules quantity in hiding sensitive association rule.The side effects minimized are not taken for correlation among rules that wipes out the creativity of the association rule.Privacy preserving data mining attains data mining goals without showing the privacy information of the individuals to the public users.A novel Hiding-Missing-Artificial Utility (HMAU) algorithm [6] is designed for hiding the sensitive itemsets during the transaction deletion process.Privacy preserving data mining (PPDM) is presented to hide the sensitive information.HMAU algorithm reduces the side effects through transaction depletion and the transaction with minimal HMAU value removed from the database.But, the noise addition and data modification are the significant problems to hide the sensitive information in PPDM.

III. ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION
Privacy Preserving Data Mining (PPDM) is used to extract relevant knowledge from large amount of data and protects the sensitive information from the data miners simultaneously.Privacy preserving data mining is a hot spot in data mining.Privacy Preserving Data Mining (PPDM) solves the issues of designing accurate models about combined data without access to exact information in individual data record.Association Rule Hiding is a PPDM technique use with Association Rule Mining method in transactional database.
An itemset is a set of products and transaction maintains simultaneously for a given set of items.The support of an itemset I in a transaction database is the percentage of transactions having I in the whole database.An itemset is frequent when the support is higher than a minimum support threshold (MST).
For two itemsets X and Y where  ∩  = ∅ .The confidence of an association rule  → is the probability that number of times Y occurs given that X occurs is equal to  ∪ divided by  .When  →  holds in the database if  ∪  is frequent and its confidence is higher than a minimum confidence threshold (MCT).This rule is called the strong association rule.Association rule mining is used to discover all strong rules in the database.

A. Association Rule Mining using Selection Technique for
Sensitive Rules Privacy-preserving data mining (PPDM) is designed to minimize the privacy threats.Privacy threats are decreased by sensitive information hiding process from databases.These types of data having the confidential information result in the privacy threats when the data gets misused.Heuristic methods are used to choose the suitable data for sanitization to hide the sensitive information.In hiding the sensitive information process, side effects of missing cost and artificial cost are created.The beneficial method is used to choose the hidden sensitive information based on the NP-hard problem in sanitization process.
An extensive work on privacy-preserving data mining (PPDM) is carried out in various contexts.A general characteristic of frameworks is the patterns mined from the data that are planned to distribute with parties other than the data owner.The significant difference between the work and issues is: both the fundamental data and the mined results are not planned for sharing and stays private to the data owner.A conservative frequency-based attack model [3] is used where the server recognizes the correct set of items in the owner's data.Furthermore, it recognizes the exact support for all items in the original data.The client/owner encrypts the data using encrypt/decrypt (E/D) module which is considered as a black box from its viewpoint.Encrypt/decrypt (E/D) module is used for converting the input data into an encrypted database.On the other hand, the server performs data mining operations and transmitted the patterns in the encrypted form to the owner of the data.The encryption scheme contains property where the revisited supports are not true supports.The E/D module regains the true individuality of the returned patterns and the true supports.An encryption scheme is named as RobFrugal.It is used to change the client data before it send to the server.
An alternative protocol is designed in [4] using simplicity, efficiency and privacy.Particularly, protocol fails in depending on commutative encryption and oblivious transfer.The solution is not completely secured.It gives large information to a small number of feasible combinations not same as protocol that discloses information.

B. Rule hiding for Privacy Preservation
The association rule hiding technique is to remove the sensitive rules from the transactional database during association rule mining.ARH technique protects sensitive data items by concealing the sensitive rules from miners and discloses all the non-sensitive rules to the miners.Data perturbation is used by Privacy Preserving Data Mining (PPDM) approach takes single-level trust on data miners.The technique establishes the ambiguity regarding individual values than the data released to the third parties for data mining purposes.In single trust level assumption, a data owner creates disturbed copy of its data with an amount of uncertainty.This assumption is restricted in many functions where a data owner trusts the data miners at various levels.An innovative dimension of Multi-Level Trust (MLT) [5] contains new demands for perturbation based PPDM.In contradiction to the single-level trust situation where only one perturbed copy is released and several perturbed copies of the similar data is presented for the data miners at various trusted levels.
The additional trust in data miner resulted in the less perturbed copy access.It also contains the access to the perturbed copies exist at lower trust levels.Additionally, data miners access multiple perturbed copies in forms.With diversity maintained across perturbed copies, the data miner on the other hand produced an exact reconstruction of the original data than permitted by the data owner.It is known as the diversity attack.It comprises the colluding attack situation where adversaries join their copies to increase an attack.It also incorporates the situation where an adversary uses public information to execute the attack by themselves.Preventing diversity attacks is the significant issue in solving the MLT-PPDM problem.
A compact prelarge GA-based (cpGA2DT) algorithm is designed in [2] to perform hiding operation of the sensitive itemsets while deleting transaction.The designed algorithm solves the issues of the evolutionary process by implementing both the compact GA-based (cGA) mechanism and the prelarge concept.A fitness function that was flexible in nature was structured using three adjustable weights to identify suitable transactions deleted to securitize the sensitive itemsets with minimal side effects of hiding failure, missing cost and artificial cost.A GA algorithm minimizes the memory needs by not taking the crossover and mutation operations but mimic the performances of traditional GAs.

C. Association Rule Hiding Techniques with Minimal Side Effects
The common technique of PPDM is to sanitize the database for hiding the information that is sensitive.A novel hiding-missing-artificial utility (HMAU) algorithm is designed in [6] to hide sensitive itemsets during transaction deletion.The transaction through the higher ratio of sensitive to non-sensitive one is chosen to delete.
In order to hide sensitive itemsets, three side effects were considered known as hiding failures, missing itemsets and artificial itemsets.Data sanitization is used to hide the sensitive knowledge from reveal in PPDM.To reduce the side effects, minimal distortion of the databases is required.
The transactions with any of the sensitive itemset are designed to locate the minimal HMAU values between transactions.The transaction with minimal HMAU value is directly taken away from the database.The process gets iterated till all sensitive itemsets are hidden.To avoid exposing hidden sensitive itemsets, the minimum count is modernized in the deletion process.A new heuristic method is designed in [1] that changes few transactions in the transaction database to reduce the supports or confidences of sensitive rules without any higher side effects.Connection between rules is not apparent to attain the goal.Heuristic methods are used for incrementing the hidden sensitive rules and minimize the number of modified entries.Rejected side effects are removed in the rule hiding process.The complete sensitive rules are hidden without unauthentic rules that are falsely created.

IV. COMPARISON OF ASSOCIATION RULEHIDING TECHNIQUES FOR PRIVACY PRESERVATION & SUGGESTIONS
In order to evaluate the privacy perseveration using association rule hiding, number of data is taken to execute the experiment.Various parameters are used to calculate the privacy preserving in association rule hiding of the data mining techniques.

A. Privacy Preserving Level
Privacy preserving level is described as the level at which the data is privately transacted to the corresponding user without showing to the public users.It also increases the information delivery to the private users.It is measured in terms of percentage (%).

B. Data Utility Rate
Data utility rate is defined as the amount of data utilized for privacy preserving using association rule hiding techniques.It is measured in terms of percentage (%).

=
The data utility rate comparison takes place on existing Privacy-Preserving Mining of Association Rules from Outsourced Transaction Databases (PPMAR-OTD) technique and Secure Multi-party Computation (SMC) algorithm.

C. Efficiency (in terms of Side Effects)
Efficiency is defined as the number of data hided without any side effects to the total number of data given.It is measured in terms of percentage.

𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 (%) =
ℎ ℎ         Fig. 3 demonstrates the efficiency of association rule hiding techniques for privacy preservation.The efficiency comparison takes place on existing compact prelarge GAbased (cpGA2DT) Algorithm and Multi-Level Trust Privacy Preserving Data Mining (MLT-PPDM).The experiment shows that MLT-PPDM is 10.34% higher efficient than cpGA2DTAlgorithm.

D. Execution Time
Execution time is defined as the time taken to hide the data with minimum side effects.Execution time is measured in terms of milliseconds (ms).

E. Memory Requirement
Memory requirement is defined as the amount of memory space required forhiding the data using the association rule hiding techniques.It is measured in terms of mega bytes (MB).Fig. 5 illustrates the memory requirement of association rule hiding techniques for privacy preservation.The memory requirement comparison takes place on existing Heuristic Method and Hiding-Missing-Artificial Utility (HMAU) algorithm.The experiment shows that HMAU Algorithm takes 21.59% lesser memory space than Heuristic Method.

F. Hiding Failure Rate (in terms of Side Effects)
Hiding failure rate is defined as the ratio of number of sensitive itemsets before sanitization to the number of sensitive itemsets after sanitization.It is measured in terms of percentage (%).The experiment shows that Heuristic Method has 26.63% lesser hiding failure rate than HMAU Algorithm.Heuristic methods are used to improve the level of hidden sensitive rules quantity.The modified database is used to hide sensitive rules with limited side effects.Efficient mechanisms are needed to increase the speed of the rule hiding process for large databases.The association rules generated from the modified database have item sets does not appear in original transaction database.The side effect minimization fails to retain the correlation among rules on the modified transactional database.Secure Multi-party Computation (SMC) algorithm computes the union of private subsets.
Secure mining of association rules is located in distributed databases in horizontal manner.The leakage information delivers, the protocol of item sets exposed, were insecure.Secure protocol is not based on the commutative encryption and transfer.
MLT-PPDM introduces the flexibility dimension that permits the data owners to make perturbed copies of data for various trust levels.In MLT-PPDM, data miners have an ability to approach several perturbed copies.Multiple perturbed copies and data miners achieves diversity attacks to modernize the original data more correctly.The department fails to have more power in reconstructing the original data with many copies.The data set does not re-anonymized after it is modified with insertions and deletions.Less perturbed copies are not used by data miners at lower trust levels.

A. Related Works
Direct and Indirect Discrimination Prevention method [9] was designed to evaluate the discriminatory frequent item sets between original and modified transactional database during data mining process.Discrimination-free data models are produced from transformed data set without damaging data quality and mining based on single measure.However, discrimination fails to include any measure to remove redundant information.To provide with a minimum extension to the original database, Border based approach with hiding algorithm [11] was designed to present the sensitive knowledge hiding.Border approach provides globally optimal solution for sensitive frequent item set hiding.However, the border approach fails to change the original data set properly, lead to information leakage and redundant frequent item sets.The regenerated frequent patterns were not present in the initial data set.

Locality-Sensitive Hashing (LSH) based Blocking
Approach with a Homomorphic Matching Technique [10] is designed for recognizing the candidate record pairs.The matching of pairs is designed using a basic protocol performing simple distance computations.Matching Technique is used for Privacy-Preserving Record Linkage.Though, it fails to create exact results because of the used anonymization format.Because of improper encoding, the initial distances fails to preserve.In order to obtain higher computational overhead, Local NN-search and Global data reorganization technique [8] is implemented for Sensitive Transactional Data.But, anonymization of personal data is not enough in various applications and the approach is not suitable because of the high dimensionality of the data.
An improved Gaussian Function based Perturbation Technique [7] was designed for preserving privacy of association rules and private data of individuals in an outsourced business transaction database.Gaussian Function based perturbation technique [7] preserved the privacy of association rules generated from the dataset and the sensitive frequent item sets.However, it is highly complex for distributed high volume dataset in cloud environment.A group incremental feature selection algorithm [12] was developed to locate the new feature subset in a short interval of time, when multiple objects are added to a decision table.Incremental feature selection algorithm is derived from information entropy and it manages an effectual as well as well-organized mechanism.Though, the time complexity does not include the computational time of entropies.

B. Future Direction
The future direction of the privacy preservation using association rule hiding techniques needs to handle the confidentiality of sensitive rules in terms of better data utility and optimal side effects on the modified transactional databases.As each user may have different concern over privacy, user-oriented privacy preserving techniques can be developed.Parallel algorithms could be developed to prevent revealing of sensitive association between items and to improve the performance of the algorithm for large and dynamic datasets.Most of the proposed research works are concentrating on side effects and numbers of sensitive rules are hidden from sanitized database.Those are not clearly stated about number of rules are hidden in each iteration, number of levels in multi-level sensitive rule hiding, number of scan needed for the database, computational efficiency in terms of memory and CPU time.In future, these objectives are also being considered and new techniques are to be proposed for hiding the sensitive association rules in privacy preserving data mining.

VI. CONCLUSION
Based on the obtained nature of the survey, existing privacy preservation techniques in data mining using association rule hiding techniques has less privacy preserving level and also involves higher amount of side effects.At the same time, the utility of the data is also very low.As well, it takes higher execution time and so the efficiency gets decreased.The survey shows that while sending the data to the destination, the public user access the data and so the privacy is not maintained when it reaches to the destination.These types of issues decrease the effectiveness of the existing systems.The wide range of experiments on existing techniques calculates the relative performance of several privacy preserving techniques and its limitations.For this reason the new privacy preservation technique using association rules hiding techniques are planned to design.Finally from the result, the research work can be carried out in privacy preservation using association rule hiding techniques to attain minimal side effects with higher data utility.

Fig. 3 .
Fig. 3. Privacy Preserving Level of Association Rule Hiding Techniques for Privacy Preservation Fig. 1 describes the privacy preserving level of association rule hiding techniques for privacy preservation.The privacy preserving level comparison takes place on existingPrivacy-Preserving Mining of Association Rules from Outsourced Transaction Databases (PPMAR-OTD)technique and Secure Multi-party Computation (SMC) algorithm.The experiment shows that SMC Algorithm has 9.37% higher privacy preserving levelthanPPMAR-OTD technique.

Fig. 4 .
Fig. 4. Data Utility Rate of Association Rule Hiding Techniques for Privacy Preservation Fig. 2 explains the data utility rate of association rule hiding techniques for privacy preservation.The experiment shows that PPMAR-OTD technique has 23.53% higher data utility rate than SMC Algorithm.

Fig. 4 Fig. 6 .
Fig. 4 describes the execution time of association rule hiding techniques for privacy preservation.The execution time comparison takes place on existing compact prelarge GAbased (cpGA2DT) Algorithm and Multi-Level Trust Privacy Preserving Data Mining (MLT-PPDM).The experiment shows that cpGA2DT Algorithm consumes 33.86% lesser time for execution than MLT-PPDM.TABLE IV.TABULATION FOR EXECUTION TIME OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION

Fig. 6
Fig. 6 shows the hiding failure rate of association rule hiding techniques for privacy preservation.The hiding failure rate comparison takes place on existing Heuristic Method and Hiding-Missing-Artificial Utility (HMAU) algorithm.

Fig. 8 .
Fig. 8. Hiding Failure Rate of Association Rule Hiding Technique for Privacy Preservation V. DISCUSSION ON LIMITATION OF PRIVACY PRESERVATION IN TRANSACTIONAL DATABASE An encryption scheme designed with encryption/decryption module was used to transform client data before it send to the server.An attack model was designed to expose secret information and knowledge for privacy preserving mining.Usually, association rule mining task is performed in a shared privacy-preserving framework.In privacy preserving mining, the mined result is not aimed at sharing and stays as private.Attack model failed to know the details for encryption algorithms.Encryption scheme preserve the support item count values, attacker module can only work based on false support item counts.

TABLE I .
TABULATION FOR PRIVACY PRESERVING LEVEL OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION

TABLE II .
TABULATION FOR DATA UTILITY RATE OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION

TABLE III .
TABULATION FOR EFFICIENCY OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION

TABLE V .
TABULATION FOR MEMORY REQUIREMENT OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION Fig. 7. Memory Requirement of Association Rule Hiding Technique for Privacy Preservation

TABLE VI .
TABULATION FOR HIDING FAILURE RATE OF ASSOCIATION RULE HIDING TECHNIQUES FOR PRIVACY PRESERVATION