Root-Cause and Defect Analysis based on a Fuzzy Data Mining Algorithm

Manufacturing organizations have to improve the quality of their products regularly to survive in today’s competitive production environment. This paper presents a method for identification of unknown patterns between the manufacturing process parameters and the defects of the output products and also of the relationships between the defects. Discovery of these patterns helps practitioners to achieve two main goals: first, identification of the process parameters that can be used for controlling and reducing the defects of the output products and second, identification of the defects that very probably have common roots. In this paper, a fuzzy data mining algorithm is used for discovery of the fuzzy association rules for weighted quantitative data. The application of the association rule algorithm developed in this paper is illustrated based on a net making process at a netting plant. After implementation of the proposed method, a significant reduction was observed in the number of defects in the produced nets. Keywords—Data mining; association rules; defect analysis; fuzzy sets; root cause analysis; quality


INTRODUCTION
Manufacturing organizations have to improve the quality of their products regularly in order to survive in today's competitive production environment.The high quality of a product is an important factor for increasing customer satisfaction and market share; therefore, manufacturing organizations should have an extensive understanding of quality to compete in the international markets.From the ISO-9000 point of view, quality is "the totality of characteristics of an entity that bear on its ability to satisfy stated and implied need".Quality improvement means the promotion of standards and the reduction of product defects.A defect is a gap between the expected results and observed results [1].Consequently, identifying product defects, determining their causes, and implementing corrective actions to reduce defects are essential and inevitable matters for manufacturing organizations.
It is generally difficult to identify the causes of a particular defect because the defect is not the outcome of a single cause, but occurs when a few associated causes combine [2].There is a close relationship between occurrence of defects in the products and the manufacturing process parameters; i.e. the malfunction of these parameters can cause defects to occur in the products.The manufacturing process parameters can be categorized based on the following: man, machine, material, method, and environment.Controlling these parameters and finding their relationships with the product defects will help Quality Improvement Teams (QIT) reduce and eliminate the defects.
This paper presents a methodology for identification of unknown patterns between the manufacturing process parameters and defects of the output products.Moreover, it identifies the relationships between the defects.Discovery of these patterns helps practitioners achieve three main goals: 1) Identification of the process parameters that can be used to control and reduce output product defects.
2) Identification of the defects that most probably have common roots.
Since manufacturers usually face large data warehouses of manufacturing processes, data mining techniques can be used to exploit useful knowledge from these datasets.Data mining is a discipline that aims at extracting novel, relevant, valuable and significant knowledge from large databases.Data mining includes several tools such as decision trees, association rule mining (ARM), neural networks, fuzzy sets, statistical approaches, etc.
In this paper, a data mining algorithm is used to find fuzzy association rules on weighted quantitative data.The values of defects and parameters are expressed in fuzzy values, and the weights of defects and parameters are allocated according to their importance.The proposed technique will obtain interesting, understandable patterns discovered among the process parameters and output defects due to use of the concept of fuzzy sets and weights.
The rest of this paper is organized as follows: In the next section, the related works on fuzzy ARM and root cause analysis are outlined.Section 3 introduces the mathematical approach; Section 4 presents an application of the methodology in a netting plant, and provides a discussion of how to analyze defects in a net fabrication process using the results obtained from the algorithm; and finally, concluding remarks will be discussed in Section 5.

A. Fuzzy Association Rule Mining
Association rule mining is a popular data mining technique due to its numerous applications in diverse areas.An www.ijacsa.thesai.orgassociation rule is an expression of X→Y, where X is a set of items, and Y is a single item [3].For mining an association rule, two numeric values should be calculated: support and confidence.The support of an association rule is the proportion of transactions that contain both the antecedent and the consequent.The confidence of an association rule is the proportion of transactions containing the antecedent that also contains the consequent.
Agrawal et al. introduced several algorithms for extracting association rules from large databases [3], [4].Moreover, different methods of association rule mining and their applications have been proposed by other researchers.In many algorithms for association rule mining, researchers have considered the relationships between transactions consisting of categorical attributes (categorized items) using binary values.However, transaction data in real-world applications usually consist of fuzzy and quantitative values.
In previous years, some work has been done on the use of fuzzy sets in discovering association rules.Miller and Yang applied Birch clustering to identify intervals and proposed a distance-based association rules mining process, which improves the semantics of the intervals [5].To solve the qualitative knowledge discovery problem, Au and Chan applied fuzzy linguistic terms to relational databases with numerical and categorical attributes.Later, they proposed the F-APACS method to discover fuzzy association rules [6], [7].Consequently Hong et al. proposed an algorithm for mining fuzzy rules from quantitative data [8].They transformed each quantitative item into a fuzzy value using membership functions to find fuzzy rules.Fuzzy association rules are easily understandable to people because of the linguistic variables associated with fuzzy sets.
In association rule mining algorithms, minimum support value (minsup) and minimum confidence value (minconf) are used to measure the frequency and strength of the rules.In a database, some valuable items may not occur frequently; therefore, they may not be included in the final association rules.To solve this problem, some researchers have suggested reduction of the minsup and minconf values to include the rules containing valuable items.But these rules sometimes fail to comply with user objectives, because many irrelevant rules may be generated.For avoidance of this issue, some approaches have been introduced.Muyeba et al. tried to use the concept of weight in their new algorithm, and introduced a fuzzy weighted association rule mining algorithm with weighted support and confidence measures [9], [10].Gyenesei also used weighted quantitative association rule mining based on a fuzzy approach (FWAR) [11].However, his proposed algorithm was not suitable due to the data overflow problem.Thus, Olsen et al. proposed a method capable of solving this problem [12].His is one of the most perfect, easy using algorithms proposed to identify association rules on fuzzy weighted data.
During recent decade a few researchers have tried to introduce more sophisticated approaches.Lin et al. introduced Compressed Fuzzy Frequent Pattern Tree (CFFPT) algorithm which integrates the fuzzy-set concepts and the FP tree-like approach to efficiently find the fuzzy frequent itemsets from the quantitative transactions [13].Also Moustafa et al. developed a novel technique named FFP_USTREAM.This technique integrates fuzzy concepts with ubiquitous data streams, employing sliding window approach, to mine fuzzy association rules [14].

B. Root Cause Analysis
Root cause analysis (RCA) is a process of analysis to define the problem, understand the causal mechanism underlying transition from desirable to undesirable condition, and to identify the root cause of problem in order to keep the problem from recurring [15].There are a variety of methods as RCA tools: Cause-Effect Diagram, Fault Tree Analysis, Current Reality Tree, 5-Whys, Apollo Root Cause Analysis, Interrelationship Diagram, Barrier Analysis, System Process Improvement Model, Causal Factor Analysis, Event-Causal Analysis, Bayesian Interference, Failure Mode and Effects Analysis, Cause-Effect Matrix, etc.
In current century, due to development in intelligence science, some researchers have used data mining methods to analyze defects in manufacturing processes.Donauer et al. utilized a pattern recognition method to find the root causes of failures considering economic aspects [16].Al-Salim recommended a data-mining-based methodology to assign quality improvement teams to investigate and eliminate the defects in manufacturing enterprises [17].In the first stage, related defects are grouped based on an association-rule technique, and then, in the second stage, the groups of defects are allocated to the quality improvement teams based on a mathematical programming model that minimizes expected quality costs pertaining to the quality improvement process.A major deficiency of this algorithm is that it only uses binary datasets of defect occurrences, but does not take into account their frequency in each record.
During recent years some RCA methods have been developed based on using ARM techniques.Chen et al. introduced a method using association rule mining techniques for identification of root-cause machine sets that, most likely, are sources of defective products [18].Sadoyan used a kind of association rule based on the rough set theory for manufacturing process control [19].This algorithm extracts knowledge from large data sets obtained from manufacturing processes, and represents the knowledge using "if/then" decision rules.Then, the results obtained from the data mining algorithm are used for controlling the output of the manufacturing process.Lee et al. used the standard ARM algorithm to quantify the causality between defect causes, and social network analysis to find indirect causality among them [20].Most of these researches are based on using standard ARM algorithm as a RCA tool.Since there are more information in expressing the occurrence of defects based on fuzzy values rather than binary ones, thus in this paper we introduce a novel RCA methodology based on using fuzzy weighted association rule mining algorithm.

III. METHODOLOGY
The procedure for achieving the goals mentioned in the introduction consists of a two-stage framework.The first stage determines process breakdown, and the second stage identifies www.ijacsa.thesai.orghidden rules from manufacturing process databases using FWAR algorithm.Then, the obtained rules are analyzed to improve the process.

A. Process Breakdown Structure
In a process, the outputs are a function of the inputs.In a manufacturing process, as well, the product defects (outputs) are related to the process parameters (inputs).So as the first step of defect analyzing process, the input parameters of the manufacturing process should be recognized.These parameters that affect product defects can be categorized into the following main groups: man, machine, material, method, and environment.The recognized parameters and defects can be displayed through a structure (process breakdown structure) that can help practitioners to gain better perception of the process.

B. Relationships Recognition
In this section, we attempt to find hidden relationships between the specified process parameters and defects using Olson's modified FWAR algorithm.

Notation:
n: the total number of data observation records; m: the total number of parameters; z: the total number of sub-parameters; : the k th quantitative sub-parameter from the j th parameter, where j=1 to m, k=1 to z, and j=1 to m-1 are parameters and j=m is a defect;

Algorithm:
Input: n, m, z, the membership function of each item, minsup and minconf; Output: fuzzy association rules.
Step 1: Transform the quantitative value of each record , i=1 to n, for each , j=1 to m, k=1 to z, into fuzzy membership values ( | |) using the given membership function of .
Step 2: Calculate Sup ( ): for j=1 to m, k=1 to z, | |, the support value of fuzzy region , to form C 1 , the set of candidate 1-itemsets.
Step 4: If L 1 is not null, then do the next step; otherwise, exit the algorithm.
Step 5: The algorithm first joins together large itemsets in L r under the condition that r-1 items in the two itemsets are the same, and the other one is different; then, the algorithm retains in C r+1 the itemsets for which all the sub-itemsets of r items exist in L r and which do not have any two items R jkp and R jkq (p≠q) of the same P jk ; the itemsets are called candidate ritemsets.
Step 6: Do the following sub-steps for each newly formed Step 7: If L r+1 is null, then do the next step; otherwise, set r=r+1 and repeat steps 5 to 6.
Step 8: Collect the large itemsets together.
Step 9: Construct association rules for each large q-itemset S with items S 1 , S 2 , … , S q , q≥2, using the following sub-steps: From Step 10, three kinds of rules can be obtained: 1) Process Parameter(s) → Defect  For controlling and reducing output product defects. For root cause analysis.
2) Defect(s) → Defect  For identification of the defects that most likely have common roots.
3) Process Parameter(s) → Process Parameter  For identification of the relations between parameters to help control and reduce defects.

IV. AN INDUSTRIAL APPLICATION: FISH-NET MANUFACTURING PROCESS
This section presents an application of the introduced algorithm in a fish-net manufacturing plant.As it is shown in Fig. 1, the fish-net manufacturing process has five major subsections as follows: 1) net making, 2) inspection and repair, 3) dyeing and dehydrating, 4) net stretching, 5) packing.
The most important step in this process is net making, which is performed by special machines.If the produced nets at this stage have many defects, the cost and the time of inspection and repair in the next step will increase.Also, some defects lead to defective nets that cannot be repaired.The net making process parameters such as the performance of the machines, workers' skills, and quality of the strings could impact the net defects.Fig. 2 presents some meshes without any deficiencies.This paper is focused on using the algorithm in the net making process to identify unknown rules between the net making process parameters and the defects of the output nets and also to identify the relationships among the defects.These rules can help practitioners to find 1) the causes of the defects that have occurred; 2) interrelated defects with common roots; and 3) process parameters that can be used for controlling and reducing the output net defects.

A. Breakdown Structure of the Net Making Process
First, the breakdown structure of the net making process is to be defined for making a standard scheme for stating the process parameters and defects.After consulting some experts, the manager of the net making section provided the breakdown structure, and specified the variables that must be considered for recognition of the relationships.

B. Identifying Hidden Relations
After developing the process breakdown structure, we applied the introduced algorithm to find the relationships between the net making parameters and defects.The information on the defects and process parameters is shown in Tables 1 and 2, respectively.In this section, we have used only 10 records of the net making process to show the performance of the algorithm in an industrial application (as shown in Table 3).The software program developed in MATLAB is used for execution of the rule generation algorithms introduced in this paper.
Step 1: The quantitative values in Table 3 are transformed into fuzzy values using the membership functions given in Fig. 3.


Step 3: The process specialists recommended .If Sup ( ) minsup, then is stored in the set of large 1-itemsets (L 1 ).


Step 4: L 1 set is not null, so we go to the next step.
Step 5: According to the itemsets in L 1 , candidate C 2 is generated.
 www.ijacsa.thesai.orgNote that itemsets such as (R 422 , R 423 ), having categorical classes of the same process parameters or defects, would not be retained in C 2 .

  
The 2-itemsets in C 2 the support values of which are equal to or greater than minsup are shown in Table 4.
c) The candidate 2-itemsets the support values of which are equal to or greater than minsup are stored in L 2 .


Step 7: Since the L2 set is not null, Steps 5 and 6 are repeated to find L3.C3 is generated from L2.

 
The 3-itemsets in C3 whose Support values are equal or greater than minsup are shown in Table 5.Thus,


The 4-itemsets in C4 and their support values are shown in Table 6.All the support values are less than minsup, so L4 is null.Then, step 8 begins.
Step 9: All the impossible association rules for the itemsets of L 2 and L 3 and their confidence values are shown in Table 7. Step 10: was recommended by the process specialists.The association rules the confidence values of which are equal to or greater than minconf are the outputs of the algorithm (see Table 8).

C. Discussion
Association rules discover patterns in a database.Analysis and evaluation of whether or not rules are meaningful is based on the analyzer's viewpoint.In Table 8, Rule 1 shows a relation between the net making process parameters and the defects.These kinds of rules can help net manufacturers achieve two main goals:  Identification of the net making process parameters which can be used for controlling and reducing the output net defects.
 Root cause identification when a defect occurs.

Consider Rule
. It means that if the record of service of an operator (P 12 ) is low, the deviation from expected mesh size defect will occur at a medium level.Rules 2 to 6 show the relations between the net making process parameters; these relations can be used to regulate the process parameters to control the net defects.Rules 7 and 8 show the relations between the net defects.These kinds of rules can be used not only in specification of the net defects that very probably have common roots but also in identification of the net defects that can impact other defects.For example, Rule shows that if "net tear" is medium and "knotless" is high, then the deviation from expected mesh size defect will be medium.
In this paper, we used a process dataset consisting of 10 records only to introduce the application of fuzzy weighted association rules in a net making process.Evidently, for attaining useful, valid rules from data mining algorithms, largesized databases must be used.Although we applied the algorithm during a performance improvement project at a fishnet manufacturing plant with a dataset consisting of 850 records, the results helped the management to have a better perception of the process to control and reduce net defects and minimize the costs.After implementing the method and conducting an improvement meeting, we observed a significant reduction in the rate of defects in the produced nets.

V. CONCLUSION AND RECOMMENDATION FOR FUTURE RESEARCH
This research clearly points out the potential of association rules as a tool for industrial application especially in manufacturing processes.In this study, an approach was presented for discovering useful patterns between process parameters and product defects using a fuzzy weighted association rule algorithm.Compared to other association rule algorithms, these obtain more understandable patterns and more interesting discovered rules using the concepts of fuzzy sets and weights.The rules obtained from the manufacturing process database can be used for controlling defects and analyzing root causes.An application of the proposed method during a net making process at a netting plant was demonstrated.A detailed discussion on how to control the manufacturing process defects using the results obtained from the algorithm was also presented.After implementing the method during a performance improvement project, we observed a significant reduction in the rate of defects in the produced nets.This work can be applied in various areas.One of our future focuses will be on expansion of the use of association rule algorithms as a main part of quality improvement methodologies, such as six sigma.The six sigma methodology helps improve the process through finding the relations between inputs and outputs and controlling outputs using the identified relations.Therefore, association rule algorithms can be used as fast, simple tools for finding hidden relations between process variables and expedited six sigma phases.

|
|: the number of fuzzy regions of ; : the t th fuzzy region of , | |, called item; calculated support value of Sup: the calculated support value of each candidate itemset; Conf: the calculated confidence value of each large itemset; minsup: the predefined minimum support value; minconf: the predefined minimum confidence value; C r : the set of candidate itemsets with r items; L r : the set of large itemsets with r items.
(r+1)-itemset S with items ( , , … , , … , S r+1 ) in C r+1 , 1 r+1.a) Calculate the fuzzy value of each record of S as ⋀   where is the membership value of in fuzzy region S x , is the weight of item S x .If the minimum operator is used for the intersection, then   b) Calculate the support value Sup (S) of S in the record If Sup (S) minsup, then store S in L r+1 .

Fig. 3 .Step 6 :
Fig. 3. Fuzzy functions of sub-parameters and defects.Step 6: The following sub-steps are performed for each newly formed candidate 2-itemset.a) The membership value of each 2-itemset is calculated.For example, consider the (R 121 , R 133 ) set.The membership function values for sample 1 are calculated as 1 for R 121 and zero for R 133 :    b) The support value is calculated for each candidate 2-itemset in C 2 .   ijacsa.thesai.org

TABLE IV .
SUPPORT VALUES GREATER THAN MINSUP

TABLE VI .
SUPPORT VALUES

TABLE VIII .
FINAL RULES