Data-driven based Fault Diagnosis using Principal Component Analysis

Modern industrial systems are growing day by day and unlikely their complexity is also increasing. On the other hand, the design and operations have become a key focus of the researchers in order to improve the production system. To cope up with these chellenges, the data-driven technique like principal component analysis (PCA) is famous to assist the working systems. A data in bulk quanitity from the sensor measurements are often available in such industrial systems. Considering the modern industrial systems and their economic benifits, the fault diagnostic techniqes have been deeply studied. For example, the techniques that consider the process data as the key element. In this paper, the faults have been detected with the data-driven approach using PCA. In particular, the faults have been detected by using and statistics. In this process, PCA projects large data into smaller dimensions. Additionally it also preserves all the important information of process. In order to understand the impact of the technique, Tennessee Eastman chemical plant is considerd for the performance evaluation. Keywords—Fault Diagnosis; Principal Component Analysis; Multivariate Statistical Approach; Tennessee Eastman Chemical Plant Introduction


I. INTRODUCTION
Industrial process managemen is one the key and emering issue in the small as well large industrial systems.Modern industrial services are in large-scale and they are extremely complex.In addition, the control of process management is carried out with a great number of parameters under the system.In the industries like manufacturing, there is a pressure to produce excellence in end-products, which is in bulk quantit.In parallel, it is also important to minimize the losses of rejection rates and to satisfy the ecological rules.To fulfill the demands, up-to-date industrial systems cover a huge number of parameters working under closed-loop controllers.For that, a data-driven design of system is one of the great interest both in research and academia.Engineering systems such as aircraft controllers, industrial processes, manufacturing systems, transportation systems, electric and electronic system are becoming more complicated to lead the failure.It will directly related to the system reliability, availability, safety and maintainability.One the other hand, such factor are very important for a good industrial system.Many of such systems rely on human efforts and the availablity.In order to improve the performance and industrial systems it is necessary to work on the automation.It reduces the human efforts and the cost so it effects the economic conditions.Nowadays, the demand for the automated systems is also increasing in the market.In this research area, there is need to study different operating constraint and applications of industrial automation that explore and elaborate the process of automation.In automated systems in it necesssary to implement techniques and policies for the fault diagnosis and repair.The fault diagnosis system tries to assure that the plant is safe by identifying unwanted events.It highlights the key issue that may degrade the overall performacne of the system.This information is necessary for plant engineer so that a quick action may be performed.So that an immidieate rescue could be performed for the safety of the industrial system.There are many techniques are available for the performance monitoring and control.PCA is one of the basic technique in the pool of famous techniques.
In Section II related work is discussed.Methology has been discussed in Section III in which implementation of techniques is done stepwise.In Section IV, PCA technique is applied in the industrial benchmark process.In Section V results have been discussed.Section VI concludes the work and provides the guidelines for the future.

II. RELATED WORK
Multivariate statistical methods largely depend upon the huge quantity of past data to define the fluctuations in the process.Multivariate statistical process monitoring has the advantage of easy to design and make the analysis of process industries entire simple, due to this property it is most popular in industrial fault diagnosis systems while in detecting the abnormal operation from the process.The technique which has the capability to retain the major information and significant knowledge in a unique dataset that generates from the industrial process is PCA.There are many approaches are available for the fault diagnosis.These apporoaches use different parameters in order to detect the faults in the control systems.For example, the study in [1] highlited the fault diagnosis based on the neural netowrks.They combined such neural networks by an observer technique.Also, multivariate statistical approaches have been researched to deal with process monitoring [2].PCA is first appeared in 1889 until now research is ongoing and applications are still in study.www.ijacsa.thesai.orgThese methods have been successfully implemented in many industrial processes.MacGregor implemented PCA based process monitoring both in continuous and batch process and conclude PCA methods are capable of treating processes with a large correlated process data and can handle easily missing data [1].Raich and Cinar [3] proposed a diagnosis method based on angle discriminant using PCA.Due to some difficulties, its applicability is not so large instead of that it fits in so many fields and successfully implemented.Though PCA could affect if it is applied in nonlinear problems because real systems are mostly nonlinear in nature, and this technique takes account linear combination due to its linear method.Kramer [4] has generalized PCA to the nonlinear case by using a neural network.These chemical engineering applications, are mostly nonlinear but the method is linear.Application of PCA in the real system have been applied at Dupont and other companies, published in many conferences and journals several types of researches have performed similar case work on data collected simulator of process [5] [6].For sake of easiness many dimensions of dataset proposed to get more from data in different views and plot in single dimension [7], on taking this step that will helps the operator to get information from more than multidimensional data [8].In some cases multidimensional data acquire is quite difficult due to nonlinearities so an automation process proposed for process monitoring in [9].The application of PCA in these type of problems motivated by three features.Number 1, PCA can develop a method which takes all the data in low dimension which helps out to get meaning from entire data from the training set by use of all dimensional data.Number 2 the data in structured format with help of PCA help to identifying the affected variables.Number 3, PCA can isolated the space which the variables contain useful information that variables have process information and rest in another subspace which contain noise.In this fault could occur in any subspace primarily [10], this step can increases the sensitivity of process monitoring to detect faults.As an effective data-driven process monitoring technique PCA can adapt complicated conditions according to rules of statistics.It is classical projection methods of multivariate statistical process monitoring which is then applied to train model beneath nominal conditions.Thus it detects online faults [11].These techniques are highly demanded based on measurements [12].In practical industrial outliers that is difficult to handle in spite of so many advantages and easy to design model [13].Sensor failure, network transmission error, machine malfunction, database software, and data recording errors are mainly cause for irregularities produce data with noise [14].Such cases outliers smoothed by mean and averaging [10].

III. METHODOLOGY
PCA is one of the famous and widely used technique.It has been effectively used in various areas including image processing, signal analysis, pattern recognition, data compression and process monitoring.This techniques is simple and efficient and have capability to process industrial data.It is familiar as influential tool for process monitoring.For this purpose, it is used in the process industry for process monitoring.PCA algorithm is a founding technique of automated process monitoring.These advanced PCA methods for example recursive, adaptive and kernel.These techniques extract the useful information from data in keeping view this it is a widely used area in fault diagnosis.It extracts orthogonal vectors in sets, known as loading vectors.It tells the amount of variance known to orthogonal vectors.Consider a process measurement matrix , where is variables and is observations in measurement matrix.

A. PCA based Fault Detection
Step 1: Pretreatment of data is done in this step.Normalize columns of Step 2: Obtain the Covariance of measurement matrix by 1 1 Step 3: In this step, the loading vector is extracted by obtaining the Singular Value Decomposition (SVD) from the above equation: 1 1

Where
Step 4: To calculate PCs (principal component) divide into the score and residual matrices.0 0 The value of L can be taken from V pc : In the above equation V res is the residual space.
Step 5: To obtain the matrix , the following equation is used:

B. PCA and the Faults
In the process, T 2 can be obtained bu the following equation: Where is the f-distribution.If the threshold value from equation no 1-5 exceeds the test statistics in equation no: 1-6 fault occurs.As statistics is demonstrated on the basis of loading vectors singular values, so it does have a problem to some inaccuracies [15] in the residual part values.The square prediction error is then used which utilizes the residual space [16].
Statistics threshold is achieved by: Where represents the standard deviation of the distribution parallel to the percentile.Hence, the confidence level for the may be determined with the help of equation ( 7) in order to cope with abnormalities.

IV. THE BENCHMARK PROCESS
In this part to verify the algorithm like PCA, simulation is carried out in order to diagnosis the faults.It is computeroriented simulator many types of research used it for comparison of different data-driven algorithms as well as model-based.It is like realistic simulator which mimic the original behavior of the typical chemical plant.TEP served as the desired benchmark to assessment algorithms for many techniques such as process observing control and fault diagnosis.TEP utilized to examine multivariate statistical process monitoring (MSPM) methods.It facilitates with different operating regimes.Figure 1 presents the flow of the process.A number of connected modules are present.These include a condenser, reactor, separator, stripper, and compressor.It consists with four in number of reactants or input and two products or output, along with by-product, and an inert by symbolically represented as A to H.There are 52 measurements by each process.Among them, 41 are the output variables and the rest are input variables.The output and input variables are depicted in Table 1 [8].Further, the process variables (from XMV(1) to XMV (11)) are used with the standard values (as in accordance with [8]).The researchers in [17] performed the simulation of TE process.There has been successfully worked done in these techniques www.ijacsa.thesai.org reported in.Training data set produced for working on this which includes both faulty and normal data set.The work in [18] proposed a control scheme which is thus applied in TEP.Flowchart of an industrial plant is demonstrated in figure 1.The gaseous components are presented as A, C, E, D and the inert B. It first feeds to the reactor.Formation of G and H component as a product from these inputs feed into a reactor.A simulator has been developed details can be found in [19].Following equations are input-output relations of the process.
A k In this equation the Component F is a by-product, that process is exothermic and cannot reversible.They are in firstorder with concentrations higher temperature happened due to fastest reaction of component G over reaction of H component.Separator has vapors from reaction which is recycled again and again that is input to the compressor.The stream generated from the process keep for the use of by product and inert.Stripper "stream 10" is driven by a separator which is condensed.

C. Main Process Variables
The process has 41 calculated variables and 11 input variables.They are listed in Table 1.Out of which 22 variables which sample every three minutes.There are 21 process faults as described in the work [17].

D. Simulated Faults in TEP
There are 21 faults in TEP process listed in Table 1.These faults affect mostly in chemical process parameters like process variables, kinetics, feed concentration and different types of actuators in the chemical process like pump valves.Data-driven approaches require online and offline training data, in this simulator there are 22 online test data sets, their duration for 48 hours plant operation and 960 samples of data are generated during simulations in [5], where faults are added after 160 data sample.

V. RESULTS AND DISCUSSION
There are a number of faults as depicted in the Table 2.All the values of Table 1 and 2 are directly taken from [8].One of the first steps is to implement the PCA in order to detect such configured faults.The default dataset from the TE simulator is used for the experimentation and evaluation.The input and output variables are configured.Such as, XMEAS  and XMV (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11).In real life the reason for the faults is unknown, similarly the TE also introduce faults at different subspaces.
According to Table 2, the fault in the condenser cooling water inlet temperature is represented by IDV2.It occurs after 160 data samples.In figure 2 PCA of IDV 2 is shown.IDV (6) involves in step type of fault, it is simulated by a sudden change in the reactor cooling water inlet temperature.The results of PCA diagnosis the affected variable from "A" feed loss in TE process are shown in figure 3. The algorithm for process monitoring is developed with the help of 960 samples taken from the ordinary process operations.Similarly, IDV (18) is unknown type of fault listed in Table 2, affect the process variable, it influence on the unknown variable.PCAbased statistics identifying the unknown variable shown in figure 4. www.ijacsa.thesai.org

VI. CONCLUSION
The fault diagnosis is very important for the optimized systems.Specifically, the data-driven techniques are getting famous.This work presents a detailed study on the data-driven technique.The design of a data-driven technique using the PCA is proposed.PCA is simple efficient and easy to design due to these properties it is frequently used in fault diagnosis techniques.The industrial benchmark, Tennessee Eastman process is used for the simulation and analysis of the datadriven technique, which successfully detects the faults.
The major objective of further investigation is to analysis of non-Gaussian process data, since in industries mostly system are non-linear in nature.In this work it is assumed that data which are under consideration is Gaussian.A framework should be established that will directly constructed from process data for construction of fault tolerant architecture. www.ijacsa.thesai.org

Fig. 1 .
Fig. 1.Tennessee Eastman Process.represent the set of loading vectors for large singular variance.The simplified equation will be:  