Comprehensive Analysis for Sensor-based Hydraulic System Condition Monitoring

Condition monitoring of equipment can be very effective in predicting faults and taking early corrective actions. As hydraulic systems constitute the core of most industrial plants, predictive maintenance of such systems is of vital importance. Due to the availability of huge data collected from industrial plants, machine learning can be used for this purpose. In this work, a hydraulic system condition monitoring (HSCM) is addressed via a public dataset with 17 sensors distributed throughout the system. Using a set of 6 features extracted from sensory data, the random forest classifier was proven, in the literature, to achieve classification rate exceeding 99% for four independent target classes, namely, Cooler, Valve, Pump and Accumulator. In this paper, sensor dependency is examined and experimental results show that a reduced set of important sensors may be sufficient for the addressed classification task. In addition, feature importance as well as implementation issues, i.e. training time and model size on disk, are analyzed. It is found that the training time can be reduced by 25.7% to 36.4% while the size on disk is reduced by 70.3% to 85.5%, using the optimized models, with only important sensors employed, in comparison with the basic model, with full set of sensors, while maintaining classification precision. Keywords—Condition monitoring; sensory data analysis; machine learning; classification


I. INTRODUCTION
This Faults or failure of equipment in an industrial plant may have serious consequences ranging from threatening the safety of operators, causing the plant to shut down for long periods of time, and lowering production rate and revenue [1]. For these reasons, maintenance plays a crucial role in process industries.
The simplest strategy for maintenance is to wait until the fault occurs and then start reacting. In this strategy, there is cost for replacing the damaged equipment and additional cost for the loss of production during equipment downtime. More advanced strategy is scheduled maintenance which is performed periodically. This approach, despite being effective, may take corrective actions which are unnecessary and costly. The most advanced approach is predictive maintenance in which the condition of equipment is continuously monitored and faults are predicted and necessary corrective actions are taken [2].
Currently, equipment condition monitoring (CM) is possible thanks to the advances in sensor technology as well as machine learning techniques which can process huge bulks of data from sensors distributed throughout the plant. These techniques, which extract key features from data and correlate them to possible faults [3] are successfully applied in condition monitoring of e.g. gearbox [4][5][6], rotating machinery [7], motor bearings [8], centrifugal pumps [9,10], hydraulic systems [11], cutting tools [12], grinding mill liners [13], and semiconductor failures [14].
Hydraulic systems are core components in most fields of industries such as water treatment plants, vehicle, aerospace [15], and other industries. The failure of hydraulic system can cause a whole plant to shut down or threaten operators' safety [16][17][18]. In addition, hydraulic systems are not operatorfriendly environments for condition monitoring [19]. Due to these reasons, condition monitoring of hydraulic systems gain a lot interest in the past two decades. For example, Liu [16] developed a tree structure model for fault diagnosis of out-ofsync oil cylinder. El-Betar et al. [17] proposed a neural network scheme for fault diagnosis of actuator leakage and valve spool blockage. Tian et al. [18] applied support vector machines (SVM) for predicting pump faults. On the other hand, Jegadeeshwaran and Sugumaran [20] employed both SVM and decision trees for fault detection of hydraulic brake systems. Helwig et al. [21] developed a hydraulic test rig with several induced fault types. Key features are extracted from sensors' data and the most highly correlated with a given fault are determined. Linear discriminant analysis (LDA) was employed to reduce feature space. The same test rig was further examined by Chawathe [22] who applied naïve Bayes, decision trees, and random forests (RF) [23]. RF classifier achieved classification accuracy of about 99% for all classes. Furthermore, it was noted that accuracy can be retained using only small set of features. Quatrini et al. [15] have studied the same dataset and used Pearson's correlation coefficient to rank the features correlated with a given fault. Algorithms such as SVM, ANN, RF, and logistic regression are tested and again RF outperfoms the other techniques for most fault types. On the other hand, König and Helmi [24] applied convolutional neural network (CNN) successfully for the same dataset [21]. In the contrary to previous studies, CNN has the ability to automatically extract key features. In addition, an analysis of misclassifications is also conducted.
given type of fault in HSCM system, and to analyze the implementation issues of the optimized classifier models.
The paper is organized as follows: the benchmark hydraulic system and dataset are described in Section 2. Section 3 illustrates the classifier model. The experimental results are then presented in Section 4. Section 5 presents a detailed discussion and the most important findings are highlighted. Finally, conclusions and future work are given in Section 6.

II. HYDRAULIC SYSTEM DATASET
For the purpose of setting up an environment to test and diagnose common faults in hydraulic systems, Helwig, Pignanelli, and Schütze [21] developed the hydraulic test rig shown in Fig. 1. In this system, several reversible faults, with different degrees of severity, can be induced and the data of the sensors distributed throughout the system are recorded. By correlating features extracted from sensor data and known faults, an antomated mechanism for fault detection can be developed.
The system consists of two hydraulic circuits: the primary working ( Fig. 1, top) and the cooling-filteration circuit (Fig. 1, bottom). The two circuits are connected through an oil tank. The primary circuit contains the main pump (MP1) and a relief valve (V11) which can be used to generate different load levels in the circuit, as well as four-compartments accumulator (A1, A2, A3, and A4) for pressure storage. The secondary circuit contains the cooler unit (C1).
The test rig is equipped with 14 sensors to measure pressure (PS1 -PS6), flow (FS1, FS2), temperature (TS1 -TS4), electrical power (EPS1), and vibration (VS1). The measurements are recorded using the standard industrial 20 mA current loop interfaces connected to a data acquisition system within Beckhoff CX5020 PLC. Additionally, three virual sensors are designed to provide estimates for system efficiency (SE), cooling efficiency (CE), and cooling power (CP). The data is collected with sampling rates of 100 Hz for pressure and motor power, 10 Hz for flow rate, and 1 Hz for other variables.
In addition to recording 17 sensors' data, the state or condition of the following targets: Cooler, Valve, Pump and Accumulator, are also recorded. A total of 2205 cycles or training examples are collected, each is 60 second long. The training examples contains cases for each state of the four targets ranging from being fully operating to close to complete failure. A list of targets, degrees of faults, their abbrevations, and the corresponding number of training examples are given in Table I. As can be seen, the problem at hand can be considered as four separate classifiaction problems, one for each target.

III. THE CLASSIFIER MODEL
The two main components of a classification task are the extraction of features and the use of a suitable type of classifier.
Features are key representative attributes of raw sensor data. They can be time domain or frequency domain. The set of features introduced by Quatrini et al. [15] are reused here to implement the experimental models. Each operation cycle is represented by 6 features, namely mean (m), standard deviation (sd), skewness (sk), kurtosis (k), slope of linear fit (slf) and position of maximum (p). The first four features characterize the distribution density of sensory signals, while slf and p features can capture the shape of the signal. These features proved very useful for fault recognition in such applications. Thus, for the 17 sensors, there are 102 features in total.
On the other hand, according to previous studies, random forest (RF) and artificial neural networks show outstanding results in the addressed dataset with a slight preference of RF [15,24]. As the objective of the current work is not to compare different classifiers but to determine which sensors and features are important in detecting a given type of fault, random forest is employed in this paper.
Random forest classifier is an ensemble of decision trees. Each tree is fed with a set of features and provides a decision which represents a corresponding class. Within the forest, the most voted class is selected as the final classifier output [23]. To use a random forest, two parameters need to be set: the number of trees and the maximum number of splits allowed in each tree which controls the tree depth.

IV. EXPERIMENTAL RESULTS
In this section, several experiments are conducted to test the performance of RF classifier using the full set of sensors and features, and then a reduced set of them.
In all experiments, RF model is implemented with 100 decision tress and maximum number of splits equals 10. The latter parameter is selected out of values between 2 and 10. For each target, samples are randomly split into 75% for training and validation, and 25% for testing. A number of 100 independent computer runs are carried out per experiment in order to well characterize the average performance of classification models.
To evaluate the classification performance, the following set of metrics are used: accuracy (Acc), precision (Pre), recall (Rec) and F-measure (F). They are defined as follows: Where NS is the total number of training examples, and TP, FP, TN, and FN denote the number of true positive, false positive, true negative, and false negative examples of a given class, respectively. The environment of Matlab 2018 is used for the implementation and testing on a machine with core i5, 2.6 GHz CPU and 10 GB RAM. The implementation of RF algorithm follows [23].

A. RF for HSCM Task
In this preliminary experiment, the RF classifier is used with the full set of sensors and features. The average classification rates and implementation issues (model size, training and inference times) for the RF model are presented in Table II. The average rates of all metrics are above 0.99. High accuracy and F-measure ensure the effectiveness and robustness of employed attributes or features together with the classifier.  As can be seen from Table V, only one observation of "Weak leakage" is misclassified as "Severely leakage" for the Pump target and vice versa. Also, from Table IV, two observations are misclassified for "Slightly reduced pressure" and one for "Severely reduced pressure" of the Accumulator target. From Table II, the size of classifier model is 4028 KB for the Cooler, with an increment of 12%, 12.7% and 32.5% for the Valve, Pump and Accumulator targets, respectively. Such variance may be explained by the different degrees of difficulty for classifying each target conditions. In addition, the training time for cooler model is the fastest with time 1.4 sec, while models of Valve and Pump require about 1.7 sec. The Accumulator model takes more training time with 2.35 seconds. Clearly, the inference time (i.e., testing the model for one observation) is almost the same for any of the four models and equals approximately 0.09 seconds. Summing up, the proposed RF model alone achieves outstanding recognition rates for all targets. Unlike the work of Quatrini et al. [15], notable performance was achieved by two classifiers, ANN for the Pump with rate 0.998 for the pump target while RF was better for the Cooler, Valve and Accumulator with classification rates of 0.998, 1 and 0.991, respectively.

B. Sensor and Target Correlation
The sentiment analysis of the impact of individual sensors on recognition of severe operating conditions is extensively studied in this section. The features of each sensor are introduced to the RF model. It might be more important to examine an employed sensor capability of recognizing every probable failure condition ignoring false alarms. Regarding this concern, precision metric is applied in this experiment. Each sensor test is repeated for 100 times in order to build up rigorous conclusions. Then, the highest-precision sensors on average are determined.
For the Cooler target, the classification task seems straightforward in agreement with previous studies [15,24]. Precision achieved using only one of the following sensors: TS2, CE, TS1, CP, PS6, PS5, TS3 and TS4, exceeds 0.995. The use of one sensor minimizes the model size from 4028 to 651 KB and the training time from 1.4 to 1.04 sec as shown in Fig. 2(a) for the cooler. Similar behaviour is reported for the Valve target. The sensors PS3 and PS2 give Precision rates of 0.997 and 0.991, respectively. The next important sensors are PS1 and FS1 with Precision 0.957 and 0.953, respectively. In summary, this experiment justifies, to a great extent, the preliminary discrimination of target classes in this dataset into easy (Cooler and Valve) and hard (Pump and Accumulator) classifiable classes [21]. For the Cooler and Valve targets, it is sufficient to apply only one sensor for monitoring the different conditions of each one. For the pump target, however using FS1 alone can achieve precision of 0.981 but the group GP2 (FS1, SE, PS1) improves it to 0.993. Focusing on the most effective sensors can optimize the RF model in terms of model size and training time. Finally, several sensors are needed to give an acceptable recognition precision of Accumulator conditions. The sensors GA5 (PS3, PS1, FS1, SE, PS2, TS1) www.ijacsa.thesai.org can achieve a high precision of 0.986. However, this is still below the precision 0.993 of the basic model which employs all sensors.

C. Feature Effectiveness
Based on the findings of the previous experiment, it is interesting to investigate the most effective attributes per sensor. Therefore, in section, only the most effective sensors for each target are considered. The wrapper-based approach is followed where each individual attribute is provided for the classifier. Thus, a direct judgement of the discriminating power of each attribute is obtained. For this purpose, RF model using 5 decision trees with a maximum number of split equals 2 is sufficient. Fig. 3(a) shows that only the mean of the CE sensor (CE_m) can classify the conditions of the Cooler target with average precision of 0.998. Also, the mean of CP sensor (CB_m) can achieve precision 0.992. Moreover, the mean of each of PS5, PS6, TS1 and TS2 results in precision exceeding 0.96. Slop of linear fit of the TS1 sensor is also useful achieving a precision of 0.942.
Other effective features are determined for the Valve condition. The kurtosis and skewness of PS2 (PS2_k and PS2_sk) give precision of 0.994 and 0.972, respectively. Position of the maximum of PS3 (PS3_p) achieves 0.99 precision. Fig. 3(b) shows the effectiveness of attributes of PS2 and PS3 denoted by GV1 and give the highest rate for the Valve target.
Attribute effectiveness for the most useful sensors for the Pump target, namely GP2 (FS1, SE, PS1), is presented in Fig. 3(c). The mean of SE (SE_m) gives 0.952 and the mean of FS1 gives 0.924 precision. The rest of attributes achieve lower rates, in particular the position of the maximum of each of FS1 and SE (FS1_p and SE_p) are definitely useless for Pump target class.
For the Accumulator target, no individual attribute of the sensors group GA5 (PS3, PS1, FS1, SE, PS2, TS1) can exceed a precision level of 0.7 as shown in Fig. 3(d). The mean of each of TS1 and PS1 are the highest two attributes with precision 0.683 and 0.59, respectively. Such observation shows again that recognition the conditions of the Accumulator target is harder than others in this application. Also, the position of maximum of each of FS1 and SE (FS1_p and SE_p) seems useless for this target class.
Summarizing these findings, it is interesting to discover that only one attribute of one senor can be efficient for the addressed classification task in this work for some targets. The mean of CE sensor (CE_m) and the kurtosis of PS2 (PS2_k) achieve precision of 0.998 and 0.992 for Cooler and Valve targets, respectively. Conversely, some features are useless for classification such as (FS1_p and SE_p) for both Pump and Accumulator targets.

V. DISCUSSION AND LIMITATIONS
Condition monitoring of hydraulic systems via sensors fixed inside the system is suspicious to some hazard situations if one or more sensors become out of service. Thus, the importance of current study stems from investigating the role of each sensor in the assigned classification task. Besides, emphasizing the most effective attributes leads to optimizing the classification model size and training time.  For the Valve target, the RF model, using PS3 and PS2, is as efficient as basic model with reduction in model size by 80.2% and training time by 36.4%. Using only FS1, SE and PS1 attributes can achieve slightly better performance than the basic model for the Pump target with reduction ratios of 85.5% and 32.7% for model size and training time, respectively. A different finding is reported for the Accumulator target. Using a reduced set of sensors such as PS3, PS1, SE, FS1, TS1 and PS2 might result in performance degradation (marked with "-" in Table VII). However, model size reduction ratio reaches 70.3% and training time becomes 26.4% less than using all sensors. For the attributes, it is figured out that simple timedomain features such as mean, kurtosis and skewness are very useful and efficient for such classification problem.
It is important to emphasize that, in this work, sensor role and feature importance is studied from a pure machine learning point of view. It is interesting to interpret the validity of the results obtained with the aid of an expert of such hydraulic plant. Moreover, the proposed model, and its optimized versions, should be tested in real environment where ad-hoc devices like PLC units are in charge for monitoring the system conditions. It also lacks to consider the effect of noise on recorded sensor signals, in particular low-frequency sensors.

VI. CONCLUSIONS AND FUTURE WORK
The employment of machine learning techniques for hydraulic system condition monitoring proves effective for automatic recognition of faults and severe conditions. It is common to fix various sensors in the system in order to collect enough readings for different operating conditions. The considered system in this study is provided by 17 (14 physical + 3 virtual sensors) for measuring quantities such as motor power, volume flow, pressure, temperature and vibration. These sensor signals are represented by a set of six simple time-domain attributes to classify four targets, namely, Cooler, Valve, Pump and Accumulator. The random forest classifier is very suitable for such classification task and its performance exceeds 99% for all targets. Moreover, the conducted experimental work reveals the impact of each sensor in classification of each target conditions. Using of all sensors is not essentially effective and efficient. Only one temperature sensor is sufficient for the Cooler conditions classification. The same observation holds for the Valve target where only two pressure sensors are sufficient. Interestingly, the volume flow, pressure and efficiency factor sensors can achieve better recognition rate than using all sensors for the Pump target.
On the contrary, for the Accumulator target, the use of the attributes of all sensors looks mandatory in order to achieve high performance. However, using a reduced set of pressure, volume flow, temperature, and efficiency factor sensors still gives acceptable classification rate for this target.
Using few sensors optimizes the classification model size and training time and, furthermore, minimizes the cost of purchasing and maintenance of many sensors while some of them can be sufficient. It is worth to investigate the applied methodology here for other similar applications to determine the most important sensors for a given fault.
The effect of noise commonly present in sensor measurements on the classification model is challenging and can be investigated in a future work. Moreover, noting that the classification performed in the current work employs windows of data of size 60 seconds, it is a reasonable extension to study the possibility of using shorter windows e.g. 5, 10, 20 or 30 seconds. This can have significant effect on the quick detection of severe fault conditions.