An Efficient Methodology for Water Supply Pipeline Risk Index Prediction for Avoiding Accidental Losses

—The accidents happening to buildings and other human facilitation sectors due to poor water supply pipelining system is a random phenomenon, but an efficient estimation system can help to escape from such accidents. Such a system can be useful in assisting the caretakers to take the initiative measures to avoid the occurrence of the accidents or at least reduce the associated risk. In this paper, we target this issue by proposing a water supply pipelines risk estimation methodology using feed forward backpropagation neural network (FFBPNN). For validation and performance evaluation, real data of water supply pipelines collected in Seoul, Republic of South Korea from 1987 to 2010 is used. A comprehensive analysis is performed in order to get reasonable results with both original and pre-processed input data. Pre-processing consists of two steps: data normalization and statistical moments computation. Statistical moments are mean, variance, kurtosis and skewness. Significant improvement in prediction accuracy is observed with data pre-processing in terms of selected performance metrics, such as mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error (RMSE).


I. INTRODUCTION
Underground facilities are very important to be monitored, because these facilities are unforeseen hazards to buildings, bridges, railway lines, etc. In terms of reducing the hazards, underground water supply pipelines are one the most important facility which contributes many underground risks and difficulties. Water is the superlative part of life without which no one can survive in the world. Water facilities are fully dependent on pipe, without pipeline water supply is impossible. In order to provide and supply water to homes, buildings and commercial areas, a bundle of pipes is installed underground. In recent age, the cities become more and more congested and these water supply pipelines are danger warning to constructed buildings as well as more risks to be taken to supply water in these congested areas [1,2].
The affection of water supply pipelines is affected by different types of parameters such as leakage, age, depth and height, quality of the pipe, water temperature, soil electrical resistivity, soil temperature, soil moisture, etc. One of the most important features for estimation of water supply pipeline risk is aging. The fitness of pipelines may be degraded with the passage of time, and also depends on the quality of materials. The degradation of the water supply pipeline can affect and destruct the underground structure of the buildings; hence it is very important to fix and restore damaged water supply pipelines on time and reduce the chances of occurrence of accidents and risks [3]. In this paper, we examine the depth, leakage, length and age which are a very important factors in water supply pipeline failure or damage.
Water supply pipeline leakage is a very remarkable factor that can cause for unexpected underground menaces such as urban sinkhole and abrupt road-side subsidence due to water pipes leaks. This water supply pipeline leakage slowly and steadily destructs the underground structure of congested buildings, sub-ways, bridges, railways, etc. and because of this alarming situation, underground water facilities permanently remain a major and serious threat. Failure and depth of water supply pipeline also ultimately damages the underground structural damage or failure eventually. When a water pipeline is spread closely to the surface then any human activity or moving of vehicles can damage the surface over the pipe and can even break the pipe when heavy load is exerted on the pipe, if a pipe under the ground is buried deeper, affection does not cause on the surface because of deepness. Suddenly rupture in the water supply pipeline can bring serious damage to the near advantage people. Comparably another parameter that is length, plays an essential role in the protection of the water supply pipeline and ultimately at underground risk [4]. To analyze water supply risks, researchers have proposed different new techniques.
The objective of this paper is to compute the accurate estimated risk failure of water supply pipelines by using FFBPNN on data with statistical moments, original data and normalized data. We selected the FFBPNN for water supply pipeline risk estimation because it is the most important model 385 | P a g e www.ijacsa.thesai.org for estimation and prediction [5]. The data with statistical moments is also very important to increase the performance of ANN [5]. Normalization also increases the performance of an algorithm. In this study the pipeline failure risk is considered in term that how it impacts the underground structure.
Rest of the paper is organized as: Section 2 presents related work, Section 3 proposed model for water supply pipelines, Section 4 explains the experimental results and discussion. The conclusion of the paper is given in Section 5.

II. RELATED WORK
Numerous algorithms have been developed based for water supply risk index assessment in literature.
Hussam et al. in [6] suggested a hierarchical based fuzzy model for water supply pipeline hazard evaluation. 16 risk factors were considered in that system. They inferred that the age of the pipe extends a solid effect on failure. They further added that pipe segments length, diameter and material are also very important elements of water supply pipelines. Yan and Vairvamoorthy in [7] proposed a decision-making technique to assess pipeline conditions. The fuzzy numbers are the output of this model that reflects the state of each pipeline. Kleiner et al. recommended a technique for buried pipelines to model the decline process; the model is based on a fuzzy rule-based non-homogeneous Markova process [8]. They also proposed a fuzzy logic method for pipeline risk evaluation. They have three main chunks, namely failure possibility, failure consequences, and a mixture of both [9].
Kleta et al., [10] used to review the system of lining surface images by video recording in which cameras are moved for damage assessment and integrity observation. The recording and camera monitoring systems are only limited to the visible parts of the surface. Different tools are designed to inspect large diameters and water mains. The most accurate tool available to detect the pockets trapped gas, leak and structural faults complicated networks of huge diameter mater mains is the Sahara Pipeline Inspection System. Meng et al., [11] recommended a quantitative risk assessment (QRA) model which is a novel approach used for evaluation of nonhomogeneous road tunnels risks, because the QRA models are inappropriate to apply to assess risk in road tunnels. The tunnel segmentation principle is used in this model in which the dissimilar urban road tunnel is segmented into numerous similar segments. The separate risk for road tunnel segments along with the combined risk indication for the entire road tunnel is elucidated. Duzgun et al., [12] suggested the decision analysis method for evaluation and managing the risk of underground coal mines and falling of coal mine roof. Possible consequences and cost of consequences, the probability was used for the risk assessment. Ustinovichius et al., [13] discussed various risk assessment methods. Assessment of risk can help decision-makers for ranking existing risk to take proper reaction suitably. Fault trees, monto carlo simulation, failure mode and effective analysis, event trees, game theory, fuzzy set, grey systems and multicriteria verbal analysis are available numerous risk assessment methods. Multilayer perceptron (MLP) is an artificial neural network (ANN) with more than one hidden layer and a bias layer. For different types of modeling of ANN, different types of architectures have been used for many years in different research areas including mathematics, engineering, medicine, neurology, meteorology, economics, hydrology, psychology and different other areas [14][15][16][17]. The ANN has many variants like multilayer perceptron (MLP), self-organization map (SOM), support vector machine (SVM), recurrent neural network (RNN) and feed forwarded neural network (FFNN). In this work, feed forward back-propagation neural network (FFBPNN) is used which is a very famous ANN model for prediction and estimation [18].
Fayaz et al. [19] proposed a model called blended hierarchical fuzzy logic for water supply risk index assessment. The purposed of the proposed model was to reduce the number of rules in the developed model. Another model named as the cohesive hierarchical fuzzy inference system was developed in [20] to assess water supply risk index. The aim of this model was similar to the previous model and both models have the potential to decrease the number of rules as well as to improve accuracy. Fig. 1 depicts the proposed method, comprised of four different kind of layers; data acquisition layer, pre-processing layer, estimation layer based on the neural network and performance evaluation layer. Each layer has its own functionality. Data layer contains the data related to water supply pipeline risks. In the pre-processing layer, statistical moments and normalization are used to pre-process the acquired data. In the estimation layer, FFBPNN is used for water supply pipeline risk index estimation. The performance of the FFBPNN is measured in the performance evaluation layer. The three-performance measurement used for evaluation of neural network such as mean absolute error (MAE), root means absolute error (RMSE) and mean absolute percentage error (MAPE), respectively.

A. Data Layer
The datasets used in this work are real datasets that have been acquired from Electronics and Telecommunications Research Institute (ETRI) working on the underground projects. This institute completed a lot of underground project globally. For our research, we collected water supply pipelines data from 1989-2010 at different places in Seoul, the Republic of South Korea. It is observed in the literature when some tweaks are added to the original data, the performance of machine learning algorithms improves, therefore in this study, we normalize the data and calculate statistical moments of the data to get better results.

B. Pre-Processing Layer
First, we take the dataset in pre-processing layer. The datasets comprise of the leakage, age, depth and height parameters of water supply pipelines. Using this data, first we calculate the statistical moments and then concatenate with the original data. The first four parameters, namely variance, mean, kurtosis and skewness [17] can be calculated using below Equations (1-4).
For trial and test purposes, the normalized data can be computed by using Equation (5).
Where the output normalized value is denoted by , the current value is indicated by , the minimum value in the set is represented by , and the maximum value is denoted by [5].

C. Water Supply Risk Index Estimation Layer
The ANN method is characterized as the regression method, which signifies the state of the art nonlinearity between the dependent and independent variables [5]. In the recent decade, researchers have deployed NNs for analyzing different kinds of estimation problems in a variety of situations. The model we used in the proposed work is the FFNN model with back-error propagation as depicted in Fig. 2  and 3, respectively for original and normalized data, and for data with statistical moments. The ANN model, bind with the error propagation algorithm (FFBPNN) is a very popular ANN model for prediction and estimation [18]. ANN has normally three layers model such as input layer, hidden layer, and output layer as depicted in Fig. 2 and 4. Researcher always use more than one hidden layer, and a bias node can also be added to the hidden layer to reduce error in the model.
If we compute the hidden layer, we can use the below Equation (6).
Where j nodes in the hidden layer can be denoted by , node i in the input layer is denoted by , w ij denotes the weight between the nodes, y represents the output layer node and can be computed by (7).
The output layer node is denoted by y, (we have only taken two output nodes in this research work, multiple numbers of output nodes can be taken according to the requirements). Error can be computed between computed data and observed data by using equation (8): The observed data propagation from the output layer is represented by d, and the hidden layer can be calculated by using equations 9 and 10, respectively.
The input and hidden layers and the adjustment of weight w between hidden layers and output layers can be computed by the following formulae (11,12), respectively. ∆w = δ , = 1, … . . , ; j = 1, … . , ∆w = δ , = 1, … . . , J The learning rate is represented by, and also momentum can be computed as (13)(14); ∆w ij n = αδ y v j, + β ∆w j1 n−1 , j = 1, … , JR ∆w ij n = αδ y v j + β ∆w j1 n−1 , i = 1, … , I; j = 1, … . . , J The iteration of error backpropagation is indicated by n, and momentum constant is represented by β. This momentum method accelerates the weights to avoid any fluctuations in the training process of error surface in the flat region.
For the validation of each of the models developed for a different number of inputs and hidden neurons, the percentage split method is applied in which the total data are separated, and experimentation is done by 70/30 random training-test splits, 70% for training the data and 30% for testing to validate the samples. This ratio is the standard ratio for splitting training and testing data [15].

D. Performance Evaluation Layer
Different parameters are available to calculate the performance of the model. Below three performance measurement equations have been used to measure the performance that is mean absolute error (MAE), root mean square error (RMSE) and the mean absolute percentage error (MAPE) [15]. For the assessment of regression accuracy, these performance measurement matrices are normally used in the literature. Numerical equations of MAE, RMSE, and MAPE are calculated by using below Equations (15,16,17), alternatively.
Where the total number of observations is represented by N, actual values are denoted by A, and estimated values is represented by P. As illustrated in Table I

A. Experimental Setup
The proposed scheme experimentation is performed using windows 10 operating system with an Intel Core i5 processor using MATLAB R2019b version 9.7.0.1216025. In this study different types of experiments have been performed on the data in order to calculate the best estimated risk index for water supply pipelines. Typically input parameters play the most important role in the performance of any kind of machine learning algorithm technique. Therefore, in this research, first the leakage, depth, length and age values of water supply pipelines with the error correction neural network have been given as inputs to the feed forward backpropagation. The model we have tried in this research composed of input layer with different combination of neurons in the hidden layer and output layer. The number of neurons which is best suited to the proposed method is the combination of ten (10) neurons in hidden layer with four (4) neurons in input layer, and single neuron in output layer have been selected as shown in Fig. 4.
Secondly, Different sets of experiments are accomplished with normalized data. Four neurons in the input layer, sixteen (16) neurons in the hidden layer and single neuron in the output layer have been applied as shown in Fig. 5. In the same case, we also tried different combinations of maximum and minimum number of neurons in the hidden layer. We found that this combination (10 neurons in hidden layer) is best fit in combination with four inputs and one output layer.
Third, the experiment is performed by combining original data with statistical moments as shown in Fig. 6. Eight (8) neurons in the input layer and Twenty (20) neurons are configured in the hidden layer with one neuron in the output layer. Age, depth, height, leakage, variance, mean, kurtosis and skewness are inputs to the neural networks. In order to find the better combination of the number of neurons in the hidden layer with the input and output layer, we tested different number of neurons in the hidden layer and we concluded that this combination is more accurate combination as shown in Fig. 6, thus selected.

B. Results and Discussion
The graphical representations of the estimated results are presented in the following section. The actual risk, estimated risk and the errors observed in estimation for water supply pipelines using originally collected data are shown in Fig. 7 and 8. The estimated risk, actual risk and the errors observed in estimated risk and actual risk using FFBPNN on normalized data are shown in Fig. 9 and 10. The actual risk, estimated risk and the errors observed in estimation using FFBPNN for collecting water supply pipeline data with statistical moments are shown in Fig. 11 and 12. Performance using three measurement; mean absolute percentage error (MAPE), mean absolute error (MAE) and root mean square error (RMSE) is calculated for FFBPNN on the normalized data (ND), original data (SD) and data with statistical moments (SMD) is shown in Table I and Fig. 13. The outcomes show that FFBPNN outperforms on both normalized data (ND) and data with statistical moments (SMD) as compared to original data that is provided to FFBPNN. 389 | P a g e www.ijacsa.thesai.org           In this paper, a multi-layer perceptron is applied to predict water supply pipeline risk. The multi-layer perceptron was randomly trained tested using historical data. The collected data are from 1989-2010 about water supply pipelines fitted in Seoul, Republic of Korea. Experimentation is done by 70/30 random training-test splits, where 70% for training and 30% for testing to validate the samples. For the performance and accuracy evaluation of the models, the root means square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used. The FFBPNN was applied to original collected data and we noticed that the outcomes attained for the original data were not prominent. Then we added a tweak by calculating the statistical moments of the original data and merged with the original data. After this process, we applied FFBPNN on this new data. The performance measure matrices indicate that the results provided by FFBPNN using the new data with statistical moments are comparatively better. Further, we normalized the original data and applied FFBPNN using this normalized data with statistical moments where noticeable outperformance is achieved. Overall the performance of FFBPNN on statistical moment data is slightly prominent as compared to normalized data and far better than original data. In future, we may apply some more tweaks on the data and may test more machine learning algorithms.