Road Damage Detection Utilizing Convolution Neural Network and Principal Component Analysis

Roads should always be in a reliable condition and maintained regularly. One of the problems that should be maintained well is the pavement cracks problem. This a challenging problem that faces road engineers, since maintaining roads in a stable condition is needed for both drivers and pedestrians. Many methods have been proposed to handle this problem to save time and cost. In this paper, we proposed a two-stage method to detect pavement cracks based on Principal Component Analysis (PCA) and Convolutional Neural Network (CNN) to solve this classification problem. We employed a Principal Component Analysis (PCA) method to extract the most significant features with a different number of PCA components. The proposed approach was trained using a Mendeley Asphalt Crack dataset, which contains 400 images of road cracks with a 480×480 resolution. The obtained results show how PCA helped in speeding up the learning process of CNN. Keywords—Pavement crack; Convolutional Neural Network (CNN); Principal Component Analysis (PCA)


I. INTRODUCTION
Transportation systems depend mainly on the quality of the pavement's condition. Pavement should be able to handle traffic and environmental load for many years [1]. Subsequently, roads might be damaged over time and demonstrate distresses. To guarantee long-long-term performance and an efficient level of service, they need to be well-preserved and go through a frequent maintenance operation. Semi-automated and automated imaging-based methods are employed to provide the early detection of pavement cracks [2]. In general, roads should have good features such as shape, surface, and friction to enable users to feel safe while using them. Authorized transportation agencies are responsible for maintaining roads regularly and maintain them in good condition. In general, roads should have a prearranged schedule to keep the road safe for the public [3].
The U.S. Department of Transportation (DOT) spends billions of dollars every year for building new roads and bridges. For example, in 2018, the DOT spent more than $63 Billion on major transportation infrastructure investments throughout the USA. Meanwhile, in 2017, some unfortunate claims were reported by the American Society of Civil Engineers (ASCE) Infrastructure Report, where the roads in the USA had a "D+" grade for the road infrastructure. It was pointed out that on a scale 1 out of every five miles of highway in the USA has a bad pavement infrastructure condition [4].
One of the main difficulties of maintaining road safety is pavement crack detection, which is a challenging problem that faces road engineers all year [5]. There are several causes of pavement cracks which include poor construction, bad weather conditions, and inadequate structural support for large vehicles [6]. Traditionally, cracks have been detected through a visual process that was proven to be a tedious, time-consuming, and expensive method with an especially low rate of effectiveness. Normally, a road maintenance operator needs a great deal of related knowledge and subjectivity to deal with such problems [7]. The manual inspection is also extremely dangerous for inspectors due to traffic hazards.
Drivers are at a high risk as well. Traffic accidents are a cause of serious concern for transportation engineers and researchers. Road accidents result in significant social and economic costs. Fluctuations in the number of accidents have occurred on highways each year [8]. Some life-threatening consequences of pavement deterioration and defects are skidding, accidental driving off-road, and spontaneous maneuvering to eliminate road infrastructure problems [9] which places the driver and others at high-risk. Besides, poor surface macrotexture and microtexture could lead to hydro-planning and inconsistent tire pavement contact resulting in the reduction of tires gripping the pavement which can cause accidents [10]. Roads or pavement engineers usually inspect all types of cracks, distress, and unevenness routine manner by gathering road condition data. Gathering road data should be implemented in all weather and traffic conditions. This process may suffer from some human errors and consume time [11]. Therefore, it is important to have well-defined strategies for monitoring and maintaining roads [12].
The motivation behind this work can be described as a response to the thousands of needless deaths each year that occur due to pavement distresses all around the world. Maintenance workers put their lives on the line to perform manual inspections of roads. According to the United States Department of Labor Occupational Safety and Health Administration, out of 4,674 worker fatalities in private industry in 2017, 20.7% were in construction. In other words, one in five worker deaths last year were in construction. As a result, this motivates us to investigate the performance of the CNN method with PCA as a feature selection to detect pavement cracks inside images.
The objective of this work is to develop an intelligent approach based on CNN for road damage detection to achieve a trustworthy detection and classification of cracks from obtained 2D concrete and asphalt pavement images. This paper is organized as follows. Section II presents a literature review of the pavement crack detection research. In Section III, we discuss the PCA and CNN elements of the proposed method. A description of the dataset and the way we split the training and testing data is presented in Section VI. The experimental results based on a well-known pavement crack dataset is presented in Section VII. Finally, concluding remarks and future works are presented in Section VIII.

II. LITERATURE REVIEW
In the past, many research papers investigate the pavement crack problem as an image processing problem. For example, Sy et al. [13] applied three operations (i.e., bi-level threshold, morphological, and projection) to detect pavement cracks. The experimental data was on three kinds of images: laboratory images, static images, and AMAC reg images. Li et al. [14] studied this problem by proposing an approach as a thresholding method based on neighboring differential histogram statistics. Oliveira and Correia [15] handle pavement crack problems inside images by proposing a local thresholding approach based on non-overlapping blocks.
Recently, the deep convolutional neural network has been proven to show great advantages in image classification and an excellent classifier for pavement cracks. In [7], the authors extracted small patches from cracked pavement images as inputs to generate a large training database. Their proposed CNN network included 4 convolutional layers with 2 maxpooling layers and 3 fully connected layers. The proposed method had an accuracy of 91% and a recall rate of 91%. In the CrackIT project published in [16], the author used the mean and standard deviation for the unsupervised learning algorithm to distinguish blocks with cracks against blocks without cracks. They assigned severity levels to identify crack segments which relied on the computed measurement of the crack's width. The ratio between the crack segment area and the number of crack pixels belonging to the crack skeleton was computed. The results showed an accuracy of 97%, a recall score of 98.4%, and a precision of 95.5%. The drawback in their method was that they were dealing with extremely thin cracks (many of which were less than 2 mm), which proved to be a difficult task.
In [17], AlexNet created by Alex Krizhevsky used Rectified Linear Units (ReLU) instead of the tanh function, which was standard at the time. ReLU's advantage over other methods is in its training time. CNN used a ReLU layer to provide a 25% error on the CIFAR-10 dataset which was six times faster than a CNN using the tanh function.
In separate work, Honyan Xu et al. proposed an end to end crack detection based on the CNN with 28 layers, including 16 convolutional layers and earned a 90.19% test accuracy [18]. The Unmanned Aerial Vehicles (UAVs) was also used for crack inspection and monitoring. In [19], the authors proposed to simulate the pre-trained deep learning models with transfer learning methods to detect the pavement cracks based on UAV images of civil infrastructure. They employed small and complex UAV images for training and validation phases. The obtained results show that the accuracy of the proposed methods was 90% in finding cracks in practical situations with no need for augmentation and pre-processing.
In [20], the author introduced a CNN model structure to solve the crack detection and classification problems. They used a digital camera to collect images of various resolutions (i.e., 32 × 32 and 64 × 64). Two CNN networks were trained based on image resolution to detect if there was a crack or not. To achieve the second goal, the authors converted the image to binary ones with two types of crack, transverse, and longitudinal. The output from the first stage was feed to a second CNN to classify the type of crack. The finding was interesting since the images with low resolutions provided a higher classification accuracy. For 32 × 32 resolution images, the recall, precision, and accuracy calculated was 98.0%, 99.4%, and 99.2% respectively for crack and noncrack detection, while the performance for classification (i.e., transverse and longitudinal) reached the accuracy of 98% and 97%.

III. METHODS
Developing an intelligent and trustworthy detection model based on CNN to detect cracks inside 2D concrete and asphalt pavement images is the main objective of this paper. The adopted database of concrete and asphalt pavement has images obtained by a 2D area digital scanning method. In this section, we shall describe the adopted methodology to solve the pavement crack detection problem.

A. Preprocessing: PCA
Principal Component Analysis commonly referred to as PCA, is a linear transformation of data. It is one of the most widely used methods of re-framing the data given [21], [22]. It measures the distances from the data to the line and tries to find the line that minimizes those distances or it can try to find the line that maximizes the distances from the projected points to the origin. It is a data transformation technique that can make it easier to use with reduction later. Data must be standardized. Dimensions will be centered around zero and have a standard deviation of 1. PCA will find a new axis, or a new attribute such that the data is maximized.
PCA works as a dimension reduction and data analysis tool. PCA has been applied successfully in a vast research area such as data mining, image processing, and artificial intelligence [23], [24]. PCA is one of the most well-known methods of factor analysis to project high-dimensional data (e.g., images) into low dimensional data based on a linear transformation without losing the value of original features [25]. So, the PCA method will reduce the number of variables and group these new variables into groups called factors, which improve the overall performance of machine learning classifiers such as execution time and memory usage.
The basic idea of PCA appeared in 1901 by Karl Pearson [26]. In 1936, Harold Hotelling [27] improved and developed the classical PCA. PCA is a method that aims in simplifying a multidimensional dataset to lower dimensions for analysis and visualization. In general, PCA works by converting the correlated feature variables into a new set of linearly uncorrelated features variables, which is called principal components. The main condition of PCA is that the number of PCA components should be less than or equal to the number of original features variables.
In this paper, we employed the PCA as a pre-processing step of the images before sending it to CNN to reduce the data size and improve the overall performance of CNN. Given a set of pavement crack image {x 1 , x 2 , x 3 , ..., x n }, the PCA works as follows: • First: We calculate the covariance matrix of x and x, using Equation 1.
• Second: We compute the eigenvectors of x, and construct a matrix as shown in Equation 2, where u 1 represents the first eigenvector, u 2 represents the second eigenvector, and so on. Equation 3 shows the calculation that is used to construct the input features maps that are uncorrelated with each other. The covariance matrix for x rot can be extracted from the diagonal matrix from U , whose diagonal elements λ 1 , λ 2 , λ 3 ,..., λ n . Where λ i presents corresponding eigenvalues of eigen vector matrix U .
• Finally: PCA is evaluated based on Equation 4.

B. What is ANN?
Artificial Neural Networks (ANNs) are computational processing systems that were inspired by how biological nervous systems function [28]. In reality, the neural system is a very complex one that consists of an extremely large number of neurons. Each neuron is designed to receive an input signal(s) from its dendrites and generate an output signal(s). the output signal(s) goes through the axon, which transfers the generated signal to the next neuron using synapses. Once a set of input signals reaches a predetermined threshold value, the neuron is triggered, which simulates the real functions inside the human brain (see Fig. 1 [29]). ANNs consist of interconnected nodes, called neurons, that learn from given input to optimize to final output. These artificial neurons have numeric weights attached to them. These weights will be optimized through the training phase. The performance of well-trained ANN will show high performance with a piece of datum or pattern to recognize or identify (see Fig. 2). Using a suitable learning algorithm, these units are efficient in generating a function that maps a relationship between inputs and output training examples.
ANN uses a training dataset (i.e., images, row data, etc.) as input data. The input layer handles the training dataset, which is connected to the next layer (i.e., the hidden layer). The hidden layer will manipulate the data and tune the connection weights before sending it to the output layer. Each ANN should have a learning algorithm (e.g., BackPropagation, Convolutional Neural Network, Long Short-Term Memory, etc.) that tunes the ANN weights to enhance the overall performance of ANN by reducing the error between the real output (i.e., actual) and obtained output (i.e., predicted) from ANN [30], [31], [32].
Several tuning parameters should be designated before we can use ANN to be trained. They include the number of layers in the hidden layer, the type of sigmoid function for the neurons, and the adopted learning algorithm (see Fig. 2 [33])

IV. WHAT IS CNN?
The earliest CNN model called the leNet-5 model was proposed by LeCun in 1998 [34]. CNN can be thought of as a close family member of the traditional ANN. The main structure of CNN is motivated by the discovery of the visual cortex in the brain, which contains a large number of cells that detect the light in the small receptive fields, and overlapping sub-regions of the visual field (see Fig. 3). These cells act as local filters over the input space, and the more complex cells have larger receptive fields.  Therefore, a simple CNN is a series of layers, and every layer of a CNN converts one volume of activations to another through a differentiable function. We use three main types of layers to build CNN architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as seen in regular Neural Networks). We will stack these layers to form a full CNN architecture.

A. Convolution Layer
Traditional neural networks are fully connected in every layer, while convolutional layers in CNNs use the convolutional operation [35]. The convolution layer in CNN operates the function that is performed by the cells in the visual cortex. The neurons in CNNs are self-optimize through learning. Each neuron receives input and operates. CNN's have an input layer, various hidden layers, and an output layer. These hidden layers use a mathematical model to pass on results to the following layer.
Convolution is the first layer used to extract features from an input image and preserves the relationship between pixels by learning image features using small tiles of input data [1], [36]. Essentially, the convolutional layer is a mathematical equation that takes two inputs such as an image matrix and a filter or kernel. Convolution uses a small square matrix, which preserves the spatial relationships among pixels, to learn image features [37]. The convolution layer can do quite a few operations with different filters including edge detection, blurring, and sharpening an image [38].
The convolution layer is the essential component of a convolutional neural network. The convolution layer includes of a set of independent filters. Each filter is individually convolved with the image, and feature maps are obtained. In general, if we convolve an image of size N × M with a filter of size l × k, we get an output feature map of size O width × O height as given in Equation 7.
where p l and p k are the padding in both width and height, respectively, and s l and s k represent the stride in both horizontal and vertical directions. Thus, if we apply a convolution operation with a filter of size 5 × 5 on an image with a size 32 × 32, the result will be a feature map of size 28 × 28, with a zero-padding and stride of one. The output feature map is acquired by the convolution of the input maps with a linear filter, adding a bias term and then applying a nonlinear function. The output can be generally denoted by the formula as in Equation 8 where q represents the layer number, W ij represents the convolutional kernel, b j represents bias, I j represents the set of input maps and f (.) represents the activation function. Fig.  4 shows the output features collected from the third layer after doing the feature extraction.

B. Polling Layer
The main objective of the pooling layer is to decrease the spatial size of the representation, which will enhance the overall performance of the neural network, the number of CNN parameters, and reduce the probability of overfitting. In general, the pooling layer is located between successive convolutional layers on CNN. The pooling layer operates on sliding a two-dimensional filter over each channel of feature map and summarizing the features lying within the region covered by the filter.
The pooling layers in the CNN are used to reduce the number of parameters when the images are too large (see Fig. 5). Spatial pooling reduces the dimensionality of each map but keeps important information. It also can control over-fitting. There are two types of pooling 1) max and 2) average pooling. Max pooling is defined as a sample-based discretization process. The advantage of Max-Pooling is a massive edge detection-based matrix multiplication.

C. Rectified Linear Unit
The activation function is a very important element for the CNN design such that it can learn and perform more complex tasks. Activation functions are a nonlinear function utilized to the input. Several frequently used activation functions in the literature are sigmoid, logistic activation function, tanh, and hyperbolic tangent activation function. In this work, we are adopting a ReLU activation function. ReLU stands for the Rectified Linear Unit for a nonlinear operation. The rectified linear activation function is defined as a piecewise linear function. This function can produce either the same function input if the input is position and zero if the input is negative. Fig.  6 shows the ReLU function. Equations 9 and 10 demonstrate the computations of ReLU.
ReLU helps to backpropagate the errors and have multiple layers of neurons being triggered by the ReLU function. ReLU helps to overcome the vanishing gradient problem and allows models to learn faster. ReLU is widely recommended to use in CNN classification models [39].

D. SoftMax Unit
SoftMax is another activation function like sigmoid, tanh, and ReLU. They are commonly used for the neurons in the output of the fully connected. It's defined as: where C is the number of classes, z is the input vector, and σ(z i ) is the output class probability. SoftMax function provides a discrete probability distribution over all the given classes. The SoftMax function output is a probabilities p i ∈ [0, 1]. The sum of the probability of all classes C is p i = 1.

E. Fully-Connected Layer
The fully connected (FC) means that every single neuron in the preceding layer is connected to every single neuron on the current layer. Each neuron shall have a summation followed by an activation function. The final layer of CNN is a FC layer that has a FC to all activation functions in the previous layer, as observed in the traditional ANN. These activation functions are used to compute the CNN final output via a matrix multiplication followed by a bias offset. In the CNN, the FC layer merges all the features obtained from the previous convolutional and sub-sampling layers.

F. Evaluation Metrics
The evaluation metrics that will be used in this paper are accuracy, precision, and recall. These metrics were chosen because they are commonly used in classification problems. They are defined as the following:  P recision = T P T P + F P (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 6, 2020   where TP, FP, TN and FN are the true positive, false positive, true negative, and false-negative, respectively.

V. PROPOSED CNN-BASED METHOD
The proposed method used in this work is depicted in Fig.  7. The proposed method is a combination of PCA and CNN methods. In the first step, we collect data for pavement cracks using a traditional method (i.e., cameraman) or an intelligent method (i.e., a robot with a camera). After collecting data, it is important to analyze the most valuable features by extracting image features using the PCA method. The proposed method will enhance the performance of CNN convergence.

VI. DATASET
A pavement crack dataset called the Asphalt Crack dataset is used in this work. The data consists of 400 images. It is a contribution by Jayanth Balaji, Thiru Balaji, Dinesh M S, Binoy Nair, and Harish Ram D.S. The dataset was published on April 26, 2019 [40]. Fig. 8 depicts samples of the dataset (i.e., cracked and not cracked pavements).

A. Training and Testing Data
The hold-out method is the simplest kind of crossvalidation method. The main basic idea of the hold-out method is to divide them into two groups: Training and Test/Validation dataset. The training dataset is used to train the model (i.e., CNN) and evaluate the trained model using the Test/Validation dataset. This approach of splitting data is applicable to image processing applications. We adopted the classical holdout method for splitting data.

VII. EXPERIMENTAL RESULT
The proposed method for pavement crack detection was simulated on an Intel Core i7-7700HQ 2.8-GHz processor with 16 GB RAM and implemented using MATLAB R2019b environment [41]. First, we preprocessed the crack images using PCA. We created a set of feature images from the original crack images data set as given in Fig. 9. These are the first features created from PCA. These images were used as input to the CNN for further processing.
A sample code that shows the CNN architecture is presented in Fig. 10 of PCA component (i.e., 2, 5, 10, and 15). For each type of experiment, we executed our program eleven times. Table I shows the obtained results for the training dataset. The performance of CNN with PCA 10 (i.e., PCA with ten components) outperforms all other methods based on average accuracy of 96.62 and standard deviation of 0.93. Fig. 11a explores the average convergence curves of the accuracy during the training process for all trained models. The performance of CNN is improved after employing the PCA. Fig. 11b demonstrates the performance of the CNN models with and without PCA. The performance of CNN with PCA is improved and able to detect the pavement cracks robustly compared to CNN without PCA.
In Fig. 12a we show the CNN performance based on eleven runs. It was found that the performance results of CNN without PCA is not the best. Table II shows the performance of the proposed method over the testing dataset. Again it is clear that the performance of CNN with PCA 10 outperforms all other models based on average and standard deviation. Fig.  12b shows the performance of all trained models over the testing dataset. The PCA can enhance the performance of CNN. Moreover, the performance of CNN with PCA 10 is the most suitable method for the pavement crack detection method.   Table V shows the p-values of the obtained results between a different number of PCA components. All the obtained pvalues are less than 0.05, which means that there is a statistical difference between them. For example, the p-value between PCA 2 and PCA 10 is 0.040, which means the performance of CNN is not similar. Finally, from the obtained results, we can conclude that PCA as a feature extraction can enhance the performance of CNN. Moreover, the proposed approach can examine a huge  number of images automatically, which save time, cost for pavement cracks detection, and reduce risks for roads and pavements engineers.

VIII. CONCLUSIONS AND FUTURE WORK
In this work, we proposed a CNN-based method to automate the pavement crack detection process. The main idea of the proposed method is to combine CNN with PCA to speed up the learning process. CNN was employed as a classification method, while PCA as a feature extraction one. We examined the performance of our proposed method on a public dataset that contains 400 images. We also explored several numbers of PCA components ( i.e., 2, 5, 10, and 15). The obtained results show that CNN with PCA 10 outperforms all other models. In future work, we will examine different parameters setting and employed an optimization algorithm such as a genetic algorithm, to optimize the PCA parameters.