Bearing Fault Detection based on Internet of Things using Convolutional Neural Network

—In the age of the industrial revolution, industry and machinery are elements of the utmost importance to the development of human civilization. As industries are dependent on their machines, regular maintenance of these machines is required. However, if the machine is too big for humans to look after, we need a system that will observe these giants. This paper proposes a convolutional neural network-based system that detects faults in industrial machines by diagnosing motor sounds using accelerometers sensors. The sensors collect data from the machines and augment the data into 261756 samples to train (70%) and test (30%) the models for better accuracy. The sensor data are sent to the server through the wireless sensor network and decomposed using discrete wavelet transformation (DWT). This big data is processed to detect faults. The study shows that custom CNN architectures surpass the performance of the transfer learning-based MobileNetV2 fault diagnosis model. The system could successfully detect faults with up to 99.64% accuracy and 99.83% precision with the MobileNetV2 pre-trained on the ImageNet Dataset. However, the Convolutional 1D and 2D architectures perform excellently with 100% accuracy and 100 % precision.


I. INTRODUCTION
Industries are getting smarter day by day. Early identification of fault inside industry machinery plays a significant role in this modern era. To increase the productivity of production systems, precise production techniques are vital. Extensive research has been done to detect early faults and classify those machines and defects. Monitoring the faulty machines and taking remedial action makes for a safe environment and reduced failures [1]. Researchers have suggested different classifiers for the early diagnosis and detection of faults. Over time, machines have become complex, and for complex machines, data-driven methods have shown efficiency, whereas non-parametric methods are applied for extracting related information from the data [2,3]. Machine learning algorithms such as SVM have significant difficulty in selecting features for identifying defects, particularly in induction motors as seen in [4]. Deep learning addresses the problem of feature selection by extracting features directly from raw data. Deep learning models are not precise enough when the amount of data is not sufficient. Generalization is also a significant drawback for deep learning algorithms. Accuracy is dependent on the proper distribution of data. Later, Transfer learning methods start getting famous in the Fault Detection and Diagnosis (FDD) field to prevent industry anomalies [5,6]. Transfer learning methods learn from a place that is enriched in data, and later these models are applied in a place where the amount of data is an issue. This research has used Convolution 1D, Convolution 2D, and MobileNetV2 architecture to detect faulty machinery.
A sufficient amount of research has been done to find better classifiers for detecting faults in industrial machinery. An Artificial Intelligence-based fault detection system has been proposed by Lei et al. For learning features from raw signals, they present a sparse filtering and neural network [7]. Boukra et al. proposed a hybrid method based on feature reduction uses two parameters. Their proposed method is not manipulated by load conditions [8]. Machine learning algorithms are used for identifying faults in the motor drive where supervised learning plays a significant role [9]. Although supervised learning downs efficiency with unwanted data. Feature extraction from raw signals plays a significant role in detecting induction motors' faults. Wavelet analysis, time-domain analysis, frequency domain analysis, time scale frequency analysis, and time scale frequency analysis all aid in the extraction of characteristics [10][11][12]. The features extracted here affect accuracy in classification and proper fault recognition. Deep learning reduces this problem. Deep learning algorithms show a significant result in detecting faults for bearings and gearboxes [13]. Deep CNN shows remarkable accuracy in fault diagnosis from raw vibrating signals in the anti-noise domain [14]. Deep learning models are not efficient enough when the amount of training data is not sufficient. Transfer learning models show better accuracy when the amount of data is less. Adversarial transfer learning algorithms can identify erroneous signals by first converting them to RGB pictures and then training the model on those images [15]. Transfer learningbased autoencoders are proposed by Wan et al. [16]. High fault classification results have been shown by VGG 19 in detecting faults in induction motors [17].
In the study, the authors worked with the Dataset obtained from Case Western Reserve University to detect the fault of bearings of the machinery based on a wireless IoT framework. The Dataset was further divided into sub-classes and environments. After label encoding and splitting to training and test set, the processed data were classified using CNN algorithms. The classification methods of MobileNetV2, Convolution 1D, and Convolution 2D are used. The performance of the algorithms is then measured to determine the best method to detect faults in machines in industries.

II. LITERATURE REVIEW
The diagnosis of bearing faults is a hot topic in mechanical condition monitoring. The key phases in bearing defect diagnostics are feature extraction and pattern classification from monitoring data. The bearings will create extra vibrations when a specific bearing element fails. The fault characteristic frequency is a connection between the frequency of the extra vibration and the bearing speed. We can locate the failing bearing by analyzing the original vibration signal's frequency components [18].
Feature extraction techniques used in classical signal processing for bearing defect diagnostics using vibration signals include the Hilbert-Huang transform (HHT), the wavelet transform, empirical mode decomposition, and approaches. HHT was used by V.K. Rai et al. [19] to extract frequency domain features from bearing fault data to identify bearing fault categories. Xinsheng Lou et al. [20] used wavelet transform and neuro-fuzzy classification to develop a novel ball bearing problem diagnostic system. The wavelet transform was employed to retrieve the accelerometer signal's feature vectors. Once the adaptive neural-fuzzy inference system had been trained to categorize the feature vectors, it was used to classify data. The suggested approach worked effectively even with a variable load.
SVMs were effectively used in the area of fault diagnostics by P. Konar et al. [21]. The feature vectors were extracted using a continuous wavelet transform (CWT), and the monitoring data for the three-phase induction motor was classified using a support vector machine (SVM). In order to recognize the early problem of the bearing, Zhuanzhe Zhao et al. [22] presented an intelligent fault detection approach based on a backpropagation (BP) neural network. It was suggested that the intrinsic mode functions (IMFs) be first acquired using a wavelet packet decomposition approach, and then the EMD method was utilized to get them. A BP neural network with three layers was built to recognize the monitoring signal fault pattern.
Using naive Bayes classifier and Bayes net classifier carried out fault diagnostics [23]. Before using the suggested technique, the vibration signals are wavelet-analyzed to extract the discrete wavelet features, then utilized as input into the Bayes net for classification.
An EMD method was utilized by Lei et al. [23], and a kurtosis-based method was offered to identify the sensitive characteristics for defect diagnostics based on bearing vibration signals. When dealing with nonlinear, non-stationary, and composite signals, Lin et al. [24] employed an enhanced EMD approach to extract features. The acoustic emission data from bearing tests were pre-processed using He et al.'s short-time Fourier transform (STFT) approach [25]. Feature selection techniques were commonly used to choose the most representative features from the collected data, including linear discriminant analysis (LDA) and principal component analysis (PCA). A diagnostic defect technique based on decision trees and PCA was reported by Sun et al. [26]. After feature extraction, PCA is used to minimize the number of features.
S. G. et al. [27] developed a continuous wavelet transform and CNN approach to accurately, robustly, and generally diagnose rotating machines faults. This study by S. S. et al. [28] demonstrated that deep convolutional neural networks (DCNNs) could learn from various sensor outputs to identify induction motor faults with consistency and accuracy. By including freshly produced extra features for self-update to incorporate new aberrant samples and fault classes, W. Y. et al. [29] used a wide convolutional neural network to increase diagnostic performance and incremental learning capabilities.

This research proposed a solution based on a wireless
Internet of Things architecture for detecting machinery bearing faults using data from Case Western Reserve University. The Dataset was augmented in 261756 samples and further divided into sub-classes and environments. In order to classify the processed data, CNN methods were used after label encoding and partitioning it into training (70%) and test sets (30%). MobilNetV2, Convolution 1D, and Convolution 2D classification algorithms are used. For industrial machine fault detection, compare several algorithms' results side by side to see which one performs better. The WSN was created to gather accelerometer sensor data from a variety of industrial bearings. The TMS320F28335 digital signal microcontroller is utilized as a microcontroller unit (MCU) in the WSN model, collecting data from accelerometers and transmitting it through an XBee (Pro Series 3) radio transmitter. In addition, the XBee module is used to configure a coordinator gateway that receives signals from sensor nodes and sends them to the diagnostic server, which hosts the suggested fault diagnosis model. The ZigBee network protocols are used to create a mesh network of XBee devices. Besides, fault diagnosis methods comprise the following steps: a) data collection through accelerometer sensors, b) apply DWT to decomposed signals, c) preprocess data steps including creating a set of classes, set reshaping, and label encoding, d) splitting the dataset into 30% for testing and 70% for training, e) then classify data for fault detection and evaluate the performance.

III. PROPOSED METHODOLOGY
To identify defects, the suggested approach used convolutional neural network algorithms. The proposed method's process is shown in Fig. 1. www.ijacsa.thesai.org

A. Data Collection and Pre-Processing
In this study, the Case Western Reserve University dataset [30] is used as a standard guide to assessing the efficiency of the fault detection algorithm. The CWRU data center examined a 2 hp Reliance electric induction motor for regular bearings, single-point drive end (DE), and fan end (FE) faults under diverse settings. Accelerometers with magnetic bases connected to the housing were used to gather vibration data. At both the driving and fan ends of the motor casing, accelerometers were installed at midnight. An accelerometer was also connected to the motor that supported the base plate in specific tests. A 16-channel DAT recorder was used to capture vibration data, then analyzed in a MATLAB environment. MATLAB (*.mat) format is used for all data files. For drive and bearing problems, digital data was captured at 12000 samples per second and data was gathered at 48000 samples per second, the final dataset volume was around 261756 samples [31]. Data on speed and horsepower was collected using a torque transducer/encoder and manually recorded.
The dataset included three working environments with the following conditions: 1) Data collection at 12000 samples per second.
So, from the big data warehouse of the CWRU data center, we gathered a dataset of 16 signals, of which four were normal baseline. The rest had four inner race faults, ball and outer race faults each. These 16 significant signals were individually sliced into 5949 samples so we can later reshape them as needed for feeding into the neural networks. Hence, the total length of the Dataset is 23796 samples and augmented the final Dataset into 261756 samples. The dataset is homogeneous and balanced. A simulator of data collection from bearing based on Accelerometer's sensors is shown in Fig. 2.
The faults were artificially introduced on the SKF Drive End bearing (6205-2RS JEM) using Electro-Discharge Machining. For environments 1, 2, and 3, the collected data provide a varied motor speed of 1797, 1772, and 1750, respectively. Fig. 3 shows the histogram of all elements in the data collection used in the analysis.  We started by loading all of the signal data and creating three sets of data. Label Binarizer was used to label the products and reshaped all three datasets. This work used a random seed value of 0.2 and utilized 30% of our data for testing and 70% of our data for training. For training our model with Convolution 1D, Convolution 2D, and MobileNetV2 used Tensorflow and Keras library because of its convenient coding environment and ability to train a state-of-an-art algorithm for signal processing and computer vision.

B. Convolution 1D and Convolution 2D
The 2D in Conv2D refers to the fact that each channel in the input and filter is two-dimensional, while the 1D in Conv1D refers to the fact that each channel in the input and filter is one-dimensional. Normalizing the data in our Convolution 1D and Convolution 2D models initially set up the first hidden layer with 100 nodes and implemented the RELU activation feature. MaxPooling1D was used with a pool size of 2 to minimize the dimension feature. The second and third convolution layers have 32 and 10 nodes, respectively. The signals were translated into NumPy arrays to speed up the computation. For backpropagation, the learning rate was set at 0.001 and used Categorical Cross-Entropy and the Adam optimizer equation to calculate the loss function. The categorical Cross-Entropy loss function is used for the multiclass classification of the dataset. After applying all of the optimizers (Adam, Nadam, Adagrad, RMSProp, Adadelta, SGD, Adamax), Adam optimizer is chosen for the highest accuracy on the dataset. Since the batch size for instruction is 32, and the decay is set to 0.1. Tables I and II depict the convolution 2D and convolution 1D models, respectively.

C. MobileNetV2
MobileNetV2 is a CNN architecture that tends to be efficient on mobile devices. MobileNetV2 has 32 filters on its initial fully convolution layer. There exist 19 residual bottleneck layers. It is utilized for image classification, object detection, quantization, and so on [32].
Two types of blocks are introduced in MobileNetV2.

1) Residual block of stride 1. 2) Block for downsizing with 2 strides.
Both the blocks are made up of three layers, as illustrated in Fig. 4. With 1x1 convolution, the ReLU6 activation mechanism is used in the first layer. On the second sheet, a depth-wise is added, and the third layer is also a 1x1 convolution, save for some non-linearity. The activation mechanism of ReLu is often included in the third layer [33]. MobileNetV2 performs well when the mathematical operations and the number of parameters are kept low. The MobileNetV2 architecture is about 35% faster than the previous version, MobileNetV1.   The base layer of the MobileNetV2 platform has been frozen and replaced with the proposed trainable layer. The Relu activation feature is extended to 512 nodes in the proposed layer. In the output sheet, there are 4 nodes and a SoftMax activation mechanism for classifying faults. The loss function is determined using categorical cross-entropy, and the learning factor is set to 0.001.

D. Performance Measures
Precision, recall, f1-score, and accuracy evaluate the models' performance after completing the training and testing phase. Precision is the closeness of the measurements to each other, while accuracy is the proximity of the measurements to a particular value. The equations used to calculate the measures are stated in Eq. (1) and Eq. (2). The ability of a model to identify all of the data points of relevance in a dataset is referred to as recall. F1-score measures the rate of recognizing real threats and not being distracted by false alarms. The measures of Eq. (3) and Eq. (4). (1) In the Equations, TP represents true positive, TN represents true negative, and FP and FN represent false positive and false negative.

IV. RESULT ANALYSIS AND DISCUSSION
The proposed methods can successfully identify faulty machinery with high accuracy [34]. After processing collected data samples, all three elements (inner raceway, outer raceway, and ball) considered in our Dataset were found to have 0.011 inches fault depth. The classification report for each class (normal, ball, inner and outer raceway) of the test dataset is shown in Fig. 6. The model shows high performance for all the classes individually.

A. MobilenetV2 Result Analysis
From the test data, the heat map is shown below in Fig. 7.      Table IV shows the accuracy of the training and test sets for Convolution 1D. The highest outcome seen in the training set is 99.97%, while the maximum accuracy demonstrated in the test set is 100 percent. Set 3 was used to train the model, and Set 2 and Set 1 were used for testing. Data sample of the Convolution 1D 's 10 epochs.

B. Convolution 1D result Analysis
The data loss and accuracy graph are shown in Fig. 8. For the ten epochs, accuracy increases and data loss decreases from training data and test data. 208 | P a g e www.ijacsa.thesai.org The scores of precision, recall, accuracy, and f1-score are shown in Fig. 9. The model provides perfect scores for each of the performance tests for all the data classes. Fig. 9. Classification Report of Convolution 1D.
In Fig. 10, the heat map demonstrates the predicted and non-predicted test data. The Convolution 1D accurately predicted 19451 samples in a normal class; however it incorrectly predicted 0 samples. A total of 20997 samples are accurately predicted when it comes to the class ball, whereas no samples are incorrectly predicted. Additionally, 19320 and 18759 samples from the inner and outer classes are accurately predicted, but no samples are incorrectly classified.

C. Convolution 2D Result Analysis
In Table V, the result of Convolution 2D is stated. Both the training and validation sets have an accuracy of 100%. The model shows high accuracy on each epoch. Additionally, the data loss for the test set is much lower.
The data accuracy and loss graph are seen in Fig. 11. It is seen that the data loss for the training set gradually decreases with the increased number of epochs. The accuracy rate stays consistently high for the test set, and the data loss stays consistently low on each epoch.
The classification report in Fig. 12 illustrates that the Conventional 2D model shows a perfect performance score throughout all the classes of the test Dataset, the same as the Conventional 1D model.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 4, 2022 209 | P a g e www.ijacsa.thesai.org In Fig. 13, the heat map is illustrated the predicted and nonpredicted test data. The Convolution 2D successfully predicted 19451 samples in a normal class, yet it wrongly predicted Nil samples. When it comes to the class ball, a total of 20997 samples are properly predicted, while nil samples are wrongly predicted. Furthermore, 19320 and 18759 samples from the inner and outer classes are properly predicted, so the predicted rate is 100%.

D. Result Discussion and Comparison
The CNN algorithms used in the study all show promising results. MobileNetV2 shows the accuracy of 99.64% and lowest data loss of 0.01, whereas both the Convolutional1D and 2D show a perfect accuracy score of 100% at the 10th epoch with CPU time of 82.5secs/epoch and 93secs/epoch, respectively. The comparison among the performance of the algorithms' prediction is illustrated in Table VI. All three algorithms of the proposed system achieve very high performance in detecting faults in machinery. However, fault detection has been done in the past using other machine learning algorithms [35][36][37]. A comparison of the performance of such algorithms with the proposed system is made in Table VII. From the comparison, it is observed that even though CNN and GAN algorithms achieve an accuracy of over 97%, it is much lower than the accuracy achieved by the algorithms proposed in the study.
The proposed system successfully detects a fault in industrial machinery up to 100%. Compared to the existing techniques, the performance is much higher. It is essential to detect faults with the highest accuracy in industrial machinery, as even a minor increase in fault detection accuracy might prevent a tragic accident. Since the machines used in industrial work generally contain a lot of small or big parts, at the same time, machines themselves can be huge, maintaining it can be difficult for humans. The proposed fault detection system can be used to make the process automatic and efficient. Using the system, the sound vibration from the machinery can be processed by MobileNetV2, Convolution 1D, and Convolution 2D to detect faults in the machines to avoid future hassles. A comparison between the proposed study and existing related work. Proposed study offers better findings than existing related studies.

V. CONCLUSION
As scientific and technical knowledge increases, mechanical equipment becomes more sophisticated and automated. Mechanical equipment relies significantly on spinning mechanical components like bearings and lead screws to work properly. Damaged or failed bearing components will cause equipment failure and fatalities. As a result, monitoring the bearing components' performance is essential. This article presented a viable method for bearing defect detection based on accelerometer sensors and the wavelet transformation (DWT) signal processing methodology, with a ZigBee-based wireless sensor network architecture for effectively sending data to a diagnostic server. MobileNetV2 architecture pretrained model compares the model with two custom CNN models: 1D and 2D deep CNN architectures. A bearing dataset collected by accelerometers sensors is used to validate the models that consist of 4 types of fault signals. Upon the four classes, we have achieved a satisfactory result. With MobileNetV2, the system was able to identify problems with 99.64% accuracy and 99.83% precision. It achieves up to 100% accuracy and precision when utilizing the Convolutional 1D and 2D architectures. In the future, the proposed architecture may be tested with more parameters besides