CNN-BiLSTM Hybrid Model for Network Anomaly Detection in Internet of Things

—Anomaly detection in internet of things network traffic is a critical aspect of intrusion and attack detection, in which a deviation from typical behavior signals the existence of malicious or inadvertent assaults, faults, flaws, and other issues. The necessity to examine a large number of security events to identify anomalous behavior of smart devices adds to the urgency of addressing the challenge of picking machine-learning and deep learning models for identifying anomalies in network traffic. For the challenge of binary data categorization, a software implementation of an intrusion detection system based on supervised-learning algorithms has been completed. The UNSW-NB15 open dataset, which contains 2,540,044 records - vectors of TCP/IP network connection signals and their associated class labels are used to train and test the system. This research compares different machine-learning models and proposes CNN-BiLSTM hybrid model for IoT network intrusion detection. The metrics for measuring the quality of classification and the running duration of algorithms for different ratios of train and test samples are the result of the built framework testing.


INTRODUCTION
The Internet of Things (IoT) is a network of electronic devices with built-in technologies that allow them to connect with one another and with the outside world. The Internet of Things (IoT) idea has been ingrained in our daily lives, presenting consumers with new options ranging from home automation to medical equipment [1]. IoT devices can effectively gather, analyze, and send massive volumes of data thanks to ultra-high-speed wireless networks and a sophisticated electronic database. Microelectronic improvements combined with low power consumption have made it increasingly easier to operate IoT devices in remote places with minimum physical oversight and maintenance [2]. Although IoT devices appear to be innocent, they are not without security and privacy concerns, since the present IoT framework contains several risks and vulnerabilities.
According to analysts, the Internet of Things will soon become a part of everyday life. According to IDC, the worldwide market for relevant solutions was valued at $ 646 billion in 2018, and it will surpass the trillion-dollar level by 2022. All of this pushes us to learn more about the security of IoT systems [3].
Automated methods, for managing and interpreting the data are required due to the complexity and diversity of data created by heterogeneous devices. Therefore, machine learning technologies that enable the development of profiles of device behavior in the network, anomalies detection and prediction of abnormal scenarios, claim the role of technology in automatically detecting dependencies and connecting devices [1].
Peripherals, sensors, gateways based on industrial communication protocols, centralized data storage; and end devices users interact with the four major pieces of an Internet of Things system. The addition of big data tools and systems based on machine learning technologies to this setup results in the creation of a new block ( Fig. 1) that is responsible for the quality of data and, as a result, the quality of the system's choices and alerts. Furthermore, centralized or cloud data storage expenses are decreased due to adaptive prioritizing and filtering of the information [2][3].
The difficulty with the advancement of attacks is that it is getting more difficult to detect and distinguish between legal and malicious network data. Intrusion detection systems (IDS) [4] do a good job of identifying malicious traffic, but they must be regularly updated with rule sets and upgrades in order to remain relevant when it comes to detecting changing threat vectors. Even if the major corporations disclose fresh sets of regulations on a regular basis, this may not be enough. As a result, the question of employing different methods for identifying irregular incursions becomes significant. The use of machine learning algorithms [5] is one of these ways. Machine learning is used because it can help automate threat processing and keep the system up to date by studying and detecting threats. That is, the software is taught to detect different types of communications in order to classify them and reject or skip them [6]. The following is a reminder of the paper. The next section discusses relevant work on detecting Internet of Things network anomalies using various machine learning algorithms. The third section discusses the problem statement. Section IV depicts the materials and procedures employed in the current study, as well as the research flowchart, dataset, and assessment criteria. In Section V, we provide the outcomes of the experiments and compare machine-learning approaches based on various factors. The results are discussed in Section VI by mentioning obstacles, open questions, and future views. The paper comes to a close in the Section VII.

II. RELATED WORKS
In this part, we look at studies that employ machine learning-based techniques to solve the challenge of detecting network abnormalities. Recent research suggests that machine learning (ML) techniques might be ideal for detecting anomalies in network data [7]. For example, Abou Daya et al. [8] used machine learning to leverage correlations between packet and flow-level data. On many anomaly detection tasks, Gaddam et al. [9] offered a solution that combined K-means clustering with an ID3 decision tree. For DDOS detection in self-defined networks, Alamri and Thayananthan [10] used XGBoost [11]. Shone et al. [12] developed a deep autoencoder (NDAE) for unsupervised feature learning and intrusion detection utilizing stacked NDAEs. To learn from anomalous traffic, Zhange et al. [13] created a semi-supervised learning system. For intrusion detection, Ullah et al. [14] developed an LSTM-based model using autoencoders. An XGBoost-DNN model was presented by Devan et al. [15] to identify cyber assaults. To solve the unbalanced class problem, Du et al. [16] integrated reinforcement learning with the SMOTE method. For the network anomaly detection problem, we now look at each machine learning approach independently. K nearest neighbour (K-NN). The KNN technique is one of the most basic and widely used nonparametric methods. It estimates the approximate distances between the input vectors' different points, then assigns the unlabeled point to the class of its K-nearest neighbor. When building a KNN classifier, the parameter (K) is crucial, and various values (K) might have varied outcomes. If K is big, the neighbors utilized for prediction will take a long time to classify and have an impact on accuracy [17].
Zhu et al. offer a Grid-based Approximate Average Outlier Detection (GAAOD) framework to maintain KNN-based anomaly recognition in network traffic streaming data [18]. In the first stage, the proposed framework presents a grid-based coefficient to control resulting data. It can self-adaptively configure the resolution of units, and reach the target of effectively filtering items that cannot become outliers. In the second stage, GAAOD framework utilizes a min-heap-based method to calculate the upper-/lower-bound distance between items and their k-th nearest neighbors. In the third stage, the author applies a k-skyband based method to support anomaly items and possible anomaly items. Technical outcomes prove the effectiveness and high correctness of the proposed approach.
Bayesian networks. A Bayesian network (BN) is a mathematical model for encoding probabilistic correlations between variables. This strategy is typically used in conjunction with statistical schemes for intrusion detection. It has several benefits, including the capacity to encode interdependencies between variables and predict occurrences, as well as the ability to incorporate existing knowledge and data [19].
The BN system, according to Lotfollahi et al. [20], provides the necessary mathematical foundation for making an apparently complex operation simple. They expected that by comparing the measurements of each network traffic sample, BN-based IDS would be able to identify assaults from regular network activity. Mohammed et al. [21] employed a controlled Naive Bayesian classifier and 248 function streams to distinguish between several sorts of information, including packet length and delivery time, as well as a variety of TCP headers. To find strong functions, feature selection correlation was performed, and it revealed that just a small subset of fewer than 20 features is required for accurate classification.
Neural networks (NNs). The behavior of numerous users and daemons in a system is predicted by NNS. If correctly planned and executed, NNS can alleviate many of the issues that rule-based systems have. The key benefit of NNS is their tolerance for erroneous data and information, as well as their capacity to generate solutions without prior understanding of data patterns [22]. www.ijacsa.thesai.org This, paired with their capacity to generalize the facts under investigation, qualified them for IDS. Data representing attacks and non-attacks must be fed into the machine learning model for automated modification of network coefficients during the training stage in order to use this technique to IDS. The most prevalent types of regulated neural networks are multilayer perceptron (MLP) and radial basis function (RBF) [23].
Only linearly separable instances of sets may be systematized using MLPs. The perceptron will be able to discover a solution if a straight line or a plane can be drawn to partition input examples into permissible categories, and the input instances are linearly separable. Learning will never reach the point where all examples are adequately systematized if the instances are not linearly separable. To address this issue, multilayer perceptrons (artificial neural networks) were developed.
There have been studies that have used multilayer perceptions to develop intrusion detection system for network traffics, which has the capacity to identify both legitimate and malicious connections, such as [24]. MLPs of three and four layers of a neural network were used to implement them.
Another prominent form of neural network is the Radial Basis Function (RBF). RBF networks are significantly quicker than back propagation because they accomplish classification by measuring the distance between inputs and RBF centers of hidden neurons. They are best suited for problems with a high sample size.

Decision tree (DT).
Quinlan [25], for example, characterized decision trees as "a useful and widely used categorization and forecasting method. A decision tree is a tree made up of three primary parts: nodes, arcs, and leaves. Each node has a unique characteristic that is the most informative of the features not yet examined on the path from the root. Each sheet is allocated to a category or class, and each arc from the node identifies the values of the node attribute. Starting at the root of the tree and working down until a node leaf is reached, a decision tree may be used to categorize a data point. The data point is classified using the node sheet. Quinlan's ID3 and C4.5 are the most widely used decision tree implementation alternatives." As an intrusion detection model, Davahli et al. [26] recommended employing decision trees (DT) and the support vector machine (SVM). They also created a hybrid DTSVM technique that employs both SAM and DT as fundamental classifiers. Decision trees were adapted by Ghanem et al. [27] for DDoS attacks, R2 as well as U2R assaults, and scanning attacks. The ID3 method is used as a learning algorithm to generate a decision tree automatically. Support Vector Machine (SVM). Cortes and Vapnik [28] proposed the support vector machine (SVM) technique. The input vector is transformed into a multidimensional feature space by SVM, which then finds the best separating hyperplane in a high-dimensional feature space. Furthermore, because the boundary solution, i.e. the separating hyperplane, determines the reference vector rather than the whole training sample, it is impervious to significantly deviating values. SVM is especially well-suited to binary classification. That is, to distinguish between two sets of training vectors with distinct class labels. The penalty function, which is a user-defined parameter in SVM, is also available. This helps users to strike a balance between the amount of samples and the erroneous solution border width categorization.
Mukkamala et al. [29] used SVM "core classifiers and classifier design approaches to apply to the network with the task of identifying abnormalities." They looked at the impact of core type values and parameters on the Support Vector Machine's (SVM) intrusion classification accuracy. The PSA-SVM model was suggested by Gauthama Raman et al. [30], where the PSO standard is used to establish the free parameters of the support vectors and the binary PSO is utilized to produce the optimal subset function in the intrusion detection system. Eskandari et al. [31] provided a model of an intrusion detection system based on network traffic behavior and message analysis and categorization. Anomalies are detected using two artificial intelligence methods: the Kohonen neural network (KSN) and support vectors (SVM).
Deep learning. Recurrent neural networks paired with long short-term memory are investigated in this research [32] for their ability to identify Internet of Things malware. Models constructed using more traditional machine learning techniques are compared to the results of the experiment. These techniques include the Support Vector Machine, the Naive Bayes classifier, the random Forest, adaptive Boosting, and the k-nearest neighbors algorithm. According to the findings of the inquiry, the technique based on deep learning gives the greatest outcomes. Other deep learning models were not compared since there was none.
As described in the study [33], a variety of deep learning methods for recognizing DDoS attacks are being researched, including multilayer perceptron, convolutional neural network, RNN-LSTM, CNN+LST ensemble, and RNN-LSTM and CNN. Their performance is compared to that of standard machine learning algorithms such as the support vector machine, Bayesian classifier, and random forest, among others. They reach the conclusion that deep learning approaches, particularly recurrent networks, are more successful than standard methods.
It is proposed in the research [34] that an auto-encoder and a deep neural network with direct communication be utilized to develop their own anomaly detection solution for industrial Internet of Things systems that they feel will be effective. When the properties of the newly constructed model are compared to those of many previously developed anomaly detection approaches, such as the deep trust network [35], the recurrent network [36], the DNN [37], and the Ensemble-DNN [38], the results show that the newly constructed model outperforms them all. Meanwhile, these models were evaluated on multiple subsets of the source data as well as on a range of different hardware and software platforms at various points in time, according to the research.

III. PROBLEM STATEMENT
It is required to define the mathematical and software techniques in order to analyze abnormalities in network traffics. Anomaly detection, according to our findings, leads to www.ijacsa.thesai.org a data categorization issue. We divide the traffic into two categories: regular traffic and abnormal traffic. As a result, the issue is a binary classification problem. We will utilize basic mathematical methods to identify severe fluctuations in the graph, such as: This is the total of all potential variations from time t1 to time t2. The formula will look like this since the function is discrete: In the next part, we utilize machine learning approaches to discover IoT network abnormalities and assess them using various measurement parameters for the supplied dataset.

IV. MATERIALS AND METHODS
In this part, we describe the whole outline of the Machine Learning (ML)-based system that has been recommended for fault and attack differentiation. According to the results presented in Section III, it might be difficult to differentiate between assaults that behave similarly to node issues at the receiving ends due to the fact that their impact on the communication channel is identical. If we monitor the state of the channel, there is a chance that we will be able to record the state transition activities that the attackers execute in order to produce a number of attacks. We came to the conclusion that the best way to overcome the challenge of differentiating between assaults and difficulties on the receiving end was to directly monitor the channel data. Next, in order to differentiate between the two abnormality groups based on channel qualities, we used machine learning models to fit those measurements (and hence channel state).

A. Methodology
As can be seen in Fig. 2, the whole process may be broken down into three distinct stages. In the initial step of development, the system is modeled for the normal, faulty, and attack classes respectively. As a consequence of this, the second step entails conducting a number of execution scenarios with the purpose of constructing datasets that define the behavior of the system under normal, faulty, and attack settings. In the third phase, the gathered datasets are put to use in order to assess a number of supervised machine learning algorithms for classification purposes in relation to the differentiation issue.
As a result, the proposed framework is flexible in that it may be used to investigate multiple classes of defects and assaults in a variety of experimental setups, as well as to evaluate the datasets generated by different supervised machine learning algorithms. Furthermore, by concentrating solely on the features of the communication channel, this framework is insensitive to the characteristics of the devices employed in any cyber-physical system of any type. This part focuses on this general framework and goes through the anomaly classes, various ML classification techniques we are looking at, and the evaluation metrics we are using to evaluate the algorithms.

B. Data
An open data collection UNSW-NB15 [39,40] was chosen as experimental data for the examination of DNN models in the tasks of detecting network abnormalities in the Internet of Things. It contains 2,540,044 records -vectors of TCP/IP network connection attributes and their related class labels. Network packets in this collection of data provide information about typical network activity as well as nine different forms of attacks: fuzzers, analyzers, backdoors, denial of service (DOS), exploits, generic, Reconnaissance, shellcode, and worms. UNSW-NB15 data contains 47 characteristics, such as IP addresses, port numbers, transaction bytes, and so on [41], as well as two class labelsthe attack category and the connection abnormality labelfor training and testing intrusion detection systems. The first 35 characteristics are for integrating data packet information, while the remainder is for connection circumstances.
The process of detecting deviations from the system's typical profile is known as anomaly detection. To detect anomalies in UNSW-NB15 network data, a binary www.ijacsa.thesai.org classification is utilized, with the connection anomaly criteria serving as a class label, with 0 corresponding to the normal profile and 1 corresponding to anomalies.

C. The Proposed CNN-BiLSTM Hybrid Model
This study uses BiLSTM as the model's foundation since it can successfully extract data characteristics. It can perform high-level abstraction and nonlinear transformation of intrusion data, evaluate two-way data information, and give more finegrained computation. BiLSTM is an upgraded variant of LSTM. Fig. 3 displays the CNN-BiLSTM structure that has been suggested. The distribution of the data in the neural network may alter after BiLSTM analysis of the data. Batch Normalization process is used to address the inconsistent data distribution problem while deep neural networks are being trained. Deep neural network training may be sped up by batch normalization. After the nonlinear transformation of the activation function, it normalizes the input data of the preceding layer, ensuring the network's trainability and enabling the neural network to continuously maintain the consistency of the input data distribution, thereby minimizing significant changes in the network's node distribution. The network's convergence rate may be accelerated while maintaining the neural network's capacity for representation.
In the IoT, information flow often exhibit significant local correlations, and some of this information even directly correlates with information across a long span. The Bidirectional LSTM network can handle this time-sequential data successfully by using an algorithm to filter out the important and irrelevant information from the data. Hence, in order to enhance the detection capabilities of the detection system, this study incorporates the BiLSTM network based on CNN. The suggested CNN-BiLSTM IoT intrusion detection model is shown in Fig. 4.
The first thing that has to be done in the detection model is to do some kind of preprocessing on the original data set. The process begins by converting all of the data into numerical data, which is followed by the standardization and normalization steps. The data that has been processed will now go into the record representation layer. When the data has been preprocessed, the record presentation layer will use an embedded representation for each individual item of data. The output feature is generated once the features of all the data have been twisted using the convolution check.
While obtaining the feature sequence, all of the features acquired by convolution are layered on one another. The pooling layer receives the feature map from the convolution layer after it has been processed by the convolution layer to produce the feature map. The feature sequences are then pooled together by the pooling layer. The eigenvector may be obtained by first dividing the input data into M blocks, then taking the maximum value for each block, and then splicing all of the results together. This process is known as maximum pooling.
After the pooling of the data in the layer for pooling the data, the acquired feature sequence is then fed into the layer for the BiLSTM. The long-term memory layer is made up of two LSTM modules that are facing in opposite directions, and various weights that are shared between them. The BiLSTM module will choose and then delete each individual piece of data in sequence.
Upon the completion of the data processing, the CNN-BiLSTM network acquires the data features. In order to integrate these feature sequences, a full connection layer is used, and the results that are acquired from the utilization of the full connection layer are then entered into the softmax classifier. In the last step, the results of classifying each piece of information are acquired. www.ijacsa.thesai.org

D. Evaluation Metrics
In machine learning tasks, the following metrics are most often used to evaluate the effectiveness of constructed models [42]: accuracy (precision), completeness (recall), F-measure (F-score), ROC-Curve (from the English Receiver Operating Characteristic curve -error curve), AUC-ROC and AUC-PR (from the English Area Under Curve -the area under the error curve and the area under the precision-recall curve) [43].
After classification, to obtain four types of results is possible. Table I demonstrates different classification parameters, where ' is the algorithm response on the object, and is the true class label on this object. Overall accuracy or accuracy is an indicator that evaluates the correctness of anomaly detection. The overall accuracy determines what percentage of the data the system or algorithm can classify correctly. Calculated by the formula:

Neg Pos
The precision of a classification system may be measured by the percentage of items that are labeled positive by the classifier and are, in fact, positive: The completeness of the data is not affected by the distribution of the data, in contrast to accuracy. Completeness does not represent the number of things that are wrongly identified as positive, and accuracy does not provide any information about the number of positive objects that are incorrectly identified [44].
The (F-score, Fß) combines the above two metrics into one measurement parameter: Where βtakes values in the range 0 < β < 1 if accuracy is given priority, and β > 1 if completeness is given priority.
The F-measure reaches a maximum with completeness and accuracy equal to one, and is close to zero if one of the arguments is close to zero.
The ROC curve, also known as the error curve, is a graph that shows the relationship between the algorithm's sensitivity (TPR, True Positive Rate) and the proportion of objects in a negative class that the algorithm predicted incorrectly (FPR, False Positive Rate) when the threshold of the decisive rule is changed [45]: In addition to these evaluation parameters, we used area under the curve receiver operating characteristics (AUC-ROC) parameters.

V. EXPERIMENTAL RESULTS
Data preparation (1) entails preparing an input data set, which includes 47 indicators of network connections and class labels, in a manner that can be fed into the studied models. To nominal-type information like IP addresses, protocol names, and data transfer services, one-hot encoding, a method of describing categorical variables in the form of binary vectors, is used.     In the IoT network anomalies detection challenge, Table II shows a comparison of the investigated machine learning algorithms and training time values. As shown in the table, support vector machine (SVM) has a high level of accuracy in detecting network anomalies, but it takes a long time to train. As a result, it is unfit for real-time anomaly identification. In comparison, for the provided dataset, logistic regression is the best approach for detecting network abnormalities in internet of things. Fig. 7 and Fig. 8 demonstrate performance evaluation and training time comparison in graphical form. In Fig. 7, we compare six machine learning methods by four evaluation parameters as accuracy, precision, recall, and F1 score. As it is illustrated in the figure, random forest, Adaptive Boosting (AdaBoost), and k nearest neighbours (KNN) show higher performance in the measured evaluation parameters than the other machine learning methods. Nevertheless, we can also consider training and testing time of each algorithm to understand how fast the applied method copes with the given problem.    Thus, we compared different machine learning methods for network anomalies detection problem in two types of performance parameters. The results show that Logistic Regression is more suitable for practical use than the other methods in intersection of two indicators. It has comparatively short training time and high accuracy in network anomalies detection.

VI. DISCUSSION
The goal of this research is to explore machine learning and deep learning models for identifying abnormalities in Internet of Things network data and develop a new deep learning model for the given problem. Deep learning models were evaluated utilizing a single set of hardware and software, as well as equal sections of the UNSW-NB15 dataset for training and testing. Test models include logistic regression, random forest, KNN, decision tree, Naive Bayes, SVM, and Adaptive Boosting. The built models have high rates of IoT network anomaly detection accuracy, ranging from 80% to 88%. The article proposes CNN-BiLSTM hybrid model for detection of anomalies in internet of things network. The proposed deep model shown about 96% accuracy. In addition, the paper chooses the best machine learning model based on the amount of time it takes to train the model and the importance of identifying abnormalities in internet of things network traffic.
It is intended to continue examining the properties of models employed in cybersecurity jobs in the future. One of the upcoming research objectives is to look at the effect of internet of things network traffic topology on the performance metrics of deep learning models [46]. Based on the findings, a deep CNN-BiLSTM strategy is proposed for recognizing and linking security incidents.

VII. CONCLUSION
The goal of this research is to look at machine learning models for identifying abnormalities in Internet of Things network data. Deep learning models were evaluated utilizing a single set of hardware and software, as well as equal sections of the UNSW-NB15 dataset for training and testing. Test models include logistic regression, random forest, KNN, decision tree, Naive Bayes, SVM, and Adaptive Boosting. The built models have high rates of network anomaly detection accuracy, ranging from 80% to 88%. The article offers suggestions for selecting the best deep learning model based on the amount of time it takes to train the model and the importance of identifying abnormalities in network traffic.
It is intended to continue examining the properties of models employed in cybersecurity jobs in the future. One of the upcoming research objectives is to look at the effect of network traffic topology on the performance metrics of deep learning models. Based on the findings, it is proposed to build a deep learning-based strategy to recognizing and linking security incidents.