Ada-IDS: AdaBoost Intrusion Detection System for ICMPv6 based Attacks in Internet of Things

The magical buzzword Internet of Things (IoT) connects any objects which are diverse in nature. The restricted capacity, heterogeneity and large scale implementation of the IoT technology tend to have lot of security threats to the IoT networks. RPL is the routing protocol for the constraint devices like IoT nodes. ICMPv6 protocol plays a major role in constructing the tree-like topology called DODAG. It is vulnerable to several security attacks. Version Number Attack, DIS flooding attack and DAO attack are the ICMPv6 based attacks discussed in this paper. The network traffic is collected from the simulated environment in the normal and attacker settings. An AdaBoost ensemble model termed Ada-IDS is developed in this research to detect these three ICMPv6 based security attacks in RPL based Internet of Things. The proposed model detects the attacks with 99.6% accuracy and there is no false alarm rate. The Ada-IDS ensemble model is deployed in the Border Router of the IoT network to safeguard the IoT nodes and network. Keywords—IoT; ICMPv6; version number attack; DIS attack; DAO attack; Ada-IDS


I. INTRODUCTION
Internet of Things (IoT) is a network of embedded objects having unique identifier with sensing and actuation capacities and limited resources. IoT has the ability to connect any objects in the real world to the global network. Though IoT makes the people"s life easier, it has lot of security issues and challenges. The privacy and security vulnerabilities increase as the global network includes greater number of connected devices from various fields and domain [1] [2]. The large volume of connected devices in IoT network are uniquely identified using IPv6 addressing. IPv6 inherited several features from its previous version IPv4. So, it has the associated vulnerabilities of IPv4 and the specific security challenges of IPv6 [3]. These security threats have to be addressed in order to enhance the IoT security schemes.
IoT resource limited devices form Low-Power Lossy Networks (LLNs). To meet the requirements of the LLNs, the Routing Protocol for Low-Power Lossy Network (RPL) is designed. This RPL protocol is exposed to several security threats [4]. In RPL, the routing is performed by the control messages of the Internet Control Message Protocol version 6 (ICMPv6). The control messages construct a Destination Oriented Directed Acyclic Graph (DODAG). It is a tree structure with hierarchy of nodes with a single root node called as Border Router which acts as a gateway to the global network [5].
The ICMPv6 messages are grouped as error messages and informational messages. The communication between the IPv6 nodes totally depends upon the ICMPv6 Protocol. It is also responsible for router and node configuration. The error messages have a preceding "0" in the high-order bit of the "Type" field and the informational message contains a preceding "1" in the "Type" field. ICMPv6 is the backbone of IPv6 and RPL as it has the building blocks such as DODAG Information Object (DIO), Destination Advertisement Object (DAO), DODAG Information Solicitation (DIS) and DAO-Acknowledgement (DAO-ACK) informational messages for constructing the DODAG for routing [6].
The root node initiates the DODAG formation by emitting DIO messages in a multicasting fashion. When a node receives the DIO message, based on the information available in the DIO message, it joins the DODAG and sends back the DAO message to the sender. Then it starts multicasting the DIO messages to its children. The DIO messages are regulated by the Algorithm. In order to identify the neighbors and join the DODAG, a node transmits DIS messages in a unicast or multicast manner. After receiving the DAO messages from the children, the parent node acknowledges the DAO message by sending DAO-ACK messages [7].
RPL and ICMPv6 protocols are prone to several security threats and attacks. According to Anthéa Mayzaud et al. [8], the attacks in RPL protocol are classified into three types such as attack against topology, attacks against resources and attacks against network. The attacks against the resources consumes more resources of the constrained devices, the attacks against topology cause sub-optimization and isolation in the topology and the attacks against the traffic creates security threats using the network traffic.
The ICMPv6 based attacks are created by manipulating the control messages. These attacks cause many damages to the networks. It also leads to Denial of Service (DoS) and Distributed Denial of Service (DDoS) in the resource constrained networks. Version Number attacks, DIS flooding attacks and DAO attacks are some of the ICMPv6 control message based attacks which lead to harmful effects in the IoT environment [9]. Machine Learning models are used to detect the intrusions from the network traces and log files. It is very (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 500 | P a g e www.ijacsa.thesai.org difficult to design IDS that performs well in terms of accuracy and less false alarm rate. Ensemble machine learning algorithms boosts the accuracy by combining many classifiers [10].
In this paper, an AdaBoost ensemble Intrusion Detection System called Ada-IDS is proposed to detect the Version Number attack, DIS flooding attack and DAO attacks in the IoT network. To develop this system, the IoT network communication traces are collected from the normal simulation environment and attack scenarios such as Version Number attack, DIS flooding attack and DAO attack. The Ada-IDS is developed by using the collected network traces. For that, the pre-processing and feature engineering processes are carried out on these collected data. Finally, an ensemble AdaBoost machine learning algorithms is applied on the collected dataset to build the Ada-IDS for detecting the ICMPv6 based attacks. The proposed Ada-IDS detects the Version Number Attack, DIS flooding attack and DAO attacks with 99.6% accuracy and with very less false alarm rate.
The rest of the paper is organized as follows: Section II explicates the related works of this research. The three ICMPv6 based attacks are explained in Section III. The Icmpv6 dataset used in this research and the proposed Ada-IDS is elaborately discussed in Section IV. The results obtained by the Ada-IDS model are presented in Section V. Finally, the Section VI concludes the paper.

II. RELATED WORK
Adnan Hasan Bdair et al. [11] critically reviewed the latest ICMPv6 based Intrusion Detection mechanisms with a special focus on the Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks. Three types of ICMPv6 based attacks such as ICMPv6 flood, ICMPv6 amplification and ICMPv6 protocol exploitation were addressed. Different types of Intrusion Detection systems for ICMPv6 based attacks were also explicated in this paper.
Arul Anitha et al. [12] proposed an Artificial Neural Network based Intrusion Detection System for Internet of Things using Multilayer Perceptron for detecting the Version Attacks and DIS attacks from the dataset collected from the Cooja Simulator and the proposed method classified the attacks and normal nodes correctly.
EmreAydogan et al. [13] developed a Centralized Intrusion Detection System for RPL based Industrial IoT using Genetic Programming concept. This system detects "Hello Flood Attacks" and "Version Number Attacks" using the Genetic Algorithm approach with 50 population and other default parameters. Network traces are not analyzed in this work.
Nour Mustafa et al. [14] developed an AdaBoost ensemble Network Intrusion Detection System (NIDS) by using Decision Tree (DT), Naive Bayes (NB) and Artificial Neural Network (ANN) algorithm. This system detects the application layer related IoT attacks. The UNSW-NB15 and NIMS botnet dataset were used to develop this ensemble model. According to their findings, the proposed model detects the attacks in the UNSW-NB15 dataset with 99.54% accuracy and NIMS botnet dataset with 98.29% accuracy.
Dan Tang et al. [15] proposed a multi-feature based AdaBoost system for detecting the low-rate Denial of Service (LDoS) attacks. At fixed time intervals the network traffics were captured and the obtained samples were analyzed using various statistical measures. The correlation scores between the features and the class labels were attained to choose the optimal feature set. Using the optimal features, the AdaBoost ensemble model was developed. NS2 simulator and a test-bed were used to evaluate the performance of the model and achieved 94.05% and 97.06% attack detection accuracy respectively.
A.R.Javed et al. [16] proposed an AdaBoost ensemble classifier to detect botnet attacks in connected vehicles. The decision tree algorithm was used as the base estimator and the cluster size was 100 in the AdaBoost algorithm. The performance of the AdaBoost classifier was compared with the decision tree, probabilistic neural network and sequential minimal optimization. According to their findings, the AdaBoost classifier outperformed other models and achieved 99.7% true positive rate and 99.1% accuracy.
Amin Shahraki et al. [17] performed a comparative analysis on various AdaBoost algorithms like Real Adaboost, Gentle Adaboost and Modest Adaboost using the well-known Intrusion detection datasets such as KDDCUP99, NSL-KDD, CICIDS2017, UNSW-NB15 and TRAbID. In this research, the authors identified that Gentle AdaBoost and Real AdaBoost performed better than the Modest AdaBoost algorithm. At the same time, the Modest AdaBoost algorithm was faster than the other AdaBoost algorithms.

III. ICMPV6 ATTACKS IN RPL BASED IOT
The ICMPv6 protocol is susceptible to various security threats and attacks. In this research, three ICMPv6 based attacks are implemented such as Version Number Attack DIS attack and DAO attack. The characteristics of these attacks are explained below:

A. Version Number Attacks
Version Number is an 8-bit number which denotes the Version of the DODAG topology. It is multicasted by the parent nodes using the DIO control message. Whenever there is an inconsistency in the DODAG, the global repair mechanism is initiated and the Version Number is updated by the root node. This updated information is multicasted from the root node via DIO control message. A Version Number Attacker without the knowledge of the root node updates the Version Number periodically and sends the updated version number using the DIO messages to its neighbors. On receiving this DIO message, the neighboring nodes join the global repair mechanism. Hence, the DODAG is reconstructed again and again. This malicious act affects the normal responsibilities of the legitimate nodes and consumes the constrained resources of the IoT devices. In the long run, it increases the control traffic while constructing the DODAG repeatedly in the network and this leads to Denial of Service (DoS) attacks [18] [19]. www.ijacsa.thesai.org

B. DIS Flooding Attacks
This attack is created by manipulating the header details of the DIS messages. The DIS Control messages are used to probe its neighbors in order to join the DODAG. On receiving this DIS message, the neighbor nodes send back DIO messages to the sender. The Time duration for sending DIO messages is scheduled by the Trickle Timer. A DIS flooding attacker continuously multicasts DIS messages to its neighbors even though it received DIO message already. This heavy flooding of DIS messages in the network degrades the performance of the Network and leads to Denial of Service (DoS) attack [20].

C. DAO Attacks
DAO attack is generated by manipulating the DAO Control Message. When a Child node receives a DIO message from its parent, it has to send back a DAO message for maintaining the reverse root. The DAO message sent by the child node traverses multiple ancestors until it reaches the root node. A DAO attacker continuously transmits the DAO message to its parent list. All such unnecessary messages in the network have to be forwarded to the root node. It consumes more network resources and also prohibits the legitimate nodes to perform regular activities. Finally, the network will be in an inconsistent state which causes Denial of Service (DoS) attacks in the network [21].
These three attacks are created by using the ICMPv6 control messages which consumes more resources in the IoT network and reduces network performance. At last, all the three attacks lead to Denial of Service (DoS) attack which causes more damage to the RPL based IoT network.

IV. PROPOSED ADA-IDS MODEL
Network or Centralized Intrusion Detection System and Distributed Intrusion Detection System are the major two categories of IDS. In the centralized concept, the IDS is installed in the border router or a dedicated server. In the Distributed IDS, it is deployed in the client nodes. As the IoT nodes are resource constrained, the Distributed IDS concept is not suitable for limited resource devices.
The proposed Ada-IDS belongs to the Centralized IDS category. It monitors the nodes in the network and whenever there is an intrusion occurs, it raises an alarm to notify the admin about the issue. The various phases involved in developing the Ada-IDS are given in Fig. 1.
As it is given in Fig. 1, there are five phases for developing the Ada-IDS that are Data Collection Phase, Pre-Processing Phase, Feature Engineering Phase, Model Building Phase and Deployment Phase.

A. Data Collection Phase
The data is collected from the simulation environment. There are 50 normal client nodes, one root node and an attacker involved in the simulation. The Version Number Attack, DIS flooding Attack DAO attacks and a simulation without attacker are implemented in the Cooja simulator and the network traces from all these experimental setups were captured using the 6LoWPAN Analyzer tool. The simulation is performed for 30 minutes in each scenario. The captured packets are analyzed using the WireShark tool and the .pcap files were converted into .csv files. The file is named as "Icmpv6.csv" that is used for building the Ada-IDS model. The collected dataset contains normal packets, Version Number Attacks, DIS flooding Attacks and DAO Attacks. The Normal and Attack instances are listed in Table I.
As it is given in Table I Table II explains the attributes of the Icmpv6 dataset. The screenshot with sample records captured using python code is shown in Fig. 2.
As it is given in Fig. 2, the Class field and Type field denote whether a packet is attack or normal. The Type field also gives the details of an attack as Version Attack, DIS Attack or DAO Attack.

B. Pre-Processing Phase
The dataset collected from the simulation environment has to undergo a pre-processing stage in order to be relevant for building the AdaBoost ensemble model. There are 394 missing values in Source and Destination fields. Since these two fields represent the IPv6 address of the nodes, the missing values cannot be replaced by mean, median or mode values. A new value is given for the Source and Destination Addresses.

C. Feature Engineering
One hot encoding and label encoding are performed on the categorical features to make them relevant for the ML algorithms. The frequency encoding is applied for the "Time" feature. The Class feature is created which separates the Normal data samples from the Attack samples. The Type feature categorizes the different types of attacks such as DIS Attack, DAO Attack and Version Number Attack. The feature "No." indicates the packet number which doesn"t have any significance in predicting the target and hence it is eliminated from the dataset. The null values in the "Source" feature are replaced by a dummy value "a". Similarly, the null values in the "Destination" field are replaced by a dummy value "b". After the accomplishment of the pre-processing and feature engineering tasks, the dataset will look like the Fig. 3.
As shown in Fig. 3, all the categorical values of the dataset are converted into numerical values. Now, the dataset is relevant for model building.

D. Model Building Phase
The pre-processed dataset with eight features is used in this experiment. The combined dataset has 127684 data samples. 80% of the data samples are split into a training set which contains 102147 instances and the remaining 20% of data samples are treated as the test set which contains 25537 instances.

E. AdaBoost Ensemble Model
An Ada-Boost (Adaptive Boosting) model is built to detect the Version Number Attack, DIS flooding attack and DAO attacks in the IoT environment. It was developed by Yoav Freund and Robert Schapire in 1996 as a classifier that uses ensemble boosting. Classifier accuracy is improved by combining multiple classifiers [22]. AdaBoost classifier creates a powerful classifier by combining several weak classifiers, resulting in a powerful classifier with high accuracy. The basic idea behind Adaboost is to train the data sample and adjust the classifier weights in each iteration, so that unusual observations can be accurately predicted [23]. Interactive training on a variety of weighted training examples should be used to fine-tune the classifier. It tries to minimize training error in order to provide the best fit possible for these examples in each iteration. The steps for obtaining the ensemble model are given below: 1) Adaboost begins by picking a training subset at random.
2) The AdaBoost machine learning model is trained iteratively by selecting the training set based on the accuracy of the previous training prediction.
3) It gives more weight to observations that were incorrectly classified, increasing the likelihood that these observations will be correctly classified during the next iteration.
4) Additionally, the trained classifier is given more weight in each iteration based on how accurately it classifies. 5) Classifiers that are more precise will be given more credit.

6)
In this process, the training data is iterated until it fits perfectly, or until the specified maximum number of estimators has been reached.
In AdaBoost classifier, there are three basic parameters such as base_estimator, n_estmator and learning_rate. The parameters used in this research are given below:  base_estimator: A weak learner is used to train the model. In this work, the default DecisionTreeClassifier is used to train the ensemble model.
 n_estimator: It specifies how many weak learners are used for training the model repeatedly. In this model 10 estimators are used. The performance is analyzed. Then increment by 10 until it reaches 100 estimators.
 learning_rate: The default learning rate is 1, it denotes the weights of the weak learner. In this ensemble model, the default learning rate is used.
In AdaBoost ensemble approach, weak learners are combined to improve accuracy, which is done iteratively by fixing the faults of the weak classifier. AdaBoost isn't prone to being overfit issue. Though AdaBoost has these advantages, the performance is degraded if there are outliers in the dataset.

F. Deployment Phase
The proposed Ada-IDS model is installed in the Border Router (Gateway). The Pseudo Code for the Ada-IDS is given in Fig. 4.
This Ada-IDS detects the icmpv6 based attacks such as Version Number Attacks, DIS flooding attacks and DAO attacks in RPL based IoT networks. www.ijacsa.thesai.org

V. RESULT AND DISCUSSION
This section elaborates the results obtained by the AdaBoost ensemble model. After accomplishing preprocess and feature engineering phases, the dataset is split into two sets like training and testing set. The training set contains 80% of the original data samples and the testing set consists of 20% of the dataset. The No. of samples in both categories is given in Table III. The training samples are used to build the AdaBoost ensemble model. The DecisionTreeClassifier is selected as the weak classifier to fine tune the model iteratively. The learning rate parameter takes the default value. The no. of base_estimator is initially given as 10. The training time and testing time with 10 base estimators are analyzed. The testing accuracy for the AdaBoost Classifier with 10 base estimators is noted. To check whether there will be any change in the accuracy with respect to the number of estimators, the base estimator is incremented by 10 until it reaches 100. Surprisingly, the accuracy is 99.6% and it is not affected by the number of estimators used for building the AdaBoost classifier. The parameters and accuracy of the AdaBoost ensemble model is listed in Table IV. As it is given in Table IV   (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 505 | P a g e www.ijacsa.thesai.org

A. Evaluation Metrics
There are three classes of attacks in the dataset. The confusion matrices are generated for each experiment which shows the actual and predicted class labels for each sample. To evaluate the performance of the models, the metrics such as accuracy, precision, Recall, F-Score are also computed [24].
 F-Score: F-Score combines the properties of both precision and recall and it expresses them using a single measure. The formula for computing the F-Score is given in Eq.4.
F-Score = 2*(Recall*Precision)/(Recall +Precision) In this work, the CPU time for training the model and testing the model are also taken into account for each experiment. The confusion matrix obtained for each experiment is almost the same and it is given in Table V. In Table V, the correctly classified samples in the testing set are given blue color text, but the misclassified samples are denoted by using red font color. As it is shown in the table, all normal events are identified correctly. There are very few misclassifications in other categories. Using the confusion matrix and by applying the equations Eq. 1 to Eq. 4, the accuracy, precision, recall and f1-score values are calculated and listed in Table VI. As Table VI denotes, the Ada-IDS model, developed by using AdaBoost Ensemble model with DecisionTreeClassifier provides better results in terms of accuracy, precision, recall and f-score. The obtained confusion matrix is the same for all observations, so that it gives the same accuracy, precision, recall and f-score values. Since it doesn"t have any false alarm-rate, it is suitable for anomaly detection. The Ada-IDS is implemented in the Border Router (6BR) to safeguard the connected devices in the IoT network.

VI. CONCLUSION
The security attacks are inevitable in RPL based Internet of Things as they have limited resources compared to other networks. In this paper, an ensemble IDS named Ada-IDS is developed using the AdaBoost ensemble model and it is deployed in the Border Router to protect the IoT network from Version Number Attack, DIS flooding Attack and DAO Attack. According to the experiments, this Ada-IDS ensemble model detected these three types of attacks with 99.6% accuracy and with no false alarm rate. Hence, it will act as an anomaly based Intrusion System. It is suitable for all IoT domains and it acts as a shield to protect the nodes from flooding of ICMPv6 messages, unnecessary version updates and bulk sending of the DAO message in the RPL based IoT network. Availability and reliability of the IoT nodes for their normal responsibilities are also ensured. To enhance this system further, more ICMPv6 related attacks can be included in the "icmpv6.csv" dataset.