Efficient Intrusion Detection System for IoT Environment

—These days, the Internet is subjected to a variety of attacks that can harm network devices or allow attackers to steal the most sensitive data from these devices. IoT environment provides new perspective and requirements for Intrusion detection due to its heterogeneity. This paper proposes a newly developed Intrusion Detection System (IDS) that relies on machine learning and deep learning techniques to identify new attacks that existed systems fail to detect in such an IoT environment. The paper experiments consider the benchmark dataset ToN_IoT that includes IoT services telemetry, Windows, Linux operating system, and network traffic. Feature selection is an important process that plays a key role in building an efficient IDS. A new feature selection module has been introduced to the IDS; it is based on the ReliefF algorithm which outputs the most essential features. These extracted features are fed into some selected machine learning and deep learning models. The proposed ReliefF-based IDSs are compared to the existed IDSs based correlation function. The proposed ReliefF-based IDSs model outperforms the previous IDSs based correlation function models. The Medium Neural Network model, Weighted KNN model, and Fine Gaussian SVM model have an accuracy of 98.39 %, 98.22 %, and 97.97 %, respectively.


I. INTRODUCTION
The Internet of Things (IoT) environment describes a network of physical objects that includes software, sensors, and other technologies to connect and transfer data with systems and devices on the Internet. IoT represents the interconnection of heterogeneous entities, where the term "entity" refers to a human being, a sensor, or possibly anything that can request or provide a service [1]. The Internet of Things (IoT) has become one of the most important technologies of the 21st century, so we need to protect data traveling between devices. IoT security is an ongoing research topic that's gaining increasing attention in governmental, industrial, and academic research. IoT security requirements are Confidentiality, Integrity, Availability, Authentication, and Authorization. There are multiple IDS systems that are used to protect networks from attacks, but some systems can't identify the newest attacks. We provide in this paper an intrusion detection system to detect attacks, the system is developed using deep learning and machine learning techniques with an efficient feature selection using the ReliefF algorithm. The proposed model is based on the Windows 10 subset of the ToN-IoT dataset. A comparison is carried out with the existed work which selects features from Windows 10 dataset using the correlation function.
In this paper, we focus on Windows 10 dataset as the windows 10 operating system is widely used in personal computers. This dataset contains new attacks for IIoT networks which can't be detected by traditional IDSs so this paper presents a new IDS system for IoT networks by using machine learning and deep learning algorithms. Many datasets have redundant or noisy data which may cause poor learning performance and consume time for training so the feature selection techniques are used to remove irrelevant and noisy attributes from the dataset. In this paper, we apply the ReliefF feature selection method on the windows 10 dataset.
The paper structure is as follows: Section II reviews the IDS backgrounds, Datasets, feature selection techniques, and Machine learning and deep learning algorithms. Section III discusses Related Works on IDS. The proposed model is explained in Section IV. Following that the Evaluation Metrics and Experimental Results are described in Section V. Finally, Section VI describes the conclusions of the future work.

II. BACKGROUND
Intrusion detection systems (IDSs) are tools or software that can secure or defend your networks from intrusions. The purpose of IDS is to recognize various types of attacks and usage of the computer that cannot be known via firewalls. This is often very important to achieve a high level of protection versus action which compromises the availability, integrity, or confidentiality of a computer system [2]. IDS monitors network traffic for suspicious activities and issues alerts when such activities are detected. Some intrusion detection systems can do actions if anomalous traffic is detected such as blocking traffic sent from a suspicious IP address. IDS sometimes is classified based on its location; Host Intrusion Detection Systems (HIDS) and Network Intrusion Detection Systems (NIDS) [3]. HIDS monitors and analyzes host activities, application logs, system calls, and modifications that occur on system files to identify intrusions such as login to unprivileged data [4]. Network-based IDS monitors the network traffic by using techniques such as packet sniffing to collect network traffic and detect attacks such as DOS attacks or port scans [4]. There are two types of Network-based IDS Statistical anomaly IDS and Pattern matching IDS [3]. On the other hand, IDS is practically classified as signature-based IDS and anomalybased IDS. Anomaly-based IDS is significantly used for detecting zero-day attacks. Efficient IDS development depends on selecting suitable machine learning or deep learning models as well as the feature selection technique in order to enhance the performance and reduce the computations. www.ijacsa.thesai.org Feature selection is an important step that plays a significant role in the development of an effective IDS. Feature selection is categorized into three methods like filter methods, wrapper methods, and embedded methods. The filter methods select the most discriminating features by data character. Features are ranked according to specific criteria and then the highest-ranked features are selected. There are many types of filter methods that have been used, such as ReliefF, F-statistic, mRMR, correlation, and information gain. Wrapper methods use the intended learning algorithm to evaluate the features. Examples of wrapper methods are Sequential forward selection and Sequential backward selection. Embedded methods perform feature selection in the model construction process examples of embedded methods are Elastic Net and Ridge Regression [5]. The most widely used feature selection methods are The ReliefF (Relief-F) [6], Correlation-Based Feature Selection [7,8], and Information Gain Ratio-based feature selection [9,10].
There are several techniques of classification like decision trees, naïve Bayes (NB), neural networks (NN), support vector machines (SVM), Random Forest (RF), rule-based system, and nearest neighbor (KNN) [14,15]. Every algorithm utilizes learning methods to create a classification model. However, proper classification techniques should handle the training data and should be able to accurately identify a class of records that have not ever been seen before [4].
Deep Learning a branch of Machine Learning, has also begun to be widely used to implement IDS. There are many types of deep learning algorithms that can be used for intrusion detection systems such as recurrent neural network, deep belief network, Long Short Term Memory, restricted Boltzmann machine, deep auto-encoder, self-taught learning, convolutional neural network, deep migration learning, and replicator neural network [16,17].

III. RELATED WORK
There are lots of researches that use some methodologies and algorithms to overcome the most advanced attacks in the network. These researches trend to intrusion detection systems (IDSs) by using machine learning and deep learning algorithms with different feature selection methods to select the most important features from the datasets. The related works are explained in this section as follows: Senthilnayaki Balakrishnan et al [18], the authors developed a new feature selection method based on Information Gain Ratio. They had named it an Optimal Feature Selection algorithm that could select an optimal number of features from the dataset. They used the KDD Cup dataset. They used Support Vector Machine and Rule-Based Classification algorithms to classify data as normal or as attacks data. The proposed feature selection and classification algorithms had achieved the best results. S. Ramakrishnan et al. [19], the authors developed an IDS to detect attacks or normal data from the KDD Cup 99 dataset. They first selected the most important attributes by using the entropy-based feature selection method. The Fuzzy Control Language was used to classify data as normal or attack data. The results showed that the proposed system reduced the computational time and achieved high accuracy.
Peilun Wu et al. [20], in this paper, the authors developed an IDS system called Densely-ResNet. They selected the important attributes using the correlation function. The UNSW-NB15 was used to evaluate the performance of Densely-ResNet. The results showed that the Densely-ResNet had achieved high accuracy and reduced false alarm rate.
Shahid Latif et al. [21], the authors proposed an IDS that utilized a deep random neural network to protect IIoT systems. However, the system had been tested with the UNSW NB15 dataset. This dataset was used to observe its applicability or feasibility to IIoT. The results showed a better detection accuracy with a low false alarm rate.
Merna Gamal et al [14], the authors proposed a hybrid intrusion detection system using machine learning and deep learning techniques to benefit from the advantage of deep learning and machine learning. They used 10% of the KDDcup1999 dataset, and applied CNN to extract features from the KDDcup1999 dataset, then used machine learning techniques (SVM, KNN) to classify the data. The experiment results showed that the proposed model had achieved the best accuracy.
Prabhat Kumar et al [22], the authors developed an ensemble learning and fog-cloud architecture-driven cyberattack detection framework for IoMT networks. They used a ToN-IoT dataset that was collected from a large-scale and heterogeneous IoT network. Results showed that the proposed system could achieve high accuracy of 96.35%, a high detection rate of 99.98%, and a reduction in false alarm rate by 5.59%.

IV. PROPOSED MODEL
This section provides our proposed model. We select features from windows 10 datasets using the ReliefF algorithm. We apply the ReliefF algorithm to remove irrelevant variables from Windows 10 dataset. Windows 10 dataset contains 125 attributes and two attributes one called "Label" whose values are (0 or 1), 0 for normal data and 1 for attack data, and "type" attribute contains normal data and seven attack categories (DDOS, DOS, Injection, MITM, Password, Scanning, and XSS). Before selecting features from the windows 10 dataset, we do preprocess on a dataset. The "type" feature refers to attack categories, these categories are converted from text data to numeric data from 2 to 8 respectively, and normal data are converted to 1. Fig. 1 shows the number of rows for each type of data in the dataset. Then we divide the dataset into training and testing data using holdout validation, 80 % of the dataset is training data and 20 % of the dataset is used as testing data. www.ijacsa.thesai.org The ReliefF algorithm is used to select features from the windows 10 dataset. After selecting features, we apply deep learning and machine learning techniques to classify the data as normal or attacks data. The Matlab tool and Machine Learning Toolbox (Classification Learner) are used to simulate algorithms such as KNN, SVM, and NN. We apply different types of KNN algorithm (such as Weighted KNN and Medium KNN), SVM (like Linear SVM and Fine Gaussian SVM), and Neural Network (like Medium NN and Bilayered NN). Also, we apply the LSTM algorithm to the dataset. Fig. 2 shows the flowchart of our proposed model.

V. SIMULATION AND EXPERIMENTAL RESULTS
Experiments are carried out to examine the performance of the proposed model.

A. Ton-IoT Datasets Description
Ton-IoT datasets were collected from Telemetry datasets of IoT services, Operating systems datasets of Windows and Linux, as well as datasets of Network traffic. The Windows datasets were generated using the virtual machines running Windows 7 and Windows 10 and incorporated the collections of data from multiple sources, including memory, process, processor, and hard drive of the systems. Windows 10 dataset contained 125 attributes and two attributes one called "Label" which values are (0 or 1), 0 for normal data and 1 for attack data, and the "type" attribute contained normal data and seven attack categories.
We apply our proposed model to Windows 10 dataset. We select features from it using the ReleifF algorithm then we apply deep learning and machine learning techniques to classify the data as normal or attack data. We divide the dataset using holdout validation as the following; 80 % of the dataset is used for training (28780 data sample records). 20 % of the dataset is used for testing (7195 data sample records).

B. Simulation Environment Properties
The proposed system is implemented using Matlab 2021a, Classification learner App is set to apply imported and preprocessed instances of the dataset the processor unit (CPU) is Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz 2.40 GHz, main memory(RAM) is 16.0 GB and the operating system is Windows 10 Pro.

C. Evaluation Metrics
There are performance metrics are calculated to examine the model performance. These metrics are accuracy, recall, precision rate and, F1-Measure ought to be calculated for evaluation. These all performance metrics can be calculated by using a confusion matrix.
 Confusion matrix is a table or array that represents the performance of a classification model on testing data for which the true values are identified. Fig. 3 shows the confusion matrix.  Accuracy is the ratio of true detection over the whole instances. (1)  Recall is how often does it predict correctly. Also known as Sensitivity or True Positive Rate (TPR).
Input data (windows 10 datasets) Preprocessing (convert type attribute to numerical data) Area Under ROC Curve (AUC) is the performance metric for binary classification issues. AUC represents the work of the model for distinguishing between the negative classes and positive classes. An area of 1.0 acts as a model in which all predictions are made perfectly. An area of 0.5 acts as a model that is as good as random. The ROC can be classified into specificity and sensitivity.

D. Feature Selection
The paper in [23] selects features from Windows 7 and Windows 10 datasets using a correlation function that selects the most correlated features in datasets, and the authors suggest applying machine learning and deep learning algorithms to selected features. In our proposed model we suggest modifying the feature selection step, we apply the ReliefF algorithm to remove irrelevant variables from Windows 10 dataset. We compare our proposed model with the work implemented in the paper [23].
We use the ReliefF algorithm because it is a widely used filter-based feature selection method that finds the best feature subset by calculating the features' weights. The ReliefF algorithm's advantages are that more robust and can deal with incomplete and noisy data. It consumes less computational resources and is not limited to two-class problems, but it can work with multiple classes [6,24]. The ReliefF algorithm first selects R samples randomly from the training set, and then it finds Near Hits in the same class, finds Near Miss in other classes, and updates each feature weight by rules. After repeating the above action m times, all feature weights will be obtained. The higher the feature weight is, the more useful the feature is for classification. The reverse is also true. The following steps are the pseudocode of the ReliefF algorithm: Pseudocode for ReliefF Algorithm: Input: a vector of attribute values and the class value for each training instance Output: the vector W of estimations of the attributes qualifies.
Step 1: put up all of the weights W[A] := 0.0; Step 2: for i:= 1 to m do begin Step 3: randomly select an instance Ri ; Step 4: find k nearest hits Hj ; Step 5: for each class C ≠ class(Ri) do Step 6: from class C find k nearest misses Mj(C); Step 7: for A:= 1 to a do Step 8: ∑ ( ) Step 9: ∑ ( ) Step 10: end;

E. Result and Discussion
ReliefF algorithm estimates the weight of each feature and sorts features according to their weights. Fig. 4 shows the most important weights of 20 selected features, the highest feature weight is equal to 0.257.  Tables I and II show features selected in the windows 10 dataset using the correlation function in [23] and features selected using the ReliefF algorithm in our proposed model, respectively. The selected features will be then used for training and testing machine learning and deep learning algorithms to evaluate their efficiency in classifying the dataset as normal or attacks data.
After selecting features, we apply deep learning and machine learning techniques to classify the data as normal or as attacks data. The algorithms applied are KNN, SVM, Neural Network, and LSTM. We apply different types of KNN algorithms (such as Weighted KNN and Medium KNN), SVM (like Linear SVM and Fine Gaussian SVM), and Neural Network (like Narrow NN and Bilayered NN). Table III shows the performance metrics of classification algorithms applied to the features selected in [23] in the windows 10 dataset. Table IV shows performance metrics of classification algorithms applied to the features selected using the ReliefF algorithm in windows 10 datasets.  Long Short Term Memory algorithm (LSTM) is also applied to features of windows 10 which are selected from using the correlation function in the paper [23] and the ReliefF algorithm in our proposed model. Table V shows the comparison of performance metrics of the LSTM model applied to the features selected using the correlation function in the paper [23] and the ReliefF algorithm in our proposed model in windows 10 datasets.
The results showed that the accuracy of the LSTM model is 68.9 % of features selected from using the correlation function, and the accuracy of the LSTM model is 70 % of features selected from using the ReliefF algorithm and the Precision is 0.72 in our proposed model unlike in the paper [23] is 0.71. This indicates that our proposed model has achieved the best results.
The results of Table III and Table IV showed that our proposed model has achieved the best results, as in our proposed model the Medium NN Model has best results than in results in the paper [23]. The accuracy of the Medium NN Model in our proposed model is 98.39 and the Precision is 0.996 unlike in the paper [23]  The accuracy of the Linear SVM algorithm in our proposed model is 96.54 %, but in paper [23] the accuracy is 75.48%. The reason this algorithm has achieved lower results than other machine learning algorithms is that this algorithm works more efficiently with two classes, but if it works with a dataset that has multiple classes may achieve low accuracy, and it isn't suitable for large datasets.
LSTM also has archived low accuracy as it doesn't work efficiently with a large number of samples. As the number of samples increases, the accuracy of LSTM tends to decrease. Figure 5 shows a graphical representation of the accuracy of algorithms applied to features selected from the windows 10 dataset. This comparison shows that our proposed model has achieved the best results than the results of [23]. Fig. 6 and 7 show the graphical representation of performance metrics of our proposed model and of [23], respectively. Fig. 6 and 7 show that our proposed model has achieved high values of performance metrics against another work [23].
All results of our proposed model are better than the results of [23]. It indicates that the ReliefF algorithm selects the most important features and it can work with multiple classes. www.ijacsa.thesai.org

VI. CONCLUSION
This paper has introduced an efficient IDS for the IoT environment. The paper uses the benchmark dataset ToN_IoT which includes IoT telemetry data. ReliefF algorithm is proposed to efficiently select the most vital features to get more enhanced performance and computations reduction. A comparison is carried out with the correlation function by applying machine learning and deep learning algorithms to the selected features from the windows 10 dataset. The proposed model has been applied to the windows 10 subset. The experimental results reveal that our proposed model has achieved better results than theirs. Our proposed model has high accuracy such as using Medium Neural Network, Weighted KNN, and Fine Gaussian SVM has 98.39 %, 98.22 %, and 97.97 % respectively. In the future, we will use Transfer Learning to enhance the accuracy of the Long Short Term Memory (LSTM) model.