Intrusion Detection using Deep Learning Long Short-term Memory with Wrapper Feature Selection Method

Recently, many companies move to use cloud computing systems to enhance their performance and productivity. Using these cloud computing systems allows the execution of applications, data, and infrastructures on cloud platforms (i.e., online), which increase the number of attacks on such systems. As a resulting, building robust Intrusion detection systems (IDS) is needed. The main goal of IDS is to detect normal and abnormal network traffic. In this paper, we propose a hybrid approach between an Enhanced Binary Genetic Algorithms (EBGA) as a wrapper feature selection (FS) algorithm and Long Short-Term Memory (LSTM). A novel injection method to prevent premature convergence of the GA is proposed in this paper. An intelligent kmeans algorithm is employed to examine the solution distribution in the search space. Once 80% of the solutions belong to one cluster, an injection method (i.e., add new solutions) is used to redistribute the solutions over the search space. EBGA will reduce the search space as a preprocessing step, while LSTM works as a binary classification method. UNSW-NB15, a real-world public dataset, is used in this work to evaluate the proposed system. The obtained results show the ability of feature selection method to enhance the overall performance of LSTM. Keywords—Intrusion detection; feature selection; long shortterm memory; binary genetic algorithm


I. INTRODUCTION
With the exponential growth rates of volumes of data, both structured and unstructured, that are generated from a variety of sources, the need to provide protection and privacy becomes a challenging issue for intrusion detection systems (IDSs) in this big data environment. Intrusions are suspicious and unauthorized activities on a computer or network that threaten the security of these systems. IDSs are very crucial to ensure network and information security. These systems can be devices or software that monitor systems or networks for malicious activities or violations of security policies.
Intrusion detection systems detects unusual attacks based on two methods; signature-based detection and anomaly detection. In signature-based detection, IDS analyzes system activities to find patterns that are similar to previously detected and stored patterns in a database. Intrusion detection using an anomaly detection method which relies on machine learning to build models of patterns of normal behavior on the system or the network (i.e., cloud computing systems) to detect patterns of unusual behavior. Fig. 1 presents the main architecture for IDS for cloud computing systems.
There are many algorithms have been proposed to build a robust IDS based on machine learning and soft computing methods. Network traffic data is a high dimensional one, many papers investigated the ability of employing FS algorithms to enhance the overall performance of IDS [1]. For example, Almomani [2] applied four types of FS algorithms, namely, genetic algorithm (GA), particle swarm optimization (PSO), firefly optimization (FFA), and grey wolf optimizer (GWO). Almomani used two classifiers: Support Vector Machine (SVM) and decision tree (J48) to build a robust IDS. Thakkar and Lohiya [3] applied seven ML classifiers (i.e., Neural Networks (NN), Decision Tree (DT), Logistic Regression (LR), Support Vector Machine (SVM), k-nearest neighbours (kNN), Random Forest (RF), and Naïve Bayes (NB)) to build an intelligent IDS. Zhu et al. [4] introduced a multi-objective method for FS for building a robust IDS inside cloud computing systems.
Many contributions in the literature focus on traditional machine learning methods for IDS. However, these methods have high cost in terms of training time when working with big data sets. To overcome this issue, deep learning approach is used for effective learning mechanism in reducing the training time and increasing the accuracy of the obtained results from the IDS. Moreover, the main contribution of this work is to introduce a robust wrapper feature selection that is able to reduce the high dimensionality of the dataset. This paper is organized as follow: Section II presents the related works of IDS. Section III presents the proposed method used in this paper (i.e., EBGA and LSTM). Section IV presents the data set used in this paper. Section V presents the obtained results and analysis. Finally, Section VI presents the conclusion and future works of this paper.

II. RELATED WORK
The literature shows a number of traditional machine learning approaches methods have been proposed for intrusion detection systems which include Support Vector Machine, K-Nearest Neighbors, Decision Trees, Random Forests, Linear Regression, Naive Bayes, Artificial Neural Networks. Recently, deep learning-based approaches has emerged to overcome the challenges of developing an accurate high-detection rate IDSs. State of the art deep learning approaches that have been used for IDS include Deep Neural Networks (DNNs) [5], Deep Belief Networks (DBNs), Restricted Boltzmann Machines (RBMs), autoencoders and hybrid methods. For example, Zhao et al. [6], proposed an intrusion detection method based on deep belief networks and probabilistic neural network. The KDD CUP 99 data set was used for testing the performance of the proposed method. The result shows that their proposed method performs better than traditional www.ijacsa.thesai.org In [7] Erfani et al. presented a hybrid approach for IDS by combining DBNs with a linear one-class SVM and was applied using several data sets. Their experimental results show that their proposed model is scalable and computationally efficient and when compared to an autoencoder it executes 3 times faster in the training phase and 1000 times faster in the testing phase.
In [8], the authors proposed an approach for IDS based on deep learning using self-taught learning on NSL-KDD, a benchmark data set, with only six features selected out from the forty one features of the data set. results of their experiments and comparisons with other machine learning algorithms; Naive Bayes, SVM and Decision Tree show that using deep learning algorithm is promising as it performs better than the other algorithms with higher accuracy rate and lower false positive rate.
Javaid et al. [5] proposed a network intrusion detection system based on deep learning approach. They used selftaught learning technique (STL) on NSL-KDD benchmark data set. They compared the performance of their approach with the soft-max regression (SMR). their results show that the proposed approach outperforms SMR with accuracy rate more than 98%.
In [9] proposed an approach for network traffic identification using Artificial Neural Networks (ANN) and Stacked AutoEncoder (SAE) based on Deep learning using a real data set of TCP data collected from an internal network. Results of their work show that their proposed approach can classify any flow data to a predefined protocol with accuracy enough to be applied in real applications.
Yin et al. [10] compared the performance of their IDS which is based on recurrent neural network, a deep learning approach, with a number of traditional machine learning techniques. Results from their experiments on NSL-KDD benchmark data set show that the proposed system outperforms traditional machine learning methods in both binary and multiclass classification with high accuracy.
The above work studied the emergence of deep learning in the performance of IDS. However, to date, A few number of existing studies in the literature have addressed the integration of deep learning approaches and Big Data for improving the performance of IDSs. Faker and Dogdu [11] integrated Big Data and deep learning approach to enhance the performance of intrusion detection system using three classifiers to classify attacks in both binary and multi-class classification; Deep Feed-Forward Neural Network (DNN), Random forest and Gradient Boosting Tree (GBT) on UNSW-NB15 and CICIDS2017 data sets. on UNSW-NB15, DNN gives high accuracy results in both binary and multi-class classification of 99.19% and 97.04%, respectively with low prediction times. However, on CICIDS2017, GBT achieved the best accuracy, of 99.99%, in binary classification. Researches in [12] suggested the implementation of Deep Neural Network model (DNN) for IDS to detect and classify unforeseen and unpredictable cyberattacks. They provide a comprehensive evaluation of experiments of DNN and other traditional machine learning models using various benchmark IDS data sets such as KDDCup99, NSL-KDD, UNSW-NB15, Kyoto, WSN-DS and CICIDS2017. Their proposed model exceeded in performance the other classical machine learning classifiers. A recent work by [13] addressed the detection of intrusions through the use of deep learning in big data environment. They proposed a hybrid deep learning model based on convolutional neural network (CNN) and a weight-dropped, long short-term memory net-work (WDLSTM). CNN is used to extract features from IDS big data and WDLSTM network for learning dependencies among the extracted features to solve the overfitting problem. Their experimental results show a good performance with 97.1% accuracy.

A. Enhanced Binary Genetic Algorithm
One of the most population evolutionary algorithms that mimics the nature selection is Genetic Algorithm (GA) [14]. GA is a population-based algorithm, where the best solution obtained after a predefined number of iterations. In simple, GA starts by generating a set of solutions called population. All these solutions are evaluated based on a fitness function. A set of genetic operations (i.e., selection, crossover, and mutation) are applied on the population at each iteration. This process is repeated iteratively until stop condition is met and return the best solution [15]. Fig. 2 explores the standard GA algorithm. To enhance the performance of GA, we proposed a novel injection method based on solution distribution in the search space. At each iteration, we examined the solution distribution using intelligent k-means clustering algorithm, if 80% of the solutions located in one cluster, we redistribute the solution by injecting the population with new solutions to redistribute the solutions over the search space and prevent the premature convergence. This enhancement will enhance the exploration process of GA. Fig. 3 explores the flow chart of enhanced GA.

B. Long Short-Term Memory (LSTM) Networks
A deep learning method (i.e., CNN-LSTM) is employed to detect intrusions. Fig. 4 explores the main structure of CNN-LSTM. In simple, LSTM uses an internal memory to memorise the temporal sequence of the input feature vectors.
LSTM maps the input i (i.e., features) with output o ( i.e., abnormal/normal packet), while forget f gate to memorize the store features. The hidden state h cell state c are used for The calculation of fully connected layer and softmax process are shown in Eq. (4), and Eq.(5), respectively. In this work, we employed the softmax to classify the input user's role. While the output of the fully connected layer is presented by the softmax layer in a range [0,1]. Nc refers to the number of rules, and L presents the activity class probability.

C. EBGA-LSTM
The proposed hybrid approach works by combining EBGA with LSTM. Here, EBGA works as a wrapper FS to remove the redundant/irrelevant data from the original dataset. while LSTM works as a binary classifier to detect normal and abnormal network traffic.

IV. DATASET
This paper evaluates the proposed hybrid approach over a public intrusion data set called UNSW-NB1. The data set is generated using a tool called IXIA PerfectStorm by Moustafa et al. [16]. The data set has 9 different types of attacks. The data set has 49 features. In this work, only 44 features are used. Table I explores 44 features of the data set. Moreover, this data set has 9 different attacks as shown in Table II. UNSW-NB data set is imbalanced data set. In this work, adaptive synthetic sampling method (ADASYN) is employed for solving class imbalance issue [17]. Table III explores the original and balanced data set. In this work, this data set is used as a binary classification problem to determine normal or abnormal attacks.

V. RESULTS AND ANALYSIS
This section reports the validation of the proposed hybrid method (i.e., EBGA with LSTM) to detect intrusion in cloud computing systems. All experiments are employed based on cross-validation method with kfold=10. We implemented the proposed approach using MATLAB 2019b. We used six criteria to evaluate the proposed method which are: accuracy (sSee Eq.(6), Specificity (see Eq.(7)), Precision (see Eq.(9)), Recall (see Eq.(10)), and F-Measure (see Eq.(11)).
Specif icity = T N T N + F P P recision = T P T P + F P (9) To perform a good analysis of the proposed approach, we simulated the proposed hybrid approach (i.e., EBGA with LSTM) with three settings; balanced data set with FS (i.e., EBGA), balanced data set without FS and Original data set without FS. Table IV explores the obtained results for three types of experiments. It is clear that the performance of feature selection improves the overall performance of LSTM compared to other experiments without feature selection. For example, the obtained results for testing data set show a good improvement (i.e., 6%) for the proposed method over balance data set. Fig. 6 explores the performance of LSTM in the training process. The classification error (i.e., RMSE) has a smooth convergence for balanced data with feature selection (i.e., blue line). Fig. 7 explores the loss convergence for the three experiments. It is clear that employing FS method helps LSTM to converge faster.

VI. CONCLUSION AND FUTURE WORKS
This paper proposed a hybrid method between EBGA and LSTM to detect normal and abnormal network traffic. EBGA works as a wrapper feature selection, while LSTM works as binary classifier. The proposed method employed as IDS for could computing system. We examined the proposed approach over a real public data set called UNSW-NB15. The original data set is imbalanced one. We handled the imbalanced data set using ADASYN method. The obtained results show the importance of feature selection method and its ability of enhancing the classification accuracy. In future work, different feature selection methods such as Harris Hawks Optimization (HHO), Gray Wolf Optimization (GWO), and Whale Optimization Algorithm (WOA) will be applied to reduce the search space and determine the most important features for IDS systems.