Healthcare Intrusion Detection using Hybrid Correlation-based Feature Selection-Bat Optimization Algorithm with Convolutional Neural Network A Hybrid Correlation-based Feature Selection for Intrusion Detection Systems

— Cloud computing is popular among users in various areas such as healthcare, banking, and education due to its low-cost services alongside increased reliability and efficiency. But, security is a significant problem in cloud-based systems due to the cloud services being accessed via the Internet by a variety of users. Therefore, the patient’s health information needs to be kept confidential, secure, and accurate. Moreover, any change in actual patient data potentially results in errors during the diagnosis and treatment. In this research, the hybrid Correlation-based Feature Selection-Bat Optimization Algorithm (HCFS-BOA) based on the Convolutional Neural Network (CNN) model is proposed for intrusion detection to secure the entire network in the healthcare system. Initially, the data is obtained from the CIC-IDS2017, NSL-KDD datasets, after which min-max normalization is performed to normalize the acquired data. HCFS-BOA is employed in feature selection to examine the appropriate features that not only have significant correlations with the target variable, but also contribute to the optimal performance of intrusion detection in the healthcare system. Finally, CNN classification is performed to identify and classify intrusion detection accurately and effectively in the healthcare system. The existing methods namely, SafetyMed, Hybrid Intrusion Detection System (HIDS), and Blockchain-orchestrated Deep learning method for Secure Data Transmission in IoT-enabled healthcare systems (BDSDT) are employed to evaluate the efficacy of HCFS-BOA-based CNN. The proposed HCFS-BOA-based CNN achieves a better accuracy of 99.45% when compared with the existing methods: SafetyMed, HIDS, and BDSDT.


INTRODUCTION
Network Intrusion Detection Systems (NIDSs) identify malicious activities and safeguard the vulnerable services by monitoring network traffic and providing alerts when anomalous events are recognized.Some organizations that are primarily focused on obtaining private user data, establishing the foundation for modern-day detection and protection are attacked by cyber-attackers.Furthermore, the healthcare sector keeps growing, and most hospitals are integrating e-healthcare systems as quickly as feasible to fulfill the needs of their patients.IDS based on cloud networks employ anomaly-based techniques to protect the cloud-based applications [1].In network security, there are two common detection techniques for NIDS, anomaly-based detection, and signature-based detection [2].An anomaly-based IDS analyzes the network traffic and correlates it to a created baseline for unknown or known attacks, where a signature-based IDS is allowed to be employed while the attack patterns are established and predetermined [3,4].To address numerous security issues, the cloud utilizes numerous cybersecurity techniques like IDS, Intrusion Prevention Systems (IPS), and firewalls [5].The centralized processing technique used by cloud computing involves uploading every transaction and processing the enduser service requests based on the transmission bandwidth, capacity of storage, and computer resources [6].Proactive network security defenses are required to protect essential assets and data because the cloud attack vector has the potential to result in successful security breaches [7].
Network security has always placed a high priority on intrusion detection since it is crucial for identifying anomalous activity on secured internal networks [8].The network of intermediate, source, and endpoint are used to identify the Distributed Denial-of-Services (DDoS) attacks.The attack's endpoint is easily detected because of the massive volume of network traffic that is generated [9].A significant number of traditional intrusion detection systems use either a port-based or Deep Packet Inspection (DPI) technique.The port-based technique identifies traffic by using the ports established by the Internet Assigned Numbers Authority (IANA) [10].Software Defined Network (SDN) is an emerging design that is costeffective, flexible, adaptable, and controlled, thereby making it more suitable for presently employed complicated applications and bandwidth [11,12].SDN's goal is to create a logically centralized hub for internet and networking architects so that they quickly respond to the evolving client demands [13].Deep learning techniques, especially CNN represent remarkable capacity in automatically extracting features and intricate patterns from complex data, including network traffic [14].By employing Deep CNN, the IDS efficiently recognizes anomalous behavior and emerging threats in real time [15].Since cloud services are accessed via the internet by a variety of users, security is a significant concern in cloud-based systems because the health information of patients must be www.ijacsa.thesai.orgkept confidential, secure, and accurate.Moreover, any change in actual patient data results in errors during the diagnosis and treatment [31][32][33][34].Therefore, the HCFS-BOA based on CNN is proposed in this research, for intrusion detection to secure the entire network in the healthcare system.The main contributions of this research are as follows:  The proposed HCFS-BOA approach is evaluated on the CIC-IDS2017, NSL-KDD benchmark datasets, and the Min-max normalization technique is employed to normalize the raw data.
 For feature selection, HCFS-BOA is employed to examine the appropriate features that not only have significant correlations with the target variable, but also contribute to the optimal performance of intrusion detection in the healthcare system.
 Finally, CNN is employed for classification to identify and classify intrusion detection accurately and effectively.The efficacy of HCFS-BOA is analyzed based on the performance measures of accuracy, precision, recall, and f1-score.
The rest of the paper is organized as follows: Section II presents the literature survey.The block diagram of the proposed method is discussed in Section III.The results are illustrated in Section IV, while Section V discusses the conclusion of this paper.

II. LITERATURE SURVEY
Faruqui et al. [16] presented a SafetyMed for Internet of Medical Things (IoMT) IDS by employing hybrid CNN-Long Short-Term Memory (CNN-LSTM).The SafetyMed was the first IDS that included an optimization approach based on the trade-off between Detection Rate (DR) and False Positive Rate (FPR).The SafetyMed enhanced the safety and security of medical devices and patient information.However, the presented SafetyMed method had no defense mechanism against an attack of Adversarial Machine Learning (AML).
Vashishtha et al. [17] implemented a HIDM for cloudbased healthcare systems to detect all kinds of attacks.The hybrid approach was a mixture of a Signature-based Detection Model (SDM) and an Anomaly-based Detection Model (ADM).The datasets of NSL-KDD, CICIDS2017, and UNSW-NB15 were employed to evaluate the efficacy of the HIDM approach.The implemented method had a higher detection rate with the error of Type-I and Type-II for both ADM and SDM.However, combining various detection systems increased the risk of false negatives and false positives.
Kumar et al. [18] introduced a BDSDT for the transmission of secure data in IoT-based healthcare systems.Initially, the architecture of blockchain was created in all IoT devices that were identified and established using a zero-knowledge proof, and then connected to the blockchain network using a smart contract-based ePOW consensus.Then, a bidirectional LSTM was employed using a DL to recognize IDS in the healthcare system.The BDSDT enhanced the privacy and security by combining both DL and blockchain methods.However, BDSDT wasn't effective against web and Bot threat attacks as there were fewer instances of these two classes which led to changes in actual patient data resulting in errors during the diagnosis and treatment.
Halbouni et al. [19] presented a CNN-Long Short-Term Memory (CNN-LSTM) for IDS system.The ability of CNN to extract the spatial features alongside the ability of LSTM to extract the temporal features were the highlights of this model.In order to improve performance, batch normalization and the layers of dropout were created to the presented method.The presented method decreased the false alarm rate and improved the rate of detection.However, CNN-LSTM failed to provide a high detection rate for specific kinds of attacks like web attacks and worms which led to changes in actual patient data resulting in errors during the diagnosis and treatment.
Han et al. [20] presented an Intrusion Detection Hyperparameter Control System (IDHCS) to regulate and train a Deep Neural Network (DNN) extracted feature and the module of k-means clustering in terms of Proximal Policy Optimization (PPO).The most valuable network features were extracted by the DNN under the control of an IDHCS, which also used K-means clustering to detect intrusion.The IDHCS performed effectively for each dataset, as well as the combined dataset.However, to represent a more realistic network environment, a diverse dataset needed to be examined.Bakro et al. [21] introduced a hybrid feature selection approach that combined filter techniques such as Particle Swarm Optimization (PSO), Chi-Square (CS), and Information Gain (IG).Combining each of these three techniques was a novel method that generated a more reliable process of feature selection by using every technique's strength to increase the possibilities of selecting the most associated features.The introduced method had the benefits of flexibility, time complexity, interpretability, and scalability.But, the feature selection approach was not done properly which resulted in overfitting.
Sudar et al. [22] implemented a Machine Learning (ML) approach based on Decision Tree (DT) and Support Vector Machine (SVM) to detect Distributed Denial of Service (DDoS) attacks.The classification approach was established in the environment of Software Defined Network (SDN).The DT and SVM approaches were deployed to distinguish among malicious and normal traffic data.This approach provided better accuracy and detection rate.Nonetheless, this implemented approach struggled to adapt to evolving attack strategies.

Praveena et al. [23] developed a Deep Reinforcement
Learning approach that was optimized by Black Widow Optimization (DRL-BWO) for intrusion detection in Unmannered Aerial Vehicles (UAV).The BWO approach was deployed for parameter optimization of the DRL method which assisted in enhancing the performance of intrusion detection in UAV networks.This approach was fit for the tasks of information extraction in high dimensional space.Nonetheless, the intricate nature of the DRL-BWO approach resulted in minimized interpretability.
Chinnasamy et al. [24] presented a Blockchain DDoS flooding attack with dynamic path detectors.The ML approach www.ijacsa.thesai.org was established to identify the attacks which focused on the DDoS assault.The primary essential traits were employed to predict the accurate DDoS attacks by utilizing a different attribute selection approach.Nevertheless, this presented approach led to severe network congestion which hindered the processing of transactions and slowed down the overall system's performance.
Chinnasamy et al. [25] developed an ML approach for effective phishing attack detection.Based on the input features such as Uniform Resource Locator (URL) and Web Traffic, the link was classified as phishing or non-phishing.This approach was determined by retrieving datasets from ML and phishing cases by employing SVM, Random Forest (RF), and Genetic.Nevertheless, ML approaches in phishing detection struggled to maintain pace with constantly evolving phishing tactics which led to potential delays in identifying the new attacks.
Anupriya et al. [26] implemented an ML approach for fraud account detection.To compute buddy similarity criteria, the adjacency network matrix graph was employed and then new features were acquired by utilizing the Principle Component Analysis (PCA).This was employed to equalize the data and transform it into the classifier in the next phase of crossvalidation for training and testing the classifier.Nevertheless, due to imbalanced datasets, this approach struggled with evolving the fraud pattern and generated false positives or negatives.
There are some limitations with the existing methods that are mentioned above such as the methods not being effective in detecting attacks which led to changes in actual patient data resulting in errors during the diagnosis and treatment.In order to overcome these issues, the HCFS-BOA-based CNN is proposed for intrusion detection to secure the entire network in the healthcare system.

III. PROPOSED METHODOLOGY
In this research, a hybrid CFS-BOA-based CNN approach is proposed for intrusion detection in healthcare systems using deep learning.It includes datasets, min-max normalization, feature selection using HCFS-BOA, classification using CNN, and performance evaluation.The overview of the proposed method is represented in Fig. 1.

A. Datasets
The proposed HCFS-BOA approach is evaluated on CIC-IDS2017 [27] and NSL-KDD benchmark datasets.The CIC-IDS-2017 dataset includes malicious and normal traffic data that is considered new and does not include an enormous amount of redundant data.It includes eleven new attacks namely, PortScan, Brute Force, DoS, web attacks like SSH, Patator, FTP-Patator, SQL injection, and XSS.It is created by the Canadian Institute for Cybersecurity in 2017, and its 80 features are employed to monitor malicious and benign traffic.The NSL-KDD is an extension of the KDD cup 99 database and contains 41-dimensional vectors with numerical and categorical values.The intrusion attacks in the NSL-KDD database are probe attacks, Remote to a user (R2L), Denial of Service (DoS), and the User to Root attack (U2R).NSL-KDD is an IoT dataset used for model training purposes in healthcare applications.

B. Pre-processing
After data collection, the normalizing process is established by rescaling the attributes with a uniform contribution.Typically, the data normalization technique addresses two key problems: the presence of outliers and the presence of dominant features.The various methods for normalizing data based on the measures of statistics are examined.Consider the data with records and instances, as expressed numerically in Eq. ( 1).
(1) where, indicates the label of class and represents the data to be learned via a learning process.The Min-max normalization technique [28] is employed to normalize the raw data, which is one of the various normalization techniques.This approach greatly minimizes the outlier's impact on the data.It scales the obtained data within the range of 0 to 1 which is numerically expressed in E q. (2).
(2) where, and represent the attribute's maximum and minimum values.By employing and , the acquired data are rescaled by the upper and lower boundaries.This acquired data is then passed as input to the feature selection.

C. Feature Selection
After normalizing the acquired data, the hybrid CFS-BOA approach is implemented for feature selection.In CFS-BOA, the features are selected by using a nature-inspired optimization technique to enhance the optimization process.The CFS-BOA's goal is to choose the most useful feature subset for detecting and avoiding security vulnerabilities while minimizing the redundancy and computational complexity.When compared to other optimization algorithms like Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO), the BOA tunes the optimization process for maximum efficiency for combining with CFS.The HCFS-BOA examines appropriate features that not only have significant correlations with the target variable, but also contribute to its optimal performance of intrusion detection in the healthcare system.This hybrid method has the potential to result in a more efficient and effective IDS that is specific to the unique characteristics of healthcare data and security requirements.
-feature subset's heuristic evaluation for a feature set that includes features ̅̅̅̅average degree of connection between the category label and the features ̅̅̅̅average degree of inter-connection between features A correlation technique based on the feature subsets is used for the evaluation of CFS.During the procedure, the feature set with the greatest value is determined to decrease the training and testing set size.A larger ̅̅̅̅ or smaller ̅̅̅̅ out of the obtained subsets by the approach provides a greater evaluation value.
2) Bat Optimization Algorithm (BOA): BOA is the first algorithm for optimization and computational intelligence, influenced by microbat echolocation behavior.In a ddimensional search, every bat flies at random with velocity, location and frequency at iteration.The current best solution is archived for bats in a population through an iterative search process.
The procedures for updating the location and velocity at each time step are mathematically presented in Eq. ( 4), Eq. ( 5) and Eq. ( 6). (4) (5) (6) where, is a vector selected at random from a uniform distribution.
Once a solution is chosen from the existing ideal solutions, a new solution for every bat is produced via a local random walk which is numerically expressed in Eq. ( 7).(7) where, is a random vector generated from uniform or Gaussian distribution in the range [-1,1].
is the average loudness of all bats at a time step.
Furthermore, the rate of pulse emission and loudness are modified as the iterations progress.They are updated using the following Eq.( 8) and Eq. ( 9). ( 8) (9) where, and are constant.
3) HCFS-BOA method for feature selection: The significance and correlation of the chosen feature subset are evaluated using the HCFS-BOA-based feature selection method.Correlation-based feature method is used in the HCFS-BOA to create a fitness function and assess the reliability of the reduced feature subset.CFS evaluates the correlation of mean feature class and the average intercorrelation between features for feature subset with features, where ) using (3).CFS is a classical filter method that selects relevant features based on correlation-based evaluation due to feature redundancy.By storing solutions in a bat's vector, BA is inspired by the echolocation activity of microbats, eliminating redundant features and reducing dimensionality.When a bat moves, it archives the best solution at the time.During the process of iterative search, the population scans for the optimum arrangement by updating and refreshing the position of each bat based on Eq. ( 4), Eq. ( 5), and Eq. ( 6).An ideal intrusiondetection approach has a higher detection rate and a lower false positive rate.Hence, a weighted fitness function is shown in Eq. ( 10).(10) where, and are the weights for the Detection Rate and False Positive Rate, respectively.A higher fitness means higher intrusion detection performance.In one iteration of the HCFS-BOA, the algorithm chooses a feature subset that depends on its correlation coefficients with the target variable.The bat optimization process involves updating the virtual bat's positions in the search space with each bat representing a potential feature subset.The technique iteratively refines feature selection by adjusting the position of bats and evaluates their performances via correlation-based metrics during both the testing and training phases.Thus, the rescaling acquired www.ijacsa.thesai.orgdata is passed into the feature selection phase which is sufficient for the classification of intrusion detection.

D. Classification
The selected features are classified using the CNN model which produces enormous results in domains such as Natural Language Processing (NLP), image processing, and healthcare diagnosis systems.For recognizing patterns and anomalies in network traffic or system logs, CNN classification is employed to improve intrusion detection in healthcare systems.Using CNN classification for IDS in healthcare helps to protect sensitive patient data, ensure the integrity of healthcare information systems, and avoid security breaches.It is an essential component of healthcare cybersecurity measures to protect electronic health records and vital healthcare infrastructure.
In contrast to Multi-Layer Perceptron (MLP), CNN reduces the number of neurons and parameters, resulting in rapid adaptability and minimal complexity.The CNN model offers an extensive number of clinical classification applications.CNN models are a subset of Feed-Forward Neural Network (FFNN) [29,30] and Deep Learning models.The convolution operations convention is constant which implies that the filter is independent in function, thereby reducing the amount of parameters.Pooling, convolution, and fully connected layers are the three types of layers used in the CNN method.These layers are required for performing feature extraction, dimensionality reduction, and classification.The filter is slid on the computers through the forward pass of convolution operation, and the input capacity of the activation map that assesses the point-wise result of every score is added to obtain the activation.The sliding filter is employed by linear and convolution operators, being stated as a quick distribution of dot product.Consider is the kernel function, is the input, at time is formulated as in Eq. (11).
Where, is for each .The parameter is the discrete which is presented in Eq. (12).

∑ (12)
The 2D image is given as input, is a 2D kernel, and the convolution is formulated as in Eq. ( 13).

∑ ∑ (13)
In order to improve the non-linearity, two activation functions, ReLU and softmax are utilized.The ReLU is mathematically represented as in Eq. ( 14).(14) The gradient for and for .The ReLU convergence ability is better than the sigmoid non-linearities.The next layer is softmax, preferable when the result requires including two or more classes which is mathematically formulated as in Eq. (15).∑ (15) The pooling layers are applied to the result in a statistic of input, and the structure of output is rescaled without losing the essential information.There are various types of pooling layers, this paper utilizes the highest pooling that individually produces large values in the rectangular neighborhood of individual points in 2D information for every input feature correspondingly.The fully connected (FC) layer, which is the last layer with and output and input are illustrated further.The parameter of the output layer is stated as a weight matrix .Where, and are rows and columns, and the bias vector .Consider the input vector the fully connected layer output with an activation function is formulated as in Eq. ( 16).(16) where, is the matrix product where function is employed as a component.This fully connected layer is applied for classification difficulties.The FC layer of CNN is commonly involved at the topmost level.The CNN production is compressed and displayed as a single vector.
Table I shows the notation description.

IV. EXPERIMENTAL RESULTS
In this research, the HCFS-BOA based CNN is simulated using a Python environment with the system configuration of 16GB RAM, Intel core i7 processor, and Windows 10 operating system.The parameters like accuracy, precision, recall, and f1-score are utilized to estimate the performance of the model.The mathematical representation of these parameters is shown Eq. ( 17) to Eq. ( 20).
 Accuracy -Accuracy is the proportion of accurate predictions to all input samples and it is calculated using Eq. ( 10).
(17) www.ijacsa.thesai.org Precision -The precision measures the percentage of actual data records versus expected data records.The performance of the classification model is greater if the precision is higher. (  Recall -Recall is calculated as the sum of the true positives and the positive class images.
 F1-Score -It is also known as the harmonic mean which seeks a balance between recall and precision. (20)

A. Quantitative and Qualitative Analysis
This section shows the quantitative and qualitative analysis of the proposed CSF-BOA-based CNN model in terms of precision, accuracy, f1-score, and recall, as presented in Tables II, III and IV.Table II illustrates the performance of feature selection on the CIC-IDS2017 dataset.The performances of ACO, PSO, CFS, and BOA are measured and matched with the proposed HCFS-BOA.Fig. 2 represents a graphical illustration of the feature selection methods.The obtained result shows that the proposed HCFS-BOA algorithm attains an accuracy of 95.98%, precision of 94.23%, recall of 93.62%, and f1-score of 94.96% which is better when compared to the existing optimization algorithms.
Table III illustrates the performance of classification with default features using CIC-IDS2017 dataset.The performance of Support Vector Machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbor (KNN), and Recurrent Neural Network (RNN) are measured and matched with the proposed HCFS-BOA.Fig. 3 represents the graphical illustration of classification performances.The obtained result shows that the proposed HCFS-BOA algorithm attains an accuracy of 93.68%, precision of 92.92%, recall of 91.69%, and f1-score of 92.73% which is superior when compared to the existing optimization algorithms.Table IV illustrates the classification outcomes with optimized features using CIC-IDS2017 dataset.The performance of SVM, ANN, KNN, and RNN are measured and matched with the optimized feature CNN.Fig. 4 illustrates the graphical representation of classification performances with optimized features.The obtained outcomes prove that the CNN algorithm accomplishes an accuracy of 99.45%, precision of 98.89%, recall of 98.67%, and f1-score of 97.98%, therefore being superior in contrast to the existing optimization algorithms.The ACO, PSO, CFS, and BOA consume 25 seconds, 29 seconds, 31 seconds, and 35 seconds of time, respectively.The time analysis of HCFS-BOA with CNN demands a training time of 20 seconds, being more robust in comparison with other optimization techniques like ACO, PSO, CFS, and BOA on the CIC-IDS2017 dataset.Table V shows the performance of classification with optimized features on the NSL-KDD dataset.Fig. 5 shows that the obtained outcomes of optimized results of the CNN algorithm accomplishes an accuracy of 98.13%, precision of 97.36%, recall of 97.07%, and f1-score of 95.34%, in that way, proving more robust in contrast to the previous optimization algorithms.The ACO, PSO, CFS, and BOA require 22 seconds, 25 seconds, 28 seconds, and 34 seconds of time, respectively.The time analysis of HCFS-BOA with CNN needs a training time of 15 seconds which is lesser than that of the previous optimization techniques like ACO, PSO, CFS, and BOA on the NSL-KDD dataset.

C. Validation of Real-Time Applications
The NSL-KDD dataset is commonly deployed for intrusion detection in IoT to ensure reliability and security for healthcare systems.This research uses the NSL-KDD dataset for training and validation purposes on real-time applications in the cloud.The NSL-KDD dataset is split into training, testing, and validation in the ratio of 70:15:15.IDS is created to detect the different types of attacks by evaluating system logs, network traffic, and behavioral patterns.Malware attacks, DoS attacks, Cross-Site Scripting (XSS), etc., are different attacks.These types of attacks are performed when the patient information is blocked or stolen by attackers.Therefore, the NSL-KDD dataset is employed for model training purposes to reduce the attacks in real-time healthcare applications.

D. Discussion
The CIC-IDS-2017 dataset is beneficial for intrusion detection because of its comprehensive representation of realistic traffic network scenarios with different types of attacks and normal activities.It provides a labelled and largescale dataset that assists the evaluation and enhancement of intrusion detection with enhanced robustness and accuracy.The NSL-KDD dataset is beneficial for intrusion detection as it solves limitations in the original KDD Cup dataset by minimizing redundancy and managing a more balanced distribution of classes.It generates the representation of a more realistic modern traffic network that contains normal behavior and different wider attacks that maximize intrusion detection robustness.By using these two datasets, the proposed approach is analyzed by generic type.Moreover, the advantages of the proposed method and the limitations of existing methods are discussed.The existing methods have some limitations such as the SafetyMed method [16] has no defense mechanism against an attack of AML.Combining various detection systems increases the risk of false negatives and false positives in HIDM [17].BDSDT [18] isn't effective against web and Bot threats since there are fewer instances of these two attack classes.The proposed HCFS-BOA-based CNN model overcomes the existing models' limitations.
To overcome the problem of AML attack, CFS is used to identify highly informative features for minimizing the risk of adversarial manipulations compared to other algorithms.BOA assists in identifying an optimal subset of features that maximizes detection accuracy and reduces the risk of false positives and false negatives.This is done by focusing on informative features in CFS that assist in enhancing the model's ability to discriminate between various attack classes like web and Bot threat.Combining CFS with BOA enables appropriate features that not only have significant correlations with the target variable but also contribute to the optimal performance of intrusion detection in the healthcare system, in contrast to the other methods.The CNN is deployed to identify and classify intrusion accurately and effectively.New attacks such as web and Bot threat attacks are classified effectively by using CNN.The proposed HCFS-BOA-based CNN achieves a superior accuracy of 99.45% when compared with the existing methods namely, SafetyMed, HIDS, and BDSDT.

V. CONCLUSION
In this research, the HCFS-BOA based on the CNN model is proposed for intrusion detection to secure the entire network in the healthcare system.The proposed method mainly comprises four stages: dataset, min-max normalization, feature selection, and classification.Initially, the data is obtained from the CIC-IDS2017 and NSL-KDD datasets, after which the min-max normalization is performed to normalize the acquired data.For feature selection, HCFS-BOA is employed for optimal performance of intrusion detection in healthcare systems.Finally, the CNN is deployed to identify and classify intrusion accurately and effectively.The proposed HCFS-BOA-based CNN achieves a better accuracy of 99.45% when compared with the existing methods like SafetyMed, HIDS, and BDSDT.In the future, hyperparameter tuning can be applied in feature selection for improving the model's performance.

Fig. 1 .
Fig. 1.Block diagram for the proposed method

1 )
Correlation-based Feature Selection (CFS):One of the most known filter algorithms is CFS which selects features based on the output of a heuristic (correlation-based) evaluation function.It seeks to choose subsets whose attributes are highly correlated with the class but unassociated with one another.Repetitive features are selected based on their high correlation with at least one other feature, while low-association features are ignored.The function of the CFS feature subset assessment is mathematically expressed in Eq.

TABLE II .
PERFORMANCE OF FEATURE SELECTION USING CIC-IDS2017 DATASET

TABLE III .
PERFORMANCE OF CLASSIFICATION WITH DEFAULT FEATURES USING CIC-IDS2017 DATASET

TABLE IV .
PERFORMANCE OF CLASSIFICATION WITH OPTIMIZED FEATURES USING CIC-IDS2017 DATASET