DDoS Classification using Combined Techniques

—Now-a-days, the attacker's favourite is to disrupt a network system. An attacker has the capability to generate various types of DDoS attacks simultaneously, including the Smurf attack, ICMP flood, UDP flood, and TCP SYN flood. This DDoS issue encouraged the design of a classification technique against DDoS attacks that enter a computer network environment. The technique is called Packet Threshold Algorithm (PTA) and is combined with several machine learning to classify incoming packets that have been captured and recorded. Apart from that, the combination of techniques can differentiate between normal packets and DDoS attacks. The performance of all techniques in the research achieved high detection accuracy while mitigating the issue of a high false positive rate. The four techniques focused in this research are PTA-SVM, PTA-NB, PTA-LR and PTA-KNN. Based on the results of detection accuracy and false positive rate for all the techniques involved, it proves the PTA-KNN technique is a more effective technique in the context of detection of incoming packets whether DDoS attacks or normal packets.


I. INTRODUCTION
The world now desperately needs an Internet to share resources with other users no matter where they are.It provides many facilities for users to perform daily activities including online games, social media and information search related to teaching and learning.Internet is available 24 hours a day to all users.However, the Internet is often threatened by several network attacks from attackers around the world and this includes DDoS attacks as said by study [1].
When a DDoS attack is launched by an attacker, the computer network or system is inaccessible at that time, even for users who have registered in the system.Typically, attackers apply botnets to perform DDoS attacks to get attacks with incredible speed.It can weaken the target server to serve all requests at that time.According to research [2], DDoS attacks can be categorized into three groups.These categories are volume-based attacks, followed by protocol attacks, and application layer attacks.Volume-based attacks are a category that involves attacks aiming to overwhelm network resources by flooding communication channels with a high volume of traffic.Volume-based attacks often utilize botnets, which are networks of compromised computers controlled by the attacker [3].By leveraging thousands or millions of infected devices, the attacker generates a massive amount of network traffic, leading to system failures in the targeted infrastructure.The category of protocol attacks focuses on attacking network protocol layers.DDoS protocol attacks often exploit vulnerabilities within the communication protocols used in network infrastructure, such as TCP/IP [4].Attackers may employ techniques like SYN floods, where they send an overwhelming number of SYN requests to the target server, causing an overload of requests and hindering the server's ability to serve legitimate users.Meanwhile, application layer attacks refer to targeting specific applications or services running on top of the network infrastructure [5].Application layer DDoS attacks focus on exploiting vulnerabilities within the application's logic or resources it relies on.Attackers can generate various types of DDoS attacks from anywhere.An example of such an attack is the HTTP flood, where attackers overwhelm a web server by sending an abnormally high volume of HTTP requests.This flood of requests leads to a strain on server resources, causing a degradation in performance or even a complete service failure.
There are several types of DDoS attacks that can be generated by attackers from anywhere.These attacks encompass ICMP flood, UDP flood, Ping of Death, Slowloris, Zero-day attack, Smurf, and TCP SYN flood [6].In order to protect against DDoS attacks, a robust and effective detection strategy is crucial. In addition to integrating machine learning algorithms with the PTA to enhance the overall performance and accuracy of the detection system, this research also conducted a comprehensive evaluation of the effectiveness of each technique.The results were presented to identify the most efficient approach for detecting malicious packets within a network environment.Furthermore, this study explored potential enhancements and optimizations to further advance the state-of-the-art in DDoS attack detection.
Having a reliable and precise detection strategy is of utmost importance in safeguarding against DDoS attacks.The combined approach of the PTA and machine learning algorithms significantly enhances the system's capability to accurately differentiate and classify incoming packets.By reducing false positives, this strategy provides a more effective defense against DDoS attacks, ensuring the integrity and availability of network resources [7].
The DDoS detection problem is enhanced using machine learning models such as SVM, KNN, Naïve Bayes, and Logistic Regression, which are well-suited for handling classification jobs.Naïve Bayes is strong at probabilistic classification, SVM is good at separating data points, KNN is good at pattern recognition, and Logistic Regression is good for binary classification.By adjusting to a variety of packet behaviors, these models help distinguish between malicious and legitimate packets with accuracy.
There are, nevertheless, certain restrictions.Large datasets may be a problem for SVM and KNN, affecting computing efficiency.The independence between features assumed by Naïve Bayes may not hold true for complex packet dynamics.Non-linear correlations between features may be difficult for logistic regression to handle, which could reduce its accuracy for complex packet classifications.When selecting the best model for DDoS detection, these limitations must be considered.
This paper is divided into several sections.Related work is presented in Section II.Next, in Section III, the methodology is presented, and the evaluation of techniques is described in Section IV, followed by results and discussion in Section V.The final section, Section VI, provides a brief summary of this paper.

II. RELATED WORK
Despite the substantial research efforts dedicated to countering DDoS attacks, the challenge of mitigating them endures.Researchers have introduced various techniques in their attempts to combat the actions of DDoS attackers.Table I provides a summary of the methods proposed by these researchers to address such attacks.
Starting with the first study conducted by study [8], this research addresses the pressing issue of distributed denial-ofservice (DDoS) attacks within the context of 5G networks.It emphasizes the predominant focus of previous studies on radio access networks (RAN) and voice service networks, often overlooking the vulnerabilities inherent in core networks (CN).These core network components, including the Access and Mobility Management Function (AMF), Session Management Function (SMF), and User Plane Function (UPF), are pivotal in providing expansive 5G coverage but are susceptible to DDoS attacks.The study introduces a methodology and a threat detection system tailored to counter signalling DDoS attacks specifically targeting 5G standalone CNs.By leveraging fundamental machine learning classifiers and preprocessing techniques such as entropy-based analysis (EBA) and statisticsbased analysis (SBA), the research demonstrates the effectiveness of proactive defense strategies against these attacks.Notably, the results underscore the RF classifier as the top performer, achieving an impressive average accuracy of 98.7%.
The second study, led by [9], underscores the critical role of the internet as a fundamental communication tool in contemporary society.In tandem with the internet's indispensability, the frequency and severity of cyber-attacks have escalated, with DDoS attacks ranking among the top five most impactful and costly cyber threats.DDoS attacks disrupt legitimate users' access to network resources, necessitating the development of swift and accurate detection methods to mitigate their considerable damage.The study adopts machine learning classification algorithms, including LR, DT, RF, Ada Boost, Gradient Boost, KNN, and NB to detect DDoS attacks using the CICDDoS2019 dataset, encompassing eleven distinct DDoS attack types characterized by 87 features.The research evaluates classifier performance through various metrics, revealing that AdaBoost and Gradient Boost excel in classification, while LR, KNN, and NB also exhibit strong performance.However, DT and RF classifiers demonstrate less effective classification results.
The third study, conducted by [10], addresses the ongoing challenge of effectively managing DDoS attacks, which pose a significant threat to network security by inundating target networks with malicious traffic from multiple sources.Despite the availability of various conventional methods for detecting DDoS attacks, rapidly identifying these threats using feature selection algorithms remains a formidable task.In this study, a hybrid approach is introduced, incorporating feature selection techniques such as chi-square, Extra Tree, and ANOVA, in conjunction with four machine learning classifiers: FR, DT, KNN, and XGBoost.The primary goal is to enable early detection of DDoS attacks on IoT devices.To validate the proposed methodology, the research employs the CICDDoS2019 dataset, which encompasses a wide range of DDoS attacks, and conducts assessments in a cloud-based environment (Google Colab).The experimental results demonstrate the superior performance of the hybrid methodology, achieving an impressive 82.5% reduction in features and attaining 98.34% accuracy with ANOVA for XGBoost, thereby facilitating the early identification of DDoS attacks on IoT devices.
The fourth study, conducted by study [11], pioneers a comprehensive approach to address pressing security concerns in IoT networks, with a specific focus on the persistent threat www.ijacsa.thesai.orgposed by DDoS attacks.Their innovative solution involves the integration of SDN with IoT to reinforce security measures and access control.Despite this integration, DDoS attacks continue to pose a formidable challenge.To tackle this issue head-on, the study introduces an advanced machine learning-based security framework.They meticulously craft a controlled testing environment for simulating DDoS attacks, capturing network logs, preprocessing them into a structured dataset, and employing a trio of robust algorithms, namely NB, DT, and SVM for network packet classification.Remarkably, their framework attains impressive accuracy rates, achieving 97.4% for NB, 96.1% for SVM, and an outstanding 98.1% for DT, unequivocally showcasing its effectiveness in mitigating DDoS threats while optimizing resource utilization and proficiently managing network traffic.This pioneering approach holds substantial promise for elevating the security posture of IoT networks.In the final study conducted by study [12], the focus is on investigating DDoS attacks within the context of the IoT.The research utilizes machine learning classifiers, including both bagging, and boosting techniques, to categorize attack traffic, making use of the CICDDoS2019 dataset designed to simulate DDoS attacks on the UDP and TCP protocols commonly employed in IoT networks.To tackle data imbalance, the study employs an ensemble sampling approach that combines random under-sampling and ADASYN oversampling.Feature selection is carried out using two methods: the Pearson correlation coefficient and the Extra Tree classifier.The results reveal that RF performs the best with minimal training and prediction time, and Extra Trees for feature selection outperforms the Pearson correlation coefficient method in terms of overall time efficiency for most classifiers.However, it's noteworthy that when using the Pearson correlation coefficient for feature selection, RF remains the optimal choice for attack detection.
After conducting an extensive analysis of prior research in the field of DDoS detection using machine learning methods, it becomes evident that there is a pressing need to improve the process of feature selection in the datasets utilized.It is of paramount importance to minimize the occurrence of false positives in order to achieve a heightened level of detection precision.This revelation underscores the critical importance of carefully selecting relevant and efficient features for incorporation into DDoS detection and classification methodologies.By enhancing feature selection techniques, the potential for generating false positive alerts can be significantly reduced, resulting in outcomes that are more reliable and precise.

III. PROPOSED METHODOLOGY
This section introduces the research methodology, which is organized into four phases as illustrated in Fig. 1, and it outlines various research activities.

A. Dataset Preparation
A dataset containing several types of DDoS attacks and normal packets is provided in the first phase, as shown in Fig. 2. The dataset is relevant to research activities as it records multiple incoming packets, which are the primary focus.It includes various features such as source address, destination address, packet type, packet size, and packet class.For instance, the source address refers to the IP address of the sender generating the packet or traffic, while the destination address represents the IP address that receives the packets or traffic.

B. Data Preprocessing
The second research phase is data preprocessing.This phase is crucial in research work as it requires expertise to transform the data into a comprehensible format.Two activities were conducted in this phase: data cleaning and data reduction.Data cleaning is indeed the first activity in the research process, as presented in Fig. 3.This method is called identification of missing values, which is utilized in the research.It indicates that if there is a missing value, the output www.ijacsa.thesai.orgwill show a value 1, 2, and so on.This means that there are missing values or empty cells in the Src_Addrs, Pkt_ID, and From_Node columns in the dataset used.The second activity involves data reduction, reducing the number of data samples by identifying and eliminating duplicate rows in the dataset, as presented in Fig. 4. In this case, data duplication occurs in rows 3 and 6, which need removal to generate high-quality data and facilitate analysis.Both activities assist in obtaining complete, consistent, and high-quality data within the dataset.

C. Data Splitting
In the third phase of the research, known as data splitting, further investigation proceeds.The dataset, consisting of a total of 240,000 samples, is partitioned into two distinct sets: the training set and the testing set, as outlined in Table II.
The training set plays a crucial role in assessing the effectiveness of machine learning methods by utilizing data samples from the dataset.On the other hand, the testing set is employed to evaluate these methods.The train and test functions were formed to separate these two sets of data.The dataset was divided according to the data distribution outlined in Table II.For example, the data separation for 80: 20 ratios allocates 80% for the training set and the remaining 20% for the testing set.

D. Packet Classification
Quality data has been selected, and this research continues with the final phase, which is packet classification.In this phase, a technique called Packet Threshold Algorithm (PTA) has been proposed.This PTA is able to identify incoming packets whether normal packets or DDoS attacks.PTA is combined with several machine learning techniques, SVM, KNN, NB and LR.In the research, the functioning of this PTA was analyzed, as shown in Fig. 5. First, the PTA will check incoming packets based on a predefined packet threshold, which involves packet size and packet type received by the server.If the received packet is TCP or UDP or ICMP and a size of less than 60 bytes per second, PTA will issue the incoming packet category is normal packet.If the server receives TCP packets larger than 60 bytes per second, PTA will issue the incoming packet category is TCP SYN flood.Meanwhile, if the server receives a packet size exceeding 60 bytes per second and carries UDP packets, the PTA will issue the incoming packet category is UDP flood.If the type of packet received by the server is an ICMP packet and the size exceeds 65,535 bytes per second, PTA will issue the incoming packet category is Ping of Death.Meanwhile, if the ICMP packet size is less than 65,535 bytes per second but exceeds 60 bytes per second, PTA will issue the incoming packet category is a Smurf attack.The PTA will act to drop all packets received by the server, for which the packet size exceeds 60 bytes per second and the PTA allows packet sizes less than 60 bytes per second to enter the network environment.Finally, PTA is combined with machine learning by involving several phases or activities including features selection, data splitting, construction and evaluation of the techniques involved.
Here is a summary of how PTA determines the category of incoming packets.Firstly, PTA utilizes a predefined packet threshold to evaluate incoming packets.Secondly, PTA examines the packet type and size to determine their respective categories, as described above.Finally, based on the determined category, PTA performs specific actions on the packet: dropping all packets received by the server that exceed 60 bytes per second and allowing packets with sizes less than 60 bytes per second to enter the network.By employing this approach, PTA can accurately classify incoming packets as normal or belonging to various types of DDoS attacks.These four evaluations can be illustrated using the confusion matrix in Table III.The confusion matrix is a crucial tool in machine learning, providing a detailed breakdown of a model's performance by categorizing predictions into TP, FN, FP, and TN.This breakdown helps assess both accuracy and the model's ability to identify positive and negative cases accurately.It is a fundamental instrument for improving classification model effectiveness in various domains, including DDoS attack detection.

V. RESULT AND DISCUSSION
In this section, the experimental results for the various techniques employed are presented.Starting with an evaluation of the effectiveness of the proposed method for DDoS attack detection, followed by a comparative analysis with previously utilized techniques.

A. Performance Comparison of PTA with Machine Learning
Techniques This section presents the performance results for four combinations of PTA techniques with machine learning based on data splitting between training and testing sets, as shown in Table III.Upon analyzing the performance of each technique using a 50:50 data splitting, it becomes evident that the PTA-KNN technique attains the highest detection accuracy of 99.86%.It is closely followed by the PTA-SVM technique, which also achieves a detection accuracy of 99.86%.The PTA-LR technique achieves a detection accuracy of 99.12%, whereas the PTA-NB technique reaches a detection accuracy of 98.70%.
Shifting focus to the 60:40 data splitting, the PTA-KNN technique once again emerges as the frontrunner, achieving the highest detection accuracy of 99.86%.Remarkably, the PTA-KNN technique surpasses the detection accuracies achieved by the PTA-SVM, PTA-LR, and PTA-NB techniques, which are 99.66%,99.17%, and 98.72% respectively.For the 70:30 data splitting, the PTA-KNN technique continues to outperform the other techniques with a detection accuracy of 99.84%.The PTA-SVM, PTA-LR, and PTA-NB techniques achieve respective detection accuracies of 99.65%, 99.16%, and 98.69%.Table IV shows the performance comparison of PTA with machine learning techniques.When considering the 80:20 data splitting, the PTA-KNN technique showcases an impressive detection accuracy of 99.83%, surpassing the PTA-SVM technique that achieves a detection accuracy of 99.63%.Furthermore, the PTA-LR technique demonstrates an impressive detection accuracy of 99.17%, whereas the PTA-NB technique achieves a slightly lower accuracy of 98.68%.Through meticulous examination, it can be deduced that the PTA-KNN technique showcases a remarkable efficacy in identifying incoming packets, regardless of their nature as DDoS attacks or normal packets.Observing the statistical outcomes presented in Fig. 6, which depict the effectiveness of the PTA-KNN technique in the research.This effectiveness stems from its utilization of packet type and size as key criteria.This conclusion is further supported by the exceptional detection accuracies achieved across various data splitting ratios: 99.86% for 50:50, 99.86% for 60:40, 99.84% for 70:30, and 99.83% for 80:20.Referring to Table V, it is noteworthy that the detection accuracies presented therein exceed the performance of alternative techniques, thus emphasizing the superiority of the PTA-KNN technique.The detection accuracy percentages for the PTA-KNN technique are determined based on the number of successfully detected incoming packets.For the 50:50 data splitting, 119,827 incoming packets were accurately detected, while 173 packets were misclassified.In the case of the 60:40 data splitting, the PTA-KNN technique successfully identified 95,863 incoming packets as valid, with 137 packets misclassified.Similarly, for the 70:30 data splitting, the technique detected 71,882 incoming packets correctly, but there were 118 misclassified packets.Lastly, for the 80:20 data splitting, the PTA-KNN technique successfully detected 47,920 incoming packets, with 80 packets being misclassified.Overall, the proposed technique appears to exhibit the highest accuracy across most algorithms, rendering it a promising approach for detection.Nevertheless, it is crucial to consider other factors, such as computational complexity and practical applicability, when selecting a technique for a specific problem.

VI. CONCLUSION
The team has extensively researched the capabilities of the PTA technique in detecting both DDoS attacks and normal packets.This involves utilizing a predefined packet threshold that considers factors such as packet size and the specific packet types that attackers may generate.By integrating the PTA technique with diverse machine learning approaches, findings reveal that the PTA-KNN technique surpasses PTA-NB, PTA-SVM, and PTA-LR techniques in terms of detection accuracy and false positive rate percentage.
In the research, potential areas for future enhancement have also been identified based on findings.One possible direction for improvement involves exploring adaptive thresholding techniques that dynamically adjust the packet threshold based on network conditions and attack patterns.Additionally, investigating the integration of anomaly detection algorithms and deep learning models could enhance the PTA technique's ability to detect emerging and sophisticated DDoS attacks.www.ijacsa.thesai.orgThese avenues for future research aim to further enhance the effectiveness and resilience of the PTA technique in combatting evolving cyber threats.
Towards a Machine Learning-Based Framework for DDoS Attack Detection in Software-Defined IoT (SD-IoT) Networks (2023)


above can be summarized as follows: True Positive (TP): Instances where the model correctly detected DDoS attacks when they occurred. False Negative (FN): Instances where the model failed to detect DDoS attacks when they were happening. False Positive (FP): Instances where the model incorrectly flagged normal traffic as DDoS attacks. True Negative (TN): Instances where the model correctly identified normal traffic as not being DDoS attacks.

TABLE I .
PAST STUDY MACHINE LEARNING TECHNIQUE

TABLE III .
CONFUSION MATRIX

TABLE IV .
PERFORMANCE COMPARISON OF PTA WITH MACHINE LEARNING TECHNIQUES

TABLE V .
DETECTION RESULTS FOR DIFFERENT DATA SPLITTING RATIOS AND PACKET TYPES USING COMBINATION TECHNIQUES

Training:Testing) No. of Incoming Packet Detected Normal Ping of Death Smurf TCP SYN Flood UDP Flood PTA-NB
This section presents a performance comparison between the proposed DDoS detection technique and existing methods, as displayed in Table VI.Within the provided table, which demonstrates performance comparisons in terms of detection accuracy for various techniques across different years of publication, it becomes evident that the highest and lowest accuracies vary significantly among the diverse techniques and algorithms employed.Notably, the proposed technique stands out with the highest overall accuracy of 99.86%, achieved using the KNN algorithm.However, it is essential to emphasize that the lowest accuracy values are somewhat dispersed.For instance, in the case of Park et al., the lowest accuracy values for LR and KNN are denoted as NA, indicating a lack of available data.In contrast, for other techniques, such as Gaur and Kumar, the lowest accuracy is attributed solely to the KNN algorithm, which attains an accuracy of 91.39%.

TABLE VI .
PERFORMANCE COMPARISON BETWEEN PROPOSED DDOS DETECTION TECHNIQUE AND PREVIOUS TECHNIQUES