DDoS Attacks Classification using Numeric Attribute-based Gaussian Naive Bayes

Cyber attacks by sending large data packets that deplete computer network service resources by using multiple computers when attacking are called Distributed Denial of Service (DDoS) attacks. Total Data Packet and important information in the form of log files sent by the attacker can be observed and captured through the port mirroring of the computer network service. The classification system is required to distinguish network traffic into two conditions, first normal condition, and second attack condition. The Gaussian Naive Bayes classification is one of the methods that can be used to process numeric attribute as input and determine two decisions of access that occur on the computer network service that is “normal” access or access under “attack” by DDoS as output. This research was conducted in Ahmad Dahlan University Networking Laboratory (ADUNL) for 60 minutes with the result of classification of 8 IP Address with normal access and 6 IP Address with DDoS attack access. Keywords—Distributed Denial of Service (DdoS); Gaussian Naive Bayes; Numeric

In research [2] the Comparative Analysis of Different DDoS Detection Techniques used Statistical Method, Intrusion Detection System (IDS), IDS based Dempster-Shafer Theory, Host Based IDS, Network IDS, and Real Time IDS of Throughput, Fault Tolerance, Performance, Overheads, Response Time, and Detection Rate.
Gülay Ōke [3] used Multiple Bayesian Classifier and Random Neural Network to detect Denial of Service attacks.Naive Bayes Classifier makes a decision by collecting offline input features.The input feature is bit rate, an increase in bit rate, entropy value of the incoming bit rate, Hurst parameter, delay, and Delay rate.Bharti Nagpal [4] comparing 5 DDoS attack tools Trinity, Low Orbit Ion Cannon (LOIC), Tribal Flood Network, Mstream, and Trinoo as Architecture used, Type of Flooding used for attacking, Type of DDoS method used, Possible damage caused, Channel encryption.Gnanapriya [5] research Software-Defined Networking (SDN) shows that SDN provides a new opportunity to defeat DDoS attacks in cloud computing environments, and summarizes the excellent SDN features to defeat DDoS attacks.Then review the study of the launch of DDoS attacks on SDN and methods against DDoS attacks on the SDN.www.ijacsa.thesai.orgNormal TCP connections usually start transmitting from the user by sending SYN to the router, and the router will allocate the buffer to the user and respond with SYN and ACK packets.This stage, the connection is in a half open state, waiting for an ACK response from the user to complete the connection settings.When the connection is completed, this is called 3-way linkage and TCP SYN Flood attacks manipulate this 3-way linkage by making the router busy with SYN request [6].TCP SYN Flood is a common form of Denial of Service attack.Fig. 2 shows the TCP SYN Flood happened.TCP SYN Flood can be observed with a Packet Capture application by using a port mirroring to observe a copy of router activity.TCP SYN flood features are often the emergence of one of the IP Address to the router.The source IP Address that always appears to the router is calculated within a specified time range and used as feature extraction as a DDoS attack [7].
Based on earlier research regarding packet classification with Naive Bayes, in this paper, we provide a detailed understanding of how to process numerical attributes on a network traffic activity based on the Gaussian Naive Bayes method.

A. Gaussian Method
The Gaussian method is one of the common and important methods in probability and statistics, introduced by Gauss in his study of error theory.Gauss uses it to describe errors.Experience shows that many random variables, the height of adult males, and reaction time in psychological experiments, all of which can be solved by the Gaussian Method [8], [9].The Gaussian method is: Where, µ is average and δ is standard deviation, to calculate µ and δ values for numerical attributes using formula

B. Naive Bayes Method
Bayes method is used to calculate the probability of occurrence of an event based on the observed effects of observation.Naive Bayes method is simple probabilistic-based prediction technique based on Bayes's method application with strong independence assumptions [10].Naive Bayes method is: Where, P(A|B) is the posterior of class (target) given predictor (attribute).P(B|A) is the likelihood which is the probability of predictor given class.P(A) is the prior probability of class.P(B) is the prior probability of predictor.

C. Accuracy
The accuracy of a classification system is described as the data output level compared to the desired value.Accuracy in classification is calculated from:  Normal access data in a normal class (True Positives (TP)).
 Normal access data outside the normal class (False Positives (FP)).
 Attacks access data outside the attack class (False Negatives (FN)).

III. RESEARCH METHODOLOGY
A. Topology  Investigator use port mirroring access with IP address 192.168.30.1.To retrieve log data of network traffic from within and to ADUNL. Analyzing IP and data packet, in this step is to analyze the IP address who is doing the attack and how long the packet is sent.

C. Methodology
 Extraction, in this stage log files with the .pcapformat, is converted into spreadsheet files so they can be processed using Gaussian Naive Bayes method.
 Pre-processing, at this step the making of input parameters can be used in the classification method.
 Apply Gaussian Naive Bayes, at this stage Gaussian Naive Bayes classification method, is used to process data that already has input parameters.
 Prediction, at this step Gaussian Naive Bayes method, determines the data that has been processed into two decisions that are normal access or under attack.

IV. RESULT AND ANALYSIS
Object research result capture network traffic at ADUNL.The methodological step is carried out coherently to produce maximum research.

A. Captured IP Packet Result
Log file of captured network traffic for 60 minutes divide within 3 minutes each time access through port mirroring ADUNL by the investigator using Wireshark packet capture in .pcapformat.Fig. 5 shows capture result in .pcapformat.

B. Analyzing IP and data packet
IP address that accesses ADUNL and estimates how many packets of data transmitted by and from the IP address that is doing the activity calculated based on log files that have been obtained.Fig. 6 shows the IP address accessing ADUNL.

C. Extraction
Capture results of network traffic log files in .pcapformat can not be processed into columns and rows required in the classification process.To be processed into columns and rows of .pcapformat are extracted into the .csvformat and then extracted into xlsx format.Fig. 7 shows extracting .pcapformat into .csvformat.

D. Pre-processing
At this stage, it is processing the results of network traffic extraction into the main parameters that can identify normal access or attack.The main parameters used as input parameters shown in Table 1.In this research, two input parameters taken are: www.ijacsa.thesai.org Incoming of IP address (IIP) within specified time range (2nd column is x attribute).
 Packet length (PL) within a specified time range (3rd column is y attribute).    1) is used to calculate the likelihood of Incoming IP address (IIP) normal and attack.

G. Visualization of Classification
Two-dimensional images can be used to display the classification results, so it can detect the level of accuracy.Matlab is the right tool to display the result of the classification.Fig. 9 shows a visualization of ADUNL network traffic classification in 3 minutes time range with an area of class µ+δ using Matlab.The normal class area and the attack with µ+δ based on Fig. 9 have not precisely shaded the members of the set.The accuracy obtained using the formula ( 5) is 57,14%, then searched again the value of δ to get the broad class that can shelter its members.www.ijacsa.thesai.orgThe normal class area and the attack with µ+(1,5δ) based on Fig. 10 still have not precisely shaded the members of the set.The accuracy obtained using the formula ( 5) is 71,43%, then searched again the value of δ to get the broad class that can shelter its members.The normal class area and the attack with µ+(2δ) based on Fig. 11 still have not precisely shaded the members of the set.The accuracy obtained using the formula ( 5) is 78,57%, then searched again the value of δ to get the broad class that can shelter its members.The normal class area with µ+(2,5δ) based on Fig. 12 has not precisely overshadowed the set members, while the attack class is right to cover the set members.The accuracy obtained using the formula ( 5) is 92,86%, then searched again the value of δ from the normal class to obtain the extent of class that can shelter its members.The normal class area with µ+(3δ) and the attack class area with µ+(2,5δ) based Fig. 13 is appropriate to cover the set members.The accuracy obtained using the formula ( 5) is 100%, then counted once again using the Gaussian Naive Bayes classifier to ensure the correctness of each set member.Table 2 shows the recalculating of Gaussian Naive Bayes classifier using a match standard deviation.The class of the normal and attack set corresponds to the access of each IP address.
The average and match standard deviation are finally used to calculate all new data of network traffic at ADUNL in timerange 3 -60 minutes using Gaussian Naive Bayes classifier shown in Fig. 14.

V. CONCLUSION AND FUTURE WORK
Gaussian Naive Bayes classification can be used to process numeric attributes on a computer network service.Numeric attributes such as Incoming IP and Packet Length are the main features to know the access that occurs in a computer network.The average and standard deviation are important for processing data based on Gaussian method, which is also used to visualize in the Matlab.Traffic on a computer network service such as normal access and DDoS attacks can be

Fig. 4 .
Fig. 4. Methodology of DDoS attacks classification.DDoS attacks classification step of the methodology is shown in Fig. 4.  Captured IP packet is used to retrieve data in the form of log file network traffic with port mirroring access in .pcapformat.

Fig. 8
Fig.8shows how to create a set based on average (µ) + standard deviation (δ) in Matlab; x1, y1 is the set of normal access (green), whereas x2, y2 is the set of attack (red).

TABLE I .
INPUT PARAMETERS IN TIME RANGE 0-3 MINUTES

TABLE II .
CLASSIFICATION WITH NEW STANDARD DEVIATION IN TIME RANGE 0-3 MINUTES