Network Packet Classification using Neural Network based on Training Function and Hidden Layer Neuron Number Variation

Distributed denial of service (DDoS) is a structured network attack coming from various sources and fused to form a large packet stream. DDoS packet stream pattern behaves as normal packet stream pattern and very difficult to distinguish between DDoS and normal packet stream. Network packet classification is one of the network defense system in order to avoid DDoS attacks. Artificial Neural Network (ANN) can be used as an effective tool for network packet classification with the appropriate combination of numbers hidden layer neuron and training functions. This study found the best classification accuracy, 99.6% was given by ANN with hidden layer neuron numbers stated by half of input neuron numbers and twice of input neuron numbers but the number of hidden layers neuron by twice of input neuron numbers gives stable accuracy on all training function. ANN with Quasi-Newton training function doesn’t much affected by variation on hidden layer neuron numbers otherwise ANN with Scaled-Conjugate and ResilientPropagation training function. Keywords—Classification; DDoS; neural; network; training; function; hidden; layer


INTRODUCTION
Distributed denial of service (DDoS) is a structured network attack coming from various sources and fused to form a large packet stream.DDoS attacks, generally utilizing resources from the slave computer coordinated by the attacker to decrease the target network resources causing legitimate client cannot access these resources.DDoS packet stream pattern behaves as normal packet stream pattern and it is very difficult to distinguish between DDoS and normal packet stream [1].
DDoS packet stream with a large volume causes the target system cannot handle and end up with a loss of resources such as system shutdown, loss of data, moreover, the system loses the overall of owned services [2], [3].Network packet classification is one of network defense system in order to avoid DDoS attacks [4].Network packet classification can be carried out by utilizing Artificial Neural Network (ANN) method.
Network packet classification for DDoS attacks detection in TOR network using ANN carried on research [5] by utilizing optimization of a sinusoidal function as a feature extractor of the network packet.ANN used in [6] with Resilient-Backpropagation function combined with the ensemble of classifier outputs method and Neyman-Pearson cost minimization strategy for detection of DDoS attack based on DARPA and KDDCUP datasets.Research [7] adopted the ANN method to detect DDoS attacks based on darknet traffic.TCP/80 and UDP/53 packets used as input and optimized by Locally Sensitive Hashing methods.ANN used in [8] to recognize illegal packets in the network, by taking advantage of the Backpropagation functions.TCP, ICMP, and UDP packet used as inputs in the [8].Research [9] proved that the ANN method can be used to detect a new type of DDoS attack, in Hadoop and HBase environment.

II. PACKET CLASSIFICATION APPROACH
The study of packet classification with artificial neural network applying variation of training function and hidden layer neuron number, involves steps as seen on Fig.  Accuracy is the ratio between the addition of normal and DDoS packet that is recognized by the system and compared to the overall packet data.Mean square error (mse) is the most ANN important parameter for performance evaluation of training functions parameters [10].Mean square error reflects an absolute error of ANN training output pattern with desired output pattern.The iterations reflect the time taken by ANN to reach convergence also a tradeoff indicator between training time and convergence.

A. Network Packet Features
To classify the network packet, the first step is extracted from the network feature of the dataset.The aim of feature extraction is to measure certain attributes in original data that distinguish one input pattern from another pattern.In this study, network packet stream extracted to six features based on statistical method.Those six features are: Average packet size: The longer DDoS attack occurs, 1) then it is always followed by a rise in the value of average packet size [11].

Number of packets: DDoS attacks overwhelm a target 2)
computer network by sending many packets at a certain time lag.DDoS always result in high number of the packet [11].

Time interval variance: DDoS attack delivers packages 3)
in large numbers occurred in a certain time span, the value of time interval variance will be smaller and nearly zero.Time interval variance stated as (1) [12].

∑
(1) Where t n is time of a packet received and ̅ is the rate of time a packet is received.

Packet size variance: The normal traffic resulting high 4)
packet size variance values within DDoS attacks resulting close to zero packet size variance value, due to the monotony packet size that sent to target.Packet size variance stated as (2) [12].
Where, p n is received packet size, and ̅ is packet size rate.
Packet rate: Packet rate reflects the number of packets 5) sent by the source address to a destination address within a specific time frame as stated on (3) [12]. ( Where n p is the number of packets, t e is end time a packet is received, t s is the initial time a packet is received.

Number of bytes: DDoS attack always increases the 6)
number of bytes in constant [12].

B. Training Function
There are numbers of batch training algorithms which can be used to train an Artificial Neural Network [13].The most used training algorithms are: Newtonian training function is fast to reach 1) convergence than conjugate gradient methods, but Newton's method is complex and time-consuming to compute the Hessian matrix for feed forward neural networks [14], [15].Based on Newton's method there a new class of method is called a Quasi-Newton method (Matlab trainlm) which doesn't require calculation of second derivatives.The Quasi-Newton method updates a Hessian matrix in each iteration of the algorithm [16], [17].
Resilllent-Propagation training function (Matlab 2) trainrp) refers to the gradient-descent algorithm that removes the effect of partial derivative magnitude from the activation function.In this case a partial derivative of the activation function is used to determine the direction of the neural network weights, whereas the magnitude of the partial derivatives has no effect on the weight changes.So that the weight changes of the neural network can become more stable in achieving the minimum gradient [15].
Scaled-Conjugate training function (Matlab trainscg) 3) refers to the conjugate-gradient algorithm that exploits the gradient's negative direction to match the weight changes of the neural network layer so that it affects the number of iterations the neural network takes to achieve convergence [15].

C. ANN Layer Scheme
There is no certainty that the best number of neurons and hidden layers are used to resolve a problem with an ANN [18].Based on that reason, this study does some variation on hidden layer neuron numbers as seen on Table 1.

Accuracy is the ratio between recognition result of 1)
DDoS and normal packet data compared to the overall packet data.
Mean-squared error (mse), reflect an absolute error of 2) ANN training actual output pattern with desired output pattern.
Iteration reflect the time that takes by ANN to reach 3) convergence [16].All training result stated that there was no overtraining faced on ANN scheme.

E. Accuracy
Quasi-Newton training function (Matlab trainlm) resulted stable accuracy value against all ANN layer schemes as stated on Fig.The number of hidden layers neuron by 2n gives stable accuracy on all training function, as compared to Kolmogorov's theory that stated the best number of hidden layer neurons to solve ANN problem is 2n + 1 which produce accuracy value that tends to be low on this experiments.

F. Mean-Squared Error
As stated From Fig. 6, the conclusion that can be drawn is as follows:

Quasi-Newton (Matlab trainlm) training function 1)
resulted small average mse value on all ANN layer schemes compared to the Scaled-Conjugate (Matlab trainscg) and Resilient-Propagation (Matlab trainrp) training functions.
The number of neurons in the hidden layer don't have 2) a significant effect on MSE value for Quasi-Newton (Matlab trainlm) training functions.The number of neurons in the hidden layer have a The results obtained from this study can be used as a basic reference to determine the effective number of hidden layers neuron in building a network packet classification system based on artificial neural network.Further, the study will be improved on other parameters like increasing the sample size of input patterns presented to the network, reducing error goal and use more training method.
Based on earlier research regarding packet classification with ANN, this study focuses on the ANN training function to find out the best training function layer for packet classification.DDoS dataset published by the Center for Applied Internet Data Analysis (CAIDA) and network normal dataset published by Ahmad Dahlan University Networks Laboratory are used in this study.
out on Matlab 2010R environment running on Windows 7 64-bit.Experimental dataset consists of 500 DDoS traffic data and 500 Normal traffic data by six features.In purpose of ANN training, dataset was divided by default on Matlab 2010R into 70% sets for training, 15% sets for validation, and 15% sets for testing.Distribution of dataset for training, validation, and testing was created by random function (Matlab dividerand) to avoid the bias tendency in the sample pattern.

5 .
The highest accuracy value 0.996 (99.6%) was achieved by ANN with Scaled-Conjugate training function (Matlab trainscg) under 6-(3)-2 layer scheme and also ANN with Quasi-Newton training function (Matlab trainlm) under 6-(12)-2 layer scheme.However, the Scaled-Conjugate training function (Matlab trainscg) resulted less consistent value on other ANN layer schemes.Based on Fig. 5 best classification accuracy was given by ANN with the number of hidden layer neurons by 1/2n and 2n.Where, n is the number of input nurons.www.ijacsa.thesai.org

TABLE .
www.ijacsa.thesai.orgdoesn't much affected by number of hidden layer neurons variation.The significant differences on MSE value is found by applying variation of hidden layer neurons numbers in the neural network trained by Scaled-Conjugate and Resilient-Propagation training function.More number of neurons in hidden layer can reduce MSE value for Resilient-Propagation (Matlab trainrp) training functions and more number of neurons in hidden layer otherwise increases MSE value on Scaled-Conjugate (Matlab trainscg) training functions.In this study, the best suitable number of neurons in hidden layer is 2n, because it gives stable accuracy on all training function.