Lightweight Internet Traffic Classification based on Packet Level Hidden Markov Models

During the last decade, Internet traffic classification finds its importance not only to safeguard the integrity and security of network resources, but also to ensure the quality of service for business critical applications by optimizing existing network resources. But optimization at first place requires correct identification of different traffic flows. In this paper, we have suggested a framework based on Hidden Markov Model, which will use Internet Packet intrinsic statistical characteristics for traffic classification. The packet inspection based on statistical analysis of its different characteristics has helped to reduce overall computational complexity. Generally, the major challenges associated with any internet traffic classifier are: 1) the limitation to accurately identify encrypted traffic when classification is performed using traditional port based techniques; 2) overall computational complexity, and 3 ) to achieve high accuracy in traffic identification. Our methodology takes advantage of internet packet statistical characteristics in terms of its size and their inter arrival time in order to model different traffic flows. For experimental results, the data set of mostly used internet applications was used. The proposed HMM models best fit the observed traffic with high accuracy. Achieved traffic identification accuracy was 91% for packet size classifier whereas it was 82% for inter packet time based classifier. Keywords—Hidden Markov model; traffic classification; network security; deep packet inspection; internet traffic modeling; Internet of Things


I. INTRODUCTION
Rapid developments in multimedia and broadband applications have made traffic classification a difficult subject, but over the years it has drawn significant importance [1]- [5] among researchers.Use of nonstandard ports, user privacy and huge traffic load on the network is creating major bottlenecks to some of the developed techniques.Traditional port based classification techniques are not reliable and cannot identify encrypted traffic.Statistical analysis based deep packet inspection approaches have proven to be more robust and efficient to handle encrypted traffic, which have made it a fertile research area.
Network traffic classification is fundamental to number of network activities, including its management, security, planning and quality of service provisioning [6].The prerequisite for Internet traffic classification is packet inspection.However, strict privacy policies and heavy network load coupled with high processing and infrastructure requirements for deep packet inspection engines have made it difficult to implement.Statistical analysis based packet inspection approaches have been very effective for encryption and protocol obfuscation.But still real time traffic classification and complexity of existing solutions is a big challenge.The debate for optimal technique for traffic classification is still open and with the emergence of new multimedia broadband applications, like Peer to Peer, Internet Protocol based Television and online Games, it has become very difficult for traditional classifiers to identify different traffic flows [7], [8].The researchers have responded to this difficulty by working out different methods of internet traffic classification based on application level usage patterns and customer behavior.The authors [9], [10] have modeled internet traffic by using a stochastic process in which internet traffic has a self-similar character in nature.The overall behavior for this model was observed for different traffic flows in various network architectures.

II. RELATED WORK
In port based identification techniques, each application is having a unique port number at the server side, and various applications are detected by doing analysis of TCP and UDP [27] traffic.But before applying any traffic engineering rules, the captured port numbers are compared against their default ports [28] in order to validate correct port identification.But the rapid advancements in various applications, some authors have assigned port numbers other than their default port numbers.Bit-torrent [29] is one of such applications which use different port numbers.Due to such cases, the port based detection could not identify 30% of Internet traffic [30]- [32].Nguyen and Armitage [11], covers the detailed and comprehensive work about traffic classification up to 2008.But due to the failure of two main packet classification techniques: 1) mapping of transport layer source and destination ports; and 2) payload signature based recognition, the researchers have focused their work on traffic classification using statistical and Machine Learning techniques.Nguyen, Thuy TT and Grenville Armitage [11] used machine-learning technique to analyze interactive IP traffic.W. Li and A. W. Moore [12] have suggested machine learning approach based on Naive Bayes and C4.5 decision tree algorithms, which accurately classify internet traffic by collecting different features at the start of internet traffic flows.There are number of packet scanning applications which are implemented across different networks, and they are capable of doing packet inspection, like SNORT [24], [25] and Linux L-7 (Layer-7) filter.One very important key area is Network security, where the intrusion takes place to take over system resources and causing denial of service for end users.To mitigate such attacks, authors [26] have suggested passing over the entire traffic through a firewall where all rules have been defined.The implementations [14], [15] works on statistical properties of different flows, i.e.IPT (Inter packet time) and PS (Packet size).Similarly, HMM implementation [16] covers the comparative analysis of different HMMs with other techniques of traffic classification.The researchers also applied other statistical methods [17], [18] to address the problem of traffic classification in IP networks.

III. HIDDEN MARKOV MODEL
These are stateful statistical models which are based on statistical principles of Markov Chain, which is a stochastic process where one state depends on the other state and are linked with each other through state transition probabilities.HMM can be represented at a high level by following variables: 1) The hidden variables with their temporal evolution follow a Markov chain, i.e. xn = s1, s2 . . ., sN represents the (hidden) state at discrete time n with N representing the number of states.
2) The observable variables which stochastically depends on the hidden state, i.e. yn = O1, O2 . . ., OM , it represents the observable variables at discrete time n with M being the number of observable variables.3) B is N×M observable generation Matrix, where M is the observation matrix and it could be discrete or continuous in nature.Each observation can be described by different distributions and all these distributions are log-concave in nature.
The probability of being in any specific state while considering the same Markov Chain λ at a certain time t is as under HMM based estimation model was developed by using HMM estimation capabilities (learning, modeling, and prediction) [19], both for PS and IPT separately.The traffic classification model [9] recognizes the distinct behavior patterns of various flows.In HMM implementation [13], first few packets are used to train the model and to classify each flow at an early stage.The basic HMM structure learns the characteristics of initial packets of different flows and afterwards, the statistical properties of the complete sequence are determined by observing packet size and inter packet time.Following four mostly used application classes: 1) Live streaming (YouTube) 2) Email services 3) Online game 4) Voice services (Skype) were used to develop the model.

IV. METHODOLOGY AND APPROACH
In order to Model different traffic flows, we focused four mostly used applications.These applications were represented by four different states based on their statistical properties, i.e. packet size and inter packet time.These applications were selected based on their usage and complexity.The classifier block diagram is as under.
The related traffic was generated from dedicated network machines and it was captured on a server placed in Network Operation Center.The considered traffic statistical parameters, i.e. packet size and inter packet mean and standard deviation was calculated using MATLAB and is shown in Table 2. Where, The Forward variable α and the backward variable β were computed using Forward-Backward algorithm [20].These variables are mentioned in ( 5) and (6).
The likelihood for Inter packet time and packet size were computed by using (7), which is given as under

( / ) [ ] [ ]
Test traffic was generated from known sources of YouTube, email, Skype and online game.The overall traffic in terms of bytes collected is shown in Table 3.
Delay was calculated both for PS and IPT traffic flows and their group delay for trained and training data is shown in Fig. 2. It shows that initially there was considerable delay (gap) between trained and training data but after eight iterations both started matching each other.PS and IPT probability density functions of these four set of traffic flows (YouTube, email, Skype & game) are shown in Fig. 3 to 6. Fig. 3 shows that YouTube average packet size is 90 bytes and its IPT is 32 bytes.Variance between PS and IPT validates that they are two independent data sets.
As compared to YouTube traffic Fig. 4 shows that email average packet size is 200 bytes and its IPT mean is almost in the same range as that of YouTube traffic, i.e. 32 bytes.Variance between PS and IPT validates that they are two independent data sets.Fig. 5 also validates the same variation between PS and IPT values for Skype traffic as it was observed for YouTube and email.Fig. 6 shows the variation for online game traffic.For these models, the training data statistics are shown in Table 4.It comprises of 945 initial packets of YouTube, email, Skype and online game.
The bar plot of training data of PS and IPT is shown in Fig. 7 and 8. PS mean value of these applications is almost double as compared to the mean value of IPT.Similarly, PS and IPT standard deviation validates the distinct nature of PS and IPT traffic data.

V. ESTIMATING FLOW PARAMETERS AND RESULTS
For estimating HMM parameters, Baum-Welch introduced an iterative algorithm [21], which kept refining HMM parameters (π, A, B) until it converges to a local minimum.The Baum-Welch algorithm seeks to optimize λ via an auxiliary function λ t = (π t , A t , B t ), which satisfies either λ = λ or P (O|λ) < P (O|λ t ).It is also represented in below equation: λ) will converge to a local optimal solution, provided that the below condition is fulfilled.
P (O|λ) give in (10) yields the results in terms of HMM parameters.
The above equations represent new sets of estimated parameters learned with the help of Expectation Maximization algorithm.

A. Traffic Flows Estimation
HMM Viterbi was applied to find out the most likely path for the hidden Markov model as specified by the state transition matrix (A), and emission matrix (B).Model parameters were iteratively improved by using Viterbi Algorithm.PS and IPT states state transition a s shown in Fig. 9 and 10 were used to optimize likelihood of each state.These figures indicate that YouTube was the mostly found state both for PS and IPT transitions.It also matches with actual traffic which was generated from different traffic sources.After computing YouTube, email, Skype and online game, their state transition probabilities are shown in Fig. 11.It shows that YouTube has 92.2% probability to stay within the same state, which reflects that overall traffic is mostly dominated by YouTube, and there are higher chances that the YouTube state probability always remains very high as compared to other flows.
The traffic flow identification accuracy results for PS and IPT are shown in Table 5.The traffic identification accuracy PS was up to 92%, whereas for IPT, the achieved accuracy was 87%.
The modeling results of YouTube, email, Skype and game for PS and IPT are shown in Table 6 and 7    Row 1 in Table 6 shows that for YouTube application achieved accuracy for PS based modeling was 91.93%, whereas 6.5% of the YouTube traffic had been classified as email, 1.2% as Skype, and 0.37% had been classified as online game.This shows that accuracy of classifier was up to 91.93% for YouTube, 84.20% for email, 81.25% for Skype and 79.54% for online game.Similarly, in case of IPT, the Table 5 shows that accuracy of classifier for YouTube traffic was 81%, for email 73.2%, for Skype it was 69% and for online game, it was 72%.For different flows, we considered traffic in one direction only and that could be one of the reasons that to a certain extent, the accuracy was 69% for Skype.Considering traffic in both directions may improve the accuracy.

VI. CONCLUSION
With rapid advancements in Internet of Things, the network resources are no more unlimited, and bandwidth hungry multimedia applications are consuming the major part of available bandwidth.Traffic classification is key to network security solution and management architectures [22], [23].In this paper, a novel HMM based modeling www.ijacsa.thesai.orgtechnique has been proposed that can classify internet traffic based on their statistical properties, i.e.PS and IPT.The traffic classifications have been done by using minimum number of statistical parameters, which reduced computational complexity and overall load on network systems.The comparative analysis of PS and IPT shows that achieved classification accuracy for PS based model was 92% and for IPT it was 81%.The achieved accuracy suggests that proposed modeling framework can be part of a multi traffic classifier system.Moreover, PS and IPT combination could also result in better accuracy and can be an area of future work on traffic classification.

Fig. 1 .
Fig. 1.Classifier block diagram.TABLE II.CONSIDERED TRAFFIC STATISTICS App IPT (dBµ) (mean) 2 [l]=packet size of mth packet IPT and PS were assumed to be statistically independent variables.The conditional probability density functions (pdf's) for inter packet time and size are given in (3) and (4).
. The results are shown through a confusion matrix.All correct classification were shown italic in below tables.

ACKNOWLEDGMENT
Authors are grateful to Pakistan Telecommunication Company Limited and Post graduate Laboratory of ElectricalEngineering Department, University of Engineering and Technology, Lahore for allowing the use of their resources for the verification of results and concluding the research.

TABLE III .
TOP FOUR APPLICATIONS TRAFFIC

TABLE IV .
TRAINING SET STATISTICS Fig. 7. PS (packet size) training data.

TABLE V .
PS AND IPT ACHIEVED ACCURACY COMPARISON

TABLE VI .
CLASSIFICATION RESULTS CONFUSION MATRIX (PS)

TABLE VII .
CLASSIFICATION RESULTS CONFUSION MATRIX (IPT)