Towards Multi-Stage Intrusion Detection using IP Flow Records

Traditional network-based intrusion detection systems using deep packet inspection are not feasible for modern high-speed networks due to slow processing and inability to read encrypted packet content. As an alternative to packetbased intrusion detection, researchers have focused on flow-based intrusion detection techniques. Flow-based intrusion detection systems analyze IP flow records for attack detection. IP flow records contain summarized traffic information. However, flow data is very large in high-speed networks and cannot be processed in real-time by the intrusion detection system. In this paper, an efficient multi-stage model for intrusion detection using IP flows records is proposed. The first stage in the model classifies the traffic as normal or malicious. The malicious flows are further analyzed by a second stage. The second stage associates an attack type with malicious IP flows. The proposed multi-stage model is efficient because the majority of IP flows are discarded in the first stage and only malicious flows are examined in detail. We also describe the implementation of our model using machine learning techniques. Keywords—IP flows; Multi-stage intrusion detection; One-class classification; Multi-class classification


I. INTRODUCTION
Network-based Intrusion detection system (NIDS) analyze network traffic to detect malicious activities.Traditional approaches for intrusion detection scan the complete packet content.This method is termed as deep packet inspection (DPI) [18].However, DPI is difficult to implement when packets are being transferred at gigabit speed.Extensive resources and dedicated hardware infrastructure need to be deployed to perform packet inspection [20].In most cases, data transmitting through the network is encrypted.DPI techniques cannot scan the encrypted payload.Another drawback of DPI is the compromise of privacy.Even if the data is not encrypted, performing strong packet filtering on the network traffic might not be permitted due to privacy issues [10].
A relatively new approach for intrusion detection analyzes the communication pattern in the network traffic for abnormal behavior [20].The communication patterns are extracted from the network in the form of IP flow records.The IP flow records contain aggregate packet information and describe the network traffic in a summarized form.An IP flow is defined as a set of IP packets passing through an observation point in the network during a certain time interval.All packets belonging to a particular flow have a set of common properties [6].The extraction of flow records from the network consists of two processes; flow export and flow collection [20].The flow records are exported from the network using flow-enabled devices.Many vendors offer built-in support in the network switches and routers for flow export.The flow collector receives flows from the flow exporter and stores them in a flow database for analysis.A flow exporter can forward flow records to more than one flow collectors.Similarly, a flow collector can receive flow from more than one flow exporters.
The process of transferring flow records between the flow exporter and collector is defined by a flow export and collection protocol.Different vendors have formulated proprietary flow export and collection protocol.However, Cisco's Netflow is a common flow export and collection protocol and is supported by almost all major vendors.Due to the increased requirement of IP flow information for network management, the Internet Engineering Task Force (IETF) has standardized the flow export and collection protocol as IP Flow Information Exchange (IPFIX) protocol [19].IPFIX is very flexible protocol and defines around 280 attributes for IP flow records.
The IP flow records have a number of applications including billing, congestion control, traffic analysis and intrusion detection.The intrusion detection system using IP flows records for attack detection are called Flow-based IDS.Flowbased IDS have several advantages over DPI-based techniques [12].The flow records contain aggregate packet data; therefore, fewer resources are required to process the flow data.The flow-based IDS are also not effected by the use of encryption because flow records do not have any payload.Flow-based technique only scans the data up to transport layer, and no confidential information leaves the network [1].
Flow-based Intrusion detection is an on-going research area [20].This paper proposes a novel multi-stage model for flowbased IDS.The multi-stage model separates malicious flows from normal flows in the first stage.The malicious flows are processed by a second stage which associates an attack type with the malicious flows.We also give implementation details of our model using machine learning techniques.We suggest the use of one and multi-class classification technique for first and second stage intrusion detection processes respectively.Our future work will include a rigorous evaluation of different one-class and multi-class techniques for flow-based intrusion detection.The best performing classification technique will be combined in a multi-stage model for a comprehensive flowbased intrusion detection framework.The multi-stage model will be evaluated on various flow-based intrusion datasets to obtain the performance results.
The organization of the paper is as follows: Section 2 discusses related work in multi-stage intrusion detection systems.The architecture of our proposed model is given in Section 3. Section 4 presents the implementation detail of our model using machine learning algorithms.The conclusion of our work is presented in Section 5.

II. RELATED WORK
The multi-stage detection of network attacks has been applied using two different approaches.The first approach considers a single attack type spanned over multiple stages.Various stages of an attack include vulnerability scan, weakness exploitation, invasion, control, and spread.Every stage of an attack corresponds to a detection stage in the multi-stage IDS.In [9], a technique for detection of a single type of attack using multi-stage traffic analysis was proposed.Similarly, a multi-stage IDS using Hidden Markov Model is presented in [16].Every attack stage is analyzed by detection agents using predefined attack signals.The signals of all attack stages are estimated by a determination stage using Hidden Markov Model for final intrusion detection decision.The IDS is evaluated on DARPA dataset and achieved a detection rate of 90%.
The other method for multi-stage IDS detects a different type of attack in every stage.In [7], a network intrusion detection technique using Learning Vector Quantization(LVQ) was proposed.The authors used multiple stages to detect different types of attack.The technique was evaluated on DARPA dataset and achieve very low error rate.A multi-stage filter using enhanced AdaBoost for network intrusion detection is presented in [17].The technique is evaluated on DARPA dataset and achieved good results for some attack types.A malware prevention and detection system using a combination of signature and anomaly-based IDS is presented in [2].The signature-based IDS uses general characteristics of attack for detection.The anomaly-based IDS is implemented using the RIPPER classifier.The signature and anomaly based IDS are implemented in three stages.The first stage classifies the traffic as normal or malicious.The second stage determines the attack type while the third stage determines the variant of a particular attack type.The technique is evaluated on the NSL-KDD99 dataset and achieved F1-measure of over 0.97 for different stages.
In [22], a multi-stage detection model using time-slot and flow-based detection, is proposed.The time-slot detection stage checks the incoming traffic for obvious traffic characteristic.This stage classifies the traffic into normal, suspicious and malicious categories.The traffic detected as suspicious is converted into IP flows and forwarded to the flow-based detection stage.The technique is evaluated in DARPA dataset and achieved a detection rate of 68.4%.
In [3], the authors proposed a real-time multi-stage intrusion detection system using unsupervised learning to improve the detection rate of unknown attacks.The system uses IP flow records for attack detection.The multi-stage model uses two detection engines.The first engines use sub-space clustering and to detect DoS, DDoS, and other attacks.The second detection engine analyzes the relation between attackers to detect Bot-master.The proposed technique focused on improving the detection rate of unknown attacks by additional flow features.
Our proposed approach differs from the existing work.Unlike most of the techniques, our model uses IP flow records instead of packets for intrusion detection.Our model separates the normal and malicious flow in the first stage and determines the attack type in the second stage.The implementation of our model uses a one-class and multi-class classification at the first and second stage.The use of one-class classification in a multi-stage model is a novel idea.The next section presents the architecture of our proposed model.

III. ARCHITECTURE OF PROPOSED MODEL
Although flow records contain summarized network traffic information, the flow data can be very large in high-speed networks [14].Flow monitoring and analysis tools employ packet sampling techniques to obtain a subset of flow records [13], [4].Furthermore, IPFIX defines around 280 flow attributes.Additional flow attributes can also be computed using the base flow attributes by the IDS to detect different types of network attacks.Large input size and high feature space can overload the IDS.Also most of the traffic in the network is normal as compared to malicious traffic.Processing of malicious as well as normal traffic by the IDS will be performance bottleneck.
We propose a multi-stage model for intrusion detection in high-speed networks using IP flow records.Figure II shows the architecture of our proposed approach.The IP flows are collected from the network using a flow-enabled device.These IP flows are passed through an attribute selection step.The multi-stage model uses two stages for attack detection.The first stage selects a minimal set of attributes and determines whether incoming IP flows are normal or malicious.The first stage uses a fast and computationally inexpensive technique for detection.It discards the normal flows and forwards the malicious to the second stage detection process.An initial intrusion alert is also sent to the consolidated intrusion alert module.
The second stage process performs detail intrusion detection on the malicious flows.The size of malicious flows is very small in overall network traffic.Due to small input size, the second stage can commit additional resources for detailed and accurate detection of an attack type.The second stage analyzes the malicious flows and determines the attack type.The second stage can also use additional flow attributes for precise detection of an attack.If the flows do not belong to any attack class, they are marked as unknown in the detail intrusion alert.The unknown flows can belong to an unseen attack, or they can be false posties of the first stage.The second stage sends a detail intrusion alert to the alert module.The alert module raises a consolidated alert combining the alert information received from both detection stages.
Our proposed multi-stage model discards normal flows in the first stage and ensures that only malicious flows are subject to detail intrusion detection.This increase the efficient of our model because no resources are consumed in the processing normal flows.Another benefit of our approach is the reduction of false positives.If the malicious flows detected in the first stage contain false positives, the second stage process does not associate a class type with such flows.The next section gives implementation detail of our model using machine learning techniques.
The value of f o is used in a decision function h o to obtain the classification result.For all unclassified IP flows z i ∈ Z, if the value of f o (z i ) is higher than the maliciousness threshold t, the flow is classified as malicious or normal otherwise.The value of maliciousness threshold t is user-defined.
The malicious flows recognized in the first stage are forwarded to the second stage.The second stage detection process associates an attack type with the malicious IP flows.Since the number of attack types can be more than one, we use multi-class classification technique to classify the IP flows into different attack types [8].
The training set Y contains labeled malicious IP flows for K attack types.The multi-class classifier learns an output function f mk (y i ) for all K attack types using the training set Y where y i ∈ Y .The function f mk gives a confidence score for all attack types in K.
For all unclassified IP flows z i ∈ Z, The incoming flow is classified into the attack type for which the function f mk (z i ) gives the highest confidence score.
The classification result of both stages is combined in a consolidated intrusion alert module.The alert module output the maliciousness of flow and the possible attack type in the alert.The information can be used by the security administrator to protect the integrity of the network.
Our future work will explore the application of one-class classification to IP flow records for intrusion detection.We will review available one-class classification methods and evaluate them on flow-based intrusion datasets for detection of malicious flows.Different techniques used for one-class classification include density estimation, reconstruction methods, and boundary methods.The outcome of the step will determine that which one-class classification perform better in intrusion detection using IP flow records.
In the second step, various machine learning technique will be evaluated using flow-based datasets for classification of malicious IP flows in different attack classes.In the third step, we will combine the best performing one and multiclass classification techniques and develop a multi-stage flowbased intrusion detection model.We will use various flowbased datasets [11] and testbeds to evaluate the performance of proposed intrusion detection system.

V. CONCLUSION
This paper proposes a multi-stage model for intrusion detection using IP Flow records.The first stage classifies the IP flow records into the normal and malicious classes.The second stage detection process performs detail analysis and classifies the flow into different attack types.We also give implementation detail of our model using one and multi-class classification.We conclude that our model is efficient since it discards the majority of the flows in the first stage using a computationally inexpensive algorithm.Only malicious flow are analyzed in detail.The multi-stage detection model also reduces the false positive rate through the application of two different classification techniques.

Figure
Figure III shows the implementation of our model using machine learning classification algorithms.The first stage detection process only detects malicious IP flows.There is only one target class in the first stage.We have proposed the use of the one-class classification for detection of malicious flows in the first stage.One class classification techniques learn the model for one target class.It only recognizes objects of target class and all other objects are rejected.The training set for one-class classification technique also consists of target class