An Adaptive Intrusion Detection Method for Wireless Sensor Networks

Current intrusion detection systems for Wireless Sensor Networks (WSNs) which are usually designed to detect a specific form of intrusion or only applied for one specific type of network structure has apparently restrictions in facing various attacks and different network structures. To bridge this gap, based on the mechanism that attacks are much likely to be deviated from normal features and from different shapes of aggregations in feature space, we proposed a knowledge based intrusion detection strategy (KBIDS) to detect multiple forms of attacks over different network structure. We firstly, in the training stage, used a modified unsupervised mean shift clustering algorithm to discover clusters in network features. Then the discovered clusters were classified as an anomaly if they had a certain amount of deviation from the normal cluster captured at the initial stage where no attacks could occur at all. The training data combined with a weighted support vector machine were then used to build the decision function that was used to flag network behaviors. The decision function was updated periodically after training by merging newly added network features to adapt network variability as well as to achieve time efficiency. During network running, each node uniformly captured their status as feature vector at certain interval and forwarded them to the base station on which the model was deployed and run. Using this way, our model can work independently of network structure in both detection and deployment. The efficiency and adaptability of the proposed method have been tested and evaluated by simulation experiments deployed on QualNet. The simulations were conducted as a full-factorial experiment in which all combinations of three forms of attacks and two types of WSN structures were tested. Results demonstrated that the detection accuracy and network structure adaptability of the proposed method outperforms the state-of-the-art intrusion detection methods for WSN. Keywords—Wireless sensor network; intrusion detection system; knowledge based detection; clustering algorithm; weighted support vector machine


I. INTRODUCTION
Wireless Sensor Network (WSN) is usually composed of many randomly distributed tiny wireless sensor nodes that collect and send sensory data in a coordinated way.In contrast to traditional wireless networks, inherent advantages such as lower cost and more convenient deployment have largely extended WSN application fields, e.g., health care monitoring [1], smart home [2] and military surveillance and reconnaissance [3].The security of WSN for those crucial fields has been an important demand [4].Since wireless sensor nodes usually are limited by power supply, computation capability and communication range [5], traditional encryption/decryption techniques that require an uninterrupted power supply to retain frequently key management and access control are unrealistic to be applied to WSN [6].Thus, establishing an intrusion detection system (IDS) to meet the security requirements in WSN is essential.
In contrast with wired and ad hoc wireless networks, WSNs are susceptible to various forms of security threats due to their open and unreliable communication channel, dynamic topology structure as well as lacking central coordination [7].In general, intrusion can be made by singular or multiple attacks.The singular attack, such as flooding attack, black hole attack, rushing attack and so on, occurs independently in WSNs during a certain interval.In flooding attack, a malicious node usually attempts to overwhelm processing capacity and energy of the sensor node as well as network bandwidth by constantly sending a stream of insignificant packets at a very short interval [8].In black hole attack, an intruder tampers with packets by advertising itself as the possible shortest path to the destination node, which results in the fact that most of the packets are forwarded to the intruder [9].In rushing attack, the attacker forwards RREQ (route request packets) packets immediately without processing after received them from other nodes, which results in high jitter of the entire network [10].For the multiple attack situations in which various types of attacks occur simultaneously, the intrusion features might be blurred by the intertwined attacks.For example, if flooding attack and black hole attack occurring simultaneously, when the relay nodes received the flooding packets, they may forward these packets to the black hole attacker instead of keeping flooding them, which makes the attacks appearing reasonable and thus covers the intrusion.Since multiple-attacks is much likely to happen than single ones, it would be benefit to have an intrusion detection solution that is capable of handling multiple attacks [11].
In recent years, many intelligent intrusion detection systems that can only deal with singular form of malicious attacks have been developed for WSN.Athmani et al. [12] protected hierarchical WSN from black hole attack by www.ijacsa.thesai.orgcontrolling packets transfer between sensor nodes and the base station.Although this lightweight scheme demonstrated significant improvement in energy saving, it was unable to defend flooding attack that aims to increase packets transfer between sensor nodes [13].In order to minimize energy consumption in intrusion detection activities, Di Sarno and Garofalo [14] proposed a method in which status of node energy was only necessary to detect multi-layer flooding attacks.However, this cross-layer intrusion detection approach has not shown the ability to detect attacks in which energy is irrelevant.For example, in a selective forwarding attack, a malicious node either forwards packets of a certain node or not does not significantly affect its energy consumption.Lim and Huie [15] introduced a Hop-by-Hop Cooperative Detection (HCD) method to decrease the probability of misbehavior forwarding while achieve more than 95% package delivery.However, the paper did not mention how to detect attacks that are not misbehavior forwarding revelant, such as flooding attack.Sarigiannidis et al. [16] presented an expert system, i.e., the RADS (rule-based anomaly detection system), based on an ultra-wideband (UWB) ranging-based detection algorithm.It seemed promising in detecting sybil attack in large-scale WSN with high detection rate and low false alarm rate, while no cooperation and data sharing between nodes are needed.However, no evidence has been shown that the RADS is able to detect unknown attacks.Obado et al. [17] calculated the number of hops on the shortest paths between a source node and a destination node as input to a Hidden Markov Model (HMM) Viterbi algorithm to identify wormhole attack.Although the HMM Viterbi algorithm reduced power consumption of the sensor nodes, it was unable to recognize other attacks that are path independent, such as flooding attack and rushing attack.Although these intrusion detection systems demonstrated merits in terms of detection capability or minimization in resource consumption, the bottleneck is that they just can detect singular threat.Researchers have been seeking available information that can be helpful to detect multiple attacks.According to Butun et al. [7], if a profile representing stochastic network behavior is generated in feature space based on the captured network traffic, malicious behaviors against a WSN could lead profile in the feature space to be deviated from the normal range and form different aggregations.Thus, different forms of attacks are much likely to have different shapes of aggregations in feature space.
In general, WSN has two types of network structures (topologies), i.e., hierarchical (cluster) network and flat network [18].In the hierarchical network, nodes are organized into clusters according to their range of transmission.Each cluster has a cluster head that is responsible to transmit information to the base station.In the flat network, all nodes are identical in routing functions, i.e., transmitting packets in a multi-hop way [7].Current intrusion detection systems usually take advantage of information of network structures to detect attacks [18].Shamshirband et al. [19] proposed a cooperative multi-agent based fuzzy artificial immune system to detect DDoS (distributed denial-of-service attack), where the sink node and base station work together to choose the best strategy for discovering an impending attack.However, the authors did not detail the cooperative manner between the common nodes and the base station in the flat network as well as the implementation.Based on the mechanism that the residual energy of nodes around the sinkhole is much less than other nodes when a flat network is suffering sinkhole attack, Shafiei et al. [20] built a geostatistical hazard model and a distributed monitoring method to detect and defend sinkhole attacks.However, this strategy does not apply to hierarchical case, because it is very difficult to identify sinkhole attack launched in a cluster head when there is no significant difference in residual energy of nodes around the cluster head between normal and attacked situations.Therefore, the information of network structure may be helpful to form patterns in intrusion detection on one hand, it may also restrict the application scope of IDS [21] on the other hand.That means that how to efficiently use network structures while not be constrained by them, i.e., to make the IDS to be network structure independent, is tricky.
The aim of this research was to develop a network structure independent intrusion detection model for WSN.The proposed model employed a knowledge-based detection strategy in which the mechanism is based on the fact that different forms of attacks are much likely to have different shapes of aggregations in feature space.Specifically, we captured network traffics and projected them into feature space as profiles representing stochastic network behaviors, and then the shapes of aggregations of the profiles could be regarded as an indicator to flag network behavior as normal or abnormal.To achieve this goal, we firstly, in the training stage, used a modified unsupervised mean shift clustering algorithm to discover clusters from the profile in the feature space.Then the discovered clusters can be classified as an anomaly if they have a certain amount of deviation from the normal cluster (behavior) captured at the initial stage where no attacks could occur at all.The training data combined with a weighted support vector machine were then used to build the decision function that was used to flag network behaviors.The decision function was updated periodically after training by merging newly added network traffic to mitigate the impact of outliers and noise as well as improve detection accuracy.During network running, each node uniformly captured network traffics as profiles at certain interval and forwarded them to the based station on which the model was deployed and run.Using this way, our model can work independently of network structure in both detection and deployment.
The rest of this paper is organized as follows.Section 2 briefly describes related work.The proposed model is presented in Section 3. Simulation intended to evaluate the performance of the model is presented in Section 4. Section 5 summarizes this paper with indications of future work.

II. RELATED WORKS
A typical anomaly detection technique usually identifies behavior that has a certain amount of deviation from normal behavior as an anomaly.Garofalo et al. [22] utilized decision tree classification and lightweight detection techniques to achieve trade-off between high detection rate and energy saving.However, the paper did not give detail how to deal with unknown attacks not described in the reference dataset.A lightweight IDS was developed by using a wrapper based feature selection algorithm to remove redundant features and www.ijacsa.thesai.orgemploying a neural network based decision tree to optimize feature selection [23].Although this detection paradigm increased the generalization ability by incorporating neural networks, its ability to identify unseen pattern was incomplete due to lacking updated decision function.A bio-inspired approach, i.e., the Watchdog based Clonal Selection Algorithm (WCSA), was implemented by Nishanthi and Virudhunagar [24].It was successful in detecting known attacks but failed to detect unknown ones [4].While these intrusion detection methods were featured as energy saving and high detection accuracy, they failed to detect "unknown" attacks.To address this problem, we used an unsupervised data mining method to classify an anomaly from normal behavior without any prior knowledge.In addition, the decision function was updated periodically to adapt to changes in network features over time to increase the generalization ability.
Improving detection accuracy can be achieved from two directions, i.e., increasing detection rate and decreasing false alarm rate.Salmon et al. [25] utilized a tailored Dendritic Cell Algorithm (DCA) derived from Danger Theory immuneinspired techniques in which different input signals can be categorized by DCA, i.e., the signals that caused damage were regarded as anomalous while others were classified as normal signals.Experimental data showed that DCA has high detection rate but no low false alarm rate.An agent-based artificial immune system was developed by [26].In their method, two types of agents, e.g., the dendritic cells agents and the T-cell agents, collaborated with each other to count danger value being regarded as indicators to detect malicious attacks.This scheme achieved low false alarm rate but still cannot obtain enough detection rate [19].Accordingly, we used a weighted support vector machine to maximize the margin between clusters of normal and anomaly to minimize the classification error, which in turn effectively enhanced detection accuracy.

III. THE MODEL
In this model, network traffics are discretized by time slice defined as (Fig. 1).Each node captures and sends its status as a d-dimensional feature vector ( ) to the base station at interval of , where d is the number of feature types (see Table 3 for detail).The proposed method is performed at the base station and includes the following four steps (Fig. 2): 1) Preprocessing: Training data are normalized by the min-max normalization method.
2) Training: The normalized training data are grouped into a certain number of clusters by a modified mean shift clustering algorithm.These clusters are eventually merged into two clusters according to the distance between them and the center of other clusters.Each feature vector in the training data is tagged as normal or anomaly by comparing with the normal data and the result of clustering.Further, each feature vector is assigned a weight representing the distance between it and its cluster center.The training data with labels of weights are served as inputs to a weighted support vector machine to establish a decision function.
3) Detecting: The testing data are flagged as normal or anomaly by the decision function.
4) Updating: In the testing stage, the feature vectors that has been processed are merged into the training data to rebuild the decision function at specific an interval of .www.ijacsa.thesai.org The intrusion detection algorithm is deployed on the base station of a WSN and other nodes are only responsible for capturing and transmitting their own network status.

A. Preprocessing
In order to mitigate the effects of extreme value at one or several dimensions on final results as well as speed convergence of the algorithm [23], training data are normalized by the min-max normalization method.Giving the training data .
A set of the minimum and the maximum values for each column of are respectively obtained as { } and * + .Each feature vector is then normalized by ( 1): ( Where is a normalized feature vector.

B. Training
The feature vectors (points) should be aggregated into a certain region in the feature space in the normal situation.But they would be deviated from the normal region while being attacked.From the perspective of feature space, different forms of attacks may result in significant difference in degree of deviation from the normal region, and thus generate several aggregations.The unsupervised mean shift clustering algorithm (MSCA) [27] can effectively discover different concentrated regions in the feature space to form arbitrary clusters, which represents the aggregation of the features resulted from attacks.In this step, the MSCA is employed to cluster training data, flag them as normal or anomaly and feeds them into the classifier for training.
Given nr data points on a ddimensional space , the initial feature vector is continuously shifted by adding a shifting vector where can be calculated by (2) (please note that the shifting of results in changes in ).stops shifting when falls below a certain threshold.
Where, ( ) ( ) .( ) and h are respectively the derivative and the bandwidth of the kernel profile ( ) which is defined by a multivariate normal kernel function ( ): Where, is a normalization constant assuring ( ) integrates to 1.During the process of the shifting, the points that it has traveled are regarded belong to the same cluster and the last one is regarded as the cluster center .In the standard MSCA, all data points do the same work as did.However, in order to speed the convergence and promote precision of the standard MSCA, a modified version that recording the track of shifting of feature vectors is given as follows: Each point on the track of and its shifting distances are recorded to form a similar set : For all feature vectors we have the similar sets: The cluster centers are defined as: Based on the track of , the subsequent each feature vector is clustered by the following two steps: Step 1: The Euclidean distances between and each point in each similar set is calculated by (7).If is less than the shifting distance , the tuple ( ) is added to ; Otherwise, proceed to step 2.

√∑ ( )
Step 2: does the same work as did to generate a new similar set and a cluster center .If has a cluster center which is equal to , then merge into : ; Otherwise, and are inserted to and respectively.
After the two steps are completed, the training data are grouped into several clusters.The cluster including the normal data is regarded as the normal cluster.The cluster whose cluster center is farthest from the normal cluster is classified as the abnormal cluster.The rest of clusters are then merged into either the normal or abnormal cluster based on the relative distance to them.When a feature vector is received after time n, it is immediately classified into the nearest cluster thus can be flagged as normal or anomaly without extra training.In addition, to mitigate the effect of outliers or noise in clustering process on final decision and improve the detection accuracy, a weighted support vector machine (WSVM) was introduced to build a decision function (i.e. the optimal margin hyperplane classification) from the clustering results.
Given tr data points: Where denotes class labels (i.e.normal or anomaly), the classification is defined as: Where w is a weight vector and b is the bias.In order to optimize and , the WSVM requires the solution of the following optimization problem: www.ijacsa.thesai.org Where is the penalty factor of misclassification. is the slack parameter to control noise.Vectors is mapped into a higher dimensional space by the function .
is a weight, which represents the relative contribution of to the decision function.
assigned to data point is calculated by (10): Where is the cluster center of .Unlike the standard SVM, where all training data points in one class are equally important, WSVM reduces the effect of outliers and noises by setting different weights [28].According to Lagrangian duality theory, the WSVM optimization problem in ( 9) is converted to a quadratic programming problem: 11) where is the Lagrangian parameter.The Karush-Kuhn-Tucker conditions of the SVM are defined as: Finally, the optimal value of and are gained by: Hence, the decision function is obtained by:

C. Detecting
In this step, each feature vector in testing data is flagged as normal or anomaly by the decision function (14).The feature vector is then merged into either normal or anomaly cluster according to the decision that has been made by the decision function.After that, it is used to update the decision function afterwards.

D. Updating
In order to cope with the possible changes in network features over time, the decision function needs to be updated at an interval of time .In this step, the cluster centers are reevaluated by MSCA with updated training data, and the weight of each feature vector is adjusted as well.After that, the decision function is updated accordingly by WSVM.
The pseudo code of KBIDS is shown in Algorithm 1.

IV. SIMULATION EXPERIMENTS
In order to evaluate the performance of KBIDS, eight experiment scenario (Table 1) were simulated by QualNet on a PC with Inter(R) Core (TM) i7-4470k, 3.50GHz, 8GB memory (RAM).In a flat network, we randomly deployed 30 sensor nodes in a region with dimension of 1000(m) x 1000(m) and deployed the base station in the center of the region, as shown in Fig. 3(a).Ten percent of nodes were designated as malicious nodes that performed attack.Compared with the flat network, the hierarchical protocol is more suitable for large-scale www.ijacsa.thesai.orgnetworks in reducing node energy consumption and communication bandwidth [29].Hence, the number of nodes in the hierarchical network was 100, and the number of attackers was 10.The relationship between nodes in the hierarchical network is given in Fig. 3(b).For all types of network structure, the MAC layer and routing protocol of all devices were IEEE802.11 and Ad hoc On-demand Distance Vector Routing (AODV), respectively.Simulation time for each experiment scenario was set as 10000 seconds.The value of time n was 100 seconds and m was 5000 seconds.Network traffic flow was simulated by constant bit rate (CBR) with packets of 512 bytes.The mobility model of nodes was simulated random waypoint (RWP) model with pause time of 5 seconds and the maximum speed of 10m/s.The statistical data of the energy consumption was counted by MicaZ radio energy model [30].The key parameters of the simulation experiments were presented in Table 2.
In each singular attack scenario, only one type of the three attacks, e.g., the black hole attack, flooding attack or rushing attack, was lunched during 4000 and 7000 seconds.Unlike singular attack scenarios, the three attacks were simultaneously lunched during 4000 and 7000 seconds in the multiple attacks scenarios.Each of these scenarios was replicated five times by setting different initial position of sensor nodes.A feature vector (Table 3) which is the basic unit of information processing in KBIDS was constructed by capturing 13 types of features representing node status [31].
In order to test the efficiency of KBIDS, detection rate and false alarm rate were used as index of assessment [32] Detection rate was calculated as the percentage of the numbers of successfully detected anomalies over the total numbers of anomalies.False alarm rate was calculated as the numbers of false alarm over the total numbers of normal data [18].

V. RESULTS
Simulation results of KBIDS were compared with mainstream intrusion detection methods, such as PCA-based centralized approach (PCACID) [33] K-Means, Mean Shift, Decision Trees (DT) and Logistic Regression (LR), to evaluate efficiency of detection and adaptively of network structure (Fig. 4).Results showed that in the eight experiment scenarios, the average detection rate and the false alarm rate of KBIDS were 97.854% and 1.875% with small standard deviation 0.922% and 1.069%, respectively, which demonstrated an obvious advantage than other mainstream methods.Although K-Means, Decision Trees and Logistic Regression obtained more than 92% average detection rate and less than 3% average false alarm rate at the same situation, their results had a large fluctuation, i.e., 4.831% and 4.291%, 1.327% and 1.348% as well as 3.922% and 2.547%.This exhibited their weak capability in dealing with various forms of attacks.In some cases, PCACID and Mean Shift achieved lower false alarm rate than KBIDS.However, they failed to detect anomaly constantly in all scenarios.Overall, KBIDS showed advantages over other mainstream detection algorithms in term of detection rate and false alarm rate.In addition, KBIDS achieved stable performance in all scenarios, particularly in scenarios with different network structure.This was a strong evidence showing that KBIDS is network independent.As one of the key parameters, the length of time slice might be one important factor affecting results and network performance.Fig. 5 revealed the relationship between the variations in time slice and detection accuracy as well as energy consumption when the interval of updating was 300s.With the increasing of , the average detection rate and the energy consumption of data transmission decreased gradually while the average false alarm rate rose steadily for these cases.Overall, the time slice achieved a well trade-off between detection accuracy and energy consumption.Since KBIDS employed a constant updating strategy to adapt network variability over time, the impact of update interval on the average running time (ART, i.e., the energy cost) of the updating step and the average detection accuracy cannot be ignored.When , although lower (20s) achieved high detection rate and low false alarm rate (Fig. 6), its cost, i.e., ART=62.16s was much larger than where ART=1.34s.No surprise, the average detection rate and the average false alarm rate respectively decreased and increased with the increasing of .However, the trend became stable after .Overall, achieved a great balance between detection accuracy and energy cost, where ART was about 3.83s which was much less than time slice .
In this algorithm, the computational complexity of the clustering step is ( ), where is the average number of shifting, and is the number of feature vectors.The computational complexity of building WSVM is ( ) where is the dimension of feature vector.The computational complexity of the updating step is ( ( ) ) where M is the number of updates.Overall, the computational complexity of the algorithm can be approximated as ( ), which is not significantly affected by the number of dimensions of feature vector.We acknowledge that with the increasing number of feature vectors, it is inevitable to increase the time cost for the updating step, which is the main source of energy consumption that delays the decision process.We used two strategies to solve this problem.First, decision can be made immediately after learning stage when WSVM has been established, no full training process is necessary at each decision making; Second, old feature vectors are removed from training data when new feature vectors come in, so that the size of the training data can be kept at a constant level.

VI. CONCLUSION
In this research, we developed KBIDS, a network structure independent intrusion detection model for WSN.KBIDS employed knowledge based detection strategy based on the www.ijacsa.thesai.orgmechanism that attacks are much likely to be deviated from normal features and from different shapes of aggregations in feature space.In KBIDS, an unsupervised data mining method was used to classify an anomaly from normal behavior without any prior knowledge.In addition, the decision function was updated periodically to adapt to changes in network features over time to increase the generalization ability.Further, a weighted support vector machine was used to maximize the margin between clusters of normal and anomaly to minimize the classification error, which in turn effectively enhanced detection During network running, each node uniformly captured their status as feature vector at certain interval and forwarded them to its neighbor.The based station runs the model to detect attack.Using this way, our model can work independently of network structure in both detection and deployment.Simulation experiments conducted on QualNet platform demonstrated that our model outperformed other mainstream algorithms in terms of detection efficiency, stability across different network structures and computational complexity.Sensitivity analysis gave insights into how model performance can be affected by some key parameters, thus future improvement can be directed.

Fig. 1 .
Fig. 1.Network traffic.Each node captures its status at regular time interval t as a feature vector.Network traffic is divided into training data and testing data based on time boundary m.The feature vectors extracted in a short period of time [0, n] after WSN initialization are regarded as normal data.

Fig. 2 .Definition 1 :Definition 3 :Definition 4 :
Fig. 2. The schematic diagram of processing steps in KBIDS.Definition 1: Network traffic is defined as a matrix * + that contains all feature vectors recorded in interval [0, t], where and N is the total number of nodes.Definition 2: Normal data is defined as a matrix * + , which contains all feature vectors captured at the initial stage [0, n] where no attacks could occur at all.Definition 3: Training data is defined as a matrix * + , which contains all feature vectors captured at the training stage [0, m].Definition 4: Testing data is defined as a matrix * + ( ) , which contains all feature vectors captured at the testing stage (m, t).

Fig. 3 .
Fig. 3. QualNet simulation of two types of network structure in WSN.(a) Flat network, in which nodes transmit data in a multi-hop way.(b) Hierarchical network, in which nodes transmit data to base station in a hierarchy way.

Fig. 4 .Fig. 5 .
Fig. 4. Comparison of detection rate (a) and false alarm rate (b) between KBIDS and other mainstream detection algorithms in eight experiment scenarios.

TABLE II .
WSN CONFIGURATION

TABLE III
numDataRecved Number of data packets received as the destination of the data numHops Aggregate sum of the hop counts of all routes added to the route cache.www.ijacsa.thesai.org