A Novel Adaptive Grey Verhulst Model for Network Security Situation Prediction

Recently, researchers have shown an increased interest in predicting the situation of incoming security situation for organization’s network. Many prediction models have been produced for this purpose, but many of these models have various limitations in practical applications. In addition, literature shows that far too little attention has been paid in utilizing the grey Verhulst model predicting network security situation although it has demonstrated satisfactory results in other fields. By considering the nature of intrusion attacks and shortcomings of traditional grey Verhulst model, this paper puts forward an adaptive grey Verhust model with adjustable generation sequence to improve the prediction accuracy. The proposed model employs the combination methods of Trapezoidal rule and Simpson’s 1/3 rule to obtain the background value in grey differential equation which will directly influence the forecast result. In order to verify the performance of the proposed model, benchmarked datasets, DARPA 1999 and 2000 have been used to highlight the efficacy of the proposed model. The results show that the proposed adaptive grey Verhulst surpassed GM(1,1) and traditional grey Verhulst in forecasting incoming security situation in a network. Keywords—Grey Theory; Network Security Situation Prediction; Adaptive Grey Verhulst Model; Adjustable Generation Sequence; Prediction Accuracy


INTRODUCTION
Internet has become an impeccable necessity in our life providing services such as information sharing, communication, social interaction and etc.The acceleration of countries modernization and proliferation of mobile devices utilization has boost up the Internet users to reach 3.17 billion in 2015 [1].The growth of Internet is further driven by new technologies such as cloud computing and Internet of Things (IoT).However, the immense popularity of the Internet and prevalent use of online services has made Internet a breeding ground for malware and cyber criminals.New security challenges are emerging while people are enjoying to sharing their resources borderlessly.In 2014, Symantec has encountered a 23% and 40% increase in data breaches and phishing attacks respectively compared to previous year [2].This alarming situation brings serious challenges to network security worldwide.
Prevention is better than detection and recovery.Due to the rising number of the threats, network security communities nowadays crave to know the incoming security situation in their network before any precaution taken.Unfortunately, countering attacks in an Intrusion Prevention System (IPS) with a complete list of responses is insufficient.Surprisingly, in a study done by University of South Wales in 2013 on nine big-brand IPS systems, they found that seven out of them failed to detect and prevent up to 49% of attacks that target vulnerabilities especially in web-based application [3].Therefore, predicting the incoming security situation in an entire network is desired to facilitate IPS to be more intelligent in in aspect of preventing the problem from growing and in returning the system to a healthy mode.Coincidentally, security situation prediction capability was considered as one of the main components in situation awareness when the concept has been introduced by Endsley [4] to the world.The idea then has been first adapted in the cyberspace by Tim Bass [5] with 3-hierarchical phases network security situation awareness (NSSA) which consists of event detection, current security situation assessment and future security situation prediction.
Recently, the governments, enterprises and other stakeholders started to adapt the concept of NSSA and seek some appropriate strategies especially in predicting incoming security situation in their network before any incident occurred.For instance, in Germany, the National Cyber Response Centre is responsible to alert the crisis management staff whenever the cyber security situation reaches the level of an imminent or already occurred crisis [6].Meanwhile, in Malaysia, the National Cyber Security Policy addressed that there is a need to develop effective cyber security incident reporting mechanisms which capable of disseminating vulnerability advisories and threat warning in a timely manner in order to strengthen the National Computer Emergency Response Teams (CERTs) in monitoring the situation of critical national information infrastructure [7].From the efforts of aforementioned countries in their strategic planning, it obviously brings us a significant motion that future network security situation prediction is very much in demand at the top level of cyber security strategic plan.
The rest of this paper is structured as follows: the authors first discuss some limitation of existing prediction models.Then, the author present a novel adaptive grey Verhulst model with the approach of calculating its adjustable generation sequence in the following section.Next, the authors demonstrate the grey prediction models with benchmarked datasets, The Defense Advanced Research Project Agency (DARPA) 1999 and 2000 (LLS DDOS1.0 and LLS DDOS 2.0.2).To verify the performance of the proposed model, the authors compare the accuracy of prediction result of our model www.ijacsa.thesai.orgwith traditional GM(1,1) and grey Verhulst models from the aspects of their Mean Absolute Percentage Error (MAPE) and Root Mean Square Deviation (RMSD).Finally, the authors summarize our work with a conclusion.

II. LIMITATION OF EXISTING PREDICTION MODELS
A considerable amount of work has been published on designing network security situation prediction models.These studies can be categorized into three groups, i.e.Machine Learning, Markov Model and Grey Theory [8].Prediction based on machine learning such as neural network and support vector system is commonly used in situation prediction due to its high convergence rate and strong fault tolerance capacity.But it requires a large amount of training data to gain the appropriate parameters and establish self-learning neurons.Furthermore, the method is unsuitable for small-scaled data as less input information will slower the convergence.Markov model, on the other hand, is also to be used to perform the prediction in various time series such as series of network situation.Nevertheless, the model is complex and difficult to build due to its difficulties in making assumption on all possible states and transitions especially in a network which is highly heterogeneous in nature.Since data pertaining to network situation may be inconsistent and incomplete, grey theory especially First-order One-variable grey model (GM(1,1)) has been widely used to provide better prediction in short-term forecasting with small sample data without any training required.Regrettably, the method is only limited to linear time series and it is not suitable for non-stationary random sequence.Apparently, the generation sequence with mean is only limited to small time interval and it depresses the model precision with delay error.In fact, Grey Verhulst, a type of small sample predicting model in Grey Theory which able to forecast the situation with single peak of data sequence.However, the model failed to apply some related influencing factors which will degrade its performance [9].Observing the chronology of an intrusion attack as illustrated in Figure 1, the authors argue that an adaptive Grey Verhulst model with its adjustable generation sequence is best suited to predict the incoming network security situation which behaves as a nonlinear time series.

III. PROPOSED ADAPTIVE GREY VERHULST MODEL
GM(1,1) and Grey Verhulst are theories to deal with indeterminate and incomplete system with their superiority in small sample.Nonetheless, they have similar problem in overshoots which caused by the non-monotonic time series data [10].In addition, the generated sequence also make the prediction generate the advance or delay error which will depress the model precision [11].Hence, this paper attempts to show that adaptive determination of grey parameters in grey Verhulst model is able to guarantee the precision.The adjustable generation sequence in this adaptive grey Verhulst model is not only suitable to forecast a stochastic time series such as incoming network security situation but also to handle multiple-peak situation variation which is inherent in network behavior [8].
In order to predict the incoming network security situation, a sequence of current and historical assessment of network security situation is used as input to the model.Figure 2 The value of In order to find the value of a and b , the matrix B and vector A have been solved by using matrix method, . The value of a and b can be obtained through the formulas below: can be determined by using the formula below.

IV. ADAPTIVE BACKGROUND VALUE GENERATION
Background value, zt is a crucial factor that influences the adoption of grey theories and theirs forecasting result.The value of developing coefficient, a and the precision of the model will be affected by different background values [12].
In traditional grey Verhulst, the grey differential equation can be written as

az t b z t 
(1) where Observed from the differential equation, background value has direct influences on the precision of the Grey Verhulst.Its value is determined by  which range 01   .On the basic of traditional grey theories,  is always set as 0.5 to equalize the importance of each data [10,13,14].In this context, the ignorance of data characteristic has produced more prediction errors [12,15].Thus, to improve the performance of grey theories especially in grey Verhulst, the error term resulted from the background value generation have to be eliminated.In other words, finding a suitable background value for the model is an essential subject to improve the prediction accuracy.www.ijacsa.thesai.org Based on [16], the most suitable background value should be located in between as illustrated in figure 3. Due to the developing coefficient will direct affect the background value, thus the newer data should be emphasized by assigning a larger value of  [12].In fact, setting the value of  is a process to search the optimal solutions within the value space.The time series dataset should be regarded as several different populations [17].Hence, the value of  should be adaptable at each timescale with different adjustable background values as depicted in figure 4.
The possible error which might degrade the precision of grey Verhulst can be identified prior to its elimination.In grey Verhulst, the whitening equation of grey Verhulst model is written as By integrating both side of equation ( 2),  1) and ( 4), the background value can be determined as below.
(5) From the equation ( 5), the error is exist if there has an inequality equation as follows.
Therefore, to eliminate the error, the background value must be equal to the integration of Indeed, equation ( 6) represents an area under a graph function as presented in figure 5. www.ijacsa.thesai.org8) can be further simplified as follows.
Finally, the background value,     1 zt can be obtained through the equation below.

V. CASE STUDY AND RESULTS
The DARPA/MIT Lincoln Lab evaluation datasets 1999 and 2000 have been published and widely used in evaluating the performance of prediction models [18][19][20][21][22] recently.In these datasets, there are various attacks found and there can be categorised into five main classes namely, Probe, Denial of Service (DoS), Remote to Local (R2L) and User to Remote (U2R) and the Data attacks [23].
In order to verify the performance of our proposed adaptive grey Verhulst in predicting network security situation, these three benchmarked datasets, DARPA 1999 and 2000 (LLS DDOS 1.0 and LLS DDOS 2.0.2) have been used in our model as well as traditional GM(1,1) and grey Verhulst models.These datasets were divided into several time-slots based on hours or minutes, and be evaluated by using entropy-based network security situation assessment approach in [24].The values of situation assessment for each time slots have been used as input for the prediction models to forecast the next network security situation.Table 1, 2 and 3 present the prediction results from each dataset for the three models of grey theory aforementioned.
From the computational results aforementioned, Mean Absolute Percentage Error (MAPE) and Root Mean Square Deviation (RMSD) are used as evaluation metrics to determine the performance of prediction models in term of its accuracy.MAPE is a measure of accuracy of a method for constructing fitted time series values in statistics especially in trend estimation while RMSD is frequently used to measure the differences between the values predicted by a model and the values actually observed.The numerical results show that adaptive grey Verhulst model has attained average 93.3% of prediction accuracy while GM(1,1) and traditional grey Verhulst models has only achieved 87.3% and 92.0%respectively.Compared to both traditional GM(1,1) and grey Verhulst model, the lower MAPE and RMSD values produced have further prove the proposed prediction model is more reliable in forecasting incoming security situation in a network.

Fig. 1 .
Fig. 1.The chronology of an intrusion attack

0 X
depicts the process flow of adaptive grey Verhulst model.First, a sequence of network security situation assessment,   is channeled into the model and a new sequence of accumulated data is built by applying the 1-Accumulated Generating Operation (1-AGO).

0 X
where a is development coefficient which its size reflects the growth rate of the sequence   and b is the role of vector which is grey input in Grey Verhulst model.After that, the equations are rearranged into matrix form Ŷ of B and Y as below.


With the value of a and b , the predicted time response sequence of Grey Verhulst model,

Fig. 5 .
Fig. 5. Area under a graph function Due to the curve function is unknown, in our proposed adaptive grey Verhulst model, the background value,     1 zt is calculated by the combination methods of Trapezoidal rule and Simpson's 1/3 rd rule.These rules are used to determine the area under a graph without knowing its function.Trapezoidal rule is based on approximating the integrand by a first order polynomial and then integrating the polynomial in the interval of integration while Simpson's 1/3 rd rule is an extension of Trapezoidal rule where the integrand is approximated by a second order polynomial.From figure 5, the area under the curve from time interval 3 t  to t can be determined as

TABLE I .
PREDICTION RESULT FOR DARPA 1999