A New Approach for Network Steganography Detection based on Deep Learning Techniques

One of the techniques that current cyber-attack methods often use to steal and transmit data out is to hide secret data in packets. This is the network steganography technique. Because millions of packets are sent and received every hour in internet activity, so it is very difficult to detect the theft and transmission of system data out using this form. Recent approaches often seek ways to compute and extract abnormal behaviors of packets to detect a steganography protocol or technique. However, such methods have the difficult problem of not being able to detect abnormal packets when an attacker uses other steganography techniques. To solve the above problem, this paper proposes a network steganography detection method using deep learning techniques. The highlight of this study is some new proposed features based on different components of the packet. By combining these many components, this proposal will not only provide the ability to detect many steganography techniques in the network, but also improve the ability to accurately detect abnormal packets. Besides, this study proposes to use deep learning for the task of detecting normal and abnormal packets. The authors want to take advantage of the big data analysis and processing capabilities of deep learning models in order to improve the ability to analyze and detect network steganography techniques. The experimental results in Section IVD have proved the effectiveness of this proposed method compared with other approaches. Keywords—Network steganography; network steganography detection method; abnormal packets; deep learning techniques


A. The Problem
The study [1] listed 11 different techniques commonly used to hide information in the network. These techniques are generally divided into three main technique groups: packet modification, stream modification, and hybrid. The research [2] presented some main difficulties that make it very difficult to detect and prevent network steganography techniques. To fix the problems in the research [2], current approaches often use two main methods: i) technique-specific methods, comprises methods proposed as countermeasures for specific steganographic techniques. Methods in this category usually operate on low-level network data, require relatively much computation resources, and are not able to detect other steganographic techniques instead of the one or several for which they are designed; ii) generic methods, comprises methods that are not designed to detect one specific steganographic technique but offer a comprehensive approach to network anomaly detection and categorization of network traffic for potential steganographic utilization. Methods in this category may not provide detailed information on detected suspicious traffic but can label it for further investigation. Most generic methods fall into two subcategories that characterize their approach: statistical or machine learning. The studies [3,4,5,6,7,8] presented several studies and proposals for detecting network steganography based on the abnormal behavior analysis technique and the ruleset database. However, noticed that these approaches have two problems [1,2,9,10,11,12]: using the available dataset and focusing on detecting only one steganography technique. Therefore, although these studies brought very high efficiency on experimental datasets, when applied in reality, they did not bring the desired result. To solve the above problems, this paper proposes a new method based on a group of generic methods. Specifically, this proposal will seek a way to optimize two main problems: i) defining and proposing features and characteristics of abnormal behavior of network steganography techniques; ii) use deep learning techniques on the basis of big data analysis to detect and classify cyber-attack techniques based on their unusual behavior defined in the task (i). Details of abnormal behaviors and algorithms for classifying network steganography techniques are presented in Section III of the paper. The results of evaluating the effectiveness of the proposed method are presented in detail in Section IV of the paper. Evaluation, conclusion, and future development direction are presented in Section V of the paper.

B. Contributions of the Paper
The practical significance and scientificity of this paper include: • Proposing some features and characteristics of the packet. The features proposed in the paper are new study, and are synthesized and extracted on many different components of the packet. The experimental results have proved these proposed features have brought many meanings.
• Proposing the use of deep learning models for the task of detecting network steganography. In the experimental section, this paper tunes the parameters in each deep learning model to provide the ability to choose for the systems to ensure a balance between the time and the efficiency of the detection method.

II. RELATED WORK
In the study [13], Mike et al. proposed a method to detect network steganography using the IDS tool. Specifically, the authors used the rulesets built in the IDS tool to detect hidden information in the network based on data sections of packets. In the study [14], the authors proposed a method to detect steganography in VoIP using the Least Significant Bits technique. Taeshik Sohn et al. [15] proposed a network steganography detection method using the Support Vector Machine (SVM) algorithm for detecting hidden information in TCP/IP protocols. Similarly, research [16] proposed using the Naive Bayes algorithm to detect secret information hidden in TCP/IP header. Cho et al. [3] proposed a method of detecting storage-based network steganography using machine learning. Specifically, in their research, the authors used the Random Forests (RF) algorithm to classify abnormal behaviors on ICMP and TCP/IP packets. Smolarczyk [2] proposed a method to detect steganography in the network using the multi-layer analysis technique.

A. Proposing the Method to Select and Extract Abnormal
Behavior of Network Steganography Techniques As mentioned above, the purpose of this paper is to use deep learning algorithms to detect network steganography based on analyzing different components of the packet. Specifically, three network steganography techniques studied in this paper are: • Size Modulation: The covert channel uses the size of a header element or of a PDU to encode the hidden data.
• Random Value: The covert channel embeds hidden data in a header element containing a random value • Reserved/Unused: The covert channel encoded hidden data into a reserved or unused header/PDU element.
Based on these attack techniques, this study will find ways to collect and analyze packets to look for their abnormal behaviors. Table I

B. The Detection Method
Thus, based on features of anomalous behaviors of packets defined and extracted in Table I, this paper will propose a method to classify these packets. It can be seen that to detect network steganography, previous studies often used algorithms such as SVM [4,15], RF [3]. To improve the efficiency of the network steganography detection method, this paper proposes to use some deep learning algorithms and models. Specifically, some deep learning algorithms and models proposed to use include: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Long short term memory (LSTM). Regarding the MLP network, the study [17] presented in detail the architecture of an MLP network that is built by simulating the way neurons work in the human brain. MLP networks usually have 3 or more layers including 1 input layer, 1 output layer, and more than 1 hidden layer. Besides, the efficiency of the MLP network depends on the activation function. This paper will tune activation functions to evaluate the effectiveness and suitability of activation functions for the network intrusion detection task. The CNN network is defined as a set of basic layers including convolution layer + nonlinear layer, fully connected layer. The detailed structure of CNN as well as the terms (stride, padding, MaxPooling) are presented in detail in the paper [18]. In which, the activation function used is ReLU.
The study [19] introduced the LSTM and its ability to remember information for a long time. This is reflected in the structure of the gates in each memory cell. A memory cell consists of three main components: input gate, forget gate, and output gate. Firstly, the forget gate decides what information should be discarded in the cell state. Next, the input gate decides what information is updated into the cell state. Finally, the output gate performs computing the desired output. During this process, the cell state is propagated through and updated when it passes through all nodes.

A. Description of Data Collection Method 1) For normal dataset:
The dataset of normal packets is collected at [20]. This dataset belongs to the "MAWI Working Group" and the "WIDE Project" which collected network traffic at ISP points in Japan. PCAP files are network traffic collected on April 30, 2021. Then 2,200,000 packets in these PCAP files are taken to conduct experiments. Table II shows the number of collected and extracted normal packets.
2) For stego dataset: This study proposes to use some network steganography tools to generate stego packets. Table  III below describes the tools and the steganography type of these tools in detail.
After successfully installing the above tools, running those tools and use Wireshark to capture the network traffic generated by those tools. Network traffic generated by each tool is saved as separate files. For example, network traffic generated by ptunnel is saved as a separate PCAP file, network traffic generated by covert_tcp is also be saved as a separate PCAP file. Only packets generated by these tools are saved. Other unrelated packets such as ARP or system packets are deleted to ensure that the PCAP file contains only stego packets containing secret information. Table IV below presents the number of stego packets generated by the tools listed in Table III. 3) Data synthesis: Based on the data collection method in Sections 1) and 2) above, obtaining an aggregated dataset for training and testing for detecting network steganography as shown in Table V.

B. Evaluation Criteria
The following measures will be used in this paper to evaluate the accuracy of models: • Accuracy: The ratio between the number of samples classified correctly and total number of samples. • F1-score: The harmonic mean of precision and recall.
The higher F1 score, the better the model is call precision • TP: The same to Recall. This shows the ability to detect the true stego packet.
• FP: The ratio between false positive value and the false positive plus true negative. This shows the false alarm rate of stego packet.

C. Experimental Scenario 1) Scenario for experimental dataset:
With the experimental dataset listed in Table V, the dataset is divided into different parts and then conduct experiments and evaluate the accuracy of the proposed models based on these experimental datasets. The whole process of dividing the experimental dataset into the scenarios will be chosen randomly in which 80% of the dataset is used in the training process, the remaining 20% is used in the testing process.
2) Evaluation scenarios: To see the effectiveness of the proposed method, this paper conducts two experimental scenarios as follows: • Scenario 1: Compare and evaluate the effectiveness of deep learning methods. For this scenario, this study conducts the evaluation according to the following algorithms: MLP, CNN, LSTM. During the experiment process, the authors tune parameters to see the effectiveness of the deep learning models.
• Scenario 2: Compare and evaluate the deep learning model with some other approaches on the same dataset.
D. Experimental Results 1) Experimental results of scenario 1: Comment: From the experimental results in Tables VI, VII, VIII, noticed that: • Regarding accuracy: Based on the classification results, found that the LSTM model yielded better performance than other deep learning models. Specifically, at the Accuracy measure, the LSTM model reached the absolute rate with two and three layers. This result is higher than that of CNN models by 1.5 % and MLP by 1.45%. Similarly, with the Recall measure, the LSTM model is higher than other models from 0.01 to 0.3%. In general, deep learning models brought high efficiency for the task of classifying normal and abnormal packets. The authors think the reason is that the packets are analyzed by us into different components, and then features are extracted from these components. This makes their abnormal behaviors are highlighted so it supports the classification process better. In addition, deep learning models, especially LSTM with the ability to remember features and hidden layers, have synthesized many important features. Therefore, it can be seen that this proposal is completely correct and reasonable.
• Regarding prediction time: Based on the experimental results, noticed that the LSTM model takes more time than other models for both training and testing processes. In particular, the training time of the LSTM model is about 2 times higher than the CNN model and 7 times higher than the MLP model. Regarding the detection time, the LSTM model is about 12 times higher than the CNN model and 4 times higher than the MLP model. From this result, seeing that although the LSTM model is more efficient than other models, they are many times more time-consuming than other models. Therefore, in reality, monitoring and detection systems need to choose the appropriate model to balance both detection time and efficiency.
40 | P a g e www.ijacsa.thesai.org 2) Experimental results of scenario 2: For this scenario, this paper will compare and evaluate the effectiveness of this proposed method with two other algorithms, RF and SVM, which were proposed in previous research. Table IX describes the experimental results of these algorithms. The results in Table IX show that the LSTM model proposed in this study gave 2% to 5% better performance than other algorithms in the same approach. It can be seen that the result of this study is superior to other related studies. The reason is that this study has proposed new meaningful features, and the deep learning classification algorithm also developed the ability to synthesize and analyze features.

V. CONCLUSION
In this paper, with the purpose to propose a new method to improve the efficiency of the network steganography detection process, the study has accomplished two tasks: i) proposing features and characteristics of abnormal packets; ii) using deep learning models for the abnormal packet classification task. Regarding the problem of analyzing abnormal features and characteristics, based on different components of the packet, this study has extracted many important and meaningful features. This is a breakthrough proposal in the task of analyzing and extracting features of packets. Regarding proposing the deep learning model, this study has succeeded in training the models to support the classification process. The experimental results in section IV.D have proved that this approach not only has scientific meaning, but also has many practical meanings, because this proposal has yielded better results than other models on all metrics. In the future, to improve the ability to detect abnormal packets, based on the research results in this paper, the authors think that it is possible to consider improving and supplementing two main issues: i) abnormal features of packets; ii) classification methods using combined deep learning networks or Attention networks.