Search Space of Adversarial Perturbations against Image Filters

The superiority of deep learning performance is threatened by safety issues for itself. Recent findings have shown that deep learning systems are very weak to adversarial examples, an attack form that was altered by the attacker's intent to deceive the deep learning system. There are many proposed defensive methods to protect deep learning systems against adversarial examples. However, there is still a lack of principal strategies to deceive those defensive methods. Any time a particular countermeasure is proposed, a new powerful adversarial attack will be invented to deceive that countermeasure. In this study, we focus on investigating the ability to create adversarial patterns in search space against defensive methods that use image filters. Experimental results conducted on the ImageNet dataset with image classification tasks showed the correlation between the search space of adversarial perturbation and filters. These findings open a new direction for building stronger offensive methods towards deep learning systems.


I. INTRODUCTION
Over the past decade, there has been the rise of deep learning in many tasks such as computer vision [1], automatic driving [3], natural language processing [2], and so on . Deep learning models are designed based on an assumption of inputs and outputs distribution being benign. This leads to when training deep learning models, we only focus to fine-tune the weights, parameters or the number of nodes and hidden layers while setting aside the validity of data. This has created a security issue against deep learning systems. Szegedy et al. [4] explored that deep neural networks are at risk of attacks from adversarial example attacks. Afterward, many research work on technologies that delude AI models has gradually become a hot spot, and researchers have continued to propose new methods of attack and defense. Adversarial attacks have been regularly adapted in both research and commerce. In the computer vision area, many adversarial attacks are proposed in image classification [5], [6], [7], [8], and object detection [9]. There are also many researches work on the adversarial example in text [10], [11], [12], [13]. In the physical world attack, Kurakin et al. [14] first exposed that hazards of adversarial examples. They use an application of Tensor-Flow Camera Demo to capture original images. After that, they use Google Inception V3 [15] for classifying those images. The implementation has been shown that a large portion of the image has been misclassified even when observed via the camera lens. Eykholt Kevin et al. [16] invented a new method based on [7] and [17] to create robust adversarial perturbation in the real world. They indicated variation in view angles, distance, and resolution are almost defeated by the robust adversarial examples in physical settings. The proposed algorithm used a term as RP 2 for Robust Physical Perturbations, which was used to craft adversarial examples for road sign recognition systems that perform a high deceiving rate in an efficient setting. And many physical adversarial attacks are proposed in face recognition [21], machine vision [20], and road sign recognition [18]. In the cyberspace security field, there are adversarial attacks in cloud service [22], malware detection [23], [24], and network intrusion detection [25]. Besides the attack methods, many defensive approaches have been proposed and they can be branched into four main categories include adversarial training, denoising, transformation and compression. Szegedy et al. [4] used adversarial examples to train an AI model with the ground truth labels, and it made that model more robust. Goodfellow et al. [5] also used the adversarial training strategy to improve the classification rate on adversarial examples with the MNIST dataset. Tramr et al. [26] combined the adversarial examples created from many different AI models to increase the robustness of those models. [27], [28] proposed new methods based on the image transformation to reduce the misclassification rate of an AI model. [30], [31] assumed almost adversarial examples are created in the high-frequency domain and they proposed the new method based on image filters to remove the adversarial perturbations. Das et al. [32] introduced a defensive method based on JPEG compression to deceive FGSM [5] and DeepFool [33] attacks. However, newer adversarial attacks such as Carlini&Wagner attacks [7] overcame these compression defensive strategies.
Our Contributions. In this work, we investigate the search space of adversarial perturbation. A challenge in the process of understanding the effects of adversarial noises is very limited so far. How to determine the available space of adversarial noises is very important. Understanding and identifying this space will help us develop better protection systems for deep learning against adversarial examples.
We describe our main contributions of this research as below: • We have recapped the numerous adversarial defensive and attack methods. Moreover, we have provided a perceptive review of these current methods.
• We discovered the close relationship between search space of adversarial perturbation and image filters.
• Our research opens up a new perspective on creating www.ijacsa.thesai.org 1 | P a g e Paper outlines. The remainder of our paper is described as follows. Section II introduces the literature review and the background of adversarial examples. Section III describes our approach on search space of adversarial examples, and Section IV demonstrate our implementation and evaluation results. Section V summaries our work.

A. Literature review
In this work, we focus on the relation between feasible space of adversarial perturbation and defensive methods based on frequency domain. So we make a literature review on these defensive methods in this section. Eliminating the adversarial features and retaking the classification rate have been considered in many works. Xu et al. [34] proposed a new defensive approach by using the feature squeezing strategies to remove the adversarial features. There are two key ideas in [34]. The first one considered the bit depth in an input image. By increasing or reducing the bit depth of image, the method removed some adversarial features. The second one used the median filter to defeat the adversarial features. However, [34] required a range of thresholds to separate between adversarial and legitimate features. So the selection of a relevant threshold for a specific dataset or setting is a nontrivial task and it is heuristic. Dang et al. [30] proposed a detection system for automatically identifying adversarial examples with the image filters (Gaussian, Median filter). The system doesn't require to setup any threshold for distinguishing adversarial and benign images. However, there is unclear how the system is able to suffer the stronger and new adversarial attacks. Our paper shows that Gaussian blurring only works well on the small adversarial perturbation, and it is futile to larger and stronger adversarial perturbation.
B. Background 1) Convolutional Neural Networks: Convolutional Neural Networks (CNNs) are designed to learn the important features from the training dataset to match them with the given labels. CNNs are used in many areas [1], [2] and provided opensource [15]. CNNs include multilayers with many operations to process signals from a lower layer to a higher layer in hierarchy architecture. In this research, we emphasize in image classification task so we only cover the brief fundamentals in this area. In an image classification task, CNNs process an input data x and try to figure out the best matching output label y from a set of labels Y . The structure of a CNN can be described as shown in table I. The layers are described in a top-down order from input to output. We can see for this CNN network, the input is a color image of size 299×299. The first layer is a convolutional layer whose kernel size is 3×3 with a stride of 2. The next convolutional layers also use the same kernel size with a difference with the number of kernels as well as stride. In an inception network, it appears layers called inception layers. These inception layers are different from convolutional layers in that they combine several different kernel sizes at once to extract more important features. The inception layer can also be called inception filters. The last adjacent layer is the logits layer before the softmax function is implemented to calculate the probability for each output label corresponding to the input.
2) Adversarial Attacks: Adversarial examples are defined as malicious patterns created by the slightly modified aim to fool an AI model but indistinguishable from humans.
FGSM (Fast Gradient Sign Method). was proposed by Ian Goodfellow et al. [5]. In a normal training process, the input and output data distributions are assumed as fixed and unchangeable, so there are only trainable parameters and weights that are fine-tuned respect to a loss objective function between input x and label y.
[5] used a very simple idea to reverse that normal process when they fine-tuned input data distribution respect to a new loss objective function between new sample x adv with new specific label y adv : where β denotes the perturbation size to create an adversarial example x adv from a legitimate input x. From a legitimate input x, FGSM looks for the best adversarial perturbation β to add into x to create a new image x adv . The value of β has to satisfy two requirements include the magnitude of β is as small as possible and respect to the loss objective function between (x,y). For the first requirement, the magnitude of β is smaller, x adv is more similar as x but the convergence rate of the algorithm 1 is slower, while the bigger β makes x adv is more different from the x but the FGSM algorithm converges faster. For the second requirement, the loss objective function between (x,y) is maximized and Loss(x adv , y adv ) is minimized. Because the total of probabilities of output is equal to one, so the algorithm 1 only needs to consider to minimize Loss(x adv , y adv ).
In this paper, we use the FGSM [5] method with l-norm optimization as a baseline to conduct assessments of the possible value areas of β during the creation of adversarial examples. Our attack method is based on a white-box attack where victim AI model information is known in advance and can be accessed.
3) Defensive approaches: There are many methods of protection that have been proposed. The typical strategy is adversarial training [4], [19], [26]. The idea of this strategy of protection is that the AI models will be trained with adversarial examples and ground truth labels. With the assumption that the more AI models are learned, the more accurate they will regain and the more likely it will be to misidentify adversarial examples. However, the major drawback of the adversarial training method is that it takes a lot of time to create adversarial examples and training time for AI models. In addition, this method does not guarantee resistance to new adversarial examples created by other methods than those created by the previous method. Other defensive methods that are often investigated to be pre-processing data. These defensive methods include preprocessing methods based on image transformation [27], [28], filter [30], [31] or compression [32]. Those methods of defense have very impressive results in helping AI systems identify which input is adversarial or legitimate. One of the defenses which also attracts high attention is gradient masking. The adversarial attack methods are largely based on gradient calculations to optimize the loss objective function when creating adversarial examples. For that reason, the idea of hiding the gradient value was proposed. [26] proposed a gradient masking method based on smoothing the gradient gradients that made the global optimal calculation based on gradient slope is more difficult. [37] uses another strategy that is distillation synthesized from different models to create a stronger model against adversarial examples.

A. Search Space on Attacking Phase
One of the important factors in the process of creating adversarial examples is the adversarial perturbation coefficient β. However, how to find out the optimal value of β and its relationship to the currently most powerful defense methods in relation to image filter [30] is unclear. That is the purpose of this study. In this research, we investigate on a white-box attack in creating adversarial examples. This is the setting defined as the attacker can access and use the AI model parameters for conducting an attack pattern. This is possible because currently, the most powerful AI models in image classification tasks are open-source. Many attack methods have been proposed, but most of them rely on FGSM for development, generally, we also use FGSM for creating adversarial examples. One thing to note, it is possible to classify adversarial attacks into two different types based on the purpose of the attacker include non-targeted and targeted attacks. The non-targeted attack is defined as the attacker only focuses on maximize the loss function of (x, y) in order to deceive the AI system. Meanwhile, a targeted attack is defined as the attacker wants to trick the AI system into a misclassifying new pattern in an intentional label rather than merely misidentifying it. Because of this, targeted attacks are more commonly used than non-targeted attacks and we also use it in attacking phases. Our main purpose to decide the size of adversarial perturbation, it means the search space of adversarial perturbation. We consider the norm operation to determine the size of the adversarial noises. Mathematically, the norm operation is used to calculate the distance, or the length of the vectors or the matrixes according to element-wise. The bigger the norm value, the bigger the difference between vectors or matrices and vice versa. Formally, the l p -norm of vector x is defined as: This is a p th -root of a summation of all elements to the p th power is what we call a norm. The important point is even though every l p -norm is all looked very similar to each other, their mathematical properties are very different and thus their application is completely different when we use to create the adversarial examples. In this work, we consider three common norm methods: l 1 -norm, l 2 -norm, and l ∞ -norm for evaluating the size of the search space of adversarial perturbation.
l 1 -norm. We define x true as the original input vector, l 1norm of x true is defined as: This norm is also well-known as the Manhattan norm and it is one of very common norm operations. l 2 -norm. is the most popular norm and also known as the Euclidean norm. The l 2 -norm and other norms are equivalent in the sense that all of them are defined in the same topology. The l 2 -norm is defined as: We use the l 2 -norm to measure the difference between two vectors x true and x adv , the l 2 -norm is re-defined: where x adv defines the adversarial example.
l ∞ -norm. The l ∞ -norm is defined as equation below: Let consider the vector x, if x (i) is each element in vector x, from the property of the infinity itself, we have: Now we have simple definition of l ∞ -norm as: So our attack phase is denoted as Algorithm 1 by using FGSM. Where x true defines the original input, x adv is adversarial example, y true defines the ground-truth label, y adv is an adversarial label, f is the activation function of machine learning model, β is the maximum adversarial value, l i defines the norm. For crafting adversarial example, we set a learning rate lr is equal to 0.01, the number of iteration is 500 times.

B. Filter Methods
Most adversarial attack methods look for the optimal values of adversarial perturbation respect to loss objective function to modify the original image. Therefore, the pixels that are incidentally edited are located in the high-frequency domain. Therefore current protection methods based on image filters have proved very effective in eliminating these adversarial noises. However, in order to better understand the search space of this adversarial perturbation and the ability to resist image filters, we studied the two most common image filters, the Gaussian and the Median filter. Mathematically, a Gaussian filter modifies the input image by calculating a convolution the area of a specific image area with a Gaussian function; this transformation is also known as the Weierstrass transform. The Algorithm 1: Crafting Adversarial Examples with l-norm optimization input : x true , y true , y adv , f , β, l i output : x adv parameter: lr = 0.01, iterations = 500 1 x adv ← x true // initial adversarial example 2 δ ← 0 // initial adversarial perturbation 3 it ← 1 // initial iteration loop 4 while δ < β and f (x adv ) = y adv and it <= iterations do 5 x adv ← x true − δ · sign( Loss(y adv |x adv )) maximize Loss(y adv |x adv ) respect to δ it ← it + 1 10 end 11 return x adv area of convolution is often called kernel size and is usually 3x3 or 5x5. When using a Gaussian filter, the kernel window will move across the surface of the input image and compute the kernel window that corresponds to the image area being processed. The second image filter to be considered in this research is the median filter. This is a very common filter used to highlight the edges of an image. The Median filter also uses kernel windows that move across the input image surface. However, the median filter processes that area simply by finding the median value of the image area being processed, then replacing that median value in the pixel position in the center of the windows kernel while preserving the pixel values in neighbors. Our filtering system proceeds by Algorithm 2, where x defines the input image, ϕ denotes the kernel sizes, f is a machine learning function that computes the predicted label with the highest probability, y true defines the ground truth label, y adv defines the adversarial label, and s is the filter function. Output are the probabilities of the ground truth label (p true ) and the adversarial label (p adv ).

Algorithm 2: Image Filters on input for image classification task
input : x true , s, f, y true , y adv output : p true , p adv parameter:

A. Datasets and AI model
The target AI model that we use in the implementation is Google Inception V3 [15] that was trained with 1,000 common categories in the ImageNet [38] dataset. Our attacking phase is a white-box attack setting and a targeted label is "street sign" label. We use FGSM with l 1 -norm, l 2 -norm, and l ∞ -norm to craft adversarial images.

B. Results
Intuitively, because of the copyright issue of the ImageNet dataset, we use our own images (include pictures of vending machine, computer mouse and keyboard) for analysis. We randomly selected targeted labels for the creation of adversarial images. By using the FGSM method in combination with l 1norm, l 2 -norm, and l ∞ -norm, from each original image we create three different adversarial images. Fig. 1 shows the probabilities of the original vending machine label when the input is an vending machine image. Fig. 2 shows the probabilities of the adversarial label with    4 shows the results of creating adversarial images from the original image of the vending machine. We find that the deep learning system is easily fooled with adversarial images. In addition, we intuitively observe that adversarial images created with l 1 -norm and l 2 -norm are harder to detect than l ∞ -norm.
www.ijacsa.thesai.org  5 shows the experimental results when we use the image filters method on the original image of the vending machine. We find that the Gaussian filter reduces classification accuracy more than the median filter. Especially in the case of the median with size filter 3×3 and 5×5, the classification results are better than the original image.
Similar to the original image, we also apply image filter methods to adversarial images. Fig. 6 shows classification results on adversarial images created by the FGSM method in combination with l 1 -norm. Fig. 7 illustrates classification results on adversarial images created by the FGSM method in combination with l 2 -norm. We observed that Gaussian kernel size 3x3 could not restore identity to ground truth label on adversarial image with l 2 -norm. The probability for vending machine label is only 14.8%. Meanwhile, the median filter still works effectively in removing adversarial noises. Fig. 8 shows classification results on adversarial images created by the FGSM method in combination with l ∞ -norm. We observed that Gaussian kernel size 3×3 could not eliminate the effect of adversarial noise with l ∞ -norm on deep learning system classification. Gaussian 5×5 gives better results, but the label with the highest probability of identification is "tabacco shop". The Median filter removes adversarial noises but cannot help the deep learning system correctly identify ground truth labels. Table II shows experimental results on vending machine (vmachine), computer mouse (c-mouse) and keyboard sets. This result shows us a large correlation between norm operations in search space of adversarial examples. It is clear that for the l ∞ -norm, the Gau (3×3, 5×5) and median (3×3) methods are more difficult to completely eliminate adversarial noises based on the l 1 and l 2 norm. Median (5×5) still proved superior in removing adversarial noises in all settings.

V. CONCLUSION
In this study, we focus on investigating the connection between the search space of adversarial examples and the defense based on the frequency domain. Our empirical results demonstrate that the FGSM method in combination with l ∞norm produces the strongest adversarial examples. In this case, both the Gaussian and the Median filters are unable to restore identification to the ground truth label. However, when using l ∞ -norm to create adversarial examples, we also significantly reduce the quality of the original image compared to using l 1 and l 2 norm. In terms of similarities with the original image, l 1 and l 2 norm produce much better adversarial examples than l ∞ norm.

ACKNOWLEDGEMENT
We would like to thank Professor Akira Otsuka for the valuable suggestions on this research. This work was supported by the Iwasaki Tomomi Scholarship.