Keywords

1 Introduction

Deep learning prevails now in many applications such as smart cars, person identification systems through a security cameras, guiding systems for disabled persons, voice/image generators using different kinds of inputs, etc. The decision that the AI system makes is critical in some applications such as the smart car system, where the system uses an image as input and process it through a network to do the decision-making task [1]. Unfortunately, some researches show that deep learning methods are sensitive to noise called adversarial noise, and the data which include adversarial noises are called adversarial examples. Adversarial examples perturb the input by small scaled patterns, so that the difference is not perceptible to human eyes but affects the performance of the neural network [3]. In the object detection case, the small adversarial noise (inside the adversarial example in images) leads the classifier to miss-classify the object to a wrong label. This could cause a disaster, e.g., a traffic accident of a smart car as it misinterpret a traffic sign [2]. There also exists the threat that the adversarial examples can be used as an attacking tool to the AI system. To defense such kind of attacks, effective defense methods have to be used.

Defense methods can be categorized according to the approach they use for defense. The first approach is to enhance the performance of the classifier so that it becomes robust against adversarial examples [8]. Adversarial training [3] belongs to this category, which uses adversarial images as an additional training set to provide the classifier with extra knowledge about adversarial examples. Another approach converts the one-hot labels for label smoothing [10] so that the labels are not so sensitive to the adversarial noise. Yet another approach is the direct denoising of the adversarial noise. For example, in [6] the defense GAN (Generative Adversarial Network) is used to filter the adversarial noise from the input image. However, the above mentioned methods all use extra adversarial noisy data to train the network. This makes the network robust against some kinds of adversarial noise but may be still weak against some adversarial examples that are not used in the training of the network. Furthermore, the training needs many adversarial examples of as many as possible types. In this paper, we propose the use of the deep image prior as a defense method which eliminates the adversarial noise using only the input image alone. Experimental results show the validness of the proposed method.

2 Preliminaries

In this section, to understand the proposed approach, we first explain the concept of the adversarial attack, and then introduce the deep image prior (DIP) network. After that, we propose in the next section the use of DIP network for defensing an adversarial attack.

2.1 Adversarial Attack on Neural Networks

An adversarial noise is a carefully designed small perturbation which when added to the original input to the network can lead the neural network to make a false decision. When used as a tool of attack, the adversarial example can arouse serious critical harms to the system which depends on the decision of the neural network. Figure 1 shows an example of the adversarial example. Even though the noise added to the image is small so that in the eye of the human the original image and the adversarial example look still similar, the neural network gives different decisions to the two images.

Fig. 1
figure 1

Example of adversarial example. A small adversarial noise added to the original image can make the neural network to classify the image as a Guacamole instead of an Egyptian cat

There are many ways that an adversarial example can be generated. In [3], a simple method is proposed which generates the adversarial noise by moving the image in the direction which increases the distance between the true label and the output of the neural network the most:

$$ \hat{x} = x + \epsilon \cdot sign(\nabla_{x} J\left( {x,y_{true} } \right). $$
(1)

Here, \( x \) refers to the input image, \( y_{true} \) is the true label of the input image, \( \nabla_{x} \) is the gradient with respect to \( x \), and \( \epsilon \) is a small positive value. An alternative to (1) is

$$ \hat{x} = x - \epsilon \cdot sign(\nabla_{x} J\left( {x, y_{fool} } \right). $$
(2)

which aims to decrease the distance between the false label and the output of the neural network, so that the network falsely classifies the image as the targeted false label. In [4], Kurakin et al. propose an iterative scheme which clips all the pixels after each iteration:

$$ \begin{array}{*{20}c} {\hat{x} = x,} \\ {\hat{x}_{N + 1} = Clip_{x, \epsilon } \{ \hat{x}_{N + 1} + \alpha \cdot sign(\nabla_{x} J\left( {x_{N} ,y_{true} )} \right)\} .} \\ \end{array} $$
(3)

where

$$ Clip_{x, \epsilon } \left\{ {x^{\prime}} \right\} = \hbox{min} \left\{ {255,x\left( {i,j,k} \right) + \epsilon ,\hbox{max} \left\{ {0,x\left( {i,j,k} \right) - \epsilon ,x^{\prime}\left( {i,j,k} \right)} \right\}} \right\}, $$
(4)

where \( x\left( {i, j, k} \right) \) is the value at the position \( \left( {i, j} \right) \) in the \( k \)’s channel, and the clipping function \( Clip_{x, \epsilon } \left\{ {x^{\prime}} \right\} \) keeps \( x^{\prime} \) stay inside the ball with radius \( \epsilon \) with the original image \( x \) as the center.

2.2 Previous Works of Using Deep Neural Networks for Adversarial Noise Removal

In [5], a method for adversarial noise removal based on the high-level representation has been introduced which uses the well-known Unet architecture. Since the adversarial noise has a very small noise level, normally, a pixel-wise loss function is not enough to eliminate the noise. Therefore, in [5], a high-level representation loss function is proposed,

$$ L = \sum\nolimits_{i} {\left\| {f(x_{i} ) - f(d_{\theta } (\hat{x}_{i} ))} \right\|^{2} } , $$
(5)

where \( x_{i} \) is the original image, \( \hat{x}_{i} \) is the adversarial noisy image, \( d_{\theta } ( \cdot ) \) is the denoising network which we want to train, and \( f( \cdot ) \) refers to the output of the classifier neural network which we want to defense from the adversarial noise. The parameters \( \theta \) of the network \( d_{\theta } ( \cdot ) \) are trained to minimize the loss function in (5) with a large number of pairs of \( (x_{i} , \hat{x}_{i} ) \), \( i = 1,2, \ldots ,N \). Therefore, this kind of network needs many adversarial noisy images and their counterpart true images to be trained. In contrast, we propose the use of the deep image prior network which can be trained by using only the adversarial noisy image.

2.3 Deep Image Prior Network

The work in [9] proposes a deep image prior (DIP) network which converts a random noisy vector z into a restored image \( g_{\theta } \left( z \right) \), where \( g_{\theta } ( \cdot ) \) denotes the deep image prior network with parameter \( \theta \). As shown in the experiments in [9], the deep image prior (DIP) network has a high impedance against noise. Therefore, during the training of the DIP network by minimizing (8), the trajectory of the parameter \( \theta \) passes through a good solution \( \theta^{*} \) which results in a well denoised image \( g_{\theta } \left( z \right) \). It should be taken care of that \( \theta^{*} \) is not the minimizer of (8), since the image using the minimizer will result in the noisy image again. The parameters are obtained by the way of minimizing the following energy functional:

$$ \mathop {\hbox{min} }\limits_{\theta } E(g_{\theta } (z);\hat{x}) $$
(6)

where \( E( \cdot ) \) is often set to the square of the \( L_{2} \) norm:

$$ E(x;x_{0} ) = \left\| {x - x_{0} } \right\|^{2} , $$
(7)

then (6) becomes

$$ \mathop {\hbox{min} }\limits_{\theta } \left\| {g_{\theta } (z) - x_{0} } \right\|^{2} . $$
(8)

The restored image \( x^{*} \) is obtained by

$$ x^{*} = g_{\theta } \cdot (z), $$
(9)

where \( \theta^{*} \) denotes the parameters of the network which are obtained in the way of minimizing \( E(g_{\theta } (z); \hat{x}) \).

3 Adversarial Example Defense Based on the DIP

In this section, we propose a defense method against the adversarial example via the deep image prior network. Unlike other deep neural network using adversarial elimination methods, the defense method do not need a neural network to be trained with many adversarial and true image pairs, but can eliminate the adversarial noise using only the adversarial noisy image. The main idea is that the projection of the adversarial noisy image onto the deep image prior space will eliminate the adversarial noise.

Let \( I_{in} \) be the input we want to put through the target classifier network \( f. \) Here, \( I_{in} \) may or may not contain the adversarial noise. Before putting \( I_{in} \) through the network, we first put \( I_{in} \) through the deep image prior \( g_{\theta } \), where the parameters \( \theta \) get updated by minimizing the following loss function,

$$ L = \left\| {I_{in} - g_{\theta } (I_{in} )} \right\|^{2} , $$
(10)

The deep image prior network is trained on a single image \( I_{in} \), which is different from other networks which are trained on many images. Let \( x_{o} \) be the initial input to the deep image prior network, and let the input be the image which want to put through the target classifier neural network, i.e., let \( x_{o} = I_{in} \). Let \( \hat{x}_{0} ,\hat{x}_{1} , \ldots ,\hat{x}_{t} \) be the outputs of the DIP according to the updates in the training. After each iteration, the update in the parameters \( \theta \) will add new high frequency components \( g(\Delta \theta_{k + 1} ,x_{0} ) \) to \( g(\theta_{k} ,x_{0} ) \) as can be seen in Fig. 2.

Fig. 2
figure 2

Construction of the noiseless image with the deep image prior network

We interpret the output of the deep image prior network as a projected version of the input image on the deep image prior space (Fig. 3). That is, the output of the deep image prior network is an approximation of the input image constrained by the deep image prior. The early outputs \( g_{\theta } (I_{in} ) \) lack many high frequency components so that \( f(g_{\theta } (I_{in} )) \ne f(I_{true} ) \), where \( I_{true} \) denotes the image without any adversarial noise. However, after a sufficient update, \( g_{\theta } (I_{in} ) \) approaches \( I_{in} \) in the \( L_{2} \) norm sense, but remains in the true class space which does not include the adversarial noise. When putting \( g_{\theta } (I_{in} ) \) through the classifier \( f \), this will give the correct, or at least, a similar classification result as \( I_{in} \), i.e., \( f(I_{in} ) \approx f(g_{\theta } (I_{in} )) \).

Fig. 3
figure 3

Projection onto the true class space

4 Experimental Results

In order to examine performance of the algorithm, we do some experiments using the pre-trained Inception-V3 [7] model. We test the defense system on clean images as well as adversarial images, to verify that the DIP does not change the high frequency components in the direction such that the classification result changes. We used 100 cat images (of size of 299 by 299, RGB color) and used the targeted fast gradient sign method (FGSM) shown in (2) for the attacking method. We used two different settings for the \( \epsilon \) value, where dataset A is made with \( \epsilon = 0.008 \), and dataset B with \( \epsilon = 0.08 \). Figure 4 shows the original image, the adversarial image, the difference between the original and the adversarial image, the image denoised with the DIP, and the difference between the adversarial and the denoised image, respectively. It can be observed that the original and the adversarial image cannot be distinguished with the eye. However, the difference image shows certain patterns which leads to the miss-classification. The difference image is multiplied by 200 for visualization. As can be seen the denoised image also shows some difference with the original image, but the difference image shows no adversarial pattern but only difference in the edge regions. Therefore, it can be said that the adversarial pattern is eliminated in the denoised image.

Fig. 4
figure 4

Example figure

In Table 1, we examine the classifier accuracy of the adversarial images, clean images, and the DIP reconstructed images. We reconstructed also the clean images with the DIP and measured the accuracy to verify that the reconstructed images show the same classification result as the clean ones. The classification accuracy result of original clean images before applying the DIP is about 95%, while the classification accuracy for the adversarial images has dropped down to 1%. After applying the DIP based denoising to the adversarial image, the classification accuracy recovers up to 87%, showing that the DIP defends the adversarial examples.

Table 1 Accuracy results (%) of the CNN classifier (Inception-V3) before/after applying the DIP

5 Conclusion

We proposed the use of the deep image prior network to remove and detect the adversarial noise embedded in the image. The deep image prior network makes it possible to denoise the adversarial noise using only the adversarial noisy image. One major drawback of the proposed method is that the training of the deep image prior network has to be done for every incoming image. However, the number of iterations in the training is small enough to make it work in real time when combined with multi-core GPU systems. Issues on how to accelerate the speed of the proposed defense system are topics for further study.