Keywords

1 Introduction

CNNs have recently outperformed machine learning methods in various tasks, such as image classification [1,2,3], object detection [4,5,6], and speech recognition  [7,8,9]. However, like many classification machine learning algorithms, deep learning methods can be easily fooled by small imperceptible perturbations in the input [10]. The main reason may be that the linear classifier is used at the last layer of deep learning methods. Although linear classifiers are very effective for the linear classification, they force the model to assign high confidence to these regions far from the decision boundary. Thus the adversarial attack can easily make some changes in multiple dimensions of the input image, which makes the perturbed images cross the classification boundary. The perturbed images are misclassified by the network in the end.

In order to mitigate the effect of adversarial attacks, the following two kinds of defense techniques are proposed: data-level method and algorithmic-level method. The former includes adversarial training [11, 12], pre-processing methods using basis functions [13] and noise removal [14]. The later can be seen in literatures [15,16,17,18,19,20], and the deep model is modified or the algorithm is trained by reducing the magnitude of gradients [17], or masking gradient [18]. However, these approaches are not completely effective against several different white-box and black-box attacks [14]. Similar to the methods based on pre-processing, they may decrease accuracy to defense some attacks. Generally, most of these defense strategies make the classification accuracy descend on clean data.

As mentioned above, successful adversarial attacks are mainly due to that fact that the models are linearly high in high dimension. This greatly decreases the flexibility of the models and makes the decision boundary close to the manifolds of the training data. In order to improve the nonlinearity of the model, Goodfellow et al. [11] have explored a variety of methods, including shallow and deep RBF networks. They used the shallow RBF network to achieve good performance against adversarial perturbations. However, they found that it was difficult to train the deep RBF network.

In this paper, we explore the incorporated network model with the deep neural network and RBF network, which not only ensures that the network model can effectively resist perturbations, but also makes the network model be trained easily. Meanwhile, small noise is added before the network input, which improves the robustness of the network attack and resists the white-box and black-box attack effectively.

2 Related Work

2.1 Adversarial Examples

The adversarial examples were first introduced by Szegedy et al. [12]. Szegedy et al. show that the prediction of the network can be changed arbitrarily by applying imperceptible non-random perturbations to the input image. The malicious input is \({X}'=X+\alpha \) and \(\alpha \) is a lightly perturbation with \(\left\| \alpha \right\| < \epsilon \), where \(\epsilon \) is so small that it makes no visual difference between X and \({X}'\) for human being but deep neural networks will be fooled.

In addition, Szegedy et al. point out that the adversarial examples are relatively robust and can be generalized in neural networks with different depths and activation functions. In other words, if we use one neural network to generate the adversarial examples, another neural network also misclassifies these examples even when it is trained with different hyper-parameters, or when it is trained on different subset of a dataset. This phenomenon makes the black-box attacks feasible.

Finally, Szegedy et al. also state that it is linear transformation of high-dimensional space of the model that leads to the phenomenon of adversarial examples, and proposes the Fast Gradient Sign Method (FGSM) for crafting adversarial examples. FGSM is an untargeted attack method and uses the same attack strength at every dimension:

$$\begin{aligned} X_{FGSM}=X+\varepsilon sign\left( \bigtriangledown _{X}J\left( X,y \right) \right) . \end{aligned}$$
(1)

In this equation, the adversarial examples are obtained by adding a transformed gradient to the input X, where \(\varepsilon \) is small enough to be undetectable.

2.2 Gaussian Noise

In the space domain and frequency domain, Gaussian noise (also known as normal noise) is commonly used. The probability density function of the Gaussian random variable Z is given by the following formula:

$$\begin{aligned} p\left( Z \right) =\frac{1}{\sqrt{2\pi }\sigma }e^{-\frac{\left( Z-\bar{Z} \right) ^{2}}{2\sigma ^{2}}}, \end{aligned}$$
(2)

where Z is the gray value, and its mean and deviation are \(\bar{Z}\) and \(\sigma \), respectively.

The perturbation is extremely small in general in order to get a normal example. In [21], Gu and Rigazio consider an alternative strategy by adding Gaussian noise damage into the adversarial examples. The aim of this additional noise strategy is to remove the adversarial examples “blind spot” areas of the classification space through adding extra “larger” interference noise to the input. Moreover, adding the ordinary tiny perturbation does not have a impact on the performance of the neural network. Experimental results show that the Gaussian noise injection can defend against samples to some extent.

2.3 RBF

RBF networks [22,23,24] are neural networks with one hidden layer of RBF units and a linear output layer. An RBF unit is a neuron with multiple real input \(X=\left( X_{1},\cdots ,X_{n} \right) \) and one output y. Each unit is determined by n-dimensional vector C and the parameter \(\beta > 0\). The output y is computed as:

$$\begin{aligned} y=\varphi \left( \xi \right) ;\xi =\beta \left\| X-C \right\| ^{2}, \end{aligned}$$
(3)

where \(\varphi :\mathbf {R}\rightarrow \mathbf {R} \) is the suitable activation function. Typically, Gaussian \(\varphi \left( Z \right) =e^{-z^{2}} \). Thus the network computes the following function \( f:\mathbf {R}^{n}\rightarrow \mathbf {R}^{n}\):

$$\begin{aligned} f_{s}\left( X \right) =\sum _{j=1}^{h}\omega _{ij}\varphi \left( \beta _{j}\left\| X-C_{j} \right\| \right) , \end{aligned}$$
(4)

where \(\omega _{ij}\in \mathbf {R} \) and \(f_{s}\) is the output of the sth output unit.

Compared with normal networks, RBF networks use radial basis function units at the last layer. Unlike the linear unit, RBF unit is activated in a well-defined region of its input space [25]. In this case, the goal of the feature extractor network is to map the data to a new representation, where each kind of data forms a cluster. Experimental results show that when RBF is combined with CNN, it can effectively resist the disturbance.

3 Method

3.1 Model

Inspired by the defense methods in [21, 25], we combine the data-level defense method with the algorithm-level defense method, and propose an incorporated model with Gaussian noise injection and RBF network on neural networks. Noth that the network can be any CNN (see Fig. 1).

Fig. 1.
figure 1

The incorporated network model.

Here we take the classic Lenet-5 [26] network as the example to demonstrate the proposed model (see Fig. 2). In this figure, the LeNet-5 network structure is in the dashed box, and before the image input into the LeNet-5 network, the tiny Gaussian noise is added to the image. After the feature extraction and the classification through LeNet-5 network, the output of LeNet-5 is set as the input into the RBF network. The RBF output is the final classification result of our proposed model.

Fig. 2.
figure 2

The incorporated network model based on LeNet-5.

3.2 Loss Function

When training the proposed model, we calculate the loss of the final output and use the cross-entropy loss function, the loss is computed as:

$$\begin{aligned} Loss = loss\left( \hat{y},y \right) \end{aligned}$$
(5)
$$\begin{aligned} loss=-\sum _{i=1}^{N}y_{i}log\left( \hat{y_{i}} \right) +\left( 1-y_{i} \right) log\left( 1-\hat{y_{i}} \right) , \end{aligned}$$
(6)

where \(\hat{y}\) is the final output of the proposed model and y is the true label.

4 Experiments

These experiments in this paper are based on two benchmark datasets: MNIST [26] and Fashion-MNIST [27]. MNIST contains 60,000 images in the training set and 10,000 images in testing set. Each image is a grayscale image with 28\(\times \)28 pixels, and the number of possible classes is 10. Fashion-MNIST is the standard data set of commodity classification. The color size of pictures and the scale of training set and the test set are all the same with MNIST, but the classification difficulty is higher than MNIST.

4.1 Experiment Setups

In the experiments, we consider two basic CNNs: Simple-CNN with two convolutional layers (S-CNN) and LeNet-5. The activation function in each network is RELU and loss function is the cross-entropy function. The detailed model structure and the parameter information are shown in Table 1, and Table 2 contains the chosen training hyper-parameters for all models.

In order to verify the superiority of the proposed model, the following three defense methods are chosen: the basic CNN model, the CNN model combined with RBF network (CNN_RBF) [25] and the centers are set as 300 in RBF network, and a CNN model incorporating Gaussian noise (Gauss_CNN) [21] with standard deviation \(\sigma =0.3\) on MNIST and Fashion-MNIST.

Moreover, five test sets are set in the defense experiments. For the network based on S-CNN, we first set the 10,000 clean test set images of the MNIST dataset as Test Set i. We then generate adversarial examples test sets by attacking four models(S-CNN, S-CNN_RBF, Gauss_S-CNN, the proposed model), which are set as Test Sets from iito v. For the network based on LeNet-5, two benchmark datasets, MNIST and Fashion-MNIST, are chosen. For each dataset, the same numbers of the clean test set images are set in Test Set i. Finally, we generate adversarial examples test sets by attacking four models (LeNet-5, LeNet-5_RBF, Gauss_LeNet-5, the proposed model), which are set as Test Sets from iito v.

Table 1. Network model parameters.
Table 2. Overview of training parameters.

4.2 Experiment Results

In this section, we report the results of several experiments for the task of classification. We first start with S-CNN model on MNIST. Next, the proposed method is applicable to another classical model LeNet-5 on MNIST and Fashion-MNIST. A series of adversarial examples of each test set are generated by FGSM attack, as shown in Fig. 3.

Fig. 3.
figure 3

An illustration of each test set. In this figure, the leftmost column displays the original images, and the next four columns show adversarial examples corresponding to Test Set ii, iii, ivand vfrom left to right.

S-CNN. In this section, we compare the recognition accuracy of the four network models in each dataset and the experimental results are shown in Table 3.

Observing the data of the first column (Test Set i) in Table 3, the proposed network model does not drop the accuracy significantly on clean images. That is, Gaussian noise can prevent the network model from overfitting, and the interference of Gaussian noise is equivalent to the data augmentation of the original dataset. From Test Set iito Test Set v, the diagonal data indicate the recognition accuracy of each network model under the white-box attack. Compared with these diagonal data, the performance of S-CNN_RBF and Gauss_S-CNN against white-box attack are improved, especially the improvement effect of S-CNN_RBF is obvious, which is more than 65% higher than S-CNN. Therefore, adding the RBF network after the softmax layer of basic network can greatly improve its robustness. This is due to that fact that the strong local approximation ability of RBF network successfully makes the basic model S-CNN better fit the real decision boundary and compress the adversarial space, and thus the basic mode with RBF can be against the white-box attack effectively. Furthermore, the data in the Test Set iishow that defensive performance of four models can be against the black-box attack. Obviously, although S-CNN_RBF model has higher performance against the white-box attack, the defense performance against the black-box attack is still only about 30%. This is due to the fact that the adversarial examples of Test Set iiare generated by the original model, and the adversarial examples have the ability to migrate across model. Therefore, utilizing RBF network can resist the white-box attack, though it cannot be against the migration attack effectively. The migration attack is defined as the attack that uses other network against vulnerabilities to generate adversarial examples of the space, and use its migration ability against unknown network.

Table 3. Experimental results based on S-CNN model on MNIST dataset.

In addition, though the network with only adding Gaussian noise is not ideal against the white-box attack (compared with the S-CNN, there is only an increase of about 30%), it has a high recognition accuracy against the black-box attack, which increases by more than 70%. This is because the perturbations are extremely small, and there is a “blind spot" area in the input space of the image. Using extra noise to images can undermine the dominance against perturbations. In this way, adversarial examples can be removed from the “blind spot" to obtain correct classification. Meanwhile, the extra noise will have less impact on the performance of neural network. From the last line in the Table 3, we can observe that the incorporated model with Gaussian noise and RBF network can promote each other. Compared with other defense model against the white-box attack and the black-box attack, the defense performance of the proposed model has improved effectively. From the Test Set ito Test Set v, the proposed model almost keeps the superiority. In conclusion, the proposed network structure has better robustness against the white-box and black-box attacks.

LeNet-5. Another classic network model, Lenet-5, is used in this section, and experiments are carried out on two datasets: MNIST and Fashion-MNIST datasets. The detailed defense performance on the MNIST and Fashion-MNIST is reported in Table 4 and Table 5, respectively.

Table 4. Experimental results based on LeNet-5 model on MNIST dataset.

Observing the data reported in Table 4, the same conclusions as in S-CNN are obtained. The diagonal data from Test Set iito Test Set vrepresent those defense models performance against the white-box attack. In addition, the data from Test Set iito Test Set vexcept diagonal are the performance of each defense model against the black-box attack. Though data analysis, LeNet-5_RBF has a better defense performance against the white-box attack, while Gauss_LeNet-5 has better performance against the black-box attack. The proposed model incorporated the merits of them has better defense performance both in two aspect of against white-box and black-box attacks. Obviously, the proposed model based on LeNet-5 has a recognition accuracy of over 70% on each adversarial examples test set.

Table 5. Experimental results based on LeNet-5 model on Fashion-MNIST dataset.

Observing data in Table 5, the improvement is not so obvious in the Fashion-MNIST dataset compared with the MNIST dataset, but the improvement trend in each defense model is consistent with MNIST dataset. Like LeNet-5_RBF, it is helpful to enhance the robustness of the model and the defense ability against the white-box attack. The average defensive performance of the Gauss_LeNet-5 against black-box attack is about 60%. And the proposed incorporated model on MNIST dataset is slightly better than it on Fashion-MNIST dataset. However, Gauss_LeNet-5 and the proposed model defense accuracy have a drop in the clean test set. This phenomenon is due to that fact that Fashion-MNIST dataset is more complicated than MNIST dataset. Adding Gaussian noise on Fashion-MNIST dataset has a impact on classification task. The recognition accuracy in the fourth column of the Table 5 on each network in the Test Set ivis less than 20%. The reason may be that images in Test Set iare generated by attacking the Gauss_LeNet-5 model, and the incorporated noise is larger compared to original image. Therefore these noises make the models have low accuracy in the classification task. However, the consistent trend of ascension on Test Set ivagain indicates that the incorporated model structure is effective in defense adversarial examples.

In summary, compared the defense performance with other three models against white-box and black-box attacks, the proposed model can defend adversarial examples effectively on Fashion-MNIST.

5 Conclusion

In this paper, we propose an incorporated defense method with Gaussian noise and RBF network. The experimental results show that the proposed method can effectively be against the adversarial examples in the white-box and black-box attacks. Furthermore, compared with other methods, the proposed method effectively improves the classification accuracy on adversarial images, and does not drop the accuracy significantly on clean images.