Keywords

1 Introduction

Deep Neural networks have achieved an excellent performance on various computer vision tasks. However, in recent years the vulnerability of those models was discovered [1, 2]. To be specific, DNNs will get a wrong result when the clean inputs add some imperceptible, human-imperceptible noises. In addition, adversarial examples show an intriguing transferability [1, 3], where they crafted from one model can also fool other models. As a result of adversarial examples can not only evaluate the robustness of networks, but also improve the robustness of networks by adversarial training [4, 9]. How to improve the transferability of adversarial examples has attracted a lot of attention (Fig. 1).

Fig. 1.
figure 1

Visualization of one targeted adversarial example generated by the proposed method (EI-NI-FGSM-HAG). A original images (a) is recognized correctly as a “cock” by ResNet-50, “armadillo” is randomly selected from other wrong labels. The adversarial noises (b) crafted by ResNet-152, VGG-19, Densenet-121, Inception-v3. The adversarial images (c) is the addition of the original images (a) and the noise (b), which is recognized as a “armadillo” by ResNet-50.

With the knowledge of the network, several methods have been proposed to generate adversarial examples. Specifically, containing Optimization-based methods like box-constrained L-BFGS [1], Carlini & Wagner attack (C &W) [5], gradient-based methods like fast gradient sign (FGSM) [2] and basic iterative method (I-FGSM) [7]. Those white-box attack methods can achieve high success rates. For black-box attacks, two different kinds of approaches have been proposed to implement it: One is the query-based approach [8], which trains a surrogate model by querying the unknown model. The surrogate model has similar prediction to unknown model, then we can use white-box attack methods to generate adversarial examples. However, in practical applications, it requires a large number of queries when the unknown network is complicated, that is easily detected by the model’s defense system. The other is transfer-based approach, Yinpeng Dong et al. [10] utilize white-box attack methods to attack an ensemble of multiple models to generate adversarial images with high transferability. There is no need to query unknown networks. The aim of ensemble-based approaches is to attack their common vulnerability. However, they show low efficacy for targeted attack which requires adversarial examples to be classified by a network as a targeted label [3].

In this work, we improved the transferability of the adversarial image based on the ensemble model in three aspects: ensemble schemes, gradient descent mechanisms, and optimization methods.

  • We discovered that different ensemble schemes have different effects on transferability of non-targeted attacks and targeted attacks. To specific, on untarget attacks, ensemble in softmax scheme has a higher success rate than the other two schemes, and the scheme of ensemble in loss is better on targeted attacks.

  • In addition, we studied different gradient descent mechanisms. The results show that pixels with higher absolute gradient values are better represented common properties between models. By attacking common properties between models to improve the transferability of adversarial examples.

  • We integrate the Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) [11] to the ensemble of models to avoid falling into local optimum in the optimization process. This method has been verified to be better than Momentum Iterative Fast Gradient Sign (MI-FGSM) [10] on a single model.

Extensive experiments on the ImageNet dataset [6] demonstrate that, on black-box setting, the proposed attack methods assist to improve the success rates of both non-targeted attacks and targeted attacks on a large margin. In targeted attacks, our best attack reaches the highest success rate of 30.1% on top-5 accuracy. This makes targeted attacks possible for black-box systems.

2 Related Works

In this section, we will give a brief introduction to some related works on adversarial attack. Let \(\boldsymbol{x}\) and \(\boldsymbol{x}^{adv}\) be a benign input and an adversarial input, respectively. Given a classifier \(f_\theta (\boldsymbol{x})\), with ground truth label y, the goal of the non-targeted attack is searching for an adversarial image \(\boldsymbol{x}^{adv}\) which is predicted by classifier satisfy \(f_\theta (\boldsymbol{x}^{adv}) \ne y\). In targeted attack, the attacker aims to search for an adversarial image misclassified into a certain class \(y^{target}\), that is \(f_\theta (\boldsymbol{x}^{adv})=y^{target}\). To limit the distortion, the adversarial images generated by both two kinds of method should satisfy \(||\boldsymbol{x}^{adv}-\boldsymbol{x}||_p \le \varepsilon \), where p could be \(0,1,2,\infty \) and \(\varepsilon \) is the maximum value of distortion.

2.1 Optimization-Based Methods

One is directly optimizing the distortion between the benign images and the adversarial images [2, 5]. To be specific, for non-targeted attacking, search for an adversarial example \(\boldsymbol{x}^{adv}\) by solving:

$$\begin{aligned} \mathop {\arg \min }\limits _{\boldsymbol{x}^{adv}} ||\boldsymbol{x}^{adv}-\boldsymbol{x}||_p-c\cdot J(\boldsymbol{x}^{adv},y^{true}) \end{aligned}$$
(1)

where \(J(\boldsymbol{x}^{adv},y^{true})\) is the loss function of prediction \(y^{true}\) and c is a constant to balance constraints the loss and distortion. Though, it is effective to find adversarial images, it is difficult to ensure the distortion between \(\boldsymbol{x}^{adv}\) and \(\boldsymbol{x}\) is less than \(\varepsilon \).

2.2 Gradient-Based Methods

Fast Gradient Sign Method (FGSM): FGSM [2] find an adversarial image \(\boldsymbol{x}^{adv}\) by the following equation:

$$\begin{aligned} \boldsymbol{x}^{adv} = \boldsymbol{x} + \varepsilon \cdot \textrm{sign}(\nabla _x J(\boldsymbol{x},y^{true})) \end{aligned}$$
(2)

This method just needs a one-step update and \(\varepsilon \) limits the maximum distortion.

Iterative Fast Gradient Sign Method (I-FGSM): I-FGSM [7] is an iterative version of FGSM. The iteration step length is \(\alpha =\varepsilon /T\), where T is the number of Iterations.It can be expressed as:

$$\begin{aligned} \boldsymbol{ x}_{0} = \boldsymbol{x}, \boldsymbol{x}^{adv}_{t+1} = \boldsymbol{x}^{adv}_{t} + \alpha \cdot \textrm{sign}(\nabla _x J(\boldsymbol{x}^{adv}_{t},y^{true})) \end{aligned}$$
(3)

The performance of iterative methods is greatly greater than one-step methods in white-box setting. However, the transferability of adversarial examples is worse.

Momentum Iterative Fast Gradient Sign Method (MI-FGSM) [10]: In the optimization process, the momentum [12] is integrated into each iteration, improving the transferability of adversarial images. The broad formalization of this method is as follows:

$$\begin{aligned} \boldsymbol{g}_{t+1}&=\mu \cdot \boldsymbol{g}_{t}+\dfrac{{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{adv}_t,y^{true})}{||{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{adv}_t,y^{true})||_1} \end{aligned}$$
(4)
$$\begin{aligned} \boldsymbol{x}^{adv}_{t+1}&= \boldsymbol{x}^{adv}_{t}+\alpha \cdot \textrm{sign}(\boldsymbol{g}_{t+1} ) \end{aligned}$$
(5)

\(\mu \) is the decay factor, and the accumulated gradient is \(g_t\). \(g_{t}\) is starting with \(g_{0}=0\).

Nesterov Iterative Fast Gradient Sign Method (NI-FGSM): NI-FGSM [11] considers previous accumulated gradient as a correction to avoid trapping in local optimum. Similar to MI-FGSM, \(\boldsymbol{g}_{t}\) is starting with \(\boldsymbol{g_{0}=0}\). The update procedure is carried out as follows:

$$\begin{aligned} \boldsymbol{x}_{t}^{nes}&= \boldsymbol{x}_{t}^{adv} +\alpha \cdot \mu \cdot \boldsymbol{g}_t\end{aligned}$$
(6)
$$\begin{aligned} \boldsymbol{g}_{t+1}&=\mu \cdot \boldsymbol{g}_{t}+\dfrac{{\nabla _{\boldsymbol{x}}}J(x^{nes}_t,y^{true})}{||{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{nes}_t,y^{true})||_1} \end{aligned}$$
(7)
$$\begin{aligned} \boldsymbol{x}^{adv}_{t+1}&=\boldsymbol{x}^{adv}_{t}+ \alpha \cdot \mathrm sign(\boldsymbol{g}_{t+1} ) \end{aligned}$$
(8)

\(g_t\) denotes accumulated gradients [13] at the iteration t, \(\mu \) denotes the decay factor.

In this paper, the distortion between \(\boldsymbol{x}^{adv}\) and \(\boldsymbol{x}\) measure by root mean square deviation, i.e., RMSD, Which is calculated as \(d(\boldsymbol{x}^{adv},\boldsymbol{x})=\sqrt{\sum _{i}(\boldsymbol{x}^{adv}_i-\boldsymbol{x}_i)^2/N}\), Where \(\boldsymbol{x}_i\) and N represent the dimensionality of \(\boldsymbol{x}\) and the pixel value of the i-th dimension of \(\boldsymbol{x}\), respectively. The values for each pixel range from 0 to 255.

2.3 Targeted Attacks

The method of generating the target adversarial example is similar to the non-target adversarial example, but the goals of the attackers transform to searching for an instance \(\boldsymbol{x_{adv}}\) to satisfy \(f_\theta (\boldsymbol{x}^{adv})=y^{target}\). For the optimization-based methods, we have the following approximate solution to this problem.

$$\begin{aligned} \mathop {\arg \min }\limits _{\boldsymbol{x}^{adv}}||\boldsymbol{x}^{adv}-\boldsymbol{x}||_p+c\cdot J(\boldsymbol{x}^{adv},y^{target}) \end{aligned}$$
(9)

For I-FGSM, MI-FGSM and NI-FGSM, we make the following changes:

$$\begin{aligned} \boldsymbol{x}^{adv}_{t+1} = \boldsymbol{x}^{adv}_{t} + \alpha \cdot \textrm{sign}(\nabla _{\boldsymbol{x}} J(\boldsymbol{x}^{adv}_{t},y^{target})\text{(I-FGSM) } \end{aligned}$$
$$\begin{aligned} \begin{array}{ll} \boldsymbol{g}_{t+1}&{} =\mu \cdot \boldsymbol{g}_{t}+\dfrac{{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{adv}_t,y^{target})}{||{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{adv}_t,y^{target})||_1} \\ \boldsymbol{x}^{adv}_{t+1} &{}= \boldsymbol{x}^{adv}_{t} -\alpha \cdot \textrm{sign}(\boldsymbol{g}_{t+1} ) \end{array} \qquad \text{(MI-FGSM) } \end{aligned}$$
$$\begin{aligned} \begin{array}{ll} \boldsymbol{x}_{t}^{nes} &{}= \boldsymbol{x}_{t}^{adv} -\alpha \cdot \mu \cdot \boldsymbol{g}_t\\ \boldsymbol{g}_{t+1} &{}=\mu \cdot \boldsymbol{g}_{t}+\dfrac{{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{nes}_t,y^{target})}{||{\nabla _{\boldsymbol{x}}}J(\boldsymbol{x}^{nes}_t,y^{target})||_1} \\ \boldsymbol{x}^{adv}_{t+1} &{}= \boldsymbol{x}^{adv}_{t} - \alpha \cdot \textrm{sign}(\boldsymbol{g}_{t+1} ) \end{array} \qquad \text{(NI-FGSM) } \end{aligned}$$

3 Methodology

3.1 Motivation

In the black box case, methods using only one known model to generate adversarial samples have been shown to be effective in non-targeted attacks [3, 10]. However, for targeted attacks, the adversarial samples generated by a single known model are virtually untransferable. Attacking multiple models at the same time can be beneficial to improve transferability. Intuitively, if an adversarial example is misidentified by all known models, it is likely to be misidentified by other unknown models. For targeted attacks and non-targeted attacks, there are different ensemble schemes for us to consider. In addition, the process of generating adversarial examples can be seen as an optimization problem [11], so a better optimization algorithm can also improve the transferability of adversarial examples.

3.2 Ensemble Schemes

Let \(\boldsymbol{l}_{k}(\boldsymbol{x})\) denote the logits of k-th model, and we have k known models, the softmax cross-entropy loss of k-th model can be expressed as:

$$\begin{aligned} \begin{array}{l} J_k(\boldsymbol{x},y) = - \boldsymbol{1}_y \cdot \textrm{log}(\textrm{softmax}(w_k\boldsymbol{l}_k(\boldsymbol{x}))) \end{array} \end{aligned}$$
(10)

where \(1_y\) is the one-hot encoding of ground-truth label y, \(w_k\) is the ensemble weight. We employ three ensemble schemes for targeted and non-targeted attacks: ensemble in logits (EI-logits), ensemble in softmax (EI-softmax), ensemble in loss (EI-loss). The ensemble loss of three ensemble schemes can be represented by the following three equations:

$$\begin{aligned} J(\boldsymbol{x},y)&= - \boldsymbol{1}_y \cdot \textrm{log}(\textrm{softmax}(\begin{matrix} \sum _{k=1}^K w_k\boldsymbol{l}_k(\boldsymbol{x}) \end{matrix})),\end{aligned}$$
(11)
$$\begin{aligned} J(\boldsymbol{x},y)&= - \boldsymbol{1}_y \cdot \textrm{log}(\begin{matrix} \sum _{k=1}^K \textrm{softmax}(w_k\boldsymbol{l}_k(\boldsymbol{x})\end{matrix})),\end{aligned}$$
(12)
$$\begin{aligned} J(\boldsymbol{x},y)&= \begin{matrix} \sum _{k=1}^K (- \boldsymbol{1}_y \cdot \textrm{log}(\textrm{softmax}(w_k\boldsymbol{l}_k(\boldsymbol{x}))\end{matrix}), \end{aligned}$$
(13)

where \( \begin{matrix} \sum _{k=1}^K w_k\end{matrix}=1\) and \(w_k\ge 0\), and we have K known models. In all ensemble schemes, we set \(w_1=w_2=...=w_k\).

3.3 Gradient Descent Mechanisms

We discovered that some pixels did not update during the two iterations, which would affect the iteration direction of other pixels and ultimately affect the transferable of the adversarial samples. Moreover, we found that most unchanged pixels after two iterations have small absolute gradient values and those pixels with high absolute gradient values are more stable in the direction of iteration. In fact, pixels with a higher absolute gradient value have a greater impact on the loss in the white-box setting.

From the perspective of transferability, those pixels with a stable iteration direction are better represented common properties between models. Thus we can only change ones with high gradient absolute values during the iteration, so as to improve the transferability of adversarial examples under the premise of ensuring certain distortion. Based on the above analysis, we propose the higher absolute gradient method (HAG), which optimizes the adversarial perturbations over pixes with higher absolute gradient.

$$\begin{aligned} \boldsymbol{g}_{t+1}^{*}[i]= {\left\{ \begin{array}{ll} \boldsymbol{g}_{t+1}[i], &{}\text {if}\quad {i} \in topk_{index}\\ 0, &{}\text {if}\quad {i} \notin topk_{index} \end{array}\right. } \end{aligned}$$
(14)

where i is the index of the corresponding element, \(topk_{index}\) is computed by Eq. 15. The topk(kx) function returns the index of the first k percent of the largest elements of a given input tensor \(\boldsymbol{x}\). For MI-FGSM and NI-FGSM, the updating formulas of non-target attacks and target attacks are Eq. 16 and Eq. 17 respectively.

$$\begin{aligned} topk_{index} = topk(k,| \boldsymbol{g}_{t+1}|)\end{aligned}$$
(15)
$$\begin{aligned} \boldsymbol{x}^{adv}_{t+1} = \boldsymbol{x}^{adv}_{t} + \alpha \cdot sign( \boldsymbol{g}_{t+1}^{*})\end{aligned}$$
(16)
$$\begin{aligned} \boldsymbol{x}^{adv}_{t+1} = \boldsymbol{x}^{adv}_{t} - \alpha \cdot sign( \boldsymbol{g}_{t+1}^{*}) \end{aligned}$$
(17)
figure a

3.4 Optimization Algorithm

We can integrate ensemble schemes into gradient-based methods to generate adversarial examples with strong transferability. Adversarial examples crafted by one step attack method (FGSM) has higher transferability than Iterative attack methods in attacking single model. Nonetheless, when attacking ensemble models, one step method has a lower success rate on all original models so that it is failure to attack ensemble models’ common vulnerability. In iterative attack methods, I-FGSM greedily searches for adversarial images in the direction of the sign of the gradient at each iteration, it easily falls into poor local optimum. MI-FGSM adopts momentum [12] which stabilizes the update direction and assists to escape from poor local optimum. NI-FGSM, more than stabilizing the update directions, gives previous accumulated gradient a correction to look ahead. Those properties are helpful to escape from poor local optimum and improve transferability of adverasial images. We merge the three ensemble schemes into NI-FGSM, where \(J(\boldsymbol{x},y)\) can be calculated from Eq. 11, Eq. 12 and Eq. 13. We summarize the NI-FGSM-HAG algorithm for attacking ensemble models in Algorithm 1.

4 Experimental Results

In this section we will present experimental results to demonstrate the effectiveness of the proposed methods. We first discuss the experimental settings and implementation details in Sect. 4.1. Then we report the results of non-targeted attacks and targeted attacks for attacking a single model in Sect. 4.2. We further conduct two trials to study the effects of our methods on attacking an ensemble model on non-targeted attacks and targeted attack in Sect. 4.3.

4.1 Experimental Settings

In this section, we detail the models to be examined, the dataset to be evaluated and the hyperparameters to be used.

Models. For normally trained models, we study five networks, ResNet-50 [17] , ResNet-152 [18],VGG-19 [15], Densenet-121 [16], Inception-v3 [14].

Dataset. We use a dataset which randomly extracted an image from each category of the ILSVRC 2012 validation set, 1000 images in total, and all of them can be classified correctly by all five models in our examination. For targeted attacks, we randomly select a lable from additional lables besides the correct one.

Hyper-Parameters. For the hyper-parameters, we set number of iteration \(T = 10\), and step size \(\alpha = 2\). For MI-FGSM and NI-FGSM, we adopt the default decay factor \(\mu = 10\).

Table 1. Attack success rates (%) of non-targeted adversarial images where we attack a single network. The adversarial examples are generated by Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50 respectively using I-FGSM, MI-FGSM and NI-FGSM. * indicates the white-box attacks.
Table 2. Attack success rates (%) of targeted adversarial images where we attack a single network. The adversarial examples are generated by Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50 respectively using I-FGSM, MI-FGSM and NI-FGSM. * indicates the white-box attacks. Results of top-5 accuracy can be found in the Table 3)
Table 3. Top-5 accuracy of targeted adversarial images where we attack a single network. The adversarial examples are generated by Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50 respectively using I-FGSM, MI-FGSM and NI-FGSM. * indicates the white-box attacks.

4.2 Attacking a Single Model

We first study the transferability of attacking a single model. Table 1 presents the success rates of non-targeted attacks and Table 2 show the top-1 success rates of targeted attacks. For non-targeted attacks, the success rates are the misclassification rates against the models we consider. However, for targeted attacks, the success rates are the percentage of the adversarial examples crafted for one model that are classified as the target label by the corresponding model. The adversarial images are generated for Vgg-19, Dens-121, Res-152, Ince-v3 and Res-50 respectively. We use three iterative attack methods: I-FGSM, MI-FGSM, NI-FGSM to implement attack. The diagonal blocks represent white-box attack scenario and off-diagonal ones indicate black-box attack scenario. The models that we attack are arranged in rows, and that we test on in columns.

From the table, we can see that all three iterative attack methods can attack a white-box model with an almost 100% success rate for both non-targeted attacks and targeted attacks. As for the black-box scenario, it can be observed that NI-FGSM has a higher success rate than other iterative attack methods about 60% in non-targeted attacks, indicating the effectiveness of the optimization algorithm. But for target attack in black-box scenario, despite NI-FGSM and MI-FGSM increasing the success rates than I-FGSM, the success rates are still small, less than 1% in most cases, and only ten percent in the highest cases. We show top-5 success rates in Table 3. In the black-box scenario, targeted attacks are much harder than non-targeted attacks since the black-box model needs to classify adversarial images as specific error categories. We can do this by attacking attacking an ensemble of models. We’ll cover that in the next section.

4.3 Attacking an Ensemble of Models

Based on the above analysis, we focus on generating more transferable adversarial examples via attacking an ensemble of models. In this section, we display the experimental results of non-targeted attacks in Sect. 4.3 and targeted attacks in Sect. 4.3.

Table 4. The success rates (%) of non-targeted adversarial images where we attack an ensemble networks. We study five models Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50 and attack the ensemble networks by MI-FGSM. “*” indicates the black-box attacks. “-” indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other four models by three ensemble schemes: EI-logits, EI-softmax and EI-loss.
Table 5. The second line shows the percentage of pixels that did not change after two iterations and the third line shows the probability that the absolute gradient of invariant pixels will be in the last 50% after two iterations.
Table 6. The success rates (%) of non-targeted adversarial images where we attack an ensemble networks. Using EI-softmax, we studied five models Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50. “*” indicates the black-box attacks. ‘-”indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other four models by MI-FGSM, MI-FGSM-HAG and NI-FGSM-HAG.

Non-targeted Attack. We consider five models here, which are Vgg-19, Dens-121, Res-152, Inc-v3, Res-50. Adversarial images are crafted by an ensemble of four models, and tested on the another hold-out model. Firstly, we tested the effects of different ensemble schemes on non-target attack. We compare the results of the three ensemble schemes, ensemble in logits, ensemble in softmax and ensemble in loss using the MI-FGSM attack method. The results are shown in Table 4. It can be found that the ensemble in softmax is better than the other two ensemble schemes for both the white-box and black-box attacks. For example, if adversarial examples are crafted on Vgg-19, Dense-121, Res-152, Inc-v3 has success rates of 95.6% on Res-50 and 100% on Vgg-19, while baselines like EI-logits only obtains the corresponding success rates of 87.9% and 96.1%, respectively.

In Table 5, we show the percentage of pixels whose pixel value has not altered after two iterations. We found that about 25% of pixes values are unchanged after the first two iterations, and most of them have a small absolute gradient values. Intuitively, pixels with a steady iteration direction are better represented common properties between models. So for Eq. 15, k is set to 0.5. As a result, only half of the pixes change in each iteration, which means that less perturbation is added to the adversarial examples. To compare transferability within the same disturbance range, we set the number of iterations to 13 when applying this method. We can combine HAG with EI-softmax naturally to form a much stronger non-targeted attack. We report the results in Table 6. MI-FGSM-HAG method improves the success rates on challenging black-box models and maintains high success rates on white-box models. It should be noted that although we increase the number of iterations to 13 in MI-FGSM-HAG, the perturbation is still smaller than the MI-FGSM.

We then compare the success rates of NI-FGSM-HAG and MI-FGSM-HAG to see the effectiveness of optimization in Table 6. Experimental results show NI-FGSM-HAG is a stronger attack method than MI-FGSM-HAG. For the strongest attack method in the case of non-target attack NI-FGSM-HAG can fool white-box model at almost 100% and misclassify the balck-box model at almost 93% rate on average.

Table 7. The success rates (%) of targeted adversarial images where we attack an ensemble networks. We study five models Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50 and attack the ensemble networks by MI-FGSM. “*” indicates the black-box attacks. “-” indicates the hold-out model and the adversarial examples are generated for the ensemble of the other four models by three ensemble schemes: EI-logits, EI-softmax and EI-loss.
Table 8. The success rates (%) of targeted adversarial images where we attack an ensemble networks. Using EI-loss, we studied five models Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50. “*” indicates the black-box attacks. “-”indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other four models by MI-FGSM, MI-FGSM-HAG and NI-FGSM-HAG.
Table 9. Top-5 accuracy of targeted adversarial images where we attack an ensemble networks. Using EI-loss, we studied five models Vgg-19, Dens-121, Res-142, Inc-v3 and Res-50. “*” indicates the black-box attacks. “-” indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other four models by MI-FGSM, MI-FGSM-HAG and NI-FGSM-HAG.

Targeted Attack. For more challenging targeted attack, we also examine the transferability of targeted adversarial images based on ensemble models. Table 7 presents the results for three esemble schemes using MI-FGSM methods. The results show EI-loss reaches much higher success rates than other two ensemble schemes on both black-box models and white-box models. Under the white-box setting, we see that EI-loss can reach more than 85% success rate. However, the highest success rate is only 11.7% under the black box setting.

For gradient descent mechanism, we set k to 0.5 like untargeted attack. The results are summarized in Table 8. The HAG yields a maximum black-box success rate of 14.5% with lower distortion. We then conducted experiments to validate the effectiveness of the combination of NI-FGSM and HAG. As Table 8 suggests, NI-FGSM-HAG obtains a significant performance improvement. The best black-box success rate attained 17.3 %, and in white-box models, the lowest success rate reached 98.9%. We also examine targeted attacks based on top-5 accuracy, the highest success rate is 30% in balck-box setting. The results can be found in the Table 9. We found that targeted attacks also have almost the same success rate as non-target attacks in the white box setting, but a low success rate in the black-box models, which means that targeted adversarial examples have a much poor transferability.

5 Conclusion and Future Work

In this paper, we propose three methods to improve the transferability of adversarial examples based on the ensemble models. Specifically, we found that different ensemble schemes have different effects on non-targeted attacks and targeted attack, EI-softmax suitable for non-targeted attacks and EI-loss suitable for targeted attacks. Moreover, we discovered that pixels with higher absolute gradient values have better transferability. By integrating HAG with NI-FGSM, we can further improve the transferability of adversarial examples. We conduct extensive experiments to demonstrate that our methods not only yield higher success rates on untargeted attacks but also enhanced the success rates on more harder targeted attacks.