DANAA: Towards Transferable Attacks with Double Adversarial Neuron Attribution

Jin, Zhibo; Zhu, Zhiyu; Wang, Xinyi; Zhang, Jiayu; Shen, Jun; Chen, Huaming

doi:10.1007/978-3-031-46664-9_31

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14177))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

732 Accesses
1 Citations

Abstract

While deep neural networks have excellent results in many fields, they are susceptible to interference from attacking samples resulting in erroneous judgments. Feature-level attacks are one of the effective attack types, which target the learned features in the hidden layers to improve their transferability across different models. Yet it is observed that the transferability has been largely impacted by the neuron importance estimation results. In this paper, a double adversarial neuron attribution attack method, termed ‘DANAA’, is proposed to obtain more accurate feature importance estimation. In our method, the model outputs are attributed to the middle layer based on an adversarial non-linear path. The goal is to measure the weight of individual neurons and retain the features that are more important toward transferability. We have conducted extensive experiments on the benchmark datasets to demonstrate the state-of-the-art performance of our method. Our code is available at: https://github.com/Davidjinzb/DANAA.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Improving the transferability of adversarial examples with path tuning

Article 11 September 2024

Towards Both Accurate and Robust Neural Networks Without Extra Data

Few2Decide: towards a robust model via using few neuron connections to decide

Article Open access 30 January 2022

Keywords

1 Introduction

Deep neural networks (DNNs) have been used in a wide range of applications in different fields, such as face recognition [6], voice recognition [1] and sentiment analysis [30]. DNNs can also achieve state-of-the-art performance in tasks such as security verification in unconstrained environments where very low false positive rate metrics are required [6]. However, deep learning models are shown to be vulnerable to interference from adversarial samples. Attackers can manipulate the model outcome by deliberately adding the perturbations to the original samples to attack the models [28].

In general, the current approaches to attack models can be categorised into two types: white-box attack [12] and black-box attack [22]. For white-box attacks, the attacker knows the relevant parameters of the target model and can formulate the most suitable attack method. For black-box attacks, on the other hand, the attacker does not have access to the model parameters. In terms of the characteristics of the white-box and black-box attack methods, the black-box attack provides the adversarial performance of the attacking samples, which is useful for improving the robustness of deep learning models in real-world scenarios. Specifically, the black-box attack methods have three types, including query-based method [14], transfer-based method [8] and hybrid method [9].

The objective of the query-based method is to interrogate the model to extract pertinent input or output information, and subsequently utilize this limited information to iteratively generate optimal adversarial samples. However, such method is subject to restrictions imposed by access permissions and often require multiple queries to obtain excellent adversarial samples. The transfer-based method aims to train and generate adversarial samples on a known-information local surrogate model, which are then transferred and tested on the target black-box model for the attack success rate. Compared to query-based methods, transfer-based methods do not require additional access to the model and can bypass certain adversarial defense mechanisms aimed at queries. The hybrid method combines the principles of query and transfer approaches. Although it can achieve sufficiently high attack success rate, it also implies that it is susceptible to adversarial defense mechanisms targeting both queries and transfers. Therefore, in this paper, we focus on transfer-based method.

As a common approach of transfer-based attack, feature-level attack attempts to maximise the internal feature loss by attacking intermediate layers’ features to improve the transferability of the attack [32]. The aim is to increase the weight of negative features in the middle layer of the model while decreasing the weight of positive features. More negative features will be retained to assist the diversion of the model’s predictions. However, it is still challenging to harmoniously differentiate the middle-level features via feature-level attack method, which is also prone to its local optimum [32]. Moreover, it is well-known that the effectiveness of transfer-based black-box attacks is influenced by the overfitting on surrogate models and specific adversarial defenses. To address these challenges, we propose to utilise the information of neuron importance estimation for the middle layer to identify the adversarial features more accurately. In addition, we also evaluate the transferability of our proposed method on adversarially trained models, which will be specifically discussed in Sect. 4. The results demonstrate that our method achieves favorable attack success rates even on target models protected by adversarial defenses.

To obtain adversarial samples with higher transferability, this paper presents a double adversarial neuron attribution attack (DANAA). DANAA method attributes the model outputs to the middle layer neurons, thus measuring individual neuron weights and retaining features that are more important towards transferability. We use adversarial non-linear path selection to enrich the attacking points, which improves the attribution results. Extensive experiments on the benchmarking datasets following the literature methods have been conducted. The results show that, DANAA can achieve the best performance for the adversarial attacks. We anticipate this work will contribute to the attribution-based neuron importance estimation and provides a novel approach for transfer-based black-box attack. Our contributions are summarised as follows:

We propose DANAA, an innovative method of non-linear gradient update paths to achieve a more accurate neuron importance estimation, for a more in-depth study of the route to attribution method.
We present both theoretical and empirical investigation details for the attribution algorithm in DANAA, which is a core part of the method, in Sect. 3.
A comprehensive statistical analysis is performed based on our benchmarking experiments on different datasets and adversarial attacks. The results in Sect. 4 demonstrates the state-of-the-art performance of DANAA method.

2 Related Work

In this section, we review the literature on white-box attacks, query-based black-box attacks, transfer-based black-box attacks, and hybrid black-box attacks.

2.1 Common White-Box Attacks

Previous work has demonstrated that neural networks are highly susceptible to misclassification by pre-addition of perturbed test samples. Such processed samples are called adversarial samples. The emergence of adversarial samples has led to the development of a range of adversarial defences to ensure the model performance [16, 28, 29].

Currently, adversarial attacks can be divided into white-box attacks and black-box attacks depending on the level of available information for the model being attacked. There are various approaches for white-box attacks, such as gradient-based and GAN-based. Gradient-based white-box attacks include FGSM [12], I-FGSM [16], PGD [20] and $ C \& W$ [3]. Some recent GAN-based white-box attack methods are AdvGAN [33], GMI [37], KED-MI [4] and $ Plug \& Play$ [24]. While white-box attacks are effective in measuring the robustness of a model under attack, in real-world scenario, the parameters of the model are often not accessible, leading to the development of black-box attacks.

2.2 Query-Based Black-Box Attacks

Query-based attacks are a branch of black-box attacks aiming to train an effective adversarial sample by performing a small-scale attack on the target model to query the model parameters, such as the model labels and confidence levels. These parameters can be used as part of the dataset to assist in training the migration algorithm to verify the migration of the black-box model. Ilyas et al. [14] were the first to propose a query-based black-box attack approach. Following, they proposed combining prior and gradient estimation of historical queries and data structures based on Bandit Optimization, which greatly reduces the number of queries [15]. Li et al. [17] proposed a query-efficient boundary-based black box attack method (QEBA). It proved that the gradient estimation of the boundary-based attack over the entire gradient space is invalid in terms of the number of queries. Andriushchenko et al. [2] proposed the square search attack method, which selects local square blocks at random locations in the image to search and update the direction of the attack.

2.3 Transfer-Based Black-Box Attacks

The transferability of adversarial attacks refers to the applicability of the adversarial samples generated by the local model to the target model for attack. The attacker firstly uses the parameters obtained from the attack on the local model to train the adversarial samples, then uses these samples to perform a black-box attack on the target model to verify the success rate.

There are three main categories of transfer-based black-box attacks, namely gradient calculation methods, input transformation methods and feature-level attack methods. Gradient calculation methods such as MIM [7], VMI-FGSM [31] and SVRE [35] improve transferability by designing new gradient updates. Input transformation methods such as DIM [34], PIM [11] and SSA [18] boost the transferability by using input transformations to simulate the ensemble process of the model, while feature-level attacks focus on the middle-layer features.

Some state-of-the-art feature-level attack methods include NRDM, FDA, FIA and NAA, etc. NRDM [21] attempts to maximise the degree of distortion between neurons, but it does not take into account the role of positive and negative features in the attack. FDA [10] averages the neuronal activation values to obtain an estimate of the importance of a neuron. However, this method does not distinguish the degree of each neuron’s importance and the discrimination between positive and negative features is still too low. FIA [32] multiplies the activation values of neurons and back-propagation gradients for estimation, but its effect on the original input is affected by over-fitting and the results are not accurate. NAA [36] effectively improves the transferability of the model and reduces computational complexity by attributing the model’s output to an intermediate layer to obtain a more accurate importance estimation. However, its attribution method focuses more on the gradient iteration process considering linear path, and there is still room for improvement in the non-linear path condition.

2.4 Hybrid Black-Box Attacks

Hybrid method is a combination of query-based method and transfer-based method. It not only considers the priori nature of the transfer but also utilizes the gradient information obtained from the query, which resolves the challenges of high access cost for the query attack and low accuracy for transfer attack.

Dong et al. [5] proposed a hybrid method named P-RGF, which used the gradient of surrogate model as prior knowledge to guide the query direction of RGF and obtained the same success rate as RGF with fewer queries. Fu et al. [9] train Meta Adversarial Perturbation (MAP) on an surrogate model and perform black-box attacks by estimating the gradient of the model, which has good transferability and generalizability. Ma et al. [19] introduced Meta Simulator to black-box attacks based on the idea of meta-learning. By combining query and transfer based attacks, the researchers not only significantly reduce the number of queries, but also reduce the complexity of queries by transferring the adversarial samples trained on the surrogate model to the target model.

While there are different types of black-box attack methods, transfer-based attacks is considered as the most convenient method which doesn’t require additional information queries for the model. However, it poses the challenge of a good transferability for the adversarial samples. Therefore, in this work, we target the transfer-based attack methods. Especially, we introduce the attribution method for the middle-layer feature estimation, which shows a promising performance with our experiments.

3 Method

3.1 Preliminaries

When an adversarial attack to the target model can be successfully launched given an adversarial samples trained with a local DNN model, we consider there is a strong transferability relationship between these two models. Formally, with a deep learning network $N:R^{n}\rightarrow R^{c}$ and original image sample $x^{0}\in R^{n}$, whose true label is t, if the imperceptible perturbation $\sum _{k=0}^{t-1}\bigtriangleup x^{k}$ is applied on the original sample $x^{0}$, we may mislead the network N with the manipulated input $x^{t}=x^{0}+\sum _{k=0}^{t-1}\bigtriangleup x^{k}$ to the label of m, which can also be denoted as $x^{adv}$. Assuming the output of the sample x as N(x), the optimization goal will be:

$$\begin{aligned} \left\| x^{t}-x^{0} \right\| _{n}<\epsilon \quad subject \ to\quad N(x^{t})\ne N(x^{0}) \end{aligned}$$

(1)

where $\left\| \cdot \right\| _{n} $ represents the n-norm distance. Considering the activation values in the middle layers of network N, we denote the activation value of y-th layer as y and the activation value of j-th neuron as $y_{j}$.

3.2 Non-linear Path-Based Attribution

Inspired by [25] and [36], we define the attribution results of input image $x^{t}$(with $n \times n$ pixels) as

$$\begin{aligned} A:=\sum _{i=1}^{n^{2}} \int \bigtriangleup x_{i}^{t} \frac{\partial N(x^{t} )}{\partial x^{t} }\textrm{d}t \end{aligned}$$

(2)

As shown in Fig. 1, different from the NAA algorithm [36], our paper proposes a new attribution idea that uses a non-linear gradient update path instead of the original linear path, which allows the model to find the optimal path against the attack itself. In Eq. 2, the gradient of N iterates along the non-linear path $x^{t}=x^{0}+\sum _{k=0}^{t-1}\bigtriangleup x^{k}$, in which $\frac{\partial N}{\partial x^{t}}(\cdot )$ is the partial derivative of N to the i-th pixel. For each iteration, $\bigtriangleup x^{t}=sign(\frac{\partial N(x^{t})}{\partial x^{t}})+N(0,\sigma )$. We further apply the learning rate and Gaussian noise to update the perturbation.

Afterwards, we can approximate A as N(x) depending on basic advanced mathematics and extend the attribution results to each layer. The formula of attribution can then be expressed as:

$$\begin{aligned} A_{y_{j}}:=\sum _{i=1}^{n^{2}} \int \bigtriangleup x_{i}^{t} \frac{\partial N(x^{t})}{\partial y_{j}({x^{t}})}\frac{\partial y_{j}(x^{t})}{\partial x^{t}}\textrm{d}t \end{aligned}$$

(3)

where $A_{y_{j}}$ represents the attribution of j-th neuron in the layer y, $\sum A_{y_{j}}=A$. We provide the relevant proof of our non-linear path-based attribution in following section.

3.3 Proof of Non-linear Path-Based Attribution

Since we now have $A_{y_{j}}$ as Eq. 3, assuming that the neurons on the middle layer of the deep neural network are independent from each other, $A_{y_{j}}$ can be expressed as

$$\begin{aligned} A_{y_{j}}:=\int \frac{\partial N(x^{t})}{\partial y_{j}(x^{t})}\sum _{i=1}^{n^{2}} \bigtriangleup x_{i}^{t} \frac{\partial y_{j}(x^{t})}{\partial x^{t}}\textrm{d}t \end{aligned}$$

(4)

where $\frac{\partial N(x^{t})}{\partial y(x^{t})}$ is the gradient of$N(x^{t})$ to the j-th neuron, $\sum _{i=1}^{n^{2}} \bigtriangleup x_{i}^{t} \frac{\partial y_{j}(x^{t})}{\partial x^{t}}$ is the sum of the gradient of $y_{j}$ to each pixel on $x^{t}(x^{t}\in R^{n})$. Since the two gradient sequences are zero covariance, we then convert Eq. 4 into:

$$\begin{aligned} A_{y_{j}}:=\int \frac{\partial N(x^{t})}{\partial y_{j}(x^{t})}\textrm{d}t\cdot \int \sum _{i=1}^{n^{2}} \bigtriangleup x_{i}^{t} \frac{\partial y_{j}(x^{t})}{\partial x^{t}} \textrm{d}t \end{aligned}$$

(5)

Combining the principles of calculus, we can prove that

$$\begin{aligned} \int \sum _{i=1}^{n^{2}} \bigtriangleup x_{i}^{t} \frac{\partial y_{j}(x^{t})}{\partial x^{t}} \textrm{d}t=y_{j}^{t}-y_{j}^{0} \end{aligned}$$

(6)

then we denote $y_{j}^{t}-y_{j}^{0}$ as $\bigtriangleup y_{j}^{t}$, Eq. 5 can be converted into

$$\begin{aligned} A_{y_{j}}:=\bigtriangleup y_{j}^{t}\int \frac{\partial N(x^{t})}{\partial y_{j}(x^{t})}\textrm{d}t \end{aligned}$$

(7)

Denoting $\int \frac{\partial N(x^{t})}{\partial y_{j}(x^{t})}\textrm{d}t$ as $\gamma (y_{j})$, which means the gradient of network N along our non-linear path with attention to the j-th neuron. Afterwards, we can get $A_{y_{j}}=\bigtriangleup y_{j}^{t}\cdot \gamma (y_{j})$. Since the neuron $y_{j}$ is on the middle layer y, finally the attribution result of the layer y can be expressed as

$$\begin{aligned} A_{y}=\sum _{y_{j}\in y}A_{y_{j}}=\sum _{y_{j}\in y}\bigtriangleup y_{j}^{t}\cdot \gamma (y_{j})=\bigtriangleup y^{t}\cdot \gamma (y) \end{aligned}$$

(8)

Algorithm 1 shows the specific pseudocode structure of our DANAA algorithm with Non-linear Path-based attribution.

4 Experiments

Extensive experiments have been conducted to demonstrate the efficiency of our method. Following sections cover the topic of leveraged datasets, benchmarking models and incorporated metrics. We also provide the experimental settings. We performed five rounds of benchmarking experiments to compare our algorithm with other methods, demonstrating the superiority of our approach to the baselines in terms of transferability for adversarial attacks. Moreover, we conducted the ablation study to investigate our approach, focusing on the impact of various learning rates and noise deviation on attack transferability.

4.1 Dataset

Following other literature methods, the widely-used datasets from NAA work [36] are considered in this paper. The datasets consist 1000 images of different categories randomly selected from the ILSVRC 2012 validation set [23], which we called a multiple random sampling(MRS) dataset.

4.2 Model

We include four widely-used models for image classification tasks, namely Inception-v3 (Inc-v3) [27], Inception-v4 (Inc-v4) [26], Inception-ResNet-v2 (IncRes-v2) [26], and ResNet-v2-152 (Res152-v2) [13], as source models for assessing the attacking performance of our algorithm. We start with four pretrained models without adversarial learning, which include Inc-v3, Inc-v4, IncRes-v2, and Res152-v2. Later on, we construct more robust models for a in-depth comparison, such as including adversarial training for the pretrained models. This results in two adversarial trained models, including Inception-v3(Inc-v3-adv) and Inception-Resnet-v2 (IncRes-v2-adv) [16]. The remaining three models are based on the ensemble models: the ensemble of three adversarial trained Inception-v3(Inc-v3-adv-3), the ensemble of four adversarial trained Inception-v3 (Inc-v3-adv-4), and the ensemble of three adversarial trained Inception-Resnet-v2 (IncRes-v2-adv-3), following the work from [29]. In [29], the models are combined by training the sub-models of the corresponding model independently and finally weighting the results of each sub-model to increase the accuracy and robustness of the model.

4.3 Evaluation Metrics

The attack success rate is selected as the metric to evaluate the performance. It measures the proportion of the dataset where our method produces incorrect label predictions after attacking. Hence, a higher success rate indicates improved performance of the attack method.

4.4 Baseline Methods

For comparison in our experiment, we selected five state-of-the-art attack methods as the baseline, including MIM [7], NRDM [21], FDA [10], FIA [32], and NAA [36]. Furthermore, to test the effect of each model after combining input transformation methods and to verify the superiority of our algorithm, we apply both DIM and PIM to the attack methods. The implementation details can be found in the open source repository. Consequently, we extend the model comparison set with MIM-PD, NRDM-PD, FDA-PD, FIA-PD, NAA-PD and DANAA-PD, respectively.

4.5 Parameter Setting

In the experiment, we set the parameters as following: the learning rate (lr) is 0.0025; the noise deviation is 0.25; and the maximum perturbation rate is 16, which is derived from the number of iterations (15) and the step size (1.07). The batch size is 10, and the momentum of the optimization process is 1. Since we introduced the DIM and PIM algorithms to verify the superiority of our model when combining input transformation methods, we set the transformation probability of DIM to 0.7, and the amplification factor and kernel size of PIM are 2.5 and 3, respectively. For the target layer of the attack, we choose the same layer as in NAA. Specifically, we attack InceptionV3/InceptionV3/Mixed_5b/concat layer for Inc-v3; InceptionV4/InceptionV4/Mixed_5e/concat layer for Inc-v4; InceptionResnetV2/Ince-ptionResnetV2/Conv2d_4a_3x3/Relu layer for IncRes-v2; the ResNet-v2-152/blo-ck2/unit_8/bottleneck_v2/add layer of Res152-v2 [36].

4.6 Result

All the experiments are carried out with the hardware of RTX 2080Ti card. A detailed replication package can be found in the open source repository at https://github.com/Davidjinzb/DANAA. We subsequently compile the results of all the attack methods without and with the input transformation methods (ending with PD) in Table 1.

In Table 1, we can see that, DANAA has retained a strong and robust performance across all the models, in comparison with other attack methods. Especially, DANAA demonstrated notable improvements on five models that are adversarial trained. We can observe a largest improvement of the attacking performance is between our method and NAA method [36], which is the generally second best attacking method in the comparison experiments. The ratio of improvement is 9.0%. Across all local models, our approach demonstrated an overall average improvement of 7.1% as compared to NAA on the adversarial trained models. By introducing the PD concept, our method achieves a maximum improvement of 9.8% over NAA-PD and an overall average improvement of 7.3% on the adversarial trained models.

Table 1. Attack success rate of multiple methods on different models

Full size table

4.7 Ablation Study

In this section, we investigate the impact of the learning rate and Gaussian noise deviations on the performance of the proposed method.

4.7.1 The Impact of Learning Rates.

Experiments are conducted using different scales of learning rates, which are 0.25, 0.025, 0.0025 and 0.00025. In Fig. 2, the DANAA method exhibits the highest attack success rate for nearly all models when the selected learning rate was 0.0025. In Fig. 3, the highest attack success rates are achieved on most models for DANAA-PD method.

Notably, when using Inception-ResNet-v2 as the source model, although at a learning rate of 0.0025 DANAA-PD ranked second best in attack success rate on the models without adversarial training, its effectiveness on the model with adversarial training is still much higher than those at other learning rates.

4.7.2 The Impact of Gaussian Noise Deviation (Scale).

To verify the effect of adding Gaussian noise to the gradient update on model transferability in this paper, we selected different noise deviations for testing in this subsection. As shown in Fig. 4 and Fig. 5, five different scales of the Gaussian noise deviation ranging from 0.2 to 0.4 are used in this experiment. In general, higher value of scale tends to have more superior results for the normal training model while sacrificing performance for the more robust one. Conversely, a lower value of scale results in less improvements for the normal trained model but better performance for the adversarial trained model. Accordingly, the scale value of 0.25 is selected for the optimal performance in this paper.

5 Conclusion

In this paper, we propose a double adversarial neuron attribution attack method (DANAA) to achieve enhanced transfer-based adversarial attack results. Compared with other literature methods, our method obtains a better transferability for the adversarial samples. To derive more accurate importance estimates for the middle layer neurons, we firstly employ a non-linear path to the perturbation update process. Considering the calculation of gradient on the non-linear path, for all examined models, the performance of DANAA algorithm has substantially improved by up to 9.0% in comparison with the second best method with adversarial trained models, and has an average overall improvement by 7.1%. With the information transformation methods of DIM and PIM, our DANAA-PD algorithm also has a maximum enhancement of 9.8% and an average overall improvement of 7.3% compared to NAA-PD algorithm. Extensive experiments have demonstrated that the attribution model proposed in this paper achieves the state-of-the-art performance, with greater transferability and generalisation capabilities.

References

Aizat, K., Mohamed, O., Orken, M., Ainur, A., Zhumazhanov, B.: Identification and authentication of user voice using DNN features and I-vector. Cogent Eng. 7(1), 1751557 (2020)
Article Google Scholar
Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXIII. LNCS, vol. 12368, pp. 484–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_29
Chapter Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Chen, S., Kahla, M., Jia, R., Qi, G.J.: Knowledge-enriched distributional model inversion attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16178–16187 (2021)
Google Scholar
Cheng, S., Dong, Y., Pang, T., Su, H., Zhu, J.: Improving black-box adversarial attacks with a transfer-based prior. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Google Scholar
Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4312–4321 (2019)
Google Scholar
Fu, J., Sun, J., Wang, G.: Boosting black-box adversarial attacks with meta learning. In: 2022 41st Chinese Control Conference (CCC), pp. 7308–7313. IEEE (2022)
Google Scholar
Ganeshan, A., BS, V., Babu, R.V.: FDA: feature disruptive attack. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8069–8079 (2019)
Google Scholar
Gao, L., Zhang, Q., Song, J., Liu, X., Shen, H.T.: Patch-wise attack for fooling deep neural network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, XXVIII. LNCS, vol. 12373, pp. 307–322. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_19
Chapter Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: International Conference on Machine Learning, pp. 2137–2146. PMLR (2018)
Google Scholar
Ilyas, A., Engstrom, L., Madry, A.: Prior convictions: black-box adversarial attacks with bandits and priors. arXiv preprint arXiv:1807.07978 (2018)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
Li, H., Xu, X., Zhang, X., Yang, S., Li, B.: QEBA: query-efficient boundary-based blackbox attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1221–1230 (2020)
Google Scholar
Long, Y., et al.: Frequency domain model augmentation for adversarial attack. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part IV. LNCS, vol. 13664, pp. 549–566. Springer, Cham (2022)
Google Scholar
Ma, C., Chen, L., Yong, J.H.: Simulating unknown target models for query-efficient black-box attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11835–11844 (2021)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Naseer, M., Khan, S.H., Rahman, S., Porikli, F.: Task-generalizable adversarial attack based on perceptual metric. arXiv preprint arXiv:1811.09020 (2018)
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519 (2017)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Struppek, L., Hintersdorf, D., Correia, A.D.A., Adler, A., Kersting, K.: Plug & play attacks: towards robust and flexible model inversion attacks. arXiv preprint arXiv:2201.12179 (2022)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Wadawadagi, R., Pagi, V.: Sentiment analysis with deep neural networks: comparative study and performance assessment. Artif. Intell. Rev. 53(8), 6155–6195 (2020)
Article Google Scholar
Wang, X., He, K.: Enhancing the transferability of adversarial attacks through variance tuning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933 (2021)
Google Scholar
Wang, Z., Guo, H., Zhang, Z., Liu, W., Qin, Z., Ren, K.: Feature importance-aware transferable adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7639–7648 (2021)
Google Scholar
Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018)
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2730–2739 (2019)
Google Scholar
Xiong, Y., Lin, J., Zhang, M., Hopcroft, J.E., He, K.: Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14983–14992 (2022)
Google Scholar
Zhang, J., et al.: Improving adversarial transferability via neuron attribution-based attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14993–15002 (2022)
Google Scholar
Zhang, Y., Jia, R., Pei, H., Wang, W., Li, B., Song, D.: The secret revealer: generative model-inversion attacks against deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 253–261 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Sydney, Camperdown, Australia
Zhibo Jin, Zhiyu Zhu & Huaming Chen
Jiangsu University, Zhenjiang, China
Xinyi Wang
Suzhou Yierqi, Changshu, China
Jiayu Zhang
University of Wollongong, Wollongong, Australia
Jun Shen

Authors

Zhibo Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiayu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shen
View author publications
You can also search for this author in PubMed Google Scholar
Huaming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaming Chen .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Xiaochun Yang
The University of Indonesia, Depok, Indonesia
Heru Suhartanto
Beijing Institute of Technology, Beijing, China
Guoren Wang
Northeastern University, Shenyang, China
Bin Wang
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Bing Li
Sun Yat-sen University, Guangzhou, China
Huaijie Zhu
Anhui University, Hefei, China
Ningning Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, Z., Zhu, Z., Wang, X., Zhang, J., Shen, J., Chen, H. (2023). DANAA: Towards Transferable Attacks with Double Adversarial Neuron Attribution. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14177. Springer, Cham. https://doi.org/10.1007/978-3-031-46664-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-46664-9_31
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46663-2
Online ISBN: 978-3-031-46664-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DANAA: Towards Transferable Attacks with Double Adversarial Neuron Attribution