Robust GAN Based on Attention Mechanism

Wu, Qian; Cao, Chunjie; Mai, Jianbin; Tao, Fangjian

doi:10.1007/978-3-030-73671-2_8

Qian Wu^11,12,13,
Chunjie Cao^11,12,13,
Jianbin Mai^11,12,13 &
…
Fangjian Tao^11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12653))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

566 Accesses

Abstract

Deep neural networks (DNNs) have been found to be easily mislead by adversarial examples that add small perturbations to inputs to produce false results. Different attack and defense strategies have been proposed to better study the security of deep neural networks. But these works only focus on an aspect such as attack or defense. In this work, we propose a robust GAN based on the attention mechanism, which uses the deep latent features of the original image as prior knowledge to generate adversarial examples, and it can jointly optimize the generator and discriminator in the case of adversarial attacks. The generator generates fake images based on the attention mechanism to deceive discriminator, the adversarial attacker perturbs the real images to deceive discriminator, and the discriminator wants to minimize the loss between fake images and adversarial images. Through this training, we can not only improve the quality of adversarial images generated by GAN, but also enhance the robustness of the discriminator under strong adversarial attacks. Experimental results show that our classifier is more robust than Rob-GAN [14], and the generator outperforms Rob-GAN on CIFAR-10.

Access provided by Autonomous University of Puebla. Download conference paper PDF

AFLOW: Developing Adversarial Examples Under Extremely Noise-Limited Settings

Generate Transferable Adversarial Physical Camouflages via Triplet Attention Suppression

Article 01 June 2024

TransNoise: Transferable Universal Adversarial Noise for Adversarial Attack

Keywords

1 Introduction

In recent years, deep neural networks have achieved great success in image recognition [1], text processing [2], speech recognition [3] and other fields, even widely used in critical security applications, such as malware detection [4], driverless technology [5], aircraft collision avoidance detection [6], etc. These all rely on the security of deep neural networks, which has become the focus of artificial intelligence security. At present, studies have shown that the deep neural network is vulnerable to the disturbance of the original samples with small perturbations [7]. These disturbances can make the system produce wrong judgment results while cannot be perceived by human eyes. Such input samples are called adversarial samples [8]. Adversarial examples can not only pose potential threat by attacking deep neural networks, but also enhance the robustness of models through training models [9]. Therefore, it is necessary to study the generation of adversarial samples.

Adversarial samples can be divided into two categories according to the attack target: maliciously-chosen target class (targeted attack) or classes that are different from the ground truth (non-targeted attack). At present, different methods have been proposed to generate adversarial samples. These methods are mainly divided into three categories. The first is gradient-based attack, such as the Fast Gradient Sign Method (FGSM) [8], which uses the linear nature of the deep neural network model in the high-dimensional space to quickly obtain the anti-perturbation, and adds disturbances in the gradient direction of the input vector. However, there is a minimization problem in this way. The second is optimization-based attack. Such as C&W attack [10], by limiting the distance ${l}_{0}$, ${l}_{2}$, ${l}_{\infty }$ norms from the real image, the perturbation amplitude of the adversarial sample is reduced. But this method is slow because it can only focus on one instance at a time. The third is generative-network based attack. Such as Natural GAN [11], which generates adversarial examples of text and images by GAN and makes the generated adversarial examples more natural. These methods are also used in black box attack. Although the generation speed of these methods is fast, the disturbance is usually larger than the above two types of methods, and it's easy to be found.

Contrary to adversarial attacks, adversarial defenses are techniques that enable the model to resist adversarial samples. Compared with attacks, defenses are more difficult. Nevertheless, a large number of defense methods are still proposed, mainly in two aspects: the passive defenses, including input reconstruction, confrontation detection, and the active defenses, including defense distillation [12] and adversarial training [13].

However, the researches in these networks only focus on one aspect of attack or defense, and do not consider improving attack and defense simultaneously within a framework.

Our contribution in this work is:

A robust generative adversarial network based on the attention mechanism (Atten-Rob-GAN) is proposed. By introducing the attention mechanism to extract the original image features and use them as the input of generator G, the network can learn the relationship between the deep features of the image. Fake images generated by G are inputted into the discriminator D, while the adversarial images obtained from the attacker interference with the original images are also inputted into D. The adversarial training and GAN training are coordinated to obtain a powerful classifier, while improving the training speed of GAN and the quality of the generated images.

2 Materials and Methods

In this section, we will first introduce the definition of the problem, then briefly describe the framework of the Atten-Rob-GAN algorithm, and the method used to generate attacked images, finally explain the network in detail, concluding the formula and training details used in our framework.

2.1 Problem Definition

$x\in {R}^{n}$ is the original sample feature space, and n is the feature dimension. $({x}_{i},{y}_{i})$ is the $i$-th instance in the training set, which is composed of a feature vector ${x}_{i}\in X$ generated from an unknown distribution ${x}_{i}\sim {P}_{real}$ and the corresponding ground truth label ${y}_{i}\in Y$. Let ${x}_{fake}\in {R}^{n}$ be the feature space of false sample, and $n$ is the feature dimension. $({{x}_{fake}}_{i},{l}_{i})$ is the $i$-th sample pair in the false sample data set, ${{x}_{fake}}_{i}$ obeys an unknown distribution ${P}_{fake}$, and ${l}_{i}$ is the corresponding prediction label. ${x}_{adv}$ is the original image preprocessed by the PGD attack. The discriminator encourages ${x}_{fake}$ to approximate ${x}_{adv}$ within the perturbation range, so that ${P}_{fake}$ is close to ${P}_{real}$.

2.2 The Atten-Rob-GAN Framework

Figure 1 shows the overall framework of Atten-Rob-GAN, which mainly includes three parts: feature extractor $F$, generator network $G$, and discriminator $D$. The output $F(x)$ of the feature extractor $F$ which input is the real image and the noise vector $z$ are concatenated vectors to form $F(x)$*. The generator $G$ receives $F(x)$* to generates the fake image ${x}_{fake}$. The discriminator D receives the image ${x}_{adv}$ and the generator output ${x}_{fake}$, and distinguishes them, predict the category when the judgment is true.

The Loss Function

This work uses the same loss function as in Rob-GAN [14], the discriminator judges the source and category of the image, $P\left(S|X\right),P\left(C|X\right)=D(X)$. The only difference is that the generator $G$ adds the deep features of the original image for feature fusion as input, $X_{{fake}} = G\left( {\left( {c,z} \right) + F\left( {X_{{real}} } \right)} \right)$. The loss function has two parts:

Discriminator Loss:

$${L}_{s}=E\left[\mathrm{log}\mathrm{P}\left(S=real|{X}_{real}\right)\right]+E[logP(S=fake|{X}_{fake})]$$

(1)

Classification Loss:

$${L}_{{c}_{real}}=E[logP(C=c|{X}_{real})]$$

(2)

$${L}_{{c}_{fake}}=E[logP(C=c|{X}_{fake})]$$

(3)

Train the discriminator $D$ to maximize ${L}_{s}+{L}_{{c}_{real}}$, and train the generator $G$ to minimize ${L}_{s}-{L}_{{c}_{fake}}$.

2.3 The Method of Generating Adversarial Examples Datasets

Projected Gradient Descent (PGD)

Madry et al. proposed an attack used in adversarial training called “Projected Gradient Descent” (PGD) [15] in 2017. Here, the PGD attack refers to initializing a search for an adversarial instance at a random point within the allowed norm sphere, and then running several basic iterative methods [16] to find adversarial examples. Given an example $x$, whose ground truth label is $y$, the PGD attack calculates the adversarial disturbance $\delta $ by using the projection gradient descent to solve the following optimization:

$$\delta\,{:= }\,{}_{{\left| {\left| \delta \right|} \right| \le \delta _{{max}} }}^{{argmax}} l\left( {f\left( {x + \delta ;w} \right),y} \right)$$

(4)

Where $f(. ;w)$ is the network parameterized by the weight $w$, $l(. ,. )$ is the loss function, and we choose $||. ||$ as the ${l}_{\mathrm{\infty }}$ norm. The PGD attack is the strongest attack in first-order gradient attack. Using this attack to conduct adversarial training will make the defense more successful.

2.4 Implementation

Network Architecture

Next, we briefly introduce the network structure of Atten-Rob-GAN. For a fair comparison, we copied all the network architectures of the generator and discriminator from Rob-GAN. Other important factors, such as learning rate, optimization algorithm, and the number of discriminator updates in each cycle also remain unchanged. The only modification is that we added an attention mechanism to the input of the generator, the feature extractor (see Fig. 3).

Generator

The specific network structure of the generator is shown in Table 1:

Table 1. The specific structure of Atten-Rob-GAN generator

Full size table

The first layer of the generator is a fully connected layer that the input is 128 noise, and the output is a ${4}^{2}\times 64\times 16$ image, where ${4}^{2}$ is the size of the feature map, and 64 × 16 is the number of channels. Then there are 4 residual blocks, a batch regularization, and the last layer is a convolutional layer with the size of a 3 * 3 convolution kernel.

Discriminator

The specific network structure of the discriminator is shown in Table 2:

Table 2. The specific structure of Atten-Rob-GAN discriminator

Full size table

The first layer of the discriminator is the optimized residual block, its detailed information is shown in Fig. 2. Then there are 3 residual blocks, an activation layer, a fully connected layer. The last fully connected layer has two types, in one case, the number of output channels is 1 when judging true or false image, and the other is that the number of output channels is the number of categories when judging the image category.

Feature Extractor Based on Attention Mechanism

Here, we first extract the image features by reducing the dimension of the original image through a network structure completely symmetrical to the generator network, then introduce the attention mechanism (SE module [17]) to extract the spatial relationship in the image's shallow features and channel feature relationship to form deep features, so that the image can learn the weight coefficients of different channel features, thus the model can make more discerning about the characteristics of each channel. Figure 3 shows the detailed process of feature extractor F.

Training Details

We conduct experiments on the MNIST [18] and CIFAR-10 [19], where we use the training set to train Rob-GAN and Atten-Rob-GAN respectively, and evaluate the test set. After the model training is completed, the test set is input to the discriminator for testing, and the accuracy of the model is used as the measurement standard. The Adam optimizer with a learning rate of 0.0002 and ${\beta }_{1}=0,{\beta }_{2}=0.9$ is used to optimize the generator and discriminator. We sample the noise vector from the normal distribution and use label smoothing to stabilize the training.

Implementation Details

In our experiment, we use Pytorch for implementation and run on NVIDIA GeForce RTX 2080 Ti * 2. We train Atten-Rob-GAN to be 200 eopchs, batch size is 64, learning rate is 0.0002, attenuation by 50% every 50 steps, and PGD attack intensity is assumed to be 0.0625.

3 Results and Discussion

3.1 Robustness of Discriminator

In this experiment, we compared the robustness of the discriminator trained by Atten-Rob-GAN with Rob-GAN. As shown in [14], the robustness of Rob-GAN under adversarial attacks even surpasses the state-of-the-art adversarial training algorithm [15]. In the comparison of [20], adversarial training was considered to be the latest level of robustness. Since Rob-GAN is equivalent to Atten-Rob-GAN without an attention mechanism component to extract feature, for fair comparison, we keep all other components the same. In order to test the robustness of the model, we choose the widely used ${l}_{\mathrm{\infty }}$ PGD attack [15], but using other gradient-based attacks is also expected to produce the same results. As defined in (8), we set ${l}_{\mathrm{\infty }}$ disturbance as $\delta _{{max}} \in np.range\left( {0, 0.01, 0.02, 0.03, 0.04} \right)$. In addition, we scale the image to [−1, 1] instead of [0, 1], because the last layer of the generator has $tanh()$ output, so we need to modify it accordingly. We display the results in Table 3, all results are the average results after 5 runs.

Table 3. Accuracy of our model under ${l}_{\mathrm{\infty }}$ PGD-attack.

Full size table

We can observe from Table 3 that our model has a higher classification success rate than Rob-GAN without attack, which proves that our classifier is more accurate after training. At the same time, under the attack intensity of [0, 0.04], our accuracy is higher than Rob-GAN's classifier on CIFAR-10, which proves that our model can obtain a more robust classifier. In the case of an attack intensity of 0.04 on MNIST, our result is slightly lower than that of Rob-GAN. The reason may be that the number of experiments is too few, and the calculated mean result is not universal.

3.2 Quality of Generator

Finally, we evaluate the quality of the generator trained on the CIFAR-10 dataset by comparing it with the generator obtained by Rob-GAN. Figure 4 shows the adversarial images generated on the two models. We can clearly observe that the image quality generated by Atten-Rob-GAN is significantly better than Rob-GAN, and even brighter than the original image.

4 Conclusion

We propose a robust generative adversarial network based on the attention mechanism. By adding the attention mechanism, the features of the original image can be extracted deeply, thereby improving the quality of the image generated by the generator. At the same time, the discriminator and generator are jointly trained in the case of adversarial attack to obtain a more powerful discriminator, this method can effectively improve the robustness of the classifier. And through experimental comparison, it is proved that the attention mechanism component we added has an optimization effect on Rob-GAN, both in terms of the robustness of the discriminator and the quality of the generator.

References

Krizhevsky, I.S., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, pp. 11–17. ICLR, San Diego (2015)
Google Scholar
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Yuan, Z., Lu, Y.Q., Wang, Z.G., Xue, Y.B.: Droid-Sec: deep learning in android malware detection. In: ACM SIGCOMM 2014 Conference, pp. 371–372. ACM, Chicago (2014)
Google Scholar
Eykholt, K., Evtimov, I., Fernandes, E., et al.: Robust physical world attacks on deep learning models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625–1634. IEEE Computer Society, Salt Lake City (2018)
Google Scholar
Majumdar, R., Kunčak, V. (eds.): CAV 2017. LNCS, vol. 10426. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9
Book Google Scholar
Cubuk, E.D., Zoph, B., Schoenholz, S.S., Le, Q.V.: Intriguing properties of adversarial examples. In: 6th International Conference on Learning Representations (ICLR), pp. 106–118. ICLR, Vancouver (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations, pp. 65–78. ICLR, San Diego (2015)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., GoodfellowI, B.D., McDaniel, P.: Ensemble adversarial training: attacks and defences. In: 5th International Conference on Learning Representations, pp. 123–142. ICLR, Toulon (2017)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Privacy, vol. 0, pp. 39–57. IEEE, San Jose (2017)
Google Scholar
Zhao, Z., Dua, D., Singh, S.: Generating natural adversarial examples. In: 6th International Conference on Learning Representations, pp. 108–115. ICLR, Vancouver (2018)
Google Scholar
Papernot, N., Mc Daniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, San Jose (2016)
Google Scholar
Wu, Y., Bamman, D., Russell, S.: Adversarial training for relation extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1778–1783. ACL (2017)
Google Scholar
Liu, X., Hsieh, C.: Rob-GAN: generator, discriminator, and adversarial attacker. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11226–11235. IEEE Computer Society, Long Beach (2019)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: 5th International Conference on Learning Representations, pp. 1538–1549. ICLR, Toulon (2017)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. In: 5th International Conference on Learning Representations, pp. 995–1012. ICLR, Toulon (2017)
Google Scholar
Hu, J., Li, S., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2018)
Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database. Proc. IEEE 86(11), 2278–2324 (1989)
Article Google Scholar
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10 (Canadian institute for advanced research). http://www.cs.toronto.edu/~kriz/cifar.html
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: 35th International Conference on Machine Learning, ICML 2018, vol. 1, pp. 436–448. IMLS, Stockholm (2018)
Google Scholar

Download references

Acknowledgements

We acknowledge the support by the National Natural Science Foundation of China (No. 66162019); National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund (No. U19B2044).

Author information

Authors and Affiliations

Key Laboratory of Internet Information Retrieval of Hainan Province, Haikou, 570228, Hainan, China
Qian Wu, Chunjie Cao, Jianbin Mai & Fangjian Tao
College of Cryptography, Hainan University, Haikou, 570228, Hainan, China
Qian Wu, Chunjie Cao, Jianbin Mai & Fangjian Tao
College of Computer and Cyberspace Security, Hainan University, Haikou, 570228, Hainan, China
Qian Wu, Chunjie Cao, Jianbin Mai & Fangjian Tao

Authors

Qian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chunjie Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jianbin Mai
View author publications
You can also search for this author in PubMed Google Scholar
Fangjian Tao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hainan University, Haiukou, China
Jieren Cheng
Hainan University, Haikou, China
Xiangyan Tang
Hainan University, Haikou, China
Xiaozhang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Q., Cao, C., Mai, J., Tao, F. (2021). Robust GAN Based on Attention Mechanism. In: Cheng, J., Tang, X., Liu, X. (eds) Cyberspace Safety and Security. CSS 2020. Lecture Notes in Computer Science(), vol 12653. Springer, Cham. https://doi.org/10.1007/978-3-030-73671-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-73671-2_8
Published: 07 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73670-5
Online ISBN: 978-3-030-73671-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust GAN Based on Attention Mechanism

Abstract

Similar content being viewed by others

AFLOW: Developing Adversarial Examples Under Extremely Noise-Limited Settings

Generate Transferable Adversarial Physical Camouflages via Triplet Attention Suppression

TransNoise: Transferable Universal Adversarial Noise for Adversarial Attack

Keywords

1 Introduction