AdvJND: Generating Adversarial Examples with Just Noticeable Difference

Zhang, Zifei; Qiao, Kai; Jiang, Lingyun; Wang, Linyuan; Chen, Jian; Yan, Bin

doi:10.1007/978-3-030-62460-6_42

Zifei Zhang¹²,
Kai Qiao¹²,
Lingyun Jiang¹²,
Linyuan Wang¹²,
Jian Chen¹² &
…
Bin Yan¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12487))

Included in the following conference series:

International Conference on Machine Learning for Cyber Security

1273 Accesses
7 Citations

Abstract

Compared with traditional machine learning models, deep neural networks perform better, especially in image classification tasks. However, they are vulnerable to adversarial examples. Adding small perturbations on examples causes a good-performance model to misclassify the crafted examples, without category differences in the human eyes, and fools deep models successfully. There are two requirements for generating adversarial examples: the attack success rate and image fidelity metrics. Generally, the magnitudes of perturbation are increased to ensure the adversarial examples’ high attack success rate; however, the adversarial examples obtained have poor concealment. To alleviate the tradeoff between the attack success rate and image fidelity, we propose a method named AdvJND, adding visual model coefficients, just noticeable difference, in the constraint of a distortion function when generating adversarial examples. In fact, the visual subjective feeling of the human eyes is added as a priori information, which decides the distribution of perturbations, to improve the image quality of adversarial examples. We tested our method on the FashionMNIST, CIFAR10, and MiniImageNet datasets. Our adversarial examples keep high image quality under slightly decreasing attack success rate. Since our AdvJND algorithm yield gradient distributions that are similar to those of the original inputs, the crafted noise can be hidden in the original inputs, improving the attack concealment significantly.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Just noticeable difference for machine perception and generation of regularized adversarial images with minimal perturbation

Article 24 January 2022

Adversarial image perturbations with distortions weighted by color on deep neural networks

Article 03 October 2022

AFLOW: Developing Adversarial Examples Under Extremely Noise-Limited Settings

Keywords

1 Introduction

Deep neural networks (DNNs) are effective for completing many important but difficult tasks like computer vision [1,2,3,4], nature language processing [5,6,7,8], etc., and can achieve state-of-the-art performances in these tasks. Furthermore, they have approached human levels of performance in some specific tasks. Thus, we can assume that artificial intelligence is moving toward human intelligence step by step. However, Szegedy made an intriguing discovery that DNNs are vulnerable to adversarial examples [9], and he first proposed the concept of adversarial examples in image classification. A good-performance DNN model misclassifies inputs modified by adding small, imperceptible perturbations, which is hard to distinguish for humans. And adversarial examples are used to attack such applications like face recognition [10, 11], autonomous driving car [12, 13] and malware detection [14]. Obviously, adversarial examples are blind spots of deep models. The problem of generating adversarial examples can be regarded as an optimization problem, in which the target perturbations are minimized when the predicted label is not equal to the true label. The mathematical formula is decribed as follows:

$$ \begin{aligned} \hbox{min} \;\;D\left( {x,\,x\, + \,r} \right) \hfill \\ \text{s} .t.\;\;f\left( {x + r} \right)\, \ne \,f\left( x \right). \hfill \\ \end{aligned} $$

(1)

Let x be the input to the model, r the perturbation, D(x, x + r) the distortion function between adversarial examples and their original inputs, and f(x) the predicted label of the model. As shown in formula (1), there are two requirements for generating adversarial examples. One is to generate a misclassified example to attack successfully, and the other is to generate the smallest possible distortion value. These requirements ensure that the adversarial examples are similar to the original inputs and that high image fidelity is guaranteed. Because of the security threat of DNNs, adversarial examples have garnered significant attention among researchers, especially in the security critical applications. Classic methods for generating adversarial examples on deep learning have been established. Based on the adversarial setting criteria to sort, white-box attack represents to directly acquire all information, like training datasets, model architecture and so on. However, black-box attack means to get information by querying model indirectly. And the proposed methods usually use the L_p norm (L₀, L₂, L_∞), to classify the adversarial examples, which is used for constraining the perturbations. That is, in the definition of the distortion function D(x, x + r), the L_p norm is used as a distance metric to measure the similarity between the adversarial examples and the original inputs. Typically, Jiawei Su et al. [15] proposed the one pixel attack method with the L₀ norm constraint, which changes by only one or several pixels [16, 17] in a picture but results in a significant changes compared with the original image of the poor attack concealment with obvious altered traces. Additionally, a lower attack success rate is resulted. Szegedy et al. proposed a method to generate adversarial examples with box-constrained L-BFGS [9] via back-propagation to obtain gradient information. Moosavi-Dezfooli et al. proposed a method to search the minimum perturbations to a classified boundary, named DeepFool [18], with the high images fidelity and attack success rates. Both of them take the L₂ norm constraint, which interferes the entire picture. Adversarial examples which satisfy the L₂ norm constraint are similar to the original inputs [18, 19]. However, it is time consuming to generate adversarial examples, which is inefficient. Goodfellow et al. proposed the fast gradient sign method (FGSM) [20] with the L_∞ norm constraint, which fastly generates adversarial examples by maximizing the loss function, with low image fidelity and attack success rate. Furthermore, Kurakin Alexey et al. proposed an iterative fast gradient sign method (I-FGSM) [21] to improve FGSM. We herein mainly discuss the L_∞ norm constraint, restraining the maximum distance difference between the adversarial example and the original input. Generally, perturbations are increased to ensure the adversarial examples’ high attack success rate; however, the adversarial examples obtained in this manner exhibit poor concealment. To alleviate the tradeoff between the attack success rate and image fidelity, we propose a method that adds visual model coefficients in the L_∞ norm constraint. Because the L_∞ norm constraint is an objective metric, the distribution of perturbation is disordered and some noisy pixels are sensitive to the human eyes. Sid Ahmed Fezza et al. [22] thought the L_p norm did not correlate with human judgement and were not suitable as a distance metric. Adil Kaan Akan et al. [23] defined the machine’s just noticeable difference with regularization terms, other than just noticeable difference of human visual perception. And they generated just noticeable difference adversarial examples, which attacked successfully just right. Different from that, we take the visual model coefficients into consideration, and think it can be added in the constraint to improve the images quality and guarantee high image fidelity. In fact, the visual subjective feeling of the human eyes is added as a priori information in the constraint to control the distribution of perturbations. In our study, we integrate the just noticeable difference (JND) coefficients into the L_∞ norm constraint of the distortion function to complete above mentioned task.

JND coefficients are critical values at which a difference can be detected. Additionally, they reflect that the human eyes can recognize the threshold of an image change. In general, the JND model is applied in image encoding. There exists redundancy in images, which without de-redundancy would be transported with lower efficiency. And JND could determine the amount of tolerated distortions to guarantee the quality of the images. Image encoding with JND coefficients can improve coding efficiency significantly [24,25,26], called perceptual coding. In this study, we used the JND model of the image domain to hide noise. As shown in Fig. 1, after adding Gaussian noise with a variance of 0.01 in the original input, the image is significantly interfered. When we constrain the noise with JND coefficients to control the distribution of noise, a human visual system (HVS) cannot distinguish the difference between the original input and the JND image, which proves the noise concealment ability of JND coefficients.

JND coefficients can hide Gaussian noise because a region with large JND coefficients is a region with complex image textures. Additionally, it is difficult for our HVS to notice these changes in these regions, which are also called visual blind spots of the human eyes. The larger the JND coefficients, the higher are the thresholds, the greater is the redundancy, the smaller is the sensitivity of the human eyes, and the more noise can be embedded. Therefore, perturbations in regions with large JND coefficients are less likely to be detected. We integrate JND coefficients into the existing adversarial attack methods. Namely, we add JND coefficients to the norm constraint and define this method as AdvJND. The primarily contributions of this study are as follows:

We suggest a method to integrate JND coefficients for generating adversarial examples. We add the visual subjective feeling of the human eyes as a priori information in the constraint to decide the distribution of perturbations and generate adversarial examples with gradients distribution similar to that of the original inputs. Hence, the crafted noise can be hidden in the original inputs, thus improving the attack concealment significantly.
We demonstrate that generating adversarial examples with our algorithm costs less time than algorithms with the L₂ norm constraint like DeepFool, when the image quality and the attack success rate of their methods are approximate. Such fact proves that our AdvJND algorithm is more efficient.

In Sect. 2, we provide the implementation algorithm of AdvJND. The effects of AdvJND are shown in Sect. 3. In Sect. 4, we draw the conclusions.

2 Methodology

In our AdvJND algorithm, we should get some information in advance, like the original image’s JND coefficients and the original perturbations from the target model’s gradients. Hence, we compute the JND coefficients in Sect. 2.1, and adopt FGSM and I-FGSM methods to yield the original perturbations in Sect. 2.2. In Sect. 2.3, we introduce the complete AdvJND algorithm.

2.1 JND Coefficients

The JND coefficients are based on the representation of visual redundancy in psychology and physiology. The receiver of image information is the HVS. A JND spatial model in the image domain primarily includes two factors: luminance masking and texture masking. On one hand, according to the Weber’s law, the luminance contrast of perception in HVS increases with the practical’s luminance. On the other hand, since the complex texture area and excess noises are both high-frequency information, so that excess noises could be hided in the texture area easily. To better match the HVS characteristics, X. K. Yang [27] designed a nonlinear additive model for masking to give consideration to both luminance adaption and texture. And texture masking is determined by the average background luminance and the average luminance difference around a pixel [28, 29]. The JND coefficient of each pixel is obtained experimentally [27]. The formula is

$$ jnd\left( {i,\,j} \right)\, = \,\hbox{max} \left( {f_{1} \left( {bg\left( {i,\,j} \right),\,mg\left( {i,\,j} \right)} \right),\,f_{2} \left( {i,\,j} \right)} \right) , $$

(2)

where f₁(i, j) is the texture masking function, f₂(i, j) is the luminance adaption function, bg(i, j) and mg(i, j) represent gradient changes of the average background luminance and neighboring points at point (i, j), respectively.

Due to the visual redundancy in the image, there is a chance to embed noises in it. Furthermore, it is necessary for us to determine the magnitude of embedding noises to guarantee imperceptibility. Luckily, JND coefficients is related with HVS’s sensitivity and helpful to embedding noises without perceptibility, which improves the attack concealment.

2.2 Adversarial Attack Methods

The paper is based on the white-box adversarial attack setting, instead of Curls & Whey [30], which concerntrates on improving adversarial image quality under the same query times in black-box setting.

In this section, we review the related studies of adversarial attack. We primarily introduce the FGSM and its extended algorithm I-FGSM and obtain the original perturbations from them. And our method performs improvements based on the FGSM and I-FGSM. The reason why we choose I-FGSM as a baseline is that I-FGSM is the state-of-the-art white-box attack based on L_∞ norm constraint.

FGSM.

The basic concept of the FGSM [20] is to optimize in the direction of increasing loss function, i.e., generating adversarial examples in the positive direction of the gradient. It exhibits two characteristics. One is that it generates adversarial examples fast, as it only performs one back-propagation without iteration. Another is that it measures the distance between the adversarial example and the original input using the L_∞. These are the two main reasons for the obvious perturbations.

$$ p\, = \,\varepsilon \, \cdot \,sign\left( {\nabla_{x} J\left( {\theta ,\,x,\,y} \right)} \right) $$

(3)

$$ x^{\text{adv}} \, = \,x\, + \,p , $$

(4)

where ε represents the upper limit of perturbation,$ \nabla_{x} J\left( \cdot \right) $ represents the gradient value of the loss function to the original input via back-propagation, p represents the perturbation,$ x $ represents the original input, and x^adv represents the generated adversarial example.

I-FGSM.

The I-FGSM [21] is the expansion of the FGSM, which computes perturbations iteratively instead of in a one-shot manner. Specifically, a ε single value that changes in the direction of the gradient sign is replaced by a smaller α value; subsequently, the upper limit of the perturbation ε is used as limiting the constraint.

$$ x_{0}^{adv} \, = \,x $$

(5)

$$ C{\text{lip}}_{x,\varepsilon } \left\{ x \right\}\, = \,\hbox{min} \left( {1,\,x\, + \,\varepsilon ,\,\hbox{max} \left( {x\, - \,\varepsilon ,\,x} \right)} \right) $$

(6)

$$ x_{t + 1}^{adv} \, = \,Clip_{x,\varepsilon } \left\{ {x_{t}^{adv} \, + \,\alpha \, \cdot \,sign\left( {\nabla_{x} J\left( {\theta ,\,x,\,y} \right)} \right)} \right\} . $$

(7)

The I-FGSM achieves adversarial examples of better image quality than the one-shot FGSM. Meanwhile it implies more time costs.

2.3 AdvJND Methods

First, we are to calculate the JND coefficients of the original input and then normalize the processed JND coefficients to the L_∞. Specifically, we normalize the original input pixels to [0,1], and calculate the JND coefficients on each channel independently to simplify the calculation. Although the JND coefficients can reflect the edge information to some extent, for a more obvious edge area and a better discrimination, we calculate the power values of the JND coefficients, which allow large values to become larger, and small values to become smaller, that is, values representing edge areas are dramatically larger than smooth areas. In this paper, we square the image’s JND coefficients.

$$ jnd_{2} \, = \,jnd\, \times \,jnd. $$

(8)

On the other hand, after squaring, the obtained JND coefficients are close to the order of 1e–3. If perturbations added are directly controlled at 1e–3 or similar, it would be difficult to attack the image successfully although the perturbations obey the image’s gradient distribution. Thus, we discard the absolute values of the JND coefficients instead of their relative values, that is, we take JND coefficients to control the distribution of perturbations indirectly.

$$ \lambda \, = \,\frac{{p_{ori} }}{{\hbox{max} \left( {jnd_{2} } \right)}} $$

(9)

$$ k\, = \,\lambda \, \times \,jnd_{2} . $$

(10)

p_ori represents the original perturbations from the FGSM or I-FGSM method, represents the scaled value, and k is the JND coefficients’ relative values, which provide the critical information of the image texture location. Although the obtained adversarial examples are similar to the original inputs, their attack success rates are still lower than original adversarial examples’. In most cases, the large values of k primarily locate in the regions with complex textures, in which noise can be hided efficiently, and the small values of k locate in the smooth areas, in which our HVS are sensitive and easy to notice. Therefore, we decide the final values of k based on the location information. And our strategy is to reduce the small values of k in multiplies and calculate the final perturbations as follows.

$$ \left\{ \begin{aligned} & t\, = \,1,\;\;if\;k\, \ge \,\rho \\ & t\, = \,\gamma ,\;if\;k\, < \,\rho \\ \end{aligned} \right. $$

(11)

$$ p_{out} \, = \,k\, \times \,t. $$

(12)

We obtained the experience value experimentally. The threshold value ρ = ε/2, the reduced multiple γ = 1/4, and p_out represents the final adversarial perturbations. The AdvJND method is summarized in Algorithm 1.

Algorithm 1 takes the FGSM method as an example to show the complete process of our AdvJND algorithm to generate adversarial examples. If we implement our AdvJND algorithm based on the I-FGSM method, take the output x^adv as the input x, and repeat the procedures from step 3 to step 9 until satisfying the minimum condition or the maximum iterations.

3 Experiments

In this section, experiments on the FashionMNIST [31], CIFAR10 [32], and MiniImageNet datasets (using 1000 images from ILSVR2012 [33] test dataset, 1925 pictures in total, and the reason why we take the MiniImageNet dataset is that it can not guarantee the high recognition accuracy in classification tasks with the whole ImageNet dataset, and in order to show the effectiveness of our attack algorithm, we validate the MiniImageNet with high accuarcy.) are used to validate our AdvJND method, and these datasets correspond to network architectures LeNet-5 [34], VGG16 [35], and Inception_v3 [36], respectively. We demonstrate the advantages of the FGSM-JND and I-FGSM-JND algorithms over the original attack methods in Sect. 3.1. And the proposed AdvJND algorithm adopts a general approach of the constraint to generate adversarial examples. In Sect. 3.2, we compare the efficiency between the I-FGSM-JND and DeepFool algorithms.

3.1 AdvJND

The core of AdvJND is integrating JND coefficients into the L_∞ constraint. More similar adversarial examples are generated though the attack success rate, slightly decreasing within an acceptable scope.

FGSM vs. FGSM-JND.

The FGSM-JND is obtained by integrating JND coefficients into the FGSM. As shown in Fig. 2, the perturbations generated by the FGSM are distributed over the entire image, but the perturbations generated by the FGSM-JND are distributed over the edge region of the “pants”. The adversarial examples generated by FGSM are rough and modified obviously, but the adversarial examples generated by our algorithm are smooth and more similar to the original inputs, since our FGSM-JND algorithm can effectively control perturbations in such smooth regions with the location of small JND coefficients and mainly hide noise in regions with the location of large JND coefficients to ensure its adversarial capacity.

I-FGSM vs. I-FGSM-JND.

The I-FGSM-JND is obtained by integrating JND coefficients into the I-FGSM. In Fig. 3, the I-FGSM generates more obvious perturbations, especially in the smooth background region. However, the perturbations generated by the I-FGSM-JND primarily focus on regions of complex texture in the images (e.g., the “bird” in row 1), which is not sensitive to the HVS, and perturbations in it cannot be detected easily. And even in smooth regions like the body of the “bird”, our I-FGSM-JND generates smaller and fewer perturbations in such regions.

From a different perspective, we can explain this phenomenon with the histograms of oriented gradients (HOG) [37], which is a feature descriptor of an image and reflects outline and texture information of an image. We herein config the HOG basic settings with 8 orientations, pixels per cell and cell per block. In Fig. 4, even though the HOG of the adversarial examples (e.g., still the “bird” in row 1) generated by the I-FGSM-JND can mainly be perturbed by a small noise texture in the background, the outline of “bird” can be recognized. By contrast, FGSM-JND’s adversarial examples are covered with noise but cannot be recognized, that is, all the magnitudes and directions of textures are messy and even we can’t distinguish the target and background. On the other hand, the HOG descriptors of the “bird” in row 2 and the “dog” in row 3 are clearer than that of the “bird” in row 1, especially in the background regions. It is most likely that the background in row 1 is more complex, where JND coefficients is larger and we can add more noise. The texture complexity reflects the information of the edge, which is related with gradient. Thus, the gradients distribution of adversarial examples generated by the I-FGSM-JND is more similar to those of the image inputs.

Original Methods vs. Improvement Methods.

In Fig. 5, we select 10 adversarial examples randomly and enlarge their local regions (marked by a red box in the same place) to see more information in detail. For example, in row 8, we enlarge the sky to observe. The FGSM method generates distinct perturbations and the I-FGSM can produce more refined perturbations by iterating the FGSM method, which also proves that it is useful to iterate. To our surprise, integrating JND coefficients into the constraint, we can get smaller perturbations than the I-FGSM method. For all images, we can conclude that our AdvJND algorithm improves the image quality obviously, especially in smooth regions with simple texture. And I-FGSM-JND algorithm performs best. There is no doubt that it works when we take the JND coefficients as a priori information to control the distribution of gradients.

As shown in Table 1, the non-attack method means taking the original images as inputs without epsilon, and the attack success rate, namely (1-recognition accuracy), of the AdvJND algorithm is lower than or equivalent to that of the original attack method, which sacrifices a little attack success rates to improve the images fidelity. This is especially obvious in the FGSM and FGSM-JND. Because the FGSM is a one-step attack method, its effect on the attack success rate is larger than that on the image fidelity, which leads the gap of the attack success rate between the FGSM and FGSM-JND a little large. And by iterating, the attack success rate is higher and the image fidelity becomes better, meanwhile, the gap of the attack success rate between the I-FGSM and I-FGSM-JND decreases.

Table 1. Comparison of recognition accuracy between the original attack and AdvJND attack on the FashionMNIST, CIFAR 10, and MiniImageNet datasets.

Full size table

On the other hand, the performance of the FashionMNIST dataset, whether the attack success rate or the gap of the attack success rate between the original attack algorithm and our AdvJND algorithm, is worse than other datasets. It can be considered that the improvement effect of our AdvJND algorithm is a little critical about images because the JND coefficients are related to the the texture complexity of the image. However, such FashionMNIST dataset prefers simple textures and smooth backgrounds, and the MiniImageNet dataset includes more practical images in our real life with more complex textures. We know that the function of the JND coefficients are small in smooth images and the effects of the JND coefficients are not obvious, which explains why our AdvJND algorithm performs better on the MiniImageNet and CIFAR10 datasets than the FashionMNIST dataset.

3.2 I-FGSM-JND vs. DeepFool

The attack success rate of the I-FGSM-JND algorithm is slightly higher than that of DeepFool, but the average of time consuming for the I-FGSM-JND algorithm to generate an adversarial example is approximately only half of the DeepFool (in Table 2). The times are computed using a NVDIA GTX 1080Ti GPU. This is because DeepFool takes the smallest distance to the nearest classification boundary as the minimum perturbations. So, it must traverse the classification boundary and obtain the smallest distance. In case of the situation of 1000 classes, the disadvantage of time-consuming will be more obvious. Thus, the efficiency of the I-FGSM-JND algorithm is significantly higher than that of DeepFool, and the I-FGSM-JND is more suitable as a universal attack method (Fig. 6).

Table 2. The efficiency of generating adversarial examples with the I-FGSM and DeepFool.

Full size table

Similar to integrating the JND coefficients in the L_∞ norm, the subjective visual information of the human eyes is used as a priori information to improve the image quality of the adversarial examples. Furthermore, we can consider to embed the appropriate visual model coefficients into the L₂ norm constraint as a priori information which can provide a better search strategy or reduce the search space to decrease the iteration or traversal times to improve the efficiency.

4 Conclusions

Large perturbations lead the adversarial examples’ high attack success rate and bad image fidelity with poor concealment. To alleviate the tradeoff between the attack success rate and image fidelity, we herein proposed an adversarial attack method using AdvJND and used JND coefficients to relate the subjective feeling of human eyes and the image quality evaluation metric. The human eyes are not sensitive to changes in complex texture regions, which provides a chance for us to embed more noise in these regions. Our experimental results demonstrated that the HOG descriptors of adversarial examples generated by the AdvJND algorithm were similar to those of the original inputs; thus, noise could be hidden effectively in the original inputs. Our approach can be incorporated into the new proposed L_∞ norm-based attack method to build adversarial examples that are similar to the original inputs. In future work, other metrics of human visual evaluation can be integrated into the L₂ norm constraint to improve the efficiency of generating adversarial examples.

References

Lecun, Y., Boser, B., Denker, J., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105, MIT Press (2012)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR 2016, pp. 770–778. IEEE (2016)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional networks. In: CVPR 2017, pp. 4700–4708. IEEE (2017)
Google Scholar
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015, OpenReview (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, pp. 3104–3112, MIT Press (2014)
Google Scholar
Andor, D., Alberti, C., Weiss, D., et al.: Globally normalized transition-based neural networks. arXiv Preprint arXiv:1603.060426 (2016)
Szegedy, C., Zaremba, W., Sutskever, I., et al.: Intriguing properties of neural networks. In: ICLR 2014, OpenReview (2014)
Google Scholar
Sharif, M., Bhagavatula, S., Bauer, L., et al.: A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. 22(3), 1–30 (2019)
Article Google Scholar
Bose, A.J., Aarabi, P.: Adversarial attacks on face detectors using neural net based constrained optimization. In: MMSP 2018, pp. 1–6, IEEE (2018)
Google Scholar
Eykholt, K., Evtimov, I., Fernandes, E., et al.: Robust physical-world attacks on deep learning visual classification. In: CVPR 2018, pp. 1625–1634, IEEE (2018)
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I., et al.: Practical black-box attacks against machine learning. In: ACCC 2017, pp. 506–519, ACM (2017)
Google Scholar
Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. In: AAAI 2018, pp. 245–251, AAAI (2018)
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019)
Article Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: SP 2017, pp. 39–57, IEEE (2017)
Google Scholar
Papernot, N., McDaniel, P., Jha, S., et al.: The limitations of deep learning in adversarial settings. In: EuroS&P 2016, pp. 372–387, IEEE (2016)
Google Scholar
Moosavi-Dezfooli, S. M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR 2016, pp. 2574–2582, IEEE (2016)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., et al.: Universal adversarial perturbations. In: CVPR 2017, pp. 1765–1773. IEEE (2017)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR 2015, OpenReview (2015)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv Preprint arXiv:1607.02533 (2016)
Fezza, S.A., Bakhti, Y., Hamidouche, W., et al.: Perceptual evaluation of adversarial attacks for CNN-based image classification. In: QoMEX 2019, pp. 1–6. IEEE (2019)
Google Scholar
Akan, A.K., Genc, M.A. Vural, F.T.: Just noticeable difference for machines to generate adversarial images. arXiv Preprint arXiv:2001.1106 (2020)
Shen, D.F., Wang, S.C.: Measurements of JND property of HVS and its applications to image segmentation, coding, and requantization. In: Digital Compression Technologies and Systems for Video Communications, pp. 113–121 (1996)
Google Scholar
Xiao, W.: An H. 264 encode mode decision algorithm based on JND. J. Univ. Electron. Sci. Technol. China 42(1), 121–124 (2013)
Google Scholar
Li, Y., Liu, H., Chen, Z.: Perceptually lossless image coding based on foveated JND. In: IRI 2015, pp. 72–75, IEEE (2015)
Google Scholar
Yang, X., Ling, W., Lu, Z., et al.: Just noticeable distortion model and its applications in video coding. Signal Process.: Image Commun. 20(7), 662–680 (2005)
Google Scholar
Chou, C.H., Chen, C.W.J.: A perceptually optimized 3-D subband codec for video communication over wireless channels. IEEE Trans. Circ. Syst. Video Technol. 6(2), 143–156 (1996)
Article Google Scholar
Chou, C.H., Li, Y.C.: A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Trans. Circ. Syst. Video Technol. 5(6), 467–476 (1995)
Article Google Scholar
Shi, Y., Wang, S., Han, Y.: Curls & whey: boosting black-box adversarial attacks. In: CVPR 2019, pp. 6519–6527, IEEE (2019)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv Preprint arXiv:1708.07747 (2017)
Cifar10. http://www.cs.toronto.edu/~kriz/cifar.html
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR 2015, OpenReview (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: CVPR 2016, pp. 2818–2826, IEEE (2016)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR 2005, pp. 886–893, IEEE (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Academy of Information Systems Engineering, PLA Strategy Support Force Information Engineering University, Zhengzhou, 450001, China
Zifei Zhang, Kai Qiao, Lingyun Jiang, Linyuan Wang, Jian Chen & Bin Yan

Authors

Zifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Lingyun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Linyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zifei Zhang .

Editor information

Editors and Affiliations

Xidian University, Xi'an, China
Xiaofeng Chen
Guangzhou University, Guangzhou, China
Hongyang Yan
Michigan State University, East Lansing, MI, USA
Qiben Yan
Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science, Thuwal, Saudi Arabia
Xiangliang Zhang

Appendices

Explore the Influence of Epsilon on Image Quality. The epsilon is crucial for improving the attack success rate. In this section, we present the attack success rate and image fidelity of AdvJND attacks by changing the epsilon value.

When the epsilon increases from 0.01 to 0.2, the attack success rate improves, too. Simultaneously, the gap between the I-FGSM and I-FGSM-JND decreases gradually. When the epsilon is 0.2, the difference in the attack success rate between the I-FGSM and I-FGSM-JND is less than 0.009. However, in terms of image quality (in Fig. 7), the adversarial examples generated by the I-FGSM with epsilon 0.01 and those by the I-FGSM-JND with epsilon 0.2 with higher attack success rate are similar.

Therefore, adversarial examples generated by AdvJND are more similar to the original inputs when the attack success rates of the original attack and AdvJND attack are equivalent. In other words, by embedding the a priori information of the human eyes’ subjective feeling, the algorithm based on AdvJND attack is more effective for alleviating the tradeoff between the attack success rate and image fidelity and achieves to generate adversarial examples with more higher image quality.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Qiao, K., Jiang, L., Wang, L., Chen, J., Yan, B. (2020). AdvJND: Generating Adversarial Examples with Just Noticeable Difference. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12487. Springer, Cham. https://doi.org/10.1007/978-3-030-62460-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-62460-6_42
Published: 11 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62459-0
Online ISBN: 978-3-030-62460-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics