1 Introduction

At present, deep neural networks (DNNs) have been widely applied in various fields due to their ability to efficiently solve complex tasks. However, DNN is highly uninterpretable, making it difficult to control [1]. The safety of its applications in specific fields deserves attention, such as military, autonomous driving, and medical treatment. The concept of adversarial example was first proposed by Szegedy et al. [1] in 2014. That is, adding a small perturbation to an original image can generate an adversarial example that makes the DNN model misclassified with high confidence. According to the accessible knowledge of the structure and parameters of the target model, adversarial attack can be divided into white-box attack and black-box attack. Since the black-box attack is more practicable, thus attracting more attentions than the former [2]. The black-box attack includes the transfer-based [3] and query-based attack [4].

In the transfer-based black-box attack, Dong et al. [5] make the generated adversarial examples more transferable by increasing the momentum in the gradient direction. However, this approach has low attack success rate. In [11], Papernot et al. propose a dataset expansion method based on the Jacobian matrix to iteratively expand and improve surrogate model. However, when the dimension of the sampled image is large, the calculation of the Jacobian matrix will consume huge resources. Besides, it is difficult to completely imitate the decision boundary of the attacked model, which causes a low attack success rate.

Fig. 1.
figure 1

Pipeline of SLIA. The attack is initialized with an example that has already been adversarial, and then generating adversarial examples iteratively move along the decision boundary. Taking targeted attack as an example, we aim to obtain an adversarial image that visually looks like a cat but be misclassified as a black swan.

Since the surrogate model cannot fully imitate the target model, many researchers tend to directly estimate the structure and parameter information of the target model. The focus of black-box attacks is gradient estimation by querying model. Chen et al. [4] utilize the finite difference based Zero-Order Optimization (ZOO) algorithm to estimate the gradient of the loss function by accessing predicted probabilities of the target model. This method needs to estimate each pixel one by one, which requires numerous queries to generate accurate gradient estimation in each iteration, causing the low attack efficiency. Bhagoji et al. [13] use the finite difference method and the random grouping method to reduce the amount of calculation. However, the reduced calculation causes the low attack success rate on the large-size image dataset.

When the model’s prediction probabilities are accessible, attackers will typically prefer score-based attack. While in more realistic scenarios where only top-1 class predictions are available, attackers will have to resort to decision-based attack. The concept of boundary-based black-box attack was first proposed by Brendel et al. [19]. It only needs to utilize the final classification output of the model to craft adversarial example. The method works by randomly walking in the direction of the original example along the decision boundary until it is closest to the original example, while remaining adversarial. This attack requires less model knowledge but can achieve comparable attack effects to white-box attack. However, the perturbation sampling strategy in [19] has great randomness, and the convergence of perturbation cannot be guaranteed. To address this problem, [6, 20] were proposed to carry out decision-based black-box attack. However, these attacks often require numerous queries to converge or have large perturbations under a given number of query budget, which makes the attack process consume heavy computation, especially when attacking large-size images.

To improve the query efficiency, we propose a decision-based boundary adversarial attack, which is specific to large-size images, termed SLIA. SLIA optimizes both \(l_{2}\)-norm and \(l_{\infty }\)-norm distortion. The main contributions of this paper are as follows: (1) We propose a decision-based black-box attack for large-size images (named SLIA), wherein adversarial images can be crafted by sending a few queries to the model; (2) When performing untargeted attack, SLIA replaces the low-frequency component of the original image with random uniform noise, and reconstructs it back to the original image space with high-frequency components. This can fool the model while retaining as much key information of the original image as possible; (3) SLIA performs discrete wavelet decomposition on adversarial example at the boundary, only estimates and updates the gradient of low-frequency component, greatly reduces the number of dimensions to be estimated with fewer model queries. Experiments show that our algorithm can be successfully used to attack different ImageNet models with less distortion than state-of-the-art algorithms under the same number of queries.

2 Related Work

According to the available knowledge of the network model, adversarial attack is classified into white-box attack and black-box attack. In a white-box setting, the attacker has all knowledge about the network. Since Szegedy et al. [1] discovered vulnerability of DNNs, various white-box attacks [8,9,10, 12] have been developed. In practice, the attacker may not be able to access the structure and parameters of the model, which is more in line with the actual attack situation. Hence black-box attacks have received more attention recently. It is often divided into three families: transfer-based, score-based, and decision-based attacks.

2.1 Transfer-Based Black-Box Attacks

Transfer-based black-box attack algorithms are mainly based on the phenomenon of transferability: adversarial example against a certain model is often misclassified by other models. Papernot et al. [10, 11] trained a local substitute model by querying the target model and used backpropagation gradient from the substitute network to craft adversarial examples. These examples can also successfully fool the target model with high probability. The follow-up work [3] showed that adversarial example generated on substitute network tends not to have better transferability for targeted attack, but can be developed on an ensemble of models. However, query-based algorithms that directly estimate the gradient of the target network outperform these methods. In addition, it is difficult to find a suitable surrogate model to learn the decision boundary of the target model.

2.2 Score-Based Black-Box Attacks

In the score-based black-box setting, the attacker utilizes the corresponding predicted probabilities to make adversarial examples by querying the target model. Chen et al. [21] applied zeroth order optimization and coordinate descent to estimate the gradient, but requred a large number of queries on the target model. The method in [6] performs gradient estimation via Natural Evolutionary Strategy (NES) and then uses Projected Gradient Descent (PGD) [7], further reduces the query complexity.

2.3 Decision-Based Black-Box Attacks

As an important category of adversarial attacks, an initial attempt named Boundary Attack [19] is highly relevant to real-world applications. It starts from an adversarial point and tries to reduce the distortion by walking towards the original image along the decision boundary while keeping adversarial. The main issue is the trade-off between the number of queries and the quality of adversarial example. HopSkipJumpAttack [21] significantly improves the former [19] in terms of query efficiency. This method can balance both the accuracy of gradient estimation and query complexity well. However, when attacking large-size images, the number of queries required to produce adversarial examples still is in the tens of thousands.

3 Problem Definition

We consider an image classifier \( f: \boldsymbol{x} \rightarrow c\), where \(\boldsymbol{x}\in \mathbb {R }^{n} \) is a normalized RGB image and c is its corresponding true label such as the top-1 classification label. \(F(\boldsymbol{x})\) is a k-dimensional vector, referring to the probability distribution over classes. \(c:=\arg \max _{c \in [k]} F_{c}(\boldsymbol{x})\) represents the label of \(\boldsymbol{x}\). Given an original image \(\boldsymbol{x}^{*}\), \(c^{*}\) represents its label. Denote the adversarial perturbation as \(\boldsymbol{\mu }\in \mathbb {R}^{n}\), the goal of untargeted attack is to make the model misclassified wherein \(c(\boldsymbol{x}^{*}+ \boldsymbol{\mu }) \ne c^{*}\), and targeted attack aims to change the original classifier decision \(c^{*}\) into a pre-specified class \(c^{+}\).

The process of generating adversarial examples can be formulated as an optimization problem by defining the function \(\mathcal {L}\):

$$\begin{aligned} \mathcal {L}_{\boldsymbol{x}^{*}}\left( \boldsymbol{x}\right) :=\left\{ \begin{array}{ll} \max \limits _{c \ne c^{*}} F_{c}\left( \boldsymbol{x}\right) -F_{c^{*}}\left( \boldsymbol{x}\right) &{} \text{(Untargeted) } \\ F_{c^{+}}\left( \boldsymbol{x}\right) -\max \limits _{c \ne c^{+}} F_{c}\left( \boldsymbol{x}\right) &{} \text{(Targeted) } \end{array}\right. \end{aligned}$$
(1)

Gradient-based methods can be used to efficiently optimize this problem under the white-box setting. However, in the decision-based black-box attack, models only provide attackers with a hard label, even without any output probabilities. In other words, only the value of sign(\(\mathcal {L}\)) is available, while the value of \(\mathcal {L}\) is unknown. We denote the indicator function \(\mathcal {I}\) as:

$$\begin{aligned} \mathcal {I}_{\boldsymbol{x}^{*}}\left( \boldsymbol{x}\right) =\text {sign}\left( \mathcal {L}_{\boldsymbol{x}^{*}}\left( \boldsymbol{x}\right) \right) =\left\{ \begin{array}{ll} 1 &{} \quad \text {if} \quad \mathcal {L}_{\boldsymbol{x}^{*}}\left( \boldsymbol{x}\right) >0 \\ -1 &{} \quad \text {otherwise} \end{array}\right. \end{aligned}$$
(2)

In our decision-based attack, the goal of the adversary is to find an adversarial perturbation \(\boldsymbol{\mu }\) which satisfies \(\mathcal {I}(\boldsymbol{x}^{*}+\boldsymbol{\mu })=1\) by sending queries to model. That is, only when \(\mathcal {I}(\boldsymbol{x}^{*}+\boldsymbol{\mu })=1\) can it be considered as a successful attack. Generating adversarial examples under decision-based black-box setting can be defined as the following optimization problem:

$$\begin{aligned} \begin{array}{ll} \min _{}&\mathcal {D}(\boldsymbol{x}^{*}, \boldsymbol{x}^{*}+\boldsymbol{\mu }) \quad s.t. \quad \mathcal {I}(\boldsymbol{x}^{*}+\boldsymbol{\mu }) = 1 \end{array} \end{aligned}$$
(3)

where \(\mathcal {D}(\cdot , \cdot )\) is \(l_{2}\)-norm or \(l_{\infty }\)-norm distance metric. We strive to find an example with as little distortion as possible from the original example under the condition of guaranteed adversarial.

4 Decision-Based Black-Box Attack Specific to Large-Size Images (SLIA)

In this section, we propose to utilize discrete wavelet transform (DWT) to decompose the low-frequency component of the attacked image, and only adds perturbation to this part, while maintaining a 100\(\%\) attack success rate. The pipeline of SLIA is shown in Fig. 1, which includes three steps: gradient estimation by querying the model, moving along the estimated gradient direction, and projecting new example to the decision boundary by binary search towards the original example. Details of each step are given below.

Fig. 2.
figure 2

Initialization for untargeted attacks.

Fig. 3.
figure 3

Overview of estimating gradient at decision boundary.

4.1 Initialization

Our SLIA starts from an adversarial image outside the boundary, and gradually reduces the distortion by moving towards the original image along the decision boundary while remaining adversarial.

Given a correctly classified original image \(\boldsymbol{x}^{*}\), the first step is to generate an initial adversarial example: (1) As shown in Fig. 2, for untargeted attacks, we perform 1-level discrete wavelet decomposition on the original image. Then the low-frequency component \(\boldsymbol{LL}^{*}\) is reset to a random uniform noise \(\boldsymbol{u}\) \(\sim \) \(\mathcal U\)(min(\(\boldsymbol{LL}^{*}\)), max(\(\boldsymbol{LL}^{*}\))). Next, we combine the low-frequency noise with the original high-frequency components to reconstruct the image through inverse DWT. We make queries to the target model, until the new image is misclassified. Different from the previous attack methods that use a uniform random noise as the initialization image, the advantage of SLIA is that the new image can retain more original image information without causing large distortion. Finally, we project it to the boundary through the binary search algorithm and identify it as the initial adversarial example \(\boldsymbol{x}_{0}\); (2) For targeted attacks, the image is randomly selected from a pre-specified class which is different from the class of the original image. Similarly, we leverage the binary search algorithm to search for the decision boundary, and take the image as initial adversarial example \(\boldsymbol{x}_{0}\).

4.2 Gradient Direction Estimate at the Decision Boundary

In this subsection, we will elaborate the gradient estimation part in the proposed method in detail. Suppose that at the t-th iteration, the adversarial example on the boundary is \(\boldsymbol{x}_{t}\). As shown in Fig. 3, \(\boldsymbol{x}_{t}\) is decomposed into low-frequency and high-frequency components by DWT. Therefore, the gradient direction of loss function \(\mathcal {L}\) at this point is estimated by sending queries to the target model,

$$\begin{aligned} {\nabla \mathcal {L}(\boldsymbol{x}_{t})}:=\frac{1}{N} \sum _{i=1}^{N} \mathcal {I}_{\boldsymbol{x}^{*}}[\text {IDWT}(\boldsymbol{LL}_{t}+\delta \boldsymbol{\eta }_{i},\boldsymbol{HL}_{t},\boldsymbol{LH}_{t},\boldsymbol{HH}_{t})] \boldsymbol{\eta }_{i}, \end{aligned}$$
(4)

where \(\delta \) and t are probe step size which are a small positive parameter and t is the current number of iteration. \(\boldsymbol{\eta }_{i=1}^{N}\) are normalized random noise vectors drawn from the Gaussian distribution over the 1/4-dimensional sphere as shown in Fig. 4 (\(\boldsymbol{x}^{*}\in \mathbb {R}^{n}\)). \(\delta \boldsymbol{\eta }_{i=1}^{N}\) is added to the low-frequency component. By combining with high-frequency components of \(\boldsymbol{x}_{t}\), inverse DWT is utilized to reconstruct N samples with unknown labels.

We determine the directions of the noise vectors by accessing the model to observe whether these samples have the same labels as the original example: (1) If \(\mathcal {I}_{\boldsymbol{x}^{*}}=-1\), the noise vector will be updated to its opposite direction; (2) If \(\mathcal {I}_{\boldsymbol{x}^{*}}=1\), the noise vector will remain unchanged. Finally, we average the above noises and use the mean as the normal vector of tangent hyperplane, i.e., the gradient direction \(\nabla \mathcal {L}(\boldsymbol{x}_{t})\) at the decision boundary.

Fig. 4.
figure 4

Illustration of estimating gradient at \(\boldsymbol{x}_{t}\) by sampling N Gaussian noises. \(\boldsymbol{q}\) is an arbitrary in the tangent space.

Due to the flatness of the boundary, it is theoretically likely that the noise vectors are symmetrically distributed on both sides of the decision boundary. Therefore, the updated and unchanged noise vectors can be clustered around the true gradient as much as possible, the mean vector is also closer to the true gradient.

The gradient estimation in SLIA is essentially a Monte Carlo estimation method. When the dimension of the gradient to be estimated is large, using the Monte Carlo method requires more sampling points to make the estimated gradient closer to the true gradient. In an RGB color image, each pixel is represented by three channels. Moreover, as the size of the image becomes larger, the dimensionality of the image increases dramatically (e.g., the data dimension on ImageNet is over 150k), resulting in a low accuracy of estimating gradient. To reduce the dimension of the gradient to be estimated and further minimize the visual effect of adversarial perturbation, SLIA applies DWT to decompose the sample into low-frequency components and high-frequency components. Note that most of the key content-defining information in natural images exists at the low-frequency end of the spectrum, while high-frequency signals are often associated with noise. That is, adversarial examples are more likely to be generated by adding noise to low-frequency component. Therefore, we keep the high-frequency components unchanged, and only perturb the low-frequency component, which reduces the dimension to be perturbed to 1/4 of the original image. Adding perturbation to the low-frequency information has several advantages: (1) Only the low-frequency component is perturbed, the dimension of the gradient to be estimated is reduced to 1/4 of the original image, which means that the same number of sampling points can obtain higher estimation accuracy; (2) Only adding perturbation to the low-frequency component, the perturbation is distributed in multiple pixels, which is not easy to form salt and pepper noise and has less visual impact.

4.3 Move Along Estimated Gradient Direction

In this part, we will move one step along the gradient direction estimated in Eq. (4) to obtain an example located in the adversarial area,

$$\begin{aligned} 5 \boldsymbol{x}_{t}^{\prime } = \boldsymbol{x}_{t} + \epsilon _{t} \cdot \frac{{\nabla \mathcal {L}(\boldsymbol{x}_{t})}}{\Vert {\nabla \mathcal {L}(\boldsymbol{x}_{t})}\Vert _{2}}, \end{aligned}$$
(5)

where \(\epsilon _{t}\) is perturbation magnitude at t-th iteration. It is computed from the distortion result of the last iteration and the geometric progression related to current iteration number t. We multiply the normalized estimated gradient by \(\epsilon _{t}\), and add it to \(\boldsymbol{x}_{t}\) to obtain an adversarial example \(\boldsymbol{x}_{t}^{\prime }\), which is slightly away from the boundary, shown in Fig. 1. Note that \(\boldsymbol{x}_{t}^{\prime }\) is at the opposite side of the boundary to \(\boldsymbol{x}^{*}\).

4.4 Project to Decision Boundary

Since the proposed gradient direction estimation works only at the boundary, we adopt binary search algorithm to quickly find the decision boundary and project \(\boldsymbol{x}_{t}^{\prime }\) to it. We use the following formula to adjust the value of the parameter \(\gamma \) to control the relative position of the adversarial example from the original example, until the stopping condition is satisfied. Hence, we move the adversarial image \(\boldsymbol{x}_{t}^{\prime }\) towards the direction of the original image \(\boldsymbol{x}^{*}\) via

$$\begin{aligned} \boldsymbol{x}^{t+1} = \gamma _t \cdot \boldsymbol{x}^{*} + (1 - \gamma _t) \cdot \boldsymbol{x}_{t}^{\prime }, \end{aligned}$$
(6)

where \(\gamma _t\) is a changing positive parameter between 0 and 1 so that \(\boldsymbol{x}_{t}^{\prime }\) projected back to the decision boundary. We denote the example projected back on the boundary as \(\boldsymbol{x}^{t+1}\), and let it enter to the next iteration as a new boundary adversarial example. The pseudo code of the complete process in generating adversarial images is outlined in Algorithm 1.

figure a
Table 1. Mean \(l_{2}\)-norm distortions for performing untargeted and targeted attacks with different query budgets.
Table 2. Mean \(l_{\infty }\)-norm distortions for performing untargeted and targeted attacks with different query budgets.

5 Experiments

5.1 Experimental Settings

Dataset and Target Models. We experiment on ImageNet [18], a public large-scale labeled image dataset, to demonstrate the efficiency of our proposed method. For ImageNet, we randomly sample 100 correctly classified test images, evenly distributed among 10 randomly selected classes. The whole images are clipped into [0,1] by default for all experiments. We perform both untargeted attacks and targeted attacks to a random class against three prevailing models: ResNet-50 [22], VGG16 [23] and DenseNet-201 [24]. All models are pretrained on ImageNet and provided by Keras onlineFootnote 1.

Compared Baseline Methods. To demonstrate the effectiveness of our method, we compare SLIA with several state-of-the-art decision-based attacks including Boundary Attack method [19], HopSkipJumpAttack (HSJA [21] and Latin Hypercube Sampling based Boundary Attack (LHS-BA) [25]. We mainly focus on attack method LHS-BA, which outperforms all of other Boundary Attack [19], Limited Attack [6], and HSJA [21]. We use the implementation of the three algorithms with the suggested hyperparameters from the publicly available source code online. We fixed the number of queries at 1K, 5K, 10K and 20K and magnitude of the average distortion is what we mainly observe when performing untargeted and targeted attacks respectively.

Evaluation Metrics. Effective querying is the most important indicator to evaluate the decision-based adversarial attack, which requires the method to craft adversarial example with smaller model queries at the same distortion. SLIA’s attack success rate is 100\(\%\), so we quantify the performance in terms of two dimensions: average \(l_{p}\)-norm distortion and specified query numbers. It can be formulated as:

$$\begin{aligned} \Vert \boldsymbol{x}\Vert _{p}=\left( \sum _{i=1}^{n}\left| \boldsymbol{x}_{i}\right| ^{p}\right) ^{\frac{1}{p}}, \end{aligned}$$
(7)

where \(l_{2}\)-norm and \(l_{\infty }\)-norm are are two most commonly used metrics in the adversarial attack field. \(l_{2}\)-norm means Euclidean distance between the original example and the adversarial one, and \(l_{\infty }\)-norm represents perturbation’s maximum changeable degree.

Hyperparameters. In our proposed attack, the number of iteration and the maximum queries are set to 76 and 20,000, respectively. At the t-th iteration, we compute probe step size in each gradient direction estimation by \(\delta _{t} = \Vert \boldsymbol{x}_{t-1}-\boldsymbol{x}^{*}\Vert _{2} / n \times 4\) and \(\epsilon _{t} = \Vert \boldsymbol{x}_{t-1}-\boldsymbol{x}^{*}\Vert _{2} / \sqrt{t} \times 4\) as perturbation step size in moving along estimated gradient direction, where n = 224\(\times \)224\(\times \)3 is the input dimension. Random vectors N is set to 100 first, and we gradually increase it by \(N = N \times (t+1)^{\frac{1}{4}}\). Stopping threshold \(\theta \) when performing binary search is set to \(n^{-\frac{3}{2}}\).

5.2 Experimental Results

To evaluate SLIA’s performance, we report mean \(l_{2}\)-norm and \(l_{\infty }\)-norm distortion results in Tables 1 and 2 when performing untargeted and targeted attacks. The distortion descending curves of various algorithms under different query budgets are given in Fig. 5. Two qualitative example processes of attacking the ResNet-50 by different attack methods are shown in Figs. 6 and 7, respectively.

Fig. 5.
figure 5

\(l_{2}\)-norm distortions across various model queries on ImageNet with ResNet-50. 1st column: untargeted attacks. 2nd column: targeted attacks.

Untargeted Attacks. As shown in the untargeted attack section of Tables 1 and 2, it is obvious that our method outperforms existing decision-based attacks by a large margin under all fixed number of model queries. SLIA also converges in a fewer number of queries, as shown in Fig. 5.

Especially in the early stages of the attack, the advantages of SLIA are more obvious. When the number of fixed model queries does not exceed 10K: (1) Under the \(l_{2}\)-norm distance metric, SLIA can reduce the distortion to 56\(\%\) of HSJA and about 67\(\%\) of LHS-BA; (2) Under the \(l_{\infty }\)-norm distance metric, the distortion of adversarial examples constructed via SLIA is about 64\(\%\) lower than that of HSJA and about 45\(\%\) lower than that of LHS-BA. Experimental data demonstrates that the adversarial examples can be crafted by our method rather quickly without using too many queries.

Fig. 6.
figure 6

Visualized trajectories of HSJA [21], LHS-BA [25] and SLIA for performing untargeted attacks on ResNet-50. 1st column: initialization. 2nd-9th columns: images after blended with original images and at 1 K, 5 K, 10 K, 20 K model queries. 10th column: original image. D is \(l_2\)-norm metric to compute the distortion between adversarial image and original image.

This is due to two reasons: (1) In the initialization part, we replace the low-frequency component of the original example with a uniform noise, and do not update other high-frequency components. In this way, more details of the original example can be preserved in the case of making the model misclassify; (2) When estimating the gradient, we consider DWT to decompose the low-frequency component of the example, and estimate the gradient of it. This greatly reduces the dimension of the gradient to be estimated to 1/4 of the original space. When sampling the same amount of Gaussian noises, the gradient can be estimated with higher accuracy than that of the original full space.

Targeted Attacks. We randomly select a target label and pick one image belonging to the target label. Then we use it as initialization image for all targeted attacks. The results for targeted attacks are presented in the lower parts of Tables 1 and 2. We can see that SLIA not only outperforms HSJA [21], but also surpasses the latest gradient estimation-based boundary attack LHS-BA [25]. From a qualitative example comparison using different methods shown in Fig. 7, when model queries is fixed at 5,000 (4-th column), the adversarial example crafted by SLIA is visually closer to the original example than the other two attacks. It can be seen that under a limited number of queries, SLIA is able to make adversarial examples with significantly smaller distortions from the corresponding original example. In other words, under the same distortion condition, SLIA requires fewer number of queries than the state-of-the-art methods. We can also find that SLIA requires a larger number of model queries to achieve a comparable distortion when performing targeted attacks than untargeted attacks. This phenomenon is evident on the ImageNet dataset which has many categories. There is often an order-of-magnitude difference in the average \(l_{p}\)-norm distortion between untargeted and targeted attacks for the same number of queries.

Fig. 7.
figure 7

Visualized trajectories of HSJA [21], LHS-BA [25] and SLIA for performing targeted attacks on ResNet-50. 1st column: initialization. 2nd-9th columns: images after blended with original images and at 1K, 5K, 10K, 20K model queries. 10th column: original image. D is \(l_2\)-norm metric to compute the distortion between adversarial image and original image.

6 Conclusion

In this work, we present a query-efficient adversarial example generation algorithm (SLIA), which is specific to ImageNet with a large image size. SLIA can be performed to ensure 100\(\%\) attack success rate for settings where the attacker only has access to the final decisions of a model. We generate adversarial examples by estimating the gradient of the low-frequency component, which greatly reduces the dimension of the gradient to be estimated. When attacking a variety of different ImageNet models, the distortion can be reduced faster with our method compared to state-of-the-art attacks with different query budgets.