Keywords

1 Introduction

Deep learning models have recently found applications in automatic driving and face recognition tasks. These tasks are critical for safety, involving potential risks to life and information privacy. It has been observed that even small perturbations to the model input can significantly alter predictions [26], leading to worst-case scenarios like accidents in automatic driving or information leakage in face recognition. Adversarial attacks are used to identify such perturbations that cause misclassifications. To counter these attacks, defense methods such as adversarial training [28, 32] and adversarial detection [3, 19] have been explored. However, more potent attacks are needed to develop more effective defenses.

This study targets black-box adversarial attacks, which operate under real-world constraints where only the model’s predictions can be accessed. We focus on black-box adversarial attacks that aim to maximize attack success rates within allowed perturbations. A promising approach is to repeat the process of extracting a specific image area and changing the perturbations added to it.

Existing attacks use simple rectangles as the areas where perturbations are changed in a single iteration (Sect. 3.1). However, it is natural to determine the areas based on the image’s color information, as it directly influences the perturbation to be added. Therefore, we focus on the color variance of the area where perturbations are changed in a single iteration (Sect. 3.2). Additionally, we focus on the compactness of the area, because existing attacks have adopted rectangles (Sect. 3.3). Through our analysis of the relationship among color variance, compactness, and attack success rates (Sect. 3.4), we discovered that areas that are compact and have a low color variance result in higher attack success rates (Sect. 3.5). Consequently, we propose applying superpixels, which achieve a good balance between color variance and compactness.

Additionally, we introduce versatile search, a new search method that restricts the search to the boundary of perturbation and allows for searches using areas beyond rectangles. With these advancements, we propose Superpixel Attack, a novel attack method that applies superpixels and performs versatile search (Sects. 4.1 and 4.2). To evaluate the performance of Superpixel Attack, we conducted comparison experiments with existing attacks using 19 models trained on the ImageNet dataset [13] and available on RobustBench [8] (Sect. 5). Superpixel Attack significantly enhances attack success rates, resulting in an average improvement of 2.10% compared to existing attacks. Considering that most models used in this study are robust against adversarial attacks, this improvement becomes especially noteworthy for black-box adversarial attacks. Our contributions can be summarized as follows:

  1. 1.

    We analyze the relationship among the color variance, compactness, and attack success rates.

  2. 2.

    We propose applying superpixels to black-box adversarial attacks and a new search method called versatile search.

  3. 3.

    We conducted comparison experiments on Superpixel Attack, which applies superpixel and performs versatile search, and found improvement in attack success rates by an average of 2.10% compared to existing attacks.

2 Preliminaries

2.1 Problem Definition

Let \(H \in \mathbb {N}\) be the height, \(W \in \mathbb {N}\) be the width, and \(C \in \mathbb {N}\) be the number of color channels of the input image. Let \(\mathcal {D} = [0, 1]^{H \times W \times C}\) denote the image space, \(Y \in \mathbb {N}\) denote the number of classes of the model, and \(f: \mathcal {D} \rightarrow [0, 1]^Y\) denote the classification model. The output of f is the predicted probability of each class, and we denote \(f_i(x) \in [0, 1]\) the predicted probability of class i when image \(x \in \mathcal {D}\) is the input. Adversarial attacks are to find an image \(x_{adv} \in \mathcal {D}\) with the predicted label differs from the ground truth label \(y \in \{1, \dots , Y\}\) of the original image \(x_{org} \in \mathcal {D}\) by adding perturbations that are imperceptible to humans. The inputs generated by adversarial attacks are called adversarial examples. This study focuses on adversarial attacks that maximize attack success rates within the allowed perturbations. We set the allowed perturbation size \(\epsilon \in \mathbb {R}^+\) and the loss function \(L: [0, 1]^Y \times \{1, \dots , Y\} \rightarrow \mathbb {R}\), and solve the following constrained nonlinear optimization problem:

$$\begin{aligned} \begin{aligned} & \max _{x_{adv} \in \mathcal {D}} \quad L \left( f(x_{adv}), y \right) \\ & \quad \text {s.t.} \qquad ||x_{adv} - x_{org}||_{\infty } \le \epsilon \end{aligned} \end{aligned}$$
(1)

2.2 Related Work

Parsimonious attack [20], Square Attack [4], and SignHunter [2] have been proposed as black-box adversarial attacks defined by Eq. (1). Parsimonious attack restricts the search space to the boundaries of allowed perturbations because attacks mostly succeed even on the boundaries. Square Attack achieves high success rates despite its reliance on random sampling. It is a part of AutoAttack [10], a well-known white-box adversarial attack. SignHunter searches for adversarial examples by repeating image division and gradient direction estimation.

Black-box adversarial attacks that minimize perturbations under misclassification [22, 27] and those that reduce the number of perturbed pixels [9, 11] have also been investigated. Attacks that generate adversarial examples from gradient information of surrogate models have also been proposed [21, 31]. These methods are based on transferability, that is, adversarial examples of one model often become those of others. However, training is required to make surrogate models resemble an attacking model and incurs high computational costs.

3 Research on Update Areas

3.1 Update Areas of Existing Methods

The most promising approach for black-box adversarial attacks defined by Eq. (1) involves searching for adversarial examples by repeating the following steps: i. Extract a specific area from the image, ii. Collectively change the perturbation added to the extracted area, iii. Calculate the value of the loss function and update the perturbations when the loss increases. In this paper, we refer to the area where perturbations are changed in a single iteration as Update Area. Existing black-box adversarial attacks have adopted simple rectangles as Update Areas. Parsimonious attack sets them using squares that divide the image equally. Square Attack sets them using randomly sampled squares from a uniform distribution. SignHunter sets them using rectangles that divide the image into equal horizontal sections.

3.2 Color Variance of Update Areas

As described in the previous section, Update Areas of the existing attacks are set using simple rectangles. However, it is natural to determine the area by considering the color information of the image because it determines the perturbation to be added. Therefore, we focus on the color variance of Update Areas. As a metric to express the color variance in divided areas of an image, Intra-Cluster Variation (ICV) [5] is proposed. ICV is calculated based on the following equation:

$$\begin{aligned} \text {ICV} = \frac{1}{\#\tilde{S}} \sum _{s \in \tilde{S}}{\frac{\sqrt{\sum _{p \in s}(I(p) - \mu (s))^2}}{|s|}}, \end{aligned}$$
(2)

where \(\tilde{S}\) is the set of image segmentations. In this paper, it refers to the set of all Update Areas used in an attack. \(s\in \tilde{S}\) denotes a single Update Area, and \(p \in s\) denotes a pixel. I(p) is the value of the pixel p in the LAB color spaceFootnote 1 and \(\mu (s)\) is the average value in the LAB color space within a single Update Area. \(\#\tilde{S}\) is the number of Update Areas and |s| is the number of pixels in a single Update Area. Smaller ICV indicates smaller color variance in each Update Area.

3.3 Compactness of Update Areas

Furthermore, considering that existing attacks use rectangles to set Update Areas, we focus on the compactness of Update Areas. The compactness (CO) [24] is a metric calculated by dividing the size of the segments by that of a circle with the same perimeter length. The following equation defines this:

$$\begin{aligned} \text {CO} = \frac{\sum _{s \in \tilde{S}} Q(s) \cdot |s|}{\sum _{s \in \tilde{S}} |s|}, \qquad Q(s) = \frac{4 \pi |s|}{|R(s)|^2}, \end{aligned}$$
(3)

where |R(s)| is the perimeter length of the Update Areas (number of pixels on the boundary). Higher CO indicates more centrally clustered Update Areas. We examined ICV and CO and attack success rates for various Update Areas construction in Sect. 3.5.

3.4 Superpixel Calculated by SLIC

Superpixel is a set of pixels that are close in color and position. They have applications in object detection [30], semantic segmentation [16], and depth estimation [7]. Dong et al. proposed a white-box adversarial attack that adds the same perturbation to each superpixel to avoid disrupting the local smoothness of a natural image [14]. We use superpixels to improve the efficiency of black-box adversarial attacks. To the best of our knowledge, no black-box adversarial attacks that apply superpixels have been proposed. Various methods have been proposed for computing superpixels. We use one of the most popular methods: Simple Linear Iterative Clustering (SLIC) algorithm [1]. It places representative points at equal intervals according to the maximum number of segments and clusters pixels based on the k-means method. Let \((h_i, w_i)\) and \((h_j, w_j)\) be the positions in the image, and \((l_i, a_i, b_i)\) and \((l_j, a_j, b_j)\) be the values in the LAB color space. Clustering is performed based on similarity k.

$$\begin{aligned} \begin{aligned} k_{color} = \sqrt{(l_i - l_j)^2 + (a_i - a_j)^2 + (b_i - b_j)^2} \\ k_{space} = \sqrt{(h_i - h_j)^2 + (w_i - w_j)^2} \\ k = \max (0,\ k_{color} + \alpha \cdot k_{space}), \end{aligned} \end{aligned}$$
(4)

where \(\alpha \) is a hyperparameter that weighs the positional distance relative to the color distance. \(\alpha = 10\) is generally set to calculate superpixels. We examine the relationship between ICV, CO, and attack success rates for \(\alpha = \pm 0.1\), \(\pm 1\), \(\pm 10\), \(\pm 100\), \(\pm 1000\) in Sect. 3.5. In addition, the SLIC implementation of scikit-image has the option to force each superpixel to be connected. The experiment in Sect. 3.5 examine both cases. For \(\alpha = 1000\), Update Areas are constructed as squares that divide the image equally, regardless of whether they are forced to be connected.

3.5 Analysis of Color Variance and Compactness

Fig. 1.
figure 1

Relationship between ICV, CO and attack success rates

The experiments use Salman et al. (ResNet-18) [23] trained on the ImageNet dataset and available on RobustBench. According to the RobustBench settings, we use 5,000 images randomly sampled from the ImageNet dataset, and the allowed perturbation size is set to \(\epsilon = 4/255\). We adopted versatile search, a new search method proposed in Sect. 4.2. We examine attack success rates at the maximum iterations \(T = 500\) for each Update Area construction. The attack success rate is calculated as follows: (number of misclassified images after the attack)/(total number of images), where the higher the attack success rate, the more powerful the attack. The seed value is fixed at 0. We used a CPU: Intel(R) Xeon(R) Gold 5220R CPU@2.20 GHz\(\times \)2, GPU: Nvidia RTX A6000, RAM:768 GB. The results are shown in Fig. 1.

Each point in Fig. 1 represents the values of CO and ICV for different Update Area construction. The numerical values represent the attack success rate at the point. The horizontal axis represents the value of CO, and the right side indicates that more centrally clustered Update Areas are constructed. The vertical axis represents the value of ICV, where the upper side indicates that Update Areas with lower color variance are constructed. Note that the same Update Areas are constructed for some parameters of \(\alpha \), and the points with equal ICV, CO, and attack success rates coincided with each other. For some representative points, the Update Areas generated by the SLIC algorithm are shown in different colors. This result indicates that it is effective to set Update Areas that are compact and have a low color variance.

4 Superpixel Attack

Based on the analysis in Sect. 3, we consider applying superpixels, which achieve a good balance between color variance and compactness, to black-box adversarial attacks. In this section, we describe the construction of Update Areas using superpixels (Sect. 4.1) and a new search method called versatile search (Sect. 4.2). We propose a novel attack method called Superpixel Attack that sets Update Areas using superpixels and performs versatile search. An overview of Superpixel Attack is shown in Fig. 2, and the pseudo-code is shown in Algorithm 1.

4.1 Update Areas Using Superpixels

Below, we describe the construction of Update Areas using superpixels. Inspired by existing attacks, Update Areas are set using a few segments of superpixels at an early stage and many segments of superpixels as the attack progresses. Specifically, the segment ratio r is given and superpixels \(\mathcal {S}\) are computed following the maximum number of segmentations \(n = r^j\) \((j = 1, 2, \dots )\). Let S be the set of Update Areas constructed for each maximum number of segments n. The original image \(x_{org}\) is divided into superpixels \(\mathcal {S}\) for each RGB color channel \(\{1, \dots , C\}\), which are set as Update Areas \(S = \mathcal {S} \times \{1, \dots , C\}\). Note that the maximum number of superpixel segments n is not always equal to the number of superpixels computed \(\#\mathcal {S}\) in the SLIC algorithm employed in this study. The segment ratio is set to \(r=4\) based on pre-examination. We set \(\alpha = 10\) and force the areas to be connected.

Fig. 2.
figure 2

Flow of proposed method: Superpixel Attack

Algorithm 1
figure a

. Superpixel Attack

4.2 Procedure of Versatile Search

Below, we describe a new search method called versatile search. It searches only the boundaries of the allowed perturbations \(\{-\epsilon , \epsilon \}^{H \times W \times C}\) according to the analysis by Moon et al. [20]. At the beginning of the search, the perturbations are initialized with \(\mathcal {E}_{best} = \{\epsilon \}^{H \times W \times C}\). Let \(\mathcal {A}\) be the entire area of the image and initialize the set of Update Areas with \(S=\{\mathcal {A}\}\). The best loss is initialized as \(\mathcal {L}_{best}=-\infty \). The following steps are repeated until the number of iterations t reaches the maximum iterations T.

First, the next area where the perturbations are changed is randomly extracted \(s\in S\). In the first iteration, Update Area is set to the entire image (\(s=\mathcal {A}\)). Only the perturbations in the extracted Update Area \(\mathcal {E}_{best}[s]\) is flipped to generate new perturbations \(\mathcal {E}\). These perturbations \(\mathcal {E}\) are added to the original image \(x_{org}\), and the loss \(\mathcal {L}\) is calculated. When the calculated loss \(\mathcal {L}\) is higher than the best loss \(\mathcal {L}_{best}\), the best loss \(\mathcal {L}_{best}\) and the perturbation \(\mathcal {E}_{best}\) are updated. Superpixels are computed when all Update Areas are searched (\(S = \emptyset \)), and new Update Areas are set using them.

When the attack is completed, the image with the best loss \(x_{best}\) is returned. Superpixel Attack employs CW loss [6] (\(L_{cw}\)) as the loss function based on pre-examination. CW loss is calculated as follows:

$$\begin{aligned} L_{cw}(f(x), y) = \max _{{i \ne y}}f_i(x) - f_y(x) \end{aligned}$$
(5)

5 Experiments

In this section, we describe the comparison experiments conducted to confirm the performance of Superpixel Attack. We compare it to Parsimonious attack (Parsimon) [20], Square Attack (Square) [4], SignHunter (SignH) [2], and Accelerated SignHunter (AccSignH) [17] as a baseline. All of these are black-box adversarial attacks with the same problem settings. The experiments use 19 models trained on the ImageNet dataset and available on RobustBench. According to the RobustBench settings, we use 5,000 images randomly sampled from the ImageNet dataset, and the allowed perturbation size is set to \(\epsilon = 4/255\). We examine the attack success rates at the maximum iterations \(T = 100\) and 1000. The baseline hyperparameters are the same as those in the original paper. The seed value is fixed at 0. We use the same computational environment as in Sect. 3.5. Table 1 presents the results. The highest attack success rate for each iteration is bolded, and the difference between the best baseline method and Superpixel Attack is noted on the right side.

Table 1. Comparison experiments with baselines

The results in Table 1 show that Superpixel Attack improves the attack success rates by an average of 1.65% for 100 iterations and 2.10% for 1000 iterations compared to existing attacks. Most models used in this study are robust against adversarial attacks, and this improvement is significant for black-box adversarial attacks. In fact, the difference between the second-best and next-best existing attacks averaged 0.67% for 100 iterations and 0.71% for 1000 iterations. For Wong (ResNet-50), PyTorch (ResNet-50), and Singh (ViT-S+ConvStem), we plot the trends of attack success rates per iteration for each attack method in Fig. 3.

Fig. 3.
figure 3

Transition of attack success rates of each attack method

Figure 3 indicates that Superpixel Attack achieves high success rates in all iterations, including the PyTorch (ResNet-50) model in contrast to SignHunter, which only has high success rates in short iterations. For the other models, each attack method exhibited trends similar to those of Wong (ResNet-50) and Singh (ViT-S+ConvStem). Furthermore, Fig. 4 shows the computational time for superpixels and forward propagation in Superpixel Attack. Although it depends on the computational environment, the computation time of superpixels is less than that of the forward propagation. This indicates that applying superpixels to adversarial attacks is practical in terms of the computation time. For 1000 iterations, the superpixel computation accounts for a very small percentage of the attacks, as indicated by the orange bars.

Fig. 4.
figure 4

Computational time of superpixels and forward propagation

6 Conclusion

This study demonstrated that the attack success rates are related to the color variance and compactness of the Update Area. The experimental results suggest that Update Areas with low color variance and high compactness is desirable. Therefore, we propose the Superpixel Attack, which employs superpixels as Update Areas to achieve a good balance between color variance and compactness. The comparison experiments show that the Superpixel Attack improves the attack success rates by an average of 2.10% compared with existing methods for 1000 iterations, which is significant for black-box adversarial attacks. This study indicates that adjusting the Update Areas according to the image can enhance the attack success rates.