1 Introduction

Deep learning has helped solving many crucial artificial intelligence problems over the past decade, some of which were outstanding for a long time [1]. Nowadays, deep neural networks (DNNs) are employed to solve complex problems in various fields such as machine vision [2], natural language processing [3], big data processing [4], DNA analysis [5] and autonomous vehicles [6,7,8,9]. The rapid development and progress of the deep learning field is rooted not only in the high capability and performance of the deep learning approaches, but it also stems from other factors such as the concentrated efforts of researchers and the rapid improvement of the deep learning models [10], discovering the importance of the different deep learning applications [11], hardware advancements [12], the increased power of the computer processors and the graphical processing units [13], and the development of various software libraries and platforms [14]. Today, DNNs play a vital role in our lives, which can even overshadow our health and well-being [15]. Owing to the high precision and the low error rate of these networks and, thus, the amount of trust they have been able to earn, the DNNs have been reliably employed in the sensitive areas [16] such as health [17], face recognition [18], autonomous flying vehicles [19], and many other fields. The deep learning approaches, especially those associated with machine vision, play a significant role in our day-to-day activities, and this role will become even more prominent in the future [20].

Machine vision is the field in which most of the deep learning breakthroughs are rooted [21]. Before the advent of the deep learning, the conventional machine vision techniques had a much lower performance [22], and the deep learning concept tremendously evolved the precision of the machine vision systems [23]. DNNs helped improve machine perception to the point that it surpassed the natural human vision precision in some cases [24]. After a short while, deep learning models were employed in different machine vision applications such as classification [25], segmentation [26], semantic segmentation [27], face recognition [28], object detection [29], and tracking [30].

Many high-precision object detectors based on the DNN models have been introduced in the recent years. The main task and challenge of these object detectors is to detect the target objects and their positions in different classes and images [31]. Object detectors are first trained on a number of labeled images, and then a trained DNN performs inference on unlabeled input images. The output will be in the form of bounding boxes and class categories for object types seen during training [32]. Figure 1 shows a basic overview of an object detection pipeline.

Fig. 1
figure 1

The basic operation pipeline of an object detector

As it can be observed in Fig. 1, after passing an image through the different layers of an object detection neural network, the target objects and their positions in the input image are detected by the object detector. The detection of objects in video recordings is an essential task related to autonomous vehicles, and due to the importance of this field, it has been the subject of numerous investigations [33]. The processing of objects in videos is a complex task, because the quality of every frame which is isolated from a video recording deteriorates and needs to be boosted and enhanced independently [34]. In certain applications, sometimes it is necessary to perform object detection and object tracking simultaneously. For this purpose, a new task called the “Video Instance Segmentation” has been introduced in the field of video processing [35].

In the last several years, many high-precision and fast object detection models have been presented [36]. Some of the most important object detection models include the different versions of the YOLO model [37], the FRCNN model [38], and the SSD model [39]. These models are considered as the generic object detectors, and their task is to detect all objects in an image and to outline their positions by means of the bounding boxes [40]. Another form of object detection is the salient object detection, in which the detectors try to find the visually dominant objects in an image [41].

A critical flaw of the DNNs was discovered by Szegedy et al. [42], when they showed that these networks are highly vulnerable against adversarial attacks. The adversarial attacks are small perturbations which normally are imperceptible to the human eye, but they can completely mislead the DNNs [43]. Since the publishing of the findings by Szegedy et al. [42], a significant part of the research works in the field of deep learning was devoted to the adversarial attacks/defenses [44,45,46,47,48,49,50,51,52,53,54,55,56,57].

While the field of adversarial robustness has witnessed a great deal of achievement in building sophisticated methods of attack and defense, the majority of the work has been focused on the task of image classification due to its simplicity in theory and practice [58]. As a result, there has been little coherent effort to survey the state-of-the-art research in adversarial robustness of object detectors. In the real world however, object detectors are much more adopted than classifiers, and thus dedicated studies on robustness of object detectors are required.

Considering the significance of the adversarial robustness in detector DNNs, in this paper, as seen in Fig. 2, we review the most important articles on the subject of adversarial attacks and defenses and generally adversarial robustness in object detection. We try to show the progress of the attacks over time in the study of attacks, and in the discussion of defenses, the advantages and disadvantages of each method have been presented. An overall comparison of different adversarial attacks is carried out in Table 1 based on the reduction of the mAP values. The mAP analysis [59] is a method of measuring the performance of detection models. Since most of the examined attacks can be applied on the FRCNN, this model was considered in comparing the mAP values.

Fig. 2
figure 2

A categorization of the methods discussed

Table 2 shows an overall comparison of the defense techniques against the adversarial attacks. In this table, for improving the precision, we have used the YOLO model and considered the DAG attack, because most of the defense strategies have presented their results based on this model and type of attack.

We discuss the recent works on adversarial attacks and defense for object detection. Problem and terminology are discussed in Sect. 2. In Sect. 3, a general description of different types of adversarial attacks is presented. Some of the more prominent adversarial defenses for robustifying the DNNs in object detection are introduced in Sect. 4. The performances of the adversarial attacks in the field of sample application scenarios like autonomous vehicles and face detection are investigated in Sect. 5, and the conclusion of the paper is presented in Sect. 6.

2 Problem formulation and terminology

Adversarial attacks include small perturbations which are usually unrecognizable to the human eye but can be mixed into clean images and contaminate them. As stated in introduction section, these attacks are able to mislead the deep learning models and reduce their accuracy [66]. Figure 3 shows some example images perturbed by the adversarial attacks. As observed in this figure, the detection models have been deceived by these perturbations to a large extent and have made wrong detections. Let us formally give a definition of an adversarial attack. Suppose \(O( \cdot )\) is an object detection model and let \(x\) be a clean input image. We expect the output to be object labels that can be displayed with a set like \(L = \{ l_{1} ,l_{2} ,....,l_{n} \}\) where n is the number of detected objects in the input image. Normally we expect the object detector to act according to the following equation:

Fig. 3
figure 3

Some examples of clean images and those perturbed by adversarial attacks, and the outcomes of the DNNs used to detect objects within these images. After perturbation, the network is not able to detect the objects anymore

$$O\left(x\right)\to L=\left\{{l}_{1},{l}_{2},\dots ,{l}_{n}\right\}.$$
(1)

Now suppose we add a small amount of perturbation like \(\rho\) to the image. The output of the detector changes to:

$$O\left(x+\rho \right)\to {\overline{L}}=\left\{{{\overline{l}}_{1}},{{\overline{l}}_{2}},\dots ,{{\overline{l}}_{\overline{n}}}\right\}.$$
(2)

Experimental results show that usually \(L \ne \overline{L}\). And also in some cases \(\overline{L}\) and \(L\) are mutually exclusive. That means, in some attacks, the model is deceived in such a way that it does not recognize even a single object in the input image.

There are numerous technical expressions pertaining to the subject of the adversarial attacks, and we are just going to define some of them here.

Literally, the expression “adversarial perturbations” refers to the disturbances that are embedded into a clean image to turn it into an adversarial example [66]. As a perturbed version of a clean image, an adversarial example is intended to mislead or deceive a machine learning technique such as a DNN [67]. In the literature related to adversarial attacks, the expression “adversarial training” refers to a method of network training by means of the images perturbed by such attacks [68]. At a high level, adversarial attacks can be divided into several types: ‘targeted’ or ‘untargeted’ attacks and ‘black-box’ or ‘white-box’ attacks. The untargeted attacks do not care about the final label and the misled labels; they just want to deceive an object detection model. What is important in the untargeted attacks is that objects get wrong labels [69], while the targeted attacks want to deceive a model so that it designates a particular label for a specific object. In fact, the targeted attacks are devised for a certain class of objects [70]. As mentioned earlier, there are different methods for adversarial defense as well. Generally, these defensive techniques can be divided into ‘one-shot’ and ‘iterative’ methods. The one-shot methods produce the adversarial disturbances by performing a one-shot computation (e.g., a one-time computation of a model’s loss gradient to generate a perturbation [71]), whereas the iterative methods perform the computations several times in order to generate a single disturbance. This operation is usually more costly than the one-shot procedure [72]. In general, these are the most common terminologies used in the literature published on the subject of the adversarial attacks, and we are going to use these expressions in the following sections.

3 Different types of attacks on object detection

In this section, we review the most common and frequently used adversarial attacks introduced in the field of object detection. We review the effects of these attacks on numerous datasets and models. An adversarial attack is considered to be more effective if it has a higher fooling rate and is able to reduce the accuracy of a model to a greater extent [73]. In the following subsections, we will explore these attacks.

According to [45], there are three tasks in the object detectors: detecting an object, forming a bounding box, and allocating a label to the bounding box. In one training sample \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x}\) we have \(n\) bounding boxes that refer to objects in the training sample. The objectness score \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{i} \in [0,1][0,1]\) which determines the presence of an object in an image, can be obtained by minimizing a binary cross-entropy (LBCE) [74]. The objectness loss can then be formulated as:

$$ L_{{{\text{obj}}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) = \sum\limits_{i = 1}^{n} {[L_{i} L_{{{\text{BCE}}}} (1,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{i} ) + (1 - L_{i} )L_{{{\text{BCE}}}} (0,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C}_{i} )]} . $$
(3)

In Eq. (3) it is assumed that \(L_{i} = 1\) if there is an object in the ith candidate bounding box and \(L_{i} = 0\) if the ith boundary box does not have any objects.

Regressing the bounding boxes: \((\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{y} )\) and \((\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{W} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{H} )\) denote the center, width, and height of the bounding box i and are obtained by minimizing a sum of box coordinates least square error (LSE) as follows:

$$ \begin{aligned} L_{{bbox}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x};O,W) =& \sum\limits_{{i = 1}}^{n} {L_{i} } \left[ L_{{SE}}\left( {b_{i}^{x},\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{y} } \right) + L_{{SE}} \left( {b_{i}^{y},\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{y} } \right) \right. \\ &\left. +\, L_{{SE}}\left( {\sqrt {b_{i}^{W} } ,\sqrt {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{W} } } \right) + L_{{SE}} \left( {\sqrt {b_{i}^{H} } ,\sqrt {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{i}^{H} } } \right) \right] \\ \end{aligned}$$
(4)

And finally, the object-type classification loss term is defined as:

$$ L_{{{\text{class}}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) = \sum\limits_{i = 1}^{n} {L_{i} } \sum\limits_{c = 1}^{C} {L_{{{\text{BCE}}}} (p_{i}^{c} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{p}_{i}^{c} )} $$
(5)

where K-class probability vector \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{p}_{i} = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{p}_{i}^{1} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{p}_{i}^{2} ,...,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{p}_{i}^{k} )\) approximates the label of a box.

Thus, the overall loss function of the deep detector introduced by Chow et al. [45], is formulated by combining Eqs. (3), (4), and (5):

$$ L_{{\text{object - detector}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) = L_{{{\text{obj}}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) + L_{{{\text{bbox}}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) + L_{{{\text{class}}}} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x} ;O,W) $$
(6)

3.1 The targeted adversarial objectness gradient attacks (TOG) series of attacks

In this series of attacks, six attacks have been introduced by Chow et al. [45], who also made the relevant software available to the public. An evaluation of the obtained results shows that the TOG attacks have done well in different cases and were able to reduce the accuracy of the considered models on a variety of datasets. Here, we will touch briefly on the mathematical approaches used by the various models and also explain the attack strategies. The algorithm implemented in the TOG attacks is relatively simple.

The TOG carries out its adversarial attacks by reversing the training process. Chow et al. [45] are able to generate the \(x^{\prime}\) images as the adversarial examples by using the following equation:

$$ x^{\prime} = \prod\limits_{{x^{\prime},\varepsilon }} {\left[ {x^{\prime}_{t - 1} - \alpha_{{{\text{TOG}}}} \Gamma \left( {\frac{{\partial L^{*} (x^{\prime}_{t - 1} ;O^{*} ,W)}}{{\partial x^{\prime}_{t - 1} }}} \right)} \right]} $$
(7)

Here, \(\alpha_{{{\text{TOG}}}}\) denotes the learning rate of the attack, and \(\Gamma (.)\) is the sign function. By strategically manipulating \(L^{*}\) and finding the auxiliary target \(O^{*}\), not only does the TOG support the random and arbitrary attacks, but it also generates 3 types of exclusive targeted attacks in order to fool its victims. We survey these attacks in the following subsections.

(a) The TOG-untargeted attack This is a type of random attack which tries to deceive the detection models in a way that they are unable to correctly detect the objects in various classes. This type of attack does not target an exact class of objects. Also, the aim of this attack is not to use a wrongful technique, and such a mistake may occur due to the concealing of an object from an object detector, the allocation of the wrong label or the fabrication of a particular detection [75].

(b) The TOG-vanishing attack The main goal of this targeted attack is to create some noise in an image so that an object detector would be unable to detect and recognize any of the object classes in that image [76]. In fact, the main objective in this attack is to have an empty detection vector at the output of an object detector.

(c) The TOG-fabrication attack Contrary to the TOG-vanishing attack, the main objective of the TOG-fabrication attack is to add incorrect detections to the output detection vector. In this type of attack, at a detector’s output, one can see an image with a lot of wrongly detected objects.

(d) The TOG-mislabeling attack In this attack, the existing positions of the objects in an image are correct, and they are detected correctly, but the object detector chooses the wrong labels for the images.

The outcomes of the TOG attacks on the example images and the performance of the object detector confronting these attacks are illustrated in Fig. 4.

Fig. 4
figure 4

The outcomes of the TOG attacks on the considered images

3.2 The DAG attack

The DAG attack, with the full name of Dense Adversary Generation attack, has been devised for the object detection and semantic segmentation tasks [46]. According to the authors, this attack has a high transferability and is quite effective on numerous datasets and architectures. The algorithm for generating this type of attack is a simple one. In order to generate a DAG attack, the following procedure should be carried out:

Suppose \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x}\) is an image that contains n desired objects to be detected as \(D = \{ d_{1} ,....,d_{n} \}\). Each desired object has been designated with a real class label of \(l_{n} \in \{ 1,2,...,C\}\) where C indicates the number of classes. The values of L will be assigned according to \(L = \{ l_{1} ,l_{2} ,....,l_{n} \}\).

For a specific task in a DNN, we will use \(f_{{l_{n} }} (X,d_{n} )\)\(\in \) \({\mathbb{R}}^{\mathrm{c}}\) to show the classification score vector (before normalizing the maximum smooth function) on the nth target of \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x}\). In generating an adversarial example, we should make the prediction of all the target objects erroneous, i.e.,\(\forall n,ARG{max}_{c}\{{f}_{c}(X+r,{d}_{n})\}\ne {l}_{n}\). Here, r indicates an adversarial perturbation which has been added to the image \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x}\). Thus, an adversarial label \({l}_{j}^{^{\prime}}\) is assigned for each target object and adversarial label vector \(L^{\prime} = \{ l^{\prime}_{1} ,l^{\prime}_{2} ,....,l^{\prime}_{n} \}\) is a set of adversarial labels. Therefore, the relevant loss function for all the targets will be

$$ {\text{loss}}(X,D,L,L^{\prime}) = \sum\limits_{n = 1}^{N} {[f_{{l_{n} }} (X,d_{n} ) - } f_{{l^{\prime}_{n} }} (X,d_{n} )] $$
(8)

The value of loss can be minimized by causing an error in the prediction of each target object. This can be done by lowering the confidence level of the original correct class \(f_{{l^{\prime}_{n} }} (X + r,d_{n} )\) and raising the confidence level of the considered incorrect class (the adversarial type) \(f_{{l^{\prime}_{n} }} (X + r,d_{n} )\).

In this approach, the gradient descent algorithm is employed to optimize the results. In the mth iteration, the current image (probably, after adding several perturbations) is displayed as Xm and the set of the correctly predicted target objects, i.e., the set of active targets, is obtained by mean of \({\mathbf{D}}_{m}=\{{d}_{n }|\mathrm{arg}{\mathrm{max}}_{\mathrm{c}}\{{f}_{c}({X}_{m}+{d}_{n})\}={l}_{n}\}\). Then, the gradient is computed based on the input data, and the sum of all the perturbations, which we call \(r_{m}\), is determined. Thus, the final perturbation will be obtained as

$$ r = \sum {r_{m} } $$
(9)

The DAG attack is demonstrated in Fig. 5.

Fig. 5
figure 5

Demonstration of the DAG attack

3.3 The composite evaporate attack

This is a black-box type of attack and it can conceal the class of the target objects from an object detector without knowing anything about the network [47]. Figure 6 shows the overall strategy of the Evaporate attack.

Fig. 6
figure 6

The overall design of an Evaporate attack

The attacks generated in the Evaporate method are iterative in nature [53], and they will be repeated in a next steps if the obtained images in the previous steps are not good enough. In this approach, an attack is initiated by obtaining the adversarial example \(x^{\prime}\) through the following optimization equation:

$$ \mathop {\min }\limits_{{x^{\prime}}} L(x^{\prime}) = d(x^{\prime},x) - \delta (D(x^{\prime})) $$
(10)

Here, \(d(x^{\prime},x)\) denotes the MSE distance, and \(\delta (D(x^{\prime}))\) indicates an adversarial criterion. The value of the criterion will be zero when the conditions of a satisfactory attack are satisfied, and it can go down to—∞ when such conditions are not met. According to [47], this attack has been able to fool the YOLOv3 model 84% of the time.

3.4 The DeepFool attack

This type of attack is carried out by adding a minimum number of perturbations. In this method, a clean image is fed into an iterative algorithm which adds some perturbations to it in every step. Eventually, by reaching a fooling threshold, the iterative algorithm is stopped. This algorithm is repeated to the point that an object detector changes its decision with respect to the original detection. This attack can be a universal attack and is used in many types of DNNs [48].

3.5 The RAP attack

This attack, which is called the Robust Adversarial Perturbation, is a black-box type of attack. This attack is also designed based on the solution of an optimization equation, and it will go on until the intended effect is achieved. In this scheme, the proposal-based object detectors and the instantaneous segmentation algorithms are attacked by adding minimal adversarial noises to an input image. In this approach, with an input image and a pre-trained RPN, a special objective function is designed, and then a technique based on the iterative gradients is employed to optimize the objective function with respect to the input image [49].

In this method, the generation of the adversarial perturbations has been considered as an optimization problem. The outcome of this attack is displayed in Fig. 7.

Fig. 7
figure 7

A demonstration of the RAP attack

3.6 The generative adversarial training (GAT) method

This method is designed based on accurate identification of weaknesses and strengths of the target network. This algorithm is designed to be able to repeatedly eliminate the weaknesses of the adversarial example and improve its strengths to make the adversarial example more efficient.

In the course of each training step, the GAT scheme learns to produce the best perturbation for each input. Simultaneously, a classification network is trained by the GAT to correctly classify the original and the adversarial examples [50]. The loss function of the GAT method is expressed as

$$ L_{F} = \alpha \cdot J(\theta_{f} ,x,y) + (1 - \alpha ) \cdot J(\theta_{f} ,x + G(\Delta ),y), $$
(11)

where

$$ J(\theta_{f} ,x,y) = - \log F(X;\theta_{f} )_{y} $$
(12)

Typical values of α and k could be 0.5 and 1.0, respectively. Also, F is a classifying network. The schematic of this technique is illustrated in Fig. 8.

Fig. 8
figure 8

The GAT method

4 The different types of defense

4.1 The adversarial training method

In this approach, the adversarial training [54] is employed as a defensive mechanism against the adversarial attacks. The basic strategy of this method is to use the perturbed images to train a network. Zhang et al. [50] have presented the following formula as an adversarial training method for achieving robust object detectors.

$$ \mathop {\min }\limits_{\theta } [\mathop {\max }\limits_{{\bar{x} \in S_{{cls}} \cup s} _{{loc}}} L(f_{\theta } (\bar{x}),{ y_{k} ,b_{k} } )] $$
(13)

in which

$$ s_{cls} = \{ \overline{x}|\mathop {\arg \,\max \,loss_{cls} (f(\overline{x}),\{ y_{k} \} )\} }\limits_{{\overline{x} \in s_{x} }} $$
(14)
$$ s_{loc} = \{ \overline{x}|\mathop {\arg \,\max \,loss_{loc} (f(\overline{x}),\{ b_{k} \} )\} }\limits_{{\overline{x} \in s_{x} }} $$
(15)

As one of the first attempts to robustify object detectors against the adversarial attacks, this method has been able to achieve good results. Zhang et al. [51] have tested this technique on the PASCAL VOC and MS COCO datasets. Amirkhani and Karimi [65] also tested this method on different architectures, and on the average, it was able to improve the adversarial accuracy of the models by about 20%.

4.2 The ADNet method

The detection strategy of this method is based on the adversarial detection network (ADNet). The ADNet learns the detection abilities from the input images in a hierarchical fashion. In this process, the input images pass through the convolutional and the composite layers. The first convolutional layer has 6 ability maps of size 5 × 5 and a step size of 1. Next, the ADNet performs the secondary sampling by using a 2 × 2 size filter and a step size of 2. Then, there is a second convolutional layer with 16 feature maps of size 5 × 5 and a step size of 1. In this layer, only 10 of the 16 feature maps are connected to the 6 feature maps of the preceding layer. The fourth layer in the proposed network is again a medium cumulative layer with a 2 × 2 size filter and a step size of 2. This is similar to the second layer, except that it has 16 feature maps. The fifth layer is a fully connected convolutional layer with 120 feature maps of size 1 × 1. Each of the 120 elements in the FC5 is connected to all the 400 nodes (5 × 5 × 16) in the fourth layer. The sixth layer is a fully connected layer with 84 neurons. Finally, we have one fully connected SoftMax output layer with 2 possible values corresponding to the perturbed images or the original images. Our convolutional layers use the “ReLU” activation function and the “Adam” optimization algorithm.

A desirable characteristic of the ADNet method is that it can fool deep models in the test phase [55]. Functioning as a separate module, it is able to detect the adversarial examples independently of a considered model; it can also act as a hidden component of an overall intelligent system. This makes the ADNet inherently strong against the attacks on itself. Note that this feature of the ADNet is contrary to most of the existing decision networks that have to rely on the internal states of a network during the test phase, which makes them exposed to a potential attackers. It is also worth noting that these methods are also incapable of dealing with the pixel-level attacks.

The only dependency of the ADNet is on a considered network during the training phase, in which the adversary examples for a model are produced by attacking the model. In this work, the ResNet has been employed to train the ADNet. However, any other network or group of networks could be used to train the ADNet and to further improve its ability of detecting the adversarial examples [52].

4.3 The JPG compression method

This method has demonstrated that by reducing the data volume of images via converting them to the JPG files, the effects of the adversarial attacks on the detection ability of the DNNs can sometimes be eliminated. Of course, this approach alone cannot be considered as a complete defensive strategy [56]. The authors of this paper have pointed out that most of the image classification datasets contain the images in the JPG format. In view of this observation, they have studied the effects of the JPG compression on the perturbations generated by the FGSM [63]. They have reported that the JPG compression technique can significantly reverse the loss of the classification accuracy for the FGSM perturbations [63]. That being said, heavy compression itself can reduce the performance of neural networks.

4.4 The parseval networks

Cisse et al. [64] have presented the Parseval networks as a defensive method against the adversarial attacks. These networks use a Lipschitz constant. Since a network can be considered as a combination of functions, it can be robustified against the small input perturbations by keeping the Lipschitz constant small for these functions. They have developed this method by controlling the position norm of the network weight matrices and parameterizing them by means of the hard Parseval frameworks; thus, they have called their method, the Parseval networks.

4.5 Gabor convolutional layers

Amirkhani and Karimi [65] recently proposed a new method in order to robustify the object detectors against adversarial attacks based on Gabor convolution layers. In this method, the images are first decomposed into their own RGB channels. Then they enter a Gabor filter bank. Due to its high ability to extract low-level image features, Gabor filters can increase network robustness at this stage. The authors of this study have been able to provide considerable improvements on the performance of object detection models against images infected with adversarial attacks. In [65], five robust models of object detection against adversarial attacks are presented and these models have been evaluated using different attacks. The method presented in this paper has been able to improve the performance of object detectors against adversarial attacks up to 50%.

5 The application scenarios

5.1 Adversarial robustness in autonomous vehicles

Because of the vital importance of object detection in autonomous vehicles, we will discuss it separately in this section.

5.1.1 Adversarial attacks in autonomous vehicles

In autonomous vehicles, by adding small intangible perturbations, deep learning models are fooled into making wrong detections and predictions [65]. In self-driving car applications, depending on the capability of an attacker, these attacks are divided into the white-box and the black-box attacks. In the white-box attacks, the attackers have all the information about the model being attacked. This information may include the training and the validation data, the model’s architecture and all its parameters, the way the model is trained, and the status of the model’s gradient during training [77]. Conversely, the black-box models have no information about the models [78]. The adversarial attacks in the autonomous vehicles are briefly reviewed in Fig. 9. In general, there are two types of adversarial attacks: the fleeing attacks and the poisoning attacks. The fleeing or deceptive attacks occur during the inference process, and the poisoning attacks take place during the model training. These attacks were initially tested on classifier models.

Fig. 9
figure 9

A brief review of the adversarial attacks in the autonomous vehicles

5.1.1.1 The white-box methods

White-box attacks are designed with full knowledge of the target model and its parameters. For example, three different white-box methods for producing the adversarial examples are introduced below:

(a) The Gradient-Based Method: In this method, all the attacks such as those in [79] and [80] are based on the Fast Gradient Sign Method (FGSM). In these methods, the adversarial examples are created directly by increasing the value of the cost function gradient for every pixel of an original image.

(b) The Optimization-Based Methods: These techniques ([81] and [82]) generate the adversarial examples by solving an optimization problem such as the following equation.

$$ \arg \min \alpha \left| {\left| {x - x^{^{\prime}} } \right|} \right|_{p} + L(J_{\theta ,c} (x^{^{\prime}} )) $$
(16)

In the first part of this equation, L denotes the distance between the original and the adversarial images and the second part is the cost function restriction of the adversarial image [83].

(c) The Generative Methods: These types of attacks ([84]) exploit the advantages of the generative methods to produce the adversarial examples. These techniques create a generative model \(\varsigma \) by optimizing the following function.

$$ L = L_{\gamma } + \alpha L_{\varsigma } $$
(17)

In this equation, \(L_{\gamma }\) represents the cross-entropy cost function for the adversarial examples and the target object class, and \(L_{\varsigma }\) indicates the degree of similarity between the adversarial examples and the original images.

5.1.1.2 The black-box method

In the black-box attacks, the attackers have no information about the model being attacked; they can only feed an input to the model and then evaluate its output [85]. Three example approaches that are used in the black-box attacks to generate the adversarial examples are as follows:

(a) The Transformation Method: It has been shown that the adversarial examples that are produced for a model by this approach are more effective than those generated by the other methods [82]. Therefore, in this approach, the attackers can use the input/output results to create a model similar to the target model and then apply the white-box techniques to generate the adversarial examples for this model. They can then use these adversarial examples to attack the target model.

(b) The Score Method: In this approach, the score of the gradient output can be estimated by knowing the target model results and accuracy and, based on this information, the adversarial examples can be generated [86].

(c) The Decision Method: In this approach, a model’s final results are used to generate the adversarial examples with large random perturbations. Then the perturbations are reduced in magnitude so that they go along with the characteristic of the adversarial examples, i.e., the intangible perturbations.

It is worth noting that the black-box attacks are more realistic than the white-box ones. The white-box attacks need the full information about the driving models of the autonomous vehicles, which is not available for most of the commercial vehicles.

In [87], a real-world adversarial attack on the traffic signs is implemented. Zhang et al. [88] presented a physical camouflage for an adversarial attack that was similar to the camouflage in the simulation programs. This technique performed as good as the detectors in leading to wrong detections. A perturbed stop sign in [89] could not be detected by the best detectors, such as the one in [90]. A technique called the DeepBillboard has been formulated in [91], which causes a deviation in the steering of the autonomous vehicles from their original paths by creating adversarial advertising billboards. This attack causes a maximum deviation of 26.44° in the steering of a vehicle. In [92], an end-to-end driving model was attacked by means of the adversarial perturbations in the driving environment. This attack caused the vehicle to crash in the CARLA simulator. A decision method for producing the adversarial textures for attacking the autonomous vehicle systems was introduced in [93]. This method leads to the wrong detection in these vehicles.

5.1.2 Adversarial defenses in autonomous vehicles

There are numerous defensive methods against the adversarial attacks on classification models. However, many of these approaches cannot be applied to the regression models used in autonomous vehicles. Figure 10 illustrates a break-down of the defensive techniques for the adversarial attacks in autonomous vehicles.

Fig. 10
figure 10

Different types of defenses against the adversarial attacks in autonomous vehicles

In the following, we will review some of the defensive techniques against the adversarial attacks in the autonomous vehicles.

5.1.2.1 The detection-based approaches

In these methods, robust models try to detect the presence of the potential attacks. Zheng et al. [94] have presented a detection-based method in which an iterative algorithm detects the presence of an attack in an input sample and tries to robustify the network with respect to this attack. The iterative methods are interesting approaches for robustifying the models against the adversarial attacks, but their effectiveness in the white-box or the image-based attacks is questionable. In some works [95, 96], the responsibility for detecting the presence of attacks has been laid on the preprocessing systems that exist in the autonomous vehicles, and these systems are expected to perform satisfactorily in detecting the adversarial attacks [97].

5.1.2.2 The training-based approaches

In these approaches, like in the training-based methods in the field of object detection, the adversarial training technique is employed to robustify the autonomous vehicles against the adversarial attacks [57]. In the training phase of the adversarial training process, a combination of the clean and perturbed images is given to a network for the training purposes, and since the network has already been exposed to the adversarial examples, its adversarial accuracy is expected to improve [98]. Yan et al. [99] have presented an efficient training-based method. In this approach, first, the input images are perturbed by different adversarial attacks and then combined with the clear images, and the new dataset thus obtained is used during the network training.

5.2 Face recognition/detection

Face recognition is one of the most important applications of object detection in the deep learning models. Face recognition is used in a vast spectrum of human–computer interfaces, cameras, and biometric detectors [100]. Therefore, adversarial attacks in face recognition applications are studied and surveyed in the following subsections.

5.2.1 Adversarial attacks in face recognition

Adversarial attacks in the face recognition applications have been investigated in many works. For example, using an adversarial attack generating network, a method of DNNs was presented in [101]. This technique is based on solving an optimization problem that can be scaled and applied to other networks as well. This method was applied specifically on the FRCNN face recognition model and was able to considerably reduce the precision of the network on the 300-W dataset (the effective precision of the FRCNN was reduced to 0.5%).

5.2.2 Defense strategies in face recognition

Different defensive techniques are also employed in the face recognition models. For example, to prove the efficacy of its adversarial attack, the defense strategy presented in Sect. 4.3 has been adopted in [101]. This method, which uses image compression to boost the resistance against adversarial attacks, has been able to enhance the network precision by 5%. Various defense strategies have been presented in this field, and the defenses outlined in Sect. 3 can be extended to this section as well.

6 Conclusion

This paper surveyed the adversarial attacks, defenses, and the related research works in the fields of object detection and autonomous vehicles. Despite the high precision of the DNNs in various computer vision tasks, these networks are vulnerable against the small imperceptible input perturbations and produce totally different outputs when exposed to such disturbances. The formulation of the effective adversarial attacks and appropriate defenses against these attacks has become an important subject in the deep learning research articles. In this review paper, we have introduced and compared the most significant attacks and defenses in the fields of object detection and autonomous vehicles. The current deep learning techniques can be easily attacked, but owing to the tremendous research efforts in this field, it is hoped that in the near future, the deep learning methods will be able to achieve great robustness against the devised adversarial attacks.

Table 1 Comparing the adversarial attacks explored in this paper
Table 2 Comparing the defense techniques considered in this paper