1 Introduction

Fig. 1.
figure 1

Vax-a-Net vaccinates pre-trained CNNs against adversarial patch attacks (APAs); small image regions crafted to induce image misclassification. A shark is correctly classified by a VGG CNN (left), but fooled by an APA [3] (middle). Vax-a-Net applies defensive training to improve the CNN’s resilience to the APA (right). Visualizations show CNN attention (via Grad-CAM [25]).

Convolutional neural networks (CNNs) are known to be vulnerable to adversarial examples: minor changes made to an image that significantly affect the classification outcome [10, 31]. Adversarial examples may be generated by pixel-level perturbation of the image, introducing covert yet fragile changes that induce misclassification [4, 8, 10, 19]. More recently, adversarial patches or ‘stickers’ have been proposed [3, 7, 8], creating overt changes within local image regions that exhibit robustness to affine transformation, and even to printing. Despite the increasing viability of such ‘adversarial patch attacks’ (APAs) to confound CNNs in the wild, there has been little work exploring defences against them (Fig. 1).

The core contribution of this paper is a new method to defend CNNs against image misclassification due to APAs. Existing defences typically seek to detect and remove patches in a pre-processing step prior to inference; e.g. exploiting the high visual salience of such patches. Yet the manipulation or removal of salient content often degrades model performance (Table 1). To avoid these problems we propose adapting the method of adversarial training to the realm of APAs. We leverage the idea of generative adversarial networks (GANs) [9] to simultaneously synthesise effective adversarial patches to attack a target CNN model, whilst fine-tuning that target model to enhance its resilience against such attacks. Existing APA methods synthesise a patch via optimizations that take several minutes to converge [3, 8]. In order to incorporate patch synthesis into the training loop, patch generation is run via inference pass on the Generator which takes less than one second. Furthermore, patch generation is also class-conditional; a single trained generator can create patches of many classes. Moreover, we demonstrate that the protection afforded to the model transfers to also defend against existing APA techniques [3, 8].

We show for the first time that adversarial training may be leveraged to adapt a pre-trained CNN model’s weights to afford it protection against state of the art APAs. We demonstrate this for both untargeted attacks (seeking misclassification) and targeted attacks (seeking misclassification to a specific class) over several contemporary CNN architectures. We demonstrate that a CNN may be ‘vaccinated’ against two state of the art APA techniques [3, 8] despite neither being invoked in that process. Immunising a CNN model against APA via further training, contrasts with existing APA defences that filter images to mitigate patches at inference time. We show our method better preserves classification accuracy, and has a higher defence success rate than inference-time defences [11, 20].

The adoption of CNNs within safety-critical autonomous systems opens a new facet of cyber-security, aimed on one hand to train networks resilient to adversarial attacks, and on the other to evaluate resilience by developing new attacks. This paper makes explicit that connection through adversarial training to immunise CNNs against this emerging attack vector.

2 Related Work

Szegedy et al. introduced adversarial attacks through minor perturbations of pixels [31] to induce CNN image misclassification. Goodfellow et al. later introduced the fast gradient sign method (FGSM, [10]) to induce such perturbations quickly in a single step, exploiting linearity of this effect in input space. These methods require access to the target model in order to backpropagate gradients to update pixels, inducing high frequency noise that is fragile to resampling. Later work improved robustness to affine transformation [19], whilst minimising perceptibility of the perturbations [4]. Gittings et al. [8] improved robustness using Deep Image Prior [34] to regularise perturbations to the manifold of natural images. Nevertheless current attacks remain susceptible to minor scaling or rotation. Other work made use of generative architectures to produce more effective attacks [1, 2, 28, 35].

Adversarial Patch Attacks. Brown et al. demonstrated that adversarial patches could be used to fool classifiers; they restricted the perturbation to a small region of the image and explicitly optimised for robustness to affine transformations [3]. Both Brown et al., and later Gittings et al. [8] backpropagate through the target model to generate ‘stickers’ that can be placed anywhere within the image to create a successful attack. This optimization process can take several minutes for one single patch. Karmon et al. showed in LaVAN that the patches can be much smaller if robustness to affine transformation is not required [14] but require pixel-perfect positioning of the patch which is impractical for real APAs. In the complementary area of object detection (rather than image classification, addressed in this paper) Liu et al. disabled an object detector using a small patch in one corner of the frame [16]. Eykholt et al. applied adversarial patches to traffic signs, explicitly optimising for printability [7]. Chen et al. performed a similar attack on an object detector with Stop signs [5]. Thys et al. attacked a person detector using a printable patch [33].

Defences at Training Time. Whilst introducing adversarial examples, Szegedy et al. also proposed adversarial training to defend against them [31]. Adversarial training is a form of data augmentation that introduces adversarial examples during the training process in order to promote robustness. This method was impractical when first proposed due to the slow speed of producing adversarial examples making it infeasible to do so during training, but this was resolved by Goodfellow et al.’s FGSM [10], and later others with more general fast gradient methods [17, 26]. Kurakin et al. applied adversarial training to the ImageNet dataset for the first time [15]. Jang et al. make use of a recursive attack generator for more effective adversarial training on MNIST and CIFAR-10 [13]. Papernot et al. applied the idea of distilling the knowledge of one neural network onto another in a way that masks the gradients at test time and prevents an attacker from being able to use backpropagation [21]. All the above only train or fine-tune models to defend against adversarial image examples, rather than defending against localised patch attacks i.e. APAs as in our work.

Defences at Inference Time. Meng and Chen observed that by approximating the manifold of natural images it is possible to remove perturbations within an adversarial image as a pre-process at inference time. By projecting the full image onto this manifold [18]; they approximated the input image using an autoencoder. Samangouei et al., and separately Jalal et al., use a GAN in place of an autoencoder [12, 24] to similarly remove adversarial perturbations.

Fig. 2.
figure 2

Proposed architecture for using adversarial training to robustify a model \(f\) against adversarial patch attacks. The conditional patch generator \(G\) can synthesise adversarial patches for \(f\) attacking multiple classes. We alternately train \(G\) and \(f\) to promote the resilience of the model against APAs [3, 8].

Naseer et al. [20] have created one of the few defences against localised perturbations i.e. APAs. They observe that adversarial patches are regions of the image with especially high gradient (this is likely how they draw attention over other areas of the image). By applying local gradient smoothing (LGS) – conceptually the opposite of a bilateral/edge-preserving blur – patches are neutralised but at the cost of lowering the classification accuracy on clean images, since classifiers rely upon structural edge detail as a recognition cue. Hayes [11] created a different method to defend against localised adversarial attacks. The defence is split into two stages: detection and removal. To detect the patch they create a saliency map using guided backpropagation and assume that a collection of localised salient features implies that there is a patch. To remove the patch, an image in-painting algorithm [32] is applied to the masked region cleaned up via some morphological filtering.

Rather than attempt to detect and erase adversarial patches, Vax-a-net takes a generative adversarial approach to simultaneously create attack patches and fine-tune the model to ‘vaccinate’ it against APAs.

3 Method

Fig. 3.
figure 3

Representative patches sampled from our generator at training epochs 500–2500 for two attack classes. Patches were generated to defend a VGG-19 model trained on ImageNet.

Consider a CNN classifier \(f:\mathbb {R}^m\rightarrow \mathbb {R}^k\) pre-trained to map a source image x to vector of probabilities f(x), encoding the chance of the image containing each of a set of classes \(c \in \mathcal {Y}\). Adversarial image attacks introduce a perturbation \(r\in \mathbb {R}^m\) to that source image such that . We say such attacks are untargeted; seeking only to induce misclassification. If our aim is to introduce a perturbation r such that we say the attack is targeted to a specific class (i).

Most adversarial images \(x+r\) are covert attacks; typically a barely perceptable r, distributed across the whole image, is sought. By contrast, adversarial patch attacks (APAs) have been introduced as overt attacks, in which an adversarial patch (‘sticker’) is synthesised and composited into a region of an image in order to induce misclassification. We define a region of interest (ROI) via binary mask \(M \in [0,1]\). In this case we seek perturbation r, which can be large, to create a composite image

$$\begin{aligned} \hat{x} = M \odot r + (1-M) \odot x \end{aligned}$$
(1)

where \(\odot \) is element-wise multiplication. A single adversarial patch capable of attacking multiple images can be created by sampling x in mini-batches from a set of training images (versus learning r over a single image, as is typical for whole image case), as we now explain.

3.1 Conditional Patch Generation

Our aim is to defend a pre-trained CNN classifier model against adversarial patch attacks exclusively through modifications in the training process. Although this has been achieved with good success for adversarial image examples, the process of adversarial training used in that case does not apply straightforwardly to the case of adversarial patches. Existing methods of adversarial training require patches to be synthesised at each step of the training process, which is impractical as existing APA methods can take several minutes to synthesise patches. To mitigate this, we adapt the idea of a conditional Deep Convolutional Generative Adversarial Network (DC-GAN) [23], to synthesise effective adversarial patches while simultaneously training the model to defend against those patches.

Figure 2 illustrates the Vax-a-Net architecture; a conditional patch generator \(G\) is used to synthesise patches which are then applied via a differentiable affine transformation and compositing operation to a training image. The training image is then classified via the target CNN f which we wish to defend; this model plays the role of discriminator in the GAN.

Our conditional patch generator \(G\) takes an input of a noise vector \(z\), accompanied by a one-hot vector encoding the class c that the attack is targeting, and produces an adversarial patch of size \(64\times 64\). It consists of five up-convolutional layers with filter size \(4\times 4\). The number of output channels for the hidden layers are 1024, 512, 256, 128 respectively. The first layer has a stride of 1 and no padding, the remainder have a stride of 2 and 1 pixel of zero-padding. We use batch normalisation after each layer, and leaky-ReLu activation. Our proposed loss function for the generator is

$$\begin{aligned} L_G = \mathbb {E}_{c,z,x,t,l} J(f(A(G(z,c), x, l, t)),c), \end{aligned}$$
(2)

where \(A\) is the patch application operator, which we will define and explain further in Sect. 3.2, and \(J\) is the cross-entropy loss between the output of \(f\) and the target class.

Fig. 4.
figure 4

Patches sampled from our conditional generator G to attack an undefended VGG-19 model. (Color figure online)

In our work we explore \(G\) capable of producing effective patches for 1–50 different ImageNet classes (Sect. 4.4). Figure 4 shows the patches that a conditional generator for 10 classes can produce after 500 epochs of training without training the discriminator, i.e. these are patches effective at attacking the undefended network. Figure 3 shows how patch content evolves as training proceeds beyond the initial training, taking into account the discriminator. The patches resemble abstract versions of the object they are attacking, but with striking colour to attract attention away from other objects.

3.2 Patch Application and Target Model

The output of our generator G(zc) is an image of size \(64\times 64\), which we must turn into a patch and apply to the image. First we apply a circular mask to create a round patch (after [3, 8]). Next we apply the patch \(p\) to the image \(x\) at location \(l\) and with an affine transformation \(t\). We denote the output of this operation as \(A(p, x, l, t)\). We use an expectation over transformation to ensure the patch works in any location and with any affine transformation applied. In our training, we enable random rotation of up to , scaling to between \(1\%\) and \(25\%\) of the image, and translation to any location on the image.

The training process consists of two stages. Initially the discriminator (classifier) is frozen, and we train our generator to produce effective adversarial patches. We then alternate between training the generator and discriminator for each batch, in the usual manner for training a GAN.

The loss function for \(G\) was defined in Eq. 2. Our loss function for \(f\) is

$$\begin{aligned} L_f=\mathbb {E}_{c,z,x,w,t,l} (J(f(A(G(z,c), x, l, t)),y) + J(f(x), y) + \lambda J(f(w), c)), \end{aligned}$$
(3)

where \(w\) are images of class \(c\). Recall that \(J(f(x), y)\) is the cross-entropy loss between the output of CNN \(f\) applied to classify the image \(x\) and the ground truth class \(y\). In practice to approximate the expectation we sample x in mini-batches from a set of training images, and for each image we randomly pick \(c\ne y\) from our set of attack classes (Sect. 4), \(l\), \(t\) from fixed distributions \(\mathcal {L}\), \(\mathcal {T}\), and \(z\) from a standard normal distribution. The first term of the loss ensures that the model correctly classifies images with patches, the second ensures that the model continues to correctly classify images without patches, and the third is to ensure that it continues to correctly classify images of class \(c\). We empirically selected the weight \(\lambda \) of the third term to have a value of 2.

3.3 Training Methodology

The architecture of our generator is close to standard for a GAN, and in place of the discriminator we have a CNN classifier which we intend to robustify. Instead of using the discriminator as a tool to enable the generator to learn how to sample from some underlying distribution from which the training data are drawn (e.g. the distribution of natural images), we are using a similar architecture to perform a different task. The main difference stems from our final goal; to end up with a discriminator that is not fooled by any patches (hence a generator with a low success rate), which is the opposite of a regular GAN. Another difference is that our discriminator is a classifier for many (here, 1000 ImageNet classes) not a binary classifier for real/fake, again meaning that the generator will never be able to achieve its goal since the goalposts constantly move i.e. there is no underlying static distribution that it will approximate.

We pre-train the generator for 500 epochs before alternating the training of both for each batch. For the generator we use an Adam optimiser with learning rate 0.001 and for the discriminator, Adam with learning rate of \(2 \times 10^{-7}\).

Fig. 5.
figure 5

Training losses and success rates for our VaN defence. (a) shows the losses for \(G\) and \(f\). Recall that for the first 500 epochs \(f\) is not trained, hence why its line is missing. (b) shows the training success rates of patches from \(G\) applied to the current \(f\) (blue), as well as the original \(f\) (green). It also shows the success of \(f\) at classifying images from \(\mathcal {Y}\), with (orange) and without (red) patches, and also \(\mathcal {A}\) (purple) see Sect. 4 (Color figure online)

Figure 5 shows both the losses and the success rates on the training data for both \(G\) and \(f\). We observe that during the 500 epoch pre-training phase for \(G\) its loss \(L_G\) becomes close to zero and its attack success rate climbs to \(\sim \)80%, showing that we can produce effective adversarial patches with our conditional generator. Once the discriminator is updated, it quickly learns not to be fooled by the patches, so the success rates for \(f\) increase while those for \(G\) decrease. The success rate of patches produced by \(G\) when applied to the original model is quite erratic, but declines over time. This confirms that \(f\) is diverging from its original state, and that the set of patches effective at fooling it diverges from those that originally fooled the undefended model.

Table 1. Control: Accuracy of models over the set of test images without attacks \(\hat{\mathcal {I}}\), reported for all ImageNet classes (\(\mathcal {Y}\)) and the subset of these classes used to form patches for APA (\(\mathcal {A}\)). Reported as top-1 accuracy for the undefended model, and the model defended by our method (D-VaN) or baselines.

4 Experiments and Discussion

We evaluate our proposed Vax-a-Net (VaN) method for defending against adversarial patch attacks (APAs) on image classification models trained using three popular network architectures; VGG-19 [27], Inception-v3 [30], and Inception-ResNet-v2 (IRN-v2) [29].

Table 2. Success rate of defences against adversarial patch attacks covering \(10\%\) or \(25\%\) of the image. We report figures for our Vax-a-Net defence (D-VaN) as well as baseline defences and undefended models. The defence success rate is the proportion of images classified correctly despite the application of APA (higher is better).
Table 3. Success rate of attacks against our models defended by Vax-a-Net, as well as models defended with the baselines, and undefended models. The attack success rate is the proportion of images classified as the adversarial target class when APA is applied (lower is better).

Baselines. We compare the efficacy of our Vax-a-Net defence (D-VaN) against 2 baseline APA defences: the local gradient smoothing (D-LGS) method of Naseer et al. [20] and the watermark removal method (D-WM) of Hayes [11]. We test the effectiveness of our defence and the baseline defences against 2 baseline patch attacks; the adversarial stickers (A-ADS) method of Brown et al. [3], and the deep image prior based (A-DIP) method of Gittings et al. [8]. For all attacks we used public open source implementations, but for defences due to absence of author code we use our own implementations in the open-source PyTorch library [22]. Due to the architecture of the pre-trained network available in PyTorch and the nature of the defence we were unable to implement D-WM on the Inception-v3 model, and results for this model were not originally reported.

Datasets. We evaluate over the ImageNet [6] dataset containing 1k object classes \(\mathcal {Y}\), using the published training (1.2M images) and test (50k images; 50 per class) partitions. For each of the architectures tested we use a model pre-trained on ImageNet, distributed with PyTorch. We refer to these as undefended models. Our proposed defence (D-VaN) involves further training of undefended models using the same training set. The test set comprises 50k images upon which attacks are mounted, each by inserting one adversarial patch. Let this unaltered test set be \(\hat{\mathcal {I}}\). The patch is crafted to encourage an image containing object of ground truth class \(y \in \mathcal {Y}\) to be misclassified a single target class \(c \in \mathcal {A}\); we use the subset of 10 attack classes \(\mathcal {A} \subset \mathcal {Y}\) proposed by Gittings et al. [8]. We evenly distribute these attack classes across the test set; let this set of attack images be \(\mathcal {I}\).

Fig. 6.
figure 6

Success rates of defended VGG-19 networks against APAs for patches covering up to 25% of the image.

Metrics. We measure the attack success rate as the proportion of \(\mathcal {I}\), containing patches crafted to indicate misclassification as \(c \in \mathcal {A}\) result in those image being misclassified as a; i.e. the success rate of a targeted attack. We measure the defence success rate as the proportion of \(\mathcal {I}\) that are correctly classified as their true class y (despite the APA). Thus the inverse of the defence success rate, is the untargeted attack success rate i.e. where any misclassification occurs due to the APA. All success rates are expressed as the percentage of the 50k attack image set \(\mathcal {I}\) constructed with the APA analysed in that experiment. All experiments were run for 1000 iterations training and 5 restarts.

4.1 D-VaN vs Baseline Defences

We first evaluate the performance of our defence (D-VaN) at reducing the effectiveness of adversarial patches synthesised by existing APA attack methods A-ADS [3] and A-DIP [8]. Both of these methods are white-box attacks, that run backpropagation through the model in order to generate patches to attack it.

We mount such attacks against our defence, the two baseline defences, and an undefended model as a control. In the case of the baseline methods we use patches that are trained on the undefended network, and then apply them to the defended network, since the defence layers are not usually differentiable. In the case of our model we attack it using patches generated on both the defended and undefended networks; D-VaN(D)/D-VaN(U). This measures transferability of the learned protection against attack from our generator G to the A-ADS and A-DIP attacks. We report both D-VaN(D) and D-VaN(U) because they can each highlight different flaws in a network’s defences, and both make sense as real-world attack vectors.

Fig. 7.
figure 7

Success rates of defended Inception-v3 networks against APAs for patches covering up to 25% of the image. We do not include a line for D-WM since the implementation of the defence was incompatible with Inception-v3.

Fig. 8.
figure 8

Success rates of defended InceptionResNet-v2 networks against APAs for patches covering up to 25% of the image.

We consider patches of a variety of sizes up to \(25\%\) of the total image area. Patches are placed randomly, anywhere in the image, and with a random rotation of up to for all experiments.

In Table 1 we report the accuracy of our model and all the baseline models on images with no adversarial attack, for \(\mathcal {Y}\) and \(\mathcal {A}\). The two baseline defences substantially reduce the accuracy of the model on the unattacked images, which is very significant for most applications since adversarial examples are relatively rare, i.e. clean images represent the overwhelming majority of samples that will be encountered in the real world. Our defended network maintains the accuracy of the undefended classifier on this set for all 3 classifiers we tested. We also note that no defence method significantly reduces model sensitivity for \(\mathcal {A}\) given clean images, which could cheat the trial by failing to ever identify images as these adversarial test classes.

4.2 Network Architecture and Patch Size

Table 2 reports the improved resilience of models under our defence, showing significantly higher defence success rates for VGG, Inception and IRN-v2 architectures at \(41.0\%\), \(51.3\%\), and \(60.7\%\) and \(14.7\%\), \(23.5\%\), and \(35.0\%\) respectively for smaller and larger patches in the case of D-VaN(D). For smaller patches these rates are at least \(30\%\) higher than the closest baseline defence method, and for larger patches they are comparable. If we consider instead D-VaN(U), then for smaller patches the accuracy is reduced by only at most \(25\%\) from the original, and for larger patches it is still greater than \(60\%\) of its original value.

Fig. 9.
figure 9

Grad-CAM [25] visualisations for our VaN defended VGG network vs the undefended model. The original image (a) is classified (correctly) as a great white shark by both the undefended and defended (D-VaN) models, whereas the patch image is misclassified as a shark by the undefended model, but classified correctly by the defended (D-VaN) model. (Color figure online)

Table 3 shows the reduced vulnerability of our defended models, for all 3 architectures. Again the reduction is most evident for smaller patches, where our defended classifier is fooled less than \(10\%\) as often as our closest competitor. The performance at for larger patch sizes is closer, but we still outperform baselines. In the case of D-VaN(U), our attack success rate is reduced to less than \(5\%\) for all networks, even for the largest patches.

Figures 6, 7 and 8 show the dependence of attack and defence success rates on size, for our defence method as well as the baseline methods. Our method is an effective defence for all three architectures we are testing, and at all scales of patch. The performance of our method degrades as the size of the patch increases, which is expected since the patch covers up to 25% of the image, possibly occluding some salient object detail.

4.3 Attention Under Attack

Figure 9 uses Grad-CAM [25] to localise CNN attention for a particular class, for both our D-VaN defended model and the undefended model. Here the model is being attacked via A-ADS with target class of ‘wallaby’ whereas the true class of the image is ‘great white shark’. Note that all plots are normalised; blue/purple relatively high attention, green/blue relative low. For images flooded with green/blue, there was low response for that class (c, e, g, j).

On the original image (a) with no patch, our model (d) and the undefended model (b) perform similarly. Both decide on the most likely class as shark, and both identify the region containing the shark as being of high importance. For this unattacked image, the response for the counterfactual class ‘wallaby’ is naturally low and both (c, e) pick a somewhat arbitrary area in the image that was of low importance to the correct decision (shark).

When the adversarial patch A-ADS targeting the counterfactual class is introduced (lower row), the undefended model identifies that patch region as very high salience for the wallaby class (h) and decides on wallaby, whereas our D-VaN defended model does not change its decision from shark, and does not attend to the patch (i). Forcing Grad-CAM to explain shark for the undefended model (which was not the decision outcome, so produces low attention) the original model picks out the area of the shark unoccluded by the patch (g) as does our defended model (i). In our case the model can correctly identify the shark, but in the original case it cannot since its attention was attracted by the wallaby patch. For completeness we show the defended model does not localise wallaby even when forced to explain wallaby in the attacked image (j).

Fig. 10.
figure 10

Success rate of defended networks as we vary the number of classes of APA that our conditional generator produces.

4.4 Class Generalization

In Fig. 10 we examine the effect of changing the number of classes which our conditional generator produces. We train each of the three network architectures to defend against between 1 and 50 classes of adversarial patch, and we evaluate their performance against both A-ADS and A-DIP attacks with patches taking up 10% or 25% of the image. We find that the defence success rate is consistent as the number of classes changes for each network and for each patch size, showing that our method does not break down as the number of classes is increased. For the attack success rate we note that for the most part it increases slightly as the number of classes increases. The exception is large A-ADS patches on Inception-v3 and InceptionResNet-v2 architectures, for which our model loses performance when targeting a very small number of classes. This suggests value in the attack class diversity availabile during training due to our conditional patch generator G.

4.5 Timing Information

Table 4 compares the time taken for inference using our method and baselines. An inference pass on the defended model takes the same time as on the undefended model; the architecture is unchanged. However our defence does take 2–3 h of training to ‘vaccinate’ the model. This process only needs to be run once, as does training the model a priori. The baseline APA defences run as a pre-process at inference time, and so take longer (and also degrade accuracy; Table 1). All runs used an NVIDIA GeForce GTX 1080 Ti GPU.

Table 4. Inference time (seconds) for an undefended VGG model trained on ImageNet, and that model with our defence or baseline defences applied.

4.6 Physical Experiment

To test the effectiveness of our defence against attacks in the physical world, where the appearance of the patch could differ from its digital form, we generated a patch to attack a VGG network targeting ImageNet class 964 “potpie” using A-ADS. We placed this patch on or around objects of 47 different ImageNet classes found in the physical world, for a total of 126 photographs of a patch. The photos were taken on a Google Pixel 2 smartphone. The undefended classifier returned adversarial vs. correct class 84 vs 9 times (attack success rate 90.3%), whereas the Vax-A-Net defended classifier returned similarly 5 vs 71 (attack success rate 6.6%).

5 Conclusion

We proposed Vax-a-Net; a method to ‘immunise’ (defend) CNN classifiers against adversarial patch attacks without degrading the performance of the model on clean data and without slowing down the inference time. In the process of achieving this we produced a conditional generator for adversarial patches, and then we used an adversarial training methodology to update the generator during training rather than having to synthesise patches from scratch at each iteration. We showed experimentally that our method performs better than the baseline defences in both a targeted and untargeted sense, and across three different popular network architectures. Furthermore we showed that our network is resilient to patches produced by two different attacks, and to patches that are produced either on our defended network or on the original undefended network, which demonstrates that our defence taught the network real robustness to these patches, and not simply to hide its gradient or to ignore a group of specific patches. Future work could look into extending these methodologies to defend networks for different tasks, such as patch attacks on object detectors.