1 Introduction

Although being an emerging technology, Deep Neural Networks (DNN) is valued to be $38.71 billion globally by 2023, with wide range of applications cross sectors like finance, energy & utilities, retail, IT & telecom, manufacturing, aerospace & defence, healthcare etc. (according to Allied Market Research) [36]. Along with DNN’s popularity is a growing concern on the safety of DNN models in carrying out the tasks (typically classification) in those applications, especially the security or safety critical ones such as healthcare and self-driving vehicles.

To address the concern, the foremost issue is to ensure the quality of the input data, which the DNN models depend on. As a data-driven technique, a DNN model will only be as good or as bad as the data provided (training data). For instance, a speech recognition system trained on clean speech will not perform well on noisy speech. However, when applied to real-world tasks, it is inevitable that the testing data differs from the training data due to a variety of reasons, such as mis-operations in data collection, natural noise, untrusted data resources etc. Such unavoidable anomaly testing data leads to severe safety problems - the DNN tends to provide high-confidence predictions while being woefully incorrect [16].

A variety of works exist aiming to detect anomaly in testing data, most of which focus on the applications of image classification. We classify these works according to the types of anomaly data they handle: out-of-distribution (OOD) data, adversarial (AD) data, and noise (NS) data.

OOD:

Out-of-Distribution (OOD) data refers to inputs that do not contain any of the classes modeled in the training distribution. For example, the clothing images from Fashion-MNIST are OOD data for a DNN trained with the MNIST data set which consists of hand-written digits. The OOD data considered in this paper are collections of meaningful natural images that are not from ID, excluding those crafted images (known as adversarial data) and meaningless images (classified as noise data).

AD:

Adversarial (AD) data is generated by introducing an imperceptible perturbation to an image from the in-distribution (ID), with the intention of inducing a DNN to make wrong judgments. The adversarial methods used in our experiments include FGSM [13], BIM [22], JSMA [33] and CW [5].

NS:

We consider two types of noise (NS) data. The first type (NS-I) is merely random noise (e.g., Gaussian noise). The second type (NS-II) is often known as fooling images, which are created by evolving meaningless images in order to mislead a DNN to output classes in ID with high confidence, such as the images generated in [30] (examples shown in Figure 9).

Through a comprehensive literature study (see Table 1 in Sect. 2), we observe that existing works either focus only on OOD and NS-I detection, or focus on AD detection. Few works detect both OOD and AD (a notable exception is [25]); and no works aim to detect NS-II.

As suggested by [4], there are no known intrinsic properties that differentiate natural images and adversarial images. We believe that for real-world tasks, all types of anomalies could potentially be fed into DNN models, and there is no effective way to tell ahead of time if they are in-distribution, adversarial, or out-of-distribution images. Many existing works make an implicit assumption that the analysers know which type of anomaly in advance and build detection approaches for the particular type, which is impossible in reality. Therefore, detection approaches should be developed to be able to handle all three types of anomalies that people are aware of.

The uncertainty of the anomaly types leads to a non-trivial challenge in the anomaly detection task. During experiments, we observe that, (1) in general, an approach for OOD performs badly for detecting AD and vice versa. Any combination of the results for each approach would not be able to outperform either the best result for OOD or the best result for AD. In addition, we observe (2) many existing works require pre-processing to the input data. This hinders their applicability in real-world tasks, since one needs to know, in advance, which data is ID and which is OOD in the case of OOD detection, and needs to know by which adversarial algorithm the data is generated in the case of AD detection. Removing the prior knowledge would largely degrade performance. Therefore, this work improves the anomaly detection accuracy without requiring prior knowledge of input data. Our approach detects anomaly using features from the Most Discriminative Layer (MDL) which has better distinction for the sub-domain of the test input. The anomaly detector then combines the MDL features with the Logit layer to cover all the three types of anomalies.

Contributions This paper proposes a uniform framework for detecting anomaly inputs, which has wider coverage of the anomaly types, easier applicability to various models, and better performance in accuracy, compared to the state-of-the-art approaches.

Coverage:

Our approach is able to detect all three types of anomaly inputs, including OOD, AD and NS, which addresses the challenge of unknown anomaly types in real-world applications.

Applicability:

Our approach provides a uniform way for the anomaly detection, and thus can be applied to any existing model without requiring extra pre-processing.

Performance:

To evaluate our approach, we conduct extensive experiments and a comprehensive comparison with the state-of-the-art approaches. The experiment results show that our approach outperforms the best results in OOD, AD and NS in most of the cases.

2 Related Work

Table 1 A summary of related work

As mentioned in the previous section, existing works either focus on OOD and NS-I detection, e.g., [1, 12, 16,17,18, 26], or focus on AD detection, e.g., [8, 11, 27, 42, 46]. The only work that can detect both is Mahalanobis distance (MD) [25].

OOD+NS-I The seminal work for OOD detection is known as the baseline approach [16], which observes that the softmax prediction probability of OOD tends to be lower than the prediction probability for correct examples, and thus a threshold over the predicted softmax probability can be used to detect OOD. A year later, the work ODIN observes that temperature scaling and input perturbation (pre-processing) can enlarge the gap between ID and OOD, and thus can be used to improve the detection performance [26]. Meanwhile, the pre-processing of ODIN requires access to OOD samples in advance, which is impossible in reality, to fine-tune the degree of perturbation. To address this limitation, the work Early Layer Output (ELO) [1] proposes a one-class classifier trained on the output of an early layer instead of the softmax layer, which does not need to access to OOD samples. An alternative approach called Generalized ODIN (G-ODIN) [18] is later proposed, which only tunes the ID data instead of OOD data. Differing from the above works, the work Outlier Exposure (OE) [17] fine-tunes the pre-trained model using an auxiliary data set that is selected from a disjoint set of OOD samples. OE includes an additional loss function to minimize the distance between the output distribution produced by the pre-trained model for the auxiliary data set and the uniform distribution. The softmax values are used as scores for anomaly detection. Similar to ELO and OE, our approach does not require preprocessing and the use of the OOD samples in training. Nevertheless, our approach extracts data from a specific sub-domain, which achieves better separation of ID and OOD (see Fig.  3).

AD A variety of approaches have been proposed to detect adversarial samples from their normal and noisy counterparts. For instance, Feinman et al. trains a logistic regression detector using a distance-based generative learning method called kernel density and Bayesian uncertainty features (KD+BU) [11]. Ma et al. proposes an intrinsic character of the adversarial regions - the local intrinsic dimensionality (LID), as the confidence score to separate the adversarial samples [27]. The IF approach demonstrates the correspondence between the training data and the classification of the network, which is quantified using the Influence Function, and outperforms LID [8]. Differing from the above works, Xu et al. applies Feature Squeezing (FS) to distinguish the adversarial samples from ID and does not need special treatment to the input data [46]. These works can only detect AD and are not suitable for detecting OOD and NS.

Finally, the work MD measures the probability density of a test sample and uses the Mahalanobis distance as the confidence score to distinguish OOD and AD from ID. It is the first work that can detect both OOD and AD [25]. Our work is able to detect an additional type (NS-II) of anomaly and outperforms MD on most of the test data sets.

3 Preliminaries

In this section, we introduce the notions that are necessary to understand the remaining part of the paper. Let a deep neural network (DNN) of m layers be represented as a function \(f:\mathcal {I}\rightarrow \mathcal {O}\), where \(\mathcal {I}\) is the input domain and \(\mathcal {O}\) is the domain for the output vectors of length d. Given \(x\in \mathcal {I}\), we have \(f(x)=\langle o_1,\dots ,o_d\rangle \in \mathcal {O}\), and the final classification chosen by the DNN is \(C^f(x)=\underset{i \in \{1,\cdots ,d\}}{argmax} \left( o_{i}\right) \) for the index of the largest element in vector f(x). For \(\ell =1\dots m\), we write \(f^\ell (x)\) for the output vector in the feature space of layer \(\ell \). In the literature, the second last layer (i.e., \(\ell =m-1\), right before the softmax layer) is often called the logit layer.

It is commonly assumed that the training data of DNN f is drawn from the distribution \(\Delta \) known as ID. we write \(\Delta _{in}(x)\) if \(x\in \mathcal {I}\) is from ID given f. An anomaly detector \(g_f\) for DNN f is a binary classifier, such that given input \(x\in \mathcal {I}\), \(g_f(x)\) answers whether x is an anomaly with respect to f. Since it is often difficult to define what an anomaly distribution is, we focus on the three types of anomalies (i.e., OOD, AD and NS) in our experiments.

Our anomaly detection algorithm is based on a discriminative model known as Support Vector Domain Description (SVDD) by [41].Footnote 1 Similar to the famous Support Vector Machine [43], SVDD defines support vectors for a sphere shaped decision boundary enclosing the class of objects represented by the (unlabeled) training data with minimal space, as shown in the following formulation.

$$\begin{aligned} \min \limits _{R,\mathbf {a},\xi }\ R^2+ \frac{1}{n\nu }\sum _i\xi _i \end{aligned}$$
(1)
$$\begin{aligned}\text{ s.t. }\quad \forall i:\quad \mathbf {x}_i-\mathbf {a}^2\le R^2+\xi _i,\quad \xi _i\ge 0\end{aligned}$$

The solution of the above constraints provides a center vector \(\mathbf {a}\), radius R and slack variables \(\xi _i\) such that the target term in Eq. (1) is minimized, provided that the square of distance from each training data \(\mathbf {x}_i\) to the center \(\mathbf {a}\) may exceed \(R^2\) by at most \(\xi _i\). Here \(\nu \) is a constant in (0, 1] and n is the size of the training set. Intuitively, a smaller \(\nu \) gives more weight to the right hand side of target term in Eq. (1), which imposes smaller values for \(\xi _i\) and larger R. The solution of Eq. (1) allows us to determine if a test input \(\mathbf {z}\) is from the ID by checking the following condition.

$$\begin{aligned} \mathbf {z}-\mathbf {a}^2=(\mathbf {z}\cdot \mathbf {z})-2\sum _i\alpha _i(\mathbf {z}\cdot \mathbf {x}_i)+\sum _{i,j}\alpha _i\alpha _j(\mathbf {z}\cdot \mathbf {x}_i)\le R^2 \end{aligned}$$
(2)

Here \(\alpha _i\) (\(\alpha _j\)) is the Lagrange multiplier associated with the constraint for the i-th (j-th) training input when solving Eq. (1), which is non-zero only if the i-th (j-th) training input is used as a support vector. Given all inputs (including the test input) only appearing in the form of inner product, it is thus viable to replace the inner products by kernel functions, of which the Gaussian Radial Basis Function (RBF) provides the best performance in practice [40]. The RBF kernel is given in the following formulation, where the free parameter s controls the spread, or how tight the density is, of the kernel.

$$\begin{aligned} K(\mathbf {x}_{i}, \mathbf {x}_{j}) = \exp (-\Vert \mathbf {x}_{i} - \mathbf {x}_{j}\Vert ^{2}/s^2) \end{aligned}$$
(3)

Early-Layer Output (ELO) [1], the work most related to ours, trains a one-class SVDD classifier using an early-layer output of ID data in a latent space, based on the observation that there exists an early-layer called the Most Discriminative Layer (MDL), such that in this latent space the ID data and OOD data are well separated.

4 Our Approach

We propose a uniform framework for the anomaly detection task. In the training phase, for a given DNN classifier, we empirically choose the Most Discriminative Layer (MDL) using a randomly picked OOD set (mix data), in the way similar to ELO [1]. Then we use the data generated from the MDL layer (Step 1) and the logit layer of the DNN to train two one-class SVDD classifiers for each known class (Step 2). During the testing phase, an input is first given to the DNN classifier which produces an output class i. The data from the corresponding MDL and logit layers are forwarded to the corresponding SVDD classifiers, i.e., \(\text{ SVDD}^1_i\) and \(\text{ SVDD}^2_i\), and the scores obtained from the SVDD classifiers are combined to form a final judgment on whether the given input is anomalous (Step 3). The overview of our approach is sketched in Fig.  1.

Fig. 1
figure 1

The overview of the proposed approach. The proposed approach has three steps. Step 1 Select one of hidden layers as MDL which has the minimum detection errors for mix data. Step 2 Feed the extracted the data of the MDL and logit layer into the corresponding SVDD\(_{i}^{\ell }\). Step 3 Calculate the score of an input sample by using the corresponding SVDD\(_{i}^{\ell }\) in MDL and logit layer, which helps to detect anomaly inputs

Let layer \(\ell \) be the MDL in a reasonable DNN classifier f to ID with sub-domains for the known classes \(1\dots d\), and the set \(\{f^\ell (x)\mid \Delta _{in}(x)\}\) forms a manifold of ID in the latent space of layer \(\ell \). We conjecture that anomaly detection precision can be further improved by making predictions conditional to output of the DNN. We focus on the sub-domains \(\{f^\ell (x)\mid \Delta _{in}(x)\wedge C^f(x)=i\}\) for each known class \(i=1\dots d\).

Fig. 2
figure 2

A two-dimensional representation of features extracted from the MDL of a LeNet model trained on MNIST. The feature cluster consisting of data from all 10 MNIST classes is shown in green dots, and yellow dots represent F-MNIST (OOD) data

Fig. 3
figure 3

Two-dimensional representations of features extracted from the MDL of a LeNet model trained on MNIST. The feature clusters for the 10 classes are shown, with green dots for MNIST (ID) data and yellow dots for F-MNIST (OOD) data

Table 2 Comparison of our approach with the baseline, ODIN, Mahalanobis distance (MD) and ELO for OOD data
Table 3 Comparison of our approach with the KD+BU, LID, MD and ELO for adversarial data
Table 4 Comparison of our approach with the baseline, ODIN and MD for noise data

Taking Eq. (1), the approach of ELO empirically chooses \(\nu =0.001\), which imposes a strong penalty on samples with distance larger than R (i.e., the slack variables should be very small) when minimizing the entire term of Eq. (1), which is reasonable if the sub-domains for different classes are relatively apart. Since R becomes relatively large, potentially more anomaly samples, especially those spatially closer to ID such as adversarial samples, are classified as ID. Moreover, an anomaly input positioned between two ID clusters of distinct classes in the latent space may also be classified as ID. Figure 2 provides a 2-D view of the clusters of MNIST samples (green dots) and Fashion-MNIST samples (yellow dots) in the MDL of a LeNet model.Footnote 2 If we split the MNIST samples and Fashion-MNIST samples into 10 classes based on the classification results of the LeNet model, and study the MNIST samples and Fashion-MNIST samples confined to each class, we get the 2-D view in Fig. 3, from which it seems that near perfect separation can be achieved in some classes (e.g., the 1st, 5th, 9th and 10th classes). Our experiment results shown in Tables 2 and 3 confirm that the method with sub-domain splitting is comparable (if not marginally better) to the ELO method for the detection of OOD inputs, and for the detection of adversarial inputs the sub-domain based method significantly outperforms ELO.Footnote 3

Our second observation is that the features represented by different layers of a DNN indeed represent distinctive discriminative power on anomalies. In the literature the softmax and logit layers [16, 25, 26] are used to distinguish OOD from ID, while some others consider the usage of early layers [1, 19]. Since anomaly data in the class AD are crafted by introducing imperceptible perturbations to images from the ID, they are closer to the ID in the input domain than OOD and NS in the majority of adversarial attack scenarios, especially the AD with relatively small perturbation. Therefore, intuitively, it requires more processing in the original classifier to separate them from ID in the penultimate logit layer, rather than the earlier MDL layer. This has also been described in detail in the literature [32]. Figure 4 provides a 2-D view of the clusters of MNIST ID samples (green dots) and JSMA AD samples (yellow dots) in the MDL of a LeNet model. Figure 5 and Figure 6 show the feature clusters for the 10 classes of MNIST ID samples (green dots) and JSMA AD samples (yellow dots) in the MDL and logit layer of a LeNet model, respectively. Such a conjecture is also confirmed by the results in Tables 2 and  3. We believe that combining the power of the early layers and the late layers can achieve better precision on detecting different types of anomalies.

Based on the above observations, we train two SVDD detectors for layers \(\ell _1\) and \(\ell _2\) for each class \(i\in \{1,\dots d\}\), and combine the results by defining \(g_i(x) = \beta _1\cdot g^{\ell _1}_{f,i}(x)^* + \beta _2\cdot g^{\ell _2}_{f,i}(x)^*\) as a score to determine if x is an anomaly, given \(C^f(x) = i\). In the above formulation, we choose the MDL as \(\ell _1\) which is empirically determined and it is most of the time an early layer that gives better precision on detection anomalies than any other layers, and \(\ell _2\) is the logit layer. When used for combining scores, \(g^{\ell }_{f,i}(x)^*\) is the normalized value of \(g^{\ell }_{f,i}(x)\), which applies here because \(g^{\ell _1}_{f,i}\) and \(g^{\ell _2}_{f,i}\) tend to produce scores of different scales. Coefficients \(\beta _1\) and \(\beta _2\) are used to balance the weights from the two detectors. As shown in Fig. 7 which is the result of a preliminary experiment, setting \(\beta _1=\beta _2=0.5\) produces a close to optimal precision on all the given OOD, AD and NS data sets when CIFAR-10 is ID in a ResNet model. This figure also suggests that relying only on the MDL layer may provide acceptable results on detection of Out-of-Distribution anomalies (by treating TinyIm, LSUN, iSUN, SVHN as OOD) and noise detection (e.g., Gaussian noises), while relying only on the logit layer may provide acceptable results on detection of a few adversarial attacks.

Fig. 4
figure 4

A two-dimensional representation of features extracted from the MDL of a LeNet model trained on MNIST. The feature cluster consisting of all 10 MNIST classes is shown in green dots, and yellow dots represent JSMA (AD) data

Fig. 5
figure 5

Two-dimensional representations of features extracted from the MDL of a LeNet model trained on MNIST. The feature clusters for the 10 classes are shown, with green dots for MNIST (ID) data and yellow dots for JSMA (AD) data

Fig. 6
figure 6

Two-dimensional representations of features extracted from the Logit layer of a LeNet model trained on MNIST. The feature clusters for the 10 classes are shown, with green dots for MNIST (ID) data and yellow dots for JSMA (AD) data

Fig. 7
figure 7

AUROC of the combined anomaly detection model with different weight coefficients for values from the logit layer of a ResNet model (CIFAR-10 as ID)

Threshold Similar to other methods [16, 26], we need to have thresholds to distinguish normal inputs and anomaly inputs. Different from most other works, we define multiple thresholds based on classes of training samples. In the case of MNIST, there are 10 classes of ID data, so we define a threshold for each class. When a sample x is given to the DNN model which generates output class i, our approach collects data from the MDL layer and logit layer for the SVDDs to generate a score to compare with \(\tau _i\). The threshold \(\tau _{i}\) is computed in the way to ensure that \(95\%\) of the test samples from class i of ID have scores above \(\tau _{i}\). The threshold-based discriminator can be formally described as follows.

$$\begin{aligned} isAnomaly(x) = \left\{ \begin{aligned}&True, \;\;\; if \;\; g_i(x) < \tau _{i} \\&Fasle, \;\;\; if \;\; g_i(x) \ge \tau _{i} \end{aligned} \right. \end{aligned}$$
(4)

Note that SVDD performs poorly on high-dimensional data. Given the MDL layer is usually an early convolutional layer, the feature space of the MDL is often high-dimensional. In this case, we compute the mean of each channel to reduce the dimension of the extracted features from the MDL layer. More precisely, let \(f^{\ell } \in \mathbb {R}^{d \times w \times h}\) be the feature maps of a convolutional layer, where d, w and h are depth, width and height, respectively. Then the feature size for the SVDD classifier is d, which is reduced from \(d \times w \times h\) to d, with each dimension taking the average of all \(w \times h\) values of the same depth.

Nomalization As we mentioned before the SVDD classifiers for the MDL layer and the logit layer tend to generate scores of different scales. If we simply combine the scores with incomparable scales, it is likely to weight one SVDD classifier more than the other, leading to undesirable results. Therefore, a normalization process is essential. In our approach, we apply the min-max normalization procedure.

$$\begin{aligned} score_{i}^{*} = \frac{score_{i} - score_{min}}{score_{max} - score_{min}} \end{aligned}$$
(5)

where the \(score_{min}\) and \(score_{max}\) are the minimum and maximum of the score vector, respectively.

In summary, our SVDD detectors are trained from ID data only, i.e., \(g^\ell _{f,i}\) only depends on the feature space at layer \(\ell \) of training inputs if \(C^f(x)=i\). Most of the ID data are wrapped inside the hypersphere decision boundary in the feature space, as defined by Eq. (1), and the hyperparameter \(\nu \) controls the relative size R and the percentage of training data to be outside of the boundary. Through some initial experiments we empirically choose an early layer, i.e., the MDL, and determine the coefficients \(\beta _1\) and \(\beta _2\), such that the detector \(g_i(x) = \beta _1\cdot g^{\ell _1}_{f,i}(x)^* + \beta _2\cdot g^{\ell _2}_{f,i}(x)^*\) produces a score by combining information from the MDL (as \(\ell _1\)) and the logit layer (as \(\ell _2\)), if x is likely to be from class i (i.e., \(C^f(x)=i\)). This score is then used to decide whether x is anomaly.

5 Experiments, Evaluations and Discussions

We conduct experiment on three types of pre-trained DNN models, with three data sets chosen as ID, against various types of OOD, AD and NS data sets. Our testing code is publicly available at https://github.com/fangzhenzhao/AnomalyDetection_NPL.

Table 5 Comparison of results with OE about the baseline, ODIN, Mahalanobis distance (MD) and ELO for OOD data
Table 6 Comparison of results with OE about the KD+BU, LID, MD and ELO for adversarial data
Table 7 Comparison of results with OE about the baseline, ODIN and MD for noise data

5.1 Experiment Settings

We choose three popular DNN models used for image classification. All DNN models are pre-trained. The anomaly detection algorithm is run on a Windows 10 desktop equipped with Intel I7-9700 3.0GHz processor, 16G RAM and Nvidia GetForce GTX1660Ti.

  1. 1.

    A LeNet [24] model with two convolutional layers and three fully connected layers. The model is trained for the MNIST data set [24] and achieves \(99.20\%\) accuracy on the testing set. MNIST consists of 60, 000 \(28\times 28\) grayscale images of hand-written digits in the training sets and 10, 000 images in the testing set.

  2. 2.

    A ResNet [15] model for the CIFAR-10 [21] data set and another ResNet model for the SVHN [29] data set, achieving accuracies of \(91.65\%\) and \(96.12\%\), respectively. CIFAR-10 consists of 50, 000 and 10, 000 \(32\times 32\) color images in its training set and testing set, respectively, with each image belonging to one of the ten classes. SVHN consists of 73, 257 and 26, 032 colored house numbers from Google Street View images in its training set and testing set, respectively.

  3. 3.

    A VGG [39] model for the CIFAR-10 data set and another VGG model for the SVHN data set, achieving accuracies of \(93.47\%\) and \(95.56\%\), respectively.

Outlier Exposure (OE) has been shown as an effective fine-tuning method for improving the performance of existing anomaly detectors [17, 31, 34]. In this work, we also present the experimental results that combines OE and our approach, which are shown in the ‘ours + OE’ column of Tables 5,  6 and  7. The authors of [17] demonstrate that only 50, 000 samples from auxiliary data set to be used to fine-tune the pre-trained model is enough to improve the performance of existing anomaly detectors. In this work, we use 50, 000 English letters from E-MNIST [7] as the auxiliary data set to fine-tune the LeNet model. For VGG and ResNet models, we use 50, 000 samples from the TinyImageNet data set [9] to perform the fine-tuning. Note that the training data from the auxiliary data set, the anomaly testing data and ID testing data are pairwise disjoint.

Evaluation Metrics

Given a (binary) anomaly detector, we define true positive (TP) as the number of cases when an input from ID is correctly reported as \(isAnomaly(x) = False\), and false negative (FN) as the number of cases when an input from ID is incorrectly reported as True, for anomaly. Similarly, true negative (TN) is the number of cases when an anomaly input is correctly reported as True, and false positive (FP) is the number of cases when an anomaly is incorrectly reported as False, for data from ID. We adopt two commonly used metrics, TNR (True Negative Rate) at \(95\%\) TPR (True Positive Rate) and Area Under the Receiver Operating Characteristic curve (AUROC), to evaluate the effectiveness of our method.

Since we have a detector \(g_i\) for each class i, all counted values need to be taken weighted average for each class. For example, we have TPR = \(\sum _{i=1,\dots d}\gamma _i\cdot \text{ TPR}_i\), where TPR\(_i\) is the true positive rate calculated for inputs that are classified as i by the DNN and \(\gamma _i\) is the percentage of sample cases being classified as i.

Existing works mostly focus on detecting either OOD only, or AD only. Therefore, we compare our results on OOD data sets and Noise data with models designed for OOD detection, and compare our results on AD data with models designed for adversarial attack detection, in separate.

5.2 Experiment Results

OOD Detection

We consider several OOD data sets for evaluating the effectiveness of our methods. In particular, Fashion-MNIST (F-MNIST) [44] and Omniglot [23] are used as OOD for the LeNet model trained with MNIST. For the ResNet model trained with CIFAR-10, the OOD sets are TinyImageNet [9], LSUN [47], iSUN [45] and SVHN [29]. For the ResNet model trained with SVHN, the OOD sets are TinyImageNet, LSUN, iSUN and CIFAR-10. The experiments for the two VGG models are treated in the same way as the ResNet models. Note that we do not test MD with feature ensemble which uses output from all layers but involves tuning with particular OOD sets. We only apply the version of MD which uses the logit layer of the DNN instead.

The results for OOD detection of these models are presented in Table 2, where the data set enclosed by the brackets next to the model denotes the ID set, e.g., MNIST is the ID for the LeNet model. As shown in the results, our method has the best precision for detection of OOD anomalies in most cases. Even for the one case when ELO is better, the percentage difference is minor.

AD Detection

We compare with the works that are designed for adversarial detection, including KD+BU [11], LID [27] and MD [25]. We include the ELO method because this is the most related work to ours. The adversarial samples used in the experiment are generated by various well-known methods, including FGSM [13], BIM [22], JSMA [33] and CW [5]. For the BIM attack, we consider two scenarios: BIM-a, which stops iterating as soon as the attack is successful (‘at the decision boundary’), and BIM-b, which attacks for a fixed number of iterations that is well beyond the average misclassification point (‘eyond the decision boundary’. Some normal inputs and adversarial inputs are displayed in Fig.  8. Since both KD+BU and LID require training with adversarial inputs, in this experiment, both detectors are trained with FGSM.

Fig. 8
figure 8

Some normal images and AD anomaly images for MNIST, CIFAR10 and SVHN

The results in Table 3 have shown that our method produces the best precision in about half of the cases regarding the AUROC values. For the rest cases where other methods have better precision, our scores are not much behind except for the two BIM-a cases for the SVHN data set.

Fig. 9
figure 9

NS-II anomaly inputs

NS Detection In this experiment we have prepared three types of noise images.

  1. 1.

    The Gaussian noise (NS-I) set consists of 10, 000 images of which every pixel is sampled from random Gaussian distribution with the mean \(\mu = 0.5\) and the variance \(\sigma = 1\), clipped to [0,1];

  2. 2.

    The Uniform noise (NS-I) set consists of 10, 000 images of which every pixel is sampled from a random uniform distribution between [0, 1];

  3. 3.

    The Fooling images (NS-II) are generated by evolving meaningless images in order to mislead a DNN to output classes in ID with high confidence. We adopt the algorithm from Section 3 of [30]. The Fooling Image sets feeding to the LeNet model consists of 10, 000 \(28\times 28\) images (confidence\(\ge 99.99\%\)). The Fooling Image sets feeding to ResNet and VGG models both consist of 10, 000 \(32 \times 32\) images (confidence\(\ge 99.9\%\)). Those images are totally unrecognizable to human eyes. Two examples from NS-II are displayed in Fig.  9.

The results shown in Table 4 indicate that only our method achieves near \(100\%\) precision regarding detection of the given Noise inputs. Note that some methods, including baseline [16] and ODIN [26], which take the classification probability of an input to discriminate whether this image is abnormal, fail to detect anomalies from NS-II and in most cases classify them as ID with high confidence. In particular, these methods mostly report 0 in the column of TNR at 95% TPR. One possible explanation is that inputs from NS-II are quite different from any known anomaly distributions that have been considered by these approaches, therefore, those images are relatively hard for them to detect.

5.3 Comparison of Results with Outlier Exposure

Since OE is designed and proved to be effective for enhancing Out-of-Distribution (OOD) detection tasks [17], we need to check if it is also useful in our approach by using OE to fine-tune the models. However, in our experiment, it is inconclusive whether OE is an effective way to enhance precision when the target anomaly includes not only OOD, but also AD. First, out of the 18 OOD detection benchmarks, OE has improved the results of our method for 11 benchmarks regarding the AUROC metric. For adversarial (AD) detection tasks, our method combined with OE has produced equal or better performance in 15/25 cases. Nevertheless, performance of our method has been enhanced in a few benchmarks, e.g., for the SVHN data set, the results of the two BIM-a are significantly improved (\(11.74\%\) gain in VGG model and \(3,89\%\) gain in ResNet model). For Noise (NS) detection, the performance of our method with OE is also improved in most cases (equal or better in 13/15 cases), although the improvements are mostly minor.

Regarding the other methods, similar improvement in performance has been observed for basline, ODIN and MD regarding OOD detection tasks. For AD detection, the improvement of performance for KD+BU, LID, MD, ELO are in general insignificant (some cases even suffering performance degradation). The effect of OE for NS detection is in general positive for all the tested methods.

Table 8 Comparison of results with AE, VAE, AE&KL and MemAE for OOD data
Table 9 Comparison of our approach with the AE, VAE, AE&KL and MemAE for adversarial data

5.4 Comparison of Results with AEs

Another class of approach for anomaly detection, which is substantially different from the main methodology presented in this paper, applies autoencoder (AE) based structures, such as AE [3] and VAE [20], to measure the difference between the reconstruction loss of normal and anomalous examples. In general, AEs are used to detect anomaly data in an unsupervised manner. Some more recent works include MemAE [12], which augments the autoencoder with a memory module and targets OOD, and AE&KL [42], which measures the difference between the outputs of the DNN with original data and reconstructed data as inputs and targets AD detection mostly. To achieve a better coverage over the anomaly detection works in the literature, we compare the performance of our method and a few recent AE-based methods, and demonstrate that our method delivers more stable results when facing OOD and AD.

In the literature, the AEs are often applied in the following two ways for anomaly detection tasks. First, an AE is used to reconstruct a test sample and the reconstruction error can be calculated and directly used for detection [6, 12, 14, 49]. Second, an AE can be used to learn a low-dimensional representation of the input data in its latent space, then distance-based metrics are applied to measure the difference between a test example and the ID dataset [10, 37, 42, 50]. The reconstruction error or distance metrics are used as an anomaly score and compared with a given threshold, and the samples above the threshold are considered as anomalous.

We compare our proposed method with a few standard or recent anomaly detectors using AE models, including autoecoder (AE) [3], variational autoencoder (VAE) [20], memory-augmented autoencoder (MemAE) [12] and autoecoder combined with KL divergence (AE&KL) [42]. The AE [3] and VAE [20] were widely used for anomaly detection [2], while these models may also reconstruct well anomaly data, causing unsatisfactory detection performance [35, 48]. The MemAE [12] is a more recent work that augments the autoencoder with a memory module which is used to represent the prototypical elements of the normal data, so that the reconstruction is obtained from only a restricted set of memory records. The above three works belong to the first category, i.e., they use the reconstruction error as the anomaly detection metrics. The data sample with reconstruction error above a given threshold is anomalous. The AE&KL defines the KL divergence as distance measure between the output distributions of the classified model on input data and reconstructed data. The AE&KL belongs to the second category.

The detailed implementation and parameter settings are given in our public code. The experimental results are given in Tables 8 and 9. The results show that AE and VAE perform better than our model in a few experiments, e.g., on detecting OOD on the SVHN dataset. However, it does not perform stably, because it also reconstruct well on some anomaly data, e.g., the BIM-a and CW for CIFAR-10 (ResNet) and SVHN (VGG/ResNet). For MemAE, we train 10 anomaly detectors according to 10 subclasses, which is the same as the original work [12]. The difference is that the original work selects one class as ID, and the remaining classes from the same dataset are considered to be anomalous,

while in our approach, ID and anomalous data (OOD and AD) are different datasets. In addition, our approach detects anomalies in given classes, meaning that for each input, we apply the target DNN to return a class, based on which we select the corresponding MemAE-based detector for testing, whether the input is anomalous. The results of MemAE are significantly inferior to the results of our method and other methods for OOD and AD, as shown in Tables 8 and 9. The AE&KL demonstrates better performance than ours in some cases for AD, e.g., the results of the BIM-a for SVHN. However, the results for BIM-b are not as good. Besides, the results for OOD are significantly inferior to our results. In summary, the performance of the above AE-based anomaly detectors are unstable and cannot be used as a general method to tackle both OOD and AD.

5.5 Discussion on Preprocessing

As shown in Table 1, both ODIN and MD apply input pre-processing to improve their precision. To illustrate the performance of our work, we further compare our results with their best performance i.e., ODIN and MD with pre-processing, using VGG and ResNet models on the CIFAR-10 data set. The results (see Table 10) confirm that the performance of ODIN and MD with pre-processing is obviously better than those without pre-processing (Table 2), and demonstrate that our results are better than both ODIN and MD with pre-processing. Similarly, for AD detection, knowing the adversarial attack strategy, the performance of LID can be significantly improved and the performance of KD+BU can also be improved to some extent (excluding BIM-b for VGG model) as shown in Table 11. More importantly, we show that our results outperform both KD+BU and LID with known adversarial samples in majority cases, when applied to VGG and ResNet models on the CIFAR-10 data set (see Table 11).

Table 10 The results of ODIN and MD for pre-processing
Table 11 Comparison of our approach with the KD+BU, LID for the known adversarial data

6 Conclusion and Future Work

To enhance the applicability of DNN input anomaly detection in real-world tasks, we have proposed a novel approach that is able to detect all three types of anomalies, namely Out-of-Distribution (OOD) data, Adversarial (AD) data and Noise (NS) data. By combining the early and late layers of pre-trained DNN models, and deepening to a fine-grained level of each sub-class, our approach generally outperforms the state-of-the-art approaches for detection of all aforementioned anomaly types, to the best of our knowledge, which has been evidenced by the experiments.

One limitation is that this work and other anomaly detection works focus only on image classification. As the application domains of DNN is expanding fast, it is interesting and necessary to explore whether the existing methodology can be adopted to other applications beyond image processing, such as speech recognition, natural language processing, and intrusion detection with network traffic monitoring.