Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Pattern recognition systems have been largely employed in security-sensitive settings like biometric identity recognition, intrusion and malware detection, spam filtering, web-page ranking and network protocol verification, to discriminate between legitimate and malicious samples. However, these applications are characterized by the presence of intelligent adversaries who can deliberately attack the classifier by carefully manipulating malicious data at test time to evade detection. From the learning perspective, this means that the underlying class-conditional probability distribution of the malicious data may change from training to test time, i.e., it is subject to an adversarial drift [18].

Accordingly, pattern classifiers are often characterized by an unsatisfying trade-off between accuracy and security against evasion at test time, especially in high-dimensional feature spaces. While two-class classifiers may achieve high accuracy in the absence of attack, they can be evaded by malicious samples that are sufficiently different from the training data. This is due to the fact that these classifiers minimize the classification risk (or error) over the training data, assuming a stationary distribution. Therefore, it does not make any significant difference in terms of risk if regions of the feature space which are not densely populated by training data are classified as legitimate or malicious [911]. On the other hand, one-class classifiers (trained on legitimate data) have been exploited in security applications exactly to detect these outlying, anomalous attacks. However, one-class classifiers may exhibit a significantly lower accuracy in the absence of attack (in particular, in high-dimensional feature spaces), as they do not exploit any information on the (available) malicious training data [10].

Motivated by the complementarity of the aforementioned approaches, in this work we propose a Multiple Classifier System (MCS) that combines two-class and one-class classifiers to achieve a better trade-off between accuracy and security against evasion. For this reason, we name it one-and-a-half-class (1.5C) MCS (Sect. 3). Our MCS is able to learn a more secure decision function by providing a tighter enclosing of the legitimate data in feature space, while also exploiting information from the available malicious data to retain high accuracy in the absence of attack. Compared to other secure learning techniques [1214], we do not make any specific assumption on the kind of adversarial drift that may occur at test time, but rather only agnostically assume that malicious data may appear everywhere in feature space at test time with a non-null probability. This also allow us to reduce the computational complexity during the training phase.

To better motivate our proposal, we provide a simplified analysis of how the classification risk changes in the presence of evasion attacks (Sect. 2). Then, we evaluate the security of our approach in a fair, well-principled way, exploiting a recently-proposed framework to design well-crafted evasion attacks against each given classifier, including the proposed MCS, assuming perfect and limited knowledge of the targeted system [68] (Sect. 4). We finally evaluate the soundness of our approach on two real-world application examples involving spam and malware detection (Sect. 5), and conclude the paper by discussing related work (Sect. 6) and future research directions (Sect. 7).

Fig. 1.
figure 1

Classification risk under evasion, in a one-dimensional feature space. The optimal hypothesis \(f\) learned on \(p(x,y)\) classifies a sample as malicious if \(x \ge 0\). Its decision boundary, achieving perfect separation between the two classes, is shown as a dashed black line. Under attack, the malicious distribution changes as \(p(a(x),y=+1)\), and this causes an increase of the evasion rate given by the light and dark red areas. The decision boundary of the optimal function \(h\) trained after attack is depicted as a solid black line. It trades a much smaller evasion rate (dark red area) for a slightly higher false positive rate (blue area) (Color figure online).

2 A Simplified Risk Analysis Under Evasion Attacks

In this section, we provide an analysis of the evasion setting under the risk minimization framework [15]. In the classical setting, it is assumed that an underlying probability distribution \(p(\varvec{x}, y)\) generates data samples \(\varvec{x} \in \mathcal {X}\) along with their class labels \(y \in \mathcal {Y}\), and risk minimization amounts to finding an hypothesis \(f : \mathcal {X} \mapsto \mathcal {Y}\) that minimizes the expected risk (or loss) \(\ell \left( y, f(\varvec{x}) \right) \) over \(p\), i.e., \(f = \mathrm{arg\,min}_{f'} R(f') = \mathbb E_{\varvec{x}, y \sim p} \{ \ell \left( y, f'(\varvec{x}) \right) \}\). For instance, if \(\ell \) is the 0–1 loss, \(R(f)\) corresponds to the minimum classification error, being \(f\) the optimal hypothesis that would be obtained if \(p\) were known.Footnote 1

In the evasion setting, the malicious data may change at test time, while the legitimate samples can be considered stationary, i.e., not affected by adversarial drift. This behavior can be modeled with a function \(a : \mathcal {X} \mapsto \mathcal {X}\) that represents how the attacker modifies the malicious samples drawn from \(p\) at test time. The additional risk incurred by \(f\) at test time can be thus written as:

$$\begin{aligned} R_\mathrm{ts}(f) - R_\mathrm{tr}(f) = \mathbb E_{\varvec{x}, y } \{ \ell (y, f(a(\varvec{x}))) - \ell (y,f(\varvec{x})) \}, \end{aligned}$$
(1)

where \(R_\mathrm{tr}(f)\) and \(R_\mathrm{ts}(f)\) respectively represent the risk incurred by \(f\) before and after the attack. As the legitimate data is not affected by the function \(a(\varvec{x})\), the above difference is not null only for the malicious class. Thus, if \(\ell \) is the 0–1 loss, it corresponds to the increase in the evasion rate at test time (see Fig. 1).

Now, assume that \(f^{\star } = \mathrm{arg\,min}_{f' \in \mathcal {F}} R_\mathrm{ts}(f')\) is the optimal hypothesis on the test data, including the manipulated attacks. Then, the difference of the risk incurred at test time by using \(f\) instead of \(f^{\star }\) is:

$$\begin{aligned} R_\mathrm{ts}(f) - R_\mathrm{ts}(f^{\star }) = \mathbb E_{\varvec{x}, y} \{ \ell (y, f(a(\varvec{x}))) - \ell (y,f^{\star }(a(\varvec{x}))) \}. \end{aligned}$$
(2)

If \(\ell \) is the 0–1 loss, this amounts to the variation of the classification error between \(f\) and \(f^{\star }\), computed on the manipulated test data.

As shown in Fig. 1, one may thus decide to trade a slightly higher number of misclassified legitimate samples for a significantly reduced evasion rate (i.e., fraction of misclassified malicious samples after attack), improving security at the expense of slightly worsening accuracy in the absence of attack. To this end, decision boundaries that more tightly enclose the legitimate class should be designed by classifying regions for which \(p(\varvec{x}) \approx 0\) as malicious. This should allow for improving security under attack by reducing the feasible attack space (i.e., regions of the feature space classified as legitimate), without significantly increasing the rate of misclassified legitimate samples. Notably, this can be a relevant problem especially in high-dimensional feature spaces, where regions that are classified as legitimate despite \(p(\varvec{x}) \approx 0\) may be significantly wider.

Fig. 2.
figure 2

1.5C classification on two-dimensional toy data. Legitimate (malicious) training samples are shown as blue (red) points, and decision boundaries as solid black lines. Left: Two-class classification yields high accuracy in the absence of attack, but it can be evaded by evasion samples that are sufficiently different from the training data (e.g., in the top-left corner). Middle: One-class classification (on the legitimate class) may worsen accuracy in the absence of attack, but improves security by enclosing the legitimate class. Right: 1.5C classification retains the advantages of both approaches: security is improved by enclosing the legitimate data, without significantly affecting accuracy in the absence of attack (Color figure online).

3 Secure 1.5C Classification with MCSs

The analysis reported in the previous section suggests that two-class and one-class classifiers can be considered as complementary techniques, usually characterized by different challenges, especially in high-dimensional feature spaces, and a different trade-off between accuracy and security against evasion at test time. Towards enhancing this trade-off, we propose an MCS architecture where a two-class classifier is combined with two one-class classifiers, learned respectively on legitimate and malicious data: the two-class classifier should allow for high accuracy in the absence of attack, while the two one-class classifiers should enable the detection of evasion attacks that are (expected to be) significantly different from the training samples of either class. To combine the given classifiers in a secure way, i.e., learning a decision function that encloses the legitimate data, we use a further one-class classifier trained on the outputs of the three base classifiers using only legitimate data. The trade-off between accuracy and security exhibited by one-class and two-class classification is exemplified in Fig. 2, along with an example of 1.5C classification overcoming the limitations of both approaches. The architecture of the proposed MCSs is depicted in Fig. 3.

Fig. 3.
figure 3

1.5C MCS: \((i)\) features are extracted from raw data; (\(ii\)) the two-class and one-class classifiers assign scores \(g_{i}(\varvec{x}) \in \mathbb R\), \(i=1,2,3\), to the feature vector \(\varvec{x}\) (assuming that higher scores are given to the malicious class); (\(iii\)) these scores are combined by the one-class combiner, providing an aggregated score \(g(\varvec{x})\); (\(iv\)) \(g(\varvec{x})\) is then compared against a decision threshold \(t\) to make the final decision (Color figure online).

4 Classifier Evasion

In this section, we consider a simplified version of the evasion attack algorithm proposed in [8], described in terms of the attack framework originally defined in [6], i.e., by clearly stating the attacker’s goal, knowledge of the system, and capability of manipulating the input data.

Attacker’s Goal. In an evasion attack, the attacker’s goal is to have malicious samples misclassified as legitimate at test time.

Attacker’s Knowledge. The attacker may have different levels of knowledge about \((i)\) the training data, \((ii)\) the feature representation, and \((iii)\) the learning model (including knowledge of the classifier’s parameters after training, and feedback on its decisions on input samples). As in previous work [6, 8], we consider here perfect knowledge (PK) and limited knowledge (LK) attacks. In the PK case, the attacker is assumed to know all the system details, including the trained classifier’s parameters (e.g., the weights assigned by a linear classifier to each feature). Although this may not be very realistic in practice, performing a security evaluation of the system under this setting allows one to assess the worst performance degradation that may be incurred by the system under attack. In the LK case, the attacker does not have access to the training data, but can collect surrogate data ideally sampled from the same distribution, and use feedback on the classifier’s decisions to re-label such samples. The feature representation and the learning model (but not the trained classifier’s parameters) are known. Under this setting, the attacker can learn a surrogate classifier \(\hat{g}(\varvec{x})\) on the re-labeled surrogate data, to approximate the discriminant function \(g(\varvec{x})\) of the targeted classifier.

Attacker’s Capability. In an evasion attack, the attacker is assumed to be able to modify malicious data at test time, according to application-specific constraints, while she can neither access nor modify the classifier’s training data.

Attack Strategy. Similarly to [8], and according to the previously-described attack scenario, an optimal evasion strategy can be thus formulated as:

$$\begin{aligned} \varvec{x}^*= & {} \arg \min _{\varvec{x}} \quad \hat{g}(\varvec{x}), \nonumber \\&\qquad \mathrm{s.t.} \quad d(\varvec{x},\varvec{x}_a) \le d_\mathrm{max}, \end{aligned}$$
(3)

where the attacker’s goal is to find a set of feasible manipulations to the initial malicious sample \(\varvec{x}_{a}\) to obtain a camouflaged sample \(\varvec{x}^{*}\) that minimizes the (surrogate) classifier’s discriminant function \(\hat{g}(\varvec{x})\). Note that, without loss of generality, we are assuming here that the classification function is \(f(\varvec{x}) = \left( g(\varvec{x}) \right) _{+}\), where \(\left( a \right) _{+} = +1\) if \(a \ge 0\), and \(-1\) otherwise, and malicious samples should be assigned higher (positive) values of \(g(\varvec{x})\) to be correctly classified.

The feasible domain for \(\varvec{x}\) is defined in terms of constraints on the manipulation of the feature values of the malicious samples. For some classes of features, a simple distance metric can be adopted. For instance, if one considers the presence or absence of a given word in an email as a feature, then the \(\ell _{1}\)-norm between two feature vectors amounts to counting how many words are different between the two emails. Therefore, as also done in previous work [1, 2, 5, 6, 8, 16], we express this constraint as \(d(\varvec{x},\varvec{x}_a) \le d_\mathrm{max}\), highlighting that only a maximum number of modifications \(d_\mathrm{max}\) to each sample are allowed. Varying the parameter \(d_\mathrm{max}\) is fundamental to properly understand system security under attack. In fact, as we will show in our experiments, more secure classifiers will exhibit a lower decrease in the detection rate of malicious samples as \(d_\mathrm{max}\) increases (i.e., as a larger amount of manipulations are performed to the malicious samples). Additional constraints to Problem (3) may be also considered, depending on the specific feature representation; e.g., if features are real-valued and normalized in \([0,1]^{d}\), one may consider an additional box constraint given as \(\varvec{x} \in [0,1]^{d}\).

The solution of Problem (3) clearly depends on the kind of discriminant function and on the given set of constraints. In [8], a general solution for nonlinear, differentiable discriminant functions was proposed, based on a simple gradient-descent algorithm. It is worth remarking that an additional component to the objective function was also considered in that work, to drive the attack point during the descent towards regions of the feature space more densely populated by legitimate samples, with the goal of increasing the probability of evading detection (especially when attacking the surrogate classifier). In this work, we disregard this component, as it is possible to substantially obtain the same effect by running the evasion attack algorithm (given as Algorithm 1) from distinct initialization points, eventually retaining the attack sample that achieves the minimum value of the objective.

figure a

4.1 Gradient Computation

The gradients required to evaluate classifier security using Algorithm 1 are given below, for the classifiers considered in this work: linear and nonlinear Support Vector Machines (SVMs), and the proposed 1.5C MCS. Note that this approach can be easily applied to any classifier, provided that its discriminant function \(g(\varvec{x})\) is at least subdifferentiable; otherwise, one may use a different optimization strategy to solve Problem (3), or approximate the non-differentiable \(g(\varvec{x})\) with a surrogate classifier having a differentiable discriminant function [8].

Linear Classifiers. For linear discriminant functions \(g(\mathbf x) = \langle \mathbf w, \mathbf x \rangle + b\), with feature weights \(\mathbf w \in \mathbb {R}^{d}\) and bias \(b \in \mathbb {R}\), the gradient is simply \(\nabla g(\mathbf x) = \mathbf w\).

Nonlinear SVMs. For kernelized (and one-class) SVMs, the discriminant function is \(g(\mathbf x) = \sum _{i=1}^{n} \alpha _i y_i k(\mathbf x,\mathbf x_{i}) + b\), where \(\varvec{\alpha }\) and \(b\) are the SVM parameters learned during training, \(k(\mathbf x,\mathbf x_{i})\) is the kernel function, and \(\{\varvec{x}_{i}, y_{i}\}_{i=1}^{n}\) are the training samples and their labels [15]. The gradient is \(\nabla g(\mathbf x) = \sum _i \alpha _i y_i \nabla k(\mathbf x,\mathbf x_{i})\) and, thus, the feasibility of our approach depends on the kernel gradient \(\nabla k(\mathbf x,\mathbf x_{i})\), which is computable for many numeric kernels; e.g., for the RBF kernel \(k(\mathbf x,\mathbf x_{i}) = \exp \left( -\gamma \Vert \mathbf x-\mathbf x_{i}\Vert ^2\right) \), it is \(\nabla k(\mathbf x,\mathbf x_{i}) = -2 \gamma \exp \left( -\gamma \Vert \mathbf x-\mathbf x_{i}\Vert ^2\right) (\mathbf x-\mathbf x_{i})\).

1.5C MCS. In this case, the gradient will depend on the particular choice of each of the component classifiers \(g_{1}\), \(g_{2}\), \(g_{3}\) and on the one-class combiner \(g(\varvec{x})\). To provide an example, we assume here that a one-class SVM with the RBF kernel is used to combine \(g_{1}\), \(g_{2}\), \(g_{3}\). In this case, the discriminant function can be written as \(g(\varvec{x}) = \sum _{i} \alpha _{i} \exp { \left( -\gamma \Vert \varvec{z}(\varvec{x}) - \varvec{z}(\varvec{x}_{i}) \Vert ^{2} \right) } + b\), where \(\varvec{z}(\varvec{x}) = [ g_{1} (\varvec{x}), g_{2} (\varvec{x}), g_{3} (\varvec{x})]^{^\top }\). The corresponding gradient is thus given as:

$$\begin{aligned} \nabla {g(\varvec{x})} = -2 \gamma \sum _{i} \alpha _{i} e^{ -\gamma \Vert \varvec{z}(\varvec{x}) - \varvec{z}(\varvec{x}_{i}) \Vert ^{2} } (\varvec{z}(\varvec{x}) - \varvec{z}(\varvec{x}_{i}))^{^\top } \frac{\partial \varvec{z}}{\partial \varvec{x}} \, , \end{aligned}$$
(4)

where \(\frac{\partial \varvec{z}}{\partial \varvec{x}} = [\nabla g_1(\varvec{x}), \nabla g_2(\varvec{x}), \nabla g_3(\varvec{x})]^{^\top } \in \mathbb R^{3 \times d}\), and \(d\) is the feature set size.

Descent in Discrete Spaces. In discrete spaces, it is not possible to follow the gradient-descent direction precisely, as it may map the given sample to a set of non-admissible feature values. The gradient-descent direction can be nevertheless used as a search heuristic, as follows. Starting from the current sample \(\varvec{x}\), one may generate a set of candidate neighboring points by perturbing only those features of the current sample which correspond to the maximum absolute values of the gradient, one at a time, in the correct direction. Eventually, one should update the current sample to the neighbor that attained the minimum value of the objective function, and iterate until convergence.

5 Experiments

In this section, we report an experimental evaluation of our proposal on two distinct security applications, i.e., spam filtering and PDF malware detection. Our goal is to investigate whether, and to what extent, the proposed 1.5C MCS is able to combine the advantages of two-class and one-class classification to yield a better trade-off between accuracy and security. To this end, we adopt a common experimental setup, involving the following classifiers: (\(i\)) a two-class SVM, with the linear kernel for the spam data, and the RBF kernel for the PDF data; (\(ii\)-\(iii\)) two one-class SVMs with the RBF kernel, trained respectively on legitimate and malicious samples; and (\(iv\)) a one-class SVM with the RBF kernel to combine the former three classifiers in our 1.5C MCS (see Sect. 3).

We simulate attackers that have perfect (PK) and limited (LK) knowledge of the targeted system (see Sect. 4). In the LK case, we assume that the attacker collects a surrogate training set whose size is only 20 % of the size of the data used to learn the targeted classifier. The reason is that, in practice, attackers may only perform a limited number of queries to the targeted classifier to recover the classification labels of such samples, and train the surrogate classifier.

We evaluate classifier performance by averaging the true positive (TP) rate, i.e., the fraction of correctly-classified malicious samples, achieved when the false positive (FP) rate, i.e., fraction of misclassified legitimate samples, is below \(1\,\%\), which corresponds to the so-called Area Under the ROC Curve (AUC) at 1 % FP rate. The reason is that, in these applications, false positives are more harmful than false negatives, and thus, they should be kept very low [6, 18]. Our results are averaged over five repetitions, in which different training-test splits are considered. In particular, in each repetition, 50 % of the data is randomly chosen for training (TR), while the rest is used for testing (TS).

Fig. 4.
figure 4

Average \(\mathrm{AUC}_{1\,\%}\) values (with standard deviation) for spam filtering (top row) and PDF malware detection (bottom row), attained by two-class (2C) SVMs, one-class (1C) SVMs trained on legitimate (L) and malicious (M) data, and the 1.5C MCS, against PK (first column) and LK (second column) attacks. In each plot, the \(\mathrm{AUC}_{1\,\%}\) under attack is evaluated against an increasing maximum number of modifications to the malicious data, i.e., number of modified words in each spam, and number of added keywords to each malicious PDFs.

5.1 Spam Filtering

Spam filtering is one of the most popular application examples considered in adversarial learning [1, 2, 6, 17, 18]. As in previous work, we consider the widely-used bag-of-words feature representation, in which an email is described as a set of binary features, indicating the presence (1) or absence (0) of a given word. Due to the high dimensionality of the corresponding feature space, linear classifiers have been often considered [1719] and implemented in real anti-spam filters, including e.g. SpamAssassin and SpamBayes.Footnote 2 Therefore, in this investigation we implement the two-class classifier as a linear SVM. Popular attacks in spam filtering include modifications to the email’s content. Attackers may obfuscate bad words (which typically appear in spam but not in legitimate emails) through misspelling (e.g., changing “cheap” as “che4p”), and add good words (which typically appear in legitimate emails but not in spam). The corresponding effect is to change features from 1 to 0, or viceversa.

Experimental Setup. We use the benchmark TREC 2007 email corpus [20], which consists of 25,220 ham and 50,199 spam emails. The first 5,000 emails are adopted in this experiment. Each email is represented by a feature vector using a tokenization method of SpamAssassin. We use a feature selection method based on information gain [21], reducing the number of features from more than 20,000 to 500, to reduce computational complexity. All the malicious samples in TS are manipulated according to PK and LK evasion attacks. In the latter case, only 500 samples have been used as the surrogate training data available to the attacker. To evaluate classifier security against an increasing maximum number of modifiable words in each spam, we set \(d(\varvec{x}, \varvec{x}_{a})=\Vert \varvec{x} - \varvec{x}_{a}\Vert _{1} \le d_\mathrm{max}\) as the constraint in Problem (3). In this case, \(d_\mathrm{max}\) is exactly the maximum number of modifiable words (i.e., feature values) in each spam. The parameters \(C, \gamma \in \{2^{-10}, 2^{-9}, \ldots , 2^{10}\}\) of each SVM are optimized through a 5-fold cross-validation on TR to minimize the classification error.

Results. The average \(\mathrm{AUC}_{1\,\%}\) values, along with their standard deviations, are shown in Fig. 4 (top row), against an increasing maximum number of modified words in each spam. In the absence of attack, the accuracy of the 1.5C MCS and the two-class (2C) SVM is similar, and very high, while for one-class (1C) SVMs it is very low. The reason is that we are not only working at extremely low FP rates, but also in high-dimensional spaces; thus, detecting a high number of attacks for these techniques becomes extremely challenging. Under attack, i.e., where the number of modified words is greater than zero, the \(\mathrm{AUC}_{1\,\%}\) of the 2C SVMs drops significantly, in both the PK and the LK settings. Conversely, the performance of our 1.5C MCS decreases more gracefully, retaining a higher detection rate against evasion attempts, and thus, a higher security.

5.2 PDF Malware Detection

Another well-known application investigated in adversarial learning is PDF malware detection. PDF files are characterized by a hierarchy of interconnected objects including keywords and data streams; e.g., the keyword “PageLayout” characterizes an object describing the format of the corresponding page. Their flexible structure allows for embedding different kinds of content, including JavaScript and binary programs. This makes PDF files suitable vectors to spread malware infections, as malware can be easily embedded into them. In this experiment, we use the feature representation adopted in [8, 22], in which each feature represents the number of occurrences of a given keyword in a PDF file. In this case, evasion attacks consist of manipulating the set of keywords present in the PDF file at test time. However, since embedded objects can be easily added to, but not removed from PDF files without corrupting their structure, as in [8, 22] we assume that the number of occurrences of each keyword can be only increased. This can be implemented by adding the constraint \(\varvec{x}_a \le \varvec{x}\) (where inequality has to be fulfilled by each feature value) to Problem (3).

Experimental Setup. We use 2,000 samples from the PDF malware dataset in [8, 22], represented by 114 distinct keywords (i.e., features). Each feature value (i.e., the number of occurrences of the corresponding keyword) is normalized in \([0,1]\) by dividing its value by 100 (which is also set as the maximum value of the occurrences of each keyword). The SVMs’ parameters are set as in the previous experiments on spam filtering. Classifier security is evaluated against an increasing maximum number of added keywords. To this end, we set \(d(\varvec{x}, \varvec{x}_{a})= 100 \Vert \varvec{x} - \varvec{x}_{a}\Vert _{1} \le d_\mathrm{max}\) as the constraint in Problem (3), similarly to the experiment on spam filtering. Together with the constraint on feature addition \(\varvec{x}_a \le \varvec{x}\), this let \(d_\mathrm{max}\) correspond to the maximum number of keywords that can be added to each malicious PDF file.

Results. As for the spam filtering case, results are reported in terms of \(\mathrm{AUC}_{1\,\%}\) values in Fig. 4 (bottom row). Similar conclusions can be drawn with respect to the results obtained for the spam filtering case, except that here, 1C SVMs perform slightly better in the absence of attack, and exhibit higher detection rates under attack than those exhibited by the 2C SVMs (for both PK and LK attacks). This means that, for a sufficiently high number of added keywords, 1C SVMs are more secure than 2C SVMs. Nevertheless, the proposed 1.5C MCS significantly outperforms the competing approaches under attack, while only performing slightly worse than the 2C SVM in the absence of attack. Worth noting, the detection rate of the 1.5C MCS decreases more gracefully in this case with the increase of the maximum number of added keywords, since the feature values of malicious samples can only be incremented.

6 Related Work

While many methods have been proposed to counter evasion attacks [1, 12, 14, 25], they are computationally demanding and make specific assumptions on how attackers should manipulate data at test time to evade detection (see also [36] and references therein, for a more detailed review). Inspired by the seminal work in [1], recent work has shown that game theory can be used to model interactions between the classification system and the attacker in non-zero-sum games, yielding more secure classifiers [14]. However, satisfying conditions for the uniqueness of the equilibrium not only require the classifier’s and the attacker’s objectives to fulfill specific conditions, but also computationally demanding algorithms to find suitable solutions. A slightly different line of work [12, 13, 25] aims to learn secure classifiers (and SVMs, in particular), by minimizing a modified version of the loss function which accounts for worst-case manipulations to data at test time, including feature deletion, insertion, and rescaling. This also results in a computationally intensive training phase.

Other work has proposed countermeasures to evasion based on well-motivated heuristics, e.g., that feature weights assigned by a linear classifier should be more evenly spread among the features, to require modifying more feature values to evade detection [18, 23]. Linear classifiers exhibiting the aforementioned behavior have been implemented efficiently using MCSs, by averaging an ensemble of linear classifiers [18, 23, 26, 27]. In the present work, we have shown that MCSs can also be used to learn secure, nonlinear classifiers, in an efficient manner.

7 Conclusions and Future Work

While pattern classifiers have been widely used in adversarial applications, they are often characterized by an unsatisfying trade-off between accuracy and security against evasion, especially in high-dimensional spaces: two-class classifiers may yield high accuracy but they are potentially insecure, while one-class classification may improve security at the expense of a lower accuracy in the absence of attack. Motivated by the complementarity of these approaches, we proposed a 1.5C MCS that achieves a better trade-off between accuracy and security. Our proposal is general, it is agnostic to the kind of adversarial drift that may occur at test time, and can be used to improve the security of any classifier against evasion attacks, as shown by the reported experiments on spam and malware detection. Interesting future research directions include providing a formal, well-principled characterization of the trade-off between accuracy and security, as well as its relationship with the dimensionality of the feature space, and investigating whether the proposed 1.5C MCS can also be capable to deal with the presence of poisoning attacks in the training data [37].