1 Introduction

In the early days of computer networking, attacks were mostly detectable by a signature-based detection system. This system employed a signature database to detect known malicious samples [1, 2].

With an exponential growth in the Internet and computer networks, a new generation of attacks has emerged. The properties of this new generation of attacks are diversity, complexity, and growth. Furthermore, attacks employ obfuscation techniques for self-protection, thereby deceiving detection systems.

Traditional signature-based methods detect before-seen malicious samples with exact matching techniques; therefore, these methods cannot provide an acceptable level of security considering this new trend of attack [3]. In order to address the weakness of signature-based methods, machine learning techniques have been employed to confront zero-day attacks [2, 4, 5]. By definition, a zero-day attack has never been seen before. However, security applications such as malware detection and spam filtering are different from the classical machine learning applications due to the adversarial nature of these applications. In these applications, attackers might use obfuscation and self-protection techniques to evade detection systems through different types of attacks. Due to this behavior, the security applications’ environment is known as adversarial environment.

In addition, data stationarity assumption in the security applications might fail due to the presence of an attacker, as it is a fundamental assumption in machine learning techniques [6]. This assumption states that the training data, used to train the classifier, and the test data that will be classified by the learned classifier, have the same distribution. In addition, this assumption is a basis for performance-evaluation techniques such as cross-validation, bootstrapping and empirical risk minimization principle.

Considering the nature of security applications, a new line of research has emerged in order to investigate the challenge of exploiting machine learning in an adversarial environment [7]. Machine learning techniques are not well equipped to address the adversarial nature of security applications. Thus, it is essential to evaluate the robustness of machine learning-based detection system in the presence of attackers.

The challenge between attackers and classifier designers leads to a reactive arms race where, an attacker analyzes a learning system and devises an attack to evade it [8]. The classifier designer then reacts by developing appropriate countermeasures; such as analyzing the new samples and re-training the model using the newly collected samples.

Conversely, in the proactive arms race, the classifier designer proactively evaluates security by simulating an attacker during the learning phase [8]. In this way, the classifier designer identifies potential attacks, models them, and evaluates their impact on the performance of the learning system. Therefore, in a proactive arms race framework, a classifier designer is able to provide a more robust detection system.

Evasion attacks are the most typical attack scenario that may be incurred in an adversarial environment during system operation; this attack aims to modify the distribution of test data to mislead detection system. For example, spammers attempt to evade detection systems by obfuscating the content of spam emails. Similarly, in the malware detection in PDF files, malicious PDF files under this attack mimic the structure of legitimate PDF samples to mislead detection system.

Recent studies proposed an optimization problem to model adversary’s strategy during evasion attack in order to evaluate the robustness of the learning system under this attack. Authors in [6, 8, 9] explored a gradient-based evasion attack against classifiers with differentiable discriminant function such as SVM classifier.

However, modeling this type of attack against a classifier with non-differentiable decision boundary (such as Random Forest classifier), has not been defined in the literature. Non-differentiable optimization problem is a category of optimization that deals with an objective function that is non-differentiable and thus non-convex due to a variety of reasons. Non-differentiable functions often arise in real world applications.

To tackle this problem, in this paper, we propose a new approach to model an evasion attack where discriminant function of the target classifier as an objective function is non-differentiable. Our proposed model consists of two main components: queryStrategyModel and applyEvasion. First, queryStrategyModel introduces a surrogate classifier with differentiable decision boundary in place of the target classifier for devising an evasion attack. Then, applyEvasion component manipulates test data based on the discriminant function of the surrogate classifier and modifies test data in order to evade the target classifier.

We leverage proactive arms race framework for the security evaluation of the target detection system, i.e. Random Forest classifier. The main contributions of this paper are as the following.

  • First, we model an evasion attack against a classifier with non-differentiable decision boundary, i.e. Random Forest.

  • Second, we evaluate the robustness of the classifier, i.e. Random Forest, under the modeled evasion attack.

  • Third, we show experimentally that the evasion attack successfully degrades the performance of the Random Forest classifier.

The remainder of this paper is organized as follows: Section 2 provides a review of the previous work. Section 3 highlights background of this research. Section 4 presents our proposed method. In Section 5, evaluation results are discussed. Finally, Section 6 presents the conclusion and future work.

2 Related work

The adversarial nature of security application is due to the role of an adversary; for example, in spam filtering, the spammer is an adversary who tries to deceive a classifier through bad word insertion such as misspelling the “spammy” word “cost” as “c0st”. For the sake of clarity, an adversary and an attacker have exactly the same role in this paper; in the literature, an adversary devises an attack against the learning system.

Several studies used game theory approaches to model the competition between an adversary and a classifier. Brückner et al. in [10] modeled the interaction between a classifier and an adversary as a Stackelberg competition in which the classifier plays the role of the leader and the adversary may react according to the leader’s move. The authors solved an optimization problem in order to explore the solution of this game. In [11], the interaction between a classifier and an adversary is modeled as a static game. The authors assumed some conditions to find a unique Nash equilibrium and derive algorithms that find the equilibria of the prediction model. In general, game-theoretic methodologies try to establish secure learning and develop a suitable countermeasure. However, these approaches relax a lot of constraints, because the compatibility with realistic constraint is too complex to be incorporated into existing game-theoretic approaches [6].

In [8], a proper security evaluation framework named as proactive arms race is introduced. The aim of this framework is to provide a more robust and secure detection system. The activities of a classifier designer in the proactive arms race are illustrated in Fig. 1.

Fig. 1
figure 1

Activities of a classifier designer in the proactive arms race framework

The recent studies in the proactive arms race explored the aforementioned activities of a classifier designer. In the following, we define the notations that are used in this paper, and related work. For ease of presentation, key symbols used in this paper are listed in Table 1. Given \(f \colon \mathcal {X} \rightarrow \mathcal {Y}\) as a two-class classification algorithm that assigns samples in the feature space \(x \in \mathcal {X}\) to a class label \(y \in \mathcal {Y} = \{-1, 1\}\), where −1, +1 indicate the legitimate, malicious class, respectively. The classifier f is trained on data \(\mathcal D\) from distribution \(P(\mathcal {X}; \mathcal {Y})\). The classifier predicts the class labels using a threshold on continuous discriminant function \(g\colon \mathcal {X} \rightarrow \mathcal {R}\). The predicted labels are denoted by y c = f(x), where y shows true labels. In addition, a sample with g(x)<0 is labeled as negative class, i.e. f(x)=−1. g(x)>0 indicates a sample of positive class with f(x)=+1.

Table 1 Key symbols used in the paper

In [6], Biggio et al. investigated the vulnerabilities of classification algorithms by devising an evasion attack at test time. They proposed an optimization problem to model adversary’s strategy during an evasion attack. In this optimization problem, for any target malicious sample x 0, an adversary finds a sample \(x_{0}^{*}\) by minimizing g(x), subject to a constraint on its distance from x 0:

$$ x_{0}^{*} = \arg\!\min_{x} ~g(x)~~~s.t.~~d(x,x^{0}) \le d_{max} $$
(1)

d max defines the maximum distance and it controls the strength of the attack. Higher d max results in higher changes in the features’ value. Defining a suitable d max is application dependent. The constraint gives the adversary freedom to modify malicious samples up to the limit of maximum distance.

This attack manipulates malicious test samples in order to misclassify and evade detection system. They presented a gradient-based algorithm in order to model an evasion attack, that is applicable only against classifiers with differentiable discriminant function. They evaluated the effectiveness of the evasion attack against the SVM classifier.

Zhang et al. proposed a Wrapper based Adversarial-aware Feature Selection referred as WAFS against evasion attack [9]. WAFS considers both classifier’s security and classifier’s accuracy in the feature selection process. They demonstrated that this approach outperforms the traditional feature selection techniques. The main drawback of WAFS is that its computational complexity in comparison to traditional wrapper-based feature selection methods is higher.

None of the previous work has modeled an evasion attack against a classifier with non-differentiable discriminant function. To the best of our knowledge, our approach is the first to address this problem.

3 Background on random forest classifier

Ensemble methods aggregate several base prediction models to enhance generality and robustness of a single prediction model [12]. Random Forest is an ensemble method used for classification and regression tasks. This classifier grows many classification decision trees on training data and predicts the class label that is a majority vote over all the decision trees [13]. This method utilizes bootstrap aggregating or bagging techniques to learn decision trees.

Forest forms B bootstrap, T b , on the training data T train . Each bootstrap samples K random instances with replacement from training data. After that, decision trees are fitted on the bootstraps. Each tree is grown according to Algorithm 1:

figure d

To classify a test sample from an input feature vector F, the feature vector is put down in each tree in the Forest, then each tree gives a prediction that is a vote for a class. Finally, Forest labels the test sample to the class with the highest vote over all trees.

According to [13], the Forest error rate is controlled by two criteria:

  • The pairwise correlation between trees in the Forest: increasing the correlation leads to increasing the Forest error rate.

  • The strength of each decision tree in the Forest: a strong classifier has a low error rate. Increasing the strength of trees leads to decreasing the Forest error rate.

Forest is quite sensitive to the number of random features in the subset; this is an adjustable parameter. Increasing this parameter increases the correlation between trees in the Forest and increases the strength of them, while decreasing it decreases both correlation and strength of trees.

4 Proposed model

Figure 2 demonstrates an overall schema for devising an evasion attack against a learning system. In this figure, applyEvasion, is a component that manipulates the target test data based on adversary’s knowledge in relation to discriminant function of the target classifier. This component in the role of an adversary devises an evasion attack at test time. applyEvasion employs a gradient-based optimization algorithm to model an attack where the discriminant function of classifier is the objective function; therefore, this function needs to be differentiable.

Fig. 2
figure 2

Block diagram of devising evasion attack against a classifier. TR and TS show training and test data, respectively. \(TS^{\prime }\) is the target test data which is altered by an adversary

In Fig. 2a, the discriminant function of the learning classifier is differentiable so applyEvasion component (adversary) devises an attack by receiving information about the classifier directly.

In Fig. 2b, when the classifier has a non-differentiable decision boundary, first the queryStategyModel introduces a surrogate classifier with differentiable decision boundary in place of the target classifier; the surrogate classifier estimates the decision boundary of the target classifier. Then, the adversary devises an evasion attack by receiving information of the surrogate classifier.

4.1 Apply evasion

Figure 3 illustrates the adversary’s strategy during an evasion attack in a two-dimensional input space. In this figure, the adversary’s goal is to modify the distribution of malicious test samples in order to increase the false negative rate of the detection system.

Fig. 3
figure 3

Overall schema of adversary’s strategy. The decision boundary is shown as a black dashed line. The adversary tries to evade the classifier by approaching the distribution of malicious samples (red class) toward distribution of legitimate samples (green class)

Algorithm 2 presents an implementation of adversary’s strategy under evasion attack that is shown in Fig. 3.

figure e

In this algorithm, X target denotes a set of samples which is targeted by the adversary, f is the target classifier, and d max is the maximum distance.

The MinimizeFunc function, shown on line 6, is a bound constraint optimization. This function gets discriminant function of the target classifier and target test data as inputs, and then manipulates target test data subject to some constraints. For any sample from X target , MinimizeFunc minimizes the discriminant function of the target classifier as an objective function, subject to the following constraints:

  • The first constraint determines a continuous feature space. According to this constraint, the modified feature’s value must take a positive value in the range of [0,1].

  • The second constraint states that the distance between the modified sample and its original point must be lower than d max , that is the maximum allowable modification of a sample under attack. For instance, in the spam filtering the maximum distance is the maximum number of words which are allowed to be changed by an adversary; with this constraint, the semantic of a spam message can be preserved. Thus, this distance is essential in controlling how samples could be changed in order to preserve their previous nature.

According to the literature, there are two common choices for defining distance function, i.e. l 1- norm and l 2-norm,Footnote 1 that are depending on the feature space and type of attacks.

For instance, if the adversary prefers to significantly manipulate as few features as possible instead of manipulating the majority of them the l 1-norm should be adopted, as it promotes sparsity; otherwise, the l 2-norm would be a better choice.Footnote 2

We assume that it is more convenient for an attacker to significantly manipulate few features [6, 9]. Therefore, in our implementation l 1-norm is preferred to l 2-norm.

There are different methods to implement MinimizeFunc function. We use L-BFGS-B solver that is a solver for optimization problems comprising an objective function and bound constraints; it is a limited-memory BFGS algorithm suitable for solving large problems. We employ “scipy.optimize.minimize” function in python with L-BFGS-B solver [14, 15]. This function gets the discriminant function of the detection system as an objective function.

MinimizeFunc finds the optimal attacked sample, x . Finally, the output of this algorithm is a collection of attacked samples, X attack , which deceives the target classifier.

4.2 Query strategy model

Algorithm 3 presents the query strategy model. In Fig. 4, we describe an example of this algorithm. This example illustrates four queries (iterations) of this algorithm. Random Forest classifier is the target classifier. The algorithm tries to estimate the Random Forest decision boundary using discriminant function of the SVM classifier with RBF kernel, i.e. the surrogate classifier.

figure f
Fig. 4
figure 4

A conceptual representation of query strategy model. The red positive samples indicate the positive class of training data and the negative blue samples show the negative class of training data. The blue star samples specify the attacked points which are misclassified by SVM classifier. The solid blue lines indicate the boundary of Random Forest classifier and the dash blue ones show the decision boundary of SVM with RBF kernel. After four queries decision boundary of SVM converges to Random Forest’s boundary

Algorithm 3 performs based on the following assumption: if SVM classifier labels data like Random Forest classifier, then it would estimate Forest’s boundary; therefore, they can be good alternatives to each other. In this case, if the adversary devises a successful evasion attack by receiving information about discriminant function of SVM classifier, and this attack evades SVM classifier, therefore, the adversary might succeed in evading Random Forest classifier as well.

In sum, the algorithm attempts to make predictions of the SVM classifier closer to Random Forest classifier at each query. The algorithm terminates after passing a predefined number of queries (QN). Finally, the surrogate classifier is returned as the output of the algorithm. The most important part of this algorithm is how the surrogate classifier follows the decisions made by the target classifier. For this purpose, at each query, the SVM classifier gets Random Forest’s feedback. In the following, the steps of this algorithm are described in detail:

  • Line 1: First both classifiers are trained on the same data, i.e. X train . At this stage, two classifiers might have different decisions at test time.

  • Line 4: In each query, a certain number of data from positive class is sampled randomly from the training data. This data is called target point, X targetPoint .

  • Line 6: Leveraging Algorithm 2, the target points are manipulated in order to mislead the SVM classifier. Of the note, SVM with RBF kernel classifier has a differentiable boundary and the adversary can apply gradient-based evasion attack (Algorithm 2). The target points after an attack are called attacked points, i.e. X attackedPoint .

  • Line 7: The attacked points will be labeled by SVM and Random Forest classifiers. A majority of attacked points are misclassified by SVM classifier; because the adversary has complete information about its discriminant function. However, Random Forest classifier labels the attacked points with higher accuracy, because the adversary is not aware of the decision boundary of Forest classifier.

  • Line 8 & 9: The attacked points and their corresponding labels from the Forest classifier are added to the training data. Then, SVM classifier is retrained on the training data. With that, after each query, prediction of SVM gets closer to Forest’s prediction due to Forest’s feedback.

  • Line 12: Finally, the output of this algorithm is a surrogate classifier with differentiable decision boundary, where its predictions are similar to the Forest.

The adversary exploits the output of queryStategyModel, i.e. the surrogate classifier, in order to devise an evasion attack against Random Forest classifier.

5 Experiments

In this section, first we investigate the effect of adversary on the MNIST handwritten digits classification task. This experiment graphically demonstrates how instances are modified under an evasion attack. After that, we present the experimental results on two well-known applications in the adversarial environment: spam filtering and malware detection in PDF files.

For these experiments, we model two types of attack: Perfect Knowledge (PK) attack and Limited Knowledge (LK) attack. As mentioned, adversary devises an attack based on her knowledge in relation to discriminant function of the target classifier. The adversary might attack the learning system at different levels of knowledge. The adversary’s knowledge range could be from no information to complete information.

  • PK attack: it is a worst case attack where the attacker has a perfect knowledge (PK) of a target system. In the PK attack an adversary has comprehensive information about the feature space, the type of classifier, and the trained model. Although it is an unrealistic assumption, it helps a classifier designer to determine a lower bound performance of a learning system under attack. In the PK attack the adversary knows the discriminant function of the detection system; thus, the adversary can organize an effective attack against the detection system.

  • LK attack: it is a more realistic assumption where adversary’s knowledge is limited to general information about the type of classifier and feature space, not the trained model. In the LK attack, the adversary tries to estimate the classifier boundary through collecting data from similar data sources.

Further, we will discuss devising PK and LK attacks in spam filtering and PDF malware detection in details.

5.1 Handwritten digits classification task

MNIST is a large dataset of handwritten digits of images that were taken from employees of American Census Bureau and students of American high school. MNIST contains ten classes of digits; each digit is represented as a gray scale image of 20×20 pixels. It has a training set of 60000 samples and a test set of 10000 samples. In this experiment, we define a classification problem between two classes of digits, i.e. 3 and 7 from MNIST dataset. The sampled dataset for this experiment contains 500 samples of class 3 and 500 samples of class 7. We uniformly rescale all images to the size of 16×16 in MNIST, and fit them into gray-scale pixel feature vectors; with this we speed up the experiment. In addition, the features’ values are normalized into the range of [0, 1] by dividing them by 255.

The adversary’s goal is to misclassify a sample from class 3 as 7. The adversary has two constraints to modify the feature’s value of a target sample; in the first one, all values are bound in the range of [0, 1], and in the second one, the maximum allowable distance from original attack point is lower than d max =19.6, this is defined according to the previous work [6].

The querystrategyModel samples 100 target points from class 3 in each query. The algorithm terminates after five queries.

Figure 5 graphically shows the effect of adversary after applying the evasion attack against surrogate of Random Forest that is SVM with RBF kernel (C = 10, gamma= 0.1). Figure 5a illustrates a sample from class 3 before attack. Figure 5b shows the effect of devising a successful attack against Random Forest classifier using query strategy model after five queries; as it can be seen, the modified sample resembles the class 7 and Forest classifier misclassifies this sample as class 7.

Fig. 5
figure 5

The effect of the adversary that causes misclassifying a sample from class ’3’ as class ’7’

5.2 Spam filtering

A well-known application in adversarial classification is spam filtering where the classifier designer mostly focuses on the textual content of emails using bag-of-words feature representation. In binary feature representation, each feature indicates presence or absence of a given word in the email body. In the evasion attack, spammer, i.e. adversary, tries to manipulate the content of emails through bad word obfuscation or good word insertion. Misspelling a “spammy” word “cheap” as “che@p” is an example of bad word obfuscation. Moreover, the good word insertion means adding the most common words of legitimate emails into spam email.

5.2.1 Experimental setup

We consider the benchmark TREC 2007 email corpus including 25,220 legitimate and 50,199 spam emails [16]. We use bag-of-words feature model to represent each email as a feature vector. The features’ values are the frequency of words. In addition, we employ Tf-idf term weighting to re-weight the count features into normalized values; Tf-idf is commonly used to reflect how important a word is to a document in a corpus. In order to generate spam dataset, we extract a vocabulary of words from first 5,000 emails in chronological order. Then, we use chi2-feature selection approach to reduce the number of extracted words to 300 features without significant loss in classification accuracy.

For experiments, we randomly select 1000 samples for each class from the remaining emails in the TREC corpus. After that, the dataset of 2000 samples is randomly split into a training and a test set while sixty percent of samples are used for training split. The above process is repeated five times to report final result of classification. For security evaluation, we report False Negative rate (FNR) of a detection system; this metric indicates the portion of malicious samples that are classified incorrectly. High value of FNR presents the effectiveness of an attack.

The surrogate classifier is an SVM classifier with RBF kernel. The SVM parameter is selected through a 5-fold cross validation process on the training data to maximize classification accuracy that mostly yielding C = 10 and gamma = 0.1. The target classifier is a Random Forest classification algorithm with 10 trees and a maximum deep of 30. In this experiment, the maximum number of modified words for devising the evasion attack against Forest at test time is d max ∈{0,20}.

5.2.2 Experimental results

Figure 6 presents how the decision boundary of the SVM classifier gets close to the Forest’s boundary using queryStrategyModel. This figure shows the predictions of SVM classifier and Random Forest on the attacked point. In this figure, FNR of both classifiers is reported in terms of queries.

Fig. 6
figure 6

FNR of Random Forest classifier and the surrogate classifier for sampled attacked points of spam data during query strategy model

In each query of queryStrategyModel, 350 positive samples from training set have been chosen randomly as target points. In the first set of queries, the SVM classifier misclassifies the majority of attacked points as false negative; nonetheless, Random Forest is robust against attack. After that, in the next queries, Forest’s FNR curve goes up. In each query, an adversary devises the evasion attack against the discriminant function of SVM classifier. With increasing queries, the discriminant function of SVM gets closer to Forest; hence, with devising an evasion attack against SVM, Forest can be evaded as well.

Figure 6a raises the question of why after some queries FNR of SVM drops, for example in query number 4. In each query, SVM gets Forest’s feedback and retrains itself. For that after some queries, the SVM classifier intends to make predictions like Forest; therefore, the SVM classifier becomes robust against attack and its evasion rate declines. After that, the FNR curve of SVM has fluctuation until it gets close to the FNR of Forest.

When SVM boundary converges to Forest, devising an attack against SVM leads to increasing FNR of both classifiers. In other words, with minimizing discriminant function of SVM, both classifiers can be evaded and their evasion rates level out at 1.

Figure 6b shows an imperfect boundary estimation where SVM is not converged to the Forest. In querystrategyModel, the number of queries and target points have implications on the algorithm’s convergence. In this case, in each query 200 positive samples from training data are selected as the target points. Moreover, we set a lower number of queries (QN = 8) for algorithm’s termination. Figure 6b demonstrates that after passing all queries, two evasion curves have not converged yet.

We report two types of evasion attack against Random Forest classifier: PK attack and LK attack. In the PK attack, the adversary devises an evasion attack where the decision boundary of the surrogate classifier converges to the Forest and estimates the Forest’s boundary accurately; in this way, the adversary has complete information about the trained model.

queryStategyModel has a weakness in modeling PK attack. Sometimes queryStategyModel does not converge to the target classifier; this means that the surrogate classifier does not succeed to estimate the discriminant function of the target classifier accurately.

In this algorithm, having more target points and the higher number of queries improves the convergence of the algorithm. In the experiment, these parameters are set experimentally. However, this weakness cannot be considered as a severe problem, because in the real world an adversary does not have complete information of the trained model and her knowledge is limited. Therefore, PK attack is not based on a realistic assumption. The realistic assumption has been made in LK attack where adversary’s knowledge is limited to general information about the type of classifier, but not the trained model. To model LK attack, we devise an evasion attack where the decision boundary of the surrogate classifier does not converge to the target’s boundary completely.

Given these considerations, querystrategyModel can be useful even when it imperfectly estimates the target classifier’s boundary.

Figure 7 depicts changes in the Forest’s and the surrogate classifier’s FNR in terms of maximum allowable modification, d max , after applying attack on the test data. FNR at d max = 0, shows the performance of classifiers before the attack. As the figures demonstrate, FNR tendency is on the rise with increasing d max ; increasing d max leads to more modification in the spam emails during the evasion attack.

Fig. 7
figure 7

Security evaluation curves for spam data at test time. Average FNR value ±half standard deviation are shown with error bars

In Fig. 7, dash green curves present FNR of surrogate classifier. The solid blue line in Fig. 7a, represents FNR of Forest classifier under PK attack. In addition, the solid red curve in Fig. 7b draws LK attack against Forest. According to Fig. 7, PK attack causes a severe effect on the classifier’s performance, because in PK attack the adversary has comprehensive information about the target classifier. However, the rise of Forest’s FNR under LK attack is considerable. This observation demonstrates that an adversary can devise a successful attack using our proposed algorithm even with a weak estimation of the target classifier, i.e. Random Forest.

5.3 Malware detection in PDF files

There are different reasons that can increase infection of PDF files. The most important reason is due to the flexibility of logical structure in PDF format. The attackers can embed their malicious JavaScript, ActionScript, binary codes and other type of attacks in the PDF file. In addition, the attackers employ self-protection techniques such as encryption and obfuscation to hide their malicious functionality. Hence, attackers are able to simply devise complex attacks such as polymorphic ones.

The previous studies have shown that malicious PDFs can be detected by their extracted structural information [1720]. However, the robustness of such detection systems are questioned where the malicious PDF files mimic the structure of legitimate PDF samples [17].

5.3.1 Experimental setup

We use a sampled dataset from the Lux0r dataset presented in [21]. The original dataset contains 17000 samples with 735 features. Each feature’s value in this dataset represents the occurrence of the special keywords, functions, constants, objects, methods and attributes referenced by JavaScript code.

In the experiments, we randomly sample a PDF dataset from the Lux0r dataset that is made up of 2000 unique PDF documents embedding JavaScript code: 1000 malicious samples and 1000 benign samples. After sampling, we apply the proposed feature selection method from [21] to reduce the number of features. Moreover, we bound maximum value of each feature to 100. Then, we normalize all features to the range of [0, 1] by dividing them by 100.

PDF files impose an additional constraint during devising evasion attack. According to PDF file structure, it is more manageable to insert an object into a PDF file and preserve its structure than removing an object from PDF file; with this limitation, during an attack features’ value of a target PDF are allowed to be only incremented. This constraint can be accounted by setting an additional constraint, xx 0 in Algorithm 2 where constraints are defined. In addition, the maximum number of added keywords, i.e. d max , controls the attack power. We consider the maximum number of 20 added keywords for modification. The statistical evaluation in this experiment is similar to the experimental setup used for the spam filtering.

5.3.2 Experimental results

Figure 8 shows surrogate classifier’s and Forest’s FNR during query strategy model on 350 attacked points. In the first query, the adversary perfectly evades surrogate classifier; but the evasion attack is not very effective on Forest. However, in comparison to spam filtering experiment, the adversary is more successful in devising attack against Forest on PDF samples in the first set of queries.

Fig. 8
figure 8

FNR of Random Forest classifier and the surrogate classifier for sampled attacked points of the PDF malware data during query strategy method

In Fig. 8a, the surrogate classifier converges to the Forest after query 9. The evasion plot of SVM in Fig. 8a is the same as in the spam filtering experiment; first, evasion rate of SVM is decreased while evasion rate of Forest is increased, then two curves get close to each other and level out at 1.

In Fig. 8b query strategy model terminates at query 8 where surrogate classifier has not converged to Forest. We use two different surrogate classifiers to devise PK and LK attack against Forest classifier on test data. In the PK attack, the adversary employs the surrogate classifier where its decision boundary converges to the Forest’s boundary; while in LK attack two classifiers are not converged.

Figure 9a presents the result of devising PK attack against Forest on test data. In addition, Fig. 9b shows the effect of LK attack against Forest. As it can be seen, in these types of attack Forest is evaded with different evasion rate; the difference of two attacks is the peak of two curves, in PK case it is over 0.8 while in the LK attack it is approximately 0.45. Similar to spam filtering, PK has a severe effect on Forest performance degradation. The devising attack on spam data has a severe effect in comparison to PDF data; because the adversary has an extra constraint to modify PDF samples. According to the limitation of PDF files in object deletion operation, the adversary can only increment features during the attack. Therefore, the effect of PK and LK attacks against Forest on PDF test data results in lower FNR than in spam samples.

Fig. 9
figure 9

Security evaluation curves for the PDF malware data at test time. Average FNR value ±half standard deviation are shown with error bars

6 Concluding remarks and future work

The proactive arms race defines a proper framework for the security evaluation of a learning system. The classifier designer in this framework attempts to proactively evaluate security of learning system by modeling an attacker during the learning phase. Consequently, the classifier designer is able to provide a robust detection system. In this paper, we devised an evasion attack against a Random Forest classifier. We proposed a query strategy model to estimate decision boundary of a Random Forest classifier by a classifier with differentiable discriminant function, i.e. surrogate classifier. After that, we employed the discriminant function of the surrogate classifier to devise an evasion attack.

We model two types of evasion attack: Perfect Knowledge (PK) and Limited Knowledge (LK) attacks. In the PK attack, an adversary has a comprehensive knowledge of learning model while in the LK adversary’s knowledge is limited to general information about the type of classifier and feature space, not the trained model.

To devise a PK attack against Random Forest classifier, an adversary employs the surrogate classifier where its boundary is converged to Random Forest and estimates its boundary perfectly; while in devising LK attack the decision boundary of surrogate classifier does not converge to the Random Forest’s boundary.

We graphically demonstrated a successful attack on an example of MNIST handwritten digits classification task. Moreover, we reported the experimental results on two well-known applications in adversarial machine learning: spam filtering and malware detection in PDF files. Experimental results demonstrated that an adversary can devise a successful attack employing our proposed method even with imperfect information of the target classifier.

In addition, the experiments show that the proposed attack effectively degrades the performance of Random Forest classifier; therefore, it is necessary to develop an appropriate countermeasure against this attack. As a future work, we plan the development of this countermeasure.