Keywords

1 Introduction

Machine Learning has been a hot topic for a long while both in research area and application area. It contains several different kinds of subfields, and for each the models and technologies exhibit great performance in corresponding practical domains or tasks such as computer vision, natural language processing, automatic pilot and so on. As it becomes more and more widely used in practice, its security issues have attracted more and more attentions.

Machine learning models could be exposed to various kinds of security risks. Firstly, previous studies [1, 2] showed that machine learning models are vulnerable. For example, a machine learning classifier could make terrible mistakes when facing specific adversarial examples while humans would not be fooled by them. That means a rather small perturbation could fool the models [3]. They can be intentionally crafted by an adversary if he has access to the models or has the capability to obtain information to some extent. Secondly, based on the assumptions on modelling the adversary given by the security evaluation framework in [4, 5], the train data and model parameters could also be attacked in some specific situations. This means the machine learning models could be manipulated by an adversary, which appears huge risks both for correctness and effectiveness.

Because some practical tasks could be done by several different machine learning models, we may wonder which models should we choose in practice when considering effectiveness, efficiency and security. This paper aims to provide a quantified comparison framework between different machine learning models. We can make our choices by referring to the evaluation results of this comparison framework when dealing with security-effectiveness trade-off dilemma.

We perform our experiments in the case of MNIST two-class (3 and 7) classification problem, as MNIST classification is one of the fundamental problems of machine learning, on which there are several models performing well [6]. We compare three models including linear support vector machine, neural networks with one hidden layer, and convolution neural networks. While the three models all have rather good performance, their complexities are hugely different. We compare these models under the situation of facing three different attack targets, which are test data, train data, and model parameters. These three adversarial settings are under different adversary capability assumptions. Besides, we perform experiments under the situation that adversaries try not to make the changing too obvious on the attack targets. For attacks on model parameters, we limit the capability of the adversary by only enabling attacks on one layer of the model, or one parameter of the model. For attacks on train data or model parameters, we control the attacks strength by limiting the changing range of the attack target. And for attacks on test data, we measure the mimicry behavior initiated by [2]. This action is initiated for leading the adversarial examples to an adversarial area with high density in the whole parameter space. From another perspective, it has an effect of concealing the attack actions launched by the adversary.

This paper is organized as follows: Sect. 1 gives an introduction of this paper. Section 2 gives a review of previous related works. Section 3 introduces the relevant machine learning models including support vector machine, multilayer perceptron and convolutional neural networks. And we also give a review of the concept of adversarial attack and how to model the adversary based on several assumptions for evaluating the model security. Section 4 presents the comparison experiments between these machine learning models which the adversary attacks on the test data (i.e. Evasion Attack). This experiment is based on the idea given by [2]. Section 5 presents the comparison experiments between these machine learning models which adversary attacks on the train data. We focus on a simple attack method called Label Reversal Poisoning. It’s a subclass of poisoning attack. Section 6 presents the comparison experiments between these machine learning models which the adversary directly attacks on the model, which means that the adversary can directly change the parameters of the models to some extent based on our assumptions. Section 7 gives our conclusion on the comparison experiments and some further thoughts.

2 Related Work

Research on adversarial environment of machine learning has lasted for many years. [9] initiated the adversarial attacks against machine learning and gave a framework of modelling the adversary strategy. [10] analyzed the capability of adversary on getting information and launching attacks based on several assumptions. [11, 12] classified the adversarial attacks into several categories and provided several defense strategies. [20] explored the space of adversarial images and explained the existence of adversarial images which cannot trick human beings.

Quantified security evaluation of machine learning models has also been studied during recent years. [2] defined an adversarial attack model based on the capability and knowledge of the adversary. They also provided a general security evaluation framework in [13] considering the machine learning models under adversarial environments. [14] provided a forward derivative algorithm to effectively craft adversarial examples without the need of existing samples. [3] argued that instead of non-linearity, the linear nature of neural networks is the primary cause of neural networks vulnerability to adversarial perturbation. [19] evaluated the robustness of models while considering different layers modified.

As to the different attack targets, [15] utilized the poisoning attacks to evaluate the security of SVM model, while [2] evaluated the security of SVM model under evasion attacks. There are also attack method takes effect when adversary couldn’t get perfect knowledge or capability, which called transferable adversarial attack [16]. There are also several papers elaborate on how to launch adversarial attacks on machine learning tasks in practice, especially the models using in security domain. For example, the spam email filters [7] and malicious control of automobile [8].

3 Preliminary

3.1 Relevant Machine Learning Models

We define a two-class classifier g(x), classifying x as a positive (negative) sample when \(g(x)>0(g(x)<0)\). We compare different models as follows:

Support Vector Machine (SVM) [17] is a classification algorithm which mainly applied on two-class classification tasks. It’s aimed to find a hyperplane to well separate the data of different classes with maximized margin. As a classic and representative algorithm of machine learning, it appears good learning ability and has been applied widely in different tasks such as data mining, image processing and pattern recognition. And besides, it has a relatively strong robustness compare to neural networks. In our experiment here in this paper, we use the basic linear SVM model trained with Hinge Lost. Noticing that the parameters of SVM model only depend on a small number of train samples, so the model appears good robustness.

Multilayer Perceptron also known as Neural Network, has been widely used in lots of application field of real world. It and its several variations appear great power and effectiveness in computer vision, natural language processing and many other artificial intelligence scenarios. In our experiment here in this paper, we use an one-hidden-layer neural network model with 20 hidden nodes and adopt ReLU as activation function, denoted by NN.

Convolutional Neural Network has been the most remarkable machine learning technique since it came out. Almost every state of the art model in computer vision field takes advantage of the convolutional structure. It shows amazing power on picture classification, object detection and many other tasks. And also it’s the most advanced model with respect to the MNIST classification problem. In our experiment here in this paper, we use the classic LeNet with two convolutional layers and two full-connected layers and adopt ReLU as activation function [18].

3.2 Adversarial Attack

Adversarial learning refers to a learning environment in which an adversary attacks the defender’s machine learning models, resulting in a confrontation between the adversary and the defender. Generally, the adversary has a purpose, mostly wants to disable the model to reach his goal. Under the assumptions of given capability and knowledge of the adversary, he may launch different attack type based on different attack targets. For a spam filter, for example, if the adversary has the perfect knowledge of this filter model, he could craft a spam email which can evade the filter by avoiding the illegitimate keywords defined by the filter. And furthermore, if he could get the authority to manipulate the train data or model parameters, he can launch attacks by manipulating the train data or model parameters to crack the models.

Besides launching attacks on the three models in our experiments, we also consider a more realistic situation in which the adversary should restrict his movement size. Because in some cases, the defender may set up additional defense such as another filter in different model or even human monitoring. If your attack appears too obvious, for example, changing all the model parameters to zero or crafting a totally unrecognizable handwriting number picture, the defender may soon detect the attack and launch a new defense immediately. This will make the attack fail very quickly. So the adversary often launches attacks along with some kinds of concealing techniques, such as mimicry the original data or limit the action size at each time step. We will expound on the corresponding method in each following experiment.

3.3 Security Evaluation

Based on the security evaluation framework proposed by [4], we believe that simulating different kinds of potential attack scenarios is the key step for evaluating the security of machine learning models. More specifically, we can empirically evaluate the security by following three steps. Firstly, identify possible attack scenarios. Secondly, design corresponding attacks. Thirdly, systematically assess the impact of the attacks. Noticing that the final step must be done in a specific way for every unique model. Also noticing that our proposed attack model is based on some specific assumptions on the adversary, including purpose, knowledge obtained of the model, and capability with regard to the data or model.

4 Attack on Test Data

Attack on Test Data is also called Evasion Attack, which means modifying the test samples to evade the filter models. In this section, we consider SVM, NN and CNN under evasion attack and try to find out how to modify the test samples according to our purpose such as evading the filter with limited range of movement. Then we simulate several attack scenarios and compare the results.

4.1 Background and Settings

We use a gradient descent algorithm to evade different filter models. This algorithm is to calculate the derivative of g(x) to x, and then subtract the unitized derivative from the original sample x. It is similar to the classical gradient descent training algorithm of neural networks, only with a difference that here we calculate the derivative of the input, not the weights. We can express one attack step into formula as:

$$\begin{aligned} x_{new} = x_{old}-\frac{\frac{\partial g(x)}{\partial x}|_{x=x_{old}}}{\Vert \frac{\partial g(x)}{\partial x}|_{x=x_{old}}\Vert } \end{aligned}$$
(1)

where \(x_{new}\) is the newly generated adversarial example.

Although the adversary aims to make the model more likely to wrongly classify the test data (classify a positive sample into negative class), an overlarge movement range could make the attack action rather easily to be detected by the defender. Therefore, the adversary may limit his movement range when attacking. Besides, he may also conduct a mimicry action proposed in [2] to imitate real positive samples, so that the adversarial examples seem not to be modified too far from a real picture from the perspective of human. For example, many noise points may indicate that the picture has been intentionally modified, and this usually happens under gradient attacks. Noticing that in [2], it is initiated for avoiding local minimum of the gradient descent algorithm. However, from another perspective, it has an effect of concealing the attack actions launched by the adversary.

We describe the problem as an optimization problem. Under above assumptions, for any positive (negative) test sample \(x_0\), the optimal attack strategy is trying to find a modified adversarial example \(x^*\) minimizing (maximizing) the output of corresponding two-class classifier g(x), while limiting the distance between \(x^*\) and \(x_0\) smaller than given \(d_{max}\). But this may lead the adversarial example appears far too different from a real number picture, which may result in being detected by additional defense process launched by the defender. For overcoming this shortage, we add an additional term into our attack objective function to launch a mimicry action. Then we get the following modified optimization problem:

$$\begin{aligned} \mathop {\arg \min }_{x}f(x)=g(x)-c\sum _{i|y_i=-1}K(x-x_i), d(x,x_0)\le d_{max} \end{aligned}$$
(2)

where \(y_i\) is the label of \(x_i\), and K is the kernel density estimator (KDE) function. Here we use Gauss Kernel as KDE in our experiment.

4.2 Experiments and Results

We conduct two experiments in this section. In the fist one, we use the evasion attack algorithm mentioned above to attack SVM, NN and CNN model. We compare the results and analyze their security positions under this attack.

In the second experiment, we take the linear SVM algorithm as an example to visually demonstrate how the mimicry action takes effect. And we also compare the results between original SVM and mimicry SVM.

Comparison Between Models. We compare the performances of SVM, NN and CNN models under evasion attack. We launch attack on every positive samples (i.e. number 3 in MNIST) of test set by different attack strength (i.e. the distance between original sample and adversarial sample), and calculate the average accuracy. Then we get three accuracy curves as shown in Fig. 1.

Fig. 1.
figure 1

Accuracy changing under gradient descent attack (\(\lambda =0\))

From Fig. 1 we can find that CNN model has the highest accuracy, followed by NN model and SVM model when there is no attack. This meets our common sense. But comparing these three models on security, we can find that the accuracy of CNN reduces rapidly as the evasion attack intensifies. This vulnerable security position is caused by the relatively high irregularity (non-linearity) of the classification bound of neural network. Because when a picture facing gradient attacks, the non-linearity of neural network leads the samples changing toward different directions respectively to achieve the classification bound through a quicker route, while SVM model appears stable because the linearity leads all the samples changing toward the same direction. So we can infer that the irregularity of the classification bound of neural network causes model’s vulnerable when facing gradient attack. On the other side, this characteristic enhances the fitting capability of a model. So there is a security-effectiveness trade-off dilemma when facing this kind of attacks.

It is worthy noting that in [3], the authors argued that the vulnerability of neural network isn’t caused by the non-linearity, but its linearity nature. And they showed that the non-linear RBF based classification models can resist the attacks to some degree while sacrifice the classification accuracy. So they claim that neural network is not non-linear enough to resist the perturbation. This conclusion is not incompatible with our result, because the non-linearity here we talk about in this paper refers to the bending or irregularity characteristics of the classification bound. This can be supported by [20]. But this bound is to some extent warped from a hyperplane which appears intrinsic linearity nature because of the linear mappings and activation function within neural networks. That means, the non-linearity concept used here can be used to explain the comparative vulnerability of neural networks to SVM while holding the accuracy, but it’s too coarse to explain all the adversarial characteristics of a model. So more specific concepts and theories beyond non-linearity may need to be proposed in the future to clarify this intricate problem.

Mimicry Attack. We take linear SVM model as an example to illustrate the effect of mimicry action mentioned above. We exhibit the samples generated by the attack algorithm. Figure 2 is the visual comparison of common attack (first line) and mimicry attack (second line). We can see from the figure that the mimicry action truly is more confusable to us. So the adversary may take advantage of this action to conceal his attack.

Fig. 2.
figure 2

Adversarial examples generated by different attack strengths and styles

Fig. 3.
figure 3

Accuracy changing under gradient attack with/without mimicry

Furthermore, the gradient descent attack follows the shortest descent path when there is no mimicry action. After adding the mimicry action, the adversary needs more attack steps to achieve the same adversarial effect, as shown in Fig. 3. Therefore, this behavior can be a trade-off dilemma for the adversary. As for us, we may launch other filter models to assist in defending the attack. This may take effect when facing no mimicry adversarial actions.

5 Attack on Train Data

Attack on Train Data is also called Poisoning Attack, which means the adversary manipulates the training set of a model before the training process begins. This kind of attacks could easily disable the machine learning model because the model even cannot be properly trained. This section based on a simple attack method called Label Reverse Attack and compare the security positions of SVM, NN, CNN under this kind of attacks.

5.1 Background and Settings

We experiment on a simple attack method called label reverse attack, which means to reverse the label (0 or 1) of samples in train data. It can deal with two-category problems such as positive sample filter. Noticing that this kind of attacks only need limited knowledge of the model and do not need the knowledge of model structure and parameters. The attack can be launched very easily if the adversary gets the knowledge and authority of the training set, and can be destructive to the model. Because the machine learning models usually learn the distribution of train data, while the poisoning attacks change the distribution to some extent.

In our experiments, we use random label reversing method to attack. That means to randomly select some training samples in the training set at a certain percentage and reverse their labels. Then we train the model with contaminated train data and finally test the model with the same test data. As the percentage of poisoned data (i.e. the attack strength) increases, we analyze and compare the security positions of SVM, NN and CNN models.

5.2 Experiments and Results

We still experiment on the MNIST dataset. Firstly, we train the three models (i.e. SVM, NN and CNN) and calculate the average model output and the classification accuracy for each model. Then we randomly select \(1\%\) of the training samples in the train set and reverse their labels. After newly training these models, we can newly calculate the values of the two indexes under the poisoning attack. Continuing intensify the attack at a step of \(1\%\), we finally obtain Fig. 4.

Fig. 4.
figure 4

Model output (Left) and accuracy (Right) changing under label reverse attack

From Fig. 4 we can find that CNN is the most unstable model, so we need to keep a balance between classification accuracy and security position when using CNN as a classifier. Besides, we can see in the right figure that when the attack strength reaches \(50\%\), the classification accuracy on test set exhibits a cliff-like fall from 0.9 to 0.1. We can explain this phenomenon through the left figure which depicts the model output. Because when the model output declines to zero, the classification results of the test samples remain unchanged. But the classification confidence indicated by the model output has declined either, so we can see a cliff-like fall of accuracy near the percentage of \(50\%\).

Comparing to the evasion attack, the poisoning attack brings down the classification accuracy rather slowly. One possible reason is that the poisoning attack only takes advantage of limited knowledge, while evasion attack uses the perfect knowledge of underlying model to launch gradient attack.

6 Attack on Model Parameters

We first define three concepts for this sections. Authority means the adversary’s authority on changing corresponding attack target, which is the model parameter in this section. Knowledge means the adversary’s knowledge on model parameters, determining whether the model is a black box or white box to him. Attack Time Window means the period when model’s vulnerability exposing to the adversary, so the adversary must conduct attack quickly for seizing the opportunity.

Now we imagine an attack scenario as follows. If the adversary has perfect knowledge and capability of the model, which means that he has obtained the complete authority of directly manipulating the model. He may modify the model parameters to disable the model under this scenario. We call this kind of adversarial movement Attacks on Model Parameters. This scenario presents an extreme case that may seldom appear in practice because of the complete authority. Besides, when it happens, the adversary could launch far more vicious attacks on the models than manipulating the model parameters, such as directly attack the defender’s system to shut down the defense.

But it’s still worthy to analyze the impact of manipulating parameters directly. Because the complete authority is hardly obtained, so how about part of the authority? If the adversary gets the authority to manipulate some of the parameters to some extent, the analysis on model parameter attack may become valuable. Besides, an adversary may need to conceal his action by limiting his own movement range. Obviously, shutting down the defense of the system directly is far more easily to be detected by the defender than just changing the values of few parameters. We consider this situation as adversary with part of the authority. Following gives some discussions on this kind of attacks based on different assumptions made. Noticing that the intercept of a weight layer is regarded as one weight parameter of corresponding layer, which multiplies one all the time.

6.1 With Complete Authority

Noticing that the complete authority brings perfect knowledge. Under this situation, the most effective attack method under this situation is to calculate the derivative of g(x) to the parameters w and then train w towards the direction given by derivative. This is a kind of adversarial training process and is all the same as the regular training process of a classifier model, except they are just trained toward opposite directions. So we can know that the attack result share the same ranking list with the classification accuracy of the models. Besides, if we consider limited attack time window, then the training speed of corresponding model should be taken into consideration. Still considering SVM, NN and CNN here, we can easily know that the training speed of a model has a negative correlation with the complexity of the model, which means the number of parameters and number of layers. So by contrast, CNN appears the most difficult to train (even if it’s adversarial training) which could be the most secure position when attack time window is limited.

6.2 With Part of Authority and Perfect Knowledge

Assume that the adversary has perfect knowledge of the model but he can only launch attacks on one layer of parameters, noted w. As discussed in complete authority situation, the most effective attack method is still calculating the gradients of parameters and launch a gradient attack. Noticing that one layer of parameters means entirely different on SVM, NN and CNN. For NN it’s 50% of all the parameters, and for CNN it’s 25%, but 100% for SVM which equivalent to complete authority. This means the adversarial training authority of the original model is limited. Besides, if we consider a limited attack time window, the convolution operation is the most complicated one with regard to calculating gradients. This means the SVM exhibiting worst security position here.

6.3 With Part of Authority and Limited Knowledge

But if the adversary only gets limited knowledge, he could not launch gradients attack on the models. That is to say, he can only modify the parameters touchable for him. We assume that the adversary can only launch attacks on one layer or one parameter of the model. With regard to the robustness of models while considering one layer, [19] conducted several experiments and concludes that this analysis is better performed respecting the network architectures due to the heterogeneous behaviors of different layers. And with regard to one parameter, we conduct experiments as follows. Comparing to take his chance on modifying the parameter randomly and get unpredictable adversarial effect, he may utilize simple attack methods. In our experiment, with regard to a given parameter within a layer, the adversary multiplies this parameter by a constant number. We call it Single Parameter Multiplying Attack. Noted that this kind of attacks can be done very easily and quickly regardless of the attack time window. We conduct two experiments under this assumption. The first experiment demonstrates different impacts when different layers of a multilayer model facing this kind of attacks. We use the first convolutional layer and the second fully connected layer of a CNN model to illustrate. Noticing that these two layers are the first and the last layer of the model. We traverse all the parameters of these two layers respectively (noticing that we ignore the intercepts and focus on the weights), and execute single parameter multiplying attack on each of the parameters at different attack strengths (choosing multiplier constant from \(-40\) to 50 at a step of 10). We calculate the average output and the average classification accuracy. Results are shown in Fig. 5.

Fig. 5.
figure 5

Model Output (Left) and Accuracy (Right) Changing under Single Parameter Multiplying Attack on First/Last Layer of CNN. The figures are drawn from scatter points at a step of 10.

Fig. 6.
figure 6

Model Output(Left) and Accuracy(Right) Changing under Single Parameter Multiplying Attack on Last Layer of SVM/NN/CNN. The figures are drawn from scatter points at a step of 10. Noted that the results of NN are unstable, in another word, they could be affected more by the trained parameter distribution of the last layer. The reason could be that the number of neurons in last layer of NN is relatively small.

The figures indicate that the attacks on last layer is more destructive. This is in line with the intuition that the absolute values of gradients of last layer are usually bigger than those of front layers. Besides, from the figure we can infer that the attacks on front layers may appear unpredictability in the results, due to the non-linearity provided by the deep structure of neural networks.

The second experiment demonstrates the correlation between the attack effect and the complexity of model. The experiment method is same as before. First, we conduct attacks only aiming at one random parameter of SVM, NN and CNN respectively (still ignore the intercepts and focus on the weights). It turns out that CNN suffers the least as we expect, and NN follows. Then we conduct attacks aiming at the last layer of SVM, NN and CNN respectively and compare the results. Besides, we newly train a NN model with a hidden layer of 10 cells, compared to the original NN model with 20 hidden cells. Results are shown in Fig. 6.

In this experiment, the SVM model has 784 weight parameters, the last layer of CNN model has 1024 weight parameters, and the last layer of NN merely has 10 or 20 weight parameters. We can infer from the figures that a higher complexity (i.e. the numbers of parameters within a layer) of the layer under attack can improve the robustness of the model, resulting in less damage as the attack intensifies. This presents an opposite result on model complexity, compared to the result of evasion attack. The main reason is that a higher complexity becomes a sort of protection when the adversary only got very limited authority like this experiment, as he can only influence the model on a small scale.

7 Conclusion

In this paper, we firstly summarize the security issues faced by machine learning, describe the current research status of machine learning security under adversarial environment, classify the adversarial attacks that machine learning models might encounter by their different attack targets. Then We simulate different kinds of attacks on SVM, NN and CNN models based on a evaluation framework and compare their security positions.

In the evasion attack (i.e. attack on test data) experiments, we show that machine learning model with high complexity appears poor security position, due to a large number of parameters could be manipulated under adversarial environment. Additionally, we conduct an experiment on the mimicry attack of adversary to illustrate the impact of concealing action on machine learning models. In the poisoning attack (i.e. attack on train data) experiments, we show that even the simplest Label Reversal Poisoning attack can significantly affect the test accuracy of the classifier. The SVM, NN and CNN models all exhibit a cliff-like fall of accuracy when train data has changed by 50%, while CNN model exhibits the most unstable fluctuation. For model parameter attacks, we classify the attacks into several scenarios through making different assumptions on the authority of adversary. The models show different security positions under different scenario due to their intrinsic properties, especially complexity and linearity. Brief results on robustness ranking (security position) under three main attacks are listed in following table, while 1 represents the strongest one. Noted that Authority means the adversary’s authority on changing corresponding attack target, and Knowledge means the adversary’s minimum requirement of knowledge on model parameters.

Attack target

Authority & Knowledge

Attack type

SVM

NN

CNN

Test data

Complete & Perfect

Evasion attack

1

2

3

Train data

Complete & None

Label reversal poisoning attack

1

2

3

Model parameter

Part & Limited

Single parameter multiplying attack on all layers

3

2

1

Model parameter

Part & Limited

Single parameter multiplying attack on last layer

Related to the number of neurons in last layer

This paper shows that different attack scenarios make different security results with respect to these three kinds of models. We could not summarize the correlations between complexity, non-linearity and vulnerability of different models simply into positive or negative. Because the correlations could be quite different in specific situations as mentioned above. So when facing a real attack, we need to analyze the authority, the knowledge, the attack goal, the attack target and the movement range of the adversary in detail. Further research could focus on more complicated and advanced attack methods, and may take other machine learning models into consideration.