Keywords

1 Introduction

Research on Adversarial Machine Learning (AML) has grown considerably in recent years and the consequences of unsecured Machine Learning Systems (MLS) have been studied in detail [1,2,3,4,5,6,7,8,9,10]. Results of these works are of concern to the scientific community, especially in the field of cybersecurity, because machine learning is being used in different applications to assist in decision making where security is paramount: healthcare, autonomous vehicles, power station operation, military operations, computer security, spam and malware detection, etc.

Due to the growing concern for the security of machine learning systems, methods have been developed for the evaluation of this type of systems [2, 11, 12]. Each of these methods conceptually defined taxonomies, threat models and attack strategies to assess MLS, including the adversarial properties that were known at the time. Due to the accelerated progress in adversarial machine learning, currently none of them contain a complete taxonomy and threat model that includes the adversarial properties found so far, and therefore do not allow benchmarking between MLS security assessments.

This research complements methods presented in [2, 11, 12] proposing a different organization of the threat model, and introducing the concern for effective metrics capable of measuring the robustness of machine learning systems to adverse examples. It is important to emphasize that security of machine learning is a constant concern, as their security properties have not been completely understood.

Although a defense threat model could be defined [13], this research is limited to the definition of an adversarial threat model.

Section 2 summarizes most relevant researches on adversary threat models in order to design the theoretical adversarial threat model and taxonomy of adversarial attacks. Section 3 provides an overview to perform a security assessment of a machine learning system considering a threat model, the different types of adversarial attack methods, metrics and we also recommend software tools for the generation of adversarial samples.

2 Threat Model and Taxonomy

The adversarial threat model is composed of the goals, capabilities, and knowledge of the adversary, that the MLS to be assessed will face. Conceptually defining the threat model is essential, because it describes the adversary against whom the system intends to defend itself, guiding the evaluation of the machine’s learning system.

There are researches [2, 4, 11, 12, 14, 15] where threat models and taxonomies are defined, but often are not compatible between them. In [2, 11, 14] methods are proposed to evaluate MLS, the structure of these methods changes in each one, according to their application. Despite the changes, these investigations share the conceptual definition of the threat model, the taxonomy or the attack strategy. In this research, we propose an organization of the threat model and a general taxonomy for attacks that allows the comparison of MLS security assessments.

We have summarized the predominant concepts in the relevant taxonomies and looked for common features to find a description of each concept compatible with previous work [2, 4, 11, 12, 14, 15]. Concepts presented in Sect. 2.1 are based on taxonomies from the most relevant researches in this research field. The taxonomy for the adversary proposed also defines the organization of the analytical threat model.

2.1 Attack Scenario

Attack scenario must be specified in terms of the conceptual model of the adversary. As well as Biggio et al. [11] model, the following scenario is based on the assumption that, the adversary acts rationally to attain a given goal, according to his/her knowledge of the classifier, and his/her capability of manipulating data.

Adversary Knowledge

The adversary can have different levels of knowledge of the targeted system such as the training data, test data, feature set, learning algorithm, model architecture, model methods or trained parameters/hyperparameters.

Biggio et al. [4] characterized the adversarial knowledge of the targeted system in terms of a space:

$$ \Theta = \left( {{\mathcal{D}},{ \mathcal{X}},\text{ }f, w} \right) $$
(1)

Where:

  • \( {\mathcal{D}} \): Training data.

  • \( {\mathcal{X}} \): Feature set.

  • \( f \): Machine learning algorithm, along with the objective function \( {\mathcal{L}} \) minimized during training.

  • \( w \): Trained parameters/hyper-parameters.

Depending on the adversary knowledge, one can describe three different type of attacks.

  • White-Box Attacks: the adversary is assumed to know everything about the targeted system. This setting allows to perform a worst-case evaluation of the security of learning algorithms. It can be characterized as follows:

    $$ \Theta _{\text{WB}} = \left( {{\mathcal{D}},{ \mathcal{X}},\text{ }f, w} \right) $$
    (2)
  • Grey-Box Attacks: the adversary has partial information about the model. Two main cases are characterized below:

    • Surrogate-Dataset (adversary is assumed to know the feature representation \( {\mathcal{X}} \) and the kind of learning algorithm \( f \)):

      $$ \Theta _{{{\text{GB}} - {\text{SD}}}} = \left( {{\hat{\mathcal{D}}},{ \mathcal{X}},\text{ }f,\hat{w}} \right) $$
      (3)

      Where:

      • \( {\hat{\mathcal{D}}} \): Surrogate dataset from a similar source.

      • \( \hat{w} \): Estimated parameters from \( {\hat{\mathcal{D}}} \) (after training a surrogate classifier).

  • Surrogate-Learners (adversary is assumed to know only the feature representation \( {\mathcal{X}} \)):

    $$ \Theta _{{{\text{GB}} - {\text{SL}}}} = \left( {{\hat{\mathcal{D}}},{ \mathcal{X}},\text{ }\hat{f},\hat{w}} \right) $$
    (4)

    Where:

    • \( {\hat{\mathcal{D}}} \): Surrogate dataset from a similar source.

    • \( \hat{f} \): Surrogate learning algorithm.

    • \( \hat{w} \): Estimated parameters from \( {\hat{\mathcal{D}}} \) (after training a surrogate classifier).

  • Black-Box Attacks: the adversary has no knowledge about the model except some components that can be obtained externally. Can be characterized as follows:

    $$ \Theta _{\text{BB}} = \left( {{\hat{\mathcal{D}}},{ \hat{\mathcal{X}}},\text{ }\hat{f},\hat{w}} \right) $$
    (5)

    Where:

    • \( {\hat{\mathcal{D}}} \): Surrogate dataset from a similar source.

    • \( {\hat{\mathcal{X}}} \): Surrogate feature set.

    • \( \hat{f} \): Surrogate learning algorithm.

    • \( \hat{w} \): Estimated parameters from \( {\hat{\mathcal{D}}} \) (after training a surrogate classifier).

Table 1 shows the three different types of attacks based on the adversary knowledge and their most known components of an MLS respectively.

Table 1. Adversary knowledge

Adversary Goals

Adversary Goals are formulated as the optimization of an objective function. Biggio et al. [11] argue that the adversary goal must be defined on the desired security violation, and on the attack specificity. The attack specificity depends on whether an adversary wants to misclassify a targeted or an indiscriminate set of samples. Table 2 summarizes the attack specificity axis.

Table 2. Attack specificity axis

In [1] Papernot et al. define targeted or indiscriminate attacks depending on whether the adversary aims to cause-specific or generic errors. Because it can cause confusion with the interpretation of targeted and indiscriminate attack specificity Biggio et al. modify their naming convention. The error specificity can thus be: specific or generic. Error Specificity disambiguates the notion of misclassification in multi-class problems. Table 3 summarizes Error Specificity attacks.

Table 3. Error specificity attacks axis

Desired end security violation (Table 4) relates to the adversary effort to compromise the system. It is important to emphasize that in the case of MLS, integrity is of paramount importance, because attacks on system integrity and availability are closely related in goal and method.

Table 4. Security violation adversary axis

Adversarial Capabilities

It refers to the control that the adversary has on training and testing data. Table 5 summarizes influence axis.

Table 5. Adversary influence axis

Table 6 summarizes how each author define the threat model in the literature respectively.

Table 6. Threat model assumptions

As we can see in Table 6, some authors use the terms ‘adversary’ or ‘adversarial’ referring to the ‘attacker’, we will use the term ‘adversary’ and ‘adversarial’ as we consider that it fits better in the context of machine learning security assessment. We also consider the definition of the adversary knowledge involves the definition of the attack surface.

2.2 Attack Strategy

The attack strategy define how the training and test data will be quantitatively modified to optimize the objective function characterizing the adversary goal [11]. Biggio et al. [4] characterized the optimal attack strategy as follows:

$$ {\mathcal{D}}_{c}^{ *} \in {\text{arg max}}_{{{\mathcal{D}}_{c}^{ '} \in\Phi \left( {{\mathcal{D}}_{c} } \right)}} {\mathcal{A}}\left( {{\mathcal{D}}_{c}^{ '} , \theta } \right) $$
(6)

Where:

  • \( \theta \in\Theta \): Adversary knowledge

  • \( {\mathcal{D}}_{c} \): Initial attack samples

  • \( {\varPhi \mathcal{D}}_{c} \): Space of possible modifications

  • \( {\mathcal{A}}\left( {{\mathcal{D}}_{c}^{ '} , \theta } \right)\text{ } \in {\mathbb{R}} \): Adversary goals objective function

  • \( {\mathcal{D}}_{c}^{ '} \in\Phi ({\mathcal{D}}_{c} ) \): Set of manipulated adversarial examples

3 Security Assessment Method

Most authors proposed security assessments focused on a specific application, classifier, and attack, performing security assessment procedures based on the exploitation of problem knowledge and heuristic techniques. They point to a previously unknown vulnerability or to assess the impact of a known attack on the security of an MLS. Here we propose an analytical method that complements the existing [4, 11] security assessment methods.

As part of the evaluation model, it is necessary to identify the threat model, in order to illustrate necessary concepts to identify it, the organization of the axes mentioned in Sect. 2 are presented in Sect. 3.1.

Threat model could be interpreted as general guidelines for the security assessment of an MLS. Figure 1 illustrates the assumptions of our proposed threat model. Attack scenario must be defined making assumptions about the adversarial knowledge, adversarial goals and adversarial capability. The definition of the attack strategy is a fundamental part of the model since it attempts to optimize the function that characterizes the adversary goals, we will discuss more about this further.

Fig. 1.
figure 1

Threat model for security assessment of machine learning systems.

3.1 Attack Strategy

As we mentioned in Sect. 2.2 the attack strategy must be defined based on the function characterizing the adversary goal. The definition of the attack strategy, the adversary knowledge, and the adversarial capabilities help to define which methods of attack to use. However, it should be mentioned that evaluating MLS with as many methods as possible will provide an even more detailed evaluation.

After defining threat model, attack scenario and attack strategy the adversarial attacks methods must be selected or designed, in Sect. 3.2 we show some state of art methods.

3.2 Adversarial Attacks Methods

Adversarial attack methods should be selected according to the defined threat model to guide the security assessment. Table 7 summarizes the most relevant adversarial attack methods according to our taxonomy proposed in Sect. 2. We consider these attacks as they have shown the best results when vulnerating MLS [15], designer/adversary can select state of the art attacks not mentioned in the table that fits their attack scenario.

Table 7. Most relevant adversarial attack methods for generating adversarial examples

We recommend that attack methods be used that fit the assumptions about the adversary knowledge, goals and capabilities, as well as consider the computational cost (attack frequency) and whether the model is gradient-free or not.

In Table 7 we categorized adversarial attack methods according to our taxonomy proposed in Sect. 2, also we introduce under which metric each attack is limited. In Sect. 3.3 we go into detail about this metrics.

3.3 Metrics

Throughout the brief history of adversarial attacks, different metrics have been used to measure the change in the original samples from the adverse samples. Goodfellow and others used metrics based on \( L_{p} \) norms, however, these types of metrics are not useful for measuring the robustness of an MLS, which is why Weng et al. [23] introduced CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness), a metric that provides an agnostic measure of attack to evaluate the robustness of any machine learning classifier trained against adversarial examples. In Table 8, we resume metrics used in adversarial settings.

Table 8. Metrics

Weng et al. [23] introduce CLEVER an attack agnostic metric to measure lower bound robustness, based on Lipshitz continuity, however, Goodfellow et al. [24] show that CLEVER fails to correctly estimate lower bound robustness, even in theoretical settings. The question of measuring robustness remains open.

We recommend the use of both distance and accuracy metrics, since attacks that remain below the limits of the corresponding \( L_{P} \) norm and obtain high accuracy could be considered effective, and therefore the adversarial robustness of the MLS is considered low.

Derived from the threat model, we can define two types of evaluation methods; one that is directly related to the designer and other to the adversary. Figure 2 briefly illustrates our method for a designer to perform a security assessment, it is important to emphasize that the order cannot be altered.

Fig. 2.
figure 2

Designer security assessment method

Figure 3 briefly illustrates our method for an adversary to perform a security assessment, as in designer evaluation method, the order cannot be altered.

Fig. 3.
figure 3

Adversary security assessment method

4 Discussion

We can observe that the evaluation method proposed is based on modeling the adversary, which allows the designer to anticipate the adversary by identifying threats that the system can face, as well as simulating attacks. The organization of the threat model proposed in Fig. 1 allows us to define the attack scenario and to model the adversary depending on his knowledge, goal, and capability.

On the adversary’s side, our threat model will help to analyze the MLS, since he will be able to identify what knowledge, goal and capability he has of the system and then chooses or design an attack method. As a result, we will have a security assessment performed from the adversarial side.

We decided not to include the development of countermeasures as part of the method, as was done in [2, 11] because this research focuses only on the security assessment of MLS. However, we leave open the possibility for the reader to cycle the methods and include the development of countermeasures in order to obtain MLS robust to adversarial attacks.

In Fig. 3.1 we can see in the adversarial goal axis that we include error specificity axis; this is because we find it helpful in evaluating multi-class classifiers. The fact that our method also considers multi-class classifiers makes it a high-level guideline.

5 Conclusions and Future Work

The security assessment method proposed in this paper provides the features necessary to perform security assessments of MLS. Each of the terms used for the conceptual definition of the threat model was compared with its similar, which allowed to choose the organization of the threat model that allows to model the adversary in detail defining assumptions about their goals, knowledge and capabilities. A limitation of the evaluation method for the designer is that it requires a full analysis of the adversary’s behavior, which is sometimes difficult and in the case of the evaluation method for the adversary is data-dependent. The unification and update of the previous security assessment methods as well as the introduction of robustness metrics will allow a more detailed security evaluation of the MLS.

However, there are still open problems, such as analyzing the vulnerabilities of the MLS with respect to adversarial attacks and developing metrics capable of quantifying the robustness of a machine learning system to adversarial examples. These issues will need to be addressed soon to help ensure that the implementation of machine learning systems in adversarial settings is secure.

As future work, we will introduce a defense threat model and defense taxonomy, with the purpose of assessing defense methods for MLS.