Keywords

1 Introduction

Due to the recent breakthroughs in deep neural networks (DNNs) design and training, DL architectures are currently deployed to solving mainstream applications, along with industrial and critical applications: going from intelligent transportation systems [1,2,3], natural language processing [4], robotics [5], and healthcare [6]. This is in part owing to the VLSI technology progress, the new high-performance communication systems and the development of IoT devices. More specifically, this trend results in the generation of abundant amounts of data from different embedded sensors and IT systems, which are necessary for training accurate DNN models.

Given the computing-intensive aspect of DNNs, the by-default deployment of deep models is in Cloud data-centers or private data-centers. However, there are practical limits and drawbacks of such systems at least from 2 perspectives:

(i):

First, from resource and power consumption and consequently environmental impact perspective, this scheme has considerable overheads.

(ii):

Second, from a communication perspective, such a deployment scheme requires sending raw data from sensors to the servers all through wireless and wired communication platforms.

The downside of this scheme is that data-centers are power-hungry platforms; they are estimated to account for around 1% of worldwide electricity use with high environmental impact [7]. These trends motivate a ML computing paradigm that overcomes these issues. Specifically, a more distributed deployment of ML at the Edge emerged as a promising paradigm towards power-efficient near-sensor intelligent systems. While Embedded and Edge ML offers promising power/accuracy trade-off and enhances the mainstream development of ML models towards sustainable and smart systems and cities, several problems still limit ML trustworthiness.

In this chapter, we focus on three aspects of ML trustworthiness, namely Robustness to errors, security, and privacy.

2 ML Robustness to Errors

In a context of performance-driven design requirements, new hardware generations continuously shrink transistors dimensions, thereby increasing circuits sensitivity to external events which can negatively affect their reliability. There are two scenarios in which errors occur in modern embedded systems:

  • Deliberate fault injection attacks such as Rowhammer [8]. Intentional attacks are another potential source of faults. The widespread usage of CNNs led to the development of sophisticated attacks. Malicious users could intentionally tamper with the parameters of the model [9].

  • Reliability-related events such as soft errors either in memories, i.e., Single Event Upsets (SEU) or in combinatorial circuits, i.e., Single Event Transients (SET). These events are typically caused by high energy particles striking electronic devices.

These errors can propagate through the neural network to create accuracy loss, and potentially global system failures that can be safety-critical or security sensitive in some cases.

In this section, we provide an exploratory analysis of DNNs vulnerability to errors.

2.1 Methodology

In most embedded ML accelerators, the model parameters are stored on-board. A memory corruption has a persistent and, hence, cumulative aspect and will remain until a new model is trained and implemented.

To reproduce models behavior under this threat, we simulate memory corruptions by injecting a number of bit-flips in random parameters of a model at runtime (inference). We subsequently evaluate the model robustness for different error rates and locations (Fig. 1).

Fig. 1
A block diagram has the following flow, input, select position, select one bit, flip bit, and a decision box of done under error injection method, and output.

Overview of the fault injection methodology

We consider two data representations:

  • IEEE-754 single-precision 32-bit float: This is the standard representation format for real numbers. It is the dominant representation in CPU and GPU architectures. For simplicity we refer to this representation as \(\mathcal {F}\) in the rest of the chapter. are composed of three parts: a sign, an exponent and a fraction part (see Fig. 2). The normalized format of IEEE-754 floating point is expressed as follows:

    $$\displaystyle \begin{aligned} val = ( -1 )^{\textit{sign}} \times 2^{exp-bias} \times ( 1.fraction ) \end{aligned} $$
    (1)
    Fig. 2
    2 float arrays labeled A and B. A has a string size of 11 grouped under sign, integer, and fractional. B has string sizes under 8 bits and 23 bits grouped under sign, exponent, and fraction.

    Fixed-point representation in (a) with a bit-width of 8 and a fractional length of 2 (left) and − 2 (right). On (b) the standard IEEE-754 representation of 32 bit floating-point values

  • Fixed-point representation: This representation uses two parameters: bit-width and fractional length. Negative fractional lengths can be used to represent powers of two. This representation is referred to as \(\mathcal {Q}\) (for quantized) in the rest of the chapter.

To evaluate the robustness of a given model to faults, we create a fault injection framework that takes a trained network as an input. While testing the model at inference time, bit-flips are injected in the network’s weights with a tunable injection rate. After each test, we report the overall accuracy under fault injection.

These tests are repeated 100 times for a statistically representative experiment. In each run, the engine generates a new set of errors and the injection of the generated errors is performed each run. We then report the accuracy distribution, i.e., the average accuracy, the maximum, the minimum, and the standard deviation for the test.

2.2 Results

The results were obtained on weights in single-precision floating point comparatively with quantized weights in terms of classification accuracy of the different networks. The results of different runs are presented as the mean and the standard deviation of the top-1 accuracy.

Figure 3 illustrates the result of comparing the floating-point and quantized representations. The results show that quantized models are surprisingly more robust to fault injection than the full precision models, which has been consistently observed for 4 different CNNs with different fault injection rates. We believe that the reason behind this observation is the error distance after injection denoted by \(\mathcal {A}\) in [10]. For instance, the \(\mathcal {Q}\) representation with 7 decimal bits and 1 integer bit will differ from the original value by at most ± 1. However, for the full precision representation, the error distance on activation can reach 3 × 1038 as observed in [10]. Therefore, since floating-point numbers are more sensitive to bit-flips than fixed-point representation, quantized networks tend to show higher robustness to errors, in addition to the area and power consumption gains.

Fig. 3
4 double line graphs of accuracy correct predictions versus the number of injected errors for Alex net, V g g 16, Google net, and Squeeze net. All 4 graphs plot the Q accuracy distribution of mu, sigma with a constant trend, and the F accuracy distribution of mu, sigma with a decreasing trend.

Models accuracy under fault injection for weights representation with 8-bit fixed point (\(\mathcal {Q}\)) and 32-bit single-precision IEEE-754 (\(\mathcal {F}\))

3 ML Security

ML systems have been deployed in a variety of application domains, including security-sensitive and safety-critical applications [11]. However, ML models have been shown vulnerable to several security threats, including adversarial examples, which consist of additive noise carefully crafted to fool ML models.

3.1 Adversarial Attacks

Adversarial examples are additive perturbations to an input that are carefully crafted by an adversary to deceive the model and force it to output a wrong label. If adversaries succeed in manipulating the decisions of a ML classifier to their advantage, this can tamper with the security and integrity of the system, and potentially threaten the safety of people in some applications like autonomous vehicles. For example, adding adversarial noise to a stop sign that leads an autonomous car to wrongly classify it as a speed limit sign can lead to crashes and loss of life. In fact, adversarial examples have been shown effective under real-world settings [12,13,14]: that when printed out, an adversarially crafted image can fool the classifiers even under different lighting conditions and orientations. Therefore, understanding and mitigating these attacks is essential to developing safe and trustworthy intelligent systems.

Attacker Knowledge

When attacking a DNN-based model, we can distinguish two main attack scenarios based on attacker knowledge:

  1. (i)

    Black-box setting: the adversary has partial or no access to the victim model’s architecture and parameters. The adversary uses the results of querying the victim to reverse engineer the classifier and create a substitute model used to generate the adversarial examples. An illustration of this scenario is given by Fig. 4.

    Fig. 4
    An illustration of a 4-step attack includes querying the target model, building a substitute model, a flow of clean sample, generate A E, and adversarial example, and attacking the target model with the black box target model.

    Illustration of a black-box attack setting

  2. (ii)

    White-box setting: in which the adversary has complete knowledge of the training data of the victim model in addition to the target model’s architecture and parameters. An illustration of this scenario is given by Fig. 5. (FGSM) [15] attack, Projected gradient descent (PGD) [16] attack, Carlini & Wagnar (C&W) [17] are the main white-box adversarial attacks.

    Fig. 5
    An illustration of a 4-step attack includes accessing the target model with attacker's knowledge of hyper-parameters and others, training his own model, a flow of clean sample, generate A E, and adversarial example, and attacking the target model with the white box target model.

    Illustration of a white-box attack setting

The attacker intention is to slightly modify the source image so that it is classified incorrectly by the target model, without special preference towards any particular output which is known as untargeted attack. However, in a targeted attack, the attacker aims at a specified wrong target class.

Minimizing Injected Noise

An adversary, using information learned about the classifier, generates perturbations to cause incorrect classification under the constraint of minimizing this perturbation magnitude to avoid detection. For illustration purposes, consider a CNN used for image classification. More formally, given an original input image x and a target classification model f() s.t. f(x) = l, the problem of generating an adversarial example x can be formulated as a constrained optimization [17]:

$$\displaystyle \begin{aligned} {} \begin{array}{rlclcl} x^* = \displaystyle \mathop{\text{argmin}}_{x^*} \mathcal{D}(x,x^*), s.t. ~ f(x^*) = l^*, ~ l \neq l^* \end{array} \end{aligned} $$
(2)

where \(\mathcal {D}\) is the distance metric used to quantify the similarity between two images and the goal of the optimization is to minimize this added noise, typically to avoid detection of the adversarial perturbations. l and l are the two labels of x and x, respectively: x is considered as an adversarial example if and only if the label of the two images are different (f(x) ≠ f(x)) and the added noise is bounded (\(\mathcal {D}(x,x^*) < \epsilon \) where \(\epsilon \geqslant 0 \)).

Distance Metrics

The adversarial perturbations should be visually imperceptible by a human eye. Since it is hard to model humans’ perception, three metrics have been practically used to measure the noise magnitude relatively to a given input, namely L0, L2, and L [17]. Notice that these three metrics are special cases of the Lp norm defined as follows:

$$\displaystyle \begin{aligned} \left\|x\right\|{}_p = \left( \sum^{n}_{i = 1} \lvert x_i \rvert^{p} \right)^{\frac{1}{p}} \end{aligned} $$
(3)

These metrics focus on different aspects of visual significance. For example, L0 evaluates the number of pixels with different values at corresponding positions in two inputs. L2 is the Euclidean distance between two images x and x, while L is the maximum difference for all pixels at corresponding positions in the two images.

Adversarial Attacks Generation

Several methods have been proposed in the literature to generate adversarial examples. In the following we give a quick overview on the most popular ones:

Fast Gradient Sign Method (FGSM)

FGSM is a single-step, gradient-based, attack. An adversarial example is generated by calculating a one-step gradient update following the direction of the sign of the loss gradient over the input, which is the direction that maximizes the target model’s loss:

$$\displaystyle \begin{aligned} x_{adv} = x + \epsilon sign (\nabla_{x}J_{\theta}(x,y)) \end{aligned} $$
(4)

where ∇J() is the gradient of the loss function J and θ is the set of model parameters and 𝜖 is the perturbation magnitude budget.

Projected Gradient Descent (PGD)

PGD is a more efficient attack generation method; it is an iterative variant of FGSM where the adversarial noise is generated adaptively as follows:

$$\displaystyle \begin{aligned} x_{adv}^{t+1} = \mathcal{P}_{\mathcal{S}_x}(x_{adv}^t + \alpha \cdot sign (\nabla_{x}\mathcal{L}_{\theta}(x_{adv}^t,y)) ) \end{aligned} $$
(5)

where \(\mathcal {P}_{\mathcal {S}_x}()\) is a projection operator projecting the input into the feasible region \(\mathcal {S}_x\) and α is the added noise at each iteration. PGD find the perturbation that maximizes the loss of a model on a particular input while keeping the size of the perturbation lower than the budget due to the projection operator.

Carlini & Wagner (C&W)

This attack has 3 variants based on the used distance metric (l0, l2, l). It generates adversarial examples by solving the following optimization problem:

(6)

where \(\left \Vert \delta \right \Vert _2\) is the lowest noise that forces the model to misclassify. l(⋅) is the loss function defined as follows:

$$\displaystyle \begin{aligned} l(x) = max(max_{i \neq t } \{Z(x)_i\}-Z(x)_t - \kappa ) \end{aligned} $$
(7)

where Z(x) is the output of the layer before the softmax called logits. t is the target label, and κ is the attack confidence. An adversarial example is considered as successful if maxit{Z(x)i}− Z(x)t ≤ 0.

3.1.1 Defenses Against Adversarial Attacks

To protect ML models against adversarial attacks, several defense techniques can be found in the literature. We briefly introduce the different categories and provide insights from Embedded Systems perspective.

Adversarial Training (AT)

AT is one of the most efficient state-of-the-art defense methods against adversarial attacks whose aim is to integrate the adversarial noise within the training process. It can be formulated as follows [16]:

$$\displaystyle \begin{aligned} \min _{\theta} \mathbb{E}_{(x, y) \sim \mathcal{D}}\left[\max _{\delta \in B(x, \varepsilon)} \mathcal{L}_{c e}(\theta, x+\delta, y)\right] \end{aligned} $$
(8)

where θ indicates the parameters of the classifier, \(\mathcal {L}_{c e}\) is the cross-entropy loss, \((x, y) \sim \mathcal {D}\) represents the training data sampled from a distribution \(\mathcal {D}\), and B(x, ε) is the allowed perturbation set. In this formulation, the inner maximization problem’s objective is to explore the “adversarial surrounding” of a given training point and to take into account, not only the sample but also the worst-case noise from an adversarial perspective. The outer minimization problem is the conventional training aiming at minimizing the loss function (which includes adversarial noise) [16].

Nonetheless, the drawback of AT is its significant computational intensivity compared to the baseline training process. This is obviously due to the nested optimization problems in the formulation that need to be solved iteratively.

Input Pre-processing (IP)

Input pre-processing is based on applying transformations to the input in order to remove adversarial perturbations [18, 19]. Transformations include the averaging, median, and Gaussian low-pass filters [19], as well as JPEG compression [20]. However, it has been shown that these defenses are vulnerable to white-box attacks [21]; in a white-box setting, where the adversary is aware of the defense, they can integrate the pre-processing function in the noise generation process. Furthermore, pre-processing requires computation overheads which is not suitable for resource-constrained devices such as Embedded Systems.

Gradient Masking (GM)

GM leverages regularization to make the model’s output less sensitive to input perturbations. Papernot et al. presented defensive distillation [22]. Nonetheless, this method is vulnerable to C&W attack [17]. Besides, GM techniques such as defensive distillation require a retraining process which results in time and energy overheads.

Randomization-Based Defenses

These techniques leverage randomness to protect systems from adversarial noise. Lecuyer et al. [23] propose that random noise be added to the first layer of the DNN and the output be estimated via a Monte Carlo simulation. Raghunathan et al. [24] evaluate only a tiny neural network. Estimating the model output requires a heavy Monte Carlo simulation with a number of different model inference runs online, which cannot be afforded under resource constraints.

These defense strategies either require changing the DNN structure, modifying the training process or retrain the model only against known adversarial threats, which results in considerable overheads in time, resource utilization and energy consumption. In the following, we present defense strategies that take into account this aspect, which we call Embedded Systems-friendly defenses.

3.2 Embedded Systems-Friendly Defenses

Another set of defense techniques are inspired by hardware-efficiency techniques such quantization [25, 26]. The authors in [27] proposed Defensive approximation (DA), which leverages approximate computing (AC) to build robust models.

3.2.1 Defensive Approximation

The demand on high-performance embedded and mobile devices has been drastically increasing in the past decades. However, the technology is physically reaching the end of Moore’s law, especially with the release of TSMC and Samsung 5 nm technology [28]. On the other hand, we observe that highly accurate computations might not be a must in all application domains. In fact, in a wide range of emerging applications, there is no specific accuracy requirements at the computing-element level, but rather a quality-of-service requirements on the system level. These application are inherently fault-tolerant by design and can relax the computational accuracy constraint. This observation has motivated the development of approximate computing (AC), a computing paradigm that trades power consumption with accuracy. The idea is to implement inexact/approximate elements that consume less energy, as far as the overall application tolerates the imprecision level in computation. This paradigm has been shown promising for inherently fault-tolerant applications such as deep learning, data analytics, and image/video/signal processing. Several AC techniques have been proposed in the literature and can be classified into three main categories based on the computing stack layer they target: software, architecture, and circuit level [29, 30].

Defensive approximation [27] tackles the problem of robustness to adversarial attacks from a new perspective, i.e., approximation in the underlying hardware, and leverages AC to secure DNNs. Specifically, at the lowest level, DA replaces exact conventional multipliers used in the convolution operations by an approximate multipliers. These approximate multiplier can generate inaccurate outputs, but the error distance needs to be under control. For this reason, the approximation occurs specifically in the mantissa multiplication, exclusively, to avoid high magnitude noise in the case of errors in the exponent or the sign bit of floating-point numbers. Subsequently, the convolution layers are built based on the approximate multipliers, which injects AC-induced noise within model layers. This noise is leveraged to protect DNNs against adversarial attacks. Moreover, in addition to the by-product gains in resources due to AC, this defense requires no retraining or fine-tuning of the protected model.

DA targets both robustness and energy/resource challenges. In fact, DA exploits the inherent fault tolerance of deep learning systems to provide resilience while also obtaining by-product gains of AC in terms of energy and resources. The AC-induced perturbations tend to help the classifier generalize and enhances its confidence and consequently enhance the classifier’s robustness. Figure 6 gives an overview on DA mechanism within a CNN. It shows the distribution of the error distance due to the approximate multiplier. This noise distribution propagates within the model and impacts the features map, thereby defusing the adversarial noise mechanism. In the following, we discuss the exploration of approximation space with regards to the baseline accuracy of the models.

Fig. 6
A flow diagram has the following flow, approximate input image, 2 sets of approximate convolution layers with max pooling, exact feature maps, and approximate feature maps, flatten, interconnected fully connected layers, and outputs via softmax that gives classes 1, 2, and 3.

Defensive approximation overview

3.2.1.1 Baseline Accuracy

Before exploring the impact of AC on the security of DNNs, the protected model needs to maintain models’ utility as a bottom line. For this reason, we explore the impact of the approximate multiplier on the accuracy for different levels of approximation, i.e., starting from approximating the full network, comparatively with having increasing exact layers along with the approximate model. Table 1 gives an overview on the utility as a function of the approximation level of the model for CIFAR-10 and ImageNet datasets.

Table 1 Impact of approximation on model classification accuracy for a set of clean Inputs from CIFAR-10 and ImageNet
3.2.1.2 Impact on Robustness

To evaluate the impact of AC on robustness, we consider a powerful adversary that has full access to the defense mechanism as well as the victim model architecture and parameters. Hence, we measure the model accuracy under adversarial examples created using different attacks for several noise budgets.

Figure 7 summarizes the effectiveness of DA defense against FGSM and PGD attacks for different noise budgets (ε). The approximate hardware prevents the attacker from generating efficient AE for deeper networks and complex data distribution. Even with a high amount of injected noise (𝜖 = 0.06), DA model accuracy remains as high as 90% under PGD attack.

Fig. 7
4 double line graphs of accuracy versus noise budget. All 4 graphs plot the approximate model line with a constant trend, and the exact model line with a decreasing trend.

Model accuracy for different noise budgets under white-box attack. (a) CIFAR-10 using FGSM. (b) CIFAR-10 using PGD. (c) ImageNet using FGSM. (d) ImageNet using PGD

3.2.2 Undervolting as a Defense

3.2.2.1 Approach

This approach explores using voltage over-scaling (VOS) as a lightweight defense against adversarial attacks [31]. It consists of reducing supply voltage at runtime, i.e., inference, without accordingly scaling down the frequency (Fig. 8). This creates stochastic hardware-induced noise at computation circuitry that is leveraged to defend DNNs against adversarial attacks. The rationale behind choosing VOS is as follows:

  1. (i)

    Stochastic noise: The impact of injecting random noise on DNNs robustness has been proven theoretically in [23, 32]. However, none of these works provides a practical implementation of the randomness source, especially one that does not require high overhead and considerable complexity to cope with Embedded ML requirements. This approach leverages a fundamental property of VOS, which is a stochastic behavior of the induced timing violations within the circuit.

  2. (ii)

    Controllable noise magnitude: While injected random noise can be used to improve the robustness of DNNs [23], its magnitude should be under control. In fact, injecting high magnitude noise can have drastic impact on baseline accuracy. Nonetheless, VOS-induced noise magnitude is directly controllable by the supply voltage.

Fig. 8
A block diagram of approximate model convention D N N + V O S faults and exact model convention D N N model has the block of data preprocessing divided into generate adversarial samples on exact and approximate models, inferences on exact and approximate models, and accuracy exact and approx.

Experimental setup for undervolted models robustness

3.2.2.2 Setup

To match the fault rates with the voltage levels, we used a Xilinx Zynq Ultrascale+  ZCU104 FPGA platform that hosts a VGG-16 CNN. The device’s Processing System (PS) includes a quad-core Arm Cortex-A53 applications processor (APU), as well as a dual-core Cortex-R5 real-time processor (RPU). We leveraged an external voltage controller, the Infineon USB005, to perform undervolting characterization on the FPGA device, which is connected to the board via an I2C wire. We can read and write the different voltage rail supplies to the board using PowerIRCenter GUI.

3.2.2.3 Impact on Robustness

Figure 9 shows the accuracy of the exact model and undervolted models for LeNet-5, AlexNet, and ResNet-18 CNNs under and 2 C&W attack. While the baseline exact model (hexact) yields high classification accuracy, it drops drastically under C&W attack reaching near 0 for ε = 0.4. Most importantly, approximate model with a fault rate fr = 10−4 maintains a high robustness (accuracy under attack) even for high magnitude ε. This observation holds for AlexNet and ResNet-18 as well.

Fig. 9
3 multi-line graphs and 3 bar graphs of LeNet 5 with M N I S T, Alex Net with CIFAR 10, and Res Net 18 with CIFAR 100. The line graphs plot h exact and 4 h approx lines with f r different values in various trends. The bar graphs plot h exact and 4 h approx bars where one h approx has the highest percentage.

Robustness of VOS-models under C&W attack for both (top) and 2 (bottom) metrics

While VOS offers a practical source of randomness that enhances DNNs robustness to adversarial attacks, it also comes with an obvious by-product gain in terms of power consumption, and offers an ad-hoc defense that does not require modifying the model or retraining it.

Trade-off

The results show that a VOS-induced noise protects DNNs against adversarial attacks. However, aggressive undervolting results in a drop in utility. A trade-off between accuracy and robustness with by-product power savings could be found, to achieve high-robustness models without accuracy drop. An example of a robustness/accuracy trade-off is depicted in Fig. 10. Notice that fr represent the fault rates, which are directly defined by the VOS level. The figure shows that with a simple space exploration, we can identify a sweet-spot for a given CNN that yields the highest possible robustness with the lowest possible accuracy drop.

Fig. 10
A multi-line graph of percentage versus computational fault rate f r. It plots baseline accuracy with a decreasing trend, and robustness on l 2 H S J and robustness on l infinity H S J as bell-shaped curves. A circle highlights the sweet spot in the center where all 3 lines align.

An illustration of the accuracy/robustness trade-off for AlexNet with CIFAR-10 on HSJ. In the figure, fr = 0 indicates the exact model, hexact

3.3 Privacy

Confidentiality is a fundamental design property, especially for systems that process, store, or communicate private and sensitive data. In ML, insuring a model privacy consists in protecting the model against information leakage, whereby an adversary aims to infer sensitive information such as training data by interacting with the victim. In fact, the promising performance of ML systems spread their use to sensitive applications ranging from medical diagnosis in health-care to surveillance and biometrics. These models are trained on various data such as clinical/biomedical records, personal photos, genome data, financial, social, location traces, etc. Moreover, they are also trained with crowd-sourced data as cloud providers (e.g., Amazon AWS, Microsoft Azure, Google API) in a ML-as-a-Service fashion, which allow novice users to train models that often contains personally identifiable information.

ML models are vulnerable to privacy threats, which are critical when data confidentiality is an issue, e.g., when revealing the identity of the patients in clinical records. Membership Inference Attacks [33] aim at determining whether a data sample belongs to the training dataset. More generically, Property Inference Attacks [34] infer certain properties that hold only for a fraction of the training data, and are independent from the features that the DNN model aims to learn. On the other hand, Model Stealing methods [35] aim at duplicating the functionality of the ML model and extract its parameters, and Model Inversion Attacks [36] aim to infer sensitive features of the training data.

Towards avoiding these leakages of confidential information, several privacy-preserving techniques can be employed. Homomorphic Encryption (HE) ensures that the data remains confidential, since the attacker does not have access to the decryption keys. CryptoNets [37] apply HE to perform DNN inference on encrypted data, and the work of [38] extends the encryption to the complete training process. However, HE-based techniques are very costly in terms of execution time and resources.

Another state-of-the-art technique towards privacy-preserving ML is Differential Privacy (DP) which consists of injecting random noise to the stochastic gradient descent process (Noisy SGD) [39], or through Private Aggregation of Teacher Ensembles (PATE) [40], in which the knowledge learned by an ensemble of “teacher” models is transferred to a “student” model. While DP is one of the most efficient defenses against information leakage, it comes at a considerable cost in terms of utility, i.e., it results in a baseline accuracy drop.

Training a deep neural network requires a large amount of data, which represents practically the most valuable asset in ML ecosystems. In some specific applications, data protected by privacy regulations and user level agreements. These can be specific to application domains such as HIPPA regulations in the US which prohibits patients’ data sharing and GDPR in Europe, which is more generic in regulating user data collection [41]. Therefore, in medical applications, a given health institution might not be able to collect enough data that is representative and relevant to train an efficient ML model.

In another scenarios, data may be created on Edge devices, but owners are reluctant to sharing it due to privacy concerns (industrial applications, text messages, etc.), bandwidth challenges, or both.

Federated learning (FL) recently emerged as a potential solution to overcome these aforementioned issues. Specifically, FL allows train ML models collaboratively between different nodes without sharing their local data [42]. FL allows multiple participants (also called clients) to train local models and then consolidate those models into a global model. This global model benefits from all client data, without directly sharing the data, preserving data privacy. Each client trains its model on its private data, and then communicates model updates to a central server (also called aggregator). By avoiding communicating the data to a central back end for training, this data remains local to each client and therefore private. Moreover, distributing the training leads to benefits in performance and network bandwidth. In an FL model, each participant updates the global model by training it on its local data and shares the metadata with a central server. Only the trained local model updates are shared, and the local data to each client remains private. The server aggregates the local model updates into a single federated model and shares this model with the participants, allowing them to benefit from a model trained on the overall data. The federated model can continue to be refined as more data becomes available. This process is illustrated in Fig. 11.

Fig. 11
An illustration has a server connected to the cloud and a network where a malicious aggregator can get access to the leakage and reconstruct client data. It is divided into 3 sets of systems connected to the network and the database where a malicious client can engage in model and data poisoning.

An overview on FL setting: Client devices send locally trained model updates to server for aggregation of the federated model

While FL has been branded by major companies such as Google as a privacy-preserving solution, it has been shown that it is vulnerable to several attacks that can jeopardize its and confidentiality:

Model Poisoning and Data Poisoning

Each of the clients in FL setting is able to arbitrarily change its local model maliciously that they send to server. The model can be manipulated either directly through its parameters or indirectly by poisoning the local training set to degrade the quality of the aggregated model making it misclassify more often, or be more susceptible to adversarial inputs. In model poisoning, a malicious client attempts to change the global model by poisoning their local model parameters directly [43]. In contrast, in data poisoning, the attacker manipulates its local training samples, affecting the model’s performance indirectly throughout a substantial portion of the input space [44].

Deep Leakage from Gradients

With access to the gradient of a particular client, an adversary is able reconstruct the training samples of the client. In fact, attacks like Deep Leakage from gradient (DLG) [45] and iDLG [46] show the possibility to reconstruct training data samples from raw gradients only. The recovered images are pixel wise accurate, and generated through an optimization problem aiming at reducing the difference between the gradient of a given candidate input and the real gradient.

Defenses and Limits

Differential Privacy has been used as a defense against data leakage [39]. However, it does not protect against poisoning attacks. Moreover, secure aggregation techniques such as [47] aim at preventing the server from accessing the individual model updates, while allowing the aggregation operation. However, this defense results by construction in an impossibility to detect integrity attacks.

To defend against integrity attacks, and limit the influence of individual participants, robust aggregation techniques have been proposed (also called Byzantine-tolerant aggregation) [48, 49].

Fairness

FL approach is designed under the assumption of non-iid data. The incentive of participants to share their model updates generated on local data is to enhance the model accuracy, specifically on their own data distribution. However, robust aggregation techniques consider the tail of the gradient updates distribution as a potential integrity attack and cuts it off in the aggregation phase. Therefore, users with “atypical” data, i.e., in the tail of the overall users data distribution will not benefit from the FL setting since their contributions are discarded by the robust aggregation mechanism [50]. This results in a fairness problem: users with minority and atypical data distributions will be disadvantaged by the FL setting.

Open Problems

FL offers an interesting solution towards privately sharing “knowledge representations” without necessarily sharing raw data, which allows to train more generalizing and efficient models. However, a three objectives that are necessary for FL deployment seem to be difficult to obtain simultaneously, i.e., privacy, integrity, and fairness. In fact, secure aggregation techniques solve the privacy problem and open an attack surface on the model integrity. On the other hand, tackling the integrity problem with robust aggregation schemes results in the loss of the global model fairness.

We believe that a fundamental problem to solve by the community is finding interesting and adaptive trade-off between these three objectives.

4 Conclusion

This chapter focuses on three aspects of ML trustworthiness, especially in the context of embedded systems and the Edge:

  1. (i)

    The first is ML models robustness to errors, either due to hardware reliability issues or deliberately injected by malicious actors.

  2. (ii)

    The second aspect is the security of ML models, especially from an adversarial ML perspective. More specifically, we explored defense techniques that are Embedded Systems-friendly, i.e., that do not result in a high overhead in power consumption or hardware resources.

  3. (iii)

    The third is the privacy problem, where we focused on federated learning as an emerging training paradigm that is compatible with Embedded Systems and IoT applications.