Security for Distributed Machine Learning Based Software

Gomez, Laurent; Ibarrondo, Alberto; Wilhelm, Marcus; Márquez, José; Duverger, Patrick

doi:10.1007/978-3-030-34866-3_6

Laurent Gomez⁷,
Alberto Ibarrondo⁸,
Marcus Wilhelm⁹,
José Márquez¹⁰ &
…
Patrick Duverger¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1118))

Included in the following conference series:

International Conference on E-Business and Telecommunications

295 Accesses

Abstract

Current developments in Enterprise Systems observe a paradigm shift, moving the needle from the backend to the edge sectors of those; by distributing data, decentralizing applications and integrating novel components seamlessly to the central systems. Distributively deployed AI capabilities will thrust this transition.

Several non-functional requirements arise along with these developments, security being at the center of the discussions. Bearing those requirements in mind, hereby we propose an approach to holistically protect distributed Deep Neural Network (DNN) based/enhanced software assets, i.e. confidentiality of their input & output data streams as well as safeguarding their Intellectual Property.

Making use of Fully Homomorphic Encryption (FHE), our approach enables the protection of Distributed Neural Networks, while processing encrypted data. On that respect we evaluate the feasibility of this solution on a Convolutional Neuronal Network (CNN) for image classification deployed on distributed infrastructures.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Practical Deep Neural Network Protection for Unmodified Applications in Intel Software Guard Extension Environments

Secure Transfer Learning for Machine Fault Diagnosis Under Different Operating Conditions

SoK: Cryptography for Neural Networks

Keywords

1 Introduction

1.1 Motivation

Until now, the backend (on-prem & cloud) deployments were considered as the single source of truth & unique point of access in regards of Enterprise Systems (ES). Nevertheless, a paradigm shift has been recently observed, by the deployment of ES assets towards the Edge sectors of the landscapes; by distributing data, decentralizing applications, de-abstracting technology and integrating edge components seamlessly to the central backend systems.

Capitalizing on recent advances on High Performance Computing along with the rising amounts of publicly available labeled data, Deep Neural Networks (DNN), as an implementation of AI, have and will revolutionize virtually every current application domain as well as enable novel ones like those on autonomous, predictive, resilient, self-managed, adaptive, and evolving applications.

Distributively deployed AI capabilities will thrust the above mentioned transition. As reported by Deloitte, “... companies are incorporating artificial intelligence in particular, machine learning into their ‘Internet of Things applications’ and seeing capabilities grow, including improving operational efficiency and helping avoid unplanned downtime” [28].

1.2 Problem Statement

The deployment of data processing capabilities throughout Distributed Enterprise Systems rises several security challenges related to the protection of input & output data [26] as well as of DNN-based/enhanced software assets.

In the specific context of distributed intelligence, DNN-based/enhanced software assets will represent key investments in infrastructure, skills and governance, as well as in the acquisition of data and talents. The software industry is therefore in the direct need to safeguard these strategic investments by enforcing the protection of this new form of Intellectual Property.

Furthermore, on the wake of Data Protection (DP) regulations such as the EU-GDPR [26], Independant Software Vendors (ISVs) have the non-transferable requirement to comply with those.

Therefore, ISVs aim to protect both: data and the Intellectual Property of their DNN-based/enhanced software assets, deployed on potentially unsecure edge hardware & platforms [15].

1.3 State-of-the-Art

Security of Deep Neural Networks is a current research topic taking advantage of two major cryptographic approaches: variants of Fully Homomorphic Encryption/FHE [12] and Secure Multi-Party Computation/SMC [8]. While FHE techniques allow addition and multiplication on encrypted data, SMC enables arithmetic operations on data shared across multi-parties.

Several approaches can be found in the literature, at different phases of the development and deployment of DNNs.

Secure Training. Secure DNN training has been addressed using FHE [16] and SMC [30], disregarding protection once the trained model is to be productively deployed. Other Machine Learning models such as linear and logistic regressions have also been trained in a secure way in [24]. In those approaches, confidentiality of training data is guaranteed, while runtime protection (i.e. input, model, output) is out of scope.

Processing on Encrypted Data. At processing phase, SMC has led to cooperative solutions where several devices work together to obtain federated inferences [21], not supporting deployment of the trained DNN to trusted decentralized systems. DNN processing on FHE encrypted data is covered in CryptoNets [13], improved in [4, 18]. More recently, in [2], the authors proposed a privacy-preserving framework for deep learning, making use of the SEAL [29] FHE library. While disclosure of data at runtime is prevented in these solutions, protection of DNN models remains out of the scope.

Intellectual Property Protection of DNN Model. In [31], the authors tackles IP protection of DNN models through model watermarking. While infringement can be detected with this method, it can not be prevented. Furthermore, runtime protection of input, model and output are out of scope.

To the best of our knowledge, no other publication has holistically tackled the protection of both trained DNN models and data, targeting distributed untrusted systems.

1.4 Data and Intellectual Property Protection for Deep Neural Networks

In this paper we propose a novel approach for the Intellectual Property Protection of DNN-based/enhanced software assets while enabling data protection at processing time, making use of concepts such as Fully Homomorphic Encryption (FHE).

Once trained, DNN model parameters (i.e. weights, biases) are encrypted homomorphically. The resulting (encrypted) DNN can be distributed across untrusted landscapes, preserving its IP while mitigating the risk of reverse engineering. At runtime, FHE-encrypted insights from encrypted input data are produced by the homomorphically encrypted DNN. Confidentiality of both trained DNN, input and output data will be therefore guaranteed.

Despite of recent improvements of FHE schemes [3, 5] and implementations [17, 25, 29], homomorphic encryption remains computationally expensive. Hence it could represent a bottleneck having a negative impact on overall performance, and on the accuracy of encrypted DNNs outputs, handling encrypted inputs. In this paper, we therefore evaluate as well the overall performance (e.g. CPU, memory, disk usage) along with the accuracy of encrypted DNNs.

This paper is organized as follows: Sect. 2 details the fundamentals of our approach. Section 3 provides an overview of our solution. In Sects. 4 and 5, we present the architecture and evaluation, concluding with an outlook in Sect. 6.

2 Fundamentals

2.1 Deep Neural Network

Figure 1 depicts a DNN with multiple layers. It is composed of L layers:

1.
An input layer, the tensor of input data $\mathbf {X}$.
2.
$L-1$ hidden layers, mathematical computations transforming $\mathbf {X}$ sequentially.
3.
An output layer, the tensor of output data $\mathbf {Y}$.

We denote the output of layer i as a tensor $\mathbf {A^{[i]}}$, with $\mathbf {A^{[0]}}=X$, and $\mathbf {A^{[L]}}=Y$. Tensors can have different sizes and number of dimensions.

Each layer $\mathbf {A^{[i]}}$ depends on the mathematical computations performed at the previous layer $\mathbf {A^{[i-1]}}$. At each layer $\mathbf {A^{[i]}}$, two types of function can be computed:

Linear: involving polynomial operations.
Non-linear, involving non-linear operations, so called activation function, such as max, exp, division, ReLU, or Sigmoid.

Linear Computation Layer. For the sake of clarity, we exemplify the inner linear computation with a Fully Connected (FC) layer, as depicted in Fig. 2.

A Fully Connected layer, noted $\mathbf {A^{[i]}}$, is composed of n parallel neurons, performing a $\mathbb {R}^n\rightarrow \mathbb {R}^n$ transformation (see Fig. 2). We define:

$\mathbf {a^{[i]}} = \begin{bmatrix} a^{[i]}_0 \ldots a^{[i]}_k \ldots a^{[i]}_N \end{bmatrix}^T$ as the output of layer $\mathrm {A}^{[i]}$;
$\mathbf {z^{[i]}} = \begin{bmatrix} z^{[i]}_0 \ldots z^{[i]}_k \ldots z^{[i]}_N \end{bmatrix}^T$ as the linear output of layer $\mathrm {A}^{[i]}$; ($\mathbf {z^{[i]}}=\mathbf {a^{[i]}}$ if there is no activation function)
$\mathbf {b^{[i]}} = \begin{bmatrix} b^{[i]}_0 \ldots b^{[i]}_k \ldots b^{[i]}_N \end{bmatrix}^T$ as the bias for layer $\mathrm {A}^{[i]}$;
$\mathbf {W^{[i]}} = \begin{bmatrix} \mathbf {w^{[i]}_0} \ldots \mathbf {w^{[i]}_k} \ldots \mathbf {w^{[i]}_N} \end{bmatrix}^T$ as the weights for layer $\mathrm {A}^{[i]}$.

Neuron k performs a linear combination of the output of the previous layer $\mathbf {a^{[i-1]}}$ multiplied by the weight vector $\mathbf {w^{[i]}_k}$ and shifted with a bias scalar $b^{[i]}_k$, obtaining the linear combination $z^{[i]}_k$:

$$\begin{aligned} z^{[i]}_k=\left( \sum _{l=0}^{M}w^{[i]}_k[l]*a^{[i-1]}_l\right) +b^{[i]}_k={\mathbf{w}^{[\mathbf{i}]}_\mathbf{k}}*{\mathbf{a}^{[\mathbf{i}-{\mathbf {1}}]}}+b^{[i]}_k~[14] \end{aligned}$$

(1)

Vectorizing the operations for all the neurons in layer $A^{[i]}$ we obtain the dense layer transformation:

$$\begin{aligned} \mathbf {z^{[i]}}=\mathbf {W^{[i]}}*\mathbf {a^{[i-1]}}+\mathbf {b^{[i]}}~[14] \end{aligned}$$

(2)

where $\mathbf {W}$ and $\mathbf {b}$ are the parameters for layer $A^{[i]}$.

Activation Functions. Activation functions are the major source of non-linearity in DNNs. They are performed element-wise ($\mathbb {R}^0\rightarrow \mathbb {R}^0$, thus easily vectorized), and are generally located after linear transformations such as Fully Connected layers.

$$\begin{aligned} a^{[i]}_k=f_{act}\left( z^{[i]}_k\right) \end{aligned}$$

(3)

Several activation functions have been proposed in the literature but Rectified Linear Unit (ReLU) is currently considered as the most efficient activation function for DL. Several variants of ReLU exist, such as Leaky ReLU [23], ELU [7] or its differentiable version Softplus.

$$\begin{aligned} \begin{aligned} ReLU(z)&=z^+=max(0, z) \\ Softplus(z)&= log(e^z + 1) \end{aligned}~[14] \end{aligned}$$

(4)

2.2 Homomorphic Encryption

While preserving data privacy, Homomorphic Encryption (HE) schemes allow certain computations on ciphertext without revealing neither its inputs nor its internal states. Gentry [12] first proposed a Fully Homomorphic Encryption (FHE) scheme, which theoretically could compute any kind of arithmetic circuit, but is computationally intractable in practice. FHE evolved into more efficient schemes preserving addition and multiplication over encrypted data, such as BGV [3], FV [11] or CKKS [5], allowing approximations of multiplicative inverse, exponential and logistic function, or discrete Fourier transformation. Similar to asymmetric encryption, a public-private key pair (pub, priv) is generated.

Definition 1

An encryption scheme is called homomorphic over an operation $\odot $ if it supports the following

$$\begin{aligned}\begin{gathered} Enc_{\mathbf {pub}}(m) = \left\langle {m} \right\rangle _{{\varvec{pub}}}, \forall m \in \mathcal {M} \\ \left\langle {m_1\odot m_2} \right\rangle _{{\varvec{pub}}} = \left\langle {m_1} \right\rangle _{{\varvec{pub}}} \odot \left\langle {m_2} \right\rangle _{{\varvec{pub}}}, \forall m_1, m_2 \in \mathcal {M} \end{gathered}\end{aligned}$$

where $Enc_{\mathbf {pub}}$ is the encryption algorithm and $\mathcal {M}$ is the set of all possible messages.

Definition 2

Decryption is performed as follows

$$\begin{aligned}\begin{gathered} Enc_{\mathbf {pub}}(m) = \left\langle {m} \right\rangle _{{\varvec{pub}}}, \forall m \in \mathcal {M} \\ Dec_{\mathbf {priv}}(\left\langle {m} \right\rangle _{{\varvec{pub}}}) = m \end{gathered}\end{aligned}$$

where $Dec_{\mathbf {priv}}$ is the decryption algorithm and $\mathcal {M}$ is the set of all possible messages.

2.3 Challenges

Even though HE schemes seem theoretically promising, their usage comes with several drawbacks, particularly when applied to Deep Learning.

Noise Budget. In Gentry’s lattice-based HE schemes [12] and subsequent variants of it, ciphertexts contain a small term of random noise drawn from some probability distribution. While every operation performed on a ciphertext increases the noise of the resulting ciphertext, it is important to keep the noise below a certain threshold, because once the noise reaches that threshold, it is no longer possible to decrypt the ciphertext. To estimate the current magnitude of noise, a noise budget can be calculated, that starts as a positive integer, decreases with subsequent operations and reaches 0 exactly when the ciphertext becomes indecipherable. The noise budget is more strongly affected by multiplications as by additions.

In order to cope with that challenge, encryption parameters can be adjusted accordingly to the required computation depth of an arithmetic circuit. In addition, Gentry introduced the so called bootstrapping procedure, which resets the noise budget of a ciphertext, but requires significant additional computational costs. Recently in [5], the authors proposed a optimized bootstrapping approach with improved performance.

FHE Libraries and APIs. As summarized in Table 1, multiple FHE libraries are available. Depending on the supported HE schemes, those libraries show noticeable difference on performance (e.g. computational, memory consumption), on supported operations type (e.g. addition, multiplication, negative, square, division), datatype (e.g. floating point, integer), and chipset infrastructure (e.g. CPU, GPU).

In addition, and regardless on their level of maturity and performance, HE libraries can be configured through several encryption parameters such as:

Polynomial degree or modulus: which determines the available noise budget and strongly affects the performance.
Plaintext modulus: which is mostly associated to the size of input data.
Security parameter: which sets the reached level of security in bits of the cryptosystem (e.g. 128, 192, 256-bit security level).

Fine-tuning of those encryption parameters enables developers to optimize the performance of encryption and encrypted operations. The selection of the right encryption parameters depends on the size of the plaintext data, targeted accuracy loss or level of security.

Table 1. FHE implementation libraries [14].

Full size table

Linear Function Support Only. By construction, linear functions, composed of addition and multiplication operations, are seamlessly protected by FHE. But, non-linear activation functions such as ReLU or Sigmoid require approximation to be computed with FHE schemes.

The challenge lies on the transformation of activation functions into polynomial approximations supported by HE schemes. We elaborate more on approximation of activation functions in Sect. 3.2.

Supported Plaintext Type. The vast majority of HE schemes allow operations on integers [17, 29], while others use booleans [6] or floating point numbers [5, 29]. In the case of integer supporting HE schemes, rational numbers can be approximated using fixed-point arithmetic by scaling with a scaling factor and rounding.

Performance. FHE schemes are computationally expensive and memory consuming. In addition, ciphertexts are often significantly bigger than plaintexts and thus use more memory and disk space.

Even if in the past years the performance of FHE made it impractical, recent FHE schemes show promising throughput. New FHE libraries take also advantage of GPU acceleration.

In addition, modern implementations of HE schemes such as HELib [17], SEAL [29], or PALISADE [25] benefit from Single Instruction Multiple Data (SIMD), allowing multiple integers to be stored in a single ciphertext and vectorizing operations, which can accelerate certain applications significantly.

3 Approach

As introduced in Sect. 1.2, the delivery of DNN-enriched insights come at a cost. ISVs aim to guarantee data security, together with the IP protection of their DNN-based/enhanced software assets, deployed on potentially unsecure edge hardware & platforms. In order to achieve those security objectives on DNN, we utilize FHE schemes to operate on ciphertext at runtime.

Consequently, secure training of DNN is out of scope of our approach as we focus on runtime execution. We assume that DNN training already preserves both data privacy & confidentiality, and the resulting trained model. Once a model is trained, as discussed in Sect. 2.1, we obtain a set of parameters for each DNN layer; i.e weights $\mathbf {W^{[i]}}$ and biases $\mathbf {b^{[i]}}$ for Fully Connected layers DNN are not solely made of FC layers, and in [14], we identified different type of linear operations parameters within DNN such as Batch Normalization [19] or Convolutional Layer [20]. Those parameters constitute the IP to be protected when deploying a DNN to distributed systems.

3.1 Linear Computation Layer Protection

Our approach is agnostic from the type of layer. In [14], we detail the encryption of layers such as Convolutional Layer or Batch Normalization. For sake of simplicity, we exemplify the encryption of DNN layers parameters on FC layers. Since FC are simply a linear transformation on the previous layer’s outputs, encryption is achieved straightforwardly as follows

$$\begin{aligned} \begin{aligned} \left\langle {\mathbf {z^{[i]}}} \right\rangle _\mathbf{pub }&= \left\langle {\mathbf {W^{[i]}}*\mathbf {a^{[i-1]}}+\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\&= \left\langle {\mathbf {W^{[i]}}} \right\rangle _\mathbf{pub }*\left\langle {\mathbf {a^{[i-1]}}} \right\rangle _\mathbf{pub }+\left\langle {\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\ \end{aligned}~[14] \end{aligned}$$

(5)

Fully Connected Layer (FC). Also known as Dense Layer, it is composed of N parallel neurons, performing a $\mathbb {R}^1\rightarrow \mathbb {R}^1$ transformation (Fig. 1). We will define:

$\mathbf {a^{[i]}} = \begin{bmatrix} a^{[i]}_0 \ldots a^{[i]}_k \ldots a^{[i]}_N \end{bmatrix}^T$ as the output of layer i;
$\mathbf {z^{[i]}} = \begin{bmatrix} z^{[i]}_0 \ldots z^{[i]}_k \ldots z^{[i]}_N \end{bmatrix}^T$ as the linear output of layer i; ($\mathbf {z^{[i]}}=\mathbf {a^{[i]}}$ if there is no activation function)
$\mathbf {b^{[i]}} = \begin{bmatrix} b^{[i]}_0 \ldots b^{[i]}_k \ldots b^{[i]}_N \end{bmatrix}^T$ as the bias of layer i;
$\mathbf {W^{[i]}} = \begin{bmatrix} \mathbf {w^{[i]}_0} \ldots \mathbf {w^{[i]}_k} \ldots \mathbf {w^{[i]}_N} \end{bmatrix}^T$ as the weights of layer i.

Neuron k performs a linear combination of the output of the previous layer $\mathbf {a^{[i-1]}}$ multiplied by the weight vector $\mathbf {w^{[i]}_k}$ and shifted with a bias scalar $b^{[i]}_k$, obtaining the linear combination $z^{[i]}_k$:

$$\begin{aligned} z^{[i]}_k=\left( \sum _{l=0}^{M}w^{[i]}_k[l]*a^{[i-1]}_l\right) +b^{[i]}_k=\mathbf {w^{[i]}_k}*\mathbf {a^{[i-1]}}+b^{[i]}_k~[14] \end{aligned}$$

(6)

Vectorizing the operations for all the neurons in layer i we obtain the dense layer transformation:

$$\begin{aligned} \mathbf {z^{[i]}}=\mathbf {W^{[i]}}*\mathbf {a^{[i-1]}}+\mathbf {b^{[i]}}~[14] \end{aligned}$$

(7)

Protecting FC Layer. Since FC is a linear layer, it can be directly computed in the encrypted domain using additions and multiplications. Vectorization is achieved straightforwardly:

$$\begin{aligned} \begin{aligned} \left\langle {\mathbf {z^{[i]}}} \right\rangle _\mathbf{pub }&\equiv \left\langle {\mathbf {W^{[i]}}*\mathbf {a^{[i-1]}}+\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\&=\left\langle {\mathbf {W^{[i]}}} \right\rangle _\mathbf{pub }*\left\langle {\mathbf {a^{[i-1]}}} \right\rangle _\mathbf{pub }+\left\langle {\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\ \end{aligned}~[14] \end{aligned}$$

(8)

$$\begin{aligned} \begin{aligned} \left\langle {\mathbf {z^{[i]}}} \right\rangle _\mathbf{pub }&\equiv \left\langle {\mathbf {W^{[i]}}*\mathbf {a^{[i-1]}}+\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\&=\left\langle {\mathbf {W^{[i]}}} \right\rangle _\mathbf{pub }*\left\langle {\mathbf {a^{[i-1]}}} \right\rangle _\mathbf{pub }+\left\langle {\mathbf {b^{[i]}}} \right\rangle _\mathbf{pub } \\ \end{aligned}~[14] \end{aligned}$$

(9)

$$\begin{aligned} \left\langle {\mathbf {a^{[i]}_k}} \right\rangle _\mathbf{pub }\equiv \left\langle {\mathbf {f_{approxact}\left( z^{[i]}_k\right) }} \right\rangle _\mathbf{pub }~[14] \end{aligned}$$

(10)

Convolutional Layer (Conv). Conv layers constitute a key improvement for image recognition and classification using NNs. The $\mathbb {R}^{2|3}\rightarrow \mathbb {R}^{2|3}$ linear transformation involved is spatial convolution, where a 2D $s*s$ filter (a.k.a. kernel) is multiplied to the 2D input image in subsets (patches) with size $s*s$ and in defined steps (strides), then added up and then shifted by a bias (see Fig. 3). For input data with several channels or maps (e.g.: RGB counts as 3 channels), the filter is applied to the same patch of each map and then added up into a single value of the output image (cumulative sum across maps). A map in Conv layers is the equivalent of a neuron in FC layers. We define:

$\mathbf {A^{[i]}_k}$ as the map k of layer i;
$\mathbf {Z^{[i]}_k}$ as the linear output of map k of layer i;
($\mathbf {Z^{[i]}_k}=\mathbf {A^{[i]}_k}$ in absence of activation function)
${b^{[i]}_k}$ as the bias value for map k in layer i
$\mathbf {W^{[i]}_k}$ as the $s*s$ filter/kernel for map k.

This operation can be vectorized by smartly replicating data [27]. The linear transformation can be expressed as:

$$\begin{aligned} \mathbf {Z^{[i]}_k}=\left( \sum _{m=0}^{M\; maps}\mathbf {A^{[i-1]}_m}\oplus \mathbf {W^{[i]}}_k\right) +{b^{[i]}_k}~[14] \end{aligned}$$

(11)

Protecting Convolutional Layers. Convolution operation can be decomposed in a series of vectorized sums and multiplications over patches of size $s*s$:

$$\begin{aligned} \begin{aligned}&\left\langle {\mathbf {Z^{[i]}_k}} \right\rangle _\mathbf{pub }=\left\langle {\left( \sum _{m=0}^{M\; maps}\;\mathbf {A^{[i-1]}_m}\oplus \mathbf {W^{[i]}_k}\right) +b^{[i]}_k } \right\rangle _\mathbf{pub } =\\&\sum _{m=0}^{M\; maps}\left\langle {\mathbf {A^{[i-1]}_m}\oplus \mathbf {W^{[i]}_k}} \right\rangle _\mathbf{pub }+\left\langle {b^{[i]}_k} \right\rangle _\mathbf{pub } =\\&\left\{ \sum _{m=0}^{M\;}\left\langle {\mathbf {A^{[i-1]}_m}[j]} \right\rangle _\mathbf{pub }*\left\langle {\mathbf {W^{[i]}}_k} \right\rangle _\mathbf{pub }\right\} _{[s*s]}+\left\langle {b^{[i]}_k} \right\rangle _\mathbf{pub } \\ \end{aligned}~[14] \end{aligned}$$

(12)

Pooling Layer. This layer reduces the input size by using a packing function. Most commonly used functions are max and mean. Similarly to convolutional layers, pooling layers apply their packing function to patches (subsets) of the image with size $s*s$ at strides(steps) of a defined number of pixels, as depicted in Fig. 4.

Protecting Pooling Layer. Max can be approximated by the sum of all the values in each patch of size $s*s$, which is equivalent to scaled mean pooling. Mean pooling can be scaled (sum of values) or standard (multiplying by 1/N). By employing a flattened input, pooling becomes easily vectorized.

Other Techniques

Batch Normalization (BN): reduces of the range of input values by ‘normalizing’ across data batches: subtracting mean and dividing by standard deviation. BN also allows finer tuning using trained parameters $\beta $ and $\gamma $ ($\epsilon $ is a small constant used for numerical stability).
$$\begin{aligned} a^{[i+1]}_k=BN_{\gamma , \beta }(a^{[i]}_k)=\gamma *\frac{a^{[i]}_k-E[a^{[i]}_k]}{\sqrt{Var[a^{[i]}_k]+\epsilon }}+\beta ~[14] \end{aligned}$$
(13)
Protection of BN: is achieved by treating division as the inverse of a multiplication.
$$\begin{aligned} \begin{aligned} \left\langle {a^{[i+1]}_k} \right\rangle _\mathbf{pub }&=\left\langle {\gamma } \right\rangle _\mathbf{pub }*\left( \left\langle {a^{[i]}_k} \right\rangle _\mathbf{pub }-\left\langle {E[a^{[i]}_k]} \right\rangle _\mathbf{pub }\right) \\ *&\left\langle {\frac{1}{\sqrt{Var[a^{[i]}_k]+\epsilon }}} \right\rangle _\mathbf{pub }+\left\langle {\beta } \right\rangle _\mathbf{pub } \end{aligned}~[14] \end{aligned}$$
(14)
Dropout and Data Augmentation: only affect training procedure. They don’t require protection.
Residual Block: is an aggregation of layers where the input is added unaltered at the end of the block, thus allowing the layers to learn incremental (‘residual’) modifications (Fig. 5).
$$\begin{aligned} \mathbf {A^{[i]}}=\mathbf {A^{[i-1]}}+ResBlock\left( \mathbf {A^{[i-1]}}\right) \end{aligned}$$
(15)
Protection of ResBlock: is achieved by protecting the sum and the layers inside ResBlock:
$$\begin{aligned} \left\langle {\mathbf {A^{[i]}}} \right\rangle _\mathbf{pub }=\left\langle {\mathbf {A^{[i-1]}}} \right\rangle _\mathbf{pub }+\left\langle {ResBlock\left( \mathbf {A^{[i-1]}}\right) } \right\rangle _\mathbf{pub }~[14] \end{aligned}$$
(16)

3.2 Activation Function Protection

Due to their innate non-linearity, activation functions need to be approximated with polynomials to be encrypted with FHE. Several approaches have been elaborated in the literature. In [13, 22], the authors proposed to use a square function as activation function. The last layer, a sigmoid activation function, is only applied during training. Chabanne et al. used Taylor polynomials around $x=0$, studying performance based on the polynomial degree [4]. In [18], Hesamifard et al. approximate instead the derivative of the function and then integrate to obtain their approximation.

Regardless on the approximation technique, we denote activation function $f_{act}()$ approximation as

$$\begin{aligned} \mathbf {f_{act}()} \approx \mathbf {f_{approxact}()}~[14] \end{aligned}$$

(17)

By construction, we have

$$\begin{aligned} \begin{aligned} \left\langle {\mathbf {a^{[i]}_k}} \right\rangle _\mathbf{pub }&= \left\langle {\mathbf {f_{act}\left( z^{[i]}_k\right) }} \right\rangle _\mathbf{pub }\\&\equiv \left\langle {\mathbf {f_{approxact}\left( z^{[i]}_k\right) }} \right\rangle _\mathbf{pub } \end{aligned}~[14] \end{aligned}$$

(18)

Rectifier Linear Unit (ReLU): is currently considered as the most efficient activation function for DL. Several variants have been proposed, such as Leaky ReLU [23], ELU [7] or its differentiable version Softplus.
$$\begin{aligned} \begin{aligned} ReLU(z)&=z^+=max(0, z) \\ Softplus(z)&= log(e^z + 1) \end{aligned}~[14] \end{aligned}$$
(19)
Sigmoid $\sigma $. The classical activation function. Its efficiency has been debated in the DL community.
$$\begin{aligned} Sigmoid(z)= \sigma (z)=\frac{1}{1+e^{-z}}~[14] \end{aligned}$$
(20)
Hyperbolic Tangent (tanh): is currently being used in the industry because it is easier to train than ReLU: it avoids having any inactive neurons and it keeps the sign of the input.
$$\begin{aligned} tanh(z)= \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}~[14] \end{aligned}$$
(21)

Protecting Activation Functions. Due to its innate non-linearity, they need to be approximated with polynomials. [13] proposed using only $\sigma (z)$ approximating it with a square function. [4] used Taylor polynomials around $x=0$, studying performance based on the polynomial degree. [18] approximate instead the derivative of the function and then integrate to obtain their approximation. One alternative would be to use Chebyshev polynomials.

4 Architecture

In this section we outline the architecture of our IP protection system, as depicted in Fig. 6.

4.1 Encryption of Trained DNN

At backend-level, a DNN is trained by the DNN Training Agent, . Training outcome (NN architecture and parameters) is pushed to the Trained DNN Protection Agent, . Alternatively, an already trained DNN can be imported directly into the Protection Agent. The DNN Protection Agent generates a Fully Homomorphic key pair from the Key Generator component, . The DNN is then encrypted and stored together with its homomorphic key pair in the Trained and Protected DNN Database, .

4.2 Deployment of Trained and Protected DNN

At the deployment phase, the Trained DNN Deployment Agent deploys the DNN on distributed systems, together with its public key, .

4.3 DNN Processing

On the distributed system, data is collected by a Data Stream Acquisition component, , and forwarded to the DNN Processing Agent, . Input layer does not involve any computation, and therefore can be seamlessly FHE encrypted as follows

$$\begin{aligned} \mathbf {X} \xrightarrow {encryption} Enc_{\mathbf {pub}} (X) = \left\langle {X} \right\rangle _\mathbf{pub }~[14] \end{aligned}$$

(22)

Encrypted inferences are sent to the Decryption Agent, , for their decryption using the private key associated to the DNN, . FHE encryption propagates across the DNN layers, from input to output layer. By construction, output layer is encrypted homomorphically.

IP of the DNN, together with the computed inferences, is protected from any disclosure on the distributed system throughout the entire process.

The decryption of the last layer’s output $\mathbf {Y}$ is done with the private encryption key priv, as in standard asymmetric encryption schemes:

$$\begin{aligned} \left\langle {\mathbf {A^{[L]}}} \right\rangle _\mathbf{pub } \xrightarrow {decryption} Dec_{\mathbf {priv}}\left( \left\langle {\mathbf {A^{[L]}}} \right\rangle _\mathbf{pub }\right) =\mathbf {Y}~[14] \end{aligned}$$

(23)

4.4 Sequential Processes

Encryption of Trained NN. Once a Neural Network is trained or imported, we encrypt all its parameters, using the Protected NN DataBase to store it and handle Homomorphic Keys (Fig. 7).

Deploy Trained and Protected NN. The newly trained and protected deep neural network is deployed on the decentralized systems, including:

1.
Network architecture;
2.
Network model: Encrypted parameters;
3.
Public encryption key.

Encrypted Inference. On the decentralized system, data is collected and injected into the deployed NN. We must encrypt $\mathbf {A^{[0]}}=\mathbf {X}$ with the public encryption key associated to the deployed NN (Fig. 8).

Inference Decryption. Encrypted inferences are sent to backend, together with an identifier of the NN used for the inference. The inference is homomorphically decrypted using the mapping private decryption key (Fig. 9).

5 Evaluation

As detailed in Sect. 2.3, FHE introduces additional computational costs at each step of the DNN life-cycle. In this section, we evaluate performance overhead from computation time, memory load and disk usage perspectives at DNN model and processing encryption and output decryption.

5.1 Hardware Setup

As backend, we use a NVIDIA DGX-1^{Footnote 1} server, empowered with 8 Tesla V100 GPUs. This machine is theoretically not resource-constrained (computation & memory). We reasonably neglect the impact of the performance overhead introduced by FHE on DNN trained model encryption and output decryption.

We deploy and execute our encrypted DNN on a NVIDIA Jetson-TX2^{Footnote 2}. Powered by NVIDIA Pascal architecture, this platform embeds 256 CUDA cores, CPU HMP Dual Denver 22 MB L2 + Quad ARM A572 MB L2, and 8 GB of memory. This platform gets closer to the hardware configuration of a Distributed Enterprise System.

5.2 Software Setup

DNN Model. As demonstrated in Sect. 3, our approach is fully agnostic from NN topology, or implementation. For the sake of our evaluation, involving several modifications to the NN model, we choose a simple CNN classifier^{Footnote 3}, implemented with the Keras library^{Footnote 4}. Two datasets have been used in our experiment: CIFAR10^{Footnote 5}, for image classification, and MNIST^{Footnote 6} for handwritten digits classification.

As depicted in Fig. 10, we distinguish two main parts in this CNN: a feature extractor and a classifier. The feature extractor reduces the amount of information from the input image, into a set of high level and more manageable features. This step facilitates the subsequent classification of the input data.

Composed of four layers, $[\textit{FC} \rightarrow \textit{ReLU} \rightarrow \textit{FC} \rightarrow \textit{Softmax}]$, the classifier categorizes the input data according to the extracted features, and outputs discrete probability distribution over 10 classes of objects.

As reference point, we evaluate key performance figures at model training and processing time without encryption. Once trained, the size of the CNN plaintext model is 9.6 Mb. On Jetson TX2, single unencrypted image classification is computed on average in 89.1 ms.

FHE Library. As introduced in Sect. 2.3, several libraries are available for FHE. We use SEAL [29] C library from Microsoft Research running on CPU. This choice is motivated by the library’s performance, support of multiple schemes such as BGV [3], stability, and documentation. The use of SEAL, implemented in C++, with the Keras Python library requires some engineering efforts. To enable both fast performance of the native C++ library and rapid prototyping using Python, we use Cython^{Footnote 7}.

We conduct our evaluation with the BGV scheme [3], utilizing the integer encoding with SIMD support. To handle the floating-point DNN parameters, we use fixed-point arithmetic with a fixed scaling factor, similarly to CryptoNets [13]. This has no noticeable impact on the classification accuracy, if a suitable scaling factor is applied. The SIMD operations allow for optimized performance through vectorization.

5.3 Linearization

We tackle the problem of linearization of the ReLU functions following approaches: we approximate it with a modified square function, and we skip activation function. The modified square function $x^2+2x$ (see Fig. 13) is derived from the ReLU approximation proposed in [4]. In order to optimize the computation of that function on ciphertexts, we used simpler coefficients.

In order evaluate the impact of these approaches, we trained the CNN on the CIFAR10 and MNIST datasets, replacing the last ReLU activation. Depicted in Figs. 11 and 12, we report the accuracy loss. Both approximations have merely a minor impact on the output classification accuracy.

Skipping the last activation function shows good results on this simple CNN, but we do not want to generalize to any other DNN or dataset.

5.4 Experimentation Results

Model & Data Protection. Intellectual Property-wise, we consider the feature extractor as of minor importance, as CNNs generally use state of the art feature extractor. The IP of the model rather lies in the parameters, weights and bias, of the trained classifier. For that reason, we encrypt the classifier only, as a first step towards full model encryption, as depicted in Fig. 10. To better understand the impact of computation depth, we also complete our evaluation with the encryption of the last FC layer only.

Confidentiality-wise, we evaluate the impact of extracted features encryption by comparing processing performance on an encrypted model with plaintext and encrypted feature extractor outputs.

As depicted in Fig. 10, we evaluate our approach on three modified versions of the model:

Last FC Layer Encrypted
Full Classifier Encrypted with no Activation Function
Full Classified Encrypted with our Modified Square Activation Function

Confidentiality-wise, we evaluate the impact of extracted features encryption by comparing processing performance on an encrypted model with plaintext and encrypted feature extractor outputs.

In order to optimize our approach, we omit the Softmax layer within the classifier. This layer does not have any influence on the classification results, as Softmax layer is mostly required at training phase, to normalize network outputs probability distribution, for more consistent loss calculations.

The overall experiment as described in Sect. 4 has been applied 5 times on each model. We report average evaluation metrics for each step: model encryption, processing encryption and decryption.

DNN Model Encryption. Each trained CNN model is encrypted on DGX-1’s CPU. In Table 2, we depict the resource consumption average on the following metrics:

Time to Compute: Time to encrypt the model.
Model Size: Size of resulting encrypted model.
Memory Load: Overall memory usage for model encryption.

We target three security levels: 128, 192, and 256-bits. For each of those, we optimize SEAL parameters as introduced in Sect. 2.3, maximizing performance, and minimizing leftover noise budget. Note that the security levels can have a counter-intuitive effect on performance, where for instance 192-bit security level might be faster that 128-bit security level. This can be explained by the fact that 128-bit security level offers more (unnecessary) noise budget, depending on the choice of FHE scheme parameters (e.g. plaintext modulus, polynomial degree). Therefore, we target a remaining noise budget as close as possible to zero.

Compared to the plaintext model size (9.6 Mb), encrypted model size increases by a factor of 8,22 in the best case, up to 1173,33.

Table 2. Model encryption

Full size table

DNN Processing Encryption. The three encrypted CNN models deployed on Jetson-TX2 for CPU based encrypted processing. At this stage, we evaluate the following metrics

Time to compute: Processing time for an encrypted classification.
Memory: Memory usage for encrypted classification.
Remaining Noise Budget: At the end of processing encryption, we evaluate the remaining noise budget, which determines if additional encryption operations could be performed on the output vector.

In Tables 3 and 4, we depict the performance of encrypted processing with plaintext and encrypted previous layer outputs. We study the impact of confidentality preservation of the preceding layer outputs. SEAL library supports secure computation over plaintext and ciphertext producing ciphertext. As a consequence, output of the last MaxPooling2D layer can be processed in FHE-encrypted Fully Connected layer. Secure computation between plaintext and ciphertext has a lower impact on performance.

We observe a slight performance improvement on time to compute and memory between 128 and 192-bit security level. This is due to the FHE parameters optimization as described in Sect. 5.4, where initial noise budget is oversized for 128-bit security level, which has a direct impact to performance.

Experiment results show that, depending on the level of achieved security, and targeted scenario, we can achieve at best encrypted classification in 2.1 s (for 128 level security and only one layer encrypted). In the worst case, with encrypted input, full classifier encrypted with a modified square function as activation layer, 5627 s (93 mins) is required for a single classification.

Table 3. Runtime encryption with plaintext input.

Full size table

Table 4. Runtime encryption with encrypted input.

Full size table

Table 5. Decryption - performance.

Full size table

Decryption. Following our approach, encrypted output are decrypted by the backend, on DGX-1. We therefore consider decryption as not computationally expensive, compared to encryption. Results are available in Table 5.

6 Conclusion

In this paper, we discuss and evaluate a holistic approach for the protection of distributed DNN-based/enhanced software assets, i.e. confidentiality of their input & output data streams as well as safeguarding their Intellectual Property. On that matter, we take advantage of Fully Homomorphic Encryption (FHE). We evaluate the feasibility of this solution on a Convolutional Neural Network (CNN) for image classification.

Our evaluation on NVIDIA DGX-1 and Jetson-TX2 shows promising results on the CNN image classifier. Firstly, the impact of activation function approximation is negligible, with almost no accuracy loss on output classification probability. Most of the overhead is introduced at processing time, affecting computation time & memory consumption. Performances vary from 2.1 s for an encrypted classification, with only 53.9 Mb consumed memory, up to 1 h 33 min with almost 5 Gb of consumed memory. This requires a balancing between expected classification throughput, targeted security level and encryption depth of the model. Currently this approach would be unrealistic for the protection of DNN-based/enhanced software assets real-time analytics. Still, the Industry calls for numerous scenarios – such as predictive maintenance – matching the current performance of our approach.

As future work, we aim to improve the performance of our approach by different means: following the constant evolution of FHE, such as with the recent CKKS scheme [5], acceleration of FHE libraries on GPU based infrastructure or optimized vectorized operations on FHE encrypted data [1]. In addition, we foresee a deployment of our solution into a Smart City scenario for risk prevention in public spaces; while expanding our approach to different types of DNNs, and complete encryption of CNNs, including the feature extraction layers.

Notes

References

Badawi, A.A., et al.: The AlexNet moment for homomorphic encryption: HCNN, the first homomorphic CNN on encrypted data with GPUs. CoRR abs/1811.00778 (2018)
Google Scholar
Boemer, F., Ratner, E., Lendasse, A.: Parameter-free image segmentation with SLIC. Neurocomputing 277, 228–236 (2018). https://doi.org/10.1016/j.neucom.2017.05.096
Article Google Scholar
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: Fully homomorphic encryption without bootstrapping. Cryptology ePrint Archive, Report 2011/277 (2011). https://eprint.iacr.org/2011/277
Chabanne, H., de Wargny, A., Milgram, J., Morel, C., Prouff, E.: Privacy-preserving classification on deep neural network. IACR Cryptology ePrint Archive 2017, 35 (2017)
Google Scholar
Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: Bootstrapping for approximate homomorphic encryption. IACR Cryptology ePrint Archive 2018, 153 (2018). http://eprint.iacr.org/2018/153
Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. Cryptology ePrint Archive, Report 2018/421 (2018). https://eprint.iacr.org/2018/421
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press, Cambridge (2015)
Book Google Scholar
Dai, W., Sunar, B.: cuHE: a homomorphic encryption accelerator library. In: Pasalic, E., Knudsen, L.R. (eds.) BalkanCryptSec 2015. LNCS, vol. 9540, pp. 169–186. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29172-7_11
Chapter Google Scholar
Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 617–640. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_24
Chapter MATH Google Scholar
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144 (2012). https://eprint.iacr.org/2012/144
Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University, Stanford, CA, USA (2009). aAI3382729
Google Scholar
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: Cryptonets: applying neural networks to encrypted data with high throughput and accuracy, pp. 201–210 (2016)
Google Scholar
Gomez, L., Ibarrondo, A., Márquez, J., Duverger, P.: Intellectual property protection for distributed neural networks - towards confidentiality of data, model, and inference. In: Samarati, P., Obaidat, M.S. (eds.) Proceedings of the 15th International Joint Conference on e-Business and Telecommunications, ICETE 2018. SECRYPT, Porto, Portugal, 26–28 July 2018, vol. 2, pp. 313–320. SciTePress (2018). https://doi.org/10.5220/0006854703130320
Goodfellow, I.: Security and privacy of machine learning. RSA Conference (2018). https://www.rsaconference.com/speakers/ian-goodfellow
Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1
Chapter Google Scholar
Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_31
Chapter MATH Google Scholar
Hesamifard, E., Takabi, H., Ghasemi, M.: CryptoDL: deep neural networks over encrypted data. CoRR (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Liu, J., Juuti, M., Lu, Y., Asokan, N.: Oblivious neural network predictions via minionn transformations. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631. ACM (2017)
Google Scholar
Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of training neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 855–863. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5267-on-the-computational-efficiency-of-training-neural-networks.pdf
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)
Google Scholar
Mohassel, P., Zhang, Y.: SecureML: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017)
Google Scholar
PALISADE: The palisade lattice cryptography library (2018). https://git.njit.edu/palisade/PALISADE
European Parliament and of the Council: General data protection regulation (2016). https://eur-lex.europa.eu/eli/reg/2016/679/oj
Ren, J.S., Xu, L.: On vectorization of deep convolutional neural networks for vision tasks. In: AAAI, pp. 1840–1846 (2015)
Google Scholar
Schatsky, D., Kumar, N., Bumb, S.: Intelligent IoT. Bringing the power of AI to the Internet of Things, Deloitte Insights (2017). https://www2.deloitte.com/insights/us/en/focus/signals-for-strategists/intelligent-iot-internet-of-things-artificial-intelligence.html
Simple Encrypted Arithmetic Library (release 3.1.0), Microsoft Research, Redmond, WA, December 2018. https://github.com/Microsoft/SEAL
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321. ACM (2015)
Google Scholar
Uchida, Y., Nagai, Y., Sakazawa, S., Satoh, S.: Embedding watermarks into deep neural networks. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 269–277. ACM (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

SAP Global Security, SAP Security Research, Mougins, France
Laurent Gomez
Eurecom, Sophia Antipolis, France
Alberto Ibarrondo
Hasso Plattner Institute, University Potsdam, Potsdam, Germany
Marcus Wilhelm
Portfolio Strategy and Technology Adoption, SAP SE, Walldorf, Germany
José Márquez
City of Antibes - Juan les Pins, France
Patrick Duverger

Authors

Laurent Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Ibarrondo
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
José Márquez
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Duverger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurent Gomez .

Editor information

Editors and Affiliations

University of Jordan, Amman, Jordan
Mohammad S. Obaidat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gomez, L., Ibarrondo, A., Wilhelm, M., Márquez, J., Duverger, P. (2019). Security for Distributed Machine Learning Based Software. In: Obaidat, M. (eds) E-Business and Telecommunications. ICETE 2018. Communications in Computer and Information Science, vol 1118. Springer, Cham. https://doi.org/10.1007/978-3-030-34866-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-34866-3_6
Published: 13 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34865-6
Online ISBN: 978-3-030-34866-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Security for Distributed Machine Learning Based Software

Abstract

Similar content being viewed by others

Practical Deep Neural Network Protection for Unmodified Applications in Intel Software Guard Extension Environments

Secure Transfer Learning for Machine Fault Diagnosis Under Different Operating Conditions

SoK: Cryptography for Neural Networks

Keywords

1 Introduction

1.1 Motivation

1.2 Problem Statement

1.3 State-of-the-Art

1.4 Data and Intellectual Property Protection for Deep Neural Networks

2 Fundamentals

2.1 Deep Neural Network

2.2 Homomorphic Encryption

Definition 1

Definition 2

2.3 Challenges

3 Approach

3.1 Linear Computation Layer Protection

3.2 Activation Function Protection

4 Architecture

4.1 Encryption of Trained DNN

4.2 Deployment of Trained and Protected DNN

4.3 DNN Processing

4.4 Sequential Processes

5 Evaluation

5.1 Hardware Setup

5.2 Software Setup

5.3 Linearization

5.4 Experimentation Results

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation