Keywords

1 Introduction

With recent advances in deep learning, cloud service providers expose trained deep neural networks (DNNs) to end-users for prediction-as-a-service (PaaS) through their application programming interfaces (APIs) [4, 5, 7, 25, 56]. For instance, Amazon Forecast enables business analytics by performing forecasting on client time-series data [3] and Azure’s Cognitive summarizes and classifies financial documents [6]. However, PaaS applications raise privacy concerns as both the user data (e.g., client time-series, text, or health data) and the machine learning model (due to intellectual property concerns), can be sensitive information, and cloud service providers must comply with privacy regulations such as CCPA [12], GDPR [20], and HIPAA [30]. Thus, it is now needed more than ever to protect the privacy of the data used in PaaS applications.

To enable privacy-preserving PaaS, various works propose performing encrypted DNN inference by employing homomorphic encryption (HE) schemes which allow computations directly on ciphertexts [8, 10, 11, 16, 17, 22, 29, 31, 33, 37, 38, 41, 42, 49]. However, to cope with the computational overhead introduced by HE operations and to account for the characteristics of modern HE schemes, e.g., their support of Single Instruction, Multiple Data (SIMD) operations, these works rely on various optimizations which are tailored to specific PaaS scenarios, the most common of which comprises a cleartext DNN model and encrypted data. As a result, existing HE-based works cannot support emerging scenarios, e.g., edge machine learning [2, 44, 55], that require outsourcing the prediction to the client (while protecting the model’s intellectual property), or privacy-preserving federated learning where inference is performed on a model that is trained in encrypted form by multiple data providers [52, 53, 57]. Moreover, these works rely on data packing schemes adapted to specific DNN architectures and application requirements, aiming either to minimize prediction latency (typically by processing one sample at a time, e.g., for real-time analytics), or to maximize the number of samples processed per second (usually by collecting and then processing in parallel a large number of samples leveraging on SIMD capabilities).

In this work, we design slytHErin, an agile framework for encrypted DNN inference. Built on HE and its multiparty variant, our framework can be adapted to various and novel PaaS scenarios where: (i) the client’s data is encrypted while the model is in cleartext, (ii) the client’s data is in cleartext and the model is encrypted, and (iii) both the client’s data and the model are encrypted. Moreover, slytHErin features application- and model-agnostic optimizations which make it suitable for various settings. For instance, slytHErin implements an intuitive and flexible packing scheme that efficiently enables SIMD operations for arbitrary batch sizes, and generic optimizations for encrypted matrix operations. We implement slytHErin in Go and provide the building blocks that enable the encrypted execution of any DNN model composed of fully-connected, convolutional, and pooling layers. Contrary to prior works, our implementation is not centered around a system model, specific assumptions, or DNN architectures, making it a versatile tool for securing different PaaS pipelines. Our evaluation shows that slytHErin achieves accuracy similar to performing inference on cleartext data and/or models. Moreover, it yields an interesting trade-off between latency and throughput, and its overall performance is on par with that of the state-of-the-art HE-based inference solutions, while being more flexible than specialized solutions. Our implementation can be found on https://github.com/ldsec/slytHErin.

2 Related Work

Given the potential privacy issues that might arise in PaaS, a number of works that build encrypted PaaS frameworks have been proposed. These works rely on homomorphic encryption (HE) and/or multiparty computation (MPC) to protect the confidentiality of both the ML model and the client’s evaluation data during prediction [8, 10, 11, 15, 16, 22, 29, 31, 33, 37, 38, 41, 42, 46, 48, 49].

HE-Based Solutions. Cryptonets was the first work in this research direction that enabled DNN evaluation on encrypted data using an HE scheme [22]. Its overhead, in terms of latency, was later improved by Brutzkus et al. which proposed novel approaches to represent the input data [11]. Other works focus on improving the efficiency of encrypted matrix operations [32, 39] or on designing novel techniques for the encrypted evaluation of more complex ML models such as graph convolutional networks [47]. The latter has been used in downstream tasks such as human action recognition [34] achieving better latency than [11]. Other works develop compilers that ease the deployment of trained ML models with HE libraries, e.g., SEAL [54], HElib [27], or Palisade [50], for encrypted inference. Boemer et al. [8, 9] build a graph compiler for SEAL that simplifies the use of a model trained with Tensorflow [1] or PyTorch [45] for encrypted PaaS. CHET, on the other hand, is a domain-specific optimizing compiler that allows the specification of tensor circuits suitable for HE-based DNN inference [18]. All of these works propose specific input data representations (packing) and optimizations for either latency or throughput for specific scenarios (e.g., featuring a cleartext model vs. encrypted data) and DNN architectures. Moreover, to cope with DNN non-linear operations that are not supported by HE schemes, e.g., activations, they either use interactions with the client [8], modify their functionality to low-degree polynomial functions [11, 18, 22, 34], or use polynomial approximations [13, 47].

Hybrid Approaches. To ease the encrypted execution of non-linear functions, some works rely on hybrid approaches combining two-party computation with HE [31, 33, 37, 48], or secret sharing with garbled circuits [42, 46, 49]. For instance, Liu et al. [37] utilize HE for matrix multiplications and garbled circuits for the non-linear activations. Juvekar et al. [33] employ HE for matrix-vector multiplication and convolution operations and garbled circuits for comparisons which are widely used in activation functions. Similarly, we provide a hybrid framework for privacy-preserving PaaS that supports a wide range of applications by relying on a multiparty variant of HE. Moreover, thanks to our generic data representation scheme and optimizations, our framework is agnostic of the DNN architecture and parameters such as batch size, while achieving on par performance with the state-of-the-art.

3 Background

3.1 Homomorphic Encryption

Homomorphic encryption (HE) schemes enable the execution of arithmetic operations directly on ciphertexts, i.e., without requiring decryption; this makes them ideal candidates for privacy-preserving machine learning inference applications. In this work, we employ the Cheon-Kim-Kim-Song (CKKS) scheme [14], which is suitable for machine learning tasks as it enables approximate arithmetic over \(\mathbb {C}^{\mathcal {N}/2}\) (hence, over real values as well). The ring \(R_{Q_{L}}=\mathbb {Z}_{Q_{L}}[X]/(X^\mathcal {N}+1)\) of dimension \(\mathcal {N}\) with coefficients modulo \(Q_{L}=\prod _{i=0}^{L}q_{i}\) defines the plaintext and ciphertext spaces, hence both plaintexts/ciphertexts are represented by polynomials of degree \(\mathcal {N}-1\) whose coefficients encode a vector of \(\mathcal {N}/2\) values. The security of CKKS is based on the ring learning with errors problem [40]. CKKS supports the homomorphic evaluation of operations such as additions, multiplications, and rotations, and any operation is simultaneously performed on all encoded values, hence offering Single Instruction, Multiple Data (SIMD). Non-linear operations, e.g., comparisons, are supported via polynomial approximations, introducing a computation overhead versus accuracy tradeoff. CKKS is a leveled HE scheme, i.e., an L-depth circuit can be evaluated before the ciphertext is exhausted. Then, a costly procedure, called bootstrapping [21], is required to refresh the exhausted ciphertext and enable more operations on it. We refer to the traditional bootstrapping operation (performed by a single party) as centralized bootstrapping.

Multiparty Homomorphic Encryption (MHE). To make our framework adaptable to various PaaS scenarios (see Sect. 4), we also rely on a multiparty variant of the CKKS scheme [43]. In the multiparty homomorphic encryption (MHE) scheme, a set of parties (e.g., model-providers) collectively generate a public key while the corresponding secret key is secret-shared among them. This setting enables secure collaboration between N parties, as parties use the collective public key to encrypt their inputs and perform joint operations on them using the MHE scheme. The result decryption by the client, however, requires the participation of all parties. Hence, this scheme ensures confidentiality under a passive adversary model with up to \(N-1\) collusions. Moreover, the multiparty CKKS scheme offers efficient multiparty computation protocols. For instance, it enables a collective bootstrapping operation, where the costly centralized bootstrapping which homomorphically evaluates the decryption and consumes many levels, is substituted by a lightweight one-round interactive protocol (\(\textsf {CBootstrap}(\cdot )\)) which does not consume levels. Moreover, the scheme supports (\(\textsf {CKeySwitch}(\cdot )\)), a collective key-switch operation which can change the encryption key of a ciphertext.

3.2 Deep Neural Networks

Deep neural networks (DNNs) are able to model complex non-linear relationships and find applicability in various domains such as computer vision. A DNN consists of multiple hidden layers between the input and output layers. Our framework enables the encrypted evaluation of DNNs comprising fully connected (FC), convolutional (Conv), and pooling (Pool) layers. We succinctly present the functionalities of these layer types:

  • Fully Connected layer: Given an input vector \(\textbf{x}\), a weight matrix \(\textbf{W}\) and a bias vector \(\textbf{b}\), a FC-layer computes \(\textbf{x}\mathbf {W^T}+\textbf{b}\).

  • Convolutional layer: Given an input tensor (e.g., an image) \(\textbf{X}\) with \(c_i\) channels of dimensions \(w \cdot h\) and a set of \(c_o\) kernels \(\textbf{K}\) each made up of \(c_i\) filters of size \(f_w\cdot f_h\), a Conv-layer computes a tensor \(\textbf{O}\) with \(c_o\) channels. Each channel \(\textbf{O}_i\) is computed as \(\sum _{n=0}^{c_i} \textbf{X}_n * \textbf{K}_{i,n}\), with \(\textbf{X}_n\) the n-th channel of the input image, \(\textbf{K}_{i,n}\) the n-th filter of the i-th kernel, and \(*\) the cross-correlation operator.

  • Pooling layer: It performs dimensionality reduction on the input. The most common types are SumPooling, AveragePooling, and MaxPooling, where the feature-map is the sum, average, and the maximum of the features in a region of the input, respectively. Max-Pooling requires non-linear operations, i.e., comparisons, which are non-trivial to implement under encryption, thus we only consider the first two types.

Each layer can be paired with an activation function which is evaluated on its output. The output of the DNN’s last layer is the prediction result (output).

4 slytHErin Overview

Building on the CKKS HE scheme and its multiparty variant (see Sect. 3), we design a framework that is flexible for various encrypted PaaS scenarios (Fig. 1). We first describe the involved entities before detailing slytHErin’s objectives and workflow for each PaaS scenario.

  • Model-provider(s): This entity (one or more) has trained an ML model and exposes it to end-users for queries (PaaS) through a prediction API hosted on a cloud service provider.

  • Client: This entity is a user of the PaaS that inputs its own sensitive data which is evaluated on the model exposed by the model-provider. The client obtains the output of the PaaS process, i.e., the prediction.

Fig. 1.
figure 1

Encrypted PaaS scenarios enabled by slytHErin. Encryption is depicted with a lock whose color is the same as the corresponding secret key. The black key (rightmost figure) corresponds to the model-providers’ collective key. Scenario 1: The client sends its encrypted data to the model-provider that evaluates it on the plaintext model. Scenario 2: The encrypted model is sent to the client for evaluation on its cleartext data. Scenario 3: The client sends encrypted data to a cohort of model-providers that retain an encrypted model.

We consider that the client and the model-provider are honest-but-curious, i.e., they follow the protocol specification, but they might try to infer information about each other’s data. slytHErin’s objective is to protect both the confidentiality of the client’s and the model-provider’s data. In particular, the model-provider should not learn any information about the client’s evaluation data and the prediction result, whereas the client should not obtain any knowledge about the model beyond what can be inferred from the PaaS output.

4.1 Scenario 1: Encrypted Client Data - Cleartext Model

This is the traditional HE-based PaaS setting, where a client encrypts its data with its own public key and sends the ciphertext to the model-provider that stores its ML model in plaintext form. The model-provider evaluates its model on the client’s encrypted data – without interacting with the client – and returns the encrypted prediction to the client. The client decrypts the ciphertext with its secret key and obtains the prediction result. In this scenario, the client’s data confidentiality is ensured as its inputs are encrypted throughout the DNN evaluation and the model-provider does not learn the prediction result. The model confidentiality is protected as the model remains on the model-provider’s side. Scenario 1 represents a typical PaaS setting, where a model-holder exposes a prediction service that receives sensitive data as inputs [11, 16, 22, 33]. For instance, imagine a health-care insurance provider that uses its customer data and trains a DNN that predicts the probability of patient re-admission to a hospital. The model is exposed through an API to clients (e.g., hospitals) who wish to obtain predictions about their own cohorts of patients. However, hospitals cannot share their patient data with third-parties due to ethical and data privacy requirements, hence, slytHErin could be an enabler for such a service as it ensures data confidentiality.

4.2 Scenario 2: Cleartext Client Data - Encrypted Model

In this scenario, the model-provider outsources the computation of the prediction to the client. However, the model is an intellectual property that needs to be protected. Thus, the model-provider encrypts its model with its own public key and sends it to the client in encrypted form. The client evaluates the encrypted model on its own (plaintext) data and obtains an encrypted prediction. Finally, the client sends the prediction ciphertext to the model provider, which obliviously decrypts the result and communicates it back to the client (Sect. 5.6). The client’s data confidentiality is ensured as its evaluation data is never transferred and the model-provider does not learn the prediction result due to the oblivious decryption phase. The model confidentiality is protected as the model is encrypted with the model-provider’s public key. Scenario 2 is suitable for applications that require outsourcing a trained model to the client side for predictions. For instance, this could be the case for model trading platforms that offer a try-before-you-buy option, where customers locally test the performance of an ML model on their data before purchasing it. Another relevant application is model outsourcing to edge devices [2, 44], e.g., mobile phones or smartwatches, that monitor their owners’ activity and provide feedback to them through predictions, e.g., health recommendations or activity tracking [55]. We note that this is a novel PaaS scenario enabled by slytHErin.

4.3 Scenario 3: Encrypted Client Data - Encrypted Model

In this scenario, we assume that the model-provider is represented by a cohort of N nodes that have collectively trained a DNN on their joint data with a state-of-the-art encrypted collaborative learning framework [52, 53, 57]. For this, we rely on a multiparty variant of homomorphic encryption (MHE). In particular, the nodes (model-providers) generate a collective public key (black key in Fig. 1, Scenario 3) whose corresponding secret key is secret-shared among them (colored keys in Fig. 1, Scenario 3). We assume that the nodes collectively train a DNN on their data and retain it under encryption for PaaS to mitigate model-targeting attacks and protect its intellectual property. For this scenario, the client encrypts its evaluation data with the collective public key and a master node from the cohort performs the prediction (with both the model and the data encrypted) with the assistance of the other nodes for collective interactive operations (e.g., ciphertext refresh – \(\textsf {CBootstrap}(\cdot )\), Sects. 3.1 and 5.6). Finally, the ciphertext storing the prediction result is re-encrypted (i.e., \(\textsf {CKeySwitch}(\cdot )\), Sects. 3.1 and 5.6) under the public key of the client which decrypts it to obtain the prediction. In this case, both the model and the client data are encrypted with the cohort’s collective public key, hence, their confidentiality is ensured as long as one of the cohort nodes is honest and does not participate in decryption. The confidentiality of the prediction output is protected, as only the client can decrypt it. Scenario 3 is suitable for PaaS applications after a model-provider outsources the model training procedure to a cohort of N nodes that leverage on distributed learning techniques for improved efficiency or after a federation of N model-providers, each with their own data, uses a state-of-the-art framework to train a collective ML model under encryption [19, 52, 53]. We note that previous works that focused on encrypted DNN inference do not support (or implement) inference on encrypted models or collaborative functionalities such as bootstrapping or re-encryption.

5 Cryptographic Building Blocks

We describe slytHErin’s underlying cryptographic building blocks that make it flexible and efficient for different encrypted PaaS scenarios (Sect. 4) and various DNN architectures. We first introduce the data packing approach adopted to encode/encrypt the input data (Sect. 5.1). Then, we describe the algorithms used to evaluate fully-connected, convolutional, and pooling layers under encryption in Sects. 5.2 and 5.3, respectively. We also present several optimizations that slytHErin implements (Sect. 5.4) and how non-linear activation functions are evaluated (Sect. 5.5). Finally, we present the multiparty computation protocols which allow slytHErin to support novel PaaS scenarios (Sect. 5.6).

5.1 Input Data Packing

Modern homomorphic encryption schemes can encode (pack) a vector of values into one ciphertext, thus enabling SIMD operations via the parallel computation of a function on all ciphertext slots. Designing an efficient packing scheme is crucial, yet challenging, due to the costs of re-arranging the ciphertext slots via rotations. Prior work on encrypted DNN inference [11, 22, 31, 33, 34, 37, 48] designed efficient packing schemes but these are tailored to specific system models and assumptions (e.g., the client’s availability for the evaluation of certain operations). slytHErin employs a simple yet generic data packing scheme that is agnostic of the encrypted PaaS scenario and also flexible in terms of batch size that results in optimized latency and throughput. Given a batch consisting of n input samples each with d features, a naive approach is to encrypt/encode each feature of an input sample separately, yielding an inefficient execution due to the high number of ciphertexts/plaintexts. To leverage on SIMD operations and enable efficient encrypted inference, we flatten the batch and encrypt/encode all values in a single ciphertext/plaintext. For an input sample represented by a tensor of size \(h \times r \times c\) (where, e.g., for an image, h is the number of channels, while r and c represent the size of the pixel matrix of each channel), we encrypt/encode a batch of size n in a tensor of size \(n \times h \times r \times c\) as follows: First, we row-flatten (RowFlatten(\(\cdot \))) each of the n tensors, such that the batch-tensor is transformed into a matrix of size \(n \times d\), with \(d=h \times r \times c\). This is done by iterating through all the channels of the input, by row-flattening the corresponding 2D matrix, and by horizontally stacking their flattened representation. The \(n \times d\) matrix is then transposed and row-flattened (TensorFlatten(\(\cdot \))), thus yielding a vector of size \(m = d \times n\). Our packing scheme requires that \(m \le s\), where s is the ciphertext capacity (i.e., \(s = \mathcal {N}/2\) for CKKS) and if that is not possible, we employ block matrix arithmetic optimizations (see Sect. 5.4).

5.2 Matrix Multiplication

To support the evaluation of fully-connected layers under encryption, slytHErin relies on the following matrix multiplication algorithm. Given two encrypted matrices, \(\textbf{A}\) and \(\textbf{W}\), where \(\textbf{A}\) is of size \(n \times d\) and \(\textbf{W}\) of size \(d\times h\), slytHErin implements their multiplication following the diagonal approach of [28]. First, \(\textbf{W}\) is represented by its generalized diagonals [28], where the element i,j of the diagonal is: \(d_{i,j} = \textbf{W}_{(i+j) \bmod d, j}\). Additionally, we replicate n times the element \(d_{i,j}\). The matrix multiplication, then, can be evaluated as follows:

$$ \textbf{A} \times \textbf{W} = \displaystyle \sum _{i=1}^{d} \mathbf {d_i} \odot \textsf {RotateCyclic}_{d \times i}(\textsf {RowFlatten}(\textbf{A}^T)) $$

where \(\textsf {RotateCyclic}_{k}(\textbf{v})\) represents a cyclic rotation of the values in \(\textbf{v}\) by k positions to the left and \(\odot \) represents the Hadamard product. Figure 2 represents a multiplication of two \(3\times 3\) matrices with this algorithm.

Fig. 2.
figure 2

Multiplication of two matrices \(\textbf{A}\) and \(\textbf{W}\) of size \(3 \times 3\).

5.3 Convolutional and Pooling Layers

To evaluate convolutional layers under encryption, slytHErin represents the convolution operation as a matrix multiplication by expressing the filter as a Toeplitz matrix [23, 26]. For ease of presentation, consider a toy-example with a convolution between a single-channel input \(\textbf{I} \in R^{3 \times 3}\) and a filter \(\textbf{h} \in R^{2 \times 2}\) operating on the input with unitary stride and no padding. We can compute the convolution as: \(\textbf{O} = \textsf {TensorFlatten}(\textbf{h} * \textbf{I})^T = \mathbf {h'} \times \mathbf {I'}\) where \(\mathbf {I'} = \textsf {TensorFlatten}(\textbf{I})^T\) and \(\mathbf {h'}=\mathcal {T}(\textbf{h})\) for a function \(\mathcal {T}\) that returns a Toeplitz matrix [23] as follows:

$$\begin{aligned} \mathbf {h'}= \begin{pmatrix} h_{1,1} &{} h_{1,2} &{} 0 &{} h_{2,1} &{} h_{2,4} &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} h_{1,1} &{} h_{1,2} &{} 0 &{} h_{2,1} &{} h_{2,4} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} h_{1,1} &{} h_{1,2} &{} 0 &{} h_{2,1} &{} h_{2,4} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} h_{1,1} &{} h_{1,2} &{} 0 &{} h_{2,1} &{} h_{2,4}\\ \end{pmatrix} \end{aligned}$$

Note that computing \(\mathbf {O^T} = \mathbf {I'^T} \times \mathbf {h'^T}\) allows us to utilize the matrix multiplication algorithm and the input data packing protocol of Sects. 5.2 and 5.1, respectively. Moreover, \(\mathbf {O^T}\) is a valid input to any subsequent layer in the DNN architecture, without requiring any re-packing, hence avoiding the cost of slot re-arrangement. slytHErin generalizes this method for convolutional layers with k kernels, each with m filters, and n inputs with m channels. slytHErin also supports SumPooling and AveragePooling layers: these are evaluated by treating them as convolutional layers, and employing the method previously described.

5.4 Optimizations

Complex-Number Trick. To optimize the input data packing scheme (Sect. 5.1), slytHErin employs the complex-number trick [51]: Since the CKKS plaintext space is \(\mathbb {C}^{{\mathcal {N}}/{2}}\), we can leverage the imaginary part of complex numbers and pack (up to) two values in one plaintext slot. This allows us to effectively perform the multiplication and sum of two values with just one multiplication. As a toy example, let us consider the vectors: \(\textbf{a} = (a_1,\dots ), \textbf{b} = (b_1,\dots ), \textbf{c} = (c_1,\dots ),\) and \(\textbf{d} = (d_1,\dots )\). To compute \(\textbf{a} \odot \textbf{c} + \textbf{b} \odot \textbf{d} = (a_1c_1 + b_1d_1 , \dots )\), we compress the first two and the two last vectors each into one vector with the following complex representation: \(\textbf{g} = (a_1 + ib_1 , \dots )\), \(\textbf{h} = (c_1 - id_1 , \dots )\). Then, \(\textbf{g} \odot \textbf{h} = (a_1c_1 + b_1d_1 + i.e. , \dots )\) for some value e, and the real part of the result can be extracted with complex conjugation, addition and constant multiplication. We apply this technique to the input matrix \(\textbf{A}\) and to the weight matrix \(\textbf{W}\). In particular, we embed pairs of adjacent columns of \(\textbf{A}\) into one column, i.e., column k is paired with column \(k+1 \bmod d\), where d is the number of columns, hence the entry \(\textbf{A}_{(k,j)}\) becomes \(\textbf{A}_{(k,j)}+i\textbf{A}_{(k,j+1)}\). For \(\textbf{W}\), we compress the pairs of adjacent diagonals into one, padding with an extra 0-diagonal if the number of diagonals is odd. The newly packed matrix \(\tilde{\textbf{W}}\) has \(\lceil \frac{d}{2} \rceil \) diagonals instead of d, reducing the complexity of the matrix multiplication algorithm by a factor of 2.

Block Matrix Arithmetic. When the size of the input batch exceeds the ciphertext capacity, slytHErin employs block-matrix arithmetic [52]. The input matrix \(\textbf{A}\) of size \(n \times d\), is represented as a block-matrix \(\mathbf {\bar{A}}\) of size \(q \times p\), i.e., a matrix consisting of blocks (or sub-matrices) of size \(\frac{n}{q} \times \frac{d}{p}\) for some divisors q and p of n and d, respectively. Similarly, the weight matrix \(\textbf{W}\) of size \(d \times h\) is partitioned to enable the multiplication \(\mathbf {\bar{O}} = \mathbf {\bar{A}} \times \mathbf {\bar{W}}\) under two constraints: (i) \(\mathbf {\bar{W}}\) must have p row partitions, and (ii) every inner block \(\textbf{W}_{k,j}\) must be compatible for matrix multiplication with the inner blocks \(\textbf{A}_{i,k}\). \(\mathbf {\bar{O}}\) is a block-matrix of size \(n \times h\) with q row partitions and m column partitions (and m the number of column partitions of \(\mathbf {\bar{W}}\)). Each block \(\textbf{O}_{i,j}\) is computed as: \(\textbf{O}_{i,j} = \sum _{k=1}^{p} \textbf{A}_{i,k}\textbf{W}_{k,j}\).

Hence, by choosing suitable partitions, each matrix inner block is small enough to be encrypted/encoded independently following the input data packing and the generalized-diagonals approach described earlier (Sects. 5.1 and 5.2). Figure 3 represents the encryption of matrix \(\textbf{A}\) with \(2\times 2\) partitioning. Then, the matrix multiplication between two large matrices is evaluated as a series of sums and multiplications between these smaller blocks. Given a model to evaluate (i.e., the dimensions of its layers), the number of input features, and a set of CKKS parameters, slytHErin follows a heuristic-based approach to automatically find the best batch size and partition strategy. In more detail, slytHErin explores the space of possible splits, starting from divisors of the number of samples (if provided by the user) or divisors of the features dimension, and picks the split sequence and batch size that minimize the overall complexity of the pipeline in terms of homomorphic operations (i.e., it minimizes the number of homomorphic multiplications required to evaluate the model), thus optimizing throughput. In any case, the user can also declare a customized batch size which overrides the optimized batch size, and let slytHErin operate with a sub-optimal block matrix representation. An advantage of the block matrix arithmetic approach is that it is amenable to parallelization: Given \(q \times p \times m\) threads, the matrix multiplication between two blocks \(\textbf{A}_{i,k} \) and \(\textbf{W}_{k,j}\) can be delegated to each thread, while using \(q \times m\) of them to combine the individual results. Moreover, for a given set of cryptographic parameters and the corresponding evaluation keys, the client does not need to regenerate the keys for the evaluation of arbitrary size matrices, which is a computationally intensive task.

Fig. 3.
figure 3

Partitioning of an input matrix \(\textbf{A}\) in a \(2 \times 2\) block matrix.

5.5 Non-Linear Operations

As non-polynomial functions, e.g., comparisons, are not computable under HE, some works modify common activation functions (e.g., \(\textsf {ReLU}\)) with simple polynomial functions [22] (e.g., \(x^{2}\)), or use polynomial approximations [53]. slytHErin employs the second approach and relies on Chebychev interpolants to approximate any Lipschitz continuous function on any finite real interval.

5.6 Multiparty Computation Protocols

We remind that slytHErin relies on CKKS and its multiparty variant (MHE) which enables interactive functionalities such as \(\textsf {CBootstrap}(\cdot )\) for collective bootstrapping and \(\textsf {CKeySwitch}(\cdot )\) for collective key-switching. The latter enables changing the encryption key of a ciphertext. In Scenario 3, the model-providers rely on these functionalities to refresh the ciphertexts noise and to change the encryption key of the prediction result, so that only the client can decrypt it.

We also design and implement an oblivious decryption protocol \(\textsf {ObvDec}(\cdot )\), for Scenario 2 (Sect. 4.2). In this protocol, the client masks its prediction result (encrypted under the model provider secret key) with an encryption of 0 under an ephemeral secret key, and sends the result to the model provider, which can remove one layer of encryption from the result (by invoking the decryption procedure of CKKS), without exposing the underlying plaintext. The result is finally sent to the client that unmasks it.

6 Experimental Evaluation

6.1 Implementation and Experimental Setup

We implemented slytHErin in Go [24], using Lattigo as the cryptographic library [35]. Our implementation is modular, reusable, and easy to adapt to several PaaS applications. Detailed documentation can be found along with our source code on https://github.com/ldsec/slytHErin. We evaluate slytHErin using the following DNN architectures:

  • NN5: A 5-layer convolutional neural network described in [22] for which we replace the square activation function with a degree 2 Chebyshev approximation of \(\textsf {Softplus}\).

  • NN20: A 20-layer DNN composed of convolutional and fully connected layers described in [16] (\(\sim \)754K model parameters) for which we replace the activation functions with a degree-63 approximation of \(\textsf {SiLU}\) and train it with the MSE loss function.

  • NN50: Similar to NN20 but comprising 50 layers (\(\sim \)1M model parameters [16]).

We use the MNIST dataset [36] for encrypted image classification, as it is the de-facto benchmark dataset used in prior work for privacy-preserving inference tasks [8, 9, 11, 16, 17, 22, 33]. All models were trained from scratch, achieving similar accuracy to the original works (and with minimal accuracy loss in the encrypted inference, none for NN5, approximately \(\sim \)0.13\(\%\) for NN20, and \(\sim \)2\(\%\) for NN50). The CKKS parameters are configured to achieve 128-bit security. For the multiparty interactive protocols, we deploy slytHErin on a local cluster with an average network delay of 20ms and 1Gbps bandwidth. All experiments were executed on machines running Ubuntu 22.04, with 12-core Intel Xeon E5-2680 2.5 GHz CPUs and 256GB RAM DDR4. The results are averaged over 3–5 runs.

6.2 Empirical Results

We first demonstrate how slytHErin supports different batch sizes by evaluating NN5 on Scenario 1 (Sect. 6.2.1). We also compare slytHErin with prior work on private PaaS as NN5 is the predominantly used benchmark. Then, in Sect. 6.2.2, we evaluate NN20 on Scenario 3 to discuss slytHErin’s scalability aspects with the number of model-providers. Finally, we demonstrate slytHErin’s application and model agility by evaluating the more complex model NN50 in all scenarios of Sect. 4 (Sect. 6.2.3).

6.2.1 Elastic Data Packing

We demonstrate the benefits of our packing approach (Sect. 5.1), by benchmarking NN5 [22] in the traditional PaaS setting (Scenario 1) for various batch sizes. For this experiment, slytHErin heuristically estimates the optimal batch size for throughput at 83, as described in Sect. 5.4; this is experimentally confirmed by observing Figs. 4a and 4b. In particular, Fig. 4a shows slytHErin’s latency for varying batch sizes up to 4, 096 in semi-log scale. We observe a linear increase in latency after the optimal size. This is expected, as slytHErin automatically splits batch sizes larger than the optimal size into sub-batches of optimal size, and processes them sequentially. Figure 4b shows the amortized runtime of slytHErin for variable batch sizes: We observe that a batch of size 83 is indeed the optimal point which minimizes the amortized runtime (or maximizes the throughput). Finally, we compare slytHErin’s performance with related works that evaluate NN5 in the same application scenario with polynomial activation functions (thus, we exclude Gazelle [33] which relies on Garbled Circuits). Table 1 shows that slytHErin’s performance is on par with or better than previous works, while providing enhanced flexibility in terms of batch size. The approach followed by CryptoNets and inspired works [8, 17, 22] allows them to achieve a good throughput by processing large batches of data items (up to \(\mathcal {N}/2\)), but their runtime is independent of the batch size (hence, it will not decrease for smaller batches as per Table 1). Conversely, the approach followed by LoLa [11] achieves low latency for a single sample, but cannot amortize the runtime when processing multiple samples. With slytHErin, the end-user can define its custom batch size without a major impact on performance.

Fig. 4.
figure 4

slytHErin’s amortized runtime with different batch sizes.

Table 1. Latency comparison between slytHErin and prior encrypted frameworks for the evaluation of NN5 and various batch sizes.
Table 2. slytHErin’s performance for NN20 on Scenario 3 (Sect. 4.3) with increasing number of parties (model-providers).
Fig. 5.
figure 5

Benchmarking decentralized vs. centralized bootstrapping on encrypted NN20 for variable number of parties (model-providers).

6.2.2 Interactive MPC Protocols

We evaluate NN20 in Scenario 3 where the model is trained and retained under encryption by multiple parties using a privacy-preserving collaborative training framework [19, 52, 53] (Sect. 4.3). Note that this scenario requires the use of the collective bootstrapping protocol \(\textsf {CBootstrap}(\cdot )\), thus, it is not supported by prior encrypted inference frameworks. Table 2 shows slytHErin’s latency and throughput for increasing number of parties, while in Fig. 5 we compare slytHErin’s amortized runtime when employing \(\textsf {CBootstrap}(\cdot )\) versus the centralized bootstrapping. Overall, we observe a linear increase (decrease) in slytHErin’s latency (throughput) as the number of parties increases. We note that the \(\textsf {CBootstrap}(\cdot )\) operation is executed in an asynchronous fashion by the master model provider, i.e., the protocol is initiated concurrently with all the model providers and the output is generated as soon as the last party provides its share. For this reason, we can even experience lower latency when increasing the number of parties by a limited amount (3 vs. 5), as the protocol becomes particularly sensible to the network conditions. In any case, the benefits of employing \(\textsf {CBootstrap}(\cdot )\) over the centralized version (when possible) are evident, as the former enables refreshing the ciphertext noise with an efficient interactive protocol, rather than with a computationally expensive homomorphic circuit.

6.2.3 Application and Model Agility

Finally, we demonstrate the high degree of flexibility offered by slytHErin, both in terms of variety of enabled use-cases and supported architectures, by evaluating a more complex model on all the scenarios described in Sect. 4. In particular, we benchmark slytHErin with NN50 and a batch of 585 samples on: (i) Scenario 1 with encrypted data and a plaintext model (Sect. 4.1), (ii) Scenario 2 with an encrypted model and plaintext data (Sect. 4.2), and (iii) Scenario 3 where the encrypted model is kept by \(N{=}3\) model-providers and encrypted data. Note that evaluating NN50 in Scenarios 1 and 2 requires the invocation of the centralized bootstrapping operation, that is not supported by most of the related works [8, 11, 22, 33].

Table 3. slytHErin’s performance for NN50 in Scenarios 1, 2, and 3 (Sect. 4). For Scenario 3, the number of model-providers is \(N{=}3\).

Table 3 shows the performance results for all scenarios. First, we note that by leveraging on our data packing approach and processing multiple samples in a SIMD fashion, slytHErin achieves reasonable runtime given the complexity of the NN50 model (Scenario 1). For reference, the original work by Chillotti et al. achieves at best an amortized time of 37.69s/sample and a throughput of 0.02samples/s. Then, we also observe that slytHErin’s generic optimizations enable the efficient evaluation of encrypted models: Evaluating NN50 under encryption on Scenario 2, which involves a matrix multiplication, addition, polynomial activation, and centralized bootstrapping operations, is only \(\sim \)7\(\%\) slower than evaluating a plaintext model evaluation (c.f. Scenario 1). slytHErin achieves the best performance results on Scenario 3 thanks to its support for interactive multiparty protocols such as collective bootstrapping (Sect. 6.2.2). Overall, we remark that slytHErin is the first framework for encrypted inference that can support all these application scenarios.

7 Conclusion

In this work, we presented slytHErin, an agile framework for privacy-preserving deep neural network inference using homomorphic encryption. Thanks to our hybrid design that leverages on HE and its multiparty variant, and generic setting-agnostic optimizations, slytHErin can support various and novel scenarios for encrypted inference featuring untrusted model providers and clients. These scenarios include: (i) the client sending encrypted data to an untrusted model-provider for inference, (ii) the model-provider sending an encrypted model to a client for local inference (without the need of mutual trust between them), and (iii) the client sending the encrypted data to a cohort of model-providers holding an encrypted model. Thus, slytHErin extends the applicability of privacy-preserving PaaS beyond previous works. Moreover, with our intuitive and flexible input data packing scheme, slytHErin can be adapted to various deep neural network architectures and can accommodate diverse application requirements, being able to process an arbitrary number of samples without incurring major performance loss. Our experimental results show that the simplicity of our packing approach and the agility of our framework does not harm its performance as it is on par with, and occasionally better than, state-of-the-art related works, while introducing an increased degree of flexibility over previous works.