Keywords

1 Introduction

Homomorphic encryption (HE) is a specific class of encryption schemes that enables computation over encrypted data. The Cheon-Kim-Kim-Song (CKKS) scheme [12] is one of the highlighted fully homomorphic encryption (FHE) schemes as it supports efficient computation on real (or complex) numbers, which are the usual data type for many applications such as deep learning. As the other HE schemes are designed for different domains, the CKKS scheme is known to be the most efficient for real numbers. For example, Brakerski-Fan-Vercauteren (BFV) [5, 6, 17] and Brakerski-Gentry-Vaikuntanathan (BGV) [4] schemes are designed for integer messages in \(\mathbb {Z}_q\), and FHEW/TFHE [13,14,15] are designed for binary circuits.

Gentry’s blueprint of bootstrapping provides the idea of homomorphic re-encryption of ciphertext. In CKKS bootstrapping, the modular reduction by an integer is performed homomorphically. However, the modular reduction function is not represented by the addition and multiplication of complex numbers. Hence, an approximate polynomial of trigonometric functions is used in prior arts [3, 7, 9, 18, 24], which have two limitations in practice: i) these are indirect approximations, which require larger multiplicative depths, and ii) the measure of approximation error is minimax-base (minimizing the upper bound of the approximation error). This paper shows that the minimax polynomial does not guarantee the minimax bootstrapping error. We propose that the error variance would be a better measure than the minimax error, especially for bootstrapping.

The CKKS scheme provides the trade-off between the efficiency and precision of messages as encrypted data of the CKKS scheme inherently has noise. Errors in encrypted data are propagated and added along with homomorphic operations. Hence, the error should be carefully measured when we design a circuit for efficiency and security in CKKS. Moreover, as attacks against CKKS have recently been proposed [11, 26, 27], reducing errors of the CKKS scheme becomes more crucial to mitigate the risk of the attacks.

1.1 Our Contributions

This paper contains contributions to the high-precision bootstrapping of the CKKS scheme. We propose i) a method to find the optimal approximate polynomial for the modular reduction in bootstrapping and ii) an efficient algorithm for homomorphic evaluation of polynomials.

First, we propose the optimal approximate polynomial for CKKS bootstrapping in the aspect of signal-to-noise ratio (SNR), which improves the precision of CKKS bootstrapping. As a result, we can reserve more levels after bootstrapping while achieving the best-known precision, where the level of a ciphertext is defined as the number of successive multiplications that can be performed to the ciphertext without bootstrapping. The proposed approximate polynomial has the following three features: i) an optimal measure of error for CKKS bootstrapping: we show that an approximate polynomial that achieves the least error variance is also optimal for CKKS bootstrapping in the aspect of SNR. ii) a direct approximation: the approximate polynomial of modular reduction is directly obtained from the whole polynomial space of degree n, i.e., \(P_n\), using the error variance-minimizing method, and thus it has less multiplicative depths compared to the previous methods (In other words, less bootstrapping is required for the same circuit.) iii) reduction of error from noisy calculation: in the polynomial evaluation over CKKS, each polynomial basis has an error. Unlike previous bootstrapping methods, the proposed method minimizes the errors introduced by noisy basis as well as the approximation error.

Second, we propose a novel variant of the baby-step giant-step (BSGS) algorithm, called the lazy-BSGS algorithm, which reduces the number of relinearizations by half compared to ordinary BSGS algorithms. The proposed lazy-BSGS algorithm is more efficient for higher degree polynomial. The proposed approximate polynomial has a high degree, while the previous methods use a composition of small-degree polynomials. Thus, the lazy-BSGS algorithm makes the evaluation time of the proposed polynomial comparable to the previous methods.

Note for the First Contribution. Previous methods utilized the minimax approximate polynomial of modular reduction function for bootstrapping to reduce the bootstrapping error [18, 24]. However, in CKKS bootstrapping, a linear transformation on slot values, called \(\textsc {SlotToCoeff}\), is performed, and its resulting ciphertext is the sum of thousands of noisy values. Since many noisy values are added, the upper bound on the final error value is loose. Hence, we propose to minimize the error variance instead of the upper bound on the error.

Besides the approximation error, each polynomial basis also has an error in CKKS, and it is amplified when we multiply large coefficients of the approximate polynomial. The previous approximation method could not control these errors with the approximate polynomial coefficients. Thus, they used the trigonometric function and double angle formula instead, to make the approximation degree small [3, 18, 24]. This indirect approximation results in larger multiplicative depths. It is preferred to reserve more levels after bootstrapping as it can reduce the number of bootstrapping in the whole system; moreover, the number of remaining levels after bootstrapping is also important for an efficient circuit design of algorithms using CKKS, for example, in [23], the depth of activation layer is optimized for the levels after bootstrapping. The proposed method minimizes the basis error variance as well as the approximation error variance, so it has less multiplicative depths compared to the previous composition of trigonometric functions. To the best of our knowledge, this is the first method to find the optimal approximate polynomial that minimizes both the approximation error and the error in the basis at the same time.

We show that from the learning with error (LWE) assumption, the input of approximate polynomial follows a distribution similar to Irwin-Hall distribution, regardless of the security. The proposed method exploits this property to improve the approximation accuracy. Also, we derive an analytical solution for our error variance-minimizing approximate polynomial, while the previous minimax approximate polynomial was obtained by iterative algorithms [18, 24].

Note for the Second Contribution. As rescaling and relinearization introduce additional errors, it is desirable to perform them as late as possible. In addition, the number of rescalings/relinearizations is also reduced by reordering the operations to delay the relinearization/rescaling. This technique, the so-called lazy rescaling/relinearization technique, has been applied to reduce the computational complexity in [1, 8, 22]. We propose a rigorous analysis on lazy rescaling and relinearization in the BSGS algorithm. Moreover, we propose the algorithm to find the optimal approximate polynomial, which fits the lazy-BSGS algorithm for odd functions.

1.2 Related Works

Bootstrapping of the CKKS Scheme. Since the CKKS bootstrapping was firstly proposed in [9], the Chebyshev interpolation has been applied to the homomorphic evaluation of modular reduction [7, 18]. Then, a technique for direct approximation was proposed using the least squares method [25] and Lagrange interpolation [19]. However, the magnitudes of coefficients of those approximate polynomials are too large. The algorithm to find minimax approximate polynomial using improved multi-interval Remez algorithm and the use of \(\arcsin \) to reduce approximation error of the modular reduction were presented in [24]. The bootstrapping for the non-sparse-key CKKS scheme was proposed, and the computation time for homomorphic linear transformations was significantly improved by using double hoisting in [3]. Julta and Manohar proposed to use sine series approximation [20], but as there exists a linear transformation from sine series \(\left\{ \sin (kx) \right\} \) to power of sine functions \(\left\{ \sin (x)^k \right\} \), this method is also based on trigonometric functions.

Attacks on the CKKS Scheme and High-Precision Bootstrapping. An attack to recover the secret key using the error pattern after decryption was recently proposed by Li and Micciancio [26], and thus it becomes more crucial to reduce the error in CKKS. One possible solution to this attack is to add a huge error, so-called the noise flooding technique [16] or perform rounding after decryption to make the plaintext error-free [26]. In order to use the noise flooding technique, the CKKS scheme requires much higher precision, and the bootstrapping error is the bottleneck of precision. Although a lot of research is required on how to exploit the bootstrapping error for cryptanalysis of CKKS, the high-precision bootstrapping is still an interesting topic [3, 20, 24, 25].

1.3 Organization of This Paper

The remainder of the paper is organized as follows. In Sect. 2, we provide the necessary notations and SNR perspective on error. The CKKS scheme and its bootstrapping algorithm are summarized in Sect. 3. We provide a new method to find the optimal direct approximate polynomials for the CKKS scheme and also show its optimality in Sect. 4. Section 5 provides the novel lazy-BSGS algorithm for the efficient evaluation of approximate polynomial for CKKS bootstrapping. The implementation results and comparison for precision and timing performance for the CKKS bootstrapping are given in Sect. 6. Finally, we conclude the paper in Sect. 7.

2 Preliminaries

2.1 Basic Notation

Vectors are denoted in boldface, such as \(\boldsymbol{v}\), and all vectors are column vectors. Matrices are denoted by boldfaced capital letters, i.e., \(\mathbf {M}\). We denote the inner product of two vectors by \(\langle \cdot , \cdot \rangle \) or simply \(\cdot \). \(\left\lfloor \cdot \right\rceil \), \(\left\lfloor \cdot \right\rfloor \), and \(\left\lceil \cdot \right\rceil \) denote the rounding, floor, and ceiling functions, respectively. \([m]_q\) is the modular reduction, i.e., the remainder of m dividing by q. \(x\leftarrow \mathcal {D}\) denotes the sampling x according to a distribution \(\mathcal {D}\). When a set is used instead of distribution, x is sampled uniformly at random among the set elements. Random variables are denoted by capital letters such as X. E[X] and Var[X] denote the mean and variance of random variable X, respectively. For a function f, Var[f(X)] can be simply denoted by Var[f]. \(\left\| \boldsymbol{a} \right\| _2\) and \(\left\| \boldsymbol{a} \right\| _{\infty }\) denote the L-2 norm and the infinity norm, and when the input is a polynomial, those denote the norm of coefficient vector. We denote the supreme norm of a function \(\left\| f \right\| _{\textsf {sup}}:=\sup _{t\in \mathcal {D}}|f(t)|\) for a given domain \(\mathcal {D}\).

Let \(\varPhi _M(X)\) be the M-th cyclotomic polynomial of degree N, and when M is a power of two, \(M = 2N\), and \(\varPhi _M(X) = X^N+1\). Let \(\mathcal {R}=\mathbb {Z}/\langle \varPhi _M(X)\rangle \) be the ring of integers of a number field \(\mathcal {S}=\mathbb {Q}/\langle \varPhi _M(X)\rangle ,\) where \(\mathbb {Q}\) is the set of rational numbers and we write \(\mathcal {R}_q = \mathcal {R}/q\mathcal {R}\). A polynomial \(a(X) \in \mathcal {R}\) can be denoted by a by omitting X when it is obvious. Since the multiplicative depth of a circuit is crucial in CKKS, from here on, the multiplicative depth is referred to as depth.

2.2 The CKKS Scheme

The CKKS scheme and its residual number system (RNS) variants [10, 18] provide operations on encrypted complex numbers, which are done by the canonical embedding and its inverse. Recall that the canonical embedding \(\textsf {Emb}\) of \(a(X)\in \mathbb {Q}/\langle \varPhi _M(X)\rangle \) into \(\mathbb {C}^N\) is the vector of the evaluation values of a at the roots of \(\varPhi _M(X)\) and \(\textsf {Emb}^{-1}\) denotes its inverse. Let \(\pi \) denote a natural projection from \(\mathbb {H}=\{(z_j)_{j\in \mathbb {Z}_M^*}:z_j=\overline{z_{-j}}\}\) to \(\mathbb {C}^{N/2}\), where \(\mathbb {Z}_M^*\) is the multiplicative group of integer modulo M. The encoding and decoding are defined as follows.

  • Ecd \((\boldsymbol{z}; \varDelta )\): For an (N/2)-dimensional vector \(\boldsymbol{z}\), the encoding returns

    $$ m(X)=\textsf {Emb}^{-1}\left( \left\lfloor \varDelta \cdot \pi ^{-1}(\boldsymbol{z}) \right\rceil _{\textsf {Emb}(\mathcal {R})} \right) \in \mathcal {R}, $$

    where \(\varDelta \) is the scaling factor and \(\left\lfloor \cdot \right\rceil _{\textsf {Emb}(\mathcal {R})}\) denotes the discretization into an element of \(\textsf {Emb}(\mathcal {R})\).

  • Dcd\((m; \varDelta )\): For an input polynomial \(m(X)\in \mathcal {R}\), output a vector

    $$\boldsymbol{z} = \pi (\varDelta ^{-1} \cdot \textsf {Emb}(m)) \in \mathbb {C}^{N/2},$$

    where its entry of index j is given as \(z_j=\varDelta ^{-1}\cdot m(\zeta _M^j)\) for \(j\in T\), \(\zeta _M\) is the M-th root of unity, and T is a multiplicative subgroup of \(\mathbb {Z}_M^*\) satisfying \(\mathbb {Z}_M^*/T=\{\pm 1\}\). Alternatively, this can be basically represented by multiplication by an \(N/2\times N\) matrix \(\mathbf {U}\) whose entries are \(\mathbf {U}_{ji} = \zeta _j^i\), where \(\zeta _j {:}{=}\zeta _M^{5^j}\).

For a real number \(\sigma \), \(\mathcal {D}\mathcal {G}(\sigma ^2)\) denotes the distribution in \(\mathbb {Z}^N\), whose entries are sampled independently from the discrete Gaussian distribution of variance \(\sigma ^2\). \(\mathcal {H}\mathcal {W}\mathcal {T}(h)\) is the set of signed binary vectors in \(\{0,\pm 1\}^N\) with Hamming weight h. Suppose that we have ciphertexts of level l for \(0 \le l \le L\).

The RNS-CKKS scheme performs all operations in RNS. The ciphertext modulus \(Q_l=q\cdot \prod _{i=1}^{l} p_i\) is used, where \(p_i\)’s are chosen as primes that satisfy \(p_i = 1 \left( \mathrm {mod}~2N\right) \) to support efficient number theoretic transform (NTT). We note that \(Q_0 = q\) is greater than p as the final message’s coefficients should not be greater than the ciphertext modulus q. For a faster computation, we use the hybrid key switching technique in [18]. First, for predefined \(\textsf {dnum}\), a small integer, we define partial products \(\left\{ \tilde{Q}_j\right\} _{0\le j<\textsf {dnum}} = \left\{ \prod _{i=j\alpha }^{(j+1)\alpha -1} p_i \right\} _{0\le j<\textsf {dnum}}\text {,}\) for a small integer \(\alpha = \left\lceil (L+1)/\textsf {dnum} \right\rceil \). For a ciphertext with level l and \(\textsf {dnum}' = \left\lceil (l+1)/\alpha \right\rceil \), we define [18]

$$\begin{aligned} \mathcal {W}\mathcal {D}_l(a) = \left( \left[ a \frac{\tilde{Q}_0}{Q_l} \right] _{\tilde{Q}_0}, \cdots , \left[ a \frac{\tilde{Q}_{\textsf {dnum}'-1}}{Q_l} \right] _{\tilde{Q}_{\textsf {dnum}'-1}} \right) \in \mathcal {R}^{\textsf {dnum}'}, \end{aligned}$$
$$\begin{aligned} \mathcal {P}\mathcal {W}_l(a) = \left( \left[ a \frac{Q_l}{\tilde{Q}_0} \right] _{{Q}_l}, \cdots , \left[ a \frac{Q_l}{\tilde{Q}_{\textsf {dnum}'-1}} \right] _{{Q}_l} \right) \in \mathcal {R}_{Q_l}^{\textsf {dnum}'}. \end{aligned}$$

Then, for any \((a,b)\in \mathcal {R}_{Q_l}^2\), we have

$$\begin{aligned} \langle \mathcal {W}\mathcal {D}_l(a) ,\mathcal {P}\mathcal {W}_l(b) \rangle = a\cdot b \left( \mathrm {mod}~Q_l\right) . \end{aligned}$$

Then, the operations in the RNS-CKKS scheme are defined as follows:

  • KeyGen \((1^\lambda )\):

    • Given the security parameter \(\lambda \), we choose a power-of-two M, an integer h, an integer P, a real number \(\sigma \), and a maximum ciphertext modulus Q, such that \(Q\ge Q_L\).

    • Sample the following values: \( s \leftarrow \mathcal {H}\mathcal {W}\mathcal {T}(h). \)

    • The secret key is \(\textsf {sk}{:}{=}(1, s)\).

  • KSGen\(_\textsf {sk}(s')\): For auxiliary modulus \(P = \prod _{i=0}^{k}p_i'\approx \max _j{\tilde{Q}_j}\), sample \(a_k'\leftarrow \mathcal {R}_{PQ_L}\) and \(e_k'\leftarrow \mathcal {D}\mathcal {G}(\sigma ^2)\). Output the switching key

    $$\begin{aligned} \textsf {swk} {:}{=}(\textsf {swk}_0, \textsf {swk}_1) =&( \left\{ b_k'\right\} _{k=0}^{\textsf {dnum}'-1}, \left\{ a_k'\right\} _{k=0}^{\textsf {dnum}'-1} )\in \mathcal {R}_{PQ_L}^{2\times \textsf {dnum}'} , \end{aligned}$$

    where \(b_k'= -a_k's+e_k'+P\cdot \mathcal {P}\mathcal {W}(s')_k\left( \mathrm {mod}~PQ_L\right) \).

    • Set the evaluation key as \(\textsf {evk}:= \textsf {KSGen}_\textsf {sk}(s^2)\).

  • Enc\( _{\textsf {sk}}(m)\): Sample \(a \leftarrow \mathcal {R}_{Q_L}\) and \(e \leftarrow \mathcal {D}\mathcal {G}(\sigma ^2).\) The output ciphertext is

    $$\textsf {ct}= (-a\cdot s + e + m, a) \left( \mathrm {mod}~Q_L\right) ,$$

    where \(\textsf {sk}= (1,s)\). There is also a public-key encryption method [12], but omitted here.

  • Dec\(_\textsf {sk}(\textsf {ct})\): Output \(\bar{m} = \langle \textsf {ct}, \textsf {sk}\rangle \).

  • Add\((\textsf {ct}_1,\textsf {ct}_2)\): For \(\textsf {ct}_1,\textsf {ct}_2 \in \mathcal {R}_{Q_l}^2\), output \( \textsf {ct}_\textsf {add} = \textsf {ct}_1 + \textsf {ct}_2 \left( \mathrm {mod}~Q_l\right) . \)

  • Mult\((\textsf {ct}_1,\textsf {ct}_2)\): For \(\textsf {ct}_1=(b_1, a_1)\) and \(\textsf {ct}_2=(b_2, a_2) \in \mathcal {R}_{Q_l}^2\), return

    $$ \textsf {ct}_\textsf {mult} = (d_0, d_1, d_2) {:}{=}(b_1b_2, a_1b_2+a_2b_1, a_1a_2) \left( \mathrm {mod}~Q_l\right) . $$
  • \(\textsf {RL}_\textsf {evk}(d_0, d_1, d_2)\): For a three-tuple ciphertext \((d_0, d_1, d_2)\) corresponding to secret key \((1,s,s^2)\), return \( (d_0,d_1) + \textsf {KS}_{\textsf {evk}}((0,d_2)). \)

  • cAdd\((\textsf {ct}_1,\boldsymbol{a};\varDelta )\): For \(\boldsymbol{a}\in \mathbb {C}^{N/2}\) and a scaling factor \(\varDelta \), output \( \textsf {ct}_\textsf {cadd} = \textsf {ct}+ (\textsf {Ecd}(\boldsymbol{a};\varDelta ),0) . \)

  • cMult\((\textsf {ct}_1,\boldsymbol{a};\varDelta )\): For \(\boldsymbol{a}\in \mathbb {C}^{N/2}\) and a scaling factor \(\varDelta \), output \( \textsf {ct}_\textsf {cmult} = \textsf {Ecd}(\boldsymbol{a};\varDelta )\cdot \textsf {ct}. \)

  • RS\((\textsf {ct})\): For \(\textsf {ct}\in \mathcal {R}_{Q_l}^2\), output \( \textsf {ct}_\textsf {RS} = \left\lfloor p_l^{-1}\cdot \textsf {ct} \right\rceil \left( \mathrm {mod}~q_{l-1}\right) . \)

  • KS\(_{\textsf {swk}}(\textsf {ct})\): For \(\textsf {ct}= (b, a)\in \mathcal {R}_{Q_l}^2\) and \(\textsf {swk} {:}{=}(\textsf {swk}_0, \textsf {swk}_1)\), output

    $$\begin{aligned} \textsf {ct}_\textsf {KS} = \left( b + \left\lfloor \frac{\langle \mathcal {W}\mathcal {D}_l(a), \textsf {swk}_0\rangle }{P} \right\rceil , \left\lfloor \frac{\langle \mathcal {W}\mathcal {D}_l(a), \textsf {swk}_1\rangle }{P} \right\rceil \right) \left( \mathrm {mod}~Q_l\right) . \end{aligned}$$

The key-switching techniques are used to provide various operations such as complex conjugate and rotation. To remove the error introduced by approximate scaling factors, one can use different scaling factors for each level as given in [21], or we can use the scale-invariant method proposed in [3] for polynomial evaluation. We note that (FullRNS-)\(\texttt {HEAAN}\) and \(\texttt {SEAL}\) are (\(\textsf {dnum}=1\)) and (\(\textsf {dnum}=L+1\)) cases, respectively, and \(\texttt {Lattigo}\) supports for arbitrary \(\textsf {dnum}\).

2.3 Signal-to-Noise Ratio Perspective of the CKKS Scheme

There has been extensive research on noisy media in many areas such as wireless communications and data storage. In this perspective, the CKKS scheme can be considered as a noisy media; encryption and decryption correspond to transmission and reception, respectively. The message in a ciphertext is the signal, and the final output has additive errors due to ring-LWE (RLWE) security, rounding, and approximation.

The SNR is the most widely-used measure of signal quality, which is defined as the ratio of the signal power to the noise power as follows:

$$\begin{aligned} \text {SNR} = \frac{E[S^2]}{E[N^2]}, \end{aligned}$$

where S and N denote the signal (message) and noise (error), respectively. As shown in the definition, the higher SNR corresponds to the better quality.

A simple way to increase SNR is to increase the signal power, but it would be limited due to regulatory or physical constraints. The CKKS scheme has the same problem; a larger scaling factor should be multiplied to the message to increase the message power, but if one uses a larger scaling factor, the ciphertext level decreases, or larger parameters should be used for security. Hence, to increase SNR, it is beneficial to reduce the noise power in the CKKS scheme rather than increasing the signal power.

Error estimation of the CKKS scheme so far has been focused on the high-probability upper bound of the error after several operations [9, 12] and also minimax for approximation [24]. However, the bound becomes quite loose as the homomorphic operation continues, and its statistical significance may diminish. Thus, we maximize SNR in this paper, which is equivalent to minimizing error variance when the scaling factor is fixed.

3 Bootstrapping of the CKKS Scheme

3.1 Outline of the CKKS Bootstrapping

There are extensive studies for bootstrapping of the CKKS scheme [3, 7, 9, 18,19,20, 24, 25]. The CKKS bootstrapping consists of the following four steps: ModRaise, CoeffToSlot, EvalMod, and SlotToCoeff.

Modulus Raising \((\textsc {ModRaise})\). \(\textsc {ModRaise}\) increases the ciphertext modulus to a larger modulus. Let \(\textsf {ct}\) be the ciphertext satisfying \(m(X) = [\langle \textsf {ct},\textsf {sk}\rangle ]_q\). Then we have \(t(X) = \langle \textsf {ct}, \textsf {sk}\rangle = qI(X) +m(X) \equiv m(X) \left( \mathrm {mod}~q\right) \) for \(I(X)\in \mathcal {R}\) with a high-probability bound \(\Vert I(X)\Vert _\infty < K = \mathcal {O}(\sqrt{h})\). The following procedure aims to calculate the remaining coefficients of t(X) when dividing by q.

Homomorphic Evaluation of Encoding (\(\textsc {CoeffToSlot}\)). Homomorphic operations are performed in plaintext slots, but we need component-wise operations on coefficients. Thus, to deal with t(X), we should put polynomial coefficients in plaintext slots. In CoeffToSlot step, \(\textsf {Emb}^{-1} \circ \pi ^{-1}\) is performed homomorphically using matrix multiplication [9], or FFT-like hybrid method [7]. Then, we have two ciphertexts encrypting \(\boldsymbol{z}_0' = (t_0,\dots , t_{\frac{N}{2}-1})\) and \(\boldsymbol{z}_1' = (t_{\frac{N}{2}},\dots , t_{N-1})\) (when the number of slots is small, we can put \(\boldsymbol{z}_0'\) and \(\boldsymbol{z}_1'\) in a ciphertext, see [9]), where \(t_j\) denotes the j-th coefficient of t(X). The matrix multiplication is composed of three steps [9]: i) rotate ciphertexts, ii) multiply diagonal components of matrix to the rotated ciphertexts, and iii) sum up the ciphertexts.

Evaluation of the Approximate Modular Reduction \((\textsc {EvalMod})\). An approximate evaluation of the modular reduction function is performed in this step. As additions and multiplications cannot represent the modular reduction function, an approximate polynomial for \([\cdot ]_q\) is used. For approximation, it is desirable to control the message size to ensure \(m_i \le \epsilon \cdot q\) for a small \(\epsilon \) [9].

Homomorphic Evaluation of Decoding \((\textsc {SlotToCoeff})\). \(\textsc {SlotToCoeff}\) is the inverse operation of CoeffToSlot. Since the matrix elements do not have to be precise as much in CoeffToSlot, we can use a smaller scaling factor here [3]. In \(\textsc {SlotToCoeff}\), the ciphertext is multiplied by the CRT matrix \(\mathbf {U}\), whose elements have magnitudes of one. Thus, the N errors in slots are multiplied by a constant of size one and then added.

3.2 Polynomial Approximation of Modular Reduction

Previous works approximated the modular reduction function as \(\frac{q}{2\pi }\sin \left( \frac{2\pi t}{q}\right) \) [7, 9, 18]. Approximate polynomial of sine function is found by using Taylor expansion of exponent function and \(e^{it}=\cos (t)+i\cdot \sin (t)\) in [9]. The Chebyshev approximation of sine function improved the approximation in [7]. The modified Chebyshev approximation in cosine function and the double-angle formula reduced the error and evaluation time in [18]. However, in these approaches, the sine function is used, and thus there is still the fundamental approximation error, that is,

$$ \left| m-\frac{q}{2\pi }\sin (2\pi \frac{m}{q})\right| \le \frac{q}{2\pi }\cdot \frac{1}{3!}\left( \frac{2\pi |m|}{q}\right) ^3 . $$

Direct-approximation methods were proposed in [19, 25], but their coefficients are large and amplify errors of polynomial basis. A composition with inverse sine function that offers a trade-off between the precision and the remaining level was proposed to remove the fundamental approximation error between the sine function and the modular reduction [24]. However, the evaluation of inverse sine function has a considerable multiplicative depth.

Those prior researches tried to find the minimax approximate polynomial \(p_n\), which minimizes \(\left\| f - p_n \right\| _{\textsf {sup}}\), where f is the function to approximate, such as sine function [7, 9, 18, 24]. Lee \(\textit{et al.}\) proposed the multi-interval Remez algorithm [24], which is an iterative method to find minimax approximate polynomial of an arbitrary piece-wise continuous function.

figure a

3.3 Baby-Step Giant-Step Algorithms

There are several baby-step giant-step algorithms for a different purpose in the context of HE. In this paper, BSGS only refers to the polynomial evaluation algorithm proposed in [18] and its variants. The BSGS algorithm is presented in Algorithm 1 composed of SetUp, BabyStep, and GiantStep. \(\textsc {SetUp}\) calculates all the polynomial bases required to evaluate the given polynomial. The \(\textsc {GiantStep}\) divides the input polynomial by a polynomial of degree \(2^ik\) and calls \(\textsc {GiantStep}\) recursively for its quotient and remainder, where \(i \le \left\lfloor \log (\textsf {deg}/k) \right\rfloor \) for an integer k, and \(\textsf {deg}\) is the degree of the polynomial. When the given polynomial has a degree less than k, it calls \(\textsc {BabyStep}\), and it evaluates the given polynomial of a small degree, namely a baby polynomial.

Originally, Han and Ki proposed to use a power-of-two k [18], and Lee \(\textit{et al.}\) generalized k to an arbitrary even number and proposed to omit even-degree terms for odd polynomial,Footnote 1 which reduces the number of ciphertext-ciphertext multiplications [24]. The number of ciphertext-ciphertext multiplications is given as

$$\begin{aligned} k -2 + l + 2^l \end{aligned}$$

in general, and

$$\begin{aligned} \left\lfloor \log \left( k-1 \right) \right\rfloor + k/2 -2 + l + 2^l \end{aligned}$$

for odd polynomials, where \(\textsf {deg}< k \cdot 2^l\) is satisfied. Also, Bossuat \(\textit{et al.}\) improved to do more recursion for high-degree terms [3] to optimize the multiplicative depth. In the BSGS algorithm of Bossuat \(\textit{et al.}\), we can evaluate a polynomial of degree up to \(2^d-1\) within multiplicative depth d by applying \(O(\log k)\) additional multiplications.

4 Optimal Approximate Polynomial of Modular Reduction for Bootstrapping of the CKKS Scheme

This section proposes a new method to find the optimal approximate polynomial of the modular reduction function for the CKKS bootstrapping, considering the noisy computation nature of the CKKS Scheme. The optimality of the proposed approximate polynomial is proved, and statistics of input for an approximate polynomial are also analyzed to improve the approximation.

4.1 Error Variance-Minimizing Polynomial Approximation

We use the variance of error as the objective function for the proposed polynomial approximation and show that it is also optimal for CKKS bootstrapping. As described later in the following subsections, the error in the noisy polynomial basis, namely the basis error, might be amplified by coefficients of the approximate polynomial. Thus, the magnitude of its coefficients should be small, and using the generalized least square method, the optimal coefficient vector \(\boldsymbol{c}^*\) of the approximate polynomial is obtained as

$$\begin{aligned} \boldsymbol{c}^* = {\arg \min }_{\boldsymbol{c}}\left( Var[e_\textsf {aprx}]+ \sum w_i \boldsymbol{c}_i^2\right) , \end{aligned}$$
(1)

where \(e_\textsf {aprx}\) is the approximation error, and the constant values \(w_i\) are determined by the basis error given by CKKS parameters such as key Hamming weight, number of slots, and scaling factor.

We call the proposed approximate polynomial obtained by (1) as the error variance-minimizing approximate polynomial, and we derived an analytic solution. We note that the optimized solution attempts to minimize the variance of the approximation error as well as the variance of amplified basis error. The error variance-minimizing approximate polynomial is described in detail by taking bootstrapping as a specific example in the following subsection. It is worth noting that the approximation can be applied arbitrary function.

4.2 Optimality of the Proposed Direct Approximate Polynomial

In this subsection, we show that the proposed error variance-minimizing approximate polynomial is optimal for CKKS bootstrapping in the following aspects. First, we show that an approximate polynomial that minimizes the error variance after \(\textsc {EvalMod}\) also minimizes the bootstrapping error variance, and thus it is optimal in terms of SNR. Next, we show that the direct approximation to the modular reduction allows a more accurate approximation than previous indirect approximations using trigonometric functions [3, 7, 9, 18, 24] for fixed multiplicative depth.

Error-Optimality of the Proposed Approximate Polynomial in CKKS Bootstrapping. Here, we show that error variance-minimizing approximate polynomial guarantees the minimal error after bootstrapping in the aspect of SNR, while the minimax approach in [3, 7, 9, 18, 24] does not guarantee the minimax error after bootstrapping. In \(\textsc {EvalMod}\), the operations between different slots do not happen, and thus we can assume that the error in each slot is independent. The \(\textsc {SlotToCoeff}\) is the homomorphic operation of decoding, and the decoding of m(X) is given as \((m(\zeta _0), m(\zeta _1), \dots , m(\zeta _{N/2-1}))\). Hence, the error in the j-th slot after \(\textsc {SlotToCoeff}\) is given as \(e_{\textsf {boot},j}(\zeta _j) = \sum _{i=0}^{N-1} e_{\textsf {mod},i} \cdot \zeta _{j}^{i}\) which is the sum of thousands of independent random variables, where \(e_{\textsf {mod},i}\) denotes the error in the i-th slot after \(\textsc {EvalMod}\) and \(|\zeta _j|=1\).

The minimax approximate polynomial minimizes \(\left\| e_\textsf {aprx}(t) \right\| _{\textsf {sup}}\) [7, 24]. In this case, we have \(e_{\textsf {mod},i} = e_\textsf {aprx}(t_i) + e_{\textsf {noise},i}\), where \(t_i\) is the i-th slot value after \(\textsc {CoeffToSlot}\) and \(e_{\textsf {noise},i}\) is the random error by the noisy polynomial basis of CKKS. Hence, the minimax approximation minimizes \(\max \left( \left|e_\textsf {aprx}(t_i) \cdot \zeta _{j}^{i} \right| \right) =\left\| e_\textsf {aprx}(t_i) \right\| _{\textsf {sup}}\), not \(\max \left( \left|e_{\textsf {boot},j} \right| \right) \). In other words, we observe that the final bootstrapping error is \(e_{\textsf {boot},j}\), and we have

$$\begin{aligned} \nonumber \max \left( \left|e_{\textsf {boot},j} \right| \right)&= \max \left( \left|\sum _{i=0}^{N-1} e_{\textsf {mod},i} \cdot \zeta _{j}^{i} \right| \right) = \max \left( \left|\sum _{i=0}^{N-1} \left( e_\textsf {aprx}(t_i) + e_{\textsf {noise},i} \right) \cdot \zeta _{j}^{i} \right| \right) \\&\le \max \left( \left|\sum _{i=0}^{N-1} e_\textsf {aprx}(t_i) \cdot \zeta _{j}^{i} \right| \right) + \max \left( \left|\sum _{i=0}^{N-1} e_{\textsf {noise},i} \cdot \zeta _{j}^{i} \right| \right) , \end{aligned}$$
(2)

where

$$\begin{aligned} \max \left( \left|\sum _{i=0}^{N-1} e_\textsf {aprx}(t_i) \cdot \zeta _{j}^{i} \right| \right) \le \left\| \zeta _j^0 \cdot e_\textsf {aprx} \right\| _{\textsf {sup}} + \dots + \left\| \zeta _j^{N-1} \cdot e_\textsf {aprx} \right\| _{\textsf {sup}} . \end{aligned}$$

Hence, the minimax approximate polynomial does not guarantee the minimum infinity norm of bootstrapping error but provides an upper bound only for the approximation error term. Besides, it is challenging to optimize polynomial coefficients for noisy basis in the existing minimax approximation.

In contrast, the proposed error variance-minimizing approximate polynomial minimizes \(Var{[e_{\textsf {mod},j}]}\). Thus, it also minimizes the final bootstrapping error \(Var{[e_{\textsf {boot},j}]}\), as

$$\begin{aligned} Var{[e_{\textsf {boot},j}]} = Var{[e_{\textsf {mod},0} \cdot \zeta _j^0]} + \dots + Var{[e_{\textsf {mod},N-1} \cdot \zeta _j^{(N-1)}]} . \end{aligned}$$

The above equation implies that minimizing the variance of the approximate error is optimal to reduce the bootstrapping error of the CKKS scheme in the aspect of SNR. Due to the characteristics of \(\textsc {SlotToCoeff}\), we have the tight value of the variance of bootstrapping error, while the minimax provides an upper bound of infinity norm. In other words, we can optimize our objective function by the proposed error variance-minimizing approximate polynomial, whereas the minimax approach optimizes an upper bound (the right-hand side of (2)) instead of the bootstrapping error (the left-hand side of (2)).

Depth Optimality of Direct Approximation. As shown in (1), the proposed method approximates the objective function directly, while the prior works approximate trigonometric functions [3, 7, 9, 18, 24]. Let \(P_\textsf {deg}\subset \mathbb {C}[X]\) be the set of all polynomials whose degree is less than or equal to \(\textsf {deg}\). When we perform a direct approximation, the algorithm finds an approximate polynomial among all elements of \(P_\textsf {deg}\), and its multiplicative depth is \(\left\lceil \log (\textsf {deg}) \right\rceil \).

When we use the approximation of trigonometric function, the search space of the approximation algorithm is much more limited. For example, as in [3, 24], suppose that we use the double angle formula twice and approximate polynomial for cosine and arcsine of degree \(\textsf {deg}_1\) and \(\textsf {deg}_2\), respectively. Then the search space is

$$\begin{aligned} \left\{ f_2\circ g \circ f_1|f_1\in P_{\textsf {deg}_1}, f_2\in P_{\textsf {deg}_2}\text {, and } g(x) = (x^2-1)^2-1 \right\} . \end{aligned}$$

We can see that the search space is much smaller than \(P_{4\textsf {deg}_1\textsf {deg}_2}\), and its multiplicative depth is \(\left\lceil \log (\textsf {deg}_1+1) \right\rceil + 2 + \left\lceil \log (\textsf {deg}_2+1) \right\rceil \ge \left\lceil \log (4\textsf {deg}_1\textsf {deg}_2+1) \right\rceil \). Hence, the direct approximation in \(P_{4\textsf {deg}_1\textsf {deg}_2}\) has more chance to find a better approximation as well as it has less multiplicative depth.

4.3 Noisy Polynomial Basis and Polynomial Evaluation in the CKKS Scheme

Let \(\{\phi _0(x),\phi _1(x),\dots , \phi _n(x)\}\) denote a polynomial basis of degree n such that every \(\phi _k(t)\) is odd for an odd k. When a polynomial \(p(x)=\sum c_i \phi _i(x)\) is evaluated homomorphically, it is expected that the result is \(p(x) + e\) for a small error e. In the CKKS scheme, there exists an error in encrypted data, and thus, each \(\phi _i(x)\) contains independent \(e_{\textsf {basis}, i}\), namely the basis error. Thus, the output is

$$\begin{aligned} \sum c_i(\phi _i(x) + e_{\textsf {basis},i}) = p(x) + \sum c_i e_{\textsf {basis},i}. \end{aligned}$$

In general, \(\sum c_i e_{\textsf {basis},i}\) is small as \(e_{\textsf {basis},i}\) are small. However, when \(|c_i|\) are much greater than p(x), \(\sum c_i e_{\textsf {basis},i}\) dominates p(x).

The basis errors, \(e_{\textsf {basis},i}\) are introduced by rescaling, key switching, and encryption errors, which are independent of the message. Each \(\phi _i(x)\) is usually obtained from smaller-degree polynomials, and thus there may be some correlation between \(e_{\textsf {basis},i}\)’s. If we assume that each \(e_{\textsf {basis},i}\) is independent, then the variance of \(\sum c_i \cdot e_{\textsf {basis},i}\) becomes \(\sum c_i^2 \cdot Var(e_{\textsf {basis},i})\) and \(w_i\) in (1) corresponds to \(Var(e_{\textsf {basis},i})\). The experiments in Sect. 6 support that our approximation with this independence assumption obtains accurate approximations for bootstrapping in practice. In other words, we do not need exact distributions of \(e_{\textsf {basis},i}\) in practice.

In conclusion, the magnitude of \(c_i\)’s should be controlled when we find an approximate polynomial. A high-degree approximate polynomial for modular reduction and piece-wise cosine function has large coefficients magnitude in previous works [19, 24]. There have been series of studies in approximate polynomials in the CKKS scheme [3, 7, 12, 18, 19, 24, 25], but the errors amplified by coefficients were not considered in the previous studies.

4.4 Optimal Approximate Polynomial for Bootstrapping and the Magnitude of Its Coefficients

The most depth-consuming and noisy part of bootstrapping is \(\textsc {EvalMod}\). In this subsection, we show how to find the optimal approximate polynomial for \(\textsc {EvalMod}\) in the aspect of SNR. By scaling the modular reduction function \([\cdot ]_q\) by \(\frac{1}{q}\), we define

$$ f_\textsf {mod}: \bigcup _{i=-K+1}^{K-1}I_i \rightarrow \left[ -\epsilon , \epsilon \right] , \text { that is, } f_\textsf {mod}(t) = t - i \text { if } t \in I_i, $$

where \(I_i = \left[ i-\epsilon , i+\epsilon \right] \) for an integer \(-K< i < K\) . Here, \(\epsilon \) denotes the ratio of the maximum coefficient of the message polynomial and the ciphertext modulus, that is, \({{|m_i}/{q}|}\le \epsilon \), where \(m_i\) denotes a coefficient of m(X). Let T be the random variable of input t of \(f_\textsf {mod}(t)\). Then, \({T = R + I}\text {,}\) where R is the random variable of the rational part r, and I is the random variable of the integer part i. We note that \(\mathrm {Pr}_{{{T}}}\left( {t} \right) = \mathrm {Pr}_{{{I}}}\left( {i} \right) \cdot \mathrm {Pr}_{{R}}\left( {r} \right) \) is satisfied for \(t=r+i\) as i and r are independent and \(\bigcup _i I_i = [-\epsilon ,\epsilon ]\times \{0, \pm 1, \dots , \pm (K-1)\}\), where \(\mathrm {Pr}_{{T}}, \mathrm {Pr}_{{I}}, \) and \(\mathrm {Pr}_{R}\) are the probability mass functions or probability density functions of TI,  and R, respectively.

The approximation error for t is given as

$$\begin{aligned} e_\textsf {aprx}(t) = p(t) -f_\textsf {mod}(t) = p(t) - (t-i) , \end{aligned}$$

where a polynomial \(p(t)=\sum c_i \phi _i(t)\) approximates \(f_\textsf {mod}(t)\). We can set p(t) as an odd function because \(f_\textsf {mod}(t)\) is odd. Then the variance of \(e_\textsf {aprx}\) is given as

$$\begin{aligned} Var[e_\textsf {aprx}]&= E[e_\textsf {aprx}^2] = \int _t e_\textsf {aprx}(t)^2 \cdot \mathrm {Pr}_{{{T}}}\left( {t} \right) dt\\&= \sum _{-K< i < K} \mathrm {Pr}_{{{I}}}\left( {i} \right) \int _{t=i-\epsilon }^{i+\epsilon } e_\textsf {aprx}(t)^2 \cdot \mathrm {Pr}_{{R}}\left( {t-i} \right) dt , \end{aligned}$$

where the mean of \(e_\textsf {aprx}\) is zero by assuming that \(\mathrm {Pr}_{{{T}}}\left( {t} \right) \) is even. It is noted that the integral can be directly calculated or approximated by the sum of discretized values as in [25].

The basis error \(\sum c_i^2 \cdot Var(e_{\textsf {basis},i})\) is also added as discussed in Subsect. 4.3. We generalize \(Var(e_{\textsf {basis},i})\) by \(w_i\). Then, we find \(\boldsymbol{c}^*\) such that

$$\begin{aligned} \boldsymbol{c}^* = {\arg \min }_{\boldsymbol{c}}\left( Var[e_\textsf {aprx}]+\sum w_i{c}_i^2 \right) , \end{aligned}$$
(3)

and its solution satisfies

$$\begin{aligned} \nabla _{\boldsymbol{c}} \left( Var[e_\textsf {aprx}]+\sum w_i{c}_i^2 \right) =0, \end{aligned}$$

where \(\boldsymbol{c}= (c_1, c_3, \dots , c_n)\) and \(\boldsymbol{w} = (w_1, w_3, \dots , w_n)\) are coefficient and weight constant vectors, respectively. We note that the objective function is convex.

It is noted that \(Var(e_{\textsf {basis},i})\) may differ by i, and thus, a precise adjustment of the magnitude of polynomial coefficients can be made by multiple weight constants, \(w_i\)’s. The following theorem states that we can find the approximate polynomial for p(t) efficiently; the computation time of solving this system of linear equations is the same as that of finding an interpolation polynomial for given points. It will be faster than the improved multi-interval Remez algorithm [24], as the Remez algorithm requires an interpolation per each iteration.

Theorem 1

There exists a polynomial-time algorithm that finds the odd polynomial \(p(t) = \sum c_i\phi _i(t)\) satisfying

$$\begin{aligned} {\arg \min }_{\boldsymbol{c}}{ \left( Var[e_{ \textsf {aprx}}]+\sum w_i\boldsymbol{c}_i^2 \right) } , \end{aligned}$$

when \(\mathrm {Pr}_{{{T}}}\left( {t} \right) \) is an even function.

Proof

By substituting \(p(t)=\sum c_i \phi _i(t)\) from \( Var[e_\textsf {aprx}] = E[e_\textsf {aprx}^2] = E[f_\textsf {mod}(t)^2] - 2E[f_\textsf {mod}(t)\cdot p(t)] + E[p(t)^2], \) we have

$$\begin{aligned} \frac{\partial }{\partial c_j} Var[e_\textsf {aprx}]&= - 2E[f_\textsf {mod}(t)\phi _j(t)] + 2\sum _i c_i\cdot E[\phi _i(t)\phi _j(t)]. \end{aligned}$$

Therefore, one can find \( \boldsymbol{c}^*= {\arg \min }_{\boldsymbol{c}} {\left( Var[e_\textsf {aprx}]+\sum w_i{c}_i^2\right) } \) by solving the following system of linear equations:

$$\begin{aligned} \left( \mathbf {T}+ \boldsymbol{w}\mathbf {I}\right) \cdot \boldsymbol{c} = \boldsymbol{y} , \end{aligned}$$
(4)

where \(\boldsymbol{w}\) is a diagonal matrix where \(\boldsymbol{w}_{ii} = w_i\),

$$\begin{aligned} \mathbf {T}= \begin{bmatrix} E[\phi _1\cdot \phi _1] &{} E[\phi _1\cdot \phi _3] &{} \dots &{} E[\phi _1\cdot \phi _n] \\ E[\phi _3\cdot \phi _1] &{} E[\phi _3\cdot \phi _3] &{} \dots &{} \vdots \\ \vdots &{} &{} \ddots &{} \vdots \\ E[\phi _n\cdot \phi _1] &{} E[\phi _n\cdot \phi _3] &{} \dots &{} E[\phi _n\cdot \phi _n] \end{bmatrix} \text {, and } \boldsymbol{y} = \begin{bmatrix} E[f_\textsf {mod}\cdot \phi _1]\\ E[f_\textsf {mod}\cdot \phi _3]\\ \vdots \\ E[f_\textsf {mod}\cdot \phi _n]\\ \end{bmatrix} . \end{aligned}$$

\(E[\phi _i\cdot \phi _j]\) and \(E[f_\textsf {mod}\cdot \phi _i]\) are integral of polynomials, which are easily calculated. Also, the equation can be simplified by the linear transformation from monomial basis to \(\phi \), and thus, the approximation of other functions is readily obtained.

   \(\square \)

4.5 Statistical Characteristics of Modular Reduction

The input distribution of the proposed approximate polynomial, represented by \(\mathrm {Pr}_{I}\) and \(\mathrm {Pr}_{R}\), is required to find \(\mathbf {T}\) and \(\boldsymbol{y}\). Unfortunately, in HE, it is not always possible to utilize the message distribution as it might be related to security. However, we observe and analyze that the major part of the input distribution of approximate polynomial is unrelated to the security.

After ModRaise, the plaintext in the ciphertext \(\textsf {ct}=(b, a)\) is given as

$$\begin{aligned} t(X)&= q \cdot I(X) + m(X) = \langle \textsf {ct}, \textsf {sk}\rangle \left( \mathrm {mod}~X^N + 1\right) , \end{aligned}$$

where \(\textsf {sk}\) has Hamming weight h and each coefficient of a ciphertext (ba) is an element of \(\mathbb {Z}_q\). The RLWE assumption states that a ciphertext is uniformly distributed over \(\mathcal {R}_q^2\), and thus each coefficient of b and a is distributed uniformly at random. In other words, coefficients of \(b+a\cdot s\) follow the well-known Irwin-Hall distribution. Especially, it is a sum of \(h+1\) independent and identically distributed uniform random variables.

We note that one can exploit the distribution of I without security concerns. This is because the probability distribution \(\mathrm {Pr}_{I}\) is given by the RLWE assumption (that b and a are uniformly distributed), regardless of the message distribution. Also, the implementation results in Sect. 6 show that we can achieve high approximation accuracy of the proposed approximate polynomial using \(\mathrm {Pr}_{I}\) even if we set to the worst-case of \(\mathrm {Pr}_{R}\).

Table 1. Experimental result and theoretical probability mass function of I when \(h=192\)

We can numerically obtain the distribution of I or analytically derive its distribution. Table 1 is the probability mass function of I, obtained numerically using \(\texttt {SEAL}\) and analytically derived by using Irwin-Hall distribution. It is shown that the experimental results and our probability analysis using the Irwin-Hall distribution agree. In previous researches, a heuristic assumption is used, and a high-probability upper bound \(K=O(\sqrt{h})\) for \(\Vert I\Vert _\infty \) is used for polynomial approximation [3, 9, 18, 24], but they could not utilize the distribution of I.

For \(\mathrm {Pr}_{R}\), we can set the worst-case scenario; message m(X) is uniformly distributed over \(\Vert m\Vert _\infty <\epsilon \cdot q\), as it results in the most significant entropy of the message. The experimental results in Sect. 6 show that even though the worst-case scenario is used and the distribution of m(X) is different from the actual one, the error value in the proposed method is comparable to the prior arts [3, 24] while consuming less depth. Also, in the experiment of [3], a uniformly distributed message is used to simulate the bootstrapping error and utilized the fact that m(X) is highly probable to be in the center to use a small-degree arcsine Taylor expansion. We note that we can also heuristically assume a specific distribution in our bootstrapping when we specify \(\mathrm {Pr}_{R}\) for (1) and improve the precision.

5 Lazy Baby-Step Giant-Step Algorithm

This section proposes error and complexity optimization when evaluating the error variance-minimizing approximate polynomial in bootstrapping. There are two optimizations: First, we show that the error variance-minimizing approximate polynomial is odd, and thus, we can ignore the even-degree terms. Second, we propose a novel evaluation algorithm, namely the lazy-BSGS algorithm, to reduce the computational complexity of \(\textsc {EvalMod}\).

5.1 Reducing Error and Complexity Using Odd Function Property

When the approximate polynomial is an odd function, we can save time for both homomorphically evaluating and finding the polynomial. Moreover, by omitting the even-degree terms, we can reduce the approximate error and basis errors.

Error Variance-Minimizing Polynomial for an Odd Function. This subsection shows that the variance-minimizing polynomial is an odd function, where \(\mathrm {Pr}_{{{T}}}\left( {t} \right) \) is even. Using an odd polynomial, we can reduce the approximation error and the computation time to find the proposed approximate polynomial. First of all, we only need to integrate over the positive domain when obtaining each element of (4). Second, the number of operations to evaluate the approximate polynomial can also be reduced by omitting even-degree terms when using the lazy-BSGS algorithm Algorithm 2 in the following subsection. Finally, the basis error is also reduced as only half of the terms are added.

The following theorem shows that when the objective of polynomial approximation such as \(f_\textsf {mod}(t)\) is odd and the probability density function is even, the error variance-minimizing approximate polynomial is also an odd function.

Theorem 2

If \(\mathrm {Pr}_{{{T}}}\left( {t} \right) \) is an even function and f(t) is an odd function, the error variance-minimizing approximate polynomial for f(t) is an odd function.

Proof

Existence and uniqueness: Equation (3) is a quadratic polynomial for the coefficients \(\boldsymbol{c}\), and thus there exists one and only solution.

Oddness: Let \(P_m\) be the subspace of the polynomials of degree at most m and \(f_m(t)\) denote the unique element in \(P_m\) that is closest to f(t) in terms of the variance of difference. Then, \(Var[-f(-t) - p(t)] + \sum w_i c_i^2\) is minimized when \(p(t) = -f_m(-t)\), because

$$\begin{aligned} Var\left[ -f(-t) - p(t)\right]&= \int _{t} (-f(-t) - p(t))^2\cdot Pr(t) dt\\&= \int _{-u} -(f(u) + p(-u))^2\cdot Pr(-u) du\\&= \int _{u} \left( f(u) - (-p(-u))\right) ^2\cdot Pr(u) du\\&= Var\left[ f(t) - (-p(-t))\right] , \end{aligned}$$

and the squares of coefficients of \(f_m(t)\) and \(-f_m(-t)\) are the same. As the error variance-minimizing approximate polynomial is unique, we conclude \(f_m(t)= -f_m(-t)\).

   \(\square \)

figure b

5.2 Lazy Baby-Step Giant-Step Algorithm

In this subsection, we propose a new algorithm that efficiently evaluates arbitrary polynomials over the CKKS scheme, namely the lazy-BSGS algorithm in Algorithm 2, and we extend it to the odd polynomials. We apply the lazy relinearization and rescaling technique [1, 2, 8, 22] to the BSGS algorithm to improve its time complexity and error performance. For example, when we evaluate a polynomial of degree 711 by using the ordinary BSGS algorithm in [18], 58 non-scalar multiplications are required; however, when we use the odd-BSGS algorithm [24], 46 non-scalar multiplications are required. Moreover, the lazy relinearization method reduces the number of relinearizations to 33, which is the same number of relinearizations for a polynomial of degree 220 using the ordinary BSGS algorithm.

The relinearization and rescaling introduce additional errors in the CKKS scheme, and the error propagates along with homomorphic operations. Hence, we should delay the relinearization and rescaling to reduce the error of the resulting ciphertext. Moreover, those operations, especially relinearization, require many NTTs, and thus it requires lots of computation. For some circuits, we can reduce the numbers of relinearizations and rescalings by delaying them. We observe that we can perform plaintext addition, ciphertext addition, and scalar multiplication to a ciphertext before relinearization.

A ciphertext is a three-tuple \((d_0, d_1, d_2) \in \mathcal {R}_{q_L}^3\) such that \(\langle {(d_0, d_1, d_2), (1,s,s^2)}\rangle \) \(=m+e\). A plaintext \(u\in \mathcal {R}\) can be multiplied homomorphically by calculating \((u\cdot d_0, u\cdot d_1, u\cdot d_2)\), but we note that the error is amplified by the magnitude of u. When we add a ciphertext (ba) to \((d_0, d_1, d_2)\), we get \((d_0 +b, d_1+a, d_2)\). However, as the scaling factor of ciphertext is changed along with homomorphic operations, we should make sure that the scaling factors of the two ciphertexts are identical when we add two ciphertexts. If not, we can multiply a constant, \(\varDelta _1 / \varDelta _2\), to a ciphertext which has a smaller scaling factor and then add, where \(\varDelta _1\) is the larger scaling factor, and \(\varDelta _2\) is the smaller scaling factor. Alternatively, we can use the scaling factor management technique proposed in [21].

We propose the lazy-BSGS algorithm, which reduces the numbers of rescalings and relinearizations, and we analyze its computational complexity. Here, we rigorously analyze the number of relinearizations as its complexity is much higher than other operations, and we note that the number of rescalings is also similar. As we use the Chebyshev polynomial of the first kind as the polynomial basis, we explain the lazy-BSGS algorithm with Chebyshev polynomial. For the sake of brevity, we denote ciphertext-ciphertext multiplication by \(\cdot \), and the ciphertext of \(T_j(t_0)\) is denoted by \(\textsf {ct}_j\), where \(T_j\) is Chebyshev polynomial of the first kind with degree j.

\(\textsc {SetUp}\) finds all the Chebyshev polynomials of degree less than or equal to k, and \(T_{2^i k}\) for \(i< l\), for given parameter k and l. We use \(T_{a} = 2\cdot T_{2^i}\cdot T_{a - 2^i} - T_{2^{i+1}- a}\) to find \(\textsf {ct}_a\), where \(i=\left\lfloor \log (a) \right\rfloor \). We note that one can alternatively use multiplication of odd degree polynomials to reduce the basis error, which is presented in Subsect. 5.4.

First, we find \(\textsf {ct}_{2^i}\) for \(i < k\), and these are used to find other Chebyshev bases with degrees less than k. Thus, we rescale and relinearize them, which requires \(\left\lfloor \log (k-1) \right\rfloor \) rescalings and relinearizations. When calculating \(\textsf {ct}_{a} = 2\cdot \textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i} - \textsf {ct}_{2^{i+1}- a}\), if \(\textsf {ct}_{a - 2^i}\) is a three-tuple ciphertext, we relinearize it (and rescale it if needed.) We note that the lazy rescaling makes it possible to accurately subtract \(\textsf {ct}_{2^{i+1}- a}\) from \(2\cdot \textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\) without level consumption as follows. We do not rescale \(\textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\) here, and thus the scaling factor of \(2\cdot \textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\) is maintained as \(\approx q^2\). Obviously, the level of \(\textsf {ct}_{2^{i+1}-a}\) is larger than that of \(\textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\). When their scaling factors are different, we multiply \(\left( \varDelta _{2^i}\cdot \varDelta _{a - 2^i} \right) \cdot p_l / \varDelta _{2^{i+1}-a}\) to \(\textsf {ct}_{2^{i+1}-a}\) and rescale if \(\varDelta _{2^{i+1}-a}\approx q^2\), or multiply \(\left( \varDelta _{2^i}\cdot \varDelta _{a - 2^i} \right) / \varDelta _{2^{i+1}-a}\) if \(\varDelta _{2^{i+1}-a}\approx q\), where \(\varDelta _{j}\) denotes the scaling factor of \(\textsf {ct}_j\), and \(p_l\) is the last prime of modulus chain for \(\textsf {ct}_{a - 2^i}\). Now, the scaling factors of \(\textsf {ct}_{2^{i+1}-a}\) and \(2 \textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\) are the same, and thus we can subtract them without additional error from the difference of scale.

To evaluate \(\textsf {ct}_{2^i}\cdot \textsf {ct}_{a - 2^i}\), we need to relinearize \(\textsf {ct}_{a - 2^i}\) if it is not relinearized yet. Hence, we need relinearized \(\textsf {ct}_j\)’s for \(j < 2^{\left\lfloor \log {k-1} \right\rfloor -1}\) to find \(\textsf {ct}_i\) for all \(i < 2^{\left\lfloor \log {k-1} \right\rfloor }\). Moreover, if \(k \ge 2^{\left\lfloor \log {k-1} \right\rfloor } +2^{\left\lfloor \log {k-1} \right\rfloor -1}\), we need \(k- 2^{\left\lfloor \log {k-1} \right\rfloor } +2^{\left\lfloor \log {k-1} \right\rfloor -1}\) more relinearizations. Each \(\textsf {ct}_{2^ik}\) should be relinearized as it is used for multiplication in \(\textsc {GiantStep}\), which requires l relinearizations. In conclusion, we do

$$\left\lfloor \log (k-1) \right\rfloor + (2^{\left\lfloor \log {k-1} \right\rfloor -1}-1) + l $$

relinearizations in \(\textsc {SetUp}\). If \(k \ge 2^{\left\lfloor \log {k-1} \right\rfloor } +2^{\left\lfloor \log {k-1} \right\rfloor -1}\), \(\left( k - 3\cdot 2^{\left\lfloor \log {k-1} \right\rfloor -1} \right) \) additional relinearizations are required.

\(\textsc {BabyStep}\) performs only plaintext multiplication and addition. Hence, it does not require relinearization in our lazy-BSGS algorithm, but the scale for baby-step polynomial coefficients should be adequately scaled to make the added ciphertexts have identical scaling factors, but this process does not involve additional computation at all. Note that the resulting ciphertext of \(\textsc {BabyStep}\) is not relinearized, i.e., it has size 3.

In \(\textsc {GiantStep}\), the \(\textsf {ct}_q\) is relinearized before multiplied to \(\textsf {ct}_{2^ik}\). Hence, the number of relinearizations is \(2^{l-1} + 2^{l-2} + \dots + 1 = 2^{l} -1\), and the final result is not relinearized. Thus, we perform relinearization once more right before \(\textsc {SlotToCoeff}\).

Finally, the number of relinearizations in lazy-BSGS is

$$\left\lfloor \log (k-1) \right\rfloor + (2^{\left\lfloor \log {k-1} \right\rfloor -1}-1) + l + 2^l$$

if \(k < 2^{\left\lfloor \log {k-1} \right\rfloor } +2^{\left\lfloor \log {k-1} \right\rfloor -1}\) and otherwise

$$\left\lfloor \log (k-1) \right\rfloor + \left( 2^{\left\lfloor \log {k-1} \right\rfloor -1}-1 \right) + l + 2^l + \left( k - 3\cdot 2^{\left\lfloor \log {k-1} \right\rfloor -1} \right) .$$

Lazy-BSGS for Odd Polynomial. We can naturally extend the lazy-BSGS for the odd polynomials. Here, \(\textsc {SetUp}\) finds all the odd-degree Chebyshev polynomials of degrees less than k. To find an odd-degree Chebyshev polynomial, we need an even-degree Chebyshev polynomial because the multiplication of odd-degree Chebyshev polynomials is not an odd-degree polynomial. Hence, we use \(T_{2^i}\) to find \(\textsf {ct}_a\), where \(i=\left\lfloor \log (a) \right\rfloor \), and thus we rescale and relinearize them, which requires \(\left\lfloor \log (k-1) \right\rfloor \) rescaling and relinearization. Thus, the number of relinearizations in lazy-BSGS for odd polynomial is

$$\left\lfloor \log (k-1) \right\rfloor + (2^{\left\lfloor \log {k-1} \right\rfloor -1}/2-1) + l + 2^l$$

if \(k < 2^{\left\lfloor \log {k-1} \right\rfloor } +2^{\left\lfloor \log {k-1} \right\rfloor -1}\) and otherwise

$$\left\lfloor \log (k-1) \right\rfloor + \left( 2^{\left\lfloor \log {k-1} \right\rfloor -1}/2-1 \right) + l + 2^l + \left( k - 3\cdot 2^{\left\lfloor \log {k-1} \right\rfloor -1} \right) /2.$$

Using the error variance-minimizing approximate polynomial in bootstrapping requires evaluating a polynomial with a higher degree than the previous composition methods. However, the lazy-BSGS algorithm reduces the time complexity by half, compared to ordinary BSGS mentioned in Sect. 2. As a result, the lazy-BSGS algorithm makes the time complexity of evaluating our polynomial comparable to the previous algorithm. Figure 1 compares our lazy-BSGS algorithm, odd-BSGS algorithm [24], and the original BSGS algorithm [18].

Fig. 1.
figure 1

Number of relinearizations for the variants of BSGS algorithms.

The lazy-BSGS algorithm is given in Algorithm 2 in detail. We note that the methods in [3] should be applied for optimal depth and scale-invariant evaluation, but we omit it for the sake of brevity. However, we note that Fig. 1 considers the depth optimization in [3], and thus, the number of relinearizations is high when the degree is close to a power of two. The BSGS coefficients are pre-computed for optimal parameters k and l to minimize the complexity.

5.3 Error Variance-Minimizing Approximate Polynomial for BSGS Algorithm

In this subsection, we propose a method to find the variance-minimizing approximate polynomial for the odd-BSGS algorithm. We generalize the amplified basis error and find the variance-minimizing coefficients for the odd-BSGS algorithm. The numerical method to select the weight constantly is also proposed.

BSGS Algorithm Coefficients and Minimizing the Approximation Error Variance. In the lazy-BSGS algorithm, we divide the given polynomial by \(T_{2^ik}\) and evaluate its quotient and remainder. Hence, each polynomial basis is multiplied by a divided coefficient, not \( c_i\). We define \(\boldsymbol{d}\) by the vector of coefficients multiplied to the basis in \(\textsc {BabyStep}\), in other words, we have \(2^l\) polynomials in \(\textsc {BabyStep}\) such that \(p_i^{\textsf {baby}}(t) = \sum _{j\in \{1,3,\dots ,k-1\}}{d_{i,j}T_j(t)}\) for \(i=0, 1,\dots , 2^l-1\), and \(\boldsymbol{d} = (d_{0,1}, d_{0,3}, \dots , d_{2^l-1,\textsf {deg}-k\cdot 2^{l-1}})\).

We should reduce the magnitude of \(\boldsymbol{d}\), to reduce the basis error. Let \(p(t) = \sum c_i T_i(t)\), and then, \(\boldsymbol{c}\) and \(\boldsymbol{d}\) have the following linearity:

$$\begin{aligned} \boldsymbol{c}&= \mathbf {L} \cdot \boldsymbol{d} = \begin{bmatrix} \mathbf {A}_{2^{l-1}k} \end{bmatrix} \cdot \begin{bmatrix} \mathbf {A}_{2^{l-2}k} &{} \boldsymbol{0}\\ \boldsymbol{0} &{} \mathbf {A}_{2^{l-2}k} \end{bmatrix} \cdots \begin{bmatrix} \mathbf {A}_{k} &{}&{}\\ &{} \ddots &{}\\ &{}&{} \mathbf {A}_{k} \end{bmatrix}\cdot \boldsymbol{d} , \end{aligned}$$
(5)

where

$$\begin{aligned} \mathbf {A}_k= \begin{bmatrix} \mathbf {I}_{k/2} &{} \frac{1}{2}\mathbf {J}_{k/2}\\ \boldsymbol{0} &{} \frac{1}{2}\mathbf {I}_{k/2}\\ \end{bmatrix} , \end{aligned}$$

\(\mathbf {I}_{k/2}\) is the \({k/2}\times {k/2}\) identity matrix, and \(\mathbf {J}_{k/2}\) is the \({k/2}\times {k/2}\) exchange matrix. Hence, the linear equation to find the error variance-minimizing approximate polynomial (4) is modified for the BSGS algorithm as

$$\begin{aligned} \left( \mathbf {L}^\intercal \mathbf {T}\mathbf {L} + \boldsymbol{w}\mathbf {I} \right) \cdot \boldsymbol{d} = \boldsymbol{y} . \end{aligned}$$
(6)

Generalization of Weight Constant. Let \(E_p\) be a function of \(\boldsymbol{d}\), which is the variance of basis error amplified by the BSGS algorithm. We simplify \(E_p\) by a heuristic assumption that \(T_i\)’s are independent and the encryptions of \(T_{k}(t), \dots , T_{2^{l-1}k}(t)\) have small error. Let \(\hat{T}_i\) be the product of all \(T_{2^j k}\)’s multiplied to \(p_i\) in the giant step, for example, \(\hat{T}_0 = 1\) and \(\hat{T}_3 = T_k T_{2k}\). Considering the error multiplied by \(d_{i,j}\), \(e_j\cdot \hat{T}_i\) is the dominant term as \(T_i\) has zero mean for odd integer i as it is an odd polynomial. Thus, we can say that

$$E_p \approx \sum _{i}\sum _{j} d_{i,j}^2 E[\hat{T}_i^2] Var[e_{\textsf {basis},j}],$$

a quadratic function of \(\boldsymbol{d}\). In other words, we have \(E_p = \boldsymbol{d}^\intercal \mathbf {H}\boldsymbol{d},\) where \(\mathbf {H}\) is a diagonal matrix that \( \mathbf {H}_{ki+j, ki+j} = E[\hat{T}_i^2] Var[e_{\textsf {basis},j}] . \) Thus, (3) is generalized as

$$\begin{aligned} \boldsymbol{c}^* = {\arg \min }_{\boldsymbol{c}}\left( Var[e_\textsf {aprx}]+E_p \right) . \end{aligned}$$

Equation (5) gives us that the optimal coefficient \(\boldsymbol{d}^*\) satisfies

$$\begin{aligned} \left( \mathbf {L}^\intercal \mathbf {T}\mathbf {L} + \mathbf {H} \right) \boldsymbol{d}^* = \mathbf {L}^\intercal \boldsymbol{y} . \end{aligned}$$
(7)

Numerical Method of Finding Optimal Approximate Polynomial. Instead of finding \(E_p\), a simple numerical method can also be used. In practice, the numerical method shows good error performance in the implementation in Subsect. 6.1. We can let \(w_i = w\) for all i and find w numerically. When w increases, the magnitude of coefficients decreases, and \(Var[e_\textsf {aprx}]\) increases, and thus its sum is a convex function of w. The magnitude of the basis errors that are amplified by coefficients \(\boldsymbol{d}\) has the order of the rescaling error whose variance is \(\frac{2n(h+1)}{12\cdot q^2}\), where n is the number of slots. In other words, we adjust w to minimize

$$\begin{aligned} Var[e_\textsf {aprx}]+ w \cdot \Vert \boldsymbol{d}\Vert _2^2 , \end{aligned}$$
(8)

where \(w \approx \frac{2n(h+1)}{12\cdot q^2}\). The odd-BSGS coefficients \(\boldsymbol{d}\), which minimize (8), satisfy

$$\begin{aligned} (\mathbf {L}^\intercal \mathbf {T}\mathbf {L} + w\mathbf {I}) \boldsymbol{d} = \mathbf {L}^\intercal \boldsymbol{y}. \end{aligned}$$

Lemma 1

(Rescaling error [9]). The error variance of rescaling error is \(\frac{2n(h+1)}{12}\), where h is key Hamming weight and n is the number of slots.

We can fine-tune w by a numerical method of performing bootstrapping and measure the bootstrapping error variance, and then adjust w. Once we decide on \(\boldsymbol{d}\), it becomes just part of the implementation; one can even hard-wire it.

Fig. 2.
figure 2

Variance of basis error in \(T_i(t)\) for even i using HEAAN (a) and \(\texttt {SEAL}\) (b) libraries with various parameters, where \(h=64\).

5.4 Basis Error Variance Minimization for Even-Degree Terms

In this subsection, we show that the even-degree Chebyshev polynomials in CKKS have huge errors and propose a method to find a small-error Chebyshev polynomial. In the BSGS algorithm, we use even-degree Chebyshev polynomials, namely, \(T_{2^ik}(t)\). For depth and simplicity, we usually obtain \(T_{a}(t)\) by using

$$\begin{aligned} T_{a}(t) = 2\cdot T_{2^i}(t)\cdot T_{a - 2^i}(t) - T_{2^{i+1}- a}(t), \end{aligned}$$

where \(i=\left\lfloor \log (a) \right\rfloor \). Let \(\textsf {ct}_i\) be the ciphertext of message \(T_i(t)\) with scaling factor \(\varDelta \), and it contains error \(e_{\textsf {basis},i}\). Then, the error in \(\textsf {ct}_{i+j}\) obtained by \(\textsf {ct}_{i+j} = 2\textsf {ct}_{i}\cdot \textsf {ct}_{j} -\textsf {ct}_{|i-j|}\) is given as

$$\begin{aligned} (2T_{i}(t)e_{\textsf {basis},j} + 2T_{j}(t)e_{\textsf {basis},i})\varDelta + 2e_{\textsf {basis},i}e_{\textsf {basis},j} - e_{\textsf {basis},|i-j|} . \end{aligned}$$
(9)

As \(\varDelta \gg e_{\textsf {basis},i}, e_{\textsf {basis},j}\), the dominant term of error variance in (9) is

$$\begin{aligned}&Var[2T_{i}(t) e_{\textsf {basis},j} + 2T_{j}(t) e_{\textsf {basis},i}] \nonumber \\&\qquad \qquad \qquad \qquad \quad \,\, \approx 4E[T_{i}(t)^2]Var[e_{\textsf {basis},j}] + 4E[T_{j}(t)^2]Var[e_{\textsf {basis},i}] . \end{aligned}$$
(10)

As a simple example, it is shown that \(E[T_{i}(t)^2]\) is close to one when i is an even number for low-degree polynomials, where t is a value after \(\textsc {CoeffToSlot}\). Meanwhile, \(E[T_{i}(t)]\) is zero and \(Var[T_{i}(t)]\) is a small value when i is odd. Thus, following to (10), the error remains large when it is multiplied by an even-degree Chebyshev polynomial in the calculation of the next Chebyshev polynomial. Therefore, when a is even, \(\textsf {ct}_{a}\) should be calculated by \(\textsf {ct}_{a} = 2\textsf {ct}_{2^i - 1}\cdot \textsf {ct}_{a + 1-2^i} - \textsf {ct}_{2^{i+1} - 2 - a}\) rather than \(\textsf {ct}_{a} = 2\textsf {ct}_{2^i}\cdot \textsf {ct}_{a-2^i} - \textsf {ct}_{2^{i+1} - a}\). Also, it is noted that, for the above reasons, the power-of-two polynomials should have a large basis error.

Table 2. The second moment of \(T_i(t)\) when t is value after \(\textsc {SlotToCoeff}\) and \(N=2^{15}\)

Figure 2 shows the experimental results of the variance of error in encryption of \(T_i(t)\) for even i’s, where t is the output value of CoeffToSlot. Square mark and x mark legends are the results with and without operation reordering, respectively. In other words, square marks are results from \(\textsf {ct}_a = 2 \textsf {ct}_{2^i-1}\cdot \textsf {ct}_{a - 2^i+1} - \textsf {ct}_{2^{i+1} - 2 -a}\) for even a. The experimental result in Fig. 2 supports our argument that multiplying even-degree Chebyshev polynomials amplifies the error. We can see that the basis error is significantly improved by reordering operations. For example, the variance of error in \(\textsf {ct}_{74}\) is reduced to 1/1973 compared to that of without reordering (Fig. 2).

6 Performance Analysis and Comparison

In this section, several implementation results and comparisons for the previous bootstrapping algorithms are presented. The bootstrapping using the proposed approximate polynomial is implemented on the well-known HE library \(\texttt {Lattigo}\), as \(\texttt {Lattigo}\) is the only open-source library that supports bootstrapping of RNS-CKKS at the time of writing. We also provide a proof-of-concept implementation of bootstrapping with high precision such as 93 bits, based on the \(\texttt {HEAAN}\) library.

6.1 Error Analysis

Weight Parameter and Approximation Error. In Subsect. 5.3, we discussed analytic and numerical solutions for error variance-minimizing approximate polynomial. In this subsection, these methods are implemented and verified. We confirm that the numerical method in Subsect. 5.3 finds a polynomial that is very close but has a slightly larger error than that of the optimal one, and \(w \approx \frac{(h+1) 2n }{q^2 12}\), where n is the number of slots.

Fig. 3.
figure 3

The theoretical variance of the approximation error, amplified basis error, and experimental results implemented in HEAAN. Polynomials of degree 81 are used. (Color figure online)

The experimental results are shown in Fig. 3 with parameters \(N = 2^{16}, h = 64\), and the slot size \(n = 2^3\). The blue lines with triangular legend show the error by polynomial approximation as \(2n \cdot q^2 \cdot Var[e_\textsf {aprx}].\) The green lines with x mark legend show the amplified basis errors as \(2n \cdot q^2 \cdot E_p,\) and the red lines with square legend are for the mean square of bootstrapping errors without scale obtained by experiments using the proposed approximate polynomial in (8). The gray dot line is the variance of bootstrapping error without scale, achieved by the analytic solution of the error variance-minimizing approximate polynomial (7) of the same degree, which is the lower bound of bootstrapping error variance. The reason for multiplying the above result by 2n is because of \(\textsc {SlotToCoeff}\) as discussed in Subsect. 4.2. For the worst-case assumption, we assume that m is distributed uniformly at random.

In Fig. 3, the sum of blue lines with triangular legend and green lines with x mark legend meets the red lines with the square legend. In other words, it shows that the theoretical derivation and experimental results are agreed upon. It can also be seen that it is possible to obtain an approximate polynomial with a small error with the proposed numerical method, but the error is slightly larger than that of the analytical solution. It is noted that the optimal w is close to the variance of the rescaling error \(\frac{(h+1) 2n }{q^2 12}\).

Polynomial Degree and Minimum Error. This subsection presents the experimental result of the approximate error variance of the proposed error variance-minimizing approximate polynomial for the given degree and constant w. In the above paragraphs, we show that when \(w\approx \frac{(h+1)2n}{q^{2}12}\), the variance of approximation error achieves the optimality. Unlike the previous methods that find the approximate polynomial without considering the CKKS parameters, the proposed approximation algorithm finds an approximate polynomial that is optimal for the given parameter of the CKKS scheme, such as the number of slots, key Hamming weight, and scaling factor.

In Fig. 4, we represent the variance of approximation error with \(w = \frac{(h+1)2n}{q^{2}12}\), where \(\left\| m/q \right\| _{\infty } < 2^{-5}\). \(w=2^{-104}\) corresponds to \(q\approx 2^{60}\) and slot size \(n=2^{14}\). \(w=2^{-200}\) corresponds to \(q\approx 2^{109}\) for the same slot size. In this figure, we can see that the proposed method approaches the maximal accuracy of polynomial approximation for \(q\approx 2^{60}\) within depth 10. Moreover, we can see that the proposed error variance-minimizing approximate polynomial achieves approximate error variance \(2^{-209}\) within depth only 11.

Fig. 4.
figure 4

Error variance of the proposed polynomial, for \(w=2^{-104}\) and \(2^{-200}\).

Table 3. Comparison of the variance of bootstrapping error of the proposed error variance-minimizing polynomial and prior arts. Columns “\(\cos \)” and “\(\sin ^{-1}\)” are for the degree of the approximate polynomial of each, and “double” is for the number of double angle formulas of cosine applied. The proposed method uses direct approximation, so the degree of the approximate polynomial is indicated by \(f_\textsf {mod}\).

6.2 Comparison of Bootstrapping and High-Precision Bootstrapping

Experimental Result of Bootstrapping Error. The proposed method is implemented using \(\texttt {Lattigo}\), and it is compared with the most accurate bootstrapping techniques in the literature [3, 24] in Table 3. In this table, the proposed error variance-minimizing polynomial directly approximates \(f_\textsf {mod}\), and the previous methods approximate the cosine function and use the double-angle formula. For a high precision achieved in [3, 24], approximate polynomials of \(\frac{1}{2\pi }\arcsin {(t)}\) by multi-interval Remez algorithm and Taylor expansion are evaluated, respectively, and the evaluation of those algorithms consumes three more levels. For a fair comparison, we fix the message precision as \(\approx 31\)-bits and compare the depth of modular reduction. The timing result is measured using Intel Xeon Silver 4210 CPU @ 2.20 GHz, single core. The scale-invariant evaluation [3] is also applied for a precise evaluation. The same parameter set as [3] is used for the proposed method, and thus the same levels are consumed for \(\textsc {CoeffToSlot}\) and \(\textsc {SlotToCoeff}\).

In the experiment, we sample each slot value \(a+bi \in \mathbb {C}\), where a and b are uniformly distributed over \([-1,1]\), and thus, from the central limit theorem, the coefficient of the encoded plaintext follows a Gaussian distribution. On the other hand, the proposed error variance-minimizing approximate polynomial is obtained under assumption that the coefficients of plaintext are distributed uniformly at random, that is, \(\mathrm {Pr}_{{R}}\left( {r} \right) = \frac{1}{2\epsilon }\) for all \(r\in [-\epsilon ,\epsilon ]\) as a worst-case assumption discussed in Sect. 4. We note that the difference of message distribution for approximation and actual experiment is a harsh environment for the proposed error variance-minimizing approximate polynomial.

The first three rows of Table 3 show that the proposed method requires less depth compared to the prior arts. This is due to the indirect approximation using trigonometric functions of previous methods. Compared to the previous method with the same depth of 10, our method has 9-bit higher precision. In another aspect, the proposed approximate polynomial achieves the same precision as the previous methods by only the depth of 10. The proposed bootstrapping consumes one to two fewer levels in \(\textsc {EvalMod}\), thus we used smaller parameters in the experiment which improves security. We can utilize the additional level depending on the application, for example, one can exploit it for efficient circuit design to reduce the total number of bootstrapping of the whole system (e.g., inference of privacy-preserving deep learning [23],) or we might speed up \(\textsc {CoeffToSlot}\) or \(\textsc {SlotToCoeff}\) using this remaining level. However, in terms of bootstrapping runtime for \(\approx 31\)-bit precision, our method is slower than previous methods due to the evaluation of high-degree polynomial. Our algorithm is more advantageous for higher precision as it is efficiently scalable, which is discussed in the next subsection.

Comparison of Numerical and Analytical Error. Experiments in Fig. 3 show that the error variance-minimizing approximate polynomial has \(Var[e_\textsf {aprx}]+ \sum w_i \boldsymbol{d}_i^2 = 2^{-103.33}\) when \(w = 2^{-104}\). We can easily find the expected bootstrapping error variance with this value. The error variance is multiplied by 2n in \(\textsc {SlotToCoeff}\); thus, the error variance after \(\textsc {SlotToCoeff}\) should be \(2^{-88.33}\). The scaling factor in bootstrapping is \(\approx q\), and thus, the error without scaling is \(2^{-88.33} \cdot q^2 \approx 2^{31.67}\). The scaling factor of a message is \(\approx 2^{45}\), and thus the expected bootstrapping error variance is \(2^{31.67}/p^2 \approx 2^{-58.33}\). Compared with the experimental result in Table 3, \(2^{-61.12}\), we can see that the numerical result roughly meets the analysis. The difference seems to be due to various methods to reduce the error introduced in Sect. 5.

Scalability and High-Precision Bootstrapping. The last two rows in Table 3 represent the proof-of-concept implementation of high-precision CKKS bootstrappingFootnote 2. In the table, we can see that the proposed method achieves high precision such as 93-bit with \(2^{12}\) slots. It is worth noting that our method uses the same depth as previous methods [3, 24], but achieves much higher accuracy. We use the variance-minimizing approximate polynomial of degree 1625 with parameter \(w=2^{-200}\) which minimizes (8) (the minimum value is \(\approx 2^{-209}\)), as shown in Fig. 4. We can also say that bottleneck of bootstrapping error is the scaling factor, not the approximation.

The previous bootstrapping methods so far cannot achieve such high accuracy, and even if so, it will have a huge multiplicative depth to perform a high-degree polynomial approximation. Julta and Manohar proposed sine series approximation and 100-bit accuracy CKKS bootstrapping using their approximation which consumes modulus of \(2^{2157}\) (see Table 1 in [20]). In contrast, due to the direct approximation of the proposed method, it has much less depth compared to indirect approximations, and thus we can achieve the same accuracy (shown in the last row or Table 3) with modulus about \(2^{1495}\), which corresponds to 6 more levels after bootstrapping. Also, the proposed lazy-BSGS algorithm reduces the number of relinearizations; the previous BSGS algorithm for odd polynomial [24] requires 66 relinearizations to evaluate polynomial of degree 1625.

This high accuracy is essential in the presence of the Li-Micciancio attack [26]. The “noise flooding” method is currently the only known way to make CKKS provably secure against Li-Micciancio attack, but it was impractical with bootstrapping as it makes CKKS noisy by losing about 30-40 bits of accuracy [26]. Although a lot of research is required on how to exploit the bootstrapping error for cryptanalysis, at least, we can directly apply the noise flooding technique [16] with the high-precision bootstrapping.

7 Conclusion

In this paper, we have two contributions for accurate and fast bootstrapping of CKKS scheme, that is, we proposed i) a method to find the optimal approximate polynomial of modular reduction for bootstrapping and its analytical solution, and ii) a more efficient algorithm to homomorphically evaluate a high-degree polynomial. The proposed error variance-minimizing approximate polynomial guarantees the minimum error after bootstrapping in the aspect of SNR; in contrast, the previous minimax approximation does not guarantee the minimum infinity norm of the bootstrapping error. Moreover, we proposed an efficient algorithm, the lazy-BSGS algorithm, to evaluate the approximate polynomial. The lazy-BSGS algorithm reduces the number of relinearizations by half compared to the ordinary BSGS algorithm, and the error is also reduced. We also proposed the algorithm to find the error variance-minimizing approximate polynomial designed for the lazy-BSGS algorithm.

The proposed algorithm reduces the level consumption of the most depth-consuming part of bootstrapping, approximate modular reduction. Thus we can reserve more levels after bootstrapping or we can use the level to speed up bootstrapping. The number of the levels after bootstrapping is significant for efficient circuit design of algorithms using CKKS [23], as well as it reduces the number of bootstrappings.

The bootstrapping performance improvement by the proposed algorithm was verified by an implementation. The implementation showed that we could reduce the multiplicative depth of modular reduction in CKKS bootstrapping while achieving the best-known accuracy. Also, we discussed that the proposed method achieves the CKKS bootstrapping with very high accuracy, so we can directly apply the noise flooding technique to the CKKS scheme for IND-CPA\(^\text {D}\) security.