1 Introduction

Homomorphic encryption (HE) is a powerful cryptographic primitive that enables performing computations over encrypted data without having access to the secret key. The HE research area has seen a lot of progress since the formulation of the first fully homomorphic encryption construction by Gentry in 2009 [17], and the schemes implemented in modern HE libraries are multiple orders of magnitude faster than the initial implementation of Gentry’s scheme [18]. The most common HE schemes are typically grouped into three classes based on the data types they support computations on. The first class primarily works with Boolean circuits and decision diagrams, similar to the original Gentry scheme, and includes the FHEW and TFHE schemes [12, 15]. The second class supports modular arithmetic over finite fields, which typically correspond to vectors of integers mod t, where t is a prime power commonly called as the plaintext modulus. The second class is also sometimes used for small-integer arithmetic. This class includes Brakerski-Gentry-Vaikuntantan (BGV) and Brakerski/Fan-Vercauteren (BFV) schemes [8, 9, 16]. The third, and most recent, class supports approximate computations over vectors of real and complex numbers, and is represented by the Cheon-Kim-Kim-Song (CKKS) scheme [11]. All these schemes are based on the hardness of the Ring Learning With Errors (RLWE) problem, where noise is added during encryption and key generation to achieve the hardness properties. The noise grows as encrypted computations are performed, and the main functional parameter in all these schemes, the ciphertext modulus Q, needs to be large enough to accommodate the noise growth, or a special bootstrapping procedure may be used to reset the noise and keep the value of Q relatively small.

Our work focuses on the HE schemes of the second class. Although the BGV and BFV schemes work with the same plaintext algebra, they use different strategies for encoding the message composed of integers in \(\mathbb {Z}_t\) and controlling the noise. The BGV scheme encodes the message in the least significant digit (LSD) of integers in \(\mathbb {Z}_Q\) and applies the modulus switching technique to keep the noise magnitude constant, i.e., it scales down Q by a factor that corresponds to the noise added after the previous modulus switching call. The BFV scheme encodes the message in the most significant digit (MSD) of integers in \(\mathbb {Z}_Q\) and uses a special form of homomorphic multiplication, where ciphertext polynomials are multiplied without modular reduction and then scaled down by Q/t. In BFV, the value of Q is typically constant and the noise magnitude increases at a rate similar to how Q decreases in BGV. The difference in noise management strategies between BGV and BFV affects the noise growth and efficiency of the schemes. Costache and Smart performed a noise growth comparison, which suggested that BGV has better noise growth for larger t than BFV [13]. However, the authors did not examine the computational complexity difference, and it has not been clear up to this moment how the schemes compare in terms of practical performance, both from the perspective of computational complexity and actual experimental measurements.

The main goal of this paper is to present improved variants of BFV and BGV schemes, which also close the gap between the schemes. The other goal is to compare the theoretical complexity of their primitive operations, and experimental performance of BGV and BFV for several different scenarios using our software implementation in the PALISADE library [2].

Modified BFV Scheme. We propose two modifications for the BFV scheme. The first modification deals with encryption, and the second modification revises the homomorphic multiplication operation. The net effects of these modifications are smaller noise growth and faster homomorphic multiplication in BFV.

The encryption in BFV can be represented as \(\boldsymbol{a} \cdot \boldsymbol{s} + \boldsymbol{e} + \Delta \boldsymbol{m}\) (for simplicity, we focus here on the secret-key formulation), where \(\boldsymbol{a}\) is a uniformly random ring element in cyclotomic ring \(\mathcal {R}_Q\), \(\boldsymbol{s}\) and \(\boldsymbol{e}\) are the secret key and Gaussian noise ring elements in \(\mathcal {R}\), \(\boldsymbol{m}\) is a message in \(\mathcal {R}_t\), and \(\Delta = \left\lfloor {Q/t}\right\rfloor \) is the scaling factor. Our analysis shows that the difference between \(\Delta \) and Q/t, which is often described in terms of \(r_t(Q) := Q - t\Delta \), brings about a significant error (proportional to \(r_t(Q)\)) that affects the first homomorphic multiplication and increases the noise growth in BFV as compared to BGV for larger t. If this error is removed, i.e., \(r_t(Q) \approx 0\), the noise growth in BFV becomes the same, or actually somewhat better, as in BGV. In view of this, our first modification suggested for BFV is to replace the encryption operation in BFV with \(\boldsymbol{a} \cdot \boldsymbol{s} + \boldsymbol{e} + \left\lceil {\frac{Q}{t} \boldsymbol{m}}\right\rfloor \), which is a more natural choice as compared to the one in the original BFV. This encryption function also significantly simplifies the noise analysis and estimates for BFV homomorphic multiplication. Note that this modification can be likewise applied to the original Brakerski LWE scheme [8].

The most expensive operation in BFV is homomorphic multiplication as it requires a multiplication of two ciphertexts \(\mathbf {c}_1\) and \(\mathbf {c}_2\) without modular reduction, followed by scaling the results of the tensor product by t/Q. Algorithmically, this requires extending both ciphertexts to modulus QP, where P is sufficiently larger than Q, performing a tensor product which involves expensive Number Theoretic Transforms (NTTs), scaling down the result by Q/t, and finally switching the scaling result from P to Q. We propose a more efficient procedure for homomorphic multiplication where the values of \(P \approx Q\), which saves some expensive modulus extension operations and NTTs. The main idea is to apply modulus switching to one of the ciphertexts, e.g., \(\mathbf {c}_2\), to change it from Q to P (denote it as \(\mathbf {c}'_2\)), and then do scaling by t/P after the tensor product. This removes the requirement for extending the scaling result from P to Q (it will already be in Q) at the expense of doing a smaller number of modulus extensions during the modulus switching of \(\mathbf {c}_2\). The other benefit is that the tensor product of \(\mathbf {c}_1\) and \(\mathbf {c}'_2\) can be scaled by t/P directly in PQ, i.e., we have a tensor product mod PQ instead of a tensor product without modular reduction.

We also introduce a leveled version of BFV homomorphic multiplication, where ciphertexts modulo a larger modulus Q are internally scaled down to a smaller modulus \(Q_\ell \) (or \(P_\ell \)), the standard homomorphic multiplication operations are performed, and then the results are scaled back up to Q. The benefit of this approach is that the ciphertexts still look the same (modulo Q) outside the homomorphic multiplication operations, but we get BGV-like benefits of working with smaller moduli in multiplication. The combined effect of our improvements in homomorphic multiplication is the speed-up of up to 4x, as compared to a prior state-of-the-art BFV implementation, when dealing with multiplications at deeper levels of computation.

BFV Scheme Optimizations. We also introduce several algorithmic optimizations that equally apply to the classical BFV and our modified variant. The first optimization is for the scenarios where we need to add multiple BFV ciphertexts that were just obtained by BFV multiplication. The standard way is to perform many expensive BFV multiplications and then add up the result. However, we can delay the scaling by t/Q (or by t/P in our BFV variant) in each homomorphic multiplication until the sum is computed, and then just do one scaling at the end. This saves many expensive NTTs and modulus extension operations. We denote this optimization as lazy scaling. The lazy scaling can be combined with previously known lazy relinearization to push most of the expensive computations in a homomorphic multiplication, i.e., scaling and relinearization, to the end, after the aggregation is done.

Some of the other optimizations apply to Residue Number Systems (RNS) variants of BFV, where multi-precision integers in \(\mathbb {Z}_Q\) are split into vectors of smaller integers using the Chinese Remainder Theorem (CRT) to perform operations efficiently using native (64-bit) integer types. The RNS variants are now predominately used in practice, and are implemented in the SEAL [29], PALISADE [2], and Lattigo [1] software libraries. There are two main RNS variants of BFV: the Bajard-Eynard-Hasan-Zucca (BEHZ) variant based on modular integer arithmetic and Montgomery reductions and the Halevi-Polyakov-Shoup (HPS) variant based on a combination of modular integer arithmetic and floating-point approximations [5, 21].

A significant limitation of the HPS approach is that high-precision (“long double” or even quad-precision) floating-point arithmetic is required to support larger CRT moduli: long doubles are needed for CRT moduli from 47 to 58 bits, and quad-precision floats are needed for higher CRT moduli [21]. We introduce a general-purpose digit decomposition technique (inspired by digit decomposition in key switching) and apply it to the HPS decryption procedure to add support for arbitrary CRT moduli using only regular double-precision floating-point arithmetic, thus overcoming this limitation of the HPS variant. This digit decomposition technique can be applied to other mixed integer/floating-point RNS operations to reduce precision requirements for floating-point arithmetic.

We also apply the full RNS variant of hybrid key switching [24] recently proposed for the CKKS scheme to both BFV RNS variants, and demonstrate how some auxiliary CRT moduli needed for homomorphic multiplication can be reused for hybrid key switching. This key switching method has some benefits (smaller noise growth, better efficiency for deeper computations) over the residue decomposition key switching method previously used in both RNS variants of BFV.

BGV Scheme Optimizations and Usability Improvements. We use the Gentry-Halevi-Smart (GHS) variant of BGV as the basis for our BGV instantiation [20, 23]. Although the original GHS variant performs some operations in RNS, it still uses multiprecision integer arithmetic for key switching and some scenarios of modulus switching. For instance, although the GHS paper originally introduced the hybrid key switching technique, the authors used multiprecision arithmetic for the digit decomposition step. We apply the full RNS version of hybrid key switching to our BGV instantiation and eliminate any multiprecision arithmetic from our BGV implementation, thus significantly improving its efficiency.

One of the challenges in the GHS variant is the need to perform dynamic noise estimation, which makes the BGV implementation less robust and usable as compared to the BFV variants where noise estimation is typically needed only at the parameter generation phase. We develop a more usable and robust variant of BGV that is essentially as simple to use as the current BFV implementations. This variant only needs to know the multiplicative depth and maximum number of additions per level for many common scenarios. The main advantage of this BGV variant is that it is significantly faster than our BFV implementations for certain practical scenarios, yet its usability matches that of BFV.

Implementation and Performance Comparison. We implement the improved variants of BFV and BGV in PALISADE, and provide their comparison for specific benchmark computations. To the best of our knowledge, this is the first publicly available implementation of both schemes in the same software library. We also perform theoretical comparison of the computational complexity for the operations that differ between BFV and BGV.

The comparison results can be summarized as follows:

  • Our improved variant of BFV has somewhat better noise growth than BGV, in contrast to prior results for the original BFV scheme that showed better noise growth for BGV at larger plaintext moduli [13].

  • Our best variant of BFV is faster than BGV for small plaintext moduli, while BGV is faster for intermediate and large plaintext moduli used in many practical scenarios.

  • The speed-up in homomorphic multiplication of our best BFV variant compared to a prior state-of-the-art RNS implementation of BFV goes up to 4x for deeper computations.

Related work. Costache et al. further examine the difference between the noise growth in BFV and BGV [14] to improve the analysis presented in [13]. They explore an alternative heuristic noise analysis approach to obtain tighter noise bounds. We point out that this new analysis has some inacurracies, e.g., the effect of extra noise due to \(r_t(Q)\) in BFV homomorphic multiplication is not accounted for. We show that this extra noise determines the higher noise growth in BFV for large plaintext moduli, and demonstrate how this noise is removed in our BFV variant. Moreover, we show that this analysis can be carried out independently of the chosen heuristic for noise analysis. The authors also do not consider the difference in the complexity of homomorphic encryption operations between BGV and BFV, which affects their conclusions. In view of the above, we primarily compare our results with the prior work [13].

The encoding of a message in the MSD of a ciphertext as \(\lceil \frac{Q}{t}\boldsymbol{m}\rfloor \) was already used in the Key Encapsulation Mechanism (KEM) Kyber [7]. But in the case of Kyber, the plaintext modulus \(t=2\), i.e., the coefficients of the messages are either 0 or 1. Therefore, the messages can be recovered directly during decryption by checking whether the coefficients are closer to \(\lceil Q/2\rfloor \) or 0, and the noise is not affected.

SEAL also independently added to v3.4.0 a modification of BFV encryption similar to what we describe in our work [29]. However, no underlying noise analysis was presented, and the prior paper related to SEAL [14] included noise analysis inaccuracies involving the rounding term \(r_t(Q)\), suggesting that the full effect of this change was not well-understood.

Note that the FHEW and TFHE schemes can also support arithmetic over finite fields for small plaintext moduli (typically up to 4 bits) [28], and can be considered as an alternative to BGV and BFV for these scenarios. These schemes support fast bootstrapping (the latency is much lower than for BGV and BFV bootstrapping [22]), but their main limitation is the lack of support for CRT packing, which makes the BGV/BFV approach much more appealing when large arrays of numbers need to be computed on/bootstrapped because one ciphertext operation can perform thousands of integer operations at once.

Organization. The rest of the paper is organized as follows. In Sect. 2 we provide the necessary background on BGV and BFV. In Sects. 3 and 4, we present our improved variants of BFV and BGV, respectively. Section 5 includes the theoretical comparison of the schemes, and discussion of the experimental results. Section 6 provides the conclusions and outlines the ideas for future work.

2 Background

All logarithms are expressed in base 2 if not indicated otherwise. Let N be a power of two. We denote the 2N-th cyclotomic ring \(\mathcal {R}=\mathbb {Z}[X] / (X^N + 1)\) and \(\mathcal {R}_Q := \mathcal {R}/Q\mathcal {R}\).Footnote 1 Ring elements are indicated in bold, e.g. \(\boldsymbol{a}\). For an integer \(Q>1\), we identify the ring \(\mathbb {Z}_Q\) with \((-Q/2, Q/2]\cap \mathbb {Z}\) as a representative interval and for \(z\in \mathbb {Z}\), \([z]_Q\in \mathbb {Z}_Q\) denotes the centered remainder of z modulo Q, while \(r_Q(z)\) denotes the classical Euclidean remainder in \([0,Q)\cap \mathbb {Z}\). For \(x\in \mathbb {Q}\), \(\lfloor x \rfloor \), \(\lfloor x \rceil \) and \(\lceil x \rceil \) denote the rounding to the lower, closest and higher integer, respectively. We extend these notations to elements of \(\mathcal {R}\) by applying them coordinate-wise. For \(\boldsymbol{a}=a_0+a_1\cdot X + \cdots + a_{N-1}\cdot X^{N-1}\in \mathcal {R}\), we denote the \(\ell _\infty \) norm of \(\boldsymbol{a}\) as \(\Vert \boldsymbol{a}\Vert _\infty = \max _{0\le i<N}\{|a_i|\}\). There exists a constant \(\delta _\mathcal {R}\) such that \(\Vert \boldsymbol{a}\cdot \boldsymbol{b}\Vert _\infty \le \delta _\mathcal {R}\Vert \boldsymbol{a}\Vert _\infty \Vert \boldsymbol{b}\Vert _\infty \) for any \((a,b)\in \mathcal {R}^2\). It is well-known that for \(\mathcal {R}=\mathbb {Z}[X] / (X^N + 1)\), \(\delta _\mathcal {R}= N\). However in practice this bound is only reached with exponentially low probability. As shown in [21], the bound \(\delta _\mathcal {R}= 2\sqrt{N}\) is much closer to what we observe experimentally, and can be used to achieve tighter noise bounds. Another approach consists in estimating the noise size using the canonical embedding norm [20], as currently done in HElib [23]. Nonetheless, in this work we estimate the noise size using the expansion factor \(\delta _\mathcal {R}\) with the method of [21] as it is simpler and precise enough for our purpose.

We use \(\boldsymbol{a} \leftarrow \chi \) to denote the sampling of \(\boldsymbol{a}\in \mathcal {R}\) according to a distribution \(\chi \). \(\chi _{\text {key}}\) denotes the uniform ternary distribution, i.e., all the coefficients of \(\boldsymbol{a} \leftarrow \chi _{\text {key}}\) are selected uniformly and independently from \(\left\{ -1,0,1 \right\} \). This distribution is commonly used for secret key generation as it is the most efficient option conforming to the HE standard [4]. \(\chi _{\texttt {err}}\) denotes a discrete Gaussian distribution with standard deviation \(\sigma _{\texttt {err}}\), i.e. all the coefficients of \(\boldsymbol{a} \leftarrow \chi _{\texttt {err}}\) are selected independently from a truncated discrete Gaussian distribution with standard deviation \(\sigma _{\text {err}}\). Truncated discrete Gaussian distributions are commonly used to generate error polynomials to meet the desired hardness requirement [4]. We assume that the polynomials sampled from \(\chi _{\texttt {key}}\) and \(\chi _{\texttt {err}}\) have their coefficients bounded by \(B_{\texttt {key}} = 1\) and \(B_{\texttt {err}} = 6\sigma _{\texttt {err}}\), respectively. Although a Gaussian distribution is not bounded by nature, the probability for a Gaussian coefficient to be larger than \(B_{\texttt {err}} = 6\sigma _\texttt {err}\), is less than \(2^{-30}\), therefore the two distributions are very close in practice. \(\mathcal {U}_Q\) denotes the uniform distribution over \(\mathcal {R}_Q\), where every coefficient of \(\boldsymbol{a}\) is sampled uniformly and independently from \(\mathbb {Z}_Q\).

2.1 Plaintext Space

We are interested in the BGV and BFV homomorphic encryption schemes which both share the same plaintext space \(\mathcal {R}_t\) for some integer \(t > 1\). Hence, the most natural way to represent plaintext messages of these schemes is to think of them as vectors of size N with their coefficients taken modulo t. However, \(\mathcal {R}_t\) has many algebraic properties, in particular when \(t = p^r\) is a prime power with p coprime to 2N. In this case \(\mathcal {R}_t\) is actually a \(\mathbb {Z}_t\)-algebra, which means that it contains a subring isomorphic to \(\mathbb {Z}_t\). In this paper we focus on the case \(r = 1\), where \(t=p\) is a prime. The interested reader can nonetheless refer to [23] for further details regarding the general case. \(\mathbb {Z}_t\)-algebra supports efficent Single-Instruction Multiple-Data (SIMD) packing/batching. For more details on the packing, the reader is referred to [30].

2.2 Homomorphic Encryption Schemes for Finite Field Arithmetic

The two schemes studied in this work: BGV and BFV are actually two instantiations of the same idea, and share, therefore, many common features. First, according to the desired security level \(\lambda \) and the targeted application, one starts by selecting public parameters for the considered scheme: ring dimension \(N=2^d\), the plaintext modulus t, a ciphertext modulus Q and two probability distributions \(\chi _{key}\) and \(\chi _{err}\) on the ring \(\mathcal {R}\). In both cases, the secret key will be an element \(\boldsymbol{s}\leftarrow \chi _{\texttt {key}}\). Note that BGV  and BFV  may be viewed as different modes of a unified scheme, where the ciphertexts may be switched from one mode/scheme to the other (see the full version for details).

Original BGV Scheme. In 2011, Brakerski et al. designed a leveled homomorphic scheme, namely capable of evaluating circuits of arbitrary size, but known beforehand [9]. The key tool of their construction is the modulus switching procedure which allows to switch a ciphertext \(\texttt {ct}\) encrypted under a modulus Q to a smaller modulus \(Q'\) in order to maintain the noise level “constant”. As a consequence, one must select a chain of \(L+1\) moduli \(Q_0 \mid Q_1 \mid \ldots \mid Q_L = Q\) such that t and \(Q_L\) are coprime. The public key is formed as:

$$\begin{aligned} \texttt {pk}= \left( \left[ \boldsymbol{a}\cdot \boldsymbol{s} + t\boldsymbol{e} \right] _{Q_L},-\boldsymbol{a} \right) \in \mathcal {R}_{Q_L}^2, \end{aligned}$$

which is equivalent to the Ring-LWE sample \((\left[ \boldsymbol{a}/t\cdot \boldsymbol{s} + \boldsymbol{e} \right] _{Q_L},[-\boldsymbol{a}/t]_{Q_L})\) (since t and \(Q_L\) are coprime) associated to \(\boldsymbol{s}\) and \(Q_L\) with \(\boldsymbol{a}\leftarrow \mathcal {U}_{Q_L}\) and \(\boldsymbol{e}\leftarrow \chi _\texttt {err}\).

A ciphertext \(\texttt {ct}=(\boldsymbol{c}_0,\boldsymbol{c}_1)\in \mathcal {R}_Q^2\) corresponds to a degree 1 polynomial whose coefficients lie in \(\mathcal {R}_Q\). The message \(\boldsymbol{m}\in \mathcal {R}_t\) is hidden in the LSD of the first coefficient \(\boldsymbol{c}_0\) of the ciphertext as follows:

with \(\boldsymbol{u}\leftarrow \chi _\texttt {key}\) and \(\boldsymbol{e}_0, \boldsymbol{e}_1\leftarrow \chi _\texttt {err}\). The noise contained in a ciphertext \(\texttt {ct}=(\boldsymbol{c}_0,\boldsymbol{c}_1)\) appears explicitly once the ciphertext is evaluated on the secret key \(\boldsymbol{s}\):

$$\begin{aligned} \boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s} = [\boldsymbol{m}]_t + t(\boldsymbol{u}\cdot \boldsymbol{e}+ \boldsymbol{e}_1\cdot \boldsymbol{s}+ \boldsymbol{e}_0) = [\boldsymbol{m}]_t + t\boldsymbol{v}_{\texttt {fresh}} \bmod Q_L, \end{aligned}$$
(1)

where the term \(\boldsymbol{v}_{\texttt {fresh}}=\boldsymbol{u}\cdot \boldsymbol{e}+ \boldsymbol{e}_1\cdot \boldsymbol{s}+\boldsymbol{e}_0 \) is the noise inherent to a “freshly” encrypted ciphertext. Since \(Q_0\mid Q_1\mid \ldots \mid Q_L\), encryptions can be performed equivalently at any level i, i.e., modulo \(Q_i\).

To decrypt a ciphertext \(\texttt {ct}=(\boldsymbol{c}_0,\boldsymbol{c}_1)\in \mathcal {R}_{Q_i}^2\) with \(i\in [0,L]\), one computes \(\boldsymbol{m}' = [\boldsymbol{c}_0+\boldsymbol{c}_1\cdot \boldsymbol{s}]_{Q_i}\) and then outputs \([\boldsymbol{m}']_t\). To ensure correctness of the decryption, the noise \(\boldsymbol{v}\) must be “small enough” such that \(\boldsymbol{m}'=[\boldsymbol{m}]_t + t\boldsymbol{v}\) does not wrap-around modulo \(Q_i\). As a consequence, decryption remains correct as long as:

$$\begin{aligned} \left\Vert {\boldsymbol{v}}\right\Vert _\infty < \frac{Q_0}{2t} - \frac{1}{2}. \end{aligned}$$

One can add two ciphertexts \(\texttt {ct}\) and \(\texttt {ct}'\) encrypting \(\boldsymbol{m}\) and \(\boldsymbol{m}'\), respectively, at the same level i to yield:

$$ \boldsymbol{c}_0+\boldsymbol{c}'_0 + (\boldsymbol{c}_1+\boldsymbol{c}_1')\cdot \boldsymbol{s} = [\boldsymbol{m}+\boldsymbol{m}']_t + t(\boldsymbol{v}+\boldsymbol{v}' + \boldsymbol{u}) \bmod Q_i,$$

with \(\left\Vert {\boldsymbol{u}}\right\Vert _\infty \le 1\). This means that

$$\begin{aligned} \texttt {ct}_\texttt {add}= \left( [\boldsymbol{c}_0+\boldsymbol{c}_{0}']_{Q_i},[\boldsymbol{c}_1+\boldsymbol{c}_1']_{Q_i} \right) \end{aligned}$$

is a level-i encryption of \([\boldsymbol{m}+\boldsymbol{m}']_t\) and its noise is almost the sum of the noises of \(\texttt {ct}\) and \(\texttt {ct}'\):

$$\begin{aligned} \left\Vert {\boldsymbol{v}_\texttt {add}}\right\Vert _\infty = \Vert \boldsymbol{v}+\boldsymbol{v}'+\boldsymbol{u}\Vert _\infty \le \left\Vert {\boldsymbol{v}}\right\Vert _\infty + \left\Vert {\boldsymbol{v}'}\right\Vert _\infty + 1. \end{aligned}$$

Similarly to addition, we can multiply two level-i ciphertexts \(\texttt {ct}\) and \(\texttt {ct}'\) to obtain the following congruence modulo \(Q_i\):

$$\begin{aligned} (\boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s})\cdot (\boldsymbol{c}'_0 + \boldsymbol{c}'_1\cdot \boldsymbol{s}) = [\boldsymbol{m}\cdot \boldsymbol{m}']_t+t([\boldsymbol{m}]_t\cdot \boldsymbol{v}' + \boldsymbol{v}\cdot [\boldsymbol{m}']_t+ t\boldsymbol{v}\cdot \boldsymbol{v}' + \boldsymbol{r}_m) \end{aligned}$$

with \([\boldsymbol{m}]_t\cdot [\boldsymbol{m}']_t = [\boldsymbol{m}\cdot \boldsymbol{m}']_t + t\boldsymbol{r}_m\) and \(\Vert \boldsymbol{r}_m\Vert _\infty \le \delta _\mathcal {R}t/2\). This means that

$$ \texttt {ct}_\texttt {mult}= \left( [\boldsymbol{c}_0\cdot \boldsymbol{c}_{0}']_{Q_i},[\boldsymbol{c}_0\cdot \boldsymbol{c}'_1+\boldsymbol{c}_1\cdot \boldsymbol{c}_0']_{Q_i},[\boldsymbol{c}_1\cdot \boldsymbol{c}_1']_{Q_i} \right) \in \mathcal {R}_{Q_i}^3$$

is a degree-2 ciphertext encrypting \([\boldsymbol{m}\cdot \boldsymbol{m}']_t\) and its noise is bounded by

$$\begin{aligned} \left\Vert {\boldsymbol{v}_\texttt {mult}}\right\Vert _\infty&= \left\| [\boldsymbol{m}]_t\cdot \boldsymbol{v}' + \boldsymbol{v}\cdot [\boldsymbol{m}']_t+ t\boldsymbol{v}\cdot \boldsymbol{v}' + \boldsymbol{r}_m\right\| _\infty \nonumber \\&\le \frac{\delta _\mathcal {R}t}{2}\left( 2\left\Vert {\boldsymbol{v}}\right\Vert _\infty \left\Vert {\boldsymbol{v}'}\right\Vert _\infty + \left\Vert {\boldsymbol{v}}\right\Vert _\infty +\left\Vert {\boldsymbol{v}'}\right\Vert _\infty + 1 \right) . \end{aligned}$$

Remark 1

The reader can notice that the degree, and thus the size, of a ciphertext increases after each multiplication, increasing therefore the future communication and computational costs. Since this is something one wants to avoid in practice, the degree-2 ciphertexts are “relinearized” after a homomorphic multiplication to degree-1 ciphertexts using a key-switching procedure (see the full version).

The main issue with homomorphic multiplication is its quadratic noise growth, which implies that by choosing \(Q_L \approx \Vert \boldsymbol{v}_{\texttt {fresh}}\Vert _\infty ^L\) one could only perform \(\log _2 L\) consecutive multiplications. The idea of modulus switching is to reduce the size of the noise after each multiplication to keep it constant and prevent the quadratic blow-up. This is achieved by scaling the ciphertext \(\texttt {ct}\) by \(Q_i/Q_j\) for \(i<j\), which scales down the noise by roughly the same factor. More precisely, let \(\texttt {ct}= (\boldsymbol{c}_0,\boldsymbol{c}_1)\) be a level \(j\in (0,L]\cap \mathbb {Z}\) encryption of a message \(\boldsymbol{m}\) with noise \(\boldsymbol{v}\) and let i be an integer smaller than j, then set:

$$\begin{aligned} \boldsymbol{\delta } = \left( t[-\boldsymbol{c}_0/t]_{Q_j/Q_i},t[-\boldsymbol{c}_1/t]_{Q_j/Q_i} \right) \in \mathcal {R}^2. \end{aligned}$$

Then one can compute

$$\begin{aligned} \texttt {ct}' = \frac{Q_i}{Q_j}\cdot (\boldsymbol{c}_0+\boldsymbol{\delta }_0,\boldsymbol{c}_1+\boldsymbol{\delta }_1) \bmod Q_i. \end{aligned}$$

Brakerski et al. showed that if \(\texttt {ct}= (\boldsymbol{c}_0,\boldsymbol{c}_1)\) is such that

$$\begin{aligned} \left\Vert {[\boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s}]_{Q_j}}\right\Vert _\infty < \frac{Q_j}{2} - \frac{tQ_j}{2Q_i}(1+\delta _\mathcal {R}B_\texttt {key}), \end{aligned}$$

then \(\texttt {ct}'\) is an encryption of \([Q_i/Q_j\boldsymbol{m}]_t\) whose noise \(\boldsymbol{v}'\) is bounded by

$$\begin{aligned} \left\Vert {\boldsymbol{v}'}\right\Vert _\infty \le \frac{Q_i}{Q_j}\left\Vert {\boldsymbol{v}}\right\Vert _\infty + \Vert \boldsymbol{v}_{\texttt {ms}}\Vert _\infty \end{aligned}$$

with \(\Vert \boldsymbol{v}_{\texttt {ms}}\Vert _\infty \le (1+\delta _\mathcal {R}B_\texttt {key})/2\). Therefore by choosing the encryption parameters such that performing modulus switching after a homomorphic multiplication (plus a key-switching) brings the noise back to its initial level, one can perform L consecutive multiplications instead of approximately \(\log _2 L\) initially.

Gentry-Halevi-Smart (GHS) Variant of BGV. Since the modulus-switching procedure outputs an encryption of the message scaled by a factor \([Q_i/Q_j]_t\), Brakerski et al. proposed to choose the moduli \(Q_i = 1 \bmod t\). This approach, although very convenient in theory, becomes challenging in practice when using a large t since it reduces significantly the range of possible moduli for the \(Q_i\). As a consequence, Gentry, Halevi, and Smart proposed to keep track of the scaling factor for each ciphertext instead [20]. In particular, they suggested to encrypt \([Q_L \boldsymbol{m}]_t\) instead of \([\boldsymbol{m}]_t\) in Eq. (1), which provides natural downscaling to \([Q_{i} \boldsymbol{m}]_t\) as modulus switching operations are applied. However in this case, one has to pay attention when adding two ciphertexts with different scaling factors. Nonetheless this can be achieved without impacting significantly the noise by following the methodology of [25].

Gentry et al. also proposed several optimizations related to noise management. The first one is to perform modulus switching after encryption and before first multiplication, in order to reduce the noise from \(\Vert \boldsymbol{v}_{\texttt {fresh}}\Vert _\infty \) to \(\Vert \boldsymbol{v}_{\texttt {ms}}\Vert _\infty \). The second one is to perform modulus switching just before the next multiplication instead of just after a multiplication. This permits to reduce the noise accumulated due to other operations, such as additions or key switching, that are performed between two subsequent multiplications.

Original BFV Scheme. In [8], Brakerski proposed a scale-invariant construction that achieves asymptotically the same noise growth as BGV, but does not explicitly call the modulus-switching procedure, embedding it internally in the homomorphic multiplication. Fan and Vercauteren then ported Brakerski’s construction to the Ring-LWE setting [16], and the scheme is now commonly referred to as BFV. The BFV scheme uses a public key

$$\begin{aligned} \texttt {pk}= \left( \left[ \boldsymbol{a}\cdot \boldsymbol{s} + \boldsymbol{e} \right] _{Q},-\boldsymbol{a} \right) \in \mathcal {R}_{Q}^2, \end{aligned}$$

which corresponds exactly to a Ring-LWE sample associated to \(\boldsymbol{s}\) and Q with \(\boldsymbol{a}\leftarrow \mathcal {U}_{Q}\) and \(\boldsymbol{e}\leftarrow \chi _\texttt {err}\). The main difference between BGV and BFV is that BFV ciphertexts encode messages in their MSD instead of LSD:

with \(\Delta = \lfloor Q/t \rfloor \), \(\boldsymbol{u}\leftarrow \chi _\texttt {key}\) and \(\boldsymbol{e}_0, \boldsymbol{e}_1\leftarrow \chi _\texttt {err}\). Similarly to BGV, the noise contained in a ciphertext \(\texttt {ct}=(\boldsymbol{c}_0,\boldsymbol{c}_1)\) appears explicitly once the ciphertext is evaluated on the secret key \(\boldsymbol{s}\):

$$\begin{aligned} \boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s} = \Delta [\boldsymbol{m}]_t + \boldsymbol{u}\cdot \boldsymbol{e}+ \boldsymbol{e}_1\cdot \boldsymbol{s}+ \boldsymbol{e}_0 = \Delta [\boldsymbol{m}]_t + \boldsymbol{v}_{\texttt {fresh}} \bmod Q, \end{aligned}$$

where the “fresh” noise \(\boldsymbol{v}_{\texttt {fresh}}=\boldsymbol{u}\cdot \boldsymbol{e}+ \boldsymbol{e}_1\cdot \boldsymbol{s}+\boldsymbol{e}_0 \) is the same as for BGV.

To decrypt the ciphertext \(\texttt {ct}\), one needs to scale and round \(\texttt {ct}(\boldsymbol{s})\) by t/Q to remove the factor \(\Delta \). Hence the decryption procedure requires computing

$$ \boldsymbol{m}' = \left\lfloor \frac{t}{Q}\left[ \boldsymbol{c}_0+ \boldsymbol{c}_1\cdot \boldsymbol{s}\right] _Q \right\rceil ,$$

and the decryption will be correct as long as:

$$\begin{aligned} \left\Vert {\boldsymbol{v}}\right\Vert _\infty < \frac{Q}{2t} - \frac{r_t(Q)}{2}. \end{aligned}$$
(2)

Note that the term \(r_t(Q)/2\) is the error inherited from the difference between \(\Delta ^{-1}\) and t/Q: \(\Delta t/Q = 1 - r_t(Q)/Q\). Addition of two ciphertexts \(\texttt {ct}\) and \(\texttt {ct}'\) is done like in BGV, but the noise growth is slightly different since the carry of the addition of the two messages is scaled by \(\Delta \):

$$ \boldsymbol{c}_0+\boldsymbol{c}'_0 + (\boldsymbol{c}_1+\boldsymbol{c}_1')\cdot \boldsymbol{s} = \Delta [\boldsymbol{m}+\boldsymbol{m}']_t + \boldsymbol{v}+\boldsymbol{v}' -r_t(Q)\boldsymbol{u} \bmod Q,$$

with \(\left\Vert {\boldsymbol{u}}\right\Vert _\infty \le 1\). This implies that

$$\begin{aligned} \texttt {ct}_\texttt {add}= \left( [\boldsymbol{c}_0+\boldsymbol{c}_{0}']_{Q},[\boldsymbol{c}_1+\boldsymbol{c}_1']_{Q} \right) \end{aligned}$$

is an encryption of \([\boldsymbol{m}+\boldsymbol{m}']_t\), and its noise is bounded by

$$\begin{aligned} \left\Vert {\boldsymbol{v}_\texttt {add}}\right\Vert _\infty = \Vert \boldsymbol{v}+\boldsymbol{v}'+ r_t(Q)\boldsymbol{u}\Vert _\infty \,\le \, \left\Vert {\boldsymbol{v}}\right\Vert _\infty + \left\Vert {\boldsymbol{v}'}\right\Vert _\infty + r_t(Q). \end{aligned}$$
(3)

The BFV multiplication of two ciphertexts \(\texttt {ct}\) and \(\texttt {ct}'\) is done differently, as compared to BGV, since once the product is computed, it gets scaled by \(\Delta ^2\), which has two important consequences. First, the product of \(\texttt {ct}\) and \(\texttt {ct}'\) cannot be reduced modulo Q, therefore it must be done in \(\mathcal {R}\), i.e., without any modular reduction. Second, the product must be scaled down by \(t/Q \approx \Delta ^{-1}\) to remove the extra \(\Delta \) factor and reduce the noise similarly to modulus switching in BGV. We describe the two steps of the homomorphic multiplication separately. The first part, called the tensoring, consists in computing the product of two ciphertexts in \(\mathcal {R}\):

$$\begin{aligned} \texttt {ct}_\texttt {tensor}= (\boldsymbol{c}_0\cdot \boldsymbol{c}_{0}',\boldsymbol{c}_0\cdot \boldsymbol{c}'_1+\boldsymbol{c}_1\cdot \boldsymbol{c}_0',\boldsymbol{c}_1\cdot \boldsymbol{c}_1')\in \mathcal {R}^3. \end{aligned}$$

When evaluating \(\texttt {ct}_{\texttt {tensor}}\) on the secret key, one obtains:

$$\begin{aligned} (\boldsymbol{c}_0\ + \boldsymbol{c}_1\cdot \boldsymbol{s})\cdot (\boldsymbol{c}'_0\ + \boldsymbol{c}'_1\cdot \boldsymbol{s})&= (\Delta [\boldsymbol{m}]_t + \boldsymbol{v} + Q\boldsymbol{k})\cdot (\Delta [\boldsymbol{m}']_t + \boldsymbol{v}' + Q\boldsymbol{k}') \\&= \frac{Q}{t}\Delta \left[ \boldsymbol{m}\cdot \boldsymbol{m}' \right] _t+ \frac{Q}{t}\boldsymbol{v}_\texttt {tensor}+ \frac{Q^2}{t}\boldsymbol{k}_\texttt {tensor}, \end{aligned}$$

where

$$\begin{aligned} \boldsymbol{v}_\texttt {tensor}&=\frac{t\boldsymbol{v}\cdot \boldsymbol{v}'}{Q}+\frac{t\Delta }{Q}\left( [\boldsymbol{m}]_t\cdot \boldsymbol{v}'+[\boldsymbol{m}']_t\cdot \boldsymbol{v} \right) +t(\boldsymbol{v}\cdot \boldsymbol{k}'+\boldsymbol{v}'\cdot \boldsymbol{k})\\&-r_t(Q)\left( [\boldsymbol{m}]_t\cdot \boldsymbol{k}'+[\boldsymbol{m}']_t\cdot \boldsymbol{k} + \boldsymbol{r}_m + \frac{\Delta }{Q}[\boldsymbol{m}]_t\cdot [\boldsymbol{m}']_t \right) , \end{aligned}$$
$$\begin{aligned} \boldsymbol{k}_\texttt {tensor}= [\boldsymbol{m}]_t\cdot \boldsymbol{k}'+[\boldsymbol{m}']_t\cdot \boldsymbol{k} + t\boldsymbol{k}\cdot \boldsymbol{k}' + \boldsymbol{r}_m \end{aligned}$$

with \(\Vert \boldsymbol{r}_m\Vert _\infty \,\le \, \delta _\mathcal {R}t/2\) like for BGV. Also note that \(\boldsymbol{k} = (\boldsymbol{c}_0+\boldsymbol{c}_1\cdot \boldsymbol{s} - \Delta [\boldsymbol{m}]_t - \boldsymbol{v})/Q\) and \(\boldsymbol{k}' = (\boldsymbol{c}'_0+\boldsymbol{c}'_1\cdot \boldsymbol{s} - \Delta [\boldsymbol{m}']_t - \boldsymbol{v}')/Q\) have their norm bounded by \((\delta _\mathcal {R}B_\texttt {key}+ 3)/2\).

The scaling operation is done in \(\mathcal {R}_Q\) and outputs a result modulo Q:

$$\begin{aligned} \texttt {ct}_\texttt {scale}= \left[ \left\lceil {\frac{t}{Q}\texttt {ct}_\texttt {tensor}}\right\rfloor \right] _Q\in \mathcal {R}_Q^3. \end{aligned}$$

The scaling leads to

$$\begin{aligned} \frac{t}{Q}\left( \boldsymbol{c}_{\texttt {tensor}_0} + \boldsymbol{c}_{\texttt {tensor}_1}\cdot \boldsymbol{s} + \boldsymbol{c}_{\texttt {tensor}_2}\cdot \boldsymbol{s}^2\right)&= \Delta \left[ \boldsymbol{m}\cdot \boldsymbol{m}' \right] + \boldsymbol{v}_\texttt {tensor}+Q\boldsymbol{k}_\texttt {tensor}. \end{aligned}$$

The rounding of scaled terms introduces an additional error \(\boldsymbol{v}_{r}\) such that

$$\begin{aligned} \texttt {ct}_{\texttt {scale}_0} + \texttt {ct}_{\texttt {scale}_1}\cdot \boldsymbol{s} + \texttt {ct}_{\texttt {scale}_2}\cdot \boldsymbol{s}^2 = \Delta \left[ \boldsymbol{m}\cdot \boldsymbol{m}' \right] + \boldsymbol{v}_\texttt {tensor}+ \boldsymbol{v}_{\texttt {r}} \bmod Q, \end{aligned}$$

with \(\Vert \boldsymbol{v}_r\Vert _\infty \,\le \, (1 + \delta _\mathcal {R}B_{\texttt {key}} + \delta ^2_\mathcal {R}B^2_{\texttt {key}})/2\). Hence the total multiplication noise \(\boldsymbol{v}_\texttt {mult}=\boldsymbol{v}_\texttt {tensor}+ \boldsymbol{v}_{\texttt {r}} \) is bounded by

$$\begin{aligned} \left\Vert {\boldsymbol{v}_\texttt {mult}}\right\Vert _\infty \le \,&\frac{\delta _\mathcal {R}t}{2}\left( \frac{2\left\Vert {\boldsymbol{v}}\right\Vert _\infty \left\Vert {\boldsymbol{v}'}\right\Vert _\infty }{Q}+\left( 4+\delta _\mathcal {R}B_\texttt {key} \right) \left( \left\Vert {\boldsymbol{v}}\right\Vert _\infty +\left\Vert {\boldsymbol{v}'}\right\Vert _\infty \right) \right. \nonumber \\&+ r_t(Q)\left( \delta _\mathcal {R}B_\texttt {key}+ 5\right) \bigg ) + \frac{1 + \delta _\mathcal {R}B_{\texttt {key}} + \delta ^2_\mathcal {R}B^2_{\texttt {key}}}{2}. \end{aligned}$$
(4)

For the same reasons as for BGV, one needs to perform a key-switching operation to relinearize the resulting degree-2 ciphertext. The key-switching procedure is the same for both schemes, and we refer to the full version for further details.

2.3 RNS Representation

The Chinese Remainder Theorem (CRT) permits decomposing multi-precision integers in \(\mathbb {Z}_Q\) into vectors of smaller integers to perform operations efficiently using native (64-bit) integer data types. The integer CRT representation is also often referred to as the Residue-Number-System (RNS) representation. As a consequence, the ciphertext modulus is usually chosen as a product of “small”, i.e., fitting in a machine word, co-prime moduli so that elements of \(\mathcal {R}_Q\) are represented with their residues modulo the different \(q_i\)’s. For BGV, we choose \(Q = q_0\cdots q_L\) and we denote \(Q_i= q_0\cdots q_i\) for \(0\le i\le L\), where each \(q_i = 1 \bmod 2N\), to use the efficient NTT algorithm for the multiplication of elements in \(\mathcal {R}_Q\). For BFV, we choose \(Q = q_1\cdots q_k\), with \(q_i = 1 \bmod 2N\) for \(1\le i\le k\) and similarly to BGV we denote \(Q_i=q_1\cdots q_i\) for \(1\le i \le k\). Note that we have chosen different notations on purpose since in the original BGV the size of the moduli is directly related to the noise reduction we want to achieve by modulus switching, and is therefore dependent on the circuit one wants to evaluate. However, this constraint can be removed by considering a granular approach with dynamic noise estimation, as implemented in HElib [23] (see the discussion in Sect. 4 for more details on dynamic vs static noise estimation in BGV). On the other hand, in BFV the size of the moduli is independent of the circuit and, hence, the moduli are usually chosen as large as possible within the limit of a machine word.

When performing computations in RNS, and more particularly when implementing BGV and BFV, it is sometimes needed to switch the RNS basis, i.e., convert \(\boldsymbol{a}\in \mathcal {R}_Q\) from its residues modulo \(Q= q_1\cdots q_k\) to \([\boldsymbol{a}]_Q\) modulo P for some \(P= p_1\cdots p_{k'}\). This can be achieved using a basis extension formulated as

$$\begin{aligned} \texttt {FastBaseExtension}(\boldsymbol{a},Q,P) = \sum _{i=1}^k \left[ \boldsymbol{a}\left( \frac{Q}{q_i}\right) ^{-1}\right] _{q_i} \frac{Q}{q_i}\bmod p_j. \end{aligned}$$
(5)

Note that the basis extension does not yield \([\boldsymbol{a}]_Q \bmod P\) but rather \([\boldsymbol{a}]_Q + \boldsymbol{u}Q \bmod P\) with \(\Vert \boldsymbol{u}\Vert _\infty < k/2\). When the result of this extension is divided by Q, as in many procedures of BGV and BFV, the error caused by this Q-overflow \(\boldsymbol{u}\) can be neglected most of the times. However in certain cases, as in the BFV decryption procedure, this overflow cannot be tolerated and needs to be removed/corrected. This can be achieved either using integer instructions with the so-called \(\gamma \)-correction technique of [5], or using floating-point instructions to retrieve \(\boldsymbol{u}\) as in [21] since

$$\begin{aligned} \boldsymbol{u} = \left\lfloor \sum _{i=1}^k \left[ \boldsymbol{a}\left( \frac{Q}{q_i}\right) ^{-1}\right] _{q_i} \frac{1}{q_i} \right\rceil . \end{aligned}$$
(6)

The same problem occurs during BFV homomorphic multiplication, and if it is not handled using either of the techniques of [5] or [21], the impact of \(\boldsymbol{u}\) on the noise growth will be significant [6].

2.4 Hybrid Key Switching in RNS

Key switching transforms a ciphertext \(\texttt {ct}= (\boldsymbol{c}_0,\boldsymbol{c}_1)\in \mathcal {R}_Q^2\), which can be decrypted with \(\boldsymbol{s}_A\), into a ciphertext \(\texttt {ct}' = (\boldsymbol{c}'_0, \boldsymbol{c}'_1)\in \mathcal {R}_Q^2\) encrypting the same message as \(\texttt {ct}\), but decryptable with another secret key \(\boldsymbol{s}_B\). This procedure is needed to compute automorphisms (rotations) of the ciphertexts [19], or to relinearize ciphertexts after a homomorphic multiplication. Note that this procedure adds a noise \(\boldsymbol{v}_{\texttt {ks}}\) to the ciphertexts.

Several ways of performing the key-switching procedure have been found over the years. The first one was formulated by Brakerski and Vaikuntanathan (BV) in [10] and extended to RNS in [5]. This technique is based on digit decomposition of one ring element in the ciphertext. Unfortunately the BV key switching requires a quadratic number of NTTs to be computed, and hence becomes the main bottleneck of the scheme (asymptotically, and often in practice), and causes a relatively large noise growth. In [20], Gentry, Halevi, and Smart proposed another alternative for key switching. Their method, which we refer to as the GHS key switching, has a smaller noise growth than BV, and is also more efficient (asymptotically, and in many practical cases) since it only requires a linear number of NTTs. The drawback of this method is that one either needs to double the dimension N or reduce the size of Q by a factor of 2 for security reasons. Gentry, Halevi, and Smart also presented a hybrid key switching technique, which combines BV digit decomposition and larger modulus from GHS to provide the best tradeoff between the two techniques. The RNS versions of hybrid key switching were later derived for the CKKS scheme in [26] (for one small special prime) and in [24] for the more general case. The hybrid key switching technique [24] is the most efficient one in practice, both in terms of performance and noise growth, as our detailed comparison of the BV, GHS, and Hybrid key switching in the full version shows. Hence we use the hybrid key switching in our implementation.

3 Improved BFV Scheme

One can notice from Eqs. (2), (3) and (4) that the noise of BFV is impacted by the \(r_t(Q)\) factor which does not appear in BGV. This factor causes faster noise growth for BFV when using larger plaintext moduli, as compared to BGV. In this section we show that this problem is not inherent to the MSD encoding of BFV, but rather comes from the choice for its instantiation in [16] and prior LWE-based Brakerski scheme [8]. We show that by instantiating the scheme in a more natural way, we can get rid of this \(r_t(Q)\) term. We also present a modified homomorphic multiplication procedure that significantly improves the complexity of BFV homomorphic multiplication, as compared to all prior variants of BFV.

In this section, \(\texttt {ct} = (\boldsymbol{c}_0, \boldsymbol{c}_1)\) and \(\texttt {ct}' = (\boldsymbol{c}'_0, \boldsymbol{c}'_1)\) denote two BFV ciphertexts encrypting, respectively, the messages \([\boldsymbol{m}]_t\) and \([\boldsymbol{m}']_t\) with noise \(\boldsymbol{v}\) and \(\boldsymbol{v}'\).

3.1 Noise Reduction

To fully understand the problem with faster noise growth in BFV for larger plaintext moduli, let us examine more carefully the noise bound after a multiplication in BFV (Eq. (4)). This bound can be simplified by analyzing only the dominant terms, which determine the noise magnitude. More precisely, if we assume that the two ciphertexts have their noise bounded by V, and \(B_\texttt {key}=1\), the size of the noise of their multiplication can be reasonably approximated by

$$\begin{aligned} \delta _\mathcal {R}t\left( \left( 5+\delta _\mathcal {R} \right) V + \frac{r_t(Q)}{2}\left( \delta _\mathcal {R}+5\right) \right) + \frac{\delta ^2_\mathcal {R}}{2} \approx \delta _\mathcal {R}^2 t\left( V + \frac{r_t(Q)}{2}\right) . \end{aligned}$$
(7)

Since the noise grows significantly with homomorphic multiplications, V becomes larger than \(r_t(Q)/2 < t\) after we perform the first multiplication. However, this is not necessarily true for the first multiplication itself since, like in BGV, the noise of fresh ciphertexts in BFV is bounded by \( B_\texttt {err}(2\delta _\mathcal {R}B_\texttt {key}+ 1) \approx 2\delta _\mathcal {R}B_\texttt {err}\). The homomorphic encryption standard [4] recommends using an error distribution with \(\sigma _\texttt {err}= 3.2\). Therefore, the noise of a fresh ciphertext can be estimated as \(V_{\texttt {init}} = 2\times 6\times 3.2 \times 2\sqrt{N} < 77\sqrt{N}\). Since in practice the dimension N typically does not exceed \(2^{16}\), a fresh ciphertext always has its noise size not higher than 14 bits, while \(r_t(Q)\) can be as large as t. As a consequence, when \( r_t(Q)\) is larger than \(2^{15}\), it becomes responsible for the larger noise growth after the first multiplication in BFV. For instance, if \(t = 2^{32}\) and \(r_t(Q) \approx t/2 \approx 2^{31}\), the noise after the first multiplication will be at least 16 bits larger than in the case when \(r_t(Q) < V_{\texttt {init}}\). Note that this difference will not lead to a larger noise growth on the next multiplications since, as shown in Eq. (7), the noise growth after a multiplication is linear in V. However, this difference of \(16+\) bits will be carried through until the end of the computation. In the case of \(t=2^{60}\), this difference would become at least 44 bits.

The easiest way to circumvent this problem would be to choose the moduli \(q_i\), as in the original BGV, i.e., such that \(q_i = 1 \bmod t\), which would lead to \(r_t(Q)=1\). However, for the same reasons as in BGV, this restriction would make the finding of the moduli challenging for large t. Although it is possible to relax this condition by choosing, for instance, \(r_t(Q) < \sqrt{N}\), i.e., finding \(r_t(Q)\) through trial and error, we believe this would be a patch rather than a real solution. We show next that there is a more natural way to fix this problem.

Indeed, the \(r_t(Q)\) term comes from the difference between \(\Delta ^{-1}\) and t/Q since when computing \(\Delta t/Q\) (during decryption and homomorphic multiplication), one obtains \(1 - r_t(Q)/Q\). Therefore, to solve this issue we propose to modify the original BFV encryption algorithm by encoding \([\boldsymbol{m}]_t\) in the ciphertext in a more natural way as \(\lfloor Q[\boldsymbol{m}]_t/t\rceil \) instead of \(\Delta [\boldsymbol{m}]_t\). The first benefit is seen in the decryption bound since now

$$\begin{aligned} \left\lfloor \frac{t}{Q}[\boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s}]_Q\right\rceil&= \left\lfloor \frac{t}{Q}\left( \frac{Q}{t}[\boldsymbol{m}]_t + \boldsymbol{v} + \boldsymbol{\varepsilon } + \boldsymbol{k}Q\right) \right\rceil = [\boldsymbol{m}]_t + \left\lfloor \frac{t}{Q}\left( \boldsymbol{v} + \boldsymbol{\varepsilon }\right) \right\rceil + t\boldsymbol{k} \\&=[\boldsymbol{m}]_t + \left\lfloor \frac{t}{Q}\left( \boldsymbol{v} + \boldsymbol{\varepsilon }\right) \right\rceil \bmod t, \end{aligned}$$

where \(\boldsymbol{k}\in \mathcal {R}\) and \(\boldsymbol{\varepsilon }\) is the rounding error coming from \(\lfloor Q[\boldsymbol{m}]_t/t \rceil = Q[\boldsymbol{m}]_t/t + \boldsymbol{\varepsilon }\), such that \(\Vert \boldsymbol{\varepsilon }\Vert _\infty \le 1/2\). Therefore the decryption will be correct as long as:

$$ \frac{t}{Q}\left\Vert { \boldsymbol{v} + \boldsymbol{\varepsilon }}\right\Vert _\infty < \frac{1}{2}, $$

which is satisfied if

$$\begin{aligned} \Vert \boldsymbol{v}\Vert _\infty < \frac{Q}{2t} - \frac{1}{2}. \end{aligned}$$
(8)

Remark 2

Note that one can compute \(\lfloor Q[\boldsymbol{m}]_t/t\rceil \bmod Q\) directly in RNS as long as t and Q are coprime since

$$ \left\lfloor \frac{Q[\boldsymbol{m}]_t}{t}\right\rceil = \frac{Q[\boldsymbol{m}]_t - [Q\boldsymbol{m}]_t}{t} = -\frac{[Q\boldsymbol{m}]_t}{t} \bmod Q.$$

The second benefit is observed in the addition since now we have

$$\begin{aligned} \boldsymbol{c}_0 + \boldsymbol{c}_1\cdot \boldsymbol{s} + \boldsymbol{c}'_0+\boldsymbol{c}'_1\cdot \boldsymbol{s}&= \frac{Q}{t}\left( [\boldsymbol{m}]_t + [\boldsymbol{m}']_t\right) + \boldsymbol{v} + \boldsymbol{v}' + \boldsymbol{\varepsilon } + \boldsymbol{\varepsilon }' \\&= \frac{Q}{t}\left( [\boldsymbol{m}+\boldsymbol{m}']_t + t\boldsymbol{u}\right) + \boldsymbol{v} + \boldsymbol{v}' + \boldsymbol{\varepsilon } + \boldsymbol{\varepsilon }' \\&= \frac{Q}{t}[\boldsymbol{m}+\boldsymbol{m}']_t + \boldsymbol{v} + \boldsymbol{v}' + \boldsymbol{\varepsilon } + \boldsymbol{\varepsilon }' \bmod Q. \end{aligned}$$

Hence the noise after a homomorphic addition is bounded by

$$\begin{aligned} \Vert \boldsymbol{v}_{\texttt {new-add}}\Vert _\infty \le \left\Vert {\boldsymbol{v}}\right\Vert _\infty + \left\Vert {\boldsymbol{v}'}\right\Vert _\infty + 1. \end{aligned}$$
(9)

Note that the decryption bound (8) and the addition bound (9) are now exactly the same as for BGV.

The equations for homomorphic multiplication can be simplified the same way. Denoting \(\boldsymbol{\tilde{v}} = \boldsymbol{v} + \boldsymbol{\varepsilon }\), \(\texttt {ct}_\texttt {tensor}\) is computed as

$$\begin{aligned} (\boldsymbol{c}_0+\boldsymbol{c}_1\cdot \boldsymbol{s})\cdot (\boldsymbol{c}'_0+\boldsymbol{c}'_1\cdot \boldsymbol{s})&= \left( \frac{Q}{t}[\boldsymbol{m}]_t + \boldsymbol{\tilde{v}} + \boldsymbol{k}Q\right) \cdot \left( \frac{Q}{t}[\boldsymbol{m}']_t + \boldsymbol{\tilde{v}}' + \boldsymbol{k}'Q\right) \\&= \frac{Q^2}{t^2}[\boldsymbol{m}\cdot \boldsymbol{m}']_t + \frac{Q}{t}\boldsymbol{v}_{\texttt {new-tensor}} + \frac{Q^2}{t}\boldsymbol{k}_{\texttt {new-tensor}}, \end{aligned}$$

where

$$\begin{aligned} \boldsymbol{v}_{\texttt {new-tensor}} = [\boldsymbol{m}]_t\cdot \boldsymbol{\tilde{v}}' + [\boldsymbol{m}']_t\cdot \boldsymbol{\tilde{v}} + \frac{t}{Q}\boldsymbol{\tilde{v}}\cdot \boldsymbol{\tilde{v}'} + t(\boldsymbol{\tilde{v}}\cdot \boldsymbol{k}' + \boldsymbol{\tilde{v}}'\cdot \boldsymbol{k}), \\ \boldsymbol{k}_{\texttt {new-tensor}} = [\boldsymbol{m}]_t\cdot \boldsymbol{k}' + [\boldsymbol{m}']_t\cdot \boldsymbol{k} + t\boldsymbol{k}\cdot \boldsymbol{k}' + \boldsymbol{r}_m. \end{aligned}$$

Then after the scaling by t/Q and rounding, similarly to the original BFV scheme, the noise of the multiplication is given by: \(\boldsymbol{v}_{\texttt {new-mult}} = \boldsymbol{v}_{\texttt {new-tensor}} + \boldsymbol{v}_r\), and is bounded by:

$$\begin{aligned} \Vert \boldsymbol{v}_{\texttt {new-mult}} \Vert _\infty \le&\frac{\delta _\mathcal {R}t}{2} \left( \frac{2\left\Vert {\boldsymbol{\tilde{v}}}\right\Vert _\infty \left\Vert {\boldsymbol{\tilde{v}}'}\right\Vert _\infty }{Q}+ \left( 4+\delta _\mathcal {R}B_\texttt {key} \right) \left( \left\Vert {\boldsymbol{\tilde{v}}}\right\Vert _\infty +\left\Vert {\boldsymbol{\tilde{v}}'}\right\Vert _\infty \right) \right) \nonumber \\&+ \frac{1 + \delta _\mathcal {R}B_{\texttt {key}} + \delta ^2_\mathcal {R}B^2_{\texttt {key}}}{2}. \end{aligned}$$
(10)

We will see in Sect. 5 that this bound is similar to the bound for BGV.

Similarly to [27], we can derive a bound on the noise after having evaluated a binary tree of depth L from (10). By assuming that the size of the noise of ciphertexts is bounded by V before the first multiplication, the noise of the resulting ciphertext will be bounded by

$$\begin{aligned} C_1^L V + C_2 \sum _{i=0}^{L-1} C_1^i \le C_1^LV + LC_2C_1^{L-1} \end{aligned}$$

with \(C_1 = \delta _\mathcal {R}t(5 + \delta _\mathcal {R}B_{\texttt {key}})\) and \(C_2 = (1 + \delta _\mathcal {R}B_{\texttt {key}} + \delta ^2_\mathcal {R}B^2_{\texttt {key}})/2 + V_{\texttt {ks}}\) where \(V_{\texttt {ks}}\) represents the noise added by the key-switching (see the full version).

Last we want to highlight further the similarity between BGV and BFV. Just like in the GHS variant of BGV, one can encrypt a ciphertext ct in BFV using a slightly larger modulus Qp and then rescale the ciphertext by 1/p, which will have the same effect as modulus switching in BGV:

$$ \texttt {ct}_{\texttt {scale}} = \left\lfloor \frac{1}{p} \texttt {ct} \right\rceil \bmod Q,$$

so that its noise after scaling is bounded by

$$ \Vert \boldsymbol{v}_{\texttt {scale}}\Vert _\infty \,\le \, \frac{\Vert \boldsymbol{v}\Vert _\infty }{p} + \frac{1}{2p} + \frac{1 + \delta _\mathcal {R}B_{\texttt {key}}}{2}. $$

Therefore, like in BGV, this allows to reduce the noise of a freshly encrypted to \((1 + \delta _\mathcal {R}B_{\texttt {key}})/2\approx \delta _\mathcal {R}/2\). Note that the noise benefit of the BFV encryption proposed in our work will become more significant as the fresh noise is several bits larger than the modulus switching noise. Moreover, when using GHS or Hybrid key switching, p can be chosen as one of the moduli of the key-switching basis, and therefore this technique will not impact the selection of Ring-LWE security parameters.

3.2 Modified Homomorphic Multiplication

In the previous subsection, we showed how to instantiate BFV in such a way that it is not worse than BGV in terms of noise growth. Now the main difference left between the two schemes is in the complexity of their homomorphic multiplication procedure. In a nutshell, in BGV the tensoring can be done directly modulo Q while in BFV it must be done without any modular reduction. In practice, this requires using a second RNS basis \(P = p_1\cdots p_{k'}\) such that \(\texttt {ct}(\boldsymbol{s})\cdot \texttt {ct}'(\boldsymbol{s}')\) does not wrap around modulo PQ. More precisely, the critical part in practice is to avoid the wraparound of the dominant term \(Qt\boldsymbol{k}\cdot \boldsymbol{k}'\) modulo P during the scaling (see Sect. 2.2). This requires to choose \(P>t\delta _\mathcal {R}^3 Q/4\), which in practice is satisfied by setting \(k'=k+1\) for the RNS instantiation. Algorithm 1 recalls the original homomorphic multiplication procedure of BFV.

figure a

We propose a new homomorphic multiplication algorithm, with a reduced computational complexity. Instead of multiplying two ciphertexts modulo Q and dealing with a multiple of \(Q^2\) modulo QP, the idea is to switch one of the two ciphertexts to modulus P so that after the tensoring we obtain a multiple of PQ that vanishes modulo PQ. As explained in the above paragraph, this would allow us to reduce the size of P since the original dominant term would now disappear. More precisely, the procedure starts as usual with two ciphertexts encrypted modulo Q:

$$ \texttt {ct}(\boldsymbol{s}) = \frac{Q}{t}[\boldsymbol{m}]_t + \boldsymbol{\tilde{v}} + \boldsymbol{k}Q \text { and } \texttt {ct}'(\boldsymbol{s}) = \frac{Q}{t}[\boldsymbol{m}']_t + \boldsymbol{\tilde{v}}' + \boldsymbol{k}'Q. $$

with \(\boldsymbol{\tilde{v}} = \boldsymbol{v} + \boldsymbol{\varepsilon }\) and \(\boldsymbol{\tilde{v}}' = \boldsymbol{v}' + \boldsymbol{\varepsilon }'\), like in Sect. 3.1.

Then one can perform the modulus switching of one of the two ciphertexts, say \(\texttt {ct}\), to convert it to modulus P by computing

$$ \hat{\texttt {ct}} = \left\lfloor \frac{P}{Q} \texttt {ct}\right\rceil \bmod P, $$

which satisfies:

$$ \hat{\texttt {ct}}(\boldsymbol{s}) = \frac{P}{t}[\boldsymbol{m}]_t + \frac{P\tilde{\boldsymbol{v}}}{Q} + \boldsymbol{k}P + \boldsymbol{\varepsilon }_{\texttt {round}} \bmod P,$$

where the rounding error \(\boldsymbol{\varepsilon }_{\texttt {round}} = \hat{\texttt {ct}}(\boldsymbol{s}) - P\texttt {ct}(\boldsymbol{s}) /Q\) has its norm bounded by \((1+\delta _\mathcal {R}B_\texttt {key})/2\) as usual. From there, \(\hat{\texttt {ct}}\) is expanded from P to QP, \(\texttt {ct}'\) is expanded from Q to QP, and one can perform the tensoring as usual to obtain

$$\begin{aligned} \hat{\texttt {ct}}_\texttt {tensor}(\boldsymbol{s}) =~&\frac{PQ}{t^2}[\boldsymbol{m}\cdot \boldsymbol{m}']_t + \frac{P}{t}\hat{\boldsymbol{v}}_\texttt {tensor}+ \frac{QP}{t}\hat{\boldsymbol{k}}_\texttt {tensor}\\&+ \boldsymbol{\varepsilon }_{\texttt {round}}\cdot \left( \frac{Q}{t}[\boldsymbol{m}']_t + \boldsymbol{\tilde{v}}' + \boldsymbol{k}'Q \right) \bmod PQ, \end{aligned}$$

with \( \hat{\boldsymbol{v}}_\texttt {tensor}= \boldsymbol{v}_{\texttt {new-tensor}}\) from Sect. 3.1 and

$$ \hat{\boldsymbol{k}}_\texttt {tensor}= [\boldsymbol{m}]_t\cdot \boldsymbol{k}' + [\boldsymbol{m}']_t\cdot \boldsymbol{k} + \boldsymbol{r}_m\in \mathcal {R}.$$

Note that this time \(\hat{\boldsymbol{k}}_\texttt {tensor}\) does not contain a multiple of \(\boldsymbol{k}\cdot \boldsymbol{k}'\). Then to get back a valid ciphertext modulo Q, one must scale down the result by t/P, instead of t/Q in the original case, which leads to

$$\begin{aligned} \frac{t}{P}\hat{\texttt {ct}}_\texttt {tensor}(\boldsymbol{s}) =~&\frac{Q}{t}[\boldsymbol{m}\cdot \boldsymbol{m}']_t + \hat{\boldsymbol{v}}_\texttt {tensor}+ Q\hat{\boldsymbol{k}}_\texttt {tensor}+ \frac{t\boldsymbol{\varepsilon }_{\texttt {round}}}{P}\cdot \left( \frac{Q}{t}[\boldsymbol{m}']_t + \boldsymbol{\tilde{v}}' + \boldsymbol{k}'Q \right) . \end{aligned}$$

After the rounding, the multiple of Q will vanish modulo Q and one will have to take into account the rounding error term \(\boldsymbol{v}_r\) of norm bounded by \((1+\delta _\mathcal {R}B_\texttt {key}+ \delta _\mathcal {R}^2 B_\texttt {key}^2)/2\), which adds to the noise, like in the original BFV scheme. Therefore, the noise of this variant of homomorphic multiplication is bounded by

$$\begin{aligned} \Vert \hat{\boldsymbol{v}}_{\texttt {new-mult}} \Vert _\infty \,\le \, \Vert \boldsymbol{v}_{\texttt {new-mult}} \Vert _\infty + \frac{t\delta _\mathcal {R}(\delta _\mathcal {R}B_\texttt {key}+ 1)}{2P}\left( \left\Vert {\tilde{\boldsymbol{v}}}\right\Vert _\infty ' + \frac{Q(\delta _\mathcal {R}B_\texttt {key}+ 4)}{2} \right) . \end{aligned}$$
(11)

Notice that the only difference in the noise growth between Eq. (10) and Eq. (11) is due to rounding error \(\boldsymbol{\varepsilon }_{\texttt {round}}\) occuring during the first modulus switching. However, we can control the size of this noise with P. As explained in Sect. 3.1, the norm of \(\boldsymbol{v}_{\texttt {new-mult}}\) is dominated by \(\delta _\mathcal {R}^2 t V\), where V is the bound on the size of the noise of \(\tilde{\boldsymbol{v}}\) and \(\tilde{\boldsymbol{v}}'\). If we look carefully at the other term and choose \(P\approx Q\), it will be dominated by \(\delta _\mathcal {R}^3t/4\). Since the noise of a fresh ciphertext, even scaled-down after encryption, has always a size larger than \(\delta _\mathcal {R}/2\), this error term will add at most half a bit to the noise of the first multiplication in the worst case, which can be considered as neglible in practice as a larger cushion is typically added to the heuristic expression for \(\delta _R\), or, more precisely, \(\delta _R^3\) in this case. This means that one can choose \(P \approx Q\) and hence \(k' = k\), instead of \(k' = k+1\) in the original case, which reduces the size of P and still achieves the same noise growth as in Sect. 3.1. We summarize our new multiplication algorithm in Algorithm 2.

figure b

Remark 3

Notice that since one must scale the tensored ciphertext by t/P instead of t/Q, it can be done directly in the basis Q. Therefore the homomorphic multiplication procedure requires 2 basis extensions from Q to P for \(\texttt {ct}=(\boldsymbol{c}_0,\boldsymbol{c}_1)\) in the beginning, 4 more basis extensions to expand the two ciphertexts \(\hat{\texttt {ct}}=(\hat{\boldsymbol{c}}_0,\hat{\boldsymbol{c}}_1)\) and \(\texttt {ct}'=(\boldsymbol{c}'_0,\boldsymbol{c}'_1)\) from P and Q, respectively, to PQ. Finally one needs 3 additional basis extensions from PQ to Q for \(\hat{\texttt {ct}}_\texttt {tensor}\in \mathcal {R}_{PQ}^3\) to perform the downscaling modulo Q. Thus it requires a total of 9 basis extensions instead of \(4 + 3 + 3 = 10\) in the original BFV algorithm. Moreover, since P is slightly smaller, each basis extension will be cheaper to compute.

Leveled BFV Multiplication. If one wants to make BFV even closer to BGV in terms of performance, one could consider a “leveled” approach to BFV by working with ciphertexts modulo \(Q_\ell = q_1\cdots q_\ell \) and performing modulus switching as the computation progresses. However, as in BGV, one would have to manage ciphertexts at different levels and deal with more challenging noise estimation.

To keep the usability of BFV, we propose instead a “leveled” multiplication that pre-scales both ciphertexts by \(\frac{Q_\ell }{Q}\) (using internal modulus switching to \(Q_\ell \)), and then multiplies the result by \(\frac{Q}{Q_\ell }\) after the multiplication procedure. In this case, the ciphertexts will always stay modulo Q outside the multiplication procedure, but the multiplication will be done internally modulo \(Q_\ell < Q\) and hence will be more efficient.

In this case, the noise of input ciphertexts after internal modulus switching from Q to \(Q_\ell \) will be equal to

$$ \hat{\boldsymbol{v}} = \frac{Q_\ell }{Q}\boldsymbol{v} + \boldsymbol{\varepsilon }_{\texttt {round}} \text { and } \hat{\boldsymbol{v}}' = \frac{Q_\ell }{Q}\boldsymbol{v}' + \boldsymbol{\varepsilon }'_{\texttt {round}},$$

where \( \boldsymbol{\varepsilon }_{\texttt {round}}\) and \( \boldsymbol{\varepsilon }'_{\texttt {round}}\) have their norm bounded by \((1+\delta _\mathcal {R}B_\texttt {key})/2 \approx \delta _\mathcal {R}/2\). On the one hand, the main consideration here is to choose \(\ell \) such that \(\frac{Q_\ell }{Q}\left\Vert {\boldsymbol{v}}\right\Vert _\infty \) remains significantly larger than \(\left\Vert {\boldsymbol{\varepsilon }_{\texttt {round}}}\right\Vert _\infty \approx \frac{\delta _\mathcal {R}}{2}\), so that the noise brought about by the modulus switching procedure will not significantly impact the overall noise growth. This is equivalent to

$$\begin{aligned} \left\Vert {\boldsymbol{v}}\right\Vert _\infty \gg \frac{Q\delta _\mathcal {R}}{2Q_\ell } \text { and } \left\Vert {\boldsymbol{v}}\right\Vert _\infty ' \gg \frac{Q\delta _\mathcal {R}}{2Q_\ell }, \end{aligned}$$

or in practice

$$\begin{aligned} \left\Vert {\boldsymbol{v}}\right\Vert _\infty> 8 \frac{Q\delta _\mathcal {R}}{Q_\ell } \text { and } \left\Vert {\boldsymbol{v}}\right\Vert _\infty ' > 8 \frac{Q\delta _\mathcal {R}}{Q_\ell }. \end{aligned}$$
(12)

On the other hand, in order to gain as much as possible in efficiency, \(Q_\ell \) must be chosen as the smallest modulus satisfying inequalities (12). Theoretically this requires to have a precise estimate of the average (or lower bound within a certain confidence interval) noise size. But in practice it is enough to add a heuristic “cushion” to our worst-case bound. See the full version for details.

figure c

Remark 4

Note that this “leveled” optimization can be equally applied to key switching. The only difference in this case would be to ensure that the noise of the scaled ciphertext remains larger than the noise brought about by the key-switching procedure itself.

Table 1. Complexities of different multiplication methods.

Remark 5

Modulus switching from Q to \(Q_\ell \) and then from \(Q_\ell \) to \(P_\ell \) in Algorithm 3 can be combined into a single modulus switching from Q directly to \(P_\ell \). This reduces the number of integer multiplications from \((k \ell + \ell ^2 + 2 \ell ) n\) to \((k \ell + \ell ) n\). Note that an approximate modulus switching (instead of an exact one with a floating-point correction technique from [21]) can be employed by adding extra \(\log k\) bits to the noise estimate used for reducing the number of levels inside homomorphic multiplication. Both exact and approximate procedures for switching the moduli of ciphertexts between arbitrary RNS bases are described in the full version.

Table 1 summarizes the computational complexities of different multiplication algorithms by assuming that the extension from Q to QP is performed using the technique from [21] with floating-point operations.

Lazy Scaling in BFV Multiplication. An additional optimization can be implemented by noticing that tensoring and scaling can be separated to optimize some evaluation circuits. For example, consider the inner product circuit of two vectors of ciphertexts. We evaluate it by multiplying (tensoring and scaling) the pairs of ciphertexts and then adding the results \(\pmod {\mathcal {R}_Q}\). It is more efficient to do this in a different way: first we apply the tensoring subroutine to the pairs of ciphertexts, then add the results \(\pmod {\mathcal {R}_{QP}}\), and finally perform the expensive scaling subroutine only once. Indeed, after tensoring we already have the information about the multiplicative noise \(\boldsymbol{v}_\texttt {tensor}\), thus changing the order of scaling and additions does not affect the \(\boldsymbol{v}_\texttt {tensor}\) noise. Moreover, as we perform the scaling down only once instead of doing it for each pair of ciphertexts, the total noise from the inner product is actually reduced. We call this technique lazy scaling and describe the pseudocode in the full version. The experimental results in the full version suggest that this optimization can speed up inner products by more than 2x in practice.

3.3 Improved Decryption in the HPS RNS Variant

A significant practical limitation of the HPS approach is that high-precision (“long double” or even quad-precision) floating-point arithmetic is required to support larger CRT moduli [21]. We introduce a general-purpose digit decomposition technique (inspired by digit decomposition in key switching) and apply it to the HPS decryption procedure to add support for arbitrary CRT moduli using only regular double-precision floating-point arithmetic, hence overcoming this limitation of the HPS variant.

The idea of HPS scaling [21] for decryption can be briefly explained as follows: for \(x\in \mathbb {Z}_{Q}\) with CRT representation \((x_1,\dots ,x_k)\) we want to compute an integer \(y = \left\lceil {t/Q x}\right\rfloor \in \mathbb {Z}_t\), and use a CRT composition formula to derive the following expression:

$$\begin{aligned} \begin{aligned} y :=&\left\lceil {\frac{t}{Q}x}\right\rfloor = \left\lceil {\left( \sum _{i=1}^k x_i \cdot \left[ \left( \frac{Q}{q_i}\right) ^{-1} \right] _{q_i} \cdot \frac{Q}{q_i} \cdot \frac{t}{Q} \right) - u \cdot Q \cdot \frac{t}{Q}}\right\rfloor \\ =&\left[ \left\lceil {\left( \sum _{i=1}^k x_i \cdot \left( \left[ \left( \frac{Q}{q_i}\right) ^{-1} \right] _{q_i} \cdot \frac{t}{q_i} \right) \right) }\right\rfloor \right] _t = \left[ \left( \sum _{i=1}^k x_i \cdot \omega _i \right) + \left\lceil {\sum _{i=1}^k x_i \cdot \theta _i}\right\rfloor \right] _t, \end{aligned} \end{aligned}$$
$$ \text{ where } \left[ {\left( \frac{Q}{q_i}\right) }^{-1} \right] _{q_i} \cdot \frac{t}{q_i} = \omega _i + \theta _i ~\text{ with }~ \omega _i\in \mathbb {Z}_t ~\text{ and }~ \theta _i \in \left[ -\frac{1}{2}, \frac{1}{2}\right) . $$

As we can only store approximate values \(\tilde{\theta }_i = \theta _i + \epsilon _i\), the magnitude of the error term \(|\epsilon '| = |\sum _{i} x_{i}\epsilon _{i}|\) in the fractional part is limited by \(kq_m\epsilon _m\), where \(q_m = \max _i(q_i)\) and \(\epsilon _m = \max _i(\epsilon _i)\). If we restrict the floating-point precision to “doubles”, which are natively supported by modern CPUs, we have to introduce a constraint \(kq_m < 2^{51}\). To support larger CRT moduli, we need “long doubles” or even quad-precision arithmetic: long doubles are needed for CRT moduli from 47 to 58 bits, and quad-precision floats are needed for higher CRT moduli [21].

Our main idea is to perform digit decomposition, somewhat similar to how digit decomposition is done in BV key-switching, to replace the factor \(q_m\) with a smaller digit of it. For base \(B_s\in \mathbb {Z}\), \(B_s \ge 2\), let \(d_s = \left\lceil {\log (q_m)/\log (B_s)}\right\rceil \). Let \(x_i = \sum _{j=0}^{d_s-1} x_{i,j}\cdot B_s^j\) be the \(B_s\) base decomposition of \(x_i\). Then the expression for y can be rewritten as

$$\begin{aligned} \begin{aligned} y =&\left[ \left\lceil {\left( \sum _{i=1}^k\sum _{j=0}^{d_s-1} x_{i,j}\cdot \left( \frac{t}{q_i}\cdot \left[ \left[ {\left( \frac{Q}{q_i}\right) }^{-1} \right] _{q_i} \cdot B_s^j \right] _{q_i} \right) \right) }\right\rfloor \right] _t\\ =&\left[ \left( \sum _{i=1}^k\sum _{j=0}^{d_s-1} \left[ x_{i,j}\cdot \omega _{i,j} \right] _t \right) + \left\lceil {\sum _{i=1}^k\sum _{j=0}^{d_s-1} x_{i,j}\cdot \theta _{i,j}}\right\rfloor \right] _t, \end{aligned} \end{aligned}$$
$$ \text{ where } \frac{t}{q_i}\cdot \left[ \left[ {\left( \frac{Q}{q_i}\right) }^{-1} \right] _{q_i} \cdot B_s^j \right] _{q_i} = \omega _{i,j} + \theta _{i,j},~ ~\text{ with }~\omega _{i,j}\in \mathbb {Z}_t ~\text{ and }~\theta _{i,j} \in \left[ -\frac{1}{2}, \frac{1}{2}\right) . $$

Note that \(\omega _{i,j}\) and \(\theta _{i,j}\) are the new precomputation factors instead of \(\omega _{i}\) and \(\theta _{i}\).

Error Analysis. The magnitude of the error term \(|\epsilon '| = |\sum _{i,j} x_{i_j}\epsilon _{i,j}|\) is now limited by \(\left| \sum _{i,j} x_{i,j} \epsilon _{i,j}\right| < kd_sB_s\epsilon _m\). In practice, our moduli \(q_i\) are normally bounded by 64 (or often by 60) bits. We have already considered the case of \(kq_m < 2^{51}\). If \(kq_m > 2^{51}\), we can take \(d_s=2\), \(B_s = 2^{\left\lceil {\log _2{q_m}/2}\right\rceil }\). Then \(|\epsilon '|< 2^{-19}k< 1/4\) for \(k<2^{17}\). Hence the floating-point error will have no effect on the result for any practically reasonable value of k.

Complexity Analysis. The procedure takes \(kd_s\) floating-point multiplications, \(kd_s\) modular integer multiplications, some modular additions, and one rounding to compute \(\left\lceil {u}\right\rfloor \). However, if \(tkd_sB_s < 2^{64}\), then we can replace modular multiplications and modular additions by plain integer multiplications and additions respectively, and do one modular reduction at the end.

Remark 6

Note that this digit decomposition technique can be applied to other mixed integer/floating-point RNS operations to reduce precision requirements for floating-point arithmetic or avoid extra noise due to floating-point rounding. For instance, it can be used in the scaling for homomorphic multiplication.

4 More Usable BGV Scheme

The practical use of the BGV scheme requires accurate dynamic noise estimation to decide when the modulus operation should be executed, and what scaling factor should be chosen for modulus switching [20]. Each modulus switching decision may significantly impact the noise not only for the current operation, but also for all subsequent operations. An error in a noise estimate may eventually lead even to a decryption failure. Therefore, fine-tuned noise estimation techniques are used to estimate the noise for various operations (see [23] for a more detailed discussion). In contrast, the BFV scheme is much more robust to inaccuracies in noise estimation and typically only requires an upper bound on the error for the desired multiplicative depth. This robustness of BFV is related to the use of the MSD encoding and scaling down by a large factor Q/t during BFV homomorphic multiplication, and the “fragility” of BGV is caused by the LSD encoding and scaling down by a small factor, comparable in magnitude to the noise incurred in operations after the previous modulus switching. For this reason, many modern homomorphic encryption libraries implement BFV as the scheme for finite fields, while only HELib and PALISADE (since quite recently) provide efficient implementations of BGV.

We present an alternative approach for instantiating BGV, which does not require dynamic noise estimation. For this instantiation, one only needs to specify the multiplicative depth of the computation, maximum number of additions per multiplicative level, and the number of additions and automorphisms before first multiplication. Then all moduli \(Q_L, Q_{L-1}, \ldots , Q_1, Q_0\) are chosen so that a small, constant level of noise can be maintained throughout the computation. All modulus switching operations are automatically performed right before a homomorphic multiplication. Conceptually, this BGV instantiation is similar in usability to BFV, where only the multiplicative depth needs to be specified upfront and all “modulus switching” operations are performed automatically.

The logic for choosing the moduli is as follows. We start with a fresh encryption that has a noise \(\left\Vert {\boldsymbol{v}_\texttt {fresh}}\right\Vert _\infty = B_\texttt {err}(2\delta _\mathcal {R}B_\texttt {key}+ 1)\). Then we perform automorphism operations and additions, and apply modulus switching right before the first multiplication. This additional modulus switching before first multiplication allows us to reset the noise to a value comparable to the modulus switching noise, which will be the constant noise level \(\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty \) we will maintain throughout the computation. This can be expressed as

$$ \frac{Q_{L}}{Q_{L+1}}\left( (n_{\text {add}}+1) \left\Vert {\boldsymbol{v}_\texttt {fresh}}\right\Vert _\infty + n_{\text {ks}}\left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty \right) + \frac{1+\delta _\mathcal {R}B_\texttt {key}}{2} \le \left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty , $$

where \(n_{\text {add}}\) and \(n_{\text {ks}}\) are the numbers of additions and automorphism operations, respectively, that are performed before first multiplication, and \(\left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty \) is the bound on key switching noise. Note that here we introduced a new level and corresponding new modulus \(Q_{L+1}\) to account for an extra level we added before first multiplication. It is best to choose \(Q_{L+1}/Q_L\) such that \(\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty \approx 1+\delta _\mathcal {R}B_\texttt {key}\) to achieve the smallest constant error because this error will allow us to minimize the subsequent values of \(Q_{i+1}/Q_i\) for most practical scenarios, hence resulting in the minimum value of ciphertext modulus \(Q_{L+1}\) (see the full version for more details, and more general expression for optimal \(\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty \)).

Then for multiplication levels (from L to 1), we have to satisfy

$$\begin{aligned}&\frac{Q_{i}}{Q_{i+1}}\left( (n'_{\text {add}}+1) \frac{\delta _\mathcal {R}t}{2} (2\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty ^2 + 2\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty + 1) + (n'_{\text {ks}}+1) \left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty \right) \\&\qquad \qquad \qquad \qquad \qquad \,\,\,\,\,\,\, +\,\frac{1+\delta _\mathcal {R}B_\texttt {key}}{2} \le \left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty , \end{aligned}$$

where \(n'_{\text {add}}\) and \(n'_{\text {ks}}\) are the maximum numbers of additions and key switching operations, respectively, allowed per any multiplication level (going from L down to 1). For simplicity we use these maximum values across all levels so that \(Q_{i+1}/Q_i\) could have roughly same value for all \(i \in \{1,\ldots ,L-1\}\). Note that for Hybrid key switching and relatively large plaintext moduli, such as \(t = 2^{16}+1\), which is often used for CRT packing, the multiplication noise is always much higher than \(\left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty \) (see derivations in the full version). Hence for this case we can rewrite the expression as

$$\begin{aligned} \frac{Q_{i}}{Q_{i+1}}\left( (n'_{\text {add}}+1) \frac{\delta _\mathcal {R}t}{2} (2\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty ^2 + 2\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty + 1) \right) + \frac{1+\delta _\mathcal {R}B_\texttt {key}}{2} \le \left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty . \end{aligned}$$

The last modulus \(Q_0\) is chosen such that decryption is correct for a ciphertext with noise bounded by \(\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty \). This implies that \(Q_0 > 2 t \left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty - t\).

It is easy to show that once \(L, n_{\text {add}}, n_{\text {ks}}, n'_{\text {add}}\), and \(n'_{\text {ks}}\) (only needed for small t) are given, all moduli \(Q_i\) can be derived. First we compute \(Q_0\), then all values \(Q_1,\ldots ,Q_L\), and finally we can find \(Q_{L+1}\).

This logic is simple to implement and avoids any dynamic noise estimation during the computation. It is also robust to inaccurate estimates as long as the upper bound for \(\delta _R\) is chosen appropriately, which is very similar to what is done for BFV. There is a cost for this simplicity and robustness. The moduli \(Q_{L+1}\) may be larger than the values obtained using the more granular approach with dynamic noise estimation [23] because we use the maximum values of \(n'_{\text {add}}\) and \(n'_{\text {ks}}\) over all intermediate levels. However, our experimental results show that this instantiation of BGV can be significantly faster than the improved BFV implementation described in Sect. 3.

Remarks on the RNS Instantiation. Recall that for original BGV, we choose \(Q = q_0\cdots q_L\) and denote \(Q_i= q_0\cdots q_i\) for \(0\le i\le L\), where all \(q_i = 1 \bmod 2N\) and co-prime to each other. In the case of our BGV variant, an extra \(q_{L+1}\) is introduced to reset the “fresh” noise to modulus switching noise. It is easy to show that for this setup, \(Q_0 = q_0\), \(Q_{i+1}/Q_i = q_{i+1}\), and \(Q_{L+1}/Q_L = q_{L+1}\).

The expressions for finding \(q_0\), \(q_i\), \(q_{L+1}\), where \(i \in \{1, \ldots , L\}\), can be written as follows:

$$\begin{aligned} q_0 > 2 t \left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty - t, \end{aligned}$$
(13)
$$\begin{aligned} q_i > 2 \left( (n'_{\text {add}}+1) \frac{\delta _\mathcal {R}t}{2} (2\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty + 2 + \frac{1}{\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty }) + (n'_{\text {ks}}+1) \frac{\left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty }{\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty } \right) , \end{aligned}$$
(14)
$$\begin{aligned} q_{L+1} > 2 \left( (n_{\text {add}}+1) \frac{\left\Vert {\boldsymbol{v}_{\texttt {fresh}}}\right\Vert _\infty }{\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty } + n_{\text {ks}}\frac{\left\Vert {\boldsymbol{v}_\texttt {ks}}\right\Vert _\infty }{\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty } \right) , \end{aligned}$$

where we take \(\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty = 1+\delta _\mathcal {R}B_\texttt {key}\).

Handling Crosslevel Operations and Scaling Factors. The GHS variant implemented in HELib uses ciphertext-specific scaling factors, which introduces some complications that may affect the usability and may require additional scalar multiplications to bring two ciphertexts to the same scaling factor. In our BGV variant, we chose a simpler approach where the same scaling factor is used for all ciphertexts at a specific level, which reduces the number of scalar multiplications. This approach was originally introduced for the CKKS scheme in [25], and in our work we adapt it to BGV.

5 Comparison of BFV and BGV

5.1 Noise Growth

When comparing BGV and BFV, it is convenient to use the leveled approach of BGV, first comparing \(Q_0\), then \(Q_i\), and finally \(Q_{L+1}\).

For \(Q_0\), our modified variant of BFV has identical noise as BGV, i.e., Eq. (8) is exactly the same as Eq. (13).

For \(Q_{i+1}/Q_i\), which corresponds to each multiplicative level, the dominant term in the BFV expression given by Eq. (11) is \(\delta ^2_R t B_\texttt {key}V\), where V is the largest of the errors in two multiplied ciphertexts. For BGV, Eq. (14) suggests that the dominant term is \(2 \delta ^2_R t B_\texttt {key}V\). In other words, the expressions for BFV and BGV are identical except for the extra multiplicative factor of 2. This factor appears in BGV because we ensure that at each level the downscaled noise matches the added modulus switching noise, keeping the noise level constant at twice the modulus switching noise (see the full version for details). In the case of BFV, the quadratic noise (product of prior noises for each ciphertext) is negligible as we downscale the ciphertext by a large factor Q/t, and we only observe the pure modulus switching noise. In other words, BFV has a small benefit of using one bit less per multiplication level.

There is also an extra advantage of BFV for small plaintext moduli, e.g., \(t=2\). As the analysis in the full version shows, the key switching noise in this case becomes comparable to multiplication noise for BGV, which implies higher values of \(Q_{i+1}/Q_i\). In contrast, the key switching noise may only affect the initial level in BFV, as afterwards the accumulated noise from prior multiplications will be much higher than additive key switching noise, which is independent of current ciphertext noise. When we switch to larger plaintext moduli, this BFV advantage disappears as the key-switching noise in BGV becomes negligible compared to multiplication noise (as shown in the full version).

Using the \((L+1)\)-th level (\(q_{L+1}\) in the RNS version) is preferred in BGV to achieve the smallest constant noise (twice the modulus switching noise). If \({(L+1)}\)-th level is not used, then the fresh noise will make each \(Q_{i+1}/Q_i\) larger by a factor \(\left\Vert {\boldsymbol{v}_\texttt {fresh}}\right\Vert _\infty /\left\Vert {\boldsymbol{v}_{\texttt {c}}}\right\Vert _\infty \approx 2B_\texttt {err}\approx 37\). Although one could use an auxiliary modulus in hybrid key switching during encryption instead (see the end of Sect. 3.1), extra noise can be accumulated from additions and/or key switching operations performed before first multiplication, which would increase all subsequent \(Q_{i+1}/Q_i\). So the least level of constant noise in BGV, and hence smallest \(Q_{i+1}/Q_i\), can be guaranteed only by introducing a relatively small extra “noise budget” for pseudo-level \(L+1\). Note that in BFV it is best to use an auxiliary modulus to reset the fresh noise to smaller modulus switching noise, without increasing the ciphertext modulus (see Sect. 3.1). Hence no pseudo-level \(L+1\) is needed in BFV, which is another small advantage of BFV over BGV.

In summary, the improved variants of BFV and BGV presented in this work have very similar noise growth, but BFV has some minor advantages over BGV, resulting in somewhat reduced ciphertext moduli needed to support the same computations.

5.2 Computational Complexity

The main difference between BFV and BGV in terms of computational complexity is due to the scaling method used in the multiplication operation. As was previously mentioned, BFV uses the MSD encoding and scales down the tensor product by a large Q/t factor, while BGV uses the LSD encoding technique to scale the tensor product only by a relatively small factor, comparable to the noise of previous multiplication. Considering that the noise growth is very similar in both schemes, one can expect that the computational complexity of BFV multiplication will be significantly higher. The purpose of this section is to quantify this difference, and examine the effect of plaintext moduli on this difference. Note that all other operations, such as addition and automorphism, use the same approach in both schemes, and do not have any significant difference in terms of theoretical complexity. Hence we focus on the operation of multiplication.

The analysis in Sect. 3.2 shows that our leveled BFV multiplication takes \(14\ell \) NTTs and \((4k\ell + 5\ell ^2 + 2k + 18\ell )n\) integer multiplications (we ignore for simplicity a much smaller contribution of floating-point operations). We also add the computational cost of hybrid key switching used for relinearization as there is a difference in its cost between BFV and BGV. For key-switching we assume that the ciphertext element is decomposed into \(d_{\text {num}} = \ell /\alpha \) digits, i.e. each digit is considered modulo \(\alpha \) moduli. The cost of relinearization for BFV is \(4\ell +2\alpha \) NTTs and \(n(3\alpha \ell + 2d_{\text {num}}\ell + 2\alpha + 5\ell )\) integer multiplications (see the full version for details). Here, for simplicity of analysis we assume that \(\ell \) is the same for leveled BFV multiplication and subsequent relinearization. The total cost of multiplication and relinearization in BFV is \(17\ell + 2\alpha \) NTTs and \(n(4k\ell + 5\ell ^2 + 2k + 23\ell + 3\alpha \ell + 2d_{\text {num}}\ell + 2\alpha )\) integer multiplications.

In the case of our BGV variant, the total cost of multiplication includes two modulus switching operations for input ciphertexts, the tensor product, and, finally, the relinearization. The cost of modulus switching is \(4(\ell '+1)\) NTTs and \(4n(\ell ' + 2)\) integer multiplications, where \(\ell '\) is the number of CRT moduli after modulus switching. The cost of tensor product is \(4n\ell '\) integer multiplications. The cost of relinearization in the case of BGV is: \(4 \ell ' + 2 \alpha '\) NTTs and \(n(3\alpha '\ell ' + 2d_{\text {num}}\ell ' + 4\alpha ' + 7\ell ')\) integer multiplications. Hence the total cost of multiplication and relinearization is \(8 \ell ' + 4 + 2 \alpha '\) NTTs and \(n(3\alpha '\ell ' + 2d_{\text {num}}\ell ' + 4\alpha ' + 15\ell ' + 8)\) integer multiplications.

One can observe that the number of NTTs needed for BFV multiplication appears to be 2x or even higher than for BGV. But we should keep in mind that typically \(\ell ' > \ell \). For example, when t = 2, we can even have \(\ell ' > 3 \ell \) since in BFV we work with large (60-bit) moduli vs the moduli of size \(\delta _R^2 t\) (less than 20 bits) in BGV. On the other hand, the cost of integer multiplications in BFV appears to be significantly higher due to multiple basis extension operations. The above may suggest that the complexity of BFV could be lower than for BGV at small t, while more significant benefits of BGV are expected as t is increased, when the ratio of \(\ell '/\ell \) becomes smaller than 2, which corresponds to the typical value of \(t = 2^{16}+1\) used for CRT packing. One could argue that this is essentially due to the assumption that the computations modulo each CRT moduli are implemented on different machine words, which is typically true for practical implementations of homomorphic encryption. As a consequence, while BGV might be practically slower than BFV at small t for classical implementations, we stress that this is only due to the way the CRT representation is usually implemented and that BGV still has a lower theoretical complexity than BFV even for small plaintext moduli.

Remark 7

To reduce even further the computational cost of BGV, one could trunk some CRT moduli together in the same 64-bit machine word. This would allow one to divide the number of moduli \(\ell '\), and thus of NTTs, by a factor of 2 when the moduli are smaller than 30 bits \((t\approx 2^{11})\) and by a factor of 3 when they are smaller than 20 bits \((t\approx 2)\).

Fig. 1.
figure 1

Comparison of homomorphic multiplication runtimes for BFV and BGV variants at various depths as a function of plaintext modulus t. Hybrid key switching with 3 digits, i.e., \(d_{\text {num}} = 3\), was used, and N was set to \(2^{15}\).

5.3 Software Implementation and Experimentation Setup

We implemented all variants of BFV and BGV in PALISADE v1.10.4. The evaluation environment was a commodity desktop computer system with an Intel(R) Core(TM) i7-9700 CPU @ 3.00 GHz and 64 GB of RAM, running Ubuntu 18.04 LTS. The compiler was g++ 9.3.0. All experiments were executed in the single-threaded mode.

PALISADE includes the implementation of both BEHZ and HPS variants of BFV. The runtime results and noise growth for both variants are roughly the same (as shown in Sect. 5.4). We chose the HPS variant as the main RNS variant for our BFV modifications due to its relative simplicity. We denote our modified BFV variant as BFV-NEW, our modified BFV variant with leveled multiplication as BFV-NEW-LVL, and our BGV variant as BGV-NEW. Note that our implementation does not trunk small CRT moduli in BGV for small values of t, i.e., it does not include the optimization suggested in Remark 7.

5.4 Performance Comparison

Figure 1 illustrates the comparison of homomorphic multiplication runtimes for the BFV and BGV variants developed in this work to the baseline for the prior state-of-the-art BFV implementation of the HPS variant [21]. The first major observation is that BFV-NEW-LVL outperforms BGV-NEW for small plaintext moduli (at least up to depth 20), while BGV-NEW runs significantly faster than BFV-NEW-LVL for intermediate and large plaintext moduli, i.e., \(t = 2^{16} + 1\) and \(t = 2^{30}-2^{18}+1\). This observation is in agreement with our theoretical complexity analysis in Sect. 5.2 since our implementation does not include the optimization suggested in Remark 7, i.e., small moduli are not trunked together. The second significant observation is that our best BFV variant, labeled as BFV-NEW-LVL, speeds up the runtime of deeper multiplications (depth-20 for \(t=2\) and depth-10 for higher t) by 3x-4x, as compared to the BFV baseline.

Table 2 shows the comparison of noise growth and runtimes for a binary tree computation ranging in multiplicative depth from 1 to 7. First, we want to point out that the noise growth and runtimes for the BEHZ and HPS variants are very close, with HPS having somewhat better runtime efficiency, which agrees well with the noise analysis in [6] and runtime comparison in [3]. In view of this, we chose HPS as the main variant for our BFV improvements (but similar gains can be expected for the BEHZ variant). Our next observation is that BGV has a slightly faster noise growth, as compared to all BFV variants, with the difference in noise increasing with depth (as predicted in Sect. 5.1). Note that the original BFV variants have somewhat higher noise (by almost constant number of bits) as compared to our BFV variants because they do not use the technique of encrypting with a slightly larger modulus Qp, followed with scaling by p. Our final observation is that \(\texttt {BGV-NEW}\) has a minor speed-up over the best BFV variant for the chosen plaintext modulus \(t=2^{16}+1\). Note that the speed-up is observed only for this or higher plaintext moduli, with \(\texttt {BFV-NEW-LVL}\) becoming faster for \(t=2\) (see the full version for details). Tables in the appendix of the full version also show the more significant effect of \(r_t(Q)\) on noise magnitude at larger plaintext moduli for the original BFV, as theoretically predicted in Sect. 3.

Table 2. Comparison of noise growth and runtimes of BFV and BGV variants for a benchmark computation \(\prod _{i=1}^{2^k} x_i\). Hybrid key switching with 3 digits, i.e., \(d_{\text {num}} = 3\), was used, t was set to \(2^{16}+1\), and \(\lambda \ge 128\). Here, e denotes the current noise magnitude, \(\log Q\), the size of the BFV ciphertext modulus, and \(\log Q_L\), the equivalent ciphertext modulus in BGV without the last CRT modulus \(q_{L+1}\).

Table 3 illustrates the comparison of noise growth and runtimes for a polynomial evaluation benchmark. Our first observation is that BGV-NEW has a significantly higher noise than all BFV variants because the moduli \(q_i\) in this case require extra room for the additions at each level (the deepest level has the most significant effect on all \(q_i\)’s). BGV-NEW again has a minor advantage in terms of runtime as compared to our best BFV variant for \(t=2^{16}+1\), but BFV-NEW-LVL becomes faster when we decrease t to smaller values (see the full version for details). Note that for \(k=8\), BGV-NEW has a smaller ring dimension than all BFV variants, which is an effect of the automated logic for hybrid key switching used in the implementation, rather than a result of better noise growth in BGV (since \(\log Q\) in BFV is significantly smaller than \(\log Q_L\) in BGV).

Table 3. Comparison of noise growth and runtimes of BFV and BGV variants for a benchmark computation \(\prod _{i=0}^{k} a_ix^i\): \(|a_i| < 16\). Hybrid key switching with 3 digits, i.e., \(d_{\text {num}} = 3\), was used, t was set to \(2^{16}+1\), and \(\lambda \ge 128\). Here, e denotes the current noise magnitude, \(\log Q\), the BFV ciphertext modulus, and \(\log Q_L\), the equivalent ciphertext modulus in BGV without the last CRT modulus \(q_{L+1}\).

6 Concluding Remarks

Our theoretical analysis and experimental results show that the modified BFV  variant has somewhat better noise growth than BGV for all plaintext moduli, though previous results suggested that BGV has a better noise growth than BFV for larger plaintext moduli [13, 14]. This result is mainly due to our modification of the BFV encryption procedure. The other major conclusion is that, when the moduli of BGV  are not trunked together, BFV is significantly faster for small plaintext moduli, e.g., \(t=2\), with BGV becoming faster as the plaintext modulus is increased.

The variant of BGV presented in this paper was mainly motivated by improving the usability of the scheme, which is known to be more challenging for use than BFV. From this perspective, this BGV variant is as easy to use as the implementation of BFV in PALISADE. However, the usability also has some performance cost, e.g., we have to choose the size of CRT moduli more conservatively. It would be interesting to examine how the performance of our BGV variant compares to the BGV design with dynamic noise estimation, which is implemented in HElib. It would not be fair to compare the PALISADE implementation directly with the HElib implementation as one would mostly observe the effect of differences in the efficiency of primitive ring operations, such as NTTs, rather than the differences between the BGV variants. For a fair comparison, a PALISADE implementation of the dynamic-noise BGV variant would be needed. Another potential improvement for BGV is to consider the idea of trunking multiple small CRT moduli mentioned in Remark 7. We plan to examine both ideas in our future work.