Keywords

1 Introduction

The position of integer factorization and the discrete logarithm problem as a cornerstone for asymmetric cryptography is being threatened by quantum computers, as their ability to efficiently solve these mathematical problems compromises the security of current asymmetric primitives. These developments have led to the emergence of post-quantum cryptography and motivated NIST to organize a post-quantum cryptography standardization process, with the goal of standardizing one or more quantum-resistant public-key cryptographic primitives. Submissions originate from various fields within post-quantum cryptography, such as lattice-based, code-based and multivariate cryptography.

Lattice-based cryptography has recently developed into one of the main research areas in post-quantum cryptography. Lattice-based submissions to the NIST Post-Quantum process can be broadly put into one of two categories: NTRU-based schemes (e.g. [39, 47]) and schemes based on the learning with errors (LWE) hard problem [36]. A lot of research has been done on their security, such as provable post-quantum secure transformations from IND-CPA to IND-CCA secure schemes [25, 29, 38, 46], security estimates of post-quantum primitives [3, 4] and provable reductions for various hard problems underlying the schemes [7, 11, 32, 35, 36].

A striking observation is that numerous proposed Key Encapsulation Mechanisms (KEM’s) have a small failure probability during decryption, in which the involved parties fail to derive a shared secret key. This is the case for the majority of schemes based on lattices, codes or Mersenne primes. The probability of such failure varies from \(2^{-64}\) in Ramstake [45] to \(2^{-216}\) in New Hope [41], with most of the failure probabilities lying around \(2^{-128}\). As this failure is dependent on the secret key, it might leak secret information to an adversary. However, reducing this probability has a price, as the parameters should be adjusted accordingly, resulting in a performance loss. An approach used by some schemes is to use error-correcting codes to decrease the failure probability. This leads to a reduction in the communication overhead, but makes the scheme more vulnurable to side-channel attacks.

As suggested by the wide range of failure probabilities in the NIST submissions, the implications of failures are still not well understood. In the absence of a clear evaluation technique for the impact of the failure rate, most NIST submissions have chosen a bound on the decryption failure probability around \(2^{-128}\) based on educated guessing. As far as we know, only NIST-submission Kyber [40] provides an intuitive reasoning for its security against decryption failure attacks, but this approximation is not tight. They introduce a methodology that uses Grover’s search algorithm to find ciphertexts that have a relatively high probability of triggering a decryption failure.

1.1 Related Works

The idea of exploiting decryption errors has been around for a long time and applies to all areas of cryptography [9]. For lattice-based encryption systems, the Ajtai-Dwork scheme and NTRU have been a target for attacks using decryption failures. Hall, Goldberg, and Schneier [23] developed a reaction attack which recovers the Ajtai-Dwork private key by observing decryption failures. Hoffstein and Silverman [24] adapted the attack to NTRU and suggested modifying NTRU to use the Fujisaki-Okamoto transform [18] to protect against such attacks. Further work in this direction is given in [28], [26] and [19]. Fluhrer [17] described an attack against Ring-Learning with Errors (RLWE) schemes. In [15] his work was extended to more protocols and in [8] a chosen-ciphertext attack on the proposal HILA5 [37] was given, using decryption failures.

These attacks are chosen-ciphertext attacks on proposals with only IND-CPA-security and can be thwarted using an appropriate transformation to a chosen-ciphertext secure scheme, such as the Fujisaki-Okamoto transformation [18]. Hofheinz et al. [25] and later Jiang et al. [29] proved a bound on the impact of the failure rate on an IND-CCA secure KEM that is constructed using this transformation, but their bounds are squared in the failure probability in the quantum oracle setting, which seems a very conservative result. Guo, Johansson and Stankovski [22] proposed a key-recovery attack against the IND-CCA-secure version of QC-MDPC, which is a code-based scheme. It uses a distinguishing property that “colliding pairs” in the noise and the secret can change the decryption failure rate.

1.2 Contributions

In this paper we investigate the requirements for KEM’s to resist decryption failure cryptanalysis. Having better security estimates can benefit the parameter selection process, resulting in improved security and efficiency. We focus on IND-CCA secure KEM’s based on the (Ring/Module-)Learning with Errors and (Ring/Module-)Learning with Rounding paradigms. Nonetheless, the general method can also be applied to investigate the impact of failures on other schemes.

The exploitation of decryption failures of an IND-CCA secure cryptographic scheme proceeds in two main steps: obtaining ciphertexts that result in a decryption failure and estimating the secret based on these ciphertexts. In the first step, an adversary can use failure boosting to find ‘weak’ input vectors that artificially enlarge the failure rate of the scheme. In Sect. 3, we examine how an adversary can generate these ‘weak’ ciphertexts that increase the failure probability. We provide a theoretical framework and a Python implementationFootnote 1 to calculate an estimate of the minimum effort required for an adversary to obtain one failing ciphertext.

Once ciphertexts that trigger a decryption failure are collected, they can be used to estimate the secret. In Sect. 4, we study how much information is leaked by the collected failures. We develop a statistical model to estimate the secret from the failures and determine the residual entropy of the secret after a certain number of failures is collected. The estimate of the secret can be used to construct an easier problem that can be solved faster.

Section 5 combines failure boosting and the secret estimation technique from Sect. 4 to estimate the security of schemes based on (Ring/Module)-Learning with Errors and (Ring/Module)-Learning with Rounding under an attack exploiting decryption failures. We show that an attacker could significantly reduce the security of some schemes if he is able to perform sufficient decryption queries. However, for most NIST submissions, the number of decryption queries required is above practical limits.

In Sect. 6 we propose a new generic weak-key (multi-target) model exploiting the fact that a fraction of keys employed can have much higher error probability if the chosen weak ciphertexts satisfy certain key-related properties. The detailed attack procedure is similar to the attack discussed in the previous sections. It first consists of a precomputation phase where special messages and their corresponding error vectors are generated. Secondly, the messages are submitted for decryption and some decryption errors are observed. Finally, a phase with a statistical analysis of the messages/errors causing the decryption errors reveals the secret key.

In Sect. 7 we apply the model to ss-ntru-pke, a version of NTRUEncrypt targeting the security level of NIST-V. The proposed attack is an adaptive CCA attack with complexity below the claimed security level. We provide a Rust implementationFootnote 2 where parts of the attack are simulated.

2 Preliminaries

2.1 Notation

Let \({\mathbb {Z}}_q\) be the ring of integers modulo q represented in \((-q/2,q/2]\), let \(R_q\) denote the ring \({\mathbb {Z}}_q[X] / (X^n + 1)\) and let \(R^{k_1 \times k_2}_q\) denote the ring of \(k_1 \times k_2\) matrices over \(R_q\). Matrices will be represented with bold uppercase letters, while vectors are represented in bold lowercase. Let \({\varvec{A}}_{ij}\) denote the element on the \(i^\text {th}\) row and \(j^\text {th}\) column of matrix \({\varvec{A}}\), and let \({\varvec{A}}_{ijk}\) denote the \(k^\text {th}\) coefficient of this element. Denote with \({\varvec{A}}_{:j}\) the \(j^{\text {th}}\) column of \({\varvec{A}}\).

The rounding operation \(\lfloor x \rceil _{q \rightarrow p}\) is defined as \(\lfloor p/q \cdot x \rceil \in {\mathbb {Z}}_p\) for an element \(x \in {\mathbb {Z}}_q\), while \(\mathtt {abs}(\cdot )\) takes the absolute value of the input. These operations are extended coefficient-wise for elements of \(R_q\) and \(R^{k_1 \times k_2}_q\). The two-norm of a polynomial \(a \in R_q\) is written as \(\left||a \right||_2\) and defined as \(\sqrt{\sum _i a_i^2}\), which is extended to vectors as \(\left||{\varvec{a}} \right||_2 = \sqrt{\sum _i \left||{\varvec{a}}_i \right||_2^2}\). The notation \(a \leftarrow \chi (R_q)\) will be used to represent the sampling of \(a \in R_q\) according to the distribution \(\chi \). This can be extended coefficient-wise for \({\varvec{A}} \in R^{k_1 \times k_2}_q\) and is denoted as \({\varvec{A}} \leftarrow \chi (R^{k_1 \times k_2}_q)\). Let \({\mathcal {U}}\) be the uniform distribution. Denote with \(\chi _1*\chi _2\) the convolution of the two distributions \(\chi _1\) and \(\chi _2\), and denote with \(\chi ^{*n} = \underbrace{\chi * \chi * \chi * \cdots * \chi * \chi }_n\) the \(n^{th}\) convolutional power of \(\chi \).

2.2 Cryptographic Definitions

A Public Key Encryption (PKE) is defined as a triple of functions PKE = \((\mathtt {KeyGen}, \mathtt {Enc}, \mathtt {Dec} )\), where the key generation \(\mathtt {KeyGen}\) returns a secret key sk and a public key pk, where the encryption \(\mathtt {Enc}\) produces a ciphertext c from the public key pk and the message \(m \in {\mathcal {M}}\), and where the decryption \(\mathtt {Dec}\) returns the message \(m'\) given the secret key sk and the ciphertext c.

A Key Encapsulation Mechanism (KEM) consists of a triple of functions KEM = \((\mathtt {KeyGen}, \mathtt {Encaps}, \mathtt {Decaps})\), where \(\mathtt {KeyGen}\) generates the secret and public keys sk and pk respectively, where \(\mathtt {Encaps}\) generates a key \(k \in {\mathcal {K}}\) and a ciphertext c from a public key pk, and where \(\mathtt {Decaps}\) requires the secret key sk, the public key pk and the ciphertext c to return a key k or the decryption failure symbol \(\perp \). The security of a KEM can be defined under the notion of indistinguishability under chosen ciphertext attacks (IND-CCA),

$$ \text {Adv}^{\text {ind-cca}}_{\text {KEM}}(A) = \left| P \left( b' = b : \begin{array}{c} (pk, sk) \leftarrow \mathtt {KeyGen}( ),\ b \leftarrow {\mathcal {U}}(\{0,1\}) , \\ (c, d, k_0) \leftarrow \mathtt {Encaps}(pk), \\ k_1 \leftarrow {\mathcal {K}},\ b' \leftarrow A^{\mathtt {Decaps}}(pk, c, d, k_b), \end{array} \right) - \frac{1}{2} \right| \, . $$

2.3 LWE/LWR Problems

The decisional Learning with Errors problem (LWE) [36] consists of distinguishing a uniform sample \(({\varvec{A}}, {\varvec{U}}) \leftarrow {\mathcal {U}}({\mathbb {Z}}^{k_1 \times k_2}_q \times {\mathbb {Z}}^{k_1 \times m}_q)\) from an LWE-sample \(({\varvec{A}}, {\varvec{B}}={\varvec{A}} {\varvec{S}} + {\varvec{E}})\), were \({\varvec{A}} \leftarrow {\mathcal {U}}({\mathbb {Z}}^{k_1 \times k_2}_q)\) and where the secret vectors \({\varvec{S}}\) and \({\varvec{E}}\) are generated from the small distributions \(\chi _s({\mathbb {Z}}^{k_2 \times m}_q)\) and \(\chi _e({\mathbb {Z}}^{k_1 \times m}_q)\) respectively. The search LWE problem states that it is hard to recover the secret \({\varvec{S}}\) from the LWE sample.

This definition can be extended to Ring- or Module-LWE [30, 32] by using vectors of polynomials. In this case, the problem is to distinguish the uniform sample \(({\varvec{A}}, {\varvec{U}}) \leftarrow {\mathcal {U}}(R^{k_1 \times k_2}_q \times R^{k_1 \times m}_q)\) from a generalized LWE sample \(({\varvec{A}}, {\varvec{B}}={\varvec{A}} {\varvec{S}} + {\varvec{E}})\) in which \({\varvec{A}} \leftarrow {\mathcal {U}}(R^{k_1 \times k_2}_q)\) and where the secret vectors \({\varvec{S}}\) and \({\varvec{E}}\) are generated from the small distribution \(\chi _s(R^{k_2 \times m}_q)\) and \(\chi _e(R^{k_1 \times m}_q)\) respectively. Analogous to the LWE case, the search problem is to recover the secret \({\varvec{S}}\) from a generalized LWE sample.

The decisional generalized Learning with Rounding (LWR) problem [7] is defined as distinguishing the uniform sample \(({\varvec{A}}, \lfloor {\varvec{U}} \rceil _{q \rightarrow p})\), with \({\varvec{A}} \leftarrow {\mathcal {U}}(R^{k_1 \times k_2}_q)\) and \({\varvec{U}} \leftarrow {\mathcal {U}}(R^{k_1 \times m}_q)\) from the generalized LWR sample \(({\varvec{A}}, {\varvec{B}}=\lfloor {\varvec{A}} {\varvec{S}} \rceil _{q \rightarrow p})\) with \({\varvec{A}} \leftarrow {\mathcal {U}}(R^{k_1 \times k_2}_q)\) and \({\varvec{S}} \leftarrow \chi _s(R^{k_2 \times m}_q)\). In the analogous search problem, one has to find \({\varvec{S}}\) from a generalized LWR sample.

2.4 (Ring/Module-)LWE Based Encryption

Let \(\mathtt {gen}\) be a pseudorandom generator that expands \(\text {seed}_{{\varvec{A}}}\) into a uniformly random distributed matrix \({\varvec{A}} \in R^{k \times k}_q\). Define \(\mathtt {enc}\) as an encoding function that transforms a message m into a polynomial representation, and \(\mathtt {dec}\) as the inverse decoding function. A general (Ring/Module-)LWE based PKE, consisting of a key generation, an encryption and a decryption phase, can then be constructed as described in Algorithms 1, 2 and 3 respectively. The randomness required for the generation of the secrets \({\varvec{S}}'_B\), \({\varvec{E}}'_B\) and \({\varvec{E}}''_B\) during the encryption, is generated pseudorandomly from the uniformly distributed seed r that is given as an input.

figure a
figure b
figure c

Using this general framework, specific schemes can be described with appropriate parameter choices. When the ring \(R_q\) is chosen as \({\mathbb {Z}}_q\), the encryption is LWE-based as can be seen in FrodoKEM [33] and Emblem [42]. A value of \(l=1\) indicates a Ring-LWE based scheme including New Hope [5], LAC [31], LIMA [43] or R.Emblem [42]. If \(l \ne 1\) and \(R_q \ne {\mathbb {Z}}_q\), the scheme is based on the Module-LWE hard problem such as Kyber [10]. When referring to Kyber throughout this paper, we will consider the original version that includes rounding. The special case that \(\chi _e = 0\) corresponds to (Module/Ring)-LWR-based schemes such as Round2 [6] and Saber [13]. In Lizard [12], a combination of an LWE and LWR problem is proposed. In most (Ring/Module-)LWE based schemes, \(q=p\) and no rounding is performed in the calculation of \({\varvec{B}}\) and \({\varvec{B}}'\), while t is in most schemes much smaller than q leading to a drastic rounding of \({\varvec{V}}'\).

We define \({\varvec{U}}_A\), \({\varvec{U}}'_B\) en \({\varvec{U}}''_B\) as the errors introduced by the rounding operations, which is formalized as follows:

$$\begin{aligned} {\varvec{U}}_A= & {} {\varvec{A}} {\varvec{S}}_A + {\varvec{E}}_A - {\varvec{B}}_r \,, \end{aligned}$$
(1)
$$\begin{aligned} {\varvec{U}}'_B= & {} {\varvec{A}}^T {\varvec{S}}'_B + {\varvec{E}}'_B - {\varvec{B}}'_r \,, \end{aligned}$$
(2)
$$\begin{aligned} {\varvec{U}}''_B= & {} {\varvec{B}}_r^T {\varvec{S}}'_B + {\varvec{E}}''_B + \mathtt {enc}(m) - {\varvec{V}}'_r \,. \end{aligned}$$
(3)

Let \({\varvec{S}}\) be the vector constructed as the concatenation of the vectors \(-{\varvec{S}}_A\) and \({\varvec{E}}_A+{\varvec{U}}_A\), let \({\varvec{C}}\) be the concatenation of \({\varvec{E}}'_B+{\varvec{U}}'_B\) and \({\varvec{S}}'_B\), and let \({\varvec{G}}={\varvec{E}}''_B+{\varvec{U}}''_B\). An attacker that generates ciphertexts can compute \({\varvec{C}}\) and \({\varvec{G}}\) and tries to obtain information about \({\varvec{S}}\). These variables are summarized below:

$$\begin{aligned} {\varvec{S}} = \begin{pmatrix} - {\varvec{S}}_A \\ {\varvec{E}}_A+{\varvec{U}}_A \end{pmatrix}, \quad {\varvec{C}} = \begin{pmatrix}{\varvec{E}}'_B+{\varvec{U}}'_B \\ {\varvec{S}}'_B \end{pmatrix},\quad {\varvec{G}}={\varvec{E}}''_B+{\varvec{U}}''_B \,. \end{aligned}$$
(4)

After the execution of this protocol, the two parties will arrive at the same key if the decoding \(\mathtt {dec}({\varvec{V}}'_r-{\varvec{V}})\) equals m. The term \({\varvec{V}}'_r-{\varvec{V}}\) can be rewritten as \(({\varvec{E}}_A + {\varvec{U}}_A)^T {\varvec{S}}'_B - {\varvec{S}}_A^T ({\varvec{E}}'_B + {\varvec{U}}'_B) + ({\varvec{E}}''+{\varvec{U}}''_B) + \mathtt {enc}(m) = {\varvec{S}}^T{\varvec{C}}+{\varvec{G}}+ \mathtt {enc}(m)\). The message can be recovered if and only if \(\mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}}) < q_t\) for a certain threshold \(q_t\) that is scheme dependent.

We will say that a (decryption) failure occurred if the parties do not arrive at a common key due to a coefficient of \(\mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})\) that is larger than \(q_t\), and will define \(F({\varvec{C}}, {\varvec{G}})\) as the probability of a decryption failure given \({\varvec{C}}\) and \({\varvec{G}}\) averaged over all \({\varvec{S}}\), which can be expressed as \(\sum _{{\varvec{S}}} P(\mathtt {abs}({\varvec{S}}^T{\varvec{C}} + {\varvec{G}}) > q_t \ | \ {\varvec{S}}) P({\varvec{S}})\).

2.5 Fujisaki-Okamoto Transformation

Using the Fujisaki-Okamoto transform [18, 25], one can transform a chosen plaintext secure PKE to an IND-CCA secure KEM. On top of the encryption from the PKE, the KEM defines an encapsulation and decapsulation function as described in Algorithms 4 and 5, using hash functions \({\mathcal {H}}\) and \({\mathcal {G}}\).

figure d
figure e

3 Weak-Ciphertext Failure Boosting

In this section, we will develop a method to estimate the minimum amount of work to obtain one ciphertext that triggers a decryption failure. In contrast to an honest party that generates ciphertexts randomly, an attacker can search for ciphertexts that have a higher failure probability than average, which will be called ‘weak’. As \({\varvec{C}}\) and \({\varvec{G}}\) are the only terms with which an attacker can influence decryption failures, the search for weak ciphertexts boils down to the search for weak \(({\varvec{C}}, {\varvec{G}})\). However, the pair \(({\varvec{C}}, {\varvec{G}})\) is generated through a hash H() with random seed r, and during decryption it is checked whether the generator of the ciphertext knew the preimage r of \(({\varvec{C}}, {\varvec{G}})\). Therefore an attacker is forced to resort to a brute force search, which can be sped up at most quadratically using Grover’s algorithm [20].

To find a criterion for our search, we sort all possible \(({\varvec{C}}, {\varvec{G}})\) according to an increasing failure probability \(F({\varvec{C}}, {\varvec{G}})\). This list can then be divided into two sets using a threshold failure probability \(f_{t}\): weak vectors with a failure probability higher or equal than \(f_{t}\), and strong vectors with lower failure probability. Let f() be the deterministic function that generates \({\varvec{C}}\) and \(\mathbf {G}\) from the random seed r. For a certain \(f_t\), we can calculate the probability of generating a weak pair: \(\alpha = P( F({\varvec{C}}, {\varvec{G}})>f_t \,|\, r \leftarrow {\mathcal {U}}, ({\varvec{C}}, {\varvec{G}}) = f(H(r)) )\), and the probability of a decryption failure when a weak pair is used: \(\beta = P( \mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})> q_t \,|\, r \leftarrow {\mathcal {U}}, ({\varvec{C}}, {\varvec{G}}) = f(H(r)), F({\varvec{C}}, {\varvec{G}})>f_t )\).

The amount of work for an adversary to find a weak pair \(({\varvec{C}}, {\varvec{G}})\) is proportional to \(\alpha ^{-1}\), but can be sped up quadratically using Grover’s algorithm on a quantum computer, resulting in an expected workload of \(\sqrt{\alpha ^{-1}}\). On the other hand, the probability of a decryption failure given a weak pair cannot be improved using quantum computation assuming that the adversary has no quantum access to the decryption oracle. This assumption is in agreement with the premise in the NIST Post-Quantum Standardization Call for Proposals [2]. The expected work required to find a decryption failure given \(f_t\) is therefore the expected number of queries using weak ciphertexts times the expected amount of work to find a weak ciphertext, or \((\alpha \cdot \beta )^{-1}\) with a classical computer and \((\sqrt{\alpha } \cdot \beta )^{-1}\) with a quantum computer. An optimization over \(f_t\) gives the minimal effort to find one decryption failure.

3.1 Practical Calculation

For most schemes, the full sorted list \(({\varvec{C}}, {\varvec{G}})\) is not practically computable, but using some observations and assumptions, an estimate can be found. The next three steps aim at calculating the distribution of the failure probability \(F({\varvec{C}}, {\varvec{G}})\), i.e. what is the probability of finding a \(({\varvec{C}}, {\varvec{G}})\) pair with a certain failure probability f. This distribution gives enough information to calculate \(\alpha \) and \(\beta \) for a certain \(f_t\).

First, we can remove the hash H(.) in the probability expression by assuming the output of f(H(.)) given random input r to behave as the probability distributions \((\chi _{C}, \chi _{G})\), resulting in: \(\alpha = P( F({\varvec{C}}, {\varvec{G}})>f_t \,|\, ({\varvec{C}}, {\varvec{G}}) \leftarrow (\chi _{C}, \chi _{G}) )\) and \(\beta = P( \mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})> q_t \,|\, ({\varvec{C}}, {\varvec{G}}) \leftarrow (\chi _{C}, \chi _{G}), F({\varvec{C}}, {\varvec{G}})>f_t )\).

Secondly, we assume that the coefficients of \({\varvec{S}}^T{\varvec{C}}\) are normally distributed, which is reasonable as the coefficients are a sum of \(2(l\cdot n)\) distributions that are somewhat close to a Gaussian. The coefficients of the polynomial \(({\varvec{S}}^T{\varvec{C}})_{ij}\) will be distributed with mean \(\mu =0\) because of symmetry around 0, while the variance can be calculated as follows, after defining \(\chi _{e+u}\) as the distribution of the coefficients of \({\varvec{E}}_A+{\varvec{U}}_A\):

$$\begin{aligned} \mathtt {var}( ({\varvec{S}}^T{\varvec{C}})_{ijk} )= & {} \mathtt {var}(\sum _{i=0}^{l-1} \sum _{k=0}^{n-1} {\varvec{C}}_{ijk} s_{ijk} + \sum _{i=l}^{2l-1} \sum _{k=0}^{n-1} {\varvec{C}}_{ijk} e_{ijk}) \end{aligned}$$
(5)
$$\begin{aligned}&\text {where: } s_{ijk} \leftarrow \chi _{s} \text { and } e_{ijk} \leftarrow \chi _{e+u} \end{aligned}$$
(6)
$$\begin{aligned}= & {} \sum _{i=0}^{l-1} \sum _{k=0}^{n-1} {\varvec{C}}_{ijk}^2 \mathtt {var}( \chi _s ) + \sum _{i=l}^{2l-1} \sum _{k=0}^{n-1} {\varvec{C}}_{ijk}^2 \mathtt {var}( \chi _{e+u} )\end{aligned}$$
(7)
$$\begin{aligned}= & {} ||({\varvec{E}}'_B+{\varvec{U}}'_B)_{:j} ||_2^2 \mathtt {var}( \chi _s ) + ||({\varvec{S}}'_B)_{:j} ||_2^2 \mathtt {var}( \chi _{e+u} ) \,. \end{aligned}$$
(8)

Therefore, the variance of the coefficients of \({\varvec{S}}^T{\varvec{C}}\) for a given \({\varvec{C}}\) is the same for all coefficients in the same column. This variance will be denoted as \(\sigma ^2_{ j}\) for coefficients in the \(j^\text {th}\) column of \({\varvec{S}}^T{\varvec{C}}\). Furthermore, following the Gaussian assumption, the failure probability given \(\sigma ^2_j\) is the same as the failure probability given the \(j^\text {th}\) column of \({\varvec{C}}\).

In the third step we gradually calculate the distribution of the failure probability. We start from the distribution of the failure probability of the coefficient at the \(ijk^{\text {th}}\) position given \(\sigma _j\), denoted with \(\chi _{coef\, | \, \sigma }\). This distribution expresses the probability of finding a \({\varvec{G}}\) so that the failure probability is equal to \(f_{ijk}\) given a certain value of \({\varvec{C}}\) (or equivalently \(\sigma ^2_j\)) and can be expressed as follows:

$$\begin{aligned}&P( f_{ijk} \, | \, {\varvec{G}} \leftarrow \chi _G, {\varvec{C}}) \,, \end{aligned}$$
(9)

where:

$$\begin{aligned} f_{ijk}&= P( \mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})_{ijk} > q_t \, | \, {\varvec{G}}, {\varvec{C}}) \end{aligned}$$
(10)
$$\begin{aligned}&\approx P( \mathtt {abs}(x +{\varvec{G}}_{ijk}) > q_t \, | \, {\varvec{G}}, x \leftarrow {\mathcal {N}}(0,\sigma ^2_{j}), \sigma ^2_{j} ) \,. \end{aligned}$$
(11)

The distribution \(\chi _{col\, | \, \sigma }\), which models the probability of a failure in the \(j^{\text {th}}\) column of \(\mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})\) given \(\sigma ^2_j\), can be calculated using the convolution of the distributions of the mn individual coefficient failures \(\chi _{coef\, | \, \sigma }\) as follows:

$$\begin{aligned} \chi _{col \, | \, \sigma } = \chi _{coef \, | \, \sigma }^{* nm } \,. \end{aligned}$$
(12)

The conditioning on \(\sigma ^2_{j}\) is necessary to counter the dependency between the coefficients of the columns of \(\mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})\), which are dependent as a result of sharing the same variance \(\sigma ^2_{j}\).

The distribution of failure probabilities in the \(j^{\text {th}}\) column of \({\varvec{S}}^ T {\varvec{C}}\), denoted as \(\chi _{col}\), can then be calculated using a weighted average over the possible values of \(\sigma ^2_j\) as follows:

$$\begin{aligned} \chi _{col} = \sum _{l_c} P( f \, | \, f \leftarrow \chi _{col, \sigma }^{*nm} ) P(\sigma ^2_{j} = l_c) \,. \end{aligned}$$
(13)

Finally we can calculate the full failure distribution \(\chi _{\text {FAIL}}\) as the convolution of the m probability distributions corresponding to the failure distributions of the different columns. This convolution does not have the dependency on \(\sigma ^2_{j}\) as failures of different columns are independent conditioned on \({\varvec{C}}\) and \({\varvec{G}}\), therefore:

$$\begin{aligned} \chi _{\text {FAIL}} = \chi _{col}^{* m } \,. \end{aligned}$$
(14)

From this failure distribution, we can calculate \(\alpha \) and \(\beta \) for an arbitrary value of \(f_t\):

$$\begin{aligned} \alpha&= P( f > f_t \, | \, f \leftarrow \chi _{\text {FAIL}}) \,, \end{aligned}$$
(15)
$$\begin{aligned} \beta&= \frac{\sum _{f>f_t} f \cdot P( f \, | \, f \leftarrow \chi _{\text {FAIL}}) }{\alpha } \,. \end{aligned}$$
(16)

We want to stress that this calculation is not exact, mainly due to the Gaussian assumption in the second step. More accurate estimates could be obtained with a more accurate approximation in step 2, tailored for a specific scheme. In this case, the assumptions and calculations of step 1 and step 3 remain valid. For the estimations of LAC [31] in subsequent paragraphs, we followed their approach for the calculation of the effect of the error correcting code. Note that this is not an exact formula as the inputs of the error correcting code are correlated through their polynomial structure.

In Fig. 1 we compare the values of \(\alpha \) and \(\beta \) calculated using the technique described above, with exhaustively tested values on a variant of LAC128 without error correction. For step 2 of the practical calculation, we use both a Gaussian approximation as well as a binomial approximation that is more tailored for LAC. We can observe that our estimation of the effect of failure boosting is relatively close to reality.

Fig. 1.
figure 1

The failure rate of one weak ciphertext (\(\beta \)) as a function of the work required to generate one weak ciphertext (\(\alpha \)) on a classical computer for LAC128 without error correction.

3.2 Applications of Failure Boosting

Failure boosting is a useful technique in at least three scenarios: first, if there is no multi-target protection, second, if the adversary can only perform a limited number of queries to the decryption oracle and third, if the adversary has access to a quantum computer.

In some (Ring/Module-)LWE/LWR schemes, the seed r is the only input to the pseudorandom generator that generates \({\varvec{C}}\) and \({\varvec{G}}\). This paves the way to a multi-target attack where precomputed weak values of r can be used against multiple targets: after choosing the parameter \(f_t\), the adversary can generate weak ciphertexts in approximately \(\alpha ^{-1}\) time (\(\sqrt{\alpha ^{-1}}\) if he has access to a quantum computer). Each precomputed sample has then a failure probability of \(\beta \) against every target. Figure 2 shows the failure probability of one weak ciphertext versus the amount of work to generate that ciphertext on a classical computer. Multi-target protection, for example by including the public key into the generation of \({\varvec{C}}\) en \({\varvec{G}}\) as proposed in Kyber [10] and Saber [13] is a relatively cheap option to resolve this issue.

Fig. 2.
figure 2

The failure rate of one weak ciphertext (\(\beta \)) as a function of the work required to generate one weak ciphertext (\(\alpha \)) on a classical computer.

If the adversary can only perform a limited number of decryption queries, for example \(2^{64}\) in the NIST Post-Quantum Standardization Call for Proposals [2], the adversary can use failure boosting to reduce the number of required decryption queries. To this end, he chooses the parameter \(f_t\) so that the inverse of the failure probability \(\beta ^{-1}\) equals the decryption query limit \(n_d\), which results in a probability of finding a decryption failure of approximately \((1-e^{-1}) \approx 0.63\). To find i failures with similar probability, the failure probability should be brought up so that \(\beta ^{-1} = n_d/i\). Since the amount of work to generate one input of the decryption query is approximately \(\alpha ^{-1}\) (\(\sqrt{\alpha ^{-1}}\) quantumly), the total amount of work expected is \(\alpha ^{-1} \beta ^{-1}\), (\(\sqrt{\alpha ^{-1}}\beta ^{-1}\) quantumly). Figure 3 shows the expected total amount of work to find one decryption failure with a classical computer, versus the failure rate of one weak ciphertext.

Fig. 3.
figure 3

The expected amount of work (\(\alpha ^{-1} \beta ^{-1}\)) on a classical computer, as a function of the failure rate of one weak ciphertext (\(\beta \)). The red dotted line indicates a failure rate of \(2^{-64}\). (Color figure online)

An adversary with a quantum computer always benefits from failure boosting, as the search for weak ciphertexts can be sped up using Grover’s algorithm. However, this speedup is not quadratic if the adversary has no quantum access to the decryption oracle. Figure 4 shows the total amount of expected work to find one decryption failure, versus the amount of work to find one weak ciphertext on a quantum computer \(\sqrt{\alpha ^{-1}}\).

Fig. 4.
figure 4

The expected amount of work (\(\sqrt{\alpha ^{-1}} \beta ^{-1}\)) as a function of the work required to generate one weak ciphertext (\(\sqrt{\alpha ^{-1}}\)) on a classical computer.

4 Estimation of the Secret

Finding a decryption failure does not immediately break the security of the KEM, but it does provide extra information to an adversary. In this section we will investigate how much this information leaks about the secret. An adversary that has obtained ciphertexts that produce decryption failures can use them to make an estimation of the secret \({\varvec{S}}\).

When a failure occurs, we know that at least one coefficient of \(\mathtt {abs}({\varvec{S}}^T{\varvec{C}}+{\varvec{G}})\) is larger than the threshold \(q_t\). This leads to a classification of the coefficients in the set of fail coefficients \(v_f\) and no-fail coefficients \(v_s\). To each coefficient at position (ijk), a vector of integers \({\varvec{s}}\) can be associated by taking the coefficients of \({\varvec{S}}_{:i}\). Similarly, the coefficient can be linked to a vector of integers \({\varvec{c}}\) calculated as a function of \({\varvec{C}}_{:j}\) and k, so that the multiplication \({\varvec{s}}{\varvec{c}}\) equals that coefficient.

No-fail vectors will contain negligible information about the secret \({\varvec{s}}\), but failure vectors do carry clues, as the threshold exceeding value of the coefficients of \({\varvec{S}}^T{\varvec{C}}+{\varvec{G}}\) implies a correlation between the corresponding \({\varvec{c}}\) and \({\varvec{s}}\). This correlation can be positive, in case of a large positive value of the coefficient, or negative, in case of a large negative value of the coefficient. Consequently, the fail coefficients can be further divided into the sets of positive \(v_{fp}\) and negative \(v_{fn}\) fail coefficients respectively. Moreover, negative fail vectors can be transformed into positive fail vectors by multiplication with \(-1\). Note that failure coefficients at position (ijk) will only contain information about the \(j^\text {th}\) column of \({\varvec{S}}\), which is why the estimation of the columns of \({\varvec{S}}\) can be performed independently.

4.1 One Positive Failure Vector

We will first examine the case where we know one positive fail vector \({\varvec{c}}\) and associated coefficient \({\varvec{G}}_{i,j,k}\), which we will denote with g. This corresponds to the case where one failing ciphertext and the location and sign of the error is known. The question is how much the knowledge about \({\varvec{c}}\) and g can improve our estimate of the associated secret \({\varvec{s}}\). Applying Bayes’ theorem and assuming independence between the coefficients of \({\varvec{c}}\) and \({\varvec{s}}\) that are on different positions, we can write:

$$\begin{aligned} P({\varvec{s}}_i \,|\, {\varvec{c}}, g, {\varvec{s}}{\varvec{c}}> q_t-g)\approx & {} P({\varvec{s}}_i \,|\, {\varvec{c}}_i, g, {\varvec{s}}{\varvec{c}} > q_t-g) \end{aligned}$$
(17)
$$\begin{aligned}= & {} \frac{P({\varvec{s}}{\varvec{c}}> q_t-g \,|\, {\varvec{s}}_i, {\varvec{c}}_i, g)P({\varvec{s}}_i \,|\, {\varvec{c}}_i, g)}{P({\varvec{s}}{\varvec{c}} > q_t-g \,|\, {\varvec{c}}_i, g)} \end{aligned}$$
(18)
$$\begin{aligned}= & {} \frac{P(\sum ^{j \ne i}_j {\varvec{s}}_j {\varvec{c}}_j> q_t-g-{\varvec{s}}_i {\varvec{c}}_i \,|\, {\varvec{s}}_i, {\varvec{c}}_i, g)P({\varvec{s}}_i)}{P({\varvec{s}}{\varvec{c}} > q_t-g \,|\, {\varvec{c}}_i, g )} \,. \end{aligned}$$
(19)

The improved estimates for the coefficients of \({\varvec{s}}\) can in turn be used to get an estimate \({\varvec{s}}_{est}\) that minimizes its variance \(E[({\varvec{s}}_{est}-{\varvec{s}})^2]\) as follows:

$$\begin{aligned} 0= & {} \frac{d}{d{\varvec{s}}_{est,i}} E( ({\varvec{s}}_{est, i}-{\varvec{s}}_i)^2 ) \end{aligned}$$
(20)
$$\begin{aligned}= & {} 2 \sum _{{\varvec{s}}_i} ({\varvec{s}}_{est, i}-{\varvec{s}}_i) P({\varvec{s}}_i) \,, \end{aligned}$$
(21)
$$\begin{aligned} \text {or:} \quad {\varvec{s}}_{est, i}= & {} \sum _{{\varvec{s}}_i} {\varvec{s}}_i \cdot P({\varvec{s}}_i) \,. \end{aligned}$$
(22)

The estimate of \({\varvec{s}}\) gives the estimate of the \(j^{\text {th}}\) column of \({\varvec{S}}\), which can be divided trivially in an approximation \({\varvec{S}}_{A, est}\) of \(({\varvec{S}}_{A})_{:j}\) and \({\varvec{E}}_{A, est}\) of \(({\varvec{E}}_A +{\varvec{U}}_A)_{:j}\). These vectors can be used to transform the original (Ring/Module-)LWE/LWR sample \(({\varvec{A}}, {\varvec{A}} ({\varvec{S}}_A)_{:j} + ({\varvec{E}}_A +{\varvec{U}}_A)_{:j} )\) into a (Ring/Module-)LWE alike problem with a smaller secret variance by subtracting \({\varvec{A}}{\varvec{S}}_{A, est} + {\varvec{E}}_{A, est}\). This results in the sample \(({\varvec{A}}, {\varvec{A}} (({\varvec{S}}_A)_{:j}-{\varvec{S}}_{A, est}) + ({\varvec{E}}_A +{\varvec{U}}_A)_{:j} - {\varvec{E}}_{A, est})\), which is a problem with smaller secret \(({\varvec{S}}_A)_{:j}-{\varvec{S}}_{A, est}\) and noise \(({\varvec{E}}_A +{\varvec{U}}_A)_{:j} - {\varvec{E}}_{A, est}\). We will call this new problem the simplified problem.

4.2 Multiple Fail Vectors

Having access to m positive fail vectors \({\varvec{c}}^{(1)} \dots {\varvec{c}}^{(m)}\) from the same column, with corresponding values of G as \(g^{(1)} \dots g^{(m)}\), an adversary can improve his estimate of \(P({\varvec{s}})\) and therefore obtain a better estimate \({\varvec{s}}_{est}\), assuming that the failure vectors \({\varvec{c}}_i\) are independent conditioned on \({\varvec{s}}\). This corresponds to knowing m failing ciphertexts and the location and sign of their errors.

$$\begin{aligned} P({\varvec{s}}_i \,|\, {\varvec{c}}^{(1)} \dots {\varvec{c}}^{(m)}, g^{(1)} \dots g^{(m)} )\approx & {} P({\varvec{s}}_i \,|\, {\varvec{c}}^{(1)}_{i} \dots {\varvec{c}}^{(m)}_{i}, g^{(1)} \dots g^{(m)} ) \end{aligned}$$
(23)
$$\begin{aligned}= & {} \frac{P({\varvec{c}}^{(1)}_{i} \dots {\varvec{c}}^{(m)}_{i} \,|\, {\varvec{s}}_i, g^{(1)} \dots g^{(m)} ) P( {\varvec{s}}_i \,|\, g^{(1)} \dots g^{(m)} )}{P({\varvec{c}}^{(1)}_{i} \dots {\varvec{c}}^{(m)}_{i} \,|\, g^{(1)} \dots g^{(m)})}\nonumber \\ \end{aligned}$$
(24)
$$\begin{aligned}= & {} \frac{ P( {\varvec{s}}_i ) \prod _{k=1}^m P({\varvec{c}}^{(k)}_{i} \,|\, {\varvec{s}}_i, g^{(k)} ) }{\prod _{k=1}^m P({\varvec{c}}^{(k)}_{i} \,|\, g^{(k)})} \,. \end{aligned}$$
(25)

Similar to Eq. 19, \(P({\varvec{c}}_{i} \,|\, {\varvec{s}}_i, g^{(k)} )\) can be calculated as:

$$\begin{aligned} P({\varvec{c}}_i \,|\, {\varvec{s}}_i, g, {\varvec{s}}{\varvec{c}}> q_t-g)= & {} \frac{P({\varvec{s}}{\varvec{c}}> q_t-g \,|\, {\varvec{s}}_i, {\varvec{c}}_i, g)P({\varvec{c}}_i \,|\, {\varvec{s}}_i, g)}{P({\varvec{s}}{\varvec{c}} > q_t-g \,|\, {\varvec{s}}_i, g)} \end{aligned}$$
(26)
$$\begin{aligned}= & {} \frac{P(\sum ^{j \ne i}_j {\varvec{s}}_j {\varvec{c}}_j> q_t-g-{\varvec{s}}_i {\varvec{c}}_i \,|\, {\varvec{s}}_i, {\varvec{c}}_i, g)P({\varvec{c}}_i)}{P({\varvec{s}}{\varvec{c}} > q_t-g \,|\, {\varvec{s}}_i, g )} \,. \end{aligned}$$
(27)

In subsequent calculations, each value of the coefficient of g is taken as the maximum possible value.

4.3 Classification of Vectors

The above approach assumes a prior knowledge of the exact position and sign of the errors. This information is needed to link coefficients of \({\varvec{C}}\) with their corresponding coefficient of \({\varvec{S}}\). However, this is not always a trivial problem. For most schemes there are three sources of extra information that will allow to perform this classification with a high probability using only a few decryption failures.

Firstly, a large coefficient of \({\varvec{G}}\) would induce a higher failure probability for the corresponding coefficient of the error term \({\varvec{S}}^T{\varvec{C}}+{\varvec{G}}\). Thus, failures are more likely to happen at positions linked to that coefficient of \({\varvec{G}}\). Moreover, a positive value of the coefficient suggests a positive error so that \({\varvec{c}} \in v_{fp}\), while a negative value hints at a negative error, or \({\varvec{c}} \in v_{fn}\).

Secondly, as vectors \({\varvec{c}} \in v_f\) are correlated with the secret \({\varvec{s}}\), they are also correlated with each other. Therefore, vectors \({\varvec{c}} \in v_{f}\) are more correlated between each other than a vector \({\varvec{c}} \in v_{f}\) with a vector \({\varvec{c}} \in v_{s}\). Moreover, a high positive correlation suggests that the vectors share the same class \(v_{fp}\) or \(v_{fn}\), while a high negative correlation indicates that the vectors have a different classes. This allows for a clustering of the fail vectors using the higher than average correlation, under the condition that the correlation difference is high enough. This correlation difference is related to the failure rate: a low failure rate implies a higher correlation because only ciphertexts that are highly correlated with the secret lead to a failure rate in this case. For example, Fig. 5 shows an estimate of the correlations between vectors of the classes \(v_{fp}\) (pos), \(v_{fn}\) (neg) and \(v_s\) (nofail) in Kyber768. This approach does not work for schemes with strong error correcting codes (ECC) such as LAC, as the bit error rate before correction is relatively high for these types of algorithms, leading to a relatively low correlation between failure vectors.

Fig. 5.
figure 5

The probability of a certain value of the correlation between different classes of vectors in Kyber768.

In case of a ring/module structure of the coefficients of \({\varvec{S}}\), an additional structure arises leading to an artifact in which some pairs of no-fail coefficients within the same polynomial also have high correlation of their corresponding vectors. Imagine a pair of failure coefficients at positions \((i,j,k_1)\) and \((i, j, k_2)\) from different decryption failures ab, with corresponding matrices \({\varvec{C}}^{(a)}\) and \({\varvec{C}}^{(b)}\). The correlation of the vectors \({\varvec{c}}^{(a)}\) and \({\varvec{c}}^{(b)}\) can be written as \(X^{k_1}{\varvec{C}}^{(a)T}_{:,j} X^{k_2} {\varvec{C}}^{(b)}_{:,j} = X^{k_1+k_2} {\varvec{C}}^{(a)T}_{:,j} {\varvec{C}}^{(b)}_{:,j}\), from which is clear that the vectors from \({\varvec{C}}^{(a)}\) and \({\varvec{C}}^{(b)}\), with respective positions \((i,j,k_1-t)\) and \((i, j, k_2+t)\) have the same correlation. The clustering will thus result in n classes, with one class containing the failure vectors. Combining this information with the information of the first method gives an adversary the failure vectors with high probability. Otherwise, an adversary can estimate the secret n times and check the validity of the result using the (Ring/Module-)LWE/LWR problem.

Finally, for schemes that use error correcting codes to reduce their failure probability, side channel leakage during the error correction might reveal information on the presence or position of failure coefficients. Note that if this is the case, it might not even be necessary to obtain a decryption failure since failing coefficients could also be collected on successful decryptions where there is at least one failing coefficient.

4.4 Implications

Figure 6 depicts the relative variance reduction of the secret as a function of the number of positive failure vectors for various schemes. For schemes that have a very low failure probability for individual coefficients of \({\varvec{S}}^T{\varvec{C}}+{\varvec{G}}\), such as Kyber, Saber and FrodoKEM, the variance of the secret drastically reduces upon knowing only a few failing ciphertexts. Assuming that the simplified problem, that takes into account the estimate of the secret, has the same complexity as a regular (Ring/Module-)LWE problem with similar secret variance, we can calculate the remaining hardness of the simplified problem as a function of the number of positive failure vectors as shown in Fig. 7 using the toolbox provided by Albrecht et al. [4] using the Q-core sieve estimate.

The effectiveness of the attack declines as the failure probability of the individual coefficients increases, since the correlation between the secret and the failing ciphertext is lower in this case. This can be seen in the case of LAC, where the individual coefficients have relatively high failure rates due to a strong error correcting code. On the other hand, a failing ciphertext will contain multiple errors, making it easier to collect multiple failure vectors.

Note that once one or more failures are found, they can be used to estimate the secret. This estimate in turn can be used to improve the search for weak ciphertexts by considering \(F({\varvec{C}}, {\varvec{G}})\) as \(\sum _{{\varvec{S}}} P(\text {FAIL}({\varvec{C}}, {\varvec{G}}), {\varvec{S}})\), where \({\varvec{S}}\) is not sampled from \(\chi _{{\varvec{S}}}\), but from the new probability distribution \(\chi _{{\varvec{S}}_{est}}\). Therefore, the search for weak keys could become easier the more failures have been found. However, we do not take this effect into account in this paper.

Fig. 6.
figure 6

The relative reduction in entropy as a function of the number of positive failure vectors

Fig. 7.
figure 7

The hardness of the simplified problem as a function of the number of positive failure vectors

5 Weak-Ciphertext Attack

Using the failure boosting technique from Sect. 3 and the secret estimation method from Sect. 4, we can lower the security of a (Ring/Module-)LWE/LWR scheme on the condition that its failure rate is high enough. To this end, we first collect i decryption failures using the failure boosting technique, which would cost approximately \(i \sqrt{\alpha ^{-1}} \beta ^{-1}\) work. Then, the exact error position and failure type should be determined for all of the failure vectors using the techniques of Subsect. 4.3. Based on this information, the secret can be estimated, which in turn can be used to simplify the (Ring/Module-)LWE/LWR problem. These last two operations require a negligible amount of work compared to finding decryption failures. Finally, we need to solve the simplified problem, with has a complexity \(S_{\text {simplified}}(i)\) as estimated in Sect. 4. The total amount of work is therefore \({\mathcal {O}}(S_{\text {simplified}}(i) + i \sqrt{\alpha ^{-1}} \beta ^{-1})\), which is depicted in Fig. 8 as a function of the number of failures i. Note that the practical security of Kyber relies on an error term \({\varvec{E}}_A\) as well as a rounding term \({\varvec{U}}_A\). Both are taken into account in the security calculation.

Fig. 8.
figure 8

The full amount of work to break the scheme as a function of the number of collected decryption failures

Table 1 gives an overview of the original hardness of the scheme before decryption failure usage S, and the attack cost \(S_{\text {simplified}}(i) + i \sqrt{\alpha ^{-1}} \beta ^{-1}\) using decryption failures for ideal values of i and \(f_t\), which are calculated through a brute force sweep. The number of collected decryption failures i and the expected number of decryption queries \(i \beta ^{-1}\) is also included. These values are calculated assuming that the adversary can perform an unlimited number of decryption queries. From this table we can see that the security of Kyber and Saber is considerably reduced. This is due to the fact that finding a failure is easier than breaking the security of the scheme S. For the case of FrodoKEM976, the security is not affected as the work to obtain a failure is considerably larger than breaking the security S.

In other situations such as a multi-target attack or having only a limited number of decryption queries, other values of \(f_t\) and i will obtain optimal results. For example in a multi-target attack scenario one would select a higher threshold \(f_t\) to be able to efficiently re-use the precomputation work \(\alpha ^{-1}\) for weak ciphertexts and therefore reduce the overall work. A limit on the number of decryptions \(n_d\) could make it necessary to increase the amount of precomputational work \(\alpha ^{-1}\) in order to reduce the failure rate \(\beta ^{-1} < n_d/i\). This would make the attack more expensive or might even invalidate it. For example, the NIST Post-Quantum Standardization Process decryption limit is set to \(2^{64}\), which rules out a decryption failure attack on schemes with a low enough failure rate such as Saber and Kyber, which can be deduced from Fig. 3. As such, the security of this schemes is not affected within the NIST framework.

Table 1. The security of different schemes with and without decryption failures

6 A Weak-Key Attack Model

In this section we elaborate a weak-key (multi-target) attack model when the adversary can only have a limited number of decryption queries to one user but multiple users can be queried. We observe that for certain keys, the error probability can be much higher when applying the failure boosting technique, i.e., choosing ‘weak’ ciphertexts as discussed in Sect. 3, if the chosen ciphertexts satisfy certain key-related properties. The major targets are the same as before – lattice-based NIST post-quantum proposals with CCA security using some CCA transformations.

We set the maximum number of ciphertexts that can be submitted to a node with a public key to be \(2^K\) and we set the maximum number of public keys in the system to be \(2^L\). Referring again to the NIST Post-Quantum Standardization Process, they have indicated in their call that at least \(K=64\) can be considered. In the discussion forum [1] for the same project, we have also seen researchers mentioning that \(L=64\) can be considered. We will adopt \(K=L=64\) in the further sections since it seems these values are not questioned, although larger values of \(K\) and \(L\) can give more powerful attacks and could definitely be relevant. For example, comparing with attacks on symmetric schemes, such attacks may require a number of plaintext-ciphertext pairs that are close to the number of possible keys (like \(2^{200}\)), and still they are considered valid attacks.

The proposed attack procedure is split in three steps.

  1. 1.

    Do a precomputation step to establish pairs of messages and corresponding ciphertexts and let informally the set \({\mathcal {F}}\) denote error vectors corresponding to the different messages, which are equivalent to the \(({\varvec{C}},{\varvec{G}})\) pairs chosen before. These selected error vectors should be with particular properties, e.g, with large norm and/or with several large entries in certain positions, etc.

  2. 2.

    Send the ciphertexts contained in \({\mathcal {F}}\) and assume that we learn the decrypted messages. Assume further that a subset have been erroneously decrypted (wrong decoding due to too large error) and let \({\mathcal {F}}'\) be the error vectors causing decryption failure. The cardinality of this set could be larger than average if certain properties (related to \({\mathcal {F}}\)) of the secret vector hold. So we submit the set of ciphertexts to each node holding a public key. The node giving the largest decryption failure rate is selected as the target public key for the attack.

  3. 3.

    Do statistical testing on the set \({\mathcal {F}}'\) (and possibly the set \({\mathcal {F}}\)) to establish relationships between the secret key and given the noise vectors leading to a decryption failure. Analyzing their correlation, we may be able to recover partial secrets, which can considerably reduce the solving complexity of the underlying hard problem. We are then able to perform a full key-recovery attack via classic approaches such as lattice reduction algorithms.

Note that the above procedure is very close to the weak-ciphertext attack described in the previous sections. One major difference is that here we choose the set \({\mathcal {F}}\) of ‘weak’ ciphertexts to be related to the ‘weak’ keys targeted, while in the prior, the ‘weak’ ciphertexts are chosen to have a larger decryption failure rate averaged over all keys.

We discuss the three steps briefly. In the precomputation step, we can observe a first difference between different schemes. Most schemes include the public key in the generation of the noisy vectors (as input in the hash function generating the noise). This means that a constructed set \({\mathcal {F}}\) can only be used for a single public key and a new such set must be constructed for the next public key. For simplicity, we assume \(|{\mathcal {F}}|=2^K\) and note that the number of nodes with a public key is \(2^{L}\). If we set the computational complexity of precomputing a set \({\mathcal {F}}\) to be \(2^{\lambda }\), the overall complexity of this first step is \(2^{\lambda +L}\). On the other hand, there are also schemes where error vectors are generated independent of the public key (e.g. LAC). In such a case the same set \({\mathcal {F}}\) can be used on all public keys and the complexity is only \(2^{\lambda }\). We could also use Grover’s search algorithm to accelerate the pre-computation step, as discussed in Sect. 3. However, since the pre-quantum and post-quantum security goals in the NIST Post-Quantum Standardization Process are different for a certain security level, this quantum acceleration may not help us to break the claimed security level of a submission.

For the second step, the idea is that among many public keys, there will be one where the corresponding secret values have a property that causes more decryption errors than on average. So to increase the decryption error probability to a reasonable and detectable level, we consider that a special property in the secret value is held with probability at least \(p'\), where \(0< p' <1\). We then assume that \(p'=2^{-L}\), so we can expect that this special property in the secret value holds for one public key. As mentioned, with respect to the CCA security, NIST restricts to have at most \(2^{64}\) decryption calls to each user (public key). So in order to distinguish a special property in the secret value corresponding to a public key, one needs to get the failure rate for this case to be larger than \(2^{-64}\).

Finally, in the statistical testing part, we have a set of error vectors that have caused decryption errors. There seems to be a plethora of methods that can be used to recover secret values. For instance, the strong maximum-likelihood approach has been discussed in Sect. 4 and heuristic approaches can also be applied. A general approach that we can adopt is to consider a smaller part of the secret vector under reconstruction, and select the most probable values for this part, based on the observed error vectors in \({\mathcal {F}}\). Then one combines such guesses for different parts and builds an approximation of a secret vector. A good approximation will mostly be sufficient as it can be used in lattice-basis reduction algorithms.

We note that in many applications, the challenge is to detect the first decryption failure, since we can usually have adaptive approaches to find more failures afterwards with a lower complexity. This idea is further demonstrated in the next section where an adaptive CCA attack on ss-ntru-pke will be presented, and also in a code-based application [34].

7 A Weak-Key Attack on ss-ntru-pke

We have applied the described weak-key approach and provide the details of attacking ss-ntru-pke, a version in the submission to the NIST Post-Quantum Standardization Process – NTRUEncrypt [47]. Connected is also the provably secure NTRU [44] whose security is based purely on the hardness of Ring-LWE. NTRUEncrypt with different parameter choices has been around for a long time and is one of the most competitive lattice-based schemes when it comes to performance.

Note that our attack in this section is in the pre-quantum (classic) security framework due to the different security goal for NIST-V when Grover’s algorithm is considered. We adopt the notations from the NTRUEncrypt submission [47] throughout this section.

7.1 The ss-ntru-pke Scheme

ss-ntru-pke is the version of NTRUEncrypt targeting the highest security level, being 256 bits. This scheme achieves CCA2 security via the NAEP transform [27], a transform similar to the Fujisaki-Okamoto transformation with an additional mask. We give a very brief explanation of the scheme. For most of the description and details, we refer to [47]. In the key generation (see Algorithm 6), two secret polynomials \(\mathbf f,\mathbf g\in {\mathcal {R}}\) are selected, where the coordinates are chosen from a discrete Gaussian \({\mathcal {X}}_{\sigma }\) distribution with standard deviation \( \sigma \). A public key is formed by computing .

figure f

We show in Algorithm 7 the encryption algorithm of ss-ntru-pke and in Algorithm 8 the decryption algorithm, both from the original proposal [47]. In these descriptions, Hash() represents a hash function, and \({\mathcal {B}}\) represents a set including all binary polynomials with degree at most \(N-1\). The Pad() operation is a function to ensure the message has sufficient entropy, and the Extract() operation is the inverse of Pad().

In each encryption of a message \(\mathbf m\), two polynomials \(\mathbf r,\mathbf e\in {\mathcal {R}}\) are generated, where the coordinates are again chosen from a discrete Gaussian distribution \({\mathcal {X}}_{\sigma }\) with standard deviation \( \sigma \). This randomness source uses a seed generated as Hash\((\mathbf m,\mathbf h)\). This means that each choice of a message \(\mathbf m\) will generate also the polynomials \(\mathbf r,\mathbf e\in {\mathcal {R}}\). Let us denote this by

$$\begin{aligned} (\mathbf r,\mathbf e) = \mathbf{{G}}(\mathbf m,\mathbf h). \end{aligned}$$
figure g

In decryption, with ciphertext \(\mathbf c\), one computes the message by computing

$$\begin{aligned} \mathbf{f * c = } p\cdot \mathbf{r*g} + p\cdot \mathbf{e * f} + \mathbf{m' * f}. \end{aligned}$$

A decryption error occurs if \(|| p\cdot \mathbf{r*g} + p\cdot \mathbf{e * f} + \mathbf{m' * f}||_\infty > q/2\). This basically translates to \(|| \mathbf{r*g} + \mathbf{e * f}||_\infty > q/4\) as \(p=2\) and the last term is much smaller than the first two.

The proposed parameters for ss-ntru-pke for the security level of NIST-V are shown in Table 2. The decoding error probability is estimated to be less than \(2^{-80}\) in [47].

Table 2. Proposed ss-ntru-pke parameters.

7.2 The Attack

We now follow the approach of the previous section and describe an attack. The detailed attack is shown in Algorithm 9, where a more efficient CCA2 version is adopted. We define an equivalence relation for two polynomials \(u(x)\), \(v(x) \in {\mathcal {R}}\) if , or if , for \(i \in {\mathbb {Z}}\).

figure j
figure k

Attack step 1 – pre-computation.

We pick random messages \(\mathbf m\) and generate corresponding \((\mathbf r,\mathbf e) = G(\mathbf m,\mathbf h)\) for a given public key \(\mathbf h\). We keep only vectors equivalent to a polynomial that has the first \(l\) (e.g., \(l =2\)) positions with the same sign and each with size larger than \(c\cdot \sigma \), where \(c\) is a constant determining the computational effort of finding such error vectors. These vectors form our chosen set \({\mathcal {F}}\).

We set \(l=2\) to illustrate the idea in a concrete attack. For one position, the probability that the entry is larger than \(c\sigma \) is \(1-\text{ erf }(c/\sqrt{2})\). As we can start from any position, the probability to have two consecutive positions with the same sign and entries larger than \(c\sigma \) is \(p_\mathbf{e} = N*(1-\text{ erf }(c/\sqrt{2}))^{2}/2\). If we set \(p_\mathbf{e}\) to be \(2^{-120}\), then \(c\) can be as large as \(9.193\).

Attack step 2 – submit ciphertexts for decryption.

We then send the ciphertexts corresponding to the noise vectors in \({\mathcal {F}}\) to the decryption algorithm. If the targeted secret key \(\mathbf f\) is also equivalent to a polynomial that has the first \(l\) (e.g., \(l =2\)) positions with the same sign and each with size larger than \(c_{s}\cdot \sigma \), where \(c_{s}\) is another constant, then the decoding errors can be detectable. We expect to collect several errors and store their corresponding error vectors \((\mathbf r, e)\). The probability to have two consecutive positions with the same sign and entries larger than \(c_{s}\sigma \) is \(p_\mathbf{s} = N*(1-\text{ erf }(c_\mathbf{s}/\sqrt{2}))^{2}/2\). If we set \(p_\mathbf{s}\) to be \(2^{-64}\), then \(c_\mathbf{s}\) can be as large as \(6.802\).

If we run \(2^{120}\) precomputation steps for each stored vector with the desired properties, then the overall complexity is \(2^{248}\) since \(p_\mathbf{s} = 2^{-64}\). Let \(C_{1}\) denote \( 2\cdot c_\mathbf{s} c\sigma ^{2}\). We can then have a coefficient in \(\mathbf r * g + e* f\) whose absolute contribution from these two big entries is at least \(C_{1} = 2^{25.97}\). We consider the probabilistic behavior of the remaining \((2N-2)\) positions. As the coefficients of \(\mathbf r, g, e, f\) are all sampled from a Gaussian distribution with mean \(0\) and stand deviation \(\sigma = 724\), the expected norm of the rest vector in \(\mathbf{f},\mathbf{g}\) with \(2N-2\) entries is about \(\sqrt{2N-2}\cdot \sigma \). Given a public key, \(\mathbf{f},\mathbf{g}\) is fixed. Thus, this coefficient of \(\mathbf r * g + e* f\) can be approximated as \(C_{1} + {\varPhi }_{0}\), where \({\varPhi }_{0}\) is Gaussian distribution with mean \(0\) and standard deviation \(\sqrt{2N-2}\cdot \sigma ^{2}\). As the error appears when this coefficient is larger than \(q/4\), the error probabilityFootnote 3 can be approximated as

$$ P_{e} = \left( 1 - \text{ erf }(\frac{q/4 - C_{1}}{\sqrt{2(2N-2)}\sigma ^{2}})\right) \cdot \frac{1}{2}. $$

We obtain a decoding error probability of \(2^{-57.3}\) for this example.

Thus we can obtain about \(2^{6.7}\) errors from the \(2^{64}\) decryption trails.

An adaptive CCA attack. If we keep the previous setting, i.e., a CCA1 attack, the cost is larger than \(2^{248}\). However, we can adopt a much more powerful attack model, namely an adaptive CCA (CCA2) attack, consisting of two phases. In the first phase, the attacker spends half of his computational power to determine a weak key; in the later phase, he would put all his remaining resources into attacking this weak key.

To be more specific, we first prepare \(2^{63}\) messages/ciphertexts for each of the \(2^{64}\) public keys. Then we expect two errors corresponding to one key, which can be claimed as a weak key.

We can also reduce the precomputation work for each key to \(2^{89}\), if there are \(2^{64}\) public keys. We have \(c=7.956\) and the error probability is \(2^{-62.0}\), so we expect to have two errors in the testing stage. We then spend \(2^{216}\) work on another precomputation to have \(2^{63}\) messages with \(c\) to be \(10.351\), done only for this weak key. The error probability in the second phase is estimated as \(2^{-53.0}\), so we can have \(2^{10}\) errors. The overall complexity is \(2^{217}\).

Attack step 3 – statistical analysis.

In this step we will try to recover the secret \( \mathbf{f}\). Let us first assume that \( \mathbf{f}\) has its two big entries in the first two positions of the vector. Then the position in \(\mathbf e * f\) where the error occurs, denoted \(i_{0}\), is the position where the two significant coefficients in and those in \( \mathbf{f}\) coincide. We now transform each \(\mathbf e\) in such a way that its two big entries are also to be found in the first two positions. This is done by replacing \(\mathbf e\) with the corresponding equivalent vector where the two big entries are in the first two positions. Assuming M decryption errors, this now gives us the following knowledge from the received decryption errors:

$$\begin{aligned} \sum _{i=2}^{N-1}e^{(j)}_if_i +N^{(j)}_i> q/4- 2\cdot c_\mathbf{s} c\sigma ^{2}, \end{aligned}$$

for \(j=1 \ldots M\) and where \(N^{(j)}\) denotes the remaining contribution to the noise. Finally we note that assuming that \( \mathbf{f}\) has its two big entries in the first two positions is not a restriction, as such an \( \mathbf{f}\) vector will just be an equivalent vector of the true \( \mathbf{f}\). So we need only to recover \( \mathbf{f}\) and then check all equivalent vectors.

We next show how to derive more knowledge of \(\mathbf{f},\mathbf{g}\) with statistical tools.

A heuristic approach. As we have assumed that the two big entries in \((\mathbf{f},\mathbf{g})\) (or (\(\mathbf e, r\))) are the first two entries, we use \(\mathbf K\) (or for \(1 \le i\le M\)) to denote a vector consisting of the remaining \(2N-2\) entries. Thus, the size of \(\mathbf K\) (or \(\mathbf V_{i}\)) can be estimated as \(\sqrt{(2N-2)}\sigma \).

We adopt the heuristic assumptions from [19] that all the errors are very close to the folding bound \(q/4\), meaning that all the messages leading to an error belong to a hyperplane

where \(C_{1}\) is the contribution from the two significant entries.

Thus, the mean vector \({\hat{\mathbf{V}}}\) of \(\mathbf V_{i}\) should be close to a scaled vector of \(\mathbf K\), i.e.,

We can have an estimation \(\hat{\mathbf{K}}\) \(=\frac{(2N-2)\sigma ^{2}}{q/4 -C_{1}}{\hat{\mathbf{V}}}\). If we round the entries of \(\mathbf K\) to the nearest integer in \({\mathbb {Z}}_q\), we obtain an estimation \((\hat{\mathbf{f}}, {\hat{\mathbf{g}}})\) of the secret vector \((\mathbf{f},\mathbf{g})\).

The remaining question is how good this estimation can be? We heuristically answer this question using the central limit theorem.

Each observation \(\mathbf V_{i}\) with approximated norm \(\sqrt{2N-2}\sigma \) can be viewed as the summation of the signal point

$$ \frac{q/4 -C_{1}}{\left||\mathbf K\right||^{2}} \mathbf{K}, $$

and a noise vector with squared norm

$$ (2N-2)\sigma ^{2} - \frac{(q/4 -C_{1})^{2}}{(2N-2)\sigma ^{2}}. $$

By the central limit theorem, if we have \(M\) observations, then the squared norm (variance) of the noise can be reduced by a factor of \(M\). Hence, the error norm should be

$$ \sqrt{\frac{1}{M}\cdot \left( (2N-2)\sigma ^{2} - \frac{(q/4 -C_{1})^{2}}{(2N-2)\sigma ^{2}}\right) }. $$

As we consider \(\hat{\mathbf{K}}\) instead of \(\hat{\mathbf{V}}\), the true error norm should be resized as

$$\begin{aligned} \frac{(2N-2)\sigma ^{2}}{q/4 -C_{1}} \cdot \sqrt{\frac{1}{M}\cdot \left( (2N-2)\sigma ^{2} - \frac{(q/4 -C_{1})^{2}}{(2N-2)\sigma ^{2}}\right) }. \end{aligned}$$
(28)

Using this formula, we can have a candidate with error norm \(0.169\sqrt{2N-2}\sigma \), assuming that \(1024\) errors have been collected.

Attack step 4 – lattice reduction.

If \(({\varDelta } \mathbf{f}, {\varDelta }{} \mathbf{g})= (\mathbf{f, g}) - ({\hat{\mathbf{f}}}, {\hat{\mathbf{g}}})\) is small, we can recover it using lattice reduction algorithms efficiently. Thus, we obtain the correct value of \((\mathbf{f}, \mathbf{g})\).

If we have the error size to be only \(0.169\sqrt{2N-2}\sigma \), as assumed in the previous step, using the LWE estimator from Albrecht et al. [4], it takes about \(2^{181}\) time and \(2^{128}\) memory if one uses sieving to implement the SVP oracle in BKZ. Though the authors of [47] discussed about memory constraint for applying sieving in lattice-based cryptanalysis, we believe it is reasonable to assume for \(2^{128}\) memory when considering a scheme targeting the classic \(256\)-bit security level. Another possibility is to implement the SVP oracle using tuple sieving, further reducing the memory complexity to \(2^{117}\). The time complexity then increases to \(2^{223}\), but still far from achieving the claimed \(256\)-security level.

Table 3. The simulated error rates v.s. the estimated error rates.

7.3 Experimental Results

We have implemented some of the parts of the attack to check performance against theory. We have chosen exactly the same parameters in ss-ntru-pke as well as in the attack, except for the q value, which in the experiment was set to the values shown in Table 3. The reason being that is we wanted to lower the decryption error rate so that simulation was possible.

We put two consecutive entries in \(\mathbf f\) each of size \(6.2 \cdot \sigma \) and we generated error vectors with two large positive entries each of size \(9.2 \cdot \sigma \). For such choice, we first verified the decryption error probabilities, as seen in Table 3. These match the theoretical results well.

Table 4. The simulated error norm v.s. the estimated error norm. (\(M = 1024\))
Table 5. The simulated error norm v.s. the estimated error norm. (\(q=2^{29}+2^{27}+2^{26}+2^{25}\))
Fig. 9.
figure 9

Error norm as a function of the number of collected error vectors.

For each choice of q we then collected up to \(M = 2^{10} + 2^9 = 1536\) error vectors and processed them in a statistical analysis step, to get a good approximation of (\(\mathbf{f},\mathbf{g}\)). As the heuristic approach described, we first created an approximation of (\(\mathbf{f},\mathbf{g}\)), say denoted by (\({\hat{\mathbf{f}}}, {\hat{\mathbf{g}}}\)), by simply computing \({\hat{f}}_i= E\cdot \frac{\sum _{j=0}^{M-1}e^{(j)}_i}{M}\) as the value in the ith position. Here E is a constant that makes the norm of the vector to be as the expected norm of \(\mathbf f\). Clearly, this is a very simple way of exploring the dependence between \(f_i\) and \(e_i\), but still it seems to be sufficient.

We have plotted the simulated error norms for various \(q\) and \(M\) in Figure 9. Furthermore, we show in Tables 4 and 5 the comparison between the simulated error norms and the estimated error norms according to Eq. 28.

In the prior table, \(M\) is fixed to \(1024\) and \(q\) varies, while in the latter table, \(q\) is fixed to \(2^{29}+2^{27}+2^{26}+2^{25}\) and \(M\) varies. We see that in all the cases, the simulated data match the estimated data well, though the simulation seems always better than the estimation, i.e., with smaller error norms. Another observation from Table 5 is that the estimation using the central limit theorem becomes more accurate when \(M\) becomes larger, which is also very reasonable.

7.4 Summarizing the Attack

The best attack is a CCA2 type attack where we in precomputation use \(2^{89+63}=2^{152}\) operations to derive \(2^{63}\) special ciphertexts that are submitted for decryption. With probability \(2^{-64}\) the secret \(\mathbf f\) has the desired property of two consecutive big entries. If so, we will most likely see several decoding errors and such a weak key has been detected. When the weak key has been detected, we perform yet another precomputation that uses \(2^{216}\) operations to derive \(2^{63}\) additional special ciphertexts again submitted for decryption. We receive in expectation \(1024\) decryption errors and the knowledge from the error vectors will allow us to reconstruct \(\mathbf f\) without too much trouble using lattice reduction algorithms, as experimental results strongly indicated. The overall complexity is thus approximately \(2^{217}\) if the SVP oracle in BKZ is implemented via lattice sieving. Actually, the cost of the lattice reduction algorithms in the final stage is not the bottleneck, since we can employ other powerful statistical tools in Step \(3\) (e.g., the Maximum Likelihood Test approach) to make this cost negligible.

8 Conclusion

In this paper we introduced a method to increase the decryption failure rate of a scheme, based on the search for ‘weak’ ciphertexts. This method benefits an adversary in at least three scenarios: if he has access to a quantum computer, if he can only perform a limited number of decryption queries or if he wants to stage a multi-target attack on schemes that do not have the appropriate protection. We explicitly calculated the effect of failure boosting in these scenarios for various (Ring/Module-)LWE/LWR schemes. We also proposed a method to estimate the secret key given ciphertexts that lead to decryption failures. The remaining security after a certain number of decryption failures was calculated, given the exact location of the error. We suggested three methods to obtain the exact location of errors in failing ciphertexts. Finally, we estimated the security of several schemes under an attack that optimally uses these decryption failures and show that for some schemes the security is drastically reduced if an attacker can perform sufficient decryption queries. However, for most NIST post-quantum standardization candidates, the expected number of required decryption queries is too high for a practical attack. We also identify the changes to this attack under a multi-target scenario or when an attacker has only access to a limited number of decryption queries.

We further proposed a generic weak-key attack model against lattice-based schemes, which is slightly different from the previous attack, based on the observation that the error probability can be much higher for certain ‘weak’ keys. We applied this model to attacking ss-ntru-pke, a version in the NTRUEncrypt submission to the NIST Post-Quantum Standardization Process. Specifically, we have presented an adaptive CCA attack on the claimed \(256\)-bit classic security level (NIST-V) of ss-ntru-pke. This attacking idea can be treated as extension of reaction attacks [16, 22] that already jeopardize the CCA security of MDPC and LDPC based crypto-systems.