Keywords

1 Introduction

Lattice-based cryptography is known for its versatility, bringing forth among others encryption schemes [4, 23], digital signatures [24, 26] and fully homomorphic encryption [16] and identity based encryption [17]. Moreover, lattice-based cryptographic schemes are among the most promising candidates for post-quantum cryptography, i.e. cryptography that is secure even in the presence of quantum computers.

In 2016, the United States National Institute of Standards and Technology (NIST) announced a standardization process with the goal of standardizing one or more post-quantum encryption and digital signature schemes [1]. July 2020 saw the start of the third round of this process, with 3 out of 4 finalists for public key encryption being lattice-based (and 2 out of the 5 alternate ‘backup’ schemes).

To improve efficiency, many lattice-based encryption schemes are not perfectly correct, which means that even after a correct execution of the protocol, it is possible that the decryption fails to retrieve the correct message or key. Such an event is called a decryption failure, and the ciphertext that caused the failure is referred to as a failing ciphertext. Three of the lattice-based NIST candidates are subject to such decryption failures: Saber [10], Kyber [8] and FrodoKEM [25].

While the probabilities of these decryption failures are chosen sufficiently small to avoid any impact on performance, they have been used to stage attacks against these schemes. Decryption failure attacks can be roughly divided into two categories: chosen-ciphertext attacks and valid-ciphertext attacks. The first type was introduced by Jaulmes and Joux [21] and can efficiently recover the secret key if it is reused, by crafting specific ciphertexts that fail based on properties of the secret key. However, this attack type can be prevented by using a chosen-ciphertext transformation such as the Fujisaki-Okamoto transformation.

The second type of decryption failure attacks remains a threat even in the presence of chosen-ciphertext security measures. The idea behind this type of attack is to input a large number of correctly generated ciphertexts in search for failing ciphertexts. The authors of Kyber [8] noted that it is possible to do a Grover search for ciphertexts with higher than average failure probability. D’Anvers et al. [12] showed how to retrieve the secret key based on correctly generated but failing ciphertexts, and introduced ‘failure boosting’, a framework to speed up the search for failing ciphertexts. This was later extended in [11] to ‘directional failure boosting’, which introduced a method that further speeds up the failing ciphertext search when one or more failing ciphertext have already been found. The latter work studied a simplified lattice-based scheme and focussed on attacking a single target showing that the cost of a decryption failure attack is dominated by the cost of finding the first failure. Moreover, they introduced a simple multitarget attack specifically designed for scenarios where a maximum number of decapsulations can be performed per target. Around the same time, Guo et al. proposed specific decryption failure attacks on ss-ntru-pke [18] and LAC [19].

As opposed to attacks focusing on decryption failures, Bindel and Schanck [7] showed that correctly generated ciphertexts also provide a small amount of information about the secret. While the errors in individual message bits were assumed to happen independently in many NIST submission documents, D’Anvers et al. [13] showed that these errors are in fact correlated, showing an underestimation of the decryption failure probability for schemes that use error correction and thus an overestimation of the security of these schemes. Dachman-Soled et al. [9] developed a tool to include ‘hints’ into a LWE hard problem and showed that it can be used to retrieve the secret key using failing ciphertexts.

Our Contributions: We first improve the state-of-the-art multitarget decryption failure attack using a levelled approach in Sect. 4, leading to a more efficient attack especially for schemes with low failure probability. Secondly, we enhance the techniques to estimate the cost of decryption failure attacks, and extend them to include practical schemes such as Saber and Kyber: Sect. 5 points out three inaccuracies in the directional failure boosting calculation for the simplified scheme of [11], which are discussed and remedied. Section 6 shows that this traditional approach of calculating the directional failure boosting cost is not directly applicable to practical schemes such as Kyber and Saber due to compression of the ciphertexts and introduces new methods that adapt the traditional directional failure boosting approach to these real-world schemes. Thirdly, Sect. 7 introduces two additional constraints an attacker might face when mounting a decryption failure attack, which have not been taken into account in previous failure boosting attacks. As a result, in Sect. 8 we discuss the impact of decryption failure attacks on Kyber and Saber.

2 Preliminaries

2.1 Notation

Denote with \(\mathbb {Z}_q\) the ring of integers modulo q, represented in \((-q/2, q/2]\). Let \(R_q\) be the ring \(\mathbb {Z}_q[X]/(X^N+1)\), with N a power of two, and let \(R^{l_1 \times l_2}\) be the ring of \(l_1 \times l_2\) matrices over \(R_q\). We denote matrices with bold upper case (e.g. \(\mathbf {A}\)) and vectors and polynomials with bold lower case (e.g. \(\mathbf {b}\)).

Denote with \(\lfloor \cdot \rfloor \) flooring to the nearest lower integer, with \(\lfloor \cdot \rceil \) rounding to the nearest integer where ties are rounded upwards, and with \(\lfloor \cdot \rceil _{q \rightarrow p}\) dividing by p/q followed by rounding, i.e. \(\lfloor x \rceil _{q \rightarrow p} = \lfloor p/q \cdot x \rceil \). Let \(\left| \cdot \right| \) denote taking the absolute value. These notations are extended to vectors, matrices and polynomials element wise. The \(l_2\) norm of a polynomial or vector of integers \(\mathbf {x}\) is defined as \(||\mathbf {x}||_2 = \sqrt{\sum _i \mathbf {x}_i^2}\) and for a vector of polynomials \(\mathbf {y}\) as \(||\mathbf {y}||_2 = \sqrt{\sum _i ||\mathbf {y}_i||_2^2}\).

Let \(x \leftarrow \chi \) mean sampling x according to a probability distribution \(\chi \), and let \(\mathbf {X} \leftarrow \chi (R^{l_1 \times l_2})\) denote sampling \(\mathbf {X} \in R^{l_1 \times l_2}\) with polynomial coefficients according to the distribution \(\chi \). When the values are sampled pseudorandomly based on a seed r, this is denoted as \(\mathbf {X} \leftarrow \chi (R^{l_1 \times l_2}; r)\). The uniform distribution is denoted \(\mathcal {U}\).

We write P[E] to denote the probability of an event E. To simplify notation we denote with P[a] the probability of sampling an element a from a certain distribution \(\chi \) when this distribution is clear from the context, i.e. \(P[x=a \ | \ x \leftarrow \chi ]\). Analogous, we denote with \(\mathbb {E}[a]\) the expected value of an element a as sampled from its distribution \(\chi \) when this distribution is clear from the context.

2.2 Cryptographic Definitions

We define a Public Key Encryption scheme (PKE) as a triplet of functions \((\mathtt {KeyGen}, \mathtt {Encrypt}, \mathtt {Decrypt})\), where the key generation \(\mathtt {KeyGen}\) generates a public key pk and secret key sk, where the encryption \(\mathtt {Encrypt}\) take a public key pk and a message m from the message space \(\mathcal {M}\) to generate a ciphertext ct, and where the decryption \(\mathtt {Decrypt}\) retrieves the message m with high probability from the ciphertext ct using the secret key sk. A PKE is \(\delta \)-correct if:

$$ \mathbb {E}\left[ P[\mathtt {Decrypt}(\mathtt {Encrypt}(m, pk), sk) \ne m ] \right] \le \delta . $$

Similarly, we define a Key Encapsulation Mechanism (KEM) as the functions \((\mathtt {KeyGen}, \mathtt {Encaps}, \mathtt {Decaps})\), where \(\mathtt {KeyGen}\) generates a public key pk and secret key sk, where \(\mathtt {Encaps}\) generates a key k from keyspace \(\mathcal {K}\) and a ciphertext ct given a public key pk, and where \(\mathtt {Decaps}\) outputs a key \(k'\) or \(\perp \) when given a ciphertext ct and corresponding secret key sk. We say that a KEM is \(\delta \)-correct if:

$$ \mathbb {E}\left[ P[\mathtt {Decaps}(ct, sk) \ne k : (ct, k) \leftarrow \mathtt {Encaps}(pk) ]\right] \le \delta . $$

The Module Learning with Errors (Mod-LWE) is a hard mathematical problem introduced by Langlois and Stehlé [22], as a generalization of the Learning with Errors (LWE) [26] and Ring Learning with Errors (Ring-LWE) [24] problems. Given integers N, q and l, the ring \(R_q = \mathbb {Z}[X] / (X^N + 1)\), a distribution with limited variance \(\chi \) and a secret element \(\mathbf {s}\in R^l_q\), samples from the Mod-LWE distribution \(\mathcal {L}_{R, N, q, l, \chi , \mathbf {s}}\) are generated as:

$$\begin{aligned}&(\mathbf {a}, \mathbf {b}:=\mathbf {a}^T \mathbf {s}+ \mathbf {e}) \end{aligned}$$
(1)
$$\begin{aligned}&\text {where: } \mathbf {a}\leftarrow \mathcal {U}(R^l_q); \mathbf {e}\leftarrow \chi (R_q) \end{aligned}$$
(2)

We will specifically focus on the case where N is a power of two. The decision Mod-LWE problem is then, given k samples, to determine whether they were generated as Mod-LWE samples from \(\mathcal {L}_{R, N, q, l, \chi , \mathbf {s}}\) or from the uniform distribution \(\mathcal {U}(R^l_q \times R_q)\). The search Mod-LWE problem consists of recovering the secret \(\mathbf {s}\) from k Mod-LWE samples. LWE is a specific instance where \(R_q = \mathbb {Z}_q\) and Ring-LWE the specific instance where \(l=1\).

Learning with Rounding (LWR), as introduced by Banerjee et al. [5], is a similar problem where the error \(\mathbf {e}\) is replaced with a deterministic error obtained by rounding. Analogous to the LWE problem, variants of LWR include Ring-LWR and Mod-LWR. Given two moduli q and p, where \(q>p\), sampling from the Mod-LWR distribution can be described as:

$$\begin{aligned}&(\mathbf {a}, \mathbf {b}:=\lfloor \mathbf {a}^T \mathbf {s} \rceil _{q \rightarrow p}) \end{aligned}$$
(3)
$$\begin{aligned}&\text {where: } \mathbf {a}\leftarrow \mathcal {U}(R^l_q) \end{aligned}$$
(4)

In this paper we will specifically consider the case where p|q. The Mod-LWR decisional and search problem are defined similar to their respective Mod-LWE versions, where in the decisional problem an adversary has to distinguish between sampling from a Mod-LWR or uniform distribution, and where in the search problem an adversary is tasked to retrieve the secret \(\mathbf {s}\) from k Mod-LWR samples.

2.3 Lattice-Based Encryption

A generic PKE based on the Mod-LWE or Mod-LWR assumption is given in Algorithm 1 to 3, where q, \(p_1\), \(p_2\) and t are scheme dependent integers, where \(\chi _s\) and \(\chi _e\) are scheme specific probability distributions with small variance, where \(r \in \mathcal {R} = \{0,1\}^{256}\) and where the message space \(\mathcal {M}\) consists of polynomials in \(R_q\) with coefficients \(\{0,1\}\).

This generic protocol can be used to describe Saber, Kyber and the scheme studied in [11], which was designed to simplify the study of failure boosting and will be referred to as Katana. The parameters of these schemes are given in Table 1. For Saber and Kyber we consider the round 3 submissions as described in [6] and [27] respectively, which are the most recent versions at the time of writing.

For Kyber, the distributions \(\chi _s\) and \(\chi _e\) are centered binomial distributions with limited variance. There is no public key compression (i.e. \(q=p_1\)) but there is ciphertext compression (i.e. \(q>p_2>t\)). SaberFootnote 1 similarly uses a centered binomial distribution for \(\chi _s\), but its distribution \(\chi _e\) always returns zero. Saber does both public key and ciphertext compression (e.g. \(q>p_1=p_2>t\)). Katana is an idealized scheme with Gaussian distributions for \(\chi _s\) and \(\chi _e\) and without compression of the public key or ciphertext (i.e. \(q=p_1=p_2=t\)).

figure a
Table 1. Parameters of Katana, Saber and Kyber. The security is based on the estimates of Albrecht et al. [2, 3]

2.4 Chosen-Ciphertext Security

To protect against chosen-ciphertext attacks, designers typically convert their passively secure PKE to an actively secure KEM using a generic transformation such as a post-quantum variant [20, 28] of the Fujisaki-Okamoto [14, 15] transformation. The obtained KEM then has a similar key generation, while the encapsulation and decapsulation are constructed as described in Algorithms 4 and 5 respectively. The idea behind this transformation is that the input ciphertext is checked using a re-encryption of the message, and the ciphertext is rejected if the input ciphertext is not valid. As a result of this procedure, an adversary does not learn anything from inputting invalid ciphertexts. However, in case of a valid ciphertext that leads to a decryption failure, the re-encryption still fails and we will assume that an attacker is able to recognize such event.

figure b

2.5 Decryption Failures

A decryption failure is an event where one fails to recover message or key, which can even happen after following the correct protocol. The occurrence of decryption failures depends on the secret terms \(\mathbf {s}, \mathbf {s}', \mathbf {e}, \mathbf {e}', \mathbf {e}''\) in combination with the rounding errors \(\mathbf {u}, \mathbf {u}', \mathbf {u}''\), which are defined as:

$$\begin{aligned}&\mathbf {u}:=\mathbf {b}_q- (\mathbf {A}\mathbf {s}+ \mathbf {e}) \end{aligned}$$
(5)
$$\begin{aligned}&\mathbf {u}' :=\mathbf {b}_q' - (\mathbf {A}^T \mathbf {s}' + \mathbf {e}') \end{aligned}$$
(6)
$$\begin{aligned}&\mathbf {u}'' :=\mathbf {v}_q' - (\mathbf {b}_q^T \mathbf {s}' + \mathbf {e}'' + \mathbf {m}) \end{aligned}$$
(7)

Expanding the value of the received message \(m'\), we get:

$$\begin{aligned} \mathbf {m}'&= \lfloor \lfloor 2/q \rfloor (\mathbf {v}_q' - \mathbf {b}_q'^T \mathbf {s}) \rceil \end{aligned}$$
(8)
$$\begin{aligned}&= \mathbf {m} + \lfloor \lfloor 2/q \rfloor ((\mathbf {e}+ \mathbf {u})^T \mathbf {s}' - \mathbf {s}^T (\mathbf {e}' + \mathbf {u}') + (\mathbf {e}''+\mathbf {u}'')) \rceil \end{aligned}$$
(9)

and a decryption failure occurs if any coefficient of this error term exceeds the threshold \(q_t= q/4\), which can be formalized as follows:

$$ ||(\mathbf {e}+ \mathbf {u})^T \mathbf {s}' - \mathbf {s}^T (\mathbf {e}' + \mathbf {u}') + (\mathbf {e}''+\mathbf {u}'')||_\infty > q_t$$

Failure Vectors: Following [12] we define the failure vectors \(\mathbf {\mathcal {S}}\), \(\mathbf {\mathcal {C}}\), \(\mathbf {\mathcal {G}}\) as:

$$\begin{aligned} \mathbf {\mathcal {S}}= \begin{pmatrix} - \mathbf {s}\\ \mathbf {e}+\mathbf {u}\end{pmatrix} \quad \mathbf {\mathcal {C}}= \begin{pmatrix} \mathbf {e}'+\mathbf {u}' \\ \mathbf {s}' \end{pmatrix}\quad \mathbf {\mathcal {G}}= \mathbf {e}''+\mathbf {u}'' \end{aligned}$$
(10)

which simplifies the failure condition to:

$$ ||\mathbf {\mathcal {S}}^T \mathbf {\mathcal {C}}+ \mathbf {\mathcal {G}}||_\infty > q_t$$

Geometric Notation: To streamline notation, we will use the geometric notation as introduced in [11]. The vector \(\overline{\mathbf {\mathcal {S}}}\in \mathbb {Z}_q^{lN \times 1}\) is an integer vector representation of \(\mathbf {\mathcal {S}}\), obtained by arranging all coefficients of the polynomials of \(\mathbf {\mathcal {S}}\) in a vector. Additionally, the rotation of a vector of polynomials \(\mathbf {\mathcal {C}}\) is defined as:

$$\begin{aligned} \mathbf {\mathcal {C}}^{(r)} :=X^{r} \cdot \mathbf {\mathcal {C}}(X^{-1}) \mod \ X^N +1. \end{aligned}$$
(11)

Using this notation, the \(i^\text {th}\) coefficient of \(\mathbf {\mathcal {S}}^T\mathbf {\mathcal {C}}\) can be calculated as \(\overline{\mathbf {\mathcal {S}}}^T\overline{\mathbf {\mathcal {C}}^{(i)}}\). An illustration of these concepts is given in Example 1. For more information about this representation we refer to [11].

Example 1

[11] For a secret \(\mathbf {\mathcal {S}}\) and a ciphertext \(\mathbf {\mathcal {C}}\) in \(\mathbb {Z}^{2 \times 1}_q[X]/(X^3+1)\):

$$\begin{aligned} \mathbf {\mathcal {S}}= \begin{bmatrix} \mathbf {\mathcal {S}}_{0,0} + \mathbf {\mathcal {S}}_{0,1} X + \mathbf {\mathcal {S}}_{0,2} X^2 \\ \mathbf {\mathcal {S}}_{1,0} + \mathbf {\mathcal {S}}_{1,1} X + \mathbf {\mathcal {S}}_{1,2} X^2 \end{bmatrix}, \quad \mathbf {\mathcal {C}}= \begin{bmatrix} \mathbf {\mathcal {C}}_{0,0} + \mathbf {\mathcal {C}}_{0,1} X + \mathbf {\mathcal {C}}_{0,2} X^2 \\ \mathbf {\mathcal {C}}_{1,0} + \mathbf {\mathcal {C}}_{1,1} X + \mathbf {\mathcal {C}}_{1,2} X^2 \end{bmatrix} \end{aligned}$$

we get the following vectors:

$$\begin{aligned} \overline{\mathbf {\mathcal {S}}}= \begin{bmatrix} \mathbf {\mathcal {S}}_{0,0} \\ \mathbf {\mathcal {S}}_{0,1} \\ \mathbf {\mathcal {S}}_{0,2} \\ \mathbf {\mathcal {S}}_{1,0} \\ \mathbf {\mathcal {S}}_{1,1} \\ \mathbf {\mathcal {S}}_{1,2} \end{bmatrix}, \quad \overline{\mathbf {\mathcal {C}}^{(0)}} = \begin{bmatrix} \mathbf {\mathcal {C}}_{0,0} \\ - \mathbf {\mathcal {C}}_{0,2} \\ - \mathbf {\mathcal {C}}_{0,1} \\ \mathbf {\mathcal {C}}_{1,0} \\ - \mathbf {\mathcal {C}}_{1,2} \\ - \mathbf {\mathcal {C}}_{1,1} \end{bmatrix} \quad \overline{\mathbf {\mathcal {C}}^{(1)}} = \begin{bmatrix} \mathbf {\mathcal {C}}_{0,1} \\ \mathbf {\mathcal {C}}_{0,0} \\ - \mathbf {\mathcal {C}}_{0,2} \\ \mathbf {\mathcal {C}}_{1,1} \\ \mathbf {\mathcal {C}}_{1,0} \\ - \mathbf {\mathcal {C}}_{1,2} \end{bmatrix} \quad \overline{\mathbf {\mathcal {C}}^{(3)}} = \begin{bmatrix} - \mathbf {\mathcal {C}}_{0,0} \\ \mathbf {\mathcal {C}}_{0,2} \\ \mathbf {\mathcal {C}}_{0,1} \\ - \mathbf {\mathcal {C}}_{1,0} \\ \mathbf {\mathcal {C}}_{1,2} \\ \mathbf {\mathcal {C}}_{1,1} \end{bmatrix} \dots \end{aligned}$$

Definitions: We will denote with F a decryption failure, and with S a successful decryption. \(F_i\) will denote an error at the \(i^\text {th}\) coefficient of \(\mathbf {\mathcal {S}}^T \mathbf {\mathcal {C}}+ \mathbf {\mathcal {G}}\), which happens when the absolute value of this coefficient is bigger than \(q_t\). Similarly \(S_i\) will denote a successful decryption of the \(i^\text {th}\) coefficient. Using the geometric notation we can say that an error \(F_i\) occurs if:

$$ \left| \overline{\mathbf {\mathcal {S}}}^T\overline{\mathbf {\mathcal {C}}^{(i)}} + \mathbf {\mathcal {G}}_i\right| > q_t$$

We will use the shorthand \(P_F[ct]\) to denote the failure probability P[F|ct] for a certain ciphertext ct, which can be formalized as:

$$\begin{aligned} P_F[ct] = \sum _{\forall \mathbf {\mathcal {S}}} P[\mathbf {\mathcal {S}}] \cdot P[F | ct, \mathbf {\mathcal {S}}] \end{aligned}$$

Sometimes, we will group ciphertexts in classes, where a class cl bundles all ciphertexts with certain properties, e.g. \(cl = \{ \forall ct : ||\mathbf {\mathcal {C}}||_2 = c, \mathbf {\mathcal {G}}= g \}\). In this case \(P_F[cl]\) denotes the weighted average of the failure probabilities of all ciphertexts in the class cl, which can be formalized as:

$$ P_F[cl] = \sum _{\forall ct: ct \in cl} P[ct] \cdot P[F | ct] $$

3 Failure Boosting Attacks

By exploiting decryption failures, an attacker can mount an attack that retrieves the secret key. The crux of such an attack is that failing ciphertexts give information that can be used to reconstruct the secret key as described in [11, 12] and [9]. In this paper we will focus on the process to obtain these failing ciphertexts as efficiently as possible.

We specifically target schemes that are IND-CCA secured, which implies that non-valid ciphertexts are rejected by the decapsulation regardless of the occurrence of a decryption failure and thus that they can not give any information. As such the attack surface is limited to submitting valid ciphertexts and observing whether a failure occurs.

Failure boosting [12] is a technique to increase the failure probability of valid ciphertexts submitted for decapsulation. It is a two step process consisting of a precomputation step and a query step. We will discuss the cost of a failure boosting attack using two metrics: work \(\mathcal {W}\) and queries \(\mathcal {Q}\). Work describes the cost of precomputation, where \(1 \mathcal {W}\) is defined as the cost of generating one ciphertext, while \(\mathcal {Q}\) describes the total number of decapsulation queries performed.

Precomputation: During precomputation, the adversary performs an offline search for weak ciphertexts, i.e. valid ciphertexts with a high failure probability. This is accomplished by randomly generating ciphertexts until a ciphertext with failure probability above a certain threshold \(f_t\) is found. The probability of finding such a ciphertext can be expressed as follows:

$$\begin{aligned} \alpha (f_t)= \sum _{\forall ct: P_F[ ct ] > f_t} P[ ct ]. \end{aligned}$$
(12)

Finding a weak failure will take on average \(\alpha (f_t)^{-1}\) work, but this can be sped up quadratically using a quantum computer to \(\sqrt{\alpha (f_t)^{-1}}\) work.

Querying: Once a weak ciphertext is found, it is submitted for decapsulation and the adversary observes whether it triggers a decryption failure. A failure happens with probability \(\beta (f_t)\) for a given threshold \(f_t\), which can be calculated as follows:

$$\begin{aligned} \beta (f_t)&= \frac{\sum _{\forall ct: P_F[ ct ]> f_t} P[ct] \cdot P_F[ct] }{\sum _{\forall ct: P_F[ ct ]> f_t} P[ct]} = \frac{\sum _{\forall ct: P_F[ ct ] > f_t} P[ct] \cdot P_F[ct] }{\alpha (f_t)}. \end{aligned}$$
(13)

The query step can not be sped up using quantum computers as an adversary has typically no quantum access to the decapsulation oracle. An adversary needs on average \(\beta (f_t)^{-1}\) queries to obtain one decryption failure.

Attack Cost: For a given threshold \(f_t\), finding a decryption failure costs on average \(\alpha (f_t)^{-1} \beta (f_t)^{-1}\) work and \(\beta (f_t)^{-1}\) queries, which can be reduced to \(\sqrt{\alpha (f_t)^{-1}} \beta (f_t)^{-1}\) work when Grover search is used during precomputation.

3.1 Directional Failure Boosting

Directional failure boosting [11] improves failure boosting and can be used when at least one other failure has been found. It specifically uses information of previously found failing ciphertexts to improve the search for new failures. In [11], this is done by calculating \(\mathbf {\mathcal {E}}\), an estimate of the direction of the secret \(\overline{\mathbf {\mathcal {S}}}\), and taking this into account in the failure estimation \(P_F[ct, \mathbf {\mathcal {E}}]\).

Directional failure boosting dramatically reduces the cost of finding additional failures after the first failure has been found. As a result, in a single target attack the work and number of queries is dominated by finding the first failure and thus the cost of a single target attack can be approximated as the cost of finding the first failure. An in depth discussion of directional failure boosting can be found in [11].

3.2 Estimation of Efficiency

The cost of (directional) failure boosting is described by Eq. 12 and 13, which requires to sum over all possible ciphertexts. This is clearly infeasible, but can be simplified by making an approximate failure model and grouping ciphertexts with similar failure probability. Two such models were presented in the literature: Gaussian approximation and geometric approximation.

Gaussian Approximation [12]: The Gaussian approximation considers the coefficients of \(\mathbf {\mathcal {S}}^T \mathbf {\mathcal {C}}\) to follow a Gaussian distribution with zero mean and variance depending on \(\mathbf {\mathcal {C}}\). This assumption can be used to accurately estimate failure boosting efficiency, but does not work for directional failure boosting estimations. The calculation method as presented in [12] takes both \(\mathbf {\mathcal {C}}\) and \(\mathbf {\mathcal {G}}\) into account in the weak ciphertext selection. For more information about the exact calculation methodology we refer the reader to [12].

Geometric Approximation [11]: The geometric approximation assumes that the angle \(\phi \) between \(\overline{\mathbf {\mathcal {S}}}^T \overline{\mathbf {\mathcal {C}}^{(i)}}\) behaves as a uniformly random angle in dimension 2Nl. This approximation corresponds to the assumption that \(\chi _s\) and \(\chi _e\) are continuous Gaussian distributions with zero mean. Using the geometric approximation, the condition on an error at the \(i^\text {th}\) coefficient can be rewritten from:

$$\begin{aligned} \left| \overline{\mathbf {\mathcal {S}}}^T \overline{\mathbf {\mathcal {C}}^{(i)}} + \mathbf {\mathcal {G}}_i \right| > q_t\end{aligned}$$
(14)

to:

$$\begin{aligned} \left| \ ||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \cos (\phi ) + \mathbf {\mathcal {G}}_i \right| > q_t \end{aligned}$$
(15)

In directional failure boosting, the vectors \(\overline{\mathbf {\mathcal {S}}}\) and \(\overline{\mathbf {\mathcal {C}}^{(i)}}\) are first expanded in a part parallel and a part orthogonal to the estimate of the secret \(\mathbf {\mathcal {E}}\):

$$\begin{aligned} \left| \overline{\mathbf {\mathcal {S}}}_{\perp }^T \overline{\mathbf {\mathcal {C}}^{(i)}}_{\perp } + \overline{\mathbf {\mathcal {S}}}_{\parallel }^T \overline{\mathbf {\mathcal {C}}^{(i)}}_{\parallel } + \mathbf {\mathcal {G}}_i \right| > q_t\end{aligned}$$
(16)

which can be further expanded to:

$$\begin{aligned}&\left| \begin{array}{l}||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \cos (\theta _{SE}) \cdot \cos (\theta _{C^iE}) + \\ ||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \sin (\theta _{SE}) \cdot \sin (\theta _{C^iE}) \cdot \cos (\psi ) + \mathbf {\mathcal {G}}_i \end{array} \right| > q_t \end{aligned}$$
(17)

with \(\psi \) a uniformly random angle in dimension \(2Nl - 1\). In D’Anvers et al. [11], the \(\mathbf {\mathcal {G}}\) term was neglected in the calculations. A more detailed explanation of this technique can be found in [11].

Attack Cost Estimation: Using the above approximations, one can bundle ciphertexts with similar failure probability in classes cl to reduce the cost of calculating \(\alpha (f_t)\) and \(\beta (f_t)\). The values of \(\alpha (f_t)\) and \(\beta (f_t)\) can be calculated using the formulas below, with the difference that P[cl] is the probability of a randomly generated ciphertext belongs to the specific class cl, and \(P_F[ cl ]\) the failure probability of ciphertexts in that class.

$$\begin{aligned} \alpha (f_t)&\approx \sum _{\forall cl: P_F[ cl ] > f_t} P[ cl ] \end{aligned}$$
(18)
$$\begin{aligned} \beta (f_t)&\approx \frac{\sum _{\forall cl: P_F[ cl ] > f_t} P[cl] \cdot P_F[cl] }{\alpha (f_t)} \end{aligned}$$
(19)

For example, under the geometric approximation, one bundles all ciphertexts with similar \(||\mathbf {\mathcal {C}}||_2\) for failure boosting. Directional failure boosting in the geometric approximation defines classes based on \(||\mathbf {\mathcal {C}}||_2\) and the closest angle \(\mathrm {maxcos}_i(\theta _{C^{(i)}E})\) between the rotations of the ciphertext and the estimate of the secret \(\mathbf {\mathcal {E}}\).

4 Multitarget Attacks

One of the main constraints in a practical attack is the number of queries that can be performed. For example, NIST [1] set a maximum of \(q_{limit}= 2^{64}\) decapsulation queries per target that can be performed during an attack. One possibility to circumvent such limitation is to consider multiple targets, with the goal of breaking one of these targets.

Such a multitarget attack queries a certain number of targets \(T^{(0)}\), where each target has an individual query limit. The goal is to retrieve the secret key for at least one of these targets. We assume that multitarget protection is in place, so that ciphertexts are only valid for one given public key and thus target. Such multitarget protection is easily obtained by incorporating (a hash of) the public key in the ciphertext generation, which is the case for Saber and Kyber.

4.1 Naive Multitarget

A naive variant of the multitarget attack was introduced in [11], which proceeded as follows: First, find the first failure by performing at most \(q_{limit}/ 2\) per target, which in total implies a maximum of \(T^{(0)} \cdot q_{limit}/ 2\) queries. Then, focus on the target that caused the failure and continue with a single target attack on this target with query limit \(q_{limit}/2\).

First note that due to multitarget protection, each generated weak ciphertext is linked to a specific public key and can only be used for that target. Moreover, one can assume that given only the public key the adversary has no efficient way to retrieve information about the secret key \(\mathbf {\mathcal {S}}\) without solving the Mod-LWE/LWR problem. This implies that he has no efficient way to distinguish between targets with higher or lower failure probability and thus that generating a weak ciphertext and querying it has exactly the same failure probability at each target.

Assuming that successful queries do not contribute any information about the targets, the failure probability at each target stays the same until a decryption failure has been found. Therefore, we can say that finding one failure at \(T^{(0)}\) targets with a maximum of \(q_{limit}/2\) queries per target has the same cost as finding one failure at one target with a maximum of \(T^{(0)} \cdot q_{limit}/ 2\) queries, so that the cost of finding the first failure in the naive multitarget attack can be described with:

$$\begin{aligned}&\sqrt{\alpha _{0}^{-1}} \beta _{0}^{-1} \text { work, and } \beta _{0}^{-1} \text { queries, } \end{aligned}$$
(20)

under the condition that:

$$\begin{aligned}&\beta _{0}^{-1} < T^{(0)} \cdot q_{limit}/ 2, \end{aligned}$$
(21)

where \(\alpha _{i}\) and \(\beta _{i}\) denote the optimal values for \(\alpha (f_t)\) and \(\beta (f_t)\) for the \(i^\text {th}\) failure, which can be determined by selecting the value of \(f_t\) that optimally reduces the work while fulfilling the query limit constraint.

To estimate the cost of finding the follow-up failures, we can use the approximation from [11], which states that in a single target attack the attack cost is dominated by finding the first failure. In this case, the first failure of the single target attack is the second overall failure so that the cost of finding the follow up failures can be calculated as:

$$\begin{aligned}&\sqrt{\alpha _{1}^{-1}} \beta _{1}^{-1} \text { work, and } \beta _{1}^{-1} \text { queries, } \end{aligned}$$
(22)

under the condition that:

$$\begin{aligned}&\beta _{1}^{-1} < q_{limit}/ 2, \end{aligned}$$
(23)

One can easily see that the total number of queries per target is always under \(q_{limit}\) in this scenario.

4.2 Levelled Multitarget

When the cost of finding the second failure is the dominant factor, this naive multitarget attack can be improved using a levelled approach. Notice that the naive multitarget attack essentially reduces the cost of the attack by relaxing the query limit constraint for finding the first failure. To reduce the cost of finding the second failure, we can similarly focus on multiple targets to relax the query constraint. However, this requires the attacker to find multiple failing ciphertexts in the first step of the attack.

Fig. 1.
figure 1

Example of multitarget attacks on Katana, with \(2^{64}\) targets and maximum \(2^{64}\) queries. The cost of finding one failure is indicated with x. The cost of finding \(T^{(1)}\) failures using failure boosting in the first phase is given by the blue dot, and the corresponding number of queries can be found as \(\beta ^{-1}\) where \(\beta \) is the x-axis value of this point. In the naive multitarget attack the cost is dominated by finding the second failure in under \(2^{64}\) queries. In the levelled approach the cost of the two phases is equalized.

More specifically, in the first phase, the attacker aims at obtaining \(T^{(1)}\) targets using under \(q_{limit}/ 3\) queries per target (which is a total of \(T^{(0)} \cdot q_{limit}/ 3\) queries). This has a cost of:

$$\begin{aligned}&T^{(1)} \sqrt{\alpha _{0}^{-1}} \beta _{0}^{-1} \text { work and } T^{(1)} \beta _{0}^{-1} \text { queries}. \end{aligned}$$
(24)

Under the condition that:

$$\begin{aligned} T^{(1)} \beta _{0}^{-1} < T^{(0)} \cdot q_{limit}/ 3 . \end{aligned}$$
(25)

The attacker can then use \(T^{(1)} \cdot q_{limit}/ 3\) queries to find the next failure, which has a cost of:

$$\begin{aligned}&\sqrt{\alpha _{1}^{-1}} \beta _{1}^{-1} \text { work and } \beta _{1}^{-1} \text { queries}. \end{aligned}$$
(26)

Under the condition that:

$$\begin{aligned} \beta _{1}^{-1} < T^{(1)} \cdot q_{limit}/ 3. \end{aligned}$$
(27)

Once a second failure is found for a given target, the attack continues with a single target attack on that target using at most \(q_{limit}/ 3\) queries. An overview of this levelled multitarget approach is given in Table 2. Note that the query limit per phase is chosen so that the total number of queries at each target over all failures is always under \(q_{limit}\). Figure 1 gives a graphical comparison of the naive and multitarget attack on Katana.

Table 2. Comparison of the naive and levelled multitarget attack. Note that \(\alpha \) and \(\beta \) values are not the same between both methods as the difference in query limits leads to a different optimal \(f_t\).

In principle it is possible to extend this approach to more levels: if the third failure would be more expensive than the previous two failures one can target \(T^{(2)}\) targets to reduce the cost of finding the third failure. However, we did not find a situation in which this was applicable, as finding the third failure is typically much cheaper than finding previous failures.

5 Better Failure Boosting Estimation

The calculation of the work necessary to perform a multitarget attack is not straightforward. Especially the cost of directional failure boosting is expensive to determine and requires multiple approximations to be able to practically compute. D’Anvers et al. [11] introduced crude approximations to reduce the computational cost of this calculation.

Apart from the geometric approximation, as explained in Subsect. 3.2, they did not consider \(\mathbf {\mathcal {G}}\), simplified the distribution of \(||\mathbf {\mathcal {S}}||_2\) into its average and used a simplified formula for the calculation of \(\theta _{SE}\). Additionally, there is a weak key effect in multitarget attacks which has not been addressed beforeFootnote 2.

These simplifications are justifiable in the single target attack, where the cost of the second failure is significantly lower than the cost of the first failure. However, in multitarget attacks, where the second failure cost might be dominant, it is important to have an accurate estimation of the cost to find this failure. We will first detail the weak key effect, then we will improve the estimation of \(\cos (\theta _{SE})\) and finally we will consider the distribution of \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\). We will clearly compare our improvements with the state-of-the-art. In this section we focus on the case where \(\chi _s = \chi _e\) and schemes without rounding, while in Sect. 6 we will extend the estimation techniques for more general schemes, including the NIST finalists Kyber and Saber.

5.1 Weak Keys

Some targets might have secret keys that are more prone to decryption failures, which we will call weak keys. It does not seem possible to efficiently identify targets with weak keys from their public key. However, in a multitarget attack, weak key targets are more prone to produce a failing ciphertext. This means that in the second phase of the attack, when looking for the second failure of a certain target, this target will have higher failure probability compared to a single target attack.

In particular, the norm of the secret \(||\mathbf {\mathcal {S}}||_2\) determines the failure probability of a given target. We will show that the a posteriori distribution of \(||\mathbf {\mathcal {S}}||_2\), given a multitarget attack where in the first phase \(T^{(0)}\) targets are considered, and with failure boosting threshold \(f_t\) can be approximated using:

$$\begin{aligned} P[ ||\mathbf {\mathcal {S}}||_2 ] \cdot \frac{ T^{(0)} P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] }{P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] + (T^{(0)} - 1) \cdot P\left[ F \ \left| \ f_t\right. \right] } \end{aligned}$$
(28)

To derive this formula, we first introduce the notation F(tq) to describe the event where the overall first failure occurs at target t on the \(q^\text {th}\) query. Similarly, we define S(tq) as a success at target t on the \(q^\text {th}\) query. \(F(t, \cdot )\) signifies the event where the first failure occurs at target t, regardless of at which query this happens. Without loss of generality we denote the target where the first failure occurs as target \(t=0\), which implies that \(||\mathbf {\mathcal {S}}||_2\) denotes the norm of \(\mathbf {\mathcal {S}}\) for the \(0^\text {th}\) target. To simplify the derivation, we will assume that the \(i^\text {th}\) query is performed at all targets at the same time, after which they are all checked for decryption failures. We can then write:

$$\begin{aligned}&P[ ||\mathbf {\mathcal {S}}||_2 \ | \ F(0, \cdot ), f_t] \end{aligned}$$
(29)
$$\begin{aligned}&= P[ ||\mathbf {\mathcal {S}}||_2 \ | \ f_t] \cdot \frac{ P[ F(0, \cdot ) \ \ | \ \ ||\mathbf {\mathcal {S}}||_2, f_t] }{ P[ F(0, \cdot ) \ | \ f_t] } \end{aligned}$$
(30)
$$\begin{aligned}&\approx T^{(0)} \cdot P[ ||\mathbf {\mathcal {S}}||_2 ] \cdot P[ F(0, \cdot ) \ \ | \ \ ||\mathbf {\mathcal {S}}||_2, f_t] \end{aligned}$$
(31)

where the latter step uses the fact that a failure occurs with equal probability at all \(T^{(0)}\) targets without extra information about the norms \(||\mathbf {\mathcal {S}}||_2\) of the targets.

The term \(P[ F(0, \cdot ) \ | \ ||\mathbf {\mathcal {S}}||_2, f_t]\) can then be extended by explicitly writing it out as a sum over the probabilities of failures at each query round:

$$\begin{aligned}&P[ F(0, \cdot ) \ | \ ||\mathbf {\mathcal {S}}||_2, f_t] \end{aligned}$$
(32)
$$\begin{aligned}&= \sum _{q=0}^\infty P\left[ \left. \begin{matrix} F(t, q), S(i, j) \\ \forall i \in \{0,\dots , T^{(0)} - 1\}, j \in \{0,\dots , q\}: (i, j) \ne (t, q) \end{matrix} \ \right| \ ||\mathbf {\mathcal {S}}||_2, f_t\right] \end{aligned}$$
(33)
$$\begin{aligned}&= \sum _{q=0}^\infty P\left[ \left. \begin{matrix} F(0, q), S(0, j) \\ \forall j \in \{0\dots ,q-1\} \end{matrix} \ \right| \ ||\mathbf {\mathcal {S}}||_2, f_t\right] \cdot P\left[ \left. \begin{matrix} S(1, j) \\ \forall j \in \{0\dots ,q\} \end{matrix} \ \right| \ f_t\right] ^{T^{(0)} - 1} \end{aligned}$$
(34)

The failure probability of a target is reduced slightly when successful ciphertexts are found. However, this effect is small, as the information embedded in successful ciphertexts is limited. We therefore assume that the failure probability of ciphertexts does not change when finding successful ciphertexts. This allows us to simplify the expression as:

$$\begin{aligned}&\approx \sum _{q=0}^\infty P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] \cdot P\left[ S \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] ^q \cdot P\left[ S \ \left| \ f_t\right. \right] ^{(T^{(0)} - 1)(q + 1)} \nonumber \\&\approx P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] \cdot P\left[ S \ \left| \ f_t\right. \right] ^{(T^{(0)} - 1)} \sum _{q=0}^\infty \left( P\left[ S \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] \cdot P\left[ S \ \left| \ f_t\right. \right] ^{(T^{(0)} - 1)} \right) ^q \end{aligned}$$
(35)
$$\begin{aligned}&\approx \frac{P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] \cdot P\left[ S \ \left| \ f_t\right. \right] ^{(T^{(0)} - 1)} }{1 - P\left[ S \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] \cdot P\left[ S \ \left| \ f_t\right. \right] ^{(T^{(0)} - 1)}} \nonumber \\&\approx \frac{P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] }{P\left[ F \ \left| \ ||\mathbf {\mathcal {S}}||_2, f_t\right. \right] + (T^{(0)} - 1) \cdot P\left[ F \, \left| \, f_t\right. \right] } \end{aligned}$$
(36)

where Eq. 35 is an infinite geometric sum, and Eq. 36 takes a Taylor approximation where only the highest order terms are kept. We will discuss the effect of weak keys in the next section, after its effects on \(\theta _{SE}\) have been addressed.

5.2 Calculating \(\theta _{SE}\)

The angle \(\theta _{SE}\) can be estimated using the simplified failure equation. Assuming a failure occurred at the \(i^\text {th}\) location we know:

$$\begin{aligned} \overline{\mathbf {\mathcal {S}}}^T \overline{\mathbf {\mathcal {C}}^{(i)}} > q_t,\end{aligned}$$
(37)

which can be rewritten as:

$$\begin{aligned} \cos (\theta _{SE}) > \frac{q_t}{||\mathbf {\mathcal {S}}||_2 ||\mathbf {\mathcal {C}}||_2}.\end{aligned}$$
(38)

The fact that uniform angles in high dimensions strongly tend to orthogonality can be used to approximate this to:

$$\begin{aligned} \cos (\theta _{SE}) = \frac{q_t}{||\mathbf {\mathcal {S}}||_2 ||\mathbf {\mathcal {C}}||_2}.\end{aligned}$$
(39)

As such, we can estimate the expected value of \(\cos (\theta _{SE})\) by assuming independence between \(\mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right] \) and \(\mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2 \right] \) as:

$$\begin{aligned} \mathbb {E}\left[ \cos (\theta _{SE}) \right] = \frac{q_t}{\mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right] \mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2 \right] }.\end{aligned}$$
(40)

In [11], the values of \(\mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right] \) and \(\mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2 \right] \) were estimated over the original a priori distribution. However, failure boosting increases the expected norm of \(||\mathbf {\mathcal {C}}||_2\) and the weak key effect increases the expected norm of \(\mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right] \). Both effects will decrease \(\mathbb {E}\left[ \cos (\theta _{SE}) \right] \) and therefore diminish the efficiency of directional failure boosting.

We take these effects into account by considering the a posteriori distributions as follows:

$$\begin{aligned} \mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2 \right]&= \sum _{||\mathbf {\mathcal {C}}||_2} ||\mathbf {\mathcal {C}}||_2 \cdot P[||\mathbf {\mathcal {C}}||_2 \ | \ f_t] \end{aligned}$$
(41)
$$\begin{aligned} \mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right]&= \sum _{||\mathbf {\mathcal {S}}||_2} ||\mathbf {\mathcal {S}}||_2 \cdot P[||\mathbf {\mathcal {S}}||_2 \ | \ F(0, \cdot ), f_t] \end{aligned}$$
(42)

Note that our expression of \(\mathbb {E}\left[ \cos (\theta _{SE}) \right] \) is now significantly better than in previous works, but still not exact for the following reasons: First, \(\mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2 \right] \) will be slightly higher than calculated above as failures happen with higher probability for higher values of \(||\mathbf {\mathcal {C}}||_2\). However, this effect is limited as failure boosting pushes \(||\mathbf {\mathcal {C}}||_2\) to high values where the tails decrease rapidly. Therefore the values of \(||\mathbf {\mathcal {C}}||_2\) will be strongly focussed around the cut-off value. Secondly, the independence assumption used to obtain Eq. 40 is not exact. Nevertheless, the approximation is good enough for our purposes.

Comparison to State-of-the-art: Figure 2a shows the effect of including the weak key effect and improving the \(\cos (\theta _{SE})\) estimation. On one hand, one can see that the weak key reduces the failure probability, which is the leftmost point on the curve, from \(2^{-115}\) to \(2^{-107}\). On the other hand, the increase in \(\mathbb {E}\left[ ||\mathbf {\mathcal {S}}||_2 \right] \) and \(\mathbb {E}\left[ ||\mathbf {\mathcal {C}}||_2\right] \) and subsequent reduction of \(\mathbb {E}\left[ \cos (\theta _{SE}) \right] \) reduces the effectiveness of directional failure boosting, an effect that becomes more pronounced with higher precomputation.

Fig. 2.
figure 2

Effect of inclusion of weak keys and \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\) on Katana. The red cross indicates the failure probability of Katana (or equally the cost of finding a failure when random guessing). (Color figure online)

5.3 Inclusion of \(\mathbf {\mathcal {S}}\) and \(\mathbf {\mathcal {G}}\)

In [11], the distributions of \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\) were simplified to their mean to speed up calculations. However, the side-effect of this is an underestimation of the failure probability and the attack efficiency. In our calculations, we take into account the distribution of both \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\).

Failure Boosting: Failure boosting calculations under the geometric approximation can be calculated by making classes based on \(||\mathbf {\mathcal {C}}||_2\) and using Eqs. 18 and 19 to determine \(\alpha (f_t)\) and \(\beta (f_t)\).

Including \(\mathbf {\mathcal {S}}\) and \(\mathbf {\mathcal {G}}\) does not change the ciphertext probability P[cl], but does impact the failure probability \(P_F[cl]\) needed to calculate \(\alpha (f_t)\) and \(\beta (f_t)\). A more exact expression of this failure probability that takes into account \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\) can be derived as follows:

$$\begin{aligned}&P_F[cl] = P_F[||\mathbf {\mathcal {C}}||_2] \end{aligned}$$
(43)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot P[F \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2] \end{aligned}$$
(44)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot \left( 1 - \prod _{i=0}^{N-1} \left( 1 - P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2] \right) \right) \end{aligned}$$
(45)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot \left( 1 - \prod _{i=0}^{N-1} \left( 1 - \sum _{\mathbf {\mathcal {G}}_i} P[\mathbf {\mathcal {G}}_i] \cdot P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i] \right) \right) \end{aligned}$$
(46)

where \(P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i]\) can be calculated following the geometric approximation of Eq. 15 as:

$$\begin{aligned} P[ F_i \ | \ ||\mathbf {\mathcal {S}}||_2, ||\mathbf {\mathcal {C}}||_2, \mathbf {\mathcal {G}}_i ] = \begin{array}{l} P[ \cos (\phi ) > \frac{q_t- \mathbf {\mathcal {G}}_i}{||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2} \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i ] \\ + P[ \cos (\phi ) < \frac{-q_t- \mathbf {\mathcal {G}}_i}{||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2} \ | \ ||\mathbf {\mathcal {C}}||_2, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i ] \end{array}, \end{aligned}$$
(47)

and where \(\phi \) can be modelled as a uniformly random angle in dimension 2Nl.

Directional Failure Boosting: The procedure for directional failure boosting is more complicated, as one should make a list over all values of \(||\mathbf {\mathcal {C}}||_2\) and \(\mathrm {maxcos}_i(\theta _{C^{(i)}E})\). As before, the calculation of P[cl] is the same as in [11], but the calculation of \(P_F[cl]\) additionally should take into account \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\).

Without loss of generality we will assume that the highest value of \(\cos (\theta _{C^{(i)}E}) \) occurs at \(i=0\), so that \(\mathrm {maxcos}_i(\theta _{C^{(i)}E})= \cos (\theta _{C^{(0)}E})\). Similar to the derivation of Eq. 46, the failure probability can then be calculated as:

$$\begin{aligned}&P_F[cl] = P_F[||\mathbf {\mathcal {C}}||_2, \theta _{C^{(0)}E}] \end{aligned}$$
(48)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot P[F \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(0)}E}, ||\mathbf {\mathcal {S}}||_2] \end{aligned}$$
(49)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot \left( 1 - \prod _{i=0}^{N-1} \left( 1 - P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(0)}E}, ||\mathbf {\mathcal {S}}||_2] \right) \right) \end{aligned}$$
(50)
$$\begin{aligned}&\approx \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot \left( 1 - \left( \begin{array}{l} \left( 1 - P[F_0 \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(0)}E}, ||\mathbf {\mathcal {S}}||_2] \right) \cdot \\ \prod _{i=1}^{N-1} \left( 1 - P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E}), ||\mathbf {\mathcal {S}}||_2] \right) \end{array} \right) \right) \end{aligned}$$
(51)
$$\begin{aligned}&\approx \sum _{||\mathbf {\mathcal {S}}||_2} P[||\mathbf {\mathcal {S}}||_2] \cdot \\&\quad \left( 1 - \left( \begin{array}{l} \left( 1 - \sum _{\mathbf {\mathcal {G}}_0} P[\mathbf {\mathcal {G}}_0] \cdot P[F_0 \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(0)}E}, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_0] \right) \cdot \\ \prod _{i=1}^{N-1} \left( 1 - \sum _{\mathbf {\mathcal {G}}_i} P[\mathbf {\mathcal {G}}_i] \cdot P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E}), ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i] \right) \end{array} \right) \right) \nonumber \end{aligned}$$
(52)

\(P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(i)}E}, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i]\) can be estimated using the geometric assumption and Eq. 17 as:

$$\begin{aligned}&\begin{array}{l} P[ \cos (\psi ) > \frac{q_t- \mathbf {\mathcal {G}}_i - ||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \cos (\theta _{SE}) \cdot \cos (\theta _{C^{(i)}E})}{||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \sin (\theta _{SE}) \cdot \sin (\theta _{C^{(i)}E})} \ | \ ||\mathbf {\mathcal {S}}||_2, ||\mathbf {\mathcal {C}}||_2, \mathbf {\mathcal {G}}_i, \cos (\theta _{C^{(i)}E}) ] \\ + P[ \cos (\psi ) < \frac{-q_t- \mathbf {\mathcal {G}}_i - ||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \cos (\theta _{SE}) \cdot \cos (\theta _{C^{(i)}E})}{||\mathbf {\mathcal {S}}||_2 \cdot ||\mathbf {\mathcal {C}}||_2 \cdot \sin (\theta _{SE}) \cdot \sin (\theta _{C^{(i)}E})} \ | \ ||\mathbf {\mathcal {S}}||_2, ||\mathbf {\mathcal {C}}||_2, \mathbf {\mathcal {G}}_i, \cos (\theta _{C^{(i)}E}) ] \end{array} \end{aligned}$$

with \(\psi \) a uniformly random angle in dimension \(2Nl-1\).

The value \(P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E}), ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i]\) can be calculated by taking a weighted average over all \(\theta _{C^{(i)}E}\) values for which \(\cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E})\) as:

$$\begin{aligned}&P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E}), ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i] \end{aligned}$$
(53)
$$\begin{aligned}&= \sum _{\forall \theta _{C^{(i)}E} : \cos (\theta _{C^{(i)}E}) \le \cos (\theta _{C^{(0)}E})} P[\theta _{C^{(i)}E}] \cdot P[F_i \ | \ ||\mathbf {\mathcal {C}}||_2, \theta _{C^{(i)}E}, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i] \end{aligned}$$
(54)

Approximate Distributions. Note that both the failure boosting and directional failure boosting methods require to loop over all possible values of \(||\mathbf {\mathcal {C}}||_2, \theta _{C^0E}, ||\mathbf {\mathcal {S}}||_2, \mathbf {\mathcal {G}}_i\), which is a costly process. To reduce calculation time, these distributions are approximated using a subset of points in the distribution. We use 200 points to approximate \(||\mathbf {\mathcal {C}}||_2\) and \(\theta _{C^{(i)}E}\), 100 points to approximate \(||\mathbf {\mathcal {S}}||_2\) and a maximum of 40 points to approximate \(\mathbf {\mathcal {G}}_i\).

Comparison to state-of-the-art: From Fig. 2b, we see that the method that does not take into account \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\) does indeed underestimate the failure probability. This effect will become larger for realistic schemes such as Saber and Kyber, who have a larger variance of the distribution of \(\mathbf {\mathcal {G}}\). Our new methodology that takes \(||\mathbf {\mathcal {S}}||_2\) and \(\mathbf {\mathcal {G}}\) into account does match with the reference calculation using the Gaussian approximation, which further confirms our method. Note that this figure presents failure boosting (for the first failure), and that the Gaussian approximation can not be used for directional failure boosting.

6 Dealing with Uneven Distributions

The cost estimation as described above can not directly be used for calculation of practical schemes that use rounding, such as Kyber or Saber, or more generally schemes that have uneven distributions for the coefficients of \(\mathbf {\mathcal {S}}\) and \(\mathbf {\mathcal {C}}\). The main reasons are twofold: first, when the distributions of \(\mathbf {s}\) and \(\mathbf {e}\) do not have the same variance, values of \(||\mathbf {e}' + \mathbf {u}'||_2\) and \(||\mathbf {s}'||_2\) have different impact on the overall failure probability. Therefore, using \(||\mathbf {\mathcal {C}}||_2\) as a predictor of the failure probability, as used in the traditional calculation of direction failure boosting [11], does not give accurate results. Secondly, when rounding occurs, the distributions of \(\mathbf {e}\) and \(\mathbf {e}'\) are typically not centered and thus the assumption of them following a uniform distribution is not valid.

Note that the Gaussian approximation which is used for the failure boosting (first failure) does not have these problems. Unfortunately it does not seem possible to port the Gaussian assumption to directional failure boosting due to the skew introduced in the distribution of \(\mathbf {\mathcal {S}}^T \mathbf {\mathcal {C}}\) when directional failure boosting is applied.

The problems described above have a significant effect on the accuracy of the failure boosting estimation (blue) as can be seen from Fig. 3. First, one can see that performing no precomputation (i.e. the leftmost point on the curve, which corresponds to the failure probability before failure boosting) does not correspond to the actual failure probability by a large margin. As an additional check we plotted the Gaussian estimation (green) for finding the first failure, which clearly further shows the discrepancy between both estimations. Looking ahead, we also plotted the geometric-uneven estimate (orange) which will be developed in this section.

Fig. 3.
figure 3

Comparison of estimated cost of (directional) failure boosting for Saber. Geometric refers to the method of Sect. 5, while geometric-uneven indicates the improved method of Sect. 6 Red cross indicates failure probability (when no precomputation is performed). Gaussian estimation is given for failure boosting as a reference. (Color figure online)

Fig. 4.
figure 4

Comparison of estimated cost of (directional) failure boosting for Katana. Geometric refers to the method of Sect. 5, while geometric-uneven indicates the improved method of Sect. 6 Red cross indicates failure probability (when no precomputation is performed). Gaussian estimation is given for failure boosting as a reference. (Color figure online)

6.1 Uneven Distributions

When the variance of the coefficients of \(\mathbf {s}\) and \(\mathbf {e}+ \mathbf {u}\) differs, the impact of \(||\mathbf {e}' + \mathbf {u}'||_2\) and \(||\mathbf {s}'||_2\) varies and they should be considered separately instead of combined in the term \(||\mathbf {\mathcal {C}}||_2\). For sake of brevity, we will use the following abbreviations:

$$\begin{aligned} \begin{array}{ll} \mathbf {\mathcal {C}}_0 = \mathbf {e}' + \mathbf {u}' &{}\quad \mathbf {\mathcal {S}}_0 = -\mathbf {s}\\ \mathbf {\mathcal {C}}_1 = \mathbf {s}' &{}\quad \mathbf {\mathcal {S}}_1 = \mathbf {e}+ \mathbf {u}\end{array} \end{aligned}$$
(55)

Uneven Failure Boosting: Instead of grouping ciphertexts based on \(||\mathbf {\mathcal {C}}||_2\), ciphertexts will be grouped in classes based on \(||\mathbf {\mathcal {C}}_0||_2\) and \(||\mathbf {\mathcal {C}}_1||_2\). The probability of a class P[cl] can be easily calculated as \(P[||\mathbf {\mathcal {C}}_0||_2] \cdot P[||\mathbf {\mathcal {C}}_1||_2]\), where the distribution of the norms can be calculated exhaustively. The failure probability \(P_F[cl]\) becomes more involved to calculate.

Similar to the approach of Subsect. 5.3, we first include the effect of \(\mathbf {\mathcal {S}}\) and \(\mathbf {\mathcal {G}}\), with the difference that we split \(||\mathbf {\mathcal {S}}||_2\) into \(||\mathbf {\mathcal {S}}_0||_2\) and \(||\mathbf {\mathcal {S}}_1||_2\) which leads to:

$$\begin{aligned}&P_F[ cl ] = P_F[||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2 ] = \\&\quad \sum _{||\mathbf {\mathcal {S}}_0||_2} \sum _{||\mathbf {\mathcal {S}}_1||_2} \left( \begin{array}{l} P[||\mathbf {\mathcal {S}}_0||_2] \cdot P[||\mathbf {\mathcal {S}}_1||_2] \cdot \\ \left( 1 - \left( 1 - \sum _{\mathbf {\mathcal {G}}_i} P[\mathbf {\mathcal {G}}_i] \cdot P[ F_i \ | \ ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, ||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2, \mathbf {\mathcal {G}}_i ] \right) ^N \right) \end{array} \right) \nonumber \end{aligned}$$
(56)

To find an expression for \(P[ F_i \ | \ ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, ||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2, \mathbf {\mathcal {G}}_i ]\) we go back to the failure term which we rewrite as:

$$\begin{aligned}&\mathbf {\mathcal {S}}^T \mathbf {\mathcal {C}}+ \mathbf {\mathcal {G}}_i \end{aligned}$$
(57)
$$\begin{aligned}&= \mathbf {\mathcal {S}}_0^T \cdot \mathbf {\mathcal {C}}_0 + \mathbf {\mathcal {S}}_1^T \cdot \mathbf {\mathcal {C}}_1 + \mathbf {\mathcal {G}}_i \end{aligned}$$
(58)
$$\begin{aligned}&= ||\mathbf {\mathcal {S}}_0||_2 \cdot ||\mathbf {\mathcal {C}}_0||_2 \cdot \cos (\phi _0) + ||\mathbf {\mathcal {S}}_1||_2 \cdot ||\mathbf {\mathcal {C}}_1||_2 \cdot \cos (\phi _1) + \mathbf {\mathcal {G}}_i \end{aligned}$$
(59)

Under the geometric assumption, the distribution of \(\phi _0\) and \(\phi _1\) can be approximated as angles from the uniform angle distribution in dimension lN. This allows us to calculate the error probability at the \(i^\text {th}\) location for given values of \(cond_1 :=(||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2, ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, \mathbf {\mathcal {G}}_i)\) as:

$$\begin{aligned}&P[ F \ | \ cond_1 ] \end{aligned}$$
(60)
$$\begin{aligned}&= P[ \left| \, ||\mathbf {\mathcal {S}}_0||_2 \cdot ||\mathbf {\mathcal {C}}_0||_2 \cdot \cos (\phi _0) + ||\mathbf {\mathcal {S}}_1||_2 \cdot ||\mathbf {\mathcal {C}}_1||_2 \cdot \cos (\phi _1) + \mathbf {\mathcal {G}}_i\right| > q_t\ | \ cond_1 ] \end{aligned}$$
(61)
$$\begin{aligned}&= \sum _{\phi _0} P[\phi _0] \left( \begin{array}{l} P[ \cos (\phi _1) > \frac{q_t- \mathbf {\mathcal {G}}_i - ||\mathbf {\mathcal {S}}_0||_2 \cdot ||\mathbf {\mathcal {C}}_0||_2 \cdot \cos (\phi _0)}{||\mathbf {\mathcal {S}}_1||_2 \cdot ||\mathbf {\mathcal {C}}_1||_2} \ | \ cond_1 ] + \\ P[ \cos (\phi _1) < \frac{-q_t- \mathbf {\mathcal {G}}_i - ||\mathbf {\mathcal {S}}_0||_2 \cdot ||\mathbf {\mathcal {C}}_0||_2 \cdot \cos (\phi _0)}{||\mathbf {\mathcal {S}}_1||_2 \cdot ||\mathbf {\mathcal {C}}_1||_2} \ | \ cond_1 ] \end{array}\right) \end{aligned}$$
(62)

Uneven Directional Failure Boosting: Directional failure boosting not only considers \(||\mathbf {\mathcal {C}}_0||_2\) and \(||\mathbf {\mathcal {C}}_1||_2\), but also the angle between the ciphertext and the estimate \(\mathbf {\mathcal {E}}\). Similar to splitting \(||\mathbf {\mathcal {C}}||_2\) these angles and the estimate \(\mathbf {\mathcal {E}}\) also should be split. We will denote with \(\mathbf {\mathcal {E}}_0\) the estimation of the direction of the secret \(\mathbf {\mathcal {S}}_0\) and with \(\mathbf {\mathcal {E}}_1\) the estimation of the direction of the secret \(\mathbf {\mathcal {S}}_1\). The angles \(\theta _{C^i_0E_0}\) and \(\theta _{C^i_0E_0}\) denote the angle between \(\overline{\mathbf {\mathcal {C}}^{(i)}}_0\) and \(\mathbf {\mathcal {E}}_0\) and between \(\overline{\mathbf {\mathcal {C}}^{(i)}}_1\) and \(\mathbf {\mathcal {E}}_1\) respectively.

Ciphertext are then combined in classes based on both the norms and the angles. Ideally one would take the maximal angle out of the lN available angles similar to [11]:

$$\begin{aligned} cl:=\left( ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, \max _i \cos (\theta _{C^i_0E_0}), \max _i \cos (\theta _{C^i_1E_1}) \right) . \end{aligned}$$

However, for computational efficiency we only consider failures \(F_0\) at the zero\(^\text {th}\) coefficient, so that the classes are defined by:

$$\begin{aligned} cl:=\left( ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, \theta _{C^0_0E_0}, \theta _{C^0_1E_1} \right) . \end{aligned}$$

The failure probability is under the same approximation equal to:

$$\begin{aligned} P_F[cl] \approx P[F_0 | cl] \end{aligned}$$

For the calculation of \(\alpha (f_t)\) and \(\beta (f_t)\), the class probability P[cl] can be simplified using independence between the class properties as: \(P[||\mathbf {\mathcal {C}}_0||_2] \cdot P[||\mathbf {\mathcal {C}}_1||_2] \cdot P[ \theta _{C^0_0E_0} ] \cdot P[ \theta _{C^0_1E_1} ]\). For the failure probability \(P_F[cl]\) we first include the influence of \(||\mathbf {\mathcal {S}}_0||_2\), \(||\mathbf {\mathcal {S}}_1||_2\) and \(\mathbf {\mathcal {G}}_0\) as:

$$\begin{aligned}&P_F[ cl ] \approx P[F_0 \ | \ cl]\\&= \sum _{||\mathbf {\mathcal {S}}_0||_2} \sum _{||\mathbf {\mathcal {S}}_1||_2} \sum _{\mathbf {\mathcal {G}}_0} \left( \begin{array}{l} P[||\mathbf {\mathcal {S}}_0||_2] \cdot P[||\mathbf {\mathcal {S}}_1||_2] \cdot P[\mathbf {\mathcal {G}}_0] \cdot \\ P[ F_0 \ | \ ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, ||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2, \mathbf {\mathcal {G}}_0, \theta _{C^0_0E_0}, \theta _{C^0_1E_1} ] \end{array} \right) ,\nonumber \end{aligned}$$
(63)

and further denoting \(cond_2 :=\left( ||\mathbf {\mathcal {C}}_0||_2, ||\mathbf {\mathcal {C}}_1||_2, ||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2, \mathbf {\mathcal {G}}_0, \theta _{C^0_0E_0}, \theta _{C^0_1E_1}\right) \), this becomes:

$$\begin{aligned} = \sum _{||\mathbf {\mathcal {S}}_0||_2} \sum _{||\mathbf {\mathcal {S}}_1||_2} \sum _{\mathbf {\mathcal {G}}_0} P[||\mathbf {\mathcal {S}}_0||_2] \cdot P[||\mathbf {\mathcal {S}}_1||_2] \cdot P[\mathbf {\mathcal {G}}_0] \cdot P[ F_0 \ | \ cond_2 ]. \end{aligned}$$
(64)

To find an expression for the error probability \(P[ F_0 \ | \ cond_2 ]\), we rewrite the failure term as follows:

$$\begin{aligned}&\overline{\mathbf {\mathcal {S}}}^T \overline{\mathbf {\mathcal {C}}^{(0)}} + \mathbf {\mathcal {G}}_0 \end{aligned}$$
(65)
$$\begin{aligned}&= \overline{\mathbf {\mathcal {S}}}_0^T \overline{\mathbf {\mathcal {C}}^{(0)}}_0 + \overline{\mathbf {\mathcal {S}}}_1^T \overline{\mathbf {\mathcal {C}}^{(0)}}_1 + \mathbf {\mathcal {G}}_0 \end{aligned}$$
(66)
$$\begin{aligned}&= \overline{\mathbf {\mathcal {S}}}_{0, \parallel }^T \overline{\mathbf {\mathcal {C}}^{(0)}}_{0, \parallel } + \overline{\mathbf {\mathcal {S}}}_{0, \perp }^T \overline{\mathbf {\mathcal {C}}^{(0)}}_{0, \perp } + \overline{\mathbf {\mathcal {S}}}_{1, \parallel }^T \overline{\mathbf {\mathcal {C}}^{(0)}}_{1, \parallel } + \overline{\mathbf {\mathcal {S}}}_{1, \perp }^T \overline{\mathbf {\mathcal {C}}^{(0)}}_{1, \perp } + \mathbf {\mathcal {G}}_0 \end{aligned}$$
(67)
$$\begin{aligned}&= ||\mathbf {\mathcal {S}}_0||_2 ||\mathbf {\mathcal {C}}_0||_2 \cos (\theta _{S_0E_0}) \cos (\theta _{C^0_0E_0}) + ||\mathbf {\mathcal {S}}_0||_2 ||\mathbf {\mathcal {C}}_0||_2 \sin (\theta _{S_0E_0}) \sin (\theta _{C_0^0E_0}) \cos (\psi _0) \nonumber \\&+ ||\mathbf {\mathcal {S}}_1||_2 ||\mathbf {\mathcal {C}}_1||_2 \cos (\theta _{S_1E_1}) \cos (\theta _{C^0_1E_1}) + ||\mathbf {\mathcal {S}}_1||_2 ||\mathbf {\mathcal {C}}_1||_2 \sin (\theta _{S_1E_1}) \sin (\theta _{C_1^0E_1}) \cos (\psi _1) \nonumber \\&+ \mathbf {\mathcal {G}}_0, \end{aligned}$$
(68)

with \(\theta _{S_0E_0}\) and \(\theta _{S_1E_1}\) the angles between \(\mathbf {\mathcal {S}}_0\) and \(\mathbf {\mathcal {E}}_0\), and \(\mathbf {\mathcal {S}}_1\) and \(\mathbf {\mathcal {E}}_1\) respectively. Following the geometric approximation, \(\psi _0\) and \(\psi _1\) are uniformly random angles in dimension \(Nl-1\). The failure probability can then be calculated as:

$$\begin{aligned}&P[ F_0 \ | \ cond_2 ] = \\&\sum _{\psi _0} P[\psi _0] \left( \begin{array}{l} P[ \cos (\psi _1) > \frac{ q_t- \mathbf {\mathcal {G}}_0 - w }{ ||\mathbf {\mathcal {S}}_1||_2 ||\mathbf {\mathcal {C}}_1||_2 \sin (\theta _{S_1E_1}) \sin (\theta _{C_1^0E_1}) } \ | \ cond_2, \psi _0 ] \\ + P[ \cos (\psi _1) < \frac{ -q_t- \mathbf {\mathcal {G}}_0 - w }{ ||\mathbf {\mathcal {S}}_1||_2 ||\mathbf {\mathcal {C}}_1||_2 \sin (\theta _{S_1E_1}) \sin (\theta _{C_1^0E_1}) } \ | \ cond_2, \psi _0 ] \end{array} \right) , \nonumber \end{aligned}$$
(69)

where:

$$\begin{aligned} w&= \left( \begin{array}{l} ||\mathbf {\mathcal {S}}_1||_2 ||\mathbf {\mathcal {C}}_1||_2 \cos (\theta _{S_1E_1}) \cos (\theta _{C^0_1E_1}) \\ + ||\mathbf {\mathcal {S}}_0||_2 ||\mathbf {\mathcal {C}}_0||_2 \cos (\theta _{S_0E_0}) \cos (\theta _{C^0_0E_0}) \\ + ||\mathbf {\mathcal {S}}_0||_2 ||\mathbf {\mathcal {C}}_0||_2 \sin (\theta _{S_0E_0}) \sin (\theta _{C_0^0E_0}) \cos (\psi _0) \end{array} \right) . \end{aligned}$$
(70)

6.2 Meet-in-the-middle Speedup

While the uneven directional failure boosting method is much more precise for schemes with uneven distributions than the original method of [11], it is computationally very demanding. The prescribed calculation in Subsect. 6.1 sums over the distributions of \(\mathbf {\mathcal {C}}_0\), \(\mathbf {\mathcal {C}}_1\), \(\mathbf {\mathcal {S}}_0\), \(\mathbf {\mathcal {S}}_1\), \(\mathbf {\mathcal {G}}_0\), \(\theta _{C_0^0E_0}\), \(\theta _{C_1^0E_1}\) and \(\psi _0\). Even when these distributions are approximated, the trade-off between computational cost and accuracy remains unsatisfactory. In this section we will introduce a meet-in-the-middle approach to reduce the computational cost of this method.

From Eq. 68, we can see that the failure equation can be written as:

$$\begin{aligned} x_0 \cos (\psi _0) + x_1 \cos (\psi _1) + z + \mathbf {\mathcal {G}}_0 \end{aligned}$$
(71)

where:

$$\begin{aligned} x_0 =&||\mathbf {\mathcal {C}}_0||_2 \cdot ||\mathbf {\mathcal {S}}_0||_2 \cdot \sin (\theta _{C_0^0E_0}) \cdot \sin (\theta _{SE_0}) \end{aligned}$$
(72)
$$\begin{aligned} x_1 =&||\mathbf {\mathcal {C}}_1||_2 \cdot ||\mathbf {\mathcal {S}}_1||_2 \cdot \sin (\theta _{C_1^0E_1}) \cdot \sin (\theta _{SE_1}) \end{aligned}$$
(73)
$$\begin{aligned} z =&\left( \begin{array}{l} ||\mathbf {\mathcal {C}}_0||_2 \cdot ||\mathbf {\mathcal {S}}_0||_2 \cdot \cos (\theta _{C_0^0E_0}) \cdot \cos (\theta _{SE_0}) +\\ ||\mathbf {\mathcal {C}}_1||_2 \cdot ||\mathbf {\mathcal {S}}_1||_2 \cdot \cos (\theta _{C_1^0E_1}) \cdot \cos (\theta _{SE_1}) \end{array} \right) \end{aligned}$$
(74)

The work can then be split into a precomputation, where the failure probability given \(x_0\), \(x_1\) and z is calculated (i.e. \(P_F[x_0, x_1, z]\)), and the directional failure boosting calculation itself, which can now use the precomputed values of \(P_F[x_0, x_1, z]\) to reduce calculations. During precomputation \(P_F[x_0, x_1, z]\) is calculated for a wide range of \(x_0, x_1\) and z values as:

$$\begin{aligned}&P_F[x_0, x_1, z] \approx P[F_0 \ | \ x_0, x_1, z]\end{aligned}$$
(75)
$$\begin{aligned}&= P[ \left| x_0 \cos (\phi _0) + x_1 \cos (\phi _1) + z + \mathbf {\mathcal {G}}_0\right| > q_t\ | \ x_0, x_1, z] \end{aligned}$$
(76)
$$\begin{aligned}&= \sum _{\mathbf {\mathcal {G}}_0} \sum _{\phi _0} P[\mathbf {\mathcal {G}}_0] \cdot P[\phi _0] \cdot P[ \left| x_0 \cos (\phi _0) + x_1 \cos (\phi _1) + z + \mathbf {\mathcal {G}}_0\right| > q_t\ | \ x_0, x_1, z] \end{aligned}$$
(77)
$$\begin{aligned}&= \sum _{\mathbf {\mathcal {G}}_0} \sum _{\phi _0} P[\mathbf {\mathcal {G}}_0] \cdot P[\phi _0] \cdot \left( \begin{array}{l}P[ \cos (\phi _1) > \frac{q_t- z - \mathbf {\mathcal {G}}_0 - x_0 \cos (\phi _0)}{x_1} \ | \ x_0, x_1, z] + \\ P[ \cos (\phi _1) < \frac{-q_t- z - \mathbf {\mathcal {G}}_0 - x_0 \cos (\phi _0)}{x_1} \ | \ x_0, x_1, z] \end{array}\right) \end{aligned}$$
(78)

Using the precomputation, the directional failure boosting calculation of \(P_F[ct]\) can then be simplified as:

$$\begin{aligned}&P_F[ct] \approx P[F_0 | ct] \end{aligned}$$
(79)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}_0||_2} \sum _{||\mathbf {\mathcal {S}}_1||_2} P[||\mathbf {\mathcal {S}}_0||_2] \cdot P[||\mathbf {\mathcal {S}}_1||_2] \cdot P[F_0 | ct, ||\mathbf {\mathcal {S}}_0||_2, ||\mathbf {\mathcal {S}}_1||_2] \end{aligned}$$
(80)
$$\begin{aligned}&= \sum _{||\mathbf {\mathcal {S}}_0||_2} \sum _{||\mathbf {\mathcal {S}}_1||_2} \left( \begin{array}{l} P[||\mathbf {\mathcal {S}}_0||_2] \cdot P[||\mathbf {\mathcal {S}}_1||_2] \cdot \\ P\left[ F_0 \left| \begin{array}{l} x_0 = ||\mathbf {\mathcal {C}}_0||_2 \cdot ||\mathbf {\mathcal {S}}_0||_2 \cdot \sin (\theta _{C_0^0E_0}) \cdot \sin (\theta _{SE_0}), \\ x_1 = ||\mathbf {\mathcal {C}}_1||_2 \cdot ||\mathbf {\mathcal {S}}_1||_2 \cdot \sin (\theta _{C_1^0E_1}) \cdot \sin (\theta _{SE_1}), \\ z = \left( \begin{array}{l} ||\mathbf {\mathcal {C}}_0||_2 \cdot ||\mathbf {\mathcal {S}}_0||_2 \cdot \cos (\theta _{C_0^0E_0}) \cdot \cos (\theta _{SE_0}) +\\ ||\mathbf {\mathcal {C}}_1||_2 \cdot ||\mathbf {\mathcal {S}}_1||_2 \cdot \cos (\theta _{C_1^0E_1}) \cdot \cos (\theta _{SE_1}) \end{array}\right) \end{array}\right. \right] \end{array} \right) \end{aligned}$$
(81)

with the values of \(P[F_0 \ | \ x_0, x_1, z]\) as calculated in the precomputation.

The precomputation loops over a grid of \((x_0, x_1, z)\) values, and for each gridpoint sums over the distribution of \(\mathbf {\mathcal {G}}_0\) and \(\phi _0\). In total, the precomputation thus only loops 5 distributions. The \((x_0, x_1, z)\) grid is calculated over 100 values for each of the elements, and intermediate values of \(P[F_0 \ | \ x_0, x_1, z]\) are linearly interpolated.

The directional failure boosting loops over the distributions of \(\mathbf {\mathcal {C}}_0\), \(\mathbf {\mathcal {C}}_1\), \(\mathbf {\mathcal {S}}_0\), \(\mathbf {\mathcal {S}}_1\), \(\theta _{C_0^0E_0}\), \(\theta _{C_1^0E_1}\), which is a total of 6 distributions. This can be compared to the loop over 8 distributions in the direct method that does not use meet-in-the-middle calculations. As a result, our meet-in-the-middle approach makes it possible to practically calculate the cost of directional failure boosting for practical schemes such as Saber and Kyber.

6.3 Removing the Bias

One of the assumptions that is explicitly used for the geometric estimation of (directional) failure boosting is that the angles \(\psi _0\) and \(\psi _1\) are distributed uniformly random. This corresponds to the idealized scenario where the secret is drawn from a continuous Gaussian distribution, but it is well approximated by binomial distribution, which is typically used in practical designs. In case of rounding, there is typically a bias in the distribution due to a non-zero mean, as a result of which there will be a ‘sense of direction’ in \(\mathbf {\mathcal {C}}_0\) and \(\mathbf {\mathcal {S}}_1\).

To remove this ‘sense of direction’ we subtract the mean of the distribution of the coefficients of \(\mathbf {\mathcal {C}}_0\) and \(\mathbf {\mathcal {S}}_1\):

$$\begin{aligned} \mathbf {\mathcal {C}}_0'&= \mathbf {\mathcal {C}}_0 - \mu _{\chi _e + \chi _s} \end{aligned}$$
(82)
$$\begin{aligned} \mathbf {\mathcal {S}}_1'&= \mathbf {\mathcal {S}}_1 - \mu _{\chi _e + \chi _s}, \end{aligned}$$
(83)

This subtraction needs to be compensated to keep a correct failure equation, which can be done as follows:

$$\begin{aligned}&\mathbf {\mathcal {S}}_0^T \mathbf {\mathcal {C}}_0 + \mathbf {\mathcal {S}}_1^T \mathbf {\mathcal {C}}_1 + \mathbf {\mathcal {G}}\end{aligned}$$
(84)
$$\begin{aligned}&= \mathbf {\mathcal {S}}_0^T \mathbf {\mathcal {C}}_0' + \mathbf {\mathcal {S}}_1'^T \mathbf {\mathcal {C}}_1 + (\mathbf {\mathcal {G}}+ \mu _{\chi _e + \chi _s} \cdot \mathbf {\mathcal {S}}_0 + \mu _{\chi _e + \chi _s} \mathbf {\mathcal {C}}_1) \end{aligned}$$
(85)

And thus by selecting:

$$\begin{aligned} \mathbf {\mathcal {G}}' = \mathbf {\mathcal {G}}+ \mu _{\chi _e + \chi _s} \cdot \mathbf {\mathcal {S}}_0 + \mu _{\chi _e + \chi _s} + \mathbf {\mathcal {C}}_1, \end{aligned}$$
(86)

we can use the failure term \(\mathbf {\mathcal {S}}_0^T \mathbf {\mathcal {C}}_0' + \mathbf {\mathcal {S}}_1'{}^T \mathbf {\mathcal {C}}_1 + \mathbf {\mathcal {G}}'\), which has exactly the same failure probability. However, this term will give slightly lower efficiency of failure boosting, as an adversary only considers \(\mathbf {\mathcal {C}}_0\) and \(\mathbf {\mathcal {C}}_1\), and not \(\mathbf {\mathcal {G}}\), to determine the weakness of ciphertexts. To apply this adjustment to previous techniques one just has to use the \(\mathbf {\mathcal {C}}_0'\), \(\mathbf {\mathcal {S}}_1'\) and \(\mathbf {\mathcal {G}}'\) instead of \(\mathbf {\mathcal {C}}_0\), \(\mathbf {\mathcal {S}}_1\) and \(\mathbf {\mathcal {G}}\).

6.4 Discussion

Figure 3 and Fig. 4 give an indication of the accuracy of our newly developed geometric-uneven methods. First, one can see that both in the case of Saber and Katana, the attack cost when performing no precomputation (the leftmost point on the curves) is approximately the failure probability. This is expected behaviour, but it is not the case for Saber in the geometric calculations following Sect. 5. This is a first indication that the geometric-uneven method is more accurate than the standard geometric method in this case.

Secondly, one can see that the geometric-uneven curve is relatively close to the Gaussian curve in the failure boosting (first failure) case. For Saber the geometric-uneven approximation gives a significantly more accurate result compared to the geometric approximation. Overall, the geometric-uneven estimation gives an overestimation of the attack cost, which is logical in view of the assumptions and approximations made in its derivation (e.g. only considering \(F_0\) and making the distributions symmetric). On the other hand, for Katana the geometric approach is more accurate than the geometric-uneven approach, which makes sense as the scheme has \(\chi _s = \chi _e\) and does not perform rounding.

One can therefore conclude that the geometric approach is best suited for symmetric non-rounding schemes like Katana, while the geometric-uneven approach is considerably better than the geometric approach for practical schemes such as Saber and Kyber.

7 Attack Constraints

In previous derivations, as in literature [11, 12], it is assumed that there is an unlimited number of possible ciphertexts. However, for schemes that use the FO transformation, ciphertexts are generated deterministicaly from a message \(m \in \mathcal {M}\), and as such there are only \(\left| \mathcal {M}\right| \) ciphertexts for each public key. When an attacker performs strong failure boosting, this maximum number of ciphertexts \(\left| \mathcal {M}\right| \) might be a limit to the number of weak ciphertexts an adversary can generate, which in turn could limit or even obstruct an attack.

In a failure boosting attack an adversary first searches for weak ciphertexts, which occur with a probability \(\alpha (f_t)\). This means that there are on average \(\left| \mathcal {M}\right| \cdot \alpha (f_t)\) weak ciphertexts that can be found at each target, and thus \(\left| \mathcal {M}\right| \cdot \alpha (f_t)\cdot T_1\) in total for \(T_1\) targets. It is expected that an attacker needs \(\beta (f_t)^{-1}\) of these weak ciphertexts to find one decryption failure and thus an adversary that wants to collect \(T_2\) failures would need \(\beta (f_t)^{-1} \cdot T_2\) weak ciphertexts. In short, there are on average \(\left| \mathcal {M}\right| \cdot \alpha (f_t)\cdot T_1\) weak ciphertexts available, and an adversary would need on average \(\beta (f_t)^{-1} \cdot T_2\) of them to proceed to the next phase of the attack.

From the above we can conclude that if \(\beta (f_t)^{-1}\cdot T_2 > \left| \mathcal {M}\right| \cdot \alpha (f_t)\cdot T_1\), it is probable that the attacker will not find sufficient unique ciphertexts to obtain \(T_2\) decryption failures. Even in the case where \(\beta (f_t)^{-1} \cdot T_2 \approx \left| \mathcal {M}\right| \cdot \alpha (f_t)\cdot T_1\), the attack will become less efficient as the adversary will with high probability generate non-unique weak ciphertexts, which requires him to restart the precomputation. For \(\beta (f_t)^{-1} \cdot T_2 < \left| \mathcal {M}\right| \cdot \alpha (f_t)\cdot T_1\), these effects can be expected to be negligible, as there will be enough weak ciphertexts to avoid duplication. To take this observation into account one can add an additional constraint in the attack calculations using the following restriction on \(f_t\): \(\beta (f_t)^{-1} \cdot \alpha (f_t)^{-1} < \left| \mathcal {M}\right| \cdot T_1 / T_2\).

Another possible obstacle for an attacker is the maximum depth \(D_{max}\) of the quantum computer used for the precomputation. Such depth limit reduces the Grover search success probability if \(\sqrt{\alpha (f_t)^{-1}} \gg D_{max}\). This can be compensated for by splitting the search space in p partitions and performing a Grover search of depth \(D_{max}\) in each partition. Asymptotically one would need \(\alpha (f_t)^{-1} / D_{max}^2\) partitions to find a weak ciphertext with probability close to 1.

Thus, when \(\sqrt{\alpha (f_t)^{-1}} \le D_{max}\), the maximum depth does not restrict the Grover search and the cost to find a weak ciphertext is \(\sqrt{\alpha (f_t)^{-1}}\), but when \(\sqrt{\alpha (f_t)^{-1}} > D_{max}\), the cost is \(D_{max} \cdot \alpha (f_t)^{-1} / D_{max}^2 = \alpha (f_t)^{-1} / D_{max}\).

8 Results

We calculated the multitarget attack cost using the geometric-uneven approach for all parameter sets of Saber, Kyber and uSaber with a query limit of \(2^{64}\) per target. In Table 3, we first give the attack cost for \(2^{40}\) and \(2^{64}\) targets following the procedure described until Sect. 6, where \(\left| \mathcal {M}\right| = \infty \) and \(D_{max} = \infty \).

We then recalculate the results for \(2^{64}\) targets with the following restrictions: in a first instance \(\left| \mathcal {M}\right| = 2^{256}\), which is the case for the current designs of these schemes, and a second instance \(\left| \mathcal {M}\right| \) is taken equal to the equivalent AES strength, i.e. \(2^{128}\) schemes that are in NIST category 1, \(2^{192}\) for schemes in NIST category 3 and \(2^{256}\) for schemes in NIST category 5. The maximum depth is in both cases set to \(D_{max} = 2^{96}\), which is the worst case scenario put forward by NIST [1]. A graphical overview of the attack for all parameter sets of Saber and Kyber is given in the eprint version of this paper, where the full line represents \(D_{max} = \infty \) and where the dotted line represents \(D_{max} = 2^{96}\).

An interested reader can generate their own numbers and figures for specific constraints using the python source code, which is made available at https://github.com/KULeuven-COSIC/PQCRYPTO-decryption-failures.

Table 3. Cost of failure boosting attacks on various schemes. Security values are given \(\log _2\). Empty cells indicate an attack is not possible in under \(2^{256} \mathcal {W}\).

8.1 Impact on Saber and Kyber

Before discussing the security impact of our attack on the targeted schemes, we want to go into some considerations considering the attack model. The failure boosting attack cost is expressed in terms of precomputational work \(\mathcal {W}\) and queries \(\mathcal {Q}\): \(1 \mathcal {W}\) refers to the cost of 1 offline encapsulation and the quantum speedup is assumed to be quadratic, ignoring subexponential costs; \(1 \mathcal {Q}\) describes the cost of 1 decapsulation, which is performed as classical computations.

In a real-life scenario, one needs to take into account the fact that \(1 \mathcal {Q}\) involves performing a decapsulation query online on the targets hardware, which might be a critical constraint in mounting a practical attack. For Saber, in an ideal scenario our attack requires at least \(2^{98}\) queries and thus encapsulations performed on the attacked hardware for an attack that costs \(2^{168} \mathcal {W}\). For the attack reported in Table 3, the query cost is \(2^{126}\) queries.

Moreover, in the offline precomputation step one has to take into account the cost of performing the encapsulation (\(1 \mathcal {W}\)). The Grover search is additionally constraint when considering a depth d for executing one encapsulation, leading to a cost of \(\alpha (f_t)^{-1} \cdot d / D_{max} \mathcal {W}\) when \(\sqrt{\alpha (f_t)^{-1}} > D_{max} / d\) where the cost of one encapsulation is still counted as \(1 \mathcal {W}\).

Our analysis shows that the category 3 instance of Saber is theoretically vulnerable for a decryption failure attack. A decryption failure attack on Saber would cost \(2^{145} \mathcal {W}\) and \(2^{126}\mathcal {Q}\) in the specific setting where \(q_{limit} = 2^{64}\) and \(T^{(0)} = 2^{64}\), which can be compared to the claimed \(2^{172}\) coreSVP security. However, practical execution of the attack would not be straightforward due to the constraints outlined above. The other parameter sets of Saber and Kyber are not vulnerable to the decryption failure attack we developed, in case of Kyber1024 and FireSaber this is due to the constraints on the number of ciphertexts due to \(\left| \mathcal {M}\right| \). The uSaber parameter sets are not vulnerable to the decryption failure attacks we developed, even without additional constraints.

8.2 Increasing the Attack Cost

One option to increase the attack cost could be to reduce \(\left| \mathcal {M}\right| \). Such a design change does not incur an efficiency cost but is limited by the security of the overall scheme as a too low value for \(\left| \mathcal {M}\right| \) could impact the security under traditional attacks. The effect of a reduction of \(\left| \mathcal {M}\right| \) to \(2^{128}\) and \(2^{192}\) for schemes of category 1 and 3 respectively is detailed in the last column of Table 3. Note that this change will especially restrain the efficiency of finding follow up failures, as the term \(\left| \mathcal {M}\right| \cdot T_1 / T_2\) is typically much higher for finding the first failure due to a high value of the number of targets \(T_1\). Therefore, a reduction in \(\left| \mathcal {M}\right| \) is also a good precaution for future advances in decryption failure attacks as will be discussed in Subsect. 8.3.

Looking at the error term \((\mathbf {e}+ \mathbf {u})^T \mathbf {s}' - \mathbf {s}^T (\mathbf {e}' + \mathbf {u}') + (\mathbf {e}''+\mathbf {u}'')\), the compression error \(\mathbf {u}''\) can be a significant factor in decryption failures in schemes with strong compression of \(\mathbf {v}'\) (i.e. large q/t). In this case the attack cost can be increased by increasing t. This comes at a modest cost in ciphertext size, but generally has no impact on the security of the scheme under non-decryption failure attacks. For Saber, increasing t to 2t would make the attack more expensive than solving the Mod-LWR problem while increasing the ciphertext size with only 256 bits. The impact of such a change for Saber is given in the last rows of Table 3.

If increasing t is not sufficient, one needs to adapt the distributions of \(\chi _s\) and \(\chi _e\), which would impact both security as design and thus would require a more in-depth analysis.

8.3 Possible Future Advances

In this subsection we go into detail on possible future advances in failure boosting and its cost estimations.

Failure Boosting The cost calculation of failure boosting takes into account both \(\mathbf {\mathcal {C}}\) and \(\mathbf {\mathcal {G}}\) and makes two assumptions. The first being that errors at different coefficients of the message are independent, which has been shown by D’Anvers et al. [13] to be a valid assumption for schemes without error correction. The second being the Gaussian assumption as discussed in Subsect. 3.2. As a result, the attack cost calculation of failure boosting is nearly optimal in the failure boosting framework.

Directional Failure Boosting The directional failure boosting calculation uses more assumptions and approximations that make the estimate less accurate. Specifically, the attack relies on two assumptions: The geometric-uneven assumption states that the distributions of \(\mathbf {\mathcal {S}}_0\), \(\mathbf {\mathcal {C}}_0\), \(\mathbf {\mathcal {S}}_1\) and \(\mathbf {\mathcal {C}}_1\) are multivariate Gaussian distributed with zero mean and equal variance for each coefficient. This is a fairly good approximation for binomial distributions with large variance, but is less accurate for small variance binomial distributions or uniform distributions as is the case in Kyber and Saber. The second assumption is the independency assumption that is also used in the failure boosting calculation and is valid for schemes without error correction.

Furthermore, the directional failure boosting calculation in this work considers a slightly suboptimal attack as some terms are not taken into account in the weak ciphertext selection criterion: First, the attack does not take into account \(\mathbf {\mathcal {G}}\) in the weak key selection (but it does for the failure probability calculation). Secondly, it removes the bias of \(\mathbf {\mathcal {S}}_0\) and \(\mathbf {\mathcal {C}}_1\) due to rounding, and adds it to the term \(\mathbf {\mathcal {G}}\) as explained in Subsect. 6.3. Therefore, the above approximations correspond to executable attacks, but the attack is slightly suboptimal as a better weak ciphertext selection criterion (e.g. taking \(\mathbf {\mathcal {G}}\) into account) would lead to a more efficient attack.

Finally, the directional failure boosting calculation makes two significant approximations: First, in the geometric-uneven directional failure boosting approach, only the error probability of the first bit of the message is considered. This would lead to an underestimation of the failure probability and thus an overestimation of the attack cost. Secondly, the distributions of the different variables are approximated using a limited number of points.

The previous assumptions and approximations are necessary to allow efficient calculation of the attack cost. However, they could result in a less optimal attack and a less accurate cost estimation for directional failure boosting. During the development of our cost estimation methods in Sects. 5 and 6 we showed that our calculation methods are still reasonably accurate using three checks:

First we checked the failure probability when no precalculation is performed, which should correspond to the failure probability of the scheme. As shown in the paper, this is always approximately the case for our cost estimation methods (but not in case of Saber or Kyber in the geometric case, which led us to argue that this method is not appropriate for Saber or Kyber).

Secondly, we checked our geometric and geometric-uneven methods in the failure boosting case using the more accurate Gaussian approximation, where we could see that our newly developed methods give approximately the same result. Note that this comparison is not possible in the directional failure boosting case.

Thirdly, we verified the geometric-uneven method using the geometric method in case of Katana. As the latter method makes less approximations and as its assumptions are valid for Katana, this comparison can be used to verify some of the new assumptions (i.e. removing the bias in Subsect. 6.3 and only considering errors at the first coefficient in Subsect. 6.1) made in the geometric-uneven method compared to the geometric method.

Table 4. Cost (\(\log _2\)) of obtaining the first and second failure in our multitarget attack and cost of obtaining only the first failure if the second failure would be free. \(q_{limit} = 2^{64}\) and \(T^{(0)} = 2^{64}\). Text is made bold for dominating factor in the attack cost. When performing a levelled multitarget attack where \(T^{(1)} \ne 1\), the search for the second failure is considered dominant.

Conclusion For schemes where the attack cost is dominated by finding the first failure, the calculated cost will be close to the optimal decryption failure attack cost (unless a radical new attack is discovered that outperforms failure boosting). For schemes with an attack cost dominated by directional failure boosting, the estimation will be less accurate. In a worst case attack scenario (from the designers perspective) one could assume the directional failure boosting cost to be reduced even more, leading to an attack that is essentially dominated by finding the first failure. Note that this is a very conservative approach and does not correspond to an existing attack scenario. An overview of the dominant attack costs can be found in Table 4.