Keywords

1 Introduction

In 2016, the U.S. National Institute of Standards and Technology (NIST) announced a Post-Quantum Cryptography (PQC) standardization process aimed at updating NIST’s public-key cryptographic standards to include post-quantum cryptography, that is, cryptographic algorithms that are thought to be secure against attacks by a quantum computer. One of the remaining code-based candidates in the NIST PQC Standardization process is BIKE, a cryptosystem based on quasi-cyclic moderate density parity check (QC-MDPC) codes.

The BIKE cryptosystem was originally designed for ephemeral use, that is in settings where a KEM key pair is generated for every key exchange. The requirement for BIKE to be used ephemerally provides a countermeasure to a reaction attack by GJS [10] wherein an attacker can use knowledge of messages that lead to decoding failures to recover the private key of a scheme. During the second and third round of the NIST PQC process, BIKE proposed parameter sets that were designed to provide security in the static-key setting [1], that is, a setting where KEM key pairs can be reused for several key exchanges. In fact all the parameter sets in the third round specification of BIKE are designed to be secure in the static-key setting, although they do not formally claim to be secure in this setting. While security in the ephemeral setting can be provided by a scheme meeting the weaker IND-CPA security notion, security in a static-key setting requires a scheme meeting the stronger IND-CCA2 security notion. Achieving IND-CCA2 security requires that BIKE’s decoder has a sufficiently low decoding failure rate (DFR), both because the security proof of BIKE in the IND-CCA2 setting assumes a low DFR, and because if a QC-MDPC cryptosystem with a sufficiently high DFR is used in the static-key setting, it would allow an attacker to perform the GJS attack with a high probability of success.

By design, it is not feasible to directly compute an average DFR for BIKE at cryptographically relevant security levels. It is possible to measure DFRs for smaller code sizes and then use extrapolation methods to estimate the DFR for larger parameters [9, 17]. One must consider the phenomenon known as the error floor region of DFR curves to avoid an underestimate of DFR for larger code sizes. It is known that for LDPC and MDPC codes, the logarithm of the DFR drops significantly faster than linearly, and then linearly as the signal-to-noise ratio is increased [15, 21]. Thus a typical DFR curve contains a concave waterfall region followed by a near-linear error floor region. One must accurately predict the error floor of a DFR curve to accurately predict the DFR for cryptographically relevant code sizes.

The error floor regions for low density parity check (LDPC) codes have been extensively analyzed in the literature. These are codes which can be defined by parity check matrices \(H_{k \times n}\) with row Hamming weight on the order of O(1), or up to \(O(\log (2n))\). For each parity check matrix, there is a corresponding bipartite graph, known as a Tanner graph. Much analysis of iterative LDPC decoding behavior focuses on properties of Tanner graph representations of the code [4, 14,15,16, 24], such as identifying stopping sets and trapping sets.

Recent work [22, 23] has considered several factors affecting the DFR of QC-MPDC codes: choice of decoder [17, 20], classes of weak keys, and sets of problematic error patterns. It was noted that error vectors with a small Hamming distance from problematic error patterns—error vectors of low weight that emit syndromes of low weight—are significant contributors to the error floors of QC-MDPC codes and it was concluded that these vectors were rare enough to not affect the overall DFR predictions for higher code sizes.

In this work, we examine the error floor behavior of QC-MDPC codes and focus on a scaled-down version of BIKE. Existing analysis of the DFR for BIKE [9, 17] relies on extrapolations based only on modifying the block size, but this analysis is only accurate if an upper bound can be established for the DFR at which the transition to error floor behavior occurs (see e.g. Assumption 3 on page 7 of [17]). Vasseur’s thesis uses experiments with error vectors based on known classes of codewords and near-codewords to give an upper bound for the transition DFR. We try to directly measure the transition point and see if it can be modeled based on the known contributions to error floor behavior described in Vasseur’s thesis, but we cannot directly measure the transition point for cryptographic size parameters, since that transition occurs at too low a DFR. We use the Black-Grey-Flip decoder [9], the recommended BIKE decoder as of the time of writing, and filter out any keys belonging to the classes of weak keys defined by [22]. We consider the three sets of near codewords as defined in [22] and find that error vectors that lead to decoding failures have small (between 2 and 8 bits) support intersections with elements of this set. We conclude that error vectors that emit syndromes of low weight are significant contributors to decoding failures, but are not fully captured by the sets of near codewords defined in [22].

2 Background

2.1 Coding Theory and QC-MDPC Codes

Throughout this document, let \(\mathbb {F}_2\) denote the finite field of two elements. For \(r \in \mathbb {N}\), \(x \in \mathbb {F}_2^r\), let \(|x |\) denote the Hamming weight of x. For two vectors \(x,y \in \mathbb {F}_2^r\), let \(x \star y = (x_0 \cdot y_0, x_1 \cdot y_1, \ldots , x_{r-1} \cdot y_{r-1})\) denote the Schur product. Let \(\mathcal {C}(n,k)\) be a binary linear code, \(n,k \in \mathbb {N}\). Then \(\mathcal {C}:\mathbb {F}_2^k \rightarrow \mathbb {F}_2^n\) maps information words to codewords and the set of \(2^k\) codewords forms a k-dimensional vector space of \(\mathbb {F}_2^n\). Let \(\mathcal {B}=\{b_0, b_1, \ldots , b_{k-1}\}\) be a basis for this subspace, \(b_i \in \mathbb {F}_2^{n}\). Then the code \(\mathcal {C}\) can be described by a generator matrix

$$\begin{aligned} G= \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_{k-1} \end{bmatrix}. \end{aligned}$$

The code can equivalently be described by a parity check matrix \(H \in \mathbb {F}_2^{n-k \times n}\) which is a generator matrix for the dual code \(\mathcal {C}^{\perp } = \{x \in \mathbb {F}_2^{n} : \forall c \in \mathcal {C}, x \cdot c = 0\}\). Thus the following relationship holds: \(HG^T = 0 \in \mathbb {F}_2^{k \times n-k}\). For any vector \(y \in \mathbb {F}_2^{n}\), and parity check matrix H, the matrix-vector product \(Hy^T = s \in \mathbb {F}_2^{n-k}\) is known as the syndrome. For any y such that \(Hy^T = 0 \in \mathbb {F}_2^{n-k}\), y is a codeword (i.e., \(y \in \mathcal {C}\)).

A \(v \times v\) circulant matrix is a square matrix such that each row \(r_{i+1}\) is one shift to the right of the previous row \(r_i\) for \(i \in \{0, 1, \ldots , v-1\}\). The ring of \(v \times v\) circulant matrices over \(\mathbb {F}_2\) is isomorphic to the polynomial ring \(\mathbb {F}_2[x]/\langle x^v+1 \rangle \). A quasi-cyclic (QC) matrix is a block sum of circulant matrices.

2.2 BIKE

Bit-flipping Key Encapsulation (BIKE) is a cryptosystem based on binary linear codes with quasi-cyclic structure and moderately sparse private keys [1]. The private key \(H \in \mathbb {F}_2^{r \times 2r}\) is composed of two circulant blocks: \(H_0, H_1\) of size \(r \times r\) with r prime and such that \(x^r-1\) has only two irreducible factors modulo 2. The columns of H have weight d and the rows \(h_i\) of H are such that \(|h_i |=w=2d\) for all \(i \in \{0, \ldots , r-1\}\). MDPC code parameters satisfy row weight \(w \approx \sqrt{n}\) for n the length of the code.

At a high level, the public-key encryption system underlying the BIKE KEM is composed of three algorithms: key generation, encryption, and decryption. The key generation algorithm generates a private key \(H=[H_0 | H_1] \in \mathbb {F}_2^{r \times 2r}\) and public key \(H'\) is H in systematic form (\(H' = H_0^{-1}H\)). To encrypt a message m, a sender must encode m into a vector e of suitable weight t, then compute the syndrome \(H'e^T=s\). The receiver decrypts by decoding the syndrome s using the secret key H and a predefined syndrome decoding algorithm. The recommended BIKE syndrome decoder as of the time of writing is the Black-Grey-Flip decoder [9].

Let \(\lambda \) denote the security parameter and let H denote a BIKE secret key. The security of BIKE depends on the inability of an attacker to break (variants of) the syndrome decoding problem(s). The best known attacks are information set decoding (ISD) algorithms, first introduced in 1962 by Prange [13] and later improved in dozens of works yielding small change in the overall asymptotic cost. (See [5, 12, 19] for a non-exhaustive list). Thus, for BIKE to achieve \(\lambda \) bits of security against the best known ISD attacks [7], the BIKE team determined that

$$\begin{aligned} \lambda \approx t - \frac{1}{2}\log _2 r \approx w - \log _2 r \end{aligned}$$

where r denotes the circulant block size of H, w denotes the row weight of H, and t denotes the weight of the error vector in which a message is encoded [1].

2.3 Weak Keys and Near Codewords

For security level \(\lambda \), the average decoding failure rate \(\text {DFR}_{\mathcal {D},\mathcal {H}}\) for an IND-CCA secure cryptosystem should be \(\le 2^{-\lambda }\) where \(\mathcal {D}\) denotes the decoder and \(\mathcal {H}\) the key space. A set \(\mathcal {W}\subset \mathcal {H}\) of keys is said to be weak if:

$$\begin{aligned} \frac{|\mathcal {W}|}{|\mathcal {H}|}\text {DFR}_{\mathcal {D},\mathcal {W}}> 2^{-\lambda } \ge \text {DFR}_{\mathcal {D},\mathcal {H}}. \end{aligned}$$

In [22, Chapter 15], Vasseur identifies three types of weak keys for the BIKE cryptosystem:

  • Type I: keys with many consecutive nonzero bits in the rows of one of the cyclic blocks, first identified by [8].

  • Type II: keys with nonzero bits at many regular intervals in the rows of one of the cyclic blocks.

  • Type III: keys with many intersections between the columns of the two cyclic blocks.

It is known that some sets of vectors are more likely to cause decoding failures than on average. A (uv)-\(\textit{near codeword}\) for a parity-check matrix H is an error vector e with Hamming weight u whose syndrome \(s=He^T\) has weight v [11]. When uv are small, these near codewords can be likely to cause decoding failures [15]. Based on the structure of BIKE, Vasseur defines three sets with small uv as follows:

  • \(\mathcal {C}\): vectors which form the rows of the generator matrix \(G=[H_1^T|H_0^T]\); these are codewords of weight w for the secret key \(H=[H_0|H_1]\).

  • \(\mathcal {N}\): the set of (dd)-near codewords of the form \((v_0,{\textbf {0}})\) or \(({\textbf {0}},v_1)\), where \({\textbf {0}}\in \mathbb {F}_2^r\) and \(v_i\) is a row of the circulant block \(H_i\) of the parity check matrix.

  • \(2\mathcal {N}\): the set of vectors formed by sums of two vectors in \(\mathcal {N}\). Due to the small chance of cancellation, one may consider the set 2\(\mathcal {N}\) as \((w-\epsilon _0, w-\epsilon _1)\)-near codewords for some small \(\epsilon _i \ge 0, i\in \{0,1\}\).

3 Methods

Cryptographically relevant DFRs are too low (\(< 2^{-128}\)) to directly measure; it is only possible to measure DFRs for smaller code sizes, then use extrapolation methods to estimate the DFR to larger parameters. Some examples of this approach can be found in [8, 9, 18]. In this ongoing work, we begin by analyzing the decoding behavior for BIKE parameter sets targeting 20 bits of security in several experiments.

Parameters were selected according to BIKE design principles with the maximum error weight t reduced to prevent any inadvertent increase in decoding failures. Initial selected parameters are as follows: \((r,w,t,\lambda ) = (523,30,18,20)\). Later we include \(389 \le r \le 827\) for prime r such that \(x^r - 1\) has only two irreducible factors modulo 2.

We use the Black-Grey-Flip (BGF) decoder in all experiments. We used the original threshold selection function, defined in section 2.5.1 of the BIKE v1.0 specification [2], to compute the bit-flip threshold for all instances. The affine threshold functions in the current version of BIKE are derived from this original threshold rule. We precomputed the values used in the threshold function and stored them in a hash table for ease of computation.

Vasseur identifies three classes of weak keys that impede decoding (see Sect. 2.3 for the definitions of these classes) and describes an algorithm for filtering out weak keys [22, Algorithm 15.3]. We implement this algorithm and use it to reject weak keys. The definition of weak key depends on a parameter T, which Vasseur sets to 10 for BIKE parameters in the cryptographically relevant range (\(\lambda \ge 128\)). (Note that smaller values of T mean that more keys are excluded.)

We instead use \(T = 3\) for the weak key threshold, the smallest value of T for which finding non-weak keys is feasible. This is justified by the following empirical observation: If we set \(T = 4\), the decoding failure rate increases enormously; for example, an experiment with \((r, T) = (587, 4)\) observed a DFR on the order of \(2^{-8}\), compared to around \(2^{-20}\) for \((r, T) = (587, 3)\). Thus, to measure the DFR for non-weak keys, we must set \(T = 3\).

We use the Boston University Shared Computing Cluster [6], a heterogeneous Linux-based computing cluster with approximately 21000 cores, to run SageMath implementations of the BGF decoder [1, 9] in all experiments. The experiments yielded a graph with both the waterfall and error floor regions for our parameter set in addition to many explicit examples of decoding failures that can be used for future analysis. All raw data and the decoder used for this paper are available at [3].

4 Average DFR over Full Message Space

We first compute an average DFR for all suitable block lengths r as follows. For r in Table 1, we sample a random key H, rejecting any weak keys of types I, II, III [22], a random vector e of weight t, compute \(s=He^T\), run BGF decoder on input (Hs), and record the total number of failures. This procedure is run N times where N varies flexibly (\(N \in \{10^3, 10^4, 10^5, 10^6, 10^7, 10^8\}\)) to ensure there are enough decoding failures at each r for robust statistical analysis. In the waterfall region, fewer decoding trials were needed to get a statistically adequate number of decoding failures. As r increased, the number of trials needed increased. For \(r>587\), decoding failures were exceptionally sparse. Since these computations get quite expensive and the log-DFR rate was decreasing only linearly for \(r>587\), we chose not to continue increasing the number of trials. The error vectors tested in the DFR experiment all had weight 18. The results of this experiment are displayed in Table 1 and plotted with best fit curves in Fig. 1.

Table 1. Decoding failure rates for r-values such that \(389 \le r \le 827\), r is prime, and \(x^r - 1\) has only two irreducible factors modulo 2. The data was computed using the parameters and methods described above.
Fig. 1.
figure 1

Decoding failure rates as in Table 1 on a semi-log graph, with a quadratic best fit (blue) in the waterfall region \(r < 587\) and a linear best fit (red) in the error floor region \(r \ge 587\). (Color figure online)

We define a decoding failure as any instance where, on input (Hs), where s is of the form \(s=He^T\), the syndrome decoder output \(e'\) is such that \(He'^T \ne s\) or \(e' \ne e\). The experiment was also designed to record any decoding instances where \(He'^T=s\) and \(e'\ne e\), but none were discovered.

5 DFR on \(\mathcal {A}_{t,\ell }(\mathcal {S})\) Sets

Vasseur identified and studied the influence of the proximity of error vectors to any \(\mathcal S \in \{\mathcal {C,N},2\mathcal {N}\}\), described in Sect. 2.3, on the DFR [22]. To quantify how close certain error vectors are to such a set \(\mathcal S \in \{\mathcal {C,N},2\mathcal {N}\}\), Vasseur introduces the set

$$ \mathcal A_{t,\ell }(\mathcal S) = \{v \in \mathbb F_2^{2r} : |v \star c |= \ell \text { for some } c \in \mathcal S\}, $$

where t is the error vector weight and \(\ell \) is the number of overlaps with an element of \(\mathcal {S}\). To convert \(\ell \) to a distance, for \(v \in A_{t,\ell }(\mathcal S)\) we define

$$ \delta (v) = |c |+t-2\ell $$

where c is a vector in \(\mathcal S\) with \(|v \star c |= \ell \). For \(\delta \) low (equivalently, \(\ell \) high), decoding failures are extremely common; see Fig. 2 for evidence at the 20-bit security level.

Fig. 2.
figure 2

20-bit security DFR versus \(\delta \) for near-codeword sets \(\mathcal {C},\mathcal {N}, 2\mathcal {N}\) for \(r = 523\)

It is natural to consider the extent to which \(\mathcal A_{t,\ell }(\mathcal S)\) for some \(\ell \) and some \(\mathcal S \in \{\mathcal C, \mathcal N, 2\mathcal N\}\) captures vectors which cause decoding failures. Our simulations indicate that it is extremely unlikely for a typical decoding failure vector to be in \(\mathcal A_{t,\ell }(\mathcal S)\) for any \(\mathcal S\) with a high \(\ell \). We define the max overlap of a decoding failure vector v with a \(\mathcal A_{t,\ell }(\mathcal S)\) set for fixed \(\mathcal {S}\) to be the largest value of \(\ell \) for which \(v \in A_{t,\ell }(\mathcal S)\). Using experimental data from \(r=587, N=10^8\) we recorded 128 total decoding failures and stored the 128 random error vectors that led to decoding failure. The relationship between these decoding failure vectors and the sets \(\mathcal S\) is shown below; see Fig. 3a. We also repeated experiments for \(r = 613\) and \(r = 619\) with \(N = 10^8\), recording 61 and 60 decoding failures, respectively. See Figs. 4 and 5 for this data.

Although the maximum value of \(\ell \) is \(t = 18\), the recorded values of \(\ell \) never exceed 10. In fact, cases of \(\ell =10\) are quite rare. The values of \(\ell \) recorded in experiments with vectors involved in decoding failures are greater than those of randomly sampled vectors, but it is expected that near-codewords and codewords of low weight overwhelmingly influence decoding failures in the error floor region [11]. From our results, it appears that only a minority of the error vectors producing a decoding failure are unusually close to a near-codeword or codeword of low weight. More analysis is needed to assess the relationships between the special sets \(\mathcal {S}\) and decoding failures.

Notice that vectors close to a set \(\mathcal S\) also have low syndrome weight; see Fig. 6. Moreover, as \(\ell \) decreases, the syndrome weights approach the average.

From this, we are motivated to analyze to what extent syndrome weight predicts decoding failures.

6 Distribution of Syndrome Weight

We investigate the syndrome weights of error vectors causing decoding failures and compare them with those of generic vectors.

Fig. 3.
figure 3

For the 128 vectors v with \(r = 587\), \(d = 15\), \(t = 18\) which caused decoding failures, we compute the distances from the sets \(\mathcal {C}, \mathcal {N}, 2\mathcal {N}\) as measured by the maximum number of intersections with an element of these sets. Here, \(\ell := |v \star c |\) for \(c\in \mathcal {C}, \mathcal {N}, 2\mathcal {N}\). We do the same computation for 128 randomly generated vectors under the same parameters.

Fig. 4.
figure 4

For the 61 vectors v with \(r = 613\), \(d = 15\), \(t = 18\) which caused decoding failures, we compute the distances from the sets \(\mathcal {C}, \mathcal {N}, 2\mathcal {N}\) as measured by the maximum number of intersections with an element of these sets. Here, \(\ell := |v \star c |\) for \(c\in \mathcal {C}, \mathcal {N}, 2\mathcal {N}\). We do the same computation for 61 randomly generated vectors under the same parameters.

Fig. 5.
figure 5

For the 60 vectors v with \(r = 619\), \(d = 15\), \(t = 18\) which caused decoding failures, we compute the distances from the sets \(\mathcal {C}, \mathcal {N}, 2\mathcal {N}\) as measured by the maximum number of intersections with an element of these sets. Here, \(\ell := |v \star c |\) for \(c\in \mathcal {C}, \mathcal {N}, 2\mathcal {N}\). We do the same computation for 60 randomly generated vectors under the same parameters.

Fig. 6.
figure 6

Syndrome weight of error vectors in \(\mathcal A_{t,\ell }(\mathcal {S})\) as \(\ell \) (the maximum number of overlaps with an element of the set \(\mathcal {S}\)) varies, for \(r = 587\), \(t = 18\). Average syndrome weight for an error vector of weight \(t = 18\) was approximately 180.712, plotted as the dotted horizontal line.

Fig. 7.
figure 7

Syndrome weights of random vectors with \(t = 18\) (red circles) and vectors causing decoding failures (blue diamonds). (Color figure online)

Fig. 8.
figure 8

A comparison of syndrome weights for \(r = 587\) between the 128 error vectors which were found to be involved in decoding failures and \(10^5\) random vectors. Vertical axis is frequency, and horizontal axis is syndrome weight.

Table 2. Hypothesis test results for \(509 \le r \le 827\), with the corresponding test statistic values and p-values, indicating the vectors causing decoding failures do have lower syndrome weights than generic vectors for \(509 \le r \le 701\), notably a selection of r-values where the waterfall region meets the error floor in the DFR graph of Fig. 1.

Figure 7 and Fig. 8 are obtained by generating \(10^3\) instances of non-weak parity check matrices H, random error vectors e, and then we compute the average weight of their syndromes \(s=He^T\). For the ones causing decoding failures, we extract the information from our DFR computations containing the corresponding parity check matrices and error vectors and then we compute the average weight of their syndromes.

We observe that the syndrome weights of generic vectors tend to follow a normal distribution while the error vectors causing decoding failures have syndrome weights that are more concentrated around the mean, which we hypothesise to be lower than that of the generic vectors; see Fig. 8 for the case \(r=587\), where we compare the syndrome weights of the 128 vectors which caused decoding failures with the syndrome weights of the \(10^5\) randomly generated vectors of the same weight \(t = 18\).

Figure 8 displays histograms of the syndrome weights of generic vectors and error vectors causing decoding failures for \(r = 587\). Similarly, for the ten r values with \(509\le r\le 653\), we use data from the previous DFR computation and an additional \(10^3\) simulations of random error vectors to compare their syndrome weights. Using this data, we explore whether or not there is convincing evidence that the syndrome weights of error vectors causing decoding failures are lower than those of generic vectors. The null hypothesis is that there is no difference between the two groups in consideration while the alternative hypothesis is that the generic vectors have higher syndrome weights. Both data come from random, independent sampling and have data sets with more than 30 observations. The difference in sample means may be modeled using a t-distribution. For each r, one could compute the point estimates \(m_{\text {generic}} - m_{\text {DF}}\) of population difference \(\mu = \mu _{\text {generic}} - \mu _{\text {DF}}\) and standard errors of the point estimate

$$SE = \sqrt{\frac{\sigma _{\text {generic}}^2}{N_{\text {generic}} } +\frac{\sigma _{\text {DF}}^2}{N_{\text {DF}} } }.$$

With this information, one could compute the test statistic for this (one-tailed) test by the formula \(T = \frac{\mu - 0}{SE}\). Using either a t-table or statistics software, we can find appropriate degrees of freedom and from there, the p-value, for each r. Our conclusion is that for the sixteen r-values in the range \(509 \le r \le 827\), the p-value is less than the significance value \(\alpha = 0.01\), and therefore we reject the null hypothesis, i.e., syndrome weights of error vectors causing decoding failures are lower than those of generic vectors. A general summary of the test statistic values \(m_{\text {generic}} - m_{\text {DF}}\) and the corresponding p-values can be found in Table 2.

7 Conclusion

In order to claim IND-CCA2 security with confidence for the proposed parameter sets of the BIKE cryptosystem, it is necessary to demonstrate that the BIKE decoder fails with cryptographically low probability on honestly generated ciphertexts. Such a low decoding failure rate cannot be directly measured, but is instead estimated by extrapolation from parameters with directly measurable decoding failure rates. In order for this analysis to be accurate, one must account for error floor behavior.

In our analysis of the BIKE cryptosystem at the 20-bit security level, we find that vectors which cause decoding failures have lower than average syndrome weight. However, identifying where these low syndrome weight vectors come from is still an open question. In [22, 23], Vasseur proposes three classes of low syndrome weight vectors: \(\mathcal {C}\), \(\mathcal {N}\), and \(2\mathcal {N}\). Vasseur also describes sets \(\mathcal {A}_{t,\ell }(S)\) of vectors which are close to the sets \(S\in \{\mathcal {C}\), \(\mathcal {N}\), \(2\mathcal {N}\}\). In our work, while we do find that Vasseur’s sets do contain many vectors that cause decoding failures, we do not find that these classes of vectors are responsible for the bulk of the decoding failures.

It therefore remains for future work to identify further classes of error vectors that might account for the observed decoding failures in our experiments. If these can be identified it may be possible to predict error floor behavior for larger parameters, and thereby identify parameter sets that have a sufficiently low decoding failure rate to be used for IND-CCA2 security in the BIKE cryptosystem.