1 Introduction

The Learning Parity with Noise problem (L P N) is a well-known problem studied in cryptography, coding theory and machine learning. In the L P N problem, one has access to queries of the form (v, c), where v is a random vector and the inner product between v and a secret vector s is added to some noise to obtain c. Given these queries, one has to recover the value of s. So, the problem asks to recover a secret vector s given access to noisy inner products of itself with random vectors.

It is believed that L P N is resistant to quantum computers so it is a good alternative to the number-theoretic problems (e.g. factorization and discrete logarithm) which can be solved easily with quantum algorithms. Also, due to its simplicity, it is a nice candidate for lightweight devices. As applications where L P N or L P N variants are deployed, we first have the HB family of authentication protocols: HB [27], HB+ [28], HB++ [11], HB# [21] and A U T H [31]. An L P N-based authentication scheme secure against Man-in-the-Middle was presented in Crypto 2013 [35]. There are also several encryption schemes based on L P N: Alekhnovich [3] presents two public-key schemes that encrypt one bit at a time. Later, Gilbert, Robshaw and Seurin [21] introduce LPN-C, a public-key encryption scheme proved to be I N D- C P A. Two schemes that improve upon Alekhnovich’s scheme are introduced in [16] and [15]. In PKC 2014, Kiltz et al. [30] propose an alternative scheme to [16]. Duc and Vaudenay [18] introduce HELEN, an L P N-based public-key scheme for which they propose concrete parameters for different security levels. A PRNG based on L P N is presented in [8] and [4].

The L P N problem can also be seen as a particular case of the L W E [38] problem where we work in \(\mathbb {Z}_{2}\). While in the case of L W E the reduction from hard lattice problems attests the hardness [10, 37, 38], there are no such results in the case of L P N. The problem is believed to be hard and is closely related to the long-standing open problem of efficiently decoding random linear codes.

In the current literature, there are few references when it comes to the analysis of L P N. The most well-known algorithm is B K W [9]. When introducing the HB+ protocol [28], which relies on the hardness of L P N, the authors propose parameters for different levels of security according to the B K W performance. These parameters are shown later to be weaker than thought [20, 33]. Fossorier et al. [20] provide a new variant that brings an improvement over the B K W algorithm. Levieil and Fouque [33] also give a formal description of the B K W algorithm and introduce two improvements over it. For their algorithm based on the fast Walsh-Hadamard transform, they provide the level of security achieved by different instances of L P N. This analysis is referenced by most of the papers that make use of the L P N problem. While they offer a theoretical analysis and propose secure parameters for different levels of security, the authors do not discuss how their theoretical bounds compare to practical results. As we will see, there is a gap between theory and practice. In the domain of machine learning, [22, 40] also cryptanalyse the L P N problem. The best algorithm for solving L P N was presented at Asiacrypt 2014 [24]. This new variant of B K W uses covering codes as a novelty.

While these algorithms solve the general case when we have a random secret, in the literature there is no analysis and implementation done for an algorithm specially conceived for the sparse secret case, i.e. the secret has a small Hamming weight.

The B K W algorithm can also be adapted to solve the L W E problem in exponential time. Implementation results and improvements of it were presented in [1, 2, 17]. In terms of variants of L P N, we have Ring- L P N [25] and Subspace L P N [31]. As an application for Ring- L P N we have the Lapin authentication protocol [25] and its cryptanalysis in [6, 23].

Motivation & contribution

Our paper comes to address exactly the aforementioned open problems, i.e. the gap between theory and practice and the analysis of an L P N solving algorithm that proves to be better than B K W and its variants in the case of a sparse secret. First, we present the current existing L P N solving algorithms in a unified framework. For these algorithms, we provide experimental results and give a better theoretical analysis that brings an improvement over the work of Levieil and Fouque [33]. Furthermore, we implement and analyse three new algorithms for the case where the secret is sparse. Our results show that for a sparse secret the B K W family of algorithms is outperformed by an algorithm that uses Gaussian elimination. Our motivation is to provide a theoretical analysis that matches the experimental results. Although this does not prove that L P N is hard, it gives tighter bounds for the parameters used by the aforementioned cryptographic schemes. It can also be used to have a tighter complexity analysis of algorithms related to L P N solving. Our results were actually used in [24] and also for L W E solving in [17].

Organization

In Section 2 we introduce the definition of L P N and present the main L P N solving algorithms. We also present the main ideas of how the analysis was conducted in [33]. We introduce novel theoretical analyses and show what improvements we bring in Section 3. Besides analysing the current existing algorithms, we propose three new algorithms and analyse their performance in Section 4. In Section 5, we provide the experimental results for the algorithms described in Sections 3 & 4. We compare the theory with the practical results and show the tightness of our query complexity. We provide a comparison between all these algorithms in Section 6 and propose practical parameters for a 80 bit security level.

Notations and preliminaries

Let 〈⋅,⋅〉 denote the inner product, \(\mathbb {Z}_{2} = \{0,1\}\) and ⊕ denote the bitwise XOR. For a domain \(\mathcal {D}\), we denote by \(x \overset {U}{\leftarrow } \mathcal {D}\) the fact that x is drawn uniformly at random from \(\mathcal {D}\). We use small letters for vectors and capital letters for matrices. We represent a vector v of size k as v = (v 1,…,v k ), where v i is the i th bit of v. We denote the Hamming weight of a vector v by H W(v). By B e r τ we define the Bernoulli distribution with parameter τ, i.e. for a random variable X, \(\Pr [X = 1] = \tau = 1 - \Pr [X = 0]\). The bias of a boolean random variable X is defined as δ = E((−1)X). Thus, for a Bernoulli variable we have δ = 1−2τ.

2 L P N

In this section we introduce the L P N problem and the algorithms that solve it. For ease of understanding, we present the L P N solving algorithms in a unified framework.

2.1 The L P N problem

Intuitively, the L P N problem asks to recover a secret vector s given access to noisy inner products of itself and random vectors. More formally, we present below the definition of the L P N problem.

Definition 1 (L P N oracle)

Let \(s \overset {U}{\leftarrow } \mathbb {Z}_{2}^{k}\) and let \(\tau \in ] 0, \frac {1}{2} [\) be a constant noise parameter. Denote by D s, τ the distribution defined as

$$\left\{ (v, c) \mid v \overset{U}{\leftarrow} \mathbb{Z}_{2}^{k}, c = \langle v,s \rangle \oplus d, d \leftarrow \mathsf{Ber}_{\tau}\right\} \in \mathbb{Z}_{2}^{k+1}. $$

An L P N oracle \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\) is an oracle which outputs independent random samples according to D s, τ .

Definition 2 (Search L P N problem)

Given access to an L P N oracle \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\), find the vector s. We denote by L P N k, τ the L P N instance where the secret has size k and the noise parameter is τ. Let \(k^{\prime } \leq k\). We say that an algorithm \(\mathcal {M}~(n,t,m,\theta ,k^{\prime })\)-solves the search L P N k, τ problem if

$$\Pr\left[ \mathcal{M}^{\mathcal{O}^{\mathsf{LPN}}_{s,\tau}}(1^{k}) = \left( s_{1} {\ldots} s_{k^{\prime}}\right) \mid s \overset{U}{\leftarrow}\mathbb{Z}_{2}^{k} \right] \geq \theta, $$

and \(\mathcal {M}\) runs in time t, uses memory m and asks at most n queries from the L P N oracle.

Note that we consider here the problem of recovering the first \(k^{\prime }\) bits of the secret. We will show in Section 3 that for all the algorithms we consider, the cost of recovering the full secret s is dominated by the cost of recovering the first block of \(k^{\prime }\) bits of s.

An equivalent way to formulate the search L P N k, τ problem is as follows: given access to a random matrix \(A \in \mathbb {Z}_{2}^{n \times k}\) and a column vector c over \(\mathbb {Z}_{2}\), such that A s Td = c, find the vector s. Here the matrix A corresponds to the matrix that has the vectors v on its rows, s is the secret vector of size k and c corresponds to the column vector that contains the noisy inner products. The column vector d is of size n and contains the corresponding noise bits.

One may observe that with τ = 0, the problem is solved in polynomial time through Gaussian elimination given n = Θ(k) queries. The problem becomes hard once noise is added to the inner product. The value of τ can be either independent or dependent of the value k. Usually the value of τ is constant and independent from the value of k. A case where τ is taken as a function of k occurs in the construction of the encryption schemes [3, 15]. Intuitively, a larger value of τ means more noise and makes the problem of search L P N harder. The value of the noise parameter is a trade-off between the hardness of the L P N k, τ and the practical impact on the applications that rely on this problem.

The L P N problem has also a decisional form. The decisional L P N k, τ asks to distinguish between the uniform distribution over \(\mathbb {Z}_{2}^{k+1}\) and the distribution \(\mathcal {D}_{s,\tau }\). A similar definition for an algorithm that solves decisional L P N can be adopted as above. Let \(\mathcal {U}_{k+1}\) denote an oracle that outputs random vectors of size k+1. We say that an algorithm \(\mathcal {M} (n,t,m,\theta )\)-solves the decisional L P N k, τ problem if

$$\left|\Pr\left[ \mathcal{M}^{\mathcal{O}^{\mathsf{LPN}}_{s,\tau}}(1^{k}) = 1 \right] - \Pr\left[ \mathcal{M}^{\mathcal{U}_{k+1}}(1^{k}) = 1\right] \right| \geq \theta $$

and \(\mathcal {M}\) runs in time t, uses memory m and needs at most n queries.

Search and decisional L P N are polynomially equivalent. The following lemma expresses this result.

Lemma 1 ([8, 29])

If there is an algorithm \(\mathcal {M}\) that (n,t,m,θ)-solves the decisional L P N k,τ , then one can build an algorithm \(\mathcal {M^{\prime }}\) that \((n^{\prime },t^{\prime },m^{\prime },\theta ^{\prime },k)\)-solves the search L P N k,τ problem, where \(n^{\prime } = \mathcal {O}(n \cdot \theta ^{-2} \log {k})\), \(t^{\prime } = \mathcal {O}(t \cdot k \cdot \theta ^{-2} \log {k} )\), \(m^{\prime } = \mathcal {O}(m \cdot \theta ^{-2} \log {k}))\) and \(\theta ^{\prime } = \frac {\theta }{4}\).

We do not go into details as this is outside the scope of this paper. We only analyse the solving algorithms for search L P N. From now on we will refer to it simply as L P N.

2.2 L P N solving algorithms

In the current literature there are several algorithms to solve the L P N problem. The first that appeared, and the most well known, is B K W [9]. This algorithm recovers the secret s of an L P N k, τ instance in sub-exponential \(2^{\mathcal {O}\left (\frac {k}{\log k}\right )}\) time complexity by requiring a sub-exponential number \(2^{\mathcal {O}\left (\frac {k}{\log k}\right )}\) of queries from the \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\) oracle. Levieil and Fouque [33] propose two new improvements which are called L F 1 and L F 2. Fossorier et al. [20] also introduce a new algorithm, which we denote F M I C M, that brings an improvement over B K W. The best algorithm to solve L P N was recently presented at Asiacrypt 2014 [24]. It can be seen as a variant of L F 1 where covering codes are introduced as a new method to improve the overall algorithm. All these algorithms still require a sub-exponential number of queries and have a sub-exponential time complexity.

Using B K W as a black-box, Lyubashevsky [34] introduces a ”pre-processing” phase and solves an L P N k, τ instance with k 1+η queries and with a time complexity of \(2^{\mathcal {O}\left (\frac {k}{\log \log k}\right )}\). The queries given to B K W have a worse bias of \(\tau ^{\prime } = \frac {1}{2} - \frac {1}{2}\left (\frac {1-2\tau }{4} \right )^{\frac {2k}{\eta \log {k}}}\). Thus, this variant requires a polynomial number of queries but has a worse time complexity. Given only n = Θ(k) queries, the best algorithms run in exponential time 2Θ(k) [36, 39].

An easy to solve instance of L P N was introduced by Arora and Ge [5]. They show that in the k-wise version where the k-tuples of the noise bits can be expressed as the solution of a polynomial (e.g. there are no 5 consecutive errors in the sequence of queries), the problem can be solved in polynomial time. What makes the problem easy is the fact that an adversary is able to structure the noise.

In this paper we are interested in the B K W algorithm and its improvements presented by Levieil and Fouque [33] and by Guo et al. [24]. The common structure of all these algorithms is the following: given n queries from the \(\mathcal {O}^{\mathsf {LPN}}_{s,\tau }\) oracle, the algorithm tries to reduce the problem of finding a secret s of k bits to one where the secret \(s^{\prime }\) has only \(k^{\prime }\) bits, with \(k^{\prime }<k\). This is done by applying several reduction techniques. We call this phase the reduction phase. Afterwards, during the solving phase we can apply a solving algorithm that recovers the secret \(s^{\prime }\). We then update the queries with the recovered bits and restart to fully recover s. For the ease of understanding, we describe all the aforementioned L P N solving algorithms in this setting where we separate the algorithms in two phases. We emphasize the main differences between the algorithms and discuss which improvements they bring.

First, we assume that k = ab. Thus, we can visualise the k-bit length vectors v as a blocks of b bits.

2.2.1 B K W algorithm

The B K W algorithm as described in [33] works in two phases:

Reduction phase

Given n queries from the L P N oracle, we group them in equivalence classes. Two queries are in the same equivalence class if they have the same value on a set q 1 of b bit positions. These b positions are chosen arbitrarily in {1,…,k}. There are at most 2b such equivalence classes. Once this separation is done, we perform the following steps for each equivalence class: pick one query at random, the representative vector, and xor it to the rest of the queries from the same equivalence class. Discard the representative vector. This will give vectors with all bits set to 0 on those b positions. These steps are also illustrated in Algorithm 1 (steps 5 – 10). We are left with at least n−2b queries where the secret is reduced to kb effective bits (others being multiplied by 0 in all queries).

We can repeat the reduction technique a−1 times on other disjoint position sets q 2,…,q a−1 from {1,…,k}∖q 1 and end up with at least n−(a−1)2b queries where the secret is reduced to k−(a−1)b = b bits. The bias of the new queries is \(\delta ^{2^{a-1}}\), as shown by the following Lemma with w = 2a−1.

Lemma 2 ([9, 33])

If (v 1 ,c 1 ),…,(v w ,c w ) are the results of w queries from \(\mathcal {A}_{s,p}^{\mathsf {LPN}}\) , then the probability that:

$$\langle v_{1} \oplus v_{2} \oplus {\ldots} \oplus v_{w},s \rangle = c_{1} \oplus {\ldots} \oplus c_{w} $$

is equal to \(\frac {1+\delta ^{w}}{2}\).

It is easy so see that the complexity of performing this reduction step is \(\mathcal {O}(kan)\).

figure a

After a−1 iterations, we are left with at least n−(a−1)2b queries, and a secret of size of b effective bits at positions 1,…,b. The goal is to keep only those queries that have Hamming weight one (step 11 of Algorithm 1). Given n−(a−1)2b queries and one bit position \(j \in \{ 1, \ldots ,k\} \backslash \{ q_{1} \cup {\ldots } \cup q_{a-1} \}\), only \(n^{\prime } = \frac {n-(a-1)2^{b}}{2^{b}}\) will have a single non-zero bit on position j and 0 on all the others. These queries represent the input to the solving phase. The bias does not change since we do not alter the original queries. The complexity for performing this step for n−(a−1)2b queries is \(\mathcal {O}\left (b(n - (a-1)2^{b})\right )\) as the algorithm just checks if the queries have Hamming weight 1.

The bit c is part of the query also: it gets updated during the xoring operations but we do not consider this bit in partitioning or when computing the Hamming weight of a query. Later on, the information stored in this bit will be used to recover bits of the secret.

Remark 1

Given that we have performed the xor between pairs of queries, we note that the noise bits are no longer independent. In the analysis of B K W , this was overlooked by Levieil and Fouque [33].Footnote 1 The original B K W [9] algorithm overcomes this problem in the following manner: each query that has Hamming weight 1 is obtained with a fresh set of queries. Given a2b queries the algorithm runs the xoring process and is left with 2b vectors. From these 2b queries, with a probability of \( 1 - \left (1 - 2^{-b}\right )^{2^{b}} \approx 1 - \frac {1}{e}\), where e = 2.718, there is one with Hamming weight 1 on a given position i. In order to obtain more such queries the algorithm repeats this process with fresh queries. This means that for guessing 1 bit of the secret, the original algorithm requires \(n = a \cdot 2^{b} \cdot \frac {1}{1- 1/e} \cdot n^{\prime }\) queries, where \(n^{\prime }\) denotes the number of queries needed for the solving phase. This is larger than \(n = 2^{b} n^{\prime } + (a-1)2^{b}\) which is the number of queries given by Levieil and Fouque [33]. We implemented and run B K W as described in Algorithm 1 and we discovered that this dependency does not affect the performance of the algorithm. I.e., the number of queries computed by the theory that ignores the dependency of the error bits matches the practical results. We need \(n = n^{\prime } + (a-1)2^{b}\) (and not \(n = 2^{b} n^{\prime } + (a-1)2^{b}\)) queries in order to recover one block of the secret. The theoretical and practical results are presented in Section 5. Given our practical experiments, we keep the “heuristic” assumption of independence and the algorithm as described in [33] which we called B K W . Thus, we assume from now on the independence of the noise bits and the independence of the queries.

Another discussion on the independence of the noise bits is presented in [19]. There we can see what is the probability to have a collision, i.e. two queries that share an error bit, among the queries formed during the xoring steps.

We can repeat the algorithm a times, with the same queries, to recover all the k bits. The total time complexity for the reduction phase is \(\mathcal {O}\left (ka^{2}n\right )\) as we perform the steps described above a times (instead of \(\mathcal {O}(kan)\) as given in [33]). However, by making the selection of a and b adaptive with ab near to the remaining number of bits to recover, we can show that the total complexity is dominated by the one of recovering the first block. So, we can typically concentrate on the algorithm to recover a single block. We provide a more complete analysis in Section 3.

Solving phase

The B K W solving method recovers the 1-bit secret by applying the majority rule. The queries from the reduction phase are of the form \(c_{j}^{\prime }= s_{i} \oplus d_{j}^{\prime }\), \(d_{j}^{\prime } \leftarrow \mathsf {Ber}_{\left (1-\delta ^{2^{a-1}}\right )/2}\) and s i being the i th bit of the secret s. Given that the probability for the noise bit to be set to 1 is smaller than \(\frac {1}{2}\), in more than half of the cases, these queries will be s i . Thus, we decide that the value of s i is given by the majority rule (steps 12–14 of Algorithm 1). By applying the Chernoff bounds [13], we find how many queries are needed such that the probability of guessing incorrectly one bit of the secret is bounded by some constant θ, with 0<θ < 1.

The time complexity of performing the majority rule is linear in the number of queries.

Complexity analysis

With their analysis, Levieil and Fouque [33] obtain the following result:

Theorem 1 (Theorem 1 from[33])

For k=a⋅b, the B K W algorithm heuristically (\(n = 20 \cdot \ln (4k) \cdot 2^{b} \cdot \delta ^{-2^{a}} + (a-1)2^{b}, t = \mathcal {O}(kan), m=kn, \theta = \frac {1}{2},b\))-solves the L P N problem.Footnote 2

In Section 3 we will see that our theoretical analysis, which we believe to be more intuitive and simpler, gives tighter bounds for the number of queries.

2.2.2 L F 1 algorithm

During the solving phase, the B K W algorithm recovers the value of the secret bit by bit. Given that we are interested only in queries with Hamming weight 1, many queries are discarded at the end of the reduction phase. As first noted in [33], this can be improved by using a Walsh-Hadamard transform instead of the majority rule. This improvement of B K W is denoted in [33] by L F 1. Again, we present the algorithm in pseudo-code in Algorithm 2. As in B K W , we can concentrate on the complexity to recover the first block.

Reduction phase

The reduction phase for L F 1 follows the same steps as in B K W in obtaining new queries as 2a−1 xors of initial queries in order to reduce the secret to size b. At this step, the algorithm does not discard queries anymore but proceeds directly with the solving phase (see steps 3–10 of Algorithm 2). We now have \(n^{\prime } = n - (a-1)2^{b}\) queries after this phase.

figure b

Solving phase

The solving phase consists in applying a Walsh-Hadamard transform in order to recover b bits of the secret at once (steps 11–13 in Algorithm 2). We can recover the b-bit secret by computing the Walsh-Hadamard transform of the function \(f(x) = {\sum }_{i} 1_{v_{i}^{\prime }=x}(-1)^{c_{i}^{\prime }}\). The Walsh-Hadamard transform is \(\hat {f}(\nu )= {\sum }_{x} (-1)^{\langle \nu , x \rangle } f(x) = {\sum }_{x} (-1)^{\langle \nu , x \rangle }{\sum }_{i} 1_{v_{i}^{\prime }=x}(-1)^{c_{i}^{\prime }} = {\sum }_{i} (-1)^{\langle v_{i}^{\prime }, \nu \rangle + c_{i}^{\prime }} = n^{\prime } - 2HW\left (A^{\prime }\nu ^{T} + c^{\prime }\right ) \). For ν = s, we have \( \hat {f}(s) = n^{\prime } - 2 \cdot HW(d^{\prime })\), where \(d^{\prime }\) represents the noise vector after the reduction phase. We know that most of the noise bits are set to 0. So, \(\hat {f}(s)\) is large and we suppose it is the largest value in the table of \(\hat {f}\). Thus, we have to look at the maximum value of the Walsh-Hadamard transform in order to recover the value of s. A naive implementation of a Walsh-Hadamard transform would give a complexity of 22b since we apply it on a space of size 2b. Since we apply a fast Walsh-Hadamard transform, we get a time complexity of b2b [14].

Complexity analysis

The following theorem states the complexity of L F 1:

Theorem 2 (Theorem 2 from [33])

For k=a⋅b and a>1, the L F 1 algorithm heuristically (\(n = (8b + 200)\delta ^{-2^{a}} + (a-1)2^{b}, t = \mathcal {O}\left (kan+b2^{b}\right ), m=kn + b2^{b}, \theta = \frac {1}{2},b\))-solves the L P N problem.Footnote 3

The analysis is similar to the one done for B K W , except that we now work with blocks of the secret s and not bits. Thus, we bound by \(\frac {1}{2a}\) the probability that \(\hat {f}(s^{\prime }) > \hat {f}(s)\), where \(s^{\prime }\) is any of the 2b−1 values different from s. As for B K W , we will provide a more intuitive and tighter analysis for L F 1 in Section 3.2.

B K W vs. L F 1

We can see that compared to B K W , L F 1 brings a significant improvement in the number of queries needed. As expected, the factor 2b disappeared as we did not discard any query at the end of the reduction phase. There is an increase in the time and memory complexity because of the fast Walsh-Hadamard transform, but these terms are not the dominant ones.

2.2.3 L F 2 algorithm

L F 2 is a heuristic algorithm, also introduced in [33], that applies the same Walsh-Hadamard transform as L F 1, but has a different reduction phase. We provide the pseudocode for L F 2 below.

figure c

Reduction phase

Similarly to B K W and L F 1, the n queries are grouped into equivalence classes. Two queries are in the same equivalence class if they have the same value on a window of b bits. In each equivalence class we perform the xor of all the pairs from that class. Thus, we do not choose any representative vector that is discarded afterwards. Given that in an equivalence class there are n/2b queries, we expect to have \(2^{b} \left (\begin {array}{c}n/2^{b}\\2 \end {array}\right )\) queries at the end of the xor-ing. One interesting case is when n is of the form n = 3⋅2b as with this reduction phase we expect to preserve the number of queries since \(\left (\begin {array}{c}3\\2 \end {array}\right ) = 3\). For any n>3⋅2b, the number of queries will grow exponentially and will also affect the time and memory complexity.

Solving phase

This works like in L F 1.

In a scenario where the attacker has access to a restricted number of queries, this heuristic algorithm helps in increasing the number of queries. With L F 2, the attacker might produce enough queries to recover the secret s.

2.2.4 F M I C M algorithm

Another algorithm by Fossorier et al. [20] uses ideas from fast correlation attacks to solve the L P N problem. While there is an improvement compared with the B K W algorithm, this algorithm does not perform better than L F 1 and L F 2. Given that it does not bring better results, we just present the main steps of the algorithm.

As the previous algorithms, it can be split into two phases: reduction and solving phase. The reduction phase first decimates the number of queries and keeps only those queries that have 0 bits on a window of a given size. Then, it performs xors of several queries in order to further reduce the size of the secret. The algorithm that is used for this step is similar to the one that constructs parity checks of a given weight in correlation attacks. The solving phase makes use of the fast Walsh-Hadamard transform to recover part of the secret. By iteration the whole secret is recovered.

2.2.5 Covering codes algorithm

The new algorithm [24] that was presented at Asiacrypt 2014, introduces a new type of reduction. There is a difference between [24] and what was presented at the Asiacrypt conference (mostly due to our results). We concentrate here on [24] and in the next section we present the suggestions we provided to the authors.

Reduction phase

The first step of this algorithm is to transform the L P N instance where the secret s is randomly chosen to an instance where the secret has now a Bernoulli distribution. This method was described in [4, 6, 32].

Given n queries from the L P N oracle: \((\bar {v_{1}},c_{1})\), \((\bar {v_{2}}, c_{2}), {\ldots } , (\bar {v_{n}}, c_{n})\), select k linearly independent vectors \(\bar {v}_{i_{1}}, \ldots , \bar {v}_{i_{k}}\). Construct the k×k target matrix M that has on its columns the aforementioned vectors, i.e. \(M = \left [\bar {v}_{i_{1}}^{T} \bar {v}_{i_{2}}^{T} {\ldots } \bar {v}_{i_{k}}^{T}\right ]\). Compute \(\left (M^{T}\right )^{-1}\) the inverse of M T, where M T is the transpose of M. We can rewrite the k queries corresponding to the selected vectors as \(M^{T}s^{T}+d^{\prime }\), where \(d^{\prime }\) is the k-bit column vector \(d = \left (d_{i_{1}}, d_{i_{2}}, \ldots , d_{i_{k}}\right )^{T}\). We denote \(c^{\prime } = M^{T}s^{T}+d^{\prime }\). For any \(\bar {v}_{j}\) that is not used in matrix M do the following computation:

$$\bar{v}_{j} \left( M^{T}\right)^{-1} c^{\prime} + c_{j} = \left\langle \bar{v}_{j}\left( M^{T}\right)^{-1},d^{\prime} \right\rangle +d_{j}. $$

We discard the matrix M. From the initial set of queries, we have obtained a new set where the secret value is \(d^{\prime }\). This can be seen as a reduction to a sparse secret. The complexity of this transform is \(\mathcal {O}\left (k^{3} + n k^{2}\right )\) by the schoolbook matrix inversion algorithm. This can be improved as follows: for a fixed χ, one can split the matrix \(\left (M^{T}\right )^{-1}\) in \(a^{\prime } = \left \lceil \frac {k}{\chi } \right \rceil \) parts \(\left [\begin {array}{c}M_{1} \\ M_{2} \\{\ldots } \\ M_{a^{\prime }} \end {array}\right ]\)of χ rows. By pre-computing \(\bar {v} M_{i}\) for all \(\bar {v} \in \{0,1\}^{\chi }\), the operation of performing \(\bar {v}_{j}\left (M^{T}\right )^{-1}\) takes \(\mathcal {O}\left (ka^{\prime }\right )\). The pre-computation takes \(\mathcal {O}(2^{\chi })\) and is negligible if the memory required by the B K W reduction is bigger. With this pre-computation the complexity is \(\mathcal {O}(nka^{\prime })\).

Afterwards the algorithm follows the usual B K W reduction steps where the size of the secret is reduced to \(k^{\prime }\) by the xoring operation. Again the vector of k bits is seen as being split into blocks of size b. The B K W reduction is applied a times. Thus, we have \(k^{\prime } = k - ab\).

The secret s of \(k^{\prime }\) bits is split into 2 parts: one part denoted s 2 of \(k^{\prime \prime }\) bits and the other part, denoted s 1, of \(k^{\prime } - k^{\prime \prime }\) bits. The next step in the reduction is to guess value of s 1 by making an assumption on its Hamming weight: H W(s 1)≤w 0. The remaining queries are of the form \(\left (v_{i},c_{i}=\langle v_{i},s_{2}\rangle \oplus d_{i}\right )\), where \(v_{i},s_{2} \in \{0,1\}^{k^{\prime \prime }}\) and \(d_{i} \in \mathsf {Ber}_{\frac {1 - \delta ^{2^{a}}}{2}}\). Thus, the problem is reduced to a secret of \(k^{\prime \prime }\) bits.

At this moment, the algorithm approximates the v i vectors to the nearest codeword g i in a \(\left [k^{\prime \prime },\ell \right ]\) linear code where \(k^{\prime \prime }\) is the size and is the dimension. By observing that g i can be written as \(g_{i}=g_{i}^{\prime }G\), where G is the generating matrix of the code, we can write the equations in the form

$$c_{i}=\langle v_{i},s_{2}\rangle\oplus d_{i} = \left\langle g_{i}^{\prime}G,s_{2} \right\rangle \oplus \langle v_{i} - g_{i},s_{2} \rangle \oplus d_{i} = \left\langle g_{i}^{\prime},s_{2}^{\prime} \right\rangle \oplus d_{i}^{\prime} $$

with \(s^{\prime }_{2} = s_{2}G^{T}\) and \(d^{\prime }_{i}=\langle v_{i}-g_{i},s_{2}\rangle \oplus d_{i}\), where \(g_{i}^{\prime }, s_{2}^{\prime }\) have length . If the code has optimal covering radius ρ, v i g i is a random vector of weight bounded by ρ, while s 2 is a vector of some small weight bounded by w 1, with some probability. So, 〈v i g i ,s 2〉 is biased and we can treat \(d^{\prime }_{i}\) in place of d i .

In [24], the authors approximate the bias of 〈v i g i ,s 2〉 to \(\delta ^{\prime } = \left (1-2\frac {\rho }{k^{\prime \prime }}\right )^{w_{1}}\), as if all bits were independent. As discussed in the next section, this approximation is far from good.

No queries are lost during this covering code operation and now the secret is reduced to bits. We now have \(n^{\prime } = n - k - a2^{b}\) queries after this phase.

Solving phase

The solving phase of this algorithm follows the same steps as L F 1, i.e. it employs a fast Walsh-Hadamard transform. One should notice that the solving phase recovers relations between the bits of the secret and not actual bits of the secret.

Complexity analysis

Recall that in the algorithm two assumptions are made regarding the Hamming weight of the secret: that s 2 has a Hamming weight smaller than w 1 and that s 1 has a Hamming weight smaller than w 0. This holds with probability \(\Pr \left (w_{0},k^{\prime }-k^{\prime \prime }\right ) \cdot \Pr \left (w_{1},k^{\prime \prime }\right )\) where

$$\Pr(w,m) = \sum\limits_{i=0}^{w} (1-\tau)^{m-i} \tau^{i} \left( \begin{array}{c}m\\i \end{array}\right). $$

The total complexity is given by the complexity of one iteration to which we add the number of times we have to repeat the iteration. We state below the result from [24]:

Theorem 3 (Theorem 1. from [ 24 ])

Let n be the number of samples required and \(a,a^{\prime },b,w_{0},w_{1},\ell ,k^{\prime },k^{\prime \prime }\) be the algorithm parameters. For the L P N k,τ instance, the number of bit operations required for a successful run of the new attack is equal to

$$t = \frac{t_{\mathsf{sparse~reduction}} + t_{\mathsf{bkw~reduction}} + t_{\mathsf{guess}} + t_{\mathsf{covering~code}} + t_{\mathsf{Walsh~transform}}}{\Pr(w_{0},k^{\prime}-k^{\prime\prime}) \Pr(w_{1},k^{\prime\prime})}, $$

where

  • \(t_{\mathsf {sparse~reduction}} = nka^{\prime }\) is the cost of reducing the L P N instance to a sparse secret

  • t b k w r e d u c t i o n = (k+1)an is the cost of the B K W reduction steps

  • \(t_{\mathsf {guess}} = n^{\prime } {\sum }_{i=0}^{w_{0}} \left (\begin {array}{c}k^{\prime }-k^{\prime \prime }\\ i \end {array}\right ) i\) is the cost of guessing \(k^{\prime }-k^{\prime \prime }\) bits and \(n^{\prime } = n - k - a 2^{b}\) represents the number of queries at the end of the reduction phase

  • \(t_{\mathsf {covering~code}} = \left (k^{\prime \prime } -\ell \right ) \left (2n^{\prime } + 2^{\ell }\right ) \) is the cost of the covering code reduction and \(n^{\prime }\) is again the number of queries

  • \(t_{\mathsf {Walsh~transform}} = \ell 2^{\ell } {\sum }_{i=0}^{w_{0}} \left (\begin {array}{c}k^{\prime }-k^{\prime \prime }\\i \end {array}\right )\) is the cost of applying the fast Walsh-Hadamard transform for every guess of \(k^{\prime }-k^{\prime \prime }\) bits

under the condition that \(n-a2^{b} > \frac {1}{\delta ^{2^{a+1}} \cdot \delta ^{\prime 2}},\) where δ=1−2τ and \(\delta ^{\prime } = \left (1-2\frac {\rho }{k^{\prime \prime }}\right )^{w_{1}}\) and ρ is the smallest integer, s.t. \({\sum }_{i=0}^{\rho } \left (\begin {array}{c}k^{\prime \prime }\\i \end {array}\right ) > 2^{k^{\prime \prime } -\ell }\).

The condition \(n-a2^{b} > \frac {1}{\delta ^{2^{a+1}} \cdot \delta ^{\prime 2}}\) proposed in [24] imposes a lower bound on the number of queries needed in the solving phase for the fast Walsh-Hadamard transform. In our analysis, we will see that this is underestimated: the Chernoff bounds dictate a larger number of queries.

3 Tighter theoretical analysis

In this section we present a different theoretical analysis from the one of Levieil and Fouque [33] for the solving phases of the L P N solving algorithms. A complete comparison is given in Section 5. Our analysis gives tighter bounds and aims at closing the gap between theory and practice. For the new algorithm from [24], we present the main points that we found to be incomplete.

We first show how the cost of solving one block of the secret dominates the total cost of recovering s. The main intuition is that after recovering a first block of \(k^{\prime }\) secret bits, we can apply a simple back substitution mechanism and consider solving a \(\mathsf {LPN}_{k-k^{\prime },\tau }\) problem. The same strategy is applied by [1,17] when solving L W E. Note that this is simply a generalisation of the classic Gaussian elimination procedure for solving linear systems, where we work over blocks of bits.

Specifically, let k 1 = k and \(k_{i} = k_{i-1} - k_{i-1}^{\prime }\) for i>1 and \(k^{\prime }_{i-1} < k_{i-1}\). Now, suppose we were able to \(\left (n_{i},t_{i},m_{i},\theta _{i},k_{i}^{\prime }\right )\)-solve an \(\mathsf {LPN}_{k_{i},\tau }\) instance (meaning we recover a block of size \(k_{i}^{\prime }\) from the secret of size k i with probability θ i , in time t i and with memory m i ). One can see that for k i+1<k i we need less queries to solve the new instance (the number of queries is dependent on the size k i+1 and on the noise level). With a smaller secret, the time complexity will decrease. Having a shorter secret and less queries, the memory needed is also smaller. Then, we can (n, t, m, θ, k)-solve the problem L P N k, τ (i.e recover s completely), with \(n = \max (n_{1},n_{2},\ldots )\), θ = θ 1+θ 2+…, \(t = t_{1}+k_{1}^{\prime } n_{1} + t_{2}+ k_{2}^{\prime } n_{2} \ldots \) (the terms \(k_{i}^{\prime } n_{i}\) are due to query updates by back substitution) and \(m = \max (m_{1},m_{2},\ldots )\). Finally, by taking θ i = 3i, we obtain \(\theta \leq \frac {1}{2}\) and thus recover the full secret s with probability over 50 %.

It is easily verified that for all the algorithms we consider, we have n = n 1, m = m 1, and t is dominated by t 1. We provide an example on a concrete L P N instance in Appendix B.

For all the solving algorithms presented in this section we assume that \(n^{\prime }\) queries remain after the reduction phase and that the bias is \(\delta ^{\prime }\). For the solving techniques that recover the secret block-by-block, we assume the block size to be \(k^{\prime }\).

3.1 B K W algorithm

Given an L P N instance, the B K W solving method recovers the 1 bit secret by applying the majority rule. Recall that the queries are of the form \(c_{j}^{\prime } = s_{i} \oplus d_{j}^{\prime }\), \(d_{j}^{\prime } \leftarrow \mathsf {Ber}_{(1-\delta ^{\prime })/2}\). The majority of these queries will most likely be \(c_{j}^{\prime } = s_{i}\). It is intuitive to see that the majority rule fails when more than half of the noise bits are 1 for a given bit. Any wrong guess of a bit gives a wrong value of the k-bit secret s. In order to bound the probability of such a scenario, we use the Hoeffding bounds [26] with X j = d j (See Appendix A). We have \(\Pr [X_{j} =1] = \frac {1-\delta ^{\prime }}{2}\). For \(X = {\sum }_{j=1}^{n^{\prime }} X_{j}\), we have \(E(X) = \frac {(1-\delta ^{\prime })n^{\prime }}{2}\) and we apply Theorem 12 with \(\lambda = \frac {\delta n^{\prime }}{2}\), α j = 0 and β j = 1 and we obtain

$$\Pr\left[\mathsf{incorrect~guess~on~}s_{i}\right] = \Pr \left[ X \geq \frac{n^{\prime}}{2} \right] \leq e^{-\frac{n^{\prime}\delta^{\prime 2}}{2}}. $$

As discussed in Remark 1, the assumption of independence is heuristic.

Using the above results for every bit 1,…,b, we can bound by a constant θ, the probability that we guess incorrectly a block of s, with 0<θ < 1. Using the union bound, we get that \(n^{\prime }=2 \delta ^{\prime -2} \ln (\frac {b}{\theta })\). Given that \(n^{\prime } = \frac {n - (a-1)2^{b}}{2^{b}}\) and that \(\delta ^{\prime } = \delta ^{2^{a-1}}\), we obtain the following result.

Theorem 4

For k≤a⋅b, the B K W algorithm heuristically (\(n = 2^{b+1} \delta ^{-2^{a}} \ln \left (\frac {b}{\theta }\right ) + (a-1)2^{b}, t = \mathcal {O}(kan),~m=kn, \theta ,b\))-solves the L P N problem.

We note that we obtained the above result using the union bound. One could make use of the independence of the noise bits and obtain \(n = 2^{b+1} \delta ^{-2^{a}} \ln \left (\frac {1}{1- 2^{-1/k}} \right ) + (a-1)2^{b}\), but this would bring a very small improvement.

In terms of query complexity, we compare our theoretical results with the ones from [33] in Tables 1 and 2. We provide the \(\log _{2}(n)\) values for k varying from 32 to 100 and we take different Bernoulli noise parameters that vary from 0.01 to 0.4. Overall, our theoretical results bring an improvement of a factor 10 over the results of [33].

Table 1 B K W query complexity - our theory
Table 2 B K W query complexity - theory [33]

In Section 5.1 we show that Theorem 4 gives results that are very close to the ones we measure experimentally.

We note that our B K W algorithm, for which we have stated the above theorem, follows the steps from Algorithm 1 for k = ab. For k < ab the algorithm is a bit different. In this case we have a−1 blocks of size b and an incomplete block of size smaller than b. During the reduction phase, we first partition the incomplete block and then apply (a−2) reduction steps for the complete blocks. We finally have b bits to recover. Other than this small change, the algorithm remains the same.

If the term \(2^{b+1} \delta ^{-2^{a}} \ln \left (\frac {b}{\theta _{i}}\right )\) dominates n, the next iteration can use a decreased by 1 leading to a new \(n \approx 2^{b+1} \delta ^{-2^{a-1}} \ln \left (\frac {b}{\theta _{i+1}}\right )\) which is roughly the square root of the previous n. So, the complexity of recovering this block is clearly dominated by the cost of recovering the previous block. If the term (a−1)2b is dominating, we can decrease b by one in the next block and reach the same conclusion.

3.2 L F 1 algorithm

For the L F 1 algorithm, the secret is recovered by choosing the highest value of a Walsh-Hadamard transform. Recall that the Walsh transform is \(\hat {f}(\nu )= n^{\prime } - 2HW\left (A^{\prime }\nu ^{T} + c^{\prime }\right )\). For ν = s, we obtain that the Walsh transform has the value \( \hat {f}(s) = n^{\prime } - 2HW(d^{\prime }) \). We have \(E(\hat {f}(s))=n^{\prime } \delta ^{\prime }\).

The failure probability for L F 1 is bounded by the probability that there is another vector νs such that \(HW\left (A^{\prime } \nu ^{T} +c^{\prime }\right ) \leq HW\left (A^{\prime }s^{T} +c^{\prime }\right )\). Recall that \(A^{\prime }s^{T} + c^{\prime } =d^{\prime }\). We define x = s+ν so that \(A^{\prime } \nu ^{T} + c^{\prime } = A^{\prime }x^{T} + d^{\prime }\). We obtain that the failure probability is bounded by \(2^{k^{\prime }}\) times the probability that \(HW\left (A^{\prime }x^{T}+ d^{\prime }\right ) \leq HW(d^{\prime })\), for a fixed \(k^{\prime }\)-bit non-zero vector x. As \(A^{\prime }\) is uniformly distributed, independent from \(d^{\prime }\), and x is fixed and non-zero, \(A^{\prime }x^{T} + d^{\prime }\) is uniformly distributed, so we can rewrite the inequality as \(HW(y) \leq HW(d^{\prime })\), for a random y.

To bound the failure probability, we again use the Hoeffding inequality [26]. Let \(X_{1}, X_{2}, \ldots , X_{n^{\prime }}\) be random independent variables with \(X_{j}=y_{j}-d_{j}^{\prime }\), \(\Pr (X_{j} \in [-1,1]) =1\). We have \(E\left (y_{j}-d_{j}^{\prime }\right )=\frac {\delta ^{\prime }}{2}\). We can take \(\lambda = E[X]= \frac {\delta ^{\prime } n^{\prime }}{2} \) in Theorem 12 and obtain:

$$\Pr\left[\mathsf{incorrect~guess~on~one~block}\right] \leq 2^{k^{\prime}} \Pr \left[\sum\limits_{j=1}^{n^{\prime}} \left( y_{j} -d_{j}^{\prime}\right) \leq 0 \right] \leq 2^{k^{\prime}} e^{- \frac{n^{\prime} \delta^{\prime 2}}{8}}. $$

Again we can bound the probability of incorrectly guessing one block of s by θ. With \(n^{\prime }=8 \left (\ln \frac {2^{k^{\prime }}}{\theta }\right ) \delta ^{\prime -2}\), the probability of failure is smaller than θ. The total number of queries will be \(n = n^{\prime } + (a-1)2^{b}\) and we have \(\delta ^{\prime }=\delta ^{2^{a-1}}\), \(k^{\prime } = b\). Similar to B K W, we obtain the following theorem:

Theorem 5

For k≤a⋅b, the L F 1 algorithm heuristically (\(n = 8 \ln \left (\frac {2^{b}}{\theta }\right ) \delta ^{-2^{a}} +(a-1)2^{b}, t = \mathcal {O}\left (kan + b 2^{b}\right ), m=kn + b2^{b}, \theta ,b\))-solves the L P N problem.

By comparing the term \((8b + 200)\delta ^{-2^{a}}\) in Theorem 2 with our value of \(8 \ln \left (\frac {2^{b}}{\theta }\right ) \delta ^{-2^{a}}\), one might check that our term is roughly a factor 2 smaller than that of [33] for practical values of a and b. For example, for a L P N 768,0.01 instance (with a = 11, b = 70), our analysis requires 268 queries for the solving phase while the Levieil and Fouque analysis requires 269 queries.

3.3 L F 2 algorithm

Having the new bounds for L F 1, we can state a similar result for L F 2. Recall that when n = 3⋅2b, L F 2 preserves the number of queries during the reduction phase. For \(3 \cdot 2^{b} \geq n^{\prime }\) we have that:

Theorem 6

For k≤a⋅b and \(n = 3 \cdot 2^{b} \geq 8 \ln \left (\frac {2^{b}}{\theta }\right ) \delta ^{-2^{a}}\), the L F 2 algorithm heuristically (\(n = 3 \cdot 2^{b}, t = \mathcal {O}\left (kan + b2^{b}\right ), m=kn + b2^{b}, \theta ,b\))-solves the L P N problem.

One can observe that we may allow n to be smaller than 3⋅2b. Given that the solving phase may require less than 3⋅2b, we could start with less queries, decrease the number of queries during the reduction and end up with the exact number of queries needed for the solving phase.

3.4 Covering codes algorithm

Recall that the algorithm first reduces the size of the secret to \(k^{\prime \prime }\) bits by running B K W reduction steps. Then it approximates the v i vector to the nearest codeword g i in a \([k^{\prime \prime },\ell ]\) linear code with G as generator matrix. The noisy inner products can be rewritten as

$$c_{i} = \left\langle g_{i}^{\prime} G,s_{2} \right\rangle \oplus \langle v_{i}-g_{i},s_{2} \rangle\oplus d_{i} = \left\langle g_{i}^{\prime}, s_{2} G^{T} \right\rangle \oplus d_{i}^{\prime} = \left\langle g_{i}^{\prime}, s_{2}^{\prime} \right\rangle \oplus d_{i}^{\prime}, $$

where \(g_{i} = g_{i}^{\prime } G\), \(s_{2}^{\prime } = s_{2}G^{T}\) and \(d_{i}^{\prime } = \langle g_{i}-v_{i},s_{2} \rangle \oplus d_{i}\).

Given that the code has a covering radius of ρ and that the Hamming weight of s 2 is smaller than w 1, the bias of 〈g i v i ,s 2〉 is computed as \(\delta ^{\prime } = \left (1-2\frac {\rho }{k^{\prime \prime }}\right )^{w_{1}}\) in [24], where \(k^{\prime \prime }\) is the size of s 2. We stress that this approximation is far from good.

Indeed, with the [3,1] repetition code given as an example in [24], the xor of two error bits is unbiased. Even worse: the xor of the three bits has a negative bias. So, when using the code obtained by 25 concatenations of this repetition code and w 1 = 6, with some probability of 36 % we have at least two error bits falling in the same concatenation and the bias makes this approach fail.

We can do the same computation with the concatenation of five [23,12] Golay codes with w 1 = 15, as suggested in [24]. With probability 0.21 %, the bias is zero or negative so the algorithm fails. With some probability 8.3 %, the bias is too low.

In any case, we cannot take the error bits as independent. When the code has optimal covering radius ρ, we can actually find an explicit formula for the bias of 〈v i g i ,s 2〉 assuming that s 2 has weight w 1:

$$\Pr\left[\langle v_{i}-g_{i},s_{2}\rangle=1|HW(s_{2})=w_{1}\right]= \dfrac1{S(k^{\prime\prime},\rho)}\sum\limits_{i\leq \rho,i\mathsf{~odd}} \left( {w_{1}\atop i}\right)S(k^{\prime\prime}-w_{1},\rho-i) $$

where \(S(k^{\prime \prime },\rho )\) is the number of \(k^{\prime \prime }\)-bit strings with weight at most ρ.

To solve L P N 512,0.125, [24] proposes the following parameters

$$a=6\quad a^{\prime}=9\quad b=63\quad \ell=64\quad k^{\prime\prime}=124\quad w_{0}=2\quad w_{1}=16 $$

and obtain n = 266.3 and a complexity of 279.92. With these parameters, [24] approximated the bias to \(\left (1-2\frac {\rho }{k^{\prime \prime }}\right )^{w_{1}}=2^{-5.91}\) (with ρ = 14). With our exact formula, the bias should rather be of 2−7.05. So, n should be multiplied by 4.82 (the square of the ratio).

Also, we stress that all this assumes the construction of a code with optimal radius coverage, such as the Golay codes, or the repetition codes of odd length and dimension 1. But these codes do not exist for all \([k^{\prime \prime },\ell ]\). If we use concatenations of repetition codes, given as an example in [24], the formula for the bias changes. Given concatenations of the [k i ,1] repetition code, with \(k_{1} + {\ldots } + k_{\ell } = k^{\prime \prime }\), \(k_{i} \approx \frac {k^{\prime \prime }}{\ell }\) and 1≤i, we would have to split the secret s 2 in chunks of k 1,…,k bits. We take w 11+…+w 1 = w 1 where w 1i is the weight of s 2 on the i th chunk. In this case the bias for each repetition code is

$$ \delta_{i} = 1 - 2 \times \dfrac1{S(k_{i},\rho_{i})}\sum\limits_{j\leq \rho_{i},j\mathsf{~ odd}} \left( \begin{array}{c}w_{1i}\\ j \end{array}\right)S(k_{i}-w_{1i},\rho_{i}-j), $$
(1)

where \(\rho _{i} = \left \lfloor \frac {k_{i}}{2} \right \rfloor \).

The final bias is

$$ \delta^{\prime} = \delta_{1} {\cdots} \delta_{\ell}. $$
(2)

We emphasize that the value of n is underestimated in [24]. Indeed, with \(n^{\prime }=\mathsf {bias}^{-2}\), the probability that \(\arg \max \left (\hat {f}(\nu )\right )=s_{2}^{\prime }\) is too low in L F 1. To have a constant probability of success θ, our analysis says that we should multiply \(n^{\prime }\) by \(8\ln \left (\frac {2^{\ell }}{\theta }\right )\). For L P N 512,0.125 and \(\theta =\frac {1}{3}\), this is 363.

When presenting their algorithm at Asiacrypt 2014, the authors of [24] updated their computation by using our suggested formulas for the bias and the number of queries. In order to obtain a complexity smaller than 280, they further improved their algorithm by the following observation: instead of assuming that the secret s 2 has a Hamming weight smaller or equal to w 1, the algorithm takes now into account all the Hamming weights that would give a good bias for the covering code reduction. I.e., the algorithm takes into account all the Hamming weights w for which \(\delta ^{\prime } > \varepsilon _{\mathsf {set}}\), where ε s e t is a preset bias. The probability of a good secret changes from \(\Pr (w_{1},k^{\prime \prime })\) to \(\Pr (HW)\) that we define below. They further adapted the algorithm by using the L F 2 reduction steps. Recall that for n = 3⋅2b, the number of queries are preserved during the reduction phase. With these changes they propose the following parameters for L P N 512,0.125:

$$a=5\quad b=62\quad \ell=60\quad k^{\prime\prime}=180\quad w_{0}=2\quad \varepsilon_{\mathsf{set}} = 2^{-14.18} $$

Using two [90,30] linear codes, they obtain that n = 263.6 =3⋅2 b queries are needed, the memory used is of m = 272.6 bits and the time complexity is t = 279.7. Thus, this algorithm gives better performance than L F 2 and shows that this L P N instance does not offer a security of 80 bits. Footnote 4

With all the above observations we update the Theorem 3.

Theorem 7

Let \(a,a^{\prime },b,w_{0},w_{1},\ell ,k^{\prime },k^{\prime \prime },\varepsilon _{\mathsf {set}}\) be the algorithm parameters. The covering code ( \(n = 8 \ln \left (\frac {2^{\ell }}{\theta }\right ) \frac {1}{\delta ^{2^{a+1}} \varepsilon _{\mathsf {set}}^{2}} + a2^{b}\), \( t, m=kn + 2^{k^{\prime \prime } -\ell } + \ell 2^{\ell }, \theta ,\ell \))-solves the L P N problem Footnote 5, where δ=1−2τ and ε s e t is a preset bias. The code chosen for the covering code reduction step can be expressed as the concatenation of one or more linear codes. The time t complexity can be expressed as

$$t = \frac{t_{\mathsf{sparse~reduction}} + t_{\mathsf{bkw~reduction}} + t_{\mathsf{guess}} + t_{\mathsf{covering~code}} + t_{\mathsf{Walsh~transform}}}{\Pr(w_{0},k^{\prime}-k^{\prime\prime}) \Pr(HW)}, $$

where

  • \(t_{\mathsf {sparse~reduction}} = nka^{\prime }\) is the cost of reducing the L P N instance to a sparse secret

  • t b k w r e d u c t i o n = (k+1)an is the cost of the B K W reduction steps

  • \(t_{\mathsf {guess}} = n^{\prime } {\sum }_{i=0}^{w_{0}} \left (\begin {array}{c}k^{\prime }-k^{\prime \prime }\\i \end {array}\right ) i\) is the cost of guessing \(k^{\prime }-k^{\prime \prime }\) bits and \(n^{\prime } = n - k - a 2^{b}\) represents the number of queries at the end of the reduction phase

  • \(t_{\mathsf {covering~code}} = (k^{\prime \prime } -\ell ) (2n^{\prime } + 2^{\ell }) \) is the cost of the covering code reduction and \(n^{\prime }\) is again the number of queries

  • \(t_{\mathsf {Walsh~transform}} = \ell 2^{\ell } {\sum }_{i=0}^{w_{0}} \left (\begin {array}{c}k^{\prime }-k^{\prime \prime }\\i \end {array}\right )\) is the cost of applying the fast Walsh-Hadamard transform for every guess of \(k^{\prime }-k^{\prime \prime }\) bits

  • \(\Pr (HW) = {\sum }_{w_{i}} (1-\tau )^{k^{\prime \prime } - w_{i}} \tau ^{w_{i}} \left (\begin {array}{c}k^{\prime \prime }\\w_{i} \end {array}\right )\) where w i is chosen such that the bias \(\delta ^{\prime }\) (computed following (1) and (2)), which depends on w i and the covering radius ρ of the chosen code, is larger than ε s e t

4 Other L P N solving algorithms

Most L P N-based encryption schemes use τ as a function of k, e.g. \(\tau = \frac {1}{\sqrt {k}}\) [3,15]. The bigger the value of k, the lower the level of noise. For k = 768, we have τ≈0.036. For such a value we say that the noise is sparse. Given that these L P N instances are used in practice, we consider how we can construct other algorithms that take advantage of this extra information.

The first two algorithms presented in this section bring new ideas for the solving phase. The third one provides a method to recover the whole secret and does not need any reduction phase.

We maintain the notations used in the previous section: \(n^{\prime }\) queries remain after the reduction phase, the bias is \(\delta ^{\prime }\) and the block size is \(k^{\prime }\).

For these solving algorithms, we assume that the secret is sparse. Even if the secret is not sparse, we can just assume that the noise is sparse. We can transform an L P N instance to an instance of L P N where the secret is actually a vector of noise bits by the method presented in [32]. The details of this transform were given in Section 2.2.5 for the covering codes algorithm.

We denote by Δ the sparseness of the secret, i.e. \(\Pr [s_{i} = 1] = \frac {1 - {\Delta }}{2}\) for any 1≤ik. We say that the secret is Δ-sparse. We can take Δ=δ.

The assumption we make is that the Hamming weight of the \(k^{\prime }\)-bit length secret s is in a given range. On average we have that \(HW(s)=k^{\prime }\left (\frac {1-{\Delta }}{2}\right )\), so an appropriate range is \(\left [0,k^{\prime }\left (\frac {1-{\Delta }}{2}\right ) + \frac {\sigma }{2} \sqrt {k^{\prime }}\right ]\), where σ is constant. We denote \(k^{\prime }\left (\frac {1-{\Delta }}{2}\right )\) by E H W and \(\frac {\sigma }{2} \sqrt {k^{\prime }}\) by d e v. Thus, we are searching in the range [0,E H W +d e v]. We can bound the probability that the secret has a Hamming weight outside the range by using the Hoeffding bound [26].

Let \(X_{1}, X_{2}, \ldots , X_{k^{\prime }}\) be independent random variables that correspond to the secret bits, i.e. \(\Pr [X_{i}=1] = \frac {1-{\Delta }}{2}\) and \(\Pr (X_{i} \in [0,1]) =1\). We have \(E(X)=\frac {1-{\Delta }}{2} k^{\prime }\). Using Theorem 12, we get that

$$\Pr[HW(s)\mathsf{~not~in~range}] = \Pr \left[ HW(s) - \frac{(1-{\Delta})}{2} k^{\prime} \geq \sigma \sqrt{\frac{k^{\prime}}{4}} \right] \leq e^{-\frac{\sigma^{2}}{2}}. $$

If we want to bound by θ/2 the probability that H W(s) is not in the correct range for one block, we obtain that \(\sigma = \sqrt {2 \ln \left (\frac {2}{\theta }\right )}\).

4.1 Exhaustive search on sparse secret

We have \(S = {\sum }_{i=0}^{E_{HW} + \mathsf {dev}} \left (\begin {array}{c}k^{\prime }\\i \end {array}\right )\) vectors ν with Hamming weight in our range. One first idea would be to perform an exhaustive search on the sparse secret. We denote this algorithm by S e a r c h 1. For every such value ν, we compute \(HW\left (A\nu ^{T} +c\right )\). In order to compute the Hamming weight we have to compute the multiplication between A and all ν which have the Hamming weight in the correct range. This operation would take \(\mathcal {O}\left (S n^{\prime } k^{\prime }\right )\) time but we can save a \(k^{\prime }\) factor by the following observation done in [7]: computing A ν T, with H W(ν)=i means xoring i columns of A. If we have the values of A ν T for all ν where H W(ν)=i then we can compute \(A \nu ^{\prime T}\) for \(HW(\nu ^{\prime }) = i+1\) by adding one extra column to the previous results.

We use here a similar reasoning done for the Walsh-Hadamard transform. When ν = s, the value of \(HW\left (As^{T} + c\right )\) is equal to H W(d) and we assume that this is the smallest value as we have more noise bits set on 0 than 1. Thus, going through all possible values of ν and keeping the minimum will give us the value of the secret. The time complexity of S e a r c h 1 is the complexity of computing the Hamming weight, i.e. \(O(S n^{\prime })\).

Besides S e a r c h 1, which requires a matrix multiplication for each trial, we also discovered that a Walsh transform can be used for a sparse secret. We call this algorithm S e a r c h 2. The advantage is that a Walsh transform is faster than a naive exhaustive search and thus improves the time complexity. We thus compute the fast Walsh-Hadamard transform and search the maximum of \(\hat {f}\) only for those S values with Hamming weight in the correct range. Given that we apply a Walsh transform we get that the complexity of this solving algorithm is \(\mathcal {O}(k^{\prime } 2^{k^{\prime }})\). So, it is more interesting than S e a r c h 1 when \(Sn^{\prime } >k^{\prime }2^{k^{\prime }}\).

For both algorithms the failure probability is given by the scenario where there exists another sparse value νs such that \(HW\left (A\nu ^{T} +c\right ) \leq HW\left (As^{T}+c\right )\). As we search through S possible values for the secret we obtain that

$$\Pr[\mathsf{incorrect~guess~on~one~block}] \leq S e^{- \frac{n^{\prime} \delta^{\prime 2}}{8}}. $$

The above probability accounts for only one block of the secret. Thus we can say that with \(\sigma = \sqrt {2 \ln \left (\frac {2}{\theta }\right )}\) and \(n= 8\left (\ln \frac {2S}{\theta }\right ) \delta ^{-2^{a}} + (a-1)2^{b}\), the probability of failure is smaller than θ.

Another failure scenario, that we take into account into our analysis, occurs when the secret has a Hamming weight outside our range.

Complexity analysis

Taking \(n = n^{\prime } + (a-1)2^{b}\), \(k^{\prime } = b\), \(\delta ^{\prime } = \delta ^{2^{a-1}}\) and Δ=δ, we obtain the following theorems for S e a r c h 1 and S e a r c h 2:

Theorem 8

Let \(S = {\sum }_{i = 0}^{E_{HW} + \mathsf {dev}} \binom {b}{i}\) where \(E_{HW} = b\left (\frac {1-{\Delta }}{2}\right )\) and \(\mathsf {dev} = \frac {\sigma }{2} \sqrt {b}\) and let \(n^{\prime } = 8\ln \left (\frac {2S}{\theta }\right ) \delta ^{-2^{a}}\). For k≤a⋅b and a secret s that is Δ-sparse, the S e a r ch11 algorithm heuristically (\(n = 8 \ln \left (\frac {2S}{\theta }\right ) \delta ^{-2^{a}} + (a-1)2^{b}, t = \mathcal {O}(kan + n^{\prime } S), m = kn + b \left (\begin {array}{c}b\\E_{HW}+ \mathsf {dev} \end {array}\right ) , \theta ,b\))-solves the L P N problem.

Theorem 9

Let \(S = {\sum }_{i = 0}^{E_{HW} + \mathsf {dev}} \binom {b}{i}\) where \(E_{HW} = b\left (\frac {1-{\Delta }}{2}\right )\) and \(\mathsf {dev} = \frac {\sigma }{2} \sqrt {b}\). For k≤a⋅b and a secret s that is Δ-sparse, the S e a r c h 2 algorithm heuristically (\(n = 8 \ln \left (\frac {2S}{\theta }\right ) \delta ^{-2^{a}} + (a-1)2^{b}, t = \mathcal {O}(kan + b 2^{b}), m = kn, \theta ,b\))-solves the L P N problem.

Here, we take the probability, that any of the two failure scenarios to happen, to be each θ/2. A search for the optimal values such that their sum is θ, brings a very little improvement to our results. Taking \(k^{\prime }=b\), we stress that S is much smaller than the \(2^{k^{\prime }} = 2^{b}\) term that is used for L F 1. For example, for k = 768, a = 11, b = 70 and τ = 0.05, we have that S≈233 which is smaller than 2b = 270 and we get \(n^{\prime }= 2^{67.33}\) and n = 273.34 (compared to \(n^{\prime }=2^{68.32}\) and n = 273.37 for L F 1). We thus expect to require less queries for exhaustive search compared to L F 1. As the asymptotic time complexity of S e a r c h 2 is the same as L F 1 and the number of queries is smaller, we expect to see that this algorithm runs faster than L F 1.

4.2 Meet in the middle on sparse secret (MITM)

Given that A s T+d = c, we split s into s 1 and s 2 and rewrite the equation as \(A_{1}{s_{1}^{T}} +d =A_{2}{s_{2}^{T}} + c\). With this split, we try to construct a meet-in-the-middle attack by looking for \(A_{2}{s_{2}^{T}} + c\) close to \(A_{1} {s_{1}^{T}}\). The secret s has size \(k^{\prime }\) and we split it into s 1 of size k 1 and s 2 of size k 2 such that \(k_{1} + k_{2} = k^{\prime }\). We consider that both s 1 and s 2 are sparse. Thus the Hamming weight of s i lies in the range \(\left [0,k_{i}\left (\frac {1-{\Delta }}{2}\right ) + \frac {\sigma ^{\prime }}{2} \sqrt {k_{i}}\right ]\). We denote \(k_{i}\left (\frac {1-{\Delta }}{2}\right ) + \frac {\sigma ^{\prime }}{2} \sqrt {k_{i}}\) by m a x H W (k i ). In order to bound the probability that both estimates are correct we use the same bound shown in Section 4 and obtain that \(\sigma ^{\prime } = \sqrt {2 \ln \left (\frac {4}{\theta }\right )}\).

For our MITM attack we have a pre-computation phase. We compute and store \(A_{1} {s_{1}^{T}}\) for all \(S_{1} = {\sum }_{i = 0}^{\mathsf {max_{HW}}(k_{1})} \binom {k_{1}}{i}\) possible values for s 1. We do the same for s 2, i.e compute \(A_{2}{s_{2}^{T}} + c\) for all \(S_{2} = {\sum }_{i = 0}^{\mathsf {max_{HW}}(k_{2})} \binom {k_{2}}{i}\) vectors s 2. The pre-computation phase takes \((S_{1} + S_{2}) n^{\prime }\) steps in total. Afterwards we pick ξ bit positions and hope that the noise d has only values of 0 on these positions. If this is true, then we could build a mask μ that has Hamming weight ξ such that dμ = 0. The probability for this to happen is \(\left (\frac {1+\delta ^{\prime }}{2}\right )^{\xi } = e^{-\xi \ln {\frac {2}{1+\delta ^{\prime }}}}\).

We build our meet-in-the-middle attack by constructing a hash table where we store, for all s 2 values, \(A_{2}{s_{2}^{T}}+c\) at position \(h\left ((A_{2}{s_{2}^{T}}+c) \wedge \mu \right )\). We have S 2 vectors s 2, so we expect to have S 22ξ vectors on each position of the hash table. For all S 1 values of s 1, we check for collisions, i.e. \(h((A_{1}{s_{1}^{T}}) \wedge \mu ) = h((A_{2}{s_{2}^{T}}+c) \wedge \mu )\). If this happens, we check if \(A_{1}{s_{1}^{T}}\) xored with \(A_{2}{s_{2}^{T}}+c\) gives a vector d with a small Hamming weight. Remember that with the pre-computed values we can compute d with only one xor operation. If the resulting vector has a Hamming weight in our range, then we believe we have found the correct s 1 and s 2 values and we can recover the value of s. Given that \(A_{1}{s_{1}^{T}}+A_{2}{s_{2}^{T}} + d = c\), we expect to have \(\left (A_{2}{s_{2}^{T}} +c\right ) \wedge \mu = A_{1}{s_{1}^{T}} \wedge \mu \) only when dμ = 0. The condition dμ = 0 holds with a probability of \(\left (\frac {1+\delta ^{\prime }}{2}\right )^{\xi }\) so we have to repeat our algorithm \(\left (\frac {2}{1+\delta ^{\prime }}\right )^{\xi }\) times in order to be sure that our condition is fulfilled.

As for exhaustive search, we have two scenarios that could result in a failure. One scenario is when s 1 or s 2 have a Hamming weight outside the range. The second one happens when there is another vector νs such that \(HW\left (A_{1} {\nu _{1}^{T}}+A_{2} {\nu _{2}^{T}}+c\right ) \leq HW\left (A_{1}{s_{1}^{T}}+A_{2}{s_{2}^{T}}+c\right )\) and \(\left (A_{1}{\nu _{1}^{T}}+A_{2} {\nu _{2}^{T}}+c\right ) \wedge \mu =0\). This occurs with probability smaller than \(S_{1}S_{2} e^{-\frac {n^{\prime } \delta ^{\prime 2}}{8}}\).

Complexity analysis

The time complexity of constructing the MITM attack is \((S_{1} + S_{2})n^{\prime } + ((S_{1}+S_{2})\xi + S_{1}S_{2}2^{-\xi } n^{\prime }) \cdot \left (\frac {2}{1+\delta ^{\prime }}\right )^{\xi }\). We include here the cost of the pre-computation phase and the actual MITM cost. We obtain that the time complexity is \(\mathcal {O}\left ((S_{1} + S_{2})n^{\prime } + (S_{1} + S_{2}) \xi \left (\frac {2}{1+\delta ^{\prime }}\right )^{\xi } + S_{1}S_{2} n^{\prime } \left (\frac {1}{1+\delta ^{\prime }}\right )^{\xi } \right )\). Taking again \(n^{\prime } = n - (a-1)2^{b}\), \(k^{\prime } = b\), \(\delta ^{\prime } = \delta ^{2^{a-1}}\), Δ=δ, we obtain the following result for MITM.

Theorem 10

Let \(n^{\prime } = 8\ln \left (\frac {2}{\theta } S_{1}S_{2}\right ) \delta ^{-2^{a}}\) . Take k 1 and k 2 values such that b=k 1 +k 2. Let \(S_{j} = {\sum }_{i = 0}^{\mathsf {max_{HW}}(k_{j})} \binom {k_{j}}{i} \) where \(\mathsf {max_{HW}}(k_{j}) = k_{j} \left (\frac {1-{\Delta }}{2}\right ) + \frac {\sigma ^{\prime }}{2} \sqrt {k_{j}}\) for j∈{1,2}. For k≤a⋅b and a secret s that is Δ-sparse, the MITM algorithm heuristically ( \(n = 8 \ln \left (\frac {2}{\theta } S_{1}S_{2}\right ) \delta ^{-2^{a}} + (a-1)2^{b}, t = \mathcal {O}\left (kan + (S_{1} + S_{2}) n^{\prime }+ (S_{1} + S_{2}) \xi \left (\frac {2}{1+\delta ^{2^{a-1}}}\right )^{\xi } + S_{1}S_{2} n^{\prime } \left (\frac {1}{1+\delta ^{2^{a-1}}}\right )^{\xi } \right ), m = kn + S_{2} + (S_{1} + S_{2}) n^{\prime }, \theta ,b\))-solves the L P N problem.

4.3 Gaussian elimination

In the case of a sparse noise, one may try to recover the secret s by using Gaussian elimination. It is well known that L P N with noise 0, i.e. τ = 0, is an easy problem. This idea was used in [12] in order to mount a passive attack on HB and HB+ protocols. If we are given Θ(k) queries for which the noise is 0, one can just run Gaussian elimination and in \(\mathcal {O}(k^{3})\) recover the secret s. For a L P N k, τ instance, the event of having no noise for k queries happens with a probability p n o n o i s e = (1−τ)k.

We design the following algorithm for solving L P N: first, we have no reduction phase. For each k new queries, we assume that the noise is 0. We recover an ν through Gaussian elimination. We must test if this value is the correct secret by computing the Hamming weight of \(A^{\prime } \nu ^{T} + c^{\prime }\), where \(A^{\prime }\) is the matrix that contains \(n^{\prime }\) fresh queries and \(c^{\prime }\) is the vector containing the corresponding noisy inner products. We expect to have a Hamming weight in the range \(\left [0, \left (\frac {1-\delta }{2}\right )n^{\prime } + \sigma \frac {\sqrt {n^{\prime }}}{2}\right ]\), where σ is a constant. From the previous results we know that for a correct secret we have

$$\Pr\left[HW\left( A^{\prime} s^{T}+ c^{\prime}\right)\mathsf{~not~in~range}\right] \leq e^{-\frac{\sigma^{2}}{2}}. $$

If we want to bound by θ/2 the probability that the Hamming weight of the noise is not in the correct range, for the correct secret, we obtain that \(\sigma = \sqrt {2 \ln \left (\frac {2}{\theta }\right )}\).

For a νs, we use the Hoeffding inequality to bound that \(HW\left (A^{\prime } \nu ^{T} + c^{\prime }\right )\) is in the correct range. Let \(X_{1}, \ldots , X_{n^{\prime }}\) be the random variables that correspond to X i = 〈v i ,ν〉⊕c i . Let \(X = X_{1} + {\ldots } + X_{n^{\prime }}\). We have \(E(X) =\frac {n^{\prime }}{2}\). Using the Hoeffding inequality, we take \(\lambda = \frac {\delta n^{\prime }}{2} -\sigma \frac {\sqrt {n^{\prime }}}{2}\) and obtain

$$\begin{array}{@{}rcl@{}} \Pr[\mathsf{failure}] & =& 2^{k} \Pr\left[HW\left.\left( A^{\prime} \nu^{T} + c^{\prime}\right)\right]\mathsf{~in~correct~range}\right] \\ & =& 2^{k} \Pr[X - E(X) \leq -\lambda ] \\ & \leq& 2^{k} e^{- \frac{2 \left( \frac{\delta n^{\prime}}{2} - \sigma\frac{\sqrt{n^{\prime}}}{2}\right)^{2}}{n^{\prime}}} = 2^{k} e^{- \frac{ \left( \delta \sqrt{n^{\prime}} -\sigma\right)^{2}}{2}} \end{array} $$

If we bound this probability of failure by θ/2 we obtain that we need at least \(n^{\prime } = \left (\sqrt {2\ln {\frac {2^{k+1}}{\theta }}} + \sigma \right )^{2} \delta ^{-2}\) queries besides the k that are used for the Gaussian elimination.

As aforementioned, with a probability of p n o n o i s e = (1−τ)k, the Gaussian elimination will give the correct secret. Thus, we have to repeat our algorithm \(\frac {1}{p_{\mathsf {no noise}}}\) times.

Complexity analysis

The computation of the Hamming weight has a cost of \(\mathcal {O}(n^{\prime }k^{2})\). Given that we run the Gaussian elimination and the verification step \(\frac {1}{p_{\mathsf {no noise}}}\) times, we obtain the following theorem for this algorithm:

Theorem 11

Let \(n^{\prime } = \left (\sqrt {2 \ln {\frac {2^{k+1}}{\theta }}} + \sqrt {2 \ln \left (\frac {2}{\theta }\right )} \right )^{2} \delta ^{-2}\). The Gaussian elimination algorithm (\(n = \frac {k + 2}{(1-\tau )^{k}} + n^{\prime } , t = \mathcal {O} \left (\frac {n^{\prime }k^{2} + k^{3}}{(1-\tau )^{k}} \right ), m =k^{2} + n^{\prime } k, \theta ,k\) )-solves the L P N problem.Footnote 6

Remark 2

Notice that this algorithm recovers the whole secret at once and the only assumption we make is that the noise is sparse. We don’t need to run the transform such that we have a sparse secret and there are no queries lost during the reduction phase.

Remark 3

In the extreme case where (1−τ)k>θ, the Gaussian elimination algorithm can just assume that k queries have noise 0 and retrieve the secret s without verifying that this is the correct secret.

5 Tightness of our query complexity

In this section we compare the theoretical analysis with implementation results of all the L P N solving algorithms described in Sections 3 & 4.

We implemented the B K W, L F 1 and L F 2 algorithms as they are presented in [33] and in pseudocode in Algorithms 1–3. The implementation was done in C on a Intel Xeon 3.33Ghz CPU. We used a custom bit library to store and handle bit vectors. Using the OpenMP libraryFootnote 7, we have also parallelized certain crucial parts of the algorithms. The xor-ing in the reduction phases as well as the majority phases for instance, are easily distributed onto multiple threads to speed up the computation. Furthermore, we implemented the exhaustive search and MITM algorithms described in Section 4. The various matrix operations performed for the sparse L P N solving algorithms are done with the M4RI library Footnote 8. Regarding the memory model used, we implemented the one described in [33] in order to accommodate the L F 2 algorithm. The source code of our implementation can be found at http://lasec.epfl.ch/lpn/lpn_source_code.zip.

We ran all the algorithms for different L P N instances where the size of the secret varies from 32 to 100 bits and the Bernoulli parameter τ takes different values from 0.01 to 0.4. A value of τ = 0.1 for a small k as the one we are able to test means that very few, if none, of the queries have the noise bits set on 1. For this sparse case, an exhaustive search is the optimal strategy. Also, τ = 0.4 might seem also as an extreme case. Still, we provide the query complexity for these extreme cases to fully observe the behaviour of the L P N solving algorithms.

For each L P N instance, we try to find the theoretical number of oracle queries required to get a 50 % probability of recovering the full secret while optimizing the time complexity. This means that in half of our instances we recover the secret correctly. In the other half of the cases it may happen that one or more bits are guessed wrong. We thus take \(\theta = \frac {1}{3}\) as the probability of failure for the first block. We choose a and b that would minimize the time complexity and we apply this split in our theoretical bounds in order to compute the theoretical number of initial queries. We apply the same split in practice and try to minimize the number of initial queries such that we maintain a 50 % probability of success. We thus experimented with different values for the original number of oracle samples, and ran multiple instances of the algorithms to approximate the success probability. One can observe that in our practical and theoretical results the a, b parameters are the same and the comparison is consistent. We were limited by the power of our experimental environment and thus we were not able to provide results for instances that require more than 230 queries.

5.1 B K W

The implementation results for B K W are presented in Table 3. Each entry in the table is of the form \(\log _{2}(n)(a)\), where n is the number of oracle queries that were required to obtain a 50 % success rate for the full recovery of the secret. Parameter a is the algorithm parameter denoting the number of blocks into which the vectors were split. We take \(b= \lceil \frac {k}{a} \rceil \). By maintaining the value of a, we can easily compute the number of queries and the time & memory complexity. In Table 4 we present the theoretical results for B K W obtained by using Theorem 4. We can see that our theoretical and practical results are within a factor of at most 2.

Table 3 B K W query complexity - practice
Table 4 B K W query complexity - theory

If we take the example of L P N 100,0.01, we need 220.78 queries and our theoretical analysis gives a value of 221.47. These two values are very close compared with the value predicted by [33], 225.64, which is a factor 10 larger. We emphasize again that for both the theory and the practice we use the split that optimizes the time complexity and from this optimal split we derive the number of queries.

Remark 4

For the B K W algorithm we tried to optimize the average final bias of the queries, i.e. obtaining a better value than \(\delta ^{2^{a-1}}\). Recall that at the beginning of the reduction phase, we order the queries in equivalence classes and then choose a representative vector that is xored with the rest of queries from the same class. One variation of this reduction operation would be to change several times the representative vector. The incentive for doing so is the following: one representative vector that has error vector set on 1 affects the bias δ of all queries, while by choosing several representative vectors this situation may be improved; more than half of them will have error bit on 0. We implemented this new approach but we found that it does not bring any significant improvement. Another change that was tested was about the majority rule applied during the solving phase. Queries have a worst case bias of \(\delta ^{2^{a-1}}\) (See Lemma 2), but some have a larger bias. So, we could apply a weighted majority rule. This would decrease the number of queries needed for the solving phase. We implemented the idea and discovered that the complexity advantage is very small.

5.2 L F 1

Below we present the experimental and theoretical results for the L F 1 algorithm. As a first observation we can see that, for all instances, this algorithm is a clear optimization over the original B K W algorithm. As before, each entry is of the form \(\log _{2}(n)(a)\), where n and a are selected to obtain a 50 % success rate for the full recovery of the secret and \(b = \lceil \frac {k}{a} \rceil \).

Table 6 shows our theoretical results for L F 1 using Theorem 5. When we compare the experimental and the practical results for L F 1 (See Tables 5 and 6) we can see that the gap between them is of a factor up to 3.

Table 5 L F 1 query complexity - practice
Table 6 L F 1 query complexity - theory

Remark 5

One may observe a larger difference for the L P N 48,0.4 instance: n = 219.74 (practice) vs. n = 224.01 (theory). For this case, the implementation requires n = 219.74 initial queries compared with the theory that requires n = 224.01 queries. Here we have a = 2 and b = 24 and the term (a−1)2b dominates the query complexity. The discrepancy comes from the worst-case analysis of the reduction phase where we say that at each reduction step we discard 2b queries. With this reasoning, we predict to lose 224 queries. If we analyse more closely, we discover that actually in the average-case we discard only \(2^{b} \cdot \left [1 - \left (1-\frac {1}{2^{b}}\right )^{n}\right ]\) queries (this is the number of expected non-empty equivalence classes). Thus, with only 219.74 initial queries, we run the reduction phase and discard 219.70 queries, instead of 224. We are left with 214.45, queries which are sufficient for the solving phase. We note that for large L P N instances, this difference between worst-case and average-case analysis for the number of deleted queries during reduction rounds becomes negligible.

Remark 6

Recall that in L F 1, like in all L P N solving algorithms, we perform the reduction phase by splitting the queries into a blocks of size b. When this split is not possible, we consider that we have a−1 blocks of size b and a last block shorter of size \(b^{\prime }\) with \(b^{\prime }<b\). By L F 1 we denote the same L P N solving algorithm that makes use of the Walsh transform but where the split of the blocks is done different. We allow now to have a last block larger than the rest. The gain for this strategy may be the following: given that we recover a larger block of the key, we run our solving phase fewer times. Although the complexity of the transform is bigger as we work with a bigger block, the reduction phase has to be applied fewer times. From our experiments we discover there seems to be no difference between the performance of the two algorithms.

5.3 L F 2

We tested the L F 2 heuristic on the same instances as for B K W and L F 1. The results are summarized in Table 7. To illustrate the performance of the heuristic, we concentrate on a particular instance, L P N 100,0.1 with a = 5,b = 20. As derived in [33], the L F 1 algorithm for this parameter set should require less than \((8\cdot b+200)\cdot \delta ^{-2^{a}} \approx 2^{18.77}\) queries for a solving phase and \((a-1)\cdot 2^{b} + (8\cdot b+200)\cdot \delta ^{-2^{a}} \approx 2^{22.13}\) queries overall to achieve a success probability of 50 %. Using our theoretical analysis, the L F 1 algorithm for this parameter set requires to have \(8 \ln (3 \cdot 2^{b}) \delta ^{-2^{a}} + (a-1)2^{b} \approx 2^{22.05}\) queries overall and 217.20 queries for the solving phase. Our experimental results for L F 1 were a bit lower than our theoretical ones: 221.38 oracle samples were sufficient. If we use the L F 2 heuristic starting with 3⋅220 ≈2 21.58 samples, we get about the same amount of vectors for the solving phase. In this case there are no queries lost during reduction. We thus have much more queries than should actually be required for a successful solving phase and correctly solve the problem with success probability close to 100 %. So we can try to start with less. By starting off with 220.65 queries and thus loosing some queries in each reduction round, we also solved the L P N problem in slightly over 50 % of the cases. The gain in total query complexity for L F 2 is thus noticeable but not extremely important.

Table 7 L F 2 query complexity - practice

As another example, consider the parameter set k = 768,τ = 0.05 proposed at the end of [33]. The values for a, b which minimize the query complexity are a = 9,b = 86 (ab = 774>k). Solving the problem with L F 1 should thus require about 287 vectors for the solving phase and 289 oracle samples overall. Using L F 2, as 3⋅2b ≈2 87 oracle samples would be sufficient, we obtain a reduction by a factor ≈4.

Even though L F 2 introduces linear dependencies between queries, this doesn’t seem to have any noticeable impact on the success probability in recovering the secret value.

Remark 7

A general observation for all these three algorithms, shown also by our results, is that the bias has a big impact on the number of queries and the complexity. Recall that the bias has value \(\delta ^{2^{a-1}}\) at the end of the reduction phase. We can see from our tables that the lower the value of τ, i.e. larger value of δ = 1−2τ, the higher a can be chosen to solve the L P N instance. Also, for a constant τ, the higher the size of the secret, the higher a can be chosen.

Remark 8

The L F 2 algorithm is a variation of L F 1 that offers a different heuristic technique to decrease the number of initial queries. The same trick could be used for B K W , exhaustive search and MITM.

While the same analysis can be applied for exhaustive search and MITM as for L F 2, B K W is a special case. We denote by B K W 2 this variation of B K W where we use the reduction phase from L F 2. Recall that for B K W we need to have \(n = 2^{b+1} \delta ^{-2^{a}} \ln \left (\frac {b}{\theta }\right ) + (a-1)2^{b}\) queries and here the dominant term is \(2^{b+1} \delta ^{-2^{a}} \ln \left (\frac {b}{\theta }\right )\). Thus, we need to start with 3⋅2b+ε, where ε>0 and increase such that at the end of the last iteration of the reduction we get the required number of queries. This improves the initial number of queries and we have a gain of factor a for the time complexity. For an L P N 48,0.1 instance, our implementation of B K W 2 requires n = 213.82 = 3.54⋅212 initial queries and increases it, during the reduction phase, up to 219.51, the amount of queries needed for the solving phase. Thus, there is an improvement from 219.99 (See Table 3) to 213.82 and the time complexity is better. While this is an improvement over B K W , it still performs worse than L F 1 and L F 2.

5.4 Exhaustive search

Recall that for exhaustive search we have two variants. The results for S e a r c h 1 are displayed in Tables 8 and 9. For S e a r c h 1 we observe that the gap between theory and practice is of a factor smaller than 4. In terms of number of queries, S e a r c h 1 brings a small improvement compared to L F 1. We will see in the next section the complete comparison between all the implemented algorithms. The same (a−1)2b dominant term causes the bigger difference for the instances L P N 48,0.4 and L P N 64,0.25.

Table 8 S e a r c h 1 query complexity - practice
Table 9 S e a r c h 1 query complexity - theory

The results for S e a r c h 2 are displayed in Tables 10 and 11.

Table 10 S e a r c h 2 query complexity - practice
Table 11 S e a r c h 2 query complexity - theory

We notice that for both S e a r c h 1 and S e a r c h 2 the instances L P N 32,0.01, L P N 48,0.01 and L P N 68,0.01 have the number of queries very low. This is due to the following observation: for n ≤ 68 linearly independent queries and τ = 0.01 we have that the noise bits are all 0 with a probability larger than 50 %. Thus, for k ≤ 64 we hope that the k queries we receive from the oracle have all the noise set on 0. With k noiseless, linearly independent queries we can just recover s with Gaussian elimination. This is an application of Remark 3.

5.5 MITM

In the case of MITM, the experimental and theoretical results are illustrated in Tables 12 and 13. There is a very small difference between MITM and exhaustive search algorithms for a sparse secret: in practice, MITM requires just couple of tens queries less than S e a r c h 1 and S e a r c h 2 for the same a and b parameters.

Table 12 MITM query complexity - practice
Table 13 MITM query complexity - theory

5.6 Gaussian elimination

As aforementioned, in the Gaussian elimination the only assumption we need is to have a noise sparse. We don’t run any reduction technique and the noise is not affected. As the algorithm depends on the probability to have a 0 noise on k linearly independent vectors, the complexity decays very quickly once we are outside the sparse noise scenario. We present in Table 14 the theoretical results obtained for this algorithm.

Table 14 Gaussian elimination query complexity - theory

In the next section we will show the effectiveness of this simple idea in the sparse case scenario and compare it to the other L P N solving algorithms.

Again for L P N 32,0.01, L P N 48,0.01 and L P N 64,0.01 we apply Remark 3.

5.7 Covering codes

The covering code requires the existence of a code with the optimal coverage. For each instance one has to find an optimal code that minimizes the query and time complexity. Unlike the previous algorithms, this algorithm cannot be truly automatized. In practice we could test only the cases that were suggested in [24]. Thus, we are not able to compare the theoretical and practical values. Nevertheless, we will give theoretical values for different practical parameters in the next section.

6 Complexity analysis of the L P N solving algorithms

We have compared our theoretical bounds with our practical results and we saw that there is a small difference between the two. Our theoretical analysis also gives tighter bounds compared with the results from [33]. We now extend our theoretical results and compare the asymptotic performance of all the L P N algorithms for practical parameters used by the L P N-based constructions. We consider the family of \(\mathsf {LPN}_{k,\frac {1}{\sqrt {k}}}\) instances proposed in [3, 15]. Although the covering code cannot be automatized, as for each instance we have to try different codes with different sizes and dimensions, we provide results also for this algorithm. When dealing with the covering code reduction, we always assume the existence of an ideal code and compute the bias introduced by this step. We do not consider here concatenation of ideal codes and we will see that we obtain a worse result for the L P N 512,0.125 instance compared with the result from [24], although the difference is small. In the covering code algorithm we also stick with the B K W reduction steps and don’t use the L F 2 reduction. As aforementioned, the L F 2 reduction brings a small improvement to the final complexity. This does not affect the comparison between all the L P N solving algorithms.

We analyse the time complexity of each algorithm, by which we mean the number of bit operations the algorithm performs while solving an L P N problem. For each algorithm, we consider values of k for which the parameters (a, b) minimising the time complexity are such that k = ab. For the L F 2 algorithm, we select the initial number of queries such that we are left with at least \(n^{\prime }=8\ln (3 \cdot 2^{b})\delta ^{-2^{a}}\) queries after the reduction phase. Recall that by S e a r c h 1 we denote the standard exhaustive search algorithm and S e a r c h 2 is making use of a Walsh-Hadamard transform. The results are illustrated in Fig. 1. We recall the time complexity and the initial number of queries for each algorithm in Table 15, where S represents the number of sparse secrets with S < 2b. For MITM, the values S 1 (resp. S 2) represent the number of possible values for the first (resp. second) half of the secret, \(n^{\prime } = 8 (\ln (6 S_{1}S_{2})) \delta ^{-2^{a}}\) represents the number of queries left after the reduction phase and ξ represents the Hamming weight of the mask used. In the case of the covering codes algorithm, all \(a,b,a^{\prime },k^{\prime },k^{\prime \prime },l,w_{0}, \varepsilon _{set}\) are parameters of the algorithm and \(n^{\prime }\) represents the number of queries left after the reduction phase. Recall that θ is \(\frac {1}{3}\).

Fig. 1
figure 1

Time Complexity of L P N Algorithms on instances \(\mathsf {LPN}_{k, \frac {1}{\sqrt {k}}}\)

Table 15 Query & Time complexity for L P N solving algorithms for recovering the first b bits

We can bound the logarithmic complexity of all these algorithms by \(\frac {k}{\log _{2}(k)} + c_{1}\) and \(c_{3} \log _{2}(k) + \frac {\sqrt {k}}{\ln (2)} + c_{2}\). The lower bound is given by the asymptotic complexity of the Gaussian elimination that can be expressed as \(c \log _{2}{k} + \frac {\sqrt {k}}{\ln (2)}\) when \(\tau = \frac {1}{\sqrt {k}}\).

The complexity of B K W can be written as \(\min _{k=ab} \left (\mathsf {poly} \cdot 2^{b} \cdot \delta ^{-2^{a}}\right )\) and for the other algorithms (except the Gaussian elimination) the formula is \(\min _{k=ab} (\mathsf {poly} \cdot \left (2^{b} + \delta ^{-2^{a}})\right )\), where p o l y denotes a polynomial factor. By searching for the optimal a, b values,we have two cases:

  • for a>1, we find \(a\sim \log _{2}\frac {k}{(\log _{2}k)^{2}\ln \frac 1\delta }\) and \(b = \frac {k}{a}\) and obtain that 2b dominates \(\delta ^{-2^{a}}\). For \(\delta = 1 - \frac {2}{\sqrt {k}}\) we obtain the complexity \(\mathsf {poly} \cdot 2^{\frac {k}{\log _{2}(k)}}\).

  • for a = 1, we have that for

    • B K W the complexity is p o l y⋅2k

    • L F 1, L F 2, S e a r c h 2 the complexity is p o l y+k2k

    • S e a r c h 1 , MITM the complexity is p o l yS r and \(\mathsf {poly} \cdot S_{r^{\prime }}^{2}\), respectively, where we define S r to be #{v∈{0,1}kH W(v)≤r}. We need to bound the value of S r . By induction we can show that \(S_{r} \leq \frac {k}{k-r-1} \cdot \frac {k^{r}}{r!}\). For \(\tau \approx \frac {1}{\sqrt {k}}\), we have that \(r \approx \left (1 + \frac {\sigma }{2}\right ) \sqrt {k}\) and \(r^{\prime } \approx \left (\frac {1}{2} + \frac {\sigma }{2 \sqrt {2}}\right ) \sqrt {k}\). We obtain that the complexity for both algorithms is \(\mathsf {poly} \cdot 2^{\gamma \sqrt {k} \log _{2}{k} + \mathcal {O}\left (\sqrt {k}\right )}\), where γ is a constant. This is not better than \(2^{\frac {k}{\log _{2}(k)}}\) for k < 200 000, but asymptotically this gives a better complexity.

For Gaussian elimination, the complexity is \(\frac {\mathsf {poly}}{(1-\tau )^{k}}\) which is \(\mathsf {poly} \cdot 2^{\sqrt {k}}\) for \(\tau = \frac {1}{\sqrt {k}}\).

We see in Fig. 1 that in some cases increasing the value of k may result in a decrease in time complexity. The reason for this is that we are considering L P N instances where the noise parameter τ takes value \(\frac {1}{\sqrt {k}}\). Thus, as k grows, the noise is reduced, which leads to an interesting trade-off between the complexity of the solving phase and the complexity of the reduction phase of the various algorithms. This behaviour does not seem to occur for the B K W algorithm. In this case, the query complexity \(n=2^{b+1}\left (1-\frac {2}{\sqrt {k}}\right )^{-2^{a}}\ln (2k) + (a -1)2^{b}\) is largely dominated by the first term, which grows exponentially not only in terms of the noise parameter, but also in terms of the block size b.

Remark 9 (L F 1 vs. S e a r c h 2 )

As shown in Fig. 1, the overall complexity of the L F 1 and S e a r c h 2 algorithms is quasi identical. From Theorems 5 and 9, we deduce that for the same parameters (a, b), the S e a r c h 2 algorithm should perform better as long as S < 2b−1. This is indeed the case for the instances we consider here, although the difference in complexity is extremely small.

We can see clearly that for the \(\mathsf {LPN}_{k,\frac {1}{\sqrt {k}}}\) family of instances, the Gaussian elimination outperforms all the other algorithms for k>500. For no k < 1000, the \(\mathsf {LPN}_{k,\frac {1}{\sqrt {k}}}\) offers an 80 bit security. This requirement is achieved for k = 1090.

Selecting secure parameters

We remind that for each algorithm we considered, our analysis made use of a heuristic assumption of query and noise independence after reduction. In order to propose security parameters, we simply consider the algorithm which performs best under this assumption.

By taking all the eight algorithms described in this article, Tables 16, 17, 18, 19, 20, 21, 22, 23 display the logarithmic time complexity for various L P N parameters. For instance, the L F 2 algorithm requires 284 steps to solve a L P N 384,0.25 instance.

Table 16 Security of L P N against B K W
Table 17 Security of L P N against L F 1
Table 18 Security of L P N against L F 2
Table 19 Security of L P N against S e a r c h 1
Table 20 Security of L P N against S e a r c h 2
Table 21 Security of L P N against MITM
Table 22 Security of L P N against Gaussian elimination
Table 23 Security of L P N against Covering codes

We recall here the result from [24]: an instance L P N 512,0.125 offers a security of 79.7. We obtain a value of 82. The difference comes mainly from the use of L F 2 reduction in [24] and from a search of optimal concatenation of linear codes.

When comparing all the algorithms, we have to keep in mind that the Gaussian elimination recovers the whole secret, while for the rest of the algorithms we give the complexity to recover a block of the secret. Still, this does not affect our comparison as we have proven in Section 3 that the complexity of recovering the first block dominates the total complexity.

We highlight with red the best values obtained for different L P N instances. We observe the following behaviour: for a sparse case scenario, i.e. τ = 0.05 for k≥576 or \(\tau = \frac {1}{\sqrt {k}} < 0.05\), the Gaussian elimination offers the best performance. For \(\tau = \frac {1}{\sqrt {k}}\) no k from our tables offers a 80 bit security. Once we are outside the sparse case scenario, we have that L F 2 and the covering code algorithms are the best ones. The covering code proves to be better than L F 2 for a level of noise of 0.125. While the performance of the covering code reduction highly depends on the sparseness of the noise, L F 2 has a more general reduction phase and is more efficient for noise parameters of 0.25 and 0.4. Also for a τ>0.5 the covering code is better than the Gaussian elimination.

Thus, for different scenarios, there are different algorithms that prove to be efficient. This comparison clearly shows that for the family of instances \(\mathsf {LPN}_{k,\frac {1}{\sqrt {k}}}\) neither the B K W, nor its variants are the best ones. One should use the Gaussian elimination algorithm.

As we have shown, there still remains a small gap between the theoretical and practical results for the algorithms we analysed. It thus seems reasonable to take a safety margin when selecting parameters to achieve a certain level of security.

Based on this analysis, we could recommend the L P N instances L P N 512,0.25, L P N 640,0.125, L P N 1280,0.05 or \(\mathsf {LPN}_{1280, \frac {1}{\sqrt {1280}}}\) to achieve 80 bit security for different noise levels. We note that the value L P N 768,0.05 that Levieil and Fouque suggest as a secure instance to use actually offers only 66 bit security and thus is not recommended.

7 Conclusion

In this article we have analysed and presented the existing L P N algorithms in a unified framework. We introduced a new theoretical analysis and this has improved the bounds of Levieil and Fouque [33]. In order to give a complete analysis for the L P N solving algorithms, we also presented three algorithms that use the advantage that the secret is sparse. We analysed also the latest algorithm presented at Asiacrypt 2014. While the covering code and the L F 2 algorithms perform best in the general case where the Bernoulli noise parameter is constant, the Gaussian elimination shows that for the sparse case scenario the length of the secret should be bigger than 1100 bits. Also, we show that some values proposed by Leviel and Fouque are insecure in the sparse case scenario.