1 Introduction

The hardness of noisy learning problems such as learning parity with noise (LPN) [BFKL93, BKW03] and learning with errors (LWE) [Reg05] have proved to be a goldmine in modern cryptography. The hardness of \(\mathsf {LWE}\) has been instrumental in solving long-standing problems such as fully homomorphic encryption [Gen09, BV11]. Both \(\mathsf {LPN}\) and \(\mathsf {LWE}\) have given us efficient and plausibly quantum-proof cryptographic constructions [KPC+11, BCD+16, ADPS16]. However, while we know several structural results about LWE, relatively little is known about the 25-year old LPN problem.

Before we proceed, let us define the LPN and LWE problems. In the (search version of the) \(\mathsf {LPN}\) problem, the algorithm is given access to an oracle that produces samples \((\mathbf {{a}}_i, \mathbf {{s}}^T \mathbf {{a}}_i + e_i)\) where \(\mathbf {{s}}\in \mathbb {Z}_2^n\) is the “secret” vector, \(\mathbf {{a}}_i \in \mathbb {Z}_2^n\) are uniformly distributed and \(e_i \in \mathbb {Z}_2\) come from the Bernoulli distribution (that is, it is 1 with probability \(\epsilon \) and 0 otherwise). The goal is to recover \(\mathbf {{s}}\). The (search version of the) \(\mathsf {LWE}\) problem is the same but for two key changes: first, the vectors \(\mathbf {{a}}_i \in \mathbb {Z}_q^n\) are uniformly random with entries from some large enough finite field \(\mathbb {Z}_q\) and second, each error term \(e_i\) is chosen from the discrete Gaussian distribution over the integers. The exact choice of the error distribution does not matter much: what is important is that in \(\mathsf {LWE}\), each sample has an error with bounded absolute value (at least with high probability). These seemingly minor differences seem to matter a great deal: we know worst-case to average-case reductions for \(\mathsf {LWE}\) [Reg05, Pei09, BLP+13] while no such result is known for \(\mathsf {LPN}\);Footnote 1 we know that (a decisional version of) \(\mathsf {LWE}\) is in the complexity class \(\mathcal {SZK}\) [MV03] (statistical zero-knowledge) while no such result is known for \(\mathsf {LPN}\); and we can build a dizzying array of cryptographic primitives assuming the hardness of \(\mathsf {LWE}\) (e.g. attribute based encryption and homomorphic encryption to name the more exotic examples) while the repertoire of \(\mathsf {LPN}\) is essentially limited to one-way functions and public-key encryption (and primitives that can be constructed generically from it). In particular, we do not know how to construct even simple, seemingly “unstructured”, primitives such as a collision-resistant hash function from the hardness of \(\mathsf {LPN}\), even with extreme parameter choices. Can we bridge this puzzling gap between \(\mathsf {LWE}\) and \(\mathsf {LPN}\)?

In a nutshell, the goal of this paper is to solve all three of these problems. Our main tool is a smoothing lemma for binary linear codes. We proceed to describe our results and techniques in more detail.

1.1 Overview of Our Results and Techniques

Worst-Case to Average-Case Reduction. We consider the promise nearest codeword problem (NCP), a worst-case analog of the learning parity with noise problem. Roughly speaking, in the search version of the (nmw)-promise nearest codeword problem, one is given the generator matrix \(\mathbf {{C}} \in \mathbb {Z}_2^{n \times m}\) of a linear code, along with a vector \(\mathbf {{t}} \in \mathbb {Z}_2^m\) such that \(\mathbf {{t}} = \mathbf{s}^T \mathbf {{C}} + \mathbf {{x}}^T\) for some \(\mathbf{s}\in \mathbb {Z}_2^n\) and \(\mathbf {{x}} \in \mathbb {Z}_2^m\) with the promise that \(\mathsf {wt}(\mathbf {{x}}) = w\). The problem is to find \(\mathbf{s}\). The non-promise version of this problem (which is commonly called the nearest codeword problem) is known to be \(\mathcal {NP}\)-hard, even to approximately solve [ABSS93] and the promise problem is similarly \(\mathcal {NP}\)-hard in the large-error regime (that is, when the Hamming weight of \(\mathbf {{x}}\) exceeds \((1/2+\epsilon )d\) where d is the minimum distance of the code and \(\epsilon >0\) is an arbitrarily small constant) [DMS99].

In terms of algorithms, Berman and Karpinski [BK02] show how to find an \(O(n/\log n)\)-approximate nearest codeword in polynomial time. In particular, this means that if the Hamming weight of \(\mathbf {{x}}\) in the promise version is at most \(O(d\cdot \log n/n)\), their algorithm finds the unique nearest codeword’s \(\mathbf{s}\) efficiently. To the best of our knowledge, this result is the current limit of polynomial-time solvability of the promise nearest codeword problem. Alon, Panigrahy and Yekhanin [APY09] show a deterministic nearly-polynomial time algorithm with the same parameters. In this work, we consider the promise NCP for balanced codes, where all nonzero codewords have Hamming weight between \((1/2-\beta )m\) and \((1/2+\beta )m\) for some balance parameter \(\beta >0\). We are not aware of improved NCP algorithms that apply to balanced codes.

Our first result (in Sect. 4) shows a reduction from the worst-case promise NCP for balanced codes where \(w/m \approx \frac{\log ^2 n}{n}\) to the average-case hardness of \(\mathsf {LPN}^{n}_{{\epsilon }}\) with very high error-rate \(\epsilon = 1/2 - 1/O(n^4)\). We note that a random linear code is \(\beta \)-balanced with overwhelming probability when \(\beta \ge 3\sqrt{n/m}\) so for a sufficiently large m the restriction on \(\beta \) is satisfied by most codes. Thus, qualitatively speaking, our result shows that solving LPN with very high error on the average implies solving NCP with very low error for most codes. While the parameters we achieve are extreme, we emphasize that no worst-case to average-case reduction for LPN was known prior to our work.

The worst-case to average-case reduction is a simple consequence of a smoothing lemma for codes that we define and prove in Sect. 3. In a nutshell, our smoothing lemma shows a simple randomized procedure that maps a worst-case linear code \(\mathcal{C}\) and a vector \(\mathbf {{t}}\) to a random linear code \(\mathcal{C}'\) and a vector \(\mathbf {{t}}'\) such that if \(\mathbf {{t}}\) is super-close to \(\mathcal{C}\), then \(\mathbf {{t}}'\) is somewhat close to \(\mathcal{C}'\). Our worst-case to average-case reduction then follows simply by applying the smoothing lemma to the worst-case code and vector. We show a simple Fourier-analytic proof of the smoothing lemma, in a way that is conceptually similar to analogous statements in the context of lattices [MR04] (see more details in the end of Sect. 3). Similar statements have been shown before in the list-decoding high-error regime [KS10], whereas our setting for NCP is in the unique decoding (low error) regime.

Statistical Zero-Knowledge. Another consequence of our smoothing lemma is a statistical zero-knowledge proof for the NCP problem for balanced codes with low noise, namely where \(w/m \approx \frac{\log ^2 n}{n}\). In particular, we show that the search problem is in \(\mathcal {BPP}^{\mathcal {SZK}}\). Membership in \(\mathcal {BPP}^{\mathcal {SZK}}\) should be viewed as an easiness result: a consequence of this result and a theorem of Mahmoody and Xiao [MX10] is that NCP with low noise cannot be \(\mathcal {NP}\)-hard unless the polynomial hierarchy collapses. Our result is the first non-\(\mathcal {NP}\)-hardness result we know for NCP, complementing the \(\mathcal {NP}\)-hardness result of Dumer, Micciancio and Sudan [DMS99] for noise slightly larger than half the minimum distance, namely where \(w/m \approx 1/2\) (but leaves a large gap in between). This is the LPN/codes analog of a result for LWE/lattices that we have known for over a decade [MV03]. We refer the reader to Sect. 5 for this result.

Collision-Resistant Hashing. Finally, we show a new cryptographic consequence of the hardness of LPN with low noise, namely a construction of a collision-resistant hash (CRH) function. Again, collision-resistant hashing from LWE/lattices has been known for over two decades [Ajt96, GGH96] and we view this result as an LPN/codes analog. The construction is extremely simple: the family of hash functions is parameterized by a matrix \(\mathbf {A} \in \mathbb {Z}_2^{n\times n^{1+c}}\) for some \(c>0\), its domain is the set of vectors \(\mathbf {{x}} \in \mathbb {Z}_2^{n^{1+c}}\) with Hamming weight \(2n/(c\log n)\) and the output is simply \(\mathbf {A}\mathbf {{x}} \pmod {2}\). This is similar to a CRH construction from the recent work of Applebaum et al. [AHI+17] modulo the setting of parameters; what is new in our work is a reduction from the LPN problem with error rate \(O(\log ^2 n/n)\) to breaking this CRH function.

Related Work. Our LPN-based collision-resistant hash function was used in [BLSV17] as a basis for constructing an identity based encryption scheme based on LPN with very low noise. Concurrently with, and independently from, our work, Yu et al. [YZW+17] constructed a family of collision-resistant hash functions based on the hardness of LPN using the same main idea as in Sect. 6 of the present work. While the core ideas of the construction in the two works is identical, [YZW+17] further discusses different parameter settings and some heuristics upon whose reliance one can obtain a tighter connection between the hardness of the CRH and the LPN problem.

2 Preliminaries

2.1 Notation

Throughout the paper, we will be working with elements in the additive group \(\mathbb {Z}_2\) with the usual addition operation. We will denote by bold lower-case letters vectors over \(\mathbb {Z}_2^n\) for \(n>1\), and by bold upper-case letters matrices over \(\mathbb {Z}_2^{m\times n}\) for \(m,n>1\). We will make the assumption that all vectors are column vectors and write \(\mathbf{a}^T\) to denote the row vector which is the transpose of \(\mathbf{a}\). The Hamming weight of \(\mathbf{a}\in \mathbb {Z}_2^n\), written as \(\mathsf {wt}(\mathbf{a})\), denotes the number of 1’s in \(\mathbf{a}\). For a set S, we write \(s\leftarrow S\) to denote that s is chosen uniformly at random from S. When D is some probability distribution, then \(s\leftarrow D\) means that s is chosen according to D.

The \(\mathsf {Ber}_\epsilon \) distribution over \(\mathbb {Z}_2\) is the Bernoulli distribution that outputs 1 with probability \(\epsilon \) and 0 with probability \(1-\epsilon \). Let \(\mathcal {S}^{m}_{k}\) be the set of all the elements \(\mathbf{s}\in \mathbb {Z}_2^m\) such that \(\mathsf {wt}(\mathbf{s})=k\).

A negligible function \(\mathrm{negl}(n)\) is any function that grows slower than inverse polynomial in n. In particular, for every polynomial p there is an \(n_0 \in \mathbb {N}\) such that for every \(n>n_0\), \(\mathrm{negl}(n) < 1/p(n)\).

2.2 The Learning Parity with Noise (LPN) Problem

For an \(\mathbf{s}\in \mathbb {Z}_2^n\), and an \(\epsilon \in [0,.5]\) let \(\mathcal {O}^n_{\mathbf{s},\epsilon }\) be an algorithm that, when invoked, chooses a random \(\mathbf{a}\leftarrow \mathbb {Z}_2^n\) and \(e\leftarrow \mathsf {Ber}_\epsilon \) and outputs \((\mathbf{a},\mathbf{s}^T\mathbf{a}+e)\). An algorithm \(\mathsf A\) is said to solve the search \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem with probability \(\delta \) if

$$\Pr [\mathsf A^{\mathcal {O}^n_{\mathbf{s},\epsilon }} \Rightarrow \mathbf{s}~;~ \mathbf{s}\leftarrow \mathbb {Z}_2^n]\ge \delta .$$

Let \(\mathcal {U}^n\) be an algorithm that, when invoked, chooses random \(\mathbf{a}\leftarrow \mathbb {Z}_2^n\) and \(b\leftarrow \mathbb {Z}_2\) and outputs \((\mathbf{a},b)\). We say that an algorithm \(\mathsf A\) has advantage \(\delta \) in solving the decisional \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem if

$$\left| \Pr [\mathsf A^{\mathcal {O}^n_{\mathbf{s},\epsilon }}\Rightarrow 0;~\mathbf{s}\leftarrow \mathbb {Z}_2^n] - \Pr [\mathsf A^{\mathcal {U}^n}\Rightarrow 0]\right| \ge \delta .$$

The LPN problem has a search to decision reduction (c.f. [KS06]). Namely, if there is an algorithm that runs in time t and has advantage \(\delta \) in solving the decisional \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem, then there is an algorithm that runs in time \(O(nt/\delta )\) that solves the search \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem with probability \(\approx 1\).

The following fact is known in some contexts as The Piling-Up Lemma [Mat93].

Lemma 2.1

For all \(\epsilon \in [0,\frac{1}{2}]\) it holds that \(\Pr [e_1+\ldots +e_k = 0;~e_i\leftarrow \mathsf {Ber}_\epsilon ]=\frac{1}{2}+\frac{1}{2}\cdot \left( 1-2\epsilon \right) ^k\).

2.3 The Nearest Codeword Problem

An (binary) (nmd)-code \(\mathcal{C}\) is a subset of \(\{0,1\}^m\) such that \(|\mathcal{C}| = 2^n\) and for any two codewords \(\mathbf {{x}}, \mathbf {{y}}\in \mathcal{C}\), \(\mathsf {wt}(\mathbf {{x}}\oplus \mathbf {{y}}) \le d\). The code is linear (denoted [nmd]-code) if \(\mathcal{C}\) is the row span of some matrix \(\mathbf {{C}}\in \{0,1\}^{n \times m}\).

Definition 2.1

(Nearest Codeword Problem \((\mathsf {NCP})\)). The nearest codeword problem \(\mathsf {NCP}_{n, m, w}\) is characterized by \(n,m,w \in \mathbb {Z}\) and is defined as follows. The input consists of a matrix \(\mathbf {{C}} \in \mathbb {Z}^{n \times m}\) which is the generator of a code, along with a vector \(\mathbf {{t}} \in \mathbb {Z}^m\) such that \(\mathbf {{t}} = \mathbf{s}^T \mathbf {{C}} + \mathbf {{x}}^T\) for some \(\mathbf{s}\in \mathbb {Z}_2^n, \mathbf {{x}} \in \mathbb {Z}_2^m\) with \(\mathsf {wt}(\mathbf {{x}}) = w\). The problem is to find \(\mathbf{s}\).

Note that our definition requires \(\mathsf {wt}(\mathbf {{x}})=w\), as opposed to the more relaxed requirement \(\mathsf {wt}(\mathbf {{x}})\le w\). However since w comes from a polynomial domain \(\{0,\ldots , m\}\) the difference is not very substantial (in particular, to solve the relaxed version one can go over all polynomially-many relevant values of w and try solving the exact version).

In this work, we consider a variant of the problem which is restricted to balanced codes, which are codes where all non-zero codewords have hamming weight close to 1/2. We start by defining balanced codes and then present balanced \(\mathsf {NCP}\).

Definition 2.2

A code \(\mathcal{C}\subseteq \{0,1\}^m\) is \(\beta \)-balanced if its minimum distance is at least \(\tfrac{1}{2}(1-\beta )m\) and maximum distance is at most \(\tfrac{1}{2}(1+\beta )m\).

Definition 2.3

(balanced \(\mathsf {NCP}\) (\(\mathsf {balNCP}\))). The balanced nearest codeword problem \(\mathsf {balNCP}_{n, m, w, \beta }\) is characterized by \(n,m,w \in \mathbb {Z}\) and \(\beta \in (0,1)\), and is defined as follows. The input consists of a matrix \(\mathbf {{C}} \in \mathbb {Z}^{n \times m}\) which is the generator of a \(\beta \)-balanced code, along with a vector \(\mathbf {{t}} \in \mathbb {Z}^m\) such that \(\mathbf {{t}}^T = \mathbf{s}^T \mathbf {{C}} + \mathbf {{x}}^T\) for some \(\mathbf{s}\in \mathbb {Z}_2^n, \mathbf {{x}} \in \mathbb {Z}_2^m\) with \(\mathsf {wt}(\mathbf {{x}}) = w\). The problem is to find \(\mathbf{s}\).

The \(\mathsf {balNCP}_{n, m, w, \beta }\) problem has a unique solution when \(w \le \frac{1}{4}(1- \beta )m\).

Standard decoding algorithms allow to solve \(\mathsf {NCP}\) in polynomial time with success probability \((1-\tfrac{w}{m})^n\) [BK02] or even deterministically in time \((1-\tfrac{w}{m})^{-n}\cdot \mathrm{poly}(n,m)\) [APY09]. We are not aware of improved methods that apply to balanced codes.

To conclude this section we show via a straightforward probabilistic argument that most sparse linear codes are indeed balanced (this is essentially the Gilbert-Varshamov Bound). This is to serve as sanity check that the definition is not vacuous and will also be useful when we apply our \(\mathcal {SZK}\) results to the \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem which naturally induces random codes.

Lemma 2.2

A random linear code \(\mathcal{C}\subseteq \mathbb {Z}_2^m\) of dimension n is \(\beta \)-balanced with probability at least \(1 - 2^{n - \beta ^2 m/4 +1}\). In particular, when \(\beta \ge 3\sqrt{n/m}\) a random linear code is \(\beta \)-balanced with probability \(1-\mathrm{negl}(n)\).

Proof

Let \(\mathbf {{C}} \leftarrow \mathbb {Z}_2^{n \times m}\) be a randomly chosen generator matrix. Then the associated code \(\mathcal{C}\) fails to be \(\beta \)-balanced if and only if there exists some \(\mathbf {{s}} \ne \mathbf {0} \in \mathbb {Z}_2^n\) such that \(|\mathsf {wt}(\mathbf {{s}}^T \mathbf {{C}}) - \frac{m}{2}| > \tfrac{\beta }{2}m\). For any fixed \(\mathbf {{s}}\ne \mathbf {0}\) the vector \(\mathbf {{s}}^T\mathbf {{C}}\) is uniformly random in \(\mathbb {Z}_2^m\) and therefore by the Chernoff bound:

$$ \Pr \left[ \left| ~\mathsf {wt}(\mathbf {{s}}^T \mathbf {{C}}) - \frac{m}{2} \right| > \tfrac{\beta m}{2} \right] \le 2 \exp \left( -\frac{\beta ^2 m}{4}\right) $$

By the union bound, the probability that the code is not \(\beta \)-balanced is at most

$$2^{n+1} \exp \left( -\frac{\beta ^2 m}{4}\right) \le 2^{n -\frac{\beta ^2 m}{4} + 1}.$$

This is negligible in n when \(\beta \ge 3\sqrt{n/m}\).

2.4 Statistical Zero Knowledge

Statistical zero-knowledge (\(\mathcal {SZK}\)) is the class of all problems that admit a zero-knowledge proof [GMR89] with a statistically sound simulation. Sahai and Vadhan [SV03] showed that the following problem is complete for \(\mathcal {SZK}\).

Definition 2.4

The promise problem Statistical Distance (SD) is defined by the following YES and NO instances. For a circuit \(C:\{0,1\}^n\rightarrow \{0,1\}^m\), we let \(C(U_n)\) denote the probability distribution on m-bit strings obtained by running C on a uniformly random input. Let \(\mathsf {SD}(D_0,D_1)\) denote the statistical (variation) distance between the distributions \(D_0\) and \(D_1\).

$$\begin{aligned} \varPi _{YES} := \{(C_0,C_1):\ C_0, C_1: \{0,1\}^n\rightarrow \{0,1\}^m \text{ and } \mathsf {SD}(C_0(U_n),C_1(U_n)) \ge 2/3\} \\ \varPi _{NO} := \{(C_0,C_1):\ C_0, C_1: \{0,1\}^n\rightarrow \{0,1\}^m \text{ and } \mathsf {SD}(C_0(U_n),C_1(U_n)) \le 1/3\} \end{aligned}$$

By \(\mathcal {BPP}^{\mathcal {SZK}}\), we mean decision problems that can be reduced to the statistical distance problem using randomized reductions. While in general such reduction could query the SD oracle on inputs that violate the promise (namely, a pair of circuits/distributions whose statistical distance lies strictly between 1/3 and 2/3), the reductions we present in this paper will respect the SD promise. Search-\(\mathcal {BPP}^{\mathcal {SZK}}\) is defined analogously.

3 A Smoothing Lemma for Noisy Codewords

Let \(\mathcal{C}\subseteq \mathbb {Z}_2^m\) be a binary linear code with generating matrix \(\mathbf {{C}} \in \mathbb {Z}_2^{n\times m}\). We say that a distribution \(\mathcal{R}\) over \(\mathbb {Z}_2^m\) smooths \(\mathcal{C}\) if the random variable \(\mathbf {{C}}\mathbf {{r}}\) for \(\mathbf {{r}}\leftarrow \mathcal{R}\) is statistically close to uniform over \(\mathbb {Z}_2^n\). We say that \(\mathcal{R}\) also smooths noisy codewords if for every vector \(\mathbf {{x}}\) of sufficiently low Hamming weight, it holds that \((\mathbf {{C}}\mathbf {{r}}, \mathbf {{x}}^T\mathbf {{r}})\) is statistically close to the distribution \(\mathcal{U}_{\mathbb {Z}_2^n}\times \mathsf {Ber}_{\epsilon }\) for some \(\epsilon \).

The notion of smoothing will play an important role in our reductions in this work. In particular, we would like to characterize families of codes that are smoothed by distributions supported over low Hamming weight vectors. To this end, we show that for balanced codes, there exist such smoothing distributions. (Similar statements have been shown before in the high-error regime, e.g., by Kopparty and Saraf [KS10].)

We note that while our proof uses harmonic analysis, it is also possible to prove it using the Vazirani XOR Lemma [Vaz86, Gol95]. However, we find that our method of using harmonic analysis demonstrates more straightforwardly the analogy of our lemma to smoothing in the lattice world (which is most often proved using harmonic analysis), see comparison in the end of this section. Furthermore, this suggests an approach if one wants to analyze the non-binary setting.

We start by defining our family of smoothing distributions \(\mathcal{R}_{d,m}\).

Definition 3.1

Let \(d, m \in {\mathbb N}\). The distribution \(\mathcal{R}_{d,m}\) over \(\mathbb {Z}_2^m\) is defined as follows. Sample (with replacement) d elements \(t_1, \ldots , t_d\) uniformly and independently from [m]. Output \(\mathbf {{x}} = \oplus _{i=1}^d \mathbf {{u}}_{t_i}\), where \(\mathbf {{u}}_j\) is the j-th standard basis vector. One can easily verify that \(\mathcal{R}_{d,m}\) is supported only over vectors of Hamming weight at most d.

We can now state and prove our smoothing lemma for noisy codewords.

Lemma 3.1

Let \(\beta \in (0,1)\) and let \(\mathbf {{C}} \in \mathbb {Z}_2^{n \times m}\) be a generating matrix for a \(\beta \)-balanced binary linear code \(\mathcal{C}\subseteq \mathbb {Z}_2^m\). Let \(\mathbf {{c}} \in \mathbb {Z}_2^m\) be a word of distance w from \(\mathcal{C}\). Let \(\mathbf {{s}}, \mathbf {{x}}\) be s.t. \(\mathbf {{c}}^T = \mathbf {{s}}^T\mathbf {{C}}+\mathbf {{x}}^T\) and \(\mathsf {wt}(\mathbf {{x}})=w\).

Consider the distribution \((\mathbf {{a}}, b)\) generated as follows. Sample \(\mathbf {{r}} \leftarrow \mathcal{R}_{d,m}\) and set \(\mathbf {{a}} = \mathbf {{C}}\mathbf {{r}}\), \(b = \mathbf {{c}}^T\mathbf {{r}}\). Then it holds that the joint distribution of \((\mathbf {{a}}, b-\mathbf {{s}}^T\mathbf {{a}})\) is within statistical distance \(\delta \) from the product distribution \(\mathcal{U}_{\mathbb {Z}_2^n}\times \mathsf {Ber}_{\epsilon }\), where

$$\begin{aligned} \delta\le & {} 2^{(n+1)/2} \cdot (\beta + \tfrac{2w}{m})^{d} \text { and } \\ \epsilon= & {} \tfrac{1}{2}-\tfrac{1}{2}(1-\tfrac{2w}{m})^{d}. \end{aligned}$$

Proof

Let e denote the value \(b-\mathbf {{s}}^T\mathbf {{a}}\). We bound the distance of from \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\) using simple harmonic analysis. Let f be the probability density function of , and consider its (binary) Fourier Transform:

$$\begin{aligned} \hat{f}(\mathbf {{y}},z) = \mathop {{\mathbb E}}_{\mathbf {{a}},e}[(-1)^{\mathbf {{y}}^T \mathbf {{a}} + z e}] = \mathop {{\mathbb E}}_{\mathbf {{r}}}[(-1)^{(\mathbf {{y}}^T \mathbf {{C}}+z\mathbf {{x}}^T) \mathbf {{r}}}], \end{aligned}$$
(1)

It immediately follows that \(\hat{f}(\mathbf {{0}},0)=1\). Moreover

$$\begin{aligned} \hat{f}(\mathbf {{0}},1) = \mathop {{\mathbb E}}_{\mathbf {{r}}}[(-1)^{\mathbf {{x}}^T\mathbf {{r}}}]. \end{aligned}$$
(2)

Recalling that \(\mathbf {{r}} = \oplus _{i=1}^d \mathbf {{u}}_{t_i}\) we have

$$ \mathop {{\mathbb E}}_{\mathbf {{r}}}[(-1)^{\mathbf {{x}}^T\mathbf {{r}}}] = \prod _{i=1}^d \mathop {{\mathbb E}}_{t_i}[(-1)^{\mathbf {{x}}^T\mathbf {{u}}_{t_i}}] = (1-\tfrac{2w}{m})^d, $$

since each \(t_i\) is sampled uniformly and independently in [m] and thus has a \(\tfrac{w}{m}\) probability to hit a coordinate where \(\mathbf {{x}}\) is one. Recalling the definition of \(\epsilon \), we have \(\hat{f}(\mathbf {{0}},1)=1-2\epsilon \).

Now let us consider the setting where \(\mathbf {{y}} \ne \mathbf {{0}}\). In that case, let us denote \(\mathbf {{v}} = \mathbf {{y}}^T\mathbf {{C}}\), a nonzero codeword in \(\mathcal{C}\). Since \(\mathcal{C}\) is balanced it follows that \(\mathsf {wt}(\mathbf {{v}}) \in [\tfrac{1}{2}(1- \beta )m, \tfrac{1}{2}(1+ \beta )m]\). Let us further denote \((\mathbf {{v}}')^T = \mathbf {{y}}^T\mathbf {{C}}+z \mathbf {{x}}^T\), since \(\mathsf {wt}(\mathbf {{x}}) \le w\) it follows that \(\mathsf {wt}(\mathbf {{v}}') \in \tfrac{1}{2}(1\pm \beta ')m\) for \(\beta ' = \beta + \tfrac{2w}{m}\). For \(\mathbf {{y}} \ne \mathbf {{0}}\) we thus get

$$\begin{aligned} \hat{f}(\mathbf {{y}},z) = \mathop {{\mathbb E}}_{\mathbf {{r}}}[(-1)^{(\mathbf {{v}}')^T\mathbf {{r}}}] = \prod _{i=1}^d \mathop {{\mathbb E}}_{t_i}[(-1)^{(\mathbf {{v}}')^T\mathbf {{u}}_{t_i}}]. \end{aligned}$$
(3)

Since each \(t_i\) is sampled uniformly from [m], it follows that \(\mathbf {{v}}'\mathbf {{u}}_{t_i} \pmod {2}=0\) with probability \(\epsilon _i \in \frac{1}{2} (1\pm \beta ')\). Therefore for all \(i\in [d]\) it holds that

$$\begin{aligned} \left| {\mathop {{\mathbb E}}_{t_i}[(-1)^{\mathbf {{v}}'\mathbf {{u}}_{t_i}}]} \right| = \left| {1-2\epsilon _i} \right| \le \beta '. \end{aligned}$$
(4)

We conclude that

$$\begin{aligned} \left| {\hat{f}(\mathbf {{y}},z)} \right| \le (\beta ')^d. \end{aligned}$$
(5)

Now we are ready to compare with \(\mathcal{U}_{\mathbb {Z}_2^n}\times \mathsf {Ber}_{\epsilon }\). Let g be the probability density function of \(\mathcal{U}_{\mathbb {Z}_2^n}\times \mathsf {Ber}_{\epsilon }\), and let \(\hat{g}\) be its Fourier Transform. Then \(\hat{g}(\mathbf {{0}},0)=1\), \(\hat{g}(\mathbf {{0}},1)=1-2\epsilon \) and \(\hat{g}(\mathbf {{y}},z)=0\) for all \(\mathbf {{y}} \ne 0\). Therefore

$$\begin{aligned} \left\| {\hat{f} - \hat{g}} \right\| _2^2 =\sum _{\mathbf {{y}},z} \left| {\hat{f}(\mathbf {{y}}) - \hat{g}(\mathbf {{y}})} \right| ^2 \le \sum _{\begin{array}{c} \mathbf {{y}} \in \mathbb {Z}_2^n \setminus \{0\} \\ z\in \mathbb {Z}_2 \end{array}} (\beta ')^{2d} \le 2^{n+1} (\beta ')^{2d}. \end{aligned}$$
(6)

By Parseval’s theorem, we have that

$$\begin{aligned} \left\| {{f} - {g}} \right\| _2^2 = \frac{1}{2^{n+1}} \left\| {\hat{f} - \hat{g}} \right\| _2^2 \le (\beta ')^{2d}, \end{aligned}$$
(7)

and going to \(\ell _1\) norm we have

$$\begin{aligned} \left\| {{f} - {g}} \right\| _1 \le 2^{(n+1)/2} \cdot \left\| {\hat{f} - \hat{g}} \right\| _2 \le 2^{(n+1)/2} \cdot (\beta ')^{d}, \end{aligned}$$
(8)

which completes the proof.   \(\square \)

Relation to Lattice Smoothing. To conclude this section, let us briefly explain the analogy to smoothing lemmas for lattices [MR04]. Our explanation is intended mostly for readers who are familiar with lattice smoothing and the notion of q-ary lattices, and wish to better understand the connection to our notion of smoothing. Other readers may safely skip this paragraph. We recall that the goal of smoothing in the lattice world is to find a distribution \(\mathcal{D}\) (a Gaussian in the lattice case), such that reducing it modulo a lattice \(\mathcal{L}\) results in an almost uniform distribution over the cosets of the lattice. Let us restrict our attention to integer lattices, integer distributions and integer cosets. Formally, \(\mathcal{D}\pmod {\mathcal{L}}\) is uniform over \(\mathbb {Z}/\mathcal{L}\). Now let \(\mathbf {{C}}\) be a generating matrix for a binary code, and consider the so called “perp lattice” \(\mathcal{L}= \varLambda ^{\perp }_2(\mathbf {{C}}) = \{\mathbf {{x}} \in \mathbb {Z}^m : \mathbf {{C}}\mathbf {{x}}=0\pmod {2}\} = \mathcal{C}^{\perp }+2\mathbb {Z}^m\). That is, the lattice corresponding to the dual code of \(\mathcal{C}\) plus all even vectors. Each integer cosets of the lattice \(\mathcal{L}\) corresponds to a vector \(\mathbf {{y}}\) where the respective coset is \(\mathcal{L}_{\mathbf {{y}}}=\{\mathbf {{x}} \in \mathbb {Z}^m : \mathbf {{C}}\mathbf {{x}}=\mathbf {{y}}\pmod {2}\}\). Thus a smoothing distribution \(\mathcal{D}\) is one where drawing \(\mathbf {{r}}\) from \(\mathcal{D}\) and computing \(\mathbf {{C}}\mathbf {{r}}\pmod {2}\) is close to uniform. Therefore, our smoothing lemma above shows that for 2-ary lattices one can devise non-trivial (and useful) smoothing distributions, and these distributions are not discrete Gaussians as usually considered in the lattice literature. Finally, the fact that we can smooth the code together with a noisy codeword is somewhat analogous to Gaussian leftover hash lemmas in the context of lattices.

4 A Worst Case Balanced NCP to Average Case LPN Reduction

Theorem 4.1

Assume there is an algorithm that solves the search \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem with success probability \(\alpha \) in the average case by running in time T and making q queries. Then, for every \(d \le m \in \mathbb {Z}\) there is an algorithm that solves search \(\mathsf {balNCP}_{n, m, w, \beta }\) in the worst case in time \(T \cdot \mathrm{poly}(n,m)\) with success probability at least \(\alpha - q \cdot \delta \) where

$$\begin{aligned} \delta\le & {} 2^{(n+1)/2} \cdot (\beta + \tfrac{2w}{m})^{d} \\ \epsilon= & {} \tfrac{1}{2}-\tfrac{1}{2}(1-\tfrac{2w}{m})^{d}. \end{aligned}$$

Proof

Assume \(\mathcal{A}\) is an algorithm for the LPN problem as in the theorem. Define \(\mathcal{B}\) as follows:

  • Input: \(\mathbf {{C}} \in \mathbb {Z}_2^{n \times m}\), \(\mathbf {{t}}\in \{0,1\}^m\). By assumption \(\mathbf {{C}}\) is the generator of a \(\beta \)-balanced code and \(\mathbf {{t}}^T = \mathbf{s}^T \mathbf {{C}} + \mathbf {{x}}^T\) for some \(\mathbf{s}\in \mathbb {Z}_2^n, \mathbf {{x}} \in \mathbb {Z}_2^m\) with \(\mathsf {wt}(\mathbf {{x}}) \le w\).

  1. 1.

    Sample \(\mathbf{s}' \leftarrow \mathbb {Z}_2^n\) and set \(\mathbf {{c}}^T = \mathbf {{t}}^T + (\mathbf{s}')^T\mathbf {{C}} = (\mathbf{s}+ \mathbf{s}')^T\mathbf {{C}} + \mathbf {{x}}^T\).

  2. 2.

    Run the algorithm \(\mathcal{A}\). Every time \(\mathcal{A}\) request a new LPN sample, choose \(\mathbf {{r}} \leftarrow \mathcal{R}_{d,m}\) and set \(\mathbf {{a}} = \mathbf {{C}}\mathbf {{r}}\), \(b = \mathbf {{c}}^T\mathbf {{r}}\) and give \(\mathbf {{a}}, b\) to \(\mathcal{A}\).

  3. 3.

    If at some point \(\mathcal{A}\) outputs \(\mathbf{s}^* \in \mathbb {Z}_2^n\) then output \(\mathbf{s}^* - \mathbf{s}'\).

By Lemma 3.1 each of the values \((\mathbf {{a}}, b)\) given to \(\mathcal{A}\) during step 2 is \(\delta \)-close to a fresh sample from \(\mathcal {O}^n_{\mathbf{s}^*,\epsilon }\) where \(\mathbf{s}^* = \mathbf{s}+ \mathbf{s}'\) is uniformly random over \(\mathbb {Z}_2^n\). By assumption, if \(\mathcal{A}\) were actually given samples from \(\mathcal {O}^n_{\mathbf{s}^*,\epsilon }\) is step 2 it would output \(\mathbf{s}^*\) in step 3 with probability \(\alpha \). Therefore if \(\mathcal{A}\) makes q queries in step 2, the probability that it outputs \(\mathbf{s}^*\) in step 3 is at least \(\alpha - q \delta \). This proves the theorem.    \(\square \)

Corollary 4.1

Let \(m = n^{c}\) for some constant \(c>1\), \(\beta = \frac{1}{\sqrt{n}}, w = \lceil m \frac{\log ^2 n}{n} \rceil \). Assume that search \(\mathsf {balNCP}_{n, m, w, \beta }\) is hard in the worst case, meaning that for every polynomial time algorithm its success probability on the worst case instance is at most \(\mathrm{negl}(n)\). Then for some \(\epsilon < \frac{1}{2} - \frac{1}{O(n^4)}\) search \(\mathsf {LPN}^{n}_{{\epsilon }}\) is hard in the average case, meaning that for every polynomial time algorithm its success probability on a random instance is at most \(\mathrm{negl}(n)\).

Proof

Follows directly from the theorem by setting \(d = \lceil 2n/ \log n \rceil \) and noting that:

$$\begin{aligned} \delta\le & {} 2^{(n+1)/2} \cdot (\beta + \tfrac{2w}{m})^{d} \le 2^{(n+1)/2 - (d/2) \log n + O(1)} \le 2^{-n/2 +O(1)} = \mathrm{negl}(n)\\ \epsilon= & {} \tfrac{1}{2}-\tfrac{1}{2}(1-\tfrac{2w}{m})^{d} \le \frac{1}{2} - 2^{-(4\frac{w}{m}d+1)} \le \frac{1}{2} - 1/O(n^4) \end{aligned}$$

   \(\square \)

The above says that the wost-case hardness of \(\mathsf {balNCP}\) with very low error-rate \(w/m \approx \frac{\log ^2 n}{n}\) implies the average-case hardness of \(\mathsf {LPN}^{n}_{{\epsilon }}\) with very high error-rate \(\epsilon = 1/2 - 1/O(n^4)\). Note that a random linear code is \(\beta \)-balanced with overwhelming probability when \(\beta \ge 3\sqrt{n/m}\) so for a sufficiently large m the restriction on \(\beta \) is satisfied by most codes.

Other choices of parameters may also be interesting. For example, we can set the error-rate to be \(w/m \approx 1/\sqrt{n}\) and \(d = 2 n/\log n\) while keeping \(m = n^c\) for some \(c>1\), \(\beta = 1/\sqrt{n}\) the same as before. Then if we assume that \(\mathsf {balNCP}_{n,m,w,\beta }\) is \((T(n), \delta (n))\) hard in the worst case (meaning that for every T(n) time algorithm the success probability on the worst case instance is at most \(\delta (n)\)) this implies \(\mathsf {LPN}^{n}_{{\epsilon }}\) is \((T'(n), \delta '(n))\) hard in the average where \(\epsilon (n) = 1/2 - 2^{-\sqrt{n}/\log n}\), \(T'(n) = T(n)/\mathrm{poly}(n)\) and \(\delta '(n) = \delta (n) + T'(n) 2^{-(n-1)/2}\). Note that, as far as we know, the \(\mathsf {balNCP}_{n,m,w,\beta }\) with noise rate \(w/m = 1\sqrt{n}\) may be \((T(n), \delta (n))\) hard for some \(T(n) = 2^{\varOmega (\sqrt{n})}\), \(\delta (n) = 2^{-\varOmega (\sqrt{n})}\), which would imply the same asymptotic hardness for \(\mathsf {LPN}^{n}_{{\epsilon }}\). Although the error-rate \(\epsilon = 1/2 - 2^{-\sqrt{n}/\log n}\) is extremely high, it is not high enough for the conclusion to hold statistically and therefore this connection may also be of interest.

5 Statistical Zero Knowledge for Balanced NCP and LPN

In this section, we show that for certain parameter regimes, \(\mathsf {balNCP}\in \mathrm {{Search}\hbox {-}\mathcal {BPP}}^\mathcal {SZK}\) and is thus unlikely to be \(\mathcal {NP}\)-hard [MX10]. Towards this end, we use a decision to search reduction analogous to the canonical one known for the \(\mathsf {LPN}\) problem. We consider the following randomized samplers (with an additional implicit parameter d):

  • Randomized sampler \(\mathsf {Samp}_0(\mathbf {{C}}, \mathbf {{t}})\) takes as input a matrix \(\mathbf {{C}} \in \{0,1\}^{n \times m}\) and a word \(\mathbf {{t}}\in \{0,1\}^m\). It samples \(\mathbf {{r}} {\mathop {\leftarrow }\limits ^{\scriptscriptstyle {\$}}}\mathcal{R}_{d,m}\) and outputs \((\mathbf {{C}}\mathbf {{r}}, \mathbf {{t}}^T\mathbf {{r}})\).

  • Randomized sampler \(\mathsf {Samp}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}})\) is parameterized by \(i \in [n]\), \(\sigma \in \{0,1\}\), takes as input a matrix \(\mathbf {{C}} \in \{0,1\}^{n \times m}\) and a word \(\mathbf {{t}}\in \{0,1\}^m\). It samples \(\mathbf {{r}} {\mathop {\leftarrow }\limits ^{\scriptscriptstyle {\$}}}\mathcal{R}_{d,m}\) and \(\rho \in \{0,1\}\) and outputs \((\mathbf {{C}}\mathbf {{r}}+\rho \mathbf {{u}}_{i}, \mathbf {{t}}^T\mathbf {{r}} + \rho \sigma )\).

Lemma 5.1

Consider a generating matrix \(\mathbf {{C}} \in \{0,1\}^{n \times m}\) for a \(\beta \)-balanced code, and let \(\mathbf {{t}} = \mathbf {{s}}^T \mathbf {{C}}+\mathbf {{x}}^T\) for some \(\mathbf {{s}}\in \{0,1\}^n\) and \(\mathbf {{x}}\) with hamming weight w. Then the following hold:

  1. 1.

    The sampler \(\mathsf {Samp}_0(\mathbf {{C}}, \mathbf {{t}})\) samples from a distribution that is \(\delta \)-close to \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\).

  2. 2.

    If \(\mathbf {{s}}_i=\sigma \) then \(\mathsf {Samp}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}})\) samples from a distribution that is \(\delta \)-close to \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\).

  3. 3.

    If \(\mathbf {{s}}_i\ne \sigma \) then \(\mathsf {Samp}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}})\) samples from a distribution that is \(\delta \)-close to \(\mathcal{U}_{\{0,1\}^n}\times \mathcal{U}_{\{0,1\}}\).

Here, \(\epsilon = \tfrac{1}{2}-\tfrac{1}{2}(1-\tfrac{2w}{m})^{d}\), \(\delta = 2^{(n+1)/2} \cdot (\beta + \tfrac{2w}{m})^{d}\).

Proof

Assertion 1 follows directly from Lemma 3.1.

For Assertion 2 we note that if \(\mathbf {{s}}_i=\sigma \) then

$$ (\mathbf {{C}}\mathbf {{r}}+\rho \mathbf {{u}}_{i}, \mathbf {{t}}^T\mathbf {{r}} + \rho \sigma ) = (\mathbf {{C}}\mathbf {{r}}, \mathbf {{t}}^T\mathbf {{r}})+\rho (\mathbf {{u}}_{i}, \sigma ) = (\mathbf {{C}}\mathbf {{r}}, \mathbf {{t}}^T\mathbf {{r}})+(\rho \mathbf {{u}}_{i}, \mathbf {{s}}^T(\rho \mathbf {{u}}_i)). $$

By Lemma 3.1 this distribution is within \(\delta \) statistical distance to

$$ (\mathbf {{a}}, \mathbf {{s}}^T\mathbf {{a}}+e)+(\rho \mathbf {{u}}_{i}, \mathbf {{s}}^T(\rho \mathbf {{u}}_i)), $$

with \((\mathbf {{a}}, e)\) distributed \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\). Finally, we can write

$$ (\mathbf {{a}}, \mathbf {{s}}^T\mathbf {{a}}+e)+(\rho \mathbf {{u}}_{i}, \mathbf {{s}}^T(\rho \mathbf {{u}}_i)) = ((\mathbf {{a}}+\rho \mathbf {{u}}_i), \mathbf {{s}}^T (\mathbf {{a}}+\rho \mathbf {{u}}_i) + e), $$

and since \(\mathbf {{a}}' = \mathbf {{a}}+\rho \mathbf {{u}}_i\) is also uniformly distributed, the assertion follows.

For Assertion 3 we note that when \(\mathbf {{s}}_i\ne \sigma \), i.e. \(\sigma = \mathbf {{s}}_i+1\) then

$$ (\mathbf {{C}}\mathbf {{r}}+\rho \mathbf {{u}}_{i}, \mathbf {{t}}^T\mathbf {{r}} + \rho \sigma ) = (\mathbf {{C}}\mathbf {{r}}, \mathbf {{t}}^T\mathbf {{r}})+\rho (\mathbf {{u}}_{i}, \sigma ) = (\mathbf {{C}}\mathbf {{r}}, \mathbf {{t}}^T\mathbf {{r}})+(\rho \mathbf {{u}}_{i}, \mathbf {{s}}^T(\rho \mathbf {{u}}_i))+(\mathbf {{0}}, \rho ). $$

As above, by Lemma 3.1, this distribution is within \(\delta \) statistical distance to

$$ (\mathbf {{a}}, \mathbf {{s}}^T\mathbf {{a}} + e) + (\rho \mathbf {{u}}_{i}, \mathbf {{s}}^T(\rho \mathbf {{u}}_i)) = (\underbrace{(\mathbf {{a}} + \rho \mathbf {{u}}_i)}_{ \mathbf {{a}}'}, \mathbf {{s}}^T (\mathbf {{a}} + \rho \mathbf {{u}}_i) + e) + (\mathbf {{0}}, \rho ) = (\mathbf {{a}}', \mathbf {{s}}^T\mathbf {{a}}'+ e+ \rho ), $$

with \((\mathbf {{a}}, e)\) distributed \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\), and thus also \((\mathbf {{a}}', e)\) distributed \(\mathcal{U}_{\{0,1\}^n}\times \mathsf {Ber}_{\epsilon }\) and independent of \(\rho \). Since \(\rho \) is uniform and independent of \((\mathbf {{a}}', e)\) it follows that \((\mathbf {{a}}', \mathbf {{s}}^T\mathbf {{a}}'+e+\rho )\) is distributed \(\mathcal{U}_{\{0,1\}^n}\times \mathcal{U}_{\{0,1\}}\).   \(\square \)

The following is an immediate corollary of Lemma 5.1.

Corollary 5.1

If \(\mathbf {{s}}_i=\sigma \) then the distributions generated by \(\mathsf {Samp}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}})\) and \(\mathsf {Samp}_0(\mathbf {{C}}, \mathbf {{t}})\) are within statistical distance at most \(2\delta \).

If \(\mathbf {{s}}_i\ne \sigma \) then the distributions generated by \(\mathsf {Samp}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}})\) and \(\mathsf {Samp}_0(\mathbf {{C}}, \mathbf {{t}})\) are within statistical distance at least \((1-2\epsilon ) - 2\delta \).

Proof

A direct calculation shows that the statistical distance between \(\mathsf {Ber}_{\epsilon }\) and \(\mathcal{U}_{\{0,1\}}\) is \(1-2\epsilon \). Plugging in Lemma 5.1, the result follows.   \(\square \)

We define the notion of a direct product sampler. This is just a sampler that outputs multiple samplers.

Definition 5.1

Let \(\mathcal{D}\) be a distribution and let \(k \in {\mathbb N}\), then \(\mathcal{D}^{({k})}\) is the distribution defined by k independent samples from \(\mathcal{D}\).

Lemma 5.2

Consider distributions \(\mathcal{D}_1, \mathcal{D}_2\) and values \(0 \le \delta _1 \le \delta _2 \le 1\) s.t. \(\mathsf {dist}(\mathcal{D}_1, \mathcal{D}_2) \in (\delta _1, \delta _2)\). Let \(k \in {\mathbb N}\) then \(\mathsf {dist}(\mathcal{D}_1^{({k})},\mathcal{D}_2^{({k})}) \in (1-c_1 e^{-c_2 \delta _1^2 k}, k\delta _2)\). For some positive constants \(c_1, c_2\).

Proof

The upper bound follows by union bound and the lower bound from the Chernoff bound.    \(\square \)

Theorem 5.1

There exists a \(\mathrm {{Search}\hbox {-}\mathcal {BPP}}^\mathcal {SZK}\) algorithm for solving \(\mathsf {balNCP}\) on instances of the following form. Letting \(\mathbf {{C}} \in \{0,1\}^{n \times m}\), \(\mathbf {{t}}\in \{0,1\}^m\), \(w \in [m]\) denote the \(\mathsf {balNCP}\) input, we require that the code generated by \(\mathbf {{C}}\) is \(\beta \)-balanced and that \(n,m,w,\beta \) are such that there exist \(d \in [m]\) and \(k \le \mathrm{poly}(n,m)\) for which

$$\begin{aligned} 2\delta k < 1/3 \end{aligned}$$
(9)

for \(\delta = 2^{(n+1)/2} \cdot (\beta + \tfrac{2w}{m})^{d}\), and

$$\begin{aligned} c_1 e^{-c_2 (1-2\epsilon - 2 \delta )^2 k} < 1/3 \end{aligned}$$
(10)

for \(\delta \) as above, \(\epsilon = \tfrac{1}{2}-\tfrac{1}{2}(1-\tfrac{2w}{m})^{d}\), and \(c_1, c_2\) are the constants from Lemma 5.2.

Proof

We recall the problem Statistical Distance (\(\mathsf {SD}\)) which is in \(\mathcal {SZK}\). This problem takes as input two sampler circuits and outputs 0 if the inputs sample distributions that are within statistical distance \(<1/3\) and 1 if the distributions are within statistical distance \(>2/3\). We will show how to solve \(\mathsf {balNCP}\) for the above parameters using an oracle to \(\mathsf {SD}\).

Specifically, for all \(i=1,\ldots , n\) and \(\sigma \in \{0,1\}\), the algorithm will call the \(\mathsf {SD}\) oracle on input \((\mathsf {Samp}^{({k})}_0(\mathbf {{C}}, \mathbf {{t}}), \mathsf {Samp}^{({k})}_{i, \sigma }(\mathbf {{C}}, \mathbf {{t}}))\), where \(\mathsf {Samp}^{({k})}_{(\cdot )}\) is the algorithm that runs the respective \(\mathsf {Samp}\) k times and outputs all k generated samples.

Let \(\alpha _{i,\sigma }\) denote the oracle response on the \((i,\sigma )\) call. Then if for any i it holds that \(\alpha _{i,0}=\alpha _{i,1}\), then return \(\bot \). Otherwise set \(\mathbf {{s}}_i\) to the value \(\sigma \) for which \(\alpha _{i,\sigma }=0\). Return \(\mathbf {{s}}\).

By definition of our samplers, they run in polynomial time, so if k is polynomial then our inputs to \(\mathsf {SD}\) are indeed valid. Combining Corollary 5.1 and Lemma 5.2, it holds that \(\alpha _{i,\sigma }=0\) if and only if \(\mathbf {{s}}^*_i=\sigma \), where \(\mathbf {{s}}^*\) is the vector for which \(\mathbf {{t}}^T = (\mathbf {{s}}^*)^T\mathbf {{C}}+\mathbf {{x}}^T\) and \(\mathsf {wt}(\mathbf {{x}})=w\). The correctness of the algorithm follows.   \(\square \)

Corollary 5.2

Let \(m = n^{c}\) for some constant \(c>1\), \(\beta = \frac{1}{\sqrt{n}}, w = \lceil m \frac{\log ^2 n}{n} \rceil \). Then search \(\mathsf {balNCP}_{n, m, w, \beta } \in \mathrm {{Search}\hbox {-}\mathcal {BPP}}^\mathcal {SZK}\).

Proof

In Theorem 5.1 set \(d = \lceil 2n/ \log n \rceil \) and \(k = n^9\). By the same calculation as in Corollary 4.1 we have \(\delta = \mathrm{negl}(n)\) and \(\epsilon \le \frac{1}{2} - 1/O(n^4)\). Therefore for large enough n we have \(2\delta k < 1/3\) and \(c_1 e^{-c_2 (1-2\epsilon - 2 \delta )^2 k} = e^{- \varOmega (n)} < 1/3\) as required by the theorem.    \(\square \)

On Statistical Zero Knowledge and LPN. We notice that since sparse random codes are balanced with overwhelming probability (Lemma 2.2), our results in this section also imply that the \(\mathsf {LPN}\) problem is in \(\mathrm {{Search}\hbox {-}\mathcal {BPP}}^\mathcal {SZK}\) for error value \(\frac{\log ^2 n}{n}\). We note that even though in LPN the weight of the noise vector (the distance from the code) is not fixed as in our definition of \(\mathsf {balNCP}\), the domain of possible weights is polynomial and thus the exact weight can be guessed with polynomial success probability. Once a successful guess had been made, it can be verified once a solution had been found.

6 Collision-Resistant Hashing

In this section, we describe a collision-resistant hash function family whose security is based on the hardness of the (decisional) \(\mathsf {LPN}^{n}_{{O(\log ^2n/n)}}\) problem. For any positive constant \(c\in \mathbb {R}^+\) and a matrix \(\mathbf{A}\in \mathbb {Z}_2^{n\times n^{1+c}}\), define the function

$$\begin{aligned} h_\mathbf{A}: \mathcal {S}^{n^{1+c}}_{2n/(c\log {n})} \rightarrow \mathbb {Z}_2^n\quad \text { as }\quad h_\mathbf{A}(\mathbf{r}):=\mathbf{A}\mathbf{r}. \end{aligned}$$
(11)

Notice that because

$$\left| \mathcal {S}^{n^{1+c}}_{2n/(c\log {n})}\right| = {n^{1+c} \atopwithdelims ()2n/(c\log {n})}> \left( \frac{n^{1+c}}{2n/(c\log {n})}\right) ^{2n/(c\log {n})}>2^{2n}$$

and the size of \(\mathbb {Z}_2^n\) is exactly \(2^n\), the function \(h_\mathbf{A}\) is compressing.

We now relate the hardness of finding collisions in the function \(h_\mathbf{A}\), for a random \(\mathbf{A}\), to the hardness of the decisional \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem.

Theorem 6.1

For any constant \(c>0\), if there exists an algorithm \(\mathsf A_1\) running in time t such that

$$\Pr \left[ \mathsf A_1(h_\mathbf{A}) \Rightarrow (\mathbf{r}_1,\mathbf{r}_2) \in \mathcal {S}^{n^{1+c}}_{2n/(c\log {n})}\text { s.t. }\mathbf{r}_1\ne \mathbf{r}_2\text { and } h_\mathbf{A}(\mathbf{r}_1)=h_\mathbf{A}(\mathbf{r}_2);~ \mathbf{A}\leftarrow \mathbb {Z}_2^{n\times n^{1+c}}\right] \ge \delta ,$$

then there exists an algorithm \(\mathsf A_2\) that runs in time \(\approx t\) and solves the decisional \(\mathsf {LPN}^{n}_{{\epsilon }}\) problem for any \(\epsilon \le \frac{1}{4}\) with advantage at least \(\delta \cdot 2^{-16n\epsilon /(c\log {n})-1}\).

In particular, for \(\epsilon =O(\log ^2 n /n)\) and any \(\delta =1/\mathrm{poly}(n)\), the advantage is \(1/\mathrm{poly}(n)\).

Proof

The algorithm \(\mathsf A_2\) has access to an oracle that is either \(\mathcal {O}^n_{\mathbf{s},\epsilon }\) or \(\mathcal {U}^n\). He calls the oracle \(n^{1+c}\) times to obtain samples of the form \((\mathbf{a}_i,b_i)\). He arranges the \(\mathbf{a}_i\) and \(b_i\) into a matrix \(\mathbf{A}\) and vector \(\mathbf{b}\) as

$$\mathbf{A}= \begin{bmatrix}~\mathbf{a}_1~|~ \cdots ~|~\mathbf{a}_{n^{1+c}}~\end{bmatrix}\in \mathbb {Z}_2^{n\times n^{1+c}},~\mathbf{b}=\begin{bmatrix} b_1 \\ \cdots \\ b_{n^{1+c}}\end{bmatrix}\in \mathbb {Z}_2^{n^{1+c}}$$

and sends \(\mathbf{A}\) to \(\mathsf A_1\). If \(\mathsf A_1\) fails to return a valid answer, then \(\mathsf A_2\) outputs \(\mathsf {ans}\leftarrow \{0,1\}\). If \(\mathsf A_1\) does return valid distinct \(\mathbf{r}_1\) and \(\mathbf{r}_2\) such that \(h_\mathbf{A}(\mathbf{r}_1)=h_\mathbf{A}(\mathbf{r}_2),\) then \(\mathsf A_2\) returns \(\mathsf {ans}=\mathbf{b}^T(\mathbf{r}_1-\mathbf{r}_2)\).

We first look at the distribution of \(\mathsf {ans}\) when the oracle that \(\mathsf A_2\) has access to is \(\mathcal {U}^n\). In this case it’s easy to see that regardless of whether \(\mathsf A_1\) returns a valid answer, we’ll have \(\Pr [\mathsf {ans}=0]=\frac{1}{2}\) because \(\mathbf{b}\) is completely uniform in \(\mathbb {Z}_2^{n^{1+c}}\).

On the other hand, if the oracle is \(\mathcal {O}^n_{\mathbf{s},\epsilon }\), then we know that for all i, \(b_i=\mathbf{s}^T \mathbf{a}_i + e_i\), where \(e_i\leftarrow \mathsf {Ber}_\epsilon \). This can be rewritten as \(\mathbf{s}^T\mathbf{A}+\mathbf{e}^T=\mathbf{b}^T\) where \(\mathbf{e}=\begin{bmatrix}e_1\\ \cdots \\ e_{n^{1+c}}\end{bmatrix}\). Therefore

$$\mathbf{b}^T(\mathbf{r}_1-\mathbf{r}_2) = \mathbf{A}(\mathbf{r}_1-\mathbf{r}_2) + \mathbf{e}^T(\mathbf{r}_1-\mathbf{r}_2) = \mathbf{e}^T(\mathbf{r}_1-\mathbf{r}_2).$$

Since \(\mathsf {wt}(\mathbf{r}_i)=2n/(c\log {n})\), we know that \(\mathsf {wt}(\mathbf{r}_1-\mathbf{r}_2)\le 4n/(c\log {n})\). Since the \(\mathbf{A}\) that is sent to \(\mathsf A_1\) is independent of \(\mathbf{e}\), we have that

$$\begin{aligned} \Pr [\mathbf{e}^T\cdot (\mathbf{r}_1-\mathbf{r}_2) = 0 ~;~ e_i\leftarrow \mathsf {Ber}_\epsilon ] \ge \frac{1}{2}+\frac{1}{2}(1-2\epsilon )^{4n/(c\log {n})}\ge \frac{1}{2}+ 2^{-16n\epsilon /(c\log {n})-1}, \end{aligned}$$
(12)

where the first inequality follows from Lemma 2.1 and the second inequality is due to the assumption that \(\epsilon \le \frac{1}{4}\) and the fact that \(1-x \ge 2^{-2x}\) for \(x\le 1/2\).

Thus when the oracle is \(\mathcal {O}^n_{\mathbf{s},\epsilon }\), we have

$$\Pr [\mathsf {ans}=0]\ge \frac{1}{2}\cdot (1-\delta )+\left( \frac{1}{2}+ 2^{-16n\epsilon /(c\log {n})-1}\right) \cdot \delta = \frac{1}{2}+ \delta \cdot 2^{-16n\epsilon /(c\log {n})-1}.$$

   \(\square \)

6.1 Observations and Other Parameter Regimes

As far as we know, the best attack against the hash function in (11) with \(c=1\) requires \(2^{\varOmega (n)}\) time, whereas the \(\mathsf {LPN}^{n}_{{\log ^2n/n}}\) problem, from which we can show a polynomial-time reduction, can be solved in time \(2^{O(\log ^2{n})}\). Thus there is possibly a noticeable loss in the reduction for this parameter setting. It was observed in [YZW+17, Theorem 2, Theorem 3] that there are other ways to set the parameters in Theorem 6.1 which achieve different connections between the hash function and the underlying LPN problem. For example, defining \(n=\log ^2{m}\) and \(c=\log {m}/\log \log {m}-1\) implies that there exists a hash function defined by the matrix \(\mathbf{A}\in \mathbb {Z}_2^{\log ^2{m}\times 2m}\) such that succeeding with probability \(\delta \) in finding collisions in this hash function is at least as hard as solving \(\mathsf {LPN}^{\log ^2{m}}_{{\epsilon }}\) problem with advantage \(\delta \cdot m^{-O(\kappa \epsilon )}\) for a constant \(\kappa \). This is exactly the parameter setting in [YZW+17, Theorem 3].Footnote 2

Based on the state of the art of today’s algorithms, it’s clear that using a hash function defined by an \(n\times n^2\) matrix \(\mathbf{A}\) is more secure than one defined by a \(\log ^2{n}\times 2n\) matrix (since one can trivially find collisions in the latter in time \(2^{O(\log ^2{n})}\)). There is, however, no connection that we’re aware of between the LPN problems on which they are based via Theorem 6.1. In particular, we do not know of any polynomial-time (in n) reductions that relate the hardness of the \(\mathsf {LPN}^{n}_{{\log ^2n/n}}\) problem to the \(\mathsf {LPN}^{\log ^2n}_{{\epsilon }}\) problem for a constant \(\epsilon \).