Keywords

1 Introduction

Constructions in the random oracle model (ROM) have shaped our understanding of the cryptographic world. Being a simple information-theoretic model, the ROM was found to be a very useful framework for understating what can be done (sometimes only heuristically), and what is unlikely to be achieved using (merely) symmetric-key cryptography. A notable example for the above is key-agreement protocols. Merkle [Mer82] has constructed a key-agreement protocol in the ROM with a quadratic gap between the query complexity of the players and the eavesdropper. Barak and Mahmoody-Ghidary [BM17], building on the seminal work of Impagliazzo and Rudich [IR89], proved that the quadratic gap achieved by [Mer82] is optimal, and Haitner, Mazor, Oshman, Reingold, and Yehudayoff [HMORY19], showed that for a large family of constructions, the communication complexity of [Mer82] is optimal.

Another primitive whose constructions in the ROM have high impact is Succinct Non-interactive Argument systems (SNARGs): non-interactive computationally sound proofs (arguments) for \(\textrm{NP}\) of succinct proof length (sublinear in the instance length). The first construction of SNARGs was given by Micali [Mic00] in the ROM. This feasibility result turned out to be very influential both theoretically and practically. In theory, it was shown how to instantiate SNARGs in the standard model for many languages of interest by instantiating the Fiat and Shamir [FS86] paradigm with a specific family of hash functions [CCHLRR18]. In practice, the succinctness of the proof is imperative in applications such as cryptocurrency and blockchain, where proofs are broadcast in a peer-to-peer network and (redundantly) stored at every network node, c.f., [BCGGMTV14, Zc14]. As such, improving the concrete efficiency of SNARGs is the focus of long line of work c.f.,  [Gro16, ZGKPP17, AHIV17, BBHR19, WTSTW18, BBBPWM18, BCRSVW19, CHMMVW20, BFS20, COS20, Sta18, LSTW21, CY21b, CY21a, GNS21].

ROM-SNARGs, like the one of [Mic00], have several attractive features. First, to date, they are the most efficient approach for post-quantum security with public verification (i.e., the verifier has no secrets). Moreover, from a practical perspective, one can heuristically instantiate the random oracle with a suitable cryptographic hash function. The result is a SNARG that uses lightweight cryptography (no need for public-key primitives), is easy to deploy (users only need to agree on a hash function), and has no trusted setup. The best ROM-SNARG appeared in the recent work of Chiesa and Yogev [CY21a], who constructed a \((t,\varepsilon )\)-sound ROM-SNARG of proof length of \({O}( \log (t/\varepsilon ) \cdot \log t\cdot \log n)\), where n is the instance length. A ROM-SNARG is \((t,\varepsilon )\)-sound if no \(t\)-queries (malicious) prover can convince the verifier to accept a false statement with probability larger than \(\varepsilon \).Footnote 1

Interestingly, and in contrast to other important primitives such as key-agreement protocols [IR89, HMORY19] and digital signatures [GGKT05, BMG07], we are lacking crucial lower bounds on the length of SNARGs in the ROM. Apart from the weak (folklore) lower bound of \({\varOmega }(\log (t/\varepsilon ))\) (which appears in the full version of the paper), the only exception is the recent bound of Chiesa and Yogev [CY20], who proved that the verifier query complexity of SNARGs cannot be too small. However, their bound does not rule out short ROM-SNARGs with verifier query complexity \(\varOmega (\log 1/\varepsilon )\), which is common for SNARG constructions.

This state-of-affairs naturally leads to the question of finding the shortest ROM-SNARG. Is it \({O}( \log (t/\varepsilon ) \cdot \log t\cdot \log n)\), as the best-known construction achieve, or is it as short as \({O}(\log (t/\varepsilon )\cdot \log n)\), as achieved in other security models (see Sect. 1.2.2). In this work, we advance our understanding about the existence of short ROM-SNARGs (with arbitrary verifier query complexity).

1.1 Our Results

Assuming the (randomized) exponential time hypothesis (rETH), see details below, we prove that for a large family of constructions, the current state-of-the art ROM-SNARG is (essentially) optimal. Specifically, we show that, for this family of constructions, a proof of \(\textrm{3SAT}\) over n variables is of length \(\tilde{\varOmega }( \log (t/\varepsilon ) \cdot \log t)\) (hiding \(\log n\) factors). Matching (up to \(\log n\) factors) the construction of the [CY21a]. The family of constructions we consider includes all constructions that have: (i) non-adaptive verifier and (ii) salted soundness. This includes all types of constructions we are aware of [Mic00, BCS16, CY21b, CY21a]). See details below.

  • Exponential time hypothesis. The (randomized) Exponential Time Hypothesis (rETH) is a stronger version \(\textrm{P}\ne \textrm{NP}\) that states that solving \(\textrm{3SAT}\) on n variables takes (randomized) time \(2^{\varOmega (n)}\). Note that some complexity assumption is inevitable for proving lower bounds on a SNARGs length.Footnote 2

  • Non-adaptive verifier. The oracle queries are asked by a non-adaptive (deterministicFootnote 3) verifier. That is, the queries are a function of the proof and are independent of the answers to other queries.Footnote 4

  • Salted soundness. This is a natural strengthening of the standard soundness of SNARG, which was introduced in Chiesa and Yogev [CY20]. A \((t,\varepsilon )\)-salted-soundness ROM-SNARG allows a cheating prover to request the random oracle to re-sample the answer for a chosen query (similar to changing a “salt” for this query). Each re-sampling costs a unit from the total \(t\) query budget allowed. The cheating prover can also return to previously sampled query answers at no cost.Footnote 5

    While one can easily construct contrived ROM-SNARGs for which salted soundness does not hold, we are not aware of any ROM-SNARG that exploits the fact that the prover cannot resample some of the oracle answers in a meaningful way. All constructions we are aware of satisfy salted soundness.Footnote 6

With these notions, we are ready to state our main result.(The precise statements of the following results are given in the main body of the paper, see Paper Organization for references.)

Theorem 1

(Conditional lower bound on ROM-SNARG length. Informal). Let \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) be an \(\textsf{s}\)-length ROM-SNARG for n-variable \(\textrm{3SAT}\), with \((t, \varepsilon )\)-salted-soundness, perfect completeness, and (deterministic) non-adaptive verifier. Let \(\mathsf {q_{\textsf{P}}}\) and \(\mathsf {q_{\textsf{V}}}\) be the query complexity of \(\textsf{P}\) and \(\textsf{V}\), respectively, and let \(\lambda \) denote the random oracle input and output length.

Assuming rETH, if \(\mathsf {q_{\textsf{V}}}\cdot \lambda \in o(n)\), and \(\log ^2 (t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\in o(n)\) then \(\textsf{s}\ge c \cdot \log t\cdot \log \frac{t}{\varepsilon } \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\), for some universal constant \(c > 0\).

We argue that the assumptions on the parameters regime in our theorem are reasonable and consider the most interesting settings (see Theorem 13 for the precise list of requirements). The goal of a SNARG is to have the proof length and the verifier complexity be much smaller than the instance size n. Usually, proportional to \(\textrm{poly}(\lambda ,\log n)\). Thus, our assumption that \(\mathsf {q_{\textsf{V}}}\cdot \lambda \), and \(\log t\cdot \log \frac{t}{\varepsilon }/ \log \mathsf {q_{\textsf{P}}}\) are of order o(n) is rather mild. The third requirement of \(\mathsf {q_{\textsf{V}}}\le t^{1/10}\) is almost trivial. It says that the query complexity of the verifier is much smaller than the query bound \(t\) of the adversary, which is very much expected from any reasonable SNARG.

The proof of Theorem 1 immediately follows by combing the following lemma with the recent lower bound of Chiesa and Yogev [CY20] on the length ROM-SNARG with low query-complexity verifiers.

Lemma 1

(Short ROM-SNARG \(\rightarrow \) low query ROM-SNARG. Informal). Let \(\textsf{ARG} = (\textsf{P},\textsf{V})\) be a ROM-SNARG for a language \({\mathcal {L}}\) with a deterministic non-adaptive verifier and \((t, \varepsilon )\)-salted-soundness, perfect completeness, proof length \(\textsf{s}\), and verifier query complexity \(\mathsf {q_{\textsf{V}}}\). Then there exists a verifier \(\mathsf {V'}\) of query complexity \(\textsf{s}/\log t\), running time \(2^{\mathsf {q_{\textsf{V}}}\cdot \log t}\) times that of \(\textsf{V}\), such that \((\textsf{P},\mathsf {V'})\) is a ROM-SNARG for \({\mathcal {L}}\) with \((t, \varepsilon )\)-soundness and completeness \(\omega (\varepsilon )\).

That is, the larger the salted-soundness of \(\textsf{ARG}\), the smaller the number of queries made by \(\mathsf {V'}\), and the better the completeness. While the completeness and verifier running time of the resulting scheme are rather poor, and we do not encourage to use it as an actual proof system, it is still non-trivial for the parameters in consideration: \(\mathsf {V'}\) running time is \(2^{o(n)}\), for n being the instance length, and the completeness is larger than the soundness error. By [CY20], the existence of such ROM-SNARG for \(\textrm{3SAT}\) contradicts rETH.

Using similar means, we can compile \(\textsf{ARG}\) into (\(\mathsf {P'}\),\(\mathsf {V'}\)), with (almost) perfect completeness, but with inefficient prover and slightly longer proof (see details in Sect. 2). Since this transformation does not yield better lower bounds, and the resulting scheme is impractical, we present the simpler transformation above.

Lower bound on the length of ROM subvector commitments. A subvector commitment (SVC) [LM19] allows to succinctly commit to a sequence of values, and later open the commitment for a subset of positions (an adversary cannot open any location into two different values). Ideally, the commitment string and the opening size of the SVC are independent (or at least not strongly related) of the length of the committed vector and the number of positions to open. This generalization of vector commitments [CF13] has a variety of applications, including SNARGs, verifiable databases with efficient updates, updatable zero-knowledge databases, universal dynamic accumulators, and more. Since SVCs in the (bare) ROM are the main building blocks in all ROM-SNARGs constructions, finding shorter ROM-SVCs is the obvious approach towards construction shorter ROM-SNARGs. For this very reason, Theorem 1 yields a lower bound on ROM-SVCs for an analog family of constructions: non-adapter receiver and salted-binding (i.e., the sender can resample the oracle outputs).

Theorem 2

(Conditional lower bound on the length of ROM subvector commitments. Informal). Let \(\textsf{CM}\) be a \((t, \varepsilon )\)-salted-sound, non-adaptive (deterministic) verification ROM-SVC for vectors of length n. Let \(\mathsf {q_{S}}\) and \(\mathsf {q_{R}}\) be the query complexity of the sender and receiver, respectively. Let \(\alpha \) denote the commitment length, and \(\beta (\ell )\) denote the opening length for subsets of size \(\ell \).

Assuming rETH, if \(\mathsf {q_{R}} \cdot \lambda \in o(n)\), and \(\log ^2(t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{S}} \in o(n)\), then \(\alpha + \beta (\log \frac{t}{\varepsilon }) \in \varOmega (\log t\cdot \log \frac{t}{\varepsilon } / \log n)\).

That is, unless the commitment itself is large, the opening of subsets of size \(\log \frac{t}{\varepsilon }\) must be large: about \(\log t/\log n\) bits per element. SVCs are relatively a strong primitive as they imply SNARGs for \(\textrm{NP}\) via the Micali construction (the other direction is not known to hold). However, we only know how to derive lower bounds for them by a reduction to SNARGs. An interesting open question is to directly get lower bounds for SVC, presumably for a larger class of constructions. Moreover, we can hope to get a lower bound for SVCs (in the ROM) without assuming rETH (or any complexity assumption). Indeed, even \(\textrm{P}=\textrm{NP}\) is not known to yield trivial SVCs in the ROM (which is not the case for SNARGs).

1.1.1 Hitting High-Entropy Distributions

The crux of Lemma 1 proof is analyzing the completeness of the resulting low verifier query scheme. We manage to translate this challenge into the following task of hitting high-entropy distributions.

Let \(X= (X_1,\ldots ,X_m)\) be a random variable uniformly distributed over \((\{0, 1\}^\lambda )^m\), let W be an event, and consider the random variable \(X|_W\), i.e., X conditioned on W. It is instructive to think of this question as “How does X appear to an adversary who received \(\log (1/{\textrm{Pr}}\left[ W\right] )\) bits of information about X?” A long sequence of works have studied the question of how “close” \(X|_W\) is to the uniformly distributed (unconditioned) X. In particular, these works considered the question of indistinguishability: showing that parts of \(X|_W\) are close to being uniform. Some works, see [EIRS01, Raz98, SV10] to name a few, proved that the distribution of \((X|_W)_i\) is close in statistical distance to the uniform one, apart from a size \(\log (1/{\textrm{Pr}}\left[ W\right] )\) set of bad i’s. Other works extended the above to bounded-query adversaries [Unr07, DGK17, CDGS18, GSV18, GLLZ20].

Unlike the above works, the focus of our result is forgeability: can we hit/sample from the conditional distribution \(X|_W\) using a simple distribution? We show that after putting aside some bad indices, one can hit the support of \(X|_W\), conditioned on its value in these bad indices, using a large enough product distribution. Like some of the above works, we state our result for high-entropy distributions, and not only for the uniform distribution conditioned on a high probability event.Footnote 7

Theorem 3

(Hitting high-entropy distributions using product sets, informal). Let \(X=(X_1, \ldots , X_m)\) be a random variable over the product set \((\{0, 1\}^\lambda )^m\) with \(H(X) \ge \lambda m - \ell \), and let \(\left\lceil \log m\right\rceil \le \gamma \le \lambda \). Then with probability at least 1/2 over \(x\leftarrow X\), there exists an \(O(\ell /\gamma )\)-size set \({\mathcal {B}}\subseteq [m]\) (of bad indices) such that

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow (\{0, 1\}^{\lambda })^{m-\left| {\mathcal {B}}\right| }}\left[ S \cap \textsf{Supp}\bigl (X_{[m] \setminus {\mathcal {B}}} \mid X_{\mathcal {B}}= x_{\mathcal {B}}\bigl ) \ne \emptyset \right] \in \varOmega ({1}/{\lambda m}). \end{aligned}$$

Letting H be the Shannon entropy function, and \(v_{{\mathcal {I}}}\), for a vector v, denote the ordered vector \((v_i)_{i\in {\mathcal {I}}}\). That is, with high probability over \(x\leftarrow X\), and after a few “bad” locations (indexed by \({\mathcal {B}}\)) are exposed, one can hit (i.e., forge a sample from) the conditional distribution \(X_{[m] \setminus {\mathcal {B}}} \mid X_{\mathcal {B}}= x_{\mathcal {B}}\) by sampling a tiny, in relative terms, product set.

Note that Theorem 3 does not state that \(X_{[m] \setminus {\mathcal {B}}} \mid X_{\mathcal {B}}= x_{\mathcal {B}}\) is close to the uniform distribution. Actually, it might be very far from that, e.g., for \(X = (U_1,\ldots ,U_m) \mid \bigoplus U_i =0^\lambda \) where the \(U_i\)’s are uniform and independent random variables over \(\{0, 1\}^\lambda \), there is no choice of \({\mathcal {B}}\), apart from the trivial one of \({\mathcal {B}}=[m]\), that makes \(X_{[m] \setminus {\mathcal {B}}} \mid X_{\mathcal {B}}= x_{\mathcal {B}}\) being close to uniform. (And this example demonstrates why the “pre-sampling” approach and alike, c.f., [Unr07], do not seem to be relevant for proving bounds of the type stated in the theorem.)It is also worth mentioning that one cannot prove Theorem 3 using the simple observation that after fixing some bad indices, the projection of \(X' \mathbin {{\mathop {=}\limits ^\textrm{def}}}(X \mid {X_{\mathcal {B}}= x_{\mathcal {B}}})\) on all other coordinates has large support. While the latter guarantees that, with high probability, each random subset \(S_i \leftarrow \{0, 1\}^\gamma \) intersects the support of \(X'_i\), appending these samples together does not necessarily form an element in \(X'\). Rather, we prove the theorem by showing that the number of points in \(S\cap \textsf{Supp}(X'_{[m] \setminus {\mathcal {B}}})\) is well-concentrated around its mean.

In our application of Theorem 3, the event W is the proof sent by \(\textsf{P}\) being a fixed \(\ell \)-bit value \(\pi \), and the size of the bad set \({\mathcal {B}}\) translates to the query complexity of the new verier \(\mathsf {V'}\). The theorem yields, see Sect. 2, that if \(\mathsf {V'}\) makes all queries is \({\mathcal {B}}\), and samples the potential answers for the other queries by itself, then it will accept (i.e., hitting the support of the accepting distribution) with good probability.

1.2 Related Work

1.2.1 SNARGs in the Random Oracle Model

There are several approaches to construct ROM-SNARGs. Micali [Mic00] (building on [Kil92, FS86]) showed a transformation that compiles a probabilistically checkable proof (PCP) and a commitment scheme into ROM-SNARG. Using the best know PCPs, the proof length of Micali’s construction, to get \((t, \varepsilon )\)-soundness, is \({O}( (\log (t/\varepsilon ))^2 \cdot \log n)\), where n is the instance size. Even when using the best-conjectured parameters for PCPs, known as the Sliding Scale Conjecture [BGLR93], the proof length remains the same up to the \(\log n\) factors (see [CY21b] for a tight analysis of the Micali construction). Ben-Sasson, Chiesa, and Spooner [BCS16] (hereon BCS) transformed a public-coin interactive oracle proofs (IOPs) into ROM-SNARG. The benefit of their is approach is that we are much better at constructing IOPs, with good parameters, than PCPs. Still, even when using the best known (or conjectured) IOP, the proof length of the BCS construction remains \({O}( (\log (t/\varepsilon ))^2 \cdot \log n)\).

Recently, Chiesa and Yogev [CY21a] have constructed a ROM-SNARG of proof length of \({O}( \log (t/\varepsilon ) \cdot \log t\cdot \log n)\), and hence slightly overcome the above “quadratic” barrier. Yet, the proof length of their construction is still far from the only (folklore) lower bound of \({\varOmega }(\log (t/\varepsilon ))\). Thus, the question of how to close this gap remains a major open question in this area.

1.2.2 SNARGs in Other Models

The security of SNARGs is unlikely to be proven in a non-idealized model (using falsifiable assumptions) Gentry and Wichs [GW11], but if one is willing to rely on “more structured” non-falsifiable assumptions (in addition or instead of the random oracle), much shorter SNARGs become feasible. Treating \(t\) as the running time of the adversary, constructions that use group-based and pairing-based assumptions achieve the optimal length (or close to optimal) of \(O(\log (t/\varepsilon ))\) (c.f., [Gro10, GGPR13, BCIOP13, BCCGP16, BBBPWM18, BFS20, PGHR13, MBKM19, CHMMVW20, Set19]). These constructions are insecure against quantum adversaries. Lattice based constructions, which are plausibly post-quantum, either achieve private-verifiability [BISW17, BISW18, GMNO18, ISW21, Nit19], or are public-verifiable, but with large proof length in practice (moreover, they typically use a random oracle as an additional assumption) [BBCPGL18, BLNS20, BCS21, CMSZ21]. (All of the above works assume a common random or reference string.)

To date, relying on the ROM is the best way to construct SNARGs that overcome all of the drawbacks mentioned above (alas, at the price of larger proofs).

Paper Organization

In Sect. 2, we give a high-level overview of the techniques for proving Lemma 1 (from short ROM-SNARGs to short ROM-SNARGs with low verifier query complexity). A formal definition of our notion of salted soundness, along with notations, definitions, and general statements used throughout the paper are given in Sect. 3. Theorem 3 (hitting high-entropy events using product sets) is proved in Sect. 4. Theorem 1 (lower bound on the length of ROM-SNARGs) and its accompanied Lemma 1 are proved in Sect. 5, and Theorem 2 (lower bound on the length of ROM subvector commitments) is proved in the full version of the paper.

2 Techniques

In this section, we give a high-level overview of our proof for Lemma 1, explaining how to transform a short salted-soundness, perfect completeness, deterministic non-adaptive verifier ROM-SNARG into a low verifier query ROM-SNARG for the same language.

Fix a deterministic non-adaptive ROM-SNARG \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) for a language \({\mathcal {L}}\) with \((t,\varepsilon )\)-slated-soundness and perfect completeness. Let s denote the proof length \(\textsf{ARG}\), and let \(\mathsf {q_{\textsf{P}}}\) and \(\mathsf {q_{\textsf{V}}}\) denote the query complexity of \(\textsf{P}\) and \(\textsf{V}\), respectively.

2.1 Warmup

As a warmup, assume that the honestly generated proof \(\pi \), sent by \(\textsf{P}\), only contains information about outputs of \(k\) (“important”) queries, whose identity is independent of the oracle. (The proof might contain additional information depending only on the instance \(\mathbbm {x}\) and the witness \(\mathbbm {w}\).) For this simple scenario, the construction of a \(k\)-query \(\mathsf {V'}\) is rather straightforward:

Algorithm 4

(Low-query verifier \(\mathsf {V'}\) . Warmup).

Oracle: \(\zeta :\{0, 1\}^\lambda \mapsto \{0, 1\}^\lambda \).

Input: Instance \(\mathbbm {x}\) and a proof \(\pi \).

Operation: 

  1. 1.

    Emulate \(\textsf{V} (\mathbbm {x},\pi )\) till it produces a list of oracle queries \((w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\). (Recall that \(\textsf{V}\) is non-adaptive.)

  2. 2.

    Sample a random \(k\)-size subset \({\mathcal {J}}\subseteq [\mathsf {q_{\textsf{V}}}]\).

  3. 3.

    For \(i=1,\ldots , \mathsf {q_{\textsf{V}}}\):

       If \(i \in {\mathcal {J}}\), set \(y_i = \zeta (w_i)\).

       Otherwise, sample \(y_i \leftarrow \{0, 1\}^\lambda \).

  4. 4.

    Accept if \(\textsf{V}\) accepts on the emulation with \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as the answers to its oracle queries

Namely, \(\mathsf {V'}\) guesses the identity of the important queries, and then uses the oracle \(\zeta \) to answer them. It samples the answers to the other queries uniformly at random. The query complexity of \(\mathsf {V'}\) is small if the number of important queries is small. Let us quickly argue about the completeness and soundness of \(\mathsf {ARG'} = (\textsf{P},\mathsf {V'})\).

  • Completeness. If the set \({\mathcal {J}}\) happens to contain all important queries, then the given proof \(\pi \), the instance \(\mathbbm {x}\), and the witness \(\mathbbm {w}\), the oracle answers provided to the emulated \(\textsf{V}\) have exactly the same distribution as in its non-emulated execution. Since we assume \(\textsf{ARG}\) has perfect completeness, the completeness of \(\mathsf {ARG'}\) is at least \(1/\left| \mathsf {q_{\textsf{V}}}\atopwithdelims ()k\right| \)—the probability that \({\mathcal {J}}\) contains all important queries.

  • Soundness: Here we rely on the salted soundness of the original SNARG scheme. Assume there exists a \((t- \mathsf {q_{\textsf{V}}})\)-query cheating prover \(\widetilde{\mathsf {P'}}\) that makes \(\mathsf {V'}\) accept \(\mathbbm {x}\notin {\mathcal {L}}\) with probability \(\varepsilon \). Consider the following t-query cheating prover \(\widetilde{\textsf{P}}\) for violating the salted-soundness of \(\textsf{ARG}\).Footnote 8

    1. 1.

      Run \(\widetilde{\mathsf {P'}}^\zeta \) to generate a proof \(\pi \). Emulate \(\textsf{V} (\mathbbm {x},\pi )\) till it produces a list of oracle queries \((w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\).

    2. 2.

      For \(i=1,\ldots , \mathsf {q_{\textsf{V}}}\):

      Query \(\zeta \) on \(w_i\) with a fresh salt. Set \(S_i = \{y_i\}\) for \(y_i\) be the query answer.

      If \(w_i\) was asked by \(\widetilde{\mathsf {P'}}\) in Step 1, add the retrieved answer to \(S_i\).

    3. 3.

      If there exists \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}}) \in S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}\) that would make \(\textsf{V}\) accept \((\mathbbm {x},\pi )\) with \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as the answers to its oracle queries, program \(\zeta (w_i) = y_i\) for each \(i \in [\mathsf {q_{\textsf{V}}}]\) (this programming is allowed by the salted soundness security game).

    4. 4.

      Output \(\pi \).

    By definition, if \(\widetilde{\textsf{P}}\) outputs a proof \(\pi \) then \(\textsf{V}\) accepts \(\pi \) on the programmed oracle. In addition, the probability that \(\widetilde{\textsf{P}}\) outputs the proof \(\pi \) generated in Step 1, is at least as large as the probability that \(\mathsf {V'} \) accepts \(\pi \) on the non-programmed oracle: \(\widetilde{\textsf{P}}\) considers for each query the original output of the oracle, as seen by \(\mathsf {V'} \) on queries in \({\mathcal {J}}\), and a uniform output, as sampled by \(\mathsf {V'} \) on inputs not in \({\mathcal {J}}\).

2.2 Actual Scenario

Things get way more challenging when the proof \(\pi \) depends on the queries made by \(\textsf{P}\), even in a slightly more complicated way. For instance, suppose \(\pi \) contains the XOR of some \(k\) queries, and \(\textsf{V}\) verifies that the XOR of these queries is consistent with \(\pi \). Since \(k\) might be arbitrarily large, i.e., much larger than \(\pi \), there is no low-query verifier that makes all these queries. So the challenge is to design a verifier that does not make all queries that effect the value of \(\pi \), but still has non-trivial soundness and completeness.

The key observation is that for the general case, where \(\pi \) depends arbitrarily on all oracle answers, we can modify the verifier so that the completeness and soundness are not that different from the naïve example considered in the warmup. Very informally, with high probability over the value of \(\pi \) and apart from \(k= s/\gamma \) “important” queries, the verification verdict does not depend “too much” on the answer to all other “non-important” queries. That is, there are many possible answers for the non-important queries that lead to acceptance (compared with all possible answers in the warmup case). See Sect. 2.3 for details. It follows that the answers for the non-important queries can be emulated by the verifier (without querying the oracle). Equipped with this understanding, the low query \(\mathsf {V'}\) is defined as follows:

Algorithm 5

(Low-query verifier \(\mathsf {V'}\) ).

Oracle: \(\zeta :\{0, 1\}^\lambda \mapsto \{0, 1\}^\lambda \).

Paramters: \(\gamma < \lambda \).

Input: Instance \(\mathbbm {x}\) and a proof \(\pi \).

Operation: 

  1. 1.

    Emulate \(\textsf{V} (\mathbbm {x},\pi )\) till it produces a list of oracle queries \((w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\). (Recall that \(\textsf{V}\) is non-adaptive.)

  2. 2.

    Sample \(k' \in [k]\) at random and sample, a random \(k' = \left\lceil s/\gamma \right\rceil \)-size subset \({\mathcal {J}}\subseteq [\mathsf {q_{\textsf{V}}}]\).

  3. 3.

    For \(i=1,\ldots , \mathsf {q_{\textsf{V}}}\):

       If \(i \in {\mathcal {J}}\), set \(S_i = \{\zeta (w_i)\}\).

       Otherwise, let \(S_i\) be a \(2^\gamma \)-size random subset of \(\{0, 1\}^\lambda \).

  4. 4.

    Accept if there exists \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}}) \in S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}\) that make \(\textsf{V}\) accepts on the emulation, with \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as the answers to its oracle queries

That is, similar to the warmup scenario, \(\mathsf {V'}\) only uses the oracle to answer the \(k= \left\lceil s/\gamma \right\rceil \) queries in the guessed set \({\mathcal {J}}\). For each other query, \(\mathsf {V'}\) samples \(2^\gamma \) candidates answers. It accepts if there is a choice from the candidate answers that jointly with the oracle answers to the queries in \({\mathcal {J}}\), leads to acceptance. The running-time of \(\mathsf {V'}\) is (roughly) \(2^{ \mathsf {q_{\textsf{V}}}\cdot \gamma }\), and the following claim states the completeness and soundness of \(\mathsf {ARG'} = (\textsf{P},\mathsf {V'})\):

Claim

(Informal). \(\mathsf {ARG'}\) has \(\bigl ({\lambda \cdot \mathsf {q_{\textsf{P}}}} \cdot k \cdot {\mathsf {q_{\textsf{V}}}\atopwithdelims ()s/\gamma }\bigl )^{-1}\)-completeness and \(( t- \mathsf {q_{\textsf{V}}}\cdot 2^\gamma ,\varepsilon )\)-soundness.

We argue completeness in Sect. 2.3, using the observation we made above regarding the small number of important queries, and argue soundness in Sect. 2.4, by extending the approach we took for proving soundness in the warmup case.

2.3 Completeness

Let \(\varPi \) and denote the proof and the random oracle answers to honest prover \(\textsf{P}\) queries on instance \(\mathbbm {x}\) and witness \(\mathbbm {w}\), respectively. Since the \(Y_i\)’s are independent uniform values in \(\{0, 1\}^\lambda \), it holds that

$$\begin{aligned} H(Y) = \mathsf {q_{\textsf{P}}}\cdot \lambda \end{aligned}$$
(1)

where H(Y) is the Shannon entropy of Y. A standard entropy argument yields that with probability at least 1/2 over \(\pi \leftarrow \varPi \):

$$\begin{aligned} H(Y \mid \varPi = \pi ) \ge \mathsf {q_{\textsf{P}}}\cdot \lambda - 2|\pi | \end{aligned}$$
(2)

In the following, fix \(\pi \in \textsf{Supp}(\varPi )\) for which Equation (2) holds. Applying Theorem 3 with respect to \(Y |_{ \varPi = \pi }\) and \(\ell = 2|\pi |\), yields that with probability 1/2 over the value of there exists a set \({\mathcal {B}}\subseteq [\mathsf {q_{\textsf{P}}}]\) of size \(\ell /\gamma \) (omitting constant factors) such that

$$\begin{aligned} {\textrm{Pr}}\left[ (S_1 \times \dots \times S_{\mathsf {q_{\textsf{P}}}-\left| {\mathcal {B}}\right| }) \cap \textsf{Supp}(Y'_{[\mathsf {q_{\textsf{P}}}] \setminus {\mathcal {B}}}) \ne \emptyset \right] \in \varOmega ({1}/{\lambda \cdot \mathsf {q_{\textsf{P}}}}) \end{aligned}$$
(3)

where each of the \(S_i\)’s is an independent \(2^\gamma \)-size subset of \(\{0, 1\}^\lambda \), \(Y' \mathbin {{\mathop {=}\limits ^\textrm{def}}}Y|_{Y_{\mathcal {B}}= y_{\mathcal {B}},\varPi = \pi }\), and \(Y'_{\mathcal {I}}\) is the ordered vector \((Y'_i)_{i\in {\mathcal {I}}}\).

Assume for simplicity that \(\textsf{V}\) and \(\textsf{P}\) make exactly the same queries. By Equation (3), if the random set \({\mathcal {J}}\) (sampled by \(\mathsf {V'}\)) is exactly \({\mathcal {B}}= {\mathcal {B}}(\pi )\), then with probability \(\varOmega ({1}/{\lambda \cdot \mathsf {q_{\textsf{P}}}})\) over the choice of the sets \(S_i\)’s sampled by \(\mathsf {V'}\), exit answers \(\{y_j\in S_j\}_{j\notin {\mathcal {J}}}\) that when combined with the oracle answers \(\{y_j\in S_j\}_{j\in {\mathcal {J}}}\), it holds that \(y=(y_1,\ldots ,y_{\mathsf {q_{\textsf{P}}}}) \in \textsf{Supp}(Y |_{ \varPi = \pi })\). Since such a vector y is possible to occur as random oracle answers in an honest execution of \(\textsf{P}\) that results in \(\pi \), the perfect completeness of \(\textsf{ARG} \) yields that \(\textsf{V}\) accepts on (the answers in) y with probability one. We conclude that \(\mathsf {V'}\) accepts with probability \(\varOmega ({1}/{\lambda \cdot \mathsf {q_{\textsf{P}}}}) \) times \({\textrm{Pr}}\left[ {\mathcal {J}}= {\mathcal {B}}\right] \ge 1/k \cdot 1/{\mathsf {q_{\textsf{V}}}\atopwithdelims ()s/\gamma }\).

Remark 1

(Improved completeness). We note that one could slightly modify the transformation to improve the completeness significantly (at the cost of proof length and prover running time). However, as this does not improve our lower bound, we only sketch the idea here. Instead of having the verifier guess the set \({\mathcal {J}}\), let the prover find \({\mathcal {J}}\), and send its description to the verifier. The completeness error now would come only from the error in Equation (2) (i.e., an error of \((\lambda \cdot \mathsf {q_{\textsf{P}}})^{-1}\)), and not from the probability of choosing the right set \({\mathcal {J}}\). The proof would be slightly larger (as it needs to contain the description of \({\mathcal {J}}\)), and the running-time of the honest prover would increase, as it needs to find the right set \({\mathcal {J}}\) (query complexity will stay the same). Even more so, using a prefix salt for all queries (included in the proof), one can make the completeness error exponentially small.

2.4 Soundness

Assume there exists a \((t- \mathsf {q_{\textsf{V}}}\cdot 2^\gamma )\)-query cheating prover \(\widetilde{\mathsf {P'}}\) that makes \(\mathsf {V'}\) accepts \(\mathbbm {x}\notin {\mathcal {L}}\) with probability \(\varepsilon \), and consider the following t-query cheating prover \(\widetilde{\textsf{P}}\) for violating the salted-soundness of \(\textsf{ARG}\).

Algorithm 6

(\(\widetilde{\textsf{P}}\)).

Oracle: \(\zeta :\{0, 1\}^\lambda \mapsto \{0, 1\}^\lambda \).

Input: Instance \(\mathbbm {x}\).

  1. 1.

    Run \(\widetilde{\mathsf {P'}}^\zeta (\mathbbm {x})\) to generate a proof \(\pi \).

  2. 2.

    Emulate \(\textsf{V} \) on \((\mathbbm {x},\pi )\) to determine its list of oracle queries \((w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\).

  3. 3.

    For \(i=1,\ldots , \mathsf {q_{\textsf{V}}}\):

    1. (a)

      Query \(\zeta \) on \(w_i\) for \(2^{\gamma }\) times. Let \(S_i\) be the set of answers.

    2. (b)

      If \(w_i\) was asked by \(\widetilde{\mathsf {P'}}\) in Step 1, add the retrieved answer to \(S_i\).

  4. 4.

    If there exists \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}}) \in S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}\) that make \(\textsf{V}\) accept \((\mathbbm {x},\pi )\) with \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as the answers to its oracle queries, program \(\zeta (w_i) = y_i\) for each \(i \in [\mathsf {q_{\textsf{V}}}]\).

  5. 5.

    Output \(\pi \).

The cheating probability of \(\widetilde{\textsf{P}}\) it as least as high as that of \(\widetilde{\mathsf {P'}}\). This is shown via a coupling argument, and the precise details are given in Sect. 5.2.2.

3 Preliminaries

3.1 Notations

We use calligraphic letters to denote sets, uppercase for random variables, and lowercase for values and functions. Let \(\textrm{poly}\) stand for the set of all polynomials. Throughout the paper, \(\log \) is the base 2 logarithm. For \(n \in \mathbb {N}\), let \([n] = \{1,\ldots ,n\}\). Given a vector \(v\in \Sigma ^n\), let \(v_i\) denote its ith entry. Similarly, for a set \({\mathcal {I}}\subseteq [n]\), let \(v_{\mathcal {I}}\) be the ordered sequence \((v_i)_{i\in {\mathcal {I}}}\), let \(v_{-{\mathcal {I}}} \mathbin {{\mathop {=}\limits ^\textrm{def}}}v_{[n] \setminus {\mathcal {I}}}\). For a set \({\mathcal {S}}\) and \(k\in \mathbb {N}\), let \({\mathcal {P}}_k({\mathcal {S}})\) denote all k-size subsets of \({\mathcal {S}}\). The support of a random variable X, denoted \(\textsf{Supp}(X)\), is defined as \(\{ x :{\textrm{Pr}}[X=x]> 0\}\). For an event E, we write \(X|_E\) to denote the random variable X conditioned on E.

The language \(\textrm{3SAT}\) over n variables is the set of all satisfiable formulas in conjunctive normal form where each clause is limited to at most three literals. The class \(\textrm{BPTIME}[T]\) refers to all languages that can be decided by a probabilistic TM that runs in time T(n), on inputs of length n.

Some basic inequalities. We use the following well-known facts:

Fact 7

\(\log (1-x) \le -x\) for \(x\in [0,1]\), and \(\log (1-x) \ge -2 x\), for any \(x\in [0,1/2]\).

Theorem 8

(Paley-Zygmund inequality). For any finite non-negative random variable X it holds that \({\textrm{Pr}}[X > 0]\ge {{\mathrm E}[X]^2}/{{\mathrm E}[X^2]}\) .

3.2 Entropy Measures

We refer to several measures of entropy. The relation and motivation of these measures are best understood by considering a notion that we will refer to as the sample-entropy: for a random variable X and \(x\in \textsf{Supp}(X)\), the sample-entropy of x with respect to X is the quantity

$$H_X(x) \mathbin {{\mathop {=}\limits ^\textrm{def}}}\log \tfrac{1}{{\textrm{Pr}}[X=x]},$$

letting \(H_X(x) = \infty \) for \(x\notin \textsf{Supp}(X)\), and \(2^{-\infty } = 0\).

The sample-entropy measures the amount of “randomness" or “surprise" in the specific sample x, assuming that x has been generated according to X. Using this notion, we can define the Shannon entropy \(H(X)\) and min-entropy \(\text {H}_{\infty }(X)\) as follows:

$$\begin{aligned} H(X) \mathbin {{\mathop {=}\limits ^\textrm{def}}}{\textrm{E}}_{x\leftarrow X}\left[ H_X(x)\right] , \quad \; \text {H}_{\infty }(X) \mathbin {{\mathop {=}\limits ^\textrm{def}}}\min _{x\in \textsf{Supp}(X)} H_X(x). \end{aligned}$$

We will also discuss the max-entropy \(H_0(X) \mathbin {{\mathop {=}\limits ^\textrm{def}}}\log |\textsf{Supp}(X)|\). The term “max-entropy” and its relation to the sample-entropy will be made apparent below.

It can be shown that \(\text {H}_{\infty }(X) \le H(X) \le H_0(X)\) with each inequality being an equality if and only if X is flat (uniform on its support). Thus, saying that \(\text {H}_{\infty }(X)\ge k\) is a strong way of saying that X has “high entropy” and \(H_0(X)\le k\) a strong way of saying that X has “low entropy”.

Conditional entropies. We will also be interested in conditional versions of entropy. For jointly distributed random variables (XY) and \((x,y)\in \textsf{Supp}(X,Y)\), we define the conditional sample-entropy to be \(H_{X| Y}(x| y) = \log \tfrac{1}{{\textrm{Pr}}_{X|Y}\left[ x|y\right] }=\log \tfrac{1}{{\textrm{Pr}}[X=x\mid Y=y]}\). Then the standard conditional Shannon entropy can be written as

$$H(X\mid Y) = {\textrm{E}}_{(x,y)\leftarrow (X,Y)}\left[ H_{X\mid Y}(x\mid y)\right] = {\textrm{E}}_{y\leftarrow Y}\left[ H(X|_{Y=y})\right] = H(X,Y)-H(Y).$$

The following fact gives a bound on the amount of entropy that is reduced when conditioning on an event for uniformly distributed random variables.

Fact 9

Let X be a random variable uniform over a set \({\mathcal {S}}\) and let W be an event. Then \(H(X\mid W) \ge \log (\left| {\mathcal {S}}\right| ) - \log 1/{\textrm{Pr}}\left[ W\right] \).

3.3 Randomized Exponential Time Hypothesis

Definition 1

(rETH; [DHMTW14]). The randomized Exponential Time Hypothesis (rETH) states that there exist \(\varepsilon > 0 \) and \(c > 1\) such that \(\textrm{3SAT}\) on n variables and with \(c \cdot n\) clauses cannot be solved by probabilistic algorithms that run in time \(2^{\varepsilon \cdot n}\).

3.4 Random Oracles

We denote by \(\mathcal {U}(\lambda )\) the uniform distribution over all functions \(\zeta :\{0, 1\}^{*} \rightarrow \{0, 1\}^\lambda \). Given an oracle algorithm \(\textsf{A} \) and an oracle \(\zeta \in \mathcal {U}(\lambda )\), \(\textsf{queries}(A, \zeta )\) is the set of oracle queries that \(\textsf{A} ^\zeta \) makes. We say that \(\textsf{A} \) is \(t\)-query if \(|\textsf{queries}(A, \zeta )| \le t\) for every \(\zeta \in \mathcal {U}(\lambda )\). We say that \(\textsf{A} \) is non-adaptive if its queries do not depend on the responses of the random oracle to previous queries. Finally, we consider the length of oracle queries, i.e., the number of bits used to specify the query: we say that \(\textsf{A} \) has queries of length \(\lambda \) if for every \(\zeta \in \mathcal {U}(\lambda )\) and \(x \in \textsf{queries}(\textsf{A}, \zeta )\) it holds that \(|x| \le \lambda \).

3.5 Non-interactive Arguments in the ROM

We consider non-interactive arguments in the ROM, where security holds against query-bounded, yet possibly computationally-unbounded, adversaries. Recall that a non-interactive argument typically consists of a prover algorithm and a verifier algorithm that prove and validate statements for a binary relation, which represents the valid instance-witness pairs.

A pair of polynomial-time oracle algorithms \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) is a ROM-SNARG with \(\alpha \)-completeness and \( (t, \mathsf {\epsilon })\)-soundness, for a relation \(\mathcal {R}\), if the following holds.

  • Completeness. For every \(\lambda \in \mathbb {N}\) and \((\mathbbm {x}, \mathbbm {w}) \in \mathcal {R}\):

    $$\begin{aligned} \underset{\overset{\zeta \leftarrow \mathcal {U}(\lambda )}{\pi \leftarrow \textsf{P} ^{\zeta }(\mathbbm {x}, \mathbbm {w})} }{{\textrm{Pr}}}\left[ \textsf{V} ^{\zeta }(\mathbbm {x}, \pi )=1\right] \ge \alpha (\left| \mathbbm {x}\right| , \lambda ) . \end{aligned}$$
  • Soundness.Footnote 9 For every \(\lambda \in \mathbb {N}\), \(t\)-query \(\widetilde{\textsf{P}}\) and \(\mathbbm {x}\notin {\mathcal {L}}(\mathcal {R})\):

    $$\begin{aligned} \underset{\overset{\zeta \leftarrow \mathcal {U}(\lambda )}{\pi \leftarrow \widetilde{\textsf{P}}^{\zeta }} }{{\textrm{Pr}}}\left[ \textsf{V} ^{\zeta }(\mathbbm {x}, \pi )=1\right] \ge \mathsf {\epsilon }(\left| \mathbbm {x}\right| , \lambda , t) . \end{aligned}$$

Complexity measures. We consider several complexity measures beyond soundness error. All of these complexity measures are, implicitly, functions of \(\mathbbm {x}\) and the security parameter \(\lambda \).

  • argument length: \(\textsf{s}:= |\pi | \).

  • times: the prover \(\textsf{P}\) runs in time \(\textsf{pt}\); the verifier \(\textsf{V}\) runs in time \(\textsf{vt}\).

  • queries: the prover \(\textsf{P}\) is a \(\mathsf {q_{\textsf{P}}}\)-query algorithm the verifier \(\textsf{V}\) is a \(\mathsf {q_{\textsf{V}}}\)-query algorithm.

3.5.1 Salted Soundness

Chiesa and Yogev [CY20] introduced a stronger notion of soundness for ROM-SNARG that they named salted soundness. This notion requires soundness to hold also against a malicious prover that has limited ability to program the oracle: it can obtain a set of random, independent strings as candidates for random oracle answers to a specific query. After obtaining such sets to the queries of his choice, the malicious prover can pick an answer of his desire from each set to be the random oracle answer.Footnote 10 This notion is formalized via the following salted soundness game defined as follows:

Game 10

(\(\textsf{SaltedSoundess}_{{\textsf{V}}, \lambda , t}(\textsf{A},\mathbbm {x})\)). 

Parameters: Algorithm \(\textsf{V}\) and \(\lambda ,t\in \mathbb {N}\).

Input: \(\mathbbm {x}\in \{0, 1\}^*\)

Player: \(\textsf{A}\).

Operation:  

  1. 1.

    Initialize keyed-map \(S\) of lists (each entry is initialized with the empty list).

  2. 2.

    Repeat the following \(t\) times:

    1. (a)

      \({\textsf{A}}\) sends a query \(x\in \{0, 1\}^*\).

    2. (b)

      Send \(y \leftarrow \{0, 1\}^{\lambda }\) to \(\textsf{A} \), and add it to the list \(S[x]\).

  3. 3.

    \({\textsf{A}}\) outputs a proof string \(\pi \) and query-answer list \(\sigma = [(x_1, y_1), \ldots , (x_n, y_n)]\).

  4. 4.

    Abort if \(y_i \notin S[x_i]\) for some \( i \in [n]\).

  5. 5.

    Output \(\textsf{V} ^{\zeta _{\sigma }}(\mathbbm {x}, \pi )\).

Definition 2

(Salted soundness). We say that ROM-SNARG \((\textsf{P}, \textsf{V})\) has \((t, \varepsilon )\)-salted-soundness for a language \({\mathcal {L}}\), if for any \(\lambda \), \(\mathbbm {x}\notin {\mathcal {L}}\) and \(\widetilde{\textsf{P}}\) it holds that \({\textrm{Pr}}\left[ \textsf{SaltedSoundess}_{{\textsf{V}}, \lambda , t}(\widetilde{\textsf{P}},\mathbbm {x})=1\right] \le \varepsilon (\left| \mathbbm {x}\right| ,\lambda , t)\).

Remark 2

(Known constructions satisfy salted soundness). Known constructions of ROM-SNARGs are usually proven to have standard soundness (as opposed to salted soundness). However, we observe that the constructions of [Mic00, BCS16, CY21b, CY21a] actually achieve this stronger notion of security. In particular, the tight analysis given in [CY21b] and in [CY21a] explicitly allowed the adversary to choose a salt for each query in the construction (e.g., see remark 3.2 in [CY21b]).

Amplification. It turns out that salted soundness can be easily amplified (at the expense of the query complexity). The proof of Lemma 2 is proved in the full version of the paper.

Lemma 2

Let \(\textsf{ARG} \) be an ROM-SNARG for a language \({\mathcal {L}}\) with \((t,\varepsilon )\)-salted-soundness for \(\varepsilon \le 1/4\). Then \(\textsf{ARG} \) has \((t/k, 2\varepsilon /k)\)-salted-soundness for any \(k\in \mathbb {N}\).

4 Hitting High-Entropy Distribution Using Product Sets

In this section we formally state and prove Theorem 3. Recall that for a set \({\mathcal {S}}\) and \(k\in \mathbb {N}\), we let \({\mathcal {P}}_k({\mathcal {S}})\) denote all k-size subsets of \({\mathcal {S}}\). Thus, a uniform sample from \(({\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda ))^{ m-\left| {\mathcal {B}}\right| }\) is a random product in \((\{0, 1\}^{\lambda })^{m-\left| {\mathcal {B}}\right| }\) of width \(2^\gamma \).

Theorem 11

(Hitting high-entropy distributions using product sets, restatement of Theorem 3). Let \(\gamma \le \lambda \in \mathbb {N}\), and let \(X=(X_1, \ldots , X_m)\) be a random variable over \((\{0, 1\}^\lambda )^m\). If \(H(X) \ge \lambda m - \ell \) and \(\gamma \ge 4\left\lceil \log m\right\rceil + 4\), then with probability at least 1/2 over \(x\leftarrow X\), then there exists a set \({\mathcal {B}}\subseteq [m]\) of size at most \(8\ell /{\gamma }+4\) such that

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{ m-\left| {\mathcal {B}}\right| }}\left[ S \cap \textsf{Supp}(X_{[m] \setminus {\mathcal {B}}} \mid _{X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}) \ne \emptyset \right] \ge {1}/{32 \lambda m}.\end{aligned}$$

Remark 3

(Tightness of Theorem 11). The size of \({\mathcal {B}}\) in Theorem 11 is tight up to a constant: Let \(m ,\lambda ,\gamma \in \mathbb {N}\) be as in Theorem 11, let \(X= (X_1,\ldots ,X_m)\) be uniform over \((\{0, 1\}^\lambda )^m\) and let W be the event that \(X_{1}= \ldots = X_t = 0^\lambda \), for some \(t\in [m]\). Clearly, \(H(X|_W) = (m-t)\lambda \). It is also clear that for every x and every set \({\mathcal {B}}\subseteq [m]\) of size \(t' < t\), it holds that

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m- t'}}\left[ S \cap \textsf{Supp}(X_{[m] \setminus {\mathcal {B}}} \mid _{X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}) \ne \emptyset \right] \le 2^{\gamma - \lambda }, \end{aligned}$$

which is negligible for sufficiently small \(\gamma \), e.g., \(\gamma = \lambda /2\). This matches, up to a constant, Theorem 11, which states that with high probability over \(x\leftarrow X\mid _{W}\), there exists a set \({\mathcal {B}}\) of size at most \(16 t + 4\) for which that the above event occurs with probability at least \(1/32\lambda m\).

Proving Theorem 11. We start with describing the high-level approach of the proof. We need to prove that with high probability over \(x \leftarrow X\), there exists a small (i.e., with size at most \(8\ell /{\gamma }+4\)) subset \({\mathcal {B}}\subseteq [m]\) such that

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{{\widehat{m}}}}\left[ S \cap \textsf{Supp}({\widehat{X}}) \ne \emptyset \right] \ge {1}/{32 \lambda m}, \end{aligned}$$

for \({\widehat{X}}= X_{[m] \setminus {\mathcal {B}}} \mid _{X_{\mathcal {B}}= x_{\mathcal {B}}}\) and \({\widehat{m}}= m- \left| {\mathcal {B}}\right| \). We assume, without loss of generality, that the elements of each \(S_i\) are chosen in a uniform order, and denote the jth element of \(S_i\), according to this order, by \(S_i[j]\). For \(y = (y_1,\ldots ,y_{\widehat{m}}) \in [2^{\gamma }]^{\widehat{m}}\), let \(S^y \in \{0, 1\}^{\lambda \times {\widehat{m}}}\) be the random variable defined by \((S^y)_i = S_i[y_i]\). Let \(Z^y\) be the indicator for the event \(S^y\in \textsf{Supp}({\widehat{X}})\), and let \(Z \mathbin {{\mathop {=}\limits ^\textrm{def}}}\sum _{y\in [2^{\gamma }]^{\widehat{m}}} Z^y\). That is, \(Z^y\) is event that the yth element of S is in \(\textsf{Supp}({\widehat{X}})\). Given this notation, we need to prove that \({\textrm{Pr}}\left[ Z >0\right] \ge 1/32\lambda m\). We start by proving that the expected value of Z is large. By linearity of expectation,

$$\begin{aligned} {\textrm{E}}\left[ Z\right] = \sum _{y\in [2^{\gamma }]^{\widehat{m}}} {\textrm{E}}\left[ Z^y\right] = 2^{\gamma {\widehat{m}}} \cdot |\textsf{Supp}({\widehat{X}})|/ 2^{{\widehat{m}}\lambda } = 2^{(\gamma - \lambda ){\widehat{m}}}\cdot |\textsf{Supp}({\widehat{X}})| \end{aligned}$$
(4)

To guarantee that \({\textrm{E}}\left[ Z\right] \) is at least one, we chose \({\mathcal {B}}\) to be a maximal subset of [m] with

$$\begin{aligned} H_{X_{{\mathcal {B}}}}(x_{{\mathcal {B}}}) \le (\lambda - \gamma )\cdot \left| {\mathcal {B}}\right| \end{aligned}$$
(5)

for \(H_Y(y)\) be the sample entropy of y according to Y (see Sect. 3.2). It is rather straightforward to show that with respect to this choice of \({\mathcal {B}}\), the expected value of Z is indeed at least one. Furthermore, since, by assumption, X has high entropy, the expected size of \({\mathcal {B}}\), as a function of x, is small, and therefore, with high probability over x the size of \({\mathcal {B}}\) is also small. (See proof in Lemma 3).

The above would suffice for lower-bounding \({\textrm{Pr}}\left[ Z>0\right] \), if the random variables \(\{Z^y\}\) would have been independent. This, however, is clearly not the case since most \(Z^y\) are not even pairwise independent: for a pair \(y,y'\in [2^{\gamma }]^{\widehat{m}}\) with \(y_{\mathcal {I}}= y'_{\mathcal {I}}\) for some \({\mathcal {I}}\subseteq [{\widehat{m}}]\), the event \(Z^{y}=1\), implying \((S^{y'})_{\mathcal {I}}\in \textsf{Supp}({\widehat{X}}_{\mathcal {I}})\), is likely to increase the probability of \(Z^{y'}=1\). Yet, we manage to show that the expected value of \(Z^2\) is small enough, implying that Z is well concentrated around its mean, and therefore \({\textrm{Pr}}\left[ Z>0\right] \) is large. To do that, we notice that for the maximal set \({\mathcal {B}}\) defined above, it holds that

$$\begin{aligned} H_{X_{{\mathcal {I}}} \mid _{ X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}}(x_{{\mathcal {I}}}) > (\lambda - \gamma ) \cdot \left| {\mathcal {I}}\right| \end{aligned}$$
(6)

for every \({\mathcal {I}}\subseteq [m]\setminus {\mathcal {B}}\). This condition implies that for every \(y,y'\) with \(y_{\mathcal {I}}= y'_{\mathcal {I}}\), the probability of \(Z^y \wedge Z^{y'}\) is sufficiently small (quantified by the size of \({\mathcal {I}}\)), implying that \({\textrm{E}}\left[ Z^2\right] \) is small.

Moving to the formal proof, Theorem 11 is an immediate corollary of the following two lemmata: Lemma 3 states that with high probability over x, there exists a small set \({\mathcal {B}}\) for which Equation (6) holds, and Lemma 4 completes the job by proving the conclusion of the theorem for the random variable \(X_{[m] \setminus {\mathcal {B}}} \mid _{X_{\mathcal {B}}= x_{\mathcal {B}}}\).

Lemma 3

(High-entropy events have an almost full-entropy large projection). Let \(\gamma \le \lambda \in \mathbb {N}\), and let \(X=(X_1, \ldots , X_m)\) be a random variable over \((\{0, 1\}^\lambda )^m\). If \(H(X) \ge \lambda \cdot m - \ell \) and \(\gamma \ge 2\cdot \left\lceil \log m\right\rceil + 2\), then with probability at least 1/2 over \(x\leftarrow X\), exists a set \({\mathcal {B}}\subseteq [m]\) of size at most \(4\ell /\gamma + 4\) such that for every \({\mathcal {I}}\subseteq [m]\setminus {\mathcal {B}}\):

$$\begin{aligned} H_{X_{{\mathcal {I}}} \mid _{ X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}}(x_{{\mathcal {I}}}) \ge (\lambda - \gamma ) \left| {\mathcal {I}}\right| . \end{aligned}$$

Lemma 4

(Hitting almost full-entropy events using product sets). Let \(\gamma \le \lambda \in \mathbb {N}\), let \(X=(X_1, \ldots , X_m)\) be a random variable over \((\{0, 1\}^\lambda )^m\). Assume \(\gamma \ge 2\cdot \left\lceil \log m\right\rceil +3\), and that for every \(x\in \textsf{Supp}(X)\) and \({\mathcal {I}}\subseteq [m]\), it holds that \(H_{X_{{\mathcal {I}}}}(x_{{\mathcal {I}}})\ge (\lambda - \gamma /2) \cdot \left| {\mathcal {I}}\right| \). Then

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m}} \left[ S \cap \textsf{Supp}(X) \ne \emptyset \right] \ge 1/{32 \lambda m}. \end{aligned}$$

We prove Lemmas 3 and 4 in Sects. 4.1 and 4.2, receptively, but first use them for proving Theorem 11.

Proof

of Theorem 11: Let \(t \mathbin {{\mathop {=}\limits ^\textrm{def}}}8\ell /\gamma + 4\), and let

$$\begin{aligned} {\mathcal {T}}\mathbin {{\mathop {=}\limits ^\textrm{def}}}\{x\in \textsf{Supp}(X) \, :\, \exists {\mathcal {B}}\subseteq [m], \left| {\mathcal {B}}\right| \le t :\forall {\mathcal {I}}\subseteq [m]\setminus {\mathcal {B}}, H_{X_{{\mathcal {I}}} \mid _{X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}}(x_{{\mathcal {I}}}) \ge (\lambda - \gamma /2)\cdot \left| {\mathcal {I}}\right| \} . \end{aligned}$$

Since, by assumption, \(\gamma /2 \ge 2\left\lceil \log m\right\rceil +2\), Lemma 3 yields that

$$\begin{aligned} {\textrm{Pr}}\left[ X\in {\mathcal {T}}\right] \ge 1/2 . \end{aligned}$$
(7)

Fix \(x\in {\mathcal {T}}\), let \({\mathcal {B}}\) be the set guaranteed by the definition of \({\mathcal {T}}\) (choose an arbitrary one, if there is more than one), and let \(X' \mathbin {{\mathop {=}\limits ^\textrm{def}}}X_{[m] \setminus {\mathcal {B}}}|_{X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}\), and let \(m' \mathbin {{\mathop {=}\limits ^\textrm{def}}}m-\left| {\mathcal {B}}\right| \). By Lemma 4

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m'}} \left[ S \cap \textsf{Supp}(X') \ne \emptyset \right] \ge 1/{32\lambda m'} \ge 1/{32 \lambda m} . \end{aligned}$$
(8)

Combining Equations (7) and (8), concludes the proof.    \(\square \)

4.1 High-Entropy Distributions Have an (Almost) Uniform Large Projection, Proving Lemma 3

Proof

of Lemma 3. Let \(m ,\lambda ,\gamma \) and X be as in Lemma 3. For \(x \in \textsf{Supp}(X)\), let \({\mathcal {B}}^x\) be the (lex. first) maximalFootnote 11 subset of [m] with

$$\begin{aligned} H_{X_{{\mathcal {B}}^x}}(x_{{\mathcal {B}}^x}) \le (\lambda - \gamma ) \left| {\mathcal {B}}^x\right| \end{aligned}$$
(9)

Since Equation (9) holds for the empty set, \({\mathcal {B}}^x\) is always defined. We prove Lemma 3 using the following two claims, proven below.

Claim

For every \(x \in \textsf{Supp}(X)\) and \({\mathcal {I}}\subseteq [m] \setminus {\mathcal {B}}^x\), it holds that \(H_{X_{{\mathcal {I}}} \mid X_{{\mathcal {B}}^x} = x_{{\mathcal {B}}^x}}(x_I)\ge (\lambda - \gamma )\cdot \left| {\mathcal {I}}\right| \).

Claim

If \(H(X)\ge \lambda \cdot m - \ell \), then for every random variable \(I \subseteq [m]\) it holds that \(H(X_I \mid I) \ge (\lambda -\left\lceil \log m\right\rceil )\cdot {\textrm{E}}\left[ \left| I\right| \right] - \ell - \left\lceil \log m\right\rceil \).

By Sect. 4.1, for every \(x \in \textsf{Supp}(X)\) and \({\mathcal {I}}\subseteq [m] \setminus {\mathcal {B}}^x\), it holds that

$$\begin{aligned} H_{X_{{\mathcal {I}}} \mid X_{{\mathcal {B}}^x} = x_{{\mathcal {B}}^x}}(x_I)\ge (\lambda - \gamma ) \left| {\mathcal {I}}\right| \end{aligned}$$
(10)

Hence, to conclude the proof, it is left to argue that with high probability over \(x\leftarrow X\), the size of \({\mathcal {B}}^x\) is small. For \({\mathcal {I}}\subseteq [m]\), let \(f_{\mathcal {I}}(x) =x_{\mathcal {I}}\) if \({\mathcal {B}}^x ={\mathcal {I}}\), and \(f_{\mathcal {I}}(x) =\; \perp \) otherwise, and let \(p_{\mathcal {I}}= {\textrm{Pr}}\left[ f_{\mathcal {I}}(X) =\; \perp \right] \). Compute

$$\begin{aligned}&H(X_{{\mathcal {B}}^X} \mid {\mathcal {B}}^X) = {\textrm{E}}_{{\mathcal {B}}\leftarrow {\mathcal {B}}^X}\left[ H( X_{\mathcal {B}}\mid {\mathcal {B}}^X = {\mathcal {B}})\right] \\&= {\textrm{E}}_{{\mathcal {B}}\leftarrow {\mathcal {B}}^X}\left[ H(f_{\mathcal {B}}(X ) \mid {\mathcal {B}}^X = {\mathcal {B}})\right] \nonumber \le \sum _{{\mathcal {I}}} {\textrm{E}}_{{{\mathcal {B}}} \leftarrow {\mathcal {B}}^X}\left[ H(f_{\mathcal {I}}(X ) \mid {\mathcal {B}}^X = {\mathcal {B}})\right] \nonumber \\&= \sum _{{\mathcal {I}}} H( f_{\mathcal {I}}(X ) \mid {\mathcal {B}}^X) \nonumber \le \sum _{{\mathcal {I}}} H( f_{\mathcal {I}}(X ) )\nonumber \\&= \sum _{{\mathcal {I}}} \bigl (\sum _{x:{\mathcal {B}}^x = {\mathcal {I}}} {\textrm{Pr}}\left[ X=x\right] \cdot H_{X_{\mathcal {I}}}(x_{\mathcal {I}})\bigl ) + p_{\mathcal {I}}\cdot \log (1/p_{\mathcal {I}})\nonumber \\&\le \sum _{{\mathcal {I}}} {\textrm{Pr}}\left[ {\mathcal {B}}^X ={\mathcal {I}}\right] \cdot (\lambda - \gamma ) \cdot \left| {\mathcal {I}}\right| + p_{\mathcal {I}}\cdot \log (1/p_{\mathcal {I}})\end{aligned}$$
(11)
$$\begin{aligned}&= (\lambda - \gamma ) {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] + \sum _{{\mathcal {I}}} p_{\mathcal {I}}\cdot \log (1/p_{\mathcal {I}})\nonumber \\&\le (\lambda - \gamma ) {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] + 1+ \sum _{{\mathcal {I}}, p_{\mathcal {I}}\ge 1/2 } -p_{\mathcal {I}}\cdot \log (p_{\mathcal {I}})\nonumber \\&\le (\lambda - \gamma ) {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] + 1+ \sum _{{\mathcal {I}}, p_{\mathcal {I}}\ge 1/2 } p_{\mathcal {I}}\cdot 2(1-p_{\mathcal {I}})\end{aligned}$$
(12)
$$\begin{aligned}&= (\lambda - \gamma ) {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] + 1+ 2 \cdot \sum _{{\mathcal {I}}, p_{\mathcal {I}}\ge 1/2 } p_{\mathcal {I}}\cdot {\textrm{Pr}}\left[ {\mathcal {B}}^X ={\mathcal {I}}\right] \nonumber \\&\le (\lambda - \gamma ) {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] + 3. \nonumber \end{aligned}$$
(13)

Inequality 12 holds by the definition of \({\mathcal {B}}^x\), and Inequality 13 holds since \(\log (1-x) \ge -2x\) for \(x\in [0,1/2]\).

On the other hand since, by assumption, \(H(X)\ge \lambda \cdot m - \ell \), Sect. 4.1 yields that

$$\begin{aligned} H(X_{{\mathcal {B}}^X}\mid {\mathcal {B}}^X) \ge (\lambda -\left\lceil \log m\right\rceil ) \cdot {\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] - \ell -\left\lceil \log m\right\rceil \end{aligned}$$
(14)

Combining Equations (11) and (14), we conclude that \({\textrm{E}}\left[ \left| {\mathcal {B}}^X\right| \right] \le \frac{\ell + \left\lceil \log m\right\rceil +3}{\gamma -\left\lceil \log m\right\rceil }\le {2\ell }/{\gamma }+2\), where the 2nd inequality follows from the fact that \(\gamma \ge 2\cdot \left\lceil \log m\right\rceil +3\). The proof follows by Markov inequality.    \(\square \)

Proving Section 4.1 .

Proof

of Section 4.1. Let \({\mathcal {B}}= {\mathcal {B}}^x\). Since for every disjoint sets \({\mathcal {A}},{\mathcal {C}}\subseteq [m]\) and \(x \in \textsf{Supp}(X)\)

$$\begin{aligned} {\textrm{Pr}}[X_{\mathcal {A}}= x_{\mathcal {A}}] \cdot {\textrm{Pr}}[X_{\mathcal {C}}= x_{\mathcal {C}}\mid X_{\mathcal {A}}= x_{\mathcal {A}}] ={\textrm{Pr}}[X_{{\mathcal {A}}\cup {\mathcal {C}}} = x_{{\mathcal {A}}\cup {\mathcal {C}}}], \end{aligned}$$

for every \({\mathcal {I}}\subseteq [m]\setminus {\mathcal {B}}\)

$$\begin{aligned} H_{X_{{\mathcal {B}}}}(x_{{\mathcal {B}}}) + H_{X_{{\mathcal {I}}}\mid _{ X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}}(x_{{\mathcal {I}}})= H_{X_{{\mathcal {I}}\cup {\mathcal {B}}}}(x_{{\mathcal {I}}\cup {\mathcal {B}}}). \end{aligned}$$

Assume towards a contradiction that \(H_{X_{\mathcal {I}}\mid X_{{\mathcal {B}}} = x_{{\mathcal {B}}}}(x_{\mathcal {I}}) < (\lambda - \gamma ) \left| {\mathcal {I}}\right| \). Since, by definition, \(H_{X_{{\mathcal {B}}}}(x_{{\mathcal {B}}}) \le (\lambda - \gamma ) \left| {\mathcal {B}}\right| \), it follows that

$$\begin{aligned} H_{X_{{\mathcal {I}}\cup {\mathcal {B}}}}(x_{{\mathcal {I}}\cup {\mathcal {B}}}) < (\lambda - \gamma ) \cdot (\left| {\mathcal {B}}\right| + \left| {\mathcal {I}}\right| ) = (\lambda - \gamma ) \cdot \left| {\mathcal {B}}\cup {\mathcal {I}}\right| , \end{aligned}$$

in contradiction to the maximality of \({\mathcal {B}}\).    \(\square \)

Proving Section 4.1 .

Proof

Since, by assumption, \(H(X) \ge \lambda m - \ell \), and since

$$\begin{aligned} H( I) = H( I,\left| I\right| ) \le \left\lceil \log m\right\rceil + H(I\mid \left| I\right| ) \le \left\lceil \log m\right\rceil + {\textrm{E}}\left[ \left| I\right| \right] \cdot \left\lceil \log m\right\rceil = \left\lceil \log m\right\rceil ({\textrm{E}}\left[ \left| I\right| \right] +1), \end{aligned}$$

we conclude that

$$\begin{aligned} H(X\mid I) \ge \lambda m - \ell - ({\mathrm E}_{x\leftarrow X}[\left| I\right| ]+1)\left\lceil \log m\right\rceil \end{aligned}$$
(15)

Therefore,

$$\begin{aligned} H(X \mid I) = H(X_I, X_{[m] \setminus I} \mid I ) \le H(X_I\mid I)+ H( X_{[m] \setminus I} \mid I) \end{aligned}$$
(16)

Finally, since \(H(X_{[m] \setminus I} \mid I) \le H_0(X_{[m] \setminus I}) \mid I ) \le \lambda \cdot (m - {\mathrm E}_{x\leftarrow X}[\left| I\right| ])\), we conclude that

$$\begin{aligned} H(X_I\mid I ) \ge&\lambda \cdot m - \ell - \left\lceil \log m\right\rceil ({\textrm{E}}\left[ \left| I\right| \right] +1) - \lambda \cdot (m - {\textrm{E}}\left[ \left| I\right| \right] ) \\ =&(\lambda - \left\lceil \log m\right\rceil )\cdot {\textrm{E}}\left[ \left| I\right| \right] - \ell - \left\lceil \log m\right\rceil . \end{aligned}$$

   \(\square \)

4.2 Hitting Almost Full-Entropy Distributions Using Product Set, Proving Lemma 4

We start by proving the following variant of Lemma 4, stated for flat distributions, i.e., X is uniform over a set. In Sect. 4.2.1, we use this variant for proving Lemma 4.

Lemma 5

(Hitting flat distributions). Let \(m,\gamma \le \lambda \in \mathbb {N}\) be such that \(\gamma \ge 2\cdot \left\lceil \log m\right\rceil +2\), let \(\delta >0\), and let \({\mathcal {T}}\subseteq \{0, 1\}^{\lambda \cdot m}\) be a non empty set. If for all \({\mathcal {I}}\subseteq [m]\) and \(a \in \{0, 1\}^{\lambda \cdot \left| {\mathcal {I}}\right| }\), it holds that

$$\begin{aligned} \left| \{x \in {\mathcal {T}}:x_{\mathcal {I}}= a\}\right| \le {\left| {\mathcal {T}}\right| } \cdot 2^{(\gamma /2 -\lambda ) \left| {\mathcal {I}}\right| }/\delta , \end{aligned}$$
(17)

then

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m}} \left[ S \cap {\mathcal {T}}\ne \emptyset \right] \ge \delta /2. \end{aligned}$$

Proof

Let \(S=(S_1,\ldots ,S_m)\) be as in the lemma statement, i.e., uniformly distributed over \( \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m}\). We assume, without loss of generality, that the elements of each \(S_i\) are chosen in a uniform order and denote the jth element of \(S_i\), according to this order, by \(S_i[j]\). For \(y = (y_1,\ldots ,y_m) \in [2^{\gamma }]^m\), let \(S^y \in \{0, 1\}^{\lambda \times m}\) be the random variable defined by \((S^y)_i \mathbin {{\mathop {=}\limits ^\textrm{def}}}S_i[y_i]\). Let \(Z^y\) be the indicator for the event \(S^y\in {\mathcal {T}}\), and let \(Z \mathbin {{\mathop {=}\limits ^\textrm{def}}}\sum _{y\in [2^{\gamma }]^m} Z^y\). By the Paley-Zygmund inequality, Theorem 8, it holds that

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{m}} \left[ S \cap {\mathcal {T}}\ne \emptyset \right] = {\textrm{Pr}}[Z>0] \ge {{\mathrm E}[Z]^2}/{{\mathrm E}[Z^2]} . \end{aligned}$$
(18)

Thus, we prove Lemma 5 by properly bounding \({\mathrm E}[Z]\) and \({\mathrm E}[Z^2]\). Let \(\rho \mathbin {{\mathop {=}\limits ^\textrm{def}}}\frac{\left| {\mathcal {T}}\right| }{2^{m\lambda }}\). Since we associate a random order with the elements of each \(S_i\), for every \(y \in [2^{\gamma }]^m\) it holds that \({\textrm{E}}\left[ Z^y\right] = \rho \). Hence,

$$\begin{aligned} {\textrm{E}}\left[ Z\right] = \sum _{y\in [2^{\gamma }]^m} {\textrm{E}}\left[ Z^y\right] = 2^{\gamma m} \rho . \end{aligned}$$
(19)

For upper bounding \({\mathrm E}[Z^2]\), we use the following claim (proved in Sect. 4.2). In the following for \(y, y' \in [2^{\gamma }]^m\), let \({\mathcal {K}}_{y,y'} \mathbin {{\mathop {=}\limits ^\textrm{def}}}\{i \in [m] : y_i = y'_i\}\).

Claim 12

For every \(y, y' \in [2^{\gamma }]^m\) it holds that \({\textrm{Pr}}[Z^y \wedge Z^{y'}] \le 2^{\gamma \cdot |{\mathcal {K}}_{y,y'}|/2} \cdot \rho ^2/\delta \).

For \({\mathcal {K}}\subseteq [m]\), let \({\mathcal {A}}_{\mathcal {K}}\mathbin {{\mathop {=}\limits ^\textrm{def}}}\{(y,y')\in [2^{\gamma }]^{m}:{\mathcal {K}}_{y,y'} ={\mathcal {K}}\}\). Using Claim 12, we deuce that

$$\begin{aligned} {\textrm{E}}\left[ Z^2\right]&= \sum _{y, y'\in [2^{\gamma }]^m}{\textrm{Pr}}[Z^y\wedge Z^{y'}] \\&= \sum _{{\mathcal {K}}\subseteq [m]}\;\;\sum _{y,y'\in {\mathcal {A}}_{\mathcal {K}}} {\textrm{Pr}}[Z^y\wedge Z^{y'}] \nonumber \\&\le \sum _{{\mathcal {K}}\subseteq [m]}\;\;\sum _{y,y'\in {\mathcal {A}}_{\mathcal {K}}} 2^{\gamma \left| {\mathcal {K}}\right| /2}\cdot \rho ^2/\delta \nonumber \\&\le \frac{\rho ^2}{\delta } \cdot \sum _{k = 0}^m \sum _{{\mathcal {K}}\subseteq [m], \left| {\mathcal {K}}\right| = k} 2^{\gamma k} \cdot (2^{2\gamma })^{m-k} \cdot 2^{\gamma k/2}\nonumber \\&= \frac{\rho ^2}{\delta } \cdot 2^{2\gamma m}\cdot \sum _{k = 0}^m \left( {\begin{array}{c}m\\ k\end{array}}\right) \cdot 2^{-\gamma k/2}\nonumber \\&\le \frac{\rho ^2}{\delta } \cdot 2^{2\gamma m}\cdot \sum _{k = 0}^m 2^{-k \cdot ({\gamma }/2-\log m)}\nonumber \le {2\cdot }\frac{\rho ^2}{\delta } \cdot 2^{2\gamma m}.\nonumber \end{aligned}$$
(20)

The first inequality holds by Claim 12, and the last one by holds since, by assumption, \(\gamma \ge 2\cdot \left\lceil \log m\right\rceil +2\). Combining Equations (18) to (20), prove the lemma by deducing that

$$\begin{aligned} {\textrm{Pr}}[Z > 0] \ge \frac{{\mathrm E}[Z]^2}{{\mathrm E}[Z^2]} \ge \frac{(2^{\gamma m}\cdot \rho )^2}{{2\cdot }\frac{\rho ^2}{\delta } \cdot 2^{2\gamma m} } =\delta /2. \end{aligned}$$

   \(\square \)

Proving Claim 12 .

Proof

Let \({\mathcal {K}}= {\mathcal {K}}_{y,y'}\), and for \(a\in \{0, 1\}^{\lambda \left| {\mathcal {K}}\right| }\) let \({\mathcal {T}}_a = \{x \in {\mathcal {T}}:x_{{\mathcal {K}}} = a\}\). Compute

$$\begin{aligned}&{\textrm{Pr}}\left[ Z^y \wedge Z^{y'}\right] = \sum _{a \in \{0, 1\}^{\lambda \cdot \left| {\mathcal {K}}\right| } }{\textrm{Pr}}\left[ S^y_{\mathcal {K}}= a\right] \cdot {\textrm{Pr}}\left[ Z^y\wedge Z^{y'}\mid S^y_{\mathcal {K}}= a\right] \\&= \sum _{a \in \{0, 1\}^{\lambda \cdot \left| {\mathcal {K}}\right| } }{\textrm{Pr}}\left[ S^y_{\mathcal {K}}= a\right] \cdot \left( \frac{\left| {\mathcal {T}}_a\right| \cdot (\left| {\mathcal {T}}_a\right| -1)}{2^{2\lambda (m - \left| {\mathcal {K}}\right| )}}\right) \\&\le \sum _{a \in \{0, 1\}^{\lambda \cdot \left| {\mathcal {K}}\right| } }2^{-\lambda \left| {\mathcal {K}}\right| } \cdot \left( \frac{\left| {\mathcal {T}}\right| }{2^{\lambda (m-\left| {\mathcal {K}}\right| )}}\right) ^2\cdot \left( \frac{\left| {\mathcal {T}}_a\right| }{\left| {\mathcal {T}}\right| }\right) ^2 \\&\le \sum _{a \in \{0, 1\}^{\lambda \left| {\mathcal {K}}\right| } }2^{-\lambda \left| {\mathcal {K}}\right| } \cdot \left( \frac{\left| {\mathcal {T}}\right| }{2^{\lambda (m-\left| {\mathcal {K}}\right| )}}\right) ^2\cdot \frac{\left| {\mathcal {T}}_a\right| }{\left| {\mathcal {T}}\right| } \cdot 2^{(\gamma /2-\lambda )\cdot \left| {\mathcal {K}}\right| }/\delta \\&= \frac{1}{\delta } \cdot \left( \frac{\left| {\mathcal {T}}\right| }{2^{\lambda m}}\right) ^2\cdot 2^{\gamma \left| {\mathcal {K}}\right| /2}\cdot \sum _{a \in \{0, 1\}^{\lambda \left| {\mathcal {K}}\right| }} \frac{\left| {\mathcal {T}}_a\right| }{\left| {\mathcal {T}}\right| } = \frac{1}{\delta } \cdot \rho ^2 \cdot 2^{\gamma \left| {\mathcal {K}}\right| /2}. \end{aligned}$$

The second inequality holds by the assumption of the lemma (Equation (17)).    \(\square \)

4.2.1 Proving Lemma 4

Proof

of Lemma 4. Define

$$\begin{aligned} {\mathcal {T}}\mathbin {{\mathop {=}\limits ^\textrm{def}}}\{x \in \textsf{Supp}(X) :\forall {\mathcal {I}}\subseteq [m], H_{X_{{\mathcal {I}}}}(x_{{\mathcal {I}}})\ge (\lambda - \gamma /2) \cdot \left| {\mathcal {I}}\right| \} \end{aligned}$$

We partition the set \({\mathcal {T}}\) into \(2 \lambda m\) subsets, such that the elements of each part have roughly the same probability under X. Specifically, for \(i \in [2 \lambda m]\) let

$$\begin{aligned} {\mathcal {T}}^i \mathbin {{\mathop {=}\limits ^\textrm{def}}}\{x \in {\mathcal {T}}:H_X(x)\in [i-1,i)\}, \end{aligned}$$

and let \( {\mathcal {T}}^{0} \mathbin {{\mathop {=}\limits ^\textrm{def}}}\{x \in {\mathcal {T}}:H_X(x) \ge 2\lambda m\}\). By definition,

$$\begin{aligned} {\textrm{Pr}}[X\in {\mathcal {T}}^{0}] = \sum _{x\in {\mathcal {T}}^{0}} {\textrm{Pr}}[X = x] \le 2^{\lambda \cdot m} \cdot 2^{-2 \cdot \lambda \cdot m} = 2^{-\lambda \cdot m}, \end{aligned}$$

and therefore \(2^{-\lambda \cdot m} +\sum _{i\in [2\cdot \lambda \cdot m]} {\textrm{Pr}}[X\in {\mathcal {T}}^i] \ge 1\). Hence, by averaging argument, exists \(i \in [2 \lambda m]\) such that

$$\begin{aligned} {\textrm{Pr}}[X\in {\mathcal {T}}^i] \ge \frac{1 - 2^{-\lambda \cdot m}}{2 \lambda m} \ge \frac{1}{4 \lambda m} \end{aligned}$$
(21)

The second inequality hold since, by assumption, \(\lambda \ge \gamma \ge 2\). In the rest of the proof we use Lemma 5 to prove that \({\textrm{Pr}}_{S\leftarrow {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )}\left[ S\cap {\mathcal {T}}^i \ne \emptyset \right] \). Let \(X^i = X \mid _{X \in {\mathcal {T}}^i}\), and for \({\mathcal {I}}\subseteq [m]\) and \(a \in \textsf{Supp}(X^i_{{\mathcal {I}}})\), let \({\mathcal {T}}^{i}_{{\mathcal {I}}, a} \mathbin {{\mathop {=}\limits ^\textrm{def}}}\{x \in {\mathcal {T}}^i :x_{{\mathcal {I}}} = a\}\). Since \(X^i\) is almost flat, for every \(a \in \textsf{Supp}(X^i_{{\mathcal {I}}})\) and \(x \in {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\):

$$\begin{aligned} {\textrm{Pr}}[X^i_{{\mathcal {I}}} = a] = \sum _{x'\in {\mathcal {T}}^{i}_{{\mathcal {I}}, a}} {\textrm{Pr}}[X^i= x'] \ge \left| {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\right| \cdot {\textrm{Pr}}[X^i = x]/2. \end{aligned}$$

Similarly,

$$\begin{aligned} 1&= \sum _{a \in \textsf{Supp}(X^i_{{\mathcal {I}}})}{\textrm{Pr}}[X^i_{\mathcal {I}}= a] = \sum _{a \in \textsf{Supp}(X^i_{\mathcal {I}})} \, \sum _{x'\in {\mathcal {T}}^{i}_{{\mathcal {I}}, a}}{\textrm{Pr}}[X^i = x'] \\ {}&\le \sum _{a \in \textsf{Supp}(X^i_{\mathcal {I}})}\left| {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\right| \cdot 2 \cdot {\textrm{Pr}}[X^i = x] = 2\cdot \left| {\mathcal {T}}^i\right| \cdot {\textrm{Pr}}[X^i = x] \nonumber . \end{aligned}$$

Combing the above two inequalities, we get that

$$\begin{aligned} {\textrm{Pr}}[X^i_{{\mathcal {I}}} = a] \ge \frac{1/2\cdot \left| {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\right| \cdot {\textrm{Pr}}[X^i=x]}{2\cdot \left| {\mathcal {T}}^i\right| \cdot {\textrm{Pr}}[X^i = x]} = \frac{\left| {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\right| }{4\cdot \left| {\mathcal {T}}^i\right| } \end{aligned}$$
(22)

By assumption, for every \( x\in {\mathcal {T}}\) and \( {\mathcal {I}}\subseteq [m]\):

$$\begin{aligned} {\textrm{Pr}}[X_{{\mathcal {I}}} = x_{{\mathcal {I}}}] \le 2^{(\gamma /2-\lambda ) \left| {\mathcal {I}}\right| } \end{aligned}$$
(23)

Therefore, for every \(a \in \textsf{Supp}(X^i_{{\mathcal {I}}})\):

$$\begin{aligned} \frac{\left| {\mathcal {T}}^{i}_{{\mathcal {I}}, a}\right| }{\left| {\mathcal {T}}^i\right| }\le 4\cdot {\textrm{Pr}}[X^i_{\mathcal {I}}= a] \le 4\cdot \frac{{\textrm{Pr}}[X_{\mathcal {I}}= a]}{{\textrm{Pr}}[X\in {\mathcal {T}}^i]} \le 16\lambda m \cdot 2^{(\gamma /2-\lambda ) \left| {\mathcal {I}}\right| } \end{aligned}$$
(24)

The first inequality holds by Equation (22), and the third by Equation (23). Applying Lemma 5 for the set \({\mathcal {T}}^i\) with parameter \(\delta = 1/{16\lambda m}\), yields that

$$\begin{aligned} {\textrm{Pr}}_{S\leftarrow {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )}\left[ S\cap {\mathcal {T}}^i \ne \emptyset \right] \ge \frac{1}{32\lambda m}, \end{aligned}$$

and we deduce that \({\textrm{Pr}}_{S\leftarrow {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )}\left[ S\cap \textsf{Supp}(X) \ne \emptyset \right] \ge \frac{1}{32\lambda m}\).    \(\square \)

5 Lower Bound on the Length of ROM-SNARGs

In this section, we present our lower bound on the proof length of ROM-SNARGs, formally stated below (see Definition 1 for the formal definition of rETH, and Sect. 3.5 for that of salted-soundness ROM-SNARGs).

Theorem 13

(Conditional lower bound on ROMSNARGs length). Let \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) be an \(\textsf{s}\)-length ROM-SNARG for n-variable \(\textrm{3SAT}\), with \((t, \varepsilon )\)-salted-soundness, perfect completeness, and deterministic non-adaptive verifier. Let \(\mathsf {q_{\textsf{P}}}\) and \(\mathsf {q_{\textsf{V}}}\) be the query complexity of \(\textsf{P}\) and \(\textsf{V}\), respectively, let \(v\) denotes \(\textsf{V}\) ’s running time, and let \(\lambda \) denote the random oracle input and output length. Assuming rETH, if

  1. 1.

    \(\varepsilon \le 1/4\);

  2. 2.

    \(\mathsf {q_{\textsf{V}}}\cdot \lambda \in o(n)\), \(\mathsf {q_{\textsf{V}}}+ \lambda \le t^{1/10}\);

  3. 3.

    \(\log ^2(t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\in o(n)\); and

  4. 4.

    \(v \in 2^{o(n)}\),

then \(\textsf{s}\ge 2^{-15} \cdot \log t\cdot \log \frac{t}{\varepsilon }/ \log \mathsf {q_{\textsf{P}}}\).

Theorem 13 is proved using the following two lemmata. Lemma 6 states that the verifier query complexity of a short ROM-SNARG can be significantly reduced, and Lemma 7, taken from [CY20], states that the existence of a low verifier query complexity ROM-SNARGs contradicts rETH.

Lemma 6

(Short ROMSNARGs \(\rightarrow \) Low Query ROMSNARGs). Let \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) be as in Theorem 13, then for any \(\gamma \in \mathbb {N}\), there exists a verifier \(\mathsf {V'}\) such that \(\mathsf {ARG'} \mathbin {{\mathop {=}\limits ^\textrm{def}}}(\textsf{P},\mathsf {V'})\) is a ROM-SNARG for \({\mathcal {L}}\) with the following properties:

  1. 1.

    completeness \(\left( \lambda \cdot \mathsf {q_{\textsf{P}}}\cdot \mathsf {q_{\textsf{V}}}^{20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil }\right) ^{-1}\);

  2. 2.

    \(( t- \mathsf {q_{\textsf{V}}}\cdot 2^\gamma ,\varepsilon )\)-soundness;

  3. 3.

    verifier query complexity \(20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil \); and

  4. 4.

    verifier running time \(O(2^{\mathsf {q_{\textsf{V}}}\cdot \log t }\cdot v)\).

Furthermore, the transformation from \(\textsf{V}\) to \(\mathsf {V'}\) is efficient (in the description length of \(\textsf{V}\)).

In words, Lemma 6 states that there exists a generic transformation from short ROM-SNARGs into the same length ROM-SNARGs with low verifier query complexity (but worse completeness and soundness). Lemma 6 is proven in Sect. 5.2.

While not explicit in their work, the following lemma follows by similar arguments to the main proof in [CY20]. A formal proof is given in the full version of the paper.

Lemma 7

(Follows from [CY20]). Let \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) be a \((t, \varepsilon )\)-sound ROM-SNARG for n-variable \(\textrm{3SAT}\) with random oracle (input and output) length \(\lambda \), argument length \(\textsf{s}\), and let \(\mathsf {q_{\textsf{V}}}\) and \(\mathsf {q_{\textsf{P}}}\) denote \(\textsf{P}\) ’s and \(\textsf{V}\) ’s query complexity, respectively. Assume

  1. 1.

    \(\textsf{s}+ \lambda \cdot \mathsf {q_{\textsf{V}}}\in o(n)\);

  2. 2.

    \(\mathsf {q_{\textsf{V}}}\le 1/4 \cdot \log (1/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\);

  3. 3.

    completeness \(\ge \varepsilon ^{2/3}\);

  4. 4.

    \(\log ^2 (1/\varepsilon )\cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\le o(n)\); and

  5. 5.

    \(\textsf{V}\) ’s running time \(2^{o(n)}\),

then \(\textrm{3SAT}\in \textrm{BPTIME}[2^{o(n)}]\).

Note that Lemma 7 does not require \(\textsf{V}\) to be deterministic or non adaptive.

5.1 Proof of Theorem 13

Proof

of Theorem 13. Suppose we are given a SNARG \(\textsf{ARG} \) for \(\textrm{3SAT}\) that satisfies the conditions of the theorem, and assume without loss of generality that \(\mathsf {q_{\textsf{P}}}\le t^{1/10}\). (Otherwise, for \(\mathsf {q_{\textsf{P}}}> t^{1/10}\), the lower bound we need to prove can be written as \(\textsf{s}\ge 2^{-15} \cdot \log \frac{t}{\varepsilon }\), which follows by the folklore lower boundFootnote 12). Assume towards contradiction that \(\textsf{s}\le 2^{-15} \cdot \log t\cdot \log \frac{t}{\varepsilon }/ \log \mathsf {q_{\textsf{P}}}\). Theorem 13 is proved via the following steps:

  1. 1.

    Apply Lemma 2 with parameter \(k=t^{0.5}\) which yields a scheme \(\textsf{ARG} \) that has \((t', \varepsilon ')\)-salted-soundness, where \(t' = t^{1/2}\), and \(\varepsilon ' = 2\varepsilon /t^{1/2}\).

  2. 2.

    Apply Lemma 6 with \(\gamma = 1/10 \cdot \log t\), to get a ROM-SNARG \(\mathsf {ARG'} \) for \(\textrm{3SAT}\) with the following parameters:

    1. (a)

      completeness \(\left( \lambda \cdot \mathsf {q_{\textsf{P}}}\cdot \mathsf {q_{\textsf{V}}}^{20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil }\right) ^{-1}\);

    2. (b)

      \((t' - \mathsf {q_{\textsf{V}}}\cdot 2^\gamma ,\varepsilon ')\)-soundness.

    3. (c)

      verifier query complexity \(\mathsf {q_{\textsf{V}}}' = 20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil \); and

    4. (d)

      verifier running time \(v' = O(2^{\mathsf {q_{\textsf{V}}}\cdot \log t }\cdot v)\).

  3. 3.

    Apply Lemma 7 on \(\mathsf {ARG'} \) to contradict rETH. For this, we need to verify that all five conditions of the lemma apply. Indeed,

    1. (i)

      \(\textsf{s}+ \lambda \cdot \mathsf {q_{\textsf{V}}}' \in o(n)\): First, observe that \(\textsf{s}\le 2^{-15} \cdot \log t\cdot \log \frac{t}{\varepsilon }/ \log \mathsf {q_{\textsf{P}}}\in o(n)\). Then, since \(\lambda \cdot \mathsf {q_{\textsf{V}}}\in o(n)\), we get that \(\lambda \cdot \mathsf {q_{\textsf{V}}}' = O(\lambda \cdot \textsf{s}/\gamma ) = O(\log t\cdot \textsf{s}/\log t) = o(n)\). Together, we have that \(\textsf{s}+ \lambda \cdot \mathsf {q_{\textsf{V}}}' \le o(n) + o(n) = o(n)\):

    2. (ii)

      \(\mathsf {q_{\textsf{V}}}' \le 1/4 \cdot \log (1/\varepsilon ') \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\): the query complexity of the verifier of \(\mathsf {ARG'} \) is

      $$\begin{aligned} \mathsf {q_{\textsf{V}}}'&\le 20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil \le 20 \cdot \left\lceil \frac{2^{-15} \cdot \log t\cdot \log \frac{t}{\varepsilon }/ \log \mathsf {q_{\textsf{P}}}}{1/10 \cdot \log t}\right\rceil \le 1/8 \cdot \log \frac{t}{\varepsilon } \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\\ {}&\le 1/4 \cdot \log \frac{t^{1/2}}{2\varepsilon } \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}= 1/4 \cdot \log \frac{1}{\varepsilon '} \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}. \end{aligned}$$
    3. (iii)

      completeness \(\ge \varepsilon '^{2/3}\): Observe that \(20\left\lceil \textsf{s}/\gamma \right\rceil \le 2^{-10} \cdot \log (t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\). Thus, the completeness of our scheme satisfies:

      $$\begin{aligned}&\left( \lambda \cdot \mathsf {q_{\textsf{P}}}\cdot \mathsf {q_{\textsf{V}}}^{20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil }\right) ^{-1} \ge \left( t^{1/10} \cdot t^{1/10} \cdot \mathsf {q_{\textsf{V}}}^{2^{-10} \cdot \log (t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}}\right) ^{-1} \\ {}&\ge 2^{- 2/10\log t- 2^{-10} \cdot \log (t/\varepsilon )} \ge 2^{- 2/10\log t- 2^{-9} \cdot \log (t^{1/2}/2\varepsilon )} \\ {}&\ge 2^{-3/10 \cdot \log (t^{1/2}/2\varepsilon )} = 2^{3/10 \cdot \log (\varepsilon ') } \ge \varepsilon '^{2/3} . \end{aligned}$$
    4. (iv)

      \(\log ^2 (1/\varepsilon ') \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}\le o(n)\): By the definition of \(\varepsilon '\) and the conditions of the theorem we get that \(\log ^2(1/\varepsilon ') \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}= O(\log ^2(t/\varepsilon ) \cdot \log ^{-1} \mathsf {q_{\textsf{P}}}) = o(n)\).

    5. (v)

      \(\textsf{V}\) ’s running time \(2^{o(n)}\): The verifier running time of the scheme is \(O(2^{\mathsf {q_{\textsf{V}}}\cdot \log t }\cdot v)\). Since \(\mathsf {q_{\textsf{V}}}\cdot \log t= o(n)\) and \(v= 2^{o(n)}\), its total running time is \(2^{o(n)}\).

  4. 4.

    We conclude that \(\textrm{3SAT}\in \textrm{BPTIME}[2^{o(n)}]\), contradicting rETH.

   \(\square \)

5.2 Short ROM-SNARGs to Low Query ROM-SNARGs, Proving Lemma 6

In this section, we prove Lemma 6 (see Sect. 2 for a high-level overview of the proof). Let \(\textsf{ARG} = (\textsf{P}, \textsf{V})\) be ROM-SNARG with \((t, \varepsilon )\)-salted soundness, random oracle of length \(\lambda \), a non-adaptive deterministic verifier, prover query complexity \(\mathsf {q_{\textsf{P}}}\), and verifier query complexity \(\mathsf {q_{\textsf{V}}}\). The low query verifier \(\mathsf {V'}\) is defined as follows:

Algorithm 14

(Low-query verifier \(\mathsf {V'}\) ).

Oracle: \(\zeta :\{0, 1\}^\lambda \mapsto \{0, 1\}^\lambda \).

Parameter: \(\gamma \le \lambda \). Let \(k =20 \left\lceil \textsf{s}/\gamma \right\rceil \).

Input: Instance \(\mathbbm {x}\) and proof \(\pi \).

Operation: 

  1. 1.

    Emulate \(\textsf{V}\) on \((\mathbbm {x},\pi )\) to get a list of queries \(w=(w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\).

  2. 2.

    Sample \(k' \in [k]\), uniformly st random and uniformly sample a \(k'\)-size subset \({\mathcal {J}}\subseteq [\mathsf {q_{\textsf{V}}}]\).

  3. 3.

    For each \(i \in [\mathsf {q_{\textsf{V}}}]\):

       If \(i \in {\mathcal {J}}\), set \(S_i = \{\zeta (w_i)\}\).

       Otherwise, let \(S_i\) be a \(2^\gamma \)-size random subset of \(\{0, 1\}^\lambda \).

  4. 4.

    Accept if there exists \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}}) \in S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}\) that make \(\textsf{V}\) accepts given \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as answers to its oracle queries.

It is easy to observe that \(\mathsf {V'}\) has the desired query complexity and running time. Thus, it is left to prove that \(\mathsf {ARG'} = (\textsf{P},\mathsf {V'})\) has the desired completeness and soundness. The completeness of \(\mathsf {ARG'}\) is analyzed in Sect. 5.2.1 and its soundness in Sect. 5.2.2. We put things together in Sect. 5.2.3.

5.2.1 Completeness

We prove the following lower bound on the completeness of \(\mathsf {ARG'}\).

Claim

\(\mathsf {ARG'}\) has completeness \(\ge \left( \lambda \cdot \mathsf {q_{\textsf{P}}}\cdot \mathsf {q_{\textsf{V}}}^{20 \cdot \left\lceil \textsf{s}/\gamma \right\rceil }\right) ^{-1}\).

In the following, we assume for simplicity that the \(\textsf{V}\) ’s queries are (always) a subset of the \(\textsf{P}\) ’s queries. (The proof without this assumption follows very similar lines, though with more complicated notation. Also, one could always modify the honest prover to perform all the verifier’s queries, this comes with a negligible cost that has no effect on our results.)

Proof

We associate the following random variable with the probability space defined by the choice of \(\zeta \) over the (honest) execution of \((\textsf{P} ^\zeta (\mathbbm {w}),\mathsf {V'} ^\zeta )(\mathbbm {x})\): denote \(\textsf{P}\) ’s queries by \(X=(X_1,\ldots ,X_{\mathsf {q_{\textsf{P}}}})\), define \(Z=(Z_1,\ldots ,Z_{\mathsf {q_{\textsf{P}}}})\) by \(Z_i =\zeta (X_i)\), and let \(\varPi \) denote the proof sent by \(\textsf{P}\). We assume for ease of notation that the queries that \(\textsf{V}\) would have made on the proof \(\varPi \) are just \(X_1,\ldots ,X_{\mathsf {q_{\textsf{V}}}}\).

The length of \(\varPi \) is \(\textsf{s}\), thus a standard argument yields that \(H(\varPi ) \le H_0(\varPi ) \le \textsf{s}.\) Since each \(Z_i\) is a bit string of length \(\lambda \) (recall that \(\lambda \) is the output length of \(\zeta \)), it holds that \(H(Z \mid \varPi ) \ge H(Z) - H(\varPi ) \ge \lambda \cdot \mathsf {q_{\textsf{P}}}- \textsf{s}\).

Since (by definition) \(H(Z \mid \varPi ) = {\mathrm E}_{\pi \leftarrow \varPi }[ H(Z \mid {\varPi = \pi })]\), with probability at least 1/2 over \(\pi \leftarrow \varPi \), it holds that \(H(Z\mid {\varPi = \pi })\ge \lambda \cdot \mathsf {q_{\textsf{P}}}- 2\cdot \textsf{s}\). Fix any such proof \(\pi \), and let \(Y=(Y_1,\ldots ,Y_{\mathsf {q_{\textsf{P}}}}) = Z \mid _{\varPi = \pi }\). For \(\ell = 2 \cdot \textsf{s}\), it holds that \(H(Y) \ge \lambda \cdot \mathsf {q_{\textsf{P}}}- \ell .\) Applying Theorem 11 on Y yields that with probability 1/2 over \(y\leftarrow Y\) there exists a subset \({\mathcal {B}}\subseteq [\mathsf {q_{\textsf{P}}}]\) with \(\left| {\mathcal {B}}\right| \le \left\lfloor 8 \ell /{\gamma }\right\rfloor +4\) such that:

$$\begin{aligned} {\textrm{Pr}}_{S \leftarrow \left( {\mathcal {P}}_{2^\gamma }(\{0, 1\}^\lambda )\right) ^{\mathsf {q_{\textsf{P}}}- |{\mathcal {B}}|}}\left[ S \cap \textsf{Supp}(Y \mid _{Y_{{\mathcal {B}}}= y_{{\mathcal {B}}}}) \ne \emptyset \right] \ge \frac{1}{32\cdot \lambda \cdot \mathsf {q_{\textsf{P}}}} . \end{aligned}$$
(25)

An immediate corollary of Equation (25) is that with probability at least 1/2 over the choice of \(y\leftarrow Y\), the following process outputs 1 with probability \(\frac{1}{32\cdot \lambda \cdot \mathsf {q_{\textsf{P}}}}\):

  1. 1.

    For each \(i \in [\mathsf {q_{\textsf{V}}}]\): If \(i \in {\mathcal {B}}\), set \(S_i = \{y_i\}\). Otherwise, let \(S_i\) be a \(2^\gamma \)-size random subset of \(\{0, 1\}^\lambda \).

  2. 2.

    Output 1 if \((S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}) \cap \textsf{Supp}((Y \mid _{Y_{{\mathcal {B}}}= y_{{\mathcal {B}}}})_{[\mathsf {q_{\textsf{V}}}]}) \ne \emptyset \).

The perfect completeness of the argument scheme \(\textsf{ARG}\) yields that for any \(\pi \in \textsf{Supp}(\varPi )\), it holds that \(\textsf{V} (\mathbbm {x},\pi )\) accepts on any value of \(z\in \textsf{Supp}((Y=Z|_{\varPi =\pi })_{[\mathsf {q_{\textsf{V}}}]})\) given as oracle answers. Thus, it accepts any value of \(z\in \textsf{Supp}((Y \mid _{Y_{{\mathcal {B}}}= y_{{\mathcal {B}}}})_{[\mathsf {q_{\textsf{V}}}]})\) for any \(y\in \textsf{Supp}(Y)\).

We deduce that \(\mathsf {V'}\) accepts with this probability, assuming that \({\mathcal {J}}= {\mathcal {B}}\cap [\mathsf {q_{\textsf{V}}}]\). Noting that \(\left| {\mathcal {B}}\right| \le \left\lfloor 8 \ell /{\gamma }\right\rfloor +4 = \left\lfloor 16\textsf{s}/\gamma \right\rfloor + 4 \le 20 \left\lceil \textsf{s}/\gamma \right\rceil = k\), the latter happens with probability at least \(k^{-1} \cdot {\mathsf {q_{\textsf{V}}}\atopwithdelims ()k}^{-1}\). We conclude that \(\mathsf {V'}\) accepts with probability at least

$$\begin{aligned}&\frac{1}{2} \cdot \frac{1}{2} \cdot \frac{1}{32\cdot \lambda \cdot \mathsf {q_{\textsf{P}}}} \cdot \frac{1}{k} \cdot \frac{1}{{\mathsf {q_{\textsf{V}}}\atopwithdelims ()k}} \ge \frac{1}{128\cdot \lambda \cdot \mathsf {q_{\textsf{P}}}} \cdot \frac{1}{k} \cdot \frac{(k/e)^k}{\mathsf {q_{\textsf{V}}}^k} \\ {}&\ge \frac{1}{e \cdot 128 \cdot \lambda \cdot \mathsf {q_{\textsf{P}}}} \frac{(k/e)^{k-1}}{\mathsf {q_{\textsf{V}}}^k} \ge \frac{1}{e \cdot 128 \cdot \lambda \cdot \mathsf {q_{\textsf{P}}}} \frac{(20/e)^{19}}{\mathsf {q_{\textsf{V}}}^k} \ge \frac{1}{\lambda \cdot \mathsf {q_{\textsf{P}}}\cdot \mathsf {q_{\textsf{V}}}^k} . \end{aligned}$$

   \(\square \)

5.2.2 Soundness

We prove the following upper bound on the soundness error of \(\mathsf {ARG'}\).

Claim

\(\mathsf {ARG'}\) has \(( t- \mathsf {q_{\textsf{V}}}\cdot 2^\gamma ,\varepsilon )\)-soundness.

Proof

Let \(\widetilde{\mathsf {P'}}\) be a \(t' \,{:=}\, t- \mathsf {q_{\textsf{V}}}\cdot 2^{\gamma }\)-query cheating prover such that \(\textrm{Pr}\left[ \langle \widetilde{\mathsf {P'}},\right. \left. \mathsf {V'} (\mathbbm {x}) \rangle = 1\right] > \varepsilon \), for some \(\mathbbm {x}\notin {\mathcal {L}}\). We show how to use \(\widetilde{\textsf{P}}\) to construct the following \(t\)-query cheating prover \(\widetilde{\textsf{P}}\) such that \({\textrm{Pr}}\left[ \textsf{SaltedSoundess}_{{\textsf{V}}, \lambda , t'}(\widetilde{\textsf{P}},\mathbbm {x})=1\right] > \varepsilon \), violating the assumed salted-soundness of (\(\textsf{P}\), \(\textsf{V}\)).

We assume without loss of generality that \(\widetilde{\mathsf {P'}}\) is deterministic. Indeed, since \(\widetilde{\textsf{P}}\) is computationally unbounded (it is only bounded by its query complexity to the random oracle), it has sufficient time to enumerate all random strings and choose the best one.

Algorithm 15

(\(\widetilde{\textsf{P}}\)).

Oracle: \(\zeta :\{0, 1\}^\lambda \mapsto \{0, 1\}^\lambda \).

Input: Instance \(\mathbbm {x}\).

  1. 1.

    Run \(\widetilde{\mathsf {P'}}^\zeta (\mathbbm {x})\) to generate a proof \(\pi \).

  2. 2.

    Emulate \(\textsf{V} \) on \((\mathbbm {x},\pi )\) to determine its list of oracle queries \((w_1, \ldots , w_{\mathsf {q_{\textsf{V}}}})\).

  3. 3.

    For \(i=1,\ldots , \mathsf {q_{\textsf{V}}}\):

    1. (a)

      Query \(\zeta \) on \(w_i\) for \(2^{\gamma }\) times. Let \(S_i\) be the set of answers.

    2. (b)

      If \(w_i\) was asked by \(\widetilde{\mathsf {P'}}\) in Step 1, add the retrieved answer to \(S_i\).

  4. 4.

    If there exists \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}}) \in S_{1}\times \ldots \times S_{\mathsf {q_{\textsf{V}}}}\) that make \(\textsf{V}\) accept \((\mathbbm {x},\pi )\) with \((y_1, \ldots , y_{\mathsf {q_{\textsf{V}}}})\) as the answers to its oracle queries, program \(\zeta (w_i) = y_i\) for each \(i \in [\mathsf {q_{\textsf{V}}}]\). Output \(\pi \).

Recall that for \(i\in {\mathcal {J}}\), the verifier \(\mathsf {V'}\) sets \(S_i\) to be the output of a single call to the oracle, and for \(i\notin {\mathcal {J}}\), it sets \(S_i\) to \(2^\gamma \) random strings in \(\{0, 1\}^\lambda \). Hence, for every choice of \(\zeta \), there exists a coupling between the sets \(S_i\) sampled by \(\mathsf {V'}\) to the sets \(\widetilde{S}_i\) sampled by \(\widetilde{\textsf{P}}\) with \(\widetilde{S}_i \supseteq S_i\) for every i. It follows that the probability that \(\widetilde{\textsf{P}}\) makes \(\textsf{V}\) accept \(\mathbbm {x}\) is at least as high as the probability that \(\widetilde{\mathsf {P'}}\) makes \(\mathsf {P'}\) accept \(\mathbbm {x}\), which by assumption is at least \(\varepsilon \). This concludes the proof since by construction, \(\widetilde{\mathsf {P'}}\) makes \(t'\) queries.    \(\square \)

5.2.3 Putting It Together

Proof

of Lemma 6. Immediately follows by Sects. 5.2.1 and 5.2.2.    \(\square \)