Keywords

1 Introduction

A wide class of cryptographic primitives can be constructed from one-way functions, which is the minimal assumption for cryptography. Informally, a function f is called a one-way function if it is easy to compute, but hard to invert by polynomial-time algorithms. Two important primitives that can be constructed from one-way functions are pseudorandom generators (PRGs) [5, 22] and universal one-way hash functions (UOWHFs) [19]. These two primitives are useful for constructing even more powerful primitives such as encryption, digital signatures and commitments. Thus, an improvement in the efficiency of constructions for PRGs and UWOHFs would have an effect on other primitives. Yet, the optimal efficiency of these two basic primitives is not fully understood.

There are several important efficiency measures to account for when considering PRGs and UOWHFs. For PRG constructions, one aims to minimize the seed length and the number of calls to the one-way function f. For UOWHF constructions, there is a need to minimize the key length and the number of calls to f. Besides these two measurements, another important parameter is the adaptivity of the calls. That is, if the inputs for the one-way function are independent of the output of previous calls, then the construction can be implemented in parallel. By contrast, if the calls are adaptive, one must make them sequentially.

Constructions. Much progress was done since the notion of PRGs has been introduced. The first construction of pseudorandom generators was given by Blum and Micali [5] based on the assumption that a specific function is hard to invert. This construction was generalized by Yao [22] to work with any one-way permutation. Since then, many subsequent works made effort to construct PRGs based on arbitrary one-way functions. Notably, through introducing the randomized iterateFootnote 1 method, Goldreich, Krawczyk and Luby [8] gave a PRG construction from any unknown-regular one-way function. The notion of regular one-way function is a refinement of a one-way permutation: A one-way function f is called regular if for every n and \(x,x'\) with \(\left| x\right| = \left| x'\right| =n\) it holds that \(\left| f^{-1}(f(x))\right| = \left| f^{-1}(f(x'))\right| \). We say that the function is unknown-regular if the regularity parameter, \(\left| f^{-1}(f(x))\right| \), may not be a computable function of n. More recently, the randomized iterate method was further studied by [11, 23], who reached a construction of PRGs from any unknown-regular one-way functions, while having \(O(n\log n)\) seed length and making \(O(n/\log n)\) calls to the one-way function. [25] improved the seed length up to \(\omega (n)\) by using a transformation that converts any unknown-regular function into a function that is known-regular on its image.

For arbitrary one-way function, a seminal work by Håstad, Impagliazzo, Levin and Luby [15] gave the first PRG construction. Since then, the efficiency has been improved by many works [10, 13, 16, 21]. Currently, the state-of-the-art construction of PRGs due to [21] uses \(O(n^3)\) bits of random seed and \(O(n^3)\) adaptive calls to the one-way function, or alternatively seed of size \(O(n^4)\) with non-adaptive calls [13, 21].Footnote 2

The constructions of UOWHFs use similar ideas to the constructions of PRGs. Still, the best PRGs constructions from arbitrary one-way functions are more efficient than the best known UOWHFs constructions. Rompel [20] gave the first UOWHF construction from arbitrary one-way functions. The efficiency was improved by [12], who gave a construction of UOWHF using \(O(n^6)\) adaptive calls with a key of size \(O(n^7)\). Constructing a UOWHF using \(O(n^3)\) calls to the one-way function is still an interesting open question.

The efficiency of UOWHF based on an unknown-regular one-way function is similar to the efficiency of the unknown-regular based PRGs. Interestingly, this was shown by [2] using the same method of randomized iterate, resulting in a construction that uses \(\varTheta (n)\) key length and \(\varTheta (n)\) calls. We stress that when the regularity of f is known (i.e., can be computed efficiently given n), there are much more efficient constructions for both PRGs and UOWHFs [7, 9, 19, 23].

Lower Bounds. The lower bounds for black-box constructions are relatively far from the upper bounds. In this line of work, there are two incomparable types of results. The first type, due to [6] is stated with terms of the stretching and compression of the PRG and UOWHF, respectively. Specifically, [6] showed that any black-box PRG construction \(G:\left\{ 0,1\right\} ^m \rightarrow \left\{ 0,1\right\} ^{m+s}\) from f must use \(\varOmega (s/\log n)\) calls to f. Similarly, any black box UWOHF construction with input size m and output size \(m-s\) must use \(\varOmega (s/\log n)\) calls. In the second type of results [17] showed that any black-box PRG construction from f must use \(\varOmega (n/\log n)\) calls to f, even for 1-bit stretching. [3] showed similar results for 1-bit compressing UWOHF.

As mentioned, there is a substantial gap between the aforementioned lower and upper bounds. One explanation for that gap is that all of the above lower bounds hold even when the one-way function f is unknown-regular. For this case, these bounds are known to be tight with the mentioned above constructions, which are based on randomized iterations. These constructions, however, are adaptive.

1.1 Our Contribution

In this paper, we give non-adaptive constructions of tight call complexity for PRGs and UOWHFs from unknown-regular one-way functions. Both of our constructions are quite simple and are very similar to each other. Same as previous results, the security of our constructions holds also if f is only almost-regular [23], which means that for every \(\left| x\right| =\left| x'\right| \), the ratio between \(\left| f^{-1}(f(x))\right| \) and \(\left| f^{-1}(f(x'))\right| \) is only bounded by a polynomial in \(\left| x\right| \) (compared to a ratio of 1, in the case of regular functions).

The seed (or key) length in our construction for PRGs (or UOWHFs respectively) is \(O(n^2)\), compared to bits in the previous adaptive constructions. This seems unavoidable and raises an interesting open question.Footnote 3

Our Constructions and Results. In this section, we present our constructions. The results here are stated for regular one-way functions but can be naturally expanded to almost-regular functions, as stated in Sects. 3 and 4. The main crux of the construction is the following observation. For regular f and i.i.d uniform random variables \(X_1\), \(X_2\) over \(\left\{ 0,1\right\} ^n\), given any fixing of \(f(X_1)\), both the entropy and min-entropy of the pair \(X_1,f(X_2)\) are exactly n. To see the above, recall that for regular f with (unknown) regularity parameter r, it holds that there are exactly r possible values for \(X_1\) given \(f(X_1)\), and exactly \(2^n/r\) possible values for \(f(X_2)\). Thus, the regularity parameter r “cancels out" when considering the number of possible values (given \(f(X_1)\)) of the pair \(X_1,f(X_2)\), which is \(r\cdot 2^n/r=2^n\). In the PRG construction, we exploit this fact by using a universal family of hash functions \(\mathcal {H}\) (and the Goldreich-Levin theorem) in order to extract pseudo-uniform bits. In the UOWHF construction, we use similar ideas in order to compress the pair \(X_1,f(X_2)\) without creating too many collisions. For both constructions, we need additional properties from the universal family \(\mathcal {H}\) that we ignore for this introduction. See more details in Sects. 3 and 4. We next present the constructions. The main ideas of the proofs for the following theorems are described in Sect. 1.2.

A Simple Construction of PRGs From Regular One-Way Functions. We start with a description of our PRG construction. Let \(\mathcal {H}= \left\{ h:\left\{ 0,1\right\} ^{2n}\rightarrow \left\{ 0,1\right\} ^{n+\log n}\right\} \) be a family of 2-universal hash functions. For a regular one-way function \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) and an integer \(t\in \mathbb {N}\),Footnote 4 the generator \(G_{t}:\mathcal {H}\times \left\{ 0,1\right\} ^{ n(t+1)}\rightarrow \mathcal {H}\times \left\{ 0,1\right\} ^{t\cdot (n+\log n)}\) is given by

$$\begin{aligned} G_t\big (h,x_1,\dots , x_{t+1}\big )=\left( h,h(x_1,f(x_{2})), \dots ,h(x_t,f(x_{t+1}))\right) \end{aligned}$$

We show that for every polynomial t, the distribution \(G_{t}(\mathcal {H}, X_1,\dots , X_{t})\) is pseudorandom. Note that the input length of \(G_{t}\) is \(|h| + n\cdot (t+1)\) and the output length is \(|h| + t\cdot (n+\log n)\). By making \(t = \varTheta (n/\log n)\) calls, we show that \(G_{t}\) is indeed a pseudorandom generator.

Theorem 1.1

[Main theorem for PRG, informal] Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an unknown-regular one-way function and let \(t(n) \ge n/\log n+1\) be some polynomial. Then, \(G_t\) is a PRG with seed length \(O(n^2+n(t(n)+1))\). Furthermore, \(G_t\) makes t(n) non-adaptive calls to f.

A Simple Construction of UOWHFs From Regular One-Way Functions. Now we introduce the construction of the UOWHFs. It is a well-known fact that in order to construct UWOHF, it is sufficient to construct a function for which it is hard to find a collision for a random input. Let f be a one-way function, let t be a parameter and let \(\mathcal {H}= \left\{ h:\left\{ 0,1\right\} ^{2n}\rightarrow \left\{ 0,1\right\} ^{n-\log n}\right\} \) be a family of hash functions. We define the function \(C_{t}:\mathcal {H}\times \left\{ 0,1\right\} ^{ n\cdot t}\rightarrow \mathcal {H}\times \left\{ 0,1\right\} ^{(t-1)\cdot (n-\log n)+2n}\) as

$$\begin{aligned} C_{t}\left( h,x_1,\dots , x_t\right) = \left( h,f(x_1),h(x_1,f(x_{2})), \dots ,h(x_{t-1},f(x_{t})),x_t\right) \end{aligned}$$

The main difference of this construction from the PRG one is that h is now a shrinking function. In addition, we also output \(f(x_1)\) and the very last input of \(C_t\). As before, since the output length of UOWHFs has to be shorter than the input length, we have to make up for the additional output \((f(x_1), x_t)\) by taking t to be \(\varTheta (n/\log n)\).

The OUWHF can now be defined using \(C_t\). Let \(k=\log \left| \mathcal {H}\right| + n\cdot t\) and for a string \(z\in \left\{ 0,1\right\} ^k\), let \(C_z\) be the function defined by \(C_z(w)= C_t(w\oplus z)\) for every \(w\in \left\{ 0,1\right\} ^k\). Our main theorem for this part is stated as follows.

Theorem 1.2

[Main theorem for UOWHF, informal] Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an unknown-regular one-way function and let \(t(n) \ge n/\log n+2\) be some polynomial. Then, \(\left\{ C_z\right\} _{z\in \left\{ 0,1\right\} ^k}\) is a family of universal one-way hash functions with key length \(k=O(n^2+n\cdot t(n))\) and output length \(O(n^2+n\cdot t(n))\). Furthermore, for every \(z\in \left\{ 0,1\right\} ^k\), \(C_z\) makes t non-adaptive calls to f.

1.2 Proof Overview

Here we give a short overview of our proofs. For both constructions, the proof boils down to showing that each input pair \(x_i,x_{i+1}\) induces a weak version of the desired primitive. For PRG, the main part of the security proof is showing that given \(f(x_1)\) and h, it is hard to distinguish between \(h(x_1,f(x_2))\) and a uniform string. For UOWHF, we prove the security by showing that given \(h,x_1,x_2\), it is hard to find a collision \(h,x'_1,x'_2\) to the function \(C(h,x_1,x_2)=h,f(x_1),h(x_1,f(x_2))\). Note that it may be easy to find \(x'_2 \ne x_2\) with \(f(x'_2)=f(x_2)\). To solve this, we further demand that \(f(x'_2)\ne f(x_2)\).Footnote 5 To show that this is enough, we prove that any collision in our UOWHF must contain a collision in the above form, for at least one input pair. Below we give short descriptions of the main ideas in more details.

The PRG Construction. We start by sketching the security proof for the PRG. Let \(X_1\) and \(X_2\) be uniform random variables over \(\left\{ 0,1\right\} ^n\), and let h be a hash function, uniformly sampled from a universal family of hash functions \(\mathcal {H}= \left\{ h:\left\{ 0,1\right\} ^{2n}\rightarrow \left\{ 0,1\right\} ^{n+\log n}\right\} \). Recall that we want to show that given h and \(f(X_1)\), it holds that \(h(X_1,f(X_2))\) is computationally indistinguishable from uniform \(n+\log n\) bits. For simplicity, assume that we are only interested in proving that the distinguish advantage is at most \(n^{-c}\), for some constant \(c>1\).

The main observation is that for regular f, given \(f(X_1)\), the pair \(X_1,f(X_2)\) has exactly n bits of min-entropy. Thus, by the leftover hash lemma, the \(n-O(c\log n)\) first bits of \(h(X_1,f(X_2))\) are \(n^{-c}/2\) statistically close to uniform. To argue that the suffix of \(h(X_1,f(X_2))\) looks uniform, we show that \(g(x_1,y)=h,f(x_1),h(x_1,y)_{1,\dots ,n-O(c\log n)}\) is a one-way function,Footnote 6 and thus we can use Goldreich-Levin in order to extract additional \(O(c\log n)\) pseudorandom bits from \(X_1,f(X_2)\).

The UOWHF Construction. We now sketch the security proof for the UOWHF. Let \(\mathcal {H}\) be a universal family of hash functions \(\left\{ h:\left\{ 0,1\right\} ^{2n}\rightarrow \left\{ 0,1\right\} ^{n-\log n}\right\} \). We show that given random h and uniformly sampled \(x_1\) and \(x_2\) from \(\left\{ 0,1\right\} ^n\), it is hard to find \((x'_1,x'_2)\ne (x_1,x_2)\) such that \(f(x_1)=f(x'_1)\), \(f(x_2)\ne f(x'_2)\) and yet \(h(x_1,f(x_2))=h(x'_1,f(x'_2))\). For \(x_1,x_2 \in \left\{ 0,1\right\} ^n\) and \(h\in \mathcal {H}\) we define

$$\begin{aligned} {\mathcal {G}}_{h,x_1,x_2}:=\left\{ (x'_1,y):h(x_1,f(x_2))= h(x'_1,y) \ \wedge \ f(x_1)=f(x'_1) \ \wedge \ y\in Im(f)\right\} . \end{aligned}$$

That is, the set \({\mathcal {G}}_{h,x_1,x_2}\) contains all the pairs \((x'_1,f(x'_2))\) for which \(h,x'_1,x'_2\) collides with \(h,x_1,x_2\). The main observation here is that, since h outputs \(n-\log n\) bits, and there are exactly \(2^n\) pairs \((x'_1,y)\) such that \(y\in Im(f)\) and \(f(x'_1)=f(x_1)\), the expected size of \({\mathcal {G}}_{h,x_1,x_2}\) is at most \(2^n/2^{n-\log n}=n\). Thus, we can use an algorithm \(\mathsf{A}\) that finds a collision in the above function in order to invert f: Given input y, we choose random \(x_1,x_2\in \left\{ 0,1\right\} ^n\) and plant y in \({\mathcal {G}}_{h,x_1,x_2}\). That is, we choose a random h conditioned on the event that \(h(x_1,f(x_2))=h(x'_1,y)\) for some \(x'_1 \in f^{-1}(f(x_1))\). Since there are about n such pairs, we can hope that the planted pair \((x'_1,y)\) will be output by \(\mathsf{A}\) with good probability.

However, we need to find \(x'_1\) for which the pair \((x'_1,y)\) has a good probability to be output by \(\mathsf{A}\). To do that, we also use \(\mathsf{A}\) in order to find a pre-image \(x'_1\) of \(f(x_1)\), and then show that \(x'_1\) has a good probability to be output again by \(\mathsf{A}\).Footnote 7 For more details, see Sect. 4.

1.3 Additional Related Work

Arbitrary One-Way Functions. In [12], the notion of inaccessible entropy (introduced in [14]) was used in order to construct UOWHF. Similar techniques were later used in [10] to construct PRG, where the notion of inaccessible entropy was replaced with next-block pseudoentropy. This construction was later simplified by [21], who also improved the seed length with the cost of adaptivity. Lately [1] pointed out that the notions of accessible entropy and next-block pseudoentropy are deeply related to each other.

Regular One-Way Functions. As mentioned above, the construction from regular one-way functions are more efficient. Beside almost-regular, a few refinements of regularity were considered in past works. [4] showed a construction for UOWHF that uses \(O(ns^6(n))\) key-length under the assumption that \(f^{-1}(f(x))\) is concentrated in an interval of size \(2^{s(n)}\). [24] considered unknown-weakly-regular functions. The last are functions for which the set of inputs with maximal number of siblings is of fraction at least \(n^{-c}\) for some constant c. For such functions, [24] presented PRG with \(O(n\log n)\) seed-length and \(O(n^{2c+1})\) calls. [23] considered known-almost-regular and unknown-weakly-regular functions. For the last, [23] showed a tight construction of UOWHF based on the randomized iterate method.

1.4 Paper Organisation

Formal definitions are given in Sect. 2. The PRG construction and proof of Theorem 1.1 are in Sect. 3. The UOWHF construction and proof of Theorem 1.2 are in Sect. 4.

2 Preliminaries

2.1 Notations

We use calligraphic letters to denote sets, uppercase for random variables, and lowercase for values and functions. For \(n \in {\mathbb {N}}\), let \([n] :=\left\{ 1,\dots ,n\right\} \). Given a vector \(s\in \left\{ 0,1\right\} ^n\), let \(s_i\) denote its i-th entry, and \(s_{1,\dots , i}\) denote its first i entries. For \(s,w\in \left\{ 0,1\right\} ^*\) we use \(s\circ w\) to denote their concatenation and for \(s,w\in \left\{ 0,1\right\} ^n\), we use \(s\oplus w \in \left\{ 0,1\right\} ^n\) to denote their bit-wise XOR.

The support of a distribution P over a finite set \(\mathcal {S}\) is defined by \({\text {Supp}}(P) :=\left\{ x\in \mathcal {S}: P(x)>0\right\} \). For a (discrete) distribution D let \(d\leftarrow D\) denote that d was sampled according to D. Similarly, for a set \(\mathcal {S}\), let \(s\leftarrow \mathcal {S}\) denote that s is drawn uniformly from \(\mathcal {S}\). For a function \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\), let \(y\leftarrow f(\left\{ 0,1\right\} ^n)\) denote that y sampled from the following distribution: sample x uniformly from \(\left\{ 0,1\right\} ^n\), and let \(y=f(x)\). Let \(\mathsf{Im}(f) ~:=\left\{ f(x) :x\in \left\{ 0,1\right\} ^n\right\} \) be the image of f. The statistical distance (also known as,  variation distance) of two distributions P and Q over a discrete domain \(\mathcal {X}\) is defined by \(\mathsf {\textsc {SD}}({P},{Q}) :=\max _{\mathcal {S}\subseteq \mathcal {X}} \left| P(\mathcal {S})-Q(\mathcal {S})\right| = \frac{1}{2} \sum _{x \in \mathcal {S}}\left| P(x)-Q(x)\right| \). The min-entropy of a distribution X, denoted by \(\mathrm {{H}_{\infty }}(X)\) is defined by \(\mathrm {{H}_{\infty }}(X):=-\log (\max _{x\in {\text {Supp}}(X)}\left\{ \Pr \left[ X=x\right] \right\} )\).

Let \({\text {poly}}\) denote the set of all polynomials, and let PPT stand for probabilistic polynomial time. A function \(\nu :{\mathbb {N}}\rightarrow [0,1]\) is negligible, denoted \(\nu (n) = neg(n)\), if \(\nu (n) < 1/p(n)\) for every \(p\in {\text {poly}}\) and large enough n. Lastly, we identify a matrix \(M\in \left\{ 0,1\right\} ^{n \times m}\) with a function \(M:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^m\) by \(M(x):=x\cdot M\), thinking of \(x\in \left\{ 0,1\right\} ^n\) as a vector with dimension n.

2.2 One-Way Functions

We now formally define basic cryptographic primitives. We start with the definition of one-way function.

Definition 2.1

(One-way function). A polynomial-time computable function

\(f:\{0,1\}^{*}\rightarrow \{0,1\}^{*}\) is called a one-way function if for every probabilistic polynomial time algorithm \(\mathsf{A}\), there is a negligible function \(\nu :\mathbb {N}\rightarrow [0,1]\) such that for every \(n\in {\mathbb {N}}\)

$$ \Pr _{x\leftarrow \{0,1\}^{n}}\left[ \mathsf{A}(f(x))\in f^{-1}(f(x))\right] \le \nu (n) $$

For simplicity we assume that the one-way function f is length-preserving. That is, \(\left| f(x)\right| =\left| x\right| \) for every \(x\in \left\{ 0,1\right\} ^*\). This can be assumed without loss of generality, and is not crucial for our constructions.

In this paper we focus on almost-regular one-way functions, formally defined below.

Definition 2.2

(Almost-regular function). A function family \(f=\{f_n: \left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\}\) is \(\beta \) -almost-regular for \(\beta \ge 0\) if for every \(n\in {\mathbb {N}}\) and \(x\in \left\{ 0,1\right\} ^n\) it holds that

$$\begin{aligned} \frac{2^n}{\left| \mathsf{Im}(f) ~\right| }\cdot n^{-\beta } \le \left| f^{-1}(f(x))\right| \le \frac{2^n}{\left| \mathsf{Im}(f) ~\right| }\cdot n^\beta . \end{aligned}$$

f is almost-regular if there exists \(\beta \ge 0\) such that f is \(\beta \)-almost-regular, and regular if it is 0-almost-regular.

Note that we do not assume that the regularity of f can be computed efficiently. That is, we only assume that f is unknown-(almost)-regular.

Immediately from the definition of a one-way function, we get the following simple observation.

Claim 2.3

For every one-way function \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) there exists a negligible function \(\nu (n)\) such that for every input \(x\in \left\{ 0,1\right\} ^n\) it holds that \(\left| f^{-1}(f(x))\right| \le 2^n\cdot \nu (n)\).

2.3 Pseudorandom Generators

In Sect. 3 we use one-way functions in order to construct PRGs. The later are formally defined below.

Definition 2.4

(Pseudorandom generator). Let n be a security parameter. A polynomial-time computable function \(G:\{0,1\}^{n}\rightarrow \{0,1\}^{m(n)}\) is called a pseudorandom generator if for every \(n>0\) it holds that \(m(n)>n\) and, for every probabilistic polynomial time algorithm \(\mathsf{D}\), there is a negligible function \(\nu :\mathbb {N}\rightarrow [0,1]\) such that for every \(n>0\),

$$ \left| \Pr _{x\leftarrow \left\{ 0,1\right\} ^{n}}\left[ \mathsf{D}(G(x))=1\right] -\Pr _{x\leftarrow \left\{ 0,1\right\} ^{m(n)}}\left[ \mathsf{D}(x)=1\right] \right| \le \nu (n). $$

A key ingredient in the construction of PRG from one-way function is the Goldreich-Levin hardcore predicate. The following lemma follows almost directly from [9].

Lemma 2.5

Let n be a security parameter. Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be a function, and D a distribution on \(\left\{ 0,1\right\} ^n\), such that for every PPT \(\mathsf{A}\)

$$\begin{aligned} \Pr _{x\leftarrow D}\left[ \mathsf{A}(f(x))\in f^{-1}(f(x))\right] = neg(n). \end{aligned}$$

Then for every PPT \(\mathsf{P}\),

$$\begin{aligned} \Pr _{x\leftarrow D, r\leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{P}(f(x),r)={\text {GL}}(x,r)\right] \le 1/2+neg(n) \end{aligned}$$

where \({\text {GL}}(x,r):=\langle x,r \rangle \) is the Goldreich-Levin predicate.

Proof

By the proof of Goldreich-Levin [9], for every \(p\in {\text {poly}}\) there is an oracle-aided PPT algorithm \(\mathsf{A}\) such that for every algorithm \(\mathsf{P}\) and x with

$$\begin{aligned} \Pr _{r\leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{P}(f(x),r)={\text {GL}}(x,r)\right] \ge 1/2+1/p(n) \end{aligned}$$

it holds that

$$\begin{aligned} \Pr \left[ \mathsf{A}^{\mathsf{P}}(f(x))=x\right] \ge 1/p^2(n). \end{aligned}$$

Thus, it holds for every \(p\in {\text {poly}}\) that

$$\begin{aligned} \Pr _{x\leftarrow D}\left[ \Pr _{r\leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{P}(f(x),r)={\text {GL}}(x,r)\right] \ge 1/2+1/p(n)\right] = neg(n) \end{aligned}$$

which implies that

$$\begin{aligned} \Pr _{x\leftarrow D, r\leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{P}(f(x),r)={\text {GL}}(x,r)\right] \le 1/2+1/p(n) +neg(n) \end{aligned}$$

for every \(p\in {\text {poly}}\).

The next lemma, stated in [22], is useful for showing that a sequence of bits is pseudorandom. The proof of the lemma is given in Appendix A.

Lemma 2.6

(Distinguishability to prediction). There exists an oracle-aided PPT algorithm \(\mathsf{P}\) such that the following holds. Let Q be a distribution over \(\left\{ 0,1\right\} ^*\times \left\{ 0,1\right\} ^n\), let \(\mathsf{D}\) be an algorithm and \(\alpha \in [0,1]\) such that,

$$\begin{aligned} \Pr _{(x,y)\leftarrow Q, z\leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{D}(x,z)=1\right] -\Pr _{(x,y)\leftarrow Q}\left[ \mathsf{D}(x,y)=1\right] \ge \alpha . \end{aligned}$$

Then there exists \(i\in [n]\) such that

$$\begin{aligned} \Pr _{(x,y) \leftarrow Q}\left[ \mathsf{P}^{\mathsf{D}}(x,y_{1,\dots ,i-1})=y_i\right] \ge 1/2 + \alpha /n. \end{aligned}$$

2.4 Universal One Way Hash Function

Lastly, we formally define UOWHF.

Definition 2.7

(Universal one-way hash function) 

Let k be a security parameter. A family of functions

\(\mathcal {F}=\left\{ f_z:\left\{ 0,1\right\} ^{n(k)}\rightarrow \left\{ 0,1\right\} ^{m(k)}\right\} _{z \in \left\{ 0,1\right\} ^k}\) is a family of universal one-way hash functions (UOWHFs) if it satisfies:

  1. 1.

    Efficiency: Given \(z\in \left\{ 0,1\right\} ^k\) and \(x\in \left\{ 0,1\right\} ^{n(k)}\), \(f_z(x)\) can be evaluated in time \({\text {poly}}(n(k),k)\).

  2. 2.

    Shrinking: \(m(k)<n(k)\).

  3. 3.

    Target Collision Resistance: For every probabilistic polynomial-time adversary \(\mathsf{A}\), the probability that \(\mathsf{A}\) succeeds in the following game is negligible in k:

    1. (a)

      Let \((x,state)\leftarrow \mathsf{A}(1^k)\in \left\{ 0,1\right\} ^{n(k)}\times \left\{ 0,1\right\} ^*\).

    2. (b)

      Choose \(z\leftarrow \left\{ 0,1\right\} ^k\).

    3. (c)

      Let \(x'\leftarrow A(state, z)\in \left\{ 0,1\right\} ^{n(k)}\).

    4. (d)

      \(\mathsf{A}\) succeeds if \(x\ne x'\) and \(f_z(x)=f_z(x')\).

A relaxation of the target collision resistance property can be done by requiring the function to be collision resistant only on random inputs.

Definition 2.8

(Collision resistance on random inputs). Let n be a security parameter. A function \(f:\left\{ 0,1\right\} ^{n}\rightarrow \left\{ 0,1\right\} ^{m(n)}\) is collision resistant on random inputs if for every probabilistic polynomial-time adversary \(\mathsf{A}\), the probability that \(\mathsf{A}\) succeeds in the following game is negligible in n:

  1. 1.

    Choose \(x\leftarrow \left\{ 0,1\right\} ^{n}\).

  2. 2.

    Let \(x'\leftarrow A(x)\in \left\{ 0,1\right\} ^{n}\).

  3. 3.

    \(\mathsf{A}\) succeeds if \(x\ne x'\) and \(f(x)=f(x')\).

The following lemma states that it is enough to construct a function that is collision resistant on random inputs, in order to get UOWHF.

Lemma 2.9

(From random inputs to targets, folklore). Let n be a security parameter. Let \(F:\left\{ 0,1\right\} ^{n} \rightarrow \left\{ 0,1\right\} ^{m(n)}\) be a length-decreasing function. Suppose F is collision-resistant on random inputs.

Then \(\left\{ F_y:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^m\right\} _{y\in \left\{ 0,1\right\} ^n}\), for \(F_y(x):=F(y\oplus x)\), is a family of target collision-resistant hash functions.

2.5 2-Universal Hash Families

2-universal families are an important ingredient in our constructions. In this section, we formally define this notion, together with some useful properties of such families.

Definition 2.10

(2-universal family). A family of function \(\mathcal {F}=\left\{ f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^\ell \right\} \) is 2-universal if for every \(x \ne x' \in \left\{ 0,1\right\} ^n\) it holds that \(\Pr _{f \leftarrow \mathcal {F}}\left[ f(x)=f(x')\right] = 2^{-\ell }\).

A universal a family is explicit if given a description of a function \(f \in \mathcal {F}\) and \(x\in \left\{ 0,1\right\} ^n\), f(x) can be computed in polynomial time (in \(n,\ell \)). Such family is constructible if it is explicit and there is a PPT algorithm that given \(x,x'\in \left\{ 0,1\right\} ^n\) outputs a uniform \(f \in \mathcal {F}\), such that \(f(x) =f(x')\).

An important property of 2-universal families is that they can be used to construct a strong extractor. This is stated in the leftover hash lemma:

Lemma 2.11

(Leftover hash lemma [18]). Let \(n\in {\mathbb {N}}\), \(\epsilon \in [0,1]\), and let X be a random variable over \(\left\{ 0,1\right\} ^n\). Let \(\mathcal {H}=\left\{ h:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^\ell \right\} \) be a 2-universal hash family with \(\ell \le \mathrm {{H}_{\infty }}(X)-2\log 1/\epsilon \). Then,

$$\begin{aligned} SD((H,H(X)),(H, U_\ell )) \le \epsilon \end{aligned}$$

for \(U_\ell \) being the uniform distribution over \(\left\{ 0,1\right\} ^\ell \) and H being the uniform distribution over \(\mathcal {H}\).

The family of all binary matrices of size \(n\times \ell \), \(\left\{ m :m\in \left\{ 0,1\right\} ^{n\times \ell }\right\} \), is a constructible 2-universal family. This family has an additional property that is useful in the proof. This property is defined below.

Definition 2.12

(Approximately flat family). A family of functions \(\mathcal {H}=\left\{ h :\left\{ 0,1\right\} ^{2n} \rightarrow \left\{ 0,1\right\} ^{\ell }\right\} \) is approximately-flat if for every set \(\mathcal {Y}\subseteq \left\{ 0,1\right\} ^n\), \(x_1,x_2 \in \left\{ 0,1\right\} ^n\) and \(y_1 \in \mathcal {Y}\) it holds that,

$$\begin{aligned} \Pr _{h \leftarrow \mathcal {H}}\left[ \exists y_2\in \mathcal {Y}\text { s.t. } h(x_1,y_1)= h(x_2,y_2)\right] \ge 2^{-10}\cdot \min \left\{ \left| \mathcal {Y}\right| \cdot 2^{-\ell }, 1\right\} . \end{aligned}$$

The proof of the next lemma is in Appendix A.

Lemma 2.13

For every \(\ell ,n \in {\mathbb {N}}\) such that \(\ell \le n\), the family \(\left\{ m :m\in \left\{ 0,1\right\} ^{n\times \ell }\right\} \) is approximately-flat.

2.6 Useful Inequalities

The following well-known inequalities will be useful later on.

Lemma 2.14

(Jensen Inequality). Let X be a distribution over \({\mathbb R}\) and let \(f:{\mathbb R}\rightarrow {\mathbb R}\) be a convex function. It holds that

$$\begin{aligned} f({\text {*}}{E}{}\left[ X\right] )\le {\text {*}}{E}{}\left[ f(X)\right] \end{aligned}$$

Lemma 2.15

(Cauchy–Schwarz inequality). Let \(n \in {\mathbb {N}}\) and \(a_1,\dots ,a_n \in {\mathbb R}\) be numbers. Then,

$$\begin{aligned} (\sum _{i\in [n]} a_i)^2 \le n \cdot \sum _{i\in [n]} a_i^2 \end{aligned}$$

Lastly, the following lemma will be useful in the security proof of the UOWHF. Let A be an algorithm such that for every x, the output of \(\mathsf{A}(x)\) is in some small set \(\mathcal {S}_x\). Then the lemma roughly states the event of two executions of \(\mathsf{A}\) returning the same value is not too rare.

Lemma 2.16

Let \(\varOmega \subseteq \left\{ 0,1\right\} ^n\) and \(\mathcal {X}\) be some set, let X be a distribution over \(\mathcal {X}\), and let \(S:\mathcal {X}\rightarrow P(\varOmega )\) be a function that maps elements in \(\mathcal {X}\) to subsets of \(\varOmega \). Let \(\mathsf{A}\) be an algorithm, such that for every \(x\in \mathcal {X}\), \(\mathsf{A}(x)\in S(x)\cup \{\bot \}\). Assume that for every \(u \in \varOmega \), it holds that \(0<\Pr _{x\leftarrow X}\left[ u\in S(x)\right] \le \ell /\left| \varOmega \right| \), and that \(\Pr _{x\leftarrow X}\left[ \mathsf{A}(x)\in S(x)\right] \ge p\). Then

$$\begin{aligned} \sum _{u\in \varOmega } \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\right] \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\mid u \in S(x)\right] \ge p^2/\ell . \end{aligned}$$

.

Proof

Using Cauchy–Schwarz inequality, it holds that:

$$\begin{aligned}&\sum _{u\in \varOmega } \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\right] \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\mid u \in S(x)\right] \\&= \sum _{u\in \varOmega } \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\right] ^2/\Pr _{x \leftarrow X}\left[ u \in S(x)\right] \\&\ge \sum _{u\in \varOmega } \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\right] ^2\cdot \left| \varOmega \right| /\ell \\&\ge \left( \sum _{u\in \varOmega } \Pr _{x \leftarrow X}\left[ \mathsf{A}(x)=u\right] \right) ^2/\ell \\&\ge p^2/\ell . \end{aligned}$$

3 The PRG Construction

In this section we prove the security of our PRG construction. We start with a description of the construction. Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function, let t be a parameter and let \(\mathcal {H}= \left\{ m :m \in \left\{ 0,1\right\} ^{2n\times ( n+\log n)}\right\} \) be the 2-universal family induced by the set of matrices of size \(2n\times (n+\log n)\).Footnote 8 The generator \(G:\mathcal {H}\times \left\{ 0,1\right\} ^{ n(t+1)}\rightarrow \mathcal {H}\times \left\{ 0,1\right\} ^{t\cdot (n+\log n)}\) is given by

$$\begin{aligned} G\big (h,x_1,\dots , x_{t+1}\big )=\left( h,h(x_1,f(x_{2})), \dots ,h(x_t,f(x_{t+1}))\right) . \end{aligned}$$

The main theorem of this part is as follows.

Theorem 3.1

[Main theorem for PRG] Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function and let \(t(n) \ge n/\log n+1\) be some polynomial. Then G is a PRG with seed length \(O(n^2+n(t+1))\). Furthermore, G uses t non-adaptive calls to f.

Note that the stretch of G is \(t\cdot \log n - n\), which is tight with [6] for large values of t. We now prove Theorem 3.1. Our main lemma states that given h and \(f(x_1)\), the hash \(h(x_1,f(x_2))\) looks uniform for a computationally bounded algorithm.

Lemma 3.2

Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function. For any PPT algorithm \(\mathsf{D}\), it holds that

$$\begin{aligned}&\left| \Pr _{\begin{array}{c} x_1\leftarrow \left\{ 0,1\right\} ^{n},\\ h \leftarrow \mathcal {H},\\ u \leftarrow \left\{ 0,1\right\} ^{n+\log n} \end{array}}\left[ \mathsf{D}(h,f(x_1),u)=1\right] -\Pr _{\begin{array}{c} x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n},\\ h \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{D}(h,f(x_1),h(x_1,f(x_{2})))=1\right] \right| \\ \end{aligned}$$

is a negligible function of n.

We prove Lemma 3.2 below, but first we use it in order to give the proof of Theorem 3.1, which is straight-forward.

Proof

(Proof of Theorem 3.1). Let f and t be as in Theorem 3.1. By construction G makes t calls to f. Additionally, \(t(n+\log n) > n(t+1)\) when \(t\ge n/\log n +1\). We are left to show that the output of G is indistinguishable from uniform. The proof is by a hybrid argument. Let H be a uniform random variable over \(\mathcal {H}\), and \(X_1,\dots ,X_{t+1}\) be i.i.d. uniform random variables over \(\left\{ 0,1\right\} ^n\). Assume toward a contradiction that there is a PPT algorithm \(\mathsf{\widehat{D}}\) that can distinguish \(G(H,X_1,\dots ,X_{t+1})\) from uniform. Then we show that the following algorithm \(\mathsf{D}\) contradicts Lemma 3.2.

figure a

For each \(\ell \in [t+1]\), let the distribution \(Hyb_\ell \) be defined as

$$\begin{aligned} Hyb_\ell :=\left( H,H(X_1,f(X_{2})), \dots ,H(X_{\ell -1},f(X_{\ell })),U_{(t+1-\ell )n\cdot \log n}\right) \end{aligned}$$

where \(U_{(t+1-\ell )n\cdot \log n}\) is the uniform distribution over \(\left\{ 0,1\right\} ^{(t+1-\ell )n\cdot \log n}\). That is, \(Hyb_\ell \) is equal to \(G(H,X_1,\dots ,X_{t+1})\) on the first \(\ell -1\) blocks, and uniform on the rest. Observe that for every fixing of \(\ell \) in the algorithm, the distribution of w for input \(h \leftarrow \mathcal {H}, y \leftarrow f(U_n), z \leftarrow \left\{ 0,1\right\} ^{n+\log n}\) is exactly as the distribution \(Hyb_{\ell }\). Similarly, the distribution of w for input \(h \leftarrow \mathcal {H}, y \leftarrow f(U_n)\) and \(z = h(X',Y')\) for \(X'\leftarrow f^{-1}(y)\) and \(Y' \leftarrow f(\left\{ 0,1\right\} ^n)\) is exactly as the distribution \(Hyb_{\ell +1}\). Thus, it holds that,

$$\begin{aligned}&\left| \Pr _{\begin{array}{c} x_1\leftarrow \left\{ 0,1\right\} ^{n}, \\ h \leftarrow \mathcal {H},\\ u \leftarrow \left\{ 0,1\right\} ^{n+\log n} \end{array}}\left[ \mathsf{D}(h,f(x_1),u)=1\right] -\Pr _{\begin{array}{c} x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n},\\ h \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{D}(h,f(x_1),h(x_1,f(x_{2})))=1\right] \right| \nonumber \\&= \left| 1/t \cdot \sum _{\ell =1}^t\bigg (\Pr _{w\leftarrow Hyb_{\ell }}\left[ \mathsf{\widehat{D}}(w)=1\right] -\Pr _{w\leftarrow Hyb_{\ell +1}}\left[ \mathsf{\widehat{D}}(w)=1\right] \bigg )\right| \nonumber \\&= 1/t \cdot \left| \Pr _{w\leftarrow Hyb_{1}}\left[ \mathsf{\widehat{D}}(w)=1\right] -\Pr _{w\leftarrow Hyb_{t+1}}\left[ \mathsf{\widehat{D}}(w)=1\right] \right| \nonumber \\&=1/t \cdot \left| \Pr _{w\leftarrow \left\{ 0,1\right\} ^{\log \left| \mathcal {H}\right| +(n+\log n)\cdot t}}\left[ \mathsf{\widehat{D}}(w)=1\right] -\Pr _{w\leftarrow G(H,X_1,\dots ,X_{t+1})}\left[ \mathsf{\widehat{D}}(w)=1\right] \right| . \end{aligned}$$
(1)

Where the last equality holds since \(Hyb_{t+1}\equiv G(H,X_1,\dots ,X_{t+1})\) and \(Hyb_1\) is the uniform distribution. We conclude by Lemma 3.2 that the advantage probability of \(\mathsf{\widehat{D}}\) is negligible.

3.1 Proving Lemma 3.2

In the rest of this section we prove Lemma 3.2. Fix \(\beta \ge 0\), any \(\beta \)-almost-regular one-way function \(f:\{0,1\}^{n}\rightarrow \{0,1\}^{n}\) and \(n\in {\mathbb {N}}\). Recall that we want to show that \(h(x_1,f(x_2))\) looks uniform to computationally bounded algorithms, given h and \(f(x_1)\). By the leftover hash lemma, every prefix \(p(x_1,x_2)\) of the above hash \(h(x_1,f(x_2))\) is somewhat close to uniform. In order to show that the suffix looks uniform as well, we prove that the concatenation of \(h,f(x_1)\) and \(p(x_1,x_2)\) is a one-way function, and then use Goldreich-Levin. The next claim states that the described function is indeed one-way on part of its domain.

Claim 3.4

For every \(i \in [n+\log n]\), let \(g_i:\mathcal {H}\times \left\{ 0,1\right\} ^n\times \left\{ 0,1\right\} ^n\rightarrow \mathcal {H}\times \left\{ 0,1\right\} ^n\times \left\{ 0,1\right\} ^{i-1}\) be the following function

$$ g_i(h,x_1,y) :=\left( h,f(x_1),h(x_1, y)_{1,\dots ,i-1} \right) . $$

Then it holds that for every PPT \(\mathsf{A}\) and every function \(i=i(n)\)

$$\begin{aligned}&\Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n}\\ z=(h,x_1,f(x_2)) \end{array}}\left[ \mathsf{A}(g_i(z))\in g^{-1}_i(g_i(z))\right] =neg(n). \end{aligned}$$
(2)

Proof

Assume toward contradiction that the claim does not hold. That is, there exists PPT algorithm \(\mathsf{A}\), a function i(n) and a constant \(d\in {\mathbb {N}}\) such that

$$\begin{aligned}&\Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n}\\ z=(h,x_1,f(x_2)) \end{array}}\left[ \mathsf{A}(g_i(z))\in g_i^{-1}(g_i(z))\right] \ge n^{-d} \end{aligned}$$
(3)

for infinitely many \(n\in {\mathbb {N}}\). Fix such n and consider the following algorithm \(\mathsf{\widehat{A}}\). In the following we show \(\mathsf{\widehat{A}}\) can be used to invert f.

figure b

That is, \(\mathsf{\widehat{A}}\) tries to invert y using \(\mathsf{A}\) and only a prefix of \(h(x_1, f(x_2))\). It does so by iterating over all the possible values of the missing input bits

$$\begin{aligned} h(f^{-1}(y),f(x_2))_{n-(4d+2\beta )\log n+1,\dots ,n+\log n} \end{aligned}$$

and every possible index \(j\in [n+\log n]\). Clearly \(\mathsf{\widehat{A}}\) runs in a polynomial time. Let \(x_1\) be some preimage of y and let \(x_2\) be some element in \(\left\{ 0,1\right\} ^n\). Note that when the guess w is equal to \(h(x_1,f(x_2))_{n-(4d+2\beta )\log n+1,\dots ,n+\log n}\), and when the index j is equal to i, the value of \(h,y,(z\circ w)_{1,\dots ,j-1}\) computed by the algorithm is equal to the output of \(g_i(h,x_1,f(x_2))\). Thus, by definition it is clear that the success probability of \(\mathsf{\widehat{A}}\) is better than \(\mathsf{A}\)’s. Formally, we get that,

$$\begin{aligned}&\Pr _{h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n}}\left[ \mathsf{\widehat{A}}(h,f(x_1),h(x_1,f(x_2))_{1,\dots ,n-(4d+2\beta )\log n})\in f^{-1}(f(x_1))\right] \nonumber \\&\ge \Pr _{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n}}\left[ \mathsf{A}(g_i(h,x_1,f(x_2)))\in g_i^{-1}(g_i(h,x_1,f(x_2)))\right] \nonumber \\&\ge n^{-d}. \end{aligned}$$
(4)

Next, we show that \(\mathsf{\widehat{A}}\) can guess the value of \(h(x_1,f(x_2))_{1,\dots ,n-(4d+2\beta )\log n}\). Indeed, recall that by the \(\beta \)-almost-regularity of f, given any fixing of \(f(x_1)\), the min-entropy of \(x_1,f(x_2)\) is at least \(n-2\beta \log n\). Thus, by the left-over hash lemma, \(h(x_1,f(x_2))_{1,\dots ,n-(4d+2\beta )\log n}\) is \(n^{-d}/2\) close to uniform given h and \(f(x_1)\). Let \(k=n-(4d+2\beta )\log n\). Combining the above with Eq. (4),

$$\begin{aligned}&\Pr _{h\leftarrow \mathcal {H}, x_1 \leftarrow \left\{ 0,1\right\} ^{n}, u \leftarrow \left\{ 0,1\right\} ^{k}}\left[ \mathsf{\widehat{A}}(h,f(x_1),u)\in f^{-1}(f(x_1))\right] \nonumber \\&={\text {*}}{E}_{y\leftarrow f(\left\{ 0,1\right\} ^n)}\left[ \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1\leftarrow f^{-1}(y),\\ u \leftarrow \left\{ 0,1\right\} ^{k} \end{array}}\left[ \mathsf{\widehat{A}}(h,y,u)\in f^{-1}(f(x_1))\right] \right] \nonumber \\&\ge {\text {*}}{E}_{y}\left[ \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1\leftarrow f^{-1}(y),\\ x_2 \leftarrow \left\{ 0,1\right\} ^{n} \end{array}}\left[ \mathsf{\widehat{A}}(h,y,h(x_1,f(x_2))_{1,\dots ,k})\in f^{-1}(f(x_1))\right] - n^{-d}/2\right] \nonumber \\&= \Pr _{h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n}}\left[ \mathsf{\widehat{A}}(h,f(x_1),h(x_1,f(x_2))_{1,\dots ,k})\in f^{-1}(f(x_1))\right] - n^{-d}/2\nonumber \\&\ge n^{-d}/2. \end{aligned}$$
(5)

Finally, let \(\mathsf{Inv}\) be the algorithm that given \(f(x_1)\) samples \(h\leftarrow \mathcal {H}\) and \(u \leftarrow \left\{ 0,1\right\} ^{n-(4d+2\beta )\log n}\), and executes \(\mathsf{\widehat{A}}\). By Eq. (5) \(\mathsf{Inv}\) inverts \(f(x_1)\) successfully with probability at least \(n^{-d}/2\) for uniformly sampled \(x_1 \in \left\{ 0,1\right\} ^n\), for infinitely many \(n\in {\mathbb {N}}\), which is a contradiction.

We are now ready to prove Lemma 3.2. The proof is straight-forward from Claim 3.4 together with Lemma 2.5 and Lemma 2.6.

Proof

(Proof of Lemma 3.2.). Assume toward a contradiction that Lemma 3.2 does not hold. That is, there exists PPT algorithm \(\mathsf{D}\) and a constant \(c\in {\mathbb {N}}\) such that

$$\begin{aligned}&\left| \Pr _{\begin{array}{c} x_1\leftarrow \left\{ 0,1\right\} ^{n},\\ h\leftarrow \mathcal {H},\\ u \leftarrow \left\{ 0,1\right\} ^{n+\log n} \end{array}}\left[ \mathsf{D}(h,f(x_1),u)=1\right] -\Pr _{\begin{array}{c} x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{n},\\ h \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{D}(h,f(x_1),h(x_1,f(x_{2})))=1\right] \right| \nonumber \\&~~~~~~~~~~~~~~\ge n^{-c} \end{aligned}$$
(6)

for infinitely many \(n\in {\mathbb {N}}\). We assume without loss of generality that for infinitely many \(n\in {\mathbb {N}}\) it holds that

$$\begin{aligned}&\Pr _{\begin{array}{c} x_1\leftarrow \left\{ 0,1\right\} ^{n},\\ h\leftarrow \mathcal {H},\\ u \leftarrow \left\{ 0,1\right\} ^{n+\log n} \end{array}}\left[ \mathsf{D}(h,f(x_1),u)=1\right] -\Pr _{\begin{array}{c} x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{2n},\\ h \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{D}(h,f(x_1),h(x_1,f(x_{2})))=1\right] \nonumber \\&\ge n^{-c} \end{aligned}$$
(7)

as otherwise we can flip the output of \(\mathsf{D}\). By Lemma 2.6 there is a oracle-aided PPT algorithm \(\mathsf{P}\) such that for infinitely many \(n\in {\mathbb {N}}\) and \(i=i(n)\) it holds that

$$\begin{aligned} \Pr _{\begin{array}{c} x_1,x_2 \leftarrow \left\{ 0,1\right\} ^{2n},\\ h \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{P}^{\mathsf{D}}(h, f(x_1),h(x_1,f(x_2))_{1,\dots ,i-1})=h(x_1,f(x_2))_i\right] \ge 1/2 + n^{-c-4}. \end{aligned}$$

Recall that, by definition, \(h,f(x_1),h(x_1,f(x_{2}))_{1,\dots , i-1} = g_{i}(x_1,f(x_2))\). Additionally, by our choice of the family \(\mathcal {H}\), \(h(x_1,f(x_{2})))_{i}\) is the \({\text {GL}}\) predicate of the function \(g_{i}(x_1,f(x_2))\).Footnote 9 Thus, the above contradicts Claim 3.4 and Lemma 2.5.

4 The UOWHF Construction

In this section we prove the security of our UOWHF construction. We start with a full description of the construction. Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function, let t be a parameter and let \(\mathcal {H}= \left\{ m :m \in \left\{ 0,1\right\} ^{2n\times ( n-\log n)}\right\} \) be the 2-universal family induced by the set of matrices of size \(2n\times (n-\log n)\).Footnote 10

The function \(C:\mathcal {H}\times \left\{ 0,1\right\} ^{ n\cdot t}\rightarrow \mathcal {H}\times \left\{ 0,1\right\} ^{(t-1)\cdot (n-\log n)+2n}\) is given by

$$\begin{aligned} C\big (h,x_1,\dots , x_t\big )= h,f(x_1),h(x_1,f(x_{2})), \dots ,h(x_{t-1},f(x_{t})),x_t. \end{aligned}$$

Let \(k=\log \left| \mathcal {H}\right| + n\cdot t\). For a string \(z\in \left\{ 0,1\right\} ^k\), let \(C_z(w):=C(w\oplus z)\). Our main theorem for this part is stated as follows.

Theorem 4.1

[Main theorem for UOWHF] Let \(f=f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function and let \(t(n) \ge n/\log n+2\) be some polynomial. Then \(\mathcal {F}_k=\left\{ C_z\right\} _{z\in \left\{ 0,1\right\} ^k}\) is a family of universal one-way hash functions with key length \(k=O(n^2+n\cdot t(n))\) and output length \(O(n^2+n\cdot t(n))\). Furthermore, for every \(z\in \left\{ 0,1\right\} ^k\), \(C_z\) uses t non-adaptive calls to f.

In the rest of this section we prove Theorem 4.1. Note that by Lemma 2.9 in order to prove Theorem 4.1, it is enough to show that it is hard to find a collision of C for a random input. The main lemma of this part is the following one, which essentially states that no efficient algorithm can find a collision in a simpler function, \(\widehat{C}(h,x_1,x_2) = h, f(x_1), h(x_1,f(x_2))\). Note that \(\widehat{C}\) is not UOWHF, as it is not shrinking, and, as we are only interested in collisions \((h,x'_1,x'_2)\) in which \(f(x_2) \ne f(x'_2)\).

Lemma 4.2

Let \(f:\left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be an almost-regular one-way function. For every PPT algorithm \(\mathsf{A}\), it holds that,

$$\begin{aligned} \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n,\\ (x'_1,x'_2)\leftarrow \mathsf{A}(h,x_1,x_2) \end{array}}\left[ f(x_1)=f(x'_1)\wedge f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)) \right] \end{aligned}$$

is a negligible function of n.

We prove Lemma 4.2 below, but first let us prove the security of C using Lemma 4.2. The proof is by reduction, stated in the next claim. Informally, we show that an algorithm that breaks the security of C can be used in order to find a collision in the function \(\widehat{C}\) defined above.

Claim 4.3

There exists an oracle-aided PPT algorithm \(\mathsf{A}\) such that the following holds. Let f be a one-way function, \(t\in {\text {poly}}\) and C be the function described above. Let \(n\in {\mathbb {N}}\), \(\alpha \in [0,1]\) and let \(\mathsf{ColFinder}\) be an algorithm such that

$$\begin{aligned} \Pr _{w\leftarrow \mathcal {H}\times (\left\{ 0,1\right\} ^n)^t, w' \leftarrow \mathsf{ColFinder}(w)}\left[ w' \ne w \wedge C(w)=C(w')\right] = \alpha . \end{aligned}$$

Then,

$$\begin{aligned} \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n,\\ (x'_1,x'_2)\leftarrow \mathsf{A}^{\mathsf{ColFinder}}(h,x_1,x_2) \end{array}}\left[ \begin{array}{c} f(x_1)=f(x'_1)\\ \wedge f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)) \end{array}\right] \ge (\alpha -\nu (n))/t, \end{aligned}$$

where \(\nu \) is a negligible function, depending only on f and t.

The proof of Theorem 4.1 is now immediate.

Proof

(Proof of Theorem 4.1.). Let ft and \(C_z\) be as in Theorem 4.1. It is clear that \(C_z\) is efficiently computable for every \(z\in \left\{ 0,1\right\} ^k\), and that C is shrinking since \(\log \left| H\right| + n\cdot t > \log \left| H\right| + (t-1)\cdot (n-\log n)+2n\) for \(t\ge n/\log n+2\).

Next, we show that it is collision-resistant for random input. Assume toward contradiction that there exists a PPT \(\mathsf{ColFinder}\) and \(p \in {\text {poly}}\) such that

$$\begin{aligned} \Pr _{\begin{array}{c} w\leftarrow \mathcal {H}\times (\left\{ 0,1\right\} ^n)^t,\\ w' \leftarrow \mathsf{ColFinder}(w) \end{array}}\left[ w' \ne w \wedge C(w)=C(w')\right] \ge 1/p(n) \end{aligned}$$

for infinitely many \(n \in {\mathbb {N}}\). Then, by Claim 4.3, for infinitely many \(n \in {\mathbb {N}}\) it holds that

$$\begin{aligned}&\Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n,\\ (x'_1,x'_2)\leftarrow \mathsf{A}^{\mathsf{ColFinder}}(h,x_1,x_2) \end{array}}\left[ \begin{array}{c} f(x_1)=f(x'_1)\wedge \\ f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)) \end{array} \right] \ge 1/(2t\cdot p(n)). \end{aligned}$$

Note that by the choice of t, \(1/(2t\cdot p(n))\) is not negligible, and that since both \(\mathsf{A}\) and \(\mathsf{ColFinder}\) are efficient, \(\mathsf{A}^{\mathsf{ColFinder}}(\cdot )\) can be efficiently implemented. Thus, the above contradicts Lemma 4.2.

4.1 Proving Claim 4.3

We next prove Claim 4.3. The next simple claim will be useful in the proof, as it states that given \((h,x_1,\dots ,x_t)\), with high probability there is no collision \((h,x'_1,\dots ,x'_t)\) of C in which for some \(j\in [t]\) it holds that \(x_j\ne x'_j\) while \(f(x_j)=f(x'_j)\) and \(f(x_{j+1})=f(x'_{j+1})\).

Claim 4.4

For every one-way function f and polynomial t, there exists a negligible function \(\nu \) such that the following holds. For every \(x_1,\dots , x_t \in \left\{ 0,1\right\} ^n\),

$$\begin{aligned} \Pr _{h \leftarrow \mathcal {H}}\left[ \begin{array}{c} \forall j \in [t-1],\ \forall x'_j \in f^{-1}(f(x_{j}))\setminus \left\{ x_j\right\} \,\, it\,\, holds\,\, that \,\,\\ h(x'_j,f(x_{j+1}))\ne h(x_j,f(x_{j+1})) \end{array}\right] \ge 1-\nu (n). \end{aligned}$$

Proof

Fix \(x_1,\dots , x_t \in \left\{ 0,1\right\} ^n\), \(j\in [t-1]\) and \(x'_j\in f^{-1}(f(x_{j}))\setminus \left\{ x_j\right\} \). Since \(\mathcal {H}\) is 2-universal, it holds that

$$\begin{aligned} \Pr _{h \leftarrow \mathcal {H}}\left[ h(x'_j,f(x_{j+1}))= h(x_j,f(x_{j+1}))\right] = n/2^n. \end{aligned}$$

By the union bound,

$$\begin{aligned}&\Pr _{h \leftarrow \mathcal {H}}\left[ \begin{array}{c} \exists j\in [t-1], x'_j \in f^{-1}(f(x_{j}))\setminus \left\{ x_j\right\} \text { s.t. }\\ h(x'_j,f(x_{j+1}))=h(x_j,f(x_{j+1})) \end{array}\right] \\&\le \sum _{j\in [t-1]}\sum _{x'_j \in f^{-1}(f(x_{j}))\setminus \left\{ x_j\right\} } \Pr _{h\leftarrow \mathcal {H}}\left[ h(x'_j,f(x_{j+1}))=h(x_j,f(x_{j+1}))\right] \\&\le t(n)\cdot |f^{-1}(f(x_{j}))| \cdot n/ 2^{n}. \end{aligned}$$

Since f is a one-way function, by Claim 2.3 it holds that \(|f^{-1}(f(x_{k}))|\le 2^n\cdot neg(n)\), and thus the claim follows.

Proof

(Proof of Claim 4.3.). Let f, t n, \(\alpha \) and \(\mathsf{ColFinder}\) as in Claim 4.3. Let \(\mathsf{A}\) be the following algorithm.

figure c

We next show that with all but negligible probability over the choice of \(w=(h, x_1,\dots , x_t)\), the following must hold. For every \(w'=(h',x'_1,\dots , x'_t)\) with \(w\ne w'\) and \(C(w)=C(w')\), there exists some \(i \in [t-1]\) such that \(f(x_i)=f(x'_i)\) and \(f(x_{i+1})\ne f(x'_{i+1})\). The lemma then follows easily.

Indeed, fix such w and \(w'\). First note that since \(C(w)=C(w')\), it holds that \(h=h'\). Let j be the first index for which \(x_j \ne x'_j\), and observe that by the definition of C, \(j \in [t-1]\). We split into cases:

  • If \(f(x_j)\ne f(x'_j)\), then \(j>1\) (since \(C(w)=C(w')\) implies that \(f(x_1)=f(x'_1)\)) and for \(i=j-1\) it holds that \(f(x_i)=f(x'_i)\) and \(f(x_{i+1})\ne f(x'_{i+1})\).

  • For the other case, assume that \(f(x_j)=f(x'_j)\). By Claim 4.4, with probability all but negligible over the choice of w, it holds that, \(h(x_j,f(x_{j+1})) \ne h(x'_j,f(x_{j+1}))\), and thus it must hold that \(f(x_{j+1})\ne f(x'_{j+1})\). We get that for \(i=j\), it holds that \(f(x_i)=f(x'_i)\) and \(f(x_{i+1})\ne f(x'_{i+1})\).

Since i is chosen uniformly in Theorem 4.5, and since the distribution of \(h,z_1,\dots , z_t\) in Theorem 4.5 is uniform for every \(i\in [t-1]\) and uniformly chosen input \(h,x_1,x_2\), we conclude that the success probability of \(\mathsf{A}^{\mathsf{ColFinder}}\) is at least \((\alpha -neg(n))/t\).

4.2 Proving Lemma 4.2

We now prove Lemma 4.2. For the rest of this section, fix \(\beta \ge 0\), and a \(\beta \)-almost-regular one-way function f. In order to prove the lemma, we show how to invert the one-way function f using an algorithm that contradicts the lemma. Formally,

Claim 4.6

There exists PPT oracle-aided algorithm \(\mathsf{Inv}\) such that the following holds. Let \(n\in {\mathbb {N}}\), \(\alpha \in [0,1]\) and let \(\mathsf{A}\) be an algorithm such that

$$\begin{aligned} \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n,\\ (x'_1,x'_2)\leftarrow \mathsf{A}(h,x_1,x_2) \end{array}}\left[ \begin{array}{c} f(x_1)=f(x'_1)\wedge f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)) \end{array} \right] =\alpha . \end{aligned}$$

Then,

$$\begin{aligned} \Pr _{x \leftarrow \left\{ 0,1\right\} }\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \ge \alpha ^2\cdot n^{-2\beta -2}\cdot 2^{-12}. \end{aligned}$$

The proof of Lemma 4.2 is immediate from Claim 4.6, as

\(\Pr _{x \leftarrow \left\{ 0,1\right\} }\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \) must be negligible.

Proof

(Proof of Lemma 4.2.). Assume toward contradiction that there exists a PPT algorithm \(\mathsf{A}\) and \(p\in {\text {poly}}\) such that

$$\begin{aligned} \Pr _{\begin{array}{c} h\leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n,\\ (x'_1,x'_2)\leftarrow \mathsf{A}(h,x_1,x_2) \end{array}}\left[ \begin{array}{c} f(x_1)=f(x'_1)\wedge \\ f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)) \end{array} \right] \ge 1/p(n) \end{aligned}$$

for infinitely many \(n\in {\mathbb {N}}\). Then, by Claim 4.6 it holds that

$$\begin{aligned} \Pr _{x \leftarrow \left\{ 0,1\right\} }\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \ge 1/p(n)^2\cdot n^{-2\beta -2}\cdot 2^{-10} \end{aligned}$$

for infinitely many \(n\in {\mathbb {N}}\), which is a contradiction to f being a one-way function.

The rest of this part is dedicated for proving Claim 4.6. Let n, \(\alpha \) and \(\mathsf{A}\) be as in Claim 4.6. In the following we assume that \(\mathsf{A}\) outputs a valid pair \((x'_1,x'_2)\) with \((f(x_1)=f(x'_1)\wedge f(x_2)\ne f(x'_2) \wedge h(x_1,f(x_2)) = h(x'_1,f(x'_2)))\) or \((\bot ,\bot )\). For \(x_1,x_2\) and h, we define,

$$\begin{aligned} {\mathcal {G}}_{h,x_1,x_2}:=\left\{ (x'_1,y) \in f^{-1}(f(x_1)) \times \mathsf{Im}(f) ~:h(x_1,f(x_2))=h(x'_1,y) \right\} . \end{aligned}$$

For ease of notation, we say that \(x \in {\mathcal {G}}_{h,x_1,x_2}\) if there exists \(y\in \mathsf{Im}(f) ~\) such that \((x,y)\in {\mathcal {G}}_{h,x_1,x_2}\). Let \(\mathsf{Inv}\) be the following algorithm. Note that \(\mathsf{Inv}\) can be implemented efficiently, by the constructibility of \(\mathcal {H}\).

figure d

That is, in order to invert its input y, \(\mathsf{Inv}\) samples \(x_1,x_2\) and h. It then uses \(\mathsf{A}\) in order to find \(x'_1\) with \(f(x'_1)=f(x_1)\). Lastly, it samples \(h'\) with \(h'(x_1,f(x_2))=h'(x'_1,y)\) and uses \(\mathsf{A}\) in order to find a collision to \(h',x_1,x_2\). By the choice of \(h'\), a possible collision is \((h',x'_1,f^{-1}(y))\). We observe that if \(\mathsf{A}\) finds such a collision, \(\mathsf{Inv}\) successfully inverted y.

For \(x_1,x_2 \in \left\{ 0,1\right\} ^n\), \(x'_1 \in f^{-1}(f(x))\) and \(y \in \mathsf{Im}(f) ~\), let

$$\begin{aligned}&p_\mathsf{A}(x_1,x_2,x'_1,y) \\ {}&:=\Pr _{ h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h',x_1,x_2)\in \left\{ x'_1\right\} \times f^{-1}(y) \mid h'(x_1,f(x_2))=h'(x'_1,y) \right] \\&= \Pr _{ h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h',x_1,x_2)\in \left\{ x'_1\right\} \times f^{-1}(y) \mid (x'_1,y)\in {\mathcal {G}}_{h',x_1,x_2}\right] \end{aligned}$$

and define \(p_\mathsf{A}(x_1,x_2,\bot ,y)=0\). By the above observation, it holds that

$$\begin{aligned}&\Pr _{x \leftarrow \left\{ 0,1\right\} ^n}\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \ge {\text {*}}{E}_{\begin{array}{c} h \leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n\\ y \leftarrow f(\left\{ 0,1\right\} ^n) \\ (x'_1,x'_2) \leftarrow A(h,x_1,x_2) \end{array}}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \end{aligned}$$
(8)

and thus it is enough to bound the latter. We bound it using the following two claims. The first shows that it is enough to bound the probability that \(\mathsf{A}\) outputs \((x'_1,\cdot )\). The second claim bounds the last probability.

Claim 4.8

For every \(x_1,x_2\in \left\{ 0,1\right\} ^n\) and \(x' \in f^{-1}(f(x_1))\) the following holds.

$$\begin{aligned}&{\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ p_\mathsf{A}(x_1,x_2,x',y)\right] \\&\ge \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h',x_1,x_2)=(x',\cdot ) \mid x'\in {\mathcal {G}}_{h',x_1,x_2}\right] \cdot n^{-\beta -1}\cdot 2^{-10}. \end{aligned}$$

Proof

Fix \(x_1,x_2 \in \left\{ 0,1\right\} ^n\) and \(x' \in f^{-1}(f(x_1))\), and for every \(h\in \mathcal {H}\), let \(\mathsf{A}(h):=\mathsf{A}(h,x_1,x_2)\) and \({\mathcal {G}}_{h}:={\mathcal {G}}_{h,x_1,x_2}\). Then, by the definition of \(p_\mathsf{A}\), it holds that

$$\begin{aligned}&{\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ p_\mathsf{A}(x_1,x_2,x',y)\right] \\&={\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid (x',y)\in {\mathcal {G}}_{h'}\right] \right] \\&= {\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ \frac{ \Pr _{h' \leftarrow \mathcal {H}}\left[ (x',y)\in {\mathcal {G}}_{h'}\wedge \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] }{\Pr _{h' \leftarrow \mathcal {H}}\left[ (x',y)\in {\mathcal {G}}_{h'}\mid x'\in {\mathcal {G}}_{h'}\right] }\right] \\&= {\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ \frac{ \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] }{\Pr _{h' \leftarrow \mathcal {H}}\left[ (x',y)\in {\mathcal {G}}_{h'}\mid x'\in {\mathcal {G}}_{h'}\right] }\right] \\&= {\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] \cdot \frac{\Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] }{\Pr _{h' \leftarrow \mathcal {H}}\left[ (x',y)\in {\mathcal {G}}_{h'}\right] }\right] \\ \end{aligned}$$

Since by our assumption on A, for every \((x',y)\) with \(\Pr \left[ \mathsf{A}(h)\in \left\{ x'\right\} \times f^{-1}(y)\right] >0\) it holds that \((x',y) \ne (x_1,f(x_2))\), we get that for every such pair \(\Pr _{h' \leftarrow \mathcal {H}}\left[ (x',y)\in {\mathcal {G}}_{h'}\right] = n/2^n\). Continue,

$$\begin{aligned}&{\text {*}}{E}_{y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ p_\mathsf{A}(x_1,x_2,x',y)\right] \\&= \sum _{y \in \mathsf{Im}(f) ~} \Pr _{x\leftarrow \left\{ 0,1\right\} ^n}\left[ f(x)=y\right] \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] \\ {}&~~~~~~~~~~~~~~~~~\cdot \frac{2^n}{ n} \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] \\&\ge \sum _{y \in \mathsf{Im}(f) ~}\frac{1}{\left| \mathsf{Im}(f) ~\right| \cdot n^\beta } \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] \\ {}&~~~~~~~~~~~~~~~~~ \cdot \frac{2^n}{ n} \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] \\&= \frac{1}{\left| \mathsf{Im}(f) ~\right| \cdot n^\beta }\cdot \frac{2^n}{ n} \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] \\ {}&~~~~~~~~~~~~~~~~~\cdot \sum _{y \in \mathsf{Im}(f) ~} \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')\in \left\{ x'\right\} \times f^{-1}(y) \mid x'\in {\mathcal {G}}_{h'}\right] \\&= \frac{2^n}{\left| \mathsf{Im}(f) ~\right| \cdot n^{\beta +1}} \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')=(x',\cdot ) \mid x'\in {\mathcal {G}}_{h'}\right] \\ \end{aligned}$$

where the inequality holds since f is \(\beta \)-almost-regular. Recall that the family \(\mathcal {H}\) is approximately-flat. That is,

$$\begin{aligned}&\Pr _{h' \leftarrow \mathcal {H}}\left[ \exists y\in \mathsf{Im}(f) ~\text { s.t. } h'(x_1,f(x_2))= h'(x',y)\right] \\&\ge 2^{-10}\cdot \min \left\{ \left| \mathsf{Im}(f) ~\right| \cdot 2^{-(n-\log n)}, 1\right\} . \end{aligned}$$

Thus,

$$\begin{aligned}&\frac{2^n}{\left| \mathsf{Im}(f) ~\right| \cdot n^{\beta +1}} \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ x'\in {\mathcal {G}}_{h'}\right] \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')=(x',\cdot ) \mid x'\in {\mathcal {G}}_{h'}\right] \\&\ge \frac{2^n}{\left| \mathsf{Im}(f) ~\right| \cdot n^{\beta +1}} \cdot 2^{-10}\cdot \min \left\{ \left| \mathsf{Im}(f) ~\right| \cdot 2^{-(n-\log n)}, 1\right\} \\ {}&~~~~~~~~~~~~~~~~~ \cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')=(x',\cdot ) \mid x'\in {\mathcal {G}}_{h'}\right] \\&\ge n^{-\beta -1}\cdot 2^{-10}\cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ \mathsf{A}(h')=(x',\cdot ) \mid x'\in {\mathcal {G}}_{h'}\right] \end{aligned}$$

and the claim holds.

The next claim uses Lemma 2.16 in order to show that in a random execution of \(\mathsf{Inv}\), \(\mathsf{A}\) has a good probability to output the same element \(x'_1\) in Items 2 and 4.

Claim 4.9

For every \(x_1,x_2 \in \left\{ 0,1\right\} \) the following holds. Let \(\alpha _{x_1,x_2}:=\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)\ne \bot \right] \). Then,

$$\begin{aligned}&\sum _{x'_1 \in f^{-1}(f(x_1))}\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x'_1,\cdot )\right] \\&~~~~~~~~~~~~~~\cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x'_1,\cdot ) \mid x'_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] }\\&\ge \alpha ^2_{x_1,x_2}\cdot n^{-\beta -1}/4. \end{aligned}$$

Proof

Fix \(x_1,x_2\in \left\{ 0,1\right\} ^n\), and let \(\alpha _{x_1,x_2}\) be as in Claim 4.9. Let \(\alpha _1 :=\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x_1,\cdot ) \right] \) and let \(\alpha _2 :=\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)\notin \left\{ (x_1,\cdot ),\bot \right\} \right] \). Notice that \(\alpha _{x_1,x_2} = \alpha _1+\alpha _2\).

Define \(\widetilde{\mathsf{A}}(h)\) to be the algorithm that outputs the first coordinate of \(\mathsf{A}\)’s output (\(\mathsf{A}(h,x_1,x_2)_1\)) if it is different from \(x_1\), or \(\bot \) otherwise. Let \({\mathcal {G}}_{h}:={\mathcal {G}}_{h,x_1,x_2}\). Note that by the assumption on \(\mathsf{A}\), \(\widetilde{\mathsf{A}}\) always outputs elements in \(S(h)=\left\{ x \in {\mathcal {G}}_{h,x_1,x_2}:x\ne x_1\right\} \). We get that \(\alpha _{2}:=\Pr _{h \leftarrow \mathcal {H}}\left[ \widetilde{\mathsf{A}}(h)\ne \bot \right] \). Let \(\varOmega = f^{-1}(f(x_1))\setminus \left\{ x_1\right\} \). It holds that,

$$\begin{aligned}&\sum _{x'_1 \in f^{-1}(f(x_1))}\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x'_1,\cdot )\right] \\&~~~~~~~~~~~~~~~\cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x'_1,\cdot ) \mid x'_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] }\\&=\sum _{x'_1\in \varOmega }\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x'_1,\cdot )\right] \cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x'_1,\cdot ) \mid x'_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] }\\&~~~~~~+\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x_1,\cdot )\right] \cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x_1,\cdot ) \mid x_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] } \\&\begin{aligned} =\sum _{x'_1\in \varOmega }&\Pr _{h \leftarrow \mathcal {H}}\left[ \widetilde{\mathsf{A}}(h)=x'_1\right] \cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \widetilde{\mathsf{A}}(h)=x'_1 \mid x'_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] }\\&+\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x_1,\cdot )\right] \cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x_1,\cdot ) \right] } \end{aligned}\\&=\sum _{x'_1\in \varOmega }\Pr _{h \leftarrow \mathcal {H}}\left[ \widetilde{\mathsf{A}}(h)=x'_1\right] \cdot \Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \widetilde{\mathsf{A}}(h)=x'_1 \mid x'_1 \in S(h')\right] +\alpha _1^2, \end{aligned}$$

where the second equality holds by definition of \(\widetilde{\mathsf{A}}\) and since \(x_1\) is always a member in \({\mathcal {G}}_{h,x_1,x_2}\). We next show that

$$\begin{aligned} \sum _{x'_1\in \varOmega }\Pr _{h \leftarrow \mathcal {H}}\left[ \widetilde{\mathsf{A}}(h)=x'_1\right] \cdot \Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \widetilde{\mathsf{A}}(h)=x'_1 \mid x'_1 \in S(h')\right] \ge \alpha _2^2 \cdot n^{-\beta -1}. \end{aligned}$$
(9)

Indeed, assume that \(\varOmega \) is not empty, as otherwise the above holds trivially. We observe that for every \(x \in \varOmega \),

$$\begin{aligned} 0<\Pr _{h'\leftarrow \mathcal {H}}\left[ x \in S(h')\right] \le \left| \mathsf{Im}(f) ~\right| \cdot n/2^n \le n^{\beta +1}/\left| f^{-1}(f(x))\right| \le n^{\beta +1}/\left| \varOmega \right| . \end{aligned}$$
(10)

Thus we can use Lemma 2.16, with \(\mathcal {X}=\mathcal {H}\) in order to get Eq. (9).

Combining the above, we conclude that

$$\begin{aligned}&\sum _{x'_1\in f^{-1}(f(x_1))}\Pr _{h \leftarrow \mathcal {H}}\left[ \mathsf{A}(h,x_1,x_2)=(x'_1,\cdot )\right] \\&~~~~~~~~~~~~~~~~~\cdot {\Pr _{\begin{array}{c} h' \leftarrow \mathcal {H} \end{array}}\left[ \mathsf{A}(h',x_1,x_2)=(x'_1,\cdot ) \mid x'_1 \in {\mathcal {G}}_{h',x_1,x_2}\right] }\\&\ge \alpha _2^2 \cdot n^{-\beta -1} + \alpha _1^2. \end{aligned}$$

The claim follows since either \(\alpha _1\) or \(\alpha _2\) is at least \(\alpha _{x_1,x_2}/2\).

We are now ready to prove Claim 4.6.

Proof

(Proof of Claim 4.6). For fixed \(x_1\) and \(x_2\) let \(\alpha _{x_1,x_2}\) be as in Claim 4.9. We start by showing that

$$\begin{aligned} \Pr _{x \leftarrow \left\{ 0,1\right\} }\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \ge {\text {*}}{E}_{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n}\left[ \alpha ^2_{x_1,x_2}\right] \cdot n^{-2\beta -2}\cdot 2^{-12}. \end{aligned}$$
(11)

Indeed, by Eq. (8),

$$\begin{aligned}&\Pr _{x \leftarrow \left\{ 0,1\right\} }\left[ \mathsf{Inv}^\mathsf{A}(f(x)) \in f^{-1}(f(x))\right] \ge {\text {*}}{E}_{\begin{array}{c} h \leftarrow \mathcal {H}, x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n\\ y \leftarrow f(\left\{ 0,1\right\} ^n) \\ (x'_1,x'_2) \leftarrow A(h,x_1,x_2) \end{array}}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \\&={\text {*}}{E}_{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n}\left[ {\text {*}}{E}_{\begin{array}{c} h \leftarrow \mathcal {H}, y \leftarrow f(\left\{ 0,1\right\} ^n), \\ (x'_1,x'_2) \leftarrow A(h,x_1,x_2) \end{array}}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \right] , \end{aligned}$$

and thus it is enough to show that for every fixed \(x_1,x_2\in \left\{ 0,1\right\} ^n\),

$$\begin{aligned} {\text {*}}{E}_{\begin{array}{c} h \leftarrow \mathcal {H}, y \leftarrow f(\left\{ 0,1\right\} ^n), \\ (x'_1,x'_2) \leftarrow A(h,x_1,x_2) \end{array}}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \ge \alpha ^2_{x_1,x_2}\cdot n^{-2\beta -2}\cdot 2^{-12}. \end{aligned}$$

Indeed, recall that by definition, \(p_\mathsf{A}(x_1,x_2,\bot ,y) =0\). Therefore,

$$\begin{aligned}&{\text {*}}{E}_{\begin{array}{c} h \leftarrow \mathcal {H}, y \leftarrow f(\left\{ 0,1\right\} ^n), \\ (x'_1,x'_2) \leftarrow A(h,x_1,x_2) \end{array}}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \\&=\sum _{x'_1\in f^{-1}(f(x_1))}\Pr _{h\leftarrow \mathcal {H}}\left[ A(h,x_1,x_2)=(x'_1,\cdot )\right] \cdot {\text {*}}{E}_{ y \leftarrow f(\left\{ 0,1\right\} ^n)}\left[ p_\mathsf{A}(x_1,x_2,x'_1,y)\right] \\&\ge \sum _{x'_1\in f^{-1}(f(x_1))}\Pr _{h\leftarrow \mathcal {H}}\left[ A(h,x_1,x_2)=(x'_1,\cdot )\right] \\&~~~~~~~~~~\cdot \Pr _{h' \leftarrow \mathcal {H}}\left[ A(h',x_1,x_2)=(x'_1,\cdot ) \mid x'_1\in {\mathcal {G}}_{h',x_1,x_2}\right] \cdot n^{-\beta -1}\cdot 2^{-10}\\&\ge \alpha ^2_{x_1,x_2}\cdot n^{-2\beta -2}\cdot 2^{-12}. \end{aligned}$$

Where the equality holds by the assumption that \(\mathsf{A}\) always outputs a valid collision, or \(\bot \). The first inequality holds by Claim 4.8 and the second by Claim 4.9.

We are now left to bound \({\text {*}}{E}_{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n}\left[ \alpha ^2_{x_1,x_2}\right] \cdot n^{-2\beta -2}\cdot 2^{-12}\). Observe that by definition \({\text {*}}{E}_{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n}\left[ \alpha _{x_1,x_2}\right] = \alpha \), and thus by the Jensen inequality, it holds that \({\text {*}}{E}_{x_1,x_2 \leftarrow \left\{ 0,1\right\} ^n}\left[ \alpha ^2_{x_1,x_2}\right] \ge \alpha ^2\), which concludes the proof.