Keywords

1 Introduction

Quantum walks are a powerful algorithmic tool which has been used to provide state-of-the-art algorithms for various important problems in post-quantum cryptography, such as the shortest vector problem (SVP) via lattice sieving [9], the subset sum problem [4], information set decoding [22], etc.

These applications are all established under a particular quantum walk framework called the MNRS framework [26], and the quantum walks look for marked nodes in a so-called Johnson graph [22] (or a product of Johnson graphs). When walking on this particular graph, the MNRS framework is somewhat rigid. First, it requires to setup the uniform superposition of all nodes along with their attached data structure, then it applies multiple times reflection operators which move this quantum state close to the uniform superposition of all marked nodes.

Due to this rigidity, previously, the best way to find k different marked nodes was to run the whole quantum walk (including the setup) k times. In [9] the authors noticed that a way to output multiple solutions instead of a single one with quantum walks would improve the quantum time complexity of their algorithm for solving the SVP.

A natural observation which guides us throughout this paper is that in certain cases, after obtaining the uniform superposition of all marked nodes via the MNRS quantum walk, it is possible to retrieve part of the solution and start another MNRS quantum walk using the remaining part of the quantum state as the new starting state. By doing so, we avoid repeating the setup cost for each new quantum walk, and we now benefit from a trade-off.

In particular, using this observation, we tackle the following problem:

Problem 1

(Multiple collision search). Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\) be a random function. Let \(k \le 2n-m\). Find \(2^k\) collision pairs, that is, pairs of distinct xy such that \(f(x) = f(y)\).

The constraints on the input and output domain are such that a significant (\(\varTheta \left( 2^{2n-m}\right) \)) number of collisions pairs exist in the random case. This problem has several applications both in asymmetric and symmetric cryptography. For example, the problem of finding multiple vectors close to a target vector, which appears in lattice sieving (as mentioned above) can be seen as a special case. The limited-birthday problem in symmetric cryptanalysis (e.g., impossible differential attacks and rebound distinguishers [16]) is another example.

Lower Bounds. While quantum query lower bounds for the collision problem (with a single solution) had been known for a longer time, Liu and Zhandry proved more recently in [25] a query lower bound in \(\Omega \left( 2^{2k/3 + m/3} \right) \) to find \(2^k\) solutions, which holds for all values of \(m \ge n\).

For relatively small values of k and m (precisely, \(k \le 3n -2m\), as we explain in Sect. 6), the BHT collision search algorithm [8] allows to reach this bound. Besides this algorithm, Ambainis’ algorithm [2] uses a quantum walk to find one collision in time \( \widetilde{\mathcal {O}} \left( 2^{m/3} \right) \). However, no matching algorithm was known for other values, neither in time nor in queries.

Contributions. Our main contribution in this paper is a chained quantum walk algorithm to solve the multiple collision search problem. We formalize the intuitive idea that the output state of a quantum walk can be reused, to some extent, as the starting state of another. For any admissible values of knm such that \(k \le \frac{m}{4}\), our algorithm requires \( \mathcal {O} \left( 2^{\frac{2}{3}k+\frac{m}{3}} \right) \) queries, and also \( \widetilde{\mathcal {O}} \left( 2^{\frac{2}{3}k+\frac{m}{3}} \right) \) quantum gates (i.e., time) and space in the qRAM model.

Theorem 4 (Sect. 4). Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\) be a random function. Let \(k \le \min (2n-m, m/4)\). There exists a quantum algorithm making \( \mathcal {O} \left( 2^{2k/3 + m/3} \right) \) quantum queries to f and with a gate count \( \widetilde{\mathcal {O}} \left( 2^{2k/3 + m/3} \right) \), that outputs \(2^k\) collision pairs of f.

By combining our algorithm with the BHT approach, we can now meet the lower bound over all values of knm, except a range of (km) contained in \(\left[ \frac{n}{3}, n\right] \times \left[ n , 1.6 n\right] \), as summarized in Fig. 1. Nevertheless, our approach also improves the known complexities in this range.

Fig. 1.
figure 1

Gate count exponent in the algorithm depending on the relative values of km and n. Both our algorithm and the BHT approach can be extended to the whole triangle, but we show only the one achieving the best complexity. In the purple region (bottom left), both approaches reach the same complexity exponent \(\frac{2k}{3} + \frac{m}{3}\).

Using our new algorithm, we improve the state-of-the-art time complexity of quantum sieving to solve the SVP in [9] from \(2^{0.2570d + o(d)}\) to \(2^{0.2563d + o(d)}\) quantum gates. We also provide time-memory trade-offs that are conjectured to be tight [15]:

Theorem 7 (Sect. 4). Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\) be a random function. For all \(k \le \ell \le \max (2n-m, m/2)\), there exists an algorithm that computes \(2^k\) collisions using \( \widetilde{\mathcal {O}} \left( 2^\ell \right) \) qubits and \( \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) \) quantum gates and quantum queries to f.

Organization. In Sect. 2 we provide several technical preliminaries on quantum algorithms, especially Grover’s quantum search algorithm. Indeed, an MNRS quantum walk actually emulates a quantum search, and these results are helpful in analyzing the behavior of such a walk. In Sect. 3, we give important details on the MNRS framework, and in particular, the vertex-coin encoding, which is a subtlety often omitted from depictions of the framework in the previous literature. In Sect. 4 we detail our algorithm assuming a suitable quantum data structure is given, and in Sect. 5 we detail the quantum radix trees. While they were already proposed in [21], we give new (or previously omitted) details relative to the radix tree operations, memory allocation, and how we can efficiently and robustly extract collisions. We give a general summary of the multiple collision search problem in Sect. 6 and our applications in Sect. 7.

2 Preliminaries

In this section, we give some preliminaries on collision search, quantum algorithms and Grover search, which are important for the analysis of quantum walks and their data structures.

2.1 Collision Search

In this paper, we study the problem of collision search in random functions.

Problem 2

Let \(f~: \{0,1\}^n \rightarrow \{0,1\}^m\) (\(n \le m\)) be a random function. Find a collision of f, that is, a pair \((x,y), x \ne y\) such that \(f(x) = f(y)\).

The case \(m < n\) can be solved by the same algorithms as the case \(m = n\) by reducing f to a subset of its domain. This is why in the following, we focus only on \(m \ge n\). The average number of collisions is \( \mathcal {O} \left( 2^{2n-m} \right) \). When \(m \ge 2n\), we can assume that exactly one collision exists, or none. Distinguishing between these two cases is the problem of element distinctness, which is solved by searching for the collision. In all cases, the collision problem can be solved in:

  • \(\varTheta (2^{m/2})\) classical time (and queries to f). When \(m = n\), the problem is the easiest, as it requires only \( \mathcal {O} \left( 2^{n/2} \right) \) time and \(\textsf{poly}(n)\) memory using Pollard’s rho method. When \(m = 2n\), the problem is harder since the best algorithm also uses \(\varTheta (2^n)\) memory.

  • \(\varTheta (2^{m/3})\) quantum time (and quantum queries to f). A first algorithm was given by Brassard, Høyer and Tapp to reach this for \(m = n\) [8], then the lower bound was proven to be \(\Omega (2^{m/3})\) [1], and afterwards, Ambainis solved the element distinctness problem (the case \(m = 2n\)) by a quantum walk algorithm [2] which can be adapted for any value of m.

In our case, we want to solve the problem of multiple collision search: as there will be expectedly many collisions in the outputs of f, we want to find a significant (exponential in n) number of them.

Problem 3

Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\), \(k \le 2n-m\). Find \(2^k\) distinct collisions of f.

Here the state of the art differ classically and quantumly:

  • Classically, it is well known that the problem can be solved for any m and k in \(\varTheta (2^{(k+m)/2})\) queries (as long as \(2^k\) does not exceed the average number of collisions of f).

  • Quantumly, Liu and Zhandry [25] gave a query lower bound \(\Omega (2^{2k/3 + m/3})\). However, a matching algorithm is only known for small m. For example, this lower bound is matched for \(m = n\) by adapting the BHT algorithm [17, 25].

Note that we assume that the collision pairs are fully distinct. In the case \(m < n, k \ge m\), there are not enough distinct images, and we only obtain multicollision tuples. The lower bound of [25] does not apply here. If \(m < n\) and \(k \le m\), we restrict the inputs of the function to a set of size \(\{0,1\}^m\), and this case is covered by a variant of the BHT algorithm. Thus, like in the case of a single collision, we will only consider \(n \le m\).

On the Memory Complexity. For \(m = n\), the best known classical algorithm for multiple collision-finding is the parallel collision search (PCS) algorithm by van Oorschot and Wiener [28]. It generalizes Pollard’s rho method which finds a single collision in \( \mathcal {O} \left( 2^{n/2} \right) \) time and \(\textsf{poly}(n)\) memory. Dinur [12] showed that in this regime, the time-space trade-off of the PCS algorithm is optimal. Using a restricted model of computation, it can also be shown optimal for larger values of m.

Quantumly, a time-space lower bound of \(T^3S \ge \Omega \left( 2^{3k+m} \right) \) has been shown [15]. However, the authors conjecture this bound can be improved to \(T^2S \ge \Omega \left( 2^{2k+m} \right) \). All known quantum algorithms for collisions, including our new algorithms, match this conjectured lower bound.

2.2 Quantum Algorithms

We refer to [27] for an introduction to quantum computation. We write our quantum algorithms in the standard quantum circuit model, where algorithms are written as a sequence of standard quantum gates. We are interested in the minimal achievable gate count. This means that we do not consider any parallelization trade-offs, even though there is some literature on the topic for SVP algorithms [24]. By default, we use the (universal) Clifford+T gate set, although our complexity analysis remains asymptotic, and we do not detail our algorithms at the gate level.

Memory Models. Many memory-intensive quantum algorithms require some kind of quantum random-access model (qRAM), which can be stronger than the standard quantum circuit model. One can encounter two types of qRAM:

  • Classical memory with quantum random access (QRACM): a classical memory of size M can be addressed in quantum superposition in \(\textsf{polylog}(M)\) operations.

  • Quantum memory with quantum random access (QRAQM): M qubits can be addressed in quantum superposition in \(\textsf{polylog}(M)\) operations.

The QRAQM model is required by most quantum walk based algorithms for cryptographic problems, e.g., subset-sum [3, 4], information set decoding [22] and the most recent quantum algorithm for lattice sieving [9]. It requires to augment the set of gates available with a “qRAM” gate addressing all M memory cells (e.g., individual bits) in superposition. In this paper, we use a definition taken from [2]:

(1)

This operation implies the ability to read in superposition by querying the cell at index i, but also to write. This is necessary for efficient data structures such as the ones studied in [2] or the quantum radix trees from the literature (see Sect. 5).

While the \(\textsf{qRAM}\) gate can be simulated with \( \widetilde{\mathcal {O}} \left( M \right) \) Clifford+T gates, in the following, the gate count of our algorithms is given asymptotically on the “Clifford + T + qRAM” gate set, so we assume the qRAM has unit cost, as is required by previous works.

Collision Finding Without qRAM. To date, the best quantum algorithms for collision finding, and the ones that reach the query lower bound, require the qRAM model: the BHT algorithm [8] uses QRACM and Ambainis’ quantum walk uses QRAQM [2] to define gate-efficient quantum data structures. Initially Ambainis used a skip list. We will focus on the more recent quantum radix tree, but the QRAQM requirement remains the same.

To some extent, it is possible to get rid of qRAM. For \(m = n\), the complexity rises from \( \mathcal {O} \left( 2^{m/3} \right) \) to \( \mathcal {O} \left( 2^{2m/5} \right) \) gates [10]. For \(m = 2n\), the complexity rises to \( \mathcal {O} \left( 2^{3m/7} \right) \) [20]. These algorithms can also be adapted for multiple collision finding, where they will outperform the classical ones for some parameter ranges (but not all).

2.3 Grover’s Algorithm

In this section, we recall Grover’s quantum search algorithm [14] and give a few necessary results for the rest of our analysis. Indeed, as shown in [26], an MNRS quantum walk actually emulates a quantum search, up to some error. If we manage to put this error aside, the analysis of the walk follows from the following lemmas.

Original Quantum Search. In the original setting of Grover’s search, we have a function \(g : \{0,1\}^{n} \rightarrow \{0,1\}\) and the goal is to find x st. \(g(x) = 1\) using queries to g. In the quantum setting, we have access to the unitary \(O_g : \left| x \right\rangle \left| b \right\rangle \rightarrow \left| x \right\rangle \left| b \oplus g(x) \right\rangle \), which is an efficient quantum unitary if g is efficiently computable. In particular we can compute \(\left| \psi _U \right\rangle = \frac{1}{\sqrt{2^n}}\sum _{x \in \{0,1\}^n} \left| x \right\rangle \left| g(x) \right\rangle \) with a single call to \(O_g\). Let \(\varepsilon = \frac{|\{x : g(x) = 1\}|}{2^n}\). We also define the normalized states

$$ \left| \psi _{B} \right\rangle = \frac{1}{\sqrt{(1-\varepsilon )2^n}}\sum _{x: g(x) = 0 } \left| x \right\rangle \left| g(x) \right\rangle , \quad \left| \psi _{G} \right\rangle = \frac{1}{\sqrt{\varepsilon 2^n}}\sum _{x: g(x) = 1 } \left| x \right\rangle \left| g(x) \right\rangle $$

and \(\left| \psi _U \right\rangle = \sqrt{1-\varepsilon }\left| \psi _{B} \right\rangle + \sqrt{\varepsilon }\left| \psi _{G} \right\rangle \). Let \(\mathcal {H}= \textrm{span}(\{\left| \psi _B \right\rangle ,\left| \psi _G \right\rangle \})\). Let \(\textrm{Rot}_{\theta }\) be the \(\theta \)-rotation unitary in \(\mathcal {H}\):

$$\begin{aligned} \textrm{Rot}_\theta (\cos (\alpha )\left| \phi _B \right\rangle + \sin (\alpha )\left| \psi _G \right\rangle ) = \cos (\alpha + \theta )\left| \psi _B \right\rangle + \sin (\alpha + \theta )\left| \psi _G \right\rangle . \end{aligned}$$

For a fixed \(\varepsilon \), let \(\alpha = \arcsin (\sqrt{\varepsilon })\) so that

$$\left| \phi _U \right\rangle = \sqrt{1-\varepsilon }\left| \psi _B \right\rangle + \sqrt{\varepsilon }\left| \psi _G \right\rangle = \cos (\alpha )\left| \psi _B \right\rangle + \sin (\alpha )\left| \psi _G \right\rangle ,$$

For a state \(\left| \psi \right\rangle \in \mathcal {H}\), let \(\textrm{Ref}_{\left| \psi \right\rangle }\) be the reflection over \(\left| \psi \right\rangle \) in \(\mathcal {H}\):

$$\begin{aligned} Ref_{\left| \psi \right\rangle } \left| \psi \right\rangle = \left| \psi \right\rangle \quad \text {and} \quad Ref_{\left| \psi \right\rangle } \left| \psi ^\bot \right\rangle = - \left| \psi ^\bot \right\rangle \end{aligned}$$

where \(\left| \psi ^\bot \right\rangle \) is any state in \(\mathcal {H}\) orthogonal to \(\left| \psi \right\rangle \)Footnote 1 We have

$$\begin{aligned} \textrm{Ref}_{\left| \psi _U \right\rangle }\textrm{Ref}_{\left| \psi _{B} \right\rangle } = \textrm{Rot}_{2\alpha } . \end{aligned}$$

Assume that we have access to a checking oracle \(O_{\textrm{check}}\) which performs:

$$\begin{aligned} {\left\{ \begin{array}{ll} O_{\textrm{check}} \left| \psi _{B} \right\rangle \left| 0 \right\rangle &{} = \left| \psi _{B} \right\rangle \left| 0 \right\rangle \\ O_{\textrm{check}} \left| \psi _{G} \right\rangle \left| 0 \right\rangle &{} = \left| \psi _{G} \right\rangle \left| 1 \right\rangle \end{array}\right. } \end{aligned}$$

In the standard setting described above, this is just copying the last register. Starting from an “initial state” \(\left| \psi _U \right\rangle \), we apply repeatedly an iterate consisting of a reflection over \(\left| \psi _U \right\rangle \), and a reflection over \(\left| \psi _B \right\rangle \). This progressively transforms the current state into the “good state” \(\left| \psi _G \right\rangle \). Typically \(\textrm{Ref}_{\left| \psi _U \right\rangle }\) is constructed from a circuit that computes \(\left| \psi _U \right\rangle \) and \(\textrm{Ref}_{\left| \psi _{B} \right\rangle }\) is implemented using the checking oracle above: in that case, we are actually performing an amplitude amplification [7].

Proposition 1

(Grover’s algorithm, known \(\alpha \)). Consider the following algorithm, with \(\alpha \le \pi /4\):

  1. 1.

    Start from \(\left| \psi _U \right\rangle \).

  2. 2.

    Apply \(\textrm{Rot}_{2\alpha } = \textrm{Ref}_{\left| \psi _U \right\rangle }\textrm{Ref}_{\left| \psi _{B} \right\rangle }\) N times on \(\left| \psi _U \right\rangle \) with \(N = \lfloor \frac{\pi /2 - \alpha }{2\alpha }\rfloor \).

  3. 3.

    Apply \(O_{\textrm{check}}\) and measure the last qubit.

This procedure measures 1 wp. at least \(1 - 4\alpha ^2\) and the resulting state is \(\left| \psi _{G} \right\rangle \).

Proof

Let us define \(\gamma = \alpha + 2N\alpha \). We have

$$(Rot_{2\alpha })^n \left| \psi _U \right\rangle = \cos (\alpha + 2N\alpha )\left| \psi _B \right\rangle + \sin (\alpha + 2N\alpha )\left| \psi _G \right\rangle = \cos (\gamma )\left| \psi _B \right\rangle + \sin (\gamma )\left| \psi _G \right\rangle .$$

Notice that we chose N st. \(\gamma \le \frac{\pi }{2} < \gamma + 2\alpha \) so \(\frac{\pi }{2} - \gamma \in [0,2\alpha )\). After applying the checking oracle, we obtain the state

$$ \cos (\gamma )\left| \psi _B \right\rangle \left| 0 \right\rangle + \sin (\gamma )\left| \psi _G \right\rangle \left| 1 \right\rangle .$$

Measuring the last qubit gives outcome 1 with probability \(\sin ^2(\gamma )\) and the resulting state in the first register is \(\left| \psi _G \right\rangle \). In order to conclude, we compute

$$ \sin ^2(\gamma ) = \cos ^2(\pi /2 - \gamma ) \ge \cos ^2(2\alpha ) \ge 1 - 4\alpha ^2. \qquad \Box $$

In our algorithms, we will start not from the uniform superposition \(\left| \psi _U \right\rangle \), but from the bad subspace \(\left| \psi _B \right\rangle \). We show that this makes little difference.

Proposition 2

(Starting from \(\left| \psi _B \right\rangle \) , known \(\alpha \)). Consider the following algorithm, with \(\alpha \le \pi /4\):

  1. 1.

    Start from \(\left| \psi _B \right\rangle \).

  2. 2.

    Apply \(\textrm{Rot}_{2\alpha } = \textrm{Ref}_{\left| \psi _U \right\rangle }\textrm{Ref}_{\left| \psi _{B} \right\rangle }\) \(N'\) times on \(\left| \psi _B \right\rangle \) with \(N' = \lfloor \frac{\pi /2}{2\alpha }\rfloor \).

  3. 3.

    Apply the checking oracle and measure the last qubit.

This procedure measures 1 with probability at least \(1 - 4\alpha ^2\) and the resulting state is \(\left| \psi _{G} \right\rangle \).

Proof

The proof is essentially the same as the previous one. Let \(\gamma ' = 2N'\alpha \). We have

$$(Rot_{2\alpha })^{N'} \left| \psi _B \right\rangle = \cos (2N'\alpha )\left| \psi _B \right\rangle + \sin (2N'\alpha )\left| \psi _G \right\rangle = \cos (\gamma ')\left| \psi _B \right\rangle + \sin (\gamma ')\left| \psi _G \right\rangle .$$

Notice that we chose \(N'\) st. \(\gamma ' \le \frac{\pi }{2} < \gamma ' + 2\alpha \) so \(\frac{\pi }{2} - \gamma ' \in [0,2\alpha )\). After applying the checking oracle, we obtain the state

$$ \cos (\gamma ')\left| \psi _B \right\rangle \left| 0 \right\rangle + \sin (\gamma ')\left| \psi _G \right\rangle \left| 1 \right\rangle .$$

Measuring the last qubit gives 1 wp. \(\sin ^2(\gamma ')\) and the resulting state in the first register is \(\left| \phi _G \right\rangle \). In order to conclude, we compute

$$ \sin ^2(\gamma ') = \cos ^2(\pi /2 - \gamma ') \ge \cos ^2(2\alpha ) \ge 1 - 4\alpha ^2. \qquad \Box $$

After applying the check and measuring, if we don’t succeed, we obtain the state \(\left| \psi _B \right\rangle \) again. So we can run the quantum search again.

In Grover’s algorithm, we have a procedure to construct \(\left| \psi _U \right\rangle \) and we use this procedure to initialize the algorithm and to perform the operation \(\textrm{Ref}_{\left| \psi _U \right\rangle }\). A quantum walk will have the same general structure as Grover’s algorithm, but we will manipulate very large states \(\left| \psi _U \right\rangle \). Though \(\left| \psi _U \right\rangle \) is long to construct (the setup operation), performing \(\textrm{Ref}_{\left| \psi _U \right\rangle }\) will be less costly.

In the MNRS framework, \(\left| \psi _U \right\rangle \) is chosen as the unique eigenvector of eigenvalue 1 of an operator related to a random walk in a graph. To perform \(\textrm{Ref}_{\left| \psi _U \right\rangle }\) efficiently, we use phase estimation on this operator.

3 Quantum Walks for Collision Finding

In this section, we present MNRS quantum walks, which underlie most cryptographic applications of quantum walks to date, and give important details on their actual implementation using a vertex-coin encoding.

3.1 Definition and Example

We consider a regular, undirected graph \(G = (V, E)\), which in cryptographic applications (e.g., collision search), is usually a Johnson graph (as in this paper) or a product of Johnson graphs (a case detailed e.g. in [22]).

Definition 1

(Johnson graph). The Johnson graph J(NR) is a regular, undirected graph whose vertices are the subsets of [N] containing R elements, with an edge between two vertices v and \(v'\) iff \(|v\cap v'|=R-1\). In other words, v is adjacent to \(v'\) if \(v'\) can be obtained from v by removing an element and adding an element from \([N] \backslash v\) in its place.

In collision search, a vertex in the graph specifies a set of R inputs to the function f under study, where its domain \(\{0,1\}^n\) is identified with \([2^n]\). Let \(M \subseteq V\) be a set of marked vertices, e.g., all the subsets \(S \subseteq \{0,1\}^n\) which contain a collision of f: \(\exists x,y \in S, x \ne y, f(x) = f(y)\). A classical random walk on G finds a marked vertex using Algorithm 1.

figure a

The quantum walk is analogous to this process. Let \(\varepsilon = \frac{|M|}{|V|}\) be the proportion of marked vertices and \(\delta \) be the spectral gap of G. Starting from any vertex, after \( \mathcal {O} \left( \frac{1}{\delta } \right) \) updates, we sample a vertex of the graph uniformly at random. For a Johnson graph J(NR), \(\delta = \frac{N}{R(N-R)} \simeq \frac{1}{R}\). Let \(\textsf{S}\) be the time to Setup, \(\textsf{U}\) the time to Update, \(\textsf{C}\) the time to Check a given vertex. Then Algorithm 1 finds a marked vertex in time: \( \mathcal {O} \left( \textsf{S} + \frac{1}{\varepsilon } \left( \frac{1}{\delta } \textsf{U} + \textsf{C} \right) \right) \). Magniez et al. [26] show how to translate this generically in the quantum setting, provided that quantum analogs of these operations (SETUP, UPDATE, CHECK) can be implemented.

Theorem 1

(From [26]). Assume that quantum algorithms SETUP, UPDATE and CHECK are given. Then there exists a quantum algorithm that finds a marked vertex with gate count: \( \widetilde{\mathcal {O}} \left( \textsf{S} + \frac{1}{\sqrt{\varepsilon }} \left( \frac{1}{\sqrt{\delta }} \textsf{U} + \textsf{C} \right) \right) \) instead of \( \mathcal {O} \left( \frac{1}{\sqrt{\varepsilon }} \left( \textsf{S}+\textsf{C}\right) \right) \) with a naive search.

Using this framework generically, we can recover the complexity of Ambainis’ algorithm for collision search: \( \widetilde{\mathcal {O}} \left( 2^{m/3} \right) \) for any codomain bit-size m. We use the Johnson graph \(J(2^n, 2^{m/3})\). Its spectral gap is approximately \(2^{-m/3}\). A vertex is marked if and only if it contains a collision, so the probability of being marked is approximately \(2^{2m/3-m} = 2^{-m/3}\). Using a quantum data structure for unordered sets, we can implement SETUP in gate count \( \widetilde{\mathcal {O}} \left( 2^{m/3} \right) \), UPDATE and CHECK in \(\textsf{poly}(n)\). The formula of Theorem 1 gives the complexity \( \widetilde{\mathcal {O}} \left( 2^{m/3} \right) \).

3.2 Details of the MNRS Framework

In the d-regular graph \(G = (V,E)\), for each \(x \in V\), let \(N_x\) be the set of neighbors of x, of size d. In the case \(G = J(N,R)\), we have \(d = R(N-R)\). For a vertex x, let \(\left| x \right\rangle \) be an arbitrary encoding of x as a quantum state, let D(x) be a data structure associated to x, and let \(\left| \widehat{x} \right\rangle = \left| x \right\rangle \left| D(x) \right\rangle \).

Remark 1

The encoding of x is commonly thought of as the set itself, and the data structure as the images of the set by f. But whenever we look at quantum walks from the perspective of gate count (and not query complexity), an efficient quantum data structure is already required for x itself, i.e., an unordered set data structure in the case of a Johnson graph, and one cannot really separate x from D(x). This is why we will favor the notation \(\left| \widehat{x} \right\rangle \).

For a vertex x, let \(\left| p_x \right\rangle \) be the uniform superposition over its neighbors: \(\left| p_x \right\rangle = \frac{1}{\sqrt{d}} \sum _{y \in N_x} \left| y \right\rangle \), and: \(\left| \widehat{p_x} \right\rangle = \frac{1}{\sqrt{d}} \sum _{y \in N_x} \left| \widehat{y} \right\rangle \). From now on, we consider a walk on edges rather than vertices in the graph, and introduce:

$$\begin{aligned} {\left\{ \begin{array}{ll} \left| \psi _U \right\rangle = \frac{1}{\sqrt{|V|}} \sum _{x \in V} \left| \widehat{x} \right\rangle \left| p_x \right\rangle \text { the superposition of vertices (and neighbors)} \\ \left| \psi _M \right\rangle = \frac{1}{\sqrt{|M|}} \sum _{x \in M} \left| \widehat{x} \right\rangle \left| p_x \right\rangle \text { the superposition of marked vertices} \\ A = \textrm{span} \{\left| \widehat{x} \right\rangle \left| p_x \right\rangle \}_{x \in V} \\ B = \textrm{span} \{\left| \widehat{p_y} \right\rangle \left| y \right\rangle \}_{y \in V} \\ \end{array}\right. } \end{aligned}$$

Let \(\textrm{Ref}_A\) and \(\textrm{Ref}_B\) be respectively the reflection over the space A and the space B. The core of the MNRS framework is to use these operations to emulate a reflection over \(\left| \psi _U \right\rangle \). By alternating such reflections with reflections over \(\left| \psi _M \right\rangle \) (using the checking procedure), the quantum walk behaves exactly as a quantum search, and the analysis of Sect. 2.3 applies.

Proposition 3

(From [26]). Let \(W = \textrm{Ref}_B \textrm{Ref}_A\). We have \( \left\langle \psi _{U} \right| W\left| \psi _{U} \right\rangle = 1\). For any other eigenvector \(\left| \psi \right\rangle \) of W, we have \(\left\langle \psi \right| W\left| \psi \right\rangle = e^{i\theta }\) with \(\theta \in [2\sqrt{\delta },\pi /2]\).

To reflect over \(\left| \psi _U \right\rangle \), we perform a phase estimation of the unitary W, which allows to separate the part with eigenvalue 1, from the part with eigenvalue \(e^{i\theta }\) with \(\theta \in [2\sqrt{\delta },\pi /2]\). The phase estimation circuit needs to call W a total of \( \mathcal {O} \left( \frac{1}{\sqrt{\delta }} \right) \) times to estimate \(\theta \) up to sufficient precision. It has some error, which can be made insignificant with a polynomial increase in complexity; thus in the following, we will consider the reflection \(\textrm{Ref}_{U}\) to be exact.

To construct W, we need to implement \(\textrm{Ref}_A\) and \(\textrm{Ref}_B\). We first remark that:

$$\begin{aligned} \textrm{Ref}_B = \text {SWUP}\circ \textrm{Ref}_A \circ \text {SWUP}, \end{aligned}$$
(2)

where \(\text {SWUP}\left| \widehat{x} \right\rangle \left| y \right\rangle = \left| \widehat{y} \right\rangle \left| x \right\rangle \). This \(\text {SWUP}\) (Swap-Update) operation can furthermore be decomposed into an update of the database (\(\text {UP}_D\)) followed by a register swap:

$$\begin{aligned} \left| \widehat{x} \right\rangle \left| y \right\rangle = \left| x \right\rangle \left| D(x) \right\rangle \left| y \right\rangle \xrightarrow {\text {UP}_D} \left| x \right\rangle \left| D(y) \right\rangle \left| y \right\rangle \xrightarrow {\text {Swap}} \left| y \right\rangle \left| D(y) \right\rangle \left| x \right\rangle = \left| \widehat{y} \right\rangle \left| x \right\rangle , \end{aligned}$$
(3)

so \(\text {SWUP}= \text {Swap}\circ \text {UP}_D\).

We would then implement \(\textrm{Ref}_A\) using an update unitary that, from a vertex x, constructs the uniform superposition of neighbors. However this would require us to write \(\log _2(|V|)\) data, and in practice, |V| is doubly exponential (the vertex is represented by an exponential number of bits). Thankfully, in d-regular graphs, and in particular in Johnson graphs, we can avoid this loophole by making the encoding of edges more compact. Instead of storing a pair of vertices (xy), which will eventually result in having to rewrite entire vertices, we can store a single vertex and a direction, or coin.

3.3 Vertex-Coin Encoding

The encoding is a reversible operation: \(O_{\text {Enc}} \left| \widehat{x} \right\rangle \left| y \right\rangle = \left| \widehat{x} \right\rangle \left| c_{x \rightarrow y} \right\rangle \), which compresses an edge (xy) by replacing y by a much smaller register of size \(\lceil \log _2(d) \rceil \). Note that we only need the existence of such a circuit. We never use it during the algorithms; all operations are directly performed using the compact encoding.

Let \(\left| \psi _{\textrm{Unif}}^{coin} \right\rangle = \frac{1}{\sqrt{d}} \sum _{c} \left| c \right\rangle \) be the uniform superposition of coins. In the vertex-coin encoding, \(\textrm{Ref}_A\) corresponds to \(I \otimes Ref_{\left| \psi _{\textrm{Unif}}^{coin} \right\rangle }\):

$$ \textrm{Ref}_A = O_{\text {Enc}}^{-1} \circ \left( I \otimes \textrm{Ref}_{\left| \psi _{\textrm{Unif}}^{coin} \right\rangle }\right) \circ O_{\text {Enc}}.$$

Now, for the \(\text {SWUP}\) operation, we have to decompose again \(\text {UP}_D\) and \(\text {Swap}\) in the encoded space. First, we define \(\text {UP}'_D\) such that:

$$\left| x \right\rangle \left| D(x) \right\rangle \left| c_{x \rightarrow y} \right\rangle \xrightarrow {UP'_{D}} \left| x \right\rangle \left| D(y) \right\rangle \left| c_{x \rightarrow y} \right\rangle .$$

Moreover, we define \(\text {Swap}'\) such that:

$$\left| x \right\rangle \left| c_{x \rightarrow y} \right\rangle \xrightarrow {\text {Swap}'} \left| y \right\rangle \left| c_{y \rightarrow x} \right\rangle .$$

and we define \(\text {SWUP}' = \text {Swap}' \circ \text {UP}'_D\) (we abuse notation here, by extending \(\text {Swap}'\) where we apply the identity to the middle register), so:

$$\begin{aligned} \text {SWUP}' \left| \widehat{x} \right\rangle \left| c_{x \rightarrow y} \right\rangle = \left| \widehat{y} \right\rangle \left| c_{y \rightarrow x} \right\rangle , \end{aligned}$$

and \(\text {SWUP}' = O_{\text {Enc}} \circ \text {SWUP}\circ O_{\text {Enc}}^{-1}\). So we define

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{Ref}'_A = I \otimes \textrm{Ref}_{\left| \psi _{\textrm{Unif}}^{coin} \right\rangle } = O_{\text {Enc}} \circ Ref_A \circ O_{\text {Enc}}^{-1} \\ \textrm{Ref}'_B = \text {SWUP}' \circ \textrm{Ref}'_A \circ \text {SWUP}' = O_{\text {Enc}} \circ \textrm{Ref}_B \circ O_{\text {Enc}}^{-1} \\ W' = \textrm{Ref}'_B \circ \textrm{Ref}'_A \end{array}\right. } \end{aligned}$$
(4)

By putting everything together, we have \(W' = O_{\text {Enc}} \circ W \circ O_{\text {Enc}}^{-1}\). Since \(O_{\text {Enc}}\) is a unitary operator, W and \(W'\) are unitarily equivalent, i.e., they have the same eigenvalues. Thus, Proposition 3 applies to \(W'\) the same as it does to W, and gives its spectral properties. We can perform phase estimation on \(W'\), and combine afterwards with Proposition 1. Since constructing the uniform superposition of coins is trivial, all relies on the unitary \(\text {SWUP}'{}\).

Theorem 2

(MNRS, adapted). Let \(\left| \widehat{x} \right\rangle \) be an encoding of the vertex x (incl. data structure) and assume that a vertex-coin encoding is given. Let \(\alpha = \arcsin \sqrt{\varepsilon }\). Starting from the state: \(\frac{1}{\sqrt{|V|}} \sum _{x \in V} \left| \widehat{x} \right\rangle \left| \psi _{\textrm{Unif}}^{coin} \right\rangle \), apply \(\left\lfloor \frac{\pi /2 -\alpha }{2\alpha } \right\rfloor \) iterates of: \(\bullet \) a checking procedure which flips the phase of marked vertices; \(\bullet \) a phase estimation of \(W'\); then apply the checking again and measure. With probability at least \(1-4\alpha ^2\), we measure 1 and collapse on the uniform superposition of marked vertices.

Finally, we can adapt this analysis by starting from the bad vertices, with a proof that is the same as Proposition 2. This will be the main building block of our new algorithm.

Theorem 3

(MNRS, starting from bad vertices). Starting from the state: \(\frac{1}{\sqrt{|V| - |M|}} \sum _{x \in V \backslash M} \left| \widehat{x} \right\rangle \left| \psi _{\textrm{Unif}}^{coin} \right\rangle \) (the superposition of unmarked vertices), apply \(\left\lfloor \frac{\pi /2}{2\alpha } \right\rfloor \) iterates of: \(\bullet \) a checking procedure which flips the phase of marked vertices; \(\bullet \) a phase estimation of \(W'\); then apply the checking again and measure. With probability at least \(1-4\alpha ^2\), we measure 1 and collapse on the uniform superposition of marked vertices. Otherwise, we collapse on the uniform superposition of unmarked vertices.

Coins for a Johnson Graph. In a Johnson graph J(NR), a coin \(c = (j,z)\) is a pair where:

  • \(j \in [R]\) is the index of the element that will be removed from the current vertex (given an arbitrary ordering, e.g. the lexicographic ordering of bit-strings).

  • \(z \in [N-R]\) is the index of an element that does not belong to the current vertex, and will be added as a replacement.

4 A Chained Quantum Walk to Find Many Collisions

In this section, we prove our main result.

Theorem 4

Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\) be a random function. Let \(k \le \min (2n-m, m/4)\). There exists a quantum algorithm making \( \mathcal {O} \left( 2^{2k/3 + m/3} \right) \) quantum queries to f and using \( \widetilde{\mathcal {O}} \left( 2^{2k/3 + m/3} \right) \) Clifford+T+qRAM gates, that outputs \(2^k\) collision pairs of f.

Our new algorithm, which is detailed in Sect. 4.1 and Sect. 4.2, solves the case \(k \le \frac{m}{4}\). The case \(k \le 2n-m\) was already solved by adapting the BHT algorithm, as detailed in Sect. 6.

Note that if we are only interested in the query complexity, our technique is still necessary to improve over previous results, but the radix tree data structure that we detail in Sect. 5 can be replaced by a simple ordered list with expensive update operations (see [19]).

4.1 New Algorithm

We detail here our chained quantum walk algorithm. Recall that the Johnson graph J(NR) is the regular, undirected graph whose vertices are subsets of size R of [N], and edges connect each pair of vertices which differ in exactly one element. We identify [N] with \(\{0,1\}^n\), the domain of f.

We assume that an efficient quantum unordered set data structure is given, which makes vertices in the Johnson graph correspond to quantum states, while allowing to implement efficiently the operations required for the MNRS quantum walks. It will be detailed in Sect. 5. In the following we write \(\left| S \right\rangle \) for the quantum state corresponding to a set S.

Idea of Our Algorithm. After running a quantum walk on a Johnson graph, we obtain a superposition of vertices which contain a collision. We remove the collision from the vertex, and we measure the elements that form this collision: we still obtain a superposition of sets, which might be exploited for the next walk. The sets in this superposition have a very important property: because we just removed the collision (more generally, we will remove all collisions that the vertex contains), they actually do not contain one with certainty. Thus, we do not have the uniform superposition of vertices of our next MNRS walk, but the uniform superposition of unmarked vertices. However, we have seen that this made little difference, and we can continue using Theorem 3. When we measure the result of a walk step, it will succeed with at least constant probability. In the case of failure, we collapse on the superposition of unmarked vertices again, which means we simply have to restart the walk. The extraction of collisions modifies the walk parameters (vertex size, graph, marked vertices) in a way that we track throughout the algorithm, and is detailed below.

Technical Details. Let C be a table in classical memory of all the multi-collisions found so far. This table contains entries of the form: \(u : (x_1, \ldots , x_r)\) where \(f(x_1) = \ldots = f(x_r) = u\) forms a multicollision of f, indexed by the image. We define the size of C, its set of preimages and its set of images:

$$\begin{aligned} {\left\{ \begin{array}{ll} \textsf{Preim}(C) := \bigcup _{ u : (x_1, \ldots , x_r) \in C } \{ x_1, \ldots , x_r \} \\ \textsf{Im}(C) := \bigcup _{ u : (x_1, \ldots , x_r) \in C } \{ u \} \end{array}\right. } \end{aligned}$$
(5)

Given the table C, given a size parameter R, we define the two sets of sets:

$$\begin{aligned} {\left\{ \begin{array}{ll} V_R^C &{}:= \{ S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C)\right) , |S| = R \} \\ M_R^C &{}:= \{ S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C)\right) , |S| = R , \\ &{}\qquad \qquad \left( \exists x \ne y \in S, f(x) = f(y) \vee \exists z \in S, f(z) \in \textsf{Im}(C) \right) \} \end{array}\right. } \end{aligned}$$
(6)

The first one will be the set of vertices for the current walk, and the second one its set of marked vertices. As we can see, the current walk excludes a set of previously measured inputs, and a vertex is marked if it leads to a new collision, or to a preimage of one of the previously measured images. The second case simply extends one of the currently known multicollision tuples. The probability for a vertex to be marked can be easily computed, and we just need to bound it as follows:

$$\begin{aligned} \max \left( \frac{R |\textsf{Im}(C)|}{2^m} , \frac{ R(R-1) }{ 2^{m+1}} \right) \le \varepsilon _{R, C} \le \frac{R |\textsf{Im}(C)|}{2^m} + \frac{ R(R-1) }{ 2^{m+1}} , \end{aligned}$$

since any vertex containing a collision, or a preimage from the table C, is marked.

In Sect. 5, we will show that with an appropriate data structure, there exists an extraction algorithm \(\textrm{EXTRACT}{}\) which does the following:

$$\begin{aligned} \textrm{EXTRACT}{} : C, R, \frac{1}{\sqrt{|M_R^C|}} \sum _{ S \in M_R^C} \left| S \right\rangle \mapsto C', R', \frac{1}{\sqrt{| V_{R'}^{C'} \backslash M_{R'}^{C'}|}} \sum _{S \in V_{R'}^{C'} \backslash M_{R'}^{C'}} \left| S \right\rangle , \end{aligned}$$

where \(R' = R-r\) for some value r, and \(C'\) contains exactly r new elements (collisions adding new entries, or preimages going into previous entries). Thus, \(\textrm{EXTRACT}{}\) transforms the output of a successful walk into the set of unmarked vertices for the next walk.

We can now give Algorithm 2, depending on a tunable parameter \(\ell \).

figure b

4.2 Complexity Analysis

Theorem 5

(Time-memory tradeoff). For all \(k \le \ell \le \min (2k/3+m/3,m/2)\), Algorithm 2 computes \(2^k\) collisions using \( \widetilde{\mathcal {O}} \left( 2^\ell \right) \) qubits and \( \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) \) Clifford+T+qRAM gates.

Proof

We start by noticing that although Algorithm 2 outputs a set of multicollisions rather than collisions, the number of collisions and multicollisions that are actually obtained are closely related. Indeed, for a function from \([2^n]\) to \([2^n]\), there is a polynomial (in n) limit to the width of multicollisions that can appear for a non-negligible fraction of the functions. Indeed, by Theorem 4 in [13], the average number of r-collisions in such a random function is \(\frac{2^n e^{-1}}{r!}\). Thus, there exists a universal constant c such that with probability \(1 - o(2^{-n})\), such a random function does not have any r-collision with \(r \ge cn\).

This means that regardless of the state of the current table C, we have:

$$\begin{aligned} |\textsf{Im}(C)| \le |\textsf{Preim}(C)| \le cn |\textsf{Im}(C)|. \end{aligned}$$

In particular, by taking \(2^\ell \) greater than \(cn 2^{k+1}\), we ensure that during the algorithm, \(R > 2^{\ell -1}\). This means that we never run out of elements.

Secondly, we can bound \(\varepsilon _{R,C} \ge \frac{R(R-1)}{2^{m+1}}\). This allows to upper bound easily the time complexity of any of the walks: if the current vertex size is R then it runs for \( \mathcal {O} \left( 2^{m/2} / R \right) \) iterates, and each iterate contains \( \widetilde{\mathcal {O}} \left( \sqrt{R} \right) \) operations. The constants in the \(\mathcal {O}\) are the same throughout the algorithm. This means that we can upper bound the complexity of each walk by \( \widetilde{\mathcal {O}} \left( 2^{m/2} / \sqrt{R} \right) \le \widetilde{\mathcal {O}} \left( 2^{m/2 - \ell / 2} \right) \).

By Theorem 3, the success probability of this walk is bigger than \(1 - 4 \varepsilon _{R,C}\). If we do not succeed, the \(\textrm{CHECK}{}\) followed by a measurement make the current state collapse again on the superposition of unmarked vertices, and we run the exact same walk again. Note that for this algorithm to work, we must have \(\varepsilon _{R,C} < 0.5\). This corresponds to the probability that the list contains a collision, or a new preimage of \(\textsf{Im}(C)\), which is \( \widetilde{\mathcal {O}} \left( 2^{2\ell -m} \right) \). Hence, we must have \(\ell \le m/2\).

Then, as \(\ell \le 2k/3+m/3\), the final complexity of the algorithm is

$$\begin{aligned} \widetilde{\mathcal {O}} \left( 2^\ell + 2^k 2^{m/2 - \ell / 2} \right) = \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) , \end{aligned}$$

where \(2^{\ell }\) is the cost of the SETUP, and the second term accounts for all the walk steps.    \(\square \)

5 Quantum Radix Trees and Extractions

In this section, we detail the quantum radix tree data structure, a history-independent unordered set data structure introduced in [21]. We show that it allows to perform, exactly and in a polynomial number of Clifford+T+qRAM gates, the two main operations required for our walk: \(\text {SWUP}'\) and \(\textrm{EXTRACT}{}\). We describe these operations in pseudocode, while ensuring that they are reversible and polynomial.

5.1 Logical Level

Following [21], the quantum radix tree is an implementation of a radix tree storing an unordered set S of n-bit strings. It has one additional property: its concrete memory layout is history-independent. Indeed, there are many ways to encode a radix tree in memory, and as elements are inserted and removed, we cannot have a unique bit-string T(S) representing a set S. We use instead a uniform superposition of all memory layouts of the tree, which makes the quantum state \(\left| T(S) \right\rangle \) unique, and independent of the order in which the elements were inserted or removed. Only the entry point (the root) has a fixed position.

We separate the encoding of S into \(\left| T(S) \right\rangle \) in two levels: first, a logical level, in which S is encoded as a unique radix tree R(S); second, a physical level, in which R(S) is encoded into a quantum state \(\left| T(S) \right\rangle \). The logical mapping \(S \rightarrow R(S)\) is standard.

Definition 2

(From [21]). Let S be a set of n-bit strings. The radix tree R(S) is a binary tree in which each leaf is labeled with an element of S, and each edge with a substring, so that the concatenation of all substrings on the path from the root to the leaf yields the corresponding element. Furthermore, the labels of two children of any non-leaf node start with different bits.

By convention, we put the “0” bit on the left, and “1” on the right. In addition to the n-bit strings stored by the tree, we append to each node the value of an \(\ell \)-bit invariant which can be computed from its children, and depends only on the logical structure of the radix tree, not its physical structure. Typically the invariant can count the number of elements in the tree.

Fig. 2.
figure 2

Tree R(S) representing the set \(S = \{\texttt{0000},\texttt{0010}, \texttt{1001}, \texttt{1011}, \texttt{1111}\}\) (the example is taken from [21]).

5.2 Memory Representation

We now detail the correspondence from R(S) to \(\left| T(S) \right\rangle \). We suppose that a quantum bit-string data structure is given, that handles bit-strings of length between 0 and n and performs operations such as concatenation, computing shared prefixes, testing if the bit-string has a given prefix, in time \(\textsf{poly}(n)\).

State of the Memory. We suppose that \( \mathcal {O} \left( M n \right) \) qubits of memory are given, where \(M \ge R\) will be set later on. We divide these qubits into M cells of \( \mathcal {O} \left( n \right) \) qubits each, which we index from 0 to \(M-1\). We encode cell addresses on \(m = \left\lceil \log _2 M \right\rceil + 1\) bits, and we also define an “empty” address \(\bot \). Each cell will be either empty, or contain a node of the radix tree, encoded as a tuple \((i, a_l, a_r, \ell _l, \ell _r)\) where:

  • i is the value of the invariant

  • \(a_l\) and \(a_r\) (m-bit strings) are respectively the memory addresses of the cells holding the left and right children, either valid indices or \(\bot \). A node with \(a_l = a_r = \bot \) is a leaf.

  • \(\ell _l\) and \(\ell _r\) are the labels of the left and right edges. (\(\varepsilon \) if the node is a leaf, where \(\varepsilon \) is the empty string).

In other words, we have added to the tree R(S) a choice of memory locations for the nodes, which we name informally the memory layout of the tree. The structure of R(S) itself remains independent on its memory layout.

The root of the tree is stored in cell number 0. In Fig. 3, we give an example of a memory representation of the tree R(S) of Fig. 2. We take as invariant the number of leaves which, at the root, gives the number of elements in the set. It is important to note that memory cells have an “empty” default state, which allows the radix tree to support size changes. Whether a cell is empty or not depends on the memory layout.

Fig. 3.
figure 3

Example of memory layout for the tree of Fig. 2, holding the set \(S = \{\texttt{0000},\texttt{0010}, \texttt{1001}, \texttt{1011}, \texttt{1111}\}\).

A radix tree encoding a set of size R contains \(2 R-1\) nodes (including the root), which means that we need (a priori) no more than \(M=2 R-1\) cells in our memory. In addition to the bit-strings x, we could add any data \(d_x\) to which x serves as a unique index. (This means adding another register which is non-empty for leaf nodes only). Finally, it is possible to account for multiplicity of elements in the tree by adding multiplicity counters, but since this is unnecessary for our applications, we will stick to the case of unique indices.

Definition. Let S be a set of size R, encoded in a radix tree with \(2R-1\) nodes. We can always take an arbitrary ordering of the nodes in the tree, for example the lexicographic ordering of the paths to the root (left = 0, right = 1). This means that, for any sequence of non-repeating cell addresses I, of length \(2R-1\), we can define a mapping: \(S, I \mapsto T_I(S)\) which specifies the writing of the tree in memory, by choosing the addresses \(I = (i_1 = 0, \ldots , i_{2R-1})\) for the elements. For example, the tree of Fig. 3 would correspond to the sequence (0, 1, 3, 4, 2, 5, 8, 9, 7). We can then define the quantum radix tree encoding S as the quantum state:

$$\begin{aligned} \left| T(S) \right\rangle = \sum _{\text {valid sequences } I} \left| T_I(S) \right\rangle , \end{aligned}$$
(7)

where we take a uniform superposition over all valid memory layouts.

For two different sets S and \(S'\), and for any pair \(I,I'\) (even if \(I' = I\)), we have \(T_{I'}(S) \ne T_I(S')\): the encodings always differ. This means that, as expected, we have \(\left\langle T(S) | T(S') \right\rangle = 0\).

Memory Allocator. In order to maintain this uniform superposition over all possible memory layouts, we need an implementation of a memory allocator. This unitary \(\textrm{ALLOC}{}\) takes as input the current state of the memory, and returns a uniform superposition over the indices of all currently unoccupied cells. Possible implementations of \(\textrm{ALLOC}{}\) are detailed in Sect. 5.4.

5.3 Basic Operations

We show how to operate on the quantum radix trees in \(\textsf{poly}(n)\) Clifford+T +qRAM gates. We start with the basics: lookup, insertion and deletion.

Lookup. We define a unitary \(\textrm{LOOKUP}{}\) which, given S and a new element x, returns whether x belongs to S:

$$\begin{aligned} \textrm{LOOKUP}~: \left| x \right\rangle \left| T(S) \right\rangle \left| 0 \right\rangle \mapsto \left| x \right\rangle \left| T(S) \right\rangle \left| x \in S \right\rangle . \end{aligned}$$
(8)

We implement \(\textrm{LOOKUP}{}\) by descending in the radix tree R(S); the pseudocode is given in Algorithm 3. Since the “while” loop contains at most n iterates, quantumly these n iterates are performed controlled on a flag that says whether the computation already ended. After obtaining the result, they are recomputed to erase the intermediate registers.

figure c

Insertion. We define a unitary \(\textrm{INSERT}{}\), which, given a new element x, inserts x in the set S. If x already belongs to S, its behavior is unspecified.

$$\begin{aligned} \textrm{INSERT}~: \left| x \right\rangle \left| T(S) \right\rangle \mapsto \left| x \right\rangle \left| T(S\cup \{x\}) \right\rangle . \end{aligned}$$
(9)

The implementation of \(\textrm{INSERT}{}\) is more complex, but the operation is still reversible. The pseudocode is given in Algorithm 4. At first, we find the point of insertion in the tree, then we call \(\textrm{ALLOC}{}\) twice to obtain new memory addresses for two new nodes. We modify locally the layout to insert these new nodes, including a new leaf for the new element x. Then, we update the invariant on the path to the new leaf. Finally, we uncompute the path to the new leaf (all the addresses of the nodes on this path). To do so, we perform a loop similar to \(\textrm{LOOKUP}{}\), given the knowledge of the newly inserted element x.

Deletion. The deletion can be implemented by uncomputing \(\textrm{INSERT}{}\), since it is a reversible operation. It performs:

$$\begin{aligned} \textrm{INSERT}{}^\dagger ~: \left| x \right\rangle \left| T(S\cup \{x\}) \right\rangle \mapsto \left| x \right\rangle \left| T(S) \right\rangle . \end{aligned}$$
(10)

The deletion of an element that is not in S is unspecified.

figure d

Quantum Lookup. We can implement a “quantum lookup” unitary \(\textrm{QLOOKUP}{}\) which produces a uniform superposition of elements in S having a specific property P. The only requirement is that the invariant of nodes has to store the number of nodes in the subtree having this property (and so, leaf nodes will indicate if the given x satisfies P(x) or not).

$$\begin{aligned} \textrm{QLOOKUP}{}~: \left| T(S) \right\rangle \left| 0 \right\rangle \mapsto \left| T(S) \right\rangle \sum _{x \in S | P(x)} \left| x \right\rangle . \end{aligned}$$
(11)

This unitary is implemented by descending in the tree coherently (i.e., in superposition over the left and right paths) with a weight that depends on the number of solutions in the left and right subtrees. First, we initialize an address register \(\left| a \right\rangle \) to the root. Then, for n times (the maximal depth of the tree), we update the current address register as follows:

  • We count the number of solutions in the left and right subtrees of the node at address \(\left| a \right\rangle \) (say, \(t_l\) and \(t_r\)).

  • We map \(\left| a \right\rangle \) to \(\left| a \right\rangle \left( \sqrt{ \frac{t_l}{t_l + t_r}} \left| \text {left child of } a \right\rangle + \sqrt{ \frac{t_r}{t_l + t_r}} \left| \text {right child of } a \right\rangle \right) \). (We do nothing if \(\left| a \right\rangle \) is a leaf).

In the end, we obtain a uniform superposition of the paths to all elements satisfying P. We can query these elements, then uncompute the paths using an inverse LOOKUP. Likewise, we can also perform a quantum lookup of pairs satisfying a given property, e.g., retrieve a uniform superposition of all collision pairs in S.

5.4 Quantum Memory Allocators

We now define the unitary \(\textrm{ALLOC}{}\), which given the current state of the memory, creates the uniform superposition of unallocated cells:

$$\begin{aligned} \textrm{ALLOC}~: \left| \text {current memory} \right\rangle \left| 0 \right\rangle \mapsto \left| \text {current memory} \right\rangle \sum _{i \text { unoccupied}} \left| i \right\rangle . \end{aligned}$$
(12)

We do not need to define a different unitary for un-allocation; we only have to recompute \(\textrm{ALLOC}{}\) to erase the addresses of cells that we are currently cleaning. To implement \(\textrm{ALLOC}{}\), we add to each memory cell a flag indicating if it is allocated. We propose two approaches.

Quantum Search Allocation. Classically, we can allocate new cells by simply choosing addresses at random and checking if they are already allocated or not. Quantumly, we can follow this approach using a quantum search over all the cells for unallocated ones. Obviously, for this approach to be efficient, we need the proportion of unallocated cells to be always constant. Besides, if we keep a counter of the number of allocated cells (which does not vary during our quantum walk steps anyway), we can make this operation exact using Amplitude Amplification (Theorem 4 in [7]). Indeed, this counter gives the proportion of allocated cells, so we know exactly the probability of success of the amplified algorithm.

We can implement this procedure with a single iteration of quantum search as long as we have a \(33\%\) overhead on the maximal number of allocated cells (similarly to the case of searching with a single query studied in [11]).

Quantum Tree Allocation. A more standard, but less time-efficient approach to implement \(\textrm{ALLOC}{}\) is to organize the memory cells in a complete binary tree (a heap), so that each node of the tree stores the number of unallocated cells in its children. This tree is not a quantum radix tree, since its size never changes, and no elements are inserted or removed. In order to obtain the uniform superposition of free cell addresses, we mimic the approach of \(\textrm{QLOOKUP}{}\).

5.5 Higher-Level Operations for Collision Walks

We now implement efficiently the higher-level operations required by our algorithms: setting up the initial vertex (SETUP), performing a quantum walk update (\(\text {SWUP}'\)), looking for collisions (\(\textrm{CHECK}{}\)) and extracting them (\(\textrm{EXTRACT}{}\)).

Representation. We consider the case of (multi-)collision search. Here the set S is a subset of \([N] = \{0,1\}^n\), but we also need to store the images of these elements by the function \(f~: \{0,1\}^n \rightarrow \{0,1\}^{m}\). Let \(F = \{ f(x) || x, x \in S \}\). A collision of f is a pair (f(x) ||x), (f(y) ||y) such that \(f(x) = f(y)\), i.e., the bit-strings have the same value on the first m bits.

Since our goal is to retrieve efficiently the collision pairs, we will store both a radix tree T(S) to keep track of the elements, and T(F) to keep track of the collisions. One should note that the sets F and S have the same size. When inserting or deleting elements, we insert and delete both in T(S) and T(F). These trees are stored in two separate chunks of memory cells.

SETUP. The unitary SETUP starts from an empty state \(\left| 0 \right\rangle \) and initializes the tree to a uniform superposition of subsets of a given set. As long as sampling uniformly at random from this set is efficient, we can implement SETUP using a sequence of insertions in a tree that starts empty.

SWUP’. We show an efficient implementation of the unitary \(\text {SWUP}'\):

$$\begin{aligned} \text {SWUP}' \left| T(S) \right\rangle \left| T(F) \right\rangle \left| c_{S \rightarrow S'} \right\rangle = \left| T(S') \right\rangle \left| T(F') \right\rangle \left| c_{S' \rightarrow S} \right\rangle \end{aligned}$$
(13)

where \(c_{S \rightarrow S'}\) is the coin register which contains information on the transition of a set S to a set \(S'\). As we have detailed before, the coin is encoded as a pair (jz) where \(j \in [R]\) is the index of an element in S, which has to be removed, and \(z \in [N-R]\) is the index of an element in \(\{0,1\}^n \backslash S\), which has to be inserted. We implement \(\text {SWUP}'\) as follows:

  1. 1.

    First, we convert the coin register to a pair xy where: \(\bullet \) y is the z-th element of \(\{0,1\}^n\) which is not in S and \(\bullet \) x is the j-th element of S (according to the lexicographic ordering of bit-strings). For the first, we need a specific algorithm detailed in the full version of the paper [5], which accesses the tree T(S). The second can be done easily if the invariant of each node stores the number of leaves in its subtree. Note that both the mapping from z to y, and from j to x, are reversible. At this point the state is \(\left| T(S) \right\rangle \left| T(F) \right\rangle \left| x,y \right\rangle \).

  2. 2.

    We use \(\textrm{INSERT}{}^\dagger \) to delete x from T(S), and delete f(x)||x from T(F).

  3. 3.

    We use \(\textrm{INSERT}{}\) to insert y in T(S) and f(y)||y in T(F). At this point the state is: \(\left| T(S') \right\rangle \left| T(F') \right\rangle \left| x, y \right\rangle \) where \(S' = (S \backslash \{x\}) \cup \{y\}\) and \(F'\) is the set of corresponding images.

  4. 4.

    Finally, we convert the pair xy back to a coin register.

Remark 2

(Walking in a reduced set). In our walk, we actually reduce the set of possible elements, due to the previously measured collisions. So the coin does not encode an element of \(\{0,1\}^n \backslash S\), but of \(\{0,1\}^n \backslash S \backslash \textsf{Preim}(C)\), where C is our current table of multicollisions. An adapted algorithm is also given in the full version of the paper [5] for this case.

Checking. We make the CHECK operation trivial, by defining an appropriate invariant of the tree T(F). For each node in the tree, we count the number of multicollisions and preimages of \(\textsf{Im}(C)\) that the subtree rooted at this node contains. Then, the unitary CHECK simply tests whether the invariant at the root is zero.

During the operations of insertion and deletion in the tree, the invariant can be updated appropriately. Besides checking if the inserted element creates a new collision (resp., the deleted element removes one), we also need to check whether the image belongs to the set \(\textsf{Im}(C)\). During the run of the algorithm, \(\textsf{Im}(C)\) is classical, and can be stored in quantum-accessible classical memory.

Extracting. The most important property for our chained quantum walk is the capacity to extract multicollisions from the radix tree, in a way that preserves the rest of the state, and allows to reuse a superposition of marked vertices for the current walk, as a superposition of unmarked vertices for the next one. Recall from Sect. 4.1 that we have defined a table of multicollisions C, a set \(V_R^C\) of sets of size R in \(\{0,1\}^n \backslash \textsf{Preim}(C)\), and a set \(M_R^C \subseteq V_R^C\) of marked vertices, which contain either a new element mapping to \(\textsf{Im}(C)\), or a new collision. Recall also from the proof of Theorem 5 that a random function, with probability \(1 - o(2^{-n})\), does not admit an r-collision \((x_1, \ldots , x_r)\) with \(r = \mathcal {O} \left( n \right) \) for some appropriate constant. This limit on the size of multicollisions ensures that the extraction does not reduce too much the size of the current vertex.

The operation \(\textrm{EXTRACT}{}\) does:

$$\begin{aligned} \textrm{EXTRACT}{} : C, R, \frac{1}{\sqrt{|M_R^C|}} \sum _{ S \in M_R^C} \left| S \right\rangle \mapsto C', R', \frac{1}{\sqrt{| V_{R'}^{C'} \backslash M_{R'}^{C'}|}} \sum _{S \in V_{R'}^{C'} \backslash M_{R'}^{C'}} \left| S \right\rangle , \end{aligned}$$

i.e., it updates the current vertex state, but also reduces R to a smaller value \(R'\), and updates the table C into a bigger table \(C'\). It is implemented as Algorithm 5. Although it is not strictly necessary, we have separated the subroutine CHECK into: CHECKP, which finds whether the set contains a new preimage of C, and CHECKC, which finds whether there is a new collision.

figure e

We now prove the correctness of Algorithm 5. We start with the uniform superposition of marked vertices, i.e., sets \(S \subseteq \{0,1\}^n \backslash \textsf{Preim}(C)\) of size R, which contain at least a solution tuple \(x_1, \ldots , x_r\) which is either a (multi)-collision, or a new preimage.

The first loop removes all new preimages. Each time we measure an element, we collapse on the superposition of sets which contained it. After CHECKP returns 0 for the first time, the state collapses on the uniform superposition of all sets S such that:

$$\begin{aligned} S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C')\right) , |S| = R' = R -t, \Big (\forall z \in S, f(z) \notin \textsf{Im}(C')\Big ) , \end{aligned}$$

where t is the number of iterates of the loop that we had to perform. There is a variable number of such iterates but we expect only one to occur on average, since the typical case is for vertices to contain only one solution.

The second loop will run until there are no collisions anymore. New preimages cannot appear since we extract entire multicollision tuples. At the first loop iterate, assuming that CHECKC returns 1, we collapse on the uniform superposition of sets:

$$\begin{aligned}&S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C')\right) , |S| = R' , \\&\qquad \qquad \qquad \qquad \Big (\forall z \in S, f(z) \notin \textsf{Im}(C') \wedge \exists x,y \in S, x\ne y, f(x) = f(y)\Big ) . \end{aligned}$$

We select one of the solutions \((x_1, \ldots , x_r)\) at random, remove it, and measure the tuple \(x_1, \ldots , x_r\). Let \(u = h(x_1)\). After measurement, the state collapses on all sets that do not contain \(x_1, \ldots , x_r\), and contain no preimage of u.

Since we update \(R'\) and \(C'\) accordingly, we obtain the sets:

$$\begin{aligned} S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C')\right) , |S| = R' , \Big (\forall z \in S, f(z) \notin \textsf{Im}(C')\Big ) . \end{aligned}$$

After repeatedly calling CHECKC and measuring, we will continue extracting collisions until CHECKC returns 0, i.e., we have collapsed on the sets which do not contain a collision. At this point, we have a uniform superposition of:

$$\begin{aligned}&S \subseteq \left( \{0,1\}^n \backslash \textsf{Preim}(C')\right) , |S| = R' , \\&\qquad \qquad \qquad \qquad \qquad \quad \Big ( \forall z \in S, f(z) \notin \textsf{Im}(C') \wedge \forall x,y \in S, f(x) \ne f(y) \Big ) . \end{aligned}$$

This is, by definition, the set of unmarked vertices (see Equation (6)).

Note that for this algorithm to work, we need to maintain invariants of the number of solutions (new preimages and multicollisions) that any subtree contains. These invariants only decrease during the loop iterates, and they are updated accordingly when we remove the solutions from the tree.

6 Searching for Many Collisions, in General

As we have seen, our new algorithm is valid (and tight) for all values of n, m and \(k \le 2n-m\) such that \(k \le \frac{m}{4}\). Two approaches can be used for higher k.

BHT. A standard approach to find multiple collisions, which works when m is small, is to extend the BHT algorithm [8]. We select a parameter \(\ell \), then make \(2^\ell \) queries, and look for \(2^k\) collisions on this list of queries. This is done by a quantum search on \(\{0,1\}^n\) for an input colliding with the list.

There are on average \(2^{2n-m}\) collision pairs in the function, so a random element of \(\{0,1\}^n\) has a probability \( \mathcal {O} \left( 2^{n-m} \right) \) to be in a collision pair. This gives \( \mathcal {O} \left( 2^{\ell - m + n} \right) \) collision pairs for the initial list.

Thus, a search for a collision with the list has \( \mathcal {O} \left( 2^{\ell - m + n} \right) \) solutions in a search space of size \(2^n\), and it requires \(\sqrt{\frac{2^n}{2^{\ell + m-n}}} = 2^{(m-\ell )/2}\) iterates.

If this procedure is to output \(2^k\) collisions, we need \(\ell \) such that \(2^{\ell - m + n} \ge 2^k\) i.e. \(\ell -m + n \ge k\). By trying to equalize the complexity of the two steps, we obtain: \(\ell = k + \frac{m-\ell }{2} \implies \ell = \frac{2k}{3} + \frac{m}{3}\) which is only valid for \(k \le 3n - 2m\). For a bigger k, we can repeat this. We find \(2^{3n-2m}\) collisions in time (and memory) \(2^{2n-m}\), and we do this \(2^{k - (3n-2m)}\) times, for a total time \( \widetilde{\mathcal {O}} \left( 2^{k + m -n} \right) \). If we want to restrict the memory then we obtain the tradeoff of \( \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) \) time using \( \mathcal {O} \left( 2^\ell \right) \) memory.

Using Our Method. If \(k > m/4\), then the memory limitation in Theorem 5 on \(\ell \) becomes relevant. In that case, as we are restricted to \(\ell \le m/2\), the minimal achievable time is \( \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) = \widetilde{\mathcal {O}} \left( 2^{k+m/4} \right) \).

Conclusion. The time and memory complexities of the problem are the following (in \(\log _2\) and without polynomial factors):

  • If \(k \le 3n-2m\): \(\frac{2k}{3} + \frac{m}{3}\) time and memory using BHT

  • Otherwise, if \(k \le \frac{m}{4}\): \(\frac{2k}{3} + \frac{m}{3}\) time and memory using our algorithm

  • Otherwise, if \(m \le \frac{4}{3} n\): \(k + m -n\) time and \(2n-m\) memory using BHT

  • Otherwise, if \(m \ge \frac{4}{3} n\): \(k + \frac{m}{4}\) time and \(\frac{m}{2}\) memory using our algorithm

This situation is summarized in Fig. 1, and it allows us to conclude:

Theorem 6

Let \(f : \{0,1\}^n \rightarrow \{0,1\}^m\), \(n \le m \le 2n\) be a random function. Let \(k \le 2n-m\). There exists an algorithm finding \(2^k\) collisions in \( \widetilde{\mathcal {O}} \left( 2^{C(k, m,n)} \right) \) Clifford+T+qRAM gates, and using \( \widetilde{\mathcal {O}} \left( 2^{C(k, m,n)} \right) \) quantum queries to f, where:

$$\begin{aligned} C(k,m,n) = \max \left( \frac{2k}{3} + \frac{m}{3} , k + \min \left( m-n , \frac{m}{4} \right) \right) . \end{aligned}$$
(14)

Proof

We check that: \(k \le 3n-2m \iff \frac{2k}{3} + \frac{m}{3} \ge k + m-n\) and \(k \le \frac{m}{4} \iff \frac{2k}{3} + \frac{m}{3} \ge k + \frac{m}{4}\).    \(\square \)

We conjecture that the best achievable complexity is, in fact, \(C(k,m,n) = \frac{2k}{3} + \frac{m}{3}\) for any admissible values of k, m and n. It would however require a non-trivial extension of our algorithm, capable of outputting collisions at a higher rate than what we currently achieve.

In terms of time-memory trade-offs, we can summarize the results as:

Theorem 7

(General Time-memory tradeoff). For all \(k \le \ell \le \min (2k/3+m/3,\max (2n-m, m/2))\), there exists an algorithm that computes \(2^k\) collisions using \( \widetilde{\mathcal {O}} \left( 2^\ell \right) \) qubits and \( \widetilde{\mathcal {O}} \left( 2^{k+m/2-\ell /2} \right) \) Clifford+T+qRAM gates and quantum queries to f.

Similarly, as in [15], we conjecture that the trade-off should be achievable for all \(\ell \le 2k/3+m/3\).

7 Applications

In this section, we show how our algorithm can be used as a building block for lattice sieving and to solve the limited birthday problem. We also discuss the problem of multicollision search.

7.1 Improvements in Quantum Sieving for Solving the Shortest Vector Problem

In this section, we present the improvement of our reusable quantum walks to lattice sieving algorithms. A lattice \({{\,\mathrm{\mathcal {L}}\,}}= {{\,\mathrm{\mathcal {L}}\,}}(\textbf{b}_1, \ldots , \textbf{b}_d) := \{\sum _{i=1}^d z_i \textbf{b}_i : z_i \in {{\,\mathrm{\mathbb {Z}}\,}}\}\) is the set of all integer combinations of linearly independent vectors \(\textbf{b}_1,\dots ,\textbf{b}_d \in {{\,\mathrm{\mathbb {R}}\,}}^d\). We call d the rank of the lattice and \((\textbf{b}_1, \ldots , \textbf{b}_d)\) a basis of the lattice. The most important computational problem on lattices is the Shortest Vector Problem (SVP). Given a basis for a lattice \({{\,\mathrm{\mathcal {L}}\,}}\subseteq {{\,\mathrm{\mathbb {R}}\,}}^d\), SVP asks to compute a non-zero vector in \({{\,\mathrm{\mathcal {L}}\,}}\) with the smallest Euclidean norm. The main lattice reduction algorithm used for lattice-based cryptanalysis is the famous BKZ algorithm [29]. It internally uses an algorithm for solving (near) exact SVP in lower-dimensional lattices. Therefore, finding faster algorithms to solve exact SVP is critical to choosing security parameters of cryptographic primitives.

Previously, the fastest quantum algorithm solved SVP under heuristic assumptions in \(2^{0.2570d + o(d)}\) time [9]. It applies the MNRS quantum walk technique to the state-of-the-art classical algorithm called lattice sieving, where we combine close vectors together to obtain shorter vectors at each step. It was noted in [9] that the algorithm could be slightly improved if we could find many marked vertices in a quantum walk without repaying the setup each time, which is exactly what we showed in Sect. 4. We redid the analysis of [9] with this improvement and show the following

Proposition 4

There exists a quantum algorithm that solves SVP under heuristic assumptions in \(2^{0.2563d + o(d)}\)

Proving this statement requires to restate the whole framework and analysis of [9]. We briefly present here the main calculation to achieve our result but we refer to the full version of the paper [5] for a more comprehensive analysis. Let \(V_d(\alpha )\) be the ratio of the volume of a spherical cap of angle \(\alpha \) to the volume of the d-dimensional sphere. We have \(V_d(\alpha ) = {\text {poly}}(d)\sin ^d(\alpha ) .\)

Proposition 5

The algorithms of [9] has the following asymptotic running time:

$$\begin{aligned} T = \max \{1,N^{c - \zeta }\} \cdot (N + N^{1-c}FAS_1). \end{aligned}$$
(15)

where \( N = \frac{1}{V_d(\pi /3)}, \ \alpha \text { st. } V_d(\alpha ) = N^{-(1-c)}, \ \theta ^*_\alpha = 2\arcsin (\frac{1}{2\sin (\alpha )}), \zeta \text { st. } N^\zeta = N^{2c}V_d(\theta ^*_\alpha ),\) and \(FAS_1\) is the running time of the \(FAS_1\) subroutine.

The authors of [9] use a quantum walk in order to solve the \(FAS_1\) problem.

Proposition 6

([9]). For a parameter \(c_1\), let \(\beta \) st. \(V_d(\beta ) = \frac{1}{N^{c_1}}\), \(\rho _0\) st. \(N^{\rho _0} = \frac{V_d(\beta )}{W_d(\beta ,\theta ^*_\alpha )},\) where \(W_d(\beta ,\theta ^*_\alpha ) = \textrm{poly}(d)\cdot \left( 1 - \frac{2 \cos ^2(\beta )}{1 + \cos (\theta ^*_\alpha )}\right) ^{d/2}\). In order to solve \(FAS_1\) with parameter \(c_1\), it is enough to repeat \(N^{\rho _0}\) times a quantum walk on a graph where we each time need to find \(N^{\zeta -\rho _0}\) marked elements with the following properties

$$ \textsf{S} = N^{c_1}, \ \delta = N^{-c_1}, \ \varepsilon = N^{2c_1 - \rho _0} V_{d-1}(\theta ^*_\alpha ), \ \textsf{U} = 1, \ \textsf{C} = 1.$$

Using Theorem 4, we obtain \(FAS_1 = N^{\rho _0} \cdot \left( \textsf{S} + \frac{N^{\zeta - \rho _0} }{\sqrt{\varepsilon }}\left( \frac{1}{\sqrt{\delta }}\textsf{U} + \textsf{C}\right) \right) .\) We take the following set of parameters: \(c \approx 0.3875, c_1 \approx 0.27\) which gives \(\zeta \approx 0.1568\) and \(\rho _0 \approx 0.1214\). Notice that with these parameters, we are indeed in the range of Theorem 4 since the number of solutions we extract is \(2^k = N^{\zeta - \rho _0} \approx N^{0.0354}\) and the range of the function f on which we collision is \(2^m = 2^{c_1} \approx N^{0.27}\) (the number of points in the code), so we indeed have \(k \le \frac{m}{4}\). The parameters of the quantum walk become:

$$ \textsf{S} \approx N^{0.27}, \ \varepsilon \approx N^{-0.2}, \ \delta \approx N^{-0.27}, \ \textsf{U} = \textsf{C} = 1 . $$

This gives \(FAS_1 \approx N^{0.27}\). Plugging this into Equation (15), we get a total running time of \(T = N^{1.2347}\) which is equal to \(T = 2^{0.2563d + o(d)}\) improving slightly the previous running time of \(2^{0.2570d + o(d)}\).

7.2 Solving the Limited Birthday Problem

The following problem is very common in symmetric cryptanalysis. It appears for example in impossible differential attacks [6], but also in rebound distinguishers [16]. In the former case we use generic algorithms to solve the problem for a black-box E, and in the latter, a valid distinguisher for E is defined as an algorithm outputting the pairs faster than the generic one.

Problem 4

(Limited Birthday). Given access to a black-box permutation \(E~: \{0,1\}^n \rightarrow \{0,1\}^n\) and possibly its inverse \(E^{-1}\), given two vector spaces \(\mathcal {D}_{\textrm{in}}\) and \(\mathcal {D}_{\textrm{out}}\) of sizes \(2^{\varDelta _{\textrm{in}}}\) and \(2^{\varDelta _{\textrm{out}}}\) respectively, find \(2^k\) pairs \(x,x'\) such that \(x \ne x', x \oplus x' \in \mathcal {D}_{\textrm{in}}, E(x) \oplus E(x') \in \mathcal {D}_{\textrm{out}}\).

For simplicity, we will focus only on the time complexity of the problem, although some parameter choices require a large memory as well. Classically the best known time complexity is given in [6]:

$$\begin{aligned} \max \left( \min _{\varDelta \in \{ \varDelta _{\textrm{in}}, \varDelta _{\textrm{out}}\}} \left( \sqrt{2^{k + n+1 - \varDelta }} \right) , 2^{k + n+1 - \varDelta _{\textrm{in}}- \varDelta _{\textrm{out}}} \right) . \end{aligned}$$
(16)

This complexity is known to be tight for \(2^k = 1\) [16].

In the quantum setting, we need to consider superposition access to E and possibly \(E^{-1}\) to have a speedup on this problem. Previously the methods used [23] involved only individual calls to Ambainis’ algorithm (when there are few solutions) or an adaptation of the BHT algorithm (when there are many solutions).

The quantum algorithm, as the classical one, relies on the definition of structures of size \(2^{\varDelta _{\textrm{in}}}\), which are subsets of the inputs of the form \(T_x = \{ x \oplus v, v \in \mathcal {D}_{\textrm{in}}\}\) for a fixed x. For a given structure \(T_x\), we can define a function \(h_x~: \{0,1\}^{\varDelta _{\textrm{in}}} \rightarrow \{0,1\}^{n - \varDelta _{\textrm{out}}}\) such that any collision of \(h_x\) yields a pair solution to the limited birthday problem. The expected number of collisions of a single \(h_x\) is \(C := 2^{2\varDelta _{\textrm{in}}+ \varDelta _{\textrm{out}}- n}\), and there are three cases:

  1. 1.

    \(C < 1\): we follow the approach of [23], which is to repeat \(2^k\) times a Grover search among structures, to find one that contains a pair (this is done with Ambainis’ algorithm). The time exponent is \(k + \frac{n-{\varDelta _{\textrm{out}}}}{2} - \frac{\varDelta _{\textrm{in}}}{3}\).

  2. 2.

    \(1< C < 2^k\): we need to consider several structures and to extract all of their collision pairs. Using Theorem 6 this gives a time exponent:

    $$\begin{aligned} \max \left( k + \frac{2}{3}(n-\varDelta _{\textrm{in}}-\varDelta _{\textrm{out}}) ,k + \min \left( n-\varDelta _{\textrm{out}}- \varDelta _{\textrm{in}}, \frac{n-\varDelta _{\textrm{out}}}{4} \right) \right) \end{aligned}$$
  3. 3.

    \(2^k < C\): we need only one structure. To recover \(2^k\) pairs, we need a time exponent (by Theorem 6):

    $$\begin{aligned} \max \left( \frac{2k}{3} + \frac{n - \varDelta _{\textrm{out}}}{3}, k + \min \left( n-\varDelta _{\textrm{out}}- \varDelta _{\textrm{in}}, \frac{n-\varDelta _{\textrm{out}}}{4} \right) \right) \end{aligned}$$

Finally, we can swap the roles of \(\varDelta _{\textrm{in}}\) and \(\varDelta _{\textrm{out}}\) and take the minimum. Unfortunately this does not lead to an equation as simple as Equation (16).

7.3 On Multicollision-Finding

A natural extension of this work would be to look for multicollisions.

Problem 5

(r -collision search). Let \(f~: \{0,1\}^n \rightarrow \{0,1\}^m\) be a random function. Find an r-collision of f, that is, a tuple \((x_1, \ldots , x_r)\) of distinct elements such that \(f(x_1) = \ldots = f(x_r)\).

As with collisions, the lower bound by Liu and Zhandry [25] is known to be tight when \(m \le n\). The corresponding algorithm is an extension of the BHT algorithm which constructs increasingly smaller lists of i-collisions, starting with 1-collisions (evaluations of the function f on arbitrary points) and ending with a list of r-collisions.

This algorithm, given in [17, 18], finds \(2^k\) r-collisions in time and memory:

$$\begin{aligned} \widetilde{\mathcal {O}} \left( 2^{k\frac{2^{(r-1)}}{2^r-1}} 2^{m\frac{2^{(r-1)}-1}{2^r-1}} \right) . \end{aligned}$$

As with 2-collisions, it is possible to extend it when \(m > n\). Of course, there’s a constraint: the list i must contain more tuples that are part of an \(i+1\)-collision than the size of the list \(i+1\).

The size of each i-collision list is \(N_i = 2^{k\frac{2^r-2^{r-i}}{2^r-1}} 2^{m \frac{2^{r-i}-1}{2^r-1}}\). The probability that an i-collision extends to an \(i+1\)-collision is of order \(2^{n-m}\). Hence, for the algorithm to work, we must have, for all i, \(N_{i+1}/N_i \le 2^{n-m}\). This means:

$$ k \frac{2^{r-i-1}}{2^r-1} - m\frac{2^{r-i-1}}{2^r-1} \le n-m . $$

This constraint is the most restrictive for the largest possible i, \(r-1\). We obtain the following constraint, which subsumes the others:

$$ k \frac{1}{2^r-1}+m\left( 1-\frac{1}{2^r-1} \right) \le n . $$

This gives the point up to which this algorithm meets the lower bound. We could use our new algorithm as a subroutine in this one, to find 2-collisions, and this would allow to relax the constraint over \(N_2/N_1\). Unfortunately, this cannot help to find multicollisions, as the other constraints are more restrictive. More generally, these constraints show that it is not possible to increase the range of the BHT-like r-collision algorithm solely by using an \(r-i\)-collision algorithm with an increased range.