1 Introduction

The generalized birthday problem (GBP) is a generalization of the classical birthday problem of finding a collision between two elements in two lists, introduced by Wagner in 2002 [15]. Since its introduction, Wagner’s K-tree algorithm for GBP has become a widely applicable tool used in cryptanalysis of code-based cryptosystems [2] (that are important designs in post-quantum cryptography), hash functions (such as FSB [4]) and stream ciphers, where it is used as a procedure in fast correlation attacks [6, 10]. Furthermore, it is an important component in improved algorithms for hard instances of the knapsack problem [1, 9].

We consider the most relevant GBP variant in cryptanalysis. For integer parameters \(K>0\) and \(0 \le \ell \le n\), we are given access to a function \(H:\{0,1\}^{\ell } \rightarrow \{0,1\}^n\), and the goal is to find K distinct inputs to H, \(\{x_i\}_{i=1}^{K}\), such that \(\sum _{i=1}^{K}H(x_i) = 0\). For simplicity, we assume that addition is performed bitwise over GF(2), but our algorithms easily extend to work with addition over \(GF(2^n)\). We view H as a random oracle whose outputs are selected independently and uniformly at random from \(\{0,1\}^n\). The number of K-tuples over \(\ell \)-bit words is about \(2^{K\ell }\), and as the problem places an n-bit constraint on the solution, the expected number of solutions is \(2^{K\ell -n}\). In particular, we expect a solution only if \(K\ell \ge n\). For \(K \ge 2\), the problem can be solved in time \(2^{n/2}\) using a simple collision search. Wagner’s observation was that for values of \(K \ge 4\), the problem can be solved much more efficiently assuming that the number of expected solutions is sufficiently large.

As a specific application of GBP to cryptanalysis, we consider the problem of breaking a code-based cryptosystem which can typically be reduced to solving a syndrome decoding (SD) problem. The input to the SD problem consists of a matrix \(B \in \{0,1\}^{n \times m}\), a word \(s \in \{0,1\}^n\) and an integer \(w > 0\). The goal is to find a word \(e \in \{0,1\}^{m}\) of Hamming weight (at most) w such that \(Be = s\). In other words, we are looking for w columns of B that sum to s over \(GF(2^n)\). We can reduce this problem to GBP with \(K = w\) by defining H(i) to output the i’th column of the matrix (for \(i \in [m]\)).Footnote 1 We note that instances of SD which originate from cryptography typically have many solutions, and hence GBP algorithms are relevant. For more details about the application of GBP to cryptanalysis of code-based cryptosystems, we refer the reader to [8].

Wagner’s K-tree algorithm for GBP with \(K=2^k\) (where k is a positive integer) receives as input K lists \(\{L_i\}_{i=1}^{K}\), each containing about \(2^{n/(k+1)}\) strings of n bits, which are assumed to be uniform in \(\{0,1\}^n\). The algorithm returns a K-tuple \(\{y_i\}_{i=1}^{K}\), where \(y_i \in L_i\) such that \(\sum _{i=1}^{K}y_i = 0\). The algorithm can be used to solve GBP assuming that \(\ell \ge n/(k+1)\) by initializing the lists \(\{L_i\}_{i=1}^{K}\) with elements of the form \(y = H(x)\) for arbitrary values of \(x \in \{0,1\}^\ell \).

At a high level, the K-tree algorithm merges its \(2^k\) inputs lists in a full binary tree structure with k layers. In each layer, the lists are merged in pairs, where each merged pair gives a new list that is input to the next layer and contains words with a larger zero prefix. Finally, the last layer yields a zero word which can be traced back to a K-tuple \(\{x_i\}_{i=1}^{K}\) such that \(\sum _{i=1}^{K}H(x_i) = 0\) as required. The time and memory complexities of the K-tree algorithm are about \(2^{n/(k+1)} = N^{1/(\log K +1)}\) (up to constants and small multiplicative factors in nK), as detailed in Sect. 3. Since any GBP algorithm for a certain value of K can be extended with the same complexity to any \(K' > K\), the time and memory complexities of the K-tree algorithm for general K are \(N^{1/(\log K +1)}\), where \(\log K\) is rounded down to the nearest integer.

Due to the high memory consumption of the K-tree algorithm, an important challenge (already pointed out by Wagner) is to investigate time-memory tradeoff algorithms for GBP, which optimize the time complexity T given only \(2^m = M < 2^{n/(k+1)}\) memory. The trivial time-memory tradeoff algorithm repeatedly builds K lists of size M from arbitrary inputs to H and executes the K-tree algorithm until a solution is found. Simple analysis shows that this algorithm achieves a tradeoff of \(TM^{\log K} = N\), which is very inefficient even when K is moderately large, as the time complexity T increases sharply for memory \(M < N^{1/(\log K + 1)}\).

Improved tradeoffs were first published in [3, 4] (by Bernstein and Bernstein et al., respectively), where the main idea is to execute (what we call) a preparation phase before running the K-tree algorithm. This phase iterates over a portion of the domain space and builds lists that are input to the K-tree algorithm (rather than building them arbitrarily), thus increasing its probability to find a solution. A different approach to the preparation phase based on partial collisions in H was published independently by Nikolić and Sasaki [12] and by Biryukov and Khovratovich [5]. This technique gives the currently best known GBP time-memory tradeoff of \(T^2M^{\log K -1} = N\).

In this paper, we devise a sub-linearFootnote 2 time-memory tradeoff \(T^{\lceil (\log K)/2 \rceil + 1 }M^{\lfloor (\log K)/2 \rfloor } = N\) for GBP with \(K \ge 8\). This improves upon the currently best known tradeoff \(T^2M^{\log K -1} = N\) for all \(K \ge 8\). Our tradeoff is applicable whenever \(T^{1/2} \le M \le T\). For the range \(1 < M \le T^{1/2}\), we also improve upon the best known tradeoff for \(K \ge 8\), but our curve formula becomes more complex as K grows. In general, given an instance of GBP, one selects a GBP algorithm that is applicable for the available amount of memory M.

We plot our GBP tradeoff curves for \(K \in \{8,16,32\}\) in Fig. 1, while comparing them to the best known ones [5, 12]. Note that the tradeoff \(T^{\lceil (\log K)/2 \rceil + 1 }M^{\lfloor (\log K)/2 \rfloor } = N\) for the range \(T^{1/2} \le M \le T\) corresponds to the rightmost linear piece of our curve (for \(K=16\) it extends beyond this range), and its improvement compared to the previously best known curve grows with K (visually, the angle between the curves grows with K).

Fig. 1
figure 1

GBP time-memory tradeoff curves for \(K \in \{8,16,32\}\)

We further improve the best known tradeoffs for values of K that are not powers of 2. Our improved tradeoffs for \(K \in \{6,14\}\) are plotted in Fig. 2.

Fig. 2
figure 2

GBP time-memory tradeoff curves for \(K \in \{6,14\}\)

Finally, we consider practical settings in which the domain size \(L= 2^\ell \) of H is limited and the K-tree algorithm cannot be directly applied. The best known algorithm for such cases is an extension of the K-tree algorithm, published by Minder and Sinclair [11]. In this restricted setting, we show how to solve many GBP instances more efficiently than the extended K-tree algorithm, improving both its time and memory complexities.

As noted above, previous time-memory tradeoffs applied a preparation phase to initialize several lists and search for a solution among them in (what we call) a list sum phase. In these works the focus was placed on the preparation phase, while the list sum phase applied the K-tree algorithm. In contrast, we focus on the list sum phase and develop algorithms that are superior to the straightforward application of the K-tree algorithm when the available memory is limited. We then carefully combine these algorithms with previous preparation phase techniques to obtain improved time-memory tradeoffs for GBP.

We begin by considering a list sum problem whose input consists of K sorted lists \(\{L_i\}_{i=1}^{K}\) of n-bit words and the goal is to find a certain number of K-tuples \(\{y_i\}_{i=1}^{K}\), where \(y_i \in L_i\) such that \(\sum _{i=1}^{K}y_i = 0\). There are several exhaustive list sum memory-efficient algorithms known for this problem that output all solutions. These algorithms are the starting points of the framework we develop in this paper. Our framework transforms such an exhaustive list sum algorithm into an efficient GBP algorithm for a given amount of memory. Obviously, an exhaustive list sum algorithm can be directly applied to solve GBP (after initializing \(\{L_i\}_{i=1}^{K}\) accordingly), as the goal is to find only one out of all solutions. However, this trivial application is inefficient since it does not exploit the fact that we only search for a single solution and moreover, it does not use a preparation phase.

Our framework consists of three main parts. First, we transform a given exhaustive list sum algorithm to efficiently output a limited number of solutions, obtaining a basic list sum algorithm, optimized for a specific value of \(K = P\). The second part of the framework composes basic algorithms for \(K=P\) in a layered tree structure which resembles the K-tree algorithm (but with arity P instead of 2). This gives optimized list sum algorithms for values of \(K = P^k\) (and additional values) where k is a positive integer. Finally, after optimizing the list sum phase, we combine it with a preparation phase to obtain a memory-efficient GBP algorithm.

Arguably, the most interesting part of our framework is the first part in which we analyze exhaustive list sum algorithms and transform them to efficiently output a limited number of solutions. There are two classes of exhaustive memory-efficient list sum algorithms relevant to this work: the first class consists of variants of the Schroeppel–Shamir algorithm for solving knapsacks, devised by Howgrave-Graham and Joux [9] (which focused on \(K=4\)) and by Becker et al. [1] (which applied a recursive variant for \(K=16\)). The second class consists of dissection algorithms [7], published by Dinur et al. and used to efficiently solve certain search problems with limited amount of memory.

In general, both classes of exhaustive list sum algorithms partition the problem on K lists into smaller subproblems, solve these subproblems and merge the solutions to solve the original problem. The difference between the classes is in the way the problem is partitioned: while Schroeppel–Shamir variants partition problem symmetrically into subproblems of equal sizes, dissection algorithms partition the problem asymmetrically into smaller and larger subproblems. Thus, Schroeppel–Shamir variants work best on values of K which are powers of 2, while dissection works best on “magic numbers” of K (which are not necessarily powers of 2) such as 7 and 11 that exhibit ideal asymmetric partitions.

Even though the focus of this work is on memory-efficient GBP algorithms, for many GBP instances with restricted domains our techniques yield improvements in both time and memory complexities compared to the extended K-tree algorithm [11] (which is the current state-of-the-art). This occurs for values of K which are not powers of 2 (such as \(K=7\)), where our new algorithms extend the dissection framework. The time complexity improvement is due to the fact that symmetric algorithms round the value of K down to the nearest power of 2 and ignore many of the possible solutions. On the other hand, the efficiency of GBP algorithms depends on their ability to find one out of many solutions, and ignoring a large fraction of them in advance is a suboptimal approach.

The rest of the paper is organized as follows. In Sect. 2 we introduce our notations and conventions, and describe preliminaries and previous work in Sect. 3. The first part of our framework that transforms exhaustive list sum algorithms to basic ones is introduced in Sect. 4, while the second part that constructs new layered algorithms is described in Sect. 5. In Sect. 6 we focus on the third part of the framework that combines preparation and list sum phase algorithms to solve GBP. Finally, in Sect. 7 we apply our new algorithms to GBP instances with a restricted domain and conclude the paper in Sect. 8.

2 Notations and conventions

Given an n-bit string x, we label its bits as \(x[1],x[2],\ldots ,x[n]\) (where x[1] is the least significant bit or LSB). Given integers \(1 \le a \le b \le n\), we denote by \(x[a\text {--}b]\) the \((b-a+1)\)-bit string \(x[a],x[a+1],\ldots ,x[b]\).

Let \(F:\{0,1\}^{\ell } \rightarrow \{0,1\}^n\), be a function for \(\ell \le n\) . Given parameters \(\ell ' \le \ell \) and \(n' \le n\), define the truncated function\(F_{|\ell ',n'}:\{0,1\}^{\ell '} \rightarrow \{0,1\}^{n'}\) as \(F_{|\ell ',n'}(x') = F(x)[1\text {--}n']\) (where the \(\ell \)-bit string x is constructed by appending \(\ell -\ell '\) zero most significant bits to the \(\ell '\)-bit string \(x'\)).

The generalized birthday problem (GBP) GBP with parameter K is given oracle access to a function \(H:\{0,1\}^{\ell } \rightarrow \{0,1\}^n\) for \(\ell \le n\), and the goal is to find a K-tuple \(\{x_i\}_{i=1}^{K}\), where \(x_i \in \{0,1\}^{\ell }\) are distinct (i.e., \(x_i \ne x_j\) for \(i \ne j\)) such that \(\sum _{i=1}^{K}H(x_i) = 0\). The addition is performed bitwise over GF(2).

We assume in this paper that H is a pseudo-random function. Our goal is to optimize the time complexity T of solving GBP with parameters K and \(\ell \), given \(M = 2^m\) words of memory, each of length n bits. In some settings (such as in [4]) each \(x_i\) in the output K-tuple needs to come from a different domain. This can be modeled using K functions \(\{H_i\}_{i=1}^{K}\). The adaptation of the algorithms we consider to this setting is mostly straightforward.

Typically, GBP algorithms evaluate the function H in a preparation phase in order to set up an instance of the list sum problem.

The list sum problem Given K sorted lists \(\{L_i\}_{i=1}^{K}\), each of \(M=2^m\) words (chosen uniformly at random) of length at least n, the goal is to find S (one, several, or all) K-tuples \(\{y_i\}_{i=1}^{K}\), where \(y_i \in L_i\) such that \((\sum _{i=1}^{K}y_i)[1\text {--}n] = 0\). The number of required solutions S is a parameter to the problem. We note that in our framework list sum algorithms use about \(M = 2^m\) memory (which is the input size, up to the constant K).Footnote 3

The list sum problem is related to the well-known K-SUM problem, which searches for a single solution \(\sum _{i=1}^{K}y_i=0\) in one input set, as opposed to several lists. Moreover, typically the distribution of words in the input set of the K-SUM problem is arbitrary and it is a worst case problem (whereas we are interested in average case complexity).

2.1 Naming conventions and notations for list sum algorithms

An algorithm that finds all solutions to the list sum problem is called an exhaustive list sum algorithm. Otherwise, we name algorithms that output a limited number of solutions (with a bound on S) according to their internal structure: in general we have basic algorithms and layered algorithms that compose basic algorithms in a tree structure (similarly to the K-tree algorithm).

We refer to a list sum algorithm that solves the list sum problem for a specific value of K as a K-way list sum algorithm. However, when referring to specific list sum algorithms we mostly need more refined notation that distinguishes them according to the values of K and S (the number of required solutions) and also the type of basic algorithms composed in layered algorithms (which determine the arity of the tree). These parameters are sufficient to uniquely identify each list sum algorithm considered in this paper.

The (unique) non-layered list sum algorithm with parameters KS is denoted by \(A_{K,S}\), where S is the number of solutions it produces. In case the algorithm is exhaustive (outputs all solutions), we simply write \(A_{K}\). For example, \(A_{4}\) is an exhaustive 4-way list sum algorithm, while \(A_{4,1}\) produces only a single solution (and hence can be naturally used to solve GBP with \(K=4\)).Footnote 4 In general, we will be interested in exhaustive \(A_{K}\) algorithms, basic \(A_{K,1}\) algorithms that output a single solution and basic \(A_{K,2^m}\) algorithms that produce \(S=2^m\) solutions, allowing to compose basic algorithms and form layered ones.

The (unique) layered list sum algorithm with \(S=1\) and arity P is denoted by \(A_{K}^{P}\) (we do not consider layered algorithms with \(S>1\) in this paper). For example, the K-tree algorithm is denoted by \(A_K^2\), as it merges its input lists in pairs. We note that layered algorithms can be uniquely distinguished by their arity P, while K is left in symbolic form (unlike basic algorithms). We also remark that for any specific value of K, the algorithm \(A_{K}^{K}\) has a single layer and is actually the basic algorithm \(A_{K,1}\) (which is our preferred notation).

When composing basic list sum algorithms in layers, the LSBs of the words in the input lists to the algorithms may already be zeroed by a previously applied basic algorithm and the task of the succeeding basic algorithm is to output solutions where the next sequence of bits is nullified. In such cases, we can ignore the zero LSBs in the input lists, resulting in a problem that complies with the above definition of list merge problem (which requires nullifying LSBs).

2.2 Complexity evaluation

The time and memory complexities of our algorithms are functions of the parameters n and K (and also \(L = 2^{\ell }\) for some GBP instances). In the complexity analysis, we ignore multiplicative polynomial (and constant) factors in n and K, which is common practice in exponential-time algorithms. Nevertheless, we note that these factors are relatively small for the algorithms considered.

3 Preliminaries and previous work

The literature relevant to this work is vast and consists of [1, 3,4,5, 7, 9, 12, 14, 15]. In this section we summarize it constructively so we can build upon it in the rest of this paper. We first describe general properties of list sum algorithms in Sect. 3.1. Then, we describe previous exhaustive list sum algorithms \(A_{K}\): in Sect. 3.2 we focus on exhaustive symmetric Schroeppel–Shamir variants for several values of K which are powers of 2 (\(K \in \{4,16\}\) and the basic \(K=2\)), while in Sect. 3.3 we deal with asymmetric exhaustive dissection algorithms. In Sect. 3.4, we show how exhaustive algorithms for \(K=2\) and \(K=4\) were efficiently adapted to basic algorithms that output a limited number of solutions. In Sect. 3.5, we describe the (layered) K-tree algorithm. Next, in Sect. 3.6 we focus on the preparation phase algorithms parallel collision search (PCS) and clamping through precomputation (CTP). We end this section by summarizing the currently best known time-memory tradeoff for GBP in Sect. 3.7.

3.1 General properties of list sum algorithms

From a combinatorial viewpoint, the number of K-tuples \(\{y_i\}_{i=1}^{K}\) in the K lists input to the list sum problem is \(2^{Km}\). Since the problem imposes an n-bit restriction on them, the number of expected solutions is \(2^{Km-n}\). Hence the list sum problem is interesting only if \(m \ge n/K\). If we impose an additional b-bit constraint on the tuples (e.g., by requiring that \((y_1+y_2)[1\text {--}b] = c\) for an arbitrary b-bit value c), then the number of expected solutions dropsFootnote 5 to \(S = 2^s = 2^{Km-b-n}\).

Nullifying bits In this paper, we mostly use an equivalent statement, where we view n as a parameter: if we search for \(S = 2^s\) solutions to the problem and set an additional b-bit constraint on the solutions, then the number of bits we can nullify is

$$\begin{aligned} n = Km-b-s. \end{aligned}$$
(1)

By default, a list sum algorithm is given the n-bit target value 0, but it can easily be adapted and applied with similar complexity to an arbitrary n-bit target value w, such that it outputs K-tuples \(\{y_i\}_{i=1}^{K}\) where \(\sum _{i=1}^{K}y_i[1\text {--}n] = w\). This can be done by XORing w to the all the words in \(L_1\), resorting it and solving the problem with a target of 0.

A basic list sum algorithm that searches for a single solution \(S=1\) for a certain K value using M memory (such that \(m \ge n/K\)) can be applied with the same complexity to any \(K'> K\). This is done by choosing an arbitrary \((K'-K)\)-tuple \(\{y_i\}_{i=K+1}^{K'}\) from lists \(\{L_i\}_{i=K+1}^{K'}\), and applying the given algorithm with input lists \(\{L_i\}_{i=1}^{K}\) and target value \(\sum _{i=K+1}^{K'}y_i[1\text {--}n]\). Hence the list sum problem for \(S=1\) does not become harder as K grows, in contrast to exhaustive list sum problems that require all solutions.

We now describe exhaustive list sum algorithms of type \(A_{K}\) for several values of K. We denote by \(T = 2^{\tau _{K}m}\) (for a parameter \(\tau _{K}\)) the time complexity of \(A_{K}\).

3.2 Exhaustive symmetric list sum algorithms \(A_{K}\) for \(K=2^k\)

3.2.1 \(A_{2}\)

The standard list sum algorithm \(A_{2}\) looks for all matches on n bits between two sorted lists of size \(2^m\). There are \(2^{2m-n}\) possible solutions to the problem and \(A_{2}\) finds them in time \(T = 2^m\) (\(\tau _{2} =1\)) assuming that their number is at most \(2^m\), namely, \(2^{2m-n} \le 2^m\) or \(m \le n\).

3.2.2 \(A_{4}\) [9]

This algorithm was devised in [9] by Howgrave-Graham and Joux as a practical variant of the Schroeppel–Shamir’s algorithm [13] (see Fig. 3).

figure a

The \(A_{4}\) algorithm enumerates all possible solutions, as any specific solution \(y_1+y_2+y_3+y_4 = 0\) is output when the value of c is set to \((y_1+y_2)[1\text {--}m]\). The expected number of solutions is \(2^{4m-n}\), and assuming that their number is not larger than \(2^{2m}\) (i.e., \(2^{4m-n} \le 2^{2m}\) or \(m \le n/2\)), then its time complexity is \(T = 2^{2m}\), namely \(\tau _{4}=2\) (since the time complexity of the loop for each value of c is \(2^m\)).

The algorithm may produce up to \(2^{2m}\) solutions, but its memory complexity is only \(2^m\). This is possible since we not required to store the solutions, but may stream them to another algorithm that will process them on-the-fly. This is an important property that holds for all list sum algorithms described in this paper.

Fig. 3
figure 3

\(A_{4}\) with \(M=2^{n/4},T=2^{n/2}\)

3.2.3 \(A_{16}\) [1]

An extension of the \(A_{4}\) algorithm will allow us to construct algorithms that find a single solution (\(S=1\)) for a limited range of the memory parameter M. An extension of the \(A_{16}\) algorithm below will yield algorithms that find a single solution for a broader range of memory complexities. The exhaustive \(A_{16}\) algorithm is a recursive variant of the previous \(A_{4}\) algorithm, published by Becker et al. [1] (see Fig. 4).

figure b

We iterate over all possible solutions in time \(T = 2^{9m+2m}= 2^{11m}\) (\(\tau _{16}=11\)),Footnote 6 assuming the number of solutions satisfies \(2^{16m-n} \le 2^{11m}\) or \(m \le n/5\).

Fig. 4
figure 4

\(A_{16}\) with \(M=2^{n/16},T=2^{11n/16}\)

3.3 Exhaustive asymmetric list sum algorithms \(A_{K}\) [7]

Dissection is an algorithmic framework for solving certain combinatorial search problems with optimized time-memory tradeoffs. It was introduced by Dinur et al. at CRYPTO 2012 [7] and used in order to break multiple encryption (i.e., an iterated construction of a block cipher with independent keys) and to solve the knapsack and Rubic’s cube problems with improved combinations of time and space complexities. In our context, we view dissection algorithms as memory-efficient asymmetric list sum algorithms of class \(A_{K}\).

Given a \(A_{K'}\) algorithm, it can be trivially utilized as a \(A_{K}\) algorithm for \(K > K'\) (with no additional memory) by enumerating all the \(2^{m(K-K')}\) possible tuples in the first \(K-K'\) lists, and applying \(A_{K'}\) on the remaining \(K'\) lists (with the target sum set to be the sum of the current \((K-K')\)-tuple). However, for certain values of K we can do better than this trivial algorithm, and dissection algorithms define a sequence of values of K for which this efficiency gain occurs.

The first dissection algorithm is defined for \(K=2\) (namely \(A_{2}\)), and it looks for matches in its 2 sorted input lists. The next number in the sequence is \(K=4\) and this dissection algorithm essentially coincides with \(A_{4}\) described above. Next is the 7-way dissection algorithm, which utilizes the 3-way list sum algorithm \(A_{3}\) described below.

3.3.1 \(A_{3}\)

figure c

The algorithm enumerates over all \(2^{3m-n}\) solutions in time \(T=2^{2m}\) (\(\tau _{3}=2\)) assuming \(m \le n\).

3.3.2 \(A_{7}\)

For \(K \ge 7\), the asymmetry in dissection algorithms becomes more apparent, as they partition the problem of size K into two subproblems of different sizes. We begin by describing the \(K=7\) algorithm.

figure d

The algorithm enumerates all possible solutions to the problem, since each solution can be decomposed as above. The time complexity of each 3-way and 4-way list sum steps in the loop is \(2^{2m}\), while we iterate over \(2^{2m}\) possible values of c. Hence, the expected time complexity is \(T = 2^{4m}\) (\(\tau _{7}=4\)) as long as the number of solutions is at most \(2^{4m}\). Since the expected number of solution is \(2^{7m-n}\), we require \(7m-n \le 4m\) or \(m \le n/3\).

We also note that the algorithm splits the problem on 7 lists into 2 subproblems of respective sizes 3, 4, while the size 4 problem itself is internally split into two subproblems of sizes 2, 2 by the \(A_{4}\) algorithm. Altogether, the problem of size 7 is split into 3 subproblems of respective sizes 3, 2, 2.

3.3.3 General dissection

The now give the details and analysis of general dissection algorithms. For a value of \(i=0,1,2,\ldots \), we construct a \(K_i\)-way list sum algorithm that runs in time \(\tau _{K_i}\) such that \(K_i = 1+i(i+1)/2\) and \(\tau _{K_i} = 1+ i(i-1)/2\). Note that, as always, we require that \(m \ge n/K\) so at least one solution is expected. Moreover, the expected number of solutions cannot be larger than the runtime, i.e., \(2^{K_im - n} \le 2^{\tau _{K_i}m}\) or \((K_i - \tau _{K_i})m \le n\), giving the requirement \(m \le n/i\) (for a positive i).

In particular, after \(A_{7}\) which corresponds to \(K_3 = 7, \tau _{7}=4\), we have \(K_4 = 11\) and \(\tau _{11} = 7\), i.e. the \(A_{11}\) dissection algorithm has time complexity \(2^{7m}\) (assuming \(m \le n/4\)). Internally, \(A_{K_i}\) recursively splits the problem of size \(K_i = 1+i(i+1)/2\) into i subproblems of sizes \(i,i-1,i-2,\ldots ,3,2,2\) and applies the algorithms \(A_{i},A_{i-1},\ldots ,A_{3},A_{2},A_{2}\), respectively, for various choices of intermediate target values (such as the 2m-bit word c in \(A_{7}\)).

Assume that we have a \(A_{K'}\) algorithm that given \(M = 2^m\) memory runs in time complexity \(T' = 2^{\tau _{K'}m} < 2^{n}\) (for an integer \(\tau _{K'} \ge 1\)). Moreover, the algorithm can be applied in streaming mode (as all list sum algorithms described in this paper). Namely, given a target of \(n' = m(K' - \tau _{K'})\) bits with arbitrary value, it finds all the \(2^{n -m(K' - \tau _{K'})}\) expected solutions in time \(T' = 2^{\tau _{K'}m}\) (assuming \(m \ge n/K'\)). Then, we can construct a \(A_{K}\) algorithm for \(K = 2K' - \tau _{K'}+1\) with time complexity \(T = 2^{K'm}\) (\(\tau _{K} = K'\)) and memory complexity \(M = 2^m\) as follows.

figure e

The memory complexity of the algorithm is indeed \(M = 2^m\). In terms of time complexity, Since \(\tau _{K'} \ge 1\), then the \(A_{K'}\) algorithm dominates the \(A_{K' - \tau _{K'} + 1}\) algorithm in the inner loop (it is applied to a problem of size at least as large). Overall, the time complexity is \(T = 2^{m(K' - \tau _{K'}) + \tau _{K'}m} = 2^{K'm}\) as claimed.

The algorithm can be applied in streaming mode and therefore it can be applied recursively, giving a sequence of dissection algorithms. The dissection sequence starts with \(K_0 = 1\) and \(\tau _{K_0} = \tau _{1} = 1\) (which is a trivial 1-way list sum algorithm). Next, we obtain \(K_1 = 2K_0 - \tau _{K_0} + 1= 2\) and \(\tau _{K_1} = \tau _{2} = K_0 = 1\), \(K_2 = 2K_1 - \tau _{K_1} + 1 = 4\) and \(\tau _{K_2} = \tau _{4} = K_1 = 2\). In general, it is easy to verify that \(K_i = 1+i(i+1)/2\) and \(\tau _{K_i} = 1+ i(i-1)/2\). In particular, after \(K_3 = 7\), we have \(K_4 = 11\) and \(\tau _{11} = 7\), i.e. the \(A_{11}\) dissection algorithm has time complexity \(2^{7m}\). As mentioned above, observe that \(A_{K_i}\) recursively splits the problem of size \(K_i = 1+i(i+1)/2\) into i subproblems of respective sizes \(i,i-1,i-2,\ldots ,3,2,2\).

3.4 Basic list sum algorithms

3.4.1 \(A_{2,1}\) and \(A_{2,2^m}\)

The \(A_{2,1}\) algorithm is an extension of the \(A_{2}\) algorithm of Sect. 3.2. It searches for a single solution (\(S = 2^s = 1\)) to the problem by looking for an n-bit match in the two lists. According to (1), \(A_{2,1}\) can nullify \(n = 2m\) bits in time \(T = 2^m\).

For \(A_{2,2^m}\), instead of matching and nullifying 2m bits using \(A_{2,1}\) with \(S=1\), we require \(S = 2^m\) solutions, or \(s=m\). Plugging \(K=2, b=0, s=m\) into (1), we conclude that we match and nullify \(n=m\) bits (in time complexity \(T = 2^m\)).

3.4.2 \(A_{3,1}\)

This algorithm extends the \(A_{3}\) algorithm of Sect. 3.3. It searches for a single solution to the problem and hence can nullify \(n = 3m\) bits in time \(2^{2m}\).

3.4.3 \(A_{4,1}\) and \(A_{4,2^m}\) [9]

The \(A_{4,1}\) algorithm is an extension of the \(A_{4}\) algorithm of Sect. 3.2, also described in [9]. When we search for a limited number of solutions to the list sum problem, we can enumerate over fewer values of the m-bit intermediate target valuec in the \(A_{4}\) algorithm. This is equivalent to placing another constraint on the 4-tuple solutions: for \(0 \le v \le m\), we place a \((m-v)\)-bit constraint by only enumerating over \(2^{v}\) values of c. Setting \(v=0\) places an m-bit constraint and reduces the time complexity of the algorithm to \(2^m\), while nullifying \(n = 4m - m = 3m\) bits (according to (1)). In general, we can nullify \(n = 4m - (m-v) = 3m+v\) bits in time \(T = 2^{m+v}\), giving the tradeoff

$$\begin{aligned} TM^2 = 2^{m+v} \cdot 2^{2m} = 2^{3m+v} = 2^n = N. \end{aligned}$$

The tradeoff is only applicable for \(0 \le v \le m\) or \(T^{1/2} \le M \le T\).

For \(A_{4,2^m}\) we apply a similar algorithm, but (according to (1)) since s is increased by m then n is reduced by m. Denote by \(n'\) the parameter of \(A_{4,1}\), then \(n = n' - m\). Using the tradeoff formula above, we obtain \(TM^2 = 2^{n'} = 2^{n+m} = NM\) or \(TM = N\), applicable once again for \(T^{1/2} \le M \le T\).

3.5 The K-tree algorithm: \(A_K^2\) [15]

We begin by describing the K-tree algorithm for \(K = 4\) (namely, \(A_4^2\)) below. This algorithm is also shown in Fig. 5.

figure f

The algorithm is composed to 2 layers. The input to the first layer consists of 4 input lists and it merges the lists in pairs by applying \(A_{2,2^m}\) and outputting the lists \(L'_1,L'_2\), each containing \(2^m\) words whose m LSBs are zero (since \(A_{2,2^m}\) nullifies m bits). These lists are input to layer 2 which applies \(A_{2,1}\) to obtain the final solution, nullifying additional 2m bits. Altogether, \(n = 3m\) bits are nullified in \(T = 2^m = 2^{n/3} = N^{1/3}\) time and \(M = 2^m = N^{1/3}\) memory.

Note that \(A_{4,1}\) above for \(M=T\) is, in fact, the \(A_4^2\) algorithm. Indeed, when \(M=T\), \(A_{4,1}\) is composed of 2 layers of 2-way list sum algorithms.

The general algorithm. The K-tree algorithm for general \(K = 2^k\) is a \(A_K^2\) algorithm that works in k layers as summarized next. For more details, refer to [15].

For any integer \(0 \le j \le k-1\), the input to layer j consists of \(K/2^j\) sorted input lists, where for each word in each list the \(j \cdot m\) LSBs are zero. At layer \(0 \le j < k-1\) the algorithm merges the lists in pairs by applying \(A_{2,2^m}\) and outputting \(K/2^{j+1}\) new lists, each containing \(2^m\) words whose \((j+1) \cdot m\) LSBs are zero (as \(A_{2,2^m}\) nullifies m bits). These lists are input to the next layer \(j+1\).

Finally, the input to layer \(j=k-1\) consists of \(K/2^{k-1} = 2\) lists of expected size \(2^m\) containing words whose \((k-1) \cdot m\) LSBs are zero. The K-tree algorithm then applies \(A_{2,1}\) to obtain the final solution, nullifying additional 2m bits. Altogether, \(n = (k+1) \cdot m\) bits are nullified in \(T = 2^m = 2^{n/(k+1)} = N^{1/(\log K + 1)}\) time and \(M = 2^m = N^{1/(\log K + 1)}\) memory.

Fig. 5
figure 5

\(A_4^2\) with \(M=2^{n/3},T=2^{n/3}\)

3.6 Preparation phase algorithms

3.6.1 Parallel collision search [14]

The parallel collision search (PCS) algorithm was published by van Oorschot and Wiener [14] as a memory-efficient technique for finding collisions in an r-bit function \(F:\{0,1\}^r \rightarrow \{0,1\}^r\). Given \(2^m \le 2^r\) words of memory, the algorithm builds a chain structure containing \(2^m\) chains, where a chain starts at an arbitrary point and computed by iteratively applying F. Each chain is terminated after about \(2^{(r-m)/2}\) evaluations, hence the structure contains a total of about \(2^{m} \cdot 2^{(r-m)/2} = 2^{(r+m)/2}\) distinct points. The fact that the chains are of length \(2^{(r-m)/2}\) ensures that each chain collides with a different chain in the structure with high probability according to the birthday paradox, as the number of relevant pairs of points is \(2^{(r-m)/2} \cdot 2^{(r+m)/2} = 2^r\). Therefore, the structure contains about \(2^m\) collisions.

The collisions can be recovered efficiently by defining a set of \(2^{(r+m)/2}\) distinguished points according to an easily verifiable condition (e.g., the \((r-m)/2\) LSBs of the r-bit word are 0). Each chain in the structure is terminated at a distinguished point (and hence its expected length is \(2^r/2^{(r+m)/2} = 2^{(r-m)/2}\) as required). The PCS algorithm stores the distinguished points sorted in memory and collisions between chains are detected at their distinguished points. The actual collisions are obtained by recomputing the colliding chains.

In total, PCS finds \(2^m\) collisions in an r-bit function in time complexity

$$\begin{aligned} T=2^{(r+m)/2}. \end{aligned}$$

3.6.2 Parallel collision search in expanding functions [14]

Assume that our goal is to find \(2^m\) collisions using \(2^m\) memory in the expanding function \(F:\{0,1\}^{\ell } \rightarrow \{0,1\}^r\), where \((r+m)/2 \le \ell \le r\) (ensuring that \(2^m\) collisions indeed exist in F). We apply PCS to the truncated function \(F_{|\ell ,\ell }\) to find \(2^m\) collisions on \(\ell \) bits of F in \(2^{(\ell + m)/2}\) time. Each such collision extends to a full r-bit collision with probability \(2^{\ell -r}\). In other words, the PCS execution gives an expected number of \(2^{m + \ell - r}\) full collisions in F. To find the required \(2^m\) collisions, we repeat the process \(2^{r-\ell }\) times, giving time complexity of

$$\begin{aligned} T = 2^{r-\ell } \cdot 2^{(\ell +m)/2} = 2^{r + (m-\ell )/2}. \end{aligned}$$

We note that in order to make the different PCS executions essentially independent, we have to use a different flavor of \(F_{|\ell ,\ell }\) in each execution (as done in [14]). For example, we can define the i’th flavor as \(F_{|\ell ,\ell }^i(x)\) = \(F_{|\ell ,\ell }(x) + i\).

3.6.3 Clamping through precomputation [3, 4]

The goal here is to find \(2^m\) values \(x_i\) such that \(F(x_i)[1\text {--}r] = 0\) for a parameter r, given a function \(F:\{0,1\}^{\ell } \rightarrow \{0,1\}^n\) (for \(m+r \le \ell \le n\)). This can be done by using clamping through precomputation (CTP) [3, 4], as detailed below.

figure g

After exhausting \(T= 2^{m+r}\) values of \(x_i\), we expect L to contain \(2^{m+r - r} = 2^{m}\) strings that satisfy the r-bit condition \(F(x_i)[1\text {--}r] = 0\), as required.

3.7 Previous GBP tradeoff algorithms for \(K=2^k\) [5, 12]

The best known time-memory tradeoff algorithm for GBP was published independently by Nikolić and Sasaki [5, 12] and also by Biryukov and Khovratovich [5]. We describe this algorithm for GBP with \(K=2^k\) and \(M = 2^m\) words of memory, such that \(m \le n/(k+1)\) (see Fig. 6).

Given an integer parameter \(r \ge m\) to be determined later, the algorithm uses the truncated function \(H_{|r,r}\) (as defined in Sect. 2).

figure h

We now calculate the value of r that ensures that the algorithm succeeds to find a solution (with high probability). The K-tree algorithm is run using \(2^{k'} = 2^{k-1}\) lists, each of size \(2^m\), and it has to zero the remaining \(n' = n-r\) bits after the PCS is executed. Therefore, according to the analysis of the K-tree algorithm in Sect. 3.5 we require \(m = n'/(k'+1) = (n-r)/k\). This gives \(r = n-mk\).

The time complexity of the PCS algorithm is \(2^{(r+m)/2}= 2^{(n-m(k-1))/2}\), while the time complexity of the K-tree algorithm is \(2^m \le 2^{(r+m)/2}\) (and can be neglected).

Therefore, the total time complexity is \(T = 2^{(n-m(\log K-1))/2}\). This gives a time-memory tradeoff of

$$\begin{aligned} T^2M^{\log K -1} = N, \end{aligned}$$

assuming that \(M \le N^{1/(\log K + 1)}\). Note that when \(M = N^{1/(\log K + 1)}\), the PCS is similar to the first layer of the K-tree algorithm, as the function \(H_{|r,r}\) is iterated only once. In addition, observe that the two steps of the algorithm are not balanced since the time complexity of PCS is larger than the time complexity of the K-tree algorithm.

Fig. 6
figure 6

Previous GBP tradeoff algorithm for \(K=8\) with \(M=2^{n/8},T=2^{3n/8}\)

4 Construction of new basic list sum algorithms

The first part of our framework transforms exhaustive list sum algorithms (of type \(A_{K}\)) into basic ones of types \(A_{K,1}\) and \(A_{K,2^m}\) (that are useful for devising layered algorithms). In this section we transform both exhaustive symmetric and asymmetric algorithms described in Sects. 3.2 and 3.3, respectively. The most relevant basic list sum algorithms obtained in this section and in [9] are summarized in the first four entries of Table 1.

Table 1 Basic list sum algorithms

4.1 Preliminary construction and analysis of basic list sum algorithms

Recall that we denote the time complexity of \(A_{K}\) by \(2^{\tau _{K}m}\) for a parameter \(\tau _{K}\). As we show next, the time-memory tradeoff for \(A_{K,1}\) is of the form \(TM^{\alpha _K}=N\) for \(\alpha _K = K-\tau _{K}\).

The basic idea generalizes the one used to construct \(A_{4,1}\) in Sect. 3.4. We deal with an algorithm \(A_{K}\) that partitions the problem of size K into several smaller subproblems, solves each one for various choices of intermediate target values and for each such choice, merges the outputs, aiming to obtain a final solution. When we iterate over a subset that contains a \(2^{-b}\) fraction of the possible intermediate target values we essentially set an additional b-bit constraint on the returned solutions. Ideally, this allows to reduce the time complexity of the algorithm by a factor of \(2^b\) to \(2^{\tau _{K}m - b}\) at the expense of nullifying less bits: recall from (1) that by setting a b-bit constraint on the solutions, we can nullify \(n = Km-b-s = Km-b\) bits (as \(s=0\) for \(A_{K,1}\)). Therefore, we obtain a tradeoff of

$$\begin{aligned} TM^{K-\tau _{K}}=N. \end{aligned}$$
(2)

Indeed, after setting the b-bit constraint, we hope to reduce the time complexity to \(T = 2^{\tau _{K}m - b}\) and obtain \(TM^{K-\tau _{K}} = 2^{\tau _{K}m - b + m(K-\tau _{K})}= 2^{Km-b} = 2^n = N\). We stress that this is an ideal formula which cannot always be achieved using a concrete algorithm. We carefully design such algorithms below, aiming to apply the ideal formula to the widest range of parameters possible. This will be relatively simply for symmetric list sum algorithms, but requires deeper insight for asymmetric algorithms.

As deduced above in (2), ideally the tradeoff curve of a basic list sum algorithm of type \(A_{K,1}\) is of the form

$$\begin{aligned} TM^{\alpha _K}=N \end{aligned}$$
(3)

for a constant \(\alpha _K = K-\tau _{K}\). When considering the variant \(A_{K,2^m}\), we require \(2^m\) solutions and the number of bits that can be nullified is reduced from n to \(n-m\). Consequently, the tradeoff becomes \(TM^{\alpha _K}=N'\) for \(N' = 2^{n-m}\), giving

$$\begin{aligned} TM^{\alpha _K-1}=N. \end{aligned}$$
(4)

As a result, we do not need to analyze \(A_{K,2^m}\) separately.

4.2 Basic symmetric list sum algorithms

The algorithms \(A_{4,1}\) and \(A_{4,2^m}\) for \(K=4\) (that are applicable in the range \(T^{1/2} \le M \le T\)) were already constructed in Sect. 3.4, where we showed that indeed \(\alpha _4 = 4 - \tau _{4} = 2\). We continue with \(K=16\).

4.2.1 \(A_{16,1}\) and \(A_{16,2^m}\)

Following the general approach above, we extend the 16-way list sum algorithm \(A_{16}\) of Sect. 3.2. Since the time complexity of \(A_{16}\) is \(2^{11m}\), we have \(\tau _{16} = 11\) and \(\alpha _{16} = 16-\tau _{16} = 5\) as given by (2) and (3). Below, we perform this computation in more detail and calculate the range for which this tradeoff is applicable.

If we fix all the words \(c_1,c_2,c_3,c_4\) in \(A_{16}\) (such that their sum is 0), we cast a constraint of 9m bits on the 16-tuples and can nullify \(n = 16m - 9m = 7m\) bits in time complexity \(2^{2m}\) (which is the time complexity of the \(A_{4}\) algorithms).

More generally, when we vary \(2^v\) times the value of \(c_1,c_2,c_3,c_4\), we cast a \((9m-v)\)-bit constraint on the 16-tuples and nullify \(n = 16m - (9m-v)= 7m+v\) bits in time complexity \(T=2^{2m+v}\), giving a tradeoff of

$$\begin{aligned} TM^5=N, \end{aligned}$$

namely \(\alpha _{16} = 5\) as expected. Since we can choose any \(0 \le v \le 9m\), the tradeoff is applicable for \(T^{1/11} \le M \le T^{1/2}\).

4.2.2 Beyond 16-way list sum algorithms

In order to extend the tradeoff curve of \(A_{4,1}\) to smaller memory ranges of \(M \le T^{1/2}\) we squared K. We can continue to extend the curve to very small memory values in a similar way by defining \(A_{K}\) for \(K = 16^2 = 256\) and transforming it to \(A_{K,1}\). For even smaller memory ranges, we use \(K = 256^2 = 2^{16}\) and so forth.

4.3 Basic asymmetric list sum algorithms

4.3.1 \(A_{7,1}\) and \(A_{7,2^m}\)

We extend the 7-way dissection \(A_{7}\) of Sect. 3.3, whose time complexity is \(2^{\tau _{7}m} = 2^{4m}\) to \(A_{7,1}\). According to the preliminary analysis above, we have \(\alpha _{7} = 7-\tau _{7} = 3\), as obtained in more detail below.

If we fix the 2m-bit value c in the loop of \(A_{7}\), we set a 2m-bit constraint and can nullify \(7m-2m = 5m\) bits in time \(2^{4m - 2m}= 2^{2m}\). In general, when we vary \(2^v\) times the value of c, we nullify \(n = 5m+v\) bits in time \(T=2^{2m+v}\), giving

$$\begin{aligned} TM^3=N, \end{aligned}$$

namely \(\alpha _{7} = 3\) as obtained above. Since we can choose any \(0 \le v \le 2m\), the tradeoff is applicable for \(T^{1/4} \le M \le T^{1/2}\). The algorithm for the specific parameters \(M=2^{n/5},T=2^{2n/5}\) is sketched in Fig. 7.

Fig. 7
figure 7

\(A_{7,1}\) with \(M=2^{n/5},T=2^{2n/5}\)

4.3.2 \(A_{11,1}\) and \(A_{11,2^m}\)

We consider the next value of \(K=11\) in the dissection sequence described in Sect. 3.3, which has time complexity of \(T = 2^{7m}\) (nullifying 11m bits), i.e., \(\tau _{11}=7\). The main loop of \(A_{11}\) splits the problem into subproblem of sizes 4 and 7, while iterating over 3m intermediate target values. To construct \(A_{11,1}\), we can easily fix these 3m values which reduces the time complexity to \(2^{4m}\) (nullifying only 8m bits). In general, we obtain the tradeoff \(TM^4 = N\) (\(\alpha _{11} = 11-\tau _{11} = 4\)) for \(T^{1/7} \le M \le T^{1/4}\). Interestingly, we can recursively fix more values and extend this tradeoff to \(T^{1/7} \le M \le T^{1/2}\), which is crucial when the number of solutions is large. Below, we describe the algorithm for \(M = T^{1/2}\) (i.e., \(M=2^{n/6},T=2^{2n/6}\)). This algorithm is sketched in Fig. 8.

figure i

The complexity of both steps is \(2^{2m}\), hence \(T = 2^{2m}\). Altogether, \(3m + 2m = 5m\) bits are fixed and \(n = 11m-5m = 6m\) bits are nullified. Therefore, \(M=2^{n/6},T=2^{2n/6}\) as claimed. The ability to recursively fix target values (while maintaining the tradeoff of (3)) is a distinct feature of asymmetric algorithms. Next, we elaborate on which and how many values can be fixed this way for general K.

Fig. 8
figure 8

\(A_{11,1}\) with \(M=2^{n/6},T=2^{2n/6}\)

4.3.3 Generic analysis of basic asymmetric list sum algorithms

We analyze the transformation of the \(K_i\)-way dissection algorithm \(A_{K_i}\) (mentioned in Sect. 3.3) to \(A_{K_i,1}\). As described in Sect. 3.3, \(A_{K_i}\) for \(K_i = 1+i(i+1)/2\) runs in time \(2^{\tau _{K_i}m}\) for \(\tau _{K_i} = 1+ i(i-1)/2\). Therefore \(\alpha _{K_i} = K_i - \tau _{K_i} = i\), ideally giving the tradeoff

$$\begin{aligned} TM^i = N. \end{aligned}$$

Determining the range of parameters for which this tradeoff applies is more subtle, as demonstrated for \(A_{11,1}\) above. Recall from Sect. 3.3 that \(A_{K_i}\) splits the problem into subproblems of sizes \(i,i-1,i-2,\ldots ,3,2,2\), and applies the algorithms \(A_{i},A_{i-1},\ldots ,A_{3},A_{2},A_{2}\), respectively. Hence, the time complexity of the algorithm cannot be reduced below the time complexity of \(A_{i}\),Footnote 7 which is \(2^{\tau _{i}m}\). In conclusion, the tradeoff is applicable in the range \(T^{1/\tau _{K_i}} \le M \le T^{1/\tau _{i}}\).

For example, for \(i=3\), we have \(K_3 = 7, \tau _{7} = 4\) (as the time complexity of \(A_{K_3} = A_{7}\) is \(2^{4m}\)) and \(\tau _{3} = 2\) (as the time complexity of \(A_{i} = A_{3}\) is \(2^{2m}\)). Therefore, we obtain the tradeoff \(TM^3=N\), applicable for \(T^{1/4} \le M \le T^{1/2}\), which indeed coincides with the \(A_{7,1}\) tradeoff obtained above. For \(i=4\), we have \(K_4 = 11\) and obtain the tradeoff \(TM^4=N\), applicable for \(T^{1/7} \le M \le T^{1/2}\).

5 Construction of new multiple-layer list sum algorithms

The second part of our framework uses the basic \(A_{P,1}\) and \(A_{P,2^m}\) algorithms developed in the previous sections in order to construct multi-layeredFootnote 8 algorithms \(A_{K}^{P}\). We recall the P is the arity of the tree of the multi-layered list sum algorithm (e.g., the K-tree algorithm uses \(P=2\), as it merges the lists in pairs). In this section we construct the algorithm \(A_{K}^{P}\) for any given choice of KPM. However, recall from Sect. 2 that P is not a formal parameter of the list sum problem, hence it is a free parameter. Therefore, given the parameters KM, one should apply the algorithm \(A_{K}^{P}\) for a value of P that minimizes the time complexity.

The most relevant multi-layer list sum algorithms obtained in this section and in [15] are summarized in Table 2.

Table 2 Multiple-layer list sum algorithms

The analysis for \(A_{K}^{P}\) will use the parameter \(\alpha _{P}\) established for \(A_{P,1}\) according to its tradeoff curve (3).

5.1 Generic construction and analysis of multiple-layer algorithms

Given KP, we write \(K = P^{k} \cdot Q\), where \(1 \le Q < P\) and PkQ are integers (if K is not of the required form we round it down to \(K'\) of this form and apply the algorithm for \(K'\)). For example, if \(K = 32\) and \(P = 4\) then \(Q = 2\) since \(32 = 4^2 \cdot 2\). In this decomposition, we have \(k = \log _{P} K\) (the logarithm is rounded down to the nearest integer). We now construct the algorithm \(A_{K}^{P}\).

If \(Q=1\), \(A_{K}^{P}\) has \(k-1\) layers of \(A_{P,2^m}\) (merging the lists in groups of P, where each merge outputs a list of \(2^m\) inputs into the next layer) and a final layer of \(A_{P,1}\). If \(Q>1\), \(A_{K}^{P}\) has k layers of \(A_{P,2^m}\) and a final layer of \(A_{Q,1}\). For example, \(A_{16}^{4}\) has \(\log _{P} K - 1 = 2-1 = 1\) layer of \(A_{4,2^m}\) and one layer of \(A_{4,1}\), while \(A_{32}^{4}\) has \(\log _{P} K = 2\) layers of \(A_{4,2^m}\) and one layer of \(A_{2,1}\).

First, we analyze the case of \(K = P^{k}\) (namely, \(Q=1\)), based on the tradeoff parameter \(\alpha _{P}\) for \(A_{P,1}\) (as specified in Eq. (3)). Fix a parameter \(n'\) such that each of the \(k-1\) layers of \(A_{P,2^m}\) nullifies \(n'\) bits in time complexity \(2^{n' - (\alpha _{P}-1)m}\), according to the tradeoff curve (4) for \(A_{P,2^m}\). Altogether, \(n'(k-1)\) bits are nullified in these layers and \(n - n'(k-1)\) remain to be nullified by the final layer \(A_{P,1}\) in time \(2^{n - n'(k-1) - \alpha _{P} m}\). In order to balance the algorithm, we equate the time complexities of the layers by setting \(n' - (\alpha _{P}-1)m = n - n'(k-1) - \alpha _{P} m\) or \(n' = (n-m)/k\). Consequently, the time complexity of the layered algorithm is \(T = 2^{n' - (\alpha _{P}-1)m} = 2^{(n-m)/k - (\alpha _{P}-1)m} = 2^{(n-m(k(\alpha _{P}-1)+1))/k}\). This gives a time-memory tradeoff of \(T^{k}M^{k(\alpha _{P}-1)+1} = N\) or

$$\begin{aligned} T^{\log _{P} K}M^{\log _{P}K \cdot (\alpha _{P}-1)+1} = N. \end{aligned}$$
(5)

It is applicable for the same time-memory range as \(A_{P,1}\) (and \(A_{P,2^m}\)).

For example, in the K-tree algorithm, we have \(P =2\) and \(\alpha _2= 2 - \tau _{2} = 1\). Hence, the tradeoff of \(T^{\log _{P} K}M^{\log _{P}K \cdot (\alpha _2-1)+1}\) collapses to \(T^{\log K}M = N\). Setting \(T=M\) (which is the only point in which the K-tree algorithm is directly applicable) gives the standard formula of \(T^{\log K +1} = N\) or \(T = N^{1/(\log K +1)}\).

In case \(K = P^{k} \cdot Q\) for \(1< Q < P\), the generic analysis becomes more involved and depends on the time-memory tradeoff curve of \(A_{Q,1}\).

In this paper we focus on \(Q=2\), where the final merge is a basic \(A_{2,1}\) algorithm that runs in fixed time complexity of \(2^m\) and nullifies 2m bits. Fix a parameter \(n'\) such that each of the k layers of \(A_{P,2^m}\) nullifies \(n'\) bits in time complexity \(2^{n' - (\alpha _{P}-1)m}\). Altogether, \(n'k\) bits are nullified in these layers and \(n - n'k\) remain to be nullified by the final layer \(A_{2,1}\) in time \(2^{m}\). Since the final 2-way list sum algorithm nullifies 2m bits, we have \(n - n'k = 2m\) or \(n' = (n-2m)/k\).

The algorithm’s time complexity is dominated by the first k layers and is therefore \(T = 2^{n' - (\alpha _{P}-1)m} = 2^{(n-2m)/k - (\alpha _{P}-1)m} = 2^{(n- m(k(\alpha _P-1)+2))/k}\). This gives a time-memory tradeoff of

$$\begin{aligned} T^{\log _{P}K}M^{\log _{P}K \; \cdot (\alpha _{P}-1)+2} = N, \end{aligned}$$
(6)

applicable for the same time-memory parameter range as \(A_{P,1}\) (and \(A_{P,2^m}\)).

5.2 Analysis of specific multiple-layer list sum algorithms

We analyze the \(A_{K}^4\) algorithm according to the generic approach above. The analysis of the rest of the layered algorithms \(A_{K}^P\) summarized in Table 2 is obtained in a similar manner by plugging the relevant \(\alpha _{P}\) parameter into (5).

5.2.1 \(A_{K}^4\)

For arity \(P=4\), we only analyze values of K which are powers of 2 (as this allows direct comparison to previous tradeoffs for GBP). For \(A_{4,1}\), we have \(\alpha _4 = 2\), established in Sect. 4.2.

In case \(K=4^{k}\), we plug \(\log _{P} K = \log _{4} K = \log K /2\) and \(\alpha _P = \alpha _4 = 2\) into (5), obtaining

$$\begin{aligned} T^{(\log K)/2}M^{(\log K)/2 +1} = N, \end{aligned}$$

applicable for \(T^{1/2} \le M \le T\) (as the \(A_{4,1}\) and \(A_{4,2^m}\) algorithms).

We demonstrate the \(A_{16}^{4}\) algorithm (that works in two layers with \(Q=1\)) below.

figure j

In case \(K=2 \cdot 4^{k}\), we have \(P = 4\), \(Q=2\) and \(\log _{4} K = (\log K - 1) /2\). Plugging these values into (6) we obtain

$$\begin{aligned} T^{(\log K - 1) /2}M^{(\log K - 1) /2 + 2} = N, \end{aligned}$$

applicable for \(T^{1/2} \le M \le T\).

Unified formula for powers of 2 Unifying the tradeoff formulas obtained above to any \(K = 2^{k}\), we obtain

$$\begin{aligned} T^{\lfloor (\log K)/2 \rfloor }M^{\lceil (\log K)/2 \rceil +1} = N, \end{aligned}$$

applicable for \(T^{1/2} \le M \le T\).

6 Construction of new algorithms for the generalized birthday problem

The third part of our framework combines preparation phase and list sum algorithms (for \(S=1\)) to obtain new GBP algorithms. It combines either PCS (memory-consuming parallel collision search) or CTP (clamping through precomputation) with a given list sum algorithm. The generic formulas are then used to obtain improved tradeoffs for various specific K values (some of which are specified in Table 3). We focus on values of K that allow direct comparison to previous tradeoff curves in addition to values that are relevant to Sect. 7, where we analyze concrete GBP instances.

Table 3 Time-memory tradeoffs of GBP algorithms

For a parameter K, we construct a GBP algorithm assuming we have a list sum algorithm A (which can be a basic or layered algorithm) with \(S=1\) for a parameter \(K'\) (whose value will be either \(K' = K\) or \(K' = K/2\)). We assume that the time-memory tradeoff curve of A for \(K'\) is \(T^{\beta }M^{\gamma } = N\) for parameters \(\beta ,\gamma \). Our analysis also uses a parameter \(\delta \), which specifies the lower range limit for the tradeoff algorithm of A as \(M \ge T^{1/\delta }\). For example, we established the tradeoff \(TM^3=N\) for \(A_{7,1}\) in the range \(T^{1/4} \le M \le T^{1/2}\), hence we have \(\beta =1,\gamma =3,\delta = 4\). We will derive a basic tradeoff for \(M \ge T^{1/\delta }\) and then extend it to \(M < T^{1/\delta }\). Furthermore, the tradeoffs depend on the input range size \(L = 2^{\ell }\) since the complexity of the preparation phase algorithm varies according to whether or not \(\ell \) is below some value. For example, when applying PCS with a small value of \(\ell \), we have to apply it to an expanding function and adapt the complexity as specified on Sect. 3.6.

Altogether, a GBP tradeoff formula depends on a triplet of parameters (preparation phase algorithm, memory range, input size range) and there are \(2^3=8\) such possible triplets. However, only 6 are relevant in this paper and they are summarized in Table 4. Deriving these formulas is a mechanical procedure of parameter optimization. Below, we explicitly derive the first 4 tradeoffs for the specific case of \(K=14\) with the PCS preparation phase.

Table 4 Generic GBP time-memory tradeoffs formulas

6.1 Tradeoff algorithm for \(K=14\)

For \(K=14\) with the PCS preparation phase, we use an algorithm A for \(K' = K/2 = 7\), namely \(A_{7,1}\). The tradeoff formula for \(A_{7,1}\) is \(TM^3=N\) (i.e., its time complexity is \(T = 2^{n-3m}\)) in the range \(T^{1/4} \le M \le T^{1/2}\). The combination algorithm uses the truncated function \(H_{|r,r}\) (as defined in Sect. 2) for a parameter \(r \ge m\), set below to optimize the algorithm.

figure k

In the first step, we execute PCS with parameter r in time \(2^{(r+m)/2}\) (according to Sect. 3.6). In the second step we nullify the remaining \(n' = n-r\) bits by using \(A_{7,1}\) in time complexity \(2^{n'-3m} = 2^{n-r-3m}\). Then, to balance the two steps we require \((r+m)/2 = n-r-3m\) or \(r = (2n-7m)/3\), giving time complexity of \(T = 2^{(n-2m)/3}\) and a tradeoff of

$$\begin{aligned} T^3 M^2 = N. \end{aligned}$$

This matches Tradeoff 1 for \(K=14\) in Table 4 (recall that for \(A_{7,1}\), \(\beta =1,\gamma =3,\delta = 4\)). This tradeoff is valid for \(T^{1/4} \le M \le T^{1/2}\) (as \(A_{7,1}\)). The algorithm for parameters \(T=N^{1/4},M=N^{1/8}\) is sketched in Fig. 9.

When \(M < T^{1/4}\), we can extend the tradeoff by applying \(A_{7,1}\) to nullify less bits (i.e., with a smaller value of \(n'\)), implying that PCS will nullify more bits and dominate the complexity of the algorithm which becomes \(T = 2^{(r+m)/2} = 2^{(n-n'+m)/2}\). In order to calculate \(n'\), we use the tradeoff curve \(\hat{T}M^3=N'\) of \(A_{7,1}\) at its lower range \(M = \hat{T}^{1/4}\) or \(\hat{T} = M^{4}\) (here, \(\hat{T}\) denotes the time complexity of \(A_{7,1}\)) and obtain \(N' = M^{7}\), namely \(n' = 7m\). Therefore, \(T = 2^{(n-n'+m)/2} = (n - 6m)/2\), giving the tradeoff

$$\begin{aligned} T^2 M^6 = N. \end{aligned}$$

This matches Tradeoff 2 in Table 4.

6.1.1 Restricted domain

Recall from the first tradeoff that \(r = (2n-7m)/3\). In case \(\ell < r = (2n-7m)/3\) (the domain size of H is \(2^\ell < 2^r\)) we are forced to use \(H_{|\ell ,r}\) which is an expanding function. The time complexity of PCS for the expanding function \(H_{|\ell ,r}\) is \(2^{r + (m-\ell )/2}\) (as specified in Sect. 3.6), while the time complexity of \(A_{7,1}\) remains \(T = 2^{n-r-3m}\). Balancing the steps in this case gives \(r + (m-\ell )/2 = n-r-3m\) or \(r = n/2 - 7m/4 + \ell /4\) and time complexity of \(T = r + (m-\ell )/2 = n/2 - 5m/4 - \ell /4\). This gives the formula

$$\begin{aligned} T^{2} M^{5/2} L^{1/2} = N, \end{aligned}$$

matching Tradeoff 3 in Table 4.

When \(M < T^{1/4}\), the steps cannot be balanced as above and the time complexity becomes \(2^{r + (m-\ell )/2}\) (dominated by PCS). Here, \(r = n-n' = n - 7m\) (\(n' = 7m\) as in the case where the domain is not restricted). We obtain \(T = 2^{n - 13m/2 - \ell /2}\), giving the formula

$$\begin{aligned} TM^{13/2}L^{1/2} = N \end{aligned}$$

and matching Tradeoff 4 in Table 4.

Fig. 9
figure 9

GBP algorithm for \(K=14\) with \(M=2^{n/8},T=2^{n/4}\)

6.2 Tradeoff formulas for \(K=2^k\) (and \(K \ge 8\))

We now derive improved GBP tradeoffs for \(K=2^k\) assuming that \(K \ge 8\). These tradeoffs are obtained by combining PCS with multiple layers of the 4-way list sum algorithm \(A_{K/2}^4\) (described in Sect. 5) as the generic algorithm A. The formula is calculated according to Tradeoff 1 in Table 4.

We recall the formula of \(A_{K/2}^4\) from Table 2, whichFootnote 9 is \(T^{\lfloor (\log K-1)/2 \rfloor }M^{\lceil (\log K-1)/2 \rceil +1} = N\), namely \(\beta = \lfloor (\log K-1)/2 \rfloor = \lceil (\log K)/2 \rceil - 1\) and \(\gamma = \lceil (\log K-1)/2 \rceil +1 = \lfloor (\log K)/2 \rfloor + 1\). Adding 2 to \(\beta \) and reducing \(\gamma \) by 1 (as in Tradeoff 1 in Table 4), we obtain

$$\begin{aligned} T^{\lceil (\log K)/2 \rceil + 1 }M^{\lfloor (\log K)/2 \rfloor } = N, \end{aligned}$$

applicable for \(T^{1/2} \le M \le T\).

6.2.1 Extending the tradeoffs to \(M < T^{1/2}\)

In case \(M < T^{1/2}\), PCS dominates the algorithm’s time complexity and the formula is given in Tradeoff 2 in Table 4 as \(T^{2}M^{\delta \beta + \gamma -1}\), where \(\delta =2\). We obtain \(T^2M^{2 \lceil (\log K)/2 \rceil - 2 + \lfloor (\log K)/2 \rfloor } = N\), or

$$\begin{aligned} T^2M^{3 \lceil (\log K)/2 \rceil - 2 - (\log K)\bmod 2} = N. \end{aligned}$$

The tradeoff is applicable for \(M \le T^{1/2}\).

6.2.2 Tradeoff formulas for \(K=16\), \(K=32\) and beyond

The tradeoff curve for \(T^{1/2} \le M \le T\) above is the best we can obtain for \(K=2^k\). On the other hand, the curve for \(M < T^{1/2}\) is only optimal for \(K=8\) as for larger K values it is possible to apply more complex list sum algorithms as described in Sect. 5.

While we cannot obtain generic optimal formulas that are applicable to all values of \(K=2^k\) and \(M < T^{1/2}\), we further extend the tradeoffs for \(K=16\) and \(K=32\) in Table 3. The results were obtained using the formulas of the combination algorithms (Tradeoffs 1,2 in Table 4) after plugging in the parameters of the relevant list sum algorithms (specified as building blocks in Table 3). The methods to obtain tradeoffs for \(K = 64\) and beyond are similar.

6.3 Tradeoff formulas for \(K \in \{6,7,14\}\)

As for values of \(K=2^k\) analyzed above, the results for \(K \in \{6,7,14,15\}\) in Table 3 were obtained using the formulas of the combination algorithms with the list sum algorithms specified as building blocks in the table.

6.3.1 Restricted domains

Using combination algorithms for restricted domains \(L = 2^{\ell } < N\), we derive additional tradeoffs in Table 5.Footnote 10 These tradeoffs are applicable to the concrete GBP instances analyzed in the next section.

Table 5 Some GBP algorithms for restricted domains
Table 6 Complexities for concrete GBP instances

7 Restricted domain instances of the generalized birthday problem

The best known algorithm for GBP instances with restricted domain is the extended K-tree algorithm [11], which can be directly applied to values of \(K=2^k\) for any \(N^{1/K} \le L \le N^{1/(\log K + 1)}\). Here, we use our GBP tradeoffs derived in Sect. 6.3 (specified in Table 5) and compare the results for some GBP instances in Table 6.

For fair comparison, we multiply the time and memory complexities obtained by our general formulas above by K. For example, for \(K=6,n=120,\ell =37\) the relevant curve in Table 5 is \(TM^{5/2}L^{1/2} = N\) for \(M \le T^{1/2}\). To optimize the time complexity we set \(M = T^{1/2}\), giving \(T = N^{4/9}L^{-2/9} \approx 2^{45}\). After multiplication by \(K=6\), we obtain \(M \approx 2^{25}, T \approx 2^{48}\).

Generally, our tradeoffs give better results for values of K that are far away from their nearest smaller power of 2. However, we remark that our algorithms also give interesting tradeoffs for some instances with values of K that are powers of 2. For example, for \(K=16,n=120,\ell =15\), the extended K-tree algorithm gives \(T=M=2^{34}\). Assume we want to keep the memory close to \(2^{15}\), then we can directly apply the 16-way list sum algorithm \(A_{16}^{4}\) of Sect. 5 (with the tradeoff of \(T^2M^3 = N\)) and obtain (after multiplication with \(K=16\)), \(M = 2^{19}\) and \(T \approx 2^{41.5}\). Hence we reduce the memory complexity by a factor of \(2^{15}\) and increase the time complexity by \(2^{7.5}\), which may be preferable is practical settings.

8 Conclusions

In this paper, we devised new GBP time-memory tradeoff algorithms, improving the state-of-the-art for a large range of parameters. The improvement is mainly due to a careful transformation of exhaustive list sum memory-efficient algorithms (i.e., variants of the Schroeppel–Shamir algorithm and dissection algorithms) to algorithms that efficiently find a single solution to the list sum problem. It is thus plausible that future improvements to exhaustive list sum algorithms will also result in improved time-memory tradeoffs for GBP using our transformation.