Keywords

1 Introduction

Cryptographic hash functions take an arbitrary length message as input and generate a fixed-length bit string. One of the most important security criteria is collision resistance. For a hash function \(\mathcal {H}: \{0,1\}^* \rightarrow \{0,1\}^n\), the complexity to find two distinct values \(x_1\) and \(x_2\) such that \(\mathcal {H}(x_1) = \mathcal {H}(x_2)\) should be \(O(2^{n/2})\). The collision resistance is a practically relevant notion. For example, Stevens et al. [37], in their attack against SHA-1, forged two PDF documents with the same hash digest that display different arbitrarily-chosen visual contents.

The SHA-2 family is one of the most important hash functions at the present time, which is specified and standardized by NIST [32]. There are two core algorithms; SHA-256 and SHA-512, depending on the word size. Moreover four schemes are additionally specified depending on the output size. SHA-2 are used in wide range of communication protocols such as TLS/SSL, SSH, and IPsec. SHA-2 are also used by the digital currency such as Bitcoin. After the recent break of SHA-1 [22], industry accelerated the migration to SHA-2.

History of SHA-2 Cryptanalysis. SHA-2 received a massive amount of security analysis. Preimage attacks were studied in [1, 13, 18, 20] and a conversion to pseudo-collisions was studied in [23]. Those would work relatively a large number of rounds, say 52 out of 64 steps of SHA-256 [20], while those only achieve a marginal amount of speed up. Those are interesting theoretical results but not strongly related to this research. As a non-random property, second-order differential collisions defined over four distinct inputs were studied [3].

More relevant works to this research are the attempts to apply previous collision finding techniques to SHA-2 or to find collisions on reduced-step SHA-2. The challenge to find a collision on reduced-step SHA-2 was initiated by [31], which found a collision on 19 out of 64 steps of SHA-224. This is a pioneering work to construct differential characteristic only having a single local collision. Then, this type of local collisions were manually optimized to find collisions of 21 steps of SHA-256 [33], and later improved to 22 steps [35], and to 24 steps and extended to SHA-512 [17, 36]. However, it was indicated that the local collision by [33] could work only up to 24 rounds [17] and indeed this was the last work for improving the manually detected local collision.

The most recent technical innovation is the development of automated differential characteristic search tools, which was initiated by Mendel et al. [27] to find a collision on 27 steps of SHA-256. Because of the search space, the efficiency of the algorithm is crucial for the automated search tool. Mendel et al. improved the algorithm and presented a 31-step collision attack and a 38-step semi-free-start collision attack against SHA-256 [29].Footnote 1 This is the current best (semi-free-start) collision attacks for SHA-256. The algorithm was further improved to apply it to SHA-512 [10], SHA-512/224 and SHA-512-256 [7]. For SHA-512, 27-step collisions and 39-step semi-free-start collisions [7] are the current best results.

Techniques for Finding SHA-2 Collisions. For the attack on SHA-256, Mendel et al. [29] presented a framework to convert semi-free-start collision attacks having some special property into a 2-block collision. The framework is illustrated in Fig. 1. The attacker first analyzes the second block without fixing IV for the second block, \(\mathrm {IV}_{\mathrm {second}}\). A semi-free-start collision attack that can work for \(2^X\) choices, typically for any unfixed X bits, of \(\mathrm {IV}_{\mathrm {second}}\) is located in the second block. Then, the attacker tests \(2^{n-X}\) messages for the first block to hit one of \(2^X\) choices of \(\mathrm {IV}_{\mathrm {second}}\), typically to hit the fixed \(n-X\) bits of \(\mathrm {IV}_{\mathrm {second}}\). Finally, the attacker determines the rest part of the second block to generate a 2-block collision.

Fig. 1.
figure 1

Converting Semi-free-start Collisions into 2-block Collisions.

The cost for the first block is \(2^{256-X}\) for SHA-256. To be faster than the birthday paradox, X must satisfy \(X>128\). To achieve such a semi-free-start collision attack, the previous work [29] generated a differential characteristic such that the characteristic can be satisfied for any value of the first five message words. Hence, it achieves \(X = 160\). (As explained later, those five message words can be adjusted to achieve a fixed 160-bit internal state value for any 160 bits of \(\mathrm {IV}_{\mathrm {second}}\).)

Dedicated Quantum Collision Attacks. Recently, it has been shown that collision attacks on hash functions with quantum machines can break more rounds than the attacks with classical machines [14]. Whether a hash function is attacked or not is judged by comparing the complexity of the generic attack (birthday paradox) and a dedicated attack. To find a collision, dedicated attacks mostly apply differential cryptanalysis. With quantum machines, the speed of finding a value satisfying a differential characteristic becomes a square root compared to classical machines, while the speed of the generic collision attack cannot be a square root of the birthday paradox, \(O(2^{n/4})\). Indeed, the tight bound of the query complexity to find a collision was proven to be \(O(2^{n/3})\) [38]. As a result, dedicated attacks can be stronger when quantum machines are available. In fact, such cases were observed for AES hashing modes [9, 14] and Gimli [11].

In the quantum setting, the generic attack complexity of finding collisions depends on settings about the resource that an attacker can use. The previous work discussed three settings. In the first setting, a small (polynomial size) quantum computer and a large (exponential size) qRAM. In the second setting, a small (polynomial size) quantum computer and a large (exponential size) classical memory, In the third setting, efficiency of quantum algorithms are evaluated by their time-space tradeoff.

In this paper, we focus on the third setting, of which details are as follows. Note that we do not take error corrections into account and consider that the running time of a quantum circuit is proportional to the depth of the circuit.

  • Cost metric of time-space tradeoff. The efficiency of an attack is evaluated by the tradeoff between T and S, where T is the attack time complexity (or, the depth of the quantum circuit) and S is the hardware size required for the attack (i.e., S is the maximum size of quantum computers (or, width of quantum circuits) and classical computers). S can be exponentially large, and we do not make distinction between qubits for computation and qubits for memory. Bernstein [2] observed that, when a classical computer of size S is available, by using the parallel rho method [34] we can find a collision of a random function in time \(T=O(2^{n/2}/S)\). There does not exist a quantum attack on a random function that achieves a better tradeoff than this classical attack.Footnote 2 Hence, a dedicated quantum collision attack on a concrete hash function that uses a quantum computer of size S is considered to be valid if its time complexity T is less than \(2^{n/2}/S\).

The condition \(T < 2^{n/2}/S\) is equivalent to \(T \cdot S < 2^{n/2}\). Hence the efficiency of a quantum attack in the time-space tradeoff metric is evaluated by the multiplication of T and S, and the threshold for the attack to be valid is \(2^{n/2}\).

Jaques and Schanck [19] showed that when error correction is necessary and quantum memory is actively corrected, it is realistic to model that the cost of a quantum attack is proportional to the multiplication of the depth and the width of the quantum circuit used in the attack. Therefore, although we do not care about error corrections in our complexity analysis, the cost metric of time-space tradeoff is in fact reasonable from the view point of cost estimation including quantum error correction (when quantum memory is actively corrected).

Research Challenge. The collision resistance of SHA-2 family in the quantum setting has not been studied before.Footnote 3 In fact, this is not a simple task. As mentioned before, the current differential characteristics for SHA-2 collision attacks consist of a single local collision. The previous work showed [14] that the cost to satisfy an uncontrolled part of the differential characteristic can be square root, while the differential characteristic for SHA-2 does not have such a form. Thus this issue deserves careful investigation.

Our Contributions. In this paper, we present quantum collision attacks on SHA-256 and on SHA-512 that break more rounds than the attacks in the classical setting. Our attacks are valid in the time-space tradeoff cost metric. The number of attacked steps is compared in Table 1.

Table 1. Comparison of the attack results. The quantum attacks on SHA-256 and SHA-512 are faster than the generic attack as long as \(S < 2^{12}\) and \(S < 2^{6.6}\), respectively.

To generate collisions, we follow the same approach as the previous work. Namely, we locate a semi-free-start collision in the second block and find a first-block message to hit one of IVs that is acceptable for the second block. In the previous work, it is principally inevitable that the semi-free-start collision attack must work for at least \(2^X\) choices of IVs, where \(X > 128\) for SHA-256. This is a strong requirement, which significantly restricts the search space to find a suitable differential characteristic. We observe that if quantum machines are available, we can construct an attack with an intuitive condition of \(X > 0\) by ignoring the constant factor. In practice, the constant factor cannot be ignored and we will show a rigorous complexity analysis.

For SHA-256, the previous work [29] found a differential characteristic with \(X > 128\) up to 31 steps, while unconditioned semi-free-start collisions could be generated for 38 steps. Hence we start from the 38-step semi-free-start collision example generated by [29] and slightly modify its message words so that semi-free-start collisions can be generated for multiple IVs. We achieve \(X \approx 20\) for 38-step SHA-256. If we have a quantum computer of size S, the attack complexity is about \(c \cdot \sqrt{2^{256 - 20}/S} = 2^{122}/\sqrt{S}\), where c is a small constant and rigorous analysis shows \(c \approx 2^4\). Because the generic attack cost under the time-space metric is \(2^{128}/S\), our attack is faster than the generic attack when \(S < 2^{12}\).

For SHA-512, it seems difficult to build a differential characteristic with a lot of degrees of freedom such as \(X > 256\). In fact, the previous work [7] could not apply the 2-block conversion, and the current strategy is limited to be a single-block attack. In this paper, we observe that the 39-step semi-free-start collision attack [7] can accept multiple choices of \(\mathrm {IV}\) with some X that is much smaller than 256, and will convert it into 2-block collision in the quantum setting.

As we mentioned before, the previous work [14] discussed three settings depending on available computational resources. In fact our attacks are valid only in the setting of time-space tradeoff because the time complexity exceed the generic complexity in other settings. Nevertheless, we would like to remark that dedicated attacks that are valid in this setting (including our attacks) are always better than the generic attacks in other settings from the viewpoint of time-space tradeoff. This is because the generic attacks in other settings have time-space tradeoff \(T^2 \cdot S = 2^n\), which is worse than the trade-off \(T \cdot S = 2^n\) of the generic attack in our setting.Footnote 4

Some readers may think that our attacks are invalid because the margin of our attacks (compared to the generic attack) are too small while we do not take the overhead for quantum computation, or their complexity does not significantly outperform the classical complexity. However, security of symmetric-key primitives is generally measured under the most vulnerable environment (they must resist any attacks in any nitpicked setting like \(S=1\)). The principle of security under the most vulnerable environment makes it natural to ignore the overhead because the overhead for quantum computation may drastically be reduced by future technical developments. In addition, when reduced-step variants of symmetric-key primitives are analyzed, the most important factors is the number of attacked steps rather than the attack cost. Our quantum attacks break significantly more steps than the classical attacks.

Remark 1

For reference, we also provide discussions on comparison between our attacks and a generic collision attack based on the multi-target preimage search. See Section B of this paper’s full version [15] for details.

Future Directions. Due to its simplicity, we believe that the idea of our 2-block quantum collision attacks is applicable to other cryptographic hash functions. It will also be interesting to study optimizations of differential characteristics for the classical semi-free-start collision attack with respect to the conversion to the quantum collision attack. Some observations and initial work will be provided in the last part of the paper.

Paper Organization. Section 2 is preliminaries. Section 3 explains the previous collision and semi-free-start collision attacks. Section 4 explains our observation that is used in our quantum attacks. Sections 5 and 6 show the attack algorithms and their evaluations. Section 7 provides discussion toward future applications of our attack idea. Finally, we conclude this paper in Sect. 8.

2 Preliminaries

For n-bit strings x and y, \(\lnot x\), \(x \wedge y\), \(x \vee y\) and \(x \oplus y\) denote the bit-wise negation of x, the bit-wise AND on x and y, the bit-wise OR on x and y, and the bit-wise XOR on x and y, respectively. For an n-bit string x and a non-negative integer m such that \(m \le n\), \(x \gg m\) (resp., \(x \ggg m\)) denotes the m-bit right shift operation on x (resp., the m-bit circular right shift operation on x). We identify the set of n-bit strings \(\{0,1\}^n\) with the sets \(\{0,\ldots ,2^n-1\}\) and \(\mathbb {Z}/2^n\mathbb {Z}\). \(x + y\) denotes the modular addition of x and y for \(x, y \in \mathbb {Z}/2^n\mathbb {Z}\), unless otherwise noted. Sometimes we use the symbol \(\boxplus \) instead of \(+\). We assume that readers are familiar with basics on quantum computationFootnote 5.

2.1 Specification of SHA-256 and SHA-512

SHA-256 and SHA-512 adopt the Merkle-Damgård construction, and their compression functions adopt the Davies-Meyer construction. Let w be the word size, which is 32 for SHA-256 and 64 for SHA-512. The length of message blocks is 16w bits (512 bits for SHA-256 and 1024 bits for SHA-512), and the length of chaining values and final outputs is 8w bits (256 bits for SHA-256 and 512 bits for SHA-512).

Given a chaining value (or the initial value IV) \(H = (H_0, \ldots ,H_7) \in (\{0,1\}^{w})^8\) and a message block \(M = (M_0,\ldots ,M_{15})\in (\{0,1\}^w)^{16}\), the output value of the compression function f(HM) is computed by iteratively updating internal states as follows. The number of steps, which is denoted by r, is 64 for SHA-256 and 80 for SHA-512.

  1. 1.

    (Message expansion.) Compute \(W_i\) (\(i=0,\ldots ,r-1\)) by

    $$\begin{aligned} W_i := {\left\{ \begin{array}{ll} M_i &{}\text { for } i=0,\ldots ,15, \\ \sigma _1(W_{i-2}) + W_{i-7} + \sigma _0(W_{i-15}) + W_{i-16} &{}\text { for } i=16,\ldots ,r-1. \end{array}\right. } \end{aligned}$$

    The functions \(\sigma _0,\sigma _1 : \{0,1\}^w \rightarrow \{0,1\}^w\) are defined later.

  2. 2.

    (Iterative state updates.) Set \(st_{-1} := H\). For \(i=0,\ldots ,r-1\), update the 8w-bit state \(st_{i-1} = (A_{i-1},A_{i-2},A_{i-3},A_{i-4},E_{i-1},E_{i-2},E_{i-3},E_{i-4})\) to \(st_{i} = (A_{i},A_{i-1},A_{i-2},A_{i-3},E_{i},E_{i-1},E_{i-2},E_{i-3}),\) where

    $$\begin{aligned} E_i&:= E_{i-4} + A_{i-4} + \varSigma _1(E_{i-1}) + \mathrm {IF}(E_{i-1},E_{i-2},E_{i-3}) + K_i + W_i, \\ A_i&:= \varSigma _0(A_{i-1}) + \mathrm {MAJ}(A_{i-1},A_{i-2},A_{i-3}) + E_i - A_{i-4}. \end{aligned}$$

    The functions \(\mathrm {IF},\mathrm {MAJ} : (\{0,1\}^w)^3 \rightarrow \{0,1\}^w\) and \(\varSigma _0,\varSigma _1 : \{0,1\}^w \rightarrow \{0,1\}^w\) are defined later. \(K_i\) is a step-dependent constant. Since the value of \(K_i\) does not affect our attacks, we omit the value of \(K_i\). See also Fig. 2.

  3. 3.

    Compute the next chaining value f(HM) as \( f(H,M) := st_{r-1} + H.\) (Only here, the symbol “+” denotes the word-wise modular addition.)

Fig. 2.
figure 2

This is an alternative representation of the state update function devised by the previous work [27]. The operation “\(\times (-1)\)” denotes the multiplication by \((-1)\) in \(\mathbb {Z}/2^w\mathbb {Z}\).

The functions \(\mathrm {IF},\mathrm {MAJ} : (\{0,1\}^w)^3 \rightarrow \{0,1\}^w\) are defined as

$$\begin{aligned} \mathrm {IF}(x,y,z) = (x \wedge y) \oplus ((\lnot x) \wedge z), \quad \mathrm {MAJ}(x,y,z) = (x \wedge y) \oplus (y \wedge z) \oplus (z \wedge x) \\ \end{aligned}$$

for both of SHA-256 and SHA-512. In addition, \(\varSigma _0,\varSigma _1,\sigma _0,\sigma _1\) are defined by

$$\begin{aligned} \varSigma _0(x)&= (x \ggg 2) \oplus (x \ggg 13) \oplus (x \ggg 22),\\ \sigma _0(x)&= (x \ggg 7) \oplus (x \ggg 18) \oplus (x \gg 3), \\ \varSigma _1(x)&= (x \ggg 6) \oplus (x \ggg 11) \oplus (x \ggg 25),\\ \sigma _1(x)&= (x \ggg 17) \oplus (x \ggg 19) \oplus (x \gg 10) \end{aligned}$$

for SHA-256, and

$$\begin{aligned} \varSigma _0(x)&= (x \ggg 28) \oplus (x \ggg 34) \oplus (x \ggg 39),\\ \sigma _0(x)&= (x \ggg 1) \oplus (x \ggg 8) \oplus (x \gg 7), \\ \varSigma _1(x)&= (x \ggg 14) \oplus (x \ggg 18) \oplus (x \ggg 41),\\ \sigma _1(x)&= (x \ggg 19) \oplus (x \ggg 61) \oplus (x \gg 6) \end{aligned}$$

for SHA-512.

Let \(W_{i,j}\) denote bit j of \(W_i\), where \(W_{i,0}\) is the least significant bit and \(W_{i,w-1}\) is the most significant bit. We also use the same notation to denote bit positions for other variables such as \(A_i\) and \(E_i\).

2.2 Quantum Computation

We use the quantum circuit model as the model of quantum computation. Let H denote the Hadamard operator defined by \(H \mathinner {|{b}\rangle } = \sum _{c \in \{0,1\}}(-1)^{b \cdot c}\mathinner {|{c}\rangle }\) for \(b \in \{0,1\}\). The quantum oracle of a function \(f:\{0,1\}^m \rightarrow \{0,1\}^n\) is the unitary operator \(O_f\) defined by \(O_f\mathinner {|{x}\rangle } \mathinner {|{y}\rangle } = \mathinner {|{x}\rangle }\mathinner {|{y \oplus f(x)}\rangle }\) for \(x \in \{0,1\}^m\) and \(y \in \{0,1\}^n\).

Grover’s Algorithm. Grover’s algorithm [12] is the quantum algorithm to solve the following database search problem.

Problem 1

Let \(F : \{0,1\}^n \rightarrow \{0,1\}\) be a function such that \(|F^{-1}(1)| > 0\). Given a (quantum) oracle access to F, find x such that \(F(x)=1\).

Let \(t := |F^{-1}(1)|\). We always consider the case that \(t/2^n \ll 1\). Though \(O(2^n/t)\) queries are required for classical algorithms to solve the problem, Grover’s algorithm solves the problem only with \(O(\sqrt{2^n/t})\) quantum queries.

More precisely, suppose that there exists a quantum circuit that computes F in time \(T_F\) by using \(S_F\) qubits (i.e., the depth and width of the circuit are \(T_F\) and \(S_F\), respectively). Then, Grover’s algorithm finds a solution in time \(T_F \cdot (\pi /4) \cdot \sqrt{2^n/t}\), by using \(S_F + 1\) qubits.

Details of Grover’s Algorithm. For a positive integer i, let \(\mathsf {Grov}(F,i)\) be the quantum algorithm that runs the following procedure:

  1. 1.

    Prepare the initial state \(\mathinner {|{\psi _{\mathsf {init}}}\rangle } := H^{\otimes (n+1)}\mathinner {|{0^n}\rangle }\mathinner {|{1}\rangle }\).

  2. 2.

    Let \(\theta \) be the value that satisfies \(\sin ^2\theta = t/2^n\) and \(0 \le \theta \le \pi /2\). Apply the unitary operator \(Q_F := - (H^{\otimes n} \otimes I)(O_0 \otimes I)(H^{\otimes n} \otimes I) O_F \) iteratively i times on \(\mathinner {|{\psi _{\mathsf {init}}}\rangle }\). Here, \(O_F\) is the quantum oracle of F, and \(O_0\) is the operator such that \(O_0\mathinner {|{x}\rangle } = (-1)^{\delta _{x,0^n}}\mathinner {|{x}\rangle }\) (\(\delta _{x,y}\) is Kronecker’s delta such that \(\delta _{x,y} = 1\) if \(x = y\) and \(\delta _{x,y} = 0\) if \(x \ne y\)).

  3. 3.

    Measure the resulting state \(Q^{i}_F \mathinner {|{\psi _{\mathsf {init}}}\rangle }\), and output the most significant n bits.

Boyer et al. showed that, when we set the number of iterations i to be \(\lfloor \pi /4 \theta \rfloor \), the algorithm \(\mathsf {Grov}(F,\lfloor \pi /4 \theta \rfloor )\) outputs x such that \(F(x) = 1\) with a probability at least \(1 - t/N\) [4]. Since \(\pi /4 \theta \le \pi /(4 \sin \theta ) = (\pi /4)\sqrt{2^n/t}\) holds, the running time of \(\mathsf {Grov}(F,\lfloor \pi /4 \theta \rfloor )\) is at most \(T_F \cdot (\pi /4) \sqrt{2^n/t}\).

Remark 2

In the above arguments, we implicitly assume that t is known in advance. If t is not known in advance, we have to perform a more sophisticated procedure, which increases the total number of queries to F by a constant factor [4].

Parallelization. When \(P \ge 2\) quantum computers are available, by running P copies of \(\mathsf {Grov}(F,\lfloor \pi /4\theta \sqrt{P} \rfloor )\) in parallel, we can find a solution in time \(T_F \cdot \frac{\pi }{4}\sqrt{2^n/(t\cdot P)}\) with a probability at least \(1-1/e\) (we always consider the case that \((t\cdot P)/2^n \ll 1\)). For completeness, we provide detailed explanations on the success probability in Section C of this paper’s full version [15]

Cost Evaluation. As mentioned in Sect. 1, we evaluate the complexity of the attacks in the setting of time-space tradeoff. We do not take costs of quantum error corrections into account, and we consider that the running time of a quantum circuit is proportional to the depth of the circuit.

In each attack, we assume that there exists an implementation of the attack target primitive (i.e., SHA-256 or SHA-512) on a quantum circuit \(\mathcal {C}\), and we regard that the unit of depth (resp., width) of quantum circuits is the depth (resp., width) of \(\mathcal {C}\), so that our cost estimation will be independent from implementation methods of primitives.

In addition, we do not take communication costs into account. That is, we assume that arbitrary two-qubit quantum gate can be applied to arbitrary pair of qubits. The communication costs will not be significant in our attacks because we use quantum circuits just for running the Grover search (or, its simple parallelization) that requires only several times as much qubits as implementations of SHA-2 use.

3 Previous Works

This section provides an overview on the collision attack on 31-step SHA-256 in [29], the semi-free-start collision attack on 38-step SHA-256 in [29], and the semi-free-start collision attack on 39-step SHA-512 in [7, 8].

3.1 Collision Attack on 31-Step SHA-256

The collision attack on 31-step SHA-256 in [29] finds a 2-block collision with time complexity \(2^{65.5}\). Intuitively, a 2-block collision \((\tilde{M}||M,\tilde{M}||M')\) (here, \(\tilde{M}, M, M'\) are in \(\{0,1\}^{512}\), and \(M \ne M'\)) is constructed by searching for a random message \(\tilde{M}\) for the first block and a semi-free-start collision \((M,M')\) for the second block such that the output of the first block is the IV of the second block.

Semi-free-start collisions in the second block are constructed based on a local collision that starts at step 5 and ends at step 18, which is found by using heuristic automated search tools. The tool finds both of differential characteristics and conditions for message pairs \((M,M')\) at the same time. See Table 2 for the differential characteristic and conditions for \((M,M')\) shown in [29]. The meanings of the notations in Table 2 are as follows:

  1. 1.

    -” indicates that the bit associated with M at the position must be equal to the corresponding bit associated with \(M'\).

  2. 2.

    0” indicates that the bit at the position must be 0 for both of M and \(M'\).

  3. 3.

    1” indicates that the bit at the position must be 1 for both of M and \(M'\).

  4. 4.

    u” indicates that the bit at the position must be 1 for M and 0 for \(M'\).

  5. 5.

    n” indicates that the bit at the position must be 0 for M and 1 for \(M'\).

See also Remark 3. For each i, by \(A_i,E_i,W_i\) we denote the words of internal states and expanded messages as described in Sect. 2.1.

Table 2. The 31-step differential characteristic for SHA-256 shown in [29].

The authors of [29] also show an example of a semi-free-start collision of 31-step SHA-256 that satisfies the differential characteristic. See Table 6 of this paper’s full version [15] for details.

Table 3. The position of the message words where non-zero differences appear. “\(\bigcirc \)” indicates that the word has non-zero difference. “\(\times \)” indicates that the word is computed from previous words with non-zero differences but the difference is canceled out. (\(W_i\) is computed from \(W_{i-2}\), \(W_{i-7}\), \(W_{i-15}\), and \(W_{i-16}\) for \(i \ge 16\).)

Attack Procedure. Next, we describe the attack procedure. The important features of the differential characteristic in Table 2 are summarized as follows:

  1. 1.

    Only seven message words \((W_5,\ldots ,W_9,W_{16},W_{18})\) have differences. Since \(W_0,\ldots ,W_4,W_{10},\ldots ,W_{15}\) do not have differences, \(W_{17},W_{19},W_{26},\ldots , W_{30}\) do not have differences, either. The differences at \(W_{20},\ldots ,W_{25}\) need to be canceled out (see Table 3).

  2. 2.

    No condition is imposed on the first five message words \(W_0,\ldots ,W_4\), thus those can be chosen freely.

By using these properties, the authors of [29] first show an attack with complexity \(2^{99.5}\), and then show how to reduce the complexity to \(2^{65.5}\).

The First Attack with Complexity \(2^{99.5}\). Let f denote the (31-step) compression function. The procedure of the collision attack with complexity \(2^{99.5}\) is as follows.

  1. I.

    Use the automatic search tool to determine the message words \(W_5,\ldots ,W_{12}\) and the internal states from the beginning of step 5 to the end of step 12 (in the second block). Though \(W_0,\ldots ,W_4\) have not been chosen yet at this step, the values of the variables \(E_1, \ldots , E_4\) and \(A_{-3},\ldots ,A_4\) are completely determined by the internal state at the beginning of step 5 (see also Fig. 4 of this paper’s full version [15] for details). Note that \(A_{-1}||A_{-2}||A_{-3}\) correspond to the 96 most significant bits of the initial value of the second block.

  2. II.

    Find a message \(\tilde{M}\) for the first block such that the 96 most significant bits of \(f(\mathrm {IV},\tilde{M})\) is equal to \(A_{-1} || A_{-2} || A_{-3}\). Compute the (uniquely determined) values \(W_0,\ldots ,W_4\) that is compatible with the chaining value \(f(\mathrm {IV},\tilde{M})\) and the state at the beginning of step 5.

  3. III.

    Now, \(W_0,\ldots ,W_{12}\) have been chosen. Use degrees of freedom in \(W_{13},W_{14},W_{15}\) to fulfill the conditions on \(E_{13},E_{14},E_{15}\), \(W_{16}\), and \(W_{18}\) (in addition to the cancellation of differences at \(W_{20},\ldots ,W_{25}\)). If it fails, go back to Step II.

Step II requires time \(2^{96}\). According to the authors of [29], Step I of the attack takes only seconds, and Step III succeeds with a probability about 1/12 due to the lack of degrees of freedom in \(W_{13},W_{14},W_{15}\), which was verified experimentally. The total time complexity is estimated as \(12 \cdot 2^{96} \approx 2^{99.5}\).

The Second Attack with Complexity \(2^{65.5}\). The attack complexity is reduced from \(2^{99.5}\) to \(2^{65.5}\) by computing many solutions in Step I. The idea is as follows. Suppose that \(\ell \) solutions can be found for Step I (they are stored in a list). Then, the complexity of Step II can be reduced from \(2^{96}\) to \(2^{96}/\ell \). If a single solution in Step I can be found in time \(T_{I}\), then the overall complexity of the attack becomes \(T_{I} \cdot \ell + 12 \cdot 2^{96}/\ell \).

The authors of [29] claim that \(T_I \approx 2^{25.5}\), and their experiments indicate that they can expect \(\ell \approx 2^{34}\). Based on these observations, they deduced that a collision can be found with complexity \(2^{25.5} \cdot 2^{34} + 12 \cdot 2^{96}/2^{34} \approx 2^{65.5}\).

3.2 Semi-Free-Start Collision Attack on 38-Step SHA-256

As well as the semi-free-start collisions in the 31-step collision attack, the semi-free-start collision of 38-step SHA-256 in [29] is constructed based on a local collision that starts at step 7 and ends at step 24, which are also found by using the heuristic automated search tool.

See Table 4 for the differential characteristic and the conditions for confirming message pairs shown in [29]. (See also Remark 3.) The semi-free-start collision of 38-step SHA-256 given in [29] is shown in Table 7 of this paper’s full version [15].

Table 4. The 38-step differential characteristic for SHA-256 shown in [29].

3.3 Semi-Free-Start Collision Attack on 39-Step SHA-512

The semi-free-start collision of 39-step SHA-512 in [7, 8] is also constructed based on a local collision that starts at step 8 and ends at step 26, which is also found by using the heuristic automated search tool.

See Table 5 of this paper’s full version [15] for the differential characteristic and the conditions for confirming message pairs shown in [8]. (See also Remark 3.) The semi-free-start collision of 39-step SHA-512 given in [7, 8] is shown in Table 8 of this paper’s full version [15].

Remark 3

To be precise some bits in the differential characteristics in Tables 2, 4, and 5 of this paper’s full version [15] have additional conditions. They are shown in the original papers [7, 8, 29] but we omit to show them because they are not significantly relevant to the basic idea of our attacks.

4 Observations and Ideas for Quantum Collision Attacks

For SHA-256, the previous 38-step semi-free-start collision is not converted into a collision while the 31-step semi-free-start collision is converted. For SHA-512, the previous 39-step semi-free-start collision is not converted.

In Sect. 4.1, we explain details on the reason that the semi-free-start collisions of 38-step SHA-256 and 39-step SHA-512 are not converted into collisions in the classical setting, based on the explanation in [7, 8]. In Sect. 4.2, we explain our basic ideas on how to apply the conversion in the quantum setting.

4.1 Obstacles for Conversions in the Classical Setting

Summary of 31-Step SHA-256. Recall that the 31-step collision is obtained by matching the IV produced from the first block and semi-free-start collisions in the second block. Also recall that the attack consists of three steps.

In Step II of the attack, the degrees of freedom in the message words \(W_0,\ldots , W_4\) are used to make the output of the first block and the local collision in the second block compatible. Let \(\alpha \) denote the number of free bits in the message words that can be used to make those two values compatible. Since \(W_0,\ldots ,W_4\) can be chosen freely, \(\alpha = 5 \cdot 32 = 160\) holds. For a randomly chosen \(\tilde{M}\) and a single solution in Step I, the probability that they can be compatible is \(2^\alpha /2^n\).

If we have \(\ell \) solutions in Step I and if Step III succeeds with a probability p, a randomly chosen \(\tilde{M}\) leads to a collision with probability \(\ell \cdot (2^\alpha /2^n) \cdot p\). Thus the time complexity T is estimated as \(T = \frac{1}{\ell \cdot (2^\alpha /2^n) \cdot p} = 2^{n}/(\ell \cdot 2^\alpha \cdot p)\) (by ignoring the complexity of Step I).

It is claimed in [29] that one can expect \(\ell = 2^{34}\) and \(p \approx 1/12 \approx 2^{-3.5}\), and the complexity \(2^{65.5}\) is obtained as \(T =2^{256}/(2^{34} \cdot 2^{160} \cdot 2^{-3.5}) = 2^{65.5}\).

Remark 4

Let \(2^X\) be the number of IVs of the second block that will be compatible with the local collisions in the second block. Then, the time complexity to find the first message block will be \(T=2^n/2^X\). The attack is valid as long as \(X > n/2 = 128\).

Lack of Degrees of Freedom in 38-Step SHA-256. We observe that in the differential characteristic for the 38-step semi-free-start collision of SHA-256 (Table 4), almost all the bits of state variable \(E_i\) have conditions for \(i=7,\ldots ,20\), which implies that both of the values and the differences for \(W_7,\ldots ,W_{20}\) will be fixed. When 16 successive message words are fixed, all the message words are fixed (due to the message expansion). Thus, among the message words \(W_0,\ldots ,W_7\) that can be used to make the first block and the local collision in the second block compatible, only the two words \(W_5\) and \(W_6\) will have degrees of freedom, and the number of free bits is \(\alpha = 2 \cdot 32 =64\) in total.

Thus the time complexity will be \(2^n/(\ell \cdot 2^{\alpha } \cdot p) {= 2^{192}/(\ell \cdot p)}\) when \(\ell \) solutions are available. Considering that \(\ell \) is about \(2^{34}\) for 31-step collisions, the complexity will be larger than \(2^{128}\) of the birthday paradox.

Remark 5

From another point of view, the 38-step semi-free-start collision cannot be converted into a collision because \(X < {128}\).

Lack of Degrees of Freedom in 39-Step SHA-512. The 39-step semi-free-start collision of SHA-512 in [8, 29] cannot be converted into a collision for the same reason.

In the differential characteristic (Table 5 of this paper’s full version [15]), almost all bits of the internal state variable \(E_i\) have some conditions for \(i=8,\ldots ,22\), which implies that both of the internal states and the message words in steps 8–22 will be fixed. Due to the constraint derived from the message expansion, only the single word \(W_7\) will have degrees of freedom among the first 8 message words that can be used to make the first block and the local collision in the second block compatible (i.e., \(\alpha = 64\)). In addition, \(\ell \) will not be large since the differential characteristic has dense conditions for \(i=8,\ldots ,22\). Thus the time complexity \(2^n/(\ell \cdot 2^{\alpha } \cdot p)\) will be larger than \(2^{256}\) of the birthday paradox.

4.2 Observations and Ideas on Conversion in the Quantum Setting

As mentioned in Remark 4, \(X > {n/2}\) must be satisfied to be a valid attack. On the other hand, in the quantum setting of time-space tradeoff, it may be possible to mount valid 2-block collision attacks even if \(X < {n/2}\). For example, assume that we can decrease the time complexity of 2-block collision attacks from \(2^n/2^X\) to \(\sqrt{2^n/2^X}\) by applying the Grover search and the Grover search requires negligible memory. It becomes a valid quantum collision attack in the setting of time-space tradeoff if \(\sqrt{2^n/2^X} < 2^{n/2}\), which is equivalent to \(X > 0\). Actually this idea is too naive and we cannot achieve a valid quantum attack in such a simple way. Nevertheless, this idea shows the possibility of valid 2-block quantum collision attacks with the Grover search.

With this in mind, we mount quantum collision attacks on 38-step SHA-256 and 39-step SHA-512 by converting the semi-free-start collisions into 2-block collisions with the Grover search. To achieve this goal, we have to take the following two points into account.

  1. 1.

    In the classical attack on 31-step SHA-256, by storing \(\ell \) solutions in Step I, the complexity of Step II is decreased by the factor of \(\ell \). This strategy works well since memory is relatively cheap in the classical setting. On the other hand, memory is expensive in the quantum setting of time-space tradeoff, and memory-less algorithms are favorable.

  2. 2.

    In the classical attack on 31-step SHA-256, we can choose \(W_0,\ldots ,W_4\) freely because those values do not affect the steps with dense conditions in the differential characteristic (i.e., steps 5–12). On the other hand, in the attack on 38-step SHA-256 (resp., 39-step SHA-512), we have to choose the message words \(W_0,\ldots ,W_6\) (resp., \(W_0,\ldots ,W_7\)) carefully because they affect on some of the message words in the steps with dense conditions, i.e., \(W_7,\ldots ,W_{20}\) (resp., \(W_8,\ldots ,W_{22}\)), through the message expansion.

We will set \(\ell = 1\) to minimize the required memory size. On the choice of the message words \(W_0,\ldots ,W_6\) for 38-step SHA-256, we observe the following.

  • We can modify \(W_6\) to another value \(\hat{W}_6\) without changing \(W_7,\ldots ,W_{21}\) by modifying \(W_j\) to \(\hat{W}_j := W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) for \(j=5,4,\ldots ,0\) step by step.

Indeed, if the value of \(W_6\) is changed to another value \(\hat{W}_6\), then \(W_{21}\) and \(W_{22}\) will be changed because \(W_i = \sigma _1(W_{i-2}) + W_{i-7} + \sigma _0(W_{i-15}) + W_{i-16}\) holds for \(i \ge 16\). However, the change of the value of \(W_{21}\) can be canceled out by modifying \(W_5\) to \(\hat{W}_5 := W_5 - (\sigma _0(\hat{W}_6) - \sigma _0(W_6))\). By modifying \(W_j\) to \(\hat{W}_j := W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) similarly for \(j=4,\ldots ,0\), we can also keep \(W_{20},\ldots ,W_{16}\) unchanged. Since \(W_7,\ldots ,W_{15}\) are not affected by the modification of \(W_0,\ldots ,W_6\), the words \(W_7,\ldots ,W_{15}\) are also kept unchanged.

We obtain a similar observation on the choice of the message words \(W_0,\ldots ,W_7\) for 39-step SHA-512. That is, we can modify \(W_7\) to another value \(\hat{W}_7\) without changing \(W_8,\ldots ,W_{22}\), by modifying \(W_j\) to \(\hat{W}_j := W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) for \(j=6,\ldots ,0\) step by step.

We mount quantum 2-block collision attacks based on these observations.

Attack Idea. Here we explain basic ideas of our quantum attacks that are common between 38-step SHA-256 and 39-step SHA-512. We will explain details that are specific to each function in the next section.

Let i denote the number of the step where the local collision starts in the differential characteristic (\(i=7\) for 38-step SHA-256 and \(i=8\) for 39-step SHA-512). The attack procedure is as follows (see also Fig. 5 of this paper’s full version [15]).

  1. I.

    Find a pair of messages \((M,M')\) and an initial value for the second block that yield a semi-free-start collision. Let \(S_{\mathrm {start}}\) be the internal state at the beginning of step i. For each j, let \(W_j\) and \(W'_j\) denote message word j expanded from M and \(M'\), respectively. Note that \(W_0 = W'_0,\ldots ,W_{i-1} =W'_{i-1}\) hold.

  2. II.

    With the Grover search, find a message \(\tilde{M}\) (for the first block) that satisfies the followings.

    1. (a)

      \(S_{\mathrm {start}}\) and the input chaining value for the second block \(\mathrm {IV}_{\mathrm {second}}\) derived from \(\tilde{M}\) can be compatible by modifying the message words \(W_0,\ldots ,W_{i-1},W'_0,\ldots ,W'_{i-1}\), while keeping the message words \(W_i,\ldots ,W_{i + 14}, W'_i, \ldots , W'_{i+14}\) unchanged. Let \(\hat{M}\) and \(\hat{M}'\) be the messages for the second block after the modification, i.e., \(\hat{M} := \hat{W}_0|| \cdots || \hat{W}_{i-1} ||W_{i} || \cdots || W_{15}\) and \(\hat{M}' := \hat{W}_0 || \cdots || \hat{W}_{i-1} ||W'_{i} || \cdots || W'_{15}\).

    2. (b)

      \(S_{\mathrm {start}}\) and the modified message pair \((\hat{M},\hat{M}')\) yield a collision at the end of the second block.

  3. III.

    By using \(\tilde{M}\) found in Step II, perform the computations in Steps II-(a) and II-(b) again to obtain the pair \((\hat{M},\hat{M}')\) that yield a collision at the end of the second block. (This step may seem redundant, but we separate this step from Step II so that we can apply the Grover search on \(\tilde{M}\) in Step II.) Output \((\tilde{M}||\hat{M},\tilde{M}||\hat{M}')\).

Step I of the above procedure corresponds to Step I of the classical collision attack on 31-step SHA-256. We store only a single solution in Step I of our attack so that the attack will be memory-less. Since only a single solution is required in this step, we just use the values shown in the previous works (i.e., the values \(M,M',h_0\) in Table 7 and Table 8 of this paper’s full version [15]).

Step II-(a) corresponds to Step II of the classical collision attack on 31-step SHA-256. Step II-(b) corresponds to Step III of the classical collision attack on 31-step SHA-256. We allow the remaining words \(W_{i+15},W_{i+16},\ldots \) and \(W'_{i+15},W'_{i+16},\ldots \) to be changed since the steps with dense conditions are up to \(i+14\) and thus to probabilistically satisfy all the conditions from step \(i+15\) by randomly changed \(W_{i+15},W_{i+16},\ldots \) and \(W'_{i+15},W'_{i+16},\ldots \) is not difficult.

Attack Complexity and Validity. Let F be the Boolean function to which Grover’s algorithm is applied in Step II of our attackFootnote 6. That is, F is defined by \(F(\tilde{M}) := 1\) if and only if \(\tilde{M}\) satisfies the two conditions II-(a) and II-(b). Let p be the probability that \(F(\tilde{M})=1\) when we pick a message \(\tilde{M}\) for the first block uniformly at random. In addition, suppose that F can be implemented on a quantum circuit of which width is \(S_F\) and depth is \(T_F\).

The time complexity of Step I is negligible since we just use the values from previous works. The time complexity of Step III is also negligible compared to that of Step II. Thus the time complexity of our attacks is dominated by the time complexity of the Grover search on F, which is at most \(T_F \cdot \frac{\pi }{4}\sqrt{1/p}\).

If a quantum computer of size \(S ({>}{S_F})\) is available, the Grover search can be parallelizedFootnote 7 and sped up by the factor of \(\sqrt{S/S_F}\), and the attack time complexity becomes

$$\begin{aligned} \left( T_F \cdot ({\pi }/{4})\sqrt{{1}/{p}}\right) /\sqrt{S/S_F} = T_F\cdot ({\pi }/{4}) \cdot \sqrt{S_F/pS}. \end{aligned}$$
(1)

Let n be the output length of the hash function. Since the time complexity of the generic attack is \(2^{n/2}/S\) when a quantum computer of size S is available, our attack is valid as long as

$$\begin{aligned} T_F \cdot ({\pi }/{4}) \cdot \sqrt{S_F/pS} < 2^{n/2}/S \end{aligned}$$
(2)

holds.

Remark 6

When we run the same procedure in the classical setting, the Grover search is replaced with the usual exhaustive search and the attack time complexity will be \((T_F \cdot S_F)/p\) (here we do not consider parallelizations for simplicity). Since the generic complexity is \(2^{n/2}\), the attack becomes valid if and only if \((T_F \cdot S_F)/p < 2^{n/2}\), which is equal to

$$\begin{aligned} p > (T_F \cdot S_F)/ 2^{n/2}. \end{aligned}$$
(3)

In particular, the classical attack is invalid if \(p < 2^{-n/2}\). On the other hand, the condition (2) is equivalent to \(p > S_F \cdot (\pi ^2/16) \cdot T^2_F /2^n\) (when \(S=1\)), and the quantum attack may be valid even if \(p < 2^{-n/2}\).

5 Quantum Collision Attack on 38-Step SHA-256

This section shows a quantum collision attack on 38-step SHA-256 based on the attack idea in Sect. 4.2.

Let \((M,M')\) and \(h_0\) be the semi-free-start collision and the initial value shown in the previous work (i.e., \((M,M')\) and \(h_0\) in Table 7 of this paper’s full version [15]). Let \(W_j\) and \(W'_j\) denote message word j associated with M and \(M'\), respectively. Recall that the local collision starts at step 7 in the differential characteristic in Table 4. Let \(S_{\mathrm {start}}\) be the state at the beginning of step 7 that is computed from \((M,M')\) and \(h_0\).

Section 5.1 provides some observations on Step II of the quantum attack. Section 5.2 provides an implementation of F and analyzes the depth and width of the circuit of F. In Sect. 5.3 we analyze the total complexity when the quantum attack is mounted with the implementation of F in Sect. 5.2.

5.1 Observation on Step II

We provide two observations.

First Observation. The internal state variables \(A_{-1},\ldots ,A_6\) and \(E_3,\ldots ,E_6\) are determined from \(S_{\mathrm {start}} = A_{6}|| \cdots || A_3 || E_6 || \cdots || E_3\). (These variables are common between M and \(M'\). See also Fig. 4 of this paper’s full version [15].) There exists a tuple \((\hat{W}_0,\ldots ,\hat{W}_6)\) that is compatible with \(\mathrm {IV}_{\mathrm {second}}\) and \(S_{\mathrm {start}}\) if and only if \(A_{-1}\) matches the most significant 32 bits of \(\mathrm {IV}_{\mathrm {second}}\). If \(A_{-1}\) matches, \(A_{-2},A_{-3},A_{-4}, E_{-1},\ldots ,E_{-4}\) are determined by the equation \(\mathrm {IV}_{\mathrm {second}} = A_{-1} || \cdots || A_{-4} ||E_{-1}|| \cdots ||E_{-4}\), and the message words \(\hat{W}_0,\ldots ,\hat{W}_6\) are uniquely determined from \(A_{-4},\ldots ,A_6\) and \(E_{-4},\ldots ,E_{-1},E_3,\ldots ,E_6\).

Second Observation. By exhaustively checking all the possible values for \(\hat{W}_6 \in \{0,1\}^{32}\), we verified that there exist 1179647 \((>2^{20})\) tuples \((\hat{W}_0,\ldots ,\hat{W}_6)\) that satisfy the following conditions.Footnote 8

  1. (i)

    \(\hat{W}_j = W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) holds for \(j=0,\ldots ,5\).

  2. (ii)

    \(S_{\mathrm {start}}\) and the messages \((\hat{M},\hat{M}')\) for the second block, where \(\hat{M} := \hat{W}_0|| \cdots || \hat{W}_6 ||W_7 || \cdots || W_{15}\) and \(\hat{M}' := \hat{W}_0 || \cdots || \hat{W}_6 ||W'_7 || \cdots || W'_{15}\), yield a collision at the end of the second block.

Remark 7

From another point of view, the second observation shows that we can make semi-free-start collisions for at least \(2^{20}\) initial values.

5.2 Implementation and Analysis of F

Below we provide an implementation of F and its analysis. In particular, we show \(T_F \le 6.8\) and \(S_F \le 3.9\), where \(S_F\) denotes the width of the quantum circuit of F and \(T_F\) denotes the running time (depth) of the circuit.

Implementation of F: Basic Idea. Before describing a formal implementation of F with notations of quantum computation, we give a basic idea behind the implementation.

First, we compute the following values (from Table 7 of this paper’s full version [15]) and store them into memory.

  1. (a)

    The internal state \(S_{\mathrm {start}}\) at the beginning of step 7.

  2. (b)

    The message words \(W_0=W'_0,\ldots ,W_6=W'_6,W_7,\ldots ,W_{21},W'_7,\ldots W'_{21}\).

  3. (c)

    The internal state variable \(A_{-1}\) that is uniquely determined from \(S_{\mathrm {start}}\).

Note that these values are computed and stored before the start and kept unchanged throughout the attack.

Given an input \(\tilde{M}\), the output value \(F(\tilde{M})\) is computed as follows.

  1. 1.

    Compute the output of the first block from \(\tilde{M}\), and let \(\mathrm {IV}_{\mathrm {second}}\) denote the output.

  2. 2.

    Check if the condition that the most significant 32 bits of \(\mathrm {IV}_{\mathrm {second}}\) is equal to \(A_{-1}\) is satisfied. If it is satisfied, proceed to the next step. Otherwise output 0 and abort.

  3. 3.

    Compute the unique \((\hat{W}_0,\ldots ,\hat{W}_6)\) that is compatible with \(\mathrm {IV}_{\mathrm {second}}\) and \(S_{\mathrm {start}}\).

  4. 4.

    Check if the following conditions are satisfied.

    1. (i)

      \(\hat{W}_j = W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) holds for \(j=0,\ldots ,5\).

    2. (ii)

      \(S_{\mathrm {start}}\) and the messages \((\hat{M},\hat{M}')\) yield a collision at the end of the second block, where \(\hat{M} := \hat{W}_0|| \cdots || \hat{W}_6 || W_7 || \cdots || W_{15}\) and \(\hat{M}' := \hat{W}_0 || \cdots || \hat{W}_6 || W'_7 || \cdots || W'_{15}\).

    If both of (i) and (ii) are satisfied, output 1. Otherwise output 0.

Remark 8

When we implement a quantum circuit, each computational step has to be reversible, and the running time of the circuit has to be independent from inputs. We ignored such properties in the above explanations for simplicity but they are taken into account in the formal description below.

Implementation of F: Formal Description. Let L be the list to store the values explained in (a)–(c) above, and f be the 38-step compression function. Given an input \(\tilde{M}\), the output value \(F(\tilde{M})\) is computed as follows.

  1. 0.

    At the beginning, the quantum state is \(\mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle }\). (\(\mathinner {|{y}\rangle }\) is the single qubit register where the value \(F(\tilde{M})\) will be added.)

  2. 1.

    Compute the output of the first block from \(\tilde{M}\). Let \(\mathrm {IV}_{\mathrm {second}}\) denote the output. Check if \(A_{-1}\) is equal to the most significant 32 bits of \(\mathrm {IV}_{\mathrm {second}}\). If they are equal, set \(b := 1\). If they are not equal, set \(b := 0\). The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle } \mathinner {|{y}\rangle }\otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle }\mathinner {|{b}\rangle }. \)

  3. 2.

    Let \(\mathrm {IV}'_{\mathrm {second}}\) denote the concatenation of \(A_{-1}\) and the least significant 224 bits of \(\mathrm {IV}_{\mathrm {second}}\) (\(\mathrm {IV}'_{\mathrm {second}} = \mathrm {IV}_{\mathrm {second}}\) holds if \(b=1\)). Compute the unique \((\hat{W}_0,\ldots ,\hat{W}_6)\) that is compatible with the initial chaining value \(\mathrm {IV}'_{\mathrm {second}}\) and \(S_{\mathrm {start}}\). The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle } \mathinner {|{y}\rangle }\otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle } \mathinner {|{b}\rangle }\mathinner {|{\hat{W}_0,\ldots ,\hat{W}_6}\rangle }.\)

  4. 3.

    Let \(\hat{M}\) denote \(\hat{W}_0|| \cdots || \hat{W}_6 ||W_7 || \cdots || W_{15}\) and \(\hat{M}'\) denote \(\hat{W}_0 || \cdots || \hat{W}_6 ||W'_7 || \cdots || W'_{15}\). Compute the values \(f(\mathrm {IV}'_{\mathrm {second}},\hat{M})\), \(f(\mathrm {IV}'_{\mathrm {second}},\hat{M}')\). The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle } \otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle } \mathinner {|{b}\rangle } \mathinner {|{\hat{W}_0,\ldots ,\hat{W}_6}\rangle } \mathinner {|{f(\mathrm {IV}'_{\mathrm {second}},\hat{M})}\rangle } |f( \mathrm {IV}'_{\mathrm {second}},\hat{M}')\rangle . \)

  5. 4.

    Recall that \(F(\tilde{M})=1\) if and only if \(b=1\) and the following (i) and (ii) hold.

    1. (i)

      \(f(\mathrm {IV}'_{\mathrm {second}},\hat{M}) = f(\mathrm {IV}'_{\mathrm {second}},\hat{M}')\).

    2. (ii)

      \(\hat{W}_j = W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) holds for \(j=0,\ldots ,5\).

    Compute \(F(\tilde{M})\) by checking if \(b=1\), and (i) and (ii) hold, and add the value \(F(\tilde{M})\) to the \(\mathinner {|{y}\rangle }\) register. The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle } \mathinner {|{y \oplus F(\tilde{M})}\rangle } \otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle } \mathinner {|{b}\rangle } \mathinner {|{\hat{W}_0,\ldots ,\hat{W}_6}\rangle }\mathinner {|{f(\mathrm {IV}'_{\mathrm {second}},\hat{M})}\rangle }\mathinner {|{f(\mathrm {IV}'_{\mathrm {second}},\hat{M}')}\rangle }. \)

  6. 5.

    Uncompute Steps 1–3 to obtain \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle } \mathinner {|{y \oplus F(\tilde{M})}\rangle }. \)

Analysis. We regard that the unit of depth (resp., width) of quantum circuits is the depth (resp., width) required to implement 38-step SHA-256 that takes 1-block inputs. In particular, we regard that the depth required to compute a single step of SHA-512 is equal to 1/38. Since the input length of 1-block SHA-256 is 512 bits and the output length is 256 bits, at least \(512 + 256 = 768\) qubits are required to implement the function on a quantum circuit.

Depth (\(T_F\)). Step 1 of the implementation computes the compression function once. The depth required to Step 2 is 7/38 since the message words in the first 7 steps in the second block are computed in Step 2. Step 3 computes the compression function twice. The cost of Step 4 is dominated by the computation for (ii), of which cost is at most 6 steps of SHA-256. Thus the depth required for Step 4 is at most 6/38. In summary, the depth required to implement Steps 1–4 is \(1 + 7/38 + 2 + 6/38 \le 3.4\). Since we have to perform uncomputations in Step 5, we have \(T_F \le 3.4 \times 2 = 6.8.\)

Width (\(S_F\)). The length of \(\tilde{M}\) is 16 words. L contains data of \(8 + (7 + 15 + 15) + 1 = 46\) words in total. y is a single bit. Thus, \((16+46) \times 32 + 1 =62 \times 32 + 1\) qubits are used in Step 0 of the implementation. Step 1 requires additional \(8 \times 32 + 1\) qubits to store \(\mathrm {IV}_{\mathrm {second}}\) and b. Step 2 requires additional \(7 \times 32\) qubits to store \(\hat{W}_0,\ldots ,\hat{W}_6\). Step 3 requires additional \((8 + 8) \times 32 = 16 \times 32\) qubits to store \(f(\mathrm {IV}_{\mathrm {second}},\hat{M})\) and \(f(\mathrm {IV}_{\mathrm {second}},\hat{M}')\). Therefore, to store intermediate values shown in the above implementation, \((62 + 8 + 7 + 16) \times 32 + 2 = 2978\) qubits are used in total. Hence we have \( S_F \le 2978/768 \le 3.9. \)

Remark 9

On the estimation of the width \(S_F\), more ancilla qubits may be required to compute the intermediate variables (such as \(\mathrm {IV}_{\mathrm {second}}\)) used in the implementation of F. However, we expect that they will be as much ancilla qubits as required to implement 1-block 38-step SHA-256. In particular, we expect that the ratio between the number of qubits to implement F and the number of qubits to implement 1-block 38-step SHA-256 will be about 3.9, even if we take the ancilla qubits to compute the intermediate variables into account.

Remark 10

Note that we could remove \(\mathinner {|{L}\rangle }\) from the computation since L is a list of classical data and computations that depend on L can be executed by classically controlling the gates. However, this has no consequence on the Time-memory tradeoff sine it is just converting qubits into classical bits.

5.3 Total Complexity

This section analyzes the total complexity when the quantum attack in Sect. 4.2 is mounted with the implementation of F in Sect. 5.2.

Let p denote the probability that \(F(\tilde{M})=1\) holds when \(\tilde{M}\) is randomly chosen. \(F(\tilde{M})=1\) holds if and only if \(\tilde{M}\) satisfies the conditions in the second and fourth steps of the implementation of F. A random \(\tilde{M}\) satisfies the condition in the second step with probability \(2^{-32}\). From the observation on Step II in Sect. 5.1, (i) and (ii) in the fourth step are satisfied with probability \(\left( 1179647/(2^{32})^7\right) > 2^{20}/2^{224} \). Therefore

$$\begin{aligned} p = 2^{-32} \cdot \left( 1179647/(2^{32})^7\right) > 2^{-32} \cdot \left( 2^{20}/2^{224}\right) = 2^{-236} \end{aligned}$$
(4)

holds.

The attack time complexity can be computed as in Eq. (1). We showed \(T_F \le 6.8\) and \(S_F \le 3.9\) in Sect. 5.2, and \(p > 2^{-236}\) in (4). Therefore, when a quantum computer of size S is available, our attack finds a collision in time \(6.8 \cdot (\pi /4) \sqrt{3.9 /(2^{-236} \cdot S) } = \frac{6.8\pi \sqrt{3.9}}{4} \cdot 2^{118} /\sqrt{S} \le 2^{122}/\sqrt{S}\). In addition, the attack time complexity \(2^{122}/\sqrt{S}\) is lower than the generic complexity \(2^{128}/S\) when \(S < 2^{12}\). Therefore our attack is valid as long as \(3.9 \le S < 2^{12}\).

Remark 11

Some may consider that our complexity analysis is invalid since the first equality in (4) holds only if the output distribution of the first block is exactly equal to the uniform distribution over \(\{0,1\}^{256}\), which will not be the case for 38-step SHA-256. However, still we can reasonably expect that the analysis is valid. See Section D of this paper’s full version [15] for details.

6 Quantum Collision Attack on 39-Step SHA-512

This section shows a quantum collision attack on 39-step SHA-512 based on the attack idea in Sect. 4.2.

Let \((M,M')\) and \(h_0\) be the semi-free-start collision and the initial value shown in the previous work (i.e., \((M,M')\) and \(h_0\) in Table 8 of this paper’s full version [15]). Let \(W_j\) and \(W'_j\) denote message word j associated with M and \(M'\), respectively.

The difference between the attack on 39-step SHA-512 from the one on 38-step SHA-256 is summarized as follows.

  1. 1.

    The local collision starts from step 8 but not step 7 (we denote the internal state at the beginning of step 8 by \(S_{\mathrm {start}}\)).

  2. 2.

    The probability \(p~(= |F^{-1}(1)|/2^{512})\) satisfies \(p > 2^{-498.4}\).

  3. 3.

    The implementation of F satisfies \(T_F \le 6.8\) and \(S_F \le 4.1\).

The attack finds a collision in time \(2^{252.7}/\sqrt{S}\), which is valid when \(4.1< S < 2^{6.6}\).

Section 6.1 provides some observations on Step II of the quantum attack. Section 6.2 provides an implementation of F and analyzes the depth and width of the circuit of F. In Sect. 6.3 we analyze the total complexity when the quantum attack is mounted with the implementation of F in Sect. 6.2.

6.1 Observation on Step II

We provide two observations.

First Observation. Given a chaining initial input value \(\mathrm {IV}_{\mathrm {second}}\), there always exists a unique tuple \((\hat{W}_0,\ldots ,\hat{W}_7)\) that is compatible with \(\mathrm {IV}_{\mathrm {second}}\) and \(S_{\mathrm {start}}\). This is because the local collision starts at step 8 in the differential characteristic for 39-step SHA-512 (see also Fig. 4 of this paper’s full version [15] for details).

Second Observation. We experimentally verified that there exist 13184 (\({>}{2^{13.6}}\)) tuples \((\hat{W}_0,\ldots ,\hat{W}_7)\) that satisfies the following conditions.

  1. (i)

    \(\hat{W}_j = W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) holds for \(j=0,\ldots ,6\).

  2. (ii)

    \(S_{\mathrm {start}}\) and the messages \((\hat{M},\hat{M}')\), where \(\hat{M} := \hat{W}_0|| \cdots || \hat{W}_7 ||W_8 || \cdots || W_{15}\) and \(\hat{M}' := \hat{W}_0 || \cdots || \hat{W}_7 ||W'_8 || \cdots || W'_{15}\), yield a collision at the end of the second block.

  3. (iii)

    \(\hat{W}_{23,j} = {W}_{23,j}\) for \(j=5,\ldots ,29\), where \(\hat{W}_{23,j}\) and \({W}_{23,j}\) are bit j of message word 23 derived from \(\hat{M}\) and M, respectively.

The condition (iii) is added to decrease the search space for \(\hat{W}_7\). We chose bit 5, bit 6, \(\ldots \), bit 29 because the differential characteristic (Table 5 of this paper’s full version [15]) has strict conditions on these bit positions of \(E_{23}\).

Remark 12

From another view of point, the second observation shows that we can make semi-free-start collisions for at least \(2^{13.6}\) initial values.

6.2 Implementation and Analysis of F

In what follows we provide description and analysis of the implementation of F used in the attack on 39-step SHA-512 and show \(T_F \le 6.8\) and \(S_F \le 4.1\).

Implementation of F: Basic Idea. Since the basic idea of the implementation is similar to that for 38-step SHA-256 in Sect. 5.2, here we only provide the difference from Sect. 5.2.

In the attack on 39-step SHA-512, there always exists a unique tuple \((\hat{W}_0,\ldots , \hat{W}_7)\) that is compatible with \(\mathrm {IV_{second}}\) and \(S_{\mathrm {start}}\) for arbitrary \(\mathrm {IV_{second}}\) due to the first observation in Sect. 6.1. Therefore we skip the step (in the implementation of F in Sect. 5.2) to check if \(A_{-1}\) is equal to the most significant 32 bits of \(\mathrm {IV_{second}}\).

Let \(\tilde{M}\) be a message for the first block, and \(\mathrm {IV}_\mathrm {second}\) be the initial vector for the second block that is computed from \(\tilde{M}\). We define \(F(\tilde{M}) := 1\) if and only if the conditions (i)–(iii) in the second observation in Sect. 6.1 are satisfied for the unique tuple \((\hat{W}_0,\ldots , \hat{W}_7)\) that is compatible with \(\mathrm {IV}_\mathrm {second}\) and \(S_{\mathrm {start}}\).

Formal Implementation of F. First, we compute the following values (from Table 8 of this paper’s full version [15]) and store them into a list L.

  1. (a)

    The internal state \(S_{\mathrm {start}}\) at the beginning of step 8.

  2. (b)

    The message words \(W_0=W'_0,\ldots ,W_7=W'_7,W_8,\ldots ,W_{22},W'_8,\ldots W'_{22},W_{23}\).

Given an input \(\tilde{M}\), the output value \(F(\tilde{M})\) is computed as follows.

  1. 0.

    At the beginning, the quantum state is \(\mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle }\). (\(\mathinner {|{y}\rangle }\) is the single qubit register where the value \(F(\tilde{M})\) will be added.)

  2. 1.

    Compute the output of the first block from \(\tilde{M}\). Let \(\mathrm {IV}_{\mathrm {second}}\) denote the output. The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle } \otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle }. \)

  3. 2.

    Compute the unique \((\hat{W}_0,\ldots ,\hat{W}_7)\) that is compatible with \(\mathrm {IV}_{\mathrm {second}}\) and \(S_{\mathrm {start}}\). The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle } \otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle }\mathinner {|{\hat{W}_0,\ldots ,\hat{W}_7}\rangle }.\)

  4. 3.

    Let \(\hat{M}\) denote \(\hat{W}_0|| \cdots || \hat{W}_7 ||W_8 || \cdots || W_{15}\) and \(\hat{M}'\) denote \(\hat{W}_0 || \cdots || \hat{W}_7 ||W'_8 || \cdots || W'_{15}\). Compute \(f(\mathrm {IV}_{\mathrm {second}},\hat{M})\), \(f(\mathrm {IV}_{\mathrm {second}},\hat{M}')\), and \(\hat{W}_{23}\), where \(\hat{W}_{23}\) is word 23 derived from \(\hat{M}\). The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y}\rangle } \otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle }\mathinner {|{\hat{W}_0,\ldots ,\hat{W}_7}\rangle } \mathinner {|{f(\mathrm {IV}_{\mathrm {second}},\hat{M})}\rangle } \mathinner {|{f(\mathrm {IV}_{\mathrm {second}},\hat{M}')}\rangle }\mathinner {|{\hat{W}_{23}}\rangle }. \)

  5. 4.

    Recall that \(F(\tilde{M}):=1\) if and only if the following (i)–(iii) hold.

    1. (i)

      \(f(\mathrm {IV}_{\mathrm {second}},\hat{M}) = f(\mathrm {IV}_{\mathrm {second}},\hat{M}')\).

    2. (ii)

      \(\hat{W}_j = W_j - (\sigma _0(\hat{W}_{j+1}) - \sigma _0(W_{j+1}))\) holds for \(j=0,\ldots ,6\).

    3. (iii)

      \(\hat{W}_{23,j} = W_{23,j}\) holds for \(j=5,\ldots ,29\)

    Compute \(F(\tilde{M})\) by checking if (i)–(iii) hold, and add the value \(F(\tilde{M})\) to the \(\mathinner {|{y}\rangle }\) register. The current quantum state is \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle } \mathinner {|{y \oplus F(\tilde{M})}\rangle }\otimes \mathinner {|{\mathrm {IV}_{\mathrm {second}}}\rangle }\mathinner {|{\hat{W}_0,\ldots ,\hat{W}_7}\rangle }\mathinner {|{f(\mathrm {IV}_{\mathrm {second}},\hat{M})}\rangle }\mathinner {|{f(\mathrm {IV}_{\mathrm {second}},\hat{M}')}\rangle }\mathinner {|{\hat{W}_{23}}\rangle }. \)

  6. 5.

    Uncompute Steps 1–3 to obtain \( \mathinner {|{\tilde{M}}\rangle }\mathinner {|{L}\rangle }\mathinner {|{y \oplus F(\tilde{M})}\rangle }. \)

Analysis. We regard that the unit of depth (resp., width) of quantum circuits is the depth (resp., width) required to implement 39-step SHA-512 that takes 1-block inputs. In particular, we regard that the depth required to compute a single step of SHA-512 is equal to 1/39. Since the input length of 1-block SHA-512 is 1024 bits and the output length is 512 bits, at least \(1024 + 512 = 1536\) qubits are required to implement the function on a quantum circuit.

Depth (\(T_F\)). Step 1 of the implementation computes the compression function once. Since the message words in the first 8 steps in the second block are computed in Step 2, the depth required for Step 2 is 8/39. Step 3 computes the compression function twice. The cost of Step 4 is dominated by the computation for (ii), which is at most 7 steps of SHA-512. Thus the depth required for Step 4 is at most 7/39. In summary, the depth required to implement Steps 1–4 is \(1 + 8/39 + 2 + 7/39 \le 3.4\). Since we have to perform uncomputations in Step 5, we have \(T_F \le 3.4 \times 2 = 6.8.\)

Width (\(S_F\)). The length of \(\tilde{M}\) is 16 words. L contains data of \(8 + (8 + 15 + 15 + 1) = 47\) words in total. y is a single bit. Thus, \((16 + 47) \times 64 + 1 = 63 \times 64 + 1\) qubits are used in Step 0 of the implementation. Step 1 requires additional \(8 \times 64\) qubits to store \(\mathrm {IV}_{\mathrm {second}}\). Step 2 requires additional \(8 \times 64\) qubits to store \(\hat{W}_0,\ldots ,\hat{W}_7\). Step 3 requires additional \((8 + 8 + 1) \times 64 = 17 \times 64\) qubits to store \(f(\mathrm {IV}_{\mathrm {second}},\hat{M})\), \(f(\mathrm {IV}_{\mathrm {second}},\hat{M}')\), and \(\hat{W}_{23}\). Therefore, to store intermediate values shown in the above implementation, \((63 + 8 + 8 + 17) \times 64 + 1 = 6145\) qubits are used in total. Hence we have \( S_F \le 6145/1536 \le 4.1. \)

Remark 13

On the estimation of the width \(S_F\), more ancilla qubits may be required to compute the intermediate variables (such as \(\mathrm {IV}_{\mathrm {second}}\)) used in the implementation of F. However, we expect that they will be as much as ancilla qubits required to implement 2-block 39-step SHA-512. In particular, we expect that the ratio between the number of qubits to implement F and the number of qubits to implement 2-block 39-step SHA-512 will be as much as 4.1, even if we take the ancilla qubits to compute the intermediate variables into account.

6.3 Total Complexity

Let p denote the probability that \(F(\tilde{M})=1\) holds when \(\tilde{M}\) is randomly chosen. \(F(\tilde{M})=1\) holds if and only if the conditions (i)–(iii) in the fourth step of the implementation of F are satisfied. From the second observation on Step II in Sect. 6.1, (i)–(iii) are satisfied with probability at least \(2^{13.6}/(2^{64})^8\). Therefore \(p > 2^{13.6}/(2^{64})^8 = 2^{-498.4}\) holds.

The attack time complexity can be computed as in Eq. (1). We showed \(T_F \le 6.8\) and \(S_F \le 4.1\) in Sect. 6.2, and \(p > 2^{-498.4}\) above. Therefore, when a quantum computer of size S is available, our attack finds a collision in time \( 6.8 \cdot (\pi /4) \sqrt{4.1 /(2^{-498.4} \cdot S) } = \frac{6.8\sqrt{4.1}\pi }{4} \cdot 2^{249.2} /\sqrt{S} \le 2^{252.7}/\sqrt{S}\). In addition, the attack time complexity \(2^{252.7}/\sqrt{S}\) is lower than the generic complexity \(2^{256}/S\) when \(S < 2^{6.6}\). Therefore our attack is valid as long as \(4.1 \le S < 2^{6.6}\).

7 Discussion

The previous sections exploited the existing semi-free-start collision attacks to mount quantum collision attacks for SHA-256 and SHA-512. This brings the following two questions. First, is it possible to optimize differential characteristics for the classical semi-free-start collision attack with respect to the conversion to the quantum collision attack? Second, is it possible to extend the conversion framework so that a wider class of the classical attack on other computation structure can be converted into a quantum collision attack? This section answers those questions. We hope those will provide future researchers with useful knowledge to find new quantum collision attacks.

7.1 Towards Searching for New Semi-Free-Start Collision Attacks

The attacks on SHA-256 and SHA-512 in Sects. 5 and 6 directly used the differential characteristics from the previous works, but it is possible to search for new differential characteristics from scratch in future works to be optimized in our conversion. More importantly, differential characteristics that cannot be exploited in the classical setting may still be exploited in the quantum setting.

Properties Required for Differential Characteristics. Our conversion is applied when the differential characteristic for the semi-free-collision attack satisfies the following properties.

  • The characteristic is dense, i.e. requiring many conditions, only in a relatively small number of steps. Let \(FIX_{\text {start}}\) and \(FIX_{\text {end}}\) be the input and output state values of these steps, respectively.

  • For multiple choices of \(IV_{\text {second}}\), it is possible to modify message words \(W_0\) to \(W_{s-1}\) so that \(IV_{\text {second}}\) and \(FIX_{\text {start}}\) are connected.

  • The probability to satisfy the characteristic from \(FIX_{\text {end}}\) is high enough to be faster than the generic attack.

Given those, we can view that the characteristic is composed of three parts as shown in Fig. 3.Footnote 9

Fig. 3.
figure 3

Form of Semi-free-start Collision Attacks that can be Converted into Collisions.

Properties for the Sparse Part. In our attacks on SHA-256 and SHA-512, (almost) all the message words are fixed after modifying \(W_0\) to \(W_{s-1}\), thus degrees of freedom to satisfy the characteristic from \(FIX_{\text {end}}\) to the end is provided by generating the first message block many times, which requires significant computational cost. Besides, the probability from \(FIX_{\text {end}}\) is reasonably high, thus the attack procedure could be provided without special attention. Here we give a decent analysis with respect to the condition to be faster than the generic attack, which should be taken into account for finding new characteristic.

Suppose that the first k message words are independently chosen and the dense part of the characteristic is located between step s and \(j-1\), where \(0< s< j < k\). Then, after modifying \(W_0\) to \(W_{s-1}\) to connect \(IV_{\text {second}}\) and \(FIX_{\text {start}}\), the attacker can still have degrees of freedom in message words \(W_{j}\) to \(W_{k-1}\). Let \(2^{d}\) and \(2^{-p}\) be the amount of degrees of freedom available to the attacker and the probability of the differential characteristic in the remaining steps. If \(d \ge p\), degrees of freedom in \(W_{j}\) to \(W_{k-1}\) is sufficient, thus only the single choice of \(IV_{\text {second}}\) is sufficient to find a collision. This is advantageous because the cost of computing the first block directly impacts to the overall attack complexity. If \(d < p\), degrees of freedom in \(W_{j}\) to \(W_{k-1}\) is insufficient, thus the generation of \(IV_{\text {second}}\) in the first block must be repeated multiple times.

Whether the attack can be faster than the generic attack depends on the relationship between d and p. To evaluate the attack complexity, we first discuss the complexity in the classical setting. Let \(2^f\) be the complexity to generate an \(IV_{\text {second}}\).Footnote 10 In the classical setting, when \(d \ge p\), the attacker needs to generate a single choice of \(IV_{\text {second}}\) and examine \(2^p\) choices of message words for the last part of the characteristic. Hence, the attack complexity is \(2^{f}+2^{p}\). When \(d < p\), the attacker needs to generate \(2^{p-d}\) choices of \(IV_{\text {second}}\) and, for each of them, examine \(2^d\) choices of message words for the last part of the characteristic. Hence, the attack complexity is \(2^{f}\cdot 2^{p-d} + 2^{p-d} \cdot 2^{d}\), which is equal to \(2^{f}\cdot 2^{p-d} + 2^{p}\). To be a valid attack, this complexity must be faster than the generic attack complexity, which is \(2^{n/2}\) in the classical setting. Hence, \(\max (f,p) < n/2\) when \(d \ge p\) and \(\max (p,f+p-d) \le n/2\) when \(d < p\). The closed formula for both cases is \(\max (p,f + \max (p-d,0)) < n/2\). Therefore, \(p < n/2\) must be satisfied in the classical setting, namely, the probability of the differential characteristic for the last part cannot be smaller than \(2^{-n/2}\).

The complexity evaluation in the quantum setting is as follows. When \(d \ge p\), the cost for generating an \(IV_\mathrm {second}\) and examining \(2^p\) choices of message for the last part of the characteristic decreases to \(\sqrt{2^f}\) and \(\sqrt{2^p}\), respectively, by using the Grover search. Hence the attack complexity becomes \(\sqrt{2^f} + \sqrt{2^p}\). When \(d < p\), similarly we obtain quadratic speed up for each subroutine with the Grover search, and the attack complexity decreases to \(\sqrt{2^{f}\cdot 2^{p-d}} + \sqrt{2^{p}}\). A quantum attack can be valid in the cost metric of time-space tradeoff if its complexity is below \(2^{n/2}\), i.e., \(\max (p,f + \max (p-d,0)) < n\). In particular, a quantum attack can be valid even if \(p \ge n/2\), i.e., the probability of the differential characteristic for the last part can be smaller than \(2^{-n/2}\).

Note that \(d \ge p\) may occur in practice. In fact, the 31-step (not 38-step) semi-free-start collision attack by Mendel et al. against SHA-256 is exactly the case with \(d \ge p\). As introduced in Sect. 3.1, the authors of [29] explained that Step III succeeds with a probability about 1/12 due to the lack of degrees of freedom in \(W_{13},W_{14},W_{15}\). However, if it is analyzed carefully, 1/12 is a part of probability that the generated \(IV_{\text {second}}\) is suitable, i.e. there are additional condition of 3.5 bits besides the match of the most significant 96 bits. The sparse characteristic in Fig. 3 corresponds to the characteristic from Step 13 to 31 in Table 2. There are 28 conditions on \(\varDelta E_{13}\) to \(\varDelta E_{16}\), thus \(p=28\).Footnote 11 Degrees of freedom exist in all bits of \(W_{13}\) to \(W_{15}\), thus \(d=96\). Hence, this is the case with \(d \ge p\).

Remark 14

Roughly speaking, the attack of Sect. 5 (resp., Sect. 6) is the case with \(s=7\), \(j=22\), \(d=0\), \(p >20\), and \(f=32\) (resp., \(s=8\), \(j=23\), \(d=0\), \(p>13.6\), and \(f=0\)).

Suitable Choice of s. The step index s is the border between the first and the second part of the characteristic in Fig. 3, where the state value is fully fixed after Step s. Suppose that the length of each message word is w-bit and the internal state size of hash functions is \(t\cdot w\)-bit. (In the case of SHA-256, \(w=32\) and \(t=8\).) Then the parameter f increases much and the attack may not work if s is too small or too large compared to t. The reason is as follows.

If s is too small compared to t (e.g., \(s < t/2\)), then degrees of freedom in \(W_0,\ldots ,W_{s-1}\) become too small and the probability that a randomly chosen \(IV_\mathrm {second}\) can be connected to \(FIX_\mathrm {start}\) becomes too small. Hence the parameter f becomes too large.

If s is too large (e.g., \(s > 2t\)), then the degrees of freedom in \(W_0,\ldots ,W_{s-1}\) will remain enough. However, this time it may be unclear which choice of \(W_0,\ldots ,W_{s-1}\) is compatible with \(IV_\mathrm {second}\) and \(FIX_\mathrm {start}\) for a random given \(IV_\mathrm {second}\), and the complexity to find a compatible choice becomes high. This implies that the parameter f also increases in this case.

Therefore we expect that an index s that is close to t (e.g., \(t/2< s < 2t\)) will be suitable. (Indeed s is close to t in our attacks in Sects. 5 and 6.)

Remark on Memory. Memory is quite expensive in the quantum setting while cheap in the classical setting.Footnote 12 Thus differential characteristics that lead to memory-less attacks but seem non-optimal in the classical setting are worth investigating in the quantum setting.

7.2 Towards Application to Other Hash Functions

A natural question that arises after seeing our results is whether we can apply the same idea to other hash functions by using similar differential characteristics shown in previous works. In earlier sections we focused on semi-free-start collisions on SHA-256 and SHA-512, which are single-branch hash functions, but we do not have to restrict ourselves to semi-free-start collisions nor single-branch hash functions: Differential characteristics for free-start collisions or double-branch hash functions may also lead to quantum collision attacks if their structures are close to Fig. 3.Footnote 13

Indeed, some previous works use such differential characteristics. Examples are the semi-free-start collision attack on reduced HAS-160 in [26], the free-start collision attacks on reduced SHA-2 family in [7] and reduced SM3 in [28], and the semi-free-start collision attacks on full RIPEMD-128 in [21] and reduced RIPEMD-160 in [24, 25, 30]. The differential characteristics used in these attacks look similar to Fig. 3 and they are found by using automated search tools in a similar way to the differential characteristics that we used in Sects. 5 and 6.

We investigated whether we could use those differential characteristics to mount quantum 2-block collision attacks. We elaborate observations that we obtained so far in Section G of this paper’s full version [15]. Unfortunately, we have not succeeded yet, and with this respect, the analysis here is a failure report. Nevertheless, we believe that those are valuable to report because we observe that some of the applications are close to be valid collision attacks while others are very far. By sharing the experience of those analysis, it would be possible to search for new differential characteristics that satisfy properties in Sect. 7.1 in order to break more rounds than classical collision attack.

8 Concluding Remarks

In this paper, we showed collision attacks on 38 and 39 steps of SHA-256 and SHA-512, respectively, when the attacker can access to quantum machines under the time-space tradeoff metric. The complexity is \(2^{122}/\sqrt{S}\) and \(2^{252.7}/\sqrt{S}\) where \(S < 2^{12}\) and \(S < 2^{6.6}\) for SHA-256 and SHA-512, respectively.

Both attacks followed the same approach as the previous work, where a semi-free-start collision attack that works for \(2^X\) choices of IVs (\(X > \frac{n}{2}\)) is converted into a 2-block collision. We observed that even a small X may lead to an attack faster than the generic one.

A possible future direction is to study applications to other cryptographic hash functions. Since the idea behind our quantum collision attacks is very simple, we believe that it has broad applications. It will also be interesting to study optimizations of differential characteristics for the classical semi-free-start collision attack with respect to the conversion to the quantum collision attack.