Keywords

1 Introduction

Cryptographic hash functions are one of the most versatile primitives and have many practical applications like integrity checks, message authentication, digital signature or password protection. Often they are a critical part of more complex systems whose security might fall apart if hash a function does not provide the properties we expect it to have.

Cryptographic hash functions take as input a string of arbitrary finite length and produce a fixed-sized output of \(n\) bits called hash. As a consequence, thefollowing main security requirements are defined for cryptographic hash functions:

  • Preimage Resistance: For a given output \(y\) it should be computationally infeasible to find any input \(x'\) such that \(y=h(x')\).

  • Second Preimage Resistance: For given \(x,y=h(x)\) it should be computationally infeasible to find any \(x' \ne x\) such that \(y=h(x')\).

  • Collision Resistance: It should be computationally infeasible to find two distinct inputs \(x,x'\) such that \(h(x)=h(x')\).

For any ideal hash function with \(n\)-bit output size, we can find preimages or second preimages with a complexity of \(2^n\), and collisions with a complexity of \(2^{n/2}\) using generic attacks.

Most cryptographic hash functions are constructed iteratively by splitting the message into evenly sized blocks \(m_i\) and using a compression function \(f\) to update the state. We call the intermediate results \(x_i\) chaining values and the final output \(h\) hash value (Fig. 1).

Fig. 1.
figure 1

Iterative construction for a cryptographic hash function.

The security proofs for the hash function rely on the difficulty of finding a collision for this compression function, hence it is also of interest to consider the properties of the compression function and find properties which distinguish it from an ideal function.

  • semi-free start collision: Find \(x, m, m'\) such that \(f(x, m) = f(x, m')\).

  • free-start collision: Find \(x, x', m, m'\) such that \(f(x, m) = f(x', m')\).

  • near collision: Find \(x, m, m'\) such that \(f(x, m) \oplus f(x, m')\) has a low Hamming weight.

To sum up the various types with respect to their relevance: a semi-free-start collision is more interesting than a free-start collision, and a collision is more interesting than a near-collision.

1.1 Motivation

Cryptanalytic attacks are often hard to verify. Cryptanalysts often concentrate on the total running time of the attack, which is boiled down to a single number. While one can argue about the exact transition point between cryptanalytic attacks of practical and theoretical time complexity, it is often placed around an equivalent of \(2^{64}\) calls to the primitive [1]. While this is a reasonable assumption for state-level adversaries, it is out of reach for academic research labs. However, the ability to fully implement and verify attacks is crucial, as this is often the only way to make sure that all details are modelled correctly in the theoretical analysis. In this paper we therefore aim at attacks that can actually be executed (and verified) with limited budget computing resources.

In this paper we show a new practical attack on a class of AES-like hash functions. We show attacks on reduced round versions of the ISO/IEC 10118-3 standard Whirlpool [2] and the new Russian federal standard GOST R 34.11-2012 [3]. The model we consider is semi-free-start attacks on the compression function, which in contrast to the free-start attacks do not allow the attacker to choose different chaining values in a pair of inputs. This reduced degree of freedom makes the task of cryptanalysts harder, but is more relevant as it is closer to the actual use in the hash function.

1.2 Contribution

Despite a lot of attention on rebound-attacks of AES and AES-like primitives, we show that more improvements are possible in the inbound phase.

To the best of our knowledge, currently no practical attacks on reduced round GOST R have been published. However, there exists a practical 4-round free-start collision attack on the Whirlpool compression function [4]. It seems very hard to apply this specific attack directly to GOST R due to the extra round in the key schedule, which gives GOST R additional security against these free-start attacks.

In this paper we show a new method to carry out a 4-round practical attack on the Whirlpool and GOST R compression function. Additionally, and in contrast to many other attacks known on GOST R, we do not need the freedom to add half a round at the end to turn a near-collision into a collision. As the full hash function also does not end with a half round, we argue that a result on 4 rounds can actually be more informative than a result on 4.5 rounds.

New message modification technique. The attack is based on the rebound attack and start-in-the-middle techniques, and it carefully chooses the key input to significantly reduce the complexity resulting in a very low complexityFootnote 1. We are also able to improve the results on \(6.5\) rounds by extending this attack. We give an actual example for such a collision, and have the source code of both the attack and the general framework publicly available to facilitate further research on practical attacksFootnote 2. The method is not specific to a particular primitive, but is an algorithmic technique that however depends on two conditions in a primitive to hold (see also Sect. 4).

1.3 Related Work

In Table 1 we summarize the practical results on Whirlpool and GOST R. As the GOST R compression function uses a design similar to the Whirlpool hash function [2], many of the previous results on Whirlpool can be applied to GOST R. We would also like to note on adding half a round at the end for GOST R. This does not always make an attack more difficult, and in some cases it makes it easier, as it makes it possible to turn a near-collision into a collision, therefore we distinguish for our attacks if it applies for both cases.

Table 1. Summary of attacks with a complexity up to \(2^{64}\) on AES-based hash functions. Time is given in compression function calls and memory in bytes.

There have also been practical attacks on other AES-based hash functions like Maelstroem (6 out of 10 rounds [7]), Grøstl (6 out of 10 rounds [8]) and Whirlwind (4.5 out of 12 rounds [9]).

1.4 Rebound Attacks

The rebound attack is a powerful tool in the cryptanalysis of hash functions, especially for finding collisions for AES-based designs [10, 11]. The cipher is split into three sub-ciphers

$$\begin{aligned} E = E_{fw} \circ E_{in} \circ E_{bw} \end{aligned}$$

and the attack proceeds in two steps. First, the inbound phase which is an efficient meet-in-middle in \(E_{in}\) using the available degree of freedom. This is followed by a probabilistic part, the outbound phase in \(E_{fw}\) and \(E_{bw}\) using the solutions from the inbound phase. The basic 4-round rebound attack uses a differential characteristic with \(1-8-64-8-1\) active bytes per round and has a complexity of \(2^{64}\). There are many techniques extending and improving this attack. Some can even improve this very basic and simple setting of a square geometry, like start-from-the-middle [8], super S-Box [12, 13] or solving three fully active states in the middle [14, 15]. Other generic extensions exploit additional degrees of freedom or non-square geometries to improve results, like and using multiple inbounds [12, 16]. In these settings, improved list-matching techniques [17, 18] are also a generic improvement.

2 Description of GOST R

This section gives a short description of the GOST R compression function as we will use it for describing our attack in detail. As we are only looking at the compression function, we leave out some details not relevant for the upcoming attack in order to simplify the description. For a more detailed description of GOST R we refer to [3].

The compression function \(g\) uses two \(512\)-bit inputs (the message block \(m\) and the chaining value \(h\)) to update the state in the following way (see Fig. 2)

$$\begin{aligned} g_N(h, m) = E(L \circ P \circ S(h), m) \oplus h \oplus m \end{aligned}$$
(1)

where \(E\) is an AES-based block cipher using a state of \(8 \times 8\) bytes and \(S, P, L\) are the same functions as used in this block cipher (see below).

Fig. 2.
figure 2

An outline of the GOST R compression function. The chaining input is processed through an additional round before entering \(E\)

If we want to find a collision for the compression function, the following equation must hold

$$\begin{aligned} \Delta m_i \oplus \Delta h_i \oplus \Delta E(h_i, m_i) = 0 \end{aligned}$$
(2)

2.1 Block Cipher E

The block cipher \(E\) takes two \(512\)-bit inputs \(M\) and \(K^0\) and produces a \(512\)-bit output \(C\). The state update consists of \(12\) rounds \(r\) and a final key addition.

$$\begin{aligned} L_1&= L \circ P \circ S \circ AK(M, K^0)\\ L_{i+1}&= L \circ P \circ S \circ AK(L^i, K^i) \quad i=1\ldots 11\\ C&= AK(L^{12}, K^{12}) \end{aligned}$$

The following four operations are used in one round (see Fig. 3):

  • AK Adds the key byte-wise by XORing it to the state.

  • S Substitutes each byte of the state independently using an 8-bit S-Box.

  • P Transposes the state.

  • L Multiplies each row by an \(8\times 8\) MDS matrix.

The 512-bit key input is expanded to 13 subkeys \(K_0, \ldots , K_{12}\). This is done similar to the state update but AK is replaced with the addition of a round-dependent constant \(RC^r\).

$$\begin{aligned}\\ L_{i+1}&= L \circ P \circ S \circ AK(K^0, RC^0) \quad i=0\ldots 11\\ K^{12}&= AK(L^{12}, K^{12}) \end{aligned}$$
Fig. 3.
figure 3

The four operations used in one round of GOST R.

2.2 Notation

The notation we use for naming the states is:

  • The state after applying the round function \(\{AK, S, P, L\}\) in round \(r\) is named \(\{AK^r, S^r, P^r, L^r\}\)

  • The byte at position \(x, y\) in state \(X^r\) is named \(X^r_{x, y}\)

  • A row is denoted by \(X^r_{*, y}\) and a column by \(X^r_{x, *}\)

  • figure a

    and

    figure b

    denote that there is a difference in a byte.

  • figure c

    and

    figure d

    are used for highlighting values of a byte.

2.3 Differential Properties

The attacks in this paper are based on differential cryptanalysis, and the resulting complexity correlates with the differential properties of the round functions. Therefore, to ease understanding, we give a short overview of the properties that are relevant for our attack.

The linear layer \(L\) has a strong influence on the number of active S-Boxes. There is no proof given that the linear layer \(L\) is MDS or has a branch number of \(9\) in the GOST R reference [3], but it was shown that this is the case in [19]. Hence, if we have one active byte at the input we will get 8 active bytes at the output with probability one. If we have \(a\) active bytes at the input the probability that there will be \(b\) active bytes at the output under the condition \(a \ne 0, b \ne 0\) and \(a + b \ge 9\) is \(2^{(b-8)8}\).

The properties of the S-Box have a strong influence on the complexity of our attack, as will be seen later. Given a S-Box \(S: \mathbb {F}^n_2 \rightarrow \mathbb {F}^n_2\)

$$\begin{aligned} \{x\ |\ S(x) \oplus S(x \oplus a) = b\} \end{aligned}$$
(3)

is the number of solutions for an input \(a\) and output difference \(b\). Table 2 gives the number of solutions for some S-Box designs used in practice.

To get a bound on the probability of the differential characteristics we are interested in the maximum value of Eq. 3 which we will refer to as the maximum differential probability (mdp) of an S-Box. A 4-round differential characteristic has at least 81 active bytes due to the properties of the linear layer, therefore any 4-round characteristic has a probability of \(\le \!\!\text {mdp}^{81}\).

For the rebound attack it is also important to know the average number of possible output differences, given a non-zero input difference. We will refer to this as the average number of solutions (ANS) for an S-Box which can be computed by constructing the differential distribution table (DDT). The ANS corresponds to the average number of non-zero entries in each row of the DDT.

This property influences the complexity for the matching step in the inbound phase and increases the costs of finding one solution. For the GOST R S-Box we get on average \(107.05\) solutions.

Table 2. Comparison of different 8-bit S-Box designs used in AES-based hash functions.

3 Attack on GOST R

In this section we describe our 4-round practical attack in detail and also show how it can be applied to more rounds. The description of the attack is split into two parts. First, we find a differential characteristic leading to a collision. Then we show how to construct a message pair following this characteristic in a very efficient way.

Fig. 4.
figure 4

The 4-round differential characteristic used in our attack.

3.1 Constructing the Differential Characteristic

For our 4-round attack we use a characteristic of the form \(1-8-64-8-1\) (see Fig. 4). This truncated differential path has the minimal number of possible active S-Boxes for 4 rounds and is the starting point for many attacks. Next, we will determine the values of the differences before continuing with the construction of the message pair.

The approach we use for this is based on techniques from the rebound attack, like the start-in-the-middle technique used in [8]. This approach would also give us an efficient way to find both the characteristic and message pair for a characteristic of the form \(1-8-64-8\). However this would still lead to a higher attack complexity if extended to 4 rounds. Hence, we only use ideas from this approach to determine the differential characteristic and do not assume the key input as constant. Precomputation. First we pre-compute the differential distribution table (DDT) of the S-Box and we also construct a list \(M_{lin}\). This list contains all possible \(255\) non-zero values of \(P_{0,0}\) and the result after applying \(L\) (see Fig. 5).

Fig. 5.
figure 5

Computing list \(M_{lin}\) for all \(255\) values of \(P^0_{0, 0}\) (blue) to find all possible transitions from \(1\) to \(8\) bytes. Gray bytes are set to zero (Color figure online).

Construction

  1. 1.

    Start with a random difference in \(AK^4_{0,0}\) and propagate it back to \(S^2\) through the inverse round functions. For the linear steps this is deterministic, and for propagating through the S-Box we choose a random possible input difference to the given output difference. After this step we will have a full active state in \(S^2\).

  2. 2.

    For each difference in \(S^2\) we look up the set of all possible input differences from the DDT for each byte of the state.

  3. 3.

    Check for each row of \(AK^2\) whether there is a possible match with the rows stored in \(M_{lin}\) (see Fig. 6).

    • The probability that a single byte matches is \(107.05 / 255 \approx 2^{-1.252}\) therefore a row matches with a probability of \(2^{-10.018}\).

    • If we take into account that \(M_{lin}\) has \(255\) entries we expect to find a match with a probability of \(1 - (1 - 2^{-10.018})^{255} \approx 2^{-2.2}\).

    • Therefore the probability for a match of all 8 rows is given by

      $$\begin{aligned} (2^{-2.2}) ^ 8 = 2^{-17.6} \end{aligned}$$
      (4)
Fig. 6.
figure 6

The matching step in the middle is done on each row individually. There are \(2^8\) possible values for each row \(AK^2_{*, j}\) for \(j=0,1,\ldots , 7\).

After this step we have found a characteristic spanning from \(S^1\) to \(AK^4\). Now we have to repeat the previous process for a single row to find the right differences in \(AK^1\). This has a probability of \(2^{-2.2}\) of succeeding. Hence we need to repeat the whole process \(2^{19.8}\) times to obtain one solution.

Note that we can only choose \(255\) differences for \(AK^4_{0,0}\), but we can also freely choose from the set of possible differences when propagating from \(S^3\) to \(AK^3\). This gives us an additional \(107.05\) choices for each row in \(S^2\) leading to \(\approx 2^{54}\) possible values for the state \(S^2\). Hence, we have enough starting points for finding our differential characteristic.

3.2 Finding the Message Pair

Now we want to find a message pair which follows the previously constructed characteristic. At this point only the differences, but not the values of the state, are fixed. We start by fixing the values of \(AK^2\) such that the 64 differential transitions \(S^2 = S(AK^2)\) are fulfilled.

Next we use the key input to solve any further conditions on the active S-Boxes in order to lower the complexity of our attack. This step is split into solving the conditions on \(S^1_{*, 0} = S(AK^1_{*, 0})\) and \(S^3_{0, *} = S(AK^3_{0, *})\).

Fig. 7.
figure 7

The values of \(AK^2\) are fixed. We solve 7 of the conditions on \(S^1\) by using the freedom in \(K^2\) (bytes marked orange), which allows us to influence the values on the bytes in \(S^1\) (orange slash pattern).

Solving Conditions at the Start. We have 8 conditions on \(S^1_{*, 0}\) which we need to solve. These conditions can be solved row-wise by choosing the corresponding values in \(K^2\) such that \(P^{-1}(L^{-1}(AK^2 \oplus K^2)) = S^1\). We can do this step row-wise by solving a linear equation. As there is only a single byte condition for each row, we only need one byte in the corresponding row of \(K^2\) to solve the equation (see Fig. 7). The remaining bytes are fixed to arbitrary values as we have no further conditions to fulfill at this step. These bytes could be used to solve more conditions for other differential characteristics or to construct additional solutions, as we will do for extending the attack on more rounds.

In this step we can generate up to \(2^{56}\) solutions per row. Note that we only do this step for \(7\) rows, as we need the last row in the next step. Solving Conditions at the End. For solving the conditions \(S^3 = S(AK^3)\), we can use the bytes in \(K^2_{*, 7}\). These bytes form a column in \( \mathsf{KP} ^3_{7, *}\) (see Fig. 8), which allows us to solve a single byte condition per row for \(\text {AK}^3\).

  1. 1.

    Assume that \(K^2_{*, 0-6}\) are fixed and propagate them forward to \(\text {KP}^3\).

  2. 2.

    We can now solve the conditions for each row individually. In each row there are \(7\) bytes fixed in \(KP^3\) and a single byte in \(K^3\) (from \(AK^3\)). This gives us a linear equation with one solution per row and allows us to solve all conditions on \(AK^3\).

Fig. 8.
figure 8

Solving all the conditions on \(AK^3\). The orange values are fixed from the previous step and the purple values are used to fulfill the conditions on \(AK^3\).

Remaining Conditions. We still need to solve one byte condition on \(S^1_{0, 7}\), which can be done by repeating the previous procedure \(2^8\) times. The bytes which are used to solve the conditions on \(AK^3\) form a row in \(K^2\) and influence the values of \(L^1\) resp. \(P^1\) and \(S^1\) (see Fig. 11 in Appendix A). This implies that we can change the value of \(S^1_{0, 7}\) by constructing different solutions for \(K^2_{*,7}\).

The only remaining condition is \(\Delta AK^0_{0,0} = \Delta AK^4_{0,0}\), which can again be solved by repeating the previous steps \(2^8\) times. It follows that we need to repeat the algorithm shown in Sect. 3.2 about \(2^{16}\) times. Complexity. We can construct the differential characteristic with a complexity of \(2^{19.8}\). Finding a message pair following this characteristic requires \(2^{16}\) steps using our message modification technique. Hence, the total complexity of the attack is \(\approx \!\!2^{19.9}\). We have implemented this attack and verified our results. The un-optimized proof-of-concept implementation in Python is publily available [20]. An example for a 4-round collision can be found in Appendix B.

3.3 Extending the Attack

As we only need to control \(15\) bytes of the key, we can extend the attack on 6.5 rounds by using a characteristic of the form \(8-1-8-64-8-1-8\). In this case we would use the same approach to find the differential characteristic for 4 rounds and in the message modification part we would construct more solutions by using the additional freedom in the key. This will influence the differences at the input/output of the 6.5 rounds. The complexity of this attack is \(\approx \!\!2^{64}\), as the 8-byte difference at the input/output needs to be equal (Fig. 9).

Fig. 9.
figure 9

The 4-round attack is extended by one round in the beginning and one round in the end to mount an attack on 6.5 rounds.

4 Application to Other AES-based Hash Functions

The message modification technique presented is not specific to GOST R, but requires a few criteria to be met. First the transposition layer has to have the property that every byte of a single row/column is moved to a different row/column (see Fig. 10). This is true for all AES-based hash functions we consider in this paper, as it is a desired property against other attacks.

The second criteria is that there is a key addition in every round, hence our attack is applicable to both Whirlpool and GOST R. Permutation-based designs like Grøstl do not have this property. The attacker has less control of the input for each round, which makes the hash function more resistant against these types of attacks.

Fig. 10.
figure 10

The transposition layer used in Whirlpool and GOST R.

The complexity of the attack depends on the choice of the S-Box, as this directly influences the costs of constructing the differential characteristic. Given the average number of solutions \(\overline{s}\) for \(\Delta out = S(\Delta in)\) with a fixed value \(\Delta in\), this directly gives the complexity for the matching step of the attack

$$\begin{aligned} \left( 1 - \left( 1 - \frac{\overline{s}}{255}\right) ^{255}\right) ^8 \end{aligned}$$
(5)

and the number of possible states for \(S^2\) is \(\approx \overline{s}^{~8}\). A comparison of the different S-Boxes used in AES-based hash functions is given in Table 3.

Table 3. Comparing the maximum differential probability (MDP) and average number of solutions (ANS) for different 8-bit S-Boxes in AES-based designs.

5 Conclusion

In this paper, we have shown new practical attacks for both the Whirlpool and GOST R compression function. We presented a 4-round attack with very low complexity of \(2^{25.10}\) resp. \(2^{19.8}\). Importantly, the attack is fully verified and source-code for it is available. In the case of GOST R the attack can be extended to find collisions for \(6.5\) rounds with a complexity of \(2^{64}\) and for Whirlpool we can extend it to construct a near-collision in \(50\) bytes with a complexity of \(2^{25.10}\) for 6.5 rounds of the compression function. The difference in the results for GOST R and Whirlpool is due to the ShiftColumns operation which does not align the bytes to lead to a collision for the differential characteristic we use.

Our attack is applicable to all AES-based primitives where it is possible for the attacker to control the key input for a few rounds. This significantly reduces the complexity of previous attacks and might be useful to speed up other attacks on AES-based hash-function designs.