1 Introduction

Hash functions are one of the most important primitives in symmetric-key cryptography. They are simply functions that, given an input of variable length, produce an output of a fixed size. They are needed in several scenarios, like integrity check, authentication, digital signatures, so we want them to verify some security properties, for instance: preimage resistance, collision resistance (i.e., for an n-bit hash function, finding two distinct inputs mapping to the same output should require at least 2n/2 computations), second preimage resistance, and so on.

Since 2005, several new attacks on hash functions have appeared. In particular, the hash standards MD5 and SHA-1 were cryptanalyzed by Wang et al. [26, 27]. Due to the resemblance of the standard SHA-2 with SHA-1, the confidence in the former was also somewhat undermined. This is why the American National Institute of Standards and Technology (NIST) decided to launch in 2008 a competition in order to find a new hash standard, SHA-3. This competition received 64 hash function submissions and accepted 51 to enter the first round. Three years and two rounds later, only 5 hash functions remained in the final phase of the competition.

Among the candidates, many functions were AES-based (they reuse some AES components or the general AES design strategy), like the SHA-3 finalist Grøstl [6]. This design trend is at the origin of the introduction of the rebound attack [18], a new cryptanalysis technique that has been widely deployed, improved and applied to a large number of SHA-3 candidates, hash functions and other types of AES-based constructions (such as block ciphers in the known/chosen-key model). It has become one of the most important tools used to analyze the security margin of many SHA-3 candidates as well as their building blocks.

The rebound attack was proposed as a method to derive a pair of internal states that verifies some truncated differential path with lower complexity than a generic attack. It was formed by two steps: a first one, the controlled part (or inbound), where solutions for two rounds of an unkeyed AES-like permutation were found with negligible complexity, and a second one, uncontrolled part (or outbound), where the solutions found during the inbound phase were used to verify probabilistically the remaining differential transitions. Assuming an AES-like internal state composed of a t×t matrix of c-bit cells, the rebound attack was then extended to three rounds by the start-from-the-middle [17] and the SuperSBox variants [7, 14] for a negligible average complexity per found pair, but with a higher minimal complexity of 2tc computations. Since most rebound-based attacks actually required many such pairs, this was not much of a constraint. In parallel, other improvements on the truncated differential paths utilized [25] or on methods to merge lists [21] were proposed.

In this article, we describe a method based on lists merging in order to control truncated differences over four rounds of an unkeyed AES-like permutation [12] with complexity 2tcx computations, where x is a parameter depending on the differential path considered. While the cost per pair found in the controlled part is much increased, solving four rounds directly allows to handle much better truncated differential paths for the uncontrolled part. Note that whether it was possible or not to reach four rounds remained an open problem among the research community. We also generalize the global reasoning by considering as well non-fully-active truncated differential paths [25] during both the controlled and uncontrolled phases, eventually obtaining the best known results for many attack scenarios of an AES-like permutation.

As an application, we concentrated our efforts on the Grøstl internal permutation. Rebound-like attacks on this function have already been applied and improved in several occasions [7, 17, 19, 21, 24], Grøstl being one of the most studied SHA-3 candidates. When entering the final round, a tweak of the function was proposed, which prevents the application of the attacks from [24]. We denote Grøstl-0 the original submission [5] of the algorithm and Grøstl its tweaked version [6]. Apart from the rebound results, the other main analysis communicated on Grøstl is a higher order property on 10 rounds of its internal permutation [3] with a complexity of 2509 computations. In Table 1, we give a summary of the best known results on both the 256- and 512-bit tweaked versions of Grøstl, including the ones that we present in this article.

Table 1. Best attacks on targets where our analysis is applicable. By best analysis, we mean the ones on the highest number of rounds.

Namely, we provide the best known rebound distinguishers on 9 rounds of the internal permutation and we show how to make some nontrivial observations on the corresponding compression function, providing the best known analysis of the Grøstl compression function exploiting the properties of the internal permutations. For Grøstl-512, we considerably increase the current largest number of analyzed rounds, from 7 to 10. Additionally, we provide in Appendix the direct application of our new techniques to the AES-based hash function PHOTON [8].

These results do not threaten the security of Grøstl, but we believe they will play an important role in better understanding its security, and AES-based functions in general. In particular, we believe that our work will help determining the bounds and limits of rebound-like attacks in this type of constructions.

2 Generalities

In this section, we start by describing a generic view of an AES-like permutation to capture various cryptographic primitives such as AES [4], Grøstl [5], ECHO [2], Whirlpool [1], LED [9], or PHOTON [8].

2.1 Description of AES-like Permutations

We define an AES-like permutation as a permutation that applies N r rounds of a round function to update an internal state viewed as a square matrix of t rows and t columns, where each of the t 2 cells has a size of c bits. As we will show later, our techniques can also be adapted when the matrix is not square (as it is the case for Grøstl-512), but we focus on square matrices for ease of description.

The round function (Fig. 1) starts by xoring a round-dependent constant to the state in the AddRoundConstant operation (AC). Then, it applies a substitution layer SubBytes (SB) which relies on a c×c nonlinear bijective SBox. Finally, the round function performs a linear layer, composed of the ShiftRows transformation (SR), that moves each cell belonging to the x-th row by x positions to the left in its own row, and the MixCells transformation (MC), that linearly mixes all the columns C of the matrix separately by multiplying each one with a matrix M implementing a Maximum Distance Separable (MDS) code: CM×C.

Fig. 1.
figure 1

One round of the AES-like permutation instantiated with t=8.

Note that this description encompasses permutations that really follow the AES design strategy, but very similar designs (for example with a slightly modified ShiftRows function or with a MixCells layer not implemented with an MDS matrix) are likely to be attacked by our techniques as well. In the case of AES-like block ciphers analyzed in the known/chosen-key model, the subkeys generated by the key schedule are incorporated into the known constant addition layer AddRoundConstant. We note that all the rounds considered in this article are full rounds: they all have the MixCells transformation, even the last one as opposed to the full version of the AES.

2.2 Description of Grøstl

The hash function Grøstl-0 has been submitted to the SHA-3 competition under two different versions: Grøstl-0-256, which outputs a 256-bit digest and Grøstl-0-512 with a 512-bit one. For the final round of the competition, the candidate has been tweaked to Grøstl, with corresponding versions Grøstl-256 and Grøstl-512.

The Grøstl hash function handles messagesFootnote 1 by dividing them into blocks after some padding and uses them to update iteratively an internal state (initialized to a predefined IV) with a compression function. This function is itself built upon two different permutations, namely P and Q. Each of those two permutations are built upon the well-understood wide-trail strategy of the AES. As an AES-like Substitution-Permutation Network, Grøstl enjoys a strong diffusion in each of the two permutations and by its wide-pipe design, the size of the internal state is ensured to be at least twice as large as the final digest.

The compression function f 256 of Grøstl-256 uses two 256-bit permutations, P 256 and Q 256, which are similar to the two 512-bit permutations, P 512 and Q 512, used in the compression function f 512 of Grøstl-512. More precisely, for a chaining value h and a message block m, the compression function (Fig. 2) produces the output (⊕ denotes the XOR operation):

$$\begin{aligned} &f_{256}(h,m) = P_{256}(h\oplus m) \oplus Q_{256}(m) \oplus h,\quad \text{or:} \end{aligned}$$
(1)
$$\begin{aligned} &f_{512}(h,m) = P_{512}(h\oplus m) \oplus Q_{512}(m) \oplus h. \end{aligned}$$
(2)
Fig. 2.
figure 2

The compression function of Grøstl using the permutations P w and Q w , with w∈{256,512}.

The internal states are viewed as matrices of bytes of size 8×8 for the 256-bit version and 8×16 for the 512-bit one. The permutations strictly follow the design of the AES and are constructed as N r iterations of the composition of four basic transformations:

$$\begin{aligned} R \stackrel{\text{def}}{:=} \textit {MixCells} \ \circ\ \textit {ShiftBytes} \ \circ\ \textit {SubBytes} \ \circ\ \textit {AddRoundConstant} . \end{aligned}$$
(3)

All the linear operations are performed in the same finite field GF(28) as in the AES, defined via the irreducible polynomial x 8+x 4+x 3+x+1 over GF(2). The AddRoundConstant (AC) operation adds a predefined round-dependent constant, which significantly differs between P and Q to prevent the internal differential attack [24] that takes advantage of the similarities between P and Q. The SubBytes (SB) layer is the nonlinear layer of the round function R and applies the same SBox as in the AES to all the cells of the internal state. The ShiftBytes (Sh) transformation shifts cells in row i by τ P [i] positions to the left for permutation P and τ Q [i] positions for permutation Q. We note that τ also differs from P to Q to emphasize the asymmetry between the two permutations. Finally, MixCells (MC) is implemented in Grøstl by the MixBytes (Mb) operation that applies a circulant MDS constant matrix M independently to all the columns of the state. In Grøstl-256, N r =10, τ P =[0,1,2,3,4,5,6,7] and τ Q =[1,3,5,7,0,2,4,6], whereas for Grøstl-512, N r =14 and τ P =[0,1,2,3,4,5,6,11] and τ Q =[1,3,5,11,0,2,4,6].

Once all the message blocks of the padded input message have been processed by the compression function, a final output transformation is applied to the last chaining value h to produce the final n-bit hash value h′=trunc n (P(h)⊕h), where trunc n only keeps the last n bits.

2.3 Distinguishers

In this article, we describe algorithms that find input pairs (X,X′) for an AES-like permutation P, such that the input difference Δ IN =XX′ belongs to a subset of size IN and the output difference Δ OUT =P(X)⊕P(X′) belongs to a subset of size OUT. The best known generic algorithm (this problem is different than the one studied in [14] where linear subspaces are considered) in order to solve this problem, known as limited-birthday problem, has been given in [7] and later a very close lower bound has been proven in [22]. For a randomly chosen n-bit permutation π, the generic algorithm can find such a pair with complexity

$$\begin{aligned} \max \bigl\{\min \bigl\{\sqrt{2^n/\mathit{IN}},\sqrt{2^n/ \mathit{OUT}} \bigr\},2^n/(\mathit{IN} \cdot \mathit{OUT}) \bigr\}. \end{aligned}$$
(4)

If one is able to describe an algorithm requiring less computation power, then we consider that a distinguisher exists on the permutation π.

In the case of Grøstl, it is also interesting to look at not only the internal permutations P and Q, but also the compression function f itself. For that matter, we will generate compression function input values (h,m) such that Δ IN =mh belongs to a subset of size IN, and such that Δ IN Δ OUT =f(h,m)⊕f(m,h)⊕hm belongs to a subset of size OUT. Then, one can remark that:

$$\begin{aligned} &f(h,m) \oplus f(m,h) \\ &\quad{}= P_{256}(h\oplus m) \oplus Q_{256}(m) \oplus P_{256}(m\oplus h) \oplus Q_{256}(h) \oplus h \oplus m, \end{aligned}$$
(5)
$$\begin{aligned} &f(h,m) \oplus f(m,h) = Q_{256}(m) \oplus Q_{256}(h) \oplus h \oplus m. \end{aligned}$$
(6)

Hence, it follows that:

$$\begin{aligned} f(h,m) \oplus f(m,h) \oplus h \oplus m = Q_{256}(m) \oplus Q_{256}(h). \end{aligned}$$
(7)

Since the permutation Q is supposed to have no structural flaw, the best known generic algorithm requires \(\max\{\min\{\sqrt{2^{n}/\mathit{IN}},\sqrt{2^{n}/\mathit{OUT}}\},2^{n}/(\mathit{IN} \cdot \mathit{OUT})\}\) operations (the situation is exactly the same as the permutation distinguisher with permutation Q) to find a pair (h,m) of inputs such that hmIN and f(h,m)⊕f(m,h)⊕hmOUT. Note that both IN and OUT are specific to our attacks.

We emphasize that even if trivial distinguishers are already known for the Grøstl compression function (for example fixed-points), no distinguisher is known for the internal permutations. Moreover, our observations on the compression function use the differential properties of the internal permutations.

2.4 Truncated Differential Characteristics

In the following, we will consider truncated differential characteristics, originally introduced by Knudsen [13] for block cipher analysis. With this technique, already proven to be efficient for AES-based hash functions cryptanalysis [10, 11, 16, 18, 23], the attacker only checks if there is a difference in a cell (active cell, denoted by a black square in the figures) or not (inactive cell, denoted by an empty square in the figures) without caring about the actual value of the difference.

In this model, all AddRoundConstant and SubBytes layers can be ignored since they have no impact on truncated differences. ShiftBytes will only move the difference positions and the diffusion will come from the MixCells layers. More precisely, we denote xy a non-null truncated differential transition mapping x active cells to y active cells in a column through a MixCells (or MixCells −1) layer, and the MDS property ensures x+yt+1. Its differential probability is determined by the number (ty) of inactive cells on the output: 2c(ty) if the MDS property is verified, 0 otherwise.

3 Distinguishers for AES-like Permutations

In this section, we describe a distinguisher for 9 rounds of an AES-like permutation with certain parameters t and c. For the sake of clarity, we first describe the attack for a truncated differential characteristic with three fully active states in the middle, but we will generalize our method in the next section by introducing a characteristic parameterized by variables controlling the number of active cells in some particular states.

Let us remark that before our work, the best known such distinguishers on this type of constructions could only reach 8 rounds, being an open problem whether reaching more rounds would be possible.

3.1 A First Truncated Differential Characteristic

The truncated differential characteristic we use has the sequence of active cells

$$\begin{aligned} t \stackrel{R_1}{\longrightarrow} 1 \stackrel{R_2}{ \longrightarrow} t \stackrel{R_3}{\longrightarrow} t^{2} \stackrel{R_4}{\longrightarrow} t^{2} \stackrel{R_5}{\longrightarrow} t^{2} \stackrel{R_6}{\longrightarrow} t \stackrel{R_7}{ \longrightarrow} 1 \stackrel{R_8}{\longrightarrow} t \stackrel{R_9}{\longrightarrow} t^{2}, \end{aligned}$$
(8)

where the sizes of the input and output difference subsets are both IN=OUT=2ct, since there are t active c-bit cells in the input of the truncated characteristic, and the t 2 active cells in the output are linearly generated from only t active cells. The actual truncated characteristic instantiated with t=8 is described in Fig. 3.

Fig. 3.
figure 3

The 9-round truncated differential characteristic used to distinguish an AES-like permutation from an ideal permutation.

Note that we have three fully active internal states in the middle of the differential characteristic, and this kind of path is impossible to solve with previous rebound or SuperSBox techniques since the number of controlled rounds would be too small and the cost for the uncontrolled part would be extremely high.

3.2 Finding a Conforming Pair

The method to find a pair of inputs conforming to this truncated differential characteristic is similar to the rebound technique: we first find many solutions for the middle rounds (beginning of round 3 to the end of round 6) and then we filter them out during the outward probabilistic transitions through the MixCells layers (round 2 and round 7). Since in our case we have two MixCells transitions t→1 (see Fig. 3), the outbound phase has a success probability of 2−2c(t−1) and is straightforward to handle once we found enough solutions for the inbound phase.

In order to find solutions for the middle rounds (see Fig. 4), we propose an algorithm inspired by the ones in [20, 21]. As in [7, 14], instead of dealing with the classical t 2 parallel c-bit SubBytes SBox applications, one can consider t parallel tc-bit SBoxes (named SuperSBoxes) each composed of two SBox layers surrounding one MixCells and one AddRoundConstant function. Indeed, the ShiftBytes can be taken out from the SuperSBoxes since it commutes with SubBytes. The part of the internal state modified by one SuperSBox is a SuperSBox set. The total state is formed by t such sets, and their particularity is that their transformation through the SuperSBox can be computed independently.

Fig. 4.
figure 4

Inbound phase for the 9-round distinguisher attack on an AES-like permutation instantiated with t=8. The four rounds represented are the rounds 3 to 6 from the whole truncated differential characteristic. A gray cell indicates an active cell; hatched and colored cells emphasize one SuperSBox set: there are seven similar others for each one of the two hatched senses. (Color figure online)

We start by choosing the input difference δ IN after the first SubBytes layer in state S1 and the output difference δ OUT after the last MixCells layer in state S12. Both δ IN and δ OUT are exact differences, not truncated ones, but they are chosen so that they are compliant with the truncated characteristic in S0 and S12. Since we have t active cells in S1 and S12, there are as many as 22ct different ways of choosing (δ IN ,δ OUT ). Note that differences in S1 can be directly propagated to S3 since MixCells is linear. We continue by computing the t forward SuperSBox sets independently by considering the 2ct possible input values for each of them in state S3. This generates t independent lists, each of size 2ct and composed by paired values in S3 (that can be used to compute the corresponding paired values in S8). Doing the same for the t backward SuperSBox sets from state S12, we again get t independent lists of 2ct elements each, and we can compute for each element of each list the pair of values of the SuperSBox set in state S8, where the t forward and the t backward lists overlap. In the sequel, we denote L i the ith forward SuperSBox list and \(L'_{i}\) the ith backward one, for 1≤it.

In terms of freedom degrees in state S8, we want to merge 2t lists of 2ct elements each for a merging condition on 2×ct 2 bits, where we use the definition of list merging from [21] (ct 2 for values and ct 2 for differences) since the merging state is fully active: we then expect \(2^{2t\times ct} 2^{-2ct^{2}}=1\) solution as a result of the merging process on average.

In the following, we describe a method to find this solution and compute its complexity afterwards (see Fig. 5). In comparison to the algorithm suggested in [12] where the case t=8 is treated, we generalize the concept to any t, even odd ones where the direct extension of [12] is not applicable. To detail this algorithm, we use a temporary parameter t′∈[1,t] such that the time complexity will be written in terms of t′. In the end, we give the best choice for t′ such that the time complexity is minimal for any t.

Step 1.:

We start by considering every possible combination of elements in each of the t′ first lists \(L'_{1},\dots,L'_{t'}\) There are 2ctt possibilities.

Step 2.:

Each choice in Step 1 fixes the first t′ columns of the internal state (both values and differences) and thus forces 2c constraints on t′ cells in each of the t lists L i , 1≤it. For each list L i , we then expect on average 2ct2−2ct=2c(t−2t′) elements to match this constraint for each choice in Step 1, and these elements can be found with one operation by sorting the lists L i beforehand.Footnote 2

Step 3.:

We continue by considering every possible combination of elements in each of the tt′ last lists L tt′+1,…,L t . Depending on the value of t′, we may have different scenarios at this point: if t−2t′≥0, then the time complexity is multiplied by 2c(t−2t′)(tt′), which is the number of expected elements in the lists. Otherwise, the probability of success decreases from 1 to 2c(t−2t′)(tt′), as the constraints imposed by the previous step are too strong and elements in those lists would exist only with probability smaller than 1.

Step 4.:

We now need to ensure that the t′ first lists L 1,…,L t and the tt′ last lists \(L'_{t-t'+1},\dots,L'_{t}\) contain a candidate fulfilling the constraints deduced in the previous steps. In the \(L'_{i}\) lists, we already determined 2c(tt′) bits so that there are 2ct−2c(tt′) elements remaining in each of those. Again, we can check if these elements exist with one operation by sorting the lists beforehand. Finally, the value and difference of all the cells have been determined, which leads to a probability 2ct−2ct=2ct of finding a valid element in each of the t′ first lists L i .

Fig. 5.
figure 5

In the case where t=8, the figure shows the steps to merge the 2×t lists. Gray cells denote cells fully constrained by a choice of elements in \(L'_{1}, \ldots, L'_{t/2}\) during the first step.

All in all, trying all the 2ctt elements in Step 1, we find

$$2^{c\cdot t\cdot t' + c(t-2t')(t-t') + (ct-2c(t-t'))(t-t') - ct\cdot t'}=1 $$

solution during the merge process. We find this solution in time T m operations, with two cases to consider. Either t−2t′≥0, in which case we enumerate 2ctt elements in Step 1 followed by the enumeration of 2c(t−2t′) elements in Step 2. In that case, we have log2(T m )=ctt′+c(t−2t′)(tt′)=2t2−2tt′+t 2. If t−2t′≤0, the conditions imposed by the elements enumerated in the first steps make the lists from Step 2 to be nonempty with probability smaller than 1. Hence, we simply have log2(T m )=ctt′. This can be summarized by:

$$\begin{aligned} \log_2(T_m) = \begin{cases} c\cdot P_t(t') & \text{if } t-2t'\geq 0\text{ with } P_t = 2X^2-2tX+t^2,\\ c\cdot Q_t(t') & \text{if } t-2t'\leq 0\text{ with } Q_t = tX. \end{cases} \end{aligned}$$
(9)

To find the value t′ that minimizes the time complexity, we need to determine for which value the minimum of both polynomials P t and Q t is reached. For P t , we get \(\frac{t}{2}\) and the nearest integer value satisfying t−2t′≥0 is \(\lceil\frac{t}{2}\rceil\). For Q t , we get \(\lfloor\frac{t}{2}\rfloor\). For example, see Figs. 6a and 6b, when t equals 8 and 7, respectively.

Fig. 6.
figure 6

Plot of the two polynomials P t and Q t in two cases: t=8 and t=7.

Consequently, if t is even we set \(t'=\frac{t}{2}\), which leads to an algorithm running in \(2^{ct^{2}/2}\) operations and t′⋅2ct memory. If t is odd, then we need to decide whether t′ should be \(\lfloor\frac{t}{2}\rfloor\) or \(\lceil\frac{t}{2}\rceil\). If we write t=2k+1, this is equivalent to find the smallest value between P t (k) and Q t (k+1). We find P t (k)=2k 2+2k+1 and Q t (k+1)=2k 2+3k+1 so that P t (k)<Q t (k+1) (see for example Fig. 6b when t=7). Hence, when t is odd, we fix \(t'=\lceil\frac{t}{2}\rceil\). Note that \(\frac{t}{2}=\lceil\frac{t}{2}\rceil\) if t is even.

Summing up, for any t, our algorithm performing the merge runs in T m operations, with:

$$ \log_2(T_m)=c\cdot P_t \biggl( \biggl \lceil\frac{t}{2} \biggr\rceil \biggr)= ct^2-2c \biggl\lfloor \frac{t}{2} \biggr\rfloor \biggl\lceil\frac{t}{2} \biggr\rceil $$
(10)

and a memory requirement of 2t′⋅2ct.

Hence, from a pair of random fixed differences (δ IN ,δ OUT ), we show how to find a pair of internal states of the permutation that conforms to the middle rounds. To pass the probabilistic transitions of the outbound phase, we need to repeat the merging 22c(t−1) times by picking another couple of differences (δ IN ,δ OUT ). In total, we find a pair of inputs to the permutation that conforms to the truncated differential characteristic in time T 9=22c(t−1)T m operations, that is:

$$ \log_2(T_9)=ct(t+2)-2c \biggl( \biggl\lfloor \frac{t}{2} \biggr\rfloor \biggl\lceil\frac{t}{2} \biggr\rceil+1 \biggr) $$
(11)

with a memory requirement of t⋅2ct.

3.3 Comparison with the Ideal Case

In the ideal case [7], obtaining a pair whose input and output differences lie in a subset of size IN=OUT=2ct for a ct 2-bit permutation requires

$$\begin{aligned} 2^{\max \{ {ct(t-1)}/{2}, ct^2-ct-ct \}}=2^{ct(t-2)}, \end{aligned}$$
(12)

computations (assuming t≥3). Therefore, our algorithm distinguishes an AES-like permutation from a random one if and only if its time complexity is smaller than the generic one. This occurs when log2(T 9)≤ct(t−2), which happens as soon as t≥8. Note that for the AES in the known-key model, we have t=4 and thus our attack does not apply.

One can also derive slightly cheaper distinguishers by aiming at less rounds: instead of using the 9-round truncated characteristic from Fig. 3, it is possible to remove either round 2 or 8 and spare one t→1 truncated differential transition. Overall, the generic complexity remains the same and this gives a distinguishing attack on the 8-round reduced version of the AES-like permutation with T 8 computations, with:

$$ \log_2(T_8)=\log_2(T_m)+c(t-1)=ct(t+1)-c \biggl(2 \biggl\lfloor\frac{t}{2} \biggr\rfloor \biggl\lceil \frac{t}{2} \biggr\rceil+1 \biggr) $$
(13)

and still 2ct memory provided that t≥6. If we spare both t→1 transitions, we end up with a 7-round distinguishing attack with time complexity T 7=T m and t⋅2ct memory for any t≥4. Note that those reduced versions of this attack can have a greater time complexity than other techniques: we provide them only for the sake of completeness.

4 Using Non-fully-active Characteristics

4.1 The Generic Truncated Characteristic

In [25], Sasaki et al. present new truncated differential characteristics that are not totally active in the middle. Their analysis allows to derive distinguishers for 8 rounds of AES-like permutations with no totally-active state in the middle, provided that the state-size verifies t≥5. In this section, we reuse their idea by introducing an additional round in the middle of their trail, which is the unique fully active state of the characteristic. With a similar algorithm as in the previous section, we show how to find a pair conforming to that case.

To keep our reasoning as general as possible, we parameterize the truncated differential characteristic by four variables (see Fig. 7) such that trade-offs will be possible by finding the right values for each one of them. Namely, we denote n B the number of active diagonals in the plaintext (alternatively, the number of active cells in the second round), n F the number of active diagonals in the ciphertext (alternatively, the number of active cells in the eighth round), m B the number of active cells in the third round and m F the number of active cells in the seventh round.

Fig. 7.
figure 7

Non-fully-active truncated differential characteristic on 9 rounds of an AES-like permutation instantiated with t=8.

Hence, the sequence of active cells in the truncated differential characteristic becomes:

$$\begin{aligned} tn_{B} &\stackrel{R_1}{\longrightarrow} n_{B} \stackrel{R_2}{\longrightarrow} m_{B} \stackrel{R_3}{\longrightarrow} tm_{B} \stackrel{R_4}{\longrightarrow} t^{2} \stackrel{R_5}{\longrightarrow} tm_{F} \\ &\stackrel{R_6}{\longrightarrow} m_{F} \stackrel{R_7}{\longrightarrow} n_{F} \stackrel{R_8}{\longrightarrow} tn_{F} \stackrel{R_9}{\longrightarrow} t^{2}, \end{aligned}$$
(14)

with the constraints n F +m F t+1 and n B +m B t+1 that come from the MDS property. The amount of solutions that can be generated for the differential path equals to (log2):

$$\begin{aligned} &ct^2 + ctn_B - c(t-1)n_B - c(t-m_B) - ct (t-m_F) - c(t-1)m_F - c(t-n_F) \\ &\quad{}=c(n_B + n_F + m_B + m_F - 2t). \end{aligned}$$
(15)

From the MDS constraints m B +n B t+1 and m F +n F t+1, we can bound the amount of expected solutions by 2c(t+1+t+1−2t)=22c. This means that, there will always be at least 22c freedom degrees, independently of t.

4.2 Finding a Conforming Pair

As in the previous case, the algorithm that finds a pair of inputs conforming to this characteristic first produces many pairs for the middle rounds and then exhausts them outwards until one passes the probabilistic filter. The cost of those uncontrolled rounds is given by:

$$\begin{aligned} 2^{c(t-n_B)}2^{c(t-n_F)} = 2^{c(2t-n_B-n_F)}, \end{aligned}$$
(16)

since we need to pass one n B m B transition in the backward direction and one m F n F in the forward direction.

We now detail a way to find a solution for the middle rounds (Fig. 8) when the input difference δ IN after the first SubBytes layer in state S1 and the output difference δ OUT after the last MixCells layer in state S12 are fixed in a way that the truncated characteristic holds in S0 and S12. The beginning of the attack is exactly the same as before in the sense that once the output differences have been fixed, we generate the 2t lists that contains the paired values of the t forward SuperSBox sets and the t backward SuperSBox sets. Again, the same 2t lists overlap and we show how to find the solution of the merging problem in \(2^{ct \cdot\min(m_{F}, m_{B},\lceil {t}/{2}\rceil)}\) operations and m B ⋅2ct memory. We recall that L i is the ith forward SuperSBox list (orange) and \(L'_{i}\) is the ith backward one (blue), for 1≤it.

Fig. 8.
figure 8

Inbound phase for the 9-round distinguisher attack on an AES-like permutation instantiated with t=8 with a single fully-active state in the middle. A gray cell indicates an active cell; hatched and colored cells emphasize one SuperSBox set: there are seven similar others. (Color figure online)

We proceed in three steps, where the first guesses the elements from some lists, this determines the remaining cells and we finish by checking probabilistic events. Without loss of generality, we assume in the sequel that m B m F ; if this is not the case, then we start Step 1 by guessing elements of lists L i in S8. We split the analysis into two cases, whether \(m_{B}\leq\lceil\frac{t}{2}\rceil\) or \(m_{B}>\lceil\frac{t}{2}\rceil\).

First case: \(m_{B}\leq\lceil\frac{t}{2}\rceil\).:

In this case, we use the strong constraints on the vector spaces spanned by the m B differences on each columns to find a solution to the merge problem.

Step 1.:

We start by guessing the elements of the m B lists \(L'_{1},\dots,L'_{m_{B}}\) in state S6. There is a total of \(2^{c t m_{B}}\) possible combinations.

Step 2.:

In particular, the previous step sets the differences of the m B first diagonals of S6 such that there are exactly m B known differences on each of the t columns of the state. This allows to determine all the differences in S5 since there are exactly m B independent differences in each column of that state. Consequently, we linearly learn all the differences of S6.

Step 3.:

Since all differences are known in S6, we determine 1 element in each of the tm B remaining \(L'_{i}\) lists: they are of size 2ct and we count ct bits of constraints coming from t differences. From the known differences, we also get a suggestion of \(2^{ct-cm_{B}}\) values for the cells of each column. Indeed, the elements of the t lists L i in S5 can be represented as disjointed sets regarding the values of the differences, since the differences can only take \(2^{cm_{B}}\) values per column. Assuming that they are uniformly distributed,Footnote 3 we get \(2^{ct}/2^{cm_{B}}=2^{ct-cm_{B}}\) elements per disjointed set for each list: they all share the same value of the differences, but have different values. Additionally, the ct-bit constraints of each list L i allows to find one element in each, and therefore a solution to the merge problem, with probability \(2^{((ct-cm_{B})-ct)t}=2^{-ctm_{B}}\).

Step 4.:

Finally, trying all the \(2^{ctm_{B}}\) elements in \((L'_{1}, \dots, L'_{m_{B}})\), we expect to find \(2^{ctm_{B}}2^{-ctm_{B}}=1\) solution that gives a pair of internal states conforming to the four middle rounds with a few operations.

Second case: \(m_{B}>\lceil\frac{t}{2}\rceil\).:

The columns of differences are less constrained, and it is enough to guess \(\lceil\frac{t}{2}\rceil\) lists in the first step to find a solution to the merge problem.

Step 1.:

We start by guessing the elements of the \(\lceil\frac{t}{2}\rceil\) lists \(L'_{1},\dots,L'_{m_{B}}\) in state S6. There is a total of 2ctt/2⌉ possible combinations.

Step 2.:

The previous step allows to filter 2c(t−2⌈t/2⌉) elements in each of the t lists L i . Depending of the parity of t, we get 1 element per list for even t, and 2c for odd ones.Footnote 4 In the latter case, there are then a probability 2ct that the t elements are found in the t lists L i .

Step 3.:

In the event that elements have been found in the previous step, we determine completely the remaining \(2ct(t-\lceil\frac{t}{2}\rceil)\) values and differences of the remaining \(t-\lceil\frac{t}{2}\rceil=\lfloor\frac{t}{2}\rfloor\) lists \(L'_{i}\). We find a match in those lists with probability 2ct×2(ct−2ct)(t−⌈t/2⌉)=2ct(1+⌊t/2⌋).

Step 4.:

Finally, trying all the \(2^{ct \lceil\frac{t}{2}\rceil}\) elements in \((L'_{1}, \dots, L'_{\lceil{t}/{2}\rceil})\), we expect to find 2ctt/2⌉2ct(1+⌊t/2⌋)=1 solution that gives a pair of internal states that conforms to the four middle rounds with a few operations.

Hence, in any case, from random differences (δ IN ,δ OUT ), we find a pair of internal states of the permutation that conforms to the middle rounds in time \(2^{ct\min (m_{B}, \lceil\frac{t}{2} \rceil )}\) and memory m B 2ct. To pass the probabilistic transitions of the outbound phase, we need to repeat the merging \(2^{c(2t-n_{B}-n_{F})}\) times by picking another couple of differences (δ IN ,δ OUT ). In total, we find a pair of inputs to the permutation conforming to the truncated differential characteristic in time complexity \(2^{ct\min(m_{B},\lceil{t}/{2}\rceil)}2^{c(2t-n_{B}-n_{F})}=2^{c(t(\min(m_{B},\lceil{t}/{2}\rceil)+2)-n_{B}-n_{F})}\) and memory complexity m B ⋅2ct.

Finally, without assuming m B m F , the time complexity T of the algorithm generalizes to:

$$\begin{aligned} \log_2(T)=c \biggl(t\cdot\min \biggl\{m_B,m_F, \biggl\lceil\frac{t}{2} \biggr\rceil \biggr\}+2t-n_B-n_F \biggr), \end{aligned}$$
(17)

with n F +m F t+1 and n B +m B t+1, and memory requirement of m B ⋅2ct.

4.3 Comparison with Ideal Case

In the ideal case, the generic complexity C(a,b) is given by the limited birthday distinguisher:

$$\begin{aligned} \log_{2^{c}} \bigl(C(a,b) \bigr)=\max \biggl\{\min \biggl\{ \frac{t^2-a}{2}, \frac{t^2-b}{2} \biggr\}, t^2-a-b \biggr\}, \end{aligned}$$
(18)

since we get an input space of size IN=2ca and output space of size OUT=2cb. Without loss of generality, assume that ab: this only selects whether we attack the permutation or its inverse. In that case, we have:

$$\begin{aligned} \log_{2^c} \bigl(C(a,b) \bigr) =\begin{cases} C_1(a,b) := (t^2-b)/2, &\text{if: } t^2 < 2a+b,\\ C_2(a,b) := a, &\text{if: } t^2=2a+b,\\ C_3(a,b) := t^2-a-b, &\text{if: } t^2 > 2a+b. \end{cases} \end{aligned}$$
(19)

In the case of the 9-round distinguisher, the generic complexity equals C(tn B ,tn F ) since there are n B active diagonals at the input, and n F active diagonals at the output. Let us compare T and the case of C 3(tn B ,tn F ) where t>2n B +n F corresponding to the limited birthday distinguisher. We want to find set of values for the parameters (t,n F ,n B ,m F ,m B ) such that our algorithm runs faster that the generic one, that is T is smaller than C 3(tn B ,tn F ). In the event that \(\min (m_{F},m_{B}, \lceil\frac{t}{2} \rceil )\) is either m F or m B , we can show that T is always greater than C 3(tn B ,tn F ), and so are the cases involving C 2(tn B ,tn F ) and C 1(tn B ,tn F ).

We consider the case \(\min (m_{F},m_{B}, \lceil\frac{t}{2} \rceil )= \lceil\frac{t}{2} \rceil\):

$$\begin{aligned} &\log_{2^c} \bigl(C_3 (t\cdot n_B, t\cdot n_F ) \bigr)-\log_{2^c} (T ) \\ &\quad{}= t (t-n_F-n_B ) - t \biggl\lceil \frac{t}{2} \biggr\rceil-2t+n_B+n_F. \end{aligned}$$
(20)

With t as a parameter and n F ,n B ∈{1,…,t}, our algorithm turns out to be a distinguisher when the quantity from (20) is positive, which is true as soon as

$$\begin{aligned} (n_{B}+n_{F}) (1-t) + t \biggl(t-2- \biggl\lceil \frac{t}{2} \biggr\rceil \biggr)\geq0. \end{aligned}$$
(21)

Since \(t- \lceil\frac{t}{2} \rceil= \lfloor\frac{t}{2} \rfloor\), we can show that if n F ∈{1,…,t} and n B ∈{1,…,t} are chosen such that

$$\begin{aligned} 2\leq n_F+n_B\leq \frac{t}{t-1} \biggl( \biggl\lfloor\frac{t}{2} \biggr\rfloor-2 \biggr), \end{aligned}$$
(22)

then our algorithm is more efficient than the generic one. Note that this may happen only when t≥8 and that m F and m B are still constrained by the MDS bound: n F +m F t+1 and n B +m B t+1.

We can also consider an 8-round case by considering the characteristic from Fig. 7 where the last round is removed:Footnote 5 the generic complexity becomes C(tn B ,n F ). Note that the complexity of our algorithm remains unchanged: there are still two probabilistic transitions to pass. For t≥4, we can show that there are many ways to set the parameters (n F ,n B ,m F ,m B ) so that TC(tn B ,n F ), and the best choice providing the most efficient distinguisher happens when the MDS bounds are tight, i.e.: n F +m F =t+1 and n B +m B =t+1.

For the sake of completeness, we can also derive distinguishers for 7-round of the permutation by considering the characteristic from Fig. 7 where the first and last rounds are removed, as soon as t≥4. The generic complexity in that scenario is C(n B ,n F ). Again, there are several ways to set the parameters, but the one that minimizes the runtime T of our algorithm also verifies the MDS bounds: n B =1, m B =t, m F =1 and n F =t.

We give examples of more different cases in Table 2, which for instance match AES and Grøstl instantiation. We note that the complexities of our algorithm may be worse that other published results.

Table 2. Examples of reached time complexities for several numbers of rounds and different (t,c) scenarios.

5 Applications to Grøstl-256 Permutations

The permutations of the Grøstl-256 hash function implement the previous generic algorithms will the following parameters: t=8, c=8 and N r =10.

Three Fully-Active States

From the analysis of Sect. 3, we can directly conclude that this leads to a distinguishing attack on the 9-round reduced version of the Grøstl-256 permutation with \(2^{c(t^{2}/2+2(t-1))}=2^{368}\) computations and 2ct=264 memory, when the ideal complexity requires 2ct(t−2)=2384 operations.

As detailed previously, we could derive distinguishers for 8-round Grøstl-256 with \(2^{c(t^{2}/2+t-1)}=2^{312}\) operations and for 7-round Grøstl-256 with \(2^{ct^{2}/2}=2^{256}\), but those results are more costly than previous known results.

Similarly, as explained in Sect. 2.3, this result also induces a nontrivial observation on the 9-round reduced version of the Grøstl-256 compression function with identical complexity.

Non-fully-active Characteristic

With the generic analysis of Sect. 4 that uses a single fully-active middle state, t=8 only allows to instantiate the parameterized truncated differential characteristic with n F =n B =1, which determines m F =m B =8. Indeed, (22) imposes \(2\leq n_{B}+n_{F}\leq \frac{16}{7}\), which gives integer values n F =n B =1. Note that it is exactly the case of the three fully-active states in the middle treated in Sect. 3, with the same complexities.

For 8-round distinguishers, the case t=8 where n B n F may give the parameters n B =5, m B =4, m F =1 and n F =8 with the last round of the characteristic of Fig. 7 is removed. If n B >n F , we instantiate the characteristic with the first round removed with the values n B =8, m B =1, m F =4 and n F =5. In both cases, the time complexity of the distinguishers is 288 operations with 264 of memory requirement, whereas the generic algorithm terminates in about 2128 operations. As for 7-round distinguishers, removing both first and last rounds of the characteristic of Fig. 7 leads to an efficient distinguishers for Grøstl-256 when n B =8, m B =1, m F =1 and n F =8. The corresponding algorithm runs in 264 operations with 264 of memory requirement, when the corresponding generic algorithm needs 2384 operations to terminate. We note that those 8- and 7-round distinguishers are not as efficient as other available techniques: we provide them for the sake of completeness.

6 Conclusion

In this article, we have provided a new and improved cryptanalysis method for AES-like permutations, by using a rebound-like approach as well as an algorithm that allows us to control four rounds in the middle of a truncated differential path, with a lower complexity than a general probabilistic approach. To the best of our knowledge, all previously known methods only manage to control three rounds in the middle and we close the open problem whether this was possible or not.

We apply our algorithm on several algorithms and in particular on the building blocks of both the 256 and 512-bit versions of the SHA-3 finalist Grøstl. We could provide the best known distinguishers on 9 rounds of the internal permutations of Grøstl-256, while for Grøstl-512, we have considerably increased the number of analyzed rounds, from 7 to 10.

These results do not threaten the security of Grøstl, but we believe they will have an important role in better understanding AES-based functions in general. In particular, we believe that our work will help determining the bounds and limits of rebound-like attacks in these types of constructions. Future works could include the study of more AES-like functions in regards to this new cryptanalysis method.