Keywords

1 Introduction

AES (Advanced Encryption Standard) [6] is probably the most used and studied block cipher, and many constructions employ reduced-round AES as part of their design. Determining its security is therefore one of the most important problems in cryptanalysis. Since there is no known attack which can break the full AES significantly faster than exhaustive search, researchers have focused on attacks which can break reduced-round versions of AES. Especially within the last couple of years, new cryptanalysis results on the AES have appeared regularly (e.g., [1, 9, 12, 15]). While those papers do not pose any practical threat to the AES, they do give new insights into the internals of what is arguably the cipher that is responsible for the largest fraction of encrypted data worldwide.

Among many others, a new technique called “Mixture Differential Cryptanalysis” [9] has been recently presented at FSE/ToSC 2019, which is a way to translate the (complex) “multiple-of-8” 5-round distinguisher [12] into a simpler and more convenient one (though, on a smaller number of rounds). Given a pair of chosen plaintexts, the idea is to construct new pairs of plaintexts by mixing the generating variables of the initial pair of plaintexts. As proved in [9], for 4-round AES the corresponding ciphertexts of the initial pair of plaintexts lie in a particular subspace if and only if the corresponding pairs of ciphertexts of the new pairs of plaintexts have the same property. Such a secret-key distinguisher, which is also independent of the details of the S-box and of the MixColumns matrix, has been reconsidered in [4], where the authors show that it is an immediate consequence of an equivalence relation on the input pairs, under which the difference at the output of the round function is invariant. Moreover, it is also the starting point for practical and competitive key-recovery attacks on 5-round AES-128 and 7-round AES-192 [1], breaking the record for these attacks which was obtained 18 years ago by the classical Square attack.

In this paper, we reconsider this distinguisher on a smaller number of rounds in order to set up new (competitive) low-data distinguishers and key-recovery attacks on reduced-round AES. A summary of our results can be found in Tables 1, 2 and 3.

Our Contribution and Related Work

Cryptanalysis of block ciphers has focused on maximizing the number of rounds that can be broken without exhausting the full code book or key space. This often leads to attacks marginally close to that of brute force. Even if these attacks are important to e.g. determine the security margin of a cipher (i.e., the ratio between the number of rounds which can be successfully attacked and the number of rounds in the full cipher), they are not practical.

For this reason, low-data distinguishers/attacks on reduced-round ciphers have recently gained renewed interest in the literature. Indeed, it seems desirable to also consider other approaches, such as restricting the attacker’s resources, in order to adhere to “real-life” scenarios. In this case, the time complexity of the attack is not limited (besides the natural bound of exhaustive search), but the data complexity is restricted to only a few known or chosen plaintexts.

Table 1. Secret-key distinguishers on 3-round AES which are independent of the secret key. The data complexity corresponds to the minimum number of chosen plaintexts/ciphertexts (CP/CC) and/or adaptive chosen plaintexts/ciphertexts (ACP/ACC) which are needed to distinguish the AES permutation from a random permutation with a success probability denoted by \(\text {Prob}\).

Attacks in this scenario have been studied in various papers, which include low-data Guess-and-Determine and Meet-in-the-Middle techniques [3], low-data truncated differential cryptanalysis [11], polytopic cryptanalysis [17], and, if adaptive chosen plaintexts/ciphertexts are allowed, yoyo-like attacks [15].

“Mixture Integral” Key-Recovery Attacks. In Sect. 4 we show that “Mixture Differential Cryptanalysis” [9] can be exploited in order to set up low-data attacks on reduced-round AES. Given a set of chosen plaintexts defined as in [9], our attacks are based on the fact that the XOR sum of the corresponding texts after 2-round AES encryptions is equal to zero with prob. 1. Using the same strategy proposed in a classical square/integral attack [5, 14], this zero-sum property can be exploited to set up competitive attacks on 3- and 4-round AES, which require only 4 and 6 chosen plaintexts, respectively. A comparison of all known low-data attacks on AES and our attacks is given in Table 2. Since (1) the pairs of plaintexts used to set up the attacks share the same generating variables – which are mixed in the same way proposed by the Mixture Differential Distinguisher – and since (2) such attacks exploit the zero-sum property (instead of a differential one), we call this attack a mixture integral attack.

“Impossible Mixture Integral” Secret-Key Distinguisher. In Sect. 3.2, we show that the previous distinguishers/attacks can also be exploited to set up a new 3-round secret-key distinguisher on AES, which is independent of the key, of the details of the S-box, and of the MixColumns operation. For a success probability of \(\approx 95\%\), such a distinguisher requires only 10 chosen plaintexts (or ciphertexts), i.e., half of the data required by the most competitive distinguisher currently present in the literature (which does not require adaptive chosen texts).

The Property Exploited by this New Distinguisher. Consider a zero-sum key-recovery attack on 3-round AES (based on a 2-round zero-sum distinguisher). An integral attack assumes that the zero-sum property is always satisfied when decrypting under the secret key. Thus, if there is no key for which the zero-sum property is satisfied, the ciphertexts have likely been generated by a random permutation, and not by AES. Such a strategy can be used as a distinguisher, but requires key guessing and is thus not independent of the secret key. In Sect. 3.2, we show how to evaluate this property without guessing any key material by providing a property which is independent of the secret key, and which holds for the ciphertexts only in the case in which the key-recovery (mixture integral) attack just proposed fails. The obtained 3-round distinguisher can also be used to set up new key-recovery attacks on reduced-round AES.

Table 2. Attacks on reduced-round AES-128. The data complexity corresponds to the number of required chosen plaintexts (CP). The time complexity is measured in reduced-round AES encryption equivalents (E), while the memory complexity is measured in plaintexts (16 bytes). Precomputation is given in parentheses. The case in which the final MixColumns operation is omitted is denoted by “r.5 rounds” (r full rounds \(+\) the final round). “Key sched.” highlights whether the attack exploits the details of the key schedule of AES.
Table 3. Comparison of attacks on reduced-round AES with a secret S-box. The data complexity corresponds to the number of required chosen plaintexts/ciphertexts (CP/CC). The time complexity is measured in reduced-round AES encryption equivalents (E), in memory accesses (M), or XOR operations (XOR). The memory complexity is measured in plaintexts (16 bytes). The case in which the final MixColumns operation is omitted is denoted by “r.5 rounds”, that is, r full rounds and the final round. New attacks are in bold. Strategy 1 (S1) denotes an attack that requires finding the details of the S-box, while Strategy 2 (S2) denotes an attack that directly finds the key.

AES with a Single Secret S-Box. Finally, in Sect. 5 we show that a competitive mixture integral attack can also be set up on reduced-round AES with a single secret S-box, i.e., the case in which the AES S-box is replaced by a secret 8-bit one while keeping everything else unchanged. In the literature, two possible strategies are considered to set up the attack:

  • Strategy S1. The attacker first determines the secret S-box up to additive constants (that is, S-box\((\cdot \oplus a)\oplus b\) for unknown a and b), and then they use this knowledge and apply attacks present in the literature (e.g., the integral one) to derive the whitening key.

  • Strategy S2. They exploit a particular property of the MixColumns matrix (i.e., the fact that two elements for each row of the matrix are equal) to directly find the secret key (no information of the secret S-box is found/used).

Examples for attacks based on the first strategy are given in [7, 18], while examples for attacks based on the second strategy are given in [8, 11, 16]. In Sect. 5 we exploit the first strategy in order to set up a competitive attack on 3-round AES with a single secret S-box. A comparison of all known attacks on reduced-round AES with a single secret S-box and our attack is given in Table 3.

Practical Verification. We implemented Algorithm 1, Algorithm 2, Algorithm 4, and Algorithm 5 in practice and could verify the theoretical results. We also implemented a method to find the affine equivalent of a secret S-box. All the source code files can be found online.Footnote 1

2 Preliminaries

2.1 Brief Description of AES

AES [6] is a substitution-permutation network (SPN) that supports key sizes of 128, 192, and 256 bits. The 128-bit plaintext initializes the internal state as a \(4 \times 4\) matrix of bytes as values in the finite field \(\mathbb {F}_{2^8}\equiv \mathbb {F}_2[X]/(X^8+ X^4+X^3+X+1)\). Depending on the version of AES, \(N_r\) rounds are applied to the state, where \(N_r=10\) for AES-128, \(N_r=12\) for AES-192, and \(N_r=14\) for AES-256. An AES round applies four operations to the state matrix:

  • SubBytes (S-box) – applying the same 8-bit to 8-bit invertible S-box 16 times in parallel on each byte of the state (provides non-linearity in the cipher).

  • ShiftRows (SR) – cyclic shift of each row to the left.

  • MixColumns (MC) – multiplication of each column by a constant \(4 \times 4\) MDS matrix (SR and MC provide diffusion in the cipher).

  • AddRoundKey (ARK) – XORing the state with a 128-bit subkey.

One round of AES can be described as \( R(x) = K \oplus MC\circ SR \circ \text { S-box} (x)\). In the first round an additional AddRoundKey operation (using a whitening key) is applied, and in the last round the MixColumns operation is omitted.

The Notation Used in the Paper. Let x denote a plaintext, a ciphertext, an intermediate state, or a key. Then \(x_{i, j}\) with \(i, j \in \{0, \dots , 3\}\) denotes the byte in the \(j\)-column of the \(i\)-row. We denote by R one round of AES, while we denote r rounds of AES by \(R^{r}\). Finally, in the paper we often use the term partial collision (or collision) when two texts belong to the same coset of a given subspace \(\mathcal {X}\). We recall that given a subspace \(\mathcal {X}\), the cosets \(\mathcal {X}\oplus a\) and \(\mathcal {X}\oplus b\) (where \(a\ne b\)) are equal (that is, \(\mathcal {X}\oplus a \equiv \mathcal {X} \oplus b\)) if and only if \(a \oplus b \in \mathcal {X}\).

2.2 Subspace Trail Notation for AES

Here we briefly recall the subspace trail notation for AES [11]. We work with vectors and vector spaces over \(\mathbb {F}_{2^8}^{4 \times 4}\), and we denote by \(\{e_{0,0}, \dots , e_{3,3}\}\) the unit vectors of \(\mathbb {F}_{2^8}^{4 \times 4}\) (e.g., \(e_{i,j}\) has a single 1 in row i and column j).

Definition 1

For \(i\in \{0,1,2,3\}\):

  • The column spaces \(\mathcal {C}_i\) are denoted as \(\mathcal {C}_i = \langle e_{0, i}, e_{1, i}, e_{2, i}, e_{3, i} \rangle \).

  • The diagonal spaces \(\mathcal {D}_i\) are denoted as \(\mathcal {D}_i = SR^{-1}(\mathcal {C}_i)\). Similarly, the inverse-diagonal spaces \(\mathcal {ID}_i\) are denoted as \(\mathcal {ID}_i = SR(\mathcal {C}_i)\).

  • The i-th mixed spaces \(\mathcal {M}_i\) are denoted as \(\mathcal {M}_i = MC (\mathcal {ID}_i)\).

Definition 2

For \(I \subseteq \{0, 1, 2, 3\}\), let \(\mathcal {C}_I\), \(\mathcal {D}_I\), \(\mathcal {ID}_I\) and \(\mathcal {M}_I\) be denoted as

$$ \mathcal {C}_I = \bigoplus _{i\in I} \mathcal {C}_i, \qquad \mathcal {D}_I = \bigoplus _{i\in I} \mathcal {D}_i, \qquad \mathcal {ID}_I = \bigoplus _{i\in I} \mathcal {ID}_i, \qquad \mathcal {M}_I = \bigoplus _{i\in I} \mathcal {M}_i. $$

Let \(\mathcal {X}^c\) be the complementary space of \(\mathcal {X}\). As shown in detail in [11],

(1):

For any coset \(\mathcal {D}_I \oplus a\) there exists a unique \(b\in \mathcal {C}_I^c\) s.t. \(R(\mathcal {D}_I \oplus a) = \mathcal {C}_I \oplus b\).

(2):

For any coset \(\mathcal {C}_I \oplus a\) there exists a unique \(b\in \mathcal {M}_I^c\) s.t. \(R(\mathcal {C}_I \oplus a) = \mathcal {M}_I \oplus b\).

Theorem 1

([11]). For each \(I\subseteq \{0,1,2,3\}\) and for each \(a \in \mathcal {D}_I^c\), there exists one and only one \(b\in \mathcal {M}_I^c\) s.t. \(R^{2}(\mathcal {D}_I \oplus a) = \mathcal {M}_I \oplus b\).

Observe that if (1) X is a subspace, (2) \(X\oplus a\) is a coset of X, and (3) x and y are two elements of the (same) coset \(X\oplus a\), then \(x\oplus y \in X\). We hence give the following result.

Lemma 1

For all xy and for all \(I\subseteq \{0,1,2,3\}\),

$$\begin{aligned} \mathrm {Prob}(R^{2}(x) \oplus R^{2}(y) \in \mathcal {M}_I \, | \, x \oplus y \in \mathcal {D}_I) = 1. \end{aligned}$$

All these results can be redescribed using a more “classical” truncated differential notation. E.g., if two texts \(t^1\) and \(t^2\) are equal except for the bytes in the i-th diagonalFootnote 2 for each \(i \in I\), then they belong to the same coset of \(\mathcal {D}_I\). A coset of \(\mathcal {D}_I\) corresponds to a set of \(2^{32 \cdot |I|}\) texts with |I| active diagonals. Again, two texts \(t^1\) and \(t^2\) belong to the same coset of \(\mathcal {ID}_I\) if the difference of the bytes that lie in the i-th anti-diagonal for each \(i \notin I\) is equal to zero. Similar considerations hold for the column space \(\mathcal {C}_I\) and the mixed space \(\mathcal {M}_I\).

We finally introduce a notation that we largely use in the following.

Definition 3

([9]). Let \(\mathcal {X}\) be one of the previous subspaces, that is, \(\mathcal {C}_I\), \(\mathcal {D}_I\), \(\mathcal {ID}_I\) or \(\mathcal {M}_I\). Let \(x_0, \dots , x_{n-1} \in \mathbb {F}_{2^8}^{4\times 4}\) be a basis of \(\mathcal {X}\), i.e., \(\mathcal {X} \equiv \langle x_0, x_1, \dots , x_{n-1} \rangle \), where \(n = 4\cdot |I|\). Let t be an element of an arbitrary coset of \(\mathcal {X}\), that is, \(t\in \mathcal {X}\oplus a\). We say that t is “generated” by the generating variables \((t^0, \dots , t^{n-1})\) (where \(t^i \in \mathbb {F}_{2^8}\)) – for the following, \(t\equiv (t^0, \dots , t^{n-1})\) – if and only if \( t =a\oplus \bigoplus _{i=0}^n t^i \cdot x_i\).

As an example, let \(\mathcal {X}=\mathcal {M}_0\equiv \langle MC(e_{0,0}), MC(e_{3,1}), MC(e_{2,2}), MC(e_{1,3})\rangle \), and let \(p\in \mathcal {M}_0 \oplus a\). Then \(p\equiv (p^0, p^1, p^2, p^3)\) if and only if \(p \equiv p^0 \cdot MC(e_{0,0}) \oplus p^1 \cdot MC(e_{1,3}) \oplus p^2 \cdot MC(e_{2,2}) \oplus p^3 \cdot MC(e_{3,1}) \oplus a\). Similarly, let \(\mathcal {X}=\mathcal {C}_0\equiv \langle e_{0,0}, e_{1,0}, e_{2,0}, e_{3,0}\rangle \), and let \(p\in \mathcal {C}_0 \oplus a\). Then \(p\equiv (p^0, p^1, p^2, p^3)\) if and only if \(p \equiv a \oplus p^0 \cdot e_{0,0} \oplus p^1 \cdot e_{1,0} \oplus p^2 \cdot e_{2,0} \oplus p^3 \cdot e_{3,0}\).

3 Mixture Integral Distinguisher on 2-Round AES

3.1 Mixture Differential Secret-Key Distinguisher

In order to present our result, we recall the “mixture differential distinguisher” [9] on reduced-round AES proposed at FSE/ToSC 2019.

Theorem 2

([9]). Given the subspace \(\mathcal {C}_0 \cap \mathcal {D}_{0,3} \equiv \langle e_{0,0}, e_{1,0}\rangle \subseteq \mathcal {C}_0\), consider two plaintexts \(p^1\) and \(p^2\) in the same coset \(\left( \mathcal {C}_0 \cap \mathcal {D}_{0,3} \right) \oplus a\) generated by \(p^1 \equiv (z^1, w^1)\) and \(p^2 \equiv ( z^2, w^2)\) (where \(z^i, w^i \in \mathbb {F}_{2^8}\) for \(i=1,2\)). Let \(\tilde{p}^1, \tilde{p}^2 \in \mathcal {C}_0 \oplus a \equiv \langle e_{0,0}, e_{1,0}, e_{2,0}, e_{3,0} \rangle \oplus a \) be two other plaintexts generated by

$$ \tilde{p}^1 \equiv (z^1, w^1, \varPsi , \varPhi ), \, \tilde{p}^2 \equiv (z^2, w^2, \varPsi , \varPhi ) \text { or } \tilde{p}^1 \equiv (z^1, w^2, \varPsi , \varPhi ), \, \tilde{p}^2 \equiv (z^2, w^1, \varPsi , \varPhi ), $$

where \(\varPsi \) and \(\varPhi \) can take any possible value in \(\mathbb {F}_{2^8}\). Then

$$\begin{aligned} R^2(p^1) \oplus R^2(p^2) = R^2(\tilde{p}^1) \oplus R^2(\tilde{p}^2) \end{aligned}$$
(1)

or equivalently \(R^2(p^1) \oplus R^2(p^2) \oplus R^2(\tilde{p}^1) \oplus R^2(\tilde{p}^2) = 0\) and

$$\begin{aligned} R^4(p^1) \oplus R^4(p^2) \in \mathcal {M}_J \iff R^4(\tilde{p}^1) \oplus R^4(\tilde{p}^2) \in \mathcal {M}_J \end{aligned}$$

holds with prob. 1 for 4-round AES, independently of the secret key, of the details of the S-box, and of the MixColumns matrix.

For completeness, we mention that such a result has been revisited recently in [4], where the authors show that the above property is an immediate consequence of an equivalence relation on the input pairs, under which the difference at the output of the round function is invariant. Moreover, we mention that a generalization of such a result, the exchange attack, has been presented in [2].

The property given in Eq. (1) is the starting point for our distinguisher and key-recovery attacks on reduced-round AES. Indeed, since this event occurs with prob. \(2^{-128}\) if the ciphertexts are generated by a random permutation, it allows to set up a secret-key distinguisher for 2-round AES.

3.2 Impossible Mixture Integral Distinguisher on 3-Round AES

The zero-sum property given in Eq. (1) is independent of the secret key and of the S-box, and it can be used to set up a key-recovery attack on reduced-round AES. In particular, consider \(p^1, p^2, \tilde{p}^1, \tilde{p}^2\) as in Eq. (1) and the corresponding ciphertexts \(c^1= R^3( p^1), c^2= R^3(p^2), \tilde{c}^1= R^3(\tilde{p}^1), \tilde{c}^2 = R^3(\tilde{p}^2)\) after 3-round AES encryptions. Assuming the last MixColumns operation is omitted and since

$$\begin{aligned} R^2(p^1) \oplus R^2(p^2) \oplus R^2(\tilde{p}^1) \oplus R^2(\tilde{p}^2) = 0, \end{aligned}$$

it follows that the secret key k must satisfy

$$\begin{aligned} \begin{aligned}&\text { S-box}^{-1}(c^1_{j,l} \oplus k_{j,l}) \oplus \text {S-box}^{-1}(c^2_{j,l} \oplus k_{j,l}) \\ \oplus&\text { S-box}^{-1}(\tilde{c}^1_{j,l} \oplus k_{j,l}) \oplus \text {S-box}^{-1}(\tilde{c}^2_{j,l} \oplus k_{j,l}) = 0 \end{aligned} \end{aligned}$$
(2)

for each \(j,l =0, \dots , 3\) as for a classical integral attack [5, 14] (all the details of the attack are given in the next section). The crucial point here is that there exists at least one key (namely, the secret key) that satisfies the previous equivalence.

Lemma 2

Let \(\{c^1, c^2, \tilde{c}^1, \tilde{c}^2\}\) be the set of ciphertexts corresponding to the 3-round encryptions of \(p^1, p^2, \tilde{p}^1, \tilde{p}^2\). With prob. 1 there exists (at least) one key k that satisfies Eq. (2).

Proof

Let \([R^2(p)]_{i,j}\) be the byte in position (ij) of the 2-round encryption of p. Let \(\hat{k}\) be the secret key, and let k be the guessed key. From Eq. (1),

$$\begin{aligned} \begin{aligned}&{{\;S^{-1}\left[ S\bigl ([R^2(p^1)]_{j,l} \oplus \hat{k}_{j,l}\bigl ) \oplus k_{i,j}\right] \oplus S^{-1}\left[ S\bigl ([R^2(p^2)]_{j,l} \oplus \hat{k}_{j,l}\bigl ) \oplus k_{i,j}\right] }} \\ {{\oplus }}&{\small {\;S^{-1}\left[ S\bigl ([R^2(\tilde{p}^1)]_{j,l} \oplus \hat{k}_{j,l}\bigl ) \oplus k_{i,j}\right] \oplus S^{-1}\left[ S\bigl ([R^2(\tilde{p}^2)]_{j,l} \oplus \hat{k}_{j,l}\bigl ) \oplus k_{i,j}\right] = 0}}, \end{aligned} \end{aligned}$$

where

$$ c^1 = R^3(p^1) \equiv S\bigl (R^2(p^1) \oplus \hat{k}\bigl ) \text { and } S^{*}(\cdot ) \equiv \text {S-box}^{*}(\cdot ) $$

and similarly for the other texts. Due to Eq. (1), the equality is always satisfied for \(\hat{k}_{j,l} = k_{i,j}\). Hence, there exists at least one key that satisfies Eq. (2).    \(\square \)

If there is no key for which the previous condition is satisfied, then the set of ciphertexts \(\{c^1, c^2, \tilde{c}^1, \tilde{c}^2\}\) is not generated by 3-round AES, but by a random permutation. That is, if there is no key \(k_{j,l}\) that satisfies Eq. (2), then the ciphertexts \(\{c^1, c^2, \tilde{c}^1, \tilde{c}^2\}\) are not the 3-round AES encryptions of \(p^1, p^2, \tilde{p}^1, \tilde{p}^2\).

However, since our goal is to set up a distinguisher which is independent of the secret key, we need a way to check this property without checking the existence of a key. So the problem is to rewrite this property in order to avoid key guessing. To solve this issue, the idea is to look for values of \(\{c^1, c^2, \tilde{c}^1, \tilde{c}^2\}\) for which Eq. (2) does not admit any solution k. As a result, we are going to show that a particular property, which is independent of the secret key, holds for the ciphertexts only in the case in which the key-recovery (mixture integral) attack just proposed fails.

Theorem 3

Given the subspace \(\mathcal {C}_0 \cap \mathcal {D}_{0,3} \equiv \langle e_{0,0}, e_{1,0}\rangle \subseteq \mathcal {C}_0\), consider two plaintexts \(p^1\) and \(p^2\) in the same coset \(\left( \mathcal {C}_0 \cap \mathcal {D}_{0,3} \right) \oplus a\) generated by \(p^1 \equiv ( z^1, w^1)\) and \(p^2 \equiv ( z^2, w^2)\). Let \(p^3, p^4 \in \mathcal {C}_0 \oplus a\) be two other plaintexts generated by

$$ p^3 \equiv (z^1, w^1, \varPsi , \varPhi ), \, p^4 \equiv (z^2, w^2, \varPsi , \varPhi ) \text { or } p^3 \equiv (z^1, w^2, \varPsi , \varPhi ), \, p^4 \equiv (z^2, w^1, \varPsi , \varPhi ), $$

where \(\varPsi \) and \(\varPhi \) can take any possible value in \(\mathbb {F}_{2^8}\). For all \(i, j =0, \dots , 3\) and for all pairwise distinctFootnote 3 \(\alpha , \beta , \gamma , \delta \in \{1,2,3,4\}\), the condition

$$\begin{aligned} \bigl [R^3(p^\alpha ) \oplus R^3(p^\beta )]_{i, j} = 0 \quad \text {and} \quad \bigl [R^3(p^\gamma ) \oplus R^3(p^\delta )]_{i, j} \ne 0, \end{aligned}$$
(3)

where \([\cdot ]_{i,j}\) denotes the byte in row i and column j, can never hold for 3-round AES (without the final MixColumns operation), independently of the secret key, of the details of the S-box, and of the MixColumns matrix.

Note that the event given in Eq. (3) occurs with a probability of around

$$\begin{aligned} 1-\left( \frac{256\cdot 255 \cdot 254\cdot 253 + 3\cdot 256 \cdot 255 + 256}{256^4}\right) ^{16}\approx 2^{-1.675} \approx 31.34\% \end{aligned}$$
(4)

if the ciphertexts are generated by a random permutationFootnote 4 \(\varPi \), where

(1):

\(256\cdot 255 \cdot 254\cdot 253/256^4\) corresponds to the probability of all bytes \([\varPi (p^\alpha )]_{i, j}\), \([\varPi (p^\beta )]_{i, j}\), \([\varPi (p^\gamma )]_{i, j}\) ,\([\varPi (p^\delta )]_{i, j}\) being different,

(2):

\(3\cdot 256 \cdot 255/256^4\) corresponds to the probability of \([\varPi (p^\alpha )]_{i, j} = [\varPi (p^\beta )]_{i, j}\) and \([\varPi (p^\gamma )]_{i, j} = [\varPi (p^\delta )]_{i, j}\), where the first value is different from the second one, for all three independent combinations of \(\alpha , \beta , \gamma , \delta \in \{1,2,3,4\}\), and

(3):

\(256/256^4\) corresponds to the probability of all bytes being equal.

As a result, this property can be exploited to set up a secret-key distinguisher which is independent of the secret key.

Proof

We prove this result by contradiction. Assume there exist \(i,j \in \{0, \dots , 3\}\) and there exist pairwise distinct \(\alpha , \beta , \gamma , \delta \in \{1,2,3,4\}\) such that

$$\begin{aligned} \bigl [R^3(p^\alpha ) \oplus R^3(p^\beta )]_{i, j} = 0 \quad \text { and } \quad \bigl [R^3(p^\gamma ) \oplus R^3(p^\delta )]_{i, j} \ne 0. \end{aligned}$$

According to Lemma 2, there exists at least one key k for 3-round AES that satisfies Eq. (2). Moreover, \([R^3(p^\alpha )]_{i, j} = [R^3(p^\beta )]_{i, j} \implies c^\alpha _{i, j} = c^\beta _{i, j}\), and hence

$$ \text {S-box}^{-1}(c^\alpha _{i, j} \oplus k_{i, j}) \oplus \text {S-box}^{-1}(c^\beta _{i, j} \oplus k_{i, j}) = 0. $$

It follows that Eq. (2) reduces to

$$ \text {S-box}^{-1}(c^\gamma _{i, j} \oplus k_{i, j}) \oplus \text {S-box}^{-1}(c^\delta _{i, j} \oplus k_{i, j}) = 0. $$

Since \(\bigl [R^3(p^\gamma ) \oplus R^3(p^\delta )]_{i, j} \ne 0\), that is, \(c^\gamma _{i,j} \ne c^\delta _{i,j}\), it follows that

$$ \forall k_{j,l}: \qquad \text {S-box}^{-1}(c^\gamma _{i, j} \oplus k_{i, j}) \ne \text {S-box}^{-1}(c^\delta _{i, j} \oplus k_{i, j}), $$

which contradicts Lemma 2. As a result, for all pairwise distinct \(\alpha , \beta , \gamma , \delta \in \{1,2,3,4\}\), the condition

$$ \forall i, j =0, \dots , 3: \quad \bigl [R^3(p^\alpha ) \oplus R^3(p^\beta )]_{i, j} = 0 \quad \text { and } \quad \bigl [R^3(p^\gamma ) \oplus R^3(p^\delta )]_{i, j} \ne 0 $$

can never hold for 3-round AES.    \(\square \)

We decided to call this an impossible mixture integral distinguisher because it exploits a property which holds with prob. 0 and because it extends the mixture integral distinguisher presented before.

Notation. For the follow-up, we introduce a notation in order to easily explain the costs of the distinguisher and of the attacks based on the impossible zero-sum property just proposed. Let \(x^1, y^1, x^2, y^2 \in \mathbb {F}_{2^8}\) s.t. \(x^1\ne x^2, y^1\ne y^2\) arbitrary but fixed. Let \(\mathfrak T^{x, y}_{ \varPsi , \varPhi }\) be a set of two plaintexts defined by

$$\begin{aligned} \mathfrak T^{x, y}_{ \varPsi , \varPhi } := \{ p^1 = (x^1, y^1, \varPsi , \varPhi ), p^2 = (x^2, y^2, \varPsi , \varPhi )\}, \end{aligned}$$
(5)

where \(\varPsi , \varPhi \in \mathbb {F}_{2^8}\) and where \(p^1, p^ 2 \in \mathcal {C}_0 \oplus a\). Since \(x^1, y^1, x^2, y^2\) are fixed, we usually denote \(\mathfrak T^{x, y}_{ \varPsi , \varPhi }\) by \(\mathfrak T_{ \varPsi , \varPhi }\), that is, \(\mathfrak T^{x, y}_{ \varPsi , \varPhi } \equiv \mathfrak T_{ \varPsi , \varPhi }\).

Let \(\mathfrak S\) be the union of two sets \(\mathfrak T^{x, y}\) (i.e., as set of four plaintexts) defined as

$$\begin{aligned} \begin{aligned} \mathfrak S = \mathfrak T_{ \varPsi , \varPhi } \cup \mathfrak T_{ \psi , \phi } \equiv \bigl \{&p^1 = (x^1, y^1, \varPsi , \varPhi ), p^2 = (x^2, y^2, \varPsi , \varPhi ),\\&p^3 = (x^1, y^1, \psi , \phi ), p^4 = (x^2, y^2, \psi , \phi ) \bigl \}, \end{aligned} \end{aligned}$$
(6)

where \((\varPsi , \varPhi ) \ne (\psi , \phi )\). Note that given \(p^1, p^2, p^3, p^4 \in \mathfrak S\), the corresponding ciphertexts after 3-round AES encryptions satisfy Theorem 3.

figure a

Data Cost of the Distinguisher. If the goal is to distinguish 3-round AES from a random permutation with a probability higher than \(95\%\), one needs at least 8 different sets \(\mathfrak S\) defined as before, since \(1-(1-2^{-1.675})^N \ge 0.95\) if \(N\ge 8\). In order to generate 8 sets \(\mathfrak S\), one needs at least 5 different sets \(\mathfrak T\), since \(\left( {\begin{array}{c}5\\ 2\end{array}}\right) =10\ge 8\), which results in a data cost of \(5\cdot 2= 10\) chosen plaintexts.

We emphasize that the corresponding 3-round encryptions of

$$ p^1 \equiv (z^1, w^1, \varPsi , \varPhi ), \, p^2 \equiv (z^2, w^2, \varPsi , \varPhi ), p^3 \equiv (z^1, w^2, \varPsi ^\prime , \varPhi ^\prime ), \, p^4 \equiv (z^2, w^1, \varPsi ^\prime , \varPhi ^\prime ) $$

satisfy Lemma 2 even if \(\varPsi \ne \varPsi ^\prime \) and \(\varPhi \ne \varPhi ^\prime \). For completeness, if a success probability of \(65\%\) is sufficient, then one needs only \(3\) chosen plaintexts (by analogous computation, \(1 - (1 - 2^{-1.675})^N \ge 0.65\) if \(N\ge 3\), which implies that \(N \ge 3\) sets \(\mathfrak S\) are required, or equivalently 3 sets \(\mathfrak T\), that is 6 chosen plaintexts).

Remark. The sets \(\mathfrak S\) just defined are not independent. Here we explain why this is just a minor problem for the distinguisher. For simplicity, we work at byte level and consider three sets:

$$ \mathfrak X_1 = \{a_0, a_1, b_0, b_1\}, \quad \mathfrak X_2 = \{a_0, a_1, c_0, c_1\}, \quad \mathfrak X_3 = \{b_0, b_1, c_0, c_1\}, $$

where \(a_0, a_1, b_0, b_1, c_0, c_1\) correspond to a (fixed position) byte of \([\varPi (p^i)]\) for several (related) plaintexts \(p^i\).

Assume that both the sets \(\mathfrak X_1\) and \(\mathfrak X_2\) do not satisfy the event described in Eq. (3). Here we analyze the impact on \(\mathfrak X_3\).

  • If all bytes of \(\mathfrak X_1\) and of \(\mathfrak X_2\) are different, it is still possible that the set \(\mathfrak X_3\) satisfies the event described in Eq. (3), e.g., if \(b_0 = c_0\) and \(b_1\ne c_1\) (or vice-versa). Hence, the set \(\mathfrak X_3\) can still satisfy the event in Eq. (3), even if both the sets \(\mathfrak X_1\) and \(\mathfrak X_2\) do not satisfy it.

  • If all bytes of \(\mathfrak X_1\) are equal (or, more generally, if \(a_0 = a_1, b_0 = b_1, a_0 \ne b_0\)) then \(\mathfrak X_2\) does not satisfy the event described in Eq. (3) if and only if \(c_0 = c_1\) as well. In this case, it follows that \(\mathfrak X_3\) does not satisfy the event described in Eq. (3) with prob. 1. A similar conclusion holds if \(a_0 = b_0, a_1 = b_1, a_0 \ne a_1\).

Note that the first case happens with a probability much larger than the second one (that is, with prob. \(\frac{256\cdot 255 \cdot 254\cdot 253}{256\cdot 255 \cdot 254\cdot 253 + 3\cdot 256 \cdot 255 + 256}\approx 99.99\%\) for each set \(\mathfrak X_i\) where \(i =1,2\)). As a result, if the sets \(\mathfrak S\) are generated as before (namely, reusing sets \(\mathfrak T\)), it follows that the real probability is actually smaller than \(1-(1-2^{-1.675})^N \). However, our practical experiments show that the real probability is just slightly smaller than the approximated theoretical one given before. Hence, the approximated theoretical number of sets given before corresponds in many cases to the real number of sets required in practice (or it is just slightly smaller).

Computational Cost of the Distinguisher. The property needs to be tested for each pair of the \(5\) input sets, and for each of the \(16\) state bytes. Testing the property requires \(\left( {\begin{array}{c}4\\ 2\end{array}}\right) \cdot 2\) table lookups and XOR operations. Hence, the total cost consists of at most \( \left( {\begin{array}{c}5\\ 2\end{array}}\right) \cdot \left( {\begin{array}{c}4\\ 2\end{array}}\right) \cdot 2 \cdot 16 = 1920 \approx 2^{10.9} \) table lookups and XOR operations, i.e., about \(2^5\) 3-round AES encryptions (for 20 S-boxes \(\approx \) 1-roundFootnote 5).

Before going on, we mention that this cost is roughly of the same order as the one required to set up a 3-round AES distinguisher based on the truncated differential property (see e.g. [11, Sect. 4.3] for more details).

Practical Verification. We implemented the distinguisher and found that by using 10 chosen plaintexts, it indeed successfully detects 3-round AES or a random permutation in around \(95\%\) of the cases. Our sample size was \(300000 \approx 2^{18}\) and we used \(21\)-round AES to simulate a random permutation. Moreover, by using 6 or 8 chosen plaintexts instead of 10, this probability decreases to \(61\%\) and \(82.9\%\), respectively. Using 12, 14, or 16 chosen plaintexts instead of 10, this probability increases to \(98.5\%\), \(99.7\%\), and \(99.95\%\), respectively.

Note that using 6 chosen plaintexts, we would expect a success probability of \(\approx 65\%\) using our approximated theoretical formula. As explained before, this small gap is due to the way in which the sets \(\mathfrak S\) are constructed. A similar consideration also holds for the case in which 10 plaintexts are used, where the practical success probability is slightly smaller than the theoretical one.

A Similar Distinguisher on 4-round AES. In [13], we consider the possibility to set up a similar impossible mixture integral distinguisher on 4-round AES (by extending the 3-round integral distinguisher). However, as we show there, it seems that a trivial application of such a distinguisher on 4 rounds requires more than the full code book. An open future problem is to study the possibility to set up a similar distinguisher on 4 (or even more) rounds of AES.

4 Mixture Integral Attacks on Reduced-Round AES

4.1 Mixture Integral Key-Recovery Attack on 3-Round AES

Equation (1) is the starting point for a key-recovery attack on 3- and 4-round AES. The attack works in the same way as a classical integral key-recovery attack [5, 14], with the crucial difference that it has a data cost of only 4 chosen plaintexts.

Given the subspace \(\mathcal {C}_0 \cap \mathcal {D}_{0,3} \equiv \langle e_{0,0}, e_{1,0}\rangle \subseteq \mathcal {C}_0\), consider two plaintexts \(p^1\) and \(p^2\) in the same coset \(\left( \mathcal {C}_0 \cap \mathcal {D}_{0,3} \right) \oplus a\) generated by \(p^1 \equiv ( z^1, w^1)\) and \(p^2 \equiv ( z^2, w^2)\). Let \(\tilde{p}^1, \tilde{p}^2 \in \mathcal {C}_0 \oplus a\) be two other plaintexts generated by

$$ \tilde{p}^1 \equiv (z^1, w^1, \varPsi , \varPhi ), \, \tilde{p}^2 \equiv (z^2, w^2, \varPsi , \varPhi ) \text { or } \tilde{p}^1 \equiv (z^1, w^2, \varPsi , \varPhi ), \, \tilde{p}^2 \equiv (z^2, w^1, \varPsi , \varPhi ), $$

where \(\varPsi \) and \(\varPhi \) can take any possible value in \(\mathbb {F}_{2^8}\). Moreover, let \(c^1, c^2, \tilde{c}^1, \tilde{c}^2\) denote the corresponding ciphertexts after 3-round AES, i.e., \(c^1 = R^3(p^1), c^2 = R^3(p^2), \tilde{c}^1 = R^3(\tilde{p}^1)\) and \(\tilde{c}^2 = R^3(\tilde{p}^2)\). Assume that the final MixColumns operation has been omittedFootnote 6. Due to the proposed zero-sum distinguisher and working at byte level, we know that the secret key \(k_{i,j}\) for each \(i, j = 0, \dots , 3\) satisfies

$$\begin{aligned} \begin{aligned}&\text { S-box}^{-1}(c^1_{i,j} \oplus k_{i,j}) \oplus \text {S-box}^{-1}(c^2_{i,j} \oplus k_{i,j}) \\ \oplus&\text { S-box}^{-1}(\tilde{c}^1_{i,j} \oplus k_{i,j})\oplus \text {S-box}^{-1}(\tilde{c}^2_{i,j} \oplus k_{i,j}) = 0 \end{aligned} \end{aligned}$$
(7)

with prob. 1 independently of the S-box. Since a wrongly guessed key satisfies the previous equality with a probability of \(2^{-8}\), it is possible to find the right one. The data cost of the attack is 4 chosen plaintexts, while the computational cost is 16 (= number of bytes) \(\cdot \; 2^8\) (= number of \(k_{i,j}\)) \(\cdot \; 4\) (= number of S-boxes) \(= 2^{14}\) S-box operations, i.e., \(2^{8.1}\) 3-round AES (assuming 20 S-boxes \(\approx \) 1-round).

figure b

An Optimal Implementation of the Attack. The previous attack does not require any memory cost. In order to reduce the computational cost, another version of the attack can be considered. The idea is simply to generate a table with the solutions \(\kappa \) of the equation

$$\begin{aligned} \text {S-box}^{-1}(i \oplus \kappa )\oplus \text {S-box}^{-1}(j \oplus \kappa ) \oplus \text {S-box}^{-1}(h \oplus \kappa ) \oplus \text {S-box}^{-1}(l \oplus \kappa ) = 0 \end{aligned}$$
(8)

for each \(i, j, h, l \in \mathbb {F}_{2^8}\).

We briefly analyze the number of solutions of Eq. (8). If \(i=j=h=l\) or if \(i= j\) and \(h=l\) (where \(j\ne h\) – analogous for the other six cases), then the previous equality is always satisfied for each \(\kappa \). Moreover, due to the result presented in Sect. 3.2 we know that the caseFootnote 7 \(i=j\) and \(h\ne l\) (analogous for the other six cases) can never occur for 3-round AES (where ijhl correspond to \([R^3(p^\alpha )]_{x, y}, [R^3(p^\beta )]_{x, y}, [R^3(p^\gamma )]_{x, y}, [R^3(p^\delta )]_{x, y}\) in Theorem 3). Hence, in the case \(i\ne j, i\ne h, i\ne l, j\ne h, j\ne l, h\ne l\) (useful for the attack), the average number of solutions is 1 (as shown in [10, Sect. 5.2] and by practical experiments).

Once such a table is generated and the ciphertexts \(c^1, c^2, \tilde{c}^1, \tilde{c}^2\) are given, the attacker needs only 16 table lookups to find the secret key (one lookup for each byte of the key), which is much less than a single encryption – a complete pseudo code is given in Algorithm 2. An overall estimation for the precomputation cost is given by \(2^{32} \cdot 2^8 \cdot 4 = 2^{42}\) S-box operations, that is, \(2^{36.1}\) 3-round AES encryptions.

figure c

4.2 Mixture Integral Key-Recovery Attack on 4-Round AES

The previous attack can be extended to 4-round AES using the same technique proposed in [5, 14] in order to extend an integral attack at the end. The idea is to guess the final anti-diagonal of the key, partially decrypt one round, and to use the previous attack on 3 rounds to filter out wrongly guessed keys:

where \(c^i = R^4(p^i), d^i = R^4(q^i)\) for \(i=1,2\) and where \(\mathfrak S = \{p^1, p^2, q^1, q^2\}\) is defined as in Eq. (6). We refer to Algorithm 3 for a complete pseudo code. Since the technique used to extend the attack is well-known in the literature, here we only analyze its cost. We remark that no details of the key schedule are used to set up the attack. The pseudo-code is given in Appendix A.

Data Cost. We consider 2 pairs of texts, that is, \((p^1, q^1)\) and \((p^2, q^2)\) generated by mixing variables. The attacker guesses 4 bytes of the last key, i.e., \((k^4_{0,0}, k^4_{1,3}, k^4_{2,2}, k^4_{3,1})\) and 4 bytes of the second-to-last key, i.e., \((k^3_{0,0}, k^3_{1,0}, k^3_{2,0}, k^3_{3,0})\) (analogous for the other four cases). Using the subkey bytes, after 2-round decryptions they can verify the zero-sum property on (at most) four bytes of the texts. Moreover, using only 4 chosen plaintexts, the probability the a key passes the test is \(2^{-32}\), which means that the number of remaining keys is \({2^{32}}\) (= values of \(k^4\)) \(\cdot \; {2^{32}}\) (= values of \(k^3\)) \(\cdot \; 2^{-32} = 2^{32}\). Without exploiting the key schedule, the attacker thus needs at least another pair of texts \((p^3, q^3)\) to detect wrongly guessed keys (without using exhaustive search), for a total of 6 chosen plaintexts.

Computational Cost. First of all, a 1-round decryption costs \(2^{32}\) (anti-diagonal of \(k^4\)) \(\cdot \; 4\) (number of S-boxes) \(\cdot \; 6\) (number of texts) \(= 3\cdot 2^{35}\) S-box lookups. For each anti-diagonal of \(k^4\), the cost of finding 4 bytes of \(k^3\) is \(2\cdot (1+ 2^{-8}+2^{-16}+2^{-24}) = 2\) table lookups. Indeed, note that since the probability that the first condition holds (\(A[g(\tilde{c}^1_{0,0},\tilde{d}^1_{0,0}, \tilde{c}^2_{0,0},\tilde{d}^2_{0,0})] = A[g(\tilde{c}^1_{0,0},\tilde{d}^1_{0,0}, \tilde{c}^3_{0,0},\tilde{d}^3_{0,0})]\) in Algorithm 3) is \(2^{-8}\), it is \(2^{-16}\) for both the first and the second condition, and so on. If the first condition is not satisfied, the attacker does not need to evaluate the others (and similarly for the other cases). Hence, the cost of finding the full key is around \(4\cdot 3\cdot 2^{35} \cdot 2 = 3\cdot 2^{38}\) S-box lookups, i.e., \(2^{33.3}\) 4-round encryptions.

Practical Verification. We implemented and practically verified Algorithm 2 in C++, which allows us to find the last secret round key almost instantly on our tested machine (Intel i7-8550U @ 4.00 GHz).

4.3 Impossible Mixture Integral Attack on 4-Round AES

In Sect. 3.2, we presented a new 3-round AES secret-key distinguisher which is independent of the key (namely, the impossible mixture integral distinguisher, which is different from the one just used for the key-recovery attacks just presented). Using the techniques just described, it is possible to set up a key-recovery attack on 4-round AES (see Appendix A for details). However, since the corresponding attack is not competitive w.r.t. other attacks in the literature, we only present its details in the extended version of the paper [13].

5 Mixture Integral Attacks on Reduced-Round AES with a Single Secret S-Box

The 3-round mixture integral attack proposed in Sect. 4.1 exploits a property which holds with prob. 1 and which is independent of the secret key and of the details of the S-box. For this reason, we are going to show that a similar attack can be set up on 3-round AES with a single secret S-box, exploiting an idea similar to the one proposed in [18]. The strategy consists of two steps:

  1. 1.

    The attacker finds the S-box up to additive constants, i.e., \(\text {S-box}^{-1}(\cdot \oplus a) \oplus b\).

  2. 2.

    The attacker exploits the previous information in order to find the key up to \(2^8\) equivalents, like \((k_0, k_1 \oplus k_0, \dots , k_{15} \oplus k_0)\).

5.1 Strategy of the Attack

Finding the S-Box (Up to Additive Constants). In order to find \(S' = \text {S-box}^{-1}(\cdot \oplus a) \oplus b\), we make use of Eq. (7), namely \(\bigoplus _{i=1,2} [\text { S-box}^{-1}(c^i_{0,0} \oplus k_{0,0}) \oplus \text {S-box}^{-1}(\tilde{c}^i_{0,0} \oplus k_{0,0})] = 0\). This is similar to what is done in [18], where the authors exploit the fact that

$$ \bigoplus _{x \in (\mathcal {D}_0 \cap \mathcal {C}_0)} \text {S-box}^{-1}([R^4(x)]_{0,0} \oplus k_{0,0}) = 0, $$

which is a well-known property of the integral attack on 4-round AES. We emphasize that this equality involves 256 different texts, while the one exploited in this paper requires only 4 texts (even if on a smaller number of rounds).

Working as in [18], taking different sets (\(c^1, c^2, \tilde{c}^1, \tilde{c}^2\)) of ciphertexts corresponding to the 3-round encryptions of plaintexts that share the same generating variables, we can now try to generate sufficiently many linear equations to be able to determine \(L \circ \text {S-box}^{-1}(\cdot \oplus a) \oplus b\). However, we are only able to determine

$$ \{L \circ \text {S-box}^{-1}(\cdot \oplus a) \oplus b \, | \, L: \mathbb {F}_{2^8} \rightarrow \mathbb {F}_{2^8} \text { linear permutation} \,\, \& \,\, a,b \in \mathbb {F}_{2^8} \}, $$

which is of size \(2^{70.2}\). In the following, \(L \circ \text {S-box}^{-1}(\cdot \oplus a) \oplus b \equiv A \circ \text {S-box}^{-1}(\cdot \oplus a)\), where A is an affine permutation. This is sufficient to find the secret key, since

$$ \bigoplus _{x \in \mathcal {X}} \text {S-box}^{-1}(x \oplus a) = 0 \iff \bigoplus _{x \in \mathcal {X}} A \circ \text {S-box}^{-1}(x \oplus a) = 0, $$

where the last condition holds since \(\mathcal {X}\) has an even number of elements and since A is an affine operation. This also implies that for the attack it is sufficient to recover the linear part of the affine operation A, since the constant b does not change the result of the sum due to the fact that \(\mathcal {X}\) has an even number of elements (for this reason we only focus on the linear part in the following).

As each linear equation gives us one byte of information and as we can only determine the S-box up to \(2^{70.2}\le 2^{72} = 2^{9\cdot 8}\) variants, there can at most be \(256 - 9 = 247\) linearly independent equations like Eq. (7). By practical experiments, we found that using \(3 \cdot 256 = 768 \approx 2^{9.6}\) different sets \(\mathfrak S\) of pairs of texts as defined in Eq. (6) are sufficient in most cases to generate a set of equations with rank \(247\) (for a cost of \(4 \cdot 2^{9.6} = 2^{11.6}\) chosen plaintexts).Footnote 8

figure d

With this set of equations and working as in [18], we now start to assign linearly independent values to variables in order to potentially increase the rank of the corresponding coefficient matrix to \(256\). However, when assigning a value, it might happen that we do not increase the rank of the matrix. If this is the case, we remove the assignment for this variable and try a different one instead, in order to avoid variable assignments which result in a system with no solutions (this countermeasure is sufficient due to the Rouché-Capelli theoremFootnote 9). We repeat this approach until we have found \(9\) variables such that fixing them to linearly independent values in \(\mathbb {F}_{2^8}\) results in a rank-\(256\) coefficient matrix. Such a set of variable assignments can easily be found after a small number of trials.

Finding the Key Given the S-Box. Given \(A\circ \text {S-box}^{-1}(\cdot \oplus k_{0,0})\) for some unknown A and \(k_{0,0}\), it is possible to find \(k_{0,0} \oplus k_{i, j}\) for \(0 \le i \le 3, 0 \le j \le 3\) (except where \(i = j = 0\)), where \(k\) denotes the third round key. Indeed, given \(p^1, p^2, q^1, q^2\) as before and the corresponding ciphertexts \(c^1, c^2, \tilde{c}^1, \tilde{c}^2\) after 3 rounds, we know that

$$\begin{aligned} \begin{aligned}&\left[ S^{-1}\biggl ([c^1_{i,j} \oplus (k_{i,j} \oplus k_{0,0})] \oplus k_{0,0}\biggl )\right] \oplus \left[ S^{-1}\biggl ([c^2_{i,j} \oplus (k_{i,j} \oplus k_{0,0})] \oplus k_{0,0}\biggl )\right] \\ \oplus&\left[ S^{-1}\biggl ([\tilde{c}^1_{i,j} \oplus (k_{i,j} \oplus k_{0,0})] \oplus k_{0,0}\biggl )\right] \oplus \left[ S^{-1}\biggl ([\tilde{c}^2_{i,j} \oplus (k_{i,j} \oplus k_{0,0})] \oplus k_{0,0}\biggl )\right] = 0, \end{aligned} \end{aligned}$$
(9)

is satisfied (see Eq. (2)), where \(S(\cdot ) = \text {S-box}(\cdot )\). Since \(A\circ \text {S-box}^{-1}(\cdot \oplus k_{0,0})\) is known (note that the affine layer \(A(\cdot )\) plays no role), it is possible to find \(k_{i,j} \oplus k_{0,0}\) by guessing \(k_{i,j} \oplus k_{0,0}\) (\(2^8\) values) and verifying that Eq. (2) is fulfilled.

Computational Cost. As we have just seen, \(2^{11.6}\) chosen plaintexts are needed for our attack to work with high probability. Finding the S-box consists of matrix rank calculations and the solving step for the system of linear equations. The rank of an \(m \times n\) matrix with coefficients in \(\mathbb {F}_2\) can be found in \(\mathcal {O}(mn^2)\) XOR operations involving single bits using Gaussian elimination. In our case, \(n = 256\) and \(m = 768\) (the value of m is obtained via practical tests – see before), therefore we need at most \(768 \cdot 256^2 \approx 2^{25.6}\) single-bit XOR operations for the initial rank calculation, which amounts to \(\approx 2^{22.6}\) \(8\)-bit XOR operations. For the additional variable assignments, where we need to recalculate the rank for each assignment, note that we assign values to variables such that we can (1) reuse the previous matrix (which is in row echelon form) and (2) minimize the number of operations needed by choosing variables efficiently.Footnote 10 For example, if \(10\) trials are needed to find \(9\) suitable variables (which is sufficient in most cases according to our tests), we can choose them such that at most \(10^4 \approx 2^{13.3}\) XOR operations are needed (note that these include \(8\)-bit XOR operations now, since we are adding arbitrary elements of \(\mathbb {F}_{2^8}\) to our augmented matrix).

We still need to evaluate the cost of solving the final system of linear equations. However, note that this is almost free, since our final matrix is already in row echelon form due to the previous computations.

In order to find the \(15\) key relations \(k_{i,j} \oplus k_{0,0}\), we need \(15 \cdot 256 \cdot 4 = 2^{13.91}\) table lookups, which amounts to about \(2^{8}\) \(3\)-round AES encryptions (assuming that the cost of one encryption round is approximately the same as the cost of \(20\) table lookups). Hence, the complexity of the whole attack is given by \(\approx 2^{25.6}\) single-bit XOR operations (in order to find the rank of the first \(768 \times 256\) matrix) and of \(2^8\) \(3\)-round AES encryptions (in order to find the \(15\) key relations \(k_{i,j} \oplus k_{0,0}\)).

Practical Verification. We implemented the attack in practice and both finding an S-box representative and finding the key relations take less than \(0.2\) seconds on our tested machine (without the time needed for the encryption oracle). We note that the bounds for XOR operations given in our theoretical estimation above are actually upper bounds. We expect the real number of operations to be lower, mainly because our initial systems are relatively sparse.