Keywords

1 Introduction

AES (Advanced Encryption Standard) [9] is probably the most used and studied block cipher. Since the development of cryptanalysis of AES and AES-like constructions in the late 1990s, the set of input which differ only in one diagonal has special importance. Indeed, it appears in several attacks and distinguishers, including various (truncated) differential [16, 17], integral [8], and impossible differential attacks [4], among others. In particular, given a diagonal set of plaintexts and the corresponding ciphertexts after 4 rounds, it is well known that the XOR-sum of the ciphertexts is equal to zero [8], or that each pair of ciphertexts cannot be equal in any of the four anti-diagonals, as shown by Biham and Keller in [5].

While a lot is known about the encryption of a diagonal set of plaintexts – that is, a set of plaintexts with one (or more) active diagonal(s) – for up to 4-round AES, an analysis for 5 or more rounds AES is still missing. At Eurocrypt 2017, a new property which is independent of the secret key has been found for 5-round AES [14]. By appropriate choices of a number of input pairs, it is possible to make sure that the number of times that the difference of the resulting output pairs lie in a particular subspace \(\mathcal{I}\mathcal{D}\) is always a multiple of 8. Such a distinguisher has then been exploited in, e.g., [2, 11] for setting up new competitive distinguishers and key-recovery attacks on round-reduced AES.

At the same time, some open questions arise from the result provided in [14]: does this property influence the average number of output pairs that lie in such a particular subspace (i.e., the mean)? Are other parameters (including the variance and the skewness) affected by the multiple-of-8 property?

In this paper, given a diagonal set of plaintexts, we consider the probability distribution of the corresponding number of pairs of ciphertexts that are equal in one fixed anti-diagonal after 5-round AES (without the final MixColumns operation) – equivalently, that belong to the same coset of a particular subspace \(\mathcal{I}\mathcal{D}\) – denoted in the following as the “(average) number of collisions”.

1.1 Contributions

As the main contribution, we perform for the first time a differential analysis of such distribution after 5-round AES, and find significant deviations from random, supported by practical implementations and verification. For a theoretical explanation we have to resort to an APN-like assumption on the S-Box, which closely resembles the AES-Sbox. A numerical summary is given in Table 1. All the results presented in this paper are independent of the secret-key.

Table 1. Expected properties of a diagonal set after 5-round encryption. Given a set of \(2^{32}\) chosen plaintexts all equal in three diagonals (that is, a diagonal set), we consider the distribution of the number of different pairs of ciphertexts that are equal in one anti-diagonal (equivalently, that lie in a particular subspace \(\mathcal{I}\mathcal{D}_I\) for \(I\subseteq \{0,1,2,3\}\) fixed with \(|I|=3\)). Expected values for mean and variance of these distributions are given in this table for 5-round AES and for a random permutation. Practical results on AES are close and are discussed in Sect. 7.2.

Mean of 5-Round AES. Firstly, by an appropriate choice of \(2^{32}\) plaintexts in a diagonal space \(\mathcal D\), we prove for the first time that the average number of times that the resulting output pairs are equal in one fixed anti-diagonal (equivalently, the average number of times that the difference of the resulting output pairs lie in a particular subspace \(\mathcal{I}\mathcal{D}\)) is (a little) bigger for 5-round AES than for a random permutation, independently of the secret key. A complete proof of this result – under an “APN-like” assumption on the S-Box which closely resembles the AES S-Box – can be found in Sect. 6.

Variance of 5-Round AES. Secondly, we theoretically compute the variance of the probability distribution just defined, and we show that it is higher (by a factor of approximately 36) for 5-round AES than for a random permutation. As we are going to show, this result is mainly due to the “multiple-of-8” result [14] proposed at Eurocrypt 2017. For this reason, with respect to the mean value, the variance is independent of the details of the S-Box.

Practical Verification and Influence of the S-Box Details on the Mean. We practically verified the mean on small-scale 5-round AES (namely, AES defined over \(\mathbb F_{2^4}^{4\times 4}\) as proposed in [7]), and the variance both for small-scale and real 5-round AES. As discussed in Sect. 7, practical results are close to the theoretical ones in both cases. Before going on, we mention that the theoretical and the practical results regarding the mean (almost) match if the S-Box satisfies an “APN-like” assumption on the S-Box which closely resembles the AES S-Box, namely, if the solutions of the equality \(\text {S-Box}(\cdot \oplus \varDelta _I) \oplus \text {S-Box}(\cdot )=\varDelta _O\) are uniformly distributed for each non-zero input/output differences \(\varDelta _I, \varDelta _O\ne 0\). In the case in which this assumption – also used in other related works as [1, 3] – is not satisfied, then a gap between the theoretical and the practical results can occur, as showed and discussed in details in the extended version of this paper – see [13, App. C].

Probability Distribution of 5-Round AES. By combining the multiple-of-8 property presented in [14], the mixture differential cryptanalysis [11, 12] and the results just mentioned about the mean and the variance, in Sect. 3 we show the following: given a diagonal space of \(2^{32}\) plaintexts with one active diagonal, the probability distribution of the number of different pairs of ciphertexts which are equal in one fixed anti-diagonal after 5-round AES (without the final MixColumns operation) with respect to (1st) all possible secret keys and (2nd) all possible initial diagonal spaces is well described by a sum of independent binomial distributions \(\mathfrak B(n, p)\), that is

$$ 2^3 \times \mathfrak B(n_3, p_3) + 2^{10} \times \mathfrak B(n_{10}, p_{10}) + 2^{17} \times \mathfrak B(n_{17}, p_{17}) $$

where the values of \(n_3, n_{10}, n_{17}\) and \(p_3, p_{10}, p_{17}\) are provided in the following.

1.2 Follow-Up Works: Truncated Differentials for 5-/6-Round AES

Before going on, we recall the other results concerning truncated differentials for 5- or 6-round AES present in the literature.

In [1], Bao, Guo and List presented “extended expectation cryptanalysis” (or “extended truncated differential”) on round-reduced AES. By making use of expectation-based distinguishers, they are able to show how to extend the well-known 3-round integral distinguisher to truncated differential secret-key distinguishers over 4, 5 and even 6 rounds. The technique exploited to derive such a result is based on results by Patarin [20], who observed that the expected (average) number of collisions differs slightly for a sum of permutations from the ideal. At the same time, authors showed that their results (namely, the expectation distinguishers over 4-, 5- and 6-round AES proposed in the main part of [1]) can be derived exploiting the same technique/strategy that we are going to propose in this paper in Sect. 6, as showed in details in [1, App. C].

Later on, in [3] Bardeh and Rønjom developed another technique in order to set an equivalent truncated differential distinguishers for up to 6-round AES. Such technique – called the “exchange equivalence attack” – resembles the yoyo technique [21] and the mixture differential cryptanalaysis [11], and it allows to give a precise estimation of the average number of pairs of ciphertexts that are equal in fixed anti-diagonal(s), given a particular set of chosen plaintexts. The corresponding secret-key distinguisher on 6-round AES has complexity of about \(2^{88.2}\) computations and chosen texts.

Remark. Before going on, we remark that all these results are valid only under the “APN” assumption of the S-Box previously mentioned. Namely, both our and the theoretical results proposed in [1, 3] regarding the average number of collisions after 5 or more rounds of AES hold only in the case in which the solutions of the equality \(\text {S-Box}(\cdot \oplus \varDelta _I) \oplus \text {S-Box}(\cdot )=\varDelta _O\) are uniformly distributed for each non-zero input/output differences \(\varDelta _I, \varDelta _O\ne 0\), an assumption that is (almost) satisfied by the AES S-Box. More details about this are provided in the following.

2 Preliminary

2.1 Advanced Encryption Standard (AES)

AES [9] is a Substitution-Permutation network based on the “Wide Trail Design” strategy [10], that supports key size of 128, 192 and 256 bits. The 128-bit plaintext initializes the internal state as a \(4 \times 4\) matrix of bytes as values in the finite field \(\mathbb F_{2^8}\). Depending on the version of AES, \(N_r\) rounds are applied to the state: \(N_r=10\) for AES-128, \(N_r=12\) for AES-192 and \(N_r=14\) for AES-256. An AES round applies four operations to the state matrix:

  • SubBytes (S-Box) - applying the same 8-bit to 8-bit invertible S-Box 16 times in parallel on each byte of the state (provides non-linearity in the cipher);

  • ShiftRows (SR) - cyclic shift of each row to the left;

  • MixColumns (MC) - multiplication of each column by a constant \(4 \times 4\) invertible matrix (MC and SR provide diffusion in the cipher);

  • AddRoundKey (ARK) - XORing the state with a 128-bit subkey k.

One round of AES can be described as \( R(x) =k \oplus MC\circ SR \circ \text { S-Box} (x)\). In the first round an additional AddRoundKey operation (using a whitening key) is applied, and in the last round the MixColumns operation is omitted.

Notation Used in the Paper. Let x denote a plaintext, a ciphertext, an intermediate state or a key. Then, \(x_{i, j}\) with \(i, j \in \{0, \ldots , 3\}\) denotes the byte in the row i and in the column j. We denote by R one round of AES (and \(R_f\) if the MixColumns operation is omitted), while we denote r rounds of AES by \(R^{r}\) (where we use the notation \(R^r_f\) in the case in which the last MixColumns operation is omitted). We also define the diagonal and the anti-diagonal of a text as follows. The i-th diagonal of a \(4 \times 4\) matrix A is defined as the elements that lie on row r and column c such that \(r- c \equiv _4 i\). The i-th anti-diagonal of a \(4 \times 4\) matrix A is defined as the elements that lie on row r and column c such that \(r+c \equiv _4 i\).

2.2 Properties of an S-Box

Given a bijective S-Box function on \(\mathbb F_{2^n}\), let \(\varDelta _I, \varDelta _O \in \mathbb F_{2^n}\). Let \(N_{\varDelta _I, \varDelta _O}\) denotes the number of solutions of the equation

$$\begin{aligned} \text {S-Box}(x \oplus \varDelta _{I}) \oplus \text {S-Box}(x) = \varDelta _{O} \end{aligned}$$
(1)

for each \(\varDelta _I\ne 0\) and \(\varDelta _O \ne 0\). Obviously, (i) x is a solution if and only if \(x\oplus \varDelta _I\) is a solution, and (ii) if \(\varDelta _O = 0\), then any \(x \in \mathbb F_{2^n}\) is a solution if and only if \(\varDelta _I = 0\) (the S-Box is bijective).

Let’s analyze the probability distribution related to \(N_{\varDelta _I, \varDelta _O}\).

Mean Value. Independently of the details of the S-Box, the mean value (or the average value) of \(N_{\varDelta _I, \varDelta _O}\) is equal to \(\mathbb E[N_{\varDelta _I, \varDelta _O}] = \frac{2^n}{2^n-1}\). Indeed, observe that for each x and for each \(\varDelta _I \ne 0\) there exists \(\varDelta _O \ne 0\) (since S-Box is bijective) that satisfies Eq. (1). Thus, the average number of solutions is \(\frac{2^n\cdot (2^n-1)}{(2^n-1)^2} = \frac{2^n}{(2^n-1)}\) independently of the details of the (bijective) S-Box.

Variance. The variance \(\mathtt {\texttt{Var}}(N_{\varDelta _I, \varDelta _O})\) depends on the details of the S-Box. For the AES S-Box case, for each \(\varDelta _I \ne 0\) there are 128 values of \(\varDelta _O \ne 0\) for which Eq. (1) has no solution, 126 values of \(\varDelta _O \ne 0\) for which Eq. (1) has 2 solutions (\(\hat{x}\) is a solution if and only if \(\hat{x} \oplus \varDelta _I\) is a solution) and finally 1 value of \(\varDelta _O \ne 0\) for which Eq. (1) has 4 solutions. The variance for the AES S-Box is so equal to \(\mathtt {\texttt{Var}}_{AES}(N_{\varDelta _I, \varDelta _O}) = 2^2\cdot \frac{126}{255} + 4^2 \cdot \frac{1}{255} - \left( \frac{256}{255}\right) ^2 =\frac{67\,064}{65\,025}\).

Maximum Differential Probability. The Maximum Differential Probability \(\mathtt {DP_{max}}\) of an S-Box is defined as

$$\begin{aligned} \mathtt {DP_{max}} = 2^{-n} \cdot \max _{\varDelta _I \ne 0, \varDelta _O} N_{\varDelta _I, \varDelta _O}\,. \end{aligned}$$
(2)

Since \(\max _{\varDelta _I \ne 0, \varDelta _O} N_{\varDelta _I, \varDelta _O}\ge 2\), \(\mathtt {DP_{max}}\) is always bigger than or equal to \(2^{-n+1}\). Permutations with \(\mathtt {DP_{max}} = 2^{-n+1}\) are called Almost Perfect Nonlinear (APN).

“Homogeneous” S-Box. Finally, given \(\varDelta _I\ne 0\) (respectively, \(\varDelta _O\ne 0\)), consider the probability distribution of \(N_{\varDelta _I, \varDelta _O}\) with respect to \(\varDelta _O\ne 0\) (respectively, \(\varDelta _I\ne 0\)): we say that the S-Box is (differential) “homogeneous” if such distribution is independent of \(\varDelta _I\) (respectively, \(\varDelta _O\)). As a concrete example, the AES S-Box is differential “homogeneous”, since for each \(\varDelta _I\ne 0\) (fixed), \(\text {{Pr}}(N_{\varDelta _I, \varDelta _O} = 2) = \frac{126}{255}\) and \(\text {{Pr}}(N_{\varDelta _I, \varDelta _O} = 4) = \frac{1}{255}\). Other examples of S-Boxes that are/are not differential “homogeneous” are given in the extended version of this paper – see [13, App. C].

3 Probability Distribution for 5-Round AES

In this section, we first recall some results already published in the literature about round-reduced AES. Then, given a diagonal space of \(2^{32}\) plaintexts with one active diagonal, we present the probability distribution of the number of different pairs of ciphertexts which are equal in one fixed anti-diagonal after 5-round AES (without the final MixColumns operation).

3.1 Truncated Differentials for 2-Round AES

Here we recall the truncated differential for 2-round AES using the subspace trail notation introduced in [15]. In the following, we only work with vectors and vector spaces over \(\mathbb F_{2^n}^{4 \times 4}\), and we denote by \(\{e_{0,0}, \ldots , e_{3,3}\}\) the unit vectors of \(\mathbb F_{2^n}^{4 \times 4}\) (e.g., \(e_{i,j}\) has a single 1 in row i and column j).

Definition 1

For each \(i\in \{0,1,2,3\}\):

  • The column spaces \(\mathcal C_i\) are defined as \(\mathcal C_i = \langle e_{0, i}, e_{1, i}, e_{2, i}, e_{3, i} \rangle \).

  • The diagonal spaces \(\mathcal D_i\) are defined as \(\mathcal D_i = SR^{-1}(\mathcal C_i)\). Similarly, the inverse-diagonal spaces \(\mathcal{I}\mathcal{D}_i\) are defined as \(\mathcal{I}\mathcal{D}_i = SR(\mathcal C_i)\).

  • The i-th mixed spaces \(\mathcal M_i\) are defined as \(\mathcal M_i = MC (\mathcal{I}\mathcal{D}_i)\).

Definition 2

For each \(I \subseteq \{0, 1, 2, 3\}\), let \(\mathcal C_I\), \(\mathcal D_I\), \(\mathcal{I}\mathcal{D}_I\) and \(\mathcal M_I\) be defined as

$$ \mathcal C_I = \bigoplus _{i\in I} \mathcal C_i\,, \qquad \mathcal D_I = \bigoplus _{i\in I} \mathcal D_i\,, \qquad \mathcal{I}\mathcal{D}_I = \bigoplus _{i\in I} \mathcal{I}\mathcal{D}_i\,, \qquad \mathcal M_I = \bigoplus _{i\in I} \mathcal M_i\,. $$

Definition 3

Let \(t \in \mathbb F_{2^n}^{4\times 4}\) be a text in a coset of a space \(\mathcal X \subseteq \mathbb F_{2^n}^{4\times 4}\) such that \(\mathcal X= \langle x_0, x_1, \ldots , x_{d-1}\rangle \) where \(\dim (\mathcal X) = d\), namely \(t \in \mathcal X \oplus \gamma \). Given \(\gamma \), \((t_0, t_1,\ldots , t_{d-1})\in \mathbb F_{2^n}^d\) are the generating variables of t if the following holds:

$$ t\equiv ( t_0,t_1,\ldots , t_{d-1}) \qquad \text { if and only if} \qquad t = \gamma \oplus \bigoplus _{j=0}^{d-1} t_j \cdot x_{j}. $$

As shown in detail in [15], for any coset \(\mathcal D_I \oplus \alpha \) there exists \(\beta \in \mathbb F_{2^8}^{4\times 4}\) such that \(R(\mathcal D_I \oplus \alpha ) = \mathcal C_I \oplus \beta \). In a similar way, for any coset \(\mathcal C_I \oplus \beta \) there exists \(\gamma \in \mathbb F_{2^8}^{4\times 4}\) such that \(R(\mathcal C_I \oplus \beta ) = \mathcal M_I \oplus \gamma \).

Theorem 1

([15]). For each \(I\subseteq \{0,1,2,3\}\) and for each \(\alpha \in \mathbb F_{2^8}^{4\times 4}\), there exists \(\beta \in \mathbb F_{2^8}^{4\times 4}\) such that \(R^{2}(\mathcal D_I \oplus \alpha ) = \mathcal M_I \oplus \beta \). Equivalently:

$$\begin{aligned} \text {Prob}(R^{2}(x) \oplus R^{2}(y) \in \mathcal M_I \, | \, x \oplus y \in \mathcal D_I) = 1\,. \end{aligned}$$
(3)

3.2 Multiple-of-8 Property and Mixture Differential Cryptanalysis

As already recalled in the introduction, the first known property independent of the secret-key for 5-round AES – called “multiple-of-8” property [14] – has been presented at Eurocrypt 2017.

Theorem 2

([14]). Let \(\{p^i\}_{i\in \{0, 1, \ldots , 2^{32\cdot d} -1\}}\) be \(2^{32\cdot d}\) plaintexts with \(1\le d\le 3\) active diagonals, or equivalently in the same coset of a diagonal subspace \(\mathcal D_I\) for a certain \(I \subseteq \{0,1,2,3\}\) with \(|I|=d\). Consider the corresponding ciphertexts after 5 rounds (without the final MixColumns operation), that is, \((p^i, c^i)\) for \(i \in \{ 0, \ldots , 2^{32\cdot |I|}-1\}\) where \(c ^i = R^5_f(p^i)\). The number of different pairsFootnote 1 of ciphertexts \((c^i, c^j)\) that are equal in \(1\le a\le 3\) anti-diagonals (i.e., that belong to the same coset of a subspace \(\mathcal{I}\mathcal{D}_J\) for a certain \(J \subseteq \{0,1,2,3\}\) with \(|J|=4-a\)) is always a multiple of 8, independently of the secret key, of the details of the S-Box and of the MixColumns matrix.

We refer to [6, 11, 14] for details. Such a result is strictly related to the mixture differential cryptanalysis [11] proposed at FSE/ToSC’19.

Theorem 3

([11]). Let \(t^1,t^2\) be two texts in \(\mathcal C_i \oplus \gamma \) for a certain \(i\in \{0,1,2,3\}\), namely two plaintexts that differ in the i-th column only. Let \(t^1 \equiv ( x^1_0, x^1_1, x^1_2, x^1_3)\) and \(t^2 \equiv ( x^2_0, x^2_1, x^2_2, x^2_3)\) be their generating variables. Let \(s^1, s^2 \in \mathcal C_i \oplus \gamma \) be defined as following:

  • if \(x^1_i \ne x^2_i\) for a certain \(i\in \{0,1,2,3\}\): the i-th generating variable \(s^1_i\) of \(s^1\) is either \(x^1_i\) or \(x^2_i\), and the i-th generating variable of \(s^2\) is \(\{x^1_i, x^2_i\} \setminus s^1_i\);

  • if \(x^1_i = x^2_i\) for a certain \(i\in \{0,1,2,3\}\): the i-th generating variable \(s^1_i\) of \(s^1\) is equal to the i-th generating variable of \(s^2\) (no condition on the value).

The following holds:

  1. 1.

    \(R^2(t^1) \oplus R^2(t^2) = R^2(s^1) \oplus R^2(s^2)\);

  2. 2.

    for each \(J\subseteq \{0,1,2,3\}\):

    $$\begin{aligned} R^4(t^1) \oplus R^4(t^2) \in \mathcal M_J \quad \text { if and only if } \quad R^4(s^1) \oplus R^4(s^2) \in \mathcal M_J\,. \end{aligned}$$

3.3 Main Result: Probability Distribution for 5-Round AES

Given a set of \(2^{32\cdot d}\) plaintexts with \(1\le d\le 3\) active diagonal(s), consider the probability distribution of the number of pairs of ciphertexts which are equal in \(1\le a\le 3\) fixed anti-diagonal(s) (without the final MixColumns operation):

  • what can we say about the mean, the variance and the skewness of this distribution?

  • does the multiple-of-8 property influence the average number of output pairs that lie in a particular subspace (i.e., the mean)? Are other parameters (as the variance and the skewness) affected by the multiple-of-8 property?

Here we answer these questions.

Theorem 4

Given an AES-like cipher that works with texts in \(\mathbb F_{2^8}^{4\times 4}\), assume that (1st) the MixColumns matrix is an MDS matrix and that (2nd) the solutions of the equation \(\text {S-Box}(x \oplus \varDelta _{I}) \oplus \text {S-Box}(x) = \varDelta _{O}\) are uniformly distributed for each non-zero input/output difference \(\varDelta _I \ne 0\) and \(\varDelta _O\ne 0\).

Given \(2^{32}\) plaintexts \(\{p^i\}_{i\in \{0, 1, \ldots , 2^{32} -1\}}\) with one active diagonal (i.e., in a coset of a diagonal subspace \(\mathcal D_i\) for \(i\in \{0,1,2,3\}\)), consider the number of different pairs of ciphertexts \((c^h, c^j)\) for \(h\ne j\) that belong into the same coset of \(\mathcal{I}\mathcal{D}_J\) for any fixed \(J\subseteq \{0,1,2,3\}\) with \(|J|=3\). The corresponding probability distribution – denoted in the following by \(\mathfrak {D}_{\text {5-AES}}\) – with respect to

  • all possible initial coset of the diagonal space \(\mathcal D_i\), and

  • all possible secret keys

is given by

$$\begin{aligned} \mathfrak {D}_{\text {5-AES}} = 2^3 \times \mathfrak B(n_3, p_3) + 2^{10} \times \mathfrak B(n_{10}, p_{10}) + 2^{17} \times \mathfrak B(n_{17}, p_{17}), \end{aligned}$$
(4)

where \(\mathfrak B_i \sim \mathfrak B(n_i, p_i)\) for \(i\in \{3,10,17\}\) are binomial distributions, and where \(n_i\) and \(p_i\) for \(i\in \{3,10,17\}\) are equal to

$$\begin{aligned} n_3&= 2^{28} \cdot (2^8-1)^4\,,&\qquad&p_3 = 2^{-32} + 2^{-53.983}\,;\\ n_{10}&=2^{23} \cdot (2^8-1)^3\,,&\qquad&p_{10} = 2^{-32} - 2^{-45.989}\,;\\ n_{17}&= 3\cdot 2^{15}\cdot (2^8-1)^2\,,&\qquad&p_{17} = 2^{-32} + 2^{-37.986}\,. \end{aligned}$$

Such distribution has mean value \(\mu = 2\,147\,484\,685.6\), and standard deviation \(\sigma = 277\,204.426\).

In order to prove Theorem 4, we first derive the values \(n_i\) for \(i=3,10,17\) and prove the result given in Eq. (4). In the next sections, we formally compute the probabilities \(p_i\) for \(i\in \{3, 10,17\}\), the value of the mean and the variance.

4 Initial Considerations

About the S-Box: “Uniform Distribution of the Solutions of \({{\textbf {S-Box}}}(\cdot \oplus \varDelta _{I}) \oplus {{\textbf {S-Box}}}(\cdot ) = \varDelta _{O}\). Before going further, we discuss the assumptions of Theorem 4, focusing on the one related to the properties/details of the S-Box. The fact that “the solutions of Eq. (1) are uniformly distributed for each \(\varDelta _I \ne 0\) and \(\varDelta _O\ne 0\)” basically corresponds to an S-Box that satisfies the following properties:

  1. 1.

    it is “homogeneous” (defined in Sect. 2.2);

  2. 2.

    its variance \(\mathtt {\texttt{Var}}(N_{\varDelta _I, \varDelta _O})\) is as “lower” as possible.Footnote 2

This is close to being true if the S-Box is APN, or if the S-Box is “close” to be APN. Although much is known for (bijective) APN permutations in odd dimension, it is known that there is no APN permutation of dimension 4 [18], there is at least one APN permutation, up to equivalence, of dimension 6 (that is, the Dillon’s permutation), while the question of finding an APN bijective (nn)-function for even \(n \ge 8\) is still open. As a result, in the case of dimensions equal to a power of 2 (e.g., \(\mathbb F_{2^4}\) or \(\mathbb F_{2^8}\)), the only (known) S-Box that (approximately) matches the assumptions of the Theorem in dimensions 4 or 8 is the one generated by the multiplicative-inverse permutationFootnote 3, as for example the AES S-Box, which is not APN but differentially 4-uniform [19] (e.g., note that the variance of the AES S-Box is \(67\,064/65\,025\) vs \(64\,004/65\,025\) of an APN S-Box). As we are going to show, our practical results on small-scale AES (for which the S-Box has the same property as the full-size AES one) are very close to the one predicted by the previous Theorem.

We remark that even if the assumptions on the S-Box of Theorem 4 are restrictive, they match criteria used to design an S-Box which is strong against differential and linear cryptanalysis. As a result, many ciphers in the literature are built using S-Boxes which (are close to) satisfy the assumptions of Theorem 4.

Influence of the S-Box. If the S-Box does not satisfy the required properties related to the assumption of the Theorem, then the average number of collisions can be different from the one previously given. To be more concrete, in the extended version of this paper [13, App. C], we provide several practical examples of the dependency of the average number of collisions for small-scale AES-like ciphers with respect to the properties of the S-Box. We also mention that, in the case in which the assumption about the S-Box is not fulfilled, it turned out (by practical tests) that also the details of the MixColumns matrix can influence the average number of collisions.

Probability Distribution of a Random Permutation. Here we briefly compare the probability distribution for 5-round AES and the one of a random permutation. This fact can be used to set up new truncated differential distinguishers for 5-round AES, as we are going to show concretely in the extended version of this paper [13, Sect. 8].

Proposition 1

Consider \(2^{32}\) plaintexts \(\{p^i\}_{i\in \{0, 1, \ldots , 2^{32} -1\}}\) with one active diagonal (equivalently, a coset of a diagonal space \(\mathcal D_i\) for \(i\in \{0,1,2,3\}\)), and the corresponding (cipher)texts generated by a random permutation \(\varPi \), that is \(c^i = \varPi (p^i)\). The probability distribution of the number of different pairs of ciphertexts \((c^h, c^j)\) that belong to the same coset of \(\mathcal{I}\mathcal{D}_J\) for any fixed \(J\subseteq \{0,1,2,3\}\) with \(|J|=3\) is given by a binomial distribution \(\mathfrak B(n, p)\), where \( n = \left( {\begin{array}{c}2^{32}\\ 2\end{array}}\right) = 2^{31} \cdot (2^{32}-1)\) and \(p = \frac{2^{96}-1}{2^{128}-1}\approx 2^{-32}\). The average number of collisions of such distribution is equal to \(2^{31}-0.5 = 2\,147\,483\,647.5\), while its variance is equal to \(2\,147\,483\,647 \simeq 2^{31}\).

It follows that:

  • independently of the secret key, the average number of pairs of ciphertexts which are equal in one fixed anti-diagonal is (a little) bigger for 5-round AES than for a random permutation (approximately \(1\,038.1\) more collisions);

  • independently of the secret key, the variance of the probability distribution of the number of collisions is much bigger for 5-round AES than for a random permutation (approximately of a factor 36).

To highlight this difference, Fig. 1 proposes a comparison between the probability distribution of the number of collisions for the AES case (approximated here for simplicity by a normal distribution) in red and of the random case in blue.

Fig. 1.
figure 1

Comparison between the theoretical probability distribution of the number of collisions between 5-round AES (approximated – only here – by a normal distribution) and a random permutation. Remark: since the AES probability distribution – in red – satisfies the multiple-of-8 property, then the probability in the case in which the number of collision n is not a multiple of 8 is equal to zero, namely \(Prob( n \ne 8 \cdot n^\prime ) = 0\). (Color figure online)

5 Proof of Theorem 4: Sum of Binomial Distributions

Consider a set of \(2^{32}\) plaintexts with one active diagonal and the corresponding ciphertexts after 5-round AES (without the final MixColumns operation). As shown by the multiple-of-8 property [14] and by the mixture differential cryptanalysis [11], the corresponding pairs of ciphertexts of such set of plaintexts are not independent/unrelated. In particular, these pairs of texts can be divided in \(n_3+n_{10}+n_{17} + n_{24}\) sets defined as in [11] (recalled in Theorem 3) such that

  1. 1.

    for each \(i\in \{3,10,17,24\}\), exactly \( n_{i}\) sets have cardinality \(2^{i}\);

  2. 2.

    each one of these sets contains pairs of texts for which i out of the four generating variables are equal (and \(4-i\) are different) after 1-round encryption;

  3. 3.

    given each one of such sets, it is not possible that some pairs of ciphertexts are equal in \(1\le a\le 3\) anti-diagonals (i.e., that belong to the same coset of \(\mathcal{I}\mathcal{D}_J\)) after 5-round, while other pairs of ciphertexts in the same set are not equal in those a anti-diagonals;

  4. 4.

    pairs of texts of different sets are independent (in the sense that pairs of texts of different sets do not satisfy the property just given for the case of pairs of texts that belong to the same set).

The values of \(n_3, n_{10}, n_{17}, n_{24}\) are computed in details in the next paragraph.

Due to the impossible differential trail on 4-round AES [5, 15], if three out of the four generating variables of the input plaintexts are equal after 1-round encryption, then the corresponding ciphertexts cannot be equal in any anti-diagonal. In other words, the probability \(p_{24}\) is equal to zero. For this reason, we will only focus on \(n_3, n_{10}, n_{17}\) in the following.

About the Values of \(\boldsymbol{n_3, n_{10}, n_{17}}\). Given a set of \(2^{32}\) chosen texts with one active columnFootnote 4, the number of pairs of texts with \(0\le v\le 3\) equal generating variables (and \(4-v\) different generating variables) after one round is given by

$$\begin{aligned} \left( {\begin{array}{c}4\\ v\end{array}}\right) \cdot 2^{31} \cdot (2^{8}-1)^{4-v}\,. \end{aligned}$$
(5)

Indeed, note that if v variables are equal for the two texts of the given pair, then these variables can take \((2^8)^v\) different values. For each one of the remaining \(4-v\) variables, the variables must be different for the two texts. Thus, these \(4-v\) variables can take exactly \(\bigl [2^8 \cdot (2^{8}-1)\bigl ]^{4-v} / 2\) different values. The result follows from the fact that there are \(\left( {\begin{array}{c}4\\ v\end{array}}\right) \) different combinations of v variables.

Due to Eq. (5), the number \(n_v\) of the sets of pairs of texts with “no equal generating variables” (namely, \(v = 0\)), the set of pairs of texts with “one equal and three different generating variable(s)” (namely, \(v = 1\)) and finally the set of pairs of texts with “two equal and two different generating variable” (namely, \(v = 2\)) are given by:

$$\begin{aligned} \forall v\in \{0,1,2\}:\qquad n_{7\cdot v +3} = \left( {\begin{array}{c}4\\ v\end{array}}\right) \cdot \frac{2^{31} \cdot (2^{8}-1)^{4-v}}{2^{7\cdot v +3}}\,. \end{aligned}$$
(6)

About Binomial Distributions \(\boldsymbol{B_i \sim \mathfrak B(n_i, p_i)}\) for \(\boldsymbol{i \in \{ 3, 10, 17\}}\). Due to the previous facts, it follows that the probability of the event “\(n=8 \cdot n^\prime \) pairs of ciphertexts equal in one fixed anti-diagonal” for \(n^\prime \in \mathbb N\) – equivalently, “\(n=8 \cdot n^\prime \) collisions” in a coset of \(\mathcal{I}\mathcal{D}_J\) for \(J\subseteq \{0,1,2,3\}\) with \(|J|=3\) – corresponds to the sum of the probabilities to have “\(2^3\cdot k_3\) collisions in the first set and \( 2^{10} \cdot k_{10}\) collisions in the second set and \(2^{17} \cdot k_{17}\) collisions in the third set” for each \(k_3, k_{10}, k_{17}\) such that \(2^3 \cdot k_3 + 2^{10} \cdot k_{10} + 2^{17} \cdot k_{17} = n\).

Each one of these (independent) events is well characterized by a binomial distribution. By definition, a binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. In our case, given n pairs of texts, each one of them satisfies or not the above property/requirement with the same probability p.

Probability Distribution. Due to all these initial considerations (based on the multiple-of-8 property and on the mixture differential cryptanalysis), it follows that the distribution \(5\text {-AES}\) of the number of collisions for the AES case is well described by

$$ \mathfrak {D}_{\text {5-AES}} = 2^3 \times \mathfrak B_3 + 2^{10} \times \mathfrak B_{10} + 2^{17} \times \mathfrak B_{17}\,, $$

where \(\mathfrak B_i \sim \mathfrak B(n_i, p_i)\) for \(i=3, 10, 17\) are independent binomial distributions. In the following, we formally compute the values of \(n_i\) and of \(p_i\).

Mean Value and Variance. Due to the results just presented, it follows that the mean value \(\mu \) of 5-AES is given by

$$\begin{aligned} \mu =&\mathbb E[\mathfrak {D}_{\text {5-AES}}] = \mathbb E[2^3 \times \mathfrak B_3 + 2^{10} \times \mathfrak B_{10} + 2^{17} \times \mathfrak B_{17}] \\ =&2^3 \cdot \mathbb E[\mathfrak B_3] + 2^{10} \cdot \mathbb E[\mathfrak B_{10}] + 2^{17} \cdot \mathbb E[\mathfrak B_{17}] \\ =&2^3 \cdot n_3 \cdot p_3 + 2^{10} \cdot n_{10} \cdot p_{10} + 2^{17} \cdot n_{17} \cdot p_{17}\, , \end{aligned}$$

where \(\mathbb E[a \cdot X + b \cdot Y + c] = a \cdot \mathbb E[X] + b \cdot \mathbb E[Y] + c\) for each \(a, b, c \in \mathbb R\) and for each random variable X and Y. Similarly, the variance \(\sigma ^2\) is given by

$$\begin{aligned} \sigma ^2&=\mathtt {\texttt{Var}}(\mathfrak {D}_{\text {5-AES}}) = \mathtt {\texttt{Var}}(2^3 \times \mathfrak B_3 + 2^{10} \times \mathfrak B_{10} + 2^{17} \times \mathfrak B_{17}) \\&= 2^6 \cdot \mathtt {\texttt{Var}}(\mathfrak B_3) + 2^{20} \cdot \mathtt {\texttt{Var}}(\mathfrak B_{10}) + 2^{34} \cdot \mathtt {\texttt{Var}}(\mathfrak B_{17}) \\&= 2^6 \cdot n_3 \cdot p_3 \cdot (1-p_3)+ 2^{10} \cdot n_{10} \cdot p_{10} \cdot (1-p_{10}) + 2^{17} \cdot n_{17} \cdot p_{17} \cdot (1-p_{17}), \end{aligned}$$

where \(\mathtt {\texttt{Var}}(a \cdot X + b \cdot Y + c) = a^2 \cdot \mathtt {\texttt{Var}}(X) + b^2 \cdot \mathtt {\texttt{Var}}(Y)\) for each \(a, b, c \in \mathbb R\) under the assumption that X and Y are independent random variables (remember that \(\mathfrak B_3, \mathfrak B_{10}, \mathfrak B_{17}\) are independent).

6 Proof of Theorem 4: About the Probabilities \(p_3,p_{10}, p_{17}\)

6.1 Reduction to the Middle Round

In order to compute the probabilities \(p_3, p_{10}\) and \(p_{17}\) given before for 5 rounds AES, the idea is to work on an equivalent result on a single round. Due to the 2-round truncated differential with prob. 1 recalled in Sect. 3.1, we have that

$$\begin{aligned} \mathcal D_i \oplus \delta \xrightarrow [\text {prob. } 1]{R^{2}(\cdot )} \mathcal M_i \oplus \omega \xrightarrow []{R(\cdot )} \mathcal D_J \oplus \delta ^{\prime } \xrightarrow [\text {prob. } 1]{R_f^{2}(\cdot )} \mathcal{I}\mathcal{D}_J \oplus \omega ^{\prime }\,. \end{aligned}$$
(7)

For this reason, it is sufficient to focus on the middle round \(\mathcal M_i \oplus \omega \xrightarrow []{R(\cdot )} \mathcal D_J \oplus \delta ^{\prime }\) in order to compute the desired result.

Sketch and Organization of the Proof. W.l.o.g., we limit ourselves to consider plaintexts in the same coset of \(\mathcal M_0\) and to count the number of texts which are equal in the first diagonal after one round (the other cases are analogous). By definition of \(\mathcal M_0\), if \(p^1, p^2 \in \mathcal M_0 \oplus \omega \), then there exist \(x^i, y^i, z^i, w^i \in \mathbb F_{2^8}\) for \(i\in \{1,2\}\) such that:

$$ p^i = \omega \oplus \begin{bmatrix} 2 \cdot x^i &{} y^i &{} z^i &{} 3\cdot w^i \\ x^i &{} y^i &{} 3 \cdot z^i &{} 2 \cdot w^i \\ x^i &{} 3 \cdot y^i &{} 2 \cdot z^i &{} w^i \\ 3 \cdot x^i &{} 2 \cdot y^i &{} z^i &{} w^i \end{bmatrix} \,, $$

where \(2 \equiv 0\,\times \,02\) and \(3 \equiv 0\,\times \,03\). In the following, we say that \(p^1\) is “generated” by the generating variables \(( x^1, y^1, z^1, w^1 )\) and that \(p^2\) is “generated” by the generating variables \(( x^2, y^2, z^2, w^2 )\). As before, we use the notation \(p^i \equiv ( x^i, y^i, z^i, w^i )\). The proof is organized as follows:

  1. 1.

    first of all, we limit ourselves to consider a subset of \(2^{16}\) texts with only 2 active bytes. Since this case is much simpler to analyze than the generic one, it allows us to highlight the crucial points of the proof;

  2. 2.

    we then present the complete proof for the case of \(2^{32}\) texts in the same coset of \(\mathcal M_0\). Roughly speaking, this case is split in various sub-cases: each one of them is studied/analyzed independently of the others using the same strategy proposed for the simplest case of \(2^{16}\) texts. The final result is obtained by simply combining the results of each one of these sub-cases.

We emphasize that the following computations are not influenced by neither the value of the secret key nor the value of the initial coset of the diagonal subspace \(\mathcal D_i\). That is, the following results are the average with respect to these two values.

6.2 A “Simpler” Case: \(2^{16}\) Texts with Two Equal Generating Variables

As a first case, we consider \(2^{16}\) texts for which two generating variables are equal, e.g., \(z^1=z^2\) and \(w^1=w^2\). Given two texts \(p^1\) generated by \((x^1, y^1, 0, 0)\) and \(p^2\) generated by \((x^2, y^2, 0, 0)\), they are equal in the first diagonal after one round if and only if the following four equations are satisfied

$$\begin{aligned} (R(p^1) \oplus R(p^2))_{0, 0} =&2\cdot (\text {S-Box}(2 \cdot x^1 \oplus a_{0,0}) \oplus \text {S-Box}(2 \cdot x^2 \oplus a_{0,0})) \\&\oplus 3\cdot (\text {S-Box}(y^1 \oplus a_{1,1}) \oplus \text {S-Box}(y^2 \oplus a_{1,1})) = 0\,,\\ (R(p^1) \oplus R(p^2))_{1, 1} =&\text {S-Box}(3 \cdot x^1 \oplus a_{3,0}) \oplus \text {S-Box}(3 \cdot x^2 \oplus a_{3,0}) \\&\oplus \text {S-Box}(y^1 \oplus a_{0,1}) \oplus \text {S-Box}(y^2 \oplus a_{0,1})= 0\,,\\ (R(p^1) \oplus R(p^2))_{2, 2} =&2\cdot (\text {S-Box}(x^1 \oplus a_{2,0}) \oplus \text {S-Box}( x^2 \oplus a_{2,0})) \\&\oplus 3\cdot (\text {S-Box}(2 \cdot y^1\oplus a_{3,1}) \oplus \text {S-Box}(2 \cdot y^2 \oplus a_{3,1}))= 0\,,\\ (R(p^1) \oplus R(p^2))_{3, 3} =&\text {S-Box}(x^1 \oplus a_{1,0}) \oplus \text {S-Box}(x^2 \oplus a_{1,0}) \\&\oplus \text {S-Box}(3 \cdot y^1 \oplus a_{2,1}) \oplus \text {S-Box}(3 \cdot y^2 \oplus a_{2,1})= 0\,, \end{aligned}$$

where \(a_{\cdot , \cdot }\in \mathbb F_{2^8}\) depends on the initial key and on the constant \(\omega \in \mathbb F_{2^8}^{4\times 4}\) that defines the coset. Equivalently, four equations of the form

$$\begin{aligned} \begin{aligned}&A\cdot \left( \text {S-Box}(B \cdot x^1 \oplus a) \oplus \text {S-Box}(B \cdot x^2 \oplus a) \right) \\ \oplus&\,C\cdot \left( \text {S-Box}(D\cdot y^1 \oplus c) \oplus \text {S-Box}(D\cdot y^2 \oplus c) \right) = 0 \end{aligned} \end{aligned}$$
(8)

must be satisfied, where \(A, B, C, D\in \mathbb F_{2^8}\) depend on the MixColumns matrix, while \(a, c\in \mathbb F_{2^8}\) depend on the secret key and on the initial constant \(\omega \).

Number of Solutions of Each Equation. Consider one of these four equations. By simple observation, Eq. (8) is satisfied if and only if the following system of equations is satisfied

$$\begin{aligned} \begin{aligned} \text {S-Box}(\hat{x} \oplus \varDelta _{I}) \oplus \text {S-Box}(\hat{x})&= \varDelta _{O}\\ \text {S-Box}(\hat{y} \oplus \varDelta _{I}^\prime ) \oplus \text {S-Box}(\hat{y})&= \varDelta _{O}^\prime \\ \varDelta _{O}^\prime = C^{-1} \cdot A \cdot \varDelta _{O} \end{aligned} \end{aligned}$$
(9)

for each value of \(\varDelta _{O}\), where \(\hat{x} = B \cdot x^1 \oplus a\), \(\varDelta _{I} = B\cdot (x^1 \oplus x^2)\), \(\hat{y} = D\cdot y^1 \oplus c\) and \(\varDelta _{I}^\prime = D\cdot (y^1 \oplus y^2)\). We emphasize that we exclude null solutions.

What is the number of different (not null) solutions \(\{(x^1, y^1), (x^2, y^2)\}\) of Eq. (8)? Given \(\varDelta _O\ne 0\), each one of the first two equations of (9) admits 256 different solutions \((\hat{x}, \varDelta _I)\) (respectively, \((\hat{y}, \varDelta _I^\prime )\)), since for each value of \(\hat{x} \in \mathbb F_{2^8}\), there exists \(\varDelta _I \ne 0\) that satisfies the first equation (similar for \(\hat{y}\) and \(\varDelta _I^\prime \)). It follows that the number of different solutions \(\{(x^1, y^1), (x^2, y^2)\}\) of Eq. (8) considering all the 255 possible values of \(\varDelta _O\) is exactly equal to

$$ \frac{1}{2} \cdot 255 \cdot ({256})^2 = 255 \cdot 2^{15}\,, $$

Independent of the Details of the S-Box. The factor 1/2 is due to the fact that we consider only different solutions, that is, two solutions of the form \((p^1\equiv (x^1, y^1), p^2\equiv (x^2, y^2))\) and \((p^2\equiv (x^1, y^1), p^1\equiv (x^2, y^2))\) are equivalent. In other words, a solution \(\{(x^1, y^1), (x^2, y^2)\}\) is valid if \(x^2 \ne x^1\) and \(y^1 < y^2\).

Probability of Common Solutions. Knowing the number of solutions of Eq. (8), what is the number of common (different) solutions \(\{(x^1, y^1),\) \((x^2, y^2)\}\) of four equations of the form (8)? We have just seen that each equation of the form (8) has exactly \(255 \cdot 2^{15}\) different (not null) solutions \(\{(x^1, y^1), (x^2, y^2)\}\). Assuming the APN-like assumption on the S-Box and the fact that the MixColumns is defined by an MDS matrix, the probability that two equations admit the same solution (i.e., that \(\{(x^1, y^1), (x^2, y^2)\}\) – solution of one equation – is equal to \(\{(\hat{x}^1, \hat{y}^1), (\hat{x}^2, \hat{y}^2)\}\) – solution of another equation) is

$$\begin{aligned} (256 \cdot 255)^{-1} \cdot (255 \cdot 128)^{-1} = 255^{-2} \cdot 2^{-15}\,. \end{aligned}$$
(10)

To explain this probability, the first term \((256 \cdot 255)^{-1}\) is due to the fact that \(x^1 = \hat{x}^1\) with probability \(256^{-1}\), while \(x^2 = \hat{x}^2\) with probability \(255^{-1}\), since by assumption \(x^2\) (respectively, \(\hat{x}^2\)) cannot be equal to \(x^1\) (respectively, \(\hat{x}^1\)). The second term \((128 \cdot 255)^{-1}\) is due to the assumption on the second variable, that is \(y^1 < y^2\). To explain it, note that the possible number of pairs \((y^1, y^2)\) with \(y^1 < y^2\) is \(\sum _{i=0}^{255} i = \frac{255 \cdot (255 + 1)}{2} = 255 \cdot 128\).Footnote 5 It follows that \(y^1\) and \(y^2\) are equal to \(\hat{y}^1\) and \(\hat{y}^2\) with probability \((128 \cdot 255)^{-1}\).

Total Number of (Different) Common Solutions. In conclusion, the average number of common (different) solutions \(\{(x^1, y^1), (x^2, y^2)\}\) of 4 equations of the form (8) is given by

$$\begin{aligned} (255 \cdot 2^{15})^4 \cdot (255^{-2} \cdot 2^{-15})^3 = \frac{2^{15}}{255^2} \simeq 0.503929258 \simeq 2^{-1}+2^{-7.992}\,. \end{aligned}$$

For comparison, in the case in which the ciphertexts are generated by a random permutation, the average number of pairs of ciphertexts that satisfy the previous property is approximately given by

$$ \left( {\begin{array}{c}2^{16}\\ 2\end{array}}\right) \cdot (2^{-8})^4 = \frac{2^{16}-1}{2^{17}} \simeq 0.499992371 \simeq 2^{-1}-2^{-17}\,. $$

Remark: About the MDS Assumption. We highlight that the probability (10) strongly depends on the assumptions that

  • the solutions of Eq. (1) – hence, the numbers \(N_{\varDelta _I, \varDelta _O}\) – are uniformly distributed for each \(\varDelta _I \ne 0\) and \(\varDelta _O\ne 0\);

  • there is “no (obvious/non-trivial) relation” between the solutions of the studied system of four equations of the form (8). This means that the four Eqs. (8) must be independent/unrelated, in the sense that the solution of one equation is not a solution of another one with probability different than the one given in (10).

Focusing here on this second requirement, a relation among solutions of different equations can arise if some relations hold between the coefficients ABCD of different equations of the form (8). Since these are the coefficients of the MixColumns matrix and since such matrix is MDS, no non-trivial linear relation among the rows/columns of any submatrix exists.

6.3 Generic Case: \(2^{32}\) Texts

As next step, we adapt the strategy just presented in order to analyze the case of \(2^{32}\) texts in the same coset of \(\mathcal M_0\). Two texts \(p^1, p^2\) are equal in one diagonal after one round if and only if four equations of the form

$$\begin{aligned} \begin{aligned}&A\cdot \left( \text {S-Box}(B \cdot x^1 \oplus b) \oplus \text {S-Box}(B \cdot x^2 \oplus b) \right) \\ \oplus&C\cdot \left( \text {S-Box}(D\cdot y^1 \oplus d) \oplus \text {S-Box}(D\cdot y^2 \oplus d)\right) \\ \oplus&E\cdot \left( \text {S-Box}(F\cdot z^1 \oplus f) \oplus \text {S-Box}(F\cdot z^2 \oplus f) \right) \\ \oplus&G\cdot \left( \text {S-Box}(H\cdot w^1 \oplus h) \oplus \text {S-Box}(H\cdot w^2 \oplus h)\right) = 0 \end{aligned} \end{aligned}$$
(11)

are satisfied, where \(A, B, C, D, E, F, G, H\in \mathbb F_{2^8}\) depend only on the MixColumns matrix, while \(b, d, f, h\in \mathbb F_{2^8}\) depend on the secret key and on the constant \(\omega \) that defined the initial coset, as before. Each one of these equations is equivalent to a system of equations like (9), that is:

$$\begin{aligned} \text {S-Box}(\hat{x} \oplus \varDelta _{I}) \oplus \text {S-Box}(\hat{x}) = \varDelta _{O}&\quad&\text {S-Box}(\hat{y} \oplus \varDelta _{I}^\prime ) \oplus \text {S-Box}(\hat{y}) = \varDelta _{O}^\prime \\ \text {S-Box}(\hat{z} \oplus \varDelta _{I}^{''}) \oplus \text {S-Box}(\hat{z}) = \varDelta _{O}^{''}&\quad&\text {S-Box}(\hat{w} \oplus \varDelta _{I}^{'''}) \oplus \text {S-Box}(\hat{w}) = \varDelta _{O}^{'''} \end{aligned}$$

together with one of the following conditions

  1. 1.

    \(\varDelta _{O}^{'''} = \varDelta _{O}^{''}=0\) and \(\varDelta _{O}^\prime = C^{-1} \cdot A \cdot \varDelta _{O}\ne 0\), or analogous (six possibilities in total);

  2. 2.

    \(\varDelta _{O}^{'''} = 0\) and \(\varDelta _{O}, \varDelta _{O}^{'}, \varDelta _{O}^{''}\ne 0\) and \( \varDelta _{O}^{''} = E^{-1} \cdot (A \cdot \varDelta _{O} \oplus C \cdot \varDelta _{O}^\prime ), \) or analogous (four possibilities in total);

  3. 3.

    \(\varDelta _{O}, \varDelta _{O}^{'}, \varDelta _{O}^{''}, \varDelta _{O}^{'''}\ne 0\) and \( \varDelta _{O}^{'''} = G^{-1} \cdot (A \cdot \varDelta _{O} \oplus C \cdot \varDelta _{O}^\prime \oplus E \cdot \varDelta _{O}^{''})\).

First Case. Since the first case (\(\varDelta _{O}^{'''} = \varDelta _{O}^{''}=0\)) is analogous to the case in which two generating variables are equal, we can limit ourselves to re-use the previous computation. In the case \(\varDelta _{O}^{'''} = \varDelta _{O}^{''}=0\) and \(\varDelta _{O}^\prime = C^{-1} \cdot A \cdot \varDelta _{O}\ne 0\), the only possible solutions of the third and fourth equations are of the form \((\hat{z}, \varDelta _I^{''} = 0)\) and \((\hat{w}, \varDelta _I^{'''} = 0)\) for each possible value of \(\hat{z},\hat{w}\in \mathbb F_{2^8}\). Using the same computation as before, the average number of common solutions for this case is

$$\begin{aligned} \left( {\begin{array}{c}4\\ 2\end{array}}\right) \cdot 256^2 \cdot \frac{2^{15}}{255^2} = \frac{2^{32}}{21\,675} \simeq 198\,153.047\,. \end{aligned}$$
(12)

About Probability \(p_{17}\). By definition of probability, the probability \(p_{17}\) – given in Theorem 4 – that pairs of texts with two equal (and two different) generating variables are equal in one diagonal after one round is given by:

$$\begin{aligned} p_{17} = \frac{1}{2^{17} \times n_{17}} \cdot \frac{2^{32}}{21\,675} = 2^{-32} + 2^{-37.98588}\,, \end{aligned}$$
(13)

where \(2^{17} \times n_{17}\) is the total number of pairs of texts with two equal (and two different) generating variables.

Second Case. Consider now the case \(\varDelta _{O}^{'''} = 0\) and \(\varDelta _{O}, \varDelta _{O}^\prime , \varDelta _{O}^{''} \ne 0\) (i.e., \(\varDelta _{I}, \varDelta _{I}^\prime , \varDelta _{I}^{''} \ne 0\)). First of all, note that \(\varDelta _{O} \ne 0\) can take 255 different values, while \(\varDelta _{O}^\prime \ne 0\) can take only 254 different values (since it must be different from 0 and from \(C^{-1} \cdot A \cdot \varDelta _{O}\)).

Using the same argumentation given before, for each Eq. (11) the number of different solutions \(\{(x^1, y^1, z^1, w^1), (x^2, y^2, z^2, w^2)\}\) – with \(z^1 < z^2\) and where \(w^1 = w^2\) – is given by \( \left( {\begin{array}{c}4\\ 1\end{array}}\right) \cdot 256 \cdot \left( \frac{1}{2} \cdot 255 \cdot 254 \cdot ( {256})^3 \right) = 2^{10} \cdot \left( 32\,385 \cdot 2^{24}\right) \), where the initial factor \(\left( {\begin{array}{c}4\\ 1\end{array}}\right) \cdot 256\) is due to the condition \(w^1 = w^2\) and on the fact that there are four analogous cases (namely, \(x^1 = x^2\) or \(y^1=y^2\) or \(z^1 = z^2\)). Similar to before, the probability that two equations of the form (11) – where \(w^1 = w^2\) – have a common solution is given by \( (256\cdot 255)^{-2} \cdot (128 \cdot 255)^{-1} = 2^{-23}\cdot 255^{-3} \) under (1st) the assumption of uniform distribution of the solutions \(n_{\varDelta _I, \varDelta _O}\) of Eq. (1) and (2nd) the assumption that there is “no (obvious/non-trivial) relation” between the solutions of the studied system of four equations of the form (11). It follows that the average number of common solutions for the four equations of the form (11) is

$$\begin{aligned} \left( {\begin{array}{c}4\\ 1\end{array}}\right) \cdot 256\cdot (32\,385 \cdot 2^{24})^4 \cdot (2^{-23}\cdot 255^{-3})^3 = \frac{127^4 \cdot 2^{37}}{255^5} \simeq 33\,160\,710.047\,. \end{aligned}$$
(14)

About Probability \(p_{10}\). As before, the probability \(p_{10}\) – given in Theorem 4 – that pairs of texts with one equal (and three different) generating variable(s) are equal in one diagonal after one round is given by:

$$\begin{aligned} p_{10} = \frac{1}{2^{10} \times n_{10}} \cdot \frac{127^4 \cdot 2^{37}}{255^5} = 2^{-32} - 2^{-45.98874}\,. \end{aligned}$$
(15)

Third Case. We finally consider the case \(\varDelta _{O}, \varDelta _{O}^{\prime }, \varDelta _{O}^{''}, \varDelta _{O}^{'''}\ne 0\). By simple computation, the number of different values that satisfy \( \varDelta _{O}^{'''} = G^{-1} \cdot (A \cdot \varDelta _{O} \oplus C \cdot \varDelta _{O}^\prime \oplus E \cdot \varDelta _{O}^{''})\). is given by \(255^3 - (255 \cdot 254) = 16\,516\,605\). Indeed, the total number of \(\varDelta _{O}, \varDelta _{O}^{\prime }, \varDelta _{O}^{''}\ne 0\) is \(255^3\), while \(255\cdot 254\) is the total number of values \(\varDelta _{O}, \varDelta _{O}^{\prime }, \varDelta _{O}^{''}\ne 0\) for which \(\varDelta _{O}^{'''}\) is equal to zero (which is not possible since \(\varDelta _{O}^{'''}\ne 0\) by assumption). In more detail, firstly observe that for each value of \(\varDelta _{O}\) there is a value of \(\varDelta _{O}^{'}\) that satisfies \(A \cdot \varDelta _{O} = C\cdot \varDelta _{O}^\prime \). For this pair of values \((\varDelta _O, \varDelta _{O}^\prime = C^{-1} \cdot A \cdot \varDelta _{O})\), the previous equation \(\varDelta _{O}^{'''} = G^{-1} \cdot E \cdot \varDelta _{O}^{''}\) is always different from zero, since \(\varDelta _{O}^{''} \ne 0\). Secondly, for each one of the \(255 \cdot 254\) values of the pair \((\varDelta _O, \varDelta _{O}^\prime \ne C^{-1} \cdot A \cdot \varDelta _{O})\), there is only one value of \(\varDelta _{O}^{''}\) such that the previous equation is equal to zero.

Hence, the total number of different solutions \(\{(x^1, y^1, z^1, w^1), (x^2, y^2,\) \( z^2, w^2)\}\) with \(w^1<w^2\) of each equation corresponding to (11) is \( \frac{1}{2} \cdot 16\,516\,605 \cdot \) \( ({256})^4 = 16\,516\,605 \cdot 2^{31}\). Since the probability that two solutions \(\{(x^1, y^1,\) \( z^1, w^1),(x^2, y^2, z^2,\) \( w^2)\}\) and \(\{(\hat{x}^1, \hat{y}^1,\) \( \hat{z}^1, \hat{w}^1),(\hat{x}^2, \hat{y}^2, \hat{z}^2, \hat{w}^2)\}\) are equal is \((255\cdot 256)^{-3} \cdot (255 \cdot 128)^{-1} = 255^{-4} \cdot 2^{-31}\) under (1st) the assumption of uniform distribution of the solutions of Eq. (1) and (2nd) the assumption that there is “no (obvious/non-trivial) relation” between the solutions of the studied system of four equations of the form (11), the average number of common solutions (with no equal generating variables) is

$$\begin{aligned} \bigl (16\,516\,605 \cdot 2^{31}\bigl )^4 \cdot (255^{-4} \cdot 2^{-31})^3 = \frac{64\,771^4\cdot 2^{31}}{255^{8}} \simeq 2\,114\,125\,822.5 \,. \end{aligned}$$
(16)

About Probability \(p_{3}\). As before, the probability \(p_{3}\) given in Theorem 4 that pairs of texts with no equal generating variable are equal in one diagonal after one round is given by:

$$\begin{aligned} p_{3} = \frac{1}{2^{3} \times n_{3}} \cdot \frac{64\,771^4\cdot 2^{31}}{255^{8}} = 2^{-32} + 2^{-53.98306}\,. \end{aligned}$$
(17)

Total Number of (Different) Common Solutions. Based on the results just proposed, given plaintexts in the same coset of \(\mathcal M_0\), the number of different pairs of ciphertexts that are equal in one fixed diagonal after 1-round (equivalently, the number of collisions in \(\mathcal D_J\) for \(|J|=3\)) is

$$ 2\,114\,125\,822.5 + 33\,160\,710.047 + 198\,153.047 \simeq 2\,147\,484\,685.594 \simeq 2^{31} + 2^{10.02}\,. $$

Since the total number of pairs of texts is \(2^{31} \cdot (2^{32}-1)\), the probability for the AES case that a couple of ciphertexts \((c^1, c^2)\) satisfies \(c^1 \oplus c^2 \in \mathcal D_J\) for \(|J|=3\) fixed is equal to

$$ p_{AES} \simeq \frac{2\,147\,484\,685.594}{2^{31}\cdot (2^{32}-1)} \simeq 2^{-32} + 2^{-52.9803} $$

versus \(\approx 2^{-32} - 2^{-128}\) for the case of a random permutation.

7 Practical Results for 5-Round AES

We have practically verified the mean and the variance for 5-round AES given above (in Theorem 4) using a C/C++ implementationFootnote 6. In particular, we have verified the mean value on a small-scale AES as proposed in [7], and the variance value both on full-size and on the small-scale AES.

7.1 Probability Distribution of 5-Round AES over \((\mathbb F_{2^n})^{4\times 4}\)

Firstly, we generalize Theorem 4 for the case of 5-round AES defined over \(\mathbb F_{2^n}^{4\times 4}\).

Proposition 2

Consider an AES-like cipher that works with texts in \(\mathbb F_{2^n}^{4\times 4}\), such that (1st) the MixColumns matrix is an MDS matrix and such that (2nd) the solutions of Eq. (1) are uniformly distributed for each input/output difference \(\varDelta _I \ne 0\) and \(\varDelta _O\ne 0\). Given \(2^{4n}\) plaintexts \(\{p^i\}_{i\in \{0, 1, \ldots , 2^{4n} -1\}}\) with one active diagonal (equivalently, in a coset of a diagonal space \(\mathcal D_i\) for \(i\in \{0,1,2,3\}\)), consider the corresponding ciphertexts after 5 rounds without the final MixColumns operation, that is, \(c^i = R^5_f(p^i)\). Independently of

  • the initial coset of \(\mathcal D_i\), and

  • the value of the secret key,

the average number of different pairs of ciphertexts \((c^h, c^j)\) for \(h\ne j\) that belong to the same coset of \(\mathcal{I}\mathcal{D}_J\) for any fixed \(J\subseteq \{0,1,2,3\}\) with \(|J|=3\) is equal to

$$\begin{aligned} \frac{2^{4n-1} \cdot (2^{2n}-3\cdot 2^n +3)^4}{(2^n-1)^{8}} + \frac{(2^{n-1}-1)^4 \cdot 2^{4n+5}}{(2^n-1)^5} + 3\cdot \frac{2^{4n}}{(2^n-1)^2}\,, \end{aligned}$$
(18)

and the variance of such distribution is given by

$$\begin{aligned} \frac{2^{4n+2} \cdot (2^{2n}-3\cdot 2^n +3)^4}{(2^n-1)^{8}} + \frac{(2^{n-1}-1)^4 \cdot 2^{5n+7}}{(2^n-1)^5} + \frac{3\cdot 2^{6n+1} }{(2^n-1)^2}\,. \end{aligned}$$
(19)

The proof is analogous to the one just given for \(\mathbb F_{2^8}^{4\times 4}\).

7.2 Practical Results for 5-Round AES over \(\mathbb F_{2^n}^{4\times 4}\) for \(n\in \{4,8\}\)

Practical Results: Variance of 5-round AES over \(\mathbb F_{2^8}^{4\times 4}\). Our practical results regarding the variance \(\sigma ^2\) for full-size AES over 320 different initial cosets and keys are

$$ \sigma ^2_{T} = 76\,842\,293\,834.905\simeq 2^{36.161}\quad \textit{versus} \quad \sigma ^2_{P}= 73\,288\,132\,411.36 \simeq 2^{36.093}\,, $$

where the subscript \(\cdot _T\) denotes the theoretical value and the subscript \(\cdot _P\) the practical one.

Practical Results for 5-round AES over \(\mathbb F_{2^4}^{4\times 4}\). Our practical results for small-scale AES regarding the mean \(\mu \) over \(125\,000 \simeq 2^{17}\) different initial cosets and keys are

$$\begin{aligned} \mu _{AES}^T&= 32\,847.124{} & {} \textit{versus}&\mu _{AES}^P&= 32\,848.57 \,; \\ \mu _{rand}^T&= 32\,767.5{} & {} \textit{versus}&\mu _{rand}^P&= 32\,768.2 \,. \end{aligned}$$

Our practical results for small-scale AES regarding the standard deviation \(\sigma \) over 100 different initial cosets and keys are

$$\begin{aligned} \sigma _{AES}^T&= 1036.58{} & {} \textit{versus}&\sigma _{AES}^P&= 1027.93 \,; \\ \sigma _{rand}^T&= 181.02{} & {} \textit{versus}&\sigma _{rand}^P&= 182.42\,. \end{aligned}$$
Fig. 2.
figure 2

Comparison between the probability distribution of the number of collisions between theoretical small-scale 5-round AES (approximated by a normal distribution) and the practical one. Remark: since the AES probability distribution satisfies the multiple-of-8 property, then the probability in the case in which the number of collisions n is not a multiple of 8 is equal to zero.

The Probability Distribution for 5-Round AES Is not Symmetric. Figure 2 highlights the difference between the practical probability distribution of the number of collisions for small-scale AES and for a random permutation.

By Fig. 2, it turns out that small-scale 5-round AES distribution has a positive skew, while the skew of the random distribution is approximately equal to zero. The skewness is the parameter that measures the asymmetry of the probability distribution of a real-valued random variable about its mean. We practically derived the values of the skewness \(\gamma \) both for full-size AES and for small-scale one using \(2^{9}\) initial cosets, and we got the following results:

$$ \gamma ^{AES} \simeq 0.43786 \qquad \textit{and} \qquad \gamma ^{AES}_{\text {small-scale}} \simeq 0.4687\,, $$

where the skew of a random permutation is close to zero. We leave the open problem to theoretically compute the skew for small/real-size AES (and to set up a corresponding distinguisher if possible) as a future work.