Keywords

1 Introduction

Advanced Encryption Standard (AES) [8], published by NIST, is widely used in the field of symmetric ciphers. For instances, AES and reduced-round versions of AES are usually used as components for hash functions, authentication encryption schemes and so on. Since the goal of distinguishing attack is to distinguish a target cipher from random permutations with some special property, studying the distinguishers on AES can help designers and cryptanalysts to evaluate the security of target cipher, which is meaningful.

In secret-key distinguishing attack, adversary needs to distinguish the target cipher from random permutations without knowing the key and internal states. Such distinguisher can be used in key-recovery attack. Furthermore, reduced round AES are often utilized to design authentication encryptions such as the third-round candidates AES-OTR [20] in CAESAR competition [6]. It is necessary to research the secret-key distinguisher on AES. Beside that, the performances of block ciphers under known key settings need be considered. Block ciphers, because of their security and simplicity, are often adopted as components of hash functions by designers, such as Whirlpool [3] and Photon [13]. Since the attacker can fully control the inter behaviour of a hash function, if a block cipher is used to design hash function, its resistance to known-key attack or chosen-key attack, where the adversaries know the key or can choose the key, should be considered. The first known-key security model is proposed by Knudsen and Rijmen for block cipher in [15] where the secret key is known to the attacker and the goal is to distinguish the block cipher from a random permutation by constructing a set of plaintext/ciphertext pairs satisfying a special property. Such a property is easy to check but impossible to achieve for any random permutation with the same complexity and a non-negligible probability by using oracle accesses to this random permutation and its inverse. Since its establishment, several types of known-key distinguishers have been proposed, such as distinguishers with integral property [1, 12, 15, 21], subspace distinguishers [17, 18], (multiple) limited-birthday distinguishers [11, 14], and the known-key distinguisher for PRESENT by combining meet-in-the-middle technique and truncated differential [5]. Moreover, the chosen-key distinguishing attack on the full AES-256 has been provided in [4].

Integral attack is an important cryptanalytic technique for symmetric-key ciphers, which was firstly put forward by Daemen et al. in [7], then unified as integral attack by Knudsen and Wagner in [16]. In an integral distinguisher, one fixes a part of plaintext bits and takes all possible values for the other plaintext bits such that the values on partial bits of ciphertext are uniformly distributed, to distinguish an actual cipher from a random permutation. If one additional linear layer is considered, the property will be that the XOR of all possible values of the specific part of ciphertext becomes zero, which is referred as zero-sum property [2]. In order to reduce the data complexity, Wang et al. applied statistical technique on original integral distinguisher and proposed a statistical integral distinguisher at FSE’16 [24], which consists of applying a statistical technique to the original integral distinguisher with the active property. As a result, this statistical integral distinguisher requires less data complexity than that of the original integral distinguisher. However, Wang et al. only considered the case that only one integral property on ciphertext, they didn’t discuss the cases that there are several integral properties on ciphertext and multiple structures of data should be used at the same time. These limit the effect of integral attacks on block ciphers, especially for known-key distinguishing attacks.

In this paper, we consider the cases omitted in [24] and use our statistical integral model to improve secret-key and known-key distinguishing attacks on AES with further less data and time complexities.

1.1 Our Contributions

Statistical Integral Distinguisher with Multiple Structures. We propose a statistical integral distinguisher with multiple structures on input and integral properties on output. In some situations of integral attacks such as known-key distinguishing attack on AES, multiple structures of input have to be used where for each structure s input bits take all possible values and the corresponding b t-bit outputs are uniformly distributed respectively. The statistical integral distinguisher in [24] can reduce the data complexity from \(\mathcal {O}(2^s)\) to \(\mathcal {O}(2^{s-t/2})\) by using one t-bit integral property if only one structure is used. But if there are \(N_s\) structures involved, the model in [24] cannot be applied. For the sake of reducing the data requirements for the original integral distinguisher with multiple structures, we construct a new statistical integral distinguisher. In our new distinguisher, the data complexity is

$$\begin{aligned} \mathcal {O}(\sqrt{N_s/b} \cdot 2^{s-\frac{t}{2}}), \end{aligned}$$

while the data complexity of the original distinguisher is

$$\begin{aligned} \mathcal {O}(N_s \cdot 2^s). \end{aligned}$$

In order to verify our theoretical model, we implement the experiments for mini version of AES. It shows that the experimental results are in good accordance with the theoretic results.

Improved Secret-Key Integral Distinguisher on AES. AES is one of the most famous block ciphers. Until 2015, the best secret-key distinguishers on AES were 4 rounds, such as impossible differential, zero-correlation linear hull and integral distinguisher. Then at CRYPTO’16, Sun et al. proposed a 5-round distinguisher on AES with secret S-box under chosen-ciphertext mode with integral zero-correlation technique in [23]. But the data complexity of this distinguisher is up to \(2^{128}\). Recently, Grassi et al. put forward a 5-round distinguisher on AES with secret S-box by utilizing a 4-round impossible differential in [9]. The data complexity is \(2^{98.2}\). Later, they proposed another one on 5-round AES in [10]. That distinguisher is independent with the details of S-box, MC operation and secret-key, and its data complexity is reduced to \(2^{32}\). However, it utilizes the property of AES structure and has nothing with the secret-key, this weakness limits it to be used in key recovery attacks. In this paper, we will evaluate the security of AES from the point of integral distinguishing attack. We present a secret-key distinguisher on 5-round AES with secret S-box by adopting our statistical integral model under chosen-ciphertext mode. The data and time complexities are \(2^{114.32}\) chosen ciphertexts and \(2^{110}\) encryptions respectively. Its memory requirements are \(2^{33.32}\) blocks. This is the best integral distinguisher on AES with secret S-box under secret-key setting so far (Table 1).

Improved Known-Key Distinguishers on AES. We apply the statistical integral distinguisher with multiple structures into the known-key distinguishing attacks on AES. The first known-key distinguisher on AES was proposed by Knudsen and Rijmen in [15], where they gave an integral known-key distinguisher for 7-round AES. At ASIACRYPT’14, Gilbert provided a very important untwisted representation of AES and used this representation to distinguish 8-round AES and the full 10-round AES with the complexity \(2^{64}\) under the known-key model in [12]. Besides the integral known-key distinguishers, the known-key distinguisher with match-in-the-middle technique for 7-round AES was presented in [19], with rebound technique for 8-round AES were provided in [11, 14] whose complexities are \(2^{48}\) and \(2^{44}\) 8-round encryptions respectively. In this paper, we take advantage of our statistical integral model to improve known-key distinguisher on 8-round AES and full 10-round AES, whose respective time complexities are \(2^{42.61}\) computations and \(2^{59.60}\) computations. These distinguishers are the best known-key ones on AES according to the time complexity so far. See Table 2.

Table 1. Summary of secret-key integral distinguishers on AES
Table 2. Summary of known-key distinguishing attacks on AES

1.2 Ontline of This Paper

In Sect. 2, some preliminaries are given. Then we present a statistical integral model with multiple structures on input and integral properties on output in Sect. 3. In Sects. 4 and 5, secret-key statistical integral distinguisher and improved known-key distinguishers on AES are put forward respectively. At last, we conclude this paper in Sect. 6.

2 Preliminaries

2.1 Description of AES

AES is a byte-orient Substitution-Permutation Network (SPN). It has three versions, namely AES-128, -192 and -256. The block-size/key-size/total-rounds of these versions are 128/128/10, 128/192/12 and 128/256/14 respectively. Each round function includes 4 components:

  • SubBytes (SB): A nonlinear bijective mapping \(\mathbb {F}^{8}_{2} \rightarrow \mathbb {F}^{8}_{2}\) for each byte of state;

  • ShiftRows (SR): Left rotate the i-th row by i bytes, where \(i=0,1,2,3\);

  • MixColumns (MC): Left multiply with an MDS matrix over the field \(GF(2^{8})\) on each column;

  • AddRoundKey (AK): XOR with a 128 bits subkey.

It is worth noting that there is a whiten key XORed with plaintext before the first round function and the MC operation is omitted in the last round. Since we do not use the key schedule in this paper, we ignore it here.

All in all, 2r-round AES can be described as follows:

$$\begin{aligned} AES_{2r} = AK \diamond (SB \diamond SR \diamond MC \diamond AK)^{2r-1} \diamond SB \diamond SR \diamond AK \end{aligned}$$
(1)

where \(A\diamond B\) denotes to implement A operation firstly, then B operation.

In [12], Gilbert proposed a new representation of AES. Firstly he defined two operations T and SC as follows, then built two special byte permutations \(P=SR\diamond T\diamond SR^{-1}\) and \(Q=SR^{-1}\diamond T\diamond SR\diamond SC\). With these two permutations, Gilbert proposed two transformations \(S=Q^{-1}\diamond SB \diamond MC \diamond AK \diamond SB \diamond P^{-1}\) and \(R=P\diamond SR \diamond MC \diamond AK \diamond SR \diamond Q\), which operate on columns and rows respectively.

$$\begin{aligned}T: \begin{pmatrix} a_{0} &{} a_{4} &{} a_{8} &{} a_{12}\\ a_{1} &{} a_{5} &{} a_{9} &{} a_{13}\\ a_{2} &{} a_{6} &{} a_{10} &{} a_{14}\\ a_{3} &{} a_{7} &{} a_{11} &{} a_{15}\\ \end{pmatrix} \mapsto \begin{pmatrix} a_{0} &{} a_{1} &{} a_{2} &{} a_{3}\\ a_{4} &{} a_{5} &{} a_{6} &{} a_{7}\\ a_{8} &{} a_{9} &{} a_{10} &{} a_{11}\\ a_{12} &{} a_{13} &{} a_{14} &{} a_{15}\\ \end{pmatrix} \end{aligned}$$
$$\begin{aligned} SC: \begin{pmatrix} a_{0} &{} a_{4} &{} a_{8} &{} a_{12}\\ a_{1} &{} a_{5} &{} a_{9} &{} a_{13}\\ a_{2} &{} a_{6} &{} a_{10} &{} a_{14}\\ a_{3} &{} a_{7} &{} a_{11} &{} a_{15}\\ \end{pmatrix} \mapsto \begin{pmatrix} a_{0} &{} a_{12} &{} a_{8} &{} a_{4}\\ a_{1} &{} a_{13} &{} a_{9} &{} a_{5}\\ a_{2} &{} a_{14} &{} a_{10} &{} a_{6}\\ a_{3} &{} a_{15} &{} a_{11} &{} a_{7}\\ \end{pmatrix} \end{aligned}$$

As a result, 2r-round AES has three equivalent representations:

$$\begin{aligned} AES_{2r}&=AK \diamond SR \diamond Q \diamond (S \diamond R)^{r-1} \diamond S \diamond P \diamond SR \diamond AK,\end{aligned}$$
(2)
$$\begin{aligned} AES_{2r}&=AK \diamond P^{-1} \diamond SB \diamond R \diamond (S \diamond R)^{r-1} \diamond SB \diamond Q^{-1} \diamond AK,\end{aligned}$$
(3)
$$\begin{aligned} AES_{2r}&=AK \diamond SB \diamond SR \diamond MC \diamond AES_{2r-2} \diamond AK^{-1} \diamond MC \diamond AK \cdot SB \diamond SR \diamond AK. \end{aligned}$$
(4)

Throughout this paper, we use \(X_{(i)}\) and \(X_{(i\sim j)},i,j=0,1,\ldots ,15\) to denote the i-th byte and \(i\sim j\)-th bytes of state X respectively.

2.2 Brief Description of Known-Key Distinguishers on AES in [12]

In this subsection, we briefly recall the known-key distinguishers for 8-round and 10-round AES proposed by Gilbert at ASIACRYPT’14 [12].

In order to mount a known-key distinguisher for \(AES_{8}\), Gilbert firstly proposed two integral distinguishers shown in Fig. 1, where \((A^{j}_{1}, A^{j}_{2}, A^{j}_{3},A^{j}_{4}), j =0,1,\ldots ,4\), A and C denote uniform distribution on 4 bytes, uniform distribution on 1 bytes and constant respectively. Then given \(2^{64}\) data \(\mathcal {Z}=\{R(x,0,0,0)\oplus (y,0,0,0) | x, y \in (0,1)^{32}\}\), this set \(\mathcal {Z}\) can be divided into \(2^{32}\) structures according to different values of x, and each structure takes all \(2^{32}\) values on the first column and constants on other columns. So the set \(\mathcal {Z}\) satisfies the first integral distinguisher in Fig. 1. Since R operation is an affine mapping, \(R(\mathcal {Z}^{-1})=\{(x,0,0,0)\oplus R^{-1}(y,0,0,0)\}\) can be divided into \(2^{32}\) structures according to different values of y, thus the set \(R(\mathcal {Z}^{-1})\) satisfies the second integral distinguisher in Fig. 1.

Combining with these two integral distinguishers with R operation above, a known-key distinguisher on \(AES_8\) is built that all input and output bytes resulted from \(2^{64}\) middle texts \(\mathcal {Z}\) are uniformly distributed. However, for random permutations, the upper bound of the probability satisfying the uniformly distributed property for each byte is \(\frac{1}{2^{128-1}}\) with \(q \le N =2^{64}\) oracle queries.

Furthermore, with the representation of Eq. (3), Gilbert mounted a known-key distinguisher for \(AES_{10}\). This distinguisher is implemented by extending one round on each side based on the distinguisher for \(AES_8\). The same \(2^{64}\) middle texts \(\mathcal {Z}\) as for the known-key distinguisher on \(AES_8 \) are used. For the corresponding input-output pairs \((p_i, c_i), i=1, \ldots , 2^{64}\), the adversary can find at least one value \((\varDelta , \varGamma )\), where \(\varDelta ,\varGamma \in (0,1)^{128}\), to make each byte of \(R \circ SB(P^{-1}(p_i)\oplus \varDelta )\) and \(R^{-1} \circ SB^{-1}(Q(c_i)\oplus \varGamma )\) be uniform distribution within time complexity \(2^{64}\). However, for a random permutation, the upper bound of the probability satisfying the uniformly distributed property for each byte is \(2^{-16.5}\) with \(q \le N =2^{64}\) oracle queries.

Since Gilbert’s work is based on the integral distinguisher and uses the active propertyFootnote 1, if we can improve the statistical integral model proposed by Wang et al. in [24], we can further improve Gilbert’s work and widely utilize the new method to all AES-like ciphers. With the improved known-key distinguishers, 10-round AES-like ciphers cannot be regarded as ideal random permutations, and the time complexities of new distinguishers are less than previous ones.

Fig. 1.
figure 1

Two integral distinguishers under the new representation of AES in [12]

2.3 Statistical Integral Distinguisher

In this subsection, we recall the statistical integral distinguisher proposed by Wang et al. in [24].

Assume that \(H: \mathbb {F}^{n}_{2} \rightarrow \mathbb {F}^{n}_{2}\) is a part of a block cipher, its input and output both can be split into two parts as follows:

$$\begin{aligned}H: \mathbb {F}^{r}_{2} \times \mathbb {F}^{s}_{2} \rightarrow \mathbb {F}^{t}_{2} \times \mathbb {F}^{u}_{2}, H(x,y)=\left( \begin{array}{c} H_{1}(x,y)\\ H_{2}(x,y)\end{array} \right) .\end{aligned}$$

If the first r bits of input are fixed as a constant \(\lambda \) and only the first t bits of output are considered, then the function H can be denoted as \(T_{\lambda }\):

$$\begin{aligned}T_{\lambda }: \mathbb {F}^{s}_{2}\rightarrow \mathbb {F}^{t}_{2}, T_{\lambda }(y)=H_{1}(\lambda ,y).\end{aligned}$$

When y takes over all possible values, the outputs \(T_{\lambda }(y)\) are uniformly distributed, then an integral distinguisher is constructed.

If the adversary only takes \(N<2^{s}\) different y, sets a counter \(V[T_{\lambda }(y)]\) and initializes this counter as zero, a statistical integral distinguisher can be constructed by investigating the distribution of the statistic as follows:

$$\begin{aligned} T=\sum ^{2^{t}-1}_{i=0} \frac{(V[T_{\lambda }(y)]-N \cdot 2^{-t})^{2}}{N\cdot 2^{-t}} \end{aligned}$$
(5)

For the right key guess (the target cipher), the statistic T follows a \(\chi ^{2}\) distribution with mean \(\mu _{0}=(2^{t}-1)\frac{2^{s}-N}{2^{s}-1}\) and variance \(\sigma ^{2}=2(2^{t}-1)(\frac{2^{s}-N}{2^{s}-1})^{2}\), but for the wrong key guess (a random permutation), it follows a \(\chi ^{2}\) distribution with mean \(\mu _{0}=(2^{t}-1)\) and variance \(\sigma ^{2}=2(2^{t}-1)\). The relation of data complexity, type-I error probability \(\alpha _{0}\) and type-II error probability \(\alpha _{1}\) is as follows

$$\begin{aligned} N=\frac{(2^{s}-1)(q_{1-\alpha _{0}}+q_{1-\alpha _{1}})}{\sqrt{(2^{t}-1)/2}+q_{1-\alpha _{1}}}+1, \end{aligned}$$
(6)

3 Statistical Integral Distinguisher with Multiple Structures on Input and Integral Properties on Output

In some integral distinguishers, there are b groups of t output bits with the active property. If we can utilize all properties at the same time, the data complexity can be further reduced. What’s more, in some attack settings, \(N_{s}\) structures, i.e. that \(N_{s}\) different \(\lambda \), should be used together. For these special settings, we construct the new statistical integral distinguisher in this section.

Firstly, we split the input into two parts and output into \(b+1\) parts.

$$\begin{aligned} H:\mathbb {F}_2^r \times \mathbb {F}_2^{s} \rightarrow \mathbb {F}_2^{t} \times \mathbb {F}_2^{t}\times \ldots \times \mathbb {F}_2^{t}\times \mathbb {F}_2^{u},\ H(x,y)=\left( \begin{array}{c} H_1(x,y) \\ H_2(x,y)\\ \ldots \\ H_{b+1}(x,y) \end{array}\right) . \end{aligned}$$

Then we use \(T_\lambda ^i\) to denote the function \(H_i\) where the first r bits of its input are fixed to the value \(\lambda \) and b outputs \(H_i, 1\le i\le b\), are considered:

$$\begin{aligned} T_\lambda ^{i}:\mathbb {F}_2^s\rightarrow \mathbb {F}_2^{t},\ T^i_\lambda (y)=H_i(\lambda ,y), i=1,2,\ldots ,b. \end{aligned}$$

For a special integral distinguisher, when y iterates all possible values of \(\mathbb {F}_2^{s}\), \(T_\lambda ^{i}(y), i =1,2,\ldots ,b\) are all uniformly distributed with probability one. Further more, if we take \(N_s\) values for \(\lambda \), i.e. \(N_s\) structures and in each structure y iterates all possible values of \(\mathbb {F}_2^{s}\), the integral properties on output are satisfied as well.

Now assume we need \(N < 2^s\) values of y under each structure and we use \(N_s\) structures which are independent. \(T_\lambda ^{i}(y) \in \mathbb {F}_2^{t}, i=1,2,\ldots ,b\) are computed for each y and we allocate a counter vector \(V_{i}[T_\lambda ^{i}(y)]\) to store the occurrences of \(T_\lambda ^{i}(y)\). Then we investigate the distribution of the following statistic:

$$\begin{aligned} C=\sum _{\lambda =1}^{N_s} \sum _{i=1}^{b} \sum _{T_\lambda ^i(y)=0}^{2^t-1}\frac{(V_{i}[T^i_\lambda (y)]-N\cdot 2^{-t})^2}{N\cdot 2^{-t}}. \end{aligned}$$
(7)

The statistic C follows different distributions determined by whether we are dealing with an actual cipher or a random permutation.

Proposition 1

For sufficiently large N, and t, the statistic \(\frac{2^s-1}{2^s-N}C_{cipher}\) (\(C_{cipher}\) is the statistic C for cipher) follows a \(\chi ^2\)-distribution with degree of freedom \(b\cdot N_s\cdot (2^t-1)\), which means that \(C_{cipher}\) approximately follows a normal distribution with mean and variance

$$\begin{aligned} \mu _0=Exp(C_{cipher})=b\cdot N_s \cdot ( 2^t-1)\frac{2^s-N}{2^s-1}, \ \, \sigma _0^2=Var(C_{cipher})=2b\cdot N_s \cdot ( 2^t-1)(\frac{2^s-N}{2^s-1})^2. \end{aligned}$$

The statistic \(C_{random}\) (\(C_{random}\) is the statistic C for randomly drawn permutation) follows a \(\chi ^2\)-distribution with degree of freedom \(b\cdot N_s\cdot (2^t-1)\), which means that \(C_{random}\) approximately follows a normal distribution with mean and variance

$$\begin{aligned} \mu _1=Exp(C_{random})=b\cdot N_s\cdot (2^t-1)\ \text {and}\ \sigma _1^2=Var(C_{random})=2b\cdot N_s\cdot ( 2^t-1). \end{aligned}$$

Proof

Deduced from Proposition 1 in [24], for a randomly drawn permutation, the statistic \(\sum _{T_\lambda ^i(y)=0}^{2^t-1}\frac{(V_{i}[T^i_\lambda (y)]-N\cdot 2^{-t})^2}{N\cdot 2^{-t}}\) follows a \(\chi ^2\)-distribution with degree of freedom \(2^t-1\) for any \(\lambda \) and i. Then the statistic \(C'_{random}\) for the randomly drawn permutation

$$\begin{aligned}C_{random}=\sum _{\lambda =1}^{N_s} \sum _{i=1}^{b} \sum _{T_\lambda ^i(y)=0}^{2^t-1}\frac{(V_{i}[T^i_\lambda (y)]-N\cdot 2^{-t})^2}{N\cdot 2^{-t}}\end{aligned}$$

is the sum of \(N_s\cdot b\) independent \(\chi ^2\) statistics with degree of freedom \(2^t-1\), so the statistic \(C_{random}\) follows a \(\chi ^2\)-distribution with degree of freedom \(b\cdot N_s\cdot (2^t-1)\). Then for sufficiently large N and t, \(C_{random}\) approximately follows a normal distribution with the expected value and variance:

$$\begin{aligned}Exp(C_{random})=b\cdot N_s\cdot (2^t-1)\ \text {and}\ Var(C_{random})=2b\cdot N_s\cdot (2^t-1).\end{aligned}$$

Since the statistic for the cipher \(\frac{2^s-1}{2^s-N}\sum _{T_\lambda ^i(y)=0}^{2^t-1}\frac{(V_{i}[T^i_\lambda (y)]-N\cdot 2^{-t})^2}{N\cdot 2^{-t}}\), for any \(\lambda \) and i, follows a \(\chi ^2\)-distribution with degree of freedom \(2^t-1\) deduced from [24]. Then the statistic \(\frac{2^s-1}{2^s-N}C'_{cipher}\) for the cipher

$$\begin{aligned}\frac{2^s-1}{2^s-N}C_{cipher}=\sum _{\lambda =1}^{N_s} \sum _{i=1}^{b}\frac{2^s-1}{2^s-N} \sum _{T_\lambda ^i(y)=0}^{2^t-1}\frac{(V_{i}[T^i_\lambda (y)]-N\cdot 2^{-t})^2}{N\cdot 2^{-t}}\end{aligned}$$

is the sum of \(N_s\cdot b\) independent \(\chi ^2\) statistics with degree of freedom \(2^t-1\), so the statistic \(\frac{2^s-1}{2^s-N}C_{cipher}\) follows a \(\chi ^2\)-distribution with degree of freedom \(b\cdot N_s\cdot (2^t-1)\). Then for sufficiently large N and t, \(C_{cipher}\) approximately follows a normal distribution with the expected value and variance:

$$\begin{aligned}Exp(C_{cipher})=b\cdot N_s\cdot (2^t-1)\cdot \frac{2^s-1}{2^s-N}\ \text {and}\ Var(C_{cipher})=2b\cdot N_s\cdot (2^t-1)\cdot (\frac{2^s-1}{2^s-N})^2.\end{aligned}$$

   \(\square \)

Corollary 1

Under the assumption of Proposition 1, for type-I error probability \(\alpha _0\) (the probability to wrongfully discard the cipher), and type-II error probability \(\alpha _1\) (the probability to wrongfully accept a randomly chosen permutation as the cipher), to distinguish a cipher and a random permutation based on b independent t-bit outputs when randomly choosing \(N_s\) values for r-bit inputs and N values for s-bit inputs, then the following equation holds.

$$\begin{aligned} N=\frac{(2^s-1)(q_{1-\alpha _0}+q_{1-\alpha _1})}{\sqrt{(b \cdot N_s \cdot (2^t-1))/2}+q_{1-\alpha _0}}+1, \end{aligned}$$
(8)

where \(q_{1-\alpha _0}\) and \(q_{1-\alpha _1}\) are the respective quantiles of the standard normal distribution.

Corollary 1 is obtained from the equation about the decision threshold \(\tau =\mu _0+\sigma _0q_{1-\alpha _0}=\mu _1-\sigma _1q_{1-\alpha _1}\). And the statistic test is also based on the decision threshold \(\tau \): if \(C\le \tau \), the test outputs ‘cipher’; Otherwise, if the statistic \(C>\tau \), the test outputs ‘random’. Note that in this statistical method the success probability \(Ps=1-\alpha _0\), and the relation between \(\alpha _1\) and the advantage of the attack a is \(\alpha _1=2^{-a}\).

In order to verify the theoretical model in Corollary 1, we implement the experiments for mini version of AES in Appendix A.1. It shows that the experimental results are in good accordance with the theoretic results.

From Eq. (8), we know that the data complexity for the statistical distinguisher is \(N \cdot N_s\). For the given values of \(n, s, t, \alpha _0, \alpha _1\), the ratio of the data complexity with \(N_s\) structures to that with one structure is \(\sqrt{N_s}\). It means that more structures will result in high data complexity, so we should avoid to utilize more structures. However, for the known-key integral distinguisher for AES etc., we have to use enough structures to make the plaintexts and the ciphertexts satisfying the desired properties simultaneously. Moreover, if b is increased, the data complexity can be reduced, but as b increases, the time complexity in some situations will be increased accordingly. Thus, we should take the proper value for b according to the time-data tradeoff.

4 Secret-Key Statistical Integral Distinguisher on Reduced 5-Round AES

In this section, we propose a secret-key distinguisher on 5-round AES with our statistical integral model based on the work of Sun et al. in [23]. In this distinguisher, the S-box used in AES is secret.

Firstly, we slightly modify the zero-correlation linear hull for 5-round decryption of AES under chosen-ciphertext mode proposed by Sun et al. in [23] (Lemma 3). Let \(V=\{(x_{(i)})\in F^{16}_{2^{8}} | x_{(0)}\oplus x_{(13)}=(k_{5})_{(0)} \oplus (k_{5})_{(13)}\}\), and assume that the input mask \(\varGamma _{I}=(a_{(i)})_{0\le i\le 15}\) and output mask \(\varGamma ^{0}_{O}=(\beta _{(i)})_{0\le i\le 15}\) satisfy:

$$\begin{aligned} a_{(i)}= {\left\{ \begin{array}{ll} a, i=0, 13,\\ 0,otherwise. \end{array}\right. } \beta _{(j)}= {\left\{ \begin{array}{ll} nonzero, j=\{0,5,10,15\}\\ 0,otherwise. \end{array}\right. } \end{aligned}$$

Then the correlation for \(\varGamma _{I}\rightarrow \varGamma ^{0}_{O}\) on V is always 0. Note that there are three other zero-correlation linear hulls as well, when \(j=\{1,6,11,12\}\), \(\{2,7,8,13\}\), \(\{3,4,9,14\}\). The corresponding output masks are \(\varGamma ^{1}_{O}\), \(\varGamma ^{2}_{O}\) and \(\varGamma ^{3}_{O}\) respectively. One of the four cases is shown in Fig. 2.

Fig. 2.
figure 2

Zero-correlation linear hull on 5-round AES with secret S-box under secret-key setting. Gray and white cells denote nonzero and zero masks respectively. The two cells with a or b are exactly the same mask.

With the technique proposed by Sun et al. in [22], these four zero-correlation linear hulls can be transformed into integral ones. Taking the linear hull \(\varGamma _{I}\rightarrow \varGamma ^{0}_{O}\) as an example, the corresponding integral distinguisher is that if the adversary takes over \(2^{120}\) different values of ciphertexts c satisfying \(c_{(0)}\oplus c_{(13)}=(k_{5})_{(0)}\oplus (k_{5})_{(13)}\), then the values on 4 bytes of plaintext \((p_{(0)},p_{(5)},p_{(10)},p_{(15)})\) are uniformly distributed.

figure a

Based on these integral distinguishers, we can implement a statistical integral distinguisher for each candidate \(\varDelta =(k_{5})_{(0)}\oplus (k_{5})_{(13)}\), where \(s=120\) and \(t=32\). In order to have the success probability \((1-\alpha _{0})^{2^{8}}=(1-\alpha _{1})^{2^{8}}=95\%\), we set \(\alpha _{0}=\alpha _{1}=0.0002\), then \(q_{1-\alpha _{0}}=q_{1-\alpha _{1}}\approx 3.54\). Meanwhile, we can use these four integral distinguishers above together within one structure, so \(b=4\) and \(N_{s}=1\). Thus by Eq. (8), \(N=2^{106.32}\) chosen ciphertexts. The decision threshold is about \(\tau \approx 17179212992.15\). As there are \(2^{8}\) different values of \(\varDelta \), the total data complexity of this distinguisher is \(N'=2^{106.32}\times 2^{8}=2^{114.32}\) chosen ciphertexts.

What’s more, we can see from Algorithm 1, the main time complexity happens on Step 5, which is about \(2^{8}\times 2^{106.32}\times 4\times 1/16\times 1/5\approx 2^{110}\) encryptions, if we regard one simple operation as \(\frac{1}{16}\) one round encryption. Beside that, memory requirements are about \(4\times 2^{32}\times 10\approx 2^{37.32}\) bytes \(=2^{33.32}\) blocks.

As far as we know, this distinguisher is the best secret-key integral one on 5-round AES with secret S-box.

5 Improved Known-Key Distinguishers on AES

In this section, we will use our new statistical integral model to reduce the complexities of known-key distinguishers on AES proposed by Gilbert at ASIACRYPT’14 in Sects. 5.1 and 5.2.Footnote 2 The time complexity is reduced to \(2^{42.61}\) in the known-key distinguisher on 8-round AES. For the 10-round AES, the time complexity is reduced to \(2^{59.60}\). Compared to all the public known-key distinguishers for 8-round AES, our distinguisher is the best one according to both time and memory complexities. Moreover, our known-key distinguisher on 10-round AES is the best one according to the time complexity.

5.1 Improved Known-Key Distinguisher on 8-Round AES

As described in Subsect. 2.2, the known-key distinguisher for \(AES_8\) is based on the uniformly distributed integral property with \(2^{32}\) structures and each structure takes \(2^{32}\) texts. This integral property can be transformed to a statistical integral property by using Proposition 1. So in our known-key distinguisher on \(AES_{8}\), we utilize the statistical integral properties on each byte of input and output to distinguish the actual cipher and random permutations. In this way, the required number of structures and texts of one structure can be reduced. The process to distinguish the actual cipher \(AES_8\) from the random permutation is described in Algorithm 2.

Since in the middle of the distinguisher, the numbers of structures before and after R operation should be the same, i.e. that \(N =N_s\). By applying Proposition 1 in above case, we have \(s=32\), \(t=8\), \(b=16\) and \(N=N_s\). If we set the error probabilities \(\alpha _0=2^{-128}\) and \(\alpha _1=2^{-128}\) (the values of \(\alpha _0\) and \(\alpha _1\) can be different and take any suitable values), then \(q_{1-\alpha _0}=q_{1-\alpha _1} \approx 13.06\). According to Eq. (8), \(N=N_s\approx 2^{20.81}\) and the threshold value \(\tau \approx 7478730631.39\).

figure b

For the case of \(AES_8\), as \(\alpha _0=2^{-128}\), the probability to wrongly regard \(AES_8\) as a random permutation is \(\alpha _0+(1-\alpha _0)\alpha _0\approx 2^{-127}\), which means that the success probability to correctly identify AES cipher is about \((1-\alpha _0)^2 \approx 1-2^{-127}\).

While for the case of random permutation, the adversary can implement encryption and decryption oracle queries to the cipher and random permutation. But statistical integral property (exploiting \(\chi ^{2}\) distribution) is different from traditional integral property (utilizing uniform distribution). At the best of times the adversary chooses the data which automatically satisfy the statistical property on the input, but to satisfy the statistical property on the output, the probability is \(\alpha _1=2^{-128}\). In order to satisfy the statistical properties both on the input and output, the probability to wrongly regard this random permutation as AES cipher is \( 1\times \alpha _1= 2^{-128}\).

To summarize, the advantage to distinguish AES cipher from random permutation is not negligible. The total time complexity of this known-key distinguisher is about \(2\times 2^{41.61}=2^{42.61}\) computations. The memory requirements are about \(16\times 2^{8}\times 2=2^{13}\) bytes used for storing the counter vector \(V[16][2^{8}]\).

5.2 Improved Known-Key Distinguisher on 10-Round AES

The statistical integral distinguisher on \(AES_{10}\) is based on the distinguishing property of \(AES_{10}\) in [12], which is represented according to Eq. (4), see Fig. 3.

Along with the idea within the distinguisher on \(AES_{10}\) in [12], in our known-key distinguisher on \(AES_{10}\), we use \(N_s < 2^{32}\) structures, each of which takes \(N=N_s\) middle texts, to obtain \(N^2\) input/output pairs. For AES cipher, there is one value for \((\varDelta , \varGamma )\) to let each byte of \(R\circ SB(R^{-1}(input\oplus \varDelta ))\) and \(R^{-1}\circ SB^{-1}(Q(output\oplus \varGamma ))\) satisfy the statistical integral property with a high probability. But for any random permutation, the probability to have one solution for \((\varDelta , \varGamma )\) to obtain the same property is very low.

However, in above way, the distinguisher has high time complexity. In order to reduce the time complexity, we implement the distinguisher in the following way. As \(N_s\) structures are used, we divide them into \(N_s/n_s\) groups and each group has \(n_s\) structures. Then we compute the statistic value for each group. There is one value \((\varDelta , \varGamma )\) to make all the statistics for \(N_s/n_s\) groups on both states \(Input'=MC\circ SR \circ SB(input \oplus \varDelta )\) and \(Output'=MC^{-1}\circ SB^{-1}\circ SR^{-1}(output \oplus \varGamma )\) less than the given threshold \(\tau \) for \(AES_{10}\). However, for the random permutation, even if the attacker can carefully choose the inputs to find one value of \(\varDelta \) to satisfy the statistical property on the state \(Input'\) with probability one, the probability to find one value \(\varGamma \) to satisfy the statistical property on the state \(Output'\) is very low.

In order to further reduce the time complexity, we focus on statistics on 8-byte states – \(Input'_{(0\sim 3)}\) and \(Output'_{(0\sim 3)}\). So we only need to find two 32-bit values for \(\varDelta '=(\varDelta _{(0)},\varDelta _{(5)},\varDelta _{(10)},\varDelta _{(15)})\) and \(\varGamma '=(\varGamma _{(0)},\varGamma _{(7)},\varGamma _{(10)},\varGamma _{(13)})\). The detailed process for this known-key distinguisher on \(AES_{10}\) is described in Algorithm 3.

In this setting, by applying Proposition 1, \(s=32\), \(t=8\), \(b=1\) and \(n_s=2^{8}\). If we set the error probabilities \(\alpha _0 =2^{-50}\) and \(\alpha _1={2^{-10.51}}\), then \(N=2^{27.92}\) and \(\tau =64123.53\) according to Eq. (8).

Fig. 3.
figure 3

Known-key distinguisher for \(AES_{10}\). \((A_{1},A_{2},A_{3},A_{4})\) and A denote uniform distribution on 4 bytes and 1 byte respectively. C denotes constant byte.

figure c

In Algorithm 3, we filter out the wrong values for \(\varDelta '=(\varDelta _{(0)},\varDelta _{(5)},\varDelta _{(10)},\) \(\varDelta _{(15)})\) with the statistics on \(Input'_{(0\sim 3)}\) one by one. At last, the probability that one wrong \(\varDelta '\) is remained after all \(2^{27.92-8}\) filtering processes is about \((2^{32}-1) \cdot \alpha _1^{4\times 2^{19.92}}\approx 0\), while the probability that the right candidate \(\varDelta \) cannot pass the filtering process is \(1-(1-\alpha _0)^{4\times 2^{19.92}}\approx 2^{-28.08}\).

In the similar way, we filter out the wrong values for \(\varGamma '=(\varGamma _{(0)},\varGamma _{(7)},\) \(\varGamma _{(10)},\varGamma _{(13)})\) with the statistics for \(Output'_{(0\sim 3)}\) one by one. Finally, the probability that one wrong \(\varGamma '\) can pass the filtering process is also about 0, while the probability that the right \(\varGamma '\) cannot pass the filtering process is also \(2^{-28.08}\). Therefore, for the case of \(AES_{10}\), the probability to correctly identify the \(AES_{10}\) cipher is about \((1-2^{-28.08})^2\approx 1-2^{-27.08}\).

While for the case of random permutation, at the best of the times the adversary can choose the inputs that there is always at least one value of \(\varDelta '\) remaining after the filtering process, but the probability that there is at least one \(\varGamma '\) surviving after the filtering process is about 0.

So the success probability of this distinguisher is about \(1-2^{-27.08}\). The advantage to distinguish \(AES_{10}\) from random permutation is not negligible. The time complexity of Steps \(2\sim 3\) is \(N\times N=2^{55.84}\) full round encryptions. Then the time complexity of Steps \(4\sim 9\) is about \(2^{16}\times n_s \times N= 2^{51.92}\) memory accesses (MA). Steps \(10\sim 14\) take \(2^{16}\times 2^{8} \times n_s\times 2^{24} =2^{56}\) MA, and Steps \(15 \sim 21\) require about \(2^{32}\times n_s \times 2^{16} = 2^{56}\) MA. Since \(\alpha _1=2^{-10.51}\), Steps \(22 \sim 29\) take \(2^{32}\times \alpha _1 \times n_s\times N=2^{57.41}\) MA and Step 30 needs about \((2^{32}\times \alpha _1^2+2^{32}\times \alpha _1^3) \times n_s \times N\approx 2^{46.91}\) MA. After one filter process, the number of candidates for \(\varDelta \) is about 1. Consequently by filtering with other \(N/n_s-1\) groups of structures, the time complexity of Step 31 is \( (N/n_s-1)\times n_s\times N\approx 2^{55.84}\) MA. Then if we roughly set one access to a table is equivalent to one full round encryption, the total complexity from Step \(4\sim 31\) is about \(2^{51.92}+2^{56}+2^{56} +2^{57.41}+2^{46.91}+2^{55.84}\approx 2^{58.49}\) encryptions. Since Step 34 also takes \(2^{58.49}\) encryptions, the total time complexity of the whole attack is about \(2^{55.84}+2\times 2^{58.49}\approx 2^{59.60}\) full round encryptions. In addition, the dominant memory requirements happen on V[N][N] and \(V'[N][N]\), which need about \(2\times 4 \times N \times N=2^{58.84}\) bytes.

6 Conclusion

In this paper, we propose a statistical integral distinguisher with multiple structures on input and multiple integral properties on output based the work of Wang et al. at FSE’16. With this distinguisher, we give the known-key distinguishing attack on 8-round and full round AES-128 based on the Gilbert’s work at ASIACRYPT’14, which are the best known-key distinguishers for AES so far according to the time complexity. Beside that, we present a secret-key statistical integral distinguisher on 5-round AES with secret S-box under chosen-ciphertext mode. This is the best integral distinguisher on AES with secret S-box under secret-key setting. As a future work, we try to apply more statistical techniques into the field of symmetric ciphers and find improved attack on AES and AES-like ciphers.