1 Introduction

Integral attack is an important cryptanalytic technique for symmetric-key ciphers, which was firstly put forward by Daemen et al. in [7], then unified as integral attack by Knudsen and Wagner in [18]. In an integral distinguisher, in order to distinguish an actual cipher from any random permutation, one chooses plaintexts by taking all values of part bits and fixing other bits as constant to check whether the values on partial bits of the ciphertext are uniformly distributed (integral property) or not. If one additional linear layer is considered, the property will be that the XOR of all possible values on the specific part bits of ciphertext becomes zero, which is referred as zero-sum property [2]. With the purpose of reducing the data complexity, Wang et al. applied statistical techniques on original integral distinguisher to build a statistical integral distinguisher at FSE’16 in [25]. As a result, this statistical integral distinguisher requires less data complexity than that of the original integral one. However, Wang et al. only considered the case that there was only one integral property on output, they didn’t discuss the cases that several integral properties existed on output and multiple structures of data should be used at the same time. These limit the effect of certain integral attacks on block ciphers, especially for known-key distinguishing attack. We want to extend the statistical integral distinguisher for more situations. This is the first motivation for this work.

Block ciphers, because of their security and simplicity, are often adopted as components of hash functions by designers, such as Whirlpool [3] and PHOTON [14]. Since the attacker can fully control the inter behaviour of a hash function, the block cipher used in it shall resist known-key and chosen-key attacks. The first known-key security model was proposed by Knudsen and Rijmen for block cipher in [17] where the secret key is known to the attacker. Its goal is to distinguish the block cipher from a random permutation by constructing a set of plaintext/ciphertext pairs satisfying a special property. Such property is easy to check but impossible to achieve for any random permutation with the same complexity and a non-negligible probability by using oracle accesses to this random permutation and its inverse. Since the establishment of known-key model, several types of known-key distinguishers have been proposed, such as distinguishers with integral property [1, 10, 17, 22], subspace distinguishers [19, 20], (multiple) limited-birthday distinguishers [11, 16], and the known-key distinguisher for PRESENT by combining meet-in-the-middle technique with truncated differential [5]. Moreover, the chosen-key distinguishing attack on the full AES-256 has been proposed in [4].

The best known-key distinguisher on AES was proposed by Gilbert at ASIACRYPT’14 in [10]. He utilized 264 middle-texts belonging to 232 structures and integral property to put forward a known-key distinguisher on full-round AES-128. If we can use the statistical integral property into this known-key distinguisher, we can reduce the time complexity so as to improve it. Furthermore, we can implement improved known-key distinguishers on all AES-like ciphers which adopt the wide trail strategy as AES. This is the second motivation of this work.

Our Contributions

In this paper, we propose a statistical integral distinguisher with multiple structures on input and integral properties on output. We found that in some situations of integral attacks such as known-key distinguishing attack on AES, multiple structures of inputs have to be used. For instance, N s structures of inputs are needed. In each structure, all 2s possible values on s input bits (out of n bits) are taken and the corresponding outputs on b different t bits are uniformly distributed respectively. The statistical integral distinguisher in [25] can reduce the data complexity from \(\mathcal {O}(2^{s})\) to \(\mathcal {O}(2^{s-t/2})\) by using one t-bit integral property if only one structure is used. However, if there are N s structures involved, the model in [25] cannot be applied. For the sake of reducing the data requirements, we construct a new statistical integral distinguisher. In our new distinguisher, the data complexity is

$$\mathcal{O}(\sqrt{N_{s}/b} \cdot 2^{s-\frac{t}{2}}), $$

while the data complexity of the original distinguisher is

$$\mathcal{O}(N_{s} \cdot 2^{s}). $$

In order to verify our theoretical model, we implement the experiments for mini version of AES. It shows that the experimental results are in good accordance with the theoretical results.

As an application, we put our statistical integral method into known-key distinguishing attack and generalize the new known-key distinguishers on 8-round and 10-round AES-like ciphers. Taking AES, the permutations of Whirlpool, PHOTON and Grøstl as examples (regard these AES-like permutations as block ciphers), we show the actual distinguishers and compare them with previous results. As far as we know, our distinguishers on AES-like ciphers are the best ones under known-key setting. The results are summarized in Table 1.

Table 1 Summary of KK and CK distinguisherson AES-like Ciphers. Note that we only consider the security of the AES-like permutation used in hash function and such permutation is regarded as block cipher

Besides that, we propose the improved secret-key integral distinguisher on AES with secret S-box, which has a great improvement comparing with the previous integral one proposed by Sun et al. at CRYPTO’16 in [23]. The results are summarized in Table 2.

Table 2 Summary of secret-key distinguishers on AES

Outline of This Paper

In Section 2, some preliminaries are given. Then we present a statistical integral model with multiple structures on input and several integral properties on output in Section 3. In Section 4 new general known-key distinguishers on AES-like ciphers are given and are applied on several actual ciphers in Section 5. Then a new secret-key statistical integral distinguisher on AES is put forward in Section 6. At last, we conclude this paper in Section 7.

2 Preliminaries

2.1 Notations

We define some notations in this part which are used through this paper.

  1. (1)

    AB: implement A operation firstly, then B operation;

  2. (2)

    AB: implement B operation firstly, then A operation;

  3. (3)

    X(i): the i-th cell of state X;

  4. (4)

    X(ij): the cells of X from the i-th one to j-th one.

To describe the integral distinguishers, we use C, A, (A1,A2,…,A n ) and \(({A^{j}_{1}}\), \({A^{j}_{2}}\), …, \({A^{j}_{n}})\) to denote different properties as follows.

  1. (1)

    C: a constant byte where all values in a set are fixed as one constant;

  2. (2)

    A: a active byte where all values in a set are distributed uniformly;

  3. (3)

    (A1,A2,…,A n ): n active bytes where all values in a set are not only distributed uniformly in each byte but also distributed uniformly on n bytes;

  4. (4)

    \(({A^{j}_{1}},{A^{j}_{2}},\ldots ,{A^{j}_{n}})\): n active bytes in one column of a state where all values in a set are not only distributed uniformly in each byte but also distributed uniformly on n bytes.

2.2 Description of AES-like cipher

AES-like cipher adopts wide trail strategy which is used in the design of AES cipher [8]. Such cipher is constructed by iterating Even-Mansour construction for target rounds. Its internal state can be viewed as square matrix of m rows and m columns. Each of these m2 cells is c bits. The round function includes 4 components:

  • AddRoundKey (AK): Xor with a subkey, sometimes a constant.

  • SubBytes (SB): A nonlinear bijective mapping for each cell of state;

  • ShiftRows (SR): Left rotate the i-th row by i cells, where i = 0,1,…,m − 1;

  • MixColumns (MC): Left multiply with an MDS matrix on each column;

So a r-round AES-like cipher can be described as follows:

$$ AES^{like}_{r} = (AK \diamond SB \diamond SR \diamond MC)^{r} \diamond AK. $$
(1)

Usually the MC operation is omitted in the last round.

Note that this description for AES-like cipher follows the design of AES. Sometimes a cipher is regarded as an AES-like cipher as well if its SR operation is slightly modified and MC operation does not adopt an MDS matrix. In our paper, we also consider the ciphers whose SR and MC are replaced by ShiftColumns (SC) and MixRows (MR) as AES-like ciphers.

In [10], Gilbert proposed a new representation of AES. Firstly, he defined two operations T and CC as follows, then built two special byte permutations P = SRTSR− 1 and Q = SR− 1TSRCC. With these two permutations, Gilbert proposed a new observation as follows.

Observation 1 (From [10])

Transformations S = Q− 1SBMCAKSBP− 1 and R = PSRMCAKSRQ operate on columns and rows respectively.

$$T: \left( \begin{array}{llll} a_{0} & a_{4} & a_{8} & a_{12}\\ a_{1} & a_{5} & a_{9} & a_{13}\\ a_{2} & a_{6} & a_{10} & a_{14}\\ a_{3} & a_{7} & a_{11} & a_{15}\\ \end{array}\right) \mapsto \left( \begin{array}{llll} a_{0} & a_{1} & a_{2} & a_{3}\\ a_{4} & a_{5} & a_{6} & a_{7}\\ a_{8} & a_{9} & a_{10} & a_{11}\\ a_{12} & a_{13} & a_{14} & a_{15}\\ \end{array}\right) $$
$$CC: \left( \begin{array}{llll} a_{0} & a_{4} & a_{8} & a_{12}\\ a_{1} & a_{5} & a_{9} & a_{13}\\ a_{2} & a_{6} & a_{10} & a_{14}\\ a_{3} & a_{7} & a_{11} & a_{15}\\ \end{array}\right) \mapsto \left( \begin{array}{llll} a_{0} & a_{12} & a_{8} & a_{4}\\ a_{1} & a_{13} & a_{9} & a_{5}\\ a_{2} & a_{14} & a_{10} & a_{6}\\ a_{3} & a_{15} & a_{11} & a_{7}\\ \end{array}\right) $$

With this new observation, r-round AES has three equivalent representations:

$$\begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!AES_{r}\! &=&AK \diamond SR \diamond Q \diamond(S \diamond R)^{r/2-1} \diamond S \diamond P \diamond SR \diamond AK, \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} &=&AK \diamond P^{-1} \diamond SB \diamond R \diamond (S \diamond R)^{r/2-1} \diamond SB \diamond Q^{-1} \diamond AK, \end{array} $$
(3)
$$\begin{array}{@{}rcl@{}} &=&AK \diamond SB \diamond SR \diamond MC \diamond AES_{r/2-2} \diamond AK^{-1} \diamond MC \diamond AK \diamond SB \diamond SR \diamond AK. \end{array} $$
(4)

Easily, these representations can be directly applied on AES-like ciphers by choosing suitable T and CC operations. But note that the MC operation in the last round function is omitted in above three representations.

2.3 Brief description of known-key distinguishers on AES in [10]

In this subsection, we briefly recall the known-key distinguishers for 8-round and 10-round AES proposed by Gilbert at ASIACRYPT’14 [10].

In order to mount a known-key distinguisher for AES8, Gilbert firstly proposed two integral distinguishers shown in Fig. 1. Then given 264 data \(\mathcal {Z}=\{R(x,0,0,0)\oplus (y,0,0,0) | x, y \in \{0,1\}^{32}\}\), this set \(\mathcal {Z}\) can be divided into 232 structures according to different values of x, and each structure takes all 232 values on the first column and constants on other columns. So the set \(\mathcal {Z}\) satisfies the first integral distinguisher in Fig. 1. Since R operation is an affine mapping, \(R^{-1}(\mathcal {Z})=\{(x,0,0,0)\oplus R^{-1}(y,0,0,0)\}\) can be divided into 232 structures according to different values of y, thus the set \(R^{-1}(\mathcal {Z})\) satisfies the second integral distinguisher in Fig. 1.

Combining these two integral distinguishers with R operation, a known-key distinguisher on AES8 is built so that all input and output bytes resulted from 264 middle texts \(\mathcal {Z}\) are uniformly distributed. However, for random permutations, the upper bound of the probability satisfying the uniformly distributed property for each byte is \(\frac {1}{2^{128-1}}\) with qN = 264 oracle queries.

Furthermore, with the representation of (3), Gilbert mounted a known-key distinguisher for AES10. This distinguisher is implemented by extending one round on each side based on the distinguisher for AES8. The same 264 middle texts \(\mathcal {Z}\) as for the known-key distinguisher on AES8 are used. For the corresponding input-output pairs (p i ,c i ),i = 1,…,264, The adversary can find at least one value (Δ,Γ), where Δ,Γ ∈{0,1}128, to make each byte of RSB(P− 1(p i ) ⊕Δ) and R− 1SB− 1(Q(c i ) ⊕Γ) uniformly distributed within time complexity 264. However, for a random permutation, the upper bound of the probability satisfying the uniformly distributed property for each byte is 2− 16.5 with qN = 264 oracle queries.

Fig. 1
figure 1

Two integral distinguishers under the new representation of AES in [10]

Since Gilbert’s work is based on the integral distinguisher and uses the active propertyFootnote 1, if we can improve the statistical integral model proposed by Wang et al. in [25], we can further improve Gilbert’s work and widely utilize the new method to all AES-like ciphers. With the improved known-key distinguishers, 10-round AES-like ciphers cannot be regarded as ideal random permutations, and the time complexities of new distinguishers are less than previous ones.

2.4 Statistical integral distinguisher

In this subsection, we recall the statistical integral distinguisher proposed by Wang et al. in [25].

Assume that \(H: \mathbb {F}^{n}_{2} \rightarrow \mathbb {F}^{n}_{2}\) is a part of a block cipher, its input and output both can be splitted into two parts as follows:

$$H: \mathbb{F}^{r}_{2} \times \mathbb{F}^{s}_{2} \rightarrow \mathbb{F}^{t}_{2} \times \mathbb{F}^{u}_{2}, H(x,y)=\left( \begin{array}{c} H_{1}(x,y)\\H_{2}(x,y)\end{array} \right). $$

If the first r bits of input are fixed as a constant λ and only the first t bits of output are considered, then the function H can be denoted as T λ :

$$T_{\lambda}: \mathbb{F}^{s}_{2}\rightarrow \mathbb{F}^{t}_{2}, T_{\lambda}(y)=H_{1}(\lambda,y). $$

When y takes over all possible values, the outputs T λ (y) are uniformly distributed, then an integral distinguisher is constructed.

If the adversary only takes N < 2s different y, sets a counter vector V [T λ (y)], \(T_{\lambda }(y) \in \mathbb {F}^{t}_{2}\) to keep track of the number of each value T λ (y) and initializes this counter as zero, a statistical integral distinguisher can be constructed by investigating the distribution of the statistic as follows:

$$ T=\sum\limits^{2^{t}-1}_{T_{\lambda}(y)= 0} \frac{(V[T_{\lambda}(y)]-N \cdot 2^{-t})^{2}}{N\cdot 2^{-t}}. $$
(5)

For the right key guess (the target cipher), the statistic T follows a χ2 distribution with mean \(\mu _{0}=(2^{t}-1)\frac {2^{s}-N}{2^{s}-1}\) and variance \(\sigma ^{2}= 2(2^{t}-1)(\frac {2^{s}-N}{2^{s}-1})^{2}\), but for the wrong key guess (a random permutation), it follows a χ2 distribution with mean μ0 = (2t − 1) and variance σ2 = 2(2t − 1). The relation of data complexity, type-I error probability α0 and type-II error probability α1 is as follows

$$ N=\frac{(2^{s}-1)(q_{1-\alpha_{0}}+q_{1-\alpha_{1}})}{\sqrt{(2^{t}-1)/2}+q_{1-\alpha_{1}}}+ 1, $$
(6)

where \(q_{1-\alpha _{0}}\) and \(q_{1-\alpha _{1}}\) are the respective quantiles of the standard normal distribution.

3 Statistical integral distinguisher with multiple structures on input and integral properties on output

In some integral distinguishers, there are b groups of t output bits with the active property. If we can utilize all properties at the same time, the data complexity can be further reduced. What’s more, in some attack settings, N s structures, i.e. that N s different λ, should be used together. For these special settings, we construct the new statistical integral distinguisher in this section.

Firstly, we split the input into two parts and output into b + 1 parts.

$$H:\mathbb{F}_{2}^{r} \times \mathbb{F}_{2}^{s} \rightarrow \mathbb{F}_{2}^{t} \times \mathbb{F}_{2}^{t}\times \ldots \times \mathbb{F}_{2}^{t}\times \mathbb{F}_{2}^{u},\ H(x,y)=\left( \begin{array}{c} H_{1}(x,y) \\ H_{2}(x,y)\\ \ldots \\ H_{b + 1}(x,y) \end{array}\right). $$

Then we use \(T_{\lambda }^{i}\) to denote the function H i where the first r bits of its input are fixed to the value λ and b outputs H i ,1 ≤ ib, are considered:

$$T_{\lambda}^{i}:\mathbb{F}_{2}^{s}\rightarrow\mathbb{F}_{2}^{t},\ T^{i}_{\lambda}(y)=H_{i}(\lambda,y), i = 1,2,\ldots,b. $$

For a special integral distinguisher, when y iterates all possible values of \(\mathbb {F}_{2}^{s}\), \(T_{\lambda }^{i}(y), i = 1,2,\ldots ,b\) are all uniformly distributed with probability one. Further more, if we take N s values for λ, i.e.N s structures and in each structure y iterates all possible values of \(\mathbb {F}_{2}^{s}\), the integral properties on output are satisfied as well.

Now assume we need N < 2s values of y under each structure and we use N s structures which are independent. \(T_{\lambda }^{i}(y) \in \mathbb {F}_{2}^{t}, i = 1,2,\ldots ,b\) are computed for each y and we allocate a counter vector \(V_{i}[T_{\lambda }^{i}(y)]\) to store the occurrences of \(T_{\lambda }^{i}(y)\). Then we investigate the distribution of the following statistic:

$$ C=\sum\limits_{\lambda= 1}^{N_{s}} \sum\limits_{i = 1}^{b} \sum\limits_{T_{\lambda}^{i}(y)= 0}^{2^{t}-1}\frac{(V_{i}[T^{i}_{\lambda}(y)]-N\cdot2^{-t})^{2}}{N\cdot2^{-t}}. $$
(7)

The statistic C follows different distributions determined by whether we are dealing with an actual cipher or a random permutation.

Proposition 1

For sufficiently largeNandt, thestatistic\(\frac {2^{s}-1}{2^{s}-N}C_{cipher}\)(C c i p h e r is the statisticCfor cipher) followsaχ2-distribution withdegree of freedombN s ⋅ (2t − 1),which means thatC c i p h e r approximately follows a normal distribution with mean and variance

$$\begin{array}{@{}rcl@{}} \mu_{0}&=&Exp(C_{cipher})=b\cdot N_{s} \cdot(2^{t}-1)\frac{2^{s}-N}{2^{s}-1},\\ {\sigma_{0}^{2}}&=&Var(C_{cipher})= 2b\cdot N_{s} \cdot(2^{t}-1)(\frac{2^{s}-N}{2^{s}-1})^{2}. \end{array} $$

The statisticC r a n d o m (C r a n d o m is the statisticCfor randomly drawn permutation)follows aχ2-distributionwith degree of freedombN s ⋅ (2t − 1),which means thatC r a n d o m approximately follows a normal distribution with mean and variance

$$\mu_{1}=Exp(C_{random})=b\cdot N_{s}\cdot (2^{t}-1)\ \text{and}\ {\sigma_{1}^{2}}=Var(C_{random})= 2b\cdot N_{s}\cdot(2^{t}-1). $$

Proof

Deduced from Proposition 1 in [25], for a randomly drawn permutation, the statistic \(\sum _{T_{\lambda }^{i}(y)= 0}^{2^{t}-1}\frac {(V_{i}[T^{i}_{\lambda }(y)]-N\cdot 2^{-t})^{2}}{N\cdot 2^{-t}}\) follows a χ2-distribution with degree of freedom 2t − 1 for any λ and i. Then the statistic \(C^{\prime }_{random}\) for the randomly drawn permutation

$$C_{random}=\sum\limits_{\lambda= 1}^{N_{s}} \sum\limits_{i = 1}^{b} \sum\limits_{T_{\lambda}^{i}(y)= 0}^{2^{t}-1}\frac{(V_{i}[T^{i}_{\lambda}(y)]-N\cdot2^{-t})^{2}}{N\cdot2^{-t}}$$

is the sum of N s b independent χ2 statistics with degree of freedom 2t − 1, so the statistic C r a n d o m follows a χ2-distribution with degree of freedom bN s ⋅ (2t − 1). Footnote 2 Then for sufficiently large N and t, C r a n d o m approximately follows a normal distribution with the expected value and variance:

$$Exp(C_{random})=b\cdot N_{s}\cdot(2^{t}-1)\ \text{and}\ Var(C_{random})= 2b\cdot N_{s}\cdot(2^{t}-1). $$

Since the statistic for the cipher \(\frac {2^{s}-1}{2^{s}-N}\sum _{T_{\lambda }^{i}(y)= 0}^{2^{t}-1}\frac {(V_{i}[T^{i}_{\lambda }(y)]-N\cdot 2^{-t})^{2}}{N\cdot 2^{-t}}\), for any λ and i, follows a χ2-distribution with degree of freedom 2t − 1 deduced from [25]. Then the statistic \(\frac {2^{s}-1}{2^{s}-N}C^{\prime }_{cipher}\) for the cipher

$$\frac{2^{s}-1}{2^{s}-N}C_{cipher}=\sum\limits_{\lambda= 1}^{N_{s}} \sum\limits_{i = 1}^{b}\frac{2^{s}-1}{2^{s}-N} \sum\limits_{T_{\lambda}^{i}(y)= 0}^{2^{t}-1}\frac{(V_{i}[T^{i}_{\lambda}(y)]-N\cdot2^{-t})^{2}}{N\cdot2^{-t}} $$

is the sum of N s b independent χ2 statistics with degree of freedom 2t − 1, so the statistic \(\frac {2^{s}-1}{2^{s}-N}C_{cipher}\) follows a χ2-distribution with degree of freedom bN s ⋅ (2t − 1). Then for sufficiently large N and t, C c i p h e r approximately follows a normal distribution with the expected value and variance:

$$Exp(C_{cipher})=b\cdot N_{s}\cdot(2^{t}-1)\cdot \frac{2^{s}-1}{2^{s}-N}\ \text{and}\ Var(C_{cipher})= 2b\cdot N_{s}\cdot(2^{t}-1)\cdot (\frac{2^{s}-1}{2^{s}-N})^{2}. $$

Corollary 1

Under the assumption of Proposition 1, for type-I error probabilityα0(theprobability to wrongfully discard the cipher), and type-II error probabilityα1(the probability to wrongfully accept a randomly chosen permutation asthe cipher), to distinguish a cipher and a random permutation based onbindependentt-bit outputs whenrandomly choosingN s values forr-bitinputs andNvaluesfors-bit inputs,then the following equation holds.

$$ N=\frac{(2^{s}-1)(q_{1-\alpha_{0}}+q_{1-\alpha_{1}})}{\sqrt{(b \cdot N_{s} \cdot (2^{t}-1))/2}+q_{1-\alpha_{0}}}+ 1, $$
(8)

where \(q_{1-\alpha _{0}}\)and \(q_{1-\alpha _{1}}\)are the respective quantiles of the standard normal distribution.

Corollary 1 is obtained from the equation about the decision threshold \(\tau =\mu _{0}+\sigma _{0}q_{1-\alpha _{0}}=\mu _{1}-\sigma _{1}q_{1-\alpha _{1}}\). While the statistic test is also based on the decision threshold τ: if Cτ, the test outputs ‘cipher’; Otherwise, the test outputs ‘random’. Note that in this statistical method the success probability Ps = 1 − α0 and the relation between α1 and the advantage of the attack a is α1 = 2a.

In order to verify the theoretical model in Corollary 1, we implement the experiments for mini version of AES in Appendix AA.1. It shows that the experimental results are in good accordance with the theoretical results.

From (8), we know that the data complexity for the statistical distinguisher is NN s . For the given values of n,s,t,α0,α1, the ratio of the data complexity with N s structures to that with one structure is \(\sqrt {N_{s}}\). It means that more structures will result in high data complexity, so we should avoid to utilize more structures. However, for the known-key integral distinguisher for AES etc., we have to use enough structures to make the plaintexts and the ciphertexts satisfying the desired properties simultaneously. Moreover, if b is increased, the data complexity can be reduced, but as b increases, the time complexity in some situations will be increased accordingly. Thus, we should take the proper value for b according to the time-data tradeoff.

4 Known-key distinguishers on AES-like cipher

In this section, we follow the definition of known-key distinguisher in [10] and use our new statistical integral model to put forward known-key distinguishers on AES-like ciphers based on Gilbert’s work at ASIACRYPT’14.Footnote 3

4.1 Definition of known-key distinguisher

Known-key model was introduced by Knudsen and Rijmen in [17] to learn something about the security margin of a cipher. There exist differences between secret-key model and known-key model. In secret-key model, the key used in the cipher is secret to the adversary so that the cipher is more like a black box. The adversary needs to recover the key or distinguish this cipher from any random permutation by accessing to this black box. While in known-key model, the adversary knows the key used in the cipher. The goal of such adversary is to achieve an input-output correlation which she can not achieve with the inputs and outputs obtained by accessing to any random permutation. In this paper, we follow the definition of known-key distinguisher in [10] below.

Definition 1 (T-Intractable Relation [10])

Let E : (K,X) ∈{0,1}k ×{0,1}nE K (X) ∈{0,1}n denote a block cipher of block size n bits. Let N ≥ 1 and \(\mathcal {R}\) denote an integer and any relation over the set S of N-tuples of n-bit blocks. \(\mathcal {R}\) is said to be T-intractable relatively to E if, given any algorithm \(\mathcal {A}^{\prime }\) that is given an oracle access to a perfect random permutation π of {0,1}n and its inverse, it is impossible for \(\mathcal {A}^{\prime }\) to construct in time TT two N-tuples \(\mathcal {X}^{\prime } = (X^{\prime }_{i})\) and \(\mathcal {Y}^{\prime } = (Y^{\prime }_{i})\) such that \(Y^{\prime }_{i} = {\Pi }(X^{\prime }_{i}), i = 1\ldots N\) and \(\mathcal {X}^{\prime } \mathcal {R} \mathcal {Y}^{\prime }\) with a success probability \(p^{\prime } \ge \frac {1}{2}\) over π and the random choices of \(\mathcal {A}^{\prime }\). The computing time T of \(\mathcal {A}^{\prime }\) is measured as an equivalent number of computations of E, with the convention that the time needed for one oracle query to π or π− 1 is equal to 1. Thus if q denotes the number of queries of \(\mathcal {A}^{\prime }\) to π or π− 1, qT.

Definition 2 (Known-Key Distinguisher [10])

Let E : (K,X) ∈{0,1}k ×{0,1}nE K (X) ∈{0,1}n denote a block cipher of block size n bits. A known-key distinguisher \((\mathcal {R}, \mathcal {A})\) of order N ≥ 1 consists of (1) a relation \(\mathcal {R}\) over the N-tuples of n-bit blocks (2) an algorithm \(\mathcal {A}\) that on input a k-bit key K produces in time \(T_{\mathcal {A}}\), i.e. in time equivalent with \(T_{\mathcal {A}}\) computations of E, an N-tuple \(\mathcal {X} = (X_{i}), i = 1\ldots N\) of plaintext blocks and an N-tuple \(\mathcal {Y} = (Y_{i}), i = 1\ldots N\) of ciphertext blocks related by \(Y^{\prime }_{i} = E(X^{\prime }_{i})\), The two following conditions must be met:

  • The relation \(\mathcal {R}\) must be \(T_{\mathcal {A}}\)-intractable relatively to E.

  • The validity of \(\mathcal {R}\) must be efficiently checkable: we formalize this requirement by incorporating the time for checking whether two N-tuples are related by \(\mathcal {R}\) in the computing time \(T_{\mathcal {A}}\) of algorithm \(\mathcal {A}\).

This definition can be understood as follows. Firstly, the adversary can derive an N-tuples of input and output blocks satisfying a relation \(\mathcal {R}\) with time T, while she cannot derive from oracle queries to any random permutation and its inverse an N-tuples of input/output pairs satisfying the same relation with time less than T. Secondly, this relation \(\mathcal {R}\) must be achieved under each key from the key space and must be checkable without knowing K. In this paper, we use statistical integral property to implement known-key distinguisher on AES-like cipher instead of integral property, so we slightly modify the definition that an N-tuples of input and output blocks satisfies a relation \(\mathcal {R}\) with a high probability while the same relation exists on random permutation with a very low probability.

4.2 New known-key distinguisher on 8-round AES-like cipher

As described in Section 2.3, the known-key distinguisher for AES8 is based on the uniformly distributed integral property with 232 structures and each structure takes 232 texts. This integral property can be transformed into a statistical integral property for AES-like cipher by using Proposition 1. So in our known-key distinguisher on \(AES^{like}_{8}\), we utilize the statistical integral property on each cell of input and output to distinguish the actual cipher from random permutation. In this way, the required number of structures and texts in one structure can be reduced. The process to distinguish the actual cipher \(AES^{like}_{8}\) from random permutation is described in Algorithm 1.

figure a

Since we use the statistical integral distinguisher twice, both type I error probabilities are set to be α0, for the case of \(AES^{like}_{8}\), the probability to wrongly regard AES8 as a random permutation is α0 + (1 − α0)α0, which means the success probability of correctly identifying AES-like cipher is about (1 − α0)2.

While for the case of random permutation, the adversary can implement encryption and decryption oracle queries to the cipher and random permutation. But statistical integral property (exploit χ2 distribution) is different from traditional integral property (utilize uniform distribution). At the best of times the adversary chooses the data which automatically satisfy the statistical property on the input, but satisfy the statistical property on the output with probability α1. In order to satisfy the statistical properties both on the input and output, the probability of wrongly regarding this random permutation as AES cipher is 1 × α1.

To summarize, the relation \(\mathcal {R}\) used in this known-key distinguisher is that N2-tuples of input and output satisfy statistical integral property respectively. For the AES cipher, this relation exists with probability of (1 − α0)2, while the same relation can be achieved with probability of only α1 by oracle queries to any random permutation. The advantage to distinguish AES cipher from random permutation is (1 − α0)2α1, which should be not negligible. The total time complexity of this known-key distinguisher is about 2 × N2 computations. The memory requirements are about m2 × 2c used for storing the counter vector V [m2][2c].

4.3 New known-key distinguisher on 10-round AES-like cipher

The statistical integral distinguisher on AES10 is based on the distinguishing property of AES10 in [10], which is represented according to (4), see Fig. 2.

Fig. 2
figure 2

Take known-key distinguisher for AES10 as an example

Along with the idea within the distinguisher on AES10 in [10], in our known-key distinguisher on \(AES^{like}_{10}\), we use N s < 2m×c structures, each of which takes N = N s middle texts, to obtain N2 input/output pairs. For AES-like cipher, there is one value for (Δ,Γ) to let each byte of MCSRSB(input ⊕Δ) and MC− 1SB− 1SR− 1(output ⊕Γ) satisfy the statistical integral property with a high probability. But for any random permutation, the probability of having one solution for (Δ,Γ) to obtain the same property is very low.

However, just simplely in above way, the distinguisher has high time complexity. In order to reduce the time complexity, we implement the distinguisher in the following way. As N s structures are used, we divide them into N s /n s groups and each group has n s structures. Then we compute the statistic value for each group. There is one value (Δ,Γ) to make all statistics for N s /n s groups on both states Input = MCSRSB(input ⊕Δ) and Output = MC− 1SB− 1SR− 1(output ⊕Γ) less than the given threshold τ for AES10. Meanwhile, for any random permutation, even if the attacker can carefully choose the inputs to find one value of Δ to satisfy the statistical property on the state Input with probability one, the probability of finding one value Γ to satisfy the statistical property on the state Output is very low.

In order to further reduce the time complexity in condition of ensuring a non-negligible distinguishing advantage, we only focus on statistics on m-cell states – \(Input^{\prime }_{(0\sim m-1)}\) and \(Output^{\prime }_{(0\sim m-1)}\). So we only need to find two (m × c)-bit values for Δ and Γ. The detailed process for this known-key distinguisher on \(AES^{like}_{10}\) is described in Algorithm 2.

In Algorithm 2, we filter out the wrong values for Δ with the statistics on \(Input^{\prime }_{(0\sim m-1)}\) one by one. At last, the probability that one wrong Δ is remained after all N s /n s filtering processes is about \((2^{m\times c}-1) \cdot \alpha _{1}^{m\times N_{s}/n_{s}}\), while the probability that the right candidate Δ cannot pass the filtering process is \(1-(1-\alpha _{0})^{m\times N_{s}/n_{s}}\).

In the similar way, we filter out the wrong values for Γ with the statistics for \(Output^{\prime }_{(0\sim m-1)}\) one by one. Finally, the probability that one wrong Γ can pass the filtering process is also \((2^{m\times c}-1) \cdot \alpha _{1}^{m\times N_{s}/n_{s}}\), while the probability that the right Γ cannot pass the filtering process is \(1-(1-\alpha _{0})^{m\times N_{s}/n_{s}}\) as well. Therefore, for the case of AES10, the probability to correctly identify the AES10 cipher is about \((1-(1-\alpha _{0})^{m\times N_{s}/n_{s}})^{2}\).

While for the case of random permutation, at the best of the times the adversary can choose the inputs that there is always at least one value of Δ remaining after the filtering process, but the probability that at least one Γ survives after the filtering process is about 0.

To summarize, the relation \(\mathcal {R}\) used in this known-key distinguisher is that there exists one pair (Δ,Γ) such that N2-tuples of MCSRSB(input ⊕Δ) and MC− 1SB− 1SR− 1(output ⊕Γ) satisfy the statistical integral property respectively. For AES cipher, this relation exists with probability of \((1-(1-\alpha _{0})^{4N_{s}/n_{s}})^{2}\), while the same relation can be achieved with probability of about 0 for any random permutation. So the success probability of this distinguisher is about \((1-(1-\alpha _{0})^{4N_{s}/n_{s}})^{2}\) which is the distinguishing advantage as well.

The time complexity of Steps 2 ∼ 3 is N2 full round encryptions. Then the time complexity of Steps 4 ∼ 9 is about 2m×c/2 × n s × N memory accesses (MA). Steps 10 ∼ 15 take m/2 × 2(m+ 2)×c × n s MA, Steps 19 ∼ 26 take 2m×c × α1 × n s × N MA and Step 27 needs about \(2^{m\times c}\times ({\alpha _{1}^{2}}+{\alpha _{1}^{3}}+\ldots + \alpha _{1}^{m-1}) \times n_{s} \times N\) MA. The time of Step 27 depends on the value of α1. Usually we only need to calculate one or two items. After one filter process, the number of candidates for Δ is about 1. Consequently by filtering with other N/n s − 1 groups of structures, the time complexity of Step 28 is (N/n s − 1) × n s × N MA. Since Step 34 also takes the same times of encryptions as Steps 4 ∼ 28, if we roughly set that one access to a table is equivalent to one full round encryption, the total time complexity of the whole attack is about \(N^{2}+ 2(2^{m\times c/2}\times n_{s} \times N+m/2 \times 2^{(m + 2)\times c} \times n_{s}+ 2^{m\times c}\times (\alpha _{1}+ {\alpha _{1}^{2}}+\ldots + \alpha _{1}^{m-1}) \times n_{s} \times N+(N/n_{s}-1)\times n_{s}\times N)\)full round encryptions. In addition, the dominant memory requirements happen on V [N][N] and V[N][N], which need about 2 × m × N2.

figure b

5 Application on AES-like ciphers

In this section, we apply the known-key distinguishers on actually AES-like ciphers including AES, Whirlpool, Grøstl and PHOTON permutations. These distinguishers are the best ones under known-key setting.

5.1 Application to AES

Advanced Encryption Standard (AES) [8], published by NIST, is the first target for our method. Take AES-128 as an example. Its block size is 128 bits and it totally has 10 rounds. By applying known-key distinguishers on AES, we have the following results.

8-round known-key distinguisher

According to Proposition 1, we have s = 32, t = 8, b = 16 and N = N s . If we set the error probabilities α0 = 2− 128 and α1 = 2− 128 (the values of α0 and α1 can be different and take any suitable values), then \(q_{1-\alpha _{0}}=q_{1-\alpha _{1}} \approx 13.06\). Calculating with (8), N = N s ≈ 220.81 and the threshold value τ ≈ 7478730631.39. Following the algorithm in Section 4.2, this distinguisher can be implemented successfully with 2 × (220.81)2 = 242.61 computations and 16 × 28 × 2 = 213 bytes of memory. Meanwhile, for the case of AES8, the probability of correctly identifying AES cipher is about 1 − 2− 127, but for the case of random permutation, the probability of wrongly regarding this random permutation as AES cipher is 2− 128. So the distinguishing advantage is 1 − 2− 127 − 2− 128 ≈ 1, which is not negligible.

10-round known-key distinguisher

In this setting, by applying Proposition 1, we have s = 32, t = 8, b = 1 and n s = 28. If we set the error probabilities α0 = 2− 50 and α1 = 2− 10.51, then N = 227.92 and τ = 64123.53 according to (8). Following the algorithm in Section 4.3, this known-key distinguisher needs 259.60 full-round encryptions and 258.84 bytes of memory. The distinguishing advantage is about 1 − 2− 27.08 which is non-ignorable.

The best previous known-key distinguisher on AES-128 is 10 rounds and was introduced by Gilbert [10], in which the time complexity is 264. Comparing with this distinguisher, ours are the best one according to the time complexity.

5.2 Application to Whirlpool

Whirlpool hash function [3] is an ISO/IEC standard hash function designed by Barreto and Rijmen in 2000. It processes 512-bit message blocks and produces a 512-bit hash value. Its compression function is based on an AES-like block cipher E with the Miyaguchi-Preneel mode: \(H_{j}=h(H_{j-1},M_{j})=E_{H_{j-1}}(M_{j})\oplus M_{j}\oplus H_{j-1}\), where H j is the chaining value and M j is the j-th message block. This block cipher E employs two similar 10-round permutations: one takes the chaining value as input to produce 11 subkeys for the second permutation; the second one updates 8 × 8 bytes message block with the subkeys from the first permutation with 10 rounds. The round transformation includes 4 operations: SubBytes(SB), ShiftColumns(SC), MixRows(MR) and AddRoundKey(AK) or AddRoundConstant(AC) in key schedule, which is similar to AES. For more detail, please refer to [3]. Known-key distinguishers on block cipher E (regard this permutation as block cipher) can be obtained by following the method in Section 4. The only difference is that R and S operate on columns and rows respectively.

8-round known-key distinguisher

By applying Proposition 1 on the known-key distinguisher, we have s = 64, t = 8, b = 64 and N = N s . If we set the error probabilities α0 = 2− 512 and α1 = 2− 512, then \(q_{1-\alpha _{0}}=q_{1-\alpha _{1}} \approx 26.48\). According to (8), N = N s ≈ 242.15 and the threshold value τ ≈ 79819582776240513.99. According to the complexity analysis in Section 4.2, the known-key distinguisher on 8-round E needs 285.31 computations and 216.32 bytes of memory. The distinguishing advantage is about 1 − 2− 511 − 2− 512 ≈ 1.

10-round known-key distinguisher

In this setting, by applying Proposition 1, s = 64, t = 8, b = 1 and n s = 230. If we set the error probabilities α0 = 2− 64 and α1 = 2− 34, then N = 249.46 according to (8). According to Section 4.3, the time and memory complexities are 2113.90 encryptions and 2102.92 bytes respectively. The distinguishing advantage is 1 − 2− 40.54 which is non-ignorable.

The best known-key distinguisher on Whirlpool was proposed by Lamberger et al. in [20] and is only 8 rounds. But there are chosen-key distinguishers for 10-round Whirlpool in [16, 20]. Comparing with the previous results, our known-key distinguisher on 8-round E block cipher is the best one according to the time complexity and on 10-round E is the first published one under known-key distinguishing setting.

5.3 Application to Grøstl-256

Grøstl [9] is a SHA-3 candidate proposal. It is an iterated hash function with a compression function built from two distinct permutations. These permutations are constructed using wide trail strategy so that they are AES-like ciphers. Grøstl has two versions Grøstl-256 and Grøstl-512. The permutation used in Grøstl-256 has 10 rounds. Its internal state is 512 bits (8 × 8 cells). The known-key distinguishers on 8-round and 10-round of this permutation are similar to that for Whirlpool except that R and S operations are implemented on rows and columns respectively. As a result, the distinguisher on 8-round permutation of Grøstl-256 needs 285.31 encryptions and 216.32 bytes of memory and on 10-round permutation needs 2113.90 encryptions and 2102.92 bytes. The latter is the best one so far under known-key setting compared with the best previous ones for 9-round Grøstl proposed in [15, 16].

5.4 Application to PHOTON

PHOTON, introduced by Guo et al. at CRYPTO’11 in [14], is a family of lightweight hash functions. It is designed based on AES-like block cipher in a sponge construction. The security of PHOTON family directly depend on the security of the internal permutations. It is necessary to study the distinguishers for this component. There are five different permutations P100,P144,P196,P256 and P288 being used for PHOTON-80/20/16, PHOTON-128/16/16, PHOTON-160/36/36, PHOTON-224/32/32 and PHOTON-256/32/32. The number of rounds are always 12 rounds. Taking P100 as an example, we propose the known-key distinguishers on it. Since the distinct among these permutations is the size of state, similar method is applied on other 4 ones. We show the results directly in Table 1.

10-round known-key distinguisher

In this setting, by applying Proposition 1, s = 20, t = 5, b = 1 and n s = 210 for P100. If we set the error probabilities α0 = 2− 32 and α1 = 2− 64 (We set the same α0 and α1 and n s = 215, 220, 225 and 230 sequentially for other four permutations), then N = 217.34 according to (8). According to Section 4.3, the time and memory complexities are 240.71 encryptions and 237.00 bytes respectively. The distinguishing advantage is 1 − 2− 21.66 which is non-ignorable.

Compared with the best previous results for up to 9-round permutations of PHOTON in [16], our known-key distinguishers on 10-round permutations are the best ones under known-key distinguishing setting.

6 Secret-key statistical integral distinguisher on 5-round AES

In this section, we propose a secret-key distinguisher on 5-round AES with our statistical integral model based on the work of Sun et al. in [23]. In this distinguisher, the S-box used in AES is secret.

Firstly, we slightly modify the zero-correlation linear hull for 5-round decryption of AES under chosen-ciphertext mode proposed by Sun et al. in [23] (Lemma 3). Let \(V=\{(x_{(i)})\in F^{16}_{2^{8}} | x_{(0)}\oplus x_{(13)}=(k_{5})_{(0)} \oplus (k_{5})_{(13)}\}\), and assume that the input mask Γ I = (a(i))0≤i≤ 15 and output mask \({{\Gamma }^{0}_{O}}=(\beta _{(i)})_{0\le i\le 15}\) satisfy:

$$a_{(i)}= \left\{\begin{array}{ll} a, i = 0, 13;\\ 0,otherwise. \end{array}\right. \beta_{(j)}= \left\{\begin{array}{ll} nonzero, j=\{0,5,10,15\};\\ 0,otherwise. \end{array}\right. $$

Then the correlation for \({\Gamma }_{I}\rightarrow {{\Gamma }^{0}_{O}}\) on V is always 0. Note that there are three other zero-correlation linear hulls as well, when j = {1,6,11,12}, {2,7,8,13}, {3,4,9,14}. The corresponding output masks are \({{\Gamma }^{1}_{O}}\), \({{\Gamma }^{2}_{O}}\) and \({{\Gamma }^{3}_{O}}\) respectively. One of the four cases is shown in Fig. 3.

Fig. 3
figure 3

Zero-correlation linear hull on 5-round AES with secret S-box under secret-key setting. Gray and white cells denote nonzero and zero masks respectively. the two cells with a or b are exactly the same mask

With the technique proposed by Sun et al. in [24], these four zero-correlation linear hulls can be transformed into integral ones. Taking the linear hull \({\Gamma }_{I}\rightarrow {{\Gamma }^{0}_{O}}\) as an example, the corresponding integral distinguisher is that if the adversary takes over 2120 different values of ciphertexts c satisfying c(0)c(13) = (k5)(0) ⊕ (k5)(13), then the values on 4 bytes of plaintext (p(0),p(5),p(10),p(15)) are uniformly distributed.

Based on these integral distinguishers, we can implement a statistical integral distinguisher for each candidate Δ = (k5)(0) ⊕ (k5)(13), where s = 120 and t = 32. In order to have the success probability \((1-\alpha _{0})^{2^{8}}=(1-\alpha _{1})^{2^{8}}= 95\%\), we set α0 = α1 = 0.0002, then \(q_{1-\alpha _{0}}=q_{1-\alpha _{1}}\approx 3.54\). Meanwhile, we can use these four integral distinguishers above together within one structure, so b = 4 and N s = 1. Thus by (8), N = 2106.32 chosen ciphertexts. The decision threshold is about τ ≈ 17179212992.15. As there are 28 different values of Δ, the total data complexity of this distinguisher is N = 2106.32 × 28 = 2114.32 chosen ciphertexts.

What’s more, we can see from algorithm 3, the main time complexity happens on Step 5, which is about 28 × 2106.32 × 4 × 1/16 × 1/5 ≈ 2110 encryptions, if we regard one simple operation as \(\frac {1}{16}\) one round encryption. Besides that, memory requirements are about 4 × 232 × 10 ≈ 237.32 bytes = 233.32 blocks.

figure c

As far as we know, this distinguisher is the best secret-key integral one on 5-round AES with secret S-box.

7 Conclusion

In this paper, we propose a statistical integral distinguisher with multiple structures on input and multiple integral properties on output based the work of Wang et al. at FSE’16. With this distinguisher, we give the known-key distinguishing attack on 8-round and 10-round AES-like ciphers such as AES, the permutation of Whirlpool, PHOTON and Grøstl-256 hash funtions based on Gilbert’s work at ASIACRYPT’14, which are the best known-key distinguishers so far. Besides that, we present a secret-key statistical integral distinguisher on 5-round AES under chosen-ciphertext mode. This is the best integral distinguisher on AES with secret S-box under secret-key setting. As a future work, we try to apply more statistical techniques to the field of symmetric cipher and find improved attack on AES and AES-like ciphers.