Keywords

1 Introduction

In recent years many lightweight block ciphers have been proposed targeting resource-constrained platforms. They adopt simple key schedules to get competitive performance figures in terms of the resource requirements. In this regard not a few of them have linear key schedules. (e.g. Gift [3], Skinny [6], Midori [2], Simon [4], Zorro [21], Prince [14], Led [22], Piccolo [39], Katan [15].) However, there are little cryptanalytic techniques that are applicable to general block ciphers of such a kind. In this work, we will present a framework for the related-key linear attack that can be applied to generic iterative block ciphers with linear key schedules. Since the linear attack was publicized by Matsui [33], there have been many extensions such as attacks using linear hulls [34, 36], multiple linear attacks [7, 27], multidimensional linear attacks [17, 23, 25] and zero-correlation attacks [13]. Though there are lots of works regarding the related-key attacks against block ciphers using differential characteristics, there are not many dealing with the related-key linear attacks. The current work tries to address this issue.

Table 1. Attack results on Simon

1.1 Our Contributions

  • We present a general framework for the related-key linear attack that is applicable to block ciphers with linear key schedules. It is based on classical Matsui’s Algorithms and makes use of a related-key linear approximations that can be obtained from an ordinary linear trail in a straightforward way. We also provide a statistical model for the attack from which we derive estimates for the success probability and the attack complexities.

  • We present experimental results that confirm the validity of our framework including the appropriateness of the statistical model we presume. We consider small-scale variants of Simon and a variant of Present with a linear key schedule for the experiments.

  • We present related-key linear attacks on Simon whose results are summarized in Table 1. The attacks cover more rounds than existing attacks and can be regarded to be better than the generic related-key attack [28] with known key differences and random plaintexts in terms of the attack complexities.

1.2 Related Works

Related-Key Linear Attacks. The idea of using related keys in linear attacks appears in a small number of previous works. Vora et al. [40] described an attack against a round-reduced DES based on a coding theory framework claiming that using related keys one can marginally improve the single-key linear attacks. Hermelin et al. [24] claim a related-key linear attack against the full Present-128 using very special types of chosen key differences based on some assumption regarding the capacity of multidimensional approximations. Bogdadov et al. [10] presented a key recovery attack using related-key linear distinguishers with chosen key differences. Their method works against block ciphers whose key schedules admit certain invariance property. The related-key attack presented in this work uses keys with known differences though it works against block ciphers with linear key schedules.

Linear Attacks Using a Linear Approximation with the Small Correlation. A linear approximation of a block cipher with correlation \(2^{-n/2}\) is usually considered not of much use except when it is exploited in a multiple linear attack together with other approximations. Beierle et al. [6] argue that the Skinny ciphers are secure against related-tweakey linear attacks by presenting bounds on the correlations of linear trails as the number of rounds increases, taking into account the fact that the attacker may utilize the tweakey as the additional data source. Kranz et al. [31] show that the linear tweak trails in such ciphers which can be used in the linear attack are exactly those coming from the ordinary trails. Ashur et al. [1] described a \(\chi ^2\) distinguisher detecting correlations smaller than \(2^{-n/2}\) in the multi-key setting. But they were not able to use the distinguisher for key recovery.

Generic Related-Key Attacks. Kelsey et al. [28] mentioned a related-key attack that can be applied to any block cipher, referring to [42]. The attack uses keys with known differences and the product of the number of related keys and the computational complexity is \(2^k\) in the attack. But the plaintexts are required to be the same regardless of the keys in the attack. So in the known plaintext setting the attacker needs to get about \(M 2^n\) pairs of key difference and plaintext to get \(M \gg 1\) related keys with the same plaintext. Thus the product of the data size and the computational complexity is about \(2^{k+n}\) in such a setting.

1.3 Organization of the Paper

In Sect. 2 we introduce the terminology and notations used in the paper. In Sect. 3 we describe the framework for the related-key linear cryptanalysis against block ciphers with linear key schedules. In Sect. 4 we present attack results on Simon obtained from the framework presented in Sect. 3 together with some dedicated analysis. In Sect. 5 we provide experimental results that corroborate the claims of the paper. In Sect. 6 we discuss the validity and the usefulness of the framework in more detail. We conclude in Sect. 7.

2 Terminology and Notations

\(\mathbb {F}_2\) and \(\mathbb {Z}\) denote the field with two elements and the ring of integers, respectively. A word is a bit string of the length w = 12, 16, 24, 32, or 64. For integers i, j with \(i \le j\), [i..j] denotes the set of integers x such that \(i \le x \le j\). The LSB (least significant bit) of a word is indexed as 0 and is located at the rightmost position. The \((i+1)\)-th rightmost bit of a word x is denoted by x[i] so that x[0] denotes the LSB of x. For a w-bit word x, x[i] with \(i \notin [0..(w-1)]\) means \(x[i \mod w]\). Also x[i..j] denotes the bit string \(x[j]\Vert \ldots \Vert x[i]\) for \(0 \le i \le j < w\). \(x \lll a\) and \(x \ggg a\) denote the circular shift of a word x to the left and right by a bits, respectively. \(\wedge \) represents the bitwise-and of two words. The inner product of a w-bit mask \(\gamma \) and a w-bit value x is defined to be \(\oplus _{i=0}^{w-1} \gamma [i] x[i]\) and is denoted by \( \langle \gamma , x \rangle \). For a Boolean function \(G: \mathbb {F} _2 ^l \rightarrow \mathbb {F}_2\), the correlation of G is defined to be the imbalance \( (| \{x: G(x) = 0 \} | - | \{x: G(x) = 1 \} | ) / 2^l \). For a vectorial Boolean function \(F: \mathbb {F} _2 ^l \rightarrow \mathbb {F} _2 ^m\), an l-bit mask \(\gamma \), and an m-bit mask \(\lambda \), the (linear) correlation of F with respect to the mask pair \((\gamma ,\lambda )\) is defined to be the correlation of the Boolean function G given by \(G(x) = \langle \gamma , x \rangle \oplus \langle \lambda , F(x) \rangle \) and is denoted by \(\varepsilon _F (\gamma , \lambda )\). The Hamming weight of a word x, denoted by \(\text {wt}(x)\), is the number of the nonzero bits of x. For a bit string X with even length, \(X_{\mathrm {L}}\) and \(X_{\mathrm {R}}\) denote the left half and right half of X, respectively. The support of a w-bit word x is defined to be the set of indices \(\{i \in [0..(w-1)] : x[i] \ne 0 \}\) and is denoted by \(\mathsf {supp}(x)\). \(\Vert \) denotes the concatenation of bit strings. Bit strings are expressed in the hexadecimal representations. For example, \(\texttt {c201}\) represents the bit string 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1. For real numbers \(\mu \) and \(\sigma >0\), \(\mathcal {N} (\mu , \sigma ^2)\) denotes the normal distribution with the mean \(\mu \) and the standard deviation \(\sigma \). \(\varPhi \) denotes the cumulative distribution function of the standard normal distribution.

3 Description of the Framework

Fig. 1.
figure 1

A long-key cipher

In our related-key linear attack, the attacker takes advantage of related-key data such that each entry in the data is a triplet \((P, C, \varDelta K)\) of a plaintext P, a ciphertext C, and a key difference \(\varDelta K\) for which the ciphertext is the encrypted value \(E_{K^* \oplus \varDelta K} (P)\) of the plaintext under the key that is the xor of the unknown base key \(K^{*}\) to be recovered and the key difference \(\varDelta K\). The attack proceeds as in the classical Matsui’s Algorithms [33]. In the classical Algorithm 2 using a linear trail, for example, the attacker uses a linear approximation that involves masked intermediate values and a parity bit expressed as an xor of masked base round keys. But in our related-key linear attack, the attacker makes use of a related-key linear approximation that involves masked key differences together with the masked intermediate values and the parity bit. The related-key linear attack is based on the following features of the block ciphers with linear key schedules:

  • When a key K is the xor of an unknown base key \(K^*\) and a known key difference \(\varDelta K\), the difference of round keys derived from K and \(K^*\) can be computed directly from \(\varDelta K\) though two keys are unknown.

  • The intermediate state obtained from a plaintext by performing several encryption rounds with a key K can be also computed using P, \(\varDelta K\), and the round keys derived from \(K^*\). The same holds for the decryption rounds.

3.1 A Related-Key Linear Approximation

Let R, r, and s be integers with \(0 \le s \le s+r \le R\) and let \(E: \mathbb {F}_2^k \times \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n\) be an R-round key-alternating iterative block cipher with k-bit keys and n-bit blocks. Let \(\tilde{E}\) be the long-key cipher corresponding to E and \(\psi \) be the key scheduling function that is linear. That is, \(\tilde{E}\) is a function \(\mathbb {F}_2^{Rn} \times \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n\) defined by

$$\tilde{E}(rk_0 \Vert rk_1 \Vert \cdots \Vert rk_{R-1}, x) = F_R( rk_{R-1} \oplus \cdots F_2( rk_1 \oplus F_1 (rk_0 \oplus x)) \cdots )$$

as in Fig. 1, where each \(F_i\) is a fixed n-bit permutation, \(\psi \) is a function \(\mathbb {F}_2^k \rightarrow \mathbb {F}_2^{Rn}\), and \(E(K,x) = \tilde{E}(\psi (K), x)\) for \((K,x) \in \mathbb {F}_2^k \times \mathbb {F}_2^n\). Suppose that we have an r-round linear trail \([\gamma _s, \gamma _{s+1}, \ldots , \gamma _{s+r}]\) for \(\tilde{E}\) such that the correlation \(\varepsilon _{F_{i+1}}(\gamma _i, \gamma _{i+1})\) for the \((i+1)\)-th round is \(\epsilon _i\) for each \(i \in [s..(s+r-1)]\). It is well-known that the average of the correlations over long keys is \(\epsilon = \epsilon _s \cdots \epsilon _{s+r-1}\). That is,

(1)

where \(\mathbf {rk} = rk_0 \Vert rk_{1} \Vert \cdots \Vert rk_{R-1}\) and \(\tilde{E}_i^{j}\) is the subcipher of \(\tilde{E}\) spanning from the \((i+1)\)-th round to the \((j+1)\)-th round. (See e.g. [35].) Let \(K^*\) be the fixed unknown key to be recovered. Let \(\psi (K^*) = \mathbf {rk}^* = rk_0^* \Vert rk_1^* \Vert \cdots \Vert rk_{R-1}^*\). For each key K, let \(\varDelta K = K \oplus K^*\). Since \(\psi \) is linear, \(\mathbf {\delta rk} := \psi (K) \oplus \psi (K^*)\) is determined by \(\varDelta K\). Let \(\mathbf {\delta rk} = \delta rk_0 \Vert \delta rk_{1} \Vert \cdots \Vert \delta rk_{R-1}.\) By (1),

which means that the correlation of the approximation

(2)

i.e. the imbalance of the approximation as \((x, \varDelta K) \) takes all the values in \(\mathbb {F}_2^{n+k}\), is the same regardless of \(K^*\) and is very close to \(\epsilon \). Since \(\psi \) is linear, we have a linear function \(L_\psi \) and a constant \(C_\psi \) with \(\psi (K) = L_\psi (K) \oplus C_\psi \) for each key K. So for each \(i \in [0..(r-1)]\), we have a linear relation

$$\begin{aligned} \langle \bar{\gamma } _{s+i}, \varDelta K\rangle \oplus \langle \gamma _{s+i}, \delta rk_{s+i} \rangle = 0 \end{aligned}$$
(3)

where \(\bar{\gamma } _{s+i}\) is a mask determined from \(\gamma _{s+i}\) and \(L_\psi \). Now, using the approximation (2) and the relations (3), we get a linear approximation

(4)

whose correlation, i.e. the imbalance of the approximation as \((x, \varDelta K) \) takes all the values in \(\mathbb {F}_2^{n+k}\), is very close to \(\epsilon \). We will call each of (2) and (4) a related-key linear approximation.

Assumption 1

The correlation of the related-key linear approximation (2) is \(\epsilon \).

We will also call the parity bit of the approximation. In Matsui’s Algorithm 1, the attacker tries to recover only the parity bit and the number of attacked rounds is the same as the number of the rounds that the linear approximation spans over. Matsui’s Algorithm 2 tries to add outer rounds to the linear approximation and recover some outer round key bits and, if possible, the parity bits. We will describe the corresponding attacks in our related-key setting.

3.2 Description of Algorithm RKLC-1

In this variant of Matsui’s Algorithm 1, we try to recover the parity bit without added outer rounds. So \(s = 0\) and \(r = R\). Suppose that we have a related-key linear approximation (4) with the correlation \(\epsilon \). Let a random related-key data \(D =\{(P_i , C_i, \varDelta K_i): i =1, \ldots , N \}\) for a key \(K^*\) be given. Compute

If \( \epsilon \tau _0(K^*, D) > 0\), then determine the parity bit to be 0 and otherwise determine it to be 1.

3.3 Description of Algorithm RKLC-2

Fig. 2.
figure 2

The outline of RKLC-2

Now we will describe the related-key attack that tries to add outer rounds to an r-round related-key linear approximation (4) and recover some of the outer round key bits together with the parity bit. Assume that we have the approximation (4) with the correlation \(\epsilon \). Let a random related-key data \(D =\{(P_i , C_i, \varDelta K_i): i =1, \ldots , N \}\) be given. We let \(z_I^* = \bigoplus _{i=0}^{r-1} \langle \gamma _{s+i}, rk_{s+i}^* \rangle \) be the parity bit and \(z_I\) be the candidate value for \(z_I^*\). We perform an attack using (4) as the distinguisher: We identify the positions of the outer round key bits that are required to compute \(\langle \gamma _s, X_s^K \rangle \oplus \langle \gamma _{s+r}, X_{s+r}^K \rangle \) with the triplets of plaintext, ciphertext and the key difference where \(X_s^K\) and \(X_{s+r}^K\) are the intermediate states for the start of the s-th round and the end of the \((s+r-1)\)-th round, respectively, that is \(X_{s}^K = E_0^{s-1}(K \oplus \varDelta K, P)\) and \( E_{s+r}^{R-1} (K \oplus \varDelta K, X_{s+r}^K) = C\). (See Fig. 2. Here bits of \(\delta rk_i\) for outer rounds that are used in the computation are computable directly from \(\varDelta K\).) By allocating a bit value in each of the positions and then concatenating, we get a candidate outer key z. Thus can be expressed as \(g(z, P, C, \varDelta K)\) for some function g. We denote

$$ |\{i: g(z, P_i, C_i, \varDelta K_i) = 0 \}| - |\{i: g(z, P_i, C_i, \varDelta K_i) = 1 \}|$$

by \(\tau (K^*, D, z)\) and calll \(\tau (K^*, D, z)/N\) the observed sample imbalance. Let \(z^*\) denote the correct outer key, i.e. the value of z obtained from \(K^*\). Heuristically, if z is correct, then \(\tau (K^*, D, z)/N\) is likely to be close to \((-1)^{z_I^*} \epsilon \) and otherwise the imbalance is likely to be close to 0. For the actual attack one usually performs the data compression first to reduce the computational complexity. The data compression in a linear attack is a process that collapses the data into a new data with multiplicity considering the outer round computations. It is part of the distillation [7] and is also called the linear compression by others [32]. But the data compression in our related-key setting needs to handle the key differences unlike that in the single-key linear attacks. So the “compression function” \(H_c: \mathbb {F}_2 ^{2n+k} \rightarrow \mathbb {F}_2^d\) with \(2^d \ll N\) we need to get for the data compression is one such that the computation of \(g(z, P,C,\varDelta K)\) can be carried out using z and \( H_c (P, C, \varDelta K)\) or such that there is a function h such that \(g(z, P, C,\varDelta K) = h(z, H_c (P, C, \varDelta K))\) for any \((P, C, \varDelta K)\). Once we have a compression function, we apply it to the data to get the compressed data \(\{(v,n_v) \in \mathbb {F}_2^d \times \mathbb {Z}: n_v = |\{(P,C,\varDelta K) \in D: H_c (P,C,\varDelta K) = v \}| \} \). We determine whether \((z, z_I)\) is correct or not by the decision rule \( (-1)^{z_I} \tau (K^*, D, z) \epsilon \ge t N \epsilon ^2\). Here t is the threshold parameter that enables us to get a tradeoff between the computational complexity and the success probability. To summarize, the attack proceeds as in Algorithm 1. If h(zv) can be expressed as \(h'(z \oplus v)\), we can use the FWHT to reduce the computational complexity as described in [18]. Consider the list of \((z, z_I)\)’s for which \( (-1)^{z_I} \tau (K^*, D,z) \epsilon \ge t N \epsilon ^2\) in the attack. The attack is successful if \((z^*, z_I^*)\) is in the list, and the list may also contain many wrong entries that are called the false alarms.

figure a

3.4 Statistical Model, Success Probability and Attack Complexities

For our related-key attacks, we presume the “standard” key hypotheses that are similar to the ones accepted as valid in the ordinary linear attack using a dominant trail. But for that, we assume that the data D is random and that the round function of the cipher in consideration is not too simple. We will see in Sect. 5.2 that when the trail is not dominant and the number of plaintexts per each key difference in the data gets larger, such hypotheses get less pertinent. Under the standard hypotheses, we get the same estimates for the success probability and the attack complexities in terms of the data size and the correlation as in many previous works (e.g. [8, 38]). But we will clarify our hypotheses in the related-key setting and elaborate on the details. We let \(c_{N,\epsilon } := \sqrt{N}|\epsilon |\) for each \(N>0\) and \(\epsilon \).

Algorithm RKLC-1. Let us consider the attack in Sect. 3.2 that uses a random data D of size N. If we fix \(K^*\) and let D vary, \(\tau _0 (K^*, D)/N\) can be regarded as a random variable. For the attack, we presume the following:

Hypothesis 1

For each \(K^*\), \(\tau _0 (K^*, D)/N\) follows \(\mathcal {N}((-1)^b \epsilon , 1/N)\) where b is the parity bit.

With this hypothesis, the success probability of the attack is \(\varPhi (\sqrt{N}|\epsilon |)\) by Lemma 1.

Lemma 1

Let \(\sigma >0, b, \mu \) be real numbers and let Y be a random variable with \(Y \sim \mathcal {N} (\mu , \sigma ^2)\) . Then \(\mathrm {Pr} \left( Y \ge b \right) = \varPhi \left( (\mu - b)/\sigma \right) \) and \(\mathrm {Pr} \left( Y \le b \right) = \varPhi \left( (b - \mu )/\sigma \right) \).

Algorithm RKLC-2. Now we consider the attack in Sect. 3.3. We fix \(K^*\) and let \(z^*\) be the correct outer key. For each outer key z, we can regard \(\tau (K^*, D,z)/N\) as a random variable letting D vary. The right key hypothesis and the wrong key hypothesis we presume are the following:

Hypothesis 2

(Right Key Hypothesis) . For each \(K^*\), \(\tau (K^*, D,z^*)/N\) follows \(\mathcal {N} ( (-1)^{b^*}\epsilon , (1-\epsilon ^2)/N) \approx \mathcal {N} ((-1)^{b} \epsilon , 1/N)\), where \(b^*\) is the parity bit.

Hypothesis 3

(Wrong Key Hypothesis) . For each \(K^*\), \(\tau (K^*, D,z)/N\) follows \(\mathcal {N} (0, 1/N)\) when \(z \ne z^*\).

We let \(\mathcal {Z}\) be the set of the candidate outer keys and let \(k_O = \log _2 |\mathcal {Z}|\). Let t be the threshold parameter. The success probability of the attack is \(\mathrm {Pr}((-1)^{b^*}\tau (K^*, D,z^*)\epsilon \) \(\ge tN\epsilon ^2)\), which equals \(\varPhi ((1-t) c_{N,\epsilon })\) by Lemma 1 under Hypothesis 2. Using the same Lemma, we also have

  • for \((z, z_I)\) with \(z \ne z^*\), the probability that \((-1)^{z_I} \tau (K^*, D,z) \epsilon \ge t N \epsilon ^2\) is \(\varPhi (-t c_{N,\epsilon })\), and

  • for \((z, z_I)\) with \(z = z^*\) and \(z_I \ne z_I^*\), the probability that \((-1)^{z_I} \tau (K^*, D,z) \epsilon \ge t N \epsilon ^2\) is \(\varPhi ((-1-t)c_{N,\epsilon })\).

under Hypothesis 3. So the false alarm probability is

$$ \begin{array}{l} \mathrm {Pr}( \mathbf {cond}, (z,z_I) \ne (z^*, z_I^*)) =\mathrm {Pr}(\mathbf {cond}, z \ne z^*) + \mathrm {Pr}(\mathbf {cond}, z = z^* , z_I \ne z_I^*) \\ = \mathrm {Pr}(\mathbf {cond} \; | \; z \ne z^*) \mathrm {Pr}(z \ne z^*) + \mathrm {Pr}(\mathbf {cond} \; | \; z = z^* , z_I \ne z_I^*) \mathrm {Pr}( z = z^* , z_I \ne z_I^*) \\ = (2^{k_O}- 1)\varPhi (-tc_{N,\epsilon })/2^{k_O} + \varPhi ((-1-t)c_{N,\epsilon })/2^{k_O + 1}, \end{array} $$

where \(\mathbf {cond}\) is short for the statement \( (-1)^{z_I} \tau (K^*, D,z) \epsilon \ge t N \epsilon ^2 \).

Theorem 1

With N, \(\epsilon \), t as described, the success probability of RKLC-2 is \(\varPhi ((1-t) c_{N,\epsilon })\) and the false alarm probability \(p_\mathrm{{fa}}(t)\) is \( (2^{k_O}- 1)\varPhi (-tc_{N,\epsilon })/2^{k_O} + \varPhi ((-1-t)c_{N,\epsilon })/2^{k_O + 1}\).

Note that \(p_\mathrm{{fa}}(t) \approx \varPhi (-tc_{N,\epsilon })\) when \(k_O\) is not too small. To compare the computational complexity of the attack with that of the exhaustive key search, we say that the complexity of 1 encryption (including the key schedule) is 1. Let \(c_p\) be the complexity of 1 computation of \(H_c\) and \(c_o\) be the complexity of 1 computation of h using an entry in the compressed data and a candidate outer key. Then the computational complexity of RKLC-2 is \(c_p N + c_o 2^{d + k_O} + 2^{k} p_\mathrm{{fa}}(t) \) by Theorem 1. The amount of memory required for the attack is \(O(2^d)\). Let \(c_a\) be the complexity of addition or subtraction of two integers. In many cases, we can reduce the computational complexity by using FWHT:

Theorem 2

The computational complexity of RKLC-2 using FWHT is

$$c_p N + 3 c_a k_O 2^{k_O+1} + 2^{k} p_\mathrm{{fa}}(t), $$

with the success probability \(\varPhi ((1-t) c_{N,\epsilon })\).

But in this case, the memory complexity is \(O(2^{k_O })\). Here we have assumed that the restored outer key z and the parity bit reveal simple independent relations between the bits of \(K^*\), meaning that using the \((k_O+1)\)-bit information that \((z^*, z_I^*)\) reveals about \(K^*\), we can recover the whole k bits of \(K^*\) using other simple \((k-k_O-1)\) relations between the bits of \(K^*\). This is mostly the case when the key schedule is linear.

Fig. 3.
figure 3

A round of Simon and a 1-round linear trail

4 Related-Key Linear Attacks on Round-Reduced Simon

The NSA published two families of lightweight block ciphers Simon and Speck [4]. They have remarkable performance figures on most software and hardware platforms and Simon is the more hardware-oriented of the two. They have been the subject of intensive security analysis since their publication. The designers of Simon expect that it is secure against related-key attacks [5].

4.1 The Simon Family of Block Ciphers

Simonn/k is a block cipher of the classical Feistel structure with k-bit keys and n-bit blocks. Its round function f sends an n/2-bit input x onto \(((x \lll 8) \wedge (x \lll 1)) \oplus (x \lll 2)\).(See Fig. 3.) It has a linear key schedule. We focus on the following ciphers whose key lengths are double the block lengths: Simon32/64, Simon48/96, Simon64/128, and Simon128/256. The details of this section can be applied equally well to the variant of Simon to be used in Sect. 5.

Fig. 4.
figure 4

2-round computations of Simon

4.2 Related-Key Linear Approximations of Simon

Since Simonn/k has the classical Feistel structure, an r-round linear trail can be represented as a sequence of \((r+2)\) n/2-bit masks: \(\varGamma _s. \varGamma _{s+1}. \cdots .\varGamma _{s+r+1}\) represents a linear trail such that at the (\(i+1\))-th round, the input and output masks are \(\varGamma _i \Vert \varGamma _{i+1}\) and \(\varGamma _{i+1}\Vert \varGamma _{i+2}\), respectively, for each \(i \in [s .. (s+r-1)]\). (See Fig. 3.) Such a linear trail leads to the related-key linear approximation

(5)

between \(X_s\), \(\varDelta K\), \(X_{s+r}\), where \(\varLambda \) is a mask with \( \langle \varLambda , \varDelta K \rangle \) = . Such a mask \(\varLambda \) can be easily obtained.

4.3 Adding Outer Rounds

One of the pivotal processes of the related-key attacks is to get effective data compression with the related-key linear approximation. For Simon, we can get effective data compression for prepending and appending many rounds when both the initial mask and the final mask of the linear trail have small Hamming weights. In this subsection, we will explain in detail how to add 2+2 rounds, i.e., how to prepend 2 rounds and append 2 rounds at the same time. How to add \(s + s\) rounds for \(s=3,4,5\) will be explained in Sect. C of the Appendix, from which how to add \(s+s'\) rounds for \(s \ne s'\) and \(2 \le s, s' \le 5\) will be obvious. For simplicity \(a+b\) and ab denote the XOR and AND of \(a, b \in \mathbb {F}_2\) in this section, respectively.

2-round Computation. Let \(rk_0\) and \(rk_1\) be the round keys derived from the candidate key K for the first 2 rounds. For a plaintext \(P = P_{\mathrm {L}} \Vert P_{\mathrm {R}}\) and a key difference \(\varDelta K\), let \(\delta rk_0\) and \(\delta rk_1\) be the derived round key differences for the first 2 rounds. Note that \(\delta rk_0\) and \(\delta rk_1\) can be computed directly from \(\varDelta K\). We want to express each bit of \(X_2 = X_{2,\mathrm {L}}\Vert X_{2,\mathrm {R}} = E_0^1(K \oplus \varDelta K, P)\) in terms of P, \(rk_0\), \(rk_1\), \(\delta rk_0\), and \(\delta rk_1\). Let \(A = f(P_{\mathrm {L}}) \oplus P_{\mathrm {R}} \oplus \delta rk_0\), and \(B = P_{\mathrm {L}} \oplus \delta rk_1 \). (See Fig. 4.) Since \(X_{2,\mathrm {R}} = X_{1,\mathrm {L}} = f(P_{\mathrm {L}}) \oplus P_{\mathrm {R}} \oplus \delta rk_0 \oplus rk_0 = rk_0 \oplus A\) and \(X_{2,\mathrm {L}} = f(X_{1,\mathrm {L}}) \oplus P_{\mathrm {L}} \oplus \delta rk_1 \oplus rk_1 = f(rk_0 \oplus A) \oplus B \oplus rk_1\),

$$\begin{aligned} X_{2,\mathrm {L}}[i]= & {} (rk_0[i-1] + A[i-1])(rk_0[i-8] + A[i-8]) \\&+ \underline{rk_0[i-2]}+A[i-2] + B[i] + \underline{rk_1[i]}, \\ X_{2,\mathrm {R}}[i]= & {} \underline{rk_0[i]} + A[i] \end{aligned}$$

Here \(X_{2,\mathrm {L}}[i]\) can be computed in terms of \(rk_0[i-1]+ A[i-1], rk_0[i-8]+ A[i-8]\), up to \(A[i-2]+B[i]\) xored with a constant determined only by \(rk_0, rk_1\). Note that the underlined terms do not mingle with the plaintext so that we will xor them with the parity bit to get an “adjusted parity bit” in RKLC-2. Otherwise the number of round key bits to restore and, hence, the attack complexity can be increased. By symmetry of the cipher structure, we get similar expressions for bits of \(X_{R-2, \mathrm {R}}\) and \(X_{R-2, \mathrm {L}}\) in terms of \(A' = f(C_{\mathrm {R}}) \oplus C_{\mathrm {L}} \oplus \delta rk_{R-1}\), \(B' = C_{\mathrm {R}} \oplus \delta rk_{R-2}\), \(rk_{R-2}\), and \(rk_{R-1}\). (See Fig. 4.)

The Data Compression. The above arguments tell us how to compress the data when adding 2+2 rounds. Suppose that we want to make use of the related-key linear approximation represented as (5) with \(s = 2\) and \(s+r+2 = R\). Let \(w = n/2\) be the word size. Let \(\mathcal {I}_{\mathrm {L}}\) = \(\mathsf {supp} (\varGamma _s)\) = \(\{ i \in [0 .. (w-1)]: \varGamma _s [i] \ne 0 \}\), \(\mathcal {I}_{\mathrm {R}} = \mathsf {supp} (\varGamma _{s+1})\), \(\mathcal {I}'_{\mathrm {L}} = \mathsf {supp} (\varGamma _{R-2})\), and \(\mathcal {I}'_{\mathrm {R}} = \mathsf {supp}(\varGamma _{R-1})\). The compression function extracts the following values from each data entry \((P, C, \varDelta K)\):

  • A[i] for i such that \((i+1) \; \mathrm {mod} \; w \in \mathcal {I}_{\mathrm {L}}\) or \( (i+8) \; \mathrm {mod} \; w \in \mathcal {I}_{\mathrm {L}}\)

  • \(A'[i]\) for i such that \((i+1) \; \mathrm {mod} \; w \in \mathcal {I}'_{\mathrm {R}}\) or \( (i+8) \; \mathrm {mod} \; w \in \mathcal {I}'_{\mathrm {R}}\)

The outer keys consists of the following outer round key bits that we need to guess:

  • \(rk_0[i]\) for i such that \((i+1) \; \mathrm {mod} \; w \in \mathcal {I}_{\mathrm {L}}\) or \( (i+8) \; \mathrm {mod} \; w \in \mathcal {I}_{\mathrm {L}}\)

  • \(rk_{R-1}[i]\) for i such that \((i+1) \; \mathrm {mod} \; w \in \mathcal {I}'_{\mathrm {R}}\) or \( (i+8) \; \mathrm {mod} \; w \in \mathcal {I}'_{\mathrm {R}}\)

The adjusted parity bit is . Note that the number \(k_O\) of guessed round key bits for outer rounds is at most \(2 \text {wt}(\varGamma _s) + 2 \text {wt} (\varGamma _{R-1})\) and d, \(\log _2\) of the size of the compressed data, is \(k_O + 1\). Note also that using above data compression, we can use FWHT in RKLC-2.

4.4 Attacks on Round-Reduced Simon

Now we will present the attacks on round-reduced Simon summarized in Table 1. Note that we compare our attacks with the current best linear attacks [16] and differential attacks [41] only since the differential/linear attacks are the most limiting attacks on Simon as noted in [5]. Note also that there does not exist a related-key attack that is more efficient than such attacks yet (cf. [30]).

Each of our attacks is an instance of the RKLC-2 and we will specify the positions of the outer round key bits that will constitute the outer keys. We use the linear trails presented in Sect. B of the Appendix. Note that each of them has squared correlation somewhat larger than \(2^{-(n+k)/2}\) and the product of the computational complexity and the data complexity of the presented attack using it is less than \(2^{k+n}\). So each presented attack is an effective one not covered by the generic attack in [28]. We set the threshold parameter t for each attack to 1 so that the success probabilities are all \(\varPhi (0)= 0.5\) by Theorem 1. Also we have \(k_O \ll {\mathrm {log}}_2 (N)\) and use FWHT in each attack so that we estimate the computational complexity as \(c_p N + 2^k \varPhi (-\sqrt{N}|\epsilon |)\) by Theorem 2. We use the estimate \(c_p = R_{\text {add}}/R\), where R is the number of rounds of the round-reduced Simon and \(R_{\text {add}}\) is the number of the added outer rounds. The attack method presented in this work can be applied to other Simonn/k with \(k>n\) in a straightforward manner.

Simon 32/64. We use a 16-round linear trail with the correlation \(2^{-21}\) whose initial and final mask are 40000001 and 00400110, respectively. We can add 4+3 rounds to this linear trail with \(k_O = 35 \): The guessed outer round key bits are \(rk_0[0,2,3,4,6,7,9,10\) 11, 12, 13, 14], \(rk_1[4,5,6,11,12,13,14]\), \(rk_2[6,13]\), \(rk_{21}[0,3,7,12]\), and \(rk_{22}\) [1,2,4,5,6,8,10,11,14,15]. Letting \(N = 2^{46.3}\), \(c_{N,\epsilon } =2^{2.15}\) so that the complexity of RKLC-2 on the 23-round reduced Simon32/64 is \(2^{46.65}\) by Theorem 2.

Simon 48/96. We use a 20-round linear trail with the correlation \(2^{-33}\) whose initial and final mask are 400000000001 and 400000100001, respectively. We add 5+3 rounds to this linear trail with \(k_O = 53\). Letting \(N = 2^{70.9}\), \(c_{N,\epsilon } = 2^{2.45}\) and the complexity of RKLC-2 on the 28-round reduced Simon48/96 is \(2^{71.07}\).

Simon 64/128. We use a 26-round linear trail with the correlation \(2^{-45}\) whose initial mask is 0000000100004044 and final mask is 0000100000004400. We add 4+4 rounds to this linear trail with \(k_O < 80\). Letting \(N = 2^{95.32}\), \(c_{N,\epsilon } = 2^{2.66}\) and the complexity of RKLC-2 on the 34-round reduced Simon64/128 is \(2^{95.5}\).

Simon 128/256. We use a 51-round linear trail with the correlation \(2^{-92}\) whose initial and final mask are 00...004\(\Vert \)00...00 and 00...001\(\Vert \)400...004, respectively. We get an attack on the 62-round reduced Simon128/256 with data complexity \(2^{190.4}\) and computational complexity \(2^{190.76}\) by adding 6+5 rounds. Also, using a 45-round subtrail with the correlation \(2^{-84}\) whose initial and final mask are 100...001\(\Vert \)4400...004 and 400...004\(\Vert \)100...00, respectively, we get an attack on the 55-round reduced Simon128/256 with data complexity \(2^{174.73}\) and computational complexity \(2^{175}\) by adding 5+5 rounds.

5 Experiments

In this section, we carry out experiments using three block ciphers. Two of them are small-scale variants of Simon. The other is Present-L that is a variant of Present-128 that has a linear key schedule.

5.1 Experiments with Variants of Simon

Related-Key Attacks on the 22-round Simon24. We describe experimental results on Simon24/48 that is a 22-round cipher with 48-bit keys and 24-bit blocks. The round function and the key schedule of the cipher are defined exactly in the same way as Simon32/64. The 31-bit constant used in the key schedule is also the same. We try to add 2+2 rounds to the 18-round linear trail 001.000.001.410.001.000.001.410.001.000.001.410.001.000.001.410.001.000.001.400 with the correlation \(\epsilon = 2^{-17}\). The guessed outer round key bits are \(rk_0[4,11], rk_{21}[2,9]\), the number \(k_O\) of the guessed outer round key bits is 4, and the size of the compressed data is \(2^5\) by arguments in Sect. 4. The additional bit to be guessed is the adjusted parity bit \( rk_0[10] \oplus rk_1[0] \oplus rk_{20}[10] \oplus rk_{21}[0] \oplus rk_{21}[8] \oplus \langle \mathtt{{000}}, rk_2\rangle \oplus \langle \mathtt{{001}}, rk_3\rangle \oplus \cdots \oplus \langle \mathtt{{001}}, rk_{19}\rangle \). In the experiment we repeat the key recovery tests using 1,000 different keys \(K^*\). For each key, we generate data of size N for \(N = 2^i \epsilon ^{-2}\) with \(i=-1,0,1,2,3,\) and 4. The number \(\nu \) of data entries per key difference was fixed to as large as \(2^{16}\). For each N, we compute the threshold parameters corresponding to \(p_S = 0.5, 0.6, 0.7, 0.8\), and 0.9 using Theorem 1 and proceed as in Algorithm 1. We count the number of the successful attempts and measure the average of the number of false alarms for the 1,000 tests. The result is shown in Fig. 5 from which we can see that the experimental probabilities are close to the theoretical ones.

Fig. 5.
figure 5

Experimental results for 22-round key recovery on Simon24

Related-Key Attacks on the 16-round Simon32. We try to add 2+2 round to the 12-round trail 0005.0000.0005.c001.1005.0110.0040.0100.0000.0100.0040.0110.0004.0111 with the correlation \(\epsilon = -2^{-17}\). The number \(k_O\) of guessed outer round key bits is 10 and we proceed as in the preceding section. Then we get the results as in Fig. 6.

Fig. 6.
figure 6

Experimental results for 16-round key recovery on Simon32

Fig. 7.
figure 7

Experimental results for 18-round single-key attack on Simon24

Single-Key Attacks on the 18-round Simon24. For comparison with the related-key linear attacks, we also perform a single-key linear attack using the linear hull containing the 14-round trail with correlation \(2^{-13}\) represented as 001.000.001.410.001.000.001.410.001.000.001.410.001.000.001.400. We try to add 2+2 rounds to the linear hull. By analyzing the distribution of the squared correlation of the linear hull over 1,000 keys, we noticed that the linear probability of the linear hull is about \(2^{-23}\). We also observed that the distribution of the correlation of the linear hull is close to \(\mathcal {N}(0, 2^{-23})\) as predicted by arguments in [19]. So we assume that the linear probability \(\epsilon _H^2\) of the linear hull is \(2^{-23}\) and the distribution of the correlation of the linear hull is \(\mathcal {N}(0, \epsilon _H^2)\). Then we apply Matsui’s Algorithm 2 presuming some adjusted key hypotheses for attacks using data sampled without replacement [8]: For the attack we let the data size N be \(2^{19}, 2^{20}, \ldots , 2^{23}\), or \(2^{24}\). We use the decision rule \(|\tau (K^*, D,z)/N| \ge t|\epsilon _H|\) with threshold parameters \(t=0.5, 1.0, 1.5\). The theoretical success probability and the false alarm probability for each (tN) are \(2 \varPhi ( -t \sqrt{N} |\epsilon _H|/\sqrt{1 - N/2^n + N \epsilon _H^2})\) and \(2 \varPhi (-t \sqrt{N} |\epsilon _H|)\), respectively [8, 9]. We observe that the experimental probabilities are close to the theoretical ones as shown in Fig. 7, confirming the analyses in [8] with linear hulls. Considering the success probabilities for each fixed (aN), where a is the advantage \(-{\mathrm {log}}_2(p_{\mathrm {fa}}(t))\) and N is the data size, the single-key linear attack using the linear hull with the linear probability \(2^{-23}\) is not so advantageous compared with the related-key linear attack (RKLC-2) using a linear trail with the linear correlation \(2^{-13}\) as we see in Fig. 8.

Fig. 8.
figure 8

Success probabilities for the single-key/related-key attack

5.2 Experiments with a Variant of Present-128

Let Present-L be a block cipher that originates from the same long key cipher as Present-128 and has a linear key schedule: The key schedule of Present-L is the same as that of Present-128 except that all the 4-bit S-boxes in the key schedule of Present-128 are removed. So each key schedule round of Present-L is just a rotation of the 128-bit state followed by xoring with a round constant. Let T-4R-B21 and T-4R-B42 be the 4-round trails with correlations \(2^{-8}\) such that the input and output mask for each round is 0000000000200000 and 0000040000000000, respectively. Let T-6R-B21 and T-6R-B42 be the 6-round trails with correlations \(2^{-12}\) defined similarly. Considering linear trails such that all the constituent masks have Hamming weight 1, we see that T-4R-B21 and T-4R-B42 have 2 and 1 other trails with the correlation \(\pm 2^{-8}\) in their linear hulls, respectively (cf. [36]). We also see that T-6R-B21 and T-6R-B42 have at least 26 and 7 other trails with the correlation \(\pm 2^{-12}\) in their linear hulls, respectively. In the experiments, we set the data size N to be \(\epsilon ^{-2}\), \(4\epsilon ^{-2}\), or \(16\epsilon ^{-2}\). We also set the number \(\nu \) of data entries per key difference to be 1, 8, or 64. We try to prepend 2 rounds before the 4-round trails or the 6-round trails. The results are as in Figs. 9 and 10. When \(\nu \) is large, the experimental probabilities may deviate considerably from the estimates given by Theorem 1. But when \(\nu \) is close to 1, as in the case of random sampling, they are close to the theoretically predicted ones. Rather surprisingly, results with 6-round trails are closer to predicted ones than with the 4-round ones, though the former are far less dominant in their linear hulls than the latter. We suspect that this has been caused by the nonrandomness of the data we have used. The details of the data are provided in the Appendix.

Fig. 9.
figure 9

Experimental results for 6-round key recovery using T-4R-B21

Fig. 10.
figure 10

Experimental results for 6-round key recovery using T-6R-B42

6 Discussions

6.1 Statistical Models

The standard key hypotheses are not adequate for most single-key linear attacks [8, 9, 11, 12]. But we claim that the standard key hypotheses we presume are adequate for our attacks in the related-key scenario where the attacker has random related-key data though the results in Sect. 5.1 shows that sometimes such key hypotheses are suitable even when the trail is far from dominant and the number of data entries per key difference is quite large. First, the standard right key hypothesis is applicable in our related-key setting since the correlation of the related-key linear approximation is the same regardless of \(K^*\) as mentioned in Sect. 3 while that of the single-key linear approximation obtained from a linear trail varies greatly depending on the key if the trail is not dominant. We do not claim that the standard wrong key hypotheses are adequate in the related-key attack regardless of the round structure of the block cipher. But we claim that such hypotheses are appropriate in the related-key scenario with random data if the round function is not too simple. In the extreme nontypical case when we have only one related key, our attack would become one in the single-key setting and we might have to consider some adjusted wrong key hypotheses especially when the data size is large. The reason is that the distribution of the correlations for wrong keys in the single-key setting has deviation \(O(2^{-n/2})\) [11] from Daemen and Rijmen’s analysis [19] on the distribution of the correlations of linear approximations for n-bit permutations. But in our related-key setting where the data is random, we just need to consider \((k+n)\)-bit-to-n-bit functions where the additional k bits come from key differences so the deviation of the distribution might be \(O(2^{-(k+n)/2})\) that is negligible compared to \(1/\sqrt{N}\) with the data size N being much smaller than \(2^{k+n}\). The standard wrong key hypothesis in Matsui’s Algorithm 2 in the single-key linear attack is not so valid especially when the linear probability of the linear hull is close to \(2^{-n}\). But the experimental results in Sect. 5.1 using a linear trail with the correlation whose absolute value is considerably less than \(2^{-n/2}\) corroborate our claims on the wrong key hypotheses as well as the right key hypotheses in our related-key attacks. We also note that the choice between sampling data with replacement and without replacement yields little difference in the success probability and the attack complexities due to the size of the whole data space.

6.2 Linear Trails, Linear Hulls, and Multiple Linear Approximations

The formulations in Sect. 3 indicate that in our related-key linear attacks it seems natural to use linear trails. The experimental results in Sect. 5 show that our estimates regarding the related-key linear attacks are accurate regardless of whether the trail is dominant or not when random data is used. In fact, none of the linear trail used in the experiments in Sect. 5 are dominant. For example, the linear correlation of the 18-round linear trail for Simon24 is \(2^{-17}\) whose square is quite less than the linear probability of the linear hull containing the trail that is larger than \(2^{-30}\). But the use of the key differences in our attacks seems to make it hard or inherently impossible to utilize the linear hulls. On the other hand, we would get related-key multiple linear attacks and related-key multidimensional attacks as straightforward derivations of the ones presented in [7, 23]. We expect that the key dependency issue in the original attacks will be mitigated in our related-key setting.

6.3 Comparison with Single-Key Linear Attacks

Our attack has significance even when it requires slightly more computation and data than the single-key linear attacks since it can utilize related-key data that the single-key attacks cannot exploit. But it is more meaningful when it is more effective than the generic related-key attack and covers more rounds than the single-key attacks. For a block cipher with a linear key schedule whose key length k is larger than the block length n, we need to find a linear trail with the correlation \(\approx \pm 2^{-(k+n)/4}\) considering the generic related-key attack. The attack will be likely to cover longer rounds than the single-key linear attack only when we can find such a trail that is longer than any single-key linear approximation with the linear probability \(\approx 2^{-n}\). The advantage of the related-key linear attack will be more visible for block ciphers with linear key schedules such that the linear hulls exploited in the single-key linear attacks contain dominant trails. For each of Simon ciphers, we could not cover much longer rounds with the related-key attacks than with the single-key attacks mainly because it admits considerably longer single-key linear characteristics with the linear probability \(\approx 2^{-n}\) originating from linear hulls than single-key linear trails with the correlation \(\approx \pm 2^{-n/2}\). For example, the linear attack on the 25-round reduced Simon48/k in [16] exploits a 16-round linear hull with linear probability \(2^{-42.92}\), while there does not exist a 16-round linear trail whose squared correlation is larger than \(2^{-50}\).

6.4 Application to Tweakable Blockciphers

The statistical model in this work can be slightly modified to provide a relevant framework for the linear attack on the tweakable blockciphers constructed from the tweakey framework [6, 26]. Though the linear attack using a linear approximation with squared correlation much smaller than \(2^{-n}\) against such a cipher is rather straightforward and was considered in [6, 31], it requires suitable statistical models. It seems that Assumption 1 can be adjusted to provide an adequate right key hypothesis for the attack. For example, it may be assumed that the average of the correlations over \((K^*, T)\) as the tweak T varies are very close to the correlation of the linear trail regardless of the base key \(K^*\). Note that the correlations of the related-key linear approximations are the same regardless of \(K^*\), and we just assume that they are the same as the correlation of the linear trail. A wrong key hypothesis that is similar to the one given in this work might be adopted for the attack.

7 Conclusions

We have introduced a general framework for the related-key linear cryptanalysis on block ciphers with linear key schedules. The attack is likely to cover more rounds than the existing linear attacks if the key length of the cipher is much larger than its block length. Using the framework, we are able to get effective related-key linear attacks on Simon that cover longer rounds than the previous attacks. Experiments with small-scale variants of Simon and a variant of Present corroborate the validity of the attack together with the suitability of the statistical model concerning the right keys and wrong keys. Some lightweight ciphers do not consider security against related-key attacks in its design criteria and the attack may be hard to apply in practice due to the requirement of large related-key data. But the attack is certainly one that should be taken into account in the design of a block cipher with a small block length and a linear key schedule that might be used in circumstances where frequent key update is inevitable. An additional feature of our attack is that the complexity of the attack does not depend much on the detail of the key schedule once the key schedule is linear, in contrast with other related-key attacks. As further works, one may try to apply the framework presented in this work to various block ciphers with linear key schedules other than Simon. Another important line of works would be to investigate the multiple linear attacks and multidimensional linear attacks in the related key setting to reduce the attack complexities.