1 Introduction

1.1 Background and Previous Work

Cryptanalysis of Block Ciphers. A block cipher is an encryption scheme which accepts an n-bit plaintext and transforms it into an n-bit ciphertext using a k-bit secret key. Block ciphers are the most widely used class of symmetric key primitives nowadays. Most of the modern block ciphers are iterative, i.e., consist of a sequence of simple operations called rounds repeated multiple times with small alterations. We denote a plaintext/ciphertext pair of a block cipher by (PC) and the n-bit state at the beginning of the r’th round of the encryption process by \(X_r\).

While the design of block ciphers is a well-developed field and various block cipher designs (most notably, the AES [36]) are widely accepted to provide strong security, there is no block cipher with a security proof that is fast enough for being used in practice. Instead, our confidence in the security of block ciphers stems from analyzing their resistance with respect to all known cryptanalytic techniques. Thus, development of cryptanalytic techniques is the main means for understanding the practical security of block ciphers.

Differential Cryptanalysis and Linear Cryptanalysis. The two central statistical techniques in cryptanalysis of block ciphers are differential cryptanalysis, introduced by Biham and Shamir [8], and linear cryptanalysis, introduced by Matsui [31].

Differential cryptanalysis studies the development of differences between two encrypted plaintexts through the encryption process. An r-round differential with probability p of a cipher is a property of the form \(\Pr [X_{i+r} \oplus X'_{i+r} = \varDelta _O | X_i \oplus X'_i = \varDelta _I] = p\), denoted in short \(\varDelta _I \xrightarrow {p} \varDelta _O\). Differential attacks exploit long (with many rounds) differentials with a non-negligible probability.

Linear cryptanalysis studies the development of parities of subsets of the state bits through the encryption process of a single plaintext. An r-round linear approximation with bias q is a property of the form \(\Pr [X_{i+r} \cdot \lambda _O = X_i \cdot \lambda _I] = \frac{1}{2}+q\), denoted in short \(\lambda _I \xrightarrow {q} \lambda _O\). (Recall that the scalar product of \(x,y \in \{0,1\}^n\) is defined as \((\sum _{i=1}^n x_i y_i) \bmod 2\).) Linear attacks exploit “long” approximations with a non-negligible bias.

Differential and linear cryptanalysis were used to mount the best known attacks on numerous block ciphers, most notably DES [35]. As a result, resistance to these two cryptanalytic techniques, and in particular, non-existence of high-probability differentials or high-bias linear approximations spanning many rounds of the cipher, has become a central criterion in block cipher design.

Differential-Linear Cryptanalysis and Other Combined Attacks on Block Ciphers. While precluding long differentials and linear approximations seems to be sufficient for making the cipher immune to differential and linear attacks, it turned out that in many cases, short characteristics and approximations can also be exploited to break the cipher. The first cryptanalytic technique to demonstrate this was differential-linear cryptanalysis (in short: DL technique), introduced by Langford and Hellman [27] in 1994. Langford and Hellman showed that if the cipher E can be decomposed as a cascade \(E=E_1 \circ E_0\), then a high-probability differential for \(E_0\) and a high-bias linear approximation for \(E_1\) can be combined into an efficient distinguisher for the entire cipher E. The DL technique was used to attack many block ciphers, and in particular, yields the best known attacks on the AES finalist Serpent [20, 30], the CAESAR [16] candidate ICEPOLE [22], COCONUT98 [4], Chaskey [28], CTC2 [30], etc.

Differential-linear cryptanalysis was followed by several other combined attacks. In particular, boomerang [38], amplified boomerang [24], and rectangle [3] attacks show that high-probability differentials in \(E_0\) an \(E_1\) can also be combined into an attack on the entire cipher. Other combinations include differential-bilinear, higher-order differential-linear, boomerang-linear attacks, etc. [7]. These combined attacks make non-existence of high-probability short differential and linear approximations a desirable (but harder to fulfill) criterion in block cipher design.

An Informal Description of the Differential-Linear Attack. The DL attack works as follows. Assume that we have a differential \(\varDelta _I \xrightarrow {p} \varDelta _O\) for \(E_0\) and a linear approximation \(\lambda _I \xrightarrow {q} \lambda _O\) for \(E_1\). In order to distinguish E from a random permutation, the adversary considers plaintext pairs with input difference \(\varDelta _I\) and checks, for each pair, whether the corresponding ciphertexts agree on the parity of the mask \(\lambda _O\).

Denote the plaintexts by \(P,P'\), the ciphertexts by \(C,C'\), and the intermediate values between \(E_0\) and \(E_1\) by \(X,X'\), respectively. The attack combines three approximations: The values \(C \cdot \lambda _O\) and \(C' \cdot \lambda _O\) are correlated to \(X \cdot \lambda _I\) and \(X' \cdot \lambda _I\), respectively, by the linear approximation for \(E_1\). The values \(X \cdot \lambda _I\) and \(X' \cdot \lambda _I\) are correlated, as consequence of the differential for \(E_0\). Hence, \(C \cdot \lambda _O\) is correlated to \(C' \cdot \lambda _O\). Figure 1 depicts the relations.

Fig. 1.
figure 1

Differential-linear cryptanalysis

Computation shows that under some randomness assumptions to be discussed below, the equality \(C \cdot \lambda _O=C' \cdot \lambda _O\) holds with probability \(\frac{1}{2}+2pq^2\). Hence, if pq are sufficiently large, then the adversary can distinguish E from a random permutation using \(O(p^{-2} q^{-4})\) chosen plaintexts. As usual, the distinguisher can be transformed into a key recovery attack by guessing some key material, performing partial encryption/decryption, and applying the distinguisher.

Randomness Assumptions Behind the DL Attack. The attack analysis described above (initially presented in [4]) crucially depends on two assumptions:

  1. 1.

    Among the cases where the differential is not satisfied, \(X' \cdot \lambda _I=X \cdot \lambda _I\) holds in half of the cases, as the cipher behaves randomly.

  2. 2.

    There is independence between the two subciphers \(E_0\) and \(E_1\). In particular, the bias of the linear approximations in \(E_1\) is not affected by the fact that they are applied to two intermediate values which correspond to plaintexts with a fixed difference.

As for the first assumption, already in [4] the authors noted that it may fail in many cases, and suggested to check the overall bias of the approximation experimentally whenever possible. Several subsequent papers aimed at formalizing the assumption and at taking into consideration multiple linear approximations for \(E_1\) instead of a single one. The first of those were by Liu et al. [29] and by Lu [30]. Recently, Blondeau et al. [10] presented a formal treatment of the DL attack, based on a general link between differential and linear attacks introduced by Chabaud and Vaudenay [13] and developed by Blondeau and Nyberg [11]. The formal treatment provides an exact expression for the bias of the approximation under the sole assumption that the two parts of the cipher are independent.

Independence Between the Subciphers in the Boomerang Attack. While the assumption on independence between \(E_0\) and \(E_1\) was not studied in previous works on the DL attack, it was studied for another combined attack – the boomerang attack. In 2011, Murphy [33] showed that in various cases of interest, the dependency between \(E_0\) and \(E_1\) may significantly affect the complexity and even the possibility of success of the boomerang attack. Murphy’s claims were supported by several concrete examples given in other papers. In particular, in [9] and [21], dependency between the subciphers was used to significantly reduce the complexity of the boomerang attacks on SAFER++ and on KASUMI, respectively. On the other hand, it was shown in [25] that the boomerang attack on KASUMI presented in [6] fails (i.e., never succeeds), due to dependency between the subciphers. In [21], Dunkelman et al. proposed the sandwich framework in order to take into account the dependency between the subciphers in the attack analysis.

The Boomerang Connectivity Table (BCT). The inspiration to our work comes from the boomerang connectivity table (BCT), proposed by Cid et al. [14] at Eurocrypt’2018 as a new tool for the boomerang attack. The BCT allows computing the complexity of the boomerang attack more accurately, and moreover, enables the adversary to choose the differentials of \(E_0\) and \(E_1\) in a way that exploits the dependency between the subciphers to amplify the overall probability of the boomerang distinguisher. Cid et al. applied the BCT to improve the boomerang attack on the CAESAR finalist Deoxys [23] and to explain an unsolved probability amplification for generating a quartet in the tweakable block cipher SKINNY [2].

1.2 Our Results

In this paper we study the effect of dependency between the subciphers on differential-linear cryptanalysis.

Inaccuracy of Previous Analysis Due to Dependency Between the Subciphers. We show that in differential-linear attacks on several cryptosystems, due to the effect of dependency, complexity analysis using the standard DL framework led to incorrect estimates, which sometimes were very far from the correct value found experimentally. One concrete example is the attack of Dobraunig et al. [19] on a 5-round variant of the CAESAR finalist Ascon [18]. The authors of [19] state that while by the theory of the DL attack, the overall bias of their approximation is expected to be \(2^{-20}\), experiments show that the bias is significantly larger: \(2^{-2}\). The discrepancy is attributed in [19] to multiple linear approximations that affect the overall bias. We show that the huge discrepancy comes mainly from dependency between the two subciphers, and in fact, when we take dependency into account using our new tool presented below, the bias estimate is increased from \(2^{-20}\) all the way to \(2^{-5}\). (The rest of the difference is indeed explained by the effect of other approximations, as claimed in [19]).

The Differential-Linear Connectivity Table. In order to (partially) take the effects of dependency into account, we introduce a new tool: the differential-linear connectivity table (DLCT). For a vectorial Boolean function \(S: \{0,1\}^n \rightarrow \{0,1\}^m\) (e.g., an n-to-m bit S-box in a block cipher), the DLCT of S is an \(2^n \times 2^m\) table whose rows correspond to input differences to S and whose columns correspond to bit masks of outputs of S. The value in the cell \((\varDelta ,\lambda )\), where \(\varDelta \) is a difference and \(\lambda \) is a mask, is

$$ DLCT_S(\varDelta ,\lambda ) = |\{x: S(x) \cdot \lambda = S(x \oplus \varDelta ) \cdot \lambda \}| - 2^{n-1}. $$

We replace the decomposition \(E = E_1 \circ E_0\) used in the standard DL attack by the decomposition \(E=E_1' \circ E_m \circ E_0'\), where \(E_0'\) is covered by the differential, \(E_m\) is covered by the DLCT, and \(E_1'\) is covered by the remainder of the linear approximation. Usually, \(E_m\) covers the first round of \(E_1\) and thus consists of several DLCTs of individual S-boxes applied in parallel. In this case, when computing the overall bias of the DL distinguisher, we replace the biases computed in the first round of the linear approximation by the entries of the DLCT in the corresponding S-boxes. Thus, the DLCT fully addresses the issue of dependency in the switch between \(E_0\) and \(E_1\). Note however that it does not resolve the possible effect of other characteristics and approximations, which still has to be handled using the framework of Blondeau et al. [10] (see Sect. 2).

Relation of the DLCT to the Fourier Transform. We show that each row of the DLCT is equal (up to normalization) to the Fourier transform of the Boolean function represented by the corresponding row of the Differential Distribution Table (DDT) constructed in differential cryptanalysis.

As a result, the DLCT can be computed efficiently using the Fast Fourier Transform. Specifically, each row of the DLCT can be constructed in time \(O(2^n + m 2^m)\) operations (instead of the trivial \(2^{m+n}\)), and thus, the entire DLCT can be computed in time \(O(2^{2n}+ m 2^{m+n})\) operations. This makes computation of the DLCT feasible even for larger encryption units (e.g., when one wants to compute a single row of the 32-bit Super S-box of AES [36]).

Applications of the DLCT. While the basic use of the DLCT is obtaining a more accurate complexity analysis of the DL attack, it can be used to obtain improved DL attacks as well. Indeed, the adversary can use the DLCT to choose the differential for \(E_0\) and the linear approximation for \(E_1\) in a way that exploits the dependency between the subciphers in her favor. We demonstrate this on two concrete ciphers.

Improved DL Attack on ICEPOLE. ICEPOLE [32] is a hardware-oriented authenticated cipher designed by Morawiecki et al. in 2014 and submitted as a candidate to the CAESAR competition. In [22], Huang et al. presented a state-recovery attack in the repeated-nonce settings on 128-bit ICEPOLE with data and time complexity of about \(2^{46}\), using differential-linear cryptanalysis. This attack is the best known attack on ICEPOLE.

We show that by using better differentials which exploit the dependency between the two underlying subciphers, one can reduce the complexity of the attack to \(2^{42}\). Furthermore, by exploiting using a better method for choosing the plaintexts, the complexity can be further reduced \(2^{36}\). We have fully implemented and verified our attack.

Improved DL Attack on 8-round DES. One of the first applications of the DL technique is an attack on 8-round DES [35] presented by Biham et al. [4]. The attack is based on a 7-round differential-linear distinguisher with bias \(2^{-5.91}\). By analyzing the DLCT of the DES S-boxes, we show that the differential and the linear approximation used in the attack can be replaced which leads to an improved bias of \(2^{-5.6}\), thus reducing the complexity of the attack from about 30,000 plaintexts to about 20,000 plaintexts.

As in the case of ICEPOLE, we have fully implemented and verified the attack. While the improvement of our attack over the result of [4] is rather modest, it is another clear example of the applicability of the DLCT and of its ability to exploit dependency in favor of the adversary.

1.3 Organization of the Paper

The rest of the paper is organized as follows. In Sect. 2 we give an overview of the DL attack, and then we present the DLCT and prove that it can be computed efficiently. We use the newly introduced tool to revisit the cryptanalytic results on Ascon [19] in Sect. 3 and on Serpent [5, 20] in Sect. 4, and explain the discrepancy between the theoretical estimate and the experimental results in these two works. In Sects. 5 and 6 we present improved DL attacks on ICEPOLE and reduced-round DES, respectively. We conclude the paper with a few open problems for future research in Sect. 7.

2 The Differential-Linear Connectivity Table

In this section we introduce and discuss the DLCT. We begin with an overview of the DL attack, then we present the DLCT and obtain a new formula for the bias of the DL distinguisher, and finally, we discuss the relation of the DLCT to the Fourier transform and its implications on the DL technique.

2.1 The Differential-Linear Attack

Let E be a cipher that can be decomposed into a cascade \(E= E_1 \circ E_0\). Assume that we have a differential \(\varDelta _I \xrightarrow {p} \varDelta _O\) for \(E_0\), i.e., an input difference \(\varDelta _I\) to \(E_0\) leads to an output difference \(\varDelta _O\) from \(E_0\) with probability p, and a linear approximation \(\lambda _I \xrightarrow {q} \lambda _O\) for \(E_1\), i.e., for \(1/2+q\) of the input/output pairs \((I_i,O_i)\) of \(E_1\) satisfy \(\lambda _I \cdot I_i = \lambda _O \cdot O_i\). Denote plaintexts by \(P,P'\), ciphertexts by \(C,C'\), and intermediate values between \(E_0\) and \(E_1\) by \(X,X'\), respectively.

The Procedure of the DL Attack. As mentioned above, the attack procedure is very simple. In order to distinguish E from a random permutation, the adversary considers plaintext pairs \((P,P')\) such that \(P \oplus P' = \varDelta _I\) and checks whether the corresponding ciphertext pairs \((C,C')\) satisfy \(C \cdot \lambda _O = C' \cdot \lambda _O\). Following Blondeau et al. [10], we denote the overall bias of the DL distinguisher by

$$\begin{aligned} \mathcal {E}_{\varDelta _I,\lambda _O} = \Pr [C \cdot \lambda _O = C' \cdot \lambda _O | P \oplus P' = \varDelta _I]. \end{aligned}$$
(1)

Naive Analysis of the Attack Complexity. The attack uses a combination of three approximations:

  1. 1.

    We have \(\Pr [C \cdot \lambda _O=X \cdot \lambda _I] = \frac{1}{2}+q\), by the linear approximation for \(E_1\).

  2. 2.

    We have \(\Pr [X' \cdot \lambda _I=X \cdot \lambda _I] =\frac{1}{2} \pm \frac{p}{2}\) (where the sign depends on the parity of \(\varDelta _O \cdot \lambda _I\)). This is because by the differential for \(E_0\), for fraction p of the plaintext pairs we have \(X \oplus X' = \varDelta _O\), and in particular, \(X' \cdot \lambda _I = X \cdot \lambda _I \oplus \varDelta _O \cdot \lambda _I\), and we assume that among the rest of the plaintext pairs, \(X' \cdot \lambda _I=X \cdot \lambda _I\) holds in half of the cases.

  3. 3.

    We have \(\Pr [C' \cdot \lambda _O=X' \cdot \lambda _I]=\frac{1}{2}+q\), by the linear approximation for \(E_1\).

Note that the equality \(C \cdot \lambda _O = C' \cdot \lambda _O\) holds if among the three equalities (1),(2), (3), either all three hold or exactly one holds. Using Matsui’s Piling-up lemma [31] (similar analysis holds also when using correlation matrices [17]), we have

$$\begin{aligned} \mathcal {E}_{\varDelta _I,\lambda _O} = \Pr [C \cdot \lambda _O=C' \cdot \lambda _O] = \frac{1}{2}+2pq^2. \end{aligned}$$
(2)

Hence, if pq are sufficiently large, then the adversary can distinguish E from a random permutation using \(O(p^{-2} q^{-4})\) chosen plaintexts (see [10, 37] for the exact relation between the data complexity and the success probability of the distinguisher). As usual, the distinguisher can be transformed into a key recovery attack by guessing some key material, performing partial encryption/decryption, and applying the distinguisher.

The Exact Complexity Analysis of Blondeau et al. [10]. As mentioned above, the naive complexity analysis crucially depends on two randomness assumptions. The first is that the equality \(X' \cdot \lambda _I=X \cdot \lambda _I\) holds in approximately half of the cases in which the differential in \(E_0\) fails; the second is that \(E_0\) and \(E_1\) are independent. Blondeau et al. [10] provided an exact expression for \(\mathcal {E}_{\varDelta _I,\lambda _O}\), relying only on the latter assumption. In order to present their result, we need a few more notations (adapted from [10]).

Consider an encryption function \(E'\) and denote its inputs by \(Z,Z'\) and its outputs by \(W,W'\). We use the notation \(\varDelta _I \xrightarrow [E']{} \varDelta _O\) for the differential transition \(\varDelta _I \rightarrow \varDelta _O\) through \(E'\), and the notation \(\lambda _I \xrightarrow [E']{} \lambda _O\) for the linear transition \(\lambda _I \rightarrow \lambda _O\) through \(E'\). For an input difference \(\varDelta _I\) and an output mask \(\lambda \), we denote

$$ \epsilon ^{E'}_{\varDelta _I,\lambda } = \Pr [W \cdot \lambda = W' \cdot \lambda | Z \oplus Z' = \varDelta _I] - \frac{1}{2}, $$

and for two masks \(\lambda _I,\lambda _O\), we denote

$$ c^{E'}_{\lambda _I,\lambda _O} = 2\left( \Pr [W \cdot \lambda _O = W' \cdot \lambda _O \Big | Z \cdot \lambda = Z' \cdot \lambda ] - \frac{1}{2}\right) . $$

Note that \(c^{E'}_{\lambda _I,\lambda _O}/2\) is the bias of the linear approximation \(\lambda _I \xrightarrow [E']{} \lambda _O\).

By [10, Theorem 2], assuming only independence between \(E_0\) and \(E_1\), we have:

$$\begin{aligned} \mathcal {E}_{\varDelta _I,\lambda _O} = \sum _{\lambda _I} \epsilon ^{E_0}_{\varDelta _I,\lambda _I} (c^{E_1}_{\lambda _I,\lambda _O})^2. \end{aligned}$$
(3)

Of course, the expression (3) is usually hard to evaluate, and thus, in practice one mostly has to rely (at least partially) on randomness assumptions and verify the results experimentally.

2.2 The Differential-Linear Connectivity Table and Its Properties

Definition of the DLCT. Let \(S: \{0,1\}^n \rightarrow \{0,1\}^m\) be a vectorial Boolean function. The DLCT of S is an \(2^n \times 2^m\) table whose rows correspond to input differences to S and whose columns correspond to bit masks of outputs of S. Formally, for \(\varDelta \in \{0,1\}^n\) and \(\lambda \in \{0,1\}^m\), the DLCT entry \((\varDelta ,\lambda )\) is

$$ DLCT_S(\varDelta ,\lambda ) \triangleq \left| \left\{ x\Big |\lambda \cdot S(x) = \lambda \cdot S(x \oplus \varDelta ) \right\} \right| - 2^{n-1}. $$

Sometimes it will be more convenient for us to use the normalized DLCT entry

$$ \overline{DLCT}_S(\varDelta ,\lambda ) \triangleq \frac{DLCT_S(\varDelta ,\lambda )}{2^n} = \Pr [\lambda \cdot S(x) = \lambda \cdot S(x \oplus \varDelta )]-\frac{1}{2} $$

instead of \(DLCT_S(\varDelta ,\lambda )\). The DLCT of Serpent’s S-box S0 is given in Table 1.

Table 1. The DLCT of Serpent’s S0

A natural interpretation of the DLCT is the following. Assume that S is equal to the entire encryption function E. Then \(DLCT_S(\varDelta ,\lambda )\) is equal (up to normalization) to the bias we obtain when we apply to E a DL distinguisher with \(\varDelta _I=\varDelta \) and \(\lambda _O=\lambda \) (that is, to the bias \(\mathcal {E}_{\varDelta ,\lambda }\)). Thus, if we could construct a DLCT for the entire encryption scheme E, then the DLCT would completely capture the DL attack. As such a construction is mostly infeasible, we construct the DLCT for small components of the cryptosystem (usually, single S-boxes or Super S-boxes) that lie on the boundary between \(E_0\) and \(E_1\), in order to obtain accurate analysis of the transition between the two subciphers.Footnote 1

The DLCT Framework. Like in the sandwich [21] and the BCT [14] frameworks of the boomerang attack, when we use the DLCT, we divide the cipher E into three subciphers: \(E= E_1' \circ E_m \circ E_0'\), where \(E_0'\) is covered by the differential \(\varDelta _I \rightarrow \varDelta \), \(E_m\) is covered by the DLCT (or by several DLCTs applied in parallel), and \(E_1'\) is covered by the remainder of the linear approximation \(\lambda \rightarrow \lambda _O\). Usually, \(E_m\) covers the first round of \(E_1\) and thus it consists of several DLCTs of single S-boxes applied in parallel. However, if it is feasible to cover by the DLCT a larger part of the cipher, this is advantageous, as the DLCT gives the exact result for the part of the cipher it covers. For example, we construct such a (partial) DLCT for three rounds in our improved DL attack on 8-round DES presented in Sect. 6.

Complexity Analysis. Assume that we have a differential \(\varDelta _I \xrightarrow {p} \varDelta \) for \(E_0'\) and a linear approximation \(\lambda \xrightarrow {q'} \lambda _O\) for \(E_1'\). (Note that since \(E_1'\) is typically a subcipher of \(E_1\), it is expected that \(|q'|>|q|\)). Denote the intermediate values after \(E_0'\) by \(X,X'\) and the intermediate values after \(E_m\) by \(Y,Y'\).

Adapting the naive analysis of the DL attack complexity presented above (i.e., Eq. (2)), we obtain

$$\begin{aligned} \mathcal {E}_{\varDelta _I,\lambda _O} = 4p \cdot \frac{DLCT_{E_m}(\varDelta ,\lambda )}{2^n} \cdot (q')^2 = 4p \cdot \overline{DLCT}_{E_m}(\varDelta ,\lambda ) \cdot (q')^2. \end{aligned}$$
(4)

Note that in the degenerate case where \(E_m = Id\), we have \(DLCT_{E_m}(\varDelta ,\lambda ) = 2^{n-1}\) for all \((\varDelta ,\lambda )\) and \(q'=q\), and thus, we obtain \(\mathcal {E}_{\varDelta _I,\lambda _O} = 2pq^2\) which is equivalent to Eq. (2) above. Interestingly, when \(E_1'=Id\), the resulting bias is \(\mathcal {E}_{\varDelta _I,\lambda _O} = p\cdot \overline{DLCT}_{E_m} (\varDelta ,\lambda _O)\).

In order to adapt the exact analysis of [10] (i.e., Eq. (3)), a bit more computation is needed. Note that for any \(\lambda \), we have

$$\begin{aligned} \epsilon ^{E_m \circ E_0'}_{\varDelta _I,\lambda }&= \sum _{\varDelta } (\Pr [\varDelta _I \xrightarrow [E_0']{} \varDelta ] \Pr [Y \cdot \lambda = Y' \cdot \lambda | X \oplus X' = \varDelta ]) - \frac{1}{2} \\&= \frac{1}{2^n} \sum _{\varDelta } \Pr [\varDelta _I \xrightarrow [E_0']{} \varDelta ] \cdot DLCT_{E_m}(\varDelta ,\lambda ) = \sum _{\varDelta } \Pr [\varDelta _I \xrightarrow [E_0']{} \varDelta ] \cdot \overline{DLCT}_{E_m}(\varDelta ,\lambda ). \end{aligned}$$

Plugging this expression into Eq. (3) (where \(E_1'\) is used instead of \(E_1\) and \(E_m \circ E_0'\) is used instead of \(E_0\)), we obtain that the exact bias of the DL distinguisher is

$$\begin{aligned} \mathcal {E}_{\varDelta _I,\lambda _O} = \sum _{\varDelta ,\lambda } \Pr [\varDelta _I \xrightarrow [E_0']{} \varDelta ] \cdot \overline{DLCT}_{E_m}(\varDelta ,\lambda ) (c^{E_1'}_{\lambda ,\lambda _O})^2. \end{aligned}$$
(5)

Note that Eq. (5) is still not free of randomness assumptions; e.g., it relies on round independence within \(E_0'\) and \(E_1'\) (see [10]). However, it is the most accurate expression for the bias of the DL distinguisher obtained so far. On the other hand, Eq. (5) is usually hard to evaluate, and in the actual applications of the DLCT we do rely on some randomness assumptions and verify our results experimentally.

Properties of the DLCT. A trivial property of the DLCT is that for any S, we have \(DLCT_S(0,\lambda )=2^{n-1}\) for all \(\lambda \). Indeed, if two inputs to S are equal then the corresponding outputs agree on any bit mask. This means that in the DL attack, if for some S-box in the first round of \(E_1\), the difference \(\varDelta _O\) is zero in the entire S-box, then the linear approximation in that S-box holds for sure, while without the dependency between the intermediate values \(X,X'\) it was anticipated to hold only probabilistically. This trivial property corresponds to the middle round S-box trick used in [9] to speed up the boomerang attack and covered by the BCT [14]. Surprisingly, this feature was not noted before explicitly in the context of the DL attack. For example, even if we take into account only this trivial type of dependency, the bias of the DL distinguisher of Dobraunig et al. [19] on Ascon increases from \(2^{-20}\) to \(2^{-8}\). Interestingly, the authors of [19] chose the linear approximation deliberately in such a way that the active S-boxes in its first round correspond to inactive S-boxes in \(\varDelta \). However, they did not take this dependency into account in the computation of the bias, as it is neglected in the classical DL model.

Another trivial property of the DLCT is that for any S, we have \(DLCT_S(\varDelta ,0)=2^{n-1}\) for all \(\varDelta \)’s. Indeed, if the the input mask is zero (i.e., no output bits are approximated), then their actual value (and by proxy, their input difference), is of no importance.

Inspection of the DLCT of Serpent’s S-box S0 presented in Table 1 shows that it contains the value \(\pm 2^{n-1}\) not only in the trivial entries of the form \(DLCT_S(0,\lambda )\) or \(DLCT_S(\varDelta ,0)\), but also in 9 additional places. Moreover, it contains many very high/very low values that can be used by an adversary, if she can adjust the differential and the linear approximation such that these high/low values are used. The existence of such high/low value entries is not surprising, as current design of S-boxes does not take the DLCT into account. Hence, the DLCT can serve as a new design criterion for S-boxes, partially measuring immunity of the cipher with respect to DL attacks.

2.3 Relation of the DLCT to the Fourier Transform

We now show that the DLCT is closely related to the Fourier transform of the DDT and that this relation can be used to efficiently compute the DLCT. We begin with a few preliminaries.

The Fourier-Walsh Transform of Boolean Functions. Let \(f:\{0,1\}^m \rightarrow \mathbb {R}\) be a Boolean function. (Note that f does not have to be two-valued.) The Fourier-Walsh transform of f is the function \(\hat{f}:\{0,1\}^m \rightarrow \mathbb {R}\) defined by

$$ \hat{f}(y) = \frac{1}{2^m} \sum _{x \in \{0,1\}^m} f(x) \cdot (-1)^{x \cdot y} = \frac{1}{2^m} \left( \sum _{\{x: x \cdot y = 0\}} f(x) - \sum _{\{x: x \cdot y = 1\}} f(x) \right) . $$

The DDT and the LAT. The DLCT resembles in its structure the two central tools used in differential and linear cryptanalysis – the Difference Distribution Table (DDT) and the Linear Approximation Table (LAT). For a vectorial Boolean function \(S: \{0,1\}^n \rightarrow \{0,1\}^m\), the DDT of S is an \(2^n \times 2^m\) table whose rows correspond to input differences to S and whose columns correspond to output differences of S. Formally, for \(\varDelta _I \in \{0,1\}^n\) and \(\varDelta _O \in \{0,1\}^m\), we have

$$ DDT_S(\varDelta _I,\varDelta _O) = \left| \left\{ x\Big |S(x) \oplus S(x \oplus \varDelta _I) = \varDelta _O \right\} \right| . $$

Similarly, the LAT of S is an \(2^n \times 2^m\) table whose rows correspond to bit masks of inputs to S and whose columns correspond to bit masks of outputs of S. Formally, for \(\lambda _I \in \{0,1\}^n\) and \(\lambda _O \in \{0,1\}^m\), we have

$$ LAT_S(\lambda _I,\lambda _O) = \left| \left\{ x\Big |\lambda _O \cdot S(x) = \lambda _I \cdot x \right\} \right| - 2^{n-1}. $$

Relation of the DLCT to the Fourier-Walsh Transform of the DDT. We assert that each row of the DLCT is equal (up to normalization) to the Fourier-Walsh transform of the corresponding row of the DDT. Formally, for each \(\varDelta \in \{0,1\}^n\), denote the Boolean function which corresponds to the \(\varDelta \)’s row of the DDT by \(f_{\varDelta }\). That is, \(f_{\varDelta }:\{0,1\}^m \rightarrow \mathbb {R}\) is defined by

$$ f_{\varDelta }(\varDelta ') = DDT_{S}(\varDelta ,\varDelta ') = \left| \{x \in \{0,1\}^n\Big | S(x) \oplus S(x \oplus \varDelta ) = \varDelta '\}\right| . $$

Proposition 1

For any \(\lambda \in \{0,1\}^m\), we have \(DLCT(\varDelta ,\lambda ) = 2^{m-1} \hat{f}_{\varDelta }(\lambda )\).

Proof

By the definitions of the DLCT and of the Fourier-Walsh transform, we have

$$\begin{aligned}&DLCT_S(\varDelta ,\lambda ) = \left| \left\{ x\Big |\lambda \cdot S(x) = \lambda \cdot S(x \oplus \varDelta ) \right\} \right| - 2^{n-1} \\&= \frac{1}{2} \left( \left| \left\{ x\Big |\lambda \cdot S(x) = \lambda \cdot S(x \oplus \varDelta ) \right\} \right| - \left| \left\{ x\Big |\lambda \cdot S(x) \ne \lambda \cdot S(x \oplus \varDelta ) \right\} \right| \right) \\&= \frac{1}{2} \left( \left| \left\{ x\Big |\lambda \cdot (S(x) \oplus S(x \oplus \varDelta )) = 0 \right\} \right| - \left| \left\{ x\Big |\lambda \cdot (S(x) \oplus S(x \oplus \varDelta )) =1 \right\} \right| \right) \\&= \frac{1}{2} \left( \sum _{\{\varDelta ': \varDelta ' \cdot \lambda = 0\}} f_{\varDelta }(\varDelta ') - \sum _{\{\varDelta ': \varDelta ' \cdot \lambda =1\}} f_{\varDelta }(\varDelta ') \right) = 2^{m-1} \hat{f}_{\varDelta }(\lambda ), \end{aligned}$$

as asserted. \(\blacksquare \)

A Theoretical Implication. The relation of the DLCT to the Fourier-Walsh transform of the DDT yields an interesting theoretical insight on the differential-linear attack.

It is well-known that the DDT and the LAT have the following mathematical interpretations.

  • The DDT: If we model the evolution of differences through the encryption process of a plaintext pair as a Markov chain, then the DDT is simply the transition matrix of the chain (see, e.g., [26]). In this regard, a differential attack utilizes a classical probability-theoretic tool for cryptanalysis.

  • The LAT: For each mask \(\lambda \), the \(\lambda \)’s column of the LAT is equal (up to normalization) to the Fourier-Walsh transform of the Boolean function \(x \mapsto \lambda \cdot S(x)\) (see, e.g., [15]). In this regard, linear cryptanalysis studies the function S via its Fourier transform, as is commonly done in Boolean function analysis (see, e.g., [34]).

Proposition 1 shows that each row of the DLCT is equal (up to normalization) to the Fourier-Walsh transform of the corresponding row of the DDT. Since the DLCT of the entire encryption scheme E completely captures DL attacks as shown above, this implies that the differential-linear attack utilizes an interesting combination of probabilistic and Fourier-analytic techniques: it considers the probability-theoretic transition matrix of a Markov chain associated with the function, and studies it via its Fourier-Walsh transform.

A Practical Implication. It is well-known that the Fourier-Walsh transform of a function \(f:\{0,1\}^m \rightarrow \mathbb {R}\) can be computed in time \(O(m \cdot 2^m)\) operations. Since each row of the DDT of S can be easily constructed in time \(O(2^n)\) operations, Proposition 1 implies that each row of the DLCT can be computed in time \(O(2^n + m 2^m)\) operations, and that the entire DLCT can be computed in time \(O(2^{2n} + m 2^{m+n})\) operations. This significantly improves over the trivial algorithm which requires \(O(2^{2n+m})\) operations.

This speedup is practically important as it allows us to compute the DLCT for larger parts of the cipher, and thus, obtain a more accurate estimate of the complexity of the DL attack. For example, in the attack on 8-round DES presented in Sect. 6, we compute one DLCT entry for three rounds of DES as a single unit, and so the ability to compute the DLCT efficiently is crucial.

3 Differential-Linear Cryptanalysis of Ascon, Revisited

Ascon [18] is an authenticated encryption algorithm that was recently selected to the final round of the CAESAR [16] competition. In [19], Dobraunig et al. presented a practical differential-linear attack on up to 5 rounds of the Ascon permutation, based on a 4-round DL distinguisher. The authors of [19] stated that while by the theory of the DL attack, the overall bias of the approximation is expected to be \(2^{-20}\), experiments show that the bias is significantly higher – \(2^{-2}\). They attributed the difference between practice and the theoretical estimate to multiple linear approximations that affect the overall bias, and used the correct value in their attack. We recompute the bias of the distinguisher using the DLCT and show that a large part of the discrepancy results from dependency between the two subciphers.

In order to recompute the bias, we have to provide some more details on the specific distinguisher used in [19]. We present the distinguisher only schematically.

The Theoretical Analysis of [19]. The DL distinguisher of [19] targets a 4-round reduced variant of Ascon denoted by E and decomposed as \(E=E_1 \circ E_0\), where \(E_0\) consists of rounds 1–2 and \(E_1\) consists of rounds 3–4. For \(E_0\), the distinguisher uses a differential characteristic of the form

$$ \varDelta _0 \xrightarrow [L \circ S]{p_0\,=\,2^{-2}}\varDelta _1 \xrightarrow [L \circ S]{p_1\,=\,2^{-3}} \varDelta _2, $$

where \(\varDelta _2\) is a truncated difference. For \(E_1\), the distinguisher uses a linear approximation of the form

$$ \lambda _0 \xrightarrow [L \circ S]{q_0\,=\,2^{-7}} \lambda _1 \xrightarrow [L \circ S]{q_1\,=\,2^{-2}}\lambda _2, $$

where \(\lambda _2\) consists of a single bit, and all nonzero bits of the mask \(\lambda _0\) are included in S-boxes in which the all the input bits are known to be zero in \(\varDelta _2\). Using the naive complexity analysis of the DL attack (i.e., Eq. (2) above), the authors of [19] concluded that the theoretical estimate for the overall bias of the approximation is \(2 \cdot 2^{-5} \cdot (2^{-8})^2 = 2^{-20}\). On the other hand, they found experimentally that the bias is as high as \(2^{-2}\).Footnote 2

Partial Explanation of the Discrepancy Using Only the Trivial Property of the DLCT. As mentioned above, the linear approximation of \(E_1\) was chosen by Dobraunig et al. in such a way that all nonzero bits of the mask \(\lambda _0\) are included in S-boxes in which all the input bits are known to be zero in \(\varDelta _2\). By the trivial property of the DLCT presented in Sect. 2, this implies that the linear approximation in round 3 holds with bias 1 / 2, instead of the theoretical bias \(q_0=2^{-7}\). Therefore, the estimated bias of the approximation is increased to \(2 \cdot 2^{-5} \cdot (2^{-2})^2=2^{-8}\), which is already much higher than \(2^{-20}\).

Analysis Using the DLCT. We now obtain a higher bias of \(2^{-5}\) by revisiting the analysis using the DLCT. Let us decompose E into \(E=E_2 \circ E_m \circ E_0\), where \(E_0\) consists of rounds 1–3, \(E_m\) consists of round 4, and \(E_2=Id\). Note that since in the DL distinguisher of [19], the output mask \(\lambda _2\) consists of the MSB in the output of S-box no. 9, we are only interested in the entries of the DLCT of that S-box (which we denote by S). For \(E_0\), we use a differential of the form \(\varDelta _0 \xrightarrow [3 \text{ rounds }]{p\,=\,2^{-3}} \varDelta _3\), where \(\varDelta _0\) is the input difference of the DL distinguisher of [19]. In our value of \(\varDelta _3\), three of the input bits to S-box no. 9 are known to be zero; specifically, the input is of the form ?0?00. (Note that the S-box is from 5 bits to 5 bits). The relevant normalized entries of the DLCT of S satisfy:

$$\begin{aligned}&\overline{DLCT}_S(16,16)=0, \overline{DLCT}_S(4,16)=0, \\&\overline{DLCT}_S(20,16)=2^{-1}, \text{ and } \overline{DLCT}_S(0,16)=2^{-1}. \end{aligned}$$

Hence, assuming that each input difference of S occurs in \(\varDelta _3\) with the same probability \(2^{-2}\) and using Eq. (4), we obtain the estimate

$$ 4 \cdot 2^{-3} \cdot 2^{-2} (0+0+2^{-1}+2^{-1}) \cdot (2^{-1})^2 = 2^{-5} $$

for the overall bias of the DL distinguisher of [19]. This value is, of course, much lower than the experimentally obtained bias of \(2^{-2}\) (which may be explained by the effect of other differentials and linear approximations). On the other hand, it is significantly higher than the value \(2^{-20}\) which follows from the classical DL framework. This demonstrates the importance of taking the dependency between subciphers into account, which the DLCT facilitates in an easy manner.

4 Differential-Linear Cryptanalysis of Serpent, Revisited

One of the first applications of the DL technique is an attack on the AES finalist Serpent [1] presented in [5]. The attack is based on a 9-round DL distinguisher with bias of \(2^{-59}\) and targets an 11-round variant of the cipher. An improved attack was presented in [20]. The authors of [20] performed experiments with reduced round variants of Serpent, and concluded that the actual bias of the approximation is \(2^{-57.75}\) and not \(2^{-59}\). Using the improved bias, they extended the attack to 12 rounds of Serpent (out of its 32 rounds) yielding the best currently known attack on the cipher.

In [20], the increased bias was attributed to the existence of other approximations that affect the overall bias. In this section we recompute the bias of the distinguisher using the DLCT and obtain the value \(2^{-57.68}\), which is very close to the experimental value. Thus, we conclude that the increased bias in the experiment results mostly from the dependency between the two subciphers.

In order to recompute the bias, we have to provide some more details on the specific distinguisher used in [5]. For sake of clarity, we present it only schematically, and refer the reader to [5] for the exact difference and mask values.

The Analysis of [5]. The DL distinguisher of [5] targets a 9-round reduced variant of Serpent that starts with round 2 of the cipher. This variant is denoted by E and decomposed as \(E=E_1 \circ E_0\), where \(E_0\) consists of rounds 2–4 and \(E_1\) consists of rounds 5–10. For \(E_0\), the distinguisher uses a differential characteristic of the form

$$ \varDelta _0 \xrightarrow [L \circ S_2]{p_0\,=\,2^{-5}} \varDelta _1 \xrightarrow [L \circ S_3]{p_1\,=\,2^{-1}} \varDelta _2 \xrightarrow [L \circ S_4]{p_2\,=\,1} \varDelta _3, $$

where \(\varDelta _2,\varDelta _3\) are truncated differences. For \(E_1\), the distinguisher uses a linear approximation of the form

$$ \lambda _0 \xrightarrow [L \circ S_5]{q_0\,=\,2^{-5}} \lambda _1 \xrightarrow [5 \text{ rounds }]{q_1\,=\,2^{-23}} \lambda _7, $$

where all nonzero bits of the mask \(\lambda _0\) are included in the bits that are known to be zero in \(\varDelta _3\). Using the naive complexity analysis of the DL attack (i.e., Eq. (2) above), the authors of [5] concluded that the overall bias of the approximation is \(2 \cdot 2^{-6} \cdot (2^{-27})^2 = 2^{-59}\). (Actually, in their attack they used the lower value of \(2^{-60}\) due to the effect of other differentials.)

The Experimental Results of [20]. The authors of [20] checked experimentally the first 4 rounds of the DL distinguisher of [5] (i.e., a 4-round distinguisher which starts with the difference \(\varDelta _0\) and ends with the mask \(\lambda _1\)) and found that its bias is \(2^{-13.75}\), instead of the theoretical estimate \(2 \cdot 2^{-6} \cdot (2^{-5})^2 = 2^{-15}\). They concluded that the overall bias of the 9-round distinguisher is \(2^{-57.75}\) instead of \(2^{-59}\), and used the conclusion to extend the key-recovery attack based on the distinguisher from 11 rounds to 12 rounds.

Analysis Using the DLCT. We considered a 3-round variant of Serpent that starts at round 3, denoted it by \(E'\), and computed the normalized DLCT entry \(\overline{DLCT}_{E'}(\varDelta _1,\lambda _1)\). (Note that computing the entire DLCT for \(E'\) is infeasible. However, due to the low diffusion of Serpent, one can compute efficiently part of the entries, including the entry we needed). We obtained \(\overline{DLCT}_S(\varDelta _1,\lambda _1) = 2^{-8.68}\).

Using Eq. (4) (for the case of \(E_1'=Id\)) we conclude that the bias of the 4-round distinguisher examined in [20] is \(p_1 \cdot \overline{DLCT}_S(\varDelta _1,\lambda _1) = 2^{-5} \cdot 2^{-8.68} = 2^{-13.68}\), which is very close to the experimental result \(2^{-13.75}\) of [20].

Note that we obtained an estimate which is very close to the actual value, using only the naive Eq. (4) and not the more accurate Eq. (5) that takes into account the effect of other differentials. This shows that the increased bias found experimentally in [20] follows almost solely from dependency between the subciphers, and demonstrates how the DLCT methodology can be used for obtaining an accurate estimate of the DL attack complexity.

5 Improved Differential-Linear Attack on ICEPOLE

ICEPOLE is an authenticated encryption cipher based on the duplex construction proposed by Morawiecki et al. submitted to the CAESAR competition [32]. The main two versions, ICEPOLE128 and ICEPOLE128a are initialized with a 128-bit key. In addition ICEPOLE128 accepts 128-bit nonce and 128-bit secret message number, in comparison, ICEPOLE128a accepts 96-bit nonce and 0-bit secret message number (to serve as a drop-in replacement for AES-128-GCM).Footnote 3 After initialization, the associated data is absorbed into the 1280-bit state. For the processing of the plaintext (encryption and authentication), a block of 1024 bits is extracted (to be XORed to the plaintext) and the plaintext is XORed into 1024 bits of the state.Footnote 4 This state is then updated using 6-round Permutation \(P_6\) which iterates a round function P 6 times over 1280 bits. After the entire plaintext has been encrypted, a tag is produced by extracting bits of the internal state. The entire process is depicted in Fig. 2.Footnote 5

Fig. 2.
figure 2

General structure of ICEPOLE

After ICEPOLE has been introduced, Huang et al. presented a differential-linear attack against ICEPOLE-128 and ICEPOLE-128a [22]. The attack recovers the internal state using a differential-linear attack, where the bias of the differential-linear depends on the value of some bits. Hence, observing the bias in the output allows identifying internal state bits. After full recovery of the internal state, one can extract the secret key (as long as the scheme is not using secret message numbers) or forge new messages (when using secret message numbers).

5.1 A Short Description of ICEPOLE-128

We first note that there are three variants of ICEPOLE (ICEPOLE-128, ICEPOLE-128a, and ICEPOLE-256), but our attacks and description concern only the 128-bit variants, ICEPOLE-128 and ICEPOLE-128a.

The internal state of ICEPOLE, denoted by S is composed of 20 64-bit words organized in a 4-by-5 matrix. We follow [32] notations: the bit S[x][y][z] is the z’th bit of the word at position (xy) where \(0\le x\le 3\), \(0\le y\le 4\), and \(0\le z \le 63\). This bit is considered to be bit \(64(x+4y)+z\) of the state. The first n bits of S are denoted by by \(S_{\lfloor n \rfloor }\). The z’th slice of S is a 4-by-5 binary matrix \((S[x][y][z])_{x,y}\). When z and x are fixed, the 5-bit vector \(S([x][y][z])_y\) is called a row.

The round function P is composed of five operations, \(P=\kappa \circ \psi \circ \pi \circ \rho \circ \mu \) which are:

  • \(\mu \) operates on each of the 64 slices independently by treating each 20-bit slice as a vector \((Z_0,Z_1,Z_2,Z_3) \in GF(2^5)\) and multiplying this vector by the MDS matrix

    $$ M=\left( \begin{array}{cccc} 2 &{} 1 &{} 1 &{} 1\\ 1 &{} 1 &{} 18 &{} 2\\ 1 &{} 2 &{} 1 &{} 18\\ 1 &{} 18 &{} 2 &{} 1\\ \end{array} \right) . $$

    The multiplication is done over \(GF(2^5)\) (with the irreducible polynomial \(x^5+x^2+1\)).

  • \(\rho \) is a cyclic rotation applied to each of the 20 64-bit words of S. Each word (xy) is rotated by a different constant, i.e., \(S[x][y] = S[x][y] \lll \mathrm {offsets}[x][y]\) where the table of \(\mathrm {offsets}[x][y]\) can be found in [32].

  • \(\pi \) reorders the words in S by moving the word S[x][y] into \(S[x'][y']\) according to the formula:

    $$ \left\{ \begin{array}{l} x' = (x+y) \bmod {4}\\ y' = (((x+y) \bmod {4}) + y + 1)\bmod {5} \end{array} \right. $$
  • \(\psi \) applies a 5-bit S-box to each of the 256 rows of the state.

  • \(\kappa \) adds a round constant (\(\mathrm {constant}[\mathrm {round}]\)) to S[0][0]. The constants are generated by an LFSR and can be found in [32].

For the sake of clarity, we shall denote the linear parts of the round function by \(\mathcal{L} = \pi \circ \rho \circ \mu \).

As mentioned before, the internal state S is initialized using a constant which is XORed with (key||nonce) value. After than \(P_{12}\) (12 rounds of P are used to mix the key and nonce into the state). If secret message numbers are used, they are encrypted using the duplex operation. The associated data is chopped into blocks of 1026-bit each (after padding). They are absorbed into the state S, and then the encryption/authentication of the plaintext takes place following the duplex operation using \(P_6\). After the entire plaintext is processed, \(P_6\) is applied to the internal state (or \(P_{12}\) in ICEPOLEv2), and the 128-bit tag is computed as \(T_0 = S[0][0]\), \(T_1 = S[1][0]\).

In the followings, we use \(e_i\) to denote a 64-bit word which is 0 in all bits, besides the i’th bit.

5.2 Huang et al.’s Differential-Linear Attack on ICEPOLE-128/ICEPOLE-128a

Huang et al. have presented a differential-linear attack against ICEPOLE-128 and ICEPOLE-128a in the repeated nonce settings [22]. As the attack recovers the internal state, if the scheme is not using a secret message number, then one can obtain the key by inverting the internal state. Otherwise, the recovery of the internal state allows encrypting/authenticating any data.

The attack targets the first application of \(P_6\) after the plaintext is introduced by injecting differences through the plaintext block \(\sigma _0\) and observing biases in the key stream used to encrypt \(\sigma _1\). Its general structure is depicted in Fig. 3. For the sake of its description, we denote by I the input to \(P_6\) (after the XOR with the plaintext) and by O the output of \(P_6\). Moreover, we denote by \(\psi _i\), \(\mathcal{L}_i\), and \(\kappa _i\) the \(\psi \), \(\mathcal L\), and \(\kappa \) of the i’th round.

Fig. 3.
figure 3

Differential-linear attack on the “First” \(P_6\) of ICEPOLE

The attack of [22] uses the 1024 bits of I and O which can be easily obtained by knowing the plaintext and ciphertext blocks for a 2-block long message. The attack introduces differences in the first plaintext block, which does not affect the fifth column of I. This difference is transformed into an input difference \(\varDelta \) after the application of the linear layer \(\mathcal L\) (i.e., the introduced difference into the state is \(\mathcal{L}^{-1}(\varDelta )\)). Due to the MDS property of \(\mu \) and the zero difference in the fifth column, \(\varDelta \) must have at least two active S-boxes. The behavior of these two active S-boxes, namely, the probability of the differential transition through them highly depends on the actual value of some bit \(b_i\) (or two bits). Hence the input difference \(\varDelta ^*\) of the differential-linear approximation appears with different probabilities, depending on the value of this bit. Luckily for us, this bit (or pair of bits) is the outcome of XORing an unknown input bit (from the fifth column of I) with known bits (which are controlled by the adversary). This allows the adversary to partition the plaintext pairs into sets according to the possible values of \(b_i\), where for the “correct” set, we expect a significantly higher bias.

The differential-linear approximation \(\varDelta ^* \rightarrow \lambda \) covers the 4 rounds until round 6, and can be extended until the end of the linear layer \(\mathcal{L}_6\). We note that in \(\psi _1\), the differential characteristic in use is \(\varDelta ^* \rightarrow \varDelta = \varDelta ^*\). Interestingly, \(\psi \) has a very useful property – given the 4 least significant bits of the output, one can partially recover the input. Table 2 suggests the values, and the probability that partial information can be found given these 4 output bits. Hence, any differential-linear whose output mask can be deduced from the partial information can be used with some probability, which we denote by \(p_L\).

Table 2. Deducing input bits of \(\psi \) from the four LSBs of the output

Due to the structure of ICEPOLE, any differential characteristic and any linear approximation can be rotated (by rotating each word, respectively). Hence, the attack is repeated with the 64 rotated versions of the differential-linear approximation. Each time, it recovers the bit \(b_i\) (which affects bias of the approximation). This is done by encrypting multiple pairs of plaintexts with input difference \(\varDelta \) with two active S-boxes (and covering all possible values of the recovered bit \(b_i\)), and observing the set which has the highest bias.

Actually, instead of taking all ciphertext pairs, only the pairs which can be used to predict the input of \(\psi _6\) from their 1024-bit outputs O are considered (as each ciphertext can be used with probability, this is actually probability \(p_L^2\)). The different approach of extending the differential-linear approximation to cover \(\psi _6\) leads to much worse performance (as there are many active S-boxes in the \(\psi _6\) layer).

After studying the differentials and linear approximations that can be used for the attack, Huang et al. [22] identified 5 input difference patterns \(\varDelta _1,\varDelta _2,\ldots ,\varDelta _5\) that after \(\mathcal L\) activate only two S-boxes, as well as two good linear approximations \(\lambda ^{mid}_1 \rightarrow \lambda _1\) and \(\lambda ^{mid}_2\rightarrow \lambda _2\). Given the word-oriented nature of the permutation P, one can rotate the differences (or masks) by rotating each word of the mask/difference. Hence, for each bit position it is possible to consider all the combinations of \(\varDelta _i\) and \(\lambda _j\) and experimentally calculate the bias of the resulting differential-linear.

The actual attack tries to find the last column of I (as the first four can be trivially deduced). Denote this fifth column by the four words \((U_0,U_1,U_2,U_3)\). The first phase recovers \(U_0\) and \(U_3\), the second phase recovers \(U_2\), and finally the third phase recovers \(U_1\). All phases follow an essentially similar process – a differential-linear approximation is built from a differential characteristic which probability (in the first round) depends on the value of some specific (unknown) bits. Then, by observing enough plaintext/ciphertext pairs and evaluating the bias, one can determine the unknown bits, from which corresponding bits of \(U_i\) are computed.

Following the above steps, we give detailed explanation on how to find the first bits of \(U_0\) and \(U_3\) in the attack. For that phase, Huang et al. propose to use the following \(\varDelta _2\):

$$ \varDelta _2 = \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} e_{10} &{} e_{41} &{} 0 &{} 0 \end{array}\right) $$

which under \(\mathcal{L}^{-1}\) has differences in the LSBs of the words S[0][2], S[1][0], S[1][1], S[1][2], S[1][3], S[2][1], S[2][3], S[3][0], and S[3][2]. Another technicality is that the adversary fixes 18 bits (by knowing the first four columns of I she can select the corresponding \(\sigma ^0\)), then the probability of the differential transition depends on two unknown bits – for one of the four possibilities it is significantly higher probability than for the rest, as presented in Table 3. The specific fixed bits and their values is given in [22].

The linear mask in use ends with the mask

$$ \lambda _1 = \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} e_{33} &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right) \xrightarrow {\mathcal{L}_6} \lambda _{1}' = \left( \begin{array}{ccccc} e_{18} &{} e_0 &{} 0 &{} e_{43} &{} 0\\ 0 &{} e_2 &{} 0 &{} 0 &{} 0\\ 0 &{} e_{21} &{} e_{61} &{} 0 &{} 0\\ 0 &{} 0 &{} e_{41} &{} e_{56} &{} 0 \end{array}\right) . $$

Using Table 2, this suggests that with probability \(p_L = 2^{-6.45}\) one can compute the output mask from a given output O. As each pair requires the evaluation of two O’s, the probability that a pair can be used for the analysis is \(p_L^2 = 2^{-11.9}\).

For the specific differential-linear characteristics presented above the two unknown at the entrance of \(\psi _1\) which can be recovered are \(b_1 = U_3^{31} \oplus a_0\) and \(b_3 = U_0^{43} \oplus U_3^{43}\oplus a_1\), where \(a_0\) and \(a_1\) can be computed from the four known columns of I. Table 3 offers the different biases as a function of \(b_1\) and \(b_3\). These biases were experimentally computed in [22] by taking \(2^{30}\) plaintexts pairs with the required values fixed.

To conclude, given the above differential-linear characteristic, the recovery of two bits \(b_1\) and \(b_3\) is as follows:

  1. 1.

    Collect N plaintext pairs with the required input difference and the 18 bits fixed.

  2. 2.

    For each pair, check whether one can deduce the bits that enter the linear mask of \(\psi _6\) for both ciphertexts.

  3. 3.

    Divide the remaining pairs into four sets according to the value of unknown values of \(b_1\) and \(b_3\).Footnote 6

  4. 4.

    Find the set with the maximal bias (which suggests the correct values of \(b_1\) and \(b_3\)). Compute from \(b_1,b_3\) and the known bits the value of the unknown bits.

The analysis shows that when taking \(N=2^{33.9}\) plaintext pairs (with 18 bits fixed) we obtain about \(2^{33.9} \cdot p_L^2 = 2^{21}\) pairs which can be analyzed (as we know the input linear mask to \(\psi _6\)). These pairs can be divided into four sets of about \(2^{19}\) pairs each, one of which with a bias of \(2^{-7.3}\), which is significantly higher than the rest, and thus can be easily detected. The data complexity is thus \(2^{33.9}\) pairs of 2-block messages for each pair of bits \(b_1,b_3\) recovered, or a total of \(64 \cdot 2 \cdot 2 \cdot 2^{33.9} = 2^{41.9}\) 1024-bit data blocks. We list in Table 3 the different bits recovered in each phase, the relevant \(p_L\), and the data complexity. The full details are available at [22].

5.3 Our New Results on ICEPOLE-128/ICEPOLE-128a

The main reason the attack of [22] used a single-bit mask for the output is to ensure a low hamming weight mask. This was chosen to optimize the two conflicting effects of \(\lambda \) on the complexity of the attack – the more active bits in \(\mathcal{L}_6(\lambda )\) there are (which translates to more active bits in \(\lambda \) when \(\lambda \) is of low hamming weight), the lower \(p_L\) is. At the same time, \(\lambda \) affects the bias as it sets the output mask of \(\psi _5\), suggesting that constraints on \(\lambda \) may lead to sub-optimal linear approximations.

Moreover, as the actual biases were measured experimentally (rather than analytically) we decided to pick a slightly different approach. Instead of studying single-bit \(\lambda \) we decided to try output masks with a single active S-box. This allowed raising the bias of the transition in \(\psi _5\) from at most 3/16 to 4/16 (which is significant due to the quadratic effect on the bias, which translates to a quadratic effect on the data and time complexities).

Table 3. The different phases of the attack of [22]

The increase in the number of possible output masks carries with it a computational problem – one needs to cover more masks in the process of computing the bias, by a factor of almost 6, for any chosen input difference. Hence, instead of relying on multiple time consuming experiments for each input difference \(\varDelta \), we use the DLCT of \(\psi \) to obtain estimates for the bias of the differential-linear approximation. This is done by taking the input difference \(\varDelta \) and computing for each S-box in \(\psi _5\) the distribution of input differences (i.e., if the input difference is \(\delta \), determining \(p_\delta \)). Then, for all the active S-boxes in the mask leaving \(\psi _5\), for each S-box’ mask \(\omega \) we compute \(\sum _{\delta } p_\delta \cdot DLCT(\delta ,\omega )\) to evaluate the probability of the differential-linear transition in \(\psi _5\). As the evaluation of \(p_d\) for each S-box is independent of the mask \(\omega \), and as the DLCT is computed once, this offers a very efficient procedure.

The result is the discovery of better differential-linear approximation for the second and third phase. We give in Table 4 the new differential-linear approximations used in the second and third phase. Due to the reduced data required in the later phases, we also reduce the data complexity in the first round (to reduce the total data complexity) and change a bit the constraints on the actual values (but they serve the same purpose as in the original attack). One main difference is that the constraints are not on the values, but rather on the parity of some subsets of bits. We list these subsets in Table 5.

Table 4. Our new different-linear approximations for phases 2 and 3 of the attack
Table 5. Bit subsets fixed for our attack

Using the new differential-linear approximations (and bit-fixing) we obtain an attack on the full ICEPOLE in complexity of \(2^{41.58}\) data and time. Its phases are listed in Table 6. We note that the first phase has a slightly lower success rate. After presenting a new approach for generating the data that further reduces the data and time (to \(2^{35.85}\)) we discuss how to mitigate this slightly lower success rate.

Table 6. The different phases of the new attack

Efficient Data Generation. We note that each of the three phases of the attack is composed of 64 applications of the same attack up to rotating the differences/masks. Hence, if each such attack requires about N plaintext pairs, a trivial implementation requires 64N plaintext pairs. Luckily, there is a more efficient way to do so.

Our key observation is that one can select \(P_i\) in advance to satisfy all the conditions of the 64 possible rotations. This can be easily done when there is at least one word which has no conditions/restrictions. For such a \(P_i\) we test for each of the 64 rotations whether one can deduce the needed bits at the input of \(\psi _6\). If so, we generate its counterpart \(P_i^*\) which satisfies the required difference, and apply the attack as before (with probability of \(p_L\) that \(P_i^*\) allows recovering the input of \(\psi _6\)).

This reduces the data complexity of Phase 1 from \(2\cdot 2 \cdot 64 \cdot 2^{32.8}\) plaintexts to \(2\cdot (2^{32.8}+64\cdot 2^{32.8-6.45}) = 2^{34.6}\) plaintexts (as for each of the 64 rotations of the differential-linear approximation there is probability of \(p_L\) that \(P_i\) is useful). Similar analysis reduces the data complexity of the second phase to \(2\cdot (2^{31.33} + 64 \cdot 2^{31.33-4.77}) = 2^{34.07}\) which is the same also for the third phase. Hence, the total data complexity of the attack is \(2^{35.85}\) chosen plaintexts.

We note that the reduced data complexity of the first phase may negatively affect the success rate. Moreover, an error in the first phase is expected to cause errors in the next phases. However, we note that one can easily test the obtained values for correctness. If the recovered internal state is not accurate, the adversary can exhaustively test internal states of hamming distance up to 5 from the extracted one in time of \(\left( \begin{array}{c}256 \\ 5\end{array}\right) \approx 2^{33.1}\) recomputations (using simple linear algebra). In other words, as long as the attack has at most 5 wrong bits, the correct internal value can be extracted.

5.4 Experimental Verification of Our Attack

We have experimentally verified our attack. We run the full attack 16 times, using a random key and nonce. The machine was a virtual machine on the Azure infrastructure (instance Standard_F64s_v2). The machine has 64 vCPUs (Intel Xeon 8168 processor) with 128 GB RAM running Ubuntu 18.04.1 TLS. We have used the official ICEPOLE code (written in C), while our attack was written in C++. Compiling with gcc-7.3.0 using the optimization flag -O3, each of the attack’s instances took about an hour. Its code is available at https://github.com/cryptobiu.

Out of the 16 experiments, 11 recovered the exact internal state. In 4 of them, a single-bit error took place in phase 3 of the attack (resulting in a single-bit error in the proposed internal state). Finally, in one experiment, a single-bit error took place in the first phase, resulting in three bits error (single-bit error in the second phase and in the third-phase). Of course, once the single-bit error in the first phase is fixed, then the errors in the other phases are resolved as well. Hence, we conclude that all experiments succeeded to recover the internal state (or were sufficiently close to the correct one) using \(2^{34.85}\) 2-block plaintexts.

6 Improved Differential-Linear Attack on 8-Round DES

refined analysis of the DLCT, we found out that the attack can be improved by replacing the differential characteristic and the linear approximation with another combination of a characteristic and an approximation, which leads to a higher bias due to dependency between the two underlying subciphers. First we briefly recall the structure of DES and describe the attack of [4], and then we present our improved attack.

In this section we use the DLCT methodology to revisit the DL attack on 8-round DES [35] presented by Biham et al. [4]. We show that the attack can be improved by replacing the differential characteristic and the linear approximation with another combination of a characteristic and an approximation, which leads to a higher bias due to dependency between the two underlying subciphers.

6.1 The DL Attack of [4] on 8-Round DES

The attack of [4] is based on a 7-round DL distinguisher. Denote a 7-round variant of DES by E. The distinguisher uses the decomposition \(E= E_1 \circ E_0\), where \(E_0\) consists of rounds 1–4 and \(E_1\) consists of rounds 5–7. For \(E_0\), it uses the truncated differential

$$ 0x0080 8200~6000 0000 = \varDelta _I \xrightarrow [E_0]{p\,=\,\frac{14}{64}} \varDelta _O = 0x???? M???~00W0 XY0Z, $$

where \(M \in \{0,1,2,\ldots ,7\}, W,X \in \{0,8\},\) and \(Y,Z \in \{0,2\}\). The characteristic is composed of a 1-round characteristic with probability \(\frac{14}{64}\) and a 3-round truncated characteristic with probability 1.

For \(E_1\) (rounds 5–7), it uses the linear approximation

$$ 0x2104 0080~0000 8000 = \lambda _I \xrightarrow [E_1]{q\,=\,2 \cdot (\frac{-20}{64})^2} \lambda _I. $$

Note that all nonzero bits of \(\lambda _I\) are included in the bits that are known to be 0 in \(\varDelta _O\). Using the naive complexity analysis of the DL attack (i.e., Eq. (2) above), the authors of [4] concluded that the overall bias of the approximation is \(2pq^2 = 2^{-5.91}\).

6.2 Our Improved DL Attack on 8-Round DES

At a first glance, it seems unlikely that the distinguisher of [4] can be improved. Indeed, the linear approximation it uses is known to be the best 3-round linear approximation of DES, and the only round in the differential characteristic whose probability is less than 1, is almost the best possible (the highest possible probability being \(\frac{16}{64}\)). In fact, we verified experimentally that for any other combination of a differential with probability \(p'\) and a linear approximation with bias \(q'\), we have \(2p'(q')^2 < 2^{-5.91}\).

Nevertheless, we obtain a higher bias, using the dependency between the subciphers. We decompose E as \(E=E_1' \circ E_m \circ E_0'\), where \(E_0'\) consists of rounds 1–2, \(E_m\) consists of rounds 3–5, and \(E_1'\) consists of rounds 6–7. For \(E_0'\), we use the differential

$$ 0x0020 0008~0000 0400 = \varDelta _I \xrightarrow [E_0']{p'\,=\,\frac{16}{64}}\varDelta _O = 0x6000 0000~0000 0000. $$

For \(E_1'\), we use the linear approximation

$$ 0x0080 8202~0000 0000 = \lambda _I \xrightarrow [E_1']{q'\,=\,\frac{-18}{64}} \lambda _O = 0080 8202~8000 0000. $$

For \(E_m\), we use the DLCT entryFootnote 7

$$ \overline{DLCT}_{E_m}(0x6000 0000~0000 0000, 0x0080 8202~0000 0000) \approx 0.26. $$

Using Eq. (4) above, we find that the overall bias of our approximation is

$$\begin{aligned} 4p' \cdot \overline{DLCT}_{E_m}(\varDelta _O,\lambda _I) \cdot (q')^2 = 4 \cdot \left( \frac{16}{64}\right) ^2 \cdot 0.26 \cdot \left( \frac{-18}{64}\right) ^2 = 2^{-5.6}. \end{aligned}$$
(6)

Since the data complexity of the DL attack is quadratic in the bias, the improvement from \(2^{-5.91}\) to \(2^{-5.6}\) reduces the data complexity of the attack of [4] on 8-round DES by a factor of about 1.5.

Comparison Between our Distinguisher and the Distinguisher of [4]. In order to compare our distinguisher to that of [4], we present the latter within the DLCT framework. It is composed of the differential \(0x0020 0008~0000 0400 \xrightarrow {p'\,=\,\frac{14}{64}} 0x0000 0400~0000 0000\) for \(E_0\), the linear approximation \(0x2104 0080~0000 0000 \xrightarrow {q'\,=\,\frac{-18}{64}} 0x2104 0080~0000 8000\) for \(E_2\), and the DLCT entry \(\overline{DLCT}_{E_m}(0x0000 0400~0000 0000,0x2104 0080~0000 0000) \approx 0.24\). Using Eq. (4), its overall bias is \(2^{-5.81}\). Note that while the value \(p' (q')^2\) in the distinguisher of [4] is larger than the corresponding value in our distinguisher, the overall bias we obtain is higher due to the larger value in the DLCT. This emphasizes that the advantage of our new DL distinguisher stems mainly from dependency between the two subciphers, reflected in the DLCT.

Experimental Verification. We experimentally verified the bias of our DL distinguisher, using 100 different keys and 500,000 plaintext pairs for each key. The average bias found in the experiments was \(2^{-5.58}\), and the standard deviation was \(2^{-10.43}\). This shows that the theoretical estimate of the bias using Eq. (4) is tight in our case, and thus, demonstrates the strength of the DLCT as a tool for accurate evaluation of the DL attack complexity.

For sake of completeness, we verified experimentally also the distinguisher of Biham et al. We checked 100 different keys and 500,000 plaintext pairs for each key. The average bias found in the experiments was \(2^{-5.72}\), and the standard deviation was \(2^{-10.56}\). In addition, we verified that our DL distinguisher has the maximal bias among all 7-round DL distinguishers that start and end with a single active S-box. While we could not check 7-round DL distinguishers with more than one active S-box in the input difference or in the output bias, it seems highly unlikely that such a distinguisher will have a higher bias, even if it exploits the dependency between the subciphers.

Another 7-round DL Distinguisher Used in [4]. The authors of [4] present another 7-round DL distinguisher of DES, which they use in the key recovery attack on 9-round DES. (Its bias is somewhat lower, but it activates less S-boxes in the round before the distinguisher). We checked this distinguisher using the DLCT framework and found that its bias is \(2^{-5.95}\), instead of \(2^{-6.13}\) computed in [4]. We verified experimentally this result as well, and obtained average bias of \(2^{-5.94}\) and standard deviation of \(2^{-10.53}\). This slightly improved bias reduces the data complexity of the attack of [4] on 9-round DES by a factor of about 1.3.

7 Summary and Conclusions

In this paper we studied the effect of the dependency between the subciphers on the differential-linear attack. We showed that in various cases of interest, including previously published DL attacks on Ascon and Serpent, the dependency significantly affects the attack’s complexity. We presented a new tool – the differential-linear connectivity table (DLCT) – which allows to (partially) take the dependency into account and to use it for making DL attacks more efficient. We showed a relation of the DLCT to the Fourier transform and deduced from it a new theoretical insight on the differential-linear attack. Finally, we demonstrated the strength of our new tool, by improving previously published DL attacks against ICEPOLE and 8-round DES.

Our objective in this paper was to introduce the DLCT and to present a few initial applications. Thus, several natural research directions are left for future work. The first is formalizing the relation of the DLCT with consideration of multiple linear approximations, as was done for the basic DL framework by Blondeau, Leander, and Nyberg [10]. The second is finding a way to extend the DLCT methodology so that it will cover more rounds at the boundary between \(E_0\) and \(E_1\). The third direction is studying properties of the DLCT, in a similar way to the way the properties of the BCT were recently studied by Boura and Canteaut [12]. The fourth direction is finding other applications of the DLCT. We believe that the DLCT is a useful generic tool, and so, we expect more applications to be found.