Keywords

1 Introduction

The meet-in-the-middle (MITM) approach is a generic technique for cryptanalysis of symmetric-key primitives, which was first introduced by Diffie and Hellman in 1977 for attacking block ciphers [18]. Many variants of this technique can be found in the literature [10, 19,20,21, 25]. Its basic idea is best illustrated by performing an MITM attack on a block cipher deliberately made susceptible to this type of attacks. Let \(E_K(\cdot )\) be a block cipher whose block size is n-bit such that \(C = E_K(P) = F_{K_2}(F_{K_1}(P))\), where \(K = K_1 || K_2\), and \(K_1\) and \(K_2\) are independent key materials. Therefore, for a given pair of plaintext-ciphertext pair (PC), the intermediate value V can be computed independently as \(F_{K_1}(P)\) and \(F_{K_2}^{-1}(C)\) with independent guesses of \(K_1\) and \(K_2\). The correct key guess necessarily satisfies \(F_{K_1}(P) = F_{K_2}^{-1}(C)\). Therefore, by searching collisions on the intermediate values computed from P and C, one can reduce the search space from \(2^{|K|} = 2^{|K_1| + |K_2|}\) to \(2^{|K_1| + |K_2| - n}\) with time complexity \(2^{|K_1|} + 2^{|K_2|}\). The remaining key space with \(2^{|K_1| + |K_2| - n}\) candidates can be tested against several known plaintext-ciphertext pairs to identify the unique secret key.

However, in practice, it is rare that a target cipher can be clearly separated into two independent halves as the above doubly cascaded F with independent key materials. When a clear separation into two independent chunks is not possible, a variant of the basic MITM strategy (known as three-subset MITM attack) is available. This method was originally proposed by Bogdanov and Rechberger [12], applied to many ciphers [12, 35, 47, 51], and was well summarized by Isobe [33]. Again, let us briefly demonstrate this technique on an ill-designed example with respect the three-subset MITM attack. Let \(E_K(\cdot )\) be a block cipher whose block size is n-bit such that it can be divided into three chunks as \(C = E_K(P) = H_{K_3 || K_2} (G_{K_1||K_2||K_3}(F_{K_1 || K_2}(P)))\), where \(K = K_1 || K_2 || K_3\) and \(K_1\), \(K_2\), \(K_3\) are independent. Moreover, some m-bit (\(m<n\)) information of a state value inside G can be partially computed along the forward direction from \(F_{K_1 || K_2}(P)\) without the knowledge of \(K_3\), or computed along the backward direction from \(H_{K_3 || K_2}^{-1}(C)\) without the knowledge of \(K_1\). The three-subset MITM attack partitions the search space with \(2^{|K|} = 2^{|K_1| + |K_2| + |K_3|}\) elements into \(2^{|K_2|}\) subspaces of equal size according to the value of \(|K_2|\). For each subspace, where the value of \(|K_2|\) is fixed, one can perform the basic MITM attack with partial match to reduce the size of the search space from \(2^{|K_1| + |K_3|}\) to \(2^{|K_1| + |K_3| - m }\) with time complexity \(2^{|K_1|} + 2^{|K_3|}\). Under our terminology, which will be introduced in Sect. 2, one run of the basic version of the MITM attack with a fixed \(K_2\) is called one MITM episode. To identify the correct key, \(2^{|K_2|}\) episodes have to be performed. Therefore, the overall time complexity can be estimated as \(2^{|K_2|}( 2^{|K_1|} + 2^{|K_3|} + 2^{|K_1| + |K_3| - m} )\). This technique has been applied to many block ciphers [10, 12, 33, 34, 47, 51].

Although the MITM technique was originally introduced for attacking block ciphers, its development seems to be largely cultivated and promoted in the cryptanalysis of hash functions. In 2008, Sasaki and Aoki successfully achieved preimage attacks on several full versions of HAVAL by combining the MITM approach with the local collision technique [48]. From then on, many MITM preimage attacks together with their enhancements and improvements targeting various hash functions emerged in the literature [1, 2, 4, 6, 30, 31, 40, 46, 49, 56, 57]. Along the way, several important techniques arise which significantly enhance and enrich the MITM methodology, including the splice-and-cut technique [3], the concept of initial structure [49], (indirect-)partial matching [3, 49], sieve-in-the-middle [15] and match-box technique [27]. Some techniques are formalized as bicliques [11, 39] and further perceived from differential views [26, 40]. These developments in the context of cryptanalysis of hash functions were finally found to be applicable in the MITM attacks on block ciphers. In [58], Wei et al. first applied the splice-and-cut technique to the MITM attacks on block ciphers by connecting the plaintext and ciphertext states with encryption or decryption oracles.

Despite that the principle of how to combine all these techniques in MITM attacks is quite clear, to actually apply them in practice effectively and efficiently is complicated, tedious, and error-prone. Recently, (semi) automatic tools are developed to explore the configuration space of MITM attacks in a more systematic approach. In [47], Sasaki proposed an MILP-based method to search for optimal independent key bits used in the three-subset MITM key-recovery attacks on GIFT [5]. However, Sasaki’s model is not general enough and the possible positions of neutral words are prefixed. At EUROCRYPT 2021, the MITM preimage attacks on AES-like hashing was throughly modeled as constrained optimization problems which were solved with MILP techniques [6]. This approach outperforms previous work done manually, and many attacks on AES-like hashing [41, 46, 59] are shown to have room to be further improved. However, this method is described in a way specific to preimage attacks and do not translate directly to MITM-based key-recovery or collision attacks.

Our Contribution. We describe the MITM attacksFootnote 1 in a unified way as MITM attacks on the so-called closed computation path. This view has been long known to our community. Nevertheless, we believe that our treatment is more formal and general. In particular, by introducing some new concepts, we make the description of MITM attacks more expressive and accurate.

Then, we focus our attention on MITM key-recovery and collision attacks on block ciphers and hash functions. We identify the peculiarities specific to these scenarios and show how to deal with them automatically. For the MITM characteristics employed in key-recovery attacks, the degrees of freedom originated from the states in the key schedule data path must not be depleted, while the degrees of freedom originated from the encryption data path must be used up. Also, when searching for candidate configurations for the MITM key-recovery attacks, we should avoid those configurations that lead to attacks requiring the full codebook. We apply our methods to concrete block ciphers SKINNY and ForkSkinny. and we identify the first 23-round attack on SKINNY-n-3n in the single-key model, penetrating one more round than the designers have expected: We conclude that meet-in-the-middle attack may work up to at most 22 rounds [9, Sect. 4.2, page 22]. Interestingly, the characteristics we employed in these attacks impose nontrivial constraints on the neutral words from the key states, which has not been seen before. For collision attacks, they are based on a generalized version of the t-cell partial target preimage attacks, where the words of the target value fulfill t (word-oriented) equations.

Table 1. Single-key attacks (SK) on SKINNY-n-3n and ForkSkinny-n-3n, where ID and DS-MITM denote impossible differential and Demirci-Selçuk MITM attacks, respectively.

Finally, we perform MITM preimage and collision attacks on concrete hash functions (e.g., Romulus-H [36], Saturnin [14], WHIRLPOOL [8], and Grøstl [28]). In the attacks on certain hash functions, we encounter some special MITM characteristics where the neutral words are nonlinearly constrained. In previous work, the neutral words are linearly constrained and thus the solution space of the neutral words can be obtained efficiently by solving the corresponding system of linear equations. For nonlinear equations, this approach would significantly increase the complexities. We propose a technique that is applicable to both the non-linearly and linearly constrained neutral words, overcoming this difficulty without increasing the time complexity of the attacks. Based on this technique, we improve the (pseudo) preimage attacks on round-reduced Grøstl-256 and its output transformation by one round. For collision attacks, the first 6-round classical collision attack on WHIRLPOOL is provided, breaking a 10-year record for collision attacks on WHIRLPOOL in the classical setting. Also, we give the first 6-round collision attack and 8-round collision attack on the output transformations of Grøstl -256 and Grøstl -512, respectively. Interestingly, we notice that all competitive collision attacks on these AES-like hashings are based on the rebound technique [44]. In addition, we offer the first third-party cryptanalysis of Saturnin-Hash [14], a second round candidate of the NIST LWC project. A summary of our results on concrete primitives is given in Table 1 and Table 2. The source code of the paper is available at https://github.com/siweisun/mitm-attacks-revisited.

Table 2. A Summary of the results. Note that we only consider preimage and collision attacks. Distinguishing attacks [13, 37, 42, 50] are not included. Also, note that the complexity of the preimage attack on Romulus-H is \(2^{248}\). This attack does not break 23-round Romulus-H since the designers only claim 128-bit security. However, this complexity is better than an exhaustive search, whose complexity is \(2^{256}\). Similarly, Saturnin claims only 224-bit security.

2 A Formal Description of the MITM Technique

We now formally describe the MITM attacks with the notations introduced by Bao et al.’s work [6] in a more unified way. We encourage the readers to carefully go through this section since it not only serves as a recall of Bao et al.’s work, but also introduces some new terminologies that enhance the expressiveness and accuracy of the descriptions of MITM attacks.

Given a computation path that forms a “closed loop”, the ultimate goal of the meet-in-the-middle attack is to find a particular value for some intermediate states with which the values for all the states involved in the computation path can be determined, such that the values are compatible with the whole computation path (there are no conflicts between the values due to the involved computation). Let us descend from the abstract highland and consider the closed computation path shown in Fig. 1. The upper segment of the computation path constitutes an iterative block cipher with an iterative key schedule, and we assume that the states involved in the encryption data path and key schedule data path contains n and \(\bar{n}\) w-bit words respectively, which are typically visualized as rectangles with n and \(\bar{n}\) cells, respectively. The lower segment of the computation path can be arbitrary. In our context, it can be an oracle of the block cipher appearing in the upper segment of the computation path when we consider an MITM key-recovery attack, or a simple exclusive-or of a given target value when we consider preimage attacks. Before we can perform an MITM attack on the computation path, a configuration or an MITM characteristic has to be identified.

Fig. 1.
figure 1

A high-level overview of the MITM attacks

MITM Characteristics and Their Visualization. The MITM attack entails the identification of several special states: the starting state \(\# S^{\mathtt {ENC}}\) (see Fig. 1) in the encryption data path, the starting state \(\# S^{\mathtt {KSA}}\) in the key schedule data path, the ending state \(\# E^{+}\) for the forward computation (the computation path starting from \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\) leading to \(\# E^{+}\)), and the ending state \(\# E^{-}\) for the backward computation (the computation path starting from \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\) leading to \(\# E^{-}\)). Moreover, the cells of \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\) are partitioned into different subsets with different meanings. Let \(\mathcal {B}^\mathtt {ENC}\), \(\mathcal {B}^\mathtt {KSA}\), \(\mathcal {R}^\mathtt {ENC}\), \(\mathcal {R}^\mathtt {KSA}\), \(\mathcal {M}^{+}\), and \(\mathcal {M}^{-}\) be some ordered subsets of \(\mathcal {N} = \{0, 1, \cdots , n-1 \}\) or \(\overline{\mathcal {N}} = \{0, 1, \cdots , \bar{n}-1 \}\) such that \(\mathcal {B}^\mathtt {ENC} \cap \mathcal {R}^\mathtt {ENC} = \emptyset \), \(\mathcal {B}^\mathtt {KSA} \cap \mathcal {R}^\mathtt {KSA} = \emptyset \), \(\mathcal {G}^{\texttt {ENC}} = \mathcal {N} - \mathcal {B}^{\mathtt {ENC}} \cup \mathcal {R}^{\mathtt {ENC}}\) and \(\mathcal {G}^{\texttt {KSA}} = \overline{\mathcal {N}} - \mathcal {B}^{\mathtt {KSA}} \cup \mathcal {R}^{\mathtt {KSA}}\). We will use these index sets to reference the cells of the states. For example, for a 16-cell state \(\#S\) and \(\mathcal {M}^{+} = [0, 1, 3]\), we have \(\#S[\mathcal {M}^{+}] = \#S[0,1,3] = (\#S[0], \#S[1], \#S[3])\).

The cells \((\# S^{\mathtt {ENC}}[\mathcal {B}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {B}^\mathtt {KSA}])\), visualized as cells, are called neutral words of the forward computation, and the cells \((\# S^{\mathtt {ENC}}[\mathcal {R}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {R}^\mathtt {KSA}])\), visualized as cells, are called neutral words of the backward computation. The initial degrees of freedom for the forward and backward computation are defined as \(\lambda ^{+} = |\mathcal {B}^\mathtt {ENC}| + |\mathcal {B}^\mathtt {KSA}|\) and \(\lambda ^{-} = |\mathcal {R}^\mathtt {ENC}| + |\mathcal {R}^\mathtt {KSA}|\) respectively, that is, the numbers of cells and cells in the starting states. In addition, \(E^{+}[\mathcal {M}^{+}]\) are visualized as cells, and \(E^{-}[\mathcal {M}^{-}]\) are visualized as cells. Finally, \(\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}]\) and \(\# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}]\) are visualized as cells.

We then define a sequence of \(l^{+}\) functions \(\boldsymbol{\pi ^{+}} = ( \pi ^{+}_1, \cdots , \pi ^{+}_{l^{+}})\) whose values can be computed with the knowledge of the cells \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}])\) and cells \((\# S^{\mathtt {ENC}}[\mathcal {B}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {B}^\mathtt {KSA}])\) in the starting states, where

$$\pi ^{+}_i: \mathbb {F}_2^{w \cdot (|\mathcal {G}^{\texttt {ENC}}| + |\mathcal {G}^{\texttt {KSA}}| + |\mathcal {B}^\mathtt {ENC}| + |\mathcal {B}^\mathtt {KSA}|) } \rightarrow \mathbb {F}_2^w$$

is a function mapping \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {B}^\mathtt {KSA}] )\) to a w-bit word \(\pi ^{+}_i(\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {B}^\mathtt {KSA}] )\). Similarly, we define a sequence of \(l^{-}\) functions \(\boldsymbol{\pi ^{-}} = ( \pi ^{-}_1, \cdots , \pi ^{-}_{l^{-}})\) whose values can be computed with the knowledge of the cells \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}])\) and cells \((\# S^{\mathtt {ENC}}[\mathcal {R}^\mathtt {ENC}], \# S^{\mathtt {KSA}}[\mathcal {R}^\mathtt {KSA}])\). \(\boldsymbol{\pi ^{+}}\) and \(\boldsymbol{\pi ^{-}}\) will be used to represent certain constraints on the neutral words of the forward and backward computations, respectively. A valid MITM characteristic satisfies the following property.

Property 1

For any fixed \(\mathfrak {c}^{+} = (a_1, \cdots , a_{l^{+}}) \in \mathbb {F}_2^{ w \cdot l^{+} }\) and \(\mathfrak {c}^{-} = (b_1, \cdots , b_{l^{-}}) \in \mathbb {F}_2^{ w \cdot l^{-} }\), when the cells \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}] )\) are fixed to an arbitrary constant, and the neutral words for the forward computation and backward computation paths fulfill the following systems of equations:

$$\begin{aligned} {\left\{ \begin{array}{ll} \pi ^{+}_1(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}] ) = a_1\\ \pi ^{+}_2(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}] ) = a_2\\ ~~~~~~~~~~~~~~~~~~\cdots ~~\cdots \\ \pi ^{+}_{l^{+}}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}] ) = a_{l^{+}} \end{array}\right. } \end{aligned}$$
(1)

and

$$\begin{aligned} {\left\{ \begin{array}{ll} \pi ^{-}_1(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ) = b_1\\ \pi ^{-}_2(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ) = b_2\\ ~~~~~~~~~~~~~~~~~~\cdots ~~\cdots \\ \pi ^{-}_{l^{-}}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ) = b_{l^{-}} \end{array}\right. } \end{aligned}$$
(2)

respectively, then the values of the cells \(\# E^{+}[\mathcal {M}^{+}]\) can be derived from the starting states \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\) along the forward computation path without the knowledge of the neutral words for the backward computation, and the values of the cells \(\# E^{-}[\mathcal {M}^{-}]\) can be derived from the starting states \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\) along the backward computation path without the knowledge of the neutral words for the forward computation. In short, computations for deriving \(\# E^{-}[\mathcal {M}^{+}]\) and \(\# E^{-}[\mathcal {M}^{-}]\) can be carried out independently.

Let us talk more about Property 1. For any given \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}])\) and \(\mathfrak {c}^{+} = (a_1, \cdots , a_{l^{+}})\), the solution space of \((\# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}])\) induced by Eq. (1) is denoted by

$$ \mathbb {B}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+}). $$

Since there are \(\lambda ^{+} = |\mathcal {B}^{\mathtt {ENC}}| + |\mathcal {B}^{\mathtt {KSA}}|\) w-bit variables and \(l^{+}\) equations, we expect \(2^{w \cdot (\lambda ^{+} - l^{+})}\) solutions, and we call \(\mathrm {DoF}^{+} = \lambda ^{+} - l^{+}\) the degrees of freedom for the forward computation. Similarly, the solution space of \((\# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}])\) induced by Eq. (2) is denoted by \(\mathbb {R}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{-})\). Since there are \(\lambda ^{-} = |\mathcal {R}^{\mathtt {ENC}}| + |\mathcal {R}^{\mathtt {KSA}}|\) w-bit variables and \(l^{-}\) equations, we expect \(2^{w \cdot (\lambda ^{-} - l^{-})}\) solutions, and we call \(\mathrm {DoF}^{-} = \lambda ^{-} - l^{-}\) the degrees of freedom for the backward computation.

Let \(F^{+}\) be the function computing \(\# E^{+}[\mathcal {M}^{+}]\) from \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\), that is, \(\# E^{+}[\mathcal {M}^{+}]\) can be computed as

$$\begin{aligned} F^{+}( \# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], ~\# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}],~ \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}],\# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ), \end{aligned}$$

and similarly, \(\# E^{-}[\mathcal {M}^{-}]\) can be computed as

$$\begin{aligned} F^{-}( \# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}} ). \end{aligned}$$

Property 1 implies that

$$\begin{aligned} F^{-}( \alpha , x, \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ) = F^{-}( \alpha , y, \# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}] ) \end{aligned}$$

for any given \(x, y \in \mathbb {B}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+})\) and \(\alpha \in \mathbb {F}_2^{|\mathcal {G}^{\mathtt {ENC}}|+|\mathcal {G}^{\mathtt {KSA}}|}\). Similarly, for any \(u, v \in \mathbb {R}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{-})\), we have

$$\begin{aligned} F^{+}( \alpha , \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}], u) = F^{+}( \alpha , \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}], v). \end{aligned}$$

Consequently, for any given \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}]) = \alpha \), and \(\mathfrak {c}^{+}\), and \(\mathfrak {c}^{-}\), we can perform a matching process given in Algorithm 1.

figure n

In real MITM attacks, Algorithm 1 will be performed multiple times for many different \(\alpha \), \(\mathfrak {c}^{+}\), and \(\mathfrak {c}^{-}\), each time is called one MITM episode. Variables that remain constant within each episode are called episodic constants, and variables remain constant in the whole life cycle of an attack (remaining constant across different episodes) are called global constants. Thus global constants are always episodic constants. The cells used in [6] and this work capture the episodic constants, whose values can change across different episodes.

Within each episode, \((2^w)^{\mathrm {DoF}^{+}}\) times of forward computation are carried out, and \((2^w)^{\mathrm {DoF}^{-}}\) times of backward computations are carried out, which are referred to as forward threads and backward threads. Each forward thread and backward thread within the same episode gives a pair of values for \((\# E^{+}[\mathcal {M}^+], \# E^{-}[\mathcal {M}^-])\) which are computed along the forward and backward computation paths from a common value of the starting states \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\), and thus can be tested for match according to the computation connecting \(\#E^{+}\) and \(\#E^{-}\) in the closed loop. Note that testing pairs computed from different values of the starting point (e.g., pairs formed from different episodes) is meaningless. In each episode, we have \((2^w)^{\mathrm {DoF}^{+}+\mathrm {DoF}^{-}}\) paired threads. If the computation connecting \(\# E^{+}[\mathcal {M}^+]\) and \(\#E^{-}[\mathcal {M}^-]\) forms an m-cell filter, then there are about \((2^w)^{\mathrm {DoF}^{+}+\mathrm {DoF}^{-} - m}\) paired threads will pass the filter and be tested for a full match. We call \(\mathrm {DoM} = m\) the degrees of match or the strength of the filter. Finally, we emphasize again that the MITM procedure given in Algorithm 1 is performed for some fixed \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}],\mathfrak {c}^{+}, \mathfrak {c}^{-})\), and we say \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\texttt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\texttt {KSA}}],\mathfrak {c}^{+}, \mathfrak {c}^{-})\) defines the context of the MITM episode.

Automatic Search for MITM Characteristics. For a given closed computation path shown in Fig. 1, a configuration of the states \(\# S^{\mathtt {ENC}}\), \(\# S^{\mathtt {KSA}}\), \(\# E^{+}\), \(\# E^{-}\), and the parameters \(\mathcal {B}^\mathtt {ENC}\), \(\mathcal {B}^\mathtt {KSA}\), \(\mathcal {R}^\mathtt {ENC}\), \(\mathcal {R}^\mathtt {KSA}\), \(\mathcal {M}^{+}\), \(\mathcal {M}^{-}\), \(\mathrm {DoF}^{+}\), \(\mathrm {DoF}^{-}\), \(\boldsymbol{\pi ^{+}}\), \(\boldsymbol{\pi ^{-}}\), and \(\mathrm {DoM}\) satisfying Property 1 is called an MITM characteristic. At EUROCRYPT 2021, Bao et al. presented an MILP-based method for finding optimal MITM characteristics for preimage attacks, and we refer the reader to [6] for more details. Here, we only mention that an MILP characteristic can be visualized with the following coloring scheme on the states of the closed computation path and the ith cell of a state \(\#S\) is encoded with a pair of 0-1 variables \((x_i^{\#S}, y_i^{\#S})\) in the MILP models according to the following rule:

  •   Gray (G), \((x_i^{\#S}, y_i^{\#S})=(1,1)\): known episodic constants.

  •   Red (R), \( (x_i^{\#S}, y_i^{\#S})=(0,1)\): neutral words for backward computation or dependent on cells and neutral words for backward computation.

  •   Blue (B), \((x_i^{\#S}, y_i^{\#S})=(1,0)\): neutral words for forward computation or dependent on cells and neutral words for forward computation.

  •   White (W), \((x_i^{\#S}, y_i^{\#S})=(0,0)\): dependent on cells in the backward computation or dependent on cells in the forward computation.

3 Automatic MITM Key-Recovery Attacks

We describe the MITM key-recovery attack on a block cipher based on Fig. 1 with the lower segment being an encryption or decryption oracle. Before going any further, we introduce some new notations. The initial degrees of freedom from the encryption and key schedule data paths for the forward computation are defined as \(\lambda _{\mathtt {ENC}}^{+} = |\mathcal {B}^{\mathtt {ENC}}|\) and \(\lambda _{\mathtt {KSA}}^{+} = |\mathcal {B}^{\mathtt {KSA}}|\), respectively. Similarly, The initial degrees of freedom from the encryption and key schedule data paths for the backward computation are defined as \(\lambda _{\mathtt {ENC}}^{-} = |\mathcal {R}^{\mathtt {ENC}}|\) and \(\lambda _{\mathtt {KSA}}^{-} = |\mathcal {R}^{\mathtt {KSA}}|\), respectively. Under these notations, we have \(\lambda ^{+} = \lambda _{\mathtt {ENC}}^{+} + \lambda _{\mathtt {KSA}}^{+}\) and \(\lambda ^{-} = \lambda _{\mathtt {ENC}}^{-} + \lambda _{\mathtt {KSA}}^{-}\).

For an MITM characteristic, we say that the degrees of freedom from the encryption data path for the forward computation is used up if for any given \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+})\), we partition the solution space

$$\mathbb {B}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+})$$

of \((\# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}])\) due to Eq. (1) into subspaces according to the value of \(\# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}]\), then each space contains exactly one element. That is, the values of the cells in \(\#S^{\mathtt {ENC}}\) can be fully determined by the cells in \(\#S^{\mathtt {KSA}}\) for a given \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+})\). Similarly, we say that the degrees of freedom from the encryption data path for the backward computation is used up if the values of the cells in \(\#S^{\mathtt {ENC}}\) can be fully determined by the cells in \(\#S^{\mathtt {KSA}}\) for a given \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{-})\).

Now, Let us recall from Sect. 2 that the goal of the MITM attack is to find a particular value for some intermediate states in the closed computation path shown in Fig. 1 with which the values for all the states involved in the computation path can be determined, such that the values derived are compatible with the whole computation path. Specifically, in the context of MITM key-recovery attacks, our goal can be formulated as follows.

Goal 1

Identify a value K for the key register hosting the master key, and a value for one full state in the encryption data path, with which we can derive the values of all states involved. We require that the values for all states are compatible and K equals to the secret key hiding in the oracle.

The above goal indicates that in the MITM key-recovery attack, the full key space must be (implicitly) tested, since a compatible assignment of values to the states is not enough (unlike MITM preimage attacks), and we must identify the unique secret key. Secondly, in the key-recovery attack, we prefer not to exhaust the full codebook of the targeted cipher. These particularities result in the following requirements for the MITM characteristic:

  1. I.

    The degrees of freedom for the forward computation or backward computation from \(\# S^{\mathtt {KSA}}\) cannot be depleted (i.e., \(\mathrm {DoF}^{+} > 0\) and \(\mathrm {DoF}^{-} > 0\)), while the degrees of freedom for both the forward computation and backward computation from \(\# S^{\mathtt {ENC}}\) should be used up.

  2. II.

    In the MITM characteristic, we require that there is at least one cell (episodic constant) in the plaintext state, which will be set to global constant in the actual attack to avoid using the full codebook.

To ensure (I), we require the corresponding systems of equations of the MITM characteristic given in Eqs. (1) and (2) to satisfy the following conditions. For Eq. (1), there are \(l^{+}_{\mathtt {KSA}}\) equations (without loss of generality, we assume these are the first \(l^{+}_{\mathtt {KSA}}\) equations) do not involve \(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}]\) and \(S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}]\). The remaining \(l^{+} - l^{+}_{\mathtt {KSA}}\) equations are used to exhaust the degrees of freedom from the encryption data path, and thus \(|\lambda _{\mathtt {ENC}}^{+}| = |\mathcal {B}^{\mathtt {ENC}}| = l^{+} - l^{+}_{\mathtt {KSA}}\). Under this, we have \(\mathrm {DoF}^{+} = \lambda _{\mathtt {KSA}}^{+} - l_{\mathtt {KSA}}^{+}\). In addition, for each constant \(( \# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+} )\), and each solution for \(\#S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}]\) of the first \(l^{+}_{\mathtt {KSA}}\) equations, we can derive one and only one solution for \(\# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}]\) by solving the remaining equations. For Eq. (2), there are \(l^{-}_{\mathtt {KSA}}\) equations (without loss of generality, we assume these are the first \(l^{-}_{\mathtt {KSA}}\) equations) do not involve \(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}]\) and \(S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}]\). The remaining \(l^{-} - l^{-}_{\mathtt {KSA}}\) equations are used to exhaust the degrees of freedom from the encryption data path, and thus \(|\lambda _{\mathtt {ENC}}^{-}| = |\mathcal {R}^{\mathtt {ENC}}| = l^{-} - l^{-}_{\mathtt {KSA}}\). Under this, we have \(\mathrm {DoF}^{-} = \lambda _{\mathtt {KSA}}^{-} - l_{\mathtt {KSA}}^{-}\). In addition, for each constant \(( \# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{-} )\), and each solution for \(\#S^{\mathtt {KSA}}[\mathcal {R}^{\mathtt {KSA}}]\) of the first \(l^{-}_{\mathtt {KSA}}\) equations, we can derive one and only one solution for \(\# S^{\mathtt {ENC}}[\mathcal {R}^{\mathtt {ENC}}]\) by solving the remaining equations.

Requirement (I) may be less obvious than (II), and we will explain it by looking into the algorithmic framework given in Algorithm 2. But before we go into the details, we emphasize that due to these peculiarities, almost all MITM characteristics found by the the method presented in [6] are useless in the context of key-recovery attacks.

figure ac

From now on, we use \(|\#S|\) denote the number of cells in a state \(\#S\). In Line 1 of Algorithm 2, we set \(|\# S^{\mathtt {ENC}}|\) gray cells, including all the gray cells in the plaintext state to global constants, where \(|\# S^{\mathtt {ENC}}|\) denotes the number of cells in \(\# S^{\mathtt {ENC}}\). Since the gray cells in the plaintext states are set to global constant, the attack will not use the full codebook. These \(|\# S^{\mathtt {ENC}}|\) gray cells are not necessarily within one single state along the computation path. Instead, they can be distributed over multiple states. Moreover, we require that the values of these cells can be set independently to arbitrary values without leading to a conflict along the computation path (excluding the computations connecting the ending states). When these constants are set, for any given key, we can derive the values of all the states (including \(\# S^{\mathtt {ENC}}\)), along the computation path (excluding the computation connecting the ending states), which indicates that if the degrees of freedom of \(\# S^{\mathtt {ENC}}\) are not exhausted, this constant setting process may lead to conflicts, which is equivalent to setting more than \(|\# S^{\mathtt {ENC}}|\) cells of \(\# S^{\mathtt {ENC}}\) to constants. Then, each MITM episode is performed within the context defined by the outer loops surrounding the code segment from Line 8 to Line 15.

Complexity Analysis. In Line 2 of Algorithm 2, suppose there are \(\varepsilon \) gray cells in the plaintext state, then the data complexity \((2^w)^{n-\varepsilon }\). Suppose the states in the encryption data and key schedule data paths contains n and \(\bar{n}\) cells, respectively, and the matching part forms an m-cell filter. According Algorithm 2, there are \( (2^w)^{\bar{n} - \lambda _{\mathtt {KSA}}^+ - \lambda _{\mathtt {KSA}}^- } \cdot (2^w)^{ l^{+}_{\mathtt {KSA}} } \cdot (2^w)^{l^{-}_{\mathtt {KSA}} } = (2^w)^{\bar{n}-(\mathrm {DoF}^{+}+\mathrm {DoF}^{-})} \) MITM episodes, and in each episode \((2^w)^{\mathrm {DoF}^{+}+\mathrm {DoF}^{-}}\) different keys are tested, where \((2^w)^{\mathrm {DoF}^{+}+\mathrm {DoF}^{-} - m}\) of them will pass the m-cell filter. Therefore, the overall time complexity can be estimated as \( (2^w)^{\bar{n} - \mathrm {DoF}^{+} - \mathrm {DoF}^{+}} ( (2^w)^{\mathrm {DoF}^{+}} + (2^w)^{\mathrm {DoF}^{-}} + (2^w)^{\mathrm {DoF}^{+}+\mathrm {DoF}^{-} - m} )\), which is approximately

$$\begin{aligned} (2^w)^{\bar{n} - \min \{\mathrm {DoF}^{+}, \mathrm {DoF}^{-}, m \}}. \end{aligned}$$
(3)

4 MITM Attacks on SKINNY and ForkSkinny

SKINNY is a family of lightweight block ciphers designed by Beierle et al. [9] based on the TWEAKEY framework [38]. In this section, we apply our method to SKINNY-n-3n (The version with an n-bit block size, a 3n-bit key, and a 0-bit tweak) with \(n \in \{64, 128 \}\). The overall structure of SKINNY-n-3n and its round function are given in Fig. 2.

Fig. 2.
figure 2

The hight-level structure of SKINNY-n-3n and its round function (Thanks to https://www.iacr.org/authors/tikz/).

The internal state is viewed as a \(4 \times 4\) square with 16 cells. In each round, the state is updated with five operations: SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR) and MixColumns (MC). The key register is arranged into three \(4 \times 4\) squares denoted as \(TK_1\), \(TK_2\), and \(TK_3\) respectively. Note that the in each round only the first two rows of the internal state are affected by ART, and the MC operation is non-MDS and thus quite different from the AES-like structures analyzed in [6]. Specifically, we have

$$\begin{aligned} \mathtt{MC} \begin{pmatrix} a\\ b\\ c\\ d \end{pmatrix} = \begin{pmatrix} a \oplus c \oplus d\\ a\\ b \oplus c\\ a \oplus c \end{pmatrix}~~ \mathrm {and}~~ \mathtt{MC} ^{-1} \begin{pmatrix} \alpha \\ \beta \\ \gamma \\ \delta \end{pmatrix} = \begin{pmatrix} \beta \\ \beta \oplus \gamma \oplus \delta \\ \beta \oplus \delta \\ \alpha \oplus \delta \end{pmatrix}. \end{aligned}$$
(4)

4.1 Programming the MITM Attacks on SKINNY-n-3n with MILP

Based on the analysis of Sect. 3, we show how to build the MILP model for finding MITM characteristics of SKINNY-n-3n in the context of key-recovery attacks. We employ the same encoding scheme from [6], where the ith cell of a state \(\#S\) is encoded with a pair of 0-1 variables \((x_i^{\#S}, y_i^{\#S})\) according to the rule given in Sect. 2. Firstly, due to the complexity estimation given by Eq. (3), \(\min \{\mathrm {DoF}^+,\mathrm {DoF}^-,\mathrm {DoM}\}\) should be maximized in our model. To this end, we introduce an auxiliary variable \(v_\mathtt{Obj}\), impose the constraints

$$ \{ v_\mathtt{Obj} \le \mathrm {DoF}^+, v_\mathtt{Obj} \le \mathrm {DoF}^-, v_\mathtt{Obj} \le \mathrm {DoM}\} $$

and set the objective function to maximize \(v_\mathtt{Obj}\). In what follows, we describe the constraints for the starting states, ending states, and the states in the computation paths with a special focus on what is different from Bao et al.’s work [6]. First of all, the tweakey schedule algorithm of SKINNY-n-3n only involves in-cell operations and permutations changing the positions of the cells in the tweakey register, which will not alter the color of a cell in our model (only their positions are changed). Therefore, we will not discuss the constraints imposed solely by the tweakey schedule algorithm in the following.

Constraints for the Starting States. As discussed in Sect. 3, we distinguish the sources of degrees of freedom from the encryption data path (denoted by \(\lambda _{\mathtt {ENC}}^+\) and \(\lambda _{\mathtt {ENC}}^-\)) and the key schedule data path (denoted by \(\lambda _{\mathtt {KSA}}^+\) and \(\lambda _{\mathtt {KSA}}^-\)), and the initial degrees of freedom satisfies \(\lambda ^+ = \lambda _{\mathtt {ENC}}^+ + \lambda _{\mathtt {KSA}}^+\) and \(\lambda ^- = \lambda _{\mathtt {ENC}}^- + \lambda _{\mathtt {KSA}}^-\), where \(\lambda _{\mathtt {ENC}}^+ = |\mathcal {B}^{\mathtt {ENC}}|\), \(\lambda _{\mathtt {KSA}}^+ = |\mathcal {B}^{\mathtt {KSA}}|\), \(\lambda _{\mathtt {ENC}}^- = |\mathcal {R}^{\mathtt {ENC}}|\), and \(\lambda _{\mathtt {KSA}}^- = |\mathcal {R}^{\mathtt {KSA}}|\). We introduce two variables \(\alpha _i\) and \(\beta _i\) for each cell in \((\# S^{\mathtt {ENC}}, \# S^{\mathtt {KSA}})\), where \(\alpha _i=1\) if and only if \((x_i^{\#S}, y_i^{\#S})=(1,0)\) and \(\beta _i=1\) if and only if \((x_i^{\#S}, y_i^{\#S})=(0,1)\). Then we have the following constraints:

$$\begin{aligned} \lambda _{\mathtt {ENC}}^+= {\sum }_i \alpha _i^{\mathtt {ENC}},~ \lambda _{\mathtt {KSA}}^+= {\sum }_i \alpha _i^{\mathtt {KSA}},~ \lambda _{\mathtt {ENC}}^-= {\sum }_i \beta _i^{\mathtt {ENC}}, ~ \lambda _{\mathtt {KSA}}^-= {\sum }_i \beta _i^{\mathtt {KSA}}, \end{aligned}$$

and

$$\begin{aligned} {\left\{ \begin{array}{ll} x_i^{\#S^{\mathtt {ENC}}} - \alpha _i^{\mathtt {ENC}} \ge 0 \\ y_i^{\#S^{\mathtt {ENC}}} - x_i^{\#S^{\mathtt {ENC}}} + \alpha _i^{\mathtt {ENC}} \ge 0 \\ y_i^{\#S^{\mathtt {ENC}}} + \alpha _i^{\mathtt {ENC}} \le 1 \end{array}\right. }, \qquad {\left\{ \begin{array}{ll} y_i^{\#S^{\mathtt {ENC}}} - \beta _i^{\mathtt {ENC}} \ge 0 \\ x_i^{\#S^{\mathtt {ENC}}} - y_i^{\#S^{\mathtt {ENC}}} + \beta _i^{\mathtt {ENC}} \ge 0 \\ x_i^{\#S^{\mathtt {ENC}}} + \beta _i^{\mathtt {ENC}} \le 1 \end{array}\right. }, \end{aligned}$$
$$\begin{aligned} {\left\{ \begin{array}{ll} x_i^{\#S^{\mathtt {KSA}}} - \alpha _i^{\mathtt {KSA}} \ge 0 \\ y_i^{\#S^{\mathtt {KSA}}} - x_i^{\#S^{\mathtt {KSA}}} + \alpha _i^{\mathtt {KSA}} \ge 0 \\ y_i^{\#S^{\mathtt {KSA}}} + \alpha _i^{\mathtt {KSA}} \le 1 \end{array}\right. }, \qquad {\left\{ \begin{array}{ll} y_i^{\#S^{\mathtt {KSA}}} - \beta _i^{\mathtt {KSA}} \ge 0 \\ x_i^{\#S^{\mathtt {KSA}}} - y_i^{\#S^{\mathtt {KSA}}} + \beta _i^{\mathtt {KSA}} \ge 0 \\ x_i^{\#S^{\mathtt {KSA}}} + \beta _i^{\mathtt {KSA}} \le 1 \end{array}\right. }. \end{aligned}$$

Constraints for the Ending States. We assume that the matching only happens at the MixColumns. Let \((\#E^+[4j],\#E^+[4j+1],\#E^+[4j+2],\#E^+[4j+3])^T\) and \((\#E^-[4j],\#E^-[4j+1],\#E^-[4j+2],\#E^-[4j+3])^T\) be the jth column of the ending states \(\#E^+\) and \(\#E^-\) linked by the MC operation. Since MC is non-MDS, its constraints are quite different from Bao et al.’s model for MDS matrix, where there is a \((\varSigma -4)\)-cell filter if and only if \(\varSigma \ge 5\) out of 8 cells of the two columns are or cells (see [6, Property 1, page 14]).

For the MC operation of SKINNY, there may exist an m-cell (\(m > 0\)) filter even if \(\varSigma <5\). For example, according to Eq. (4), if , and all other cells are  , we still get a 1-cell filter due to \(\#E^+[4j]=\#E^-[4j+1]\). We can enumerate all possible patterns and convert these local constraints into linear inequalities using the convex hull computation method. In Fig. 3, we list some of the possible matching patterns with their filtering strength measured in cells. We introduce a variable \(\gamma _j \ge 0\) for the j-th columns of \({\#E^+}\) and \({\#E^-}\) such that there is a \(\gamma _j\)-cell filter due to the coloring patterns of \({\#E^+}\) and \({\#E^-}\), then we get a \(\mathrm {DoM}\)-cell filter at the matching point, where \(\mathrm {DoM} = \sum _j \gamma _j\) and should be positive according to the complexity analysis given by Eq. (3).

Fig. 3.
figure 3

Some possible coloring patterns at the matching point

Constraints Imposed by the Computation Paths. Along the computation paths leading to the ending states, the initial degrees of freedom are consumed according to the MITM characteristic. Forward computation consumes the degrees of freedom of the neutral words for backward computation while backward computation consumes the degrees of freedom of the neutral words for the forward computation. The consumption of degrees of freedom is counted in cells. Let \(\sigma _{\mathtt {ENC}}^+\), \(\sigma _{\mathtt {KSA}}^+\) and \(\sigma _{\mathtt {ENC}}^-\), \(\sigma _{\mathtt {KSA}}^-\) be the accumulated degrees of freedom that have been consumed in the backward and forward computation in the encryption and key schedule data paths. Since the degrees of freedom from the encryption data paths for both directions should be used up and the degrees of freedom originated from the key schedule data path should not be exhausted, we require

$$\begin{aligned} {\left\{ \begin{array}{ll} \lambda _{\mathtt {ENC}}^+-\sigma _{\mathtt {ENC}}^+=0,~~~ \lambda _{\mathtt {ENC}}^--\sigma _{\mathtt {ENC}}^-=0\\ \mathrm {DoF}^+ = \lambda _{\mathtt {KSA}}^+ - \sigma _{\mathtt {KSA}}^+ \ge 1,~~~ \mathrm {DoF}^- = \lambda _{\mathtt {KSA}}^- - \sigma _{\mathtt {KSA}}^- \ge 1 \end{array}\right. }. \end{aligned}$$

According to the semantics of the colors, how a coloring pattern of the input and output states of an operation consumes the degrees of freedom should be be different for the forward and the backward computation paths. Therefore, we will give two sets of rules for different directions of the computation.

XOR. The XOR operations exist in the ART and MC, and we can reuse the XOR-RULE\(^+\) (for forward computation) and XOR-RULE\(^-\) (for backward computation) rules gvien in [6]. The coloring patterns and how the degrees of freedom are consumed are visualized in Fig. 4.

Fig. 4.
figure 4

Rules for XOR, where a “*” means that the cell can be any color

AddRoundTweakey. ART is the operation that the first two rows of the three tweakey states are XORed into the encryption data path. There are three XOR operations and four input cells (three from the tweakey state and one from the encryption data path) involved to produce an output cell. Certainly, we can use the XOR-RULE three times to get the constraints. However, this approach misses some important coloring patterns that may lead to better attacks. We take the forward computation for example as shown in Fig. 5. If we use XOR\(^+\)-RULE three times successively as shown in Fig. 5(a), when the and are the input cells of the XOR, the output cell will be  , eventually leading to a output cell. However, if we change the order of the XOR operations as shown in Fig. 5(b), then \(\oplus \) may produce a cell by consuming one degree of freedom, leading to a output cell. To take this into account, we model the rule for three XORs as a whole, named as 3-XOR\(^+\)-RULE, with Fig. 5(c) as an example.

Fig. 5.
figure 5

The inaccuracy of modeling \(\mathtt{3}\)-XOR\(^+\) by applying XOR\(^+\) successively

For the 3-XOR operation in the forward computation, we have the following set of rules (denoted by 3-XOR\(^+\)-RULE):

  • \(\blacktriangleright \) 3-XOR\(^+\)-RULE-1. If there are cells but no and cells in the input, the output cell is or (partially cancel the impacts of the input cells by consuming \(\lambda _{\mathtt {ENC}}^-\) or \(\lambda _{\mathtt {KSA}}^-\)).

  • \(\blacktriangleright \) 3-XOR\(^+\)-RULE-2. If there are and cells but no cells in the input, the output cell is or (partially cancel the impacts from on by consuming \(\lambda _{\mathtt {ENC}}^-\) or \(\lambda _{\mathtt {KSA}}^-\)).

  • \(\blacktriangleright \) 3-XOR\(^+\)-RULE-3. If there are cells but no and cells in the input, the output cell is .

  • \(\blacktriangleright \) 3-XOR\(^+\)-RULE-4. If all the input cells are , then the output cell is .

  • \(\blacktriangleright \) 3-XOR\(^+\)-RULE-5. If there is at least one cell in the input, the output is .

We introduce variables \(\delta _{\mathtt {ENC}}^-\) and \(\delta _{\mathtt {KSA}}^-\) to denote the consumed degrees of freedom due to 3-XOR\(^+\)-RULE. For example, \(\delta _{\mathtt {ENC}}^-=1\) means that we consume one degree of freedom from \(\lambda _{\mathtt {ENC}}^-\) by applying the rule. In order to use up all the degrees of freedom from \(\#S^{\mathtt {ENC}}\), we should consume \(\lambda _{\mathtt {ENC}}^-\) first whenever possible. As shown in Fig. 6, when there are degrees of freedom in the encryption path, i.e., cells, the consumption of degree of freedom is always from \(\lambda _{\mathtt {ENC}}^-\), i.e., \(\delta _{\mathtt {ENC}}^-=1\) and \(\delta _{\mathtt {KSA}}^-=0\).

Let \(\#a\), \(\#b\), \(\#c\), \(\#d\) be the input cells and \(\#e\) be the output cell. Then, the set of rules 3-XOR\(^+\)-RULE restricts \((x^{\#a},y^{\#a},x^{\#b},y^{\#b},x^{\#c},y^{\#c},x^{\#d},y^{\#d},x^{\#e},y^{\#e},\) \(\delta _{\mathtt {ENC}}^-)\) and \((x^{\#a},y^{\#a},x^{\#b},y^{\#b},x^{\#c},y^{\#c},x^{\#d},y^{\#d},x^{\#e},y^{\#e}, \delta _{\mathtt {KSA}}^-)\) to subsets of \(\mathbb {F}_2^{11}\), which can be described by a system of linear inequalities by using the convex hull computation method. Some valid coloring patterns due to 3-XOR\(^+\)-RULE are given in Fig. 6. Note that 3-XOR\(^-\)-RULE can be obtained from 3-XOR\(^+\)-RULE by exchanging the cells and cells, since the meanings of and are dual for the forward and backward computations.

Fig. 6.
figure 6

3-XOR\(^+\)-RULE, where a “*” means that the cell can be any color

Fig. 7.
figure 7

MC-RULE

MixColumn. Since MC contains only XOR operations, we can use XOR-RULE to generate the set of rules MC-RULE for MC. According to Eq. (4), there exists one equation that XORs three cells together to get one cell. We use a similar approach we employed for 3-XOR\(^+\)-RULE and 3-XOR\(^-\)-RULE to handle this special equation. Finally, we get the valid propogations of the coloring patterns and list some of them in Fig. 7. Note that there are no key additions involved in MC, and thus all the consumed degrees of freedom are from \(\lambda _{\mathtt {ENC}}^+\) and \(\lambda _{\mathtt {ENC}}^-\).

4.2 The MITM Key-Recovery Attack on SKINNY-n-3n

Solving the model built in Sect. 4.1, we identify a 23-round MITM characteristic as shown in Fig. 8. The starting states are \(\# S^{\mathtt {ENC}} = Y_1\) and the three tweakey words \(\# S^{\mathtt {KSA}} = (TK_1^{(1)}, TK_2^{(1)}, TK_3^{(1)})\). The matching process happens at the MC operation between the ending states \(\#E^{+} = Z_{12}\) and \(\#E^{-} = X_{13}\). There are 3 cells and 3 cells in \(\# S^{\mathtt {KSA}}\), providing \(\lambda _{\mathtt {KSA}}^-=\lambda _{\mathtt {KSA}}^+=3\) cells of initial degrees of freedom originated from the key schedule data path. For \(\#S^{\mathtt {ENC}}\), \(Y_1\) provides \(\lambda _{\mathtt {ENC}}^-=8\) and \(\lambda _{\mathtt {ENC}}^+=1\) cells of initial degrees of freedom from the encryption data path. The \(\lambda _{\mathtt {ENC}}^+=1\) cells of degrees of freedom is used up when computing \(X_1\) from \(Y_1\) by XORing the subtweakey. In the forward computation, the \(\lambda _{\mathtt {ENC}}^-=8\) cells of degrees of freedom are used up when computing \(Y_4\) from \(Y_1\). For the forward computation, we require \(TK_1^{(6)}[7] \oplus TK_2^{(6)}[7] \oplus TK_3^{(6)}[7]\) and \(TK_1^{(8)}[1] \oplus TK_2^{(8)}[1] \oplus TK_3^{(8)}[1]\) to be constants, consuming \(\sigma _{\mathtt {KSA}}^-=2\) cells of degrees of freedom originated from the key schedule data path. Hence, we get \(\mathrm {DoF}^-=\lambda _{\mathtt {KSA}}^--\sigma _{\mathtt {KSA}}^-=1\). Similarly, we get \(\mathrm {DoF}^+=\lambda _{\mathtt {KSA}}^+-\sigma _{\mathtt {KSA}}^+=1\). At the matching point, we have \(\mathrm {DoM}=2\) from the first two column of \(\#E^+\) and \(\#E^-\) with Eq. (4). The 23-round key-recovery attack is given in Algorithm 3. The data and memory complexity is bounded by Line 2, which is \(2^{104}\) for SKINNY-128-384 and \(2^{52}\) for SKINNY-64-192. According to Eq. (3), the time complexity is about \(2^{376}\) for SKINNY-128-384 and \(2^{188}\) for SKINNY-64-192.

figure bs

Remark. The designers of SKINNY claimed that: “We conclude that meet-in-the-middle attack may work up to at most 22 rounds (see [9], Sect. 4.2, page 22)”. Our attack penetrates one more round than expected and is the first 23-round single-key attack on SKINNY-128-384 and SKINNY-64-192. Using the same method, we also analyze ForkSkinny (see the full version of the paper). In addition, we report on some results on Romulus-H as a by-product of the analysis of SKINNY (see the full version of the paper).

Fig. 8.
figure 8

An MITM key-recovery attack on 23-round SKINNY-n-3n

5 Exploiting Nonlinearly Constrained Neutral Words in MITM Attacks and Its Applications

According to Property 1 in Sect. 2, in order to compute the allowable values for the neutral words, one has to solve two systems of equations, i.e., Eq. (1) and (2). In previous MITM preimage attacks [6, 46], the two systems of equations are linear (or can be reduced to linear equations involving certain cells not from the starting states that implicitly define the spaces of the neutral words). Hence, it is easy to derive the solution spaces \( \mathbb {B}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{+}) \) and \( \mathbb {R}(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \mathfrak {c}^{-}) \) by solving the systems of equations, whose cost can be ignored compared with the overall complexity. However, in practice, we encounter many interesting MITM characteristics with nonlinear constrained neutral words, and there is no efficient method for solving them. We present a table based technique in Algorithm 4 which can be applied in attacks relying on such MITM characteristics without solving the equations or increasing the overall time complexities.

figure bt

Algorithm 4 obtains the solution spaces of the neutral words for all possible \(\mathfrak {c}^{+}\) and \(\mathfrak {c}^{-}\) under a given value of \((\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}] )\) with time complexity \((2^w)^{\lambda ^+} + (2^w)^{\lambda ^-}\) and memory complexity \((2^w)^{\lambda ^+} + (2^w)^{\lambda ^-}\). After running Algorithm 4, \(V[\boldsymbol{v}]\) stores the solution space of

$$\boldsymbol{\pi }^+(\# S^{\mathtt {ENC}}[\mathcal {G}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {G}^{\mathtt {KSA}}], \# S^{\mathtt {ENC}}[\mathcal {B}^{\mathtt {ENC}}], \# S^{\mathtt {KSA}}[\mathcal {B}^{\mathtt {KSA}}]) = \boldsymbol{v},$$

which consists about \(2^{w\cdot (\lambda ^+-l^+)}=2^{w\cdot \mathrm {DoF}^+}\) values for the neutral words for the forward computation. Similarly, under each index \(\boldsymbol{u}\) of U, there are about \(2^{w\cdot (\lambda ^--l^-)}=2^{w\cdot \mathrm {DoF}^-}\) values for the neutral words for the backward computation. Algorithm 4 can be plugged into the procedure for MITM attacks to deal with MITM characteristics with nonlinearly constrained neutral words. For example, applying the technique to the MITM preimage attack gives Algorithm 5. Next, we show the time complexity is not increased.

figure bu

Complexity Analysis. In each MITM episode within the context defined by the “For” loops surrounding the code segment from Line 6 to Line 14 of Algorithm 5, we test \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} )}\) messages and we expect \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - m)}\) of them to pass the m-cell filter, and averagely, there are about \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - h )}\) preimages passing the check at Line 13 for each episode. The time complexity to perform one MITM episode is

$$\begin{aligned} (2^w)^{\mathrm {DoF}^+}+(2^w)^{\mathrm {DoF}^-}+(2^w)^{\mathrm {DoF}^++\mathrm {DoF}^--m}. \end{aligned}$$
(5)

Then, we estimate the size of \(\mathbb {G}\) in Line 1 of Algorithm 5, which determines the number of MITM episodes performed. Suppose \(|\mathbb {G}| = (2^w)^x\), to produce one preimage, we require that \( (2^w)^x \cdot (2^w)^{l^+ + l^-} \cdot (2^w)^{\mathrm {DoF}^+ + \mathrm {DoF}^-}=(2^w)^h \) or \(x=h-(\lambda ^+ + \lambda ^-)\). Hence, we consider two situations depending on \(\lambda ^++\lambda ^-\).

  • \(\lambda ^++\lambda ^-\ge h\): In this case, we set \(x=0\), then \(|\mathbb {G}| =1\). At Line 3 and Line 4 of Algorithm 5, we only need to traverse \((2^w)^{h-(\mathrm {DoF}^{+}+\mathrm {DoF}^{-})}\) values of (\(\mathfrak {c}^{+},\mathfrak {c}^{-} \))\(\in \mathbb {F}_2^{ w \cdot l^{+}+ w\cdot l^{-}}\), where \(h-(\mathrm {DoF}^{+}+\mathrm {DoF}^{-})\le l^{+}+ l^{-}\) due to \(\lambda ^++\lambda ^-\ge h\), to find the preimage. Then, together with Eq. (5), we have the overall time complexity: \( (2^w)^{\lambda ^+}+(2^w)^{\lambda ^-} + (2^w)^{h-{\min }({\mathrm {DoF}^+},~{\mathrm {DoF}^-},~m)}. \)

  • \(\lambda ^++\lambda ^-< h\): Set \(x=h-(\lambda ^++\lambda ^-)\), and we need to build \(2^x\) V and U in Line 2 of Algorithm 5. Hence, we get the overall complexity:

    $$\begin{aligned} (2^w)^{h-\lambda ^+}+(2^w)^{h-\lambda ^-} + (2^w)^{h-{\min }({\mathrm {DoF}^+},~{\mathrm {DoF}^-},~m)}. \end{aligned}$$
    (6)

Moreover, the memory complexity for both situations is

$$\begin{aligned} (2^w)^{\lambda ^+}+(2^w)^{\lambda ^-}+(2^w)^{\min (\mathrm {DoF}^+,~\mathrm {DoF}^-)}. \end{aligned}$$
(7)

We apply Algorithm 5 to Grøstl-256, Saturnin -Hash, and AES-256 hashing and improved cryptanalytic results are obtained (see the full version of the paper). In particular, employing the new representation of the AES key schedule due to Leurent and Pernot (EUROCRYPT 2021), we identify the first preimage attack on 10-round AES-256 hashing.

6 MITM-Based Collision Attacks and Its Applications

Suppose that there is an algorithm that can produce a different t-cell partial target preimage. Then we expect to find a collision by running the algorithm \(2^{w \cdot (h - t)/2}\) times to identify a collision on the h-cell hash value. At FSE 2012 [43], Li, Isobe, and Shibutani employed this strategy to convert the MITM-based partial target preimage attacks into pseudo collision attacks. First, we consider a generalization of partial target preimage attacks.

Let \(\mathbb {T}\) be the space of all possible values of the output of the hash function. For a predefined partition of \(\mathbb {T}\) into \((2^w)^t\) subspaces with an equal size. We call an algorithm a t-cell partial target preimage attack if it can produce a message whose hash value is a random element in a given subspace. For example, an algorithm generating a message such that the first word of its hash value is always 0 is a 1-cell partial target preimage attack. An algorithm generating a message such that the XOR of the first and second words of its hash value is always 0 is also a 1-cell partial target preimage attack. Given an MITM characteristic, the framework for a collision attack is described in Algorithm 6. Note that the call to Algorithm 6 can be replaced by an ordinary equation solving procedure to save the memory if the involved equations are linear or easy to solve. To be clear on how to set the objective functions in our MILP models, we need to understand how the complexity of the attack is related to the parameters specified in the MITM characteristic.

figure bv

Complexity Analysis. In the MITM t-cell partial target preimage attack, if the matching process results in an m-cell filter, then we have \(m \le t\), because the matching information is derived from the known cells of the target T. To determine the overall complexity of the algorithm, we need to determine how many MITM episodes (Line 9 to 18 of Algorithm 6) are required. According to the analysis of Algorithm 4 in Sect. 5, the time complexity for building U and V is \((2^w)^{\lambda ^+} + (2^w)^{\lambda ^-}\). In each MITM episode within the context defined by the “For” loops surrounding the code segment from Line 9 to Line 18, we test \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} )}\) messages and we expect \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - m)}\) of them to pass the m-cell filter, and averagely, there are about \(2^{w \cdot (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - t )}\) messages are inserted into the hash table H. Therefore, we need about \( (2^w)^{ \frac{h-t}{2} - (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - t ) } \) episodes to produce one collision. The time to perform one MITM episode is

$$\begin{aligned} (2^w)^{\mathrm {DoF}^+}+(2^w)^{\mathrm {DoF}^-}+(2^w)^{\mathrm {DoF}^++\mathrm {DoF}^--m}+(2^w)^{\mathrm {DoF}^++\mathrm {DoF}^--t}. \end{aligned}$$
(8)

Suppose in Line 3 of Algorithm 6 we have \(\mathbb {G} = 2^{w\cdot x}\). Then, \((2^w)^{x}\cdot (2^w)^{l^+}\cdot (2^w)^{l^-}\) matching episodes are performed. Hence, we have

$$\begin{aligned} (2^w)^{x}\cdot (2^w)^{l^+}\cdot (2^w)^{l^-} =(2^w)^{ \frac{h-t}{2} - (\mathrm {DoF}^{+} + \mathrm {DoF}^{-} - t ) }. \end{aligned}$$

We get \(x=\frac{h}{2}-(\lambda ^++\lambda ^--\frac{t}{2})\). Hence, we consider two situations:

  • \(\lambda ^++\lambda ^-\ge \frac{h+t}{2}\): In this case, we set \(x=0\). At Line 6 and Line 7 of Algorithm 6, we only need to traverse \((2^w)^{\frac{h-t}{2}-(\mathrm {DoF}^{+}+\mathrm {DoF}^{-}-t)}\) values of (\(\mathfrak {c}^{+},\mathfrak {c}^{-} \))\(\in \mathbb {F}_2^{ w \cdot l^{+}+ w\cdot l^{-}}\), where \(\frac{h-t}{2}-(\mathrm {DoF}^{+}+\mathrm {DoF}^{-}-t)\le l^{+}+ l^{-}\) due to \(\lambda ^++\lambda ^-\ge \frac{h+t}{2}\), to find the collision. Then, together with Eq. 8, we have the overall time complexity:

    $$\begin{aligned} (2^w)^{\lambda ^+}+(2^w)^{\lambda ^-} + (2^w)^{\frac{h}{2}- {\min }\{\mathrm {DoF}^{+}-\frac{t}{2},~ \mathrm {DoF}^{-}-\frac{t}{2},~ m-\frac{t}{2},~\frac{t}{2}\}}. \end{aligned}$$
    (9)
  • \(\lambda ^++\lambda ^-<\frac{h+t}{2}\): Set \(x=\frac{h}{2}-(\lambda ^++\lambda ^--\frac{t}{2})\), and we need to build \(2^x\) V and U in Line 5 of Algorithm 6. Hence, we get the overall complexity:

    $$\begin{aligned} (2^w)^{\frac{h}{2} - (\lambda ^+ - \frac{t}{2})}+ (2^w)^{\frac{h}{2} - (\lambda ^- - \frac{t}{2})}+ (2^w)^{\frac{h}{2}- {\min }\{\mathrm {DoF}^{+}-\frac{t}{2},~ \mathrm {DoF}^{-}-\frac{t}{2},~ m-\frac{t}{2},~\frac{t}{2}\}}, \end{aligned}$$
    (10)

    which is approximately \((2^w)^{\frac{h}{2}- {\min }\{\mathrm {DoF}^{+}-\frac{t}{2},~ \mathrm {DoF}^{-}-\frac{t}{2},~ m-\frac{t}{2},~\frac{t}{2}\}}\), since we always have \(\mathrm {DoF}^{+} \le \lambda ^+\) and \(\mathrm {DoF}^{-} \le \lambda ^-\).

The memory complexity in both situations is

$$\begin{aligned} (2^w)^{\lambda ^+}+(2^w)^{\lambda ^-} + (2^w)^{\min \{\mathrm {DoF}^{+},\mathrm {DoF}^{-}\}}+(2^w)^{\frac{h-t}{2}}. \end{aligned}$$
(11)

where the \((2^w)^{\frac{h-t}{2}}\) is to store the t-cell partial target preimages in H. Consequently, for an attack efficient than the trivial birthday attack, we have \({\min }\{\mathrm {DoF}^{+}-\frac{t}{2},~ \mathrm {DoF}^{-}-\frac{t}{2},~ m-\frac{t}{2},~\frac{t}{2}\} > 0\), \(\lambda ^+<\frac{h}{2}\) and \(\lambda ^-<\frac{h}{2}\), or

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm {DoF}^{+}> \frac{t}{2},~ \mathrm {DoF}^{-} > \frac{t}{2}\\ \frac{t}{2}< m \le t\\ \lambda ^+<\frac{h}{2},~\lambda ^-<\frac{h}{2} \end{array}\right. }. \end{aligned}$$

6.1 Automatic Search for MITM-Based Collision Attacks

First of all, The objective function of the model is to maximize

$${\min }(\mathrm {DoF}^{+}-\frac{t}{2},\mathrm {DoF}^{+}-\frac{t}{2},m-\frac{t}{2},\frac{t}{2})$$

according to Eq. (10). In what follows, we only discuss the main particularity of MITM-based collision attacks, which lies in the matching part. To be more specific, the degree of match (DoM) is derived differently from other attacks discussed in the work. To be concrete, we consider AES-like hashings like WHIRLPOOL and Grøstl, which includes the MixColumn(MC) or MixRows(MR) operation in their last rounds. To determine the degree of match, we consider two situations according to the position where the match happens.

The Matching Point is Placed at the Last Round. Suppose that the MDS matrix of the MC operation at the matching point operates on k cells, which links the state Z in the last round to the XOR sum of the input state X of the first round and the target T, i.e., \(\mathtt{MC} (Z)=X\oplus T\). Suppose that from the forward and backward computation \(\alpha \) cells and \(\beta \) cells are known. Without loss of generality, we assume \((Z[0],\cdots , Z[\alpha -1])^T\) of Z is known as , and \((X[0], \cdots , X[\beta -1])^T\) of X is known as . From

figure ca

we get \(\beta \) linear equations with k variables \(Z[0],Z[1],\cdots , Z[k-1]\) on the left, and \(2\beta \) variables \(X[0],\cdots , X[\beta -1]\), \(T[0],\cdots , T[\beta -1]\) on the right. There are \(k-\alpha \) unknowns \(Z[\alpha ],\cdots , Z[k-1]\) on the left. Hence, if \(\beta > k-\alpha \), we can represent the \(k - \alpha \) unknowns by other variables by consuming \(k-\alpha \) linear equations. At last, we have \(\varSigma =\beta - (k-\alpha )\) linear equations left:

$$\begin{aligned} \left\{ \begin{array}{l} {\zeta _1}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _1}(X[0], \cdots ,X[\beta - 1]) \oplus {\varphi _1}(T[0], \cdots ,T[\beta - 1]),\\ {\zeta _2}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _2}(X[0], \cdots ,X[\beta - 1]) \oplus {\varphi _2}(T[0], \cdots ,T[\beta - 1]),\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\vdots \\ {\zeta _{\varSigma }}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _{\varSigma }}(X[0], \cdots ,X[\beta - 1]) \oplus {\varphi _{\varSigma }}(T[0], \cdots ,T[\beta - 1]),\\ \end{array} \right. \end{aligned}$$
(12)

where \(\zeta _i(\cdot ),~{\phi _i}(\cdot ),~\varphi _i(\cdot )\) are linear equations. By assigning \(t \le \varSigma =\beta +\alpha -k\) conditions on the target T in the Eq. (12):

$$\begin{aligned} \left\{ \begin{array}{l} {\varphi _1}(T[0], \cdots ,T[\beta - 1]) = \tau _1,\\ {\varphi _2}(T[0], \cdots ,T[\beta - 1])=\tau _2,\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\vdots \\ {\varphi _{t}}(T[0], \cdots ,T[\beta - 1])=\tau _t,\\ \end{array} \right. \end{aligned}$$
(13)

where \(\boldsymbol{\tau }=(\tau _1,\cdots , \tau _t)\in \mathbb {F}_2^{ w \cdot t }\), we get a t-cell filter:

$$\begin{aligned} \left\{ \begin{array}{l} {\zeta _1}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _1}(X[0], \cdots ,X[\beta - 1]) \oplus \tau _1,\\ {\zeta _2}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _2}(X[0], \cdots ,X[\beta - 1]) \oplus \tau _2,\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\vdots \\ {\zeta _{\tau _t}}(Z[0], \cdots ,Z[\alpha - 1]) = {\phi _{\varSigma }}(X[0], \cdots ,X[\beta - 1]) \oplus \tau _t.\\ \end{array} \right. \end{aligned}$$

In summary, we have the constraints \(\mathrm {DoF} = t \le \varSigma =\beta +\alpha -k\) and \(\beta +\alpha \ge k\). Therefore, in the MILP model for this case, we can ignore the coloring information of T. After identifying an MITM characteristic with configurations for \((\alpha ,\beta , m, t)\), the t conditions on T can be derived accordingly with Eq. (13).

Fig. 9.
figure 9

The matching point is not placed at the last round.

The Matching Point is not at the Last Round. In this case, the XOR of the target T can happen in the forward computation (see Fig. 9(a)) or in the backward computation (see Fig. 9(b)). The yellow cells are prefixed constants, which can be represented as 0-1 variables in the same way as the Gray (G) cells: If the ith cell of T is yellow, then \((x^T_i,y^T_i)=(1,1)\). Other cells of T are White (W), encoded as \((x^T_j,y^T_j)=(0,0)\).

In the case shown in Fig. 9(a), the rules of xoring the tag T is the same to the XOR\(^+\)-RULE by regarding the cells as cells. Moreover, we require that the cells in T align with the cells in X as shown in Fig. 9(a). Hence, the constraint \(x^T_i\le x^{X}_i\) is added to avoid the transition . Therefore, for the number t of conditions imposed on T, we have \(t=\sum _ix^T_i\).

In the case of Fig. 9(b), we consider the positions of cells in MC\(^{-1}(T)\). The rules of xoring the tag T is the same to the XOR\(^+\)-RULE by regarding the cells as cells. In addition, we require that the cells in MC\(^{-1}(T)\) align with the cells in Z. Hence, the constraint \(y^{\mathtt{MC}^{-1}(T)}_i\le y^{Z}_i\) is added to avoid the transition . Therefore, for the number t of conditions imposed on T, we have \(t=\sum _iy^{\mathtt{MC}^{-1}(T)}_i\).

6.2 Collision Attacks on WHIRLPOOL and Grøstl

The WHIRLPOOL hash function, designed by Barreto and Rijmen, is an ISO/IEC standard. Its compression function is built by plug an AES-like cipher into the Miyaguchi-Preneel construction . During the last 20 years, WHIRLPOOL has withstood extensive cryptanalysis [32, 44, 52] and the best collision attack in the classical setting reaches 5 rounds [29, 42]. Recently, Hosoyamada and Sasaki introduced a quantum collision attack on 6-round WHIRLPOOL [32].

We give the first 6-round collision attack on WHIRLPOOL in the classical setting, breaking the 10-year record for collision attacks on WHIRLPOOL. Applying the automatic model of MITM collision attack to WHIRLPOOL, we find a new 6-round MITM characteristic shown in Fig. 10. We apply Algorithm 6 to WHIRLPOOL based on this MITM characteristic. The starting state is \(X_3\). Then, we have \(\lambda ^+=10\) and \(\lambda ^-=20\), \(w=8\). According to Property 1, we have \(l^+=8\) and \(\mathfrak {c}^{+} = (a_1, \cdots , a_{8}) \in \mathbb {F}_2^{ 8 \times 8 }\); \(l^-=16\) and \(\mathfrak {c}^{-} = (b_1, \cdots , b_{16}) \in \mathbb {F}_2^{ 8 \times 16 }\). Then we build similar equations in the attack on Grøstl (See Section D in the full version of the paper). Therefore, we call Algorithm 4 to build V and U. \(\mathrm {DoF}^+=\lambda ^+-l^+=2\), \(\mathrm {DoF}^-=\lambda ^--l^-=4\), \(t=m=2\) and \(h=64\). The time complexity is \( (2^8)^{\frac{64}{2} - (10 - \frac{2}{2})}+ (2^8)^{\frac{64}{2} - (20 - \frac{2}{2})}+ (2^8)^{\frac{64}{2}- \min \{2-\frac{2}{2},~ 4-\frac{2}{2},~ 2-\frac{2}{2},~\frac{2}{2}\}} \approx 2^{248} \) according to Eq. (10), and the memory complexity is about \( 2^{248}\). We also apply the method to Grøstl, and the results are given in Section F of the full version of the paper.

Fig. 10.
figure 10

An MITM attack on 6-round WHIRLPOOL

7 Conclusion and Open Problems

We formulate the MITM attacks in a more formal, expressive, and accurate way. Based on this formulation, we investigate the peculiarities of MITM-based key-recovery attacks on block ciphers and collision attacks on AES-like hash functions and model them in the constraint programming paradigm. Now, we have a fairly powerful tool for finding exploitable MITM characteristics in key-recovery, (pseudo) preimage, and collision attacks on word oriented designs. Moreover, we present a generic procedure for dealing with nonlinearly constrained neutral words without increasing the overall time complexities of the attacks relying on them. We apply our method to concrete keyed and unkeyed primitives, leading to attacks improving the state-of-the-art. At this point, we would like propose an open problem: Is it possible to search for bit-level MITM characteristics automatically, and to what extent it can improve the current cryptanalytic results? For bit-oriented models, we think the work from Fuhr, Minaud, and Yu [27, 47] is good starting point.