Keywords

1 Introduction

In lightweight cryptography, cryptographic primitives are required to be suitable for resource-constrained environments, such as low area, latency or energy consumption. Therefore, it’s usually difficult to design such ciphers since one has to make trade-offs between implementation efficiency and security. Hence, the security of such ciphers is worth to be evaluated thoroughly.

In CHES 2021, Leander et al. proposed a family of ultra-low-latency block ciphers named as SPEEDY [9] to resolve the problem: How to design a secure encryption algorithm whose hardware implementation is fast. To gain such cipher, they first considered which type of logic gates and circuit topologies are suitable for ultra low-latency encryption. As a result, SPEEDY adopts a lightweight S-box, whose coordinate functions are realized as two-level NAND trees, and a hardware-friendly linear layer. After careful combination, SPEEDY achieves lower latency in hardware than most cryptographic primitives. SPEEDY contains three versions SPEEDY-5-192, SPEEDY-6-192 and SPEEDY-7-192, whose number of rounds are 5, 6, and 7, respectively. Note that, as the number of rounds decreases, the SPEEDY cipher gains better performance in hardware implementations but has a weak bound of security claim.

Recently, Boura et al. [5] announced that the full-round security of SPEEDY-7-192 is broken under the chosen-ciphertext setting, where a round-reduced attack on SPEEDY-6-192 is also proposed. However, due to the more restricted security parameters, no valid attack on SPEEDY-5-192 is given. The best key recovery attack on this variant only covers 3 rounds proposed by Rohit et al. [16].

In this paper, we aim to evaluate the security of SPEEDY thoroughly using differential cryptanalysis, linear cryptanalysis and differential-linear cryptanalysis. Proposed by Biham and Shamir in 1990, differential cryptanalysis [4] has been an important method in evaluating the security of symmetric-key ciphers. Using a differential distinguisher with high probability, one can recover the secret key after adding several rounds before or/and after it. Such an attack can be mounted only when the adversary is under the chosen-plaintext/ciphertext setting. Linear cryptanalysis [13], proposed by Matsui in 1992, only requires the adversary under the known-plaintext setting, which is more realistic than the differential attack. It exploits the linear distinguisher with high correlation considering the linear relation between some bits of plaintexts, ciphertexts and the secret key. Differential-Linear cryptanalysis [8] is another chosen plaintext/ciphertext method, which is proposed by Langford and Hellman in 1994. This method uses a distinguisher with a high correlation that consists of a differential distinguisher followed by a linear distinguisher. To evaluate its correlation more accurately, Achiya and Dunkelman introduced DLCT (short for Differential Linear Connectivity Table) [1] to connect the differential distinguisher and the linear one. This method works efficiently on ciphers whose differential probabilities and linear correlations are very high when the distinguisher covers small rounds, but decrease very quickly when the number of rounds increases.

1.1 Contributions

In this paper, we propose several key recovery attacks on full-round SPEEDY-7-192 and 4-round SPEEDY-5-192 by exploiting key-recovery friendly distinguishers constructed following the divide-and-conquer strategy. Detailed contributions are shown as follows.

Strategy of Searching Key-Recovery Friendly Differential and Linear Trail of SPEEDY. Our searching strategy includes two parts: (a) control the propagation of active pattern deduced from the input difference/mask and output difference/mask; (b) construct long distinguishers in the divide-and-conquer way. To fulfill (a), we introduce TDDT (short for truncated differential distribution table) to show the non-randomness of S-box with bit-wise truncated differential propagation. With this table, one can depict clearly in the searching algorithm on how to choose input/output differences of the target distinguisher. While for (b), we construct part of the distinguisher by concatenating two distinguishers covering the small number of rounds. For instance, the 5.5-round distinguisher is first built with a 4-round distinguisher. To gain a better 4-round distinguisher, we split it into two parts with each covering two rounds. The input difference of the MixColumn operation in the middle is chosen to minimize the total number of activated S-boxes involved in its two neighborhood S-box layers. For the other MixColumns contained in the 5.5-round distinguisher, no restriction on their input differences is necessary.

Improved Key Recovery Attacks Against Full-Round SPEEDY-7-192. By exploiting the above strategy, we gain a 5.5-round differential distinguisher, which has a higher probability than the one found by [5]. With this differential trail, we mount the first chosen-plaintext attack on its full-round variant. Besides, since it’s key-recovery-friendly, it also leads to a full-round chosen-ciphertext attack, which requires slightly less complexities than the one proposed by [5]. Further, we also mount the first known-plaintext 7-round linear attack with a 5-round linear distinguisher by adopting a similar search strategy. Although this one costs slightly higher complexities than the differential attack, it is a known-plaintext attack which requires weaker capabilities of the adversary than the chosen-plaintext attack required in the differential cryptanalysis. Therefore, such linear key recovery attack is also valuable. All our attack results along with previous published ones are summarized in Table 1.

Improved Key Recovery Attack on Round-Reduced SPEEDY-5-192. We also provide two attacks on 4-round SPEEDY-5-192 using differential and differential-linear cryptanalysis, respectively. More precisely, the differential attack is mounted with a two-round distinguisher where one round is added at the top and bottom, respectively, while the differential-linear attack is given by exploiting a three-round distinguisher and adding one half-round at both sides. Note that they are the best valid attacks on it in terms of the number of rounds.

Table 1. Summary of valid attacks on SPEEDY-7-192 and SPEEDY-5-192. KP, CP and CC separately denote the data collected in the known-plaintext, chosen-plaintext and chosen-ciphertext settings. Time complexities are evaluated in encryption units, while memory costs are evaluated in block size.

1.2 Organization

First, we briefly recall SPEEDY, as well as differential, linear and differential-linear cryptanalysis in Sect. 2. In Sect. 3, we show how to exploit the propagation properties of SB and MC operations, and introduce the TDDT (short for truncated differential distribution table). Using these properties, we introduce the automated search method oriented to key recovery for differential, linear and differential-linear cryptanalysis in Sect. 4. Detailed attack procedures are given in Sect. 5. Finally, we summarize our work in Sect. 6. Besides, the code for searching distinguishers in this paper is available at the following repository:

https://github.com/Jin-liang-Wang/Cryptanalysis-of-SPEEDY.

2 Preliminaries

2.1 Brief Introduction of SPEEDY

SPEEDY [9] is a family of ultra-low latency block ciphers proposed at CHES 2021. SPEEDY uses a 6-bit bijective S-box and can be instantiated with different block sizes, key sizes, and numbers of rounds. For instance, SPEEDY-r-6l is the version of block size and key size of 6l-bits, and r is the number of iterated rounds. The internal state of SPEEDY-r-6l can be viewed as an \(l \times 6\) rectangle array of bits where \(l=32\). Following the notation of its design document [9], we use \(x_{[i,j]}\) to denote the bit located at row i and column j of the state x where \(0 \le i< l, 0 \le j < 6\).

The design document of SPEEDY suggests \(l = 32\), and the number of rounds \(r\in \{5,6,7\}\), which are called SPEEDY-5-192, SPEEDY-6-192, SPEEDY-7-192, respectively. As for the security claim, SPEEDY-r-192 achieves 128-bit security when iterated over \(r = 6\) rounds and full 192-bit security when iterated over \(r = 7\) rounds, while the \(r = 5\) rounds variant provides \(2^{128}\) time complexity and \(2^{64}\) data complexity. To make it clear, we denote SPEEDY-r-192 as SPEEDY in this paper, and briefly recall the cipher as follows.

Initialization. SPEEDY receives a 192-bit plaintext and initializes the internal state with a two-dimensional matrix x. Specifically, it initializes the k-th MSB (short for the most significant bit) into \(x_{[i,j]}\), where \(k = 6i + j\).

Round Function. Its round function consists of five different operations: SubBox (SB), ShiftColumns (SC), MixColumns (MC), AddRoundKey (A\(_{k_r}\)) and AddRoundConstant (A\(_{c_r}\)). A high-level overview of its round function is shown in Fig. 1. Note that the last round function only has three operations.

Fig. 1.
figure 1

r rounds of SPEEDY block cipher.

SB is composed of 32 parallel 6-bit Sbox, where each Sbox takes the i-th row \((x_{[i,0]},x_{[i,1]},x_{[i,2]},x_{[i,3]},x_{[i,4]},x_{[i,5]})\) as input and its output is placed in the same row. The detail of S-box is shown in Appendix A. SC rotates the j-th column of the state upwards by j bits. In other words, the output bit \(y_{[i,j]}\) equals to the input bit \(x_{[(i + j)\bmod 32, j]}\). MC also operates on each column, where seven specifically chosen input bits are XORed as a new bit of its output, as shown in Eq. (1). It also can be seen as multiplying each column by a cyclic matrix.

$$\begin{aligned} y_{[i,j]} = x_{[i, j]} \oplus x_{[i + 1, j]} \oplus x_{[i + 5, j]} \oplus x_{[i + 9, j]} \oplus x_{[i + 15, j]} \oplus x_{[i + 21, j]} \oplus x_{[i + 26, j]}, \quad {\forall {i,j}. } \end{aligned}$$
(1)

The 192-bit round key \(k_r\) and 192-bit constant \(c_r\) are respectively XORed with the entire state in A\(_{k_r}\) and A\(_{c_r}\).

Key Schedule. The 192-bit master key of SPEEDY is regarded as the first round key \(k_0\). Relation between the r-th round key \(k_r\) and the \((r+1)\)-the round key \(k_{r+1}\) is constructed by a bit permutation PB. To be clear, the j-th bit of \(k_{r+1}\) equals to the i-th bit of \(k_r\), where \(j\equiv ( 7i + 1) \bmod 192\).

2.2 Differential, Linear and Differential-Linear Crytanalysis

Differential Cryptanalysis. Differential cryptanalysis is a chosen plaintext attack proposed by Biham and Shamir [4] in 1990. Starting from a well-chosen difference \(\delta _{in}\), the distribution of \(E_k (x)\oplus E_k (x\oplus \delta _{in})\) is non-uniform. More precisely, there exits a specific \(\delta _{out}\) such that \({\text {Pr}}_{x,k}[E_k (x)\oplus E_k (x\oplus \delta _{in})= \delta _{out}]\) is significantly higher than \(2^{-n}\), where n is the block size. In order to distinguish a cipher with a high probability differential \((\delta _{in} \rightarrow \delta _{out})\) from a random permutation, one can collect D ciphertexts corresponding to pairs of plaintexts \((P,P\oplus \delta _{in})\), and compute the number of pairs following the differential:

$$ Q=\#\left\{ P: E_k(P) \oplus E_k(P \oplus \delta _{in})=\delta _{out}\right\} . $$

The expected value of Q for the cipher is \(D\times {\text {Pr}}[\delta _{in}\rightarrow \delta _{out}]\) and \(D\times 2^{-n}\) for the random permutation. Thus, the distinguisher succeeds with high probability when \(D=\mathcal {O}(1/{\text {Pr}}[\delta _{in}\rightarrow \delta _{out}])\).

Linear Cryptanalysis. Linear cryptanalysis, proposed by Matsui at EUROCRYPT 1993 [13], exploits the linear approximations of the round function in order to obtain a biased approximation of the cipher. The linear approximation is a pair of masks \((\alpha ,\alpha ^{\prime })\) such that the distribution of the XORed masked plaintext and ciphertext \(x\cdot \alpha \oplus E_k(x)\cdot \alpha ^{\prime }\) is biased. More precisely, we have (\(|{\text {Pr}}_{x}[x\cdot \alpha \oplus E_k(x)\cdot \alpha ^{\prime }]-\frac{1}{2}| \gg 2^{-n/2})\) for most keys k), where \(x\cdot y = \bigoplus _i x_i y_i\) denotes the inner product. Then, we have the correlation which is expected to be zero when averaged over all keys is:

$$ \text {Cor}_k\left( \alpha \rightarrow \alpha ^{\prime }\right) =2 \underset{x}{{\text {Pr}}}\left[ x \cdot \alpha =E_k(x) \cdot \alpha ^{\prime }\right] -1. $$

Similarly, to distinguish a cipher with a biased linear approximation \((\alpha ,\alpha ^{\prime })\) from a random permutation, we collect D known plaintexts/ciphertexts, and evaluate the experimental correlation:

$$ Q=\left( \#\left\{ P, C: P \cdot \alpha \oplus C \cdot \alpha ^{\prime }=0\right\} -\#\left\{ P, C: P \cdot \alpha \oplus C \cdot \alpha ^{\prime }=1\right\} \right) / D . $$

The expected value Q is larger (in absolute value) for the cipher than for a random permutation and can be observed with high probability when \(D=\mathcal {O}(\text {Cor}(\alpha \rightarrow \alpha ^{\prime })^{-2})\).

Differential-Linear Cryptanalysis. Differential-linear cryptanalysis, proposed by Langford and Hellman in 1994 [8], is a technique to combine differential and linear cryptanalysis. Let E be a cipher which can be described as a cascade of three subciphers, \(E_0\), \(E_m\) and \(E_1\), i.e., \(E = E_1 \circ E_m \circ E_0 \). Let \(\delta _{in}\) and \(\delta _{out}\) denote the input and output difference of \(E_0\), \(\alpha \) and \(\alpha ^{\prime }\) be the input and output linear mask of \(E_1\). Asume that the differential trail \(\delta _{in} \xrightarrow {} \delta _{out}\) in \(E_0\) is satisfied with probability p, and the linear trail \(\alpha \xrightarrow {} \alpha ^{\prime }\) in \(E_1\) is satisfied with correlation q. The correlation of \(\delta _{out} \xrightarrow {} \alpha \) in \(E_m\) is satisfied with

$$ r = \text {Cor}\left( \delta _{out} \xrightarrow {E_m} \alpha \right) =2 \underset{x}{{\text {Pr}}}\left[ E_m(x) \cdot \alpha = E_m(x \oplus \delta _{out}) \cdot \alpha \right] -1 . $$

Then, the correlation of the trail \( \delta _{in} \xrightarrow {E} \alpha ^{\prime }\) is satisfied with correlation \(\text {Cor}(\delta _{in} \xrightarrow {E} \alpha ^{\prime }) =pq^2r\). One can collect D ciphertexts corresponding to pairs of plaintexts \((P,P \oplus \delta _{in})\) and evaluate the experimental correlation:

figure a

Similarly, the expected value is larger (in absolute value) for the cipher than for a random permutation and can be observed with high probability when \(D=\mathcal {O}(\text {Cor}(\delta _{in} \xrightarrow {E} \alpha ^{\prime })^{-2})\).

3 Further Study on Operations of SPEEDY

In general, the extended path before and after the distinguisher determines the number of guessed key bits, the attack rounds, and the time complexity. To control the number of active S-boxes in the extended path, we add the constraint of the propagation through key recovery rounds and propose the concept of TDDT (short for truncated differential distribution table). Since the security of SPEEDY highly relies on the 6-bit S-box and the MC operation, we did in-depth research for SB operation with the TDDT in Sect. 3.1 and for MC operation in Sect. 3.2, respectively. With these newly observed properties, we can construct the automated search method for the differential and linear distinguishers by taking the key recovery into account in Sect. 4.

3.1 Truncated Differential Distribution Table

In general, the time complexity of differential cryptanalysis is affected by the number of active bits in plaintext and ciphertext. To control the number of active bits, we limit some bits being inactive before the second SB operation to reduce the number of active bits in plaintext, as shown in Sect. 4.

On average, each bit is limited inactive in rounds of key recovery with probability \(2^{-1}\). However, we have found that in a specific S-box, e.g., the S-box of SPEEDY, this probability is fluctuate considerably. Two examples are provided in Equations (2a) and (2b) below, although the input sets of these two situations are both one bit inactive, the probabilities of Eqs. (2a) and (2b) are much higher than \(2^{-1}\).

$$\begin{aligned}&{\texttt {**0***}} \xrightarrow [p = 2^{-0.54}]{{\textsf{SB}}} \texttt{010000}\,, \end{aligned}$$
(2a)
$$\begin{aligned}&{{\texttt {0*****}}} \xrightarrow [p = 2^{-0.83}]{{\textsf{SB}}} {\texttt{010000}}\,, \end{aligned}$$
(2b)

where * denotes a arbitrary bit \(\in \{0,1\}\). The probability is computed as follows. Taking Eq. (2b) as an example. The input set \(\varDelta _{in}\) contains \(2^{5}\) differences, i.e., {0b000000,0b000001,\(\cdots \),0b011111}. We assume that each difference in \(\varDelta _{in}\) appears with equal probability. Besides, the output difference is \(\delta _{out}=\) 0b010000. Then this probability is computed by \( {\text {Pr}}(\varDelta _{in} \xrightarrow []{{{\textsf{SB}}}} \delta _{out} ) = \frac{\sum _{\delta _{in} \in \varDelta _{in}} {\text {Pr}}(\delta _{in} \xrightarrow []{{{\textsf{SB}}}} \delta _{out})}{|\varDelta _{in}|}. \)

In order to get an optimal probability in rounds of key recovery, we need to consider how to select inactive bits. In this section, we introduce the TDDT and show how it can select inactive bits using the automatic solver in Sect. 4.

Compared to DDT (short for differential distribution table [4]) which contains the number of transition situations of each fixed input-output difference, for a N-bit S-box, the TDDT is a \(2^N \times 2^N\) table which contains the number of transition situation from a set of input differences to a set of output differences. To clarify, we define the following notation to represent a set of differences named bit-string \({\textbf {k}}\). For an n-bit bit-string \({\textbf {k}} = (k_0, k_1, ..., k_{n-1}), {\textbf {k}} \in \mathbb {F}^{n}_{2}\), we define \({\textbf {k}}^{\prime } \preceq {\textbf {k}}\), if \(k^{\prime }_i \le k_i\) for all i. Then, we denote \(\mathcal {B}_{\varDelta _{in}}/\mathcal {B}_{\varDelta _{out}}\) as the bit-string representation of the input/output difference set \(\varDelta _{in}/\varDelta _{out}\). For example, when a truncated input difference \(\varDelta _{in}\) is "00*00*", the bit-string representation \(\mathcal {B}_{\varDelta _{in}}\) is \(\texttt{0b001001}_{\mathcal {(S)}}\)Footnote 1, i.e., \(\mathcal {B}_{\varDelta _{in}} = {\texttt{0b001001}}_{\mathcal {(S)}}\) denotes the differential set "00*00*" which contains differences {0b000000,0b000001,0b001000,0b001001}. Thus, we can construct TDDT from the DDT using Algorithm 1 and the probability of the truncated difference propagation from \(\varDelta _{in}\) to \(\varDelta _{out}\) is:

$$\begin{aligned} {\text {Pr}}(\varDelta _{in} \xrightarrow []{{{\textsf{SB}}}} \varDelta _{out} ) = \frac{ {{\textsf{TDDT}}}[\mathcal {B}_{\varDelta _{in}}][\mathcal {B}_{\varDelta _{out}]}}{|\varDelta _{in}| \cdot 2^N}, \end{aligned}$$
(3)

where N is the size of S-box and \(|\varDelta _{in}|\) denotes the number of input differences \(\delta _{in}\) that fulfills \(\delta _{in}\preceq \varDelta _{in}\).

figure b

By exploiting the differential property of the transition from a set of differences that are inactive in certain bits to another set of differences with different inactive bits, TDDT can help us make a universal search model for key recovery. Moreover, we also consider the cases of differential propagation from the fixed input difference to a set of output differences. Then, we define FTDDT(short for fixed-input truncated difference distribution table) to describe the probability distribution of this transition and introduce it as follows.

Fixed-Input Truncated Difference Distribution Table . As aforementioned, FTDDT contains the number of transition situations of a fixed input difference to a set of output differences. For the special case, when the input difference is zero, the output difference must be zero, i.e., the set of output differences \(\varDelta _{out}\) contains only zero differences. More precise, when \(\delta _{in} = 0\), we have

$$ {{\textsf{FTDDT}}}[0][0] = 2^{N}\quad \text {and} \quad {{\textsf{FTDDT}}}[0][i] = 0, i\in [1,2^{N})\text {.} $$

We show the construction of FTDDT in Algorithm 2. By modeling these difference distribution tables into our automated search model, we can find a key recovery-oriented differential which contains key recovery rounds as well as distinguishers. Time complexity of Algorithm 2 is\( (2^N - 1) \cdot \sum _{i = 1} ^{N} \left( {\begin{array}{c}N\\ i\end{array}}\right) 2^i =\mathcal {O}(2^{N \cdot \log _2 6}) \text {,} \) which is evaluated as the number of the look-up table operations.

figure c

In summary, FTDDT shows the non-randomness of S-box with bit-wise truncated differential propagation. Considering the differential propagation through a random permutation, the fixed input difference \(\delta _{in}\) will propagate to the output difference set \(\varDelta _{out}\) with probability \(2^{-i}\) where the number of zero bits in \(\mathcal {B}_{\varDelta _{out}}\) is i. As mentioned before, due to the non-randomness of a specific S-box, the probability is changed to \({{\textsf{FTDDT}}}[\delta _{in}][\mathcal {B}_{\varDelta _{out}}]/2^{N}\). In order to verify the correctness of FTDDT, we have conducted an experiment in Appendix B. With this accurate propagation probability of differential, we can gain a better-extended path for a key recovery attack as shown in Sect. 4.

Inverse Probability of FTDDT. During key recovery, we need to partially encrypt (resp. decrypt) the plaintext pair (resp. ciphertext pair) to validate the input (resp. output) of the distinguisher. To filter the good pairs of plaintext more precisely in validating the input of the distinguisher, we utilize the inverse of FTDDT. We denote \(N_a\) as the number of active bits in \(\varDelta _{out}\). For example, \(N_a=5\) if \(\varDelta _{out}=\) 0b110111, i.e., **0***. Then, when considering a fixed difference \(\delta _{out} \preceq \varDelta _{out}\) propagating to \(\delta _{in}\) which is the inverse of FTDDT[\(\delta _{in}\)][\(\mathcal {B}_{\varDelta _{out}}\)], the average probability of this transition is \( {{\textsf{FTDDT}}}[\delta _{in}][\mathcal {B}_{\varDelta _{out}}] / 2^{N + N_a}. \) The proof is shown in Appendix C. With the inverse probability of FTDDT, we will get the more precise time complexity of the key-recovery procedure.

3.2 Differential Propagation Property of MC

The MC operation of SPEEDY generates each new bit in a column by XORing seven current bits. It also can be seen as multiplying each column by a cyclic matrix M. In this section, we research the MC operation in-depth by considering the matrix M. Here we traverse all possible inputs with HW (short for Hamming weight) from one to eight and find that their minimum output \(\text {HW are } (7,8,7,8,7,6,5,4)\). Table 2 summarizes which choices of active differential bits are associated with these minimal output HW cases. Specifically, we use \(\{ a_{0}, a_{1},\cdots \}\) to denote that the \(a_i\)-th bit of the input difference is active. Moreover, due to the property of the MC operation of SPEEDY, we rotate the array and make the 0-th bit active to represent the rotation invariant array class. From Table 2, we can observe that when the input HW is one, there is a minimum sum of the input and output minimum HW. Meanwhile, when the input HW is two and three, the minimal sum of the input and output HW is relatively small.

Table 2. Summary of active differentials yielding low output HW from M. When the input HW is greater than 3, the cases of active bits are too large to list. We record their number to simplify. For the input differences with different HW, we list its output difference which have the minimum HW.

When building the automated search model, one can utilize the property of MC to control the diffusion of the differential, which can reduce the size of the search space. In this work, we utilize a divide-and-conquer method to search for a longer differential by dividing it into several parts. More precisely, for a four rounds differential, we first search several two rounds differentials with the input difference of the last MC is of relatively low HW as in Table 2. Then, we search the last two rounds of differential and connect them with the middle MC. By limiting the sum of input-output HW of the middle MC, we can control the diffusion of the 4-round differential, i.e. minimizing the total number of activated S-boxes involved in the middle, and achieve a higher probability.

We note here that Boura et al. [5] proposed a different strategy to utilize the properties of the MC operation. In their approach, a differential is divided into several one-round parts and they consider the case that the HW of both input and output are small.

4 Automated Search Method Oriented to Key Recovery

In this section, we introduce the automated search strategies of differential, linear and differential-linear distinguishers against SPEEDY following the divide-and-conquer strategy, as well as taking the key recovery procedure into consideration.

Usually, cryptanalysts perform key recovery attacks based on a strong distinguisher. By appending several rounds before and after the distinguisher, one can obtain an extended path containing both distinguisher and key recovery rounds. However, this two-step process of key recovery cannot promise optimal results as shown in [15, 18]. Therefore, automated search oriented to key recovery can obtain better attack results. To find such key-recovery-friendly distinguishers, properties of SB and MC are introduced in Sect. 3 are utilized.

We implement our search algorithms using STPFootnote 2, which is a solver developed based on the SAT (short for boolean satisfiability problem) [12]/SMT(short for satisfiability modulo theories) [2] problem and has been widely used in the field of cryptography [6, 10, 11, 14].

4.1 Differential Search Model Against SPEEDY

Since we aim to search for distinguishers in the single-key setting, we can omit A\(_{k_r}\) and A\(_{c_r}\) in our model. We denote \(S_j^i\) as the j-th intermediate state of the i-th round and set the first round indexed with zero as the key recovery round.

Fig. 2.
figure 2

7-round differential attack of SPEEDY.

Generally, the variable number and the size of each variable will determine the total scale of the search model and affect the complexity of solving the SAT/SMT problem. As the number of rounds increases, the time and memory complexity of solving the problem increases exponentially. Specifically, when the model exceeds four rounds, we cannot obtain a solution in a reasonable time with STP in the case of SPEEDY. For obtaining a longer distinguisher, we employ a divide-and-conquer [18] approach to search several models with a short round to form the entire differential. As shown in Fig. 2, our automated search model is divided into several steps. Motivated by [15], we also consider a key recovery-oriented search model where the distinguisher rounds and key recovery rounds are all included.

Model One: 7-Round Differential Contains 1.5-Round of Key Recovery. The 7-round differential consists of 5.5-round differential distinguisher \(S_0^1\rightarrow S_1^6\) and 1.5 key recovery rounds \(S_0^0 \rightarrow S_0^1,S_1^6 \rightarrow S_3^6\). For convenience, we use backward FTDDT to represent the FTDDT of SPEEDY S-box and forward FTDDT to represent the FTDDT of the inverse of SPEEDY S-box.

  • Step 1: From \(S_3^1\) to \(S_4^3\), the search strategies are as follows.

    • Rule 1: Model SB to comply with DDT.

    • Rule 2: Model each active S-box in \(S_3^1\) to comply with \(2^{-3\times 2}\) probability. This strategy is mainly used to limit the active S-box in the \(S^1_3\). More precisely, we measure this trail with probability \(2^{-6 \times N_a} \times p\), where \(N_a\) is the number of active S-box in \(S^1_3\) and p is the probability generated from Rule 1 in Step 1. Indeed, \(N_a\) active S-boxes imply that there will have \(2 \times N_a\) S-boxes in trail \(S^1_0 \rightarrow S^1_3\). Since the maximum probability transition through the S-box is \(2^{-3}\), we measure each active S-box in \(S^1_3\) with probability \(2^{-6}\).

    • Rule 3: Constrain only one of the columns to be activated in \(S_4^3\). For the input before MC, consider only entries with low HW from Table 2.

  • Step 2: From \(S^3_4\) to \(S^6_0\), the strategies to reduce search space are as follows.

    • Rule 1: Due to the differential must be linked, we fix the input differences as those in \(S^3_4\).

    • Rule 2: Model the active S-box in \(S^6_0\) to comply with a maximum probability of \(2^{-3}\).

  • Step 3: After running the above two steps in parallel, we obtain several four rounds of differentials with high probability. Then, we choose the differential whose probability is higher than \(2^{-170}\) and search the corresponding last 1.5-round differential of distinguisher and 1.5-round of key recovery as follows.

    • Rule 1: Model \(S^1_0 {\mathop {\longrightarrow }\limits ^{{\textsf{SB}}}} S^1_1 \) and \( S^1_2 {\mathop {\longrightarrow }\limits ^{{\textsf{SB}}}} S^1_3\) to comply with DDT.

    • Rule 2: Model \(S^0_1 {\mathop {\longrightarrow }\limits ^{{\textsf{SB}} ^ {-1}}} S^0_0\) and \( S^6_0 {\mathop {\longrightarrow }\limits ^{{\textsf{SB}}}} S^6_1\) to comply with FTDDT.

    • Rule 3: Constrain the active S-box in \(S^0_1\) and \(S^6_2\) less than 32.

    • Rule 4: Constrain the active S-box number in \(S^0_1\) less than 16.

    • Rule 5: Constrain the active S-box number in \(S^6_2\) more than 6.

    Here, rules 3 to 5 are used to constrain the active bits to reduce the time complexity of the key recovery phase as shown in Sect. 5.1.

  • Step 4: Following step one to three, we can get several 5.5-round distinguishers with 1.5-round of key recovery. Then, we choose the distinguisher with maximum probability and search an extended path for key recovery as shown in Algorithm 6 (Appendix G). To reduce the time complexity, we consider the differential rotation invariant of SPEEDY. More precisely, we use one of the 32 rotation-invariant distinguishers to implement the key recovery attack, and the chosen one makes the lowest time complexity.

With the above four steps, we find a differential trail with probability \(2^{-185.53}\) which can be used to attack on the 7-round SPEEDY. More precisely, this trail consists of a 5.5-round differential distinguisher with probability \(2^{-182.49}\) and a 1.5-round key-recovery phase with probability \(2^{-3.04}\). We illustrate it as Fig. 3.

Model Two: 4-Round Differential Contains Two Rounds of Key Recovery. From Step 2, we can get several two rounds distinguisher from \(S^3_4\) to \(S^6_0\). Then, we use backward FTDDT and forward FTDDT to search 2-round of key recovery. We show this distinguisher with probability \(2^{-59.35}\) in Appendix J. For the property of linear key relation between \(k_0\) and \(k_4\), the rotation invariant property does not affect the time complexity when attacking the 4-round SPEEDY.

4.2 Searching Linear Distinguishers Against SPEEDY

For linear cryptanalysis, we use a similar strategy used for differential search, which is a model oriented toward key recovery. Then, we search for a five-round distinguisher with two rounds of key recovery by following a four-step process. Due to the page number limit of the paper, we have placed the specifics in Appendix D.

Finally, we found a 5-round distinguisher combined with two key recovery rounds with correlation \(2^{-93.01}\). We show the detail of the linear distinguisher in Appendix I. In this distinguisher, the bits of \(k_7\) which is derived from \(k_0\) by linear key schedule is 139. We reduce the guessed key bits to 185 bits in the key recovery phase.

4.3 Searching Differential-Linear Distinguishers Against SPEEDY

Similarly, we search for distinguishers against 4-round SPEEDY. More specifically, we search for three rounds of differential-linear distinguishers and extend half a round forward and backward for key recovery. The detailed strategy is shown in Appendix E.

Finally, the distinguisher is shown in Fig. 5 of Appendix F. The correlation of this 3-round distinguisher is \(2^{-23.095}\) and the probability of rounds of key-recovery is \(2^{-6.972}\). The bits of \(k_7\) deduced from \(k_0\) by linear key schedule are 29.

5 Key Recovery Attack Against SPEEDY

In this section, we first implemented both differential and linear cryptanalysis on full SPEEDY with the newly obtained distinguishers in Sect. 4. Then, the key recovery attack against other versions of SPEEDY are also provided.

5.1 7-Round Differential Attack Against Full SPEEDY

Using the newly obtained 7-round differential containing distinguishers and key recovery rounds, we can perform key recovery attacks on full SPEEDY. Before the attack, we first introduce some techniques that will be used in this section as follows.

Fig. 3.
figure 3

Key recovery attack on 7-round SPEEDY with differential cryptanalysis. When the output set of a FTDDT has only one single bit, it is shown as a single difference in this figure since it has one possible case for such an output set, e.g., 0b00*000 is shown as 0b001000 since the active difference is impossible to transit to the 0b000000 through the S-box of SPEEDY.

Deduce \(\boldsymbol{k_{0}}\) from \(\boldsymbol{k_{7}}\) by Linear Key Schedule. Due to the linear key schedule, we can get the relationship between \(k_{0}\) and \(k_{7}\) and deduce the key bits of \(k_{0}\) from \(k_{7}\). We call these bits of \(k_{0}\) deduced key bits from \(k_{7}\). The detailed relationship between \(k_{0}\) and \(k_{7}\) is shown as the purple squares in Fig. 3.

Function to Sieve Possible Key Guesses. The FirstSboxSieve function uses the key bits of \(k_7\) to deduce the six bits of each row in \(k_0\) using the linear key schedule. After deducing the six bits of \(k_0\) from \(k_7\), we can filter the corresponding key guessing of the plaintext pair with the zero difference in \(S^0_1\). Then, for each plaintext pair and the corresponding six bit keys of each row, there will leave \(2^{6-d-n}\) key guesses after sieving by the first S-box, where d is the bit number deduced from \(k_7\) and n is the inactive bit number after the first S-box. We show the details of FirstSBoxSieve in Algorithm 3. The SecondSboxSieve is used to combine the sets \(L_i\) of possible key bits in a bigger set. The function can sieve the elements of the bigger set using the difference in \(S^0_3\). The detail is shown in Algorithm 4.

figure d
figure e

Distinguisher with Rounds of Key Recovery. The 5.5-round differential distinguisher is shown in Fig. 3, where the black square and the blue square denote the active bits from \(S^1_4\) to \(S^5_4\) and the active bits from \(S^0_3\) to \(S^1_3\), respectively. To distinguish the differential propagation of the key recovery round, we use different colors to indicate the propagation of each active bit in the extended rounds, where the gray square indicates the active bits whose differences are forced zero. Further, we use the shaded square to denote the bits random from \(\{0,1\}\). Then, we use the red square to denote the active bits in \(S^0_0\) and \(S^6_3\). To represent the bits in \(k_0\) that are derived from \(k_7\) using the linear key schedule, we use the purple square to denote the deduced bits in \(k_0\) and the green square to denote the guessed key bits when performing the key recovery attack. Note that 180 key bits in \(k_0\) are involved in the key recovery attack.

The attack procedure contains three parts, i.e., data collection, key recovery and brute force the master key.

Data Collection. Consider structure S which consists of \(2^{156}\) plaintexts and for each plaintext pair \(P_0,P_1 \in S\), \( P_0 \,\oplus \, P_1 = (N,N,N,N,N,N,N,N,N,N,N,N,N,\) 0, NNNNNNNNNNN, 0, 0, 0, 0, 0, NN), where N denotes any 6-bit cell. We require the oracle to encrypt those \(2^{156}\) plaintexts and collect the ciphertexts by the hash table which is indexed by the 156-bit inactive bits in ciphertexts. From the hash table, we expect to get \(2^{155}\) pairs of plaintext-ciphertext pair \((P_0,C_0),(P_1,C_1)\) that satisfy \(C_0 \oplus C_1= (0,N,0,N,0,0,0,N,0,0,0,0,N,0,0,\) \(0,N,0,0,0,0,0,0,0,0,N,0,0,0,0,0,0)\text {.}\)

Next, we briefly introduce the key recovery and brute force phases, the detail of the pseudocode is shown in Appendix G. We use \(\delta _{in}^i\) to denote the i-th row of the input difference of the 5.5-round distinguisher.

Key Recovery. For each pair obtained from the data collection phase, we first use \(C_0 \oplus C_1\) to deduce the 36-bit \(k_7\) from DDT. Note that, since we evaluate the propagation probability of key-recovery rounds by FTDDT, the number of deduced 36-bit \(k_7\) should be calculated by the inverse probability of FTDDT as shown in Sect. 3.1. For each deduced 36-bit \(k_7\), we fix the deduced keys in \(k_0\) from \(k_7\) due to the linear key schedule and guess other 144-bit \(k_0\) as follows.

Firstly, we use the early abort technique which is guessing the key-bits of row 14,15,17 and partial encrypt to run FisrtSboxSieve. If there are no satisfied key guesses to validate the output difference of the SubBox operation, we consider the 36-bit deduced \(k_7\) is wrong and discard it. If all the deduced \(k_7\) are wrong, we consider this pair of plaintext-ciphertext is wrong and discard it. Next, for each plaintext-ciphertext pair, we sieve \(k_0\) in row 11,12,13,16 with FisrtSboxSieve and partial encrypt to \(\delta _{in}^{11}\) with SecondSboxSieve. After this step, the plaintext-ciphertext pair and corresponding \(k_0\) can validate the input difference \(\delta _{in}^{11}\). Similarly, we discard the deduced \(k_7\) if there are no satisfied \(k_0\) guesses to validate \(\delta _{in}^{11}\) and discard the pair if no deduced \(k_7\) is left. Then, we employed the same method to those deduced keys in the row of \(k_0\) with FisrtSboxSieve and partial encrypt plaintext pairs to \(\delta _{in}^{14}, \delta _{in}^{15}, \delta _{in}^{18}, \delta _{in}^{19}, \delta _{in}^{7}, \delta _{in}^{5}, \delta _{in}^{4}, \delta _{in}^{3}, \delta _{in}^{2}, \delta _{in}^{1}, \delta _{in}^{30},\delta _{in}^{29}, \delta _{in}^{21}\) to further sieve with SecondSboxSieve.

Finally, it will leave \(2^{135.46}\) pairs and \(2^{6.5}\) 180-bit keys for each pair on average, and we will get \(2^{141.96}\) candidate keys of this 180-bit master key for each structure. Then, we use the brute force procedure to sieve the wrong candidate keys.

Brute Force. For each candidate key from the key recovery procedure, we guess the remaining 12-bit of the master key to get the complete 192-bit candidate master key \(k^{c}\). Then we use a test pair \((m_t,c_t)\) from encrypt oracle to check if \(E_{k^{c}}(m_t) = c_t\) to verify if \(k^{c}\) is the right key.

Complexity and Success Probability. Since the memory access of storing the ciphertexts to the hash table is negligible compared to the time for collecting the ciphertexts, the time complexity of data collection for each structure is \(2^{156}\). Since the hash table storing \(2^{155}\) pairs is very large, we treat the time to access this table as one encryption. Further, since the 7-round SPEEDY encryption has \(7 \times 2 \times 32 = 448\) SubBox operations, the time complexity of the key recovery phase is less than \(2^{155} + (70\times 2^{155})/448\) for each structure (details are shown in Appendix ). Then, the time complexity of Brute Force phase is \(2^{151.96}\) times 7-round SPEEDY encryption for each structure. Overall, the total time complexity of each structure is \(2^{156} + 2^{155} + (70\times 2^{155})/448 + 2^{153.96} \approx 2^{156.86}\) 7-round \({\texttt {SPEEDY}}\) encryptions.

The signal-to-noise-ratio is Using \(2^{185.53}/2^{155} \approx 2^{30.53}\) structures, we can ensure the success rate of 79.15% with 7.51-bit advantage according to [17]. Hence, the data complexity is \(2^{186.53}\), the time complexity is \(2^{30.53} \times 2^{156.86} \approx 2^{187.39}\) and the memory required is \(2^{156}\).

Reducing Memory Requirement. The above attack require \(2^{156}\) bits in memory, which is too large. To obtain lower memory complexity, we can use the chosen ciphertext attack mode and select the active 36-bit structure at the ciphertext. After using the inactive 36 bits in the plaintext as the index, we get \(2^{35}\) pairs, and then one can follow the key recovery procedure used in the above attack to obtain the right key.

5.2 7-Round Linear Attack Against Full SPEEDY

With the 5-round linear distinguisher shown in Appendix I, we compute the attack parameter on SPEEDY-7-192 using the symbols borrowed from [7]. With the \(\mathcal {FWT}\) approach in [7], the time complexity is \( 2(\rho _M + \rho _A)2^{|k_0 |+ |k_7 |- l_{07}}\text {,} \) where the \(|k_i |\) means the bit number guessed in \(k_i\), and \(l_{07}\) means the key bits of \(k_7\) deduced from \(k_0\). Following the notation in [7], \(\rho _A\) is the cost of an addition, and \(\rho _M\) is the cost of a multiplication. Assuming that two multiplications correspond to roughly one evaluation of the cipher, we have the time complexity of key recovery phase \(2^{186}\) encryptions. The memory required is \(2^{144} + 2^{180} + 2^{185} \approx 2^{185.04}\). According to [17], we need \(2^{188.50}\) data and the advantage \(a = 4\) to achieve the success rate 69.12%. Finally, we exhaustively search the remaining 7 key bits. The total time complexity of the attack is \(2^{188.50} + 2^{186} + 2^{188} \approx 2^{189.41}\).

5.3 4-Round Differential Attack Against SPEEDY.

As shown in Appendix J, we choose ciphertexts in \(S^3_3\) to make a 60-bit structure and filter the corresponding plaintext pairs in \(S^2_0\). After running this step, it will leave \(\left( {\begin{array}{c}2^{60}\\ 2\end{array}}\right) /2^{36} \approx 2^{83}\) pairs. Then, we sieve the key bits of the 108-bit involved \(k_4\) according to the \(2^{83}\) pairs and leave \(2^{43.692}\) possible \(k_4\) for each pair. Then, for the combination of these \(2^{83}\times 2^{43.692} = 2^{126.692}\) plaintext-ciphertext pairs and corresponding key guesses, we guess the 156-bits \(k_0\) using the method in Sect. 5.1.

Due to the differential probability being \(2^{-59.35}\) and one structure can only provide \(2^{60}\) data, we construct two structures to collect the data. The data complexity of this attack is \(2^{61}\), the time complexity is \(2^{119.692}\) and the memory required is \(2^{83}\). With above attack parameters, the success rate is \(94.17\%\).

5.4 4-Round Differential-Linear Attack Against SPEEDY

With the distinguisher searching in Sect. 4.3, we obtain a 4-round differential-linear attack against SPEEDY with \(2^{61}\) data complexity \(2^{105}\) time complexity and \(2^{105}\) memory requirement. The detail of this attack is shown in Appendix F.

6 Conclusion

In this paper, we first show how to find key-recovery friendly distinguishers for SPEEDY by following the divide-and-conquer strategy as well as some other new techniques. With such strategies, we found a 5.5-round differential distinguisher which is key-recovery friendly and with higher probability than the one used in the previous result. Using this distinguisher, we are able to mount the first chosen-plaintext full-round attack on SPEEDY-7-192. Besides, with the same distinguisher, we can also mount a full-round attack under the chosen-ciphertext setting, which slightly reduces the attack complexities compared with the previous one proposed in the same setting. Meanwhile, using a similar search strategy, we also found a 5-round linear distinguisher which leads to the first known-plaintext full-round attack. Although the full-round security of this variant has recently been announced to be broken, our attacks here need a weaker ability requirement for the adversary. Besides, for SPEEDY-5-192, which requires more restricted attack parameters, we also propose two 4-round key recovery attacks using a differential distinguisher and a differential-linear one, respectively. According to our best knowledge, both attacks are the best ones on this variant in terms of the number of rounds. However, its full-round security is not threatened.