1 Introduction

In recent years, the design of lightweight block ciphers is one of the most important research areas and has attracted lots of attention from all over the world. Lightweight encryption is motivated by the need to provide acceptable security for specific applications used in resource constrained environment, such as RFID, sensor network, and microcontroller in the Internet of Things (IoT). The most important concern is an optimal implementation with much lower area, or power consumption than traditional encryption standard such as AES [16]. A variety of lightweight schemes aiming at various goals have been proposed in the last few years. The first generation of lightweight block ciphers such as PRESENT [8] and KATAN [13], mainly focused on hardware implementation performance such as area cost. Then software-oriented designs such as nibble-wise block cipher LBlock [23] and TWINE [22] have emerged, which take not only hardware but also 8/16/32-bit multiple software platforms into account. Nowadays, designs aiming at other goals have also been studied, such as the lightweight tweakable block cipher SKINNY [7], Midori [3] which is focusing on low energy, CRAFT [6] with efficient protection against DFA attacks. Moreover, designs capable of serialized implementation such as SIMON [5], Piccolo [20], and WARP [2] can achieve ultra lightweight with a very small hardware footprint compared to round-based implementation but at the cost of long cycles.

On the other hand, some new designs aiming at low-latency have been proposed rather recently. It is motivated by the need to provide real-time security for some specific applications, such as instant authentication, storage or memory encryption, high-speed encryption for secure processor architectures, and embedded systems with real-time requirements. For all these cases, low-latency encryption and instant response time are highly desirable. Hence an encryption scheme should be optimized for latency and the entire process should be completed within the shortest possible delay. Therefore, the most important feature of low-latency block cipher is to encrypt a block of data within one single clock cycle. However, for this kind of fully-unrolled implementation it will be a huge challenge to achieve acceptable chip area. Therefore, the number of rounds must be moderate and the round function may be relatively complex. For example, SP-type or SPS-type round function with lightweight low latency components such as 4-bit S-box and heavy linear layer such as MDS matrix are usually employed.

So far, only a few low-latency ciphers have been proposed. PRINCE [9] is the first low-latency block cipher proposed by Borghoff et al. at ASIACRYPT 2012, and it has already been deployed in a number of products including LPC55S of NXP Semiconductors [17]. Later, Bozilov et al. proposed an updated version called PRINCEv2 [12] which claims to increase the security level of PRINCE by making only small modifications. Inspired by the design of PRINCE, Beierle et al. proposed a low-latency tweakable block cipher called MANTIS [7]. Qualcomm company has also proposed a low-latency block cipher family called QARMA [1], which targets at applications such as memory encryption and pointer authentication. QARMA has already been used to achieve control flow integrity (CFI) in the products of ARMv8.3 [18]. In 2021, Banik et al. proposed Orthros [4], a low-latency pseudorandom function (PRF) which ignors the support of decryption to achieve ultra low-latency. Leander et al. proposed another ultra low-latency block cipher family called SPEEDY at CHES 2021 [15]. It primarily targets at secure process architectures embedded in high-end CPUs. It achieves ultra low-latency in single-cycle encryption speed, while the decryption is less efficient. Up to now, SPEEDY outperforms all the other known encryption primitives with the lowest latency in hardware.

On the other hand, in the security aspect, there exists more risk for low-latency cipher since the number of rounds must be as small as possible in order to achieve ultra low-latency. Not surprisingly, many low-latency block ciphers suffer serious security problems. Several theoretical attacks which can achieve even full-round have been reported. For example, Soleimany et al. [21] proposed an effective reflection attack which can apply to full-round \({\texttt {PRINCE}_{{core}}}\), and Dobraunig et al. [14] proposed a practical clustering differential attack on full-round MANTIS\(_5\). Therefore, for low-latency block cipher designs, study about the security margin and accurate security evaluation against known attacks may be very important and desirable.

Related Work: The notation of branch number is usually used to evaluate the minimum number of active S-boxes for SP-type round function. However, for block ciphers employing SPS-type round function, it will be difficult to obtain the accurate number of active S-boxes based on branch number directly. Therefore, the combination of SPS is usually considered together as a Super S-box and then the minimum number of active Super S-boxes is evaluated instead. Obviously, it will be less accurate and this method is usually used as a rough security evaluation against differential analysis in the design of block ciphers with SPS-type round function.

As a newly proposed low-latency block cipher, SPEEDY has not only outstanding performance in latency, but also has an unusual structure in contrast to other known low-latency block ciphers. Instead of the \(\alpha \)-reflection structure [9] with SP-type round function used in PRINCE, MANTIS and QARMA, SPEEDY employs an SPS-type round function with two layers of 6-bit S-box connected by bit rotations and then a binary matrix multiplication as the linear function. The block size and key size of SPEEDY are both 192-bit, and the number of rounds can be 5/6/7 which corresponds to different levels of security. In order to evaluate its security against differential and linear attacks, the designers try to analyze the minimum number of active S-boxes to give an upper bound for the probabilities of differential and linear trails. A direct way to compute the minimum number of S-boxes is by making use of the notation of branch number.

However, since the round function of SPEEDY consists of two layers of 6-bit S-box connected by bit rotations which can be considered as a Super S-box, it will be difficult to obtain the number of active 6-bit S-boxes directly. Therefore, in [15] the designers only considered 1-bit to 1-bit transitions through Super S-box whose maximum differential probability is \(2^{-6}\), and then the binary matrix with branch number \(bn = 8\) can ensure that the maximum expected differential probability of differential trails over tow rounds of SPEEDY is \(2^{-6 \times 8}\). Moreover, they proposed an extension for the definition of branch number called higher-order branch number \(bn_r\), which represents the minimum number of 1-bit to 1-bit transitions over r rounds. Then the maximum expected differential probability over r-round SPEEDY is estimated by \(2^{-6 \times bn_r}\).

Obviously, this is inaccurate since only 1-bit to 1-bit transitions through Super S-box are considered. Without the restriction of 1-bit difference, there should exist differential trails with less active 6-bit S-boxes. Not surprisingly, when they search for the minimum number of active S-boxes by assuming that all the differentials through the S-box are possible and there are at most 8 active S-boxes in the internal state, the results of minimum number of active 6-bit S-boxes for 2–4 rounds are all significantly smaller than the values estimated according to branch number [15]. This may threaten the security margin since the round number for low-latency block ciphers are usually rather small. Therefore, for block ciphers employing SPS-type round function, accurate evaluation of the minimum number of active S-boxes may be vital to the designers, especially when heuristic search for differential trail of long rounds is impractical.

Moreover, we have noticed that after the submission of our paper and during the review phase, Boura et al. reported some similar results about differential attack of SPEEDY in [10, 11]. They implemented a search for finding optimal trails under some constraints. The main search strategy was to precompute all good one-round trails such that both columns x and M(x) have at most 7 active bits each. Then they chained them to create longer trails and finally obtained a 5.5-round differential distinguisher with probability \(2^{-183.59}\). Their main contributions focused on improvements of the key-recovery part. They proposed some non-trivial techniques of efficient data and key-sieving, and based on them they presented a full-round differential attack of SPEEDY-7-192 with a time complexity of \(2^{187.84}\) and data complexity of \(2^{187.28}\).

Our Contribution: In this paper, we present some new observations on the branch number and study concrete differential analysis of SPEEDY. First of all, we propose a new notation of partition branch number, which can describe the minimum number of active S-boxes for SPEEDY more accurately. We also give an efficient algorithm to compute the value of partition branch number. Then by extending the notation to higher-order partition branch number, we can obtain more accurate results of the minimum number of active S-boxes for more rounds. Based on this, we have obtained the minimum number of active 6-bit S-boxes for 2–7 rounds SPEEDY are as follows:

$$\begin{aligned} pbn_2 = 13, \;\; pbn_3 = 23, \;\; pbn_4 = 35, \;\; pbn_5 = 45, \;\; pbn_6 = 57, \;\; pbn_7 = 67. \end{aligned}$$

It is noteworthy that these results are significantly smaller than the minimum number of active S-boxes in [15]. Hence, it will contradict the security margin evaluated by the designers. Moreover, our computation of higher-order partition branch number can be used to evaluate the minimum number of active S-boxes more directly and efficiently. Compared to the time-consuming search method, we can obtain accurate results for arbitrary rounds while automatic search may be infeasible for long rounds.

On this basis, we also search for the optimal differential characteristics of SPEEDY. Instead of the assumption that all the differential transitions through the S-box are possible, we take the Difference Distribution Table (DDT) of the 6-bit S-box into consideration. Therefore, we can study concrete differential characteristics which satisfy the actual difference distribution table and evaluate the security margin of SPEEDY more precisely. We first present a special kind of 1-bit to 1-bit differential trails for 2–7 rounds SPEEDY together with the maximum differential probabilities. Then, based on the computation of higher-order partition branch number, we present examples of optimal differential characteristics for 2–7 rounds SPEEDY, whose differential probabilities are \(2^{-46.2}\), \(2^{-76.72}\), \(2^{-129.2}\), \(2^{-170.0}\), \(2^{-216.0}\) and \(2^{-266.2.0}\) respectively.

Furthermore, by utilizing the simple bit-permutation key schedule of SPEEDY, we can extend the differential trail search method and construct an efficient 6-round related-key differential trail with probability \(2^{-179.2}\). Based on it, we can present a related-key differential attack on full round SPEEDY-7-192 with data complexity of \(2^{186.2}\) chosen-plaintexts and time complexity of \(2^{160.13}\) encryptions. Compared to other third-party cryptanalysis results on SPEEDY [10, 19], this is the first full-round related-key differential attack reported so far and the attack complexity outperforms all the previous known results.

Organization of the Paper: First of all, we give a brief description of SPEEDY in Sect. 2. Then, some observations of the branch number are explained in Sect. 3. In Sect. 4, we present the definition of partition branch number and based on it we give some more accurate results about the security evaluation of SPEEDY against differential analysis. Concrete differential trails for 2–7 rounds SPEEDY and experimental validation are provided in Sect. 5. In Sect. 6, we present related-key differential attack on full round SPEEDY-7-192. Finally, Sect. 7 concludes the paper.

2 Specification of SPEEDY

SPEEDY is a family of ultra low-latency block ciphers with different block size and varying number of rounds. For example, SPEEDY-r-6l is an instance with block and key sizes 6l bits and it iterates r rounds. The suggested parameter is SPEEDY-r-192 which achieves 128-bit security when iterated \(r=6\) rounds and full 192-bit security when iterated \(r=7\) rounds, while a decent security level (\(\ge 2^{128}\) time complexity when data complexity is limited to \(\le 2^{64}\)) when iterated \(r=5\) rounds.

For SPEEDY-r-192, the internal state is viewed as an \(32 \times 6\) array of bits. The notation x[ij] denotes the bit located at row i and column j of the state x with \(0 \le i < 32\) and \(0 \le j < 6\). First of all, the internal state is initialized with the 192-bit plaintext with the same order used for indexing bits. Then, round functions are applied on the internal state. Each round function is composed of the following operations: AddRoundKey (\(\texttt {A}_{k_r}\)), SubBox(SB), ShiftColumns(SC), SubBox(SB), ShiftColumns(SC), MixColumns (MC), AddRoundConstant (\(\texttt {A}_{c_r}\)). Note that in the last round, the linear layer and constant addition are omitted, and instead an extra key addition is applied. Therefore, the encryption procedure can be expressed as follows:

$$\begin{aligned} \texttt {A}_{k_0} \rightarrow \texttt {SB} \rightarrow \texttt {SC} \rightarrow \texttt {SB} \rightarrow \texttt {SC} \rightarrow \texttt {MC} \rightarrow \texttt {A}_{c_0} \rightarrow \cdots \rightarrow \texttt {A}_{k_{r-1}} \rightarrow \texttt {SB} \rightarrow \texttt {SC} \rightarrow \texttt {SB} \rightarrow \texttt {A}_{k_r} \end{aligned}$$
  • SubBox(SB): The 6-bit S-box S is applied to each row of the state. The contents of the 6-bit S-box is given in Table 1.

  • ShiftColumns(SC): The j-th column of the state is rotated upside by j bits.

  • MixColumns(MC): A cyclic binary matrix is multiplied to each column of the state. Namely, it can be computed as follows and the addition in the indices of the state is in module 32 for the first (row) index.

    $$\begin{aligned} y[i,j]=x_{i,j} \oplus x_{[i+1,j]} \oplus x_{[i+5,j]} \oplus x_{[i+9,j]} \oplus x_{[i+15,j]} \oplus x_{[i+21,j]} \oplus x_{[i+26,j]}, \forall i,j. \end{aligned}$$
  • AddRoundKey(\(\texttt {A}_{k_r}\)): The 192-bit round key \(k_r\) is XORed to the state.

  • AddRoundConstant(\(\texttt {A}_{c_r}\)): The 192-bit constant \(c_r\) is XORed to the state.

Table 1 Contents of the 6-bit S-box used in SPEEDY (in hexadecimal notation)

The KeySchedule of SPEEDY receives a 192-bit master key as the first round key and then applies a simple bit permutation to compute the next round key iteratively. We will omit the description of RoundConstant values since they are not related to our work. Interested readers can refer to [15] for more details.

3 Observations on branch number

3.1 (Higher-order) branch number

The SP-type round function is one of the most widely used structures in the design of block cipher. Normally, a layer of non-linear S-box is used to provide confusion effect and a linear function is used to provide diffusion effect. For measuring diffusion effect, the notation of branch number is defined to represent the minimum number of active S-boxes in consecutive 2-round. However, for block ciphers employing SPS-type round function, such as SPEEDY and Piccolo, it will be difficult to obtain the number of active S-boxes directly based on branch number. Therefore, they have to consider the two layers of S-boxes together as Super S-box and then evaluate the minimum number of active Super S-box instead [15, 20].

For the round function of SPEEDY, the linear function MC can be considered as a cyclic binary matrix multiplied to each column of the state. Denote the corresponding binary cyclic matrix of MC as M, and then its branch number is defined as:

$$\begin{aligned} bn = \mathop {\min }\limits _{x \in F_2^{l} \backslash \left\{ 0 \right\} } hw(x)+hw(M \times x^T) \end{aligned}$$

where \(hw(\cdot )\) denotes the hamming weight of a binary array. According to the differential analysis in [15], they only considered 1-bit to 1-bit differential over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) where the first and second SB operations are both 1-bit to 1-bit transitions, and its maximum differential probability is \(2^{-6}\). Then, considering that the branch number of MC is \(bn = 8\), they can ensure that the maximum expected differential probability of this kind of 1-bit to 1-bit differential trails over 2-round SPEEDY is \((2^{-6})^8 = 2^{-48}\).

Moreover, in order to evaluate the maximum differential probability of 1-bit to 1-bit differential trails over several rounds of SPEEDY, they proposed a new notation of higher-order branch number \(bn_r\). For example, the higher-order branch number for 3-round SPEEDY can be computed as follows:

$$\begin{aligned} bn_3 = \mathop {\min }\limits _{\begin{array}{c} {i_1 ,i_2 ,i_3 \ne 0}\\ {H[i_1 ][i_2 ] = H[i_2 ][i_3 ] = 1} \end{array}} i_1 + i_2 + i_3 \end{aligned}$$

where H is a binary table such that the element in the position H[i][j] is 1 if and only if there is an \(x \in F_2^{l} \backslash \{ 0\}\) with \(hw(x) = i\) and \(hw(M \times x^T) = j\). Obviously, the branch number bn is the same as \(bn_2\) defined as follows:

$$\begin{aligned} bn_2 = \mathop {\min }\limits _{\begin{array}{c} {i_1 ,i_2 \ne 0}\\ {H[i_1 ][i_2 ] = 1} \end{array}} i_1 + i_2 \end{aligned}$$

Then the maximum expected differential probability of 1-bit to 1-bit differential trails over r-round SPEEDY can be estimated as \(2^{-6 \times bn_r}\).

3.2 Differential analysis of 2-round SPEEDY

In the above analysis, they have only considered 1-bit to 1-bit differential over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) in order to use the branch number of MC to evaluate the maximum expected differential probability over two rounds of SPEEDY. However, for all the differential trails over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\), if we can consider multiple nonzero bits differential transition for the second SB operation, the number of active 6-bit S-boxes can be reduced significantly.

Without loss of generality, we set one column of the input and output states of MC as active, and then the operations for two rounds of SPEEDY can be simplified as follows:

$$\begin{aligned} \overleftarrow{\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}} \circ \texttt {SC} \circ \texttt {MC} \circ \overrightarrow{\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}}. \end{aligned}$$

Take the central operation MC as a starting point, and assume the second SB operation in both forward and backward directions can take n-bit differential transitions with \(1 \le n \le 6\). Then we can obtain differential trail of 2-round SPEEDY with much higher probability. We give a simple example to illustrate the differential transitions of this kind of trails in the following.

Example 1

We present an example of 2-round differential trail for SPEEDY. Fig. 1 illustrates its differential propagations in detail. The black box denotes ’1’ difference and the empty box denotes ’0’ difference, respectively. There are only 13 active S-boxes and the maximum differential probability should be \((2^{-3})^{13} = 2^{-39}\). Obviously, it is much higher than the maximum differential probability of 1-bit to 1-bit differential trails which is estimated to be \(2^{-48}\).

Fig. 1
figure 1

An example of 2-round differential trail for SPEEDY

4 Partition branch number

Inspired by the analysis of 2-round differential trail of SPEEDY, we can define a new form of branch number to describe the least number of active S-boxes more accurately. For the round function of SPEEDY, since SC does not change the column position and MC is a cyclic matrix, we can omit the second SC in the computation of branch number as follows.

$$\begin{aligned} \texttt {SB} \circ \texttt {SC} \circ \texttt {SB} \circ \texttt {MC} \circ \texttt {SB} \circ \texttt {SC} \circ \texttt {SB}. \end{aligned}$$

Similar to the definition of hamming weight of a binary array \(hw(\cdot )\), we can define a new form of partition hamming weight as follows.

Definition 1

(Partition Hamming Weight) For a binary array x, partition all the active bits into several sets in order and each set satisfies that the maximum distance does not exceed k. Then the minimum number of active sets is defined as the partition hamming weight of a binary array, which is denoted as \(phw_k(x)\).

Considering the round function of SPEEDY, the binary array x defined in the above definition can be considered as a column of the state. Since the operation of SC is a rotation with 5 bits at most, we will only consider the situation of \(k = 5\). Therefore, in the remaining of this paper, we will abbreviate \(phw_5(x)\) as phw(x). The definition of partition hamming weight represents the minimum number of active S-boxes for the output difference after the operations of SB and SC. In Fig. 1, the difference propagation of the second round illustrates a sample of partition. Moreover, we will give a more detailed example as follows.

For example, given the following binary array x, which can be considered as a column of the state. There are 7 active bits in the positions {0,1,5,9,15,21,26}.

$$\begin{aligned} x^T = [1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0] \end{aligned}$$

They can be partitioned into 4 active sets as follows. The first set contains the 0-th and 1-th bits and its maximum distance is 1. The second set contains the 5-th to 9-th bits and its maximum distance is 4. The third set contains only the 15-th bit and its maximum distance is denoted as 0. The last set contains the 21-th to 26-th bits and its maximum distance is 5. Obviously, this is a valid partition. By checking all the possible valid partitions, we can get the minimum number of active sets is 4, namely \(phw(x) = 4\).

$$\begin{aligned} \left\{ 1,1\right\} ,0,0,0,\left\{ 1,0,0,0,1\right\} ,0,0,0,0,0,\left\{ 1\right\} ,0,0,0,0,0,\left\{ 1,0,0,0,0,1\right\} ,0,0,0,0,0 \end{aligned}$$

On this basis, we can define a new kind of branch number to evaluate the minimum number of active S-boxes considering multi-bit differentials instead of only 1-bit to 1-bit differentials. Similar to the above analysis, we set one column of input and output states of MC as active, and assume the second SB operation in both forward and backward directions can take n-bit differential transitions with \(1 \le n \le 6\). According to the definition of partition hamming weight, the maximum distance of each active set does not exceed 5. Therefore, after the operations of SB and SC, they can be transited into the same S-box. Then an improved branch number can be defined as follows.

Definition 2

(Partition Branch Number) For the binary cyclic matrix M of the MC operation, we define its partition branch number as follows:

$$\begin{aligned} pbn = \mathop {\min }\limits _{x \in F_2^{l} \backslash \left\{ 0 \right\} } hw(x)+hw(M \times x^T) + phw(x) + phw(M \times x^T). \end{aligned}$$

The partition branch number can ensure that there are at least pbn active S-boxes over two rounds. Here we give a brief explanation for this claim. For any input difference \(x \in F_2^{l} \backslash \left\{ 0 \right\} \), the output difference after MC should be \(M \times x^T\). We can use \(hw(x)+hw(M \times x^T)\) to denote the number of active S-boxes for the operations of \(\texttt {SB} \circ \texttt {MC} \circ \texttt {SB}\). Then according to the definition of partition hamming weight, \(phw(x)+phw(M \times x^T)\) denotes the minimum number of active S-boxes for the operations of \(\texttt {SB} \circ \texttt {SC}\). Therefore, the partition branch number defined above can represent the the minimum number of active S-boxes through two rounds operations of \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB} \circ \texttt {MC} \circ \texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\).

According to the difference distribution table (DDT) of the 6-bit S-box used in SPEEDY, the maximum differential probability is \(2^{-3}\). Therefore, the maximum expected differential probability of differential trails over two rounds of SPEEDY can be estimated as \((2^{-3})^{pbn}\). It is noteworthy that based on partition branch number, we can directly evaluate the minimum number of active S-boxes. Compared to the traditional branch number which is used to evaluate the minimum number of active \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\), we can obtain more accurate results about the maximum differential probability of SPEEDY.

4.1 Computation of pbn

In order to compute the value of pbn, the main problem is to compute the partition hamming weight of a binary array. For an arbitrary binary array x, we can use a recursive algorithm to compute the value of phw(x). First of all, we choose all the non-zero bits to form a set. Then we use a recursive algorithm to traverse all the possible partitions and check whether the condition is satisfied for each partition. Finally, the minimum number of active sets for all valid partitions is output as phw(x).

Denote an l-bit binary array as \(x=[x_0, x_1, \ldots x_{l-1}]\), and choose the indices of all non-zero bits in order to form a set \(\{{i_1}, {i_2}, \ldots , {i_j}\}\), where \(0 \leqslant {i_1}< {i_2}< \ldots < {i_j} \leqslant {l-1}\) and \(x_{i_1} = x_{i_2} = \ldots = x_{i_j} = 1\). Moreover, since SC is a bit rotation operation, the indexes \({i_1}, {i_2}, \ldots , {i_j}\) can be considered as modular l in the computation of maximum distance. For example, for the binary array displayed above after Definition 1, the index set formed by all non-zero bits is \(\{0,1,5,9,15,21,26\}\).

In order to compute the partition hamming weight, we present the following Algorithm 1 to traverse and check all the valid partitions for the set \(\{{i_1}, {i_2}, \ldots , {i_j}\}\). Since there are j elements in the set, the following trivial partition with j subsets is always valid.

$$\begin{aligned} \{{i_1}\}, \{{i_2}\}, \ldots , \{{i_j}\} \end{aligned}$$

Therefore, we can start to traverse and check if there is a valid partition with \(j-1\) subsets. If we obtain a valid partition, then we can abort the traversal and go on to check partitions with \(j - 2\) subsets and so on. The algorithm stops when all the possible partitions with \(n \; (1 \le n \le j-1)\) subsets are invalid and the output is \(phw = n + 1\). As an exceptional case, if the partition with \(n = 1\) namely the set \(\{{i_1}, {i_2}, \ldots , {i_j}\}\) is also valid, then the output is \(phw = 1\).

Algorithm 1
figure a

Computation of partition hamming weight

For partitions with n subsets, we use the following recursive function TraverseCheck to traverse and check if there is a valid partition. According to the definition of phw, a partition is valid if it satisfies the condition that the maximum distance of each subset does not exceed 5. Moreover, since MC is a cyclic binary matrix, without loss of generality, all the partitions can start from the first element \(i_1\). Therefore, the set \(\{{i_1}, {i_2}, \ldots , {i_j}\}\) can be partitioned into n subsets in order, and each subset contains \(s_1, s_2, \ldots , s_n\) elements, where \(s_1 + s_2 + \ldots + s_n = j\). Note that all the subsets should not be empty, namely \(1 \le s_1, s_2, \ldots , s_n \le j\). Considering that the element in each set is just the index of active bit, the maximum distance of each subset can be simply computed as the subtraction of the last element and the first element in the subset. For example, the maximum distance of the first subset is equal to \(i_{s_1} - i_1\), and the maximum distance of the second subset is equal to \(i_{s_1 +s_2} - i_{s_1 +1}\), etc.

Algorithm 2
figure b

TraverseCheck

Algorithm 3
figure c

Traverse

4.2 Higher-order partition branch number

Similarly, in order to evaluate the minimum number of active S-boxes over several rounds, we can further define the notation of higher-order partition branch number \(pbn_r\). For example, the higher-order partition branch number for 3-round SPEEDY can be computed as follows:

$$\begin{aligned} pbn_3 = \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} ,i_{30}, i_{31} \ne 0}\\ {R[i_{10} ][i_{11} ] = R[i_{20} ][i_{21} ] = R[i_{30} ][i_{31} ]= 1}\\ {H[i_{11} ][i_{20} ] = H[i_{21} ][i_{30} ] = 1} \end{array}}i_{10} + i_{11} + i_{20} + i_{21} + i_{30} + i_{31} \end{aligned}$$

where H is a binary table such that the element in the position H[i][j] is 1 if and only if there is an \(x \in F_2^{l} \backslash \{ 0\}\) with \(hw(x) = i\) and \(hw(M \times x^T) = j\), and R is a binary table such that the element in the position R[i][j] is 1 if and only if there is an \(x \in F_2^l \backslash \{ 0\}\) with \(hw(x) = i\) and \(phw(x) \le j \le hw(x)\). Namely, the table R records all the possible differential trails over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) where the second layer of SB can be n-bit to 1-bit differential.

Based on the above definition, we can compute the higher-order partition branch number for 3–7 rounds SPEEDY and the results are as follows:

$$\begin{aligned} pbn_3 = 21, \;\; pbn_4 = 31, \;\; pbn_5 = 41, \;\; pbn_6 = 51, \;\; pbn_7 = 61. \end{aligned}$$

Considering that the best differential probability of the 6-bit S-box is \(2^{-3}\), the maximum probability of differential trails over 3–7 rounds is estimated to be smaller than \(2^{-63}\), \(2^{-93}\), \(2^{-123}\), \(2^{-153}\) and \(2^{-183}\).

Notice that in the above computation, we only consider the hamming weight of a binary array x, which is rather imprecise especially taking into account the operation MC between two rounds. In order to improve the accuracy, we can precompute a table R2 to record all the possible differential trails over 2-round SPEEDY (namely \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB} \circ \texttt {MC} \circ \texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\)). The table R2 is first initialized with 0, and then the element in the position R2[i][j] is updated with the minimum value of \((hw(x) + hw(M \times x^T) + i + j)\) if and only if there is an \(x \in F_2^l \backslash \{ 0\}\) with \(phw(x) \le i \le hw(x)\) and \(phw(M \times x^T) \le j \le hw(M \times x^T)\). Namely, the element in R2[i][j] stores the minimal number of active S-boxes for all the possible differential trails over 2-round where the hamming weight of input difference is i and the hamming weight of output difference is j.

Then, we can compute the value of higher-order partition branch number more accurately. For even rounds, it can be computed as follows:

$$\begin{aligned} pbn_4= & {} \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0, R2[i_{20} ][i_{21} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = 1} \end{array}} R2[i_{10} ][i_{11} ] + R2[i_{20} ][i_{21} ] \\ pbn_6= & {} \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} ,i_{30}, i_{31} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0, R2[i_{20} ][i_{21} ] \ne 0, R2[i_{30} ][i_{31} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = H[i_{21} ][i_{30} ] = 1} \end{array}} R2[i_{10} ][i_{11} ] + R2[i_{20} ][i_{21} ] + R2[i_{30} ][i_{31} ] \end{aligned}$$

For odd rounds, we can combine the two tables R and R2 together, and the value of higher-order partition branch number is computed as follows:

$$\begin{aligned} pbn_3= & {} \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = 1}\\ {R[i_{20} ][i_{21} ] = 1} \end{array}} R2[i_{10} ][i_{11} ] + i_{20} + i_{21} \\ pbn_5= & {} \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} ,i_{30}, i_{31} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0, R2[i_{20} ][i_{21} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = H[i_{21} ][i_{30} ] = 1}\\ {R[i_{30} ][i_{31} ] = 1} \end{array}}R2[i_{10} ][i_{11} ] + R2[i_{20} ][i_{21} ] + i_{30} + i_{31} \end{aligned}$$

Note that the partition branch number pbn is the same as \(pbn_2\) defined as

$$\begin{aligned} pbn_2 = \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0} \end{array}} R2[i_{10} ][i_{11} ] \end{aligned}$$

Based on the above formulas of higher-order partition branch number, we can obtain more accurate results about the minimum number of active 6-bit S-boxes for 3–7 rounds SPEEDY as follows:

$$\begin{aligned} pbn_2 = 13, \;\; pbn_3 = 23, \;\; pbn_4 = 35, \;\; pbn_5 = 45, \;\; pbn_6 = 57, \;\; pbn_7 = 67. \end{aligned}$$

Considering that the best differential probability of the S-box is \(2^{-3}\), the maximum probability of differential trails over 2–7 rounds SPEEDY is estimated to be smaller than \(2^{-39}\), \(2^{-69}\), \(2^{-105}\), \(2^{-135}\), \(2^{-171}\) and \(2^{-201}\). It is noteworthy that these results are significantly higher than the maximum expected differential probabilities estimated by the designers in [15].

Moreover, these results are also consist with the search results of minimum number of active S-boxes obtained in [15]. By considering that there are at most 8 active words (of 6-bit) per state, they searched for the minimum number of active S-boxes for up to 4 rounds and the result are 13, 23 and 35 for 2, 3 and 4 rounds. Compared to the time-consuming search method, our computation of higher-order partition branch number can directly evaluate the minimum number of active S-boxes over several rounds more efficiently. Moreover, it can obtain accurate results for arbitrary rounds while automatic search may be infeasible for long rounds.

5 Concrete differential trails

In the above analysis, we always assume that all the differential transitions through the S-box are possible. However, when taking the actual Difference Distribution Table (DDT) of 6-bit S-box used in SPEEDY into consideration, the expected differential trail with maximum differential probability may not exist. Therefore, in order to verify the effectiveness of higher-order partition branch number obtained above and evaluate the security margin of SPEEDY more precisely, we will study concrete differential trails which satisfy the differential distribution table and compute its maximum differential probability.

In this section, we first present a special kind of 1-bit to 1-bit differential trails for 2–7 rounds SPEEDY and evaluate the corresponding concrete differential probabilities. Then, based on the computation of higher-order partition branch number, we present several effective n-bit concrete differential trails for 2–5.5 rounds SPEEDY and evaluate the maximum differential probabilities.

5.1 Concrete 1-bit to 1-bit differential trails for SPEEDY

First of all, according to the analysis of MC having branch number \(bn = 8\) and the maximum differential probability of 1-bit to 1-bit transitions over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) is \(2^{-6}\), the maximum expected differential probability over two rounds of SPEEDY is supposed to be \((2^{-6})^8=2^{-48}\). However, we can prove that there is no possible 1-bit to 1-bit transition with such maximum differential probability.

Similar to the notation used in [15], we also use a table \(T_1[i,j]\) to present the 1-bit to 1-bit differential probabilities of the SPEEDY S-box, and a table \(T_2[i,j]\) to present the 1-bit to 1-bit differential probabilities over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\).

$$\begin{aligned} T_2[i,j] = \mathop {\max }\limits _{k} T_1[i,k] \cdot T_1[k,j] \end{aligned}$$

The results of \((T_1, T_2)\) are listed in Table 2. We find that for most of the entries in \(T_2\), there is a unique solution of k to achieve the maximum differential probability. For example, for the entry \(T_2[5,2] = 16\), the only possible transition is \(T_1[5,1] \cdot T_1[1,2]\). There are only seven entries which have 2 or 3 solutions of k, and for simplicity we can choose the first solution as the transition.

Table 2 1-bit to 1-bit differential probabilities of 6-bit S-box (\(T_1\)) and \( \texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) (\(T_2\))

Moreover, since SC and MC do not change the column position of active bits, the 1-bit to 1-bit differential trail over two rounds of SPEEDY can be illustrated as the following connection:

$$\begin{aligned} \texttt {SB} \circ \texttt {SC} \circ \texttt {SB} \xrightarrow {\texttt {MC}} \texttt {SB} \circ \texttt {SC} \circ \texttt {SB}. \end{aligned}$$

Based on the branch number of MC, there are at least 8 active 1-bit to 1-bit transitions over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\). All the possible patterns satisfy that there is one active \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) in the first round and seven active \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) in the second round. Therefore, we can define a table \(T_4\) to store the maximum differential probability of the above connection, where each entry \(T_4[i,j]\) denotes the maximum possible differential probability that the active input bits in the column i transit to active output bits in the column j. Then, the values of \(T_4[i,j]\) can be computed by the following equation:

$$\begin{aligned} T_4[i,j] = \mathop {\max }\limits _{k} T_2[i,k] \cdot (T_2[k,j])^7. \end{aligned}$$

The results of \(T_4\) are listed in Table 3. We can see that the maximum probability of concrete 1-bit to 1-bit differential trail over two rounds of SPEEDY is \(2^{-50}\), which is slightly lower than the estimated maximum probability of \(2^{-48}\).

Table 3 Probabilities of 1-bit to 1-bit differential trails for 2-round SPEEDY (\(T_4\))

Similarly, we can analyze the maximum concrete probability of 1-bit to 1-bit differential trails for 3–7 rounds SPEEDY. Based on the above analysis, in order to keep the active columns as few as possible, all the active \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) in one round should transit to the same active column. Moreover, since SC and MC are both column-rotation equivalent, the number of active bits in each column will not be affected by SC. Therefore, the minimum number of active 1-bit to 1-bit transitions over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) for r-round can be computed as:

$$\begin{aligned} \mathop {\min }\limits _{x \in F_2^{l} \backslash \left\{ 0 \right\} } \sum \limits _{i = 1}^r {hw(M^{i-1} \times x^T)} \end{aligned}$$

where M is the corresponding binary matrix of MC and \(M^0\) denotes the identity matrix I. By traversing all the possible values of x, we can know that the results of minimum number of active 1-bit to 1-bit transitions over \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\) for 3–7 rounds are 13, 32, 36, 48 and 56 respectively. Then, similar to the computation of \(T_4\), we can traverse all the possible patterns of x minimizing the above equation and compute the maximum concrete probability of 1-bit to 1-bit differential trails for 3–7 rounds. For example, the table \(T_6\) stores the maximum concrete probabilities of 1-bit to 1-bit differential trails for 3-round SPEEDY, and the value of each entry \(T_6[i,j]\) can be computed as follows.

$$\begin{aligned} T_6[i,j] = \mathop {\max }\limits _{k1, k2} (T_2[i,k1])^{hw(x)} \cdot (T_2[k1,k2])^{hw(M \cdot x)} \cdot (T_2[k2,j])^{hw(M^2 \cdot x)}. \end{aligned}$$

Finally, the results of \(T_6\) are listed in Table 4, and more results of \(T_8\)-\(T_{14}\) for 4–7 rounds are listed in Table 9 in Appendix A.

Table 4 Probabilities of 1-bit to 1-bit differential trails for 3-round SPEEDY (\(T_6\))

To sum up, for this special kind of 1-bit to 1-bit differential trails, the maximum concrete probabilities for 2–7 rounds are \(2^{-50}\), \(2^{-82.08}\), \(2^{-196.49}\), \(2^{-229.96}\), \(2^{-306.60}\) and \(2^{-356.60}\). Moreover, in order to provide experimental verification, we present example differential trails for 2–7 rounds SPEEDY in Table 5. The input difference of each round is represented column-wise in hexadecimal and R/R\(_{-1}\) denotes one complete round and the final round respectively.

Table 5 Example 1-bit to 1-bit differential trails for 2–7 rounds SPEEDY

5.2 Concrete n-bit differential trails for SPEEDY

In the above analysis of 1-bit to 1-bit differential trails, we have only considered 1-bit to 1-bit differential transitions over the 6-bit S-box. In this section, we present concrete differential trails which can take n-bit differential transitions over the S-box with \(1 \le n \le 6\). According to the analysis in Sect. 4, the least number of active S-boxes for this kind of differential trails can be reduced significantly. Therefore, we can construct effective differential trails with much higher probability. Based on the result of higher-order partition branch number \(pbn_r\), we can obtain the least number of active S-boxes for r-round. Then by traversing all the possible patterns and taking the actual Difference Distribution Table (DDT) of the S-box into consideration, we can get concrete differential trail with maximum differential probability for r-round SPEEDY.

5.2.1 2-Round differential trail

According to the analysis of partition branch number \(pbn_2 = 13\), there are at least 13 active S-boxes over two rounds. Moreover, based on the analysis in Sect. 4.2, in order to compute the value of pbn, we precompute a table R2[i][j] to store the minimal number of active S-boxes for all the possible differential trails over 2-round SPEEDY, where the hamming weight of input difference is i and the hamming weigh of output difference is j. Then, by checking all the entries of table R2, we can find that there is only one entry satisfying \(R2[1][4] = 13\). Namely, the only possible pattern of 2-round differential trail with 13 active S-boxes satisfies that there is only one active bit in the input difference and 4 active S-boxes in the output difference.

Therefore, by traversing all the possible patterns and taking the actual difference distribution table of the S-box into consideration, we can get the maximum differential probability of 2-round differential trail for SPEEDY. In Fig. 2, we present an example differential trail for 2-round SPEEDY, where the black box denotes ‘1’ difference and the empty box denotes ‘0’ difference. There are 13 active S-boxes and the concrete differential probability is equal to \(2^{-3.4} \times 2^{-4} \times (2^{-4})^5 \times (2^{-3})^4 \times (2^{-3.4})^2 \approx 2^{-46.2} \). Obviously, it is much better than the concrete 1-bit to 1-bit differential trail for 2-round SPEEDY presented in Sect. 5.1.

Fig. 2
figure 2

Concrete differential trail for 2-round SPEEDY

5.2.2 3-Round differential trail

Similarly, according to the analysis of \(pbn_3 = 23\), there are at least 23 active S-boxes over three rounds. Moreover, based on the definition of \(pbn_3\),

$$\begin{aligned} pbn_3 = \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = 1}\\ {R[i_{20} ][i_{21} ] = 1} \end{array}} R2[i_{10} ][i_{11} ] + i_{20} + i_{21} \end{aligned}$$

we traverse all the possible values of \(i_{10}, i_{11} ,i_{20}, i_{21}\) and find that there is only one solution satisfying 23, namely \(i_{10} = 1, i_{11} = 7 ,i_{20} = 5, i_{21} = 2\). Note that in this pattern the first two rounds R2[1][7] just represents a 1-bit to 1-bit differential trail. Therefore, we can exploit the results of concrete 1-bit to 1-bit differential probabilities of 2-round SPEEDY in Table 3. By checking all the possible connections of 2-round 1-bit to 1-bit differential trail with 1-round n-bit differential trail satisfying the given pattern, we can get the maximum differential probability of concrete 3-round differential trail for SPEEDY.

In Fig. 3, we present an example of the best differential trail for 3-round SPEEDY, where the black box denotes ’1’ difference and the empty box denotes ’0’ difference. There are 23 active S-boxes and the first 2-round (\(\texttt {2R}\)) is 1-bit to 1-bit differential trail corresponding to \(T_4[1,4]\) in Table 3 whose differential probability is \(2^{-51.32}\). Therefore, the total probability of the 3-round differential trail is equal to \(2^{-51.32} \times (2^{-3})^2 \times (2^{-4})^3 \times 2^{-4} \times 2^{-3.4} \approx 2^{-76.72} \). As a contrast, the best differential probability of concrete 1-bit to 1-bit differential trail for 3-round SPEEDY is only \(2^{-82.08} \) as listed in Table 4.

Fig. 3
figure 3

Concrete differential trail for 3-round SPEEDY

5.2.3 4-Round differential trail

There are at least 35 active S-boxes for 4-round SPEEDY since \(pbn_4=35\),

$$\begin{aligned} pbn_4 = \mathop {\min }\limits _{\begin{array}{c} {i_{10}, i_{11} ,i_{20}, i_{21} \ne 0}\\ {R2[i_{10} ][i_{11} ] \ne 0}\\ {R2[i_{20} ][i_{21} ] \ne 0}\\ {H[i_{11} ][i_{20} ] = 1} \end{array}} R2[i_{10} ][i_{11} ] + R2[i_{20} ][i_{21} ] \end{aligned}$$

Unfortunately, there are only four solutions of \(i_{10}, i_{11} ,i_{20}, i_{21}\) satisfying there are 35 active S-boxes, and by traversing all the possible patterns we find that it is impossible to construct a concrete differential trail when the actual difference distribution table of the S-box is considered. Therefore, we further check patterns with the almost least number of active S-boxes. By traversing all the possible patterns with 36 and 37 active S-boxes, we can get the maximum differential probability of concrete differential trails for 4-round SPEEDY.

In Table 6, we present an example of the best concrete differential trail for 4-round SPEEDY. The input and output differences of each operation are expressed column-wise in hexadecimal as follows. The total probability of this 4-round differential trail is \( 2^{-129.2} \). Obviously, it is significantly higher than the best concrete 1-bit to 1-bit 4-round differential trail listed in Table 5, whose probability is \(2^{-196.49} \) exceeding the bound of 192-bit security already.

Table 6 Concrete differential trail for 4-round SPEEDY

5.2.4 5-Round differential trail

Similar to the situation of 4-round concrete differential trail, there is also no possible 5-round differential trail satisfying \(pbn_5 = 45\) when the actual difference distribution table of the S-box is considered. Therefore, we further check all the possible patterns with 46-48 active S-boxes and search for the best concrete differential trail for 5-round SPEEDY. As a result, we present an example of the best concrete differential trail for 5-round SPEEDY in Table 7. The input difference and output difference can be expressed column-wise in hexadecimal as follows.

$$\begin{aligned} (\texttt {00200000},\texttt {00200000},0,0,0,0) \xrightarrow {\texttt {5R}} (0,\texttt {00001000},0,\texttt {01001000},\texttt {01000040},0) \end{aligned}$$

The total probability of this 5-round differential trail is \( 2^{-170.0} \). It is noteworthy that this is still higher than \(2^{-192}\) and hence it can be used to mount a key-recovery attack on SPEEDY-7-192.

Table 7 Concrete differential trail for 5-round SPEEDY

5.2.5 Differential trails for more rounds

We have also searched for the optimal differential trails for more rounds by considering the least number of active S-boxes (\(pbn_6=57\) and \(pbn_7=67\)). Unsurprisingly, there is no valid differential trail for 6-round or 7-round SPEEDY with probability higher than \(2^{192}\), when considering the difference distribution table of S-box. For simplicity, we only present input/output differences and the probabilities of 6-round and 7-round differential trails we have obtained.

The input and output differences of 6-round differential trail can be expressed column-wise in hexadecimal as follows, and the probability is \( 2^{-216.0} \).

$$\begin{aligned} (\texttt {00200000},\texttt {00200000},0,0,0,0) \xrightarrow {\texttt {6R}} (0,\texttt {08200000},0,\texttt {08000200},0,\texttt {00200200}) \end{aligned}$$

Similarly, we can extend to get the following 7-round differential trail and its differential probability is \( 2^{-266.2} \).

$$\begin{aligned}{} & {} (\texttt {00200000},\texttt {00200000},0,0,0,0) \xrightarrow {\texttt {7R}} (0,\texttt {00000040},\texttt {04100040},\texttt {00000040},\\ {}{} & {} \quad 0,\texttt {00010000}) \end{aligned}$$
Fig. 4
figure 4

6-Round related-key differential trail and full-round key recovery attack

6 Full-round related-key differential attack

Utilizing the above differential characteristic, together with the simple linear key schedule of SPEEDY, we can construct an efficient 6-round related-key differential trail and mount a key recovery attack on full-round SPEEDY-7-192.

The key schedule of SPEEDY receives a 192-bit master key and initializes it to the state of the round key (\(k_0\)). Then, it applies a simple bit permutation \(\texttt {PB}\) to compute the next round key. Contents of the bit-permutation \(\texttt {PB}\) are listed in Table 10 in Appendix A.

6.1 Related-key differential trail

Based on the simple bit-permutation of key schedule, if we choose one active bit in the master key, then we can get a pair of related-key with only one active bit in each round. Moreover, all the subkey differences can be directly determined with probability one.

Therefore, by carefully choosing the position of active bit in master key, we can construct efficient related-key differential by trying best to cancel the input difference and control the difference propagation for each round. We combine the subkey difference with the differential trail search method in Sect. 5 and traverse all the possible positions of one active bit . Finally, we have obtained the following 6-round related-key differential trail and its probability is about \( 2^{-179.2} \). Its difference propagation is illustrated in Fig.4 in detail.

$$\begin{aligned} {\begin{array}{*{20}c} {\Delta X^0 = (\texttt {04000000},0,0,0,0,0)} \\ {\Delta K^0 = (\texttt {04000000},0,0,0,0,0)} \\ \end{array}} \xrightarrow {\texttt {6R}} \Delta X^6 = (0,0,0,0,0,\texttt {97440108}) \end{aligned}$$

6.2 Full-round key recovery attack

We can extend the above 6-round related-key differential trail one round forwards to reach the full 7-round and recover the key used in the last round.

Table 8 Time complexity evaluation of the attack procedure

Note that the output difference \(\Delta X^6\) has only 9 active bits in the 5-th column and all the other bits are zero. In the last round, after the first subkey addition \(\texttt {Ak}_6\), there will be another active bit in the 0-th column. Then after the operations of \(\texttt {SB} \circ \texttt {SC} \circ \texttt {SB}\), all the rows of \(\Delta C\) may be active. In order to enhance the filter of ciphertext pairs, we restrict the output differences of the first layer of SB in Rows 23 and 28 to be 0x10 and 0x04 respectively. Then the differential probability is reduced to \( 2^{-185.2} \), and the possible difference propagation of the last round is illustrated in Fig.4. We denote the empty box as ’0’ difference, the black box as ’1’ difference, the grey box as ’?’ difference, and the red box as key difference respectively.

The attack procedure can be described briefly as follows. First, choose randomly \( 2^{185.2} \) pairs of plaintexts with \(\Delta P = (\texttt {04000000},0,0,0,0,0)\), and generate the corresponding ciphertext pairs under a pair of related-key with \(\Delta K = (\texttt {04000000},0,0,0,0,0)\). Second, filter the ciphertext pairs with zero-difference conditions. There are seven inactive rows which means a filter probability of \(2^{-42}\). Hence, after this step, there remains about \( 2^{143.2} \) ciphertext pairs. Third, guess the value of subkey \(k_7\) in Row 22 and 25 respectively, and partially decrypt through the second layer of SB to check if it satisfies the difference condition. The time complexity of this step is \( 2^{143.2} \cdot 2^{6} + 2^{143.2} \cdot 2^{-6} \cdot 2^{12}= 2^{150.2}\), and after this step there remains about \( 2^{131.2}\) pairs. Last, guess the value of subkey \(k_7\) row by row (from Row 17 to Row 0 and then Row 31 to Row 27) and partially decrypt through the second layer of SB to check if it satisfies the difference condition. Moreover, in some steps the filter condition of first layer of SB can bring another data filter probability of \(2^{-6}\). The data filter probability and time complexity of each step are listed in Table 8. The overall time complexity is about \(2^{19.73} \cdot 2^{143.2} = 2^{162.93}\), and there remains about \( 2^{-6.8} \) pairs. Therefore, if there still remains a pair after all the filter steps, the corresponding subkey guess is correct.

To sum up, the data and time complexities of related-key differential attack on full-round SPEEDY are \(2^{185.2} \times 2 = 2^{186.2}\) chosen-plaintexts and \(2^{162.93} \div 7 \approx 2^{160.13}\) encryptions. We have successfully recovered 150-bit keys, and the remaining 42-bit keys can be recovered simply by exhaustive search with negligible complexity.

7 Conclusion

In this paper, we present concrete differential analysis of SPEEDY based on some new observations on the branch number. First of all, we propose (higher-order) partition branch number, which can describe the minimum number of active S-boxes for SPEEDY more accurately. Then by utilizing an efficient algorithm to compute the value of \(pbn_r\), we can obtain more accurate results about the minimum number of active S-boxes for 2–7 rounds SPEEDY. It is noteworthy that these results are significantly higher than the maximum expected differential probabilities estimated by the designers. Based on this, we search for optimal differential characteristics of SPEEDY while considering the difference distribution table of the 6-bit S-box. We present examples of optimal differential characteristics for 2–7 rounds SPEEDY, whose probabilities are \(2^{-46.2}\), \(2^{-76.72}\), \(2^{-129.2}\), \(2^{-170.0}\), \(2^{-216.0}\) and \(2^{-266.2.0}\) respectively.

Furthermore, by utilizing the simple bit-permutation key schedule of SPEEDY, we can extend the differential trail search method and construct an efficient 6-round related-key differential trail with probability \(2^{-179.2}\). Based on it, we can present related-key differential attack on full round SPEEDY-7-192 with data complexity of \(2^{186.2}\) chosen-plaintexts and time complexity of \(2^{160.13}\) encryptions. Moreover, our work will provide new insights in evaluating the security against differential attack more accurately for block ciphers employing SPS-type round function with bit-wise rotation as the linear layer.