Keywords

1 Introduction

Primitive designers are required to evaluate security against various cryptanalyses, particularly differential cryptanalysis [1, 2] and linear cryptanalysis [3, 4]. For a block cipher with a substitution-permutation network (SPN), a popular approach is to design an S-box that has a small maximum differential probability (MDP) and a linear layer that ensures a large number of differentially active S-boxes as represented by the wide trail strategy [5]. Let minAS be the minimum number of differentially active S-boxes. Then, the probability to satisfy any differential characteristic is upper-bounded by \(( MDP )^{minAS}\).

An advantage of this method is simplicity. It enables to search for an S-box and a linear layer independently. A disadvantage is that an effect derived by an interplay between the S-box and the linear layer can be overlooked. Our goal is to develop an S-box-designing method that takes into account such an interplay.

Bitslice-Friendly Ciphers and SbPN Ciphers. The interplay between the S-box and the linear layers has already been considered for ciphers in which one block computation can be processed efficiently in a bitslice manner such as Serpent [6], Noekeon [7], LS-designs [8], and Rectangle [9]. In those designs, an internal state is divided into several slices, then a linear function is applied for each slice and an S-box is applied over multiple slices. To increase security, the following (4-bit) S-box criteria were introduced.

  1. A)

    S-box differential branch number of 3

  2. B)

    MDP of \(2^{-2}\) (best possible for 4-bit S-box)

The differential branch number of an S-box S is the minimum value of \(hw(x\oplus y) + hw(S(x)\oplus S(y))\) for any distinct x and y, where hw is the Hamming weight.

PRESENT [10] adopts the SPN structure where the linear layer is a simple bit-permutation. We denote ciphers with this structure by “SbPN ciphers.” Both criteria A and B are important for SbPN ciphers. In fact, the PRESENT S-box was designed to satisfy both. After a while, two SbPN ciphers GIFT [11] and TRIFLE-BC [12] refined those criteria. In TRIFLE-BC, criterion A was relaxed: an input difference of weight 1 can propagate to an output difference of weight 1 if its probability is \(2^{-3}\).Footnote 1 GIFT relaxed both criteria A and B. First, MDP of the S-box is \(2^{-1.4}\), while GIFT ensures \(hw(x\oplus y) + hw(S(x)\oplus S(y)) \ge 4\) for any differential transitions with probability \(2^{-2}\) or higher. Second, GIFT introduced a new criterion called BOGI. In short, it classifies the bit positions into good or bad depending on whether a single-bit difference on this bit can propagate with probability \(2^{-2}\) or not. Then, the bit-permutation is chosen to map bad outputs (BO) to good inputs (GI).

To sum up, the branch number of the S-box is important for bitslice-friendly and SbPN ciphers to increase the number of active S-boxes. GIFT and TRIFLE-BC considered the interplay between the S-box and the linear layer, particularly by forcing a small probability for differential transitions with a small weight.

Challenges for Other Constructions. To the best of our knowledge, other constructions do not use S-boxes that are considered the interplay with the linear layer. One reason is that important S-box criteria, corresponding to the branch number for the bitslice-friendly and the SbPN ciphers, are non-trivial for other linear layers. In fact, for AES, it is difficult to exploit the property of the linear layer to give a more accurate evaluation than \(( MDP )^{minAS}\).

1.1 Our Contributions

In this paper, we tackle the above-mentioned challengeFootnote 2. We first identify input and output differences of the linear layer that result in a small number of active S-boxes. We then design an S-box so that the identified differences for the linear layer cannot be satisfied with a high probability. In such a manner, we reveal a property of an S-box that is imposed by a linear layer. For convenience, we call the property “attention-required property (ARP)” in this paper.

The goal of this paper is to design S-boxes by taking into account the ARP of the linear layer for several ciphers other than the bitslice-friendly and the SbPN ciphers. We search for S-boxes for several existing schemes and show that the resistance against differential cryptanalysis can be improved by replacing the existing S-box with new ones.

Fig. 1.
figure 1

Results of Lilliput

Extended Generalized Feistel Network. We start our discussion from an extended generalized Feistel network (EGFN) proposed at SAC2013 [15] because this is a typical example where the interplay between the S-box and linear layers can significantly increase the resistance against differential cryptanalysis.

We observe that the differential characteristic probability of Lilliput tends to be high when the S-box has identity-differential transition, i.e. an input difference \(\varDelta \) is mapped to the same difference \(\varDelta \) via active S-boxes. In fact, Lilliput’s S-box maps the difference 0x6 to 0x6 with probability \(2^{-2}\) and this appears many times in the best differential characteristic. From this observation, we define that the ARP of Lilliput is identity-differential transitions with MDP. Then, we modify Lilliput’s S-box so that any identity-differential transition occurs with probability \(2^{-3}\). To confirm the impact of our modification, we evaluate the maximum differential characteristic probability with the new S-box. We show that the number of rounds to guarantee \(2^{-64}\) is reduced by 1 round as shown in Fig. 1. Note that the new S-box is obtained only by swapping the LSB and the second LSB of the original S-box, thus implementation overhead is negligible.

AES-bMC Ciphers. Our next target is AES-like ciphers in which MixColumns applies the multiplication by a binary matrix. We call such ciphers “AES-bMC ciphers.” AES-bMC ciphers are popular for lightweight cryptography. Midori [16], SKINNY  [17], MANTIS [17], and CRAFT [18] are examples of AES-bMC Ciphers.

Fig. 2.
figure 2

Best differential characteristic for 5-round Midori64. Numbers in gray cells represent the difference in the hexadecimal, and there are no differences in white cells.

We observe that the ARP of the AES-bMC ciphers is completely different from that of the EGFN. As an example, Fig. 2 shows the best differential characteristic for 5-round Midori64. The 1st round uses differential transitions \(\mathtt {0x1} \rightarrow \mathtt {0x2}\) with MDP, and the 2nd round uses differential transitions \(\mathtt {0x2} \rightarrow \mathtt {0x1}\). We call such a chain of differential transitions “high-probability chain ”. We define the ARP of AES-bMC ciphers as the maximum length of the high-probability chain. Then, we observe that the differential characteristic probability can be lowered by choosing an S-box that only allows short high-probability chains.

We then need to find S-boxes that satisfy the ARP. Regarding 4-bit S-boxes, we can exhaustively check all S-boxes belonging to the 16 classes available in the previous works [19, 20]. As a result, we find S-boxes where the maximum length of the high-probability chain is the shortest of S-boxes belonging to the 16 classes.

Fig. 3.
figure 3

Results of Midori64 and SKINNY64

To demonstrate the effect, we replaced the original S-boxes of Midori64 and SKINNY64 with our S-box without modifying their diffusion parts. Figure 3 shows comparisons of their maximum differential characteristic probabilities (MDCPs). Our S-box reduces the number of rounds to reach \(2^{-64}\) by one for Midori64.

Regarding 8-bit S-boxes, exhaustive check is impossible and we need to reduce the search space by introducing some heuristic. In the appendix, we show a SKINNY-like S-box whose high-probability chain is at most 2. The comparison of the MDCP with the new S-box and SKINNY128 is shown in the full version of this paper.

2 Preliminaries

2.1 Basic Knowledge About S-Boxes

The differential probability between input difference \(\varDelta _{in}\) and output difference \(\varDelta _{out}\) of an S-box S is expressed as

$$\begin{aligned} DP_S[\varDelta _{in}, \varDelta _{out}] = \frac{ \# \{ x \in \{0,1\}^n | S(x) \oplus S(x \oplus \varDelta _{in}) = \varDelta _{out} \} }{2^n}. \end{aligned}$$

Let MDP be the maximum probability of \(DP_S\) for all possible pairs of \((\varDelta _{in},\varDelta _{out})\) excluding \((\varDelta _{in}, \varDelta _{out}) = (0,0)\). The smaller the MDP, the more secure an S-box is against differential cryptanalysis. Therefore, many cryptographers have studied the design methods of such S-boxes [21, 22]. When n is odd, S-boxes whose MDP is \(2^{-n+1}\) can be designed [21]. However, the problem is generally open when n is even. Only for \(n=6\), an S-box whose MDP is \(2^{-5}\) is known [22].

The affine-equivalent class is useful to understand the property of S-boxes.

Definition 1

Let \(M_{in}\) and \(M_{out}\) be two invertible matrices and \(c_{in}\) and \(c_{out}\) be two vectors. Then, the S-box \(S'\) defined by two affine transformations as

$$\begin{aligned} S'(x) = M_{out} \times S(M_{in} \times x \oplus c_{in}) \oplus c_{out} \end{aligned}$$

is affine equivalent with the S-box S.

\(c_{in}\) and \(c_{out}\) do not affect the differential probability. Moreover, \(DP_{S'}[\varDelta _{in}, \varDelta _{out}]\) is equal to \(DP_S[M_{in} \times \varDelta _{in}, M_{out}^{-1} \times \varDelta _{out}]\). Therefore, the affine-equivalent S-box preserves not only the MDP but also the frequency that each probability appears because \(M_{in}\) and \(M_{out}\) are invertible.

Cryptographers’ main interest is 8-bit and 4-bit S-boxes, where 8-bit S-boxes are widely applied to various block ciphers such as AES [23] and 4-bit S-boxes are applied to many lightweight block ciphers. It was already proven by exhaustive analysis that the best possible 4-bit S-box has an MDP of \(2^{-2}\) [19, 24]. The number of 4-bit S-boxes is \(2^{4} ! \approx 2^{44.25}\), but the number of affine-inequivalent S-boxes is 302 [20]. Among them, 16 classes also achieve the highest security against linear cryptanalysis [19, 20].

A differential branch number of S-boxes is also one of the well-known design criteria. It is commonly used in bitslice-friendly ciphers such as Serpent [6] or SbPN ciphers such as PRESENT [10]. Note that the branch number of S-boxes is not useful for block ciphers whose linear layer consists of byte-wise XOR such as SKINNY or Midori, which are the main focus in our paper, because it never divides the output of one S-box into each bit.

2.2 Basic Knowledge About Linear Layers

Some ciphers adopt a multiplication with a matrix to diffuse outputs of several S-boxes. Let \(M \in (\mathbb {F}_{2^n})^{m \times m}\) be a diffusion matrix, and m outputs of n-bit S-boxes denoted as \(x \in (\mathbb {F}_{2^n})^m\) are diffused as \(y = M \times x\). We say that the branch number of M is d when the sum of input and output non-zero n-bit differences is at least d for any non-zero \(\varDelta x\).

In this paper, we focus on the linear layer consisting of byte-wise XOR, i.e., entries of M are 0 or 1. Such a matrix is often used to design lightweight block ciphers [16,17,18]. For example, Midori diffuses four bytes as

$$\begin{aligned} \begin{pmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \end{pmatrix} = \begin{pmatrix} 0 &{} 1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 &{} 1 \\ 1 &{} 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 0 \end{pmatrix} \times \begin{pmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} x_1 \oplus x_2 \oplus x_3 \\ x_0 \oplus x_2 \oplus x_3 \\ x_0 \oplus x_1 \oplus x_3 \\ x_0 \oplus x_1 \oplus x_2 \end{pmatrix}, \end{aligned}$$

and the branch number of this matrix is 4. We call such a matrix binary matrix.

2.3 Differential Characteristics and Their Probabilities

We often evaluate differential characteristics and their probabilities. Block cipher \(E_k\) is composed of an iteration of round functions as \(E_k := F_r \circ \cdots \circ F_1\). The probability of a differential characteristic denoted as \((\varDelta _0, \varDelta _1, \ldots , \varDelta _r)\) is defined as \(\prod _{i=1}^{r} DP_{F_i}[\varDelta _{i-1}, \varDelta _i]\). Designers of block ciphers must guarantee that the maximum differential characteristic probability (MDCP) will be sufficiently low so that attackers cannot distinguish the cipher from an ideal one.

The so-called “differential” is used to consider multiple differential characteristics with fixed \(\varDelta _0\) and \(\varDelta _r\). If there are multiple differential characteristics from \(\varDelta _0\) to \(\varDelta _r\), the total probability is the sum of them. We call this the differential effect to avoid confusion.

2.4 How to Search for the Best Differential Characteristic

There are many known techniques to search for the differential characteristic with the MDCP, i.e., the best differential characteristic. One of the most famous methods is the so-called branch-and-bound algorithm [4], which was used to show the best differential and linear characteristics on DES.

After the proposal of the MILP-based method by Mouha et al. [25], the recent trend is turning to the method aided by solvers such as MILP, SAT, or CP. The method by Mouha et al. can show the lower bound of the number of active S-boxes but it does not always give the tight bound. Namely, found characteristics are not always used for attacks directly. Sun et al. [26, 27] proposed a rigorous algorithm to search for the differential characteristics with consideration of differential transitions of active S-boxes. It enables us to show the tight bound of the minimum number of active S-boxes. On the other hand, the differential characteristic with the minimum number of active S-boxes does not always yield the best differential characteristic. Evaluating differential characteristic probability is also important. They also showed how to evaluate the differential characteristic probability rather than the number of active S-boxes, but it was not efficient. A more practical algorithm was shown in [28]. They decomposed a differential distribution table (DDT) into several tables for every probability and constraints are generated for every table. This method allows us to evaluate the MDCP for given ciphers.

The focus of our paper is to design S-boxes lowering differential characteristic probability. After designing such S-boxes, to demonstrate the effect, we replace the original S-boxes of Lilliput, SKINNY, and Midori64 with ours, and re-evaluate the MDCP by using the method shown in [28].

3 Designing S-Boxes Suited for EGFN

The upper bound \(( MDP )^{minAS}\) can be tight only when the probability of the differential transition is MDP in all active S-boxes. However, the choice of differential transitions with MDP is limited.

Therefore, when differential characteristics with minAS (and close to minAS) always involve low-probability transitions in active S-boxes as many as possible, the MDCP can be lower than \(( MDP )^{minAS}\).

We start concrete discussion from the extended generalized Feistel network (EGFN). It is an easy example to demonstrate why considering high- and low- probability transitions are significant.

3.1 Extended Generalized Feistel Network

Fig. 4.
figure 4

Round function of Lilliput

An EGFN, which was proposed in [15], is an extension of the block-shuffle generalized Feistel network [29]. It has an additional linear layer to diffuse active bytes more quickly. Later, Lilliput was proposed as a concrete substantiation of the EGFN in [30]. Figure 4 shows the round function, where there is the additional linear layer between the S-box layer and the permutation layer.

Table 1. The maximum differential characteristic probability of Lilliput. The experiment continued until the number of rounds when the probability is lower than \(2^{-64}\).

Our goal is to design an S-box suited to this network, and we show the original S-box of Lilliput as a reference.

$$\begin{aligned} S = \{ \mathtt 0x4, 0x8, 0x7, 0x1, 0x9, 0x3, 0x2, 0xE, 0x0, 0xB, 0x6, 0xF, 0xA, 0x5, 0xD, 0xC \}. \end{aligned}$$

Table 1 summarizes the minimum number of active S-boxes and MDCP when the original S-box is used. The minimum number of active S-boxes has been discussed in [31], but the evaluation of the MDCP has not been reported yet. We evaluated the MDCP by using a SAT/SMT solver. As shown in Table 1, differential characteristics with \( minAS \) do not lead to the best differential characteristic in 10 rounds and later.

3.2 Identity-Differential Transitions

To reveal the attention-required property (ARP) of this network, we first analyze the condition that differential transitions lead to the best differential characteristic. A notable fact derived from Table 1 is that the minus log base 2 of the probability of the best differential characteristics is always double the number of active S-boxes in the corresponding characteristics. In other words, the best differential characteristics use only transitions with \(2^{-2}\) in all active S-boxes.

Fig. 5.
figure 5

DDT of Lilliput S-box (left) and the best 13-round characteristic (right)

Fig. 6.
figure 6

DDT of modified S-box (left) and the best 12-round characteristic (right)

Figure 5 shows the DDT of the S-box and the best 13-round differential characteristic. In all active S-boxes, the input and output differences are always \(\mathtt 0x6\), and all XORes of two active differences are inactive. Here, we observe an interesting property, particularly in the EGFN. Unlike SP networks, the input difference of S-boxes transits to the next round directly. To cancel two active differences by XORing in the additional linear layer, the input difference should transit to the same difference via S-boxes. Thus, we expect that ARP of this network is identity-differential transitions with MDP, such as \(\mathtt{0x6} \rightarrow \mathtt{0x6}\) with probability \(2^{-2}\). Assuming that a replaced S-box has \(\mathtt{0x1} \rightarrow \mathtt{0x1}\) as such a transition instead of \(\mathtt{0x6} \rightarrow \mathtt{0x6}\), there always exist characteristics whose all non-zero differences are replaced from \(\mathtt{0x6}\) to \(\mathtt{0x1}\). Therefore, avoiding identity-differential transitions with MDP is at least necessary to lower the 13-round differential characteristic probability of Lilliput from the original.

3.3 Replacing S-Box

We now modify the specification of the 4-bit S-box to avoid identity-differential transitions with MDP. To avoid degradation of implementation efficiency, we search for S-boxes belonging to the same bit-permutation equivalent class as the original S-box. The following modified S-box

$$\begin{aligned} S' = \{ \mathtt 0x4, 0x8, 0x7, 0x2, 0xA, 0x3, 0x1, 0xD, 0x0, 0xB, 0x5, 0xF, 0x9, 0x6, 0xE, 0xC \}, \end{aligned}$$

in which the LSB and the second LSB of the output of the original S-box are swapped, successfully avoids identity-differential transitions with MDP (see Fig. 6).

We finally use a solver-aided method to evaluate the MDCP of Lilliput using the modified S-box to confirm the validity of our replacement. The row labeled “modified” in Table 1 shows the MDCP when the modified S-box is used. The best 12-round differential characteristic is shown in Fig. 6. It is clear that this best differential characteristic is more complicated than the original one in which all non-zero transitions are \(\mathtt 0x6 \rightarrow \mathtt 0x6\).

We finally stress the impact on this replacement. The probability of the 12-round best differential characteristic is \(2^{-61}\), which is almost the same as the 13-round MDCP in the original. Besides, the new 13-round MDCP is lower than \(2^{-64}\). Roughly, the gain of about 1 round can be achieved only by swapping the LSB and the 2nd LSB in the output of the S-box.

4 Designing S-Boxes Suited for AES-bMC Ciphers

Unlike the case of EGFN, avoiding the identity high-probability differential transitions is not significant for the AES-bMC ciphers. We introduce a new S-box design criterion called high-probability chain as the ARP for AES-bMC ciphers. We show that using S-boxes with shorter high-probability chain length plays a critical role to lower the MDCP.

4.1 Definition of AES-bMC Ciphers

Intuitively, AES-bMC ciphers are AES-like ciphers that apply a binary matrix during the MixColumns operation. The state of AES-bMC ciphers can be composed of several cells of c bits, typically \(c=4\) or \(c=8\), and cells are arranged in an \(n \times m\) 2-dimensional array. For example, \(n=m=4\) and \(c=8\) for SKINNY128 and \(n=m=4\) and \(c=4\) for Midori64. In each round, the AES-bMC ciphers operate the following computations to update the state.

  • SubCell(SB). A c-bit to c-bit S-box is applied to each cell in the state.

  • ShuffleCell(SR). Cell positions in each row is permuted.

  • MixColumn(MC). Each column is multiplied with a binary matrix.

  • AddRoundKey(AK). A subkey of \(n\times m\times c\) bits is XORed to the state.

We regard that all subkeys are different and distributed uniformly at random.

4.2 High-Probability Chain

The strategy to reveal ARP is the same as the case for EGFN. We focus on high-probability transitions of S-boxes and design S-boxes such that characteristics with minAS (and close to minAS) become invalid when only the high-probability transitions are used.

Fig. 7.
figure 7

DDT of S-box of Midori64 and its high-probability chain

The number of active S-boxes of AES-bMC ciphers can be small when output differences from two active S-boxes cancel each other by the XOR in the MC operation. Considering that cell-positions move to different columns by the SR operation, the number of active S-boxes tends to be small when all the active S-boxes in every round propagate from the same input difference to the same output difference, as shown in Fig. 2.

It is impossible to avoid that all the active S-boxes in a certain single round propagate from the same input difference to the same output difference with MDP, say \(\varDelta A \rightarrow \varDelta B\) in round i. An efficient differential characteristic appears when the same event also occurs in subsequent rounds, say \(\varDelta B \rightarrow \varDelta C\) with MDP in round \(i+1\), \(\varDelta C \rightarrow \varDelta D\) with MDP in round \(i+2\), and so on. In particular, if such a chain makes a loop after a certain number of rounds, say \(\varDelta A \rightarrow \varDelta B \rightarrow \varDelta C \rightarrow \varDelta D \rightarrow \cdots \rightarrow \varDelta A\), a high-probability differential characteristic may occur for a large (or infinite) number of rounds. We call such a chain of the differential transitions with MDP high-probability chain. Then, ARP of the multiplication by a binary matrix in AES-bMC ciphers can be defined as the maximum length of the high-probability chain.

Definition 2 (High-probability chain length)

We say that an S-box has a high-probability chain of length L when there exists a set of differences \((\varDelta _0, \varDelta _{1}, \ldots , \varDelta _{L})\) such that \(\varDelta _i \rightarrow \varDelta _{i+1}\) is a high-probability transition on the S-box for all \(i \in \{0,1,\ldots ,L-1\}\). We also define “the maximum high-probability chain length” as the longest high-probability chain length among all high-probability chains.

If there exists a high-probability chain such that \(\varDelta _0 = \varDelta _L\), we say that this chain has the iterative property, or the chain length is infinite. Figure 7 shows the DDT of Midori64’s S-box and its high-probability chain. Many high-probability chains have the iterative property, i.e., the maximum high-probability chain length of Midori64’s S-box is infinite.

Fig. 8.
figure 8

5-round differential characteristic with minAS for Midori64 with S-box such that the maximum high-probability chain length is 2

To demonstrate the effect of the high-probability chain, we recall the 5-round differential characteristic shown in Fig. 2 with 23 active S-boxes. In this characteristic, all the active cells in a state have the same difference, i.e. 0x1 at the input and 0x2 after the first SB operation, and so on. Differences are preserved before and after the MC operation because of the property of the binary matrix. With Midori’s S-box, all the active S-boxes can be bypassed with probability \(2^{-2}\) which results in probability \(2^{-2\times 23}=2^{-46}\) for the entire characteristic. This is because the maximum high-probability chain length of Midori64’s S-box is infinite. In contrast, suppose that we can adopt an S-box whose maximum high-probability chain length is 2. Then, at least one out of three rounds require \(2^{-3}\) to be bypassed. For example, in Fig. 8, three active S-boxes in the 3rd round require \(2^{-3}\), thus the entire probability decreases to \(2^{-23 \times 2 - 3}=2^{-49}\). Note that this impact is as big as increasing the number of active S-boxes by 1.

We assume that differences are preserved before and after the MC operation in the best differential characteristics as shown in this example. Of course, it does not always hold. We emphasize that the high-probability chain is still meaningful enough even if differences are not preserved before and after all the MC operations. This is because we can still expect that differences are preserved before and after most of the MC operations.

4.3 Design of 4-Bit S-Boxes with Shortest High-Probability Chains

We discuss how to design S-boxes with the shortest maximum high-probability chain length. We focus on 16 classes of 4-bit S-boxes. We first introduce useful properties of S-boxes related to high-probability chain lengths.

Table 2. Maximum high-probability chain lengths of 20160 S-boxes belonging to each affine-equivalent class

Property 1

Two S-boxes S and \(S'\) with the relation \(S'(x) = M^{-1} \times S(M \times x \oplus c_{in}) \oplus c_{out}\) have the same high-probability chain length.

Proof

Let us consider the DDTs of S and \(S'\). Since two constants \(c_{in}\) and \(c_{out}\) do not affect the differential distribution table (DDT), they are independent of the high-probability chain. Next, the sequence of operations corresponding to a high-probability chain can be written as

$$\begin{aligned}&\cdots \rightarrow S \rightarrow M^{-1} \rightarrow M \rightarrow S \rightarrow M^{-1} \rightarrow M \rightarrow \cdots \end{aligned}$$

Since \(M^{-1}\) and M are always cancelled out, these two S-boxes clearly have the same high-probability chain length. Note that assuming the high-probability chain length of S is 1, the high-probability chain length of \(S'\) is also 1. This is because, if it is 2, the high-probability chain length of S is 2.    \(\square \)

In other words, Property 1 implies that S-boxes belonging to the same affine equivalent class have the same high-probability chain length when \(M_{out} = M_{in}^{-1}\).

Property 2

If for any non-zero \(\delta _{in}\) (resp. \(\delta _{out}\)), there exist \(\delta _{out}\) (resp. \(\delta _{in}\)) such that the transition \(\delta _{in} \rightarrow \delta _{out}\) is high-probability, the high-probability chain length is always infinite.

Proof

Let us focus on one high-probability transition \(\delta _{in} \rightarrow \delta _{out}\). When Property 2 holds, we always continue high-probability transitions from \(\delta _{out}\) (resp, to \(\delta _{in}\)). Therefore, the high-probability chain length is infinite.    \(\square \)

We call that high-probability transitions are uniformly distributed in the DDT when the DDT fulfills Property 2.

Property 3

If the S-box is an involution, i.e., \(S(S(x))=x\) for any x, the high-probability chain length is always infinite.

Proof

In any involution S-box, when an input difference \(\varDelta _{in}\) propagates to \(\varDelta _{out}\) with high probability, \(\varDelta _{out}\) also propagates to \(\varDelta _{in}\) with high probability. Therefore, the high-probability chain length is always infinite.    \(\square \)

Table 3. Forty S-boxes with the shortest high-probability chain lengths
Fig. 9.
figure 9

DDT of S-box described in Example 1 and its high-probability chain

Property 1 implies that the exhaustive evaluation is not difficult. The number of invertible \(4 \times 4\) matrices is 20160, and the search space is only \(16 \times 20160\) for the exhaustive evaluation of 16 classes. We evaluated all S-boxes belonging to 16 classes, and each maximum high-probability chain length is summarized in Table 2. Since high-probability transitions are uniformly distributed in \(G_3\) and \(G_5\), there is no S-box whose high-probability chain length is finite because of Property 2. As a result, the shortest maximum high-probability chain length is 2, and we can find 10, 10, and 20 S-boxes in \(G_7\), \(G_{11}\), and \(G_{12}\), respectively. Table 3 summarizes these 40 S-boxes. Note that all S-boxes with the form \(M^{-1} \times S(M \times x \oplus c_{in}) \oplus c_{out}\) have the shortest maximum high-probability chain length 2 when S is chosen from 40 S-boxes listed in Table 3.

Example 1 (Example of S-boxes with maximum high-probability chain length 2)

The following S-box

$$ \{ \mathtt 0xA, 0x4, 0xD, 0x5, 0x7, 0x3, 0xE, 0xF, 0x0, 0x6, 0x8, 0x2, 0x1, 0xC, 0x9, 0xB \} $$

belongs to class \(G_7\), and the maximum high-probability chain length is 2. Figure 9 shows the DDT and its high-probability chain. Note that this S-box is generated as \(M^{-1} \times S(M \times x \oplus \mathtt{0x5}) \oplus \mathtt{0x3}\), where S-box S and (bijective) linear transformation M are specified as

$$\begin{aligned} S&= \{ \mathtt 0x0, 0x4, 0x2, 0xB, 0xA, 0xC, 0x9, 0x8, 0x5, 0xF, 0xD, 0x3, 0x7, 0x1, 0x6, 0xE \}, \\ M&= \{ \mathtt 0x0, 0x5, 0xE, 0xB, 0xC, 0x9, 0x2, 0x7, 0xA, 0xF, 0x4, 0x1, 0x6, 0x3, 0x8, 0xD \}. \end{aligned}$$

4.4 Case Study with Midori64

Specification of Midori 64. Midori64 is an SPN block cipher. The block size is 64 bits and the state is denoted as a \(4 \times 4\) nibble array as follows.

$$\begin{aligned} \begin{pmatrix} s_0 &{} s_4 &{} s_8 &{} s_{12} \\ s_1 &{} s_5 &{} s_9 &{} s_{13} \\ s_2 &{} s_6 &{} s_{10} &{} s_{14} \\ s_3 &{} s_7 &{} s_{11} &{} s_{15} \end{pmatrix} \end{aligned}$$

The round function applies the following four operations to the state.

  • SubCell: A 4-bit to 4-bit S-box is applied to 16 nibbles.

  • ShuffleCell: Each cell of the state is permuted as follows:

    \((s_0, s_1, \ldots , s_{15}) \leftarrow (s_0, s_{10}, s_5, s_{15}, s_{14}, s_4, s_{11}, s_1, s_9, s_3, s_{12}, s_6, s_7, s_{13}, s_2, s_8).\)

  • MixColumn: Four nibbles in each column are multiplied by the binary matrix, where all entries except for the first diagonal take 1.

  • KeyAdd: A round key is XORed to some of the nibbles of the state. We omit their details because these operations do not have any impact on MDCP.

Tight MDCP on Original Midori 64. The designers of Midori evaluated the minimum number of active S-boxes, and it reaches 35 after 7 rounds. Then, the upper bound of the MDCP given by the minimum number of active S-boxes is lower than \(2^{-64}\) because MDP of the S-box is \(2^{-2}\). The tightness of the upper bound was reported in [32].

Replacing S-Box. We replace the original S-box of Midori64 with that described in Example 1 and evaluate the tight MDCP.

Table 4. The maximum differential characteristic probabilities of Midori64

Table 4 shows the detail of the result. To evaluate the MDCP, we used the SAT/SMT solver STP [33]. Our modeling is based on the method shown by CryptoSMT [34], but the modeling for the S-box is replaced with that for our designed S-box. As described above, the upper bounds of the MDCP given by the minimum number of active S-boxes are always tight in the original S-box of Midori64. However, when the S-box shown in Example 1 is used, the MDCP becomes lower than the original from three rounds because the high-probability chain length is at most 2. Moreover, we can see significant gain in 6 rounds because the chain is disconnected in two out of six rounds. While the original 6-round Midori64 is insufficient to prove that the MDCP is lower than \(2^{-64}\), 6 rounds are enough to prove it when our designed S-box is adopted.Footnote 3

4.5 Case Study with SKINNY

We applied the same 4-bit S-box with SKINNY64, and its result is summarized in Fig. 3. To apply the same idea to SKINNY128, we need to design an 8-bit S-box whose high-probability chain length is short. We also investigate such S-boxes and applied to SKINNY128. The result is shown in the full version of this paper in detail.

4.6 Remark for Implementation Performance

Unlike the case of Lilliput, our S-box with the shortest high-probability chain does not satisfy the original criteria of Midori and SKINNY regarding the implementation performance. Therefore, we do not claim that our S-box is better than the original, though our S-box enhances security against differential cryptanalysis. Our goal of those demonstrations is to show the proof-of-concept about the possibility of enhancing security against differential cryptanalysis by considering the interplay between S-boxes and the linear layer even in AES-bMC ciphers. We believe that the approach we present should be taken into consideration for future designs.

5 Discussion and Conclusion

5.1 Evaluation of Differential Effect

Our S-boxes lower the MDCP, but a more accurate probability should be estimated by considering the differential effect. For example, even if the MDCP is lower, the differential probability including the differential effect could be higher if the number of characteristics with the MDCP is relatively larger. Unfortunately, it is difficult to predict the differential effect in advance.

Table 5. Differential effect on 6-round Midori64 with our modified S-box

As an exercise to understand the differential effect of our S-box, we compared the original S-box and our one by evaluating the differential effect on the 6-round Midori64. The differential effect of the original Midori64 was discussed in [32]. The differential effect is extremely strong, which increases the probability from \(2^{-60}\) to \(2^{-48.36}\). Thus, the gain is \(2^{11.64} = 2^{-48.36}/2^{-60}\). Similarly to [32], we evaluated the differential effect of 6-round Midori64 using our S-box. The best differential characteristic of the original Midori64 no longer leads to the highest probability when our S-box is used. Instead, the following input and output differences

$$\begin{aligned} \begin{pmatrix} \mathtt {E} &{} \mathtt {0} &{} \mathtt {0} &{} \mathtt {0} \\ \mathtt {E} &{} \mathtt {0} &{} \mathtt {0} &{} \mathtt {E} \\ \mathtt {E} &{} \mathtt {E} &{} \mathtt {0} &{} \mathtt {0} \\ \mathtt {E} &{} \mathtt {0} &{} \mathtt {E} &{} \mathtt {0} \end{pmatrix} \xrightarrow {\mathrm{6~rounds}} \begin{pmatrix} \mathtt {0} &{} \mathtt {3} &{} \mathtt {3} &{} \mathtt {3} \\ \mathtt {0} &{} \mathtt {3} &{} \mathtt {0} &{} \mathtt {3} \\ \mathtt {0} &{} \mathtt {3} &{} \mathtt {3} &{} \mathtt {0} \\ \mathtt {0} &{} \mathtt {0} &{} \mathtt {3} &{} \mathtt {3} \end{pmatrix} \end{aligned}$$

include the best differential characteristic with probability \(2^{-68}\), where the last ShuffleCell and MixColumn are removed in the last round. Table 5 summarizes the differential effect of this input/output differences. While the MDCP is \(2^{-68}\), the probability increases to about \(2^{-61}\) due to the differential effect. Thus, the gain is about \(2^{5} = 2^{-61}/2^{-66}\). Compared to the gain \(2^{11.64}\) in the original Midori64, the gain \(2^5\) in ours is relatively small.

5.2 On Application to Linear Cryptanalysis

Linear cryptanalysis is also an important method similarly to differential cryptanalysis. Therefore, we observe the case of linear cryptanalysis.

Corresponding to the DDT of differential cryptanalysis, the linear approximation table (LAT) is constructed in linear cryptanalysis. The LAT also includes transition with both high and low correlations. Thus, our idea can be also applied to linear cryptanalysis conceptually by focusing on the ARP and transitions with the highest correlation in the LAT.

Fig. 10.
figure 10

Results of Lilliput for linear cryptanalysis

The number of transitions with the highest correlation in the LAT tends to be more than that in the DDT. However, avoiding the identity high-correlation transitions is still possible. We discuss the case of the EGFN in the full version of this paper and show that our S-box is useful for linear cryptanalysis too. Figure 10 shows our result of Lilliput for the linear cryptanalysis. While 15 rounds are necessary to lower the probability to less than \(2^{-64}\) in the original, 14 rounds are enough in ours. Unfortunately, avoiding infinite chain length is more difficult due to more high-correlation transitions, and application to AES-bMC ciphers looks difficult. In practice, when we evaluated high-probability chain lengths for LAT, all S-boxes belonging to the sixteen 4-bit S-box classes have infinite high-probability chains. On the other hand, considering the SKINNY-like 8-bit S-box, the maximum (linear) high-probability chain length can be 3: \(\mathtt 0x04 \rightarrow 0x40 \rightarrow 0x08 \rightarrow 0x01\) and \(\mathtt 0x20 \rightarrow 0x02 \rightarrow 0x80 \rightarrow 0x10\).

5.3 Concluding Remarks

In this paper, we applied a method of designing S-boxes to take care of the ARP of the linear layer for EGFN and AES-bMC ciphers. We observed that ARP of the EGFN ciphers is the identity-differential transition with MDP. For its instance Lilliput, we showed that such a property can be avoided only by swapping the two bits of the S-box output, which imposes negligible performance overhead. By replacing Lilliput’s S-box with the new one, we show that the number of rounds to ensure \(2^{-64}\) can be reduced. We then observed that ARP of the AES-bMC ciphers is the length of the high-probability chain. We list 4-bit S-boxes achieving the maximum chain length of 2. The effect of using such S-boxes is demonstrated by replacing the original S-box of Midori64 and SKINNY with the new ones.

We aimed to show that it would be possible to go beyond the simple bound of \(( MDP )^{minAS}\) by designing the S-box considering the crucial property of the linear layer. Applications of AES-bMC ciphers and EGFN ciphers are just the first step in this direction, and we believe that the same idea can be extended to more structures. We hope that this paper will provide useful knowledge for future block cipher designers.