Keywords

1 Introduction

Differential cryptanalysis [5] evaluates the propagation of an input difference \(\delta X = X\oplus X'\) between two plaintexts X and \(X'\) through the ciphering process. Indeed, differential attacks exploit the fact that the probability of observing a specific output difference given a specific input difference is not uniformly distributed. Today, differential cryptanalysis is public knowledge, and block ciphers such as AES have proven bounds against differential attacks. A classical extension of differential cryptanalysis is the so called related-key differential cryptanalysis [4] that allows an attacker to inject differences not only between the plaintexts X and \(X'\) but also between the keys K and \(K'\) (even if the secret key K stays unknown from the attacker). This attack has been recently extended to tweakable block ciphers [3]. Those particular ciphers allow in addition to the key, a public value called a tweak. Thus, related-tweakey differential attacks allow related-key differences but also related-tweak differences (i.e. differences in a pair of tweaks \((T, T')\)). In differential attacks, two notions are considered: first, differentials where only the input and the output differences are known; and differential characteristics where each difference after each round is completely specified. A classical approach to evaluate the resistance against differential attacks is to compute the probability of the best differential characteristic of the cipher.

Finding optimal (related-tweakey) differential characteristics is a highly combinatorial problem that hardly scales. To limit this explosion, a common solution consists in using a truncated representation [16] for which cells are abstracted by single bits that indicate whether sequences contain differences or not. Typically, each cell (i.e. byte or nibble) is abstracted by a single bit (or, equivalently, a Boolean value). In this case, the goal is no longer to find the exact input and output differences, but to find the positions of these differences, i.e., the presence or absence of a difference for every cell. When a difference is present at the input of an S-box, we talk about an active S-box or an active byte/nibble. However, some truncated representations may not be valid (i.e., there do not exist actual byte values corresponding to these difference positions) because some constraints at the byte level are relaxed when reasoning on difference positions.

Hence, the optimal (related-tweakey) differential characteristic problem is usually solved in two steps [1, 6]. In the first one, every differential byte is abstracted by a Boolean variable, denoted by \(\varDelta \), that indicates whether there is a difference or not at this position, and we search for all truncated representations of low weight as the less differences passing through S-boxes there are, the more the probability is increased. Then, for each of these low weight truncated representations, the second step aims at deciding whether it is valid (i.e., whether it is possible to find actual cell values, denoted \(\delta \), for every Boolean variable) and, if it is valid, at finding the actual cell values that maximize the probability of obtaining the output difference given the input difference.

Related Work. Many techniques have been proposed to search for the Step 1 solutions using automatic tools such as Boolean satisfiability (SAT) [21, 26, 27] or Mixed Integer Linear Programming (MILP) [3, 24, 30] and Satisfiability Modulo Theories (SMT) [17]. Dedicated solutions have also been proposed [20].

Regarding the search of the best instantiation of a truncated characteristic, most of the approaches were ad-hoc and dedicated to a precise cipher [6, 9,10,11, 18, 28]. Concerning the use of SAT solvers, [28] implements a SAT model for differential cryptanalysis based on Cryptominisat5 [26] for Midori64 and LED64. This model implies a sufficiently small number of clauses to model the non-zero values of the DDT and to be applicable. However, no result concerning 8-bit S-boxes are given. As SAT uses Boolean formulas, it seems that the same problem than for MILP appears for modeling S-box: a huge number of Boolean formulas will be necessary to correctly model this step even if dedicated tools as Logic Friday or the Expresso algorithm [1] are used. In [1], 16 days are needed to find the best related tweakey differential characteristics on SKINNY-128 for the SK model. Recently, in [11, 12], the authors introduce Constraint Programming (CP) models for Step 2 and the performance results are really promising regarding AES-192 and AES-256.

Our Contribution. In this paper, we refine the security bounds on the SKINNY-n tweakable block cipher regarding differential cryptanalysis for the four following attack models according to the size of the tweakey: the SK model focuses on single-key attack, the TK1 model considers related-tweakey attack when the tweakey has only one component, the TK2 model in the related-tweakey settings considers 2 components and the TK3 model, 3 components.

To do so, we implement Step 1 using an ad-hoc method inspired from [10]. We also propose a CP model for Step 2 taking as input the solutions outputted by Step 1. Thus, we provide, for the first time, the best differential related-tweakey characteristics up to 14 rounds for the TK1 model. We also consider the TK2 and TK3 models and we were able to found some differential characteristics up to 16 rounds for the TK2 model and up to 17 rounds for the TK3 model of SKINNY-128. However, we were not able to test all the solutions Step 1, and thus these differential characteristics are not necessarily optimal. This is an important improvement compared to previous results. For instance, in [19] Liu et al. could only find the best differential characteristics up to 7 and 9 rounds for TK1 and TK2. Finally we also show there is no differential characteristic with probability higher than \(2^{-128}\) against 15 rounds in the TK1 model, 19 rounds for TK2 and 23 rounds for TK3. All those results clearly show that SKINNY is much more resistant to differential cryptanalysis than one would expect while counting the number of active S-boxes.

As a feedback, we also provide the time results we obtain when implementing the Step 1 using another tool, a MILP model for the 4 attack settings. As a result we show that MILP is not always the best choice. First, for Step 1, the ad-hoc method is able to surpass the MILP model. Second, the CP model proposed for Step 2 is incomparably much faster than the MILP model proposed in [1] that requires 16 days according their paper.

All the codes to reproduce these results can be found at [7].

Organization of the Paper. Section 2 gives a short description of SKINNY-n; Sect. 3 presents our Ad-Hoc tool and gives performance results comparing our Ad-Hoc model with a MILP one; Sect. 4 presents our dedicated modeling for Step 2 based on CP and analyzes the obtained results. Finally, Sect. 5 concludes this paper.

2 Cipher Under Study: SKINNY-n

In this section, we briefly review the tweakable block cipher SKINNY-n where n denotes the block size and can be equal to 64 or 128 bits. All the details that have been overlooked can be found in [3].

As its name indicates, it enciphers blocks of length 64 or 128 bits seen as a \(4 \times 4\) matrix of cells (nibbles for \(n=64\) or bytes for \(n=128\)). We denote \(x_{i,j,k}\) the cell at row i and column j of the internal state at the beginning of round k (i.e. \( 0 \le i,j \le 3\) and \(0 \le k \le r+1\) where r is the number of rounds depending on the tweak length and on the key length). SKINNY-n follows the TWEAKEY framework from [15]. SKINNY-n has three main tweakey size versions: the tweakey size can be equal to \(t=64\) or 128 bits, \(t=128\) or 256 bits and \(t=192\) or 384 bits and we denote \(z=t/n\) the tweakey size to block size ratio. Then, the number of rounds is directly derived from the z value: between 32 rounds for the 64/64 version up to 56 for the 128/384 version.

The tweakey state is also viewed as a collection of z \(4 \times 4\) square arrays of cells (nibbles for \(n=64\) or bytes for \(n=128\)). We denote these arrays TK1 when \(z=1\), TK1 and TK2 when \(z= 2\), and finally TK1, TK2 and TK3 when \(z= 3\). We also denote by \(TKk_{i,j}\) the nibble or the byte at position [ij] in TKk. Moreover, we define the associated adversarial model SK (resp. TK1, TK2 or TK3) where the attacker cannot (resp. can) introduce differences in the tweakey state.

One encryption round of SKINNY is composed of five operations applied in the following order: SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR) and MixColumns (MC) (see Fig. 1).

Fig. 1.
figure 1

The SKINNY round function with its five transformations [14].

  • SubCells. A 4-bit (\(n=64\)) or an 8-bit (\(n=128\)) S-box is applied to each cell of the state. See [3] for the details of the S-boxes.

  • AddConstants. A 6-bit affine LFSR is used to generate round constants \(c_0\) and \(c_1\) that are XORed to the state at position [0, 0] and [1, 0] whereas the constant \(c_2=\mathtt {0x02}\) is XORed to the position [2, 0].

  • AddRoundTweakey. The first and second rows of all tweakey arrays are extracted and bitwise exclusive-ored to the cipher internal state, respecting the array positioning. More formally, we have:

    • \(x_{i,j} = x_{i,j} \oplus TK1_{i,j}\) when \(z = 1\),

    • \(x_{i,j} = x_{i,j} \oplus TK1_{i,j} \oplus TK2_{i,j}\) when \(z = 2\),

    • \(x_{i,j}=x_{i,j}\oplus TK1_{i,j}\oplus TK2_{i,j} \oplus TK3_{i,j}\) when \(z = 3\).

    Then, the tweakey arrays are updated. First, a permutation \(P_T\) is applied on the cells positions of all tweakey arrays: if \(\ell =4*i+j\) where i is the row index and j is the column index, then the cell \(\ell \) is moved to position \(P_T(\ell )\) where \(P_T = [9, 15, 8, 13, 10, 14, 12, 11, 0, 1, 2, 3, 4, 5, 6, 7]\). Second, every cell of the first and second rows of TK2 and TK3 are individually updated with an LFSR on 4 bits (when \(n=64\)) or on 8 bits (when \(n=128\)) with a period equal to 15.

  • ShiftRows. The rows of the cipher state cell array are rotated to the right. More precisely, the second (resp. third and fourth) cell row is rotated by 1 position (resp. 2 and 3 positions).

  • MixColumns. Each column of the cipher internal state array is multiplied by the \(4\times 4\) binary matrix M:

    figure a

Since 2016 and the birth of SKINNY-128, the cryptographic world never stopped trying to attack it. Among all the cryptanalysis results, we could cite the following ones in the related-tweakey settings and classified according the type of attacks. First, in [19, 25, 31], boomerang and rectangle related-tweakey attacks are considered. The best result is on 28 rounds with a complexity of \(2^{315}\) in time based on a boomerang distinguisher of 23 rounds in the TK3 scenario. Concerning impossible related-tweakey attack [19, 29], the best attack has 23 rounds using a distinguisher with 15 rounds in the TK2 scenario. Even if the distinguishers presented here have less rounds, they do not look at the same attack scenario. This paper essentially goes further than [1] concerning the search of the best related-tweakey differential trails and aims at refining the best security bounds of SKINNY in this attack model.

3 Models and Results for Step 1

As explained in the introduction, in a first step called Step 1, we abstract each possible difference at cell (nibble or byte) level by a binary variable which symbolizes the presence/absence of a difference value at a given position of the cipher. The main concern regarding this step is the combinatorial explosion induced by the abstract XOR operation for which the sum of two non-zero values can lead to the presence or the absence of a difference.

3.1 Possible Transitions

Since the S-box is bijective and the ShiftRows operation only permutes cells, both those operations do not affect truncated differences. But for the AddRoundTweakey and MixColumns transformations we need to take care of the XOR operation. More precisely, given two truncated differences a and b we know that the possible values of \((a, b, a \oplus b)\) are:

$$ (0,0,0), (0,1,1), (1,0,1), (1,1,0), (1,1,1) $$

However we have to pay attention to uninstantiable solutions. For instance, given three truncated differences a, b and c, (1, 1, 1, 0, 0, 1) is a possible value for \((a,b,c, a \oplus b, a \oplus c, b\oplus c)\) but it is impossible to instantiate it because if \(a = b\) and \(a = c\) then \(b = c\).

Hence we rewrite the equation \(y = \texttt {MixColumns} \circ \texttt {AddRoundTweakey} (x, k)\) to avoid such patterns:

  • \(y[1] = x[0] \oplus k[0]\),

  • \(y[3] = y[1] \oplus x[2]\),

  • \(y[0] = y[3] \oplus x[3]\),

  • \(y[2] = x[1] \oplus k[1] \oplus x[3]\)

We experimentally verified that each truncated solution of this system can be instantiated.

Keyschedule. When looking at the key schedule of SKINNY at the cell level and for truncated differential characteristics it is mostly a simple cell permutation. In the model SK, there are no differences in the round keys. In the TKx models, differences in the round keys are possible. If the number of rounds targeted is at most 30, the rule for active cells on the round keys is quite simple: either the cell is inactive for all round keys, either it is active for all round keys but one (TK2) or two (TK3).

3.2 Ad-hoc Models for Step 1

To the best of our knowledge, the most efficient algorithm to search for truncated differential characteristics on SPN ciphers is the one described in [10] by Fouque et al. which was applied on the 3 versions of AES. It is mostly dynamic programming as Round i is independent of the paths of rounds \(0, 1, \ldots , i-1\) and at each step we only have to save, for each truncated state, the minimal number of active S-boxes to reach it. Hence, the complexity of this algorithm is exponential in the state size but linear in the number of rounds. The algorithm is specified in Algorithm 1. At the end of the algorithm we obtain an array C such that C[r][s] contains the minimal number of active S-boxes required to reach state s after r rounds. Retrieving the truncated representations is then done quite easily using C, starting from the last state to the first. Let say we want to exhaust all truncated differential characteristics on R rounds with at most b active S-boxes ending with state s. From \(C[R-1][s]\), we know whether such characteristic exists or not. If \(C[R-1][s] \le b\) we exhaust all states \(s'\) such that the transition \(s' \rightarrow s\) through one round is possible and, for each of them, we now need to exhaust all truncated differential characteristics on \(R-1\) rounds with at most \(b - \vert s \vert \) active S-boxes ending with state \(s'\).

figure b

The complexity of the algorithm in the single key model is very low, and we experimentally counted around \((R-1) \times 2^{20}\) simple operations for R rounds. A naive solution to search for truncated representations in the TK1, TK2 and TK3 models would be to apply the previous algorithm for each possible configuration of the key. While for TK1 this would only increase the overall complexity by a factor \(2^{16}\), the search would not be practical for both the TK2 and TK3 models. Indeed, because of the possible cancellations occurring in the round keys, the number of configurations is very high:

$$ \left( \sum _{k = 0}^{8} \left( {\begin{array}{c}8\\ k\end{array}}\right) \left( \sum _{i = 0}^{tk-1} \left( {\begin{array}{c}\left\lfloor (R-1)/2 \right\rfloor \\ i\end{array}}\right) \right) ^{k}\right) \left( \sum _{k = 0}^{8} \left( {\begin{array}{c}8\\ k\end{array}}\right) \left( \sum _{i = 0}^{tk-1} \left( {\begin{array}{c}\left\lceil (R-1)/2 \right\rceil \\ i\end{array}}\right) \right) ^{k}\right) . $$

For instance, for \(R = 30\), there are more than \(2^{64}\) configurations in the TK2 model.

In the following we present the first practical algorithm which tackles down the problem for the TK models without relying on a black box solver as MILP, SAT or CP solvers. Actually this is the only algorithm fast enough to generate all the Step 1 solutions required to perform the Step 2. Indeed, the best differential characteristic is rarely based on the truncated differential characteristics minimizing the number of active S-boxes and thus we need to generate a large number of truncated characteristics to find the one instantiating with the best probability. As we will explain in Sect. 3.4, all other approaches we tried to generate them failed.

The idea of our ad-hoc method is quite similar to the one used in the single key model. Actually, to compute the minimal number of active S-boxes at round \(r+1\) we only need to know the minimal number of active S-boxes for each possible state at round r together with the number of cancellations for each key cell occurred so far. Indeed, we do not need to know at which rounds the cancellations occurred but only how many times they did. A simplified version of this algorithm is described in Algorithm 2. The most important part is related to the variable cancelled which count how many times each key cell is cancelled through the encryption. It is a vector of 16 cells, each cell taking values among \(\{0,1, \ldots , x\,-\,1, r\}\) for the TKx model. The main advantage of our representation is that at each step of the algorithm, C[r][s] contains at most \((x+1)^{16}\) elements for the TKx model which is much lower than the number of possible sequences of round keys.

figure c

Finally we introduce a new improvement which greatly speeds up the search procedure. It is based on the so-called early abort technique principle and the idea is to handle the key cell by cell. Indeed, we expect that the best truncated differential characteristics do not involve many active cells in the round key and so we want to quickly cut those branches during the search. To do so we first pick a key cell and guess whether it is active or not. At this step we have not decided yet if any cancellations occur nor their positions but only if it is always 0 or at least once 1. Then we apply the algorithm partially and guess another key cell if and only if it seems possible to find a truncated differential characteristic with a small enough number of active S-boxes. More precisely, along the search we have the relation \(y = x \oplus k\) where k is the round key. We introduce a new 16-bit variable g such that \(g_i = 0\) if we made a choice for bit i of k and 1 otherwise. To compute the possible truncated transitions from x to y through k for all the possible key (according to g) we can restrict ourself at looking at the possible truncated transitions from (x|g) to y through (k|g) where | is the bitwise OR. Indeed, we use the fact that in truncated setting \(1 \oplus 1\) is 0 or 1 and thus our technique allows to handle all the possible keys by looking only at few transitions.

3.3 Results for Step 1

For Step 1, we run our ad-hoc tool on the four attack scenarios (SK, TK1, TK2, and TK3) when varying the number of rounds between 3 and 20. We conducted all our experiments on our server composed of \(2\times \) AMD EPYC 7742 64-Core and 1TB of RAM. In particular, we were able to complete the security analysis made in [2, 3] and claim that the minimal number of active S-boxes in TK1 for 28, 29 and 30 rounds are 105, 109 and 113 respectively (as shown in Table 1).

Table 1. Lower bounds on the number of active S-boxes in SKINNY.

However, the optimal solution of Step 2, in terms of differential characteristic probability, could be obtained for a number of active S-boxes which is not the optimal one. Hereafter, we denote \(Obj_{Step1}\) the number of active S-boxes we consider when solving the problems. For example, assume that, when processing Step 2, one obtains a differential characteristic with the best probability equal to \(2^{-3 \times 6}=2^{-18}\) with \(Obj_{Step1}=6\) and whereas the optimal differential probability of the S-box is \(2^{-2}\). It means that one has to test all solutions outputted by Step 1 until \(Obj_{Step1} < 18/2 = 9\) to be sure that none has a better differential characteristic probability. This is exactly what happened for the case of SKINNY-128 in the TK models. We only want to stress here that computing the optimal bounds is often not enough and we need to go further. However, increasing the value of \(Obj_{Step1}\) induces an increase of the possible number of Step 1 solutions as illustrated in the third column of Table 4. As one can see, this number of solutions tends to grow exponentially when we increase v. For example, for SKINNY-128 with 14 rounds in the TK1 model, for the optimal value \(v^*=45\), Step 1 outputs only 3 solutions; whereas we have 897 solutions for \(v=v^*+5=50\); 137 019 solutions for \(v=v^*+10=55\) and finally 7 241 601 solutions for \(v=59\). So, the time required to output all those Step 1 solutions and the time required for the Step 2 computations on 1 solution outputted by the Step 1 become the bottleneck of the overall process.

3.4 Other Approaches

We tried different approaches to solve the Step 1 problem, including MILP, SAT and CP models.

Our SAT model is encoded through the high level modeling language MiniZinc while our CP model is based on the Choco-solver. Unfortunately, the results of both the SAT and the CP models are really bad: for example, for all instances greater than 16 rounds we were unable to obtain the solutions in reasonable time. This is mainly due to the need to enumerate solutions for SAT, which implies to prohibit all solutions previously found. For CP, on the other hand, this has to do with the nature of the Boolean variables themselves where the Choco-solver can not efficiently propagate lower bounds and upper bounds on Boolean variables.

Our MILP model was much better than our SAT and CP ones. We started from the original model presented in [3] but made several optimizations. First, we added constraints in the SK model to obtain all solutions up to column shifts in order to remove symmetries. Moreover, as the original model only describes the way to find the minimal number of active S-boxes, we added a constraint in each model to set a lower bound on the number of active S-boxes and thus, be able to enumerate all the Step 1 solutions given a particular lower bound for the number of active S-boxes. Then, in the original MILP model all xor operations were modeled using dummy variables which is known to be inefficient. Thus we replaced the corresponding inequalities, using that \(x \,\oplus \, y \,\oplus \,z = 0\) can be described with the three inequalities:

$$\begin{aligned} \{x + y \ge z\},~\{x + z \ge y\},~\{y + z \ge x\}. \end{aligned}$$

Finally, regarding the resolutions of the MILP models, the parallelization were left to the Gurobi solver.Footnote 1

We compared the MILP model to our ad-hoc tool and we found that our MILP model is much slower in most cases and actually too slow to output all the Step 1 solutions needed to perform Step 2. Running times are given in Table 2.

Table 2. Comparison of the running times required to generate all Step 1 solutions between our MILP and ad-hoc approaches.

Note that while our ad-hoc tool gave very good running times, it may require a lot of memory to store the array C. For instance, for 30 rounds in TK3 mode, our tool required up to 500 GB of RAM to finish the search. It is also important to note that it did not take fully advantage of the 128 cores of our server, and most often used less than 40 cores.

4 Modeling Step 2 with CP

The aim of Step 2 is to try to instantiate the abstracted solutions provided by Step 1 while maximizing the probability of the differential characteristic. Thus, Step 2 takes as input a solution of Step 1 with the objective function of maximizing the probability of the differential characteristic. However, some solutions of Step 1 could not be instantiated in Step 2 as refining the abstraction level of Step 2 will induce non-consistent solutions. In the literature, this step has been modeled using ad-hoc methods [6], MILP [1], SAT [28] or CP [12]. As MILP [1] and SAT [28] seem to hardly scale due to prohibitive computational times (linked with the size of the 8-bit S-boxes that must be represented in the form of linear inequalities or of clauses), we focus here on a dedicated CP method implemented using the Choco solver [22]. We also provide, in the second part of this section, the results we obtain when instantiating the differential characteristics in the 4 attack scenarios.

4.1 Constraint Programming

Although less usual than MILP to tackle cryptanalytic problems, CP has already been used in e.g. [9, 13]. We recall some basic principles of CP and we refer the reader to [23] for more details.

CP is used to solve Constraint Satisfaction Problems (CSPs). A CSP is defined by a triple (XDC) such that \(X = \{x_1,x_2,\dots ,x_n\}\) is a finite set of variables, D is a function that maps every variable \(x_i\in X\) to its domain \(D(x_i)\) and \(C=\{c_1,c_2,\dots ,c_m\}\) is a set of constraints. \(D(x_i)\) is a finite ordered set of integer values to which the variable \(x_i\) can be assigned to, whereas \(c_j\) defines a relation between some variables \(vars(c_j) \subseteq X\). This relation restricts the set of values that may be assigned simultaneously to \(vars(c_j)\). Each constraint is equipped with a filtering algorithm which removes from the domains of \(vars(c_j)\), the values that cannot satisfy \(c_j\).

In CP, constraints are classified in two categories. Extensional constraints, also called table constraints, explicitly define the allowed (or forbidden) tuples of the relation. Intentional constraints define the relation using mathematical operators. For instance, in a CSP with \(X=\{x_1,x_2,x_3\}\) such that \(D(x_1)=D(x_2)=D(x_3)=\{0,1\}\), a constraint ensuring that the sum of the variables in X is different from 1 can be either expressed in extension (1) or in intention (2):

  1. 1.
  2. 2.

    \(x_1+x_2+x_3\ne 1\)

Actually, any intentional constraint can be encoded with an extensional one provided enough memory space, and conversely [8]. However, they may offer different performances.

The purpose of a CSP is to find a solution, i.e. an assignment of all variables to a value from their respective domains such that all the constraints are simultaneously satisfied. When looking for a solution, a two-phase mechanism is operated: the search space exploration and the constraint propagation. The exploration of the search space is processed using a depth-first search. At each step, a decision is taken, i.e. a non-assigned variable is selected and its domain is reduced to a singleton. This modification requires to check the satisfiability of all the constraints. This is achieved thanks to constraint propagation which applies each constraint filtering algorithm. Any application may trigger modifications in turn; the propagation ends when either no modification occurs and all constraints are satisfied or a failure is thrown, i.e., at least one constraint cannot be satisfied. In the former case, if all variables are assigned, a solution has been found. Otherwise a new decision is taken and the search is pursued. In the latter case, a backtrack to the first refutable decision is made and the search is resumed.

Turning a CSP into a Constrained Optimisation Problem (COP) is done by adding an objective function. Such a function is defined over variables of X, the purpose is then to find the solution that optimizes the objective function. Finding the optimal solution is done by repeatedly applying the two-phase mechanism above, and by adding a cut on the objective function that prevents from finding a same cost solution in the future.

4.2 Modeling Step 2 with CP

Given a Boolean solution for Step 1, Step 2 aims at searching for the byte-consistent solution with the highest (related-tweakey) differential characteristic probability (or proving that there is no byte-consistent solution). In this section, Model 1 describes the CP model we used for SKINNY-128 (SK). Actually, the ones used to model the other variants, as well as SKINNY-64 are rather similar.

figure d

For each Boolean variable \(\varDelta X_{r,i,j}\) of Step 1, we define an integer variable \(\delta X_{r,i,j}\). The domain of this integer variable depends on the value of the Boolean variable in the Step 1 solution: If \(\varDelta X_{r,i,j}=0\), then the domain is \(D(\delta X_{r,i,j})=\{0\}\) (i.e., \(\delta X_{r,i,j}\) is also assigned to 0); otherwise, the domain is \(D(\delta X_{r,i,j})=[1,255]\). For each byte that passes through an S-box, we define an integer variable \(\delta S\!B_{r,i,j}\) which corresponds to the difference after the S-box. Its domain is \(\{0\}\) if \(\varDelta X_{r,i,j}\) is assigned to 0 in the Step 1 solution; otherwise, it is \(D(\delta S\!B_{r,i,j})=[1,255]\). This is expressed in (3) of Model 1.

Finally, as we look for a byte-consistent solution with maximal probability, we also add an integer variable \(P_{r,i,j}\) for each byte in an S-box: this variable corresponds to the absolute value of the base 2 logarithm of the probability of the transition through the S-box. Actually, a factor 10 has been applied to avoid considering floats. Thus we define a Table constraint (4) composed of valid triplets of the form \((\delta X_{r,i,j},\delta S\!B_{r,i,j}, P_{r,i,j})\). Note that these triplets only contain non-zero values and that \(P_{r,i,j}\) takes only 2 different values for the 4-bit S-box (SKINNY-64) and 7 different values for the 8-bit S-box (SKINNY-128). There are roughly \(2^{14}\) triplet elements in the Table constraint for the SKINNY-128 case. As the S-box layer is the only non-linear layer, the other operations could be directly implemented in a deterministic way at the cell level. The associated constraints thus follow the SKINNY-128 linear operations. When possible, i.e. when one element is known to be zero, we replace XOR constraints (encoded using Tableconstraints) by a simple equality constraint. This corresponds to Table  constraints (5), (6),  (7) and  (8) in Model 1.

The overall goal is finally to find a byte-consistent solution which maximizes differential characteristic probability. Thus, we define an integer variable \(Obj_{Step2}\) to minimize the sum of all \(P_{r,i,j}\) variables (1). This value mainly depends on the number of S-boxes outputted by Step1 \(Obj_{Step1}\) and can be bounded to \([\![20 \cdot Obj_{Step1}, 70\cdot Obj_{Step1}]\!]\) (2).

The differences for the models TK1, TK2 and TK3 are the modeling of the XORs induced by the lanes of the tweakey through XOR table constraints. Each XOR constraint depicted in Model 1 provides high quality filtering but requires 65536 tuples to be stored which results in prohibitive memory usage. This may limit the number of threads that can be used for the resolution, which was the case for TK2 and TK3. To get around this issue, we encoded the XOR constraint in intention (by defining filtering rules), providing a more memory efficient algorithm, at the expense of filtering strength. This last choice was applied for TK2 and TK3 (SKINNY-128 only). We also rely on Tableconstraints to model the LFSRs applied on TK2 and TK3.

Concerning the search space strategy, for the TK2 and the TK3 attack settings, the Step 1 only outputs the truncated value of the sum of the TKi. Thus, the search space strategy first looks at the cancellation places of the sum of the TKi and then instantiates the TKi values according to those positions. For the TK1 setting, we simply apply the default Choco-solver strategy.

Concerning the parallelization, we affect one solution outputted by Step 1 per thread and we share between the threads the value of \(Obj_{Step2}\).

4.3 Step 2 Performance Results

We run our Step 2 model on the two versions of SKINNY (SKINNY-64 and SKINNY-128) using our CP models written in Choco-solver. We conduct all our experiments on our server composed of \(2\times \) AMD EPYC 7742 64-Core and 1TB of RAM. All the reported times are real system times.

Up to our knowledge, we only found [1] that gives time results concerning finding the best SK differential characteristic probability on SKINNY-128 using a MILP tool based on Gurobi.

More precisely, the authors say: “In our experiments, we used Gurobi Optimizer with Xeon Processor E5-2699 (18 cores) in 128 GB RAM.” and, for 13 rounds, “in our environment, the test of 6 classes [Step 1 solutions with 58 active S-boxes without symmetries] finished in 16 days. Finally, it is proven that the tight bound on the probability of differential characteristic for 13 rounds is \(2^{-123}\)” in the SK model.

Regarding the TK models, the best known results were obtained by Liu et al. also using MILP models [19]. They could only find the best differential characteristics up to 7, 9 and 13 rounds for TK1, TK2 and TK3 respectively.

Results for SKINNY -64. We sum up in Table 3 all the results we obtain for SKINNY-64 in the four different attack models (SK,TK1,TK2 and TK3). The overall time, in this case, is not a bottleneck. We only give results concerning number of rounds that are at the limit (just under and just upper) when regarding the number of active S-boxes which is equal to 32 in the case of SKINNY-64 as the state size is 64 bits and as the best differential probability of the S-box is equal to \(2^{-2}\). Thus, the best overall differential characteristic probability must be under \(2^{-64}\).

Note that sometimes, we need to browse several \(Obj_{Step1}\) bounds to find the optimal differential characteristic probability when the number of rounds is fixed. Indeed, we need to proactively adapt the probability bound we found. For example, in the case of TK2 SKINNY-64 with 13 rounds, the optimal \(Obj_{Step1}\) is equal to 25 and when providing the Step 2 process with this \(Obj_{Step1}\) bound, we find a best differential characteristic probability equal to \(2^{-55}\). Thus, we need to enumerate all the Step 1 solutions with \(Obj_{Step1}=26\) and \(Obj_{Step1}=27\) to be sure that the previous probability is really the best one. Then, before running again Step 2 on those new results we adapt the best probability to the new bound equal to \(2^{-55}\) instead of the old bound equal to \(2^{-64}\).

We also provide in Appendix A the details of the best found differential characteristics.

Table 3. Overall results concerning SKINNY-64 in the four attack models. Step 2 time corresponds to the Step 2 time taken over all Step 1 solutions when \(Obj_{step1}\) takes the values precise in the first column. Best Pr corresponds to the best found probability of a differential characteristic.
Table 4. Overall results concerning SKINNY-128 in the four attack models. Step 2 time corresponds to the Step 2 time taken over all solutions of Step1-enum when \(Obj_{step1}\) takes the values precise in the first column. Best Pr corresponds to the best found probability of a differential characteristic.
Table 5. Overall results concerning SKINNY-128 with exactly one active cell in the tweakey.

Results for SKINNY -128. In the same way, we provide in Table 4 the best differential characteristic probability with the total time required for this search for the 4 different attack models. As one can see, we also verify all the possible values for \(Obj_{Step1}\) for a given number of rounds, depending on the probability value previously found. Thus, this time, the number of solutions outputted by Step 1 could be huge when we move away from the optimal Step 1 value \(v^*\). However, as the time spent to solve one solution is reasonable (at least when considering SK and TK1), our model scales reasonably well: the worst case requires 25 days of real time on our server on 8 threads and 31 GB of RAMFootnote 2. Our TK2 and TK3 models are based on XOR constraints encoded in intention (and not using tables) and these experiences have been launched using the 128 threads of our server.

Concerning TK2 and TK3, we were not able to perform all the computations due to the huge number of Step 1 solutions. Hence we decided to handle only the Step 1 solutions with exactly one active byte in the round keys in order to limit the number of truncated characteristics to instantiate. Those results are given in Table 5. We provide in Appendix B the best TK2 differential characteristic we found for 16 rounds, and the best TK3 differential characteristic we found for 17 rounds. Note that we do not know if these differential characteristics are optimal in terms of probability as we were not able to test all the solutions Step 1.

Lessons Learnt. The overall gap is not to find the optimal value of \(Obj_{Step1}=v^*\) for a given number of rounds and to enumerate the corresponding overall solutions if the Step 1 model is sufficiently tight. The real gap is if the value obtained for \(Obj_{Step2}\) (here equal to \(2 \times v^*\) as the best differential probability for the S-box is equal to \(2^{-2}\)) is far from the optimal bound then we have to increase \(Obj_{Step1}\) up to the bound \(\lfloor Obj_{Step2}/2\rfloor \). Further we are from \(v^*\) in the Step 1 resolution, more numerous are the Step 1 solutions (in fact this number grows exponentially as could be seen in Table 4). Thus, the time for the Step 2 resolution becomes the bottleneck.

5 Conclusion

In this paper, we improve the security bounds regarding differential characteristics search on the block cipher SKINNY. As usually done, we have divided the search procedure into two steps: Step 1 which abstracts the difference values into Boolean variables and finds the truncated characteristics with the smallest number of active S-boxes; and Step 2 which inputs the results of Step 1 to output the best possible probability instantiating the abstract solutions outputted by Step 1. Of course, each solution of Step 1 could not always be instantiated in Step 2.

For Step 1, an ad-hoc method which heavily uses the structure of the problem is proposed. For solving Step 2, we have implemented a Choco-solver model. Regarding Step 2, our Choco-solver model is much faster than any other approaches. It allowed us to find, for the first time, the best (related-tweakey) differential characteristics in the TK1 model up to 14 rounds for SKINNY-128 and to show there is no differential trail on 15 rounds with a probability better than \(2^{-128}\). Regarding the TK2 model, we were able to find the best differential trails up to 16 rounds. For TK3, we are able to exhibit a differential characteristic up to 17 rounds. Note that in [19] Liu et al. were only able to reach 7 and 9 rounds in the TK1 and TK2 model respectively. Our approach is thus an important improvement.