Keywords

1 Introduction

Cryptanalysis aims at finding non-random properties and distinguishers against classical cryptographic primitives. In particular, differential cryptanalysis [5] is a powerful tool against block and stream ciphers. It studies the propagation of the difference \(\delta X = X\oplus X'\) between two plaintexts X and \(X'\) through the cipher E or a part of the cipher, where \(\oplus \) is the exclusive OR (xor). If the distribution of the output difference \(\delta C=E_K(X) \oplus E_K(X')\) is non uniform over all the keys K, then an adversary has a distinguisher and can exploit this non-uniformity to guess part of the key K faster than exhaustive search. Most of the times, the distinguisher is constructed on a reduced number of rounds. Today, differential cryptanalysis is a public knowledge, and block ciphers have proven security bounds against differential attacks. Hence, [4] proposed to consider differences not only between the plaintexts X and \(X'\) but also between the keys K and \(K'\) to mount related-key attacks. In this case, the cryptanalyst is interested in finding optimal related-key differentials, i.e., input and output differences that maximize the probability of obtaining the output difference given the input difference. In other words, we search for \(\delta X\), \(\delta K\), and \(\delta C\) that maximize the probability that \(\delta C\) is equal to \(E_K(X) \oplus E_{K\oplus \delta K}(X \oplus \delta X)\) for a plaintext X and a key K.

Finding an optimal related-key differential characteristic is a highly combinatorial problem that hardly scales. To simplify this problem, Knudsen [16] introduced truncated differential characteristics where byte or nibble differences are abstracted by single bits that indicate if there is a difference at a given position or not. Thus, the search for related-key differential characteristics is divided into two steps as done in [6, 10]. In Step 1, each byte or nibble difference is abstracted by a Boolean value and the aim of this step is to find the trail which minimizes the number of active S-boxes. The goal here is to find the positions of the differences. Then, for each solution found at Step 1, Step 2 aims at instantiating each Boolean value into a valid byte or nibble difference while maximizing the overall probability of the differential characteristic crossing the S-boxes. However, some truncated differential characteristics found at Step 1 may not be valid (i.e., there do not exist byte or nibble values corresponding to these difference positions). Scaling from the AES to Rijndael can only be made with a tight model for Step 1. Models described in [6, 10] do not scale when increasing the block size and the key size. Only the model described in [13] has a sufficiently small number of solutions found at Step 1 to hope that the computational time will be reasonable.

In this paper, we show how to adapt the two-step solving process of [13] dedicated to the AES to compute optimal related-key differential characteristics for Rijndael [8]. Both steps are solved with Constraint Programming (CP) solversFootnote 1: Picat-SAT for Step 1 and Choco for Step 2. We improve the approach of [13] by better interleaving Steps 1 and 2 and exploiting bounds to stop the search sooner. We also improve the Step 2 process of [13] by decomposing the constraints associated with \(\texttt{MixColumns}\). These improvements allow us to compute the optimal differential characteristics for all Rijndael instances but one within a reasonable amount of time.

Rijndael is a family of block ciphers (more precisely it is composed of 25 instances of the same cipher where the block size and the key size vary) originally proposed at the AES competition. But the NIST only retained as a standard its 128-bit-block version under the key sizes 128, 192 and 256 bits. Studying the security of Rijndael is interesting to enlighten the AES standardization process. Among the most interesting results, we obtain a 12-round (over 13 rounds) related-key differential distinguisher and a 12-round (over 13 rounds) attack for Rijndael with a block size equal to 128 bits and a key size equal to 224 bits. We also obtain an 11-round related-key differential distinguisher for Rijndael with a block size equal to 160 bits and a key size equal to 256 bits leading to an attack on 12 rounds out of 14.

When looking at the state of the art concerning the cryptanalysis of Rijndael, some of the results are in the single key scenario [11, 15, 18, 19, 25, 27] or in the related-key scenario [26] and none of those attacks exceeds 10 rounds.

The rest of this paper is organized as follows: in Sect. 2, we recall the full description of Rijndael; in Sect. 3, we detail the methods and our CP models. in Sect. 4, we sum up all the related-key differential characteristics distinguishers we obtained, give all resolution times and compare them with those of [13]; in Sect. 5, we present two attacks based on the most efficient distinguishers and finally, in Sect. 6, we conclude this paper.

2 Rijndael

Rijndael-\(C_{len}\)-\(K_{len}\)(where \(C_{len}\) is the block size and \(K_{len}\) is the key size) is a set of 25 different SPN block ciphers designed by Joan Daemen and Vincent Rijmen [9]. Each instance varies according to the block size (128, 160, 192, 224 or 256 bits) and to the key size (128, 160, 192, 224 or 256 bits) but the ciphering process is the same for all variants, except for the \(\texttt{ShiftRows}\) operation (given in Table 1) and the number of rounds (given in Table 2). It has been chosen as the new advanced encryption standard by the NIST [1] with a 128-bit block size and a key length that can be set to 128, 192 or 256 bits. The number of rounds \(N_r\) depends on the text size \(C_{len}\) and on the key size \(K_{len}\) and varies between 10 and 14. For all the versions, the current block at the input of the round i is represented by a \(4 \times N_b\) matrix of bytes \(X_i\) where \(N_b=(C_{len}/32)\) is the number of columns and where each byte at row j and column k is denoted by \(X_i[j,k]\). The round function, repeated \(N_r-1\) times, involves four elementary mappings, all linear except the first one. Round i consists of the following transformations:

  • \(\texttt{SubBytes}\). A bytewise transformation is applied on each byte of the current block using an 8-bit to 8-bit non linear S-box, denoted by \(\texttt{SBOX}\): \(SX_i[j, k] = \mathtt {SBOX(}X_i[j, k]\mathtt {)}\), \(\forall j \in [0,3], \forall k \in [0,N_b-1]\).

  • \(\texttt{ShiftRows}\). A linear mapping rotates to the left all the rows of the current matrix \(SX_i\). The values of the shifts denoted \(P_{N_b}\) (given in Table 1) depend on \(N_b\): \(Y_i[j, k] = SX_i[j, (P_{N_b}[j] + k) \mod N_b]\), \(\forall j \in [0,3], \forall k \in [0,N_b-1]\).

  • \(\texttt{MixColumns}\) is a linear multiplication of each column of the current state by a constant matrix M in the Galois field GF(\(2^8\)), that provides the corresponding column of the new state. For a given column \(k \in [0, N_b-1]\), if we denote by \(\otimes \) the multiplication in GF(\(2^8\)), we have:

    $$ Z_i[l, k] = \sum _{j \in [0,3]} M[l, j] \otimes Y_i[j, k] $$

    with M the \(4 \times 4\) circulant matrix defined by its first row \(= [2, 3, 1, 1]\)

  • \(\texttt{AddRoundKey}\) performs a bitwise xor between the subkey \(RK_{i}\) of round i and the current state \(Z_i\): \( X_{i + 1}[j, k] = Z_i[j, k] \oplus RK_{i}[j, k]\), \(\forall j \in [0,3], \forall k \in [0,N_b-1]\).

figure a
Table 1. \(\texttt{ShiftRows}\) table \(P_{C_{len}}\). This table specifies the required number of byte shifts to the left according to the row number, e.g., \(P_{224}[3] = 4\).
Table 2. The number of rounds \(N_r\) of Rijndael-\(C_{len}\)-\(K_{len}\).

The subkeys \(RK_i\) are generated from the master key K using a \(\texttt{KeySchedule}\) algorithm composed of byte shifting, \(\texttt{SBOX}\) substitutions and xors which is fully described in Algorithm 1. We denote by \(N_k=\) \(K_{len}\)/32 the number of columns of the master key K. Note that each subkey \(RK_i\) is extracted from a main register WK in the following way: \(RK_i[j, k] = WK[j, (i+1) \times N_b + k]\), \(\forall j \in [0,3], \forall k \in [0, N_b-1]\).

Those \(Nr-1\) rounds are surrounded at the top by an initial key addition with the subkey \(RK_0\) and at the bottom by a final transformation composed by a call to the round function where the \(\texttt{MixColumns}\) operation is omitted.

Our notations are summarized below.

  • \(X_i\): the state at the beginning of round i. Note that \(X_{i}\) is also the state after applying the \(\texttt{AddRoundKey}\) function on the previous round \(X_{i-1}\);

  • \(SX_i\): the state of round i, after applying \(\texttt{SubBytes}\);

  • \(Y_i\): the state of round i, after applying \(\texttt{ShiftRows}\);

  • \(Z_i\): the state of round i, after applying \(\texttt{MixColumns}\).

  • \(RK_i\): the subkey of round i.

3 The Solving Process

In this section, we describe how to compute the optimal related-key differential characteristics for Rijndael by adapting and improving the approach introduced in [13] for the AES. We first recall some basic principles of CP; then we describe the two-step solving process; finally, we describe the CP models associated with each of these two steps.

3.1 Constraint Programming

CP is a generic framework for solving Constraint Satisfaction Problems (CSPs), i.e., finding an assignment of values to variables such that (i) each variable is assigned to a value that belongs to its domain, and (ii) a given set of constraints is satisfied. Each constraint is a relation between some variables which restricts the set of values that may be assigned simultaneously to these variables. This relation may be defined in intention, by using mathematical operators, or in extension, by listing all allowed tuples. For example, let us consider the constraint that ensures that the sum of three variables \(x_1\), \(x_2\), and \(x_3\) is different from 1. This constraint may be defined in intention by using operators \(+\) and \(\ne \): \(x_1+x_2+x_3\ne 1\) or it may be defined in extension by using a table constraint: \(\langle x_1,x_2,x_3\rangle \in \mathtt{T_{sum\ne 1}}\) where \(\mathtt{T_{sum\ne 1}}\) is a table which enumerates every triple of values the sum of which is different from 1. For example, if the domain of \(x_1\), \(x_2\), and \(x_3\) is \(\{0,1\}\), then \(\mathtt{T_{sum\ne 1}}\) contains the following triples: \(\langle 0,0,0\rangle , \langle 0,1,1\rangle , \langle 1,0,1\rangle , \langle 0,1,1\rangle \), and \(\langle 1,1,1\rangle \).

CP may also be used to solve Constrained Optimisation Problems (COPs), i.e., CSPs with an additional objective function that must be optimised. CSPs and COPs are defined by using a modelling language such as MiniZinc [20]. Then, they can be solved by CP solvers such as, Choco [21], Gecode [12], Chuffed [7], or Picat-SAT [28]. We refer the reader to [22] for more details on CP.

SAT is a special case of CSP, where all variables have boolean domains and all constraints are logical formulae (clauses). Also, MILPs (Mixed Integer Linear Programs) are special cases of COPs where all variables have numerical domains, constraints are linear inequalities, and the objective function is linear. To compute differential characteristics with SAT or MILP solvers, it is necessary to model the problem by means of logical formulae (for SAT) or linear inequalities (for MILP). In particular, the non linear \(\texttt{DDT}\) associated with an S-box must be represented by a large number of clauses (for SAT) or inequalities (for MILP), and the resulting models hardly scale [2, 17, 24].

When using CP, constraints do not need to be linear and \(\texttt{DDT}\)s are modelled in a straightforward way by using table constraints. In particular, [13] recently showed that CP solvers can compute optimal differential characteristics very efficiently and outperform the dedicated approaches of [6] and [10] for the AES.

3.2 Two-Step Solving Process

Finding the best related-key differential characteristic is a highly combinatorial problem. To improve the scalability of the attack, Knudsen has introduced truncated differentials [16]. The core idea is to solve the problem in two steps: In Step 1, we compute a truncated differential characteristic \(S_1\) where each differential byte \(\delta A\) of the ciphering process is replaced with a boolean variable \(\varDelta A\) that indicates whether \(\delta A\) contains a difference or not (i.e., \(\varDelta A = 0 \iff \delta A = 0\) and \(\varDelta A = 1 \iff \delta A \in \llbracket 1, 2^8 - 1\rrbracket \)); In Step 2, we instantiate \(S_1\) into a differential characteristic \(S_2\): for each boolean variable \(\varDelta A\), if \(\varDelta A\) is equal to 0 in \(S_1\), then \(\delta A\) is equal to 0 in \(S_2\); otherwise \(\delta A\) must belong to \(\llbracket 1, 2^8 - 1\rrbracket \). Note that some truncated characteristics cannot be instantiated to a characteristic because some abstractions are done at Step 1.

As \(\texttt{SBOX}\) is the only non-linear operation, the probability of a differential characteristic only depends on the values of the differential bytes that pass through S-boxes, under the Markov assumption that rounds are independent. We denote \(\delta _{SB}\) this set of bytes (including those in the key schedule), and \(\varDelta _{SB}\) the corresponding set of boolean variables.

A theoretical upper bound on the probability of the best differential characteristic may be computed by searching for the truncated differential characteristic which minimises the number of active S-boxes \({ NB}_{\texttt{SBOX}} = \# \{ \varDelta A \mid \varDelta A \in \varDelta _{SB} \wedge \varDelta A = 1 \} \). As \(2^{-6}\) is the maximal differential probability of the Rijndael \(\texttt{SBOX}\), the best probability is upper bounded by \({ UB} = 2^{-6 \cdot { NB}_{\texttt{SBOX}}}\).

figure b

UB may be larger than the actual best probability because it may be possible that the best truncated differential characteristic cannot be instantiated into a differential characteristic, or because some non null differential bytes that go through S-boxes have a probability equal to \(2^{-7}\) instead of \(2^{-6}\). Hence, the best differential characteristic is searched by alternating Step 1 and Step 2 in an iterative process which is described in Algorithm 2. First, we call Step1-opt to compute \({ NB}_{\texttt{SBOX}}\), a lower bound of the number of active S-boxes in a truncated differential, and this number is used to compute a first upper bound UB on the probability. The lower bound LB on the probability is initialised to 0. Then, at each iteration of the while loop, we call Step1-next to compute the next truncated differential characteristic with \({ NB}_{\texttt{SBOX}}\) active S-boxes: each time this function is called, it returns a new Step 1 solution with \({ NB}_{\texttt{SBOX}}\) active S-boxes until they all have been computed (in this latter case, Step1-next returns null). If a new Step 1 solution \(S_1\) has been computed, then Step2 is called to search for the differential characteristic \(S_2\) corresponding to \(S_1\) whose probability is larger than LB and maximal: if such a characteristic exists, then the best characteristic \(S^*\) is updated to \(S_2\) and LB is updated to the probability of \(S_2\). When Step1-next returns null, all truncated characteristics with \({ NB}_{\texttt{SBOX}}\) active S-boxes have been enumerated. In this case, we increment \({ NB}_{\texttt{SBOX}}\) and update consequently the upper bound UB. We stop iterating when UB becomes smaller than or equal to LB: in this case \(S^*\) is equal to the optimal differential characteristic.

Algorithm 2 is different from the one used in [13]: it avoids computing useless Step 1 solutions by updating LB and UB and stopping the process when \({ LB \ge UB}\). Step1-opt, Step1-next and Step2 are implemented with CP solvers and the corresponding CP models are described in the next two sections.

3.3 Step 1

Both Step1-opt and Step1-next compute truncated differential characteristics: Step1-opt searches for the truncated characteristic that minimises \({ NB}_{\texttt{SBOX}}\), whereas Step1-next searches for the next truncated characteristic given \({ NB}_{\texttt{SBOX}}\). Both problems share the same constraints which are described in this section. Step1-opt is a COP which is obtained by adding the objective function: minimise \({ NB}_{\texttt{SBOX}}\). Step1-next is a CSP which is obtained by assigning the variable \({ NB}_{\texttt{SBOX}}\) to the optimal solution of Step1-opt.

A key point for Algorithm 2 to be efficient is to avoid as much as possible computing truncated characteristics which cannot be instantiated at Step 2. To this aim, we consider the model introduced in [13] which is tighter than the model of [14], i.e., it computes fewer truncated characteristics that cannot be instantiated at Step 2. This model has been defined for the AES, and we show in this section how to extend it to Rijndael.

figure c

Constraints Associated with Rijndael Transformations. A basic Step 1 model for Rijndael is displayed in Model 1. Constraint (A1) relates \({ NB}_{\texttt{SBOX}}\) with the number of active S-boxes. The other constraints are derived from Rijndael round function transformations.

  • \(\texttt{SubBytes}\): As \(\texttt{SBOX}\) is bijective, there is an output difference if and only if there is an input difference. The \(\texttt{SubBytes}\) transformation at Boolean level is thus abstracted by an identity mapping \(\varDelta X_i\) and \(\varDelta SX_i\) (Constraint (A2)).

  • \(\texttt{ShiftRows}\): As \(\texttt{ShiftRows}\) is just a shift at byte level, its abstraction in Step 1 is directly expressed as the equivalent shift as defined in Constraint (A3).

  • \(\texttt{MixColumns}\): Multiplications of \(\texttt{MixColumns}\) cannot be mapped into the Boolean domain as the coefficients of M belong to GF(\(2^8\)). Thus, instead of encoding multiplications, we exploit the MDS (Maximum Distance Separable) property of the \(\texttt{MixColumns}\) transformation as defined in Constraint (A4).

  • \(\texttt{AddRoundKey}\): It is a simple xor between bytes of the current state \(Z_i\) and bytes of the subkey \(RK_i\). It is modelled by constraint (A5) which prevents every triple of boolean variables involved in a same xor from having exactly one difference. This constraint also relates variables associated with the subkey \({ RK}_i\) with variables associated with the expanded key WK.

  • \(\texttt{KeySchedule}\): the whole \(\texttt{KeySchedule}\) process of Rijndael is described in Algorithm 1. The variables that pass through \(\texttt{SBOX}\)es are unchanged, as stated in Constraint (A6). Xors are modelled by Constraint (A7) which prevents every triple of boolean variables involved in a same xor from having exactly one difference.

However, this simple model generates many truncated characteristics which cannot be instantiated at Step 2. This mainly comes from the fact that xors performed by \(\texttt{AddRoundKey}\) and \(\texttt{KeySchedule}\) are modelled by constraints which simply prevent the sum of differences to be equal to 1. Thus, we show how to refine this in the next two paragraphs.

Inference of New XOR Equations from the \(\texttt{KeySchedule}\) . In Model 1, every xor equation \(\delta A \oplus \delta B \oplus \delta C = 0\) is represented by a sum constraint \(\varDelta A + \varDelta B + \varDelta C \ne 1\). This simple model is not sharp enough and generates a lot of truncated solutions that cannot be instantiated at Step 2. For example, the two xor equations \(\begin{array}{l} \delta A \oplus \delta B \oplus \delta C = 0 \text { and } \delta B \oplus \delta C \oplus \delta D = 0 \end{array}\) are represented by the two sum constraints \(\begin{array}{r} \varDelta A + \varDelta B + \varDelta C \ne 1 \text { and } \varDelta B + \varDelta C + \varDelta D \ne 1 \end{array}\). When reasoning at the byte level, we easily infer that we cannot have \(\delta A=0\) and \(\delta D\ne 0\), whatever the values of \(\delta B\) and \(\delta C\) are. However, when reasoning at the boolean level, the two sum constraints may be satisfied when \(\varDelta A=0\) and \(\varDelta D=1\) (e.g., when \(\varDelta B=\varDelta C=1\)).

To sharpen the Step 1 model and reduce the number of Step 1 solutions that cannot be instantiated at Step 2, we generate new xor equations from the initial set of equations, by xoring them. These new equations do not change the set of solutions at the byte level. However, at the boolean level, they remove some of the truncated solutions that cannot be instantiated at Step 2. For example, when xoring the two xor equations of our previous example, we obtain the equation \(\delta A\oplus \delta D = 0\). When adding the constraint \(\varDelta A + \varDelta D\ne 1\) to the two sum constraints, we prevent the search from generating solutions with \(\varDelta A=0\) and \(\varDelta D=1\).

This trick has been introduced in [13] for the AES, and we extend it to Rijndael in a straightforward way. More precisely, we consider the set of all xor equations coming from the \(\texttt{KeySchedule}\) (this set corresponds to Constraint (A7) of Model 1). From this set, we generate all possible equations that involve no more than 4 variables by recursively xoring these equationsFootnote 2. This set of new equations is denoted ExtXOR.

Introduction of diff Variables. As done in [13], we also introduce \({ diff}\) variables to reason on differences at the byte level: every variable \({ diff}_{A, B}\) is a boolean variable which is true if \(\delta A\ne \delta B\), and false otherwise. diff variables are associated with variables involved in the \(\texttt{KeySchedule}\), in \(\texttt{AddRoundKey}\) and in \(\texttt{MixColumns}\). This new Model is presented in Model 2.

Each variable \({ diff}_{A, B}\) is related with \(\varDelta A\) and \(\varDelta B\) by ensuring: \({ diff}_{A, B}+\varDelta A+\varDelta B\ne 1\). In other words, \({ diff}_{A, B}=0\) whenever \(\varDelta _A=\varDelta _B=0\) and \({ diff}_{A, B}=1\) whenever \(\varDelta _A\ne \varDelta _B\). This corresponds to Constraints (E1) (for \(\texttt{KeySchedule}\)) and (E2) (for the \(\texttt{MixColumns}\)). These constraints also ensure symmetry, i.e., \({ diff}_{A, B}={ diff}_{B, A}\). Constraints (E3) and (E4) ensure transitivity (i.e., if \(\delta A=\delta B\) and \(\delta B=\delta C\), then \(\delta A=\delta C\)) by constraining the sum of the corresponding diff variables to be different from 1.

Constraint (E5) relates diff variables associated with the subkey \({ RK}_i\) with diff variables associated with the expanded key WK.

Constraints (E6) and (E7) are associated with the new xor equations in ExtXOR. Two cases are considered: equations with three variables in Constraint (E6), and equations with four variables in Constraint (E7). In both cases, if at least one variable involved in the equation belongs to \(\varDelta _{SB}\), then the constraint simply prevents the sum of the variables to be equal to 1. Otherwise, we exploit diff variables to tighten the constraint.

Finally, Constraints (E8) and (E9) ensure the MDS property of \(\texttt{MixColumns}\) on differences between pairs of columns (this constraint is partly equivalent with the linear incompatibility of [10]). Indeed, the MDS property holds between each input and output column before and after applying \(\texttt{MixColumns}\) but it also holds when xoring different columns. More precisely, if \(i_1,i_2\in [0,r-2]\) are two round numbers, and \(k_1,k_2\in [0,3]\) are two column numbers, for every row \(j\in [0,3]\), we have

$$\begin{aligned} \begin{array}{lll} \delta Z_{i_1}[j][k_1]\oplus \delta Z_{i_2}[j][k_2] &{}=&{} \left( \bigoplus _{l=0}^3 M[j][l] \cdot \delta Y_{i_1}[l][k_1]\right) \oplus \left( \bigoplus _{l=0}^3 M[j][l] \cdot \delta Y_{i_2}[l][k_2]\right) \\ &{}=&{} \bigoplus _{l=0}^3 M[j][l] \cdot \left( \delta Y_{i_1}[l][k_1] \oplus \delta Y_{i_2}[l][k_2]\right) \end{array} \end{aligned}$$

Therefore, the MDS property also holds for the result of the xor of two different columns. This is modelled by Constraint (E8). This constraint removes many Step 1 solutions that cannot be instantiated at Step 2. In a rather similar way, Constraint (E9) is derived by xoring equations coming from \(\texttt{AddRoundKey}\).

figure d

Incomplete Step 1 Solutions. As pointed out in [13], some AES instances have a huge number of Step 1 solutions. Many of these solutions have exactly the same values for the boolean variables in \(\varDelta _{ SB}\) (corresponding to S-boxes), and they only differ on values of other boolean variables (that do not correspond to S-boxes). For example, when the key has 192 bits and the number of rounds is equal to 10, there are 27,548 different Step 1 solutions. However, there are only 7 different assignments of values to the variables in \(\varDelta _{ SB}\).

As Rijndael is a generalisation of the AES, this is also true for Rijndael. Hence, as proposed in [13], we enumerate incomplete solutions such that only the variables in \(\varDelta _{ SB}\) are assigned.

3.4 Step 2

Given a Step 1 solution (corresponding to a truncated characteristic), Step 2 aims at searching for the corresponding characteristic which has the largest probability, and such that this largest probability is larger than the best probability found so far (LB).

The CP model used to solve Step 2 is described in Model 3. For each boolean variable \(\varDelta A\) of Step 1, this model uses an integer variable \(\delta A\) to represent the corresponding differential byte. If this byte passes through an S-box (i.e., \(\delta A \in \delta _{ SB}\)), then the initial domain of \(\delta A\) depends on the value of \(\varDelta A\) in the Step 1 solution: If \(\varDelta A = 0\), then \(\delta A\) is assigned to 0; Otherwise the domain of \(\delta A\) is \(\llbracket 1, 255\rrbracket \). For each variable \(\delta A\not \in \delta _{ SB}\), the domain of \(\delta A\) is \(\llbracket 0, 255\rrbracket \) as the associated boolean variable \(\varDelta A\) is not assigned in the Step 1 solution.

\(\texttt{SBOX}\). We introduce new variables in order to represent probabilities associated with S-boxes. More precisely, for each byte \(\delta A\in \delta _{ SB}\) that passes through an S-box, we introduce an integer variable \(p_{A}\). This variable represents the binary logarithm of the probability to observe the output difference \(\delta { SA}\) given the input difference \(\delta A\). We consider the binary logarithm of the probability (instead of the probability), in order to avoid rounding errors. As the probability for Rijndael S-boxes is 0, \(2^{-7}\), \(2^{-6}\), or 1, and as we only consider values with non null probabilities, the values that may be assigned to \(p_{A}\) are \(-7\), \(-6\), and 0.

Constraint (C2) of Model 3 relates input differences, output differences and probabilities for the S-boxes applied on the plaintext, whereas Constraint (C9) relates them for the S-boxes in the \(\texttt{KeySchedule}\). In both cases, we use a table constraint which ensures that the triple of variables belongs to a table denoted \(\mathtt {T_{SBOX}}\): This table contains a triple \(\langle \delta _{in}, \delta _{out}, p\rangle \) for each couple of differential bytes \((\delta _{in},\delta _{out})\) such that the \(\texttt{DDT}\) content for \((\delta _{in},\delta _{out})\) is different from 0, and such that p is equal to: \( \log _2(\frac{ \# \{ (X, X') \in \llbracket 0, 255\rrbracket ^2 \mid ( X \oplus X' = \delta _{in} ) \wedge ( \texttt{SBOX}(X) \oplus \texttt{SBOX}(X') = \delta _{out} ) \} }{256}) \).

Objective Function. We introduce an integer variable obj to represent the binary logarithm of the probability of the differential characteristic. The objective function is: maximise obj. The actual probability is computed as \(2^{ obj}\).

figure e

Constraint (C1) of Model 3 ensures that obj is equal to the sum of every \(p_A\) such that \(\delta A\) is a byte that passes through an S-box. It also ensures that obj is strictly greater than the current lower bound LB.

\(\texttt{ShiftRows}\). Constraint (C3) of Model 3 is the straightforward translation of \(\texttt{ShiftRows}\).

\(\texttt{MixColumns}\). Constraints (C4) to (C7) represent the \(\texttt{MixColumns}\) operation. We introduce new integer variables to represent the result of applying the Galois Field multiplication to a byte: for each value \(v\in \{2,3\}\), and each byte \(\delta Y_i[j,k]\), the variable \(v\delta Y_i[j,k]\) is constrained to be equal to \(v\otimes \delta Y_i[j,k]\) by the table constraint (C4), where \(\texttt {T}_\texttt {MULv}\) contains every couple \((\delta A,\delta B)\in \llbracket 0,255\rrbracket ^2\) such that \(\delta B = v\otimes \delta A\). Then, Constraint (C5) ensures that \(\delta Z_i[j,k]\) is equal to the result of xoring four bytes (corresponding to the bytes at column k of \(\delta Y_i\) multiplied by the coefficients at row j of M). Again, this is done by using table constraints. The main novelty with respect to the model introduced in [13] for the AES is that we do not use a single table containing every tuple of five bytes such that the xor of these bytes is equal to 0, as this table is very large (\(2^{40}\) tuples of five bytes). Instead, we introduce new variables (denoted \(a_i[k]\), \(b_i[k]\), etc.), and we decompose the relation into three constraints such that each constraint ensures that the xor of three variables is equal to zero. For example, the relation

$$\begin{aligned} \delta Z_i[0,k] \oplus 2\delta Y_i[0,k]\oplus 3\delta Y_i[1,k]\oplus \delta Y_i[2,k]\oplus \delta Y_i[3,k] = 0 \end{aligned}$$

is decomposed into the three following constraints:

$$\begin{aligned} \langle 2\delta Y_i[0,k], 3\delta Y_i[1,k], a_i[k]\rangle \in \texttt {T}_\oplus \\ \langle \delta Y_i[2,k], \delta Y_i[3,k], b_i[k]\rangle \in \texttt {T}_\oplus \\ \langle a_i[k], b_i[k], \delta Z_i[0,k]\rangle \in \texttt {T}_\oplus \end{aligned}$$

where \(\texttt {T}_\oplus \) is the table which contains every triple \((\delta A,\delta B, \delta C)\in \llbracket 0,255\rrbracket ^3\) such that \(\delta A\oplus \delta B \oplus \delta C = 0\). This decomposition allows us to remove some variables and simplify constraints when we know that some variables are equal to 0. For example, if \(\varDelta Y_i[0,k]=0\) in the truncated characteristic, then we infer that \(2\delta Y_i[0,k] = 0\) (because \(2\otimes 0=0\)) and \(a_i[k] = 3\delta Y_i[1,k]\) (because \(0\oplus \delta Y_i[1,k] \oplus a_i[k] = 0\)). Hence, in this case the three previous constraints are replaced with: \( \langle \delta Y_i[2,k], \delta Y_i[3,k], b_i[k]\rangle \in \texttt {T}_\oplus \) and \( \langle 3\delta Y_i[1,k], b_i[k], \delta Z_i[0,k]\rangle \in \texttt {T}_\oplus \).

Constraints (C6) and (C7) are redundant constraints that model the \(\texttt{MixColumns}\) \(^{-1}\) operation: They do not change the solutions, but they speed up the solution process by allowing the solver to propagate in both forward (from the plaintext to the ciphertext) and backward (from the ciphertext to the plaintext) directions.

\(\texttt{AddRoundKey}\). Constraint (C8) is a straightforward implementation of \(\texttt{AddRoundKey}\), using table \(\texttt {T}_\oplus \).

\(\texttt{KeySchedule}\). Constraints (C9) and (C10) model the \(\texttt{KeySchedule}\). Constraint (C9) models the S-boxes of the \(\texttt{KeySchedule}\) (as described in Sect. 3.4). Constraint (C10) models the xors of the \(\texttt{KeySchedule}\), using table \(\texttt {T}_\oplus \). Note that we do not represent xors with constants as they are cancelled by differential cryptanalysis.

Fig. 1.
figure 1

Comparison of the approach of [13] (in green) with our new approach (in red), our new approach without \(\texttt{MixColumns}\) decomposition (in orange), and our new approach without exploiting bounds (in purple). Each point (xy) corresponds to an AES instance (with \(K_{len}=128\) on the left, \(K_{len}=192\) on the middle, and \(K_{len}=256\) on the right): y gives the CPU time in seconds needed to solve it (logscale) when there are \(N_r= x\) rounds. (Color figure online)

4 Results

The Step 1 model is implemented with the MiniZinc 2.4.3 modelling languageFootnote 3. This language is accepted by many CP solvers and preliminary experiments have shown us that the best performing solver is Picat-SATFootnote 4 2.8.6: This solver first translates the MiniZinc model into a SAT instance and then uses the Lingeling SAT solver to solve the SAT instance. The Step 2 model is implemented and solved with the CP library ChocoFootnote 5 v4.10.2.

In Fig. 1, we compare solving times of the approach of [13] with those of our new approach on the AES instances in order to evaluate the interest of our two modifications, i.e., (i) the interleaving of Steps 1 and 2 and the active use of LB and UB to stop the search whenever LB \(\ge \) UB (see Sect. 3.2), and (ii) the decomposition of the \(\texttt{MixColumns}\) constraint into 3 smaller table constraints (see Sect. 3.4). For this experiment, all runs have been performed on a single core of an Intel Xeon CPU E3 at 3.50 GHz with 4 cores under a Linux Ubuntu 20.04.1 (Focal Fossa) using at most 16 GB of RAM. There are two instances for which our new approach needs slightly more time than the approach of [13]: AES-128 when \(N_r=3\) (48 s instead of 23 s) and AES-256 when \(N_r=13\) (567 s instead of 479 s). For the 21 remaining instances our new approach is faster and, in some cases the difference is very large, e.g., 4,217 s instead of 95,389 s for AES-128 when \(N_r=5\), or 5,163 s instead of 30,059 s for AES-192 when \(N_r=10\). To evaluate the interest of each of our two modifications separately, we also display our new approach without (ii), and our new approach without (i). In many cases, each modification improves the solution process, and the combination of these two modifications is even better. However, modification (i) deteriorates the solution process when \(N_r\ge 10\) for AES-256. This comes from the fact that, for these instances, the optimal solution is strictly smaller than \(2^{-6 \cdot NB_{{\small sbox}}}\) so that the lower bound LB cannot be used to stop the search.

We give in Tables 3 and 4 the results of Algorithm 2 for every key length \(K_{len}\in \{128, 160, 192, 224, 256\}\), every block size \(C_{len}\in \{128,160,192,\) \(224,256\}\), and every number of rounds \(N_r\in \llbracket 3,x\rrbracket \) where x is the maximum number of rounds authorized (i.e., the maximal number of rounds for which \({ NB}_{\texttt{SBOX}}\) is smaller than the key length divided by 6 and the number of active S-boxes in the plaintext part is smaller than the block length divided by 6). For this experiment, all runs have been performed on a single core of an Intel Xeon E5-2630 v4 at 3.10 Ghz with 10 cores under a Linux Debian 10 (Buster) using at most 16 GB of RAM (default JVM configuration). This architecture was provided by the Grid5000 cluster [3].

Please note that there is a slight difference between the model used for Fig. 1 and the model used for Tables 3 and 4. Indeed, the model in [13] ignores the sboxes of the last round subkey. When the key has 128 or 256 bits, this does not change anything. However, when the key has 192 bits this may change results. To allow a fair comparison with [13], we also ignore the sboxes of the last round key in all models compared in Fig. 1. However, in Tables 3 and 4, we do consider the sboxes of the last round subkey. Therefore, when the key has 192 bits and the text 128 bits, some probabilities may be greater than those reported in [13], e.g., for the instance Rijndael-128-192 with 7 rounds the maximal probability is \(2^{-84}\) instead of \(2^{-78}\).

One instance (when \(C_{len}=128\), \(K_{len}=160\), and \(N_r=8\)) is still not completely solved at the time we submit the paper, after 38 days of computation. For this instance, the output value of Step1-opt is 23. Step1-enum has enumerated 7 truncated characteristics with 23 active S-boxes and none of them can be instantiated into a Step 2 characteristic. So far, we have enumerated 213 truncated characteristic with 24 active S-boxes and none of them can be instantiated into a Step 2 characteristic. Hence, for this instance the current upper bound is \(UB = 2^{-150}\). We have computed 189 instances with 25 active S-boxes and 1048 instances with 26 active S-boxes and the smaller probability is \(LB = 2^{-160}\).

All other instances have been solved within a reasonable amount of time: 82 are solved within 1,000 s; 24 need more than 1,000 s and less than 10,000 s (i.e., less than three hours); 10 need more than 10,000 s and less than 100,000 s (i.e., less than 28 h); and finally 2 need more than 28 h and less than 3 days.

In Tables 3 and 4, \(o_1\) is the output value of Step1-opt (called at line 1 of Algorithm 2), i.e., the initial value of \({ NB}_\texttt {SBOX}\); p is the output value of Algorithm 2, i.e., the probability of the optimal related-key differential characteristic; and time is the total CPU time spent by Algorithm 2 in seconds (this time both includes the running times of Picat-SAT and of Choco). We also report the number \(o_2\) of active S-boxes in the optimal differential characteristic. In most cases (91 out of 122 cases), \(o_1=o_2\) and \(p > 2^{-6\cdot (o_1+1)}\). In these cases, Algorithm 1 has enumerated Step 1 solutions for only one value of \({ NB}_\texttt {SBOX}\), and LB became larger than or equal to UB the first time \({ NB}_\texttt {SBOX}\) has been incremented.

In 17 cases (marked with \(^c\) just after \(o_2\)), \(o_1=o_2\) but it has been necessary to increment \({ NB}_\texttt {SBOX}\) at least once in order to check that no better characteristic can be found with more active S-boxes. For example, when \(C_{len}=128\), \(K_{len} = 224\), and \(r=9\), the best differential characteristic has 22 active S-boxes and its probability is \(2^{-139}\). As \(2^{-139} < 2^{-6 \cdot 23}\), Algorithm 1 has incremented \({ NB}_\texttt {SBOX}\) in order to check that it is not possible to have a larger probability (equal to \(2^{-139}\)) with 23 active S-boxes.

In 2 cases (marked with just after \(o_2\)), \(o_2 \ge o_1+1\) because none of the step 1 truncated characteristic with \(o_1\) active S-boxes can be instantiated into a Step 2 characteristic. In these two cases, Algorithm 1 has incremented \({ NB}_\texttt {SBOX}\) in order to enumerate Step 1 solutions with \(o_1+1\) active S-boxes and find the best differential characteristic.

Finally, in 13 cases (marked with just after \(o_2\)), \(o_2 \ge o_1+1\) because a better characteristic has been found with \(o_1+n\) active S-boxes (though at least one Step 1 solution can be instantiated into a Step 2 solution).

Table 3. Summary of the best related-key differential characteristics for Rijndael when \(C_{len}\in \{128,160\}\). The time is given in seconds.
Table 4. Summary of the best related-key differential characteristics for Rijndael when \(C_{len}\in \{192,224,256\}\). The time is given in seconds.

5 Attacks

We describe in this section the best attacks we could mount based on the distinguishers found in the previous section. More precisely, two particular distinguishers have a real interest in terms of attacks. The first one is an 11-round related-key differential characteristic distinguisher on Rijndael-128-224 (presented in Table 5 that allows us to mount an attack on 12 rounds (out of 13) of this cipher. There also exists a 12-round distinguisher for Rijndael-128-224 but due to its very low probability (equal to \(2^{-127}\)) for the data path, we do not manage to transform this distinguisher into an attack. And second, we also mount an attack on 12 rounds of Rijndael-160-256 (it has 14 rounds) based on the 11-round related-key differential characteristic distinguisher (presented in Table 6).

5.1 Attack on 12 Rounds of Rijndael-128-224

First, remember that the 12th round of Rijndael-128-224 is the last round for our attack so it does not contain a \(\texttt{MixColumns}\) operation. We base our attack on the distinguisher presented in Table 5. This distinguisher has a probability equal to \(2^{-169}\): \(2^{-103}\) coming from the state and \(2^{-66}\) coming from the key.

Table 5. The Best related key differential characteristic we found on 11 rounds of Rijndael-128-224 with probability equal to \(2^{-169}\). The four words represent the four rows of the state and are given in hexadecimal notation. Note that the last round does not contain the \(\texttt{MixColumns}\) operation.

Thus, the attack process is the following one. We submit \(M=2^{103 + \epsilon }\) pairs of plaintexts X and \(X'\) with the difference specified in the first line of Table 5 under the keys K and \(K'=K \oplus \delta K\) with the difference specified in the first line (second column) of Table 5. Then a possible propagation of the difference is the one shown in Table 5, and we obtain the corresponding ciphertexts C and \(C'\).

We know from Table 5 that the output of the 11th round (and the beginning of the 12th round) is of the form \(\delta X_{12} = \begin{pmatrix} 01 &{} 01 &{} 1F &{} 1F\\ 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0\\ \end{pmatrix}\). After passing through SubBytes and ShiftRows, it becomes: \(\delta SX_{12} = \begin{pmatrix} ? &{} ? &{} ? &{} ?\\ 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0\\ \end{pmatrix}\). From the keyschedule, the subkey difference \(\delta K_{12}\) will be of the form \(\begin{pmatrix} 21 &{} A \oplus 01 &{} A &{} A \oplus 01\\ 1F &{} B &{} B &{} B\\ 1F &{} C &{} C &{} C\\ 21 &{} D &{} D &{} D\\ \end{pmatrix}\) where ABC and D are unknown difference. Thus the difference between C and \(C'\) will be of the form \(\delta C= \begin{pmatrix} ? &{} ? &{} ? &{} ?\\ 1F &{} B &{} B &{} B\\ 1F &{} C &{} C &{} C\\ 21 &{} D &{} D &{} D\\ \end{pmatrix}\).

So the attack works as follows:

  1. 1.

    We filter on the values 1F, 1F, and 21 at positions (1, 0), (2, 0), and (3, 0) in \(\delta C\) before the last ShiftRows. It remains \(2^{103+\epsilon -24}=2^{79+\epsilon }\) pairs of plaintexts/ciphertexts. Moreover, we know that the three bytes at positions (1, 1), (1, 2) and (1, 3) must be equal (this remark also holds for the second and the third rows). This leads to another filter of 48 bits.

  2. 2.

    We guess the byte value of \(K_{12}\) at position (0, 0) with a cost of \(2^{8}\). Then, we decipher this byte from C and \(C'\) to check if it is equal to 01 at the input of the 12th round. Then, it filters \(2^{-8}\) values. Moreover, the known byte at position (0, 0) in \(K_{12}\) gives us the difference D (due to the keyschedule construction).

  3. 3.

    We can guess the byte at position (1, 0) in \(K_{12}\) and check the value at the input of the 12th round at position (1, 0), by deciphering from C and \(C'\). Then, it filters \(2^{-8}\) values. Moreover, we can compute the difference A.

  4. 4.

    We can guess the three bytes at positions (0, 1), (0, 2), and (0, 3) in \(K_{12}\) and check the value at the input of the 12th round at the same position knowing the difference A, by deciphering from C and \(C'\). Then, it filters \(2^{-24}\) values.

  5. 5.

    We can guess the byte at position (3, 0) in \(K_{12}\) and check the value at the input of the 12th round at position (3, 0), by deciphering from C and \(C'\). Then, it filters \(2^{-8}\) values.

  6. 6.

    We can guess the byte at position (2, 0) in \(K_{12}\) and check the value at the input of the 12th round at position (2, 0), by deciphering from C and \(C'\). Then, it filters \(2^{-8}\) values.

Then, we have guessed 7 bytes of the subkey \(K_{12}\), 56 bits of key, and we have filtered an equivalent of 72 bits, leading to keep \(2^{103+\epsilon -72}=2^{31+\epsilon }\) pairs of plaintexts/ciphertexts. After guessing the 7-byte difference in the subkey \(K_{12}\), \(\delta _{K_{12}}\) is fully determined. Thus, guessing the bytes of one key state could determine the bytes of the related-key state.

The related-key differential characteristic given in Table 5 has a probability to happen for the state part equals to \(2^{-103}\). Thus, if we use \(2^{104}\) plaintexts/ciphertexts in the related-key differential attack on 12 rounds of Rijndael-128-224, then the right difference of the 56 bits of the last subkey will be counted at least twice on average whereas the probability that a bad key appears twice is really low (around \(2^{32-72}=2^{-40}\)). The success probability computed using the results of [23] is around 97%. The time complexity of the attack is about \(2^{105}\) encryptions and the attack succeeds if the key follows the characteristic described in Table 5. In other words, we have a set of weak keys of size \(2^{224-66}=2^{158}\).

The 168 remaining bits of the master key can be guessed through guessing more bytes in \(K_{11}\) and in \(K_{12}\) and filtering according to the remaining values in \(\delta X_{11}\) and the \(\texttt{SBox}\)es of the key schedule.

5.2 Attack on 12 Rounds of Rijndael-160-256

Table 6. The Best related key differential characteristic we found on 11 rounds of Rijndael-160-256 with probability equal to \(2^{-204}\). The four words represent the four rows of the state and are given in hexadecimal notation. Note that the last round does not contain the \(\texttt{MixColumns}\) operation.

In the same way, from the related-key differential characteristic distinguisher on 11-round of Rijndael-160-256 presented in Table 6, we can easily mount a 12-round attack against Rijndael-160-256 that has 14 rounds. Note that the 12th round does not contain the \(\texttt{MixColumns}\) operation as it is the last round of our attack. The 11-round related-key differential characteristic distinguisher presented in Table 6 has a probability equal to \(2^{-204}\): \(2^{-128}\) coming from the difference in the state and \(2^{-76}\) coming from the key.

Thus, the attack process is the following one. We submit \(M=2^{128 + \epsilon }\) pairs of plaintexts X and \(X'\) with the difference specified in the first line of Table 6 under the keys K and \(K'=K \oplus \delta K\) with the difference specified in the first line (second column) of Table 6. Then a possible propagation of the difference is the one shown in Table 6. Then, we obtain the corresponding ciphertexts C and \(C'\).

Then, we know from Table 6 that the output of the 11th round (and the beginning of the 12th round) is of the form

\(\delta X_{12} = \begin{pmatrix} 82 &{} FF &{} 0 &{} 0 &{} E0\\ 0 &{} F2 &{} 0 &{} 0 &{} 0\\ 0 &{} F2 &{} 0 &{} 0 &{} 0\\ 0 &{} ED &{} 70 &{} 70 &{} 70\\ \end{pmatrix}\). After passing through SubBytes and ShiftRows, it becomes: \(\delta SX_{12} = \begin{pmatrix} ? &{} ? &{} 0 &{} 0 &{} ?\\ ? &{} 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 &{} ?\\ ? &{} ? &{} 0 &{} ? &{} ?\\ \end{pmatrix}\). From the keyschedule, the subkey differences \(\delta K_{12}\) will be of the form \(\begin{pmatrix} 82 &{} 0 &{} 82 &{} 0 &{} F\\ A &{} A &{} A &{} A &{} D\\ B &{} B &{} B &{} B &{} E\\ C &{} C &{} C &{} C &{} E0\\ \end{pmatrix}\) where ABCDE and F are unknown difference. Thus the difference between C and \(C'\) will be of the form \(\delta C= \begin{pmatrix} ? &{} ? &{} 82 &{} 0 &{} ?\\ ? &{} A &{} A &{} A &{} D\\ B &{} B &{} B &{} B &{} ?\\ ? &{} ? &{} C &{} ? &{} ?\\ \end{pmatrix}\).

So the attack works as follows:

  1. 1.

    For all the \(2^{128 + \epsilon }\) encrypted pairs of plaintexts/ciphertexts, we filter on the values 82 and 0 at positions (0, 2) and (0, 3) in \(\delta C\). This filters \(2^{-16}\) values. Then, it remains \(2^{112 + \epsilon }\) encrypted pairs of plaintexts/ciphertexts.

  2. 2.

    We guess the three bytes at positions (1, 4), (2, 4), and (3, 4) in \(K_{11}\) for a cost of \(2^{24}\). This gives us the values of differences A, B and C. With those known values, we filter on \(\delta SX_{12}\) on the 8 positions that are equal to 0 after removing A, B, and C (positions (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3) and (3, 2)). This filters \(2^{-64}\) values.

  3. 3.

    We guess 6 bytes of \(K_{12}\) (those at positions (0, 0), (0, 1), (1, 0), (3, 0), (3, 1) and (3, 3)). We filter the corresponding \(2^{-48}\) values on \(\delta X_{12}\) (before the \(\texttt{SBOX}\)es) at the same positions.

  4. 4.

    We guess the byte at position (2, 3) in \(K_{12}\) to get one new byte in \(\delta K_{12}\) at position (1, 4) equal to D and check if the difference is equal to 0 at position (1, 4) in \(\delta X_{12}\). It filters \(2^{-8}\) values.

  5. 5.

    We guess the byte at position (3, 4) in \(K_{12}\) to filter one byte of value in \(\delta X_{12}\) at position (3, 4). We guess another byte at position (2, 4) in \(K_{12}\) to filter the byte value at position (2, 4) in \(\delta X_{12}\). And finally, we guess the two bytes at positions (1, 3) and (0, 4) in \(K_{12}\) to filter the byte value at position (0, 4) in \(\delta X_{12}\).

We have guessed a total of 112 key bits and we have filtered, in the initial step 16 bits and the equivalent of 32 bits in the second step and the last step leading to stay with \(2^{80+\epsilon }\) pairs of plaintexts/ciphertexts.

The related-key differential characteristic given in Table 6 has a probability to happen for the state part equals to \(2^{-128}\). Thus, if we use \(2^{129}\) plaintexts/ciphertexts in the related-key differential attack on 12 rounds of Rijndael-160-256, then the right difference of the 112 bits of the last subkey will be counted at least twice on average whereas the probability that a bad key appears twice is really low (around \(2^{81-112}=2^{-31}\)). The success probability computed using the results of [23] is around 97% also. The time complexity of the attack is about \(2^{130}\) encryptions and the attack succeeds if the key follows the characteristic described in Table 6. In other words, we have a set of weak keys of size \(2^{256-76}=2^{180}\).

The 144 remaining bits of the master key can be guessed through guessing more bytes in \(K_{11}\) and in \(K_{12}\) and filtering according the remaining values in \(\delta X_{11}\) and the \(\texttt{SBox}\)es of the key schedule.

6 Conclusion

In this paper, we have extended and improved the models initially proposed in [13] for the AES to the 25 instances of Rijndael. This allowed us to compute optimal related-key differential characteristics for all Rijndael instances but one (and provide upper and lower bounds for the remaining one). We sum up in Table 7 the best attacks described in this paper.

Table 7. Summary of the best related-key differential attacks we found on different Rijndael instances. The last column displays the number of keys for which the attack works.

Those results are obtained using a two-step strategy that is feasible in terms of memory usage and time consumption. This strategy is modelled in MiniZinc, and it is solved by combining two solvers: Picat-SAT for Step 1 and Choco for Step 2. It essentially means that we could today automate a big part of cryptanalysis through the use of generic solvers: we have gone from the artisan age of cryptanalysis to its industrial age.