Keywords

1 Introduction

Cryptanalysis of block ciphers witnessed the remarkable progress after the proposal of differential attack on DES by Biham and Shamir [8] in 1990. The differential attack is a basic and widely used cryptanalytic approach against the block ciphers. This attack is generalised and combined with other cryptanalytic techniques to reduce the attack complexity. High probability differential characteristics are the first and foremost requirement for the attack to succeed. In 1994, M. Matsui proposed a method based on the branch-and-bound technique [17] to search the high probability differential characteristics. In 2011, Mouha et al. proposed a new technique using mixed integer linear programming (MILP) to search the differential characteristics [18]. The method based on MILP uses optimization problem solvers to construct high probability differential characteristics. Most of the block ciphers follow the Shannon’s principles [14] and wide trail design strategy [11] to thwart the differential attack. In practice, we need a differential with the probability greater than \(2^{-n}\) to distinguish r rounds of an n-bit block cipher from the random data. Any r-round characteristic with a probability less than \(2^{-n}\) cannot be used to mount the differential attack on r or more rounds of a block cipher. A differential characteristic is useful till it requires less data than the available limit i.e. \(2^n\) pairs. The motivation of this paper is to find a technique which can be used to extend the classical differential characteristics without (much) increasing the data complexity. Machine learning based differential cryptanalysis approach works well to solve this problem.

Machine learning techniques are used to determine the meticulous relations in the data. Since such relations define the security strength of the cipher, identification of these relations plays an important role. In cryptanalysis domain, the machine learning techniques are explored very recently to mount the key recovery attack using differential cryptanalysis [12].

In this paper, we combine the classical and machine learning techniques to design an ML based generic extension for any classical differential distinguisher. This approach provides the better results with (much) lower data complexity. We extend an r-round high probability classical differential distinguisher with an s-round ML based differential distinguisher. The extended distinguisher is used to distinguish the \((r+s)\) rounds of a block cipher using less data. With this extension, the hybrid distinguisher outperforms both the classical and ML based distinguisher. We call this hybrid distinguisher a differential-ML distinguisher. This technique is experimented on three different types of lightweight block ciphers SPECK32 [4], SIMON32 [4], and GIFT64 [3] and better results are obtained with very high accuracy.

The remaining part of the paper is organised as follows. In Sect. 2, we compare our technique with the previous work. In Sect. 3, we provide a brief description of the lightweight block ciphers SIMON32, SPECK32 and GIFT64. We discuss the classical differential distinguisher and machine learning based differential distinguisher in Sect. 4 and describe the existing work on differential distinguishers using machine learning. In Sect. 5, we propose a novel technique to construct the differential-ML distinguisher. We demonstrate our technique on SPECK32, SIMON32, and GIFT64 block ciphers and present the results in Sect. 6. The paper is concluded in Sect. 7.

Notations: We have used the following notations in this paper:

figure a

Conventions: Throughout the paper, we refer an r-round differential distinguisher with the single input and single output difference as a classical differential distinguisher \(D^{CD}_{1\cdots r}\).

2 Comparison with the Previous Work

A. Gohr [12] used machine learning techniques and proposed the idea of learning the differences to mount a key recovery attack. He presented a technique to construct the ML based differential distinguisher and used it for the key recovery attack on SPECK32. Gohr compared this technique with the classical differential attack and showed that complexity of the key recovery attack is reduced by using the ML distinguisher. Baksi et al. [2] also used the same approach to design the ML distinguisher for GIMLI cipher and GIMLI hash [5]. Various ML architectures are compared in [2] and it is claimed that ML distinguisher for GIMLI outperforms the classical differential distinguisher.

A. Gohr presented the 11-round key recovery attack on SPECK32. In this attack, 7-round ML based distinguisher is used and it is extended to 9 rounds by pre-pending a 2-round high probability differential distinguisher. In Gohr’s approach, the accuracy and the data complexity of the 9-round extended distinguisher is not discussed explicitly. Although, the accuracy of extended distinguisher is quite low, yet it is used in the key recovery with various cipher specific optimizations. In this paper, we present a new technique to extend r-round classical differential distinguisher using an s-round ML distinguisher. Now, the extended distinguisher works as the \((r+s)\) rounds differential-ML distinguisher. The proposed technique ensures that the accuracy of differential-ML distinguisher is high and comparable to the classical differential distinguisher. We experimentally show that there is an exponential reduction in the data complexity of the \((r+s)\)-round distinguisher by using the proposed differential-ML distinguisher.

3 Block Ciphers: SPECK32, SIMON32, and GIFT64

SPECK and SIMON are two families of the block ciphers proposed by Beaulieu et al. [4] in 2013. These block ciphers are designed to provide the high performance across a range of devices. There are 10 versions of each cipher based on the block and key size combinations which makes them suitable for a wide range of applications. We discuss the encryption algorithm for 32-bit block size and 64-bit key variants of each block cipher. We omit the key expansion algorithm and original paper [4] can be referred for more details.

GIFT is designed by improving the bit permutation of the lightweight block cipher PRESENT. Based on the input plaintext block size, there are two versions of GIFT namely GIFT64 and GIFT128. In each version, the 128-bit key is used to encrypt the input plaintext. A brief description of SPECK32, SIMON32, and GIFT64 block ciphers is provided in the following subsections.

3.1 Description of SPECK32

SPECK32 is a block cipher with 32-bit block size and 64-bit key size. There are total 22 rounds in SPECK32. It is based on the Feistel network and can be represented by the composition of two Feistel maps. Its encryption algorithm divides the 32-bit input into the two 16-bit words (\( L_{r},R_{r}\)) and the key expansion algorithm extracts the 16-bit round subkeys \((RK_r)\) for each round. The round function comprises of addition modulo \(2^{16}\), bitwise XOR, left and right circular shift operations as described in Algorithm 1.

figure b

3.2 Description of SIMON32

SIMON32 is a block cipher with 32-bit block size and 64-bit key size. There are total 32 rounds in SIMON32 and it is also based on the Feistel network. Its encryption algorithm divides the 32-bit input into two 16-bit words \((L_{r}, R_r)\). The key expansion algorithm expands the 64-bit master key to provide the 16-bit round subkeys (\(RK_r\)) for each round. It applies a round function consisting the bitwise XOR, bitwise AND, and left circular shift operations on the left 16-bit words in each round. The encryption algorithm of SIMON32 is described in Algorithm 2.

figure c

3.3 Description of GIFT64

GIFT64 encrypts a 64-bit plaintext block using the 128-bit key and generates a 64-bit ciphertext block [3]. There are total 28 rounds in GIFT64. In each round, S-box, bit permutation, round subkeys and constant additions are applied through the round function. The key expansion algorithm extracts the 32-bit subkeys \((RK_r)\) from the 128-bit key. Its encryption algorithm uses a 4-bit S-box S (Table 1), bit permutation \(P_{64}\) (Table 2), 6-bit round constants \(C_r\) (Table 3) and 32-bit round subkeys \(RK_r\) as described in Algorithm 3.

figure d

S-box: The 4-bit S-box (Table 1) is applied 16 times in parallel in each round.

Table 1. S-Box

Bit Permutation: The diffusion layer uses a permutation \(P_{64}\) (Table 2) on 64 bits in each round.

Table 2. Bit permutation

Round Constants: In each round, the 6-bit round constant \(C_r\) given in the Table 3 is used, where \(c_0\) refers to the least significant bit. For subsequent rounds, it is updated as follows:

$$(c_5, c_4, c_3, c_2, c_1, c_0) \leftarrow (c_4, c_3, c_2, c_1, c_0, c_5 \oplus c_4 \oplus 1)$$
Table 3. Round constants

4 Differential Cryptanalysis

Differential cryptanalysis was applied on DES [19] and its exhaustive attack complexity was reduced. This created a path for other cryptanalytic techniques e.g. linear [16], impossible differential [6], algebraic [10], etc. [9]. While designing a block cipher, its output is tested for indistinguishability from the random permutations. However, there may not exist any relationship between the single input and output occurrences but there may exist the non-random relations between the input and output differences. The basic approach of differential attack is to study the propagation of input differences and exploitation of non-random relations between the input and output differences. This attack works with differential characteristics providing the high probability relation between the input and output differences. The high probability differential characteristics are used in the key recovery attack by adding some rounds on the top and bottom of the differential characteristic.

4.1 Classical Differential Distinguisher

There exists several automated techniques to search the optimal differential distinguishers for block ciphers [13]. In this paper, we use the available differential distinguishers for SPECK32 [1] and SIMON32 [7]. We extend the 6-round distinguisher for SPECK32 and 7-round distinguisher for SIMON32 using the ML distinguisher. For GIFT64, we construct the high probability differential characteristics for 4 rounds using the branch-and-bound based search technique [15] and extend this distinguisher with the ML distinguisher.

4.1.1 Differential Characteristic for SPECK32 

Abed et al. [1] presented the 9-round differential characteristics for SPECK32 with the probability of \(2^{-31}\). We choose 8-round differential characteristic presented in Table 4 [1] and use the 6-round differential characteristic (\(\varDelta _0\rightarrow \varDelta _6\)) with the probability of \(2^{-13} \) in our experiments.

Table 4. Differential characteristic of SPECK32 [1]

4.1.2 Differential Characteristic for SIMON32 

Biryukov et al. [7] presented the 12-round differential characteristics for SIMON32 with the probability of \(2^{-34}\). From the 12-round characteristic presented in Table 5 [7], we use the 7-round differential characteristic (\(\varDelta _0\rightarrow \varDelta _7\)) with the probability of \(2^{-16} \) in our experiments.

Table 5. Differential characteristic of SIMON32 [7]

4.1.3 Differential Characteristic for GIFT64 

We construct the 4-round optimal differential characteristic with high probability using branch-and-bound based search algorithm [15]. We use this 4-round differential characteristic with the probability of \(2^{-12}\) to construct the differential-ML distinguisher for GIFT64 (Table 6).

Table 6. Differential characteristic of GIFT64

4.2 Differential Distinguisher Using Machine Learning

For a chosen input difference, we use the neural distinguisher design proposed by A. Gohr [12]. We also consider the improvements in this design suggested by Baksi et al. [2] and use dense layers of MLPs (Multi Layers Perceptrons) instead of the convolution networks. We use two hidden layers with 1024 neurons in each layer and train the model on ciphertext differences rather than ciphertext pairs. These improvements increase the learning efficiency of the model.

The model is trained on the data with chosen and random differences. This approach works well because the model learns sensitivity as well as specificity in the data. The sensitivity corresponds to the true positive predictions while the specificity corresponds to the true negative predictions. Initially, we generate a set of random plaintexts \((P_{1},P_{2},\cdots ,P_{N})\) and assign a label 0 or 1 to each plaintext randomly. If label of the plaintext \(P_i\) is 1, then we generate another plaintext \(P^{'}_{i}\) having a difference \(\varDelta _r\) with \(P_i\) otherwise \(P^{'}_{i}\) is generated randomly. The difference \(\varDelta _r\) corresponds to the output difference of the classical differential distinguisher. We encrypt the plaintexts \(P_{i}\) and \(P^{'}_{i}\) using the s-round CIPHER\(_s\) to get the ciphertexts \(C_{i}\) and \(C^{'}_{i}\). The set of ciphertext differences \((C_{i} \oplus C^{'}_{i})\) along with the labels is used as training data (TD) for the training phase. Other than the training data, we also generate the validation data (VD) which is used by the trained model M to determine the validation accuracy. Size of TD and VD is subjected to the available computing resources. We train the model M on training data till the validation accuracy is saturated. The saturation implies that there is a negligible improvement in the validation accuracy of \(i^{th}\) training epoch \((\alpha _{s_{i}})\) in comparison to the validation accuracies \((\alpha _{s_{i-1}}\) and \( \alpha _{s_{i-2}})\) of the last two training epoches. We consider the model M as a valid distinguisher \((D^{ML}_{r+1\cdots r+s})\) if the validation accuracy \((\alpha _{s})\) is at least 0.51 (Algorithm 4).

Once a valid ML based distinguisher\((D^{ML}_{r+1\cdots r+s})\) is obtained, we generate a pair of plaintexts with chosen difference \((\varDelta _{r})\). ORACLE is queried for the corresponding ciphertexts and \(D^{ML}_{r+1\cdots r+s}\) is used to make the prediction on ciphertexts difference. If the prediction probability is at least 0.51 then we consider that the ciphertext pair belongs to the CIPHER\(_s\) otherwise not. The accuracy of such prediction is expected to be \(\alpha _{s}\).

figure e

5 Differential-ML Distinguisher: Extending Classical Differential Distinguisher Using Machine Learning

The accuracy plays an important role to design the machine learning based differential distinguisher. There is a trade-off between the accuracy and the number of rounds covered. If we increase the number of rounds then accuracy of the ML distinguisher may decrease. The data complexity of a low accuracy distinguisher cannot be compared with the classical distinguisher due to high amount of false positive and false negative in the ML distinguisher. Therefore, we propose a new technique which uses the ML distinguisher to extend the existing classical distinguisher. Since, the accuracy of the proposed extended distinguisher is high, we can compare its data complexity with the classical distinguisher.

Fig. 1.
figure 1

Extending the classical distinguisher using ML distinguisher (1. Classical distinguisher: \(D^{CD}_{1\cdots r}\) 2. ML distinguisher: \(D^{ML}_{r+1\cdots r+s}\) 3. Differential-ML distinguisher: \(D^{CD\rightarrow ML}_{1\cdots r+s}\))

figure f
figure g

To extend the r-round classical differential distinguisher \(D^{CD}_{1\cdots r}\) (\(\varDelta _0 \rightarrow \varDelta _r\)), we use the difference \(\varDelta _r\) to model s-round distinguisher (\(D^{ML}_{r+1\cdots r+s}\)) with an accuracy \(\alpha _s\). The accuracy \(\alpha _s\) defines the distinguishing ability of the distinguisher and better accuracy gives better predictions. Now, the distinguisher \(D^{ML}_{r+1\cdots r+s}\) can be used to distinguish the output of CIPHER\(_{s}\) with an accuracy \(\alpha _s\).

The data complexity of the r-round classical differential distinguisher (\(\varDelta _0 \rightarrow \varDelta _r\), probability: \(2^{-p_r}\)) is \(2^{p_r}\) chosen plaintext pairs. It is expected to get at least one occurrence of \(\varDelta _r\) in the output difference. If we provide \(2^{p_r}\) ciphertext pairs after the \((r+s)\) rounds of encryption to the distinguisher \(D^{ML}_{r+1\cdots r+s}\) then we expect that the ML distinguisher \(D^{ML}_{r+1\cdots r+s}\) will correctly predict one occurrence corresponding to the difference \(\varDelta _r\). Since ML distinguisher learns the multiple output differences, we expect that it will learn the pattern of differences which are suggested by the classical differential distinguisher. Therefore, we require at least \(2^{p_r}\) data to model the \((r+s)\)-round differential-ML distinguisher (\(D^{CD\rightarrow ML}_{1\cdots r+s}\)). Now, the accuracy \(\alpha _s\) of the s-round ML distinguisher plays a significant role. If accuracy \(\alpha _{s}\) is low then the accuracy \(\alpha _{r+s}\) of the distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) for \(2^{p_r}\) data will also be low. The accuracy of the differential-ML distinguisher must be high to compare it with the (\(r+s\))-round classical differential distinguisher. To increase the accuracy \(\alpha _{r+s}\), we propose a novel technique which requires additional data (\(2^{\delta }\)). Therefore, data complexity of the differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) becomes \(2^{p_r + \delta }\), where \(\delta \) defines the additional data required to increase the accuracy of predictions (Fig. 1).

In our technique, we define the differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) with five parameters (\(D^{CD}_{1\cdots r}\), \(D^{ML}_{r+1\cdots r+s}\), \(T=\alpha _s\), \(C_T\), \(\beta \)). Where, T is the threshold probability, \(C_T\) is the cutoff on the number of pairs with the prediction probability \(\ge \) T and \(\beta \) is data complexity of the differential-ML distinguisher. These parameters are required to construct the differential-ML distinguisher. We set \(\alpha _s\) as the threshold probability (T) and propose an experimental approach to calculate \(C_T\) and \(\beta \) (Algorithm 5). We start with the minimum data (\(2^{p_r}\)) and set \(\delta \) as 0. We generate a set of \(2^{p_r}\) plaintext pairs with the difference \(\varDelta _0\) and another set of \(2^{p_r}\) plaintext pairs with the random differences. These pairs are encrypted using the CIPHER\(_{r+s}\). The distinguisher \(D^{ML}_{r+1\cdots r+s}\) is used to get the prediction probabilities \(p_{\varDelta _{0} }\) and \(p_{R}\) as explained in Algorithm 5.

Using these probabilities, we get the True Positive (TP) and the True Negative (TP) values. We repeat this process 10 times and plot the curve for TP and TN values. If the TP and TN curves intersect, then we increase the data requirement and repeat the process with the increased data. We repeat the process until we get the non intersecting curves. Once such curves are obtained, data complexity (\(\beta \)) of the differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) becomes \(2^{p_{r}+\delta }\). To calculate \(C_T\), we take average of the closest points on the TP and TN curves. Closest points correspond to the minimum number of predictions on TP curve and maximum number of predictions on TN curve. The value of \(C_T\) is taken as the separation cutoff for TP and TN curves and it is used by the distinguisher (\(D^{CD\rightarrow ML}_{1\cdots r+s}\)) to distinguish the data sample correctly. The complete procedure to construct the differential-ML distinguisher is described in Algorithm 5.

The differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) act as the (\(r+s\))-round distinguisher. We choose a set of \(\beta \) plaintext pairs with the difference \(\varDelta _0\) and get the ciphertext pairs after (\(r+s\)) rounds. The distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) makes the prediction for each ciphertext pair. If the number of pairs with the prediction probability greater than T is above the cutoff threshold \(C_T\) then the distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) classifies whether the given data is an output from the target CIPHER\(_{r+s}\) or not. The prediction procedure is described in the prediction phase of the Algorithm 5.

With the proposed distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\), we can achieve very high accuracy to distinguish the output of the CIPHER\(_{r+s}\) and it can be used to mount a key recovery attack. The experiments to construct the differential-ML distinguishers are presented in the next section.

6 Experimental Results

We construct the differential-ML distinguisher for the 32-bit variants of the lightweight block ciphers SPECK and SIMON and 64-bit variant of GIFT. We experimented on 32-bit and 64-bit block ciphers due to constraints on available resources. With more computing power, ciphers with larger block size can be explored to construct differential-ML distinguisher. We extend the classical differential distinguisher discussed in Sect. 4 with the ML distinguisher. Using this novel technique, we construct the differential-ML distinguishers for 9-round SPECK32, 12-round SIMON32, and 8-round GIFT64 with (much) less data complexity than the classical distinguishers.

We used Keras-GPUFootnote 1 library in Google colabFootnote 2 for the model training and predictions. In each experiment, ADAM optimizer is used for the adaptive learning rates and the Mean-Square-Error is used as the loss function. The validation batch accuracy is considered as the accuracy (\(\alpha _{s}\)) of the trained model.

6.1 Differential-ML Distinguisher: SPECK32

For SPECK32, we use the classical differential characteristic for 6 rounds (\(\varDelta _0\rightarrow \varDelta _6\)) as described in the Table 4. We have an output difference 0x850A9520 after 6 rounds with the probability of \(2^{-13}\). We train the 3-round ML distinguisher using \(\varDelta _6\) as the input difference and the model is trained with the accuracy of 0.79 using Algorithm 4. The batch accuracy and loss are described in the Appendix A.

6.1.1 Construction 

The probability of the 6-round classical differential distinguisher is \(2^{-13}\). Therefore, we will require at least \(2^{13}\) data to make the predictions with the 9-round differential-ML distinguisher. We calculate T, \(C_T\), and \(\beta \) as discussed in Algorithm 5 and construct the 9-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) by extending the 6-round classical distinguisher. We draw the graphs for TP and TN values (Fig. 2) and calculate data complexity (\(\beta \)) and cutoff (\(C_T\)). We experimented with the various samples of different sizes and obtained a clear separation between true positive and true negative curves for a sample size of \(2^{20}\). We calculate the value of \(C_T\) as 73100 and \(\beta \) as \(2^{20}\) with the help of graph (d) in Fig. 2.

Fig. 2.
figure 2

Calculation of \(C_{T}\) and data complexity (\(\beta \)) for SPECK32 (\(D^{CD\rightarrow ML}_{1\cdots r+s}\))

6.1.2 Prediction 

We constructed the 9-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) in the previous subsection. The accuracy \((\alpha _{r+s})\) of this differential-ML distinguisher for different experiments is mentioned in the Table 7.

Table 7. Accuracy for SPECK32 with \(T = 0.79\), \(C_{T} = 73100\) and \(\beta =2^{20}\)

In the experiments, we take 50 samples belonging to the plaintext difference \(\varDelta _0\) (=0x0211 0A04) of the classical distinguisher and other 50 samples belonging to the random input differences. The differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) predicts whether the given sample belongs to the difference \(\varDelta _0\) or not by using the Algorithm 5. We used \(2^{20}\) data in each sample and achieved the accuracy (\(\alpha _{r+s}\)) more than 96% in each experiment.

Therefore, the data complexity of the 9-round differential-ML distinguisher for SPECK32 is \(2^{20}\). However, the data complexity of the 9-round classical differential distinguisher is \(2^{31}\) as presented in [1]. The best known differential characteristics for SPECK32 exists for 9-rounds with the data complexity of \(2^{30}\) [7]. Using the differential-ML technique, we have constructed the 9-round distinguisher with the data complexity far less than the existing classical differential distinguisher.

6.2 Differential-ML Distinguisher: SIMON32

For SIMON32, we use the classical differential characteristic for 7 rounds as described in the Table 5. We have an output difference 0x1D014200 (\(\varDelta _7\)) after 7 rounds with the probability of \(2^{-16}\). We use \(\varDelta _7\) as the input difference for the training phase of the 5-round ML distinguisher. We train the model with the accuracy of 0.57 using the Algorithm 4. The batch accuracy and loss are described in the Appendix A.

6.2.1 Construction 

The probability of the 7-round classical differential distinguisher is \(2^{-16}\). So, we will require at least \(2^{16}\) data for the 12-round differential-ML distinguisher of SIMON32 and additional data \((2^{\delta })\) will be required to increase the accuracy of the differential-ML distinguisher. Similar to the SPECK32 case, we require T, \(C_T\), and \(\beta \) to construct the 12-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) which extends the existing 7-round classical distinguisher. We calculate the data complexity (\(\beta \)) and cutoff (\(C_T\)) by using the Algorithm 5 and the graphs for TP and TN values (Fig. 3). It is observed from the graphs that a clear separation between true positive and true negative values exists for the sample size of \(2^{22}\). We calculated the value of \(C_T\) as 656300 and data complexity(\(\beta \)) as \(2^{22}\) on the basis of this separation.

Fig. 3.
figure 3

Calculation of \(C_{T}\) and data complexity (\(\beta \)) for SIMON32 (\(D^{CD\rightarrow ML}_{1\cdots r+s}\))

6.2.2 Prediction 

The 5-round ML distinguisher (\(D^{ML}_{r+1\cdots r+s}\)) is trained with the validation accuracy of 0.57. It is used to extend the 7-round classical differential distinguisher. The accuracy of the 12-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) for different experiments is mentioned in the Table 8.

Table 8. Accuracy for SIMON32 with \(T = 0.57\), \(C_{T} = 656300\) and \(\beta =2^{22}\)

Similar to the previous case, we take 50 samples belonging to the initial input difference \(\varDelta _0\) (=0x04001900) of the classical distinguisher and other 50 samples belonging to the random input differences. We make predictions with \(2^{22}\) data using the value of \(C_T\) calculated in the previous step and the accuracy (\(\alpha _{r+s}\)) greater than 97% is achieved in each experiment. From these experiments, 12-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) with data complexity of \(2^{22}\) is constructed, while data complexity for the 12-round classical differential distinguisher is \(2^{34}\) (Table 5). In this case, we present the first 12-round distinguisher with the data complexity less than \(2^{32}\). This shows that the differential-ML distinguisher provides the better results than the classical differential distinguisher in case of SIMON32 also.

6.3 Differential-ML Distinguisher: GIFT64

For GIFT64, we searched an optimal differential characteristic for 4 rounds which is described in the Table 6. We obtain the output difference after 4 rounds as \(\varDelta _4\) = 0x0044000000110000 with the probability of \(2^{-12}\). The difference \(\varDelta _4\) is used to train the 4-round ML based distinguisher. We train a model with the accuracy of 0.65 using the Algorithm 4. The batch accuracy and loss are described in Appendix A.

6.3.1 Construction 

The probability of the 4-round classical differential characteristic is \(2^{-12}\). Therefore, data complexity of the 4-round differential distinguisher will be \(2^{12}\). So, the 8-round differential-ML distinguisher for GIFT64 will require at least \(2^{12}\) data. We calculate T, \(C_T\), and data complexity (\(\beta \)) by using Algorithm 5. These are required to construct the 8-round differential-ML distinguisher by extending the 4-round classical differential distinguisher. It can be easily inferred from the graphs depicted in Fig. 4 that a clear separation between true positive and true negative values exists for the sample size of \(2^{20}\). We use this separation to get the cutoff threshold (\(C_T=103750\)) and data complexity (\(\beta =2^{20}\)) for \(D^{CD\rightarrow ML}_{1\cdots r+s}\).

Fig. 4.
figure 4

Calculation of \(C_{T}\) and data complexity (\(\beta \)) for GIFT64 (\(D^{CD\rightarrow ML}_{1\cdots r+s}\))

6.3.2 Prediction 

The 4-round ML distinguisher \(D^{ML}_{r+1\cdots r+s}\) is trained with the validation accuracy of 0.65 and it is used to extend the 4-round classical differential distinguisher. Accuracy of the 8-round differential-ML distinguisher \(D^{CD\rightarrow ML}_{1\cdots r+s}\) for different experiments is mentioned in the Table 9.

Table 9. Accuracy for GIFT64 with \(T = 0.65\), \(C_{T} = 103650\) and \(\beta =2^{20}\)

Similar to SPECK32 and SIMON32 cases, we take 50 samples belonging to the input difference \(\varDelta _0\) (=0x000000000000000A) of the classical distinguisher and other 50 samples belonging to the random input differences. For each sample, we use \(2^{20}\) data to achieve the accuracy (\(\alpha _{r+s}\)) greater than 98% in each experiment. Therefore, data complexity of the 8-round differential-ML distinguisher is \(2^{20}\), while data complexity of the 8-round classical differential distinguisher was \(2^{38}\) [20].

6.4 Comparison with the Classical Differential Distinguishers

We have constructed the differential-ML distinguishers for the block ciphers based on three different types of structures (Feistel, SPN, and ARX). We are able to distinguish the same number of rounds using less amount of data in comparison to the classical distinguisher. These results indicate that our technique provides better results for the block ciphers based on all types of structures. The source code for the above mentioned experiments is available on GitHubFootnote 3. We provide a comparison of the data complexities between the differential-ML distinguisher and the classical differential distinguisher in Table 10.

Table 10. Summary of results

7 Conclusion

In this paper, we have proposed a novel technique to extend the classical differential distinguisher using machine learning. We have constructed the high accuracy (more than \(96\%\)) differential-ML distinguishers for 9-round SPECK32, 12-round SIMON32, and 8-round GIFT64. For SPECK32, we have extended the 6-round classical differential distinguisher with the 3-round ML distinguisher and the data complexity of 9-round differential-ML distinguisher is \(2^{20}\). For SIMON32, the classical differential distinguisher for 7-rounds is extended with the 5-round ML distinguisher and data complexity of the 12-round differential-ML distinguisher is \(2^{22}\). For GIFT64, the 8-round differential-ML distinguisher is constructed with the data complexity of \(2^{20}\) whereas data complexity of the 8-round classical differential distinguisher was \(2^{38}\). The data complexity of the distinguishers for SPECK32, SIMON32, and GIFT64 is significantly reduced using differential-ML distinguishers in comparison to the classical distinguishers.