A New Heuristic Based on the Cuckoo Search for Cryptanalysis of Substitution Ciphers

Jain, Ashish; Chaudhari, Narendra S.

doi:10.1007/978-3-319-26535-3_24

Ashish Jain¹⁷ &
Narendra S. Chaudhari^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9490))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

The efficient utilization of one of the latest search heuristic, namely, cuckoo search for automated cryptanalysis (or attack) of substitution ciphers is addressed. A previously proposed genetic algorithm based attack of the simple substitution cipher is enhanced. A comparison is obtained between various attack algorithms that are based on cuckoo search, genetic algorithm, enhanced genetic algorithm, tabu search and scatter search where cuckoo search shows the best performance. It is worth pointing out that the proposed cuckoo search algorithm provides a valid and efficient option for solving similar permutation problems.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Cuckoo Search Strategies for Solving Combinatorial Problems (Solving Substitution Cipher: An Investigation)

On the Cryptanalysis of S-DES Using Binary Cuckoo Search Algorithm

A Review on Metaheuristic Techniques in Automated Cryptanalysis of Classical Substitution Cipher

Keywords

1 Introduction

The goal of the cryptanalyst is to systematically recover the original text (plaintext) and/or key by mounting an attack on the cipher. The attack may involve several ciphertexts and/or some plaintexts, intelligent mathematical computer algorithms and usually some luck. A ciphertext-only attack is one where the objective of the adversary (or cryptanalyst) is to recover the plaintext and/or deduce decryption key by observing only the “known ciphertext”. This class of attack is most challenging and in this paper, we study such a cryptanalytic attack on a classical cipher (a most general form –mono-alphabetic (or simple) substitution cipher, hereinafter, substitution cipher). Although classical ciphers, namely, substitution and transposition ciphers were used hundreds of years ago, a particular interest in the study of these ciphers has been retained due to the fact that most of the modern cryptosystems still utilize the functions of these ciphers as their basic building blocks. For instance, the Advanced Encryption Standard (AES) used all over the world by finance community is based on the principle of the substitution-permutation network. Due to such important facts, the classical ciphers are normally considered as first choice when investigating new attack strategies such as the one discussed in this paper. In literature, several heuristics have been applied successfully for cryptanalysis of classical ciphers. For example, [1–8], where the main goal of researchers was to develop efficient cryptanalytic algorithms by utilizing the search (or optimization) heuristics.

1.1 Simple Substitution Cipher

In this type of cipher a key can be represented as a permutation of the plaintext alphabet (for example, if the plaintext alphabet consists of 26 English alphabets and the space character, then a permutation of these 27 alphabet characters usually forms a key). During the encryption process each letter of the plaintext message is replaced by the corresponding key element (i.e. ciphertext alphabet) that forms a ciphertext of length equal to the length of the plaintext message. The original plaintext message from the ciphertext is then recovered by intended recipient using decryption process. For details about this cipher we refer the interested reader to [4].

1.2 Our Contributions

It is surprising that in the last one decade no heuristic attack more efficient than the tabu search attack has been reported for breaking the substitution cipher. In this paper, we overcome the limitation by developing a new attack algorithm by utilizing one of the latest search heuristic, namely, cuckoo search. Moreover, for optimizing the cryptanalysis process we fine-tune some of the parameters of the cuckoo search. Additionally, two related and efficient attacks of the substitution cipher proposed by Clark [4] and are based on the genetic algorithm and tabu search are implemented. Furthermore, the genetic algorithm attack proposed by Clark [4] is enhanced by incorporating a new adaptive mutation operator. Although the scatter search based attack of the substitution cipher proposed by Garici and Drias [5] is not efficient (with respect to time complexity), it is being considered to obtain a fair comparison between previously proposed evolutionary heuristic based attacks and those presented in this paper.

2 Cryptanalysis of Substitution Ciphers

The main weakness of substitution ciphers is that their character frequency statistics (or n-grams) are not changed by the encryption algorithm. In other words, for every grouping of characters in the plaintext there is a distinct and corresponding grouping of characters in the ciphertext [4]. Therefore, most attacks of substitution ciphers attempt to match the n-grams of the encrypted text with those of known language (for example, English). In this paper, we use this fact as the basis to attack the substitution cipher and at the same time automate the search for the required n-gram frequencies by developing efficient algorithms based on the optimization heuristics.

2.1 Fitness Function or Cost

During the optimization process, each candidate key (or individual solution) is used to decrypt the known ciphertext and at the same time the n-gram statistics of the decrypted text is compared to the language statistics. In general, Eq. (1) can be used for comparison of these statistics.

$$\begin{aligned} Cost_k=\alpha \left( \sum _{i\in \mathscr {A}} |\mathcal {K}_{i}^{u}-\mathcal {D}_{i}^{u}|\right) +\beta \left( \sum _{i\in \mathscr {A}} |\mathcal {K}_{i}^{b}-\mathcal {D}_{i}^{b}|\right) +\gamma \left( \sum _{i\in \mathscr {A}} |\mathcal {K}_{i}^{t}-\mathcal {D}_{i}^{t}|\right) \!, \end{aligned}$$

(1)

where $\mathscr {A}$ denotes the language alphabet (e.g. in English language: A, B, ..., Z, _), and $\mathcal {K}$ and $\mathcal {D}$ denote statistics of the known language and decrypted text, respectively. The indices u, b and t denote the unigram, bigram and trigram statistics, respectively. Different weights in the range (0.0–1.0) with step size 0.1 can be assigned to $\alpha $, $\beta $ and $\gamma $. However, the following constraint must be satisfied to keep the number of combinations of $\alpha $, $\beta $ and $\gamma $ workable.

$$\begin{aligned} \alpha +\beta +\gamma =1.0. \end{aligned}$$

(2)

Generally, in order to do more accurate evaluation of candidate keys, the n-grams should be higher in number. Also as per literature, the inclusion of trigram statistics in the fitness function are generally the most effective basis for cryptanalysis of classical ciphers [4]. However, for cryptanalysis of the substitution cipher the fitness function used in literature (for example, [1, 4]) is purely based only on the bigrams. The reason is that the complexity associated with computation of trigram statistics is high (order of $N^3$, where N is the key size), while the benefit over the bigrams is minimal. Moreover, even though small amount of ciphertexts are known, the attack on the substitution cipher is more effective using fitness function which is based on the bigrams only than one which utilizes trigrams alone [4]. Due to these facts, the following fitness function which is purely based on the bigrams is used in this paper to attack the substitution cipher.

$$\begin{aligned} Cost_k=\sum _{i\in \mathscr {A}} |\mathcal {K}_{i}^{b}-\mathcal {D}_{i}^{b}|. \end{aligned}$$

(3)

2.2 Cryptanalysis Using Genetic Algorithms

In this section, we enhance the genetic algorithm attack that was proposed by Clark (particularly, mutation operator). Note, the crossover (or mating) operator employed in Algorithm 1 is identical to that proposed by Clark. Due to lack of space, we do not present the crossover operator, but we refer readers to [4].

Mutation-I. This mutation operator is utilized in most of the previously reported attacks on the classical cipher. It is based on the simple way of perturbing a cipher key where two randomly selected elements of a key are swapped. In addition to this mutation operator, we use a similar mutation operator but an adaptive one which can be described as follows.

Mutation-II. Interchange three randomly chosen elements of the key. However, this mutation operator is adapted when evolution starts to languish, i.e. no improvement in the solution is observed after some number of iterations. We limit this number to seven based on the experiments. The main benefit of the adaptive mutation operator is that it improves the rate of convergence and the success probability (in terms of full key recovery) of the genetic algorithm attack.

2.3 Cryptanalysis Using Cuckoo Search Combined with Lévy Flights

In 2009, Yang and Deb [9, 10] have proposed a new nature-inspired population based metaheuristic known as cuckoo search via Lévy flights. “This metaheuristic is formed by inspiration from the obligate brood parasitic behavior of few cuckoo species in combination with Lévy flight behavior of some birds and fruit flies [9]”. For simplicity, the standard cuckoo search method is described in Algorithm 2.

To generate a new nest (or a solution vector, see Algorithm 2, step 3) $\mathbf x _{j}(t+1)$ from an existing nest $\mathbf x _{j}(t)$, for, a cuckoo j; Lévy flight is performed as [9, 11]:

$$\begin{aligned} \mathbf x _j(t+1)=\mathbf x j(t)+\mu . \end{aligned}$$

(4)

The above equation is essentially a stochastic equation for a random walk. In general, a random walk is a Markov chain whose next state/location depends only on the current location (the first term) and the transition probability (the second term) [9]. In the second term, $\mu >0$ is a step-size scaling factor which should be associated to the scales of the problem of interest [12]. The term l is the step-size which is drawn from a probability distribution with a power law tail (also known as Lévy stable distribution) [13]. It is worth mentioning that the cuckoo search via random walk using Lévy flights is more efficient than other popular swarm intelligence techniques (e.g. PSO) in exploring the search space as its step-size distribution is pseudo-random [12]. Nonetheless, it is not a trivial task to produce pseudo-random steps that accurately obey the Lévy stable distribution. However, from the implementation aspects, Mantegna’s [13] algorithm is one of the most efficient and yet straightforward method that generates a stochastic variable [12]. Note that the stochastic variable has probability density arbitrarily close to Lévy stable distribution characterized by an arbitrarily chosen control parameter (0.3$ \le \lambda \le $ 1.99) [13]. Finally, the step-size l using Mantegna’s algorithm can be calculated as [13]:

$$\begin{aligned} l=\frac{u}{|v|^{1/\lambda }}, \end{aligned}$$

(5)

where u and v are two Gaussian stochastic variables with a zero mean and standard deviations of $\sigma _{u}(\lambda )$ and $\sigma _{v}(\lambda )$, respectively, where $\sigma _{u}(\lambda )$ and $\sigma _{v}(\lambda )$ can be given by Eq. (6). Here, $\varGamma (z)$=$\int _{0}^{\infty } t^{z-1} e^{-z} dt$ [13]. For detailed description on the standard cuckoo search interested reader can refer [9, 11, 12].

$$\begin{aligned} \sigma _{u}(\lambda )=\left[ \frac{\varGamma (1+\lambda ) \text {sin}(\pi \lambda /2)}{\varGamma ((1+\lambda )/2)\lambda 2^{(\lambda -1)/2}}\right] ^{1/\lambda }=0.696575 \text { and } \sigma _{v}(\lambda )=1 (\text {if: }\lambda =1.5), \end{aligned}$$

(6)

Cuckoo Search Attack. The cuckoo search attack algorithm for cryptanalysis of the substitution cipher is intuitive (see Algorithm 3). Therefore, here we discuss only the mapping of main components of the cuckoo search (i.e. nest, egg and Lévy flights) to the problem under consideration. In most applications of the cuckoo search, often it is assumed that each nest has one egg. However, in case of cryptanalysis of the substitution cipher, we consider each nest has N distinct eggs/elements (i.e. N distinct characters of a key: $n_1$, $n_2$,..., $n_n$, where N=27 (i.e. A —Z and the space character). For simplicity, consider each egg has a unique number ($\in $ [1, N]) associated with it. That is there must be a unique identity of each elements in the nest/solution. In other words, a key cannot have two similar characters. In this regard, the difficulty is how to preserve distinctness property of the key elements, since we can clearly observe that the Eq. (4) will destroy the distinctness property of the elements of the current solution during updation. Note that the importance of Eq. (4) is that it builds the new solution from an existing solution via Lévy flights which is an efficient approach, since the step-size is heavy tailed and any large step is possible. That is the existing solution has better chance to get changed to a better quality solution in lesser computations. Hence, we do not change this equation, rather we utilize its efficiency for improving existing solutions using the current best solution and new solutions generated by Eq. (4) (see Algorithm 3, step 7–11).

2.4 Cryptanalysis Using Tabu Search

The main focus of this optimization algorithm [4, 14] is to provide a heuristic for searching a good solution to the problem under consideration without becoming trapped in a local optima. This algorithm maintains a tabu list as a short term memory list where at each iteration the current key is added to the tabu list, and the key remains ‘tabu’ for a fixed number of iterations. An intuitive level heuristic for cryptanalysis of the substitution cipher is shown in Algorithm 4. Here, we mention that the choice of $N_{poss}$ parameter must be less than N(N-1), since this is the maximum number of distinct keys which can be created from the best key of the tabu by swapping two elements [4], where N is the key size (e.g. 27). For a detailed description of this heuristic we refer readers to [3, 4].

3 Parameters, Experimental Setup and Results

The inputs to all the above presented algorithms are: known ciphertext, its length and bigram statistics of the language (which are assumed to be known). The output of each algorithm is either full or partial substitution cipher key. Note that in order to recover the message that is readable, it is not essential to recover every element of the key, i.e. considerable partial key recovery is also significant to understand the message. The parameters such as m (i.e. population/tabu-list size), Maximum-Iterations (i.e. maximum number of iterations) and $N_{poss}$ (i.e. size of possibilities list in case of tabu search) were fine-tuned by a combination of several experiments. Note that the fine-tuning was performed separately for each of the algorithms in order to optimize the cryptanalysis process.

In some scenarios guidelines are helpful, for example in Algorithm 3, $\mu $=0.01 and $\lambda $=1.5 are taken that has been reported in [10] as general choice. Furthermore, in order to obtain a clear comparison between above algorithms, the guidelines reported in [4] followed, i.e. three criteria were used: amount of known ciphertext available for the attack, number of keys examined prior to determination of correct solution and the time needed to find the correct solution.

Now, we discuss the experimental setup and their details. We tested each of the algorithms on 100 different messages. For each message, each of the cryptanalytic algorithms was run 3 times (it comes to a total of 300 times) and the best of the 3 was recorded. Afterwards, 100 best recorded results corresponding to each algorithm were then averaged. The above described process of recording and averaging is repeated for known ciphertext of size 100 to 800 with step size of 100. The average results of each of the algorithms are then plotted in Fig. 1. From Fig. 1, we can clearly observe that each of the attack algorithms perform well and comparably. Nevertheless, the best way to identify the proper efficiency of the approximation algorithms is; to compare them on the basis of two factors: state space searched (i.e. number of keys examined before evolving the best result) and complexity of the attack (i.e. performance time). For this purpose, we tested each of the attack algorithms on the 100 different known ciphertext of length 1000 characters and then recorded the amount of keys examined and the time taken by the attack. The average results are shown in Table 1. Note that the scatter search method has not been considered in Table 1 (and not re-implemented in this paper), since Garici and Drias (investigators of this algorithm) have concluded that the scatter search method takes 75 % more time than the genetic algorithms (please see Sect. 4.2 in [5]), while the quality of the results is only 15 % better. From Table 1, we can clearly see that the mean performance time of enhanced genetic algorithm is comparatively lesser than the genetic algorithm proposed by Clark, while the average number of keys examined is approximately equal. From Table 1, we can note that the tabu search is more efficient in both respects than genetic algorithms. However, it is also clear from the table that the cuckoo search outperforms tabu search in both respects. Most importantly, the overall performance of cuckoo search is much better among all the attack algorithms (including mean and standard deviation of the key elements correctly found).

Table 1. A comparison based on: mean performance time ($T_{mean}$), average number of keys examined before the best solution found ($M_{avg}$) and number of key elements correctly found–mean ($\bar{X}$) & standard deviation (S).

Full size table

4 Conclusion

This paper presents various attacks on the substitution cipher where genetic algorithms, tabu search and cuckoo search have been utilized. We examined that the adaptive mutation operator along with appropriate selection procedure improves the performance of previously proposed genetic algorithm attack. It is worth pointing out that the cuckoo search attack has shown the best performance among all attacks. Most importantly, we needed to fine-tune lesser number of parameters for cuckoo search than genetic algorithms and tabu search. This study indicates that the developed attack algorithm (which utilizes the cuckoo search) is able to produce results that are clearly better than previous attack algorithms of the substitution cipher, and therefore it can be used as a valid and efficient alternative for solving this kind of permutation problems.

References

Forsyth, W.S., Safavi-Naini, R.: Automated cryptanalysis of substitution ciphers. Cryptologia 17(4), 407–418 (1993)
Article MathSciNet MATH Google Scholar
Spillman, R., Janssen, M., Nelson, B., Kepner, M.: Use of a genetic algorithm in the cryptanalysis of simple substitution ciphers. Cryptologia 17(1), 31–44 (1993)
Article Google Scholar
Clark, A.: Modern optimisation algorithms for cryptanalysis. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 258–262. IEEE (1994)
Google Scholar
Clark, A.J.: Optimisation heuristics for cryptology. Ph.D. thesis (1998)
Google Scholar
Garici, M.A., Drias, H.: Cryptanalysis of substitution ciphers using scatter search. In: Mira, J., Álvarez, J.R. (eds.) IWINAC 2005. LNCS, vol. 3562, pp. 31–40. Springer, Heidelberg (2005)
Chapter Google Scholar
Song, J., Yang, F., Wang, M., Zhang, H.: Cryptanalysis of transposition cipher using simulated annealing genetic algorithm. In: Kang, L., Cai, Z., Yan, X., Liu, Y. (eds.) ISICA 2008. LNCS, vol. 5370, pp. 795–802. Springer, Heidelberg (2008)
Chapter Google Scholar
Cowan, M.J.: Breaking short playfair ciphers with the simulated annealing algorithm. Cryptologia 32(1), 71–83 (2008)
Article MATH Google Scholar
Boryczka, U., Dworak, K.: Genetic transformation techniques in cryptanalysis. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014, Part II. LNCS, vol. 8398, pp. 147–156. Springer, Heidelberg (2014)
Chapter Google Scholar
Yang, X.S., Deb, S.: Cuckoo search via lévy flights. In: World Congress on Nature & Biologically Inspired Computing, 2009, NaBIC 2009, pp. 210–214. IEEE (2009)
Google Scholar
Yang, X.S., Deb, S.: Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer. Optimisation 1(4), 330–343 (2010)
Article MATH Google Scholar
Yang, X.S., Cui, Z., Xiao, R., Gandomi, A.H., Karamanoglu, M.: Swarm Intelligence and Bio-Inspired Computation: Theory and Applications. Elsevier, Waltham (2013)
Book Google Scholar
Yang, X.S.: Nature-Inspired Optimization Algorithms. Elsevier, Amsterdam (2014)
MATH Google Scholar
Mantegna, R.N.: Fast, accurate algorithm for numerical simulation of levy stable stochastic processes. Phys. Rev. E 49(5), 4677–83 (1994)
Article Google Scholar
Glover, F.: Tabu search: a tutorial. Interfaces 20(4), 74–94 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Discipline of Computer Science and Engineering, Indian Institute of Technology Indore, Indore, India
Ashish Jain & Narendra S. Chaudhari
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Narendra S. Chaudhari

Authors

Ashish Jain
View author publications
You can also search for this author in PubMed Google Scholar
Narendra S. Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Jain .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, A., Chaudhari, N.S. (2015). A New Heuristic Based on the Cuckoo Search for Cryptanalysis of Substitution Ciphers. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-26535-3_24
Published: 10 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26534-6
Online ISBN: 978-3-319-26535-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us