1 Introduction

Cryptanalysis of stream ciphers has various directions, like, algebraic attacks, statistical attacks, side channel attacks etc. Algebraic or statistical attacks evaluate the mathematical robustness of ciphers. There is another kind of attack which exploits practical implementations, called side channel attacks (SCA). Hardware/software implementations may leak information through power consumption, timing etc. SCA exploits these side channel information to attack systems. Scan chain-based attack [1, 16, 21, 24] is one such side channel attack. It exploits the design for testability features built in almost all modern-day ICs. Scan chain is a testability feature built in devices, where, all the flip-flops (FFs) are connected via a scan chain, data may be input in the device through scan_in line and after normal mode of operation scanned out through the scan_out line. Through scan chain feature, user may both observe the contents of a FF and set any FF to a desired state. However, both of these features may be exploited to break a system. It has been shown in [21, 24, 25] that crypto-systems may be broken, practically, exploiting the scan technique. Scan attacks are thus a threat to cryptographic devices implementations. Evidently, by disabling scan chains this vulnerability may be removed. But, this increases the possibility of defective supply of hardware. However, extensive effort is also given toward countermeasures against scan attacks [2, 9,10,11,12, 15, 17].

Scan attacks and their countermeasures are studied by many researchers. Literature shows that such kind of attacks are effective against both block ciphers and stream ciphers. Even strong ciphers that are resistant to other existing attacks are vulnerable against scan attack. Scan-based attacks on data encryption standard (DES) and advanced encryption standard (AES) have been shown in [24] and [25], respectively. Till date, no mathematical attacks are known against full round Trivium, but [1] mounts a scan-based side channel attack against Trivium, while [21] presents an attack on the RC4 stream cipher. On the other hand, a number of schemes are suggested to prevent these attacks [10, 21, 25]. However, all these countermeasures are application specific. Hence, scan attack is still a threat for cryptographic hardware.

In this paper, we propose a general strategy for attacking any stream cipher using scan-based side channels. The proposed strategy is employed to hardware efficient finalists of eStream [3, 22], MICKEY-128 2.0, Trivium, Grain-128 and are shown that all three are vulnerable against this attack in practical scenario.

This paper is organized as follows. Following this introduction, Sect. 2 briefly discusses scan attack and states the proposed methodology of scan-based side channel attack on stream ciphers. Section 3 illustrates the proposed attack on MICKEY-128 2.0, Trivium and Grain-128. Finally, Sect. 5 concludes the paper.

2 Scan attack on stream ciphers

The DFT feature provides the ability to run the system in two modes, normal and test. In the test mode, a scan_in line allows test input to the IC, which after normal mode of operation of the chip may be scanned out through the scan_out line that shows the contents of one register per clock cycle. Scan_in/scan_out is a universally accepted feature practised in VLSI test community. But it opens a side channel for attacking crypto-hardware.

A scan-based side channel attack works in two phases. In the first phase, the attacker obtains correspondence of the scan chain bit positions with the hardware FFs. This way the attacker is able to obtain the internal state of the cipher. Once, the internal state of the cipher is known, in the next phase, the attacker inverts the state to its initial state to get back the key of the cipher. As the scan chain connects the registers of the IC in an arbitrary manner, the main challenge of an adversary is to determine the exact contents of the state of the cipher from the permutation of FFs in the scan chain. The success of this phase depends on the design of the algorithm. The second phase of attack is concentrated on the possibility of reversibility of cipher states from one state to its previous. However, the second phase is redundant as once the full state of the cipher is known at any cycle, we can predict the next sequence of keystreams easily according to the cipher algorithm.

In this section, a new scan-based attack is employed successfully on hardware efficient finalists of eStream, MICKEY-128 2.0, Trivium and Grain-128. Our attacking principle require an interface through which we can feed key and IV to the stream cipher as shown in Fig. 1. The proposed methodology for scan-based attack is applicable on any stream cipher when we have the interface, shown in Fig. 1 to the cipher. In this section, we propose this general strategy. It may be mentioned that the strategy is equally applicable against block ciphers as well. The attack model is described next.

2.1 Physical and mathematical model of scan attack

User may input key and IV bits through the interface (Fig. 1). Key is the secret private key of the stream cipher, while IV is a public initialization vector of the stream cipher. The user may change IV with same key to get different keystreams. The hardware implementation has a scan architecture with \(scan\_in\) controllability and \(scan\_out\) observability.

  • The user is able to run the cipher for any number of cycles in normal mode by keeping \(scan\_in = 0\).

  • The state of the internal registers can be obtained through the \(scan\_out\) line when \(scan\_in = 1\).

  • The order in which scan chain connects the internal registers is unknown to the user.

  • The attacker may use different IV (since it is publicly changeable), restart the system and use scan features. The IV is an input to the interface of the crypto-hardware. Obviously, the attacker cannot change the key.

Fig. 1
figure 1

I/O interface of stream cipher for scan attack

Since, the state of the internal register may be fully known only when the order of getting the bits is known, the challenge of an adversary is to find correspondence of the output bits with the internal registers used in the hardware implementation.

Let the internal registers of the crypto-chip implementing the crypto-algorithm be the ordered sequence, \(R =\langle r_0, r_1, \ldots , r_{n-1}\rangle \). We take R in such a way that \(R=\langle K,V,X\rangle \), where, K, V are the exact register sequence holding initial private key and public IV of the algorithm in correct order. X is the rest of the registers in the chip connected to scan chains. Let the positions \(P=\langle p_0, p_1, \ldots , p_{n-1}\rangle \) denotes the first, second, \(\ldots n\mathrm{th}\) position of the \(scan\_out\) in order. Let \(S_j=Content^j(R)\) and \(Q_j=Content^j(P)\), where \(Content^j(Y)\) is the content of Y after j rounds of operation of the algorithm following the \(\langle key, IV\rangle \) loading phase. Also, let, \(s_{ij} = Content^j(r_i)\) and \(q_{ij} = Content^j(p_i)\).

Then, by the principle of scan chains,

$$\begin{aligned} P=\Pi (R), \end{aligned}$$

where \(\Pi \) represents a permutation operation.

Now, \(S_j=Round^j(S_0)\), where

$$\begin{aligned} S_0=Content_0(R)=\langle k,iv,x\rangle \end{aligned}$$

the initial internal state of the chip and exponentiation refers to execution of the Round operation of the cipher. Clearly, knowing \(S_0\) gives away the private key, k, initial vector iv and auxiliary bits x of the algorithm.

P and \(Q_j, \forall j\) are the leakages through the \(scan\_out\) line. We exploit this information to obtain \(S_0\) and thus k.

2.1.1 Obtaining \(\Pi ^{-1}\)

The goal to obtain \(\Pi ^{-1}\) is equivalent to obtaining a bijection between R and P.

We define an equivalence relation, I on \(\{r_0, r_1, \ldots , r_{n-1}\}\) as follows:

For a fixed m and \(\langle key,IV\rangle \) pair,

$$\begin{aligned} (r_i, r_j)\in I \quad \mathrm{if}\ \mathrm{and}\ \mathrm{only}\ \mathrm{if}, \quad s_{il}=s_{jl} \quad \forall l=0,1,\ldots , m. \end{aligned}$$

Then, we form equivalence classes in I. The set of \(r_i, i=0, 1, \ldots , {n-1}\) having same value as \(q_{ij}\) for \(j=0,1,2 \ldots , m\) form an equivalence class \(E_i\). In other words,

$$\begin{aligned} E_i=\{r_l:s_{lj}=q_{ij} \forall j=0,1, \ldots , m\} \end{aligned}$$

We then gradually increase m to increase the number of equivalence classes, ||E||. Clearly, when \(||E||=n\), we have obtained a bijection. Let, for \(m=M\), this bijection is obtained. We give in the theorem below the proof that M exists.

Theorem 1

For all i, The sequence \(\langle s_{i0}, s_{i1}, \ldots , s_{im}\rangle \) is pseudo-random for large values of m.

Proof

A good stream cipher generates pseudo-random sequences as output. In other words, \(Z_j=C^j(K,V)\) is not polynomial-time distinguishable from a random sequence. If possible, let \(\exists i\), s.t., \(\langle s_{i0}, s_{i1}, \ldots , s_{im}\rangle \) is polynomial-time distinguishable \(\forall m\). Since \(Z_j=C^j(K,V)\), after a number of iterations from the \(\langle key, IV\rangle \) loading phase, say, \(\alpha , Z_\alpha \) is a function of \(\langle s_{i0}, s_{i1}, \ldots , s_{i\alpha }\rangle \). Since \(\langle s_{i0}, s_{i1}, \ldots , s_{i\alpha }\rangle \) is not pseudo-random, after \(\alpha \) iterations, we can polynomial-time distinguish \(Z_0, Z_1, \ldots , Z_\alpha \) from a random sequence as \(Z_j=f(s_{kj}|k=0,\ldots )\) is a polynomial, for stream ciphers. A contradiction. Hence, after large enough values of \(m (=\alpha ), \langle s_{i0}, s_{i1}, \ldots , s_{im}\rangle \) is pseudo-random.

It can be mentioned that there may be unused state bits in the algorithm which are directly or indirectly never used in computation of key streams. These bits can be clearly ignored. Thus, X registers are not required in our attack. \(\square \)

Theorem 2

N is a characteristic of a crypto-algorithm measuring maximum number of iterations the cipher algorithm to be run so that each internal register (consisting of K and V only) have distinct state sequence over the N iterations for a fixed \(\langle key, IV\rangle \) pair. N exists for good crypto-systems.

Proof

\(\forall i\) and large enough \(m, \langle s_{i0}, s_{i1}, \ldots , s_{im}\rangle \) is pseudo-random for good crypt-systems. Let, \(\beta \) = \(max_i(m), i=0,1,\ldots , n-1\). Then, the probability that after \(m \rangle \beta \) iterations, \(T = \{\langle s_{00}, s_{01}, \ldots , s_{0m}\rangle ,\) \(\langle s_{10}, s_{11}, \ldots , s_{1m}\rangle ,\) \(\ldots , \langle s_{(n-1)0}, s_{(n-1)1}, \ldots , s_{(n-1)m}\rangle \}\) will each generate distinct m-bit Boolean sequences is given by,

$$\begin{aligned} Pr(U_m)= & {} \frac{(2^m-1)(2^m-2)\cdots (2^m-n+1)}{2^{m(n-1)}} \\= & {} \left( 1-\frac{1}{2^m}\right) \left( 1-\frac{2}{2^m}\right) \cdots \left( 1-\frac{n-1}{2^m}\right) \end{aligned}$$

Clearly, \(1-Pr(U_m) \langle \epsilon \), for some large value of m and however small \(\epsilon \). That is, this uniqueness is guaranteed. Hence, for any good crypto-system, \(\exists N\), for which, \(Pr(U_m)= 1\).

There may exist some \(\langle key, IV\rangle \) pairs that do not allow this uniqueness which are ideally not admissible in a good crypto-system. We call such pairs, weak pair. Though possible practically, such pairs should be few for any practical crypto-system. A random choice of \(\langle key,IV\rangle \) pair avoids using weak pairs and thus reaching uniqueness with high probability.

Theorem 3

The number of iterations the chip should be run to reach unique state is, \(log_2n \le M \le N\).

Proof

It is proved that eventually n equivalence classes can be reached. The maximum number of iterations the cipher is to be simulated to reach this uniqueness is by definition N. Also, m iterations produces \(\frac{n}{2^m}\) equivalence classes. Hence, minimum number of iterations required to reach uniqueness is, \(log_2n\).

Theorem 4

The average number of iterations required to reach n equivalence classes is,

$$\begin{aligned} \Sigma _{m>log_2n}m\frac{(2^m-1)(2^m-2)\cdots (2^m-n+1)}{2^{m(n-1)}}. \end{aligned}$$

Proof

It follows directly from the value of \(Pr(U_m)\) (Theorem 2).

2.1.2 Breaking the system

Once, we have obtained \(\Pi ^{-1}\), we know,

$$\begin{aligned} R=\Pi ^{-1}(P). \end{aligned}$$

If it is obtained after M rounds of operation, we know, \(Q_M\) and thus, due to the bijection, \(S_M\). But,

$$\begin{aligned} S_M=Round^M(S_0) \end{aligned}$$

for that particular secret key and IV which is embedded into the crypto-device. Hence,

$$\begin{aligned} S_0=Round^{-M}(S_M)=\left( Round^{-1}(S_M)\right) ^M. \end{aligned}$$

If the cipher operation is reversible, i.e., \(Round^{-1}\) is known, we get after M inversions \(S_0=\langle k,v,x\rangle \), from which we get the secret key, k. If, however, \(Round^{-1}\) is not known, since

$$\begin{aligned} S_{M+1}=Round(S_M) \end{aligned}$$

we can always predict the following output key streams from this knowledge of internal state, thus breaking the system.

2.2 Scan attack

In this subsection, we present a general methodology to attack any stream cipher using the knowledge of output of scan chain side channel.

It is a general method. It differs from the previously published scan attacks on ciphers in that it does not use any knowledge of the design of the cipher. For example, in [1], to obtain the correspondence between scan chain positions and the internal registers, the adversary chooses a particular \(\langle key,IV\rangle \) pair to input to the chip (Fig. 1) exploiting the design of Trivium. In this proposed methodology, the adversary does not need to know the design of the cipher. In this procedure, the cipher is treated as a black box having interface like Fig. 1. A random \(\langle key, IV\rangle \) pair is chosen matching certain criteria as described in step 1 later. This random pair is utilized to obtain the correspondence between scan chain positions and the internal registers of the chip.

The general scan-based side channel attack is based upon the observation that the state register bits of a stream cipher are each expected to be in different states over a time period. For example, if a cipher has states \(r_0, r_1 \ldots , r_{n-1}\), over a run of m cycles, \(r_0, r_1, \ldots , r_{n-1}\) are expected have gone through different m length Boolean sequences. In other words, \(r_0 , r_1 \ldots , r_{n-1}\) are expected to be (possibly weak) pseudo-random sequence generators. Otherwise, if \(r_0\) and \(r_1\) hold same states always, at least one of \(r_0\) or \(r_1\) may be removed from the design without reducing the security of the cipher. Clearly, this is true irrespective of the key and initialization vector used in the cipher. However, weak pairs may exist. But,

  • It is highly unlikely that the distinction be not made for large m (Theorem 2), owing to the pseudo-randomness described earlier. In fact, as we have shown N exists for all crypto-systems.

  • There is always a key, IV combination that may distinguish between two state registers. Otherwise, those two states may be merged to one or one state may be ignored.

The scan-based attack can, therefore, be framed on any stream cipher through the following two steps:

  • Step 1—Preprocessing phase (\(\Pi ^{-1}\)) The first step of our attack is done offline both in software and in hardware with the chip to be analyzed. It consists of three subphases. The first phase requires the knowledge of the cipher algorithm. The second phase uses a hardware chip with the interface given in Fig. 1. This chip has the same scan chain configuration as the one the adversary is going to break.

    • Simulation phase In this step, we simulate the cipher in software. The cipher is initialized with any specific key and initialization vector. It is then run for m cycles. The Boolean sequences generated by the state registers \(r_0, r_1, \ldots , r_{n-1}\) of the cipher are noted in a table. If the sequences generated by \(r_0, r_1, \ldots , r_{n-1}\) are unique (all different from others), we have the Boolean sequences to distinguish among the state register bits of the stream cipher. On the other hand, if, the sequences generated are not unique, we change the key, initialization vector combination or increase number of rounds of simulation until M to obtain unique sequences (since, N is not known). Thus, at the end of this phase, we have

      • A key and an initialization vector.

      • Number of rounds the cipher is to be run, m.

      • The unique Boolean sequences for \(r_0, r_1, \ldots , r_{n-1}\) over the m cycles of operation, which is a \(n~\times ~m\) table, \(T_1\). The ith column of \(T_1\) denotes the Boolean string generated by \(r_i\) over m cycles of operation.

    • Online phase During this phase, the adversary has access to the hardware of the algorithm and the scan chain. Due to the interface (Fig. 1) assumed, she can input any key and IV pair to the cipher. In this phase, we start to run the hardware of the cipher with the key and initialization vector for m cycles noted during the previous phase. We scan out full states of the cipher for the m cycles of its execution. The scanned-out bit sequences are noted in a table over the m cycles, with ith scanned-out bit in column i over the m cycles. Thus, we create another \(n~\times ~m\) table, \(T_2\) with ith column denoting the output of the i-th scanned-out bit position for the m cycles of operation.

    • Deduction phase This phase finds the correspondence between scan chain bits and cipher state bits. In this phase, we match each column of \(T_2\), with \(T_1\), thus obtaining the correspondence of scan chain bits with cipher state bits. For example, if i-th column of \(T_2\) matches with j-th column of \(T_1\), scan chain bit position i, \(p_i\) corresponds to cipher state bit \(r_j\). Hence, after this step, we obtain the full correspondence of the scan chain bits with the state bits of the cipher.

    Thus, after this state, we have obtained the correspondence between internal states of the cipher and the scan-output positions (\(\Pi ^{-1}\)). The complexity of the attack is determined by the number of state bits of the cipher, n and the number of rounds the cipher needs to be simulated to reach at unique sequences, m. Let \(t_s\) denote the time required to simulate each round of the cipher and \(t_o\) denote the execution time of each round of the cipher including its full state scan-out. Then, the time required to obtain the full correspondence is, \(O(t_s \times m + t_o \times m + nlogn)\). The number of restarts required during the entire process is, m. Clearly, the whole process is practically executable.

  • Step 2—Actual attack We now are given a crypto-chip having the same scan chain configuration as before with a secret key and public IV input to it. Since the adversary knows \(\Pi ^{-1}\) for this chip, she can obtain the internal state of the cipher after any number of iterations. Hence, the content of R after m cycles, \(S_m\) is known. If the internal state is invertible, after m inversions, we get back \(\langle k,iv,x\rangle \) and thus, the secret key. If, however, the inversion operation is not known, we can compute \(S_{m+1}=Round(S_m)\), therefore, computing the future key streams and essentially breaking the system.

Note that here we have considered only a single scan chain. An implementation might contain multiple scan chains. If the scan chains are independent, we choose any one among the scan chains and perform the attack as before. On the other hand, if the scan chains are dependent, it becomes hard to determine actual values of the scan flip-flops. For example, if 4 scan chains are compacted through a XOR as the final scan output, we need to know exact values of a set of scan chains connected to all the scan flip-flops. From the output of the compacted scan chain, we will guess one of the \(2^4\) possible values of the 4 scan chains. The attack procedure will then consider one scan chain, ignore others and follow the algorithm described above to determine the correspondence. This incurs exponential number of operations due to \(2^4\) guesses per step.

Table 1 Boolean sequence of MICKEY-128 2.0 state bits

If the crypto-system is embedded in a larger implementation, this attack can be applied in a similar way if the input–output characteristics of the whole system are known. In this case, we can apply our algorithm in a straightforward manner. Note that our algorithm treats the system under investigation as a black box and only needs to know the input–output characteristics of the system. Hence, the algorithm remains the same, while the complexity is determined by the number of scan flip-flops in the entire implementation.

The attack is practical for crypto-systems when n is practical number of iterations the system can be run. The offline simulations are performed in software. Hence, as n-iterations is practical, the offline simulations can be performed in few minutes. The online phase requires n cycles of operation of the system. It also requires an unprotected scan chain access of the system. The scan chains may be multiple in number and compacted as described earlier.

3 Scan attack on hardware based eStream winners

In this section, we describe applications of the proposed general strategy on the hardware efficient finalists of eStream, MICKEY-128 2.0, Trivium and Grain-128.

3.1 Scan attack against MICKEY-128 2.0

MICKEY-128 2.0 is an eStream winner. To the best of our knowledge, almost no reasonable cryptanalysis or side channel attacks except fault attack are known against MICKEY-128 2.0. The cipher is designed using two registers, linear and nonlinear, with clock-based updates. It can be mentioned that the cipher does not use shift registers in its design. The detailed specification of MICKEY-128 2.0 may be found in [4].

Scan-based side channel attack using the general strategy is successfully applied to MICKEY-128 2.0. The key and IV used are both 1 for all positions. We used 128 bit IV. With only 18 cycles of operation, after initialization, the state bits of MICKEY-128 2.0 may be distinguished and thus correspondence of the scan chain bits and the cipher state bits may be obtained in 18 cycles of operation only. The total time required for the attack is, \(O(t_s \times 18 + t_o \times 18 + 320\mathrm{log}320)\) which is quite low and practically implementable in a few minutes. The Boolean sequences generated by the r and s registers over the period of 18 cycles are given in Table 1. As knowing the full internal state of a cipher at any instant of operation lends the attacker to predict keystream bits, it can be claimed that the general strategy for scan attack works against MICKEY-128 2.0.

3.2 Scan attack on Trivium

Trivium is also one of the eStream winner stream cipher. The detailed description of Trivium may be obtained in [6]. A number of cryptanalysis and side channel attacks are known against Trivium. Most of the attacks on Trivium are on its reduced round versions. A scan-based side channel attack on Trivium is depicted on [21]. The proposed attack in [21] takes \(4~\times ~288\) cycles to obtain the full correspondence of the scan chain bits with the state register bits. Our attack based on the general strategy takes only 108 cycles of operation to retrieve full internal state of Trivium. The key and IV used are both set to 1 for all positions. The correspondence of state bits with the Boolean sequence simulated is shown in Table 2. The complexity of the attack is, \(O(t_s \times 108 + t_o \times 108 + 288\mathrm{log}288)\), which again takes only few minutes to realize in practice. We have here used states \(s_0\) to \(s_{287}\) instead of \(s_1\) to \(s_{288}\) used in the original specification of Trivium. In 108 cycles full internal state of Trivium may be revealed. Since, Trivium takes 1152 rounds to initialize, the attack takes place within the initialization rounds. After the full internal state of Trivium is known, due to the reversibility of its round operations depicted in [21] we are able to revert back to the initial configuration of the cipher, therefore, obtaining its secret key. Thus, the general strategy of scan attack also works well on Trivium.

Table 2 Sample Boolean sequences of Trivium

3.3 Scan attack on Grain-128

Grain-128 is also an eStream winner. Till date, a moderate number of cryptanalysis are reported against Grain-128 ( [5] etc.). The cipher is designed using two registers, linear and nonlinear left shift registers. The most least significant bits \((s_{127}\) and \(b_{127})\) are updated by a linear and a nonlinear feedback per cycle of operation, respectively. The keystream is output based on a nonlinear mixing of internal state bits of this cipher. During initialization, the keystream bit is not output but is XOR-ed with both linear and nonlinear feedbacks. A detailed specification of Grain-128 may be found in [8].

Scan-based side channel attack using the general strategy can be applied to Grain-128. The key and IV used are both 1 for all positions. We used 128 bit key and 96 bit IV. In only 167 cycles of operation, the full internal state of Grain-128 may be distinguished. Thus correspondence of the scan chain bits, and the cipher state bits may be obtained in 167 cycles of operation only. It may be noted that this lies within its initialization phase. The total time required for the attack is, \(O(t_s \times 167 + t_o \times 167 + 256\mathrm{log}256)\) which is quite low and practically implementable in a few minutes. The Boolean sequences generated by the b and s registers over the period of 167 cycles are given in Table 3. Once, the full internal state of Grain is known, we should be able to revert iterations of Grain-128 due to its reversibility property suggested in [5] to retrieve back its original key. Thus, scan attack succeeds against Grain in practical scenario.

3.4 Performance

All the three crypto-systems may thus be seen to be extremely vulnerable against scan-based side channel attack methodology proposed in the previous section. In Table 4, we summarize the cost of our proposed attack on the three stream ciphers. Table 4 also compares the performance of the present attack with the existing attacks in terms of number of cycles of operations required. It can be mentioned that only Trivium has a reported scan attack.

In [19], the authors present a differential scan attack on advanced DFT scan structures. They have shown a differential scan attack with multiple scan chains on state-of-the-art ciphers. Our presentation in this paper is lacking the analysis of the attack for multiple scan chains, but our attack on single scan chain is better than this result in that we do not assume any knowledge of the cipher being attacked and also the complexity of our attack is much less in terms of number of restarts required and the simulations and online time required. It can be stated that a differential scan attack using our strategy is possible for multiple scan chains.

Table 3 Sample Boolean sequences of Grain-128

4 Countermeasure

In this section, we introduce a countermeasure against scan attack on stream ciphers. This method may be extended to other cryptographic systems such as block ciphers and hash functions without any algorithm specific changes.

Our countermeasure assumes that data may be scanned out in normal mode and test mode. In both modes, the countermeasure strategy remains the same. Our strategy works on scan chain scrambling. We generate a scrambled sequence from the original sequence to hide the actual data inside the chip-registers. The actual scan output \(c_i\) is given by

$$\begin{aligned} c_i = r_i + f \end{aligned}$$

where \(r_i\) is the actual chip-register, + is Boolean XOR and f is a cryptographically robust nonlinear random bit sequence generator. A schematic diagram of the design is given in Fig. 2. We discuss the design and security of f in this section.

Table 4 Comparison of attack complexity
Fig. 2
figure 2

A secure scan chain design

Note that this obfuscation does not obstruct scan chain functionality of the chip. A valid user has knowledge of f and its input, so, she can obtain \(r_i\) by the equation, \(r_i = c_i + f\). However, an adversary though knows f does not know its input, therefore, essentially obtains a scrambled scan chain output.

Cellular automata (CA) is shown to posses good pseudo-random properties [23]. It is also shown to possess robust cryptographic properties [13]. CA also shows regular and compact design in hardware [14]. For these advantages, we have used CA for the design of f. Before discussing design of f, we briefly discuss basics of CA here.

4.1 Cellular automata

Definition 1

Cellular automata: a cellular automaton is a finite array of cells. Each cell is a finite state machine \(C = (Q, f_c)\) where Q is a finite set of states and \(f_c\) a mapping \(f_c : Q^n \rightarrow Q\). The mapping \(f_c\) called local transition function. n is the number of cells the local transition function depends on. On each iteration of the CA each cell of the CA updates itself with respective \(f_c\). Dimension of the cell array is called the dimension of the CA. Adjacent cells of a cell are called the neighborhood of the CA.

The number of neighboring cells \(f_c\) depends on may be same or different on different directions of the automaton. \(f_c\) may be same or different for cells across the automaton.

Definition 2

Rule: the local transition function for a 3-neighborhood CA cell can be expressed as follows:

$$\begin{aligned} q_i(t + 1) = f_c[q_i(t), q_{i+1}(t), q_{i-1}(t)] \end{aligned}$$

where \(f_c\) denotes the local transition function realized with a combinational logic and is known as a rule of CA cell [7]. The decimal value of the truth table of the local transition function is defined as the rule number of the CA.

For example, 3-neighborhood CA,

Rule 30: \(f_i = q_{i-1}(t) \oplus (q_{i+1}(t)+q_i(t))\), where + is the Boolean OR operator and \(\oplus \) is the Boolean XOR operator. Similarly,

Rule 60: \(f_i = q_{i-1}(t) \oplus q_i(t)\).

Rule 90: \(f_i = q_{i-1}(t) \oplus q_{i+1}(t)\).

Table 5 ANF of CA rules used in ruleset 5

4.2 Design

f is a random sequence generator which is XOR-ed with the actual scan out sequence \(r_i\) to generate the actual scan out sequence \(c_i\). The random sequence generator f is constructed using a hybrid nonlinear cellular automata. A linear CA lacks cryptographic properties like nonlinearity, algebraic degree and hence can be cryptanalyzed [18]. A nonlinear CA is weak in resiliency. It is shown in [13] that hybridizing linear and nonlinear rules to form CA creates cryptographically robust CA having good cryptographic properties like, balancedness, resiliency, nonlinearity and also have good d-monomial characteristics [20]. Hence, we have hybridized linear rules, \(L_1=\{60,90,150,240\}\) and nonlinear rules, \(L_2= \{30, 120,\) \( 180, 210\}\) to choose the hybrid CA used in f. The CA is exactly ruleset 5 CA presented in [13].

Our countermeasure function f is designed as follows:

  • It is a CA consisting of cells with rules \(30, 60, 90, 120, 150, 180, 210, 240\) spaced alternatively. The actual functions these rules operate on are given in Table 5.

  • We use a null-boundary CA of length 16 for this design.

  • The input to the CA is 14 random bit positions from the cipher registers and two Boolean 1s. Boolean 1s are input to the cell operating with rule 30; rest 14 random positions are connected to the other CA cells of the hybrid CA.

  • Before generating the actual scan_out sequence, CA is run for one cycle and output of cell 9 is XOR-ed with the internal scan output. Cells of the CA are numbered from left to right starting at 1. So cell 9 is the middle cell of the 16 cell CA.

Evidently, this design may be adapted to crypto-systems other than stream ciphers.

We present the cryptographic properties of the hybrid CA in the following subsection.

4.2.1 Cryptographic Properties of f

The nonlinear hybrid CA used in construction of f is studied in [13], and it is called ruleset 5 by the authors. We briefly reproduce the characteristics here.

Ruleset 5 is tested over three iterations for cryptographic properties like, balancedness, nonlinearity, resiliency and algebraic degree. It is also tested against d-monomial test [20]. The results are tabulated in Tables 6 and 7.

Table 6 Cryptographic properties of ruleset 5
Table 7 d-Monomial characteristics of hybrid ruleset 5 CA [13]
Table 8 Hardware performance of f

It can be seen that over all the iterations, the CA generates balanced output and has a fast nonlinearity growth. Resiliency of the CA is constant, and it has good algebraic degree growth rate. Also, results of d-monomial test are satisfactory.

Table 9 Comparison with other schemes

4.3 Hardware requirement

The hardware requirement of the design is very small. The countermeasure function is synthesized in Xillinx 8.1. In Table 8, we show the area requirement and throughput of f. We have not included the comparison with the countermeasure with the crypto-core implemented in the design, since the design is varied with different crypto-algorithms. But an experiment with Grain-128 is performed where the overhead of the implementation of the countermeasure is merely about \(5\%\). It is seen that very small hardware overhead is added to the cipher design. Also the throughput shows that it is operable with high frequency.

4.4 Security and fault coverage

Consider a stream cipher implementation based on n internal registers. Hence, obtaining the exact 16 bit positions used in the CA will have a complexity of \(n \atopwithdelims ()16\) which is \(\Omega (2^{n})\). Clearly, as the scrambled output is scanned out, the complexity of knowing the exact state of the cipher remains \(2^{n}\). Also, since the actual bit positions used in the hybrid CA are not known, no particular scan in sequence may give us important information. It may thus be claimed that the countermeasure is secure.

The function f takes input from the internal states of the cipher, which is unknown and scrambles the output of the scan chain. So security is ensured as discussed above. To show fault coverage, since the function f is deterministic, when a known value is fed to the internal state of the cipher, any faulty scan flip-flop will be discovered \(100\%\) of time because the tester knows the function f, the internal states and the scan-chain-ordering. This is obvious as \(scan\_out \oplus f = I\), where I is the internal state. Now from the scan-chain-ordering, exact faulty scan flip-flops can be identified as if there is no scrambling. We have verified this assertion experimentally.

Below (Table 9) we compare our strategy of countermeasure with some of the existing countermeasures. As the countermeasure, we have presented is nothing, but a scrambling of the scan output and its security and fault coverage is expected to be identical to any other scrambling technique; however, the advantage here is on the hardware overhead. Only 16 single bit registers are added to the design and minimal combinational logic, so in terms of hardware overhead, this design is optimal.

The countermeasure presented in [1] can be attacked as by setting all registers to 1, and the adversary knows the positions of the XORs.

It can be noted that the registers of f may be included in the scan chain. As far as the scan-output is scrambled, the adversary has no advantage.

5 Conclusion

In this paper, we have shown that if the state-of-the-art testable design techniques can be employed for cryptographic hardware, then the design is vulnerable against scan-based side channel attack. We also propose a methodology by which internal state of any stream cipher may be extracted using scan chains in realistic time. The proposed strategy is successfully employed on hardware efficient finalists of eStream, MICKEY-128 2.0, Trivium and Grain-128. The attack is practically mountable as it works within few cycles of operations of the ciphers. We have also proposed a countermeasure to prevent the presented attack using cellular automata.