1 Introduction

As it was shown before [15], fault countermeasures often lower the implementation resistance against side-channel attacks. Therefore, it is necessary to consider these two classes of physical attacks together when protecting the algorithm. In this chapter, we show how to automatically construct a code-based countermeasure that significantly reduces the success of a fault injection attack while keeping low information leakage via side-channels.

There are two main countermeasure classes to protect implementations against side-channel attacks. Masking [9] is a software-level countermeasure which tries to “mask” the relationship between the intermediate values and power leakage. Hiding [20] tries to reduce the signal and increase noise by utilizing various techniques—it “hides” the operations performed by the device. While masking can make fault attacks more challenging, it does not help to prevent them. On the other hand, some hiding techniques, such as dual-rail precharge logic (DPL), help in preventing fault attacks by detecting faults [18].

In 2011, DPL was extended to software by Hoogvorst et al. [10], by using balanced encoding schemes. Since then, there were several other proposals [6, 13, 14, 19], all of them using various coding techniques to prevent side-channel leakage. However, it was shown that unlike hardware DPL representation, its software counterpart is not fault resistant by default [4]. Therefore, to prevent both attack techniques, it is necessary to design the coding scheme from the beginning with this goal in mind.

In this chapter, we focus on approach presented in [2]. We first explain the theoretical background necessary for designing software hiding countermeasures that are resistant to both side-channel and fault attacks. We provide an algorithm to automatically find optimal codes for various code distances and number of codewords with given code length. We also provide evaluation of the codes—by using detection and correction probabilities and by an automated fault simulator. This simulation is done by using a general-purpose microcontroller implementation and an instruction set simulator that is capable of injecting different fault models into any instruction of the code. Our evaluation shows that the codes generated by our algorithm provide a high security level with respect to both side-channel and fault attacks.

The rest of the chapter is organized as follows: Section 7.2 provides an overview of the related work in this field, together with necessary background on coding theory. Section 7.3 defines the properties of codes with respect to fault attacks. Section 7.4 details our algorithm and provides estimated and simulated results on chosen codes. These results are further discussed in Sect. 7.5. Finally, Sect. 7.6 summarizes this chapter.

2 General Background

In this section we provide a necessary background on software encoding-based side-channel countermeasures and on coding theory necessary for developing a combined countermeasure. Section 7.2.1 overviews the related work in the field. Section 7.2.2 provides basic definitions that are used later in this chapter.

2.1 Related Work

After the paper by Hoogvorst et al. [10], who presented a method to extend the DPL to software implementations, several works were published in the area of software hiding schemes.

Rauzy et al. [14] developed a scheme that encodes the data by using bit-slicing, where only one bit of information is processed at a time. They claim that this kind of protection is 250 times more resistant to power analysis attacks compared to the unprotected implementation, while being 3 times slower. For testing, they used PRESENT cipher, running on an 8-bit microcontroller.

Chen et al. [6] proposed an encoding scheme that adds a complementary bit to each bit of the processed data, resulting in a constant Hamming weight code. Their countermeasure was implemented on a Prince cipher, using an 8-bit microcontroller.

Servant et al. [19] introduced a constant weight implementation for AES, by using a (3,6) code. To improve the performance, they split 8-bit variables into two 4 bit words and encode them separately. This implementation was also capable of detecting faults with 93.75% probability. Their implementation used a 16-bit microcontroller.

Maghrebi et al. [13] proposed an encoding scheme that differs from the previous proposals. For their case, they did not assume the Hamming weight leakage model for register bits; therefore, they concluded that balanced codes might not be the optimal ones to use generally. In their method, they first obtain the profile of a device to get a vector of register bit leakages. Then they estimate leakage values for each codeword and build a code by using codewords with the lowest leakage. Their algorithm selects the optimal code by ranking the codes based on the difference in power consumption between the codewords and on the power consumption variance. Our algorithm extends this idea by adding the variance of register bits in order to achieve better leakage characteristics and by adding conditions for error detection and correction.

In general, none of the previous schemes has been designed for fault resistance. Schemes proposed in [6, 14] have been analyzed with respect to fault attacks by Breier et al. [4], concluding that without additional modifications to assembly code, the probability of a successful fault attack is non-negligible. Therefore, in this chapter we focus on design and automated generation of fault tolerant and side-channel resistant coding schemes.

When it comes to combined countermeasures, in [17], Schneider et al. proposed a hardware countermeasure based on combining threshold implementation with linear codes. As stated in the paper, their proposal is not considered for software targets. In the execution process, there are multiple checking steps that protect the implementation against faults. However, in software, it would be easy to overcome such checks by multiple fault injections [21]. Also, it would be possible to inject faults that are impossible with hardware implementations, such as instruction skips [3].

This chapter provides the reader with the following information:

  • We specify theoretical bounds for encoding schemes with respect to fault attacks that are necessary to be taken into account when designing a fault resistant scheme.

  • We show how to automatically design a code that is capable of protecting the implementation against side-channel and fault attacks and we show trade-offs between these two resistances.

  • We adopt the ranking algorithm proposed in [13] and show how to improve it for constructing side-channel resistant codes with better properties—by ranking the codes according to the codeword with the highest leakage, and by calculating the register bit variance. We add the conditions for selecting the codes with the desired error-detection/correction capabilities in an automated way.

  • We analyze the codes constructed by the code generation algorithm—we calculate leakages, fault detection, and correction probabilities, and we simulate the assembly code implementing the codes on a general-purpose microcontroller.

2.2 Coding Theory Background

A binary code, denoted by \({{\mathcal C}}\), is a subset of the n-dimensional vector space over \({{\mathbb F}}_2\)-\({{\mathbb F}}_2^n\), where n is called the length of the code \({{\mathcal C}}\). Each element \({\boldsymbol c}\in {{\mathcal C}}\) is called a codeword in \({{\mathcal C}}\) and each element \({{\boldsymbol x}}\in {{\mathbb F}}_2^n\) is called a word [11, p. 6]. Take two codewords \({\boldsymbol c},{\boldsymbol c}'\in {{\mathcal C}}\), the Hamming distance between c and c , denoted by \({\mathrm {dis}}\left ({\boldsymbol c},{\boldsymbol c}'\right )\), is defined to be the number of places at which c and c differ [11, p. 9]. More precisely, if c = c 1 c 2c n and \({\boldsymbol c}'=c^{\prime }_1c^{\prime }_2\dots c^{\prime }_n\), then

$$\displaystyle \begin{aligned} {\mathrm{dis}}\left({\boldsymbol c},{\boldsymbol c}'\right)=\sum_{i=1}^n{\mathrm{dis}}\left(c_i,c^{\prime}_i\right), \end{aligned}$$

where c i and \(c^{\prime }_i\) are treated as binary words of length 1 and hence

$$\displaystyle \begin{aligned} {\mathrm{dis}}\left(c_i,c^{\prime}_i\right)= \begin{cases} 1&\text{if }c_i\neq c^{\prime}_i\\ 0&\text{if }c_i=c^{\prime}_i \end{cases}. \end{aligned}$$

Furthermore, for a binary code \({{\mathcal C}}\), the (minimum) distance of \({{\mathcal C}}\), denoted by \({\mathrm {dis}}\left ({{\mathcal C}}\right )\), is [11, p. 11]

$$\displaystyle \begin{aligned} {\mathrm{dis}}\left({{\mathcal C}}\right)=\min\{{\mathrm{dis}}\left({\boldsymbol c},{\boldsymbol c}'\right):{\boldsymbol c},{\boldsymbol c}'\in{{\mathcal C}},{\boldsymbol c}\neq{\boldsymbol c}'\}. \end{aligned}$$

Definition 7.1 ([7, p. 75])

For a binary code \({{\mathcal C}}\) of length n, \({\mathrm {dis}}\left ({{\mathcal C}}\right )=d\), let \(M=|{{\mathcal C}}|\) denote the number of codewords in \({{\mathcal C}}\). Then \({{\mathcal C}}\) is called an (n, M, d)-binary code.

This minimum distance of a binary code is closely related to the error-detection and error-correction capabilities of \({{\mathcal C}}\).

Definition 7.2 ([11, p. 12])

Let u be a positive integer. \({{\mathcal C}}\) is said to be u-error-detecting if, whenever there is at least one but at most u errors that occur in a codeword in \({{\mathcal C}}\), the resulting word is not in \({{\mathcal C}}\).

From the definition, it is easy to prove that \({{\mathcal C}}\) is u-error-detecting if and only if \({\mathrm {dis}}\left ({{\mathcal C}}\right )\geq u+1\) [11, p. 12]. A common decoding method that is used is nearest neighbor decoding, which decodes a word \({{\boldsymbol x}}\in {{\mathbb F}}_2^n\) to the codeword c x such that

$$\displaystyle \begin{aligned} {\mathrm{dis}}\left({{\boldsymbol x}},{\boldsymbol c}_{{\boldsymbol x}}\right)=\min_{{\boldsymbol c}\in{{\mathcal C}}}{\mathrm{dis}}\left({{\boldsymbol x}},{\boldsymbol c}\right). \end{aligned} $$
(7.1)

When there are more codewords c x satisfies (7.1), the incomplete decoding rule requires a retransmission [11, p. 10].

Definition 7.3 ([11, p. 13])

Let v be a positive integer. \({{\mathcal C}}\) is v-error correcting if minimum distance decoding with incomplete decoding rule is applied, v or fewer errors can be corrected.

Remark 7.1

\({{\mathcal C}}\) is v-error correcting if and only if \({\mathrm {dis}}\left (C\right )\geq 2v+1\) [11, p. 13].

Definition 7.4 ([8])

An (n, M, d)-binary code \({{\mathcal C}}\) is called an equidistant code if \(\forall {\boldsymbol c},{\boldsymbol c}'\in {{\mathcal C}},\ {\mathrm {dis}}\left ({\boldsymbol c},{\boldsymbol c}'\right )={\mathrm {dis}}\left ({{\mathcal C}}\right )\).

For our purpose, we will use binary code for protecting the underlying implementation.

We propose two choices of look-up tables:

  1. 1.

    Correction Table: This table will treat a word \({{\boldsymbol x}}\in {{\mathbb F}}_2^n\) the same as the codeword \({\boldsymbol c}_{{\boldsymbol x}}\in {{\mathcal C}}\) which satisfies \({\mathrm {dis}}\left ({\boldsymbol c}_{{\boldsymbol x}},{{\boldsymbol x}}\right )\leq \lfloor \frac {d-1}{2}\rfloor \), where d is the distance of \({{\mathcal C}}\). Note that this is equivalent to using bounded distance decoding [12, p. 36] and taking the bounded distance to be \(\lfloor \frac {d-1}{2}\rfloor \). To use this table we require that \({\mathrm {dis}}\left ({{\mathcal C}}\right )\geq 3\).

  2. 2.

    Detection Table: This is a normal look-up table that returns a null value when \({{\boldsymbol x}}\notin {{\mathcal C}}\) is accessed.

We will give a theoretical criterion to measure the bit flip fault resistant capability of a binary code when it is used as an encoding countermeasure against fault injection attacks in Sect. 7.3. Afterwards we propose three coding schemes. The encoding scheme will be simulated (and implemented) and evaluated in Sect. 7.4.

Let m be a positive integer such that 1 ≤ m ≤ n, where n is the code length.

Definition 7.5

An m-bit fault is a fault injected in the codeword that flips exactly m bits. We assume each bit has equal probability to be flipped.

Definition 7.6

When the fault is analyzed, we adopt the following terminologies:

  • Corrected: Fault is detected and corrected.

  • Null: Fault is detected and results into zero output.

  • Invalid: Fault is detected and results into an output that is not a codeword.

  • Valid: Fault is not detected and fault injection is successful, i.e., it results in the output of a valid but incorrect codeword.

3 Theoretical Analysis

In this section we will first give the theoretical analysis for the fault resistant capabilities of binary code in general. Then we propose two different coding schemes and analyze their fault resistant probabilities.

3.1 Correction Table

Definition 7.7

For an (n, M, d)-binary code \({{\mathcal C}}\) such that d ≥ 3, let

$$\displaystyle \begin{aligned} F_{{\boldsymbol c},m}:=\left\{{{\boldsymbol x}}\in{{\mathbb F}}_2^n:{\mathrm{dis}}\left({\boldsymbol c},{{\boldsymbol x}}\right)=m\text{ and }\exists{\boldsymbol c}'\in{{\mathcal C}}\text{ such that }{\mathrm{dis}}\left({{\boldsymbol x}},{\boldsymbol c}'\right)\leq\left\lfloor\frac{d-1}{2}\right\rfloor\right\}. \end{aligned}$$

Then

$$\displaystyle \begin{aligned} p_{m,(e)}:= \begin{cases} 1&m\leq\lfloor\frac{d-1}{2}\rfloor\\ 1-\frac{1}{M\binom{n}{m}}\sum_{{\boldsymbol c}\in{{\mathcal C}}}|F_{{\boldsymbol c},m}|&m>\lfloor\frac{d-1}{2}\rfloor \end{cases} \end{aligned} $$
(7.2)

is called the m-bit fault resistance probability with error correction for \({{\mathcal C}}\).

As mentioned earlier, when a Correction Table is used, it is equivalent to using bounded distance decoding. When \(m\leq \lfloor \frac {d-1}{2}\rfloor \) bits are flipped, by Remark 7.1, the error will be corrected and hence p m,(e) = 1. When \(m>\lfloor \frac {d-1}{2}\rfloor \) bits are flipped, the fault will be valid if the resulting word is at distance at most \(\lfloor \frac {d-1}{2}\rfloor \) from any codeword. Thus by Definition 7.6, 1 − p m,(e) gives the theoretical probability of a Valid fault and the bigger the p m,(e) is, the more resistant the binary code to m-bit fault. Furthermore, when m = 1, the fault will be corrected and most of the cases are expected to return Corrected.

Another interesting fault model is random fault, i.e., assuming there is an equal probability for m-bits fault to occur ∀1 ≤ m ≤ n. Taking this into account, we define the following.

Definition 7.8

For an (n, M, d)-binary code \({{\mathcal C}}\) such that d ≥ 3, let p m,(e) be its m-bit fault resistance probability with error for 1 ≤ m ≤ n, then

$$\displaystyle \begin{aligned} p_{{\mbox{rand}},(e)}:=\sum_{m=1}^n\frac{1}{n}p_{m,(e)}\end{aligned} $$

is called the overall resistance index with error correction for \({{\mathcal C}}\).

As suggested by the name, the bigger the p rand,(e) is, the more resistant the code \({{\mathcal C}}\) is to random faults.

3.2 Detection Table

Now we consider Detection Table.

Definition 7.9

For an (n, M, d)-binary code \({{\mathcal C}}\) such that d ≥ 2, let

$$\displaystyle \begin{aligned} S_m:=\sum_{{\boldsymbol c}\in{{\mathcal C}}}|\{{\boldsymbol c}'\in{{\mathcal C}}:{\mathrm{dis}}\left({\boldsymbol c}',{\boldsymbol c}\right)=m\}|. \end{aligned}$$

Then

$$\displaystyle \begin{aligned} p_m:=1-\frac{S_m}{M\binom{n}{m}} \end{aligned} $$
(7.3)

is called the m-bit fault resistance probability for \({{\mathcal C}}\).

When an m-bit fault is injected in the codeword, if the resulting word is not a codeword then the value will be set to Null. The only case when the fault is valid is when after m bits are flipped, the resulting word is still a codeword. Thus by Definition 7.6, 1 − p m gives the theoretical probability of a Valid fault. Hence, the bigger the p m, the better the m-fault resistance of the binary code.

Remark 7.2

When m ≤ d, no codeword is at distance m from each other and hence p m = 1.

Note that if S n = M, i.e., for each codeword \({\boldsymbol c}\in {{\mathcal C}}\), there exists a \({\boldsymbol c}'\in {{\mathcal C}}\) such that \({\mathrm {dis}}\left ({\boldsymbol c},{\boldsymbol c}'\right )=n\), then we have

$$\displaystyle \begin{aligned} p_n=1-\frac{M}{M\binom{n}{n}}=1-1=0. \end{aligned}$$

That means, for this code, n-bit fault will always be injected successfully. In view of this, we exclude these kind of codes from our selection (see Algorithm 1). In practice, n and M are the fixed known values, from Eq. (7.3), to get bigger p m the goal of choosing the code \({{\mathcal C}}\) is to make S m small. There are several ways of achieving this depending on the preference of the user:

  1. 1.

    For small values of m, make p m = 0: Choose code with a bigger minimum distance d, then p m will be 1 for more values of m. Of course, there is a limit for the minimum distance that can be achieved (see Table 7.1). This particular scheme will be discussed in Sect. 7.3.3, where it is called Detection Scheme.

    Table 7.1 Possible (n, M, d)-binary codes for n = 8, 9, 10, M = 16 and n = 8, M = 4
  2. 2.

    A certain m 0-bit fault resistance is desired: Choose code such that \(S_{m_0}=0\).

  3. 3.

    Sacrificing one m 0-bit fault resistance to achieve m-bit fault resistance for all other values of m ≠ m 0: This is possible by using equidistant codes. That is, take code such that \(|S_{m_0}|=M\). This particular scheme will be discussed in Sect. 7.3.3, where it is called Equidistant Detection Scheme.

  4. 4.

    Making all p m almost equally large: Choose \({{\mathcal C}}\) such that S m are similar for all m > d. Note that

    $$\displaystyle \begin{aligned} \sum_{m=d+1}^nS_m=2M \end{aligned}$$

    is always true.

Similar to last subsection, considering random fault, we define the following.

Algorithm 1: Ranking algorithm that chooses the code with the optimal leakage properties

Definition 7.10

For an (n, M, d)-binary code \({{\mathcal C}}\) such that d ≥ 2, let p m be its m-bit fault resistance probability for 1 ≤ m ≤ n, then

$$\displaystyle \begin{aligned} p_{{\mbox{rand}}}:=\sum_{m=1}^n\frac{1}{n}p_m \end{aligned}$$

is called the overall resistance index for \({{\mathcal C}}\).

Note that the bigger the p rand is, the more resistant the code \({{\mathcal C}}\) is to random faults.

Lemma 7.1

For an (n, M, d)-binary code \({{\mathcal C}}\) , if it is equidistant, then

$$\displaystyle \begin{aligned} p_m= \begin{cases} 1&m\neq d\\ 1-\frac{M-1}{\binom{n}{d}}&m=d \end{cases},\ \ \mathit{\mbox{ and }}\ \ p_{{\mathit{\mbox{rand}}}}=1-\frac{M-1}{\binom{n}{d}n}. \end{aligned}$$

3.3 Coding Schemes

Here we propose two different coding schemes:

  1. 1.

    Detection Scheme: Using binary code which has minimum distance at least 2.

  2. 2.

    Correction Scheme: Using binary code which has minimum distance at least 3 with error correction enabled look-up table.

Furthermore, as will be seen from the rest of this chapter, equidistant codes have different behaviors than codes that are not equidistant. Hence when equidistant codes are used, we emphasize the usage by referring to the schemes as “Equidistant detection scheme” and “Equidistant correction scheme,” respectively.

We will analyze the m-bit fault resistant probability (with error) as well as overall resistance index (with error) for each of them using (n, M, d) binary codes for n = 8, 9, 10 and M = 4, 16. We chose M = 4 because it is easy to analyze and explain, and M = 16 because it can encode one nibble of the data; therefore, it is usable in a practical scenario. To illustrate the usage of the schemes we refer the reader to Appendix 2 for calculations of the probabilities for some specific codes as examples.

First, we discuss the possible values of the minimum distance d. As is well known in coding theory, fixing the length of the code n and minimum distance d, M is upper bounded by certain value. This upper bound is tight for small values n and d and still open for a lot of other values [7, p. 247]. In particular, for n = 8, 9, 10 and different values of d we know the exact possible values of M. In return, the possible values of d are known when n, M are fixed. In Table 7.1 we list the possible minimum distances that can be achieved for n = 8, 9, 10 and M = 4 or 16. Note that the values are taken from [7, p. 247, 248] and [5].

For equidistant binary code, we have the following constraint on d.

Lemma 7.2

Let \({{\mathcal C}}\) be an (n, M, d) equidistant binary code such that M ≥ 3, then d is even.

Proof

Recall the Hamming weight of a word \({{\boldsymbol x}}\in {{\mathbb F}}_2^n\) denoted by wt(x) is defined to be the number of nonzero coordinates in x [11, p. 46]. And we have the following relation (see [11, Corollary 4.3.4 and Lemma 4.3.5]):

$$\displaystyle \begin{aligned} wt({{\boldsymbol x}})+wt({{\boldsymbol y}})\equiv{\mathrm{dis}}\left({{\boldsymbol x}},{{\boldsymbol y}}\right)~{\mathrm{mod}}~2. \end{aligned}$$

Take an (n, M, d) equidistant binary code \({{\mathcal C}}\) and any three distinct codewords \({{\boldsymbol x}},{{\boldsymbol y}},{{\boldsymbol z}}\in {{\mathcal C}}\), we have

$$\displaystyle \begin{aligned} {\mathrm{dis}}\left({{\boldsymbol x}},{{\boldsymbol y}}\right)+{\mathrm{dis}}\left({{\boldsymbol y}},{{\boldsymbol z}}\right)+{\mathrm{dis}}\left({{\boldsymbol z}},{{\boldsymbol x}}\right)\equiv2wt({{\boldsymbol x}})+2wt({{\boldsymbol y}})+2wt({{\boldsymbol z}})\equiv0~{\mathrm{mod}}~2. \end{aligned}$$

Hence, d cannot be odd.

Furthermore we have M ≤ n + 1[8]. Thus we will only consider (8, 4, 2) and (8, 4, 4) equidistant binary codes. The fact that such codes exist can be derived from [8].

4 Automated Generation and Evaluation of Codes

In this section, we will utilize the findings stated in Sect. 7.3 to design the algorithm that automatically generates codes with the optimal side-channel and fault detection properties for a given code length. First, we present the algorithm that finds the codes based on searching criteria in Sect. 7.4.1. Then we show properties of the codes that were produced by the algorithm in Sect. 7.4.2. To verify our theoretical results, we simulate fault injections into these codes, by using an automated fault simulator which will be explained in Sect. 7.4.3. Finally, we present and discuss the simulation results in Sect. 7.4.4.

4.1 Code Generation and Ranking Algorithm

When it comes to device leakage, it normally depends on the processed intermediate values. In [13], they proposed the first encoding scheme that assumed a stochastic leakage model over the Hamming weight model. In such model, leakage is formulated as follows:

$$\displaystyle \begin{aligned} T(x) = L(x) + \epsilon, \end{aligned} $$
(7.4)

where L is the leakage function mapping the deterministic intermediate value (x) processed in the register to its side-channel leakage, and 𝜖 is the (assumed) mean-free Gaussian noise. For 8-bit microcontroller case, we can specify this function as L(x) = α 0 + α 1 x 1 + ⋯α 8 x 8, where x i is the ith bit of the intermediate value, and α i is the ith bit weight leakage for specific register [16]. The α i values can be obtained by using the following equation:

$$\displaystyle \begin{aligned} \alpha = ({\mathbf{A}}^T\mathbf{A})^{-1}{\mathbf{A}}^T\mathbf{T}, \end{aligned} $$
(7.5)

where A is a matrix of intermediate values and T is a set of traces. After the device profiling which obtains the α values, we can use our ranking algorithm to select the optimal code with given inputs (Algorithm 1). Note that one can still use the Hamming weight model—for that case, α has to be defined as unity. In the following, we will explain how the algorithm works.

First, the inputs have to be specified—length (n), number of the codewords (M), minimum distance (d), and leakages of the register bits (α i). Depending on these values, the algorithm analyzes every possible set of M codewords that can be a potential code candidate. Lines 2–3 iterate over every combination of two codewords. Lines 4–6 test if the minimum distance condition is fulfilled. Then, lines 7–10 check, whether for each codeword there exists another codeword which is at distance n from it—if yes, we skip this set. This condition is necessary in order to get a code resistant against n-bit flip (we will detail such case in Sect. 7.5). Lines 11–13 compute the 3 values that are used in order to calculate the values for the whole code in the later phase: estimated power consumption for the codeword, stored in table A, estimated variance for bit leakages in the codeword, stored in table B, and the highest bit leakage value, stored in table C. Next, the codeword value is stored in the index table I.

Lines 14–16 use the values from tables A, B, C to compute the register leakage variance (μ S[x] denotes the mean leakage for a word S[x]), highest variance for bit leakages within registers, and value of the highest bit leakage within registers for the set S. These values are stored in tables D, E, F, respectively, and are used in the final evaluation.

The final evaluation is the last phase of the algorithm. First, it takes a subset of D with the best register leakage variance (μ S denotes the mean leakage for codewords in S). It narrows this subset to candidate codes with the lowest value of the highest bit leakage according to set E. From these, it chooses the code with the lowest bit leakage variance using table F.

4.2 Properties of Generated Codes

Codes with the best side-channel and fault resistance properties according to Algorithm 1 with 4 codewords and length 8 can be found in Table 7.2. Their detailed properties are stated in Table 7.3. More codes with cardinality 16 and various distances can be found in Appendix 1.

Table 7.2 Codes used in evaluation
Table 7.3 Side-channel and fault properties of the codes

For calculating the register variance, we follow the similar methodology as used in [13], together with their generated α values, but we improved their ranking algorithm by calculating the bit variances inside registers and by selecting the code which has the lowest leakage value for the highest leaking codeword. First part of Table 7.3 shows these three values, with the order of preference according to our ranking algorithm. Second part of the table shows bit fault resistance probabilities, denoted by p m for m-bit flips in the codeword, as well as overall resistance index, denoted by p rand for the code. The last part of the table shows the fault resistance probabilities with error correction, denoted by p m,(e), as well as overall resistance index with error correction, which is denoted by p rand,(e). We do not consider codes with distance 1 because such codes do not provide protection against 1-bit flips and therefore the fault protection would be very low. However, such codes can still be used for minimizing the side-channel leakage.

In general, if we aim for higher distance values, we get better detection and correction capabilities, but the side-channel leakage is higher as well. That is because if the distance is higher, it is more likely that the variance of leakage among the codewords is bigger. Also, we can see that equidistant codes have a constant detection probability of 1 except the case when number of bit flips is the same as the code distance. Moreover, if we sum up the probabilities of all the bit flip faults for non-equidistant codes, the overall detection probability is lower. However, the side-channel leakage of equidistant codes is more than 10 times higher compared to non-equidistant codes.

4.3 Automated Fault Simulation

The fault simulator we used was customized for the purpose of evaluating a microcontroller assembly table look-up implementation of the encoding schemes presented in this chapter. More details on this simulator are provided in [1]. This simulator helps us to extend the theoretical results to real-world results, where one has to use capabilities of microprocessors for computing the results.

A high-level overview is given in Fig. 7.1. There are three instructions in total—the first two LDI load the two operands into registers r0 and r1. Both of the operands are already encoded according to one of the coding schemes. The LPM instruction loads the data from the look-up table stored in the memory by using the values in r0 and r1, and the result is stored to register r2. This part works as a standard instruction set simulator. During each execution, a fault is injected into the code. For each type of fault, we test all the possible combinations of codewords, and we disturbed all the instructions in our code. We have tested the following fault models:

  • Bit faults: In this fault model, one to n bits in the destination register change its value to a complementary one.

  • Random byte faults: The random byte fault model changes random number of bits in the destination register.

  • Instruction skip: Instruction skip is a very powerful model that is capable of removing some countermeasures completely. We have tested a single instruction skip on all three instructions in the code.

  • Stuck-at fault: In this fault model, the value of the destination register changes to a certain value, usually to all zeroes. Therefore, we have tested this value in our simulator.

After the output is produced under a faulty condition, it is analyzed by the output checker, which decides on its classification. Outputs can be of four types (Corrected, Valid, Invalid, and Null), and these types are described in detail in Sect. 7.2.2.

Fig. 7.1
figure 1

Fault simulator operation overview

4.4 Simulated Results

Figure 7.2 shows plots for \({{\mathcal C}}_{8,4,min4}\) and \({{\mathcal C}}_{8,4,eq4}\), with and without the error correction. Instruction skip faults and stuck-at faults show zero success when attacking any of the generated codes. When it comes to bit flips, we can see that for better fault tolerance, one should not use the error-correction capabilities, since the properties of such codes allow changing the faulty codeword into another codeword, depending on the number of bit flips and minimum distance of the code. When deciding whether to choose an equidistant code or not, situation is the same as in Table 7.3—equidistant codes have slightly better fault detection properties, but worse side-channel leakage protection. Therefore, it depends on the implementer to choose a compromise between those two.

Fig. 7.2
figure 2

Simulation results for \({{\mathcal C}}_{8,4,eq4}\) with equidistant detection scheme in (a) and with equidistant correction scheme in (b); \({{\mathcal C}}_{8,4,min4}\) with detection scheme in (c) and with correction scheme in (d)

5 Discussion

First, we would like to explain the difference between the calculated results in Table 7.3 and the simulated results in Fig. 7.2 in equidistant code \({{\mathcal C}}_{8,4,min4}\). Table 7.3 shows theoretical results assuming that error happens before using the look-up table. However, in a real-world setting, fault can be injected at any point of the execution, including the table look-up, or even after obtaining the result from the table. That is also why there are Invalid faults, despite the table always outputs Null in case of being addressed by a word that does not correspond to any codeword. Because there are three instructions in the assembly code, faulting the destination register of the last one after returning the value from the table results into 1/3 of Invalid faults in all the cases except instruction skips.

To explain the condition on lines 7–8 of Algorithm 1, we can take the code with n = 8, M = 16, and d = 4 as an example. The simulation result for this code is stated in Fig. 7.3a. Full results for this code are then in Table 7.5 in the appendix. There are no codes with these parameters that could satisfy the abovementioned condition—all 480 codes that can be constructed have the property that if any codeword is faulted by n bit flip, it will change to other codeword. Therefore, such codes are not suitable for protecting implementations against fault attacks. For this reason, it is more suitable to use the \({{\mathcal C}}_{8,16,min3}\) code, stated in Fig. 7.3b, that does not suffer from such property.

Fig. 7.3
figure 3

Simulation results for the codes: (a) \({{\mathcal C}}_{8,16,min4}\) and (b) \({{\mathcal C}}_{8,16,min3}\)

Table 7.4 Codes generated by Algorithm 1
Table 7.5 Side-channel and fault properties of the codes from Table 7.4

To summarize the evaluation results, we point out the following findings:

  • Correction scheme is not suitable for fault tolerant implementations—while it can be helpful in non-adversary environments, where it can be statistically verified, how many bits are usually faulted, and therefore, a proper error-correction function can be specified, in adversary-based settings, one cannot estimate the attacker capabilities. In case of correcting 1-bit error, for example, attacker who can flip multiple bits will have a higher probability of producing Valid faults, compared to using detection scheme with the same code.

  • We can find an optimal code either from the fault tolerance perspective or from side-channel tolerance perspective—if we consider both, a compromise has to be made, depending on which attack is more likely to happen or how powerful an attacker can be in either setting. If we sacrifice the fault tolerance, we will normally get a code with distance 2 (e.g., side-channel resistant codes in [13] all have distance 2 and they are not equidistant codes); therefore, such codes will be vulnerable to 2-bit faults. On the other hand, by relaxing the power consumption variance condition, we will be able to choose codes with bigger distance, being able to resist higher number of bit faults.

  • Both types of resistances can be improved if we sacrifice the memory and choose codes with greater lengths.

  • Equidistant detection schemes is a good option in case the implementation can be protected against certain number of bit flips—because all the Valid faults are achieved only if the attacker flips the same number of bits as is the distance. However, this condition does not hold in case of equidistant correction schemes.

6 Chapter Summary

In this chapter, we provided a necessary background for constructing side-channel and fault attack resistant software encoding schemes. Current encoding schemes only cover side-channel resistance, and either do not discuss fault resistance or only state it as a side product of the construction, such as [19]. Our work defines theoretical bounds for fault detection and correction and provides an automated way to construct efficient codes that are capable of protecting the underlying computation against both physical attack classes.

To support our result with a practical case study, we designed an automated simulator to evaluate the table look-up operation under faulty conditions, by using a microcontroller assembly code. As expected, the codes constructed using the stated algorithm provide robust fault resistance, while keeping the side-channel leakage at the minimum.