Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Digital information is handled by electronic devices, such as smartphones or servers. Some information, such as keys, is sensitive, in the sense that it shall remain confidential. In general, information is present in three states within devices: at rest, in transit, and in computation. The protection of information at rest can be ensured by on-chip encryption in the memories. The same technique applies to the data in transit: the buses can be encrypted (e.g., in a lightweight way, in which case one uses the term scrambling). Therefore, the protection of information during computation is the big issue to be dealt with. It is a real challenge, as a computing devices inadvertently leak some information about the data they manipulate. In this context, three questions are of interest:

  1. 1.

    How does an attacker best exploit the leaked information? The situation is similar to that of a decoding problem, and one aims at finding the optimal decoder.

  2. 2.

    Second, the designer (and the end user) aim at being protected against such attacks. Their goal is thus to try and weaken the side-channel. Randomization is one option, referred to as masking in the literature. We will illustrate that it can be seen as the use of code to optimally mix some random bits into the computations, with the possibility to eventually get rid off this entropy, e.g., at the end of the computation. Another interesting usage of codes is to detect faults in circuits. This dual use of codes is of interest in general security settings, where attacks can choose to be either passive or active. It is also very relevant in the case the circuit is trapped with a Hardware Trojan Horse.

  3. 3.

    Third, it is interesting to know in which respect the circuit leakage favors or not attacks. In particular, we will investigate the effect of glitches as a threat to masking schemes.

Outline. We start with the adversarial strategies in Sect. 2. Protection strategies, especially masking, are presented in Sect. 3. We will show how the circuit itself can contribute to the attack, through the analysis of glitches, in Sect. 4. Conclusions are in Sect. 5. Eventually, Appendix A gives some computation evidences why masking protection can be seen as reducing the signal-to-noise ratio, by increasing the noise.

2 Side-Channel Analysis as a Decoding Problem

In this section, we first describe the setup and the objective of the attacker. Second, we solve the objective of the attacker in various different setups.

2.1 Setup

We assume the device manipulates some data known by the attacker, such as a plaintext or a ciphertext, called T. This data is mixed with some secret, say a key \(k^*\). The attacker manages to capture some noisy function of T and \(k^*\), and attempts to extract \(k^*\). For this purpose, he will enumerate (manageable) parts of the key (e.g., bytes), denoted k, and choose the key candidate \(\hat{k}\) which is the most likely. Therefore, the attack resembles a communication channel, where the input is \(k^*\) and the output is \(\hat{k}\). The attack is termed successful if \(\hat{k}=k^*\).

Two kinds of leakage models are realistic in practice:

  1. 1.

    direct probing model, where the attacker uses some kind of probes, each being able to measure one bit,

  2. 2.

    indirect measurement of an aggregated function of the bits, using for instance an electromagnetic probe.

These two ways of capturing the signal are, by nature, very different. They are illustrated in Fig. 1.

Fig. 1.
figure 1

Settings for side-channel analysis. In the probing model (a), a few bits (here, \(d=3\)) are measured with dedicated probes. In the bounded moments model (b), the attacker measures an integrated quantity of several bits.

The first one is noiseless. However, the bits in integrated circuits are nanometric, whereas probes are mesometric. Therefore, only few such probes can be used simultaneously. The security parameter is thus linked to the ability for the attacker to recover some useful information out of d probes (where d is typically 1, 2, 3 or 4). Besides, the probing requires a physical access to the wires, which is challenging, since it is possible that the contact breaks the bit to be probed. Such attack is termed semi-invasive, since it leaves an evidence that the circuit has been tampered with (an opening is necessity to insert the probe).

The second one is noisy and also leaks some function of the bits. Therefore, the attacker needs to capture more than one trace to extract some information. This is why we model, in the sequel, traces by random variables. By convention, the variables are printed with capital letters, such as X, when designating a random variable, and with small letters, such as x, when designating the realization of random variables. We also denote by Q the number of queries (= of measurements), and by \(\mathbf {x}=(x_1,\ldots ,x_Q)\) the vector of measurements. This attack will require a statistical analysis, which in general consists in the study of the leakage probability distribution. This starts in general by the analysis of the leakage moments.

We will link the two models in the case of RSM countermeasure (Sect. 3.5). The next Sect. 2.2 discusses the channel \(k^\star \rightarrow \hat{k}\), for the second case.

2.2 Example of AWGN Channel

Fig. 2.
figure 2

Side-channel analysis as a communication channel

The key recovery setup is illustrated in Fig. 2 (see Fig. 1 in [24]). When the noise is Gaussian and independent from one measurement to others, it is referred to as AWGN (Additive white Gaussian noise). We write:

$$\begin{aligned} X = y(T,k^*) + N, \qquad \text {where } N\sim \mathcal {N}(0,\sigma ^2). \end{aligned}$$
(1)

The random variable \(y(T,k^*)\) is the aggregated leakage model, and N is the noise (independent from Y). Let n the bitwidth of the key k and of the texts T. The function \(y:\mathbb {F}_2^n\times \mathbb {F}_2^n\rightarrow \mathbb {R}\) is, in practice, the composition two functions \(y=\varphi \circ f\), where:

  • f is an algorithmic function called sensitive variable, such as \(f(T,k^*) = S(T\oplus k^*)\), where S is a substitution box, and

  • \(\varphi : \mathbb {F}_2^n\rightarrow \mathbb {R}\) accounts for the way the sensitive variable leaks, such as the Hamming weight \(\varphi : z\mapsto w_H(z)=\sum _{i=1}^n z_i\).

2.3 Absence of Countermeasures

The optimal distinguisher is the key guess \(\hat{k}\) which maximizes the success probability, that is the probability that \(\hat{k}\) is actually \(k^*\).

When there is no protection, all the uncertainty resides in the measurement noise. Thus, as the attacker knows T, he also knows \(Y=Y(T,k)\) (for all key guess k).

Theorem 1

([24, Theorem 4]). In the AWGN setup, the optimal distinguisher is demonstrated to be equal to:

$$\begin{aligned} \mathcal {D}_{opt}(\mathbf {x},\mathbf {t}) = {{\mathrm{argmin}}}_k \left\| \mathbf {x}-\mathbf {y}(\mathbf {t},k) \right\| _2^2 = {{\mathrm{argmax}}}_k \langle \mathbf {x} | \mathbf {y}(\mathbf {t},k) \rangle -\frac{1}{2} \left\| \mathbf {y}(\mathbf {t},k) \right\| _2^2 , \end{aligned}$$
(2)

where \(\left\| \cdot \right\| _2\) is the Euclidean norm and \(\langle \cdot | \cdot \rangle \) is the canonical scalar product.

2.4 Multivariate and Multimodel Setting

In the multivariate and multimodel case, the attacker is able to collect:

  • not only one sample, but D (dimensionality) samples, and

  • each function of the bits (e.g., \(z\mapsto 1\), \(z\mapsto z_i\) for \(1\le i\le n\), but also any selection of \(z\mapsto \bigwedge _{i\in I} z_i\) where \(I\subseteq \mathbb {F}_2^n\)) has a different contribution.

We call S the number of models, and \(\alpha \) the \(D\times S\) matrix of the leakages, such that Eq. (1) is generalized as:

$$\begin{aligned} \mathbf {X} = \alpha \mathbf {y}(\mathbf {T},k^*) + \mathbf {N}, \qquad \text {where } \mathbf {N}\sim \mathcal {N}(\mathbf {0},\varSigma ), \end{aligned}$$
(3)

where \(\mathbf {N}\) is multivariate normal of \(D\times D\) covariance matrix \(\varSigma \), and \(\mathbf {Y}=\mathbf {y}(\mathbf {T},k^*)\) is set of S models (e.g., \(S=1\) if the leakage model is the Hamming weight, or \(S=n+1\) if there is a non-zero offset (such offset is modeled by \(z\mapsto 1\)) and each bit \(1\le i\le n\) of the leakage model leaks differently). In this case also, boldface variables are vectorial (either multivariate or multimodel).

We have a generalization of Theorem 1:

Theorem 2

([7, Theorem 1]). Let us define \(\mathbf {x}'=\varSigma ^{-1/2}\mathbf {x}\) and \(\alpha '=\varSigma ^{-1/2}\alpha \). Then, in the multivariate and multimodel AWGN setup, the optimal distinguisher is demonstrated to be equal to:

$$\begin{aligned} \mathcal {D}^{D,S}_{opt}(\mathbf {x},\mathbf {t})&= {{\mathrm{argmin}}}_k \ \sum _{d=1}^D \Vert \mathbf {x}'_d-\alpha '_d\mathbf {y}((\mathbf {t},k) \Vert _2^2 \\&= {{\mathrm{argmax}}}_k \ {{\mathrm{tr}}}\left( \mathbf {x}' {\left( \alpha ' \mathbf {y}(\mathbf {t},k) \right) }^\mathsf {T}\right) -\frac{1}{2} \left\| \alpha ' \mathbf {y}(\mathbf {t},k) \right\| _F^2 , \end{aligned}$$

where \({{\mathrm{tr}}}\left( \cdot \right) \) is the trace operator of a square matrix and \(\Vert \cdot \Vert _F\) is the Frobenius normal of a (rectangular) matrix.

2.5 Collision

In some situations, the attacker does not know the leakage function \(y=\varphi \circ f\), but knows that it is reused several times for different bytes, say \(L>1\). We denote by \(x^{(\cdot )}=(x^{(1)},\ldots ,x^{(\ell )},\ldots ,x^{(L)})\) the L leakages. Therefore, the optimal attack consists in a collision attack where all the coefficients of the leakage function are regressed.

Theorem 3

([5, Theorem 2.5]). The optimal collision attack is:

$$\begin{aligned} \mathcal {D}^L_{opt}(\mathbf {x}^{(\cdot )}, \mathbf {t}^{(\cdot )})&= {{\mathrm{argmax}}}_{k^{(\cdot )}\in (\mathbb {F}_2^n)^L} \quad \sum _{u\in \mathbb {F}_2^n} \frac{\left( \sum _\ell \sum _{q / t_q^{(\ell )} \oplus k^{(\ell )} = u} \ x^{(\ell )}_q \right) ^2}{\sum _\ell \sum _{q / t_q^{(\ell )} \oplus k^{(\ell )} = u} \ 1} . \end{aligned}$$

Notice that in general, this attack allows to recover \((L-1)\) n-bit keys when the collision is involving L samples with identical leakage model.

2.6 General Setting, with Countermeasures

In general, the device defends itself, by the implementation of protections. Masking is one of them. In the expression of y, in addition to T and k, another random variable M is introduced, called the mask, unknown to the attacker. It is usually assumed that it is uniformly distributed.

Theorem 4

([8, Proposition 8]). The optimal attack in case of masking countermeasure is:

$$\begin{aligned} \mathcal {D}^{M;L}_{opt}(\mathbf {x}^{(\cdot )}, \mathbf {t}^{(\cdot )}) = {{\mathrm{argmax}}}_k \ \sum _{q=1}^Q \log \left\{ \sum _{m} \exp \Bigl \{ \sum _{d=1}^D \frac{1}{\sigma ^{{(d)}^2}}\bigl (x_q^{(d)}y_q^{(d)}-\frac{1}{2}{y_q^{(d)}}^2\bigr )\Bigr \} \right\} , \end{aligned}$$

assuming that the noise at each sample d is normal of variance \(\sigma ^{{(d)}^2}\).

2.7 Link Between Success Probability, SNR and Leakage Function

The optimal distinguishers \(\mathcal {D}_\text {opt}\) given in various scenarios ( \(\mathcal {D}_\text {opt}\) for nominal case in Sect. 2.3, \(\mathcal {D}^{D,S}_\text {opt}\) for multivariate and multimodel case in Sect. 2.3, \(\mathcal {D}^L_\text {opt}\) for the collision case in Sect. 2.5, and \(\mathcal {D}^{M;L}_\text {opt}\) for the masked case in Sect. 2.6) allow to recover the secret key with the largest success rate (denoted as SR), but do not help in predicting the number of traces to reach a given success rate (or vice-versa).

Such relationship can be easily derived from the analysis of so-called first-order exponents [23]. Let us denote \(\mathcal {A}_\text {opt}(\mathbf {x}, \mathbf {t}, k)\) the argument of maximization in either of \(\mathcal {D}_\text {opt}\), \(\mathcal {D}^{D,S}_\text {opt}\), \(\mathcal {D}^L_\text {opt}\) or \(\mathcal {D}^{M;L}_\text {opt}\). We have:

Theorem 5

([23, Corollary 1]).

$$\begin{aligned} 1-\mathrm {SR}(\mathcal {D}) \approx e^{ - Q \cdot \mathrm {SE}(\mathcal {D}) } \end{aligned}$$
(4)

where the first-order success exponent \(\mathrm {SE}(\mathcal {D})\) is equal to:

$$\begin{aligned} \mathrm {SE}(\mathcal {D}) = \frac{1}{2}\ \min _{k\ne k^*}\ \frac{\bigl ( \mathcal {A}_{opt}(\mathbf {x}, \mathbf {t}, k^*)-\mathcal {A}_{opt}(\mathbf {x}, \mathbf {t}, k)\bigr )^2}{\mathrm {Var}\bigl ( \mathcal {A}_{opt}(\mathbf {x}, \mathbf {t}, k^*)-\mathcal {A}_{opt}(\mathbf {x}, \mathbf {t}, k)\bigr )}. \end{aligned}$$
(5)

For the sake of the introduction of a signal-to-noise, we rewrite Eq. (1) as:

$$\begin{aligned} X = \alpha y(T,k^*) + N, \text { where } {{\mathrm{\mathbb {E}}}}(y(T,k^*))=0, \mathrm {Var}(y(T,k^*))=1 \text { and } N\sim \mathcal {N}(0,\sigma ^2). \end{aligned}$$

Let us introduce generalized confusion coefficients [20]:

Definition 6

(General 2-way confusion coefficients [23, Definitions 8 and 10]). For \(k\ne k^{*}\) we define

$$\begin{aligned} \kappa (k^*,k)&= {{\mathrm{\mathbb {E}}}}\Bigl \{\Bigl (\frac{Y(k^{*})-Y(k)}{2}\Bigr )^2\Bigr \} , \end{aligned}$$
(6)
$$\begin{aligned} \kappa '(k^*,k)&= {{\mathrm{\mathbb {E}}}}\Bigl \{\Bigl (\frac{Y(k^{*})-Y(k)}{2}\Bigr )^4\Bigr \} . \end{aligned}$$
(7)

For example, for the optimal distinguisher in the nominal case, the success exponent expression is:

Lemma 7

(SE for the optimal distinguisher, [23, Proposition 5]). The success exponent for the optimal distinguisher takes the closed-form expression

$$\begin{aligned} \mathrm {SE}(\mathcal {D})&=\frac{1}{2}\ \min _{k\ne k^*} \; \frac{\alpha ^2\kappa ^2(k^*,k)}{\sigma ^2 \kappa (k^*,k) + \alpha ^2(\kappa '(k^*,k) - \kappa (k^*,k)^2}. \end{aligned}$$
(8)

This closed-form expression simplifies for high noise \(\sigma \gg \alpha \) in a simple equation:

Corollary 8

([23, Corollary 2]).

$$\begin{aligned} \mathrm {SE}(\mathcal {D}) \ \approx \ \frac{1}{2} \min _{k\ne k^*} {\frac{\alpha ^2 \kappa ^2(k^*,k)}{\sigma ^2 \kappa (k^*,k)}} \ =\ \frac{1}{2} \cdot \text {SNR} \cdot \min _{k\ne k^*} \kappa (k^*, k), \end{aligned}$$
(9)

where \(\mathrm {SNR}=\alpha ^2/\sigma ^2\) is the signal-to-noise ratio (see [6] for the definition of SNR in the multivariate case).

3 Side-Channel Protection

Side-channel attacks threaten the security of cryptographic implementations. Protections against such attacks can be devised using the coding theory. We illustrate in this section several techniques which randomize leakages in a view to decorrelate them from the internally manipulated data, and that (in some cases) also allow to detect malicious fault injections.

3.1 Strategies to Thwart Side-Channel Attacks

As discussed in Sect. 2.7 (especially in (9)), the success of an attack is all the larger as the leakage function has a higher confusion (6) and the SNR is high. However, the input of confusion is limited, since \(0\le \min _{k\ne k^*} \kappa (k^*, k)\le 1/2\) is bounded. Moreover, the defender cannot always change the algorithm nor its leakage model, that is \(\min _{k\ne k^*} \kappa (k^*, k)\) is fixed. Thus, the defender is better off focusing on the reduction of the SNR.

This can be achieved in two flavors:

  1. 1.

    reduce the signal, as done in strategies aiming at flattening the leakage. This is easily achieved for some side-channels, such as timing: the execution time is made constant, e.g., by inserting dummy instructions or by balancing the code in each branch when the control flow forks. However, balancing an analogue quantity (such as power or electromagnetic field) is more challenging, let alone because of process variations, two identical gates or structures behave differently after fabrication. For instance, this is the working factor of physically unclonable functions (PUFs). Therefore, the quality of the protection depends on the ability of the fabrication plant to produce reproducible patterns. This fact naturally limits the quality of the designer’s work, hence does not encourage to reach very high levels of security. In case this case, the second option is preferred;

  2. 2.

    increase the noise, by resorting to some extra random variables independent of that involved in the leakage function. Obviously, some artificial noise can be easily produced: one practical example consists in running an algorithm known to produce a lot of leakage (such as an asymmetrical engine, e.g., RSA) in parallel to the algorithm to protect. However, there remains the risk that the attacker manages, by a subtle placement of the probes, to limit or completely avoid the externally added noise; imagine an attacker with a very selective electromagnetic probe which would place its probe over the targetted algorithm, which is micrometers apart from the noise source (RSA). Therefore, it sounds wiser to entangle the computation and the random variables. This is what is achieved by so-called masking schemes. Appendix A explains why masking reduces the SNR.

Notice that the two strategies are orthogonal, that is, it is beneficial to employ them at the same time. Still, in the sequel, we will focus on masking, since it allows (at least in theory) to increase the noise at the maximal extent.

3.2 Masking Schemes

Masking schemes have been introduced to obfuscate the internals of a computation, in a view to make it more difficult to be attacked. The strategy in masking is based on randomization:

  • for data (e.g., in algorithms with constant-execution flow, such as AES), and

  • for operations (e.g., in algorithms where the sequence of operations leak some secrets, such as RSA).

In practice, a masking scheme consists in four algorithms, as depicted in Fig. 3.

Fig. 3.
figure 3

Masking schemes

Initially, the input data must be masked, thanks to a first algorithm. Second, the masked data is manipulated, so as to implement the intended cryptographic operation. Many techniques exist. One way to envision masking is to see all the operations making up the cryptographic function as look-up tables. In this case, the masked look-up tables can be implemented as [37, Table 1]:

  • new larger look-up tables, where the masking material is now part of the addressing strategy,

  • table recomputation specifically for the current mask, or

  • computation style which is able to operate on masked data.

After the operation has been computed, it can be necessary to refresh the masks. Indeed, if the value is intended to be used more than once, then some masks would be duplicated during the computation. It is thus wise to re-randomize the current masks. Eventually, at the end of the computation, the masked data shall be freed from its mask. Hence a demasking step. The first three algorithms require entropy, whereas the last one destroys entropy.

3.3 Security of Masking Schemes

It is easy to measure the amount of entropy consumed by a masking scheme (see top of Fig. 3). However, this does not obviously reflect its actual security level. Indeed, the entropy can be wasted, e.g., by being badly used: XORing together entropy reduces it, while bringing no additional difficulty for the attacker.

The first attempt to measure security arise from [1, Definition 1]. The order is defined as the minimum number of intermediate values an attacker must collect to recover part of the secret. In this framework, the overall security is that of the weakest link.

Still, the exact definition of an intermediate variable is unclear. The difficulty arises from the fact the designer would like to link the security to properties of its design. However, the intermediate variables encompass different notions depending on the refinement stage: after compilations, variables are mapped to internal resources. Thus, the granularity [1, Sect. 3] can change between the cryptographic algorithm specification, the source code, the machine code, and what is actually executed on the device.

Some early works considered intermediate values are bits, such as in private circuits [25, 26]. This makes sense for hardware circuits, for which (in general CMOS processes) an equipotential has only two licit values, that is carries one bit. However, private circuits have been extended to software implementations (see e.g. [40]), where intermediate variables become bitvectors of the machine word length. But after considering some new threats, such as glitches, a new trend has consisted in looking back to bit-oriented masking. This is typically the case of threshold implementations [35], where the granularity is again the bit.

In this article, we are interested with the lowest possible level of security analysis, hence we consider that intermediate variables are bits.

3.4 Orthogonal Direct Sum Masking (ODSM), a Masking Scheme Based on Codes

We illustrate in this section several masking schemes, and show in which respect they relate to coding theory.

We will show that the two security notions related to masking (probing and bounded-moment models) are equivalent when conducting analyses at bit-level. We model a circuit as a parallel composition of bits, seen as elements of \(\mathbb {F}_2\). The exemple, when there are n wires in the circuit, we model the circuit state as an element of \(\mathbb {F}_2^n\), that is the Cartesian product \(\mathbb {F}_2 \times \ldots \times \mathbb {F}_2\).

At this stage, we use the following new notations. Let X a k-bit information word to be concealed. Let Y an \((n-k)\)-bit mask used to protect X. The protected variable is \(Z=X G + Y H\), where:

  • G is an \(k\times n\) generating matrix of a code,

  • H is an \((n-k)\times n\) generating matrix of a code of dual distance \(d+1\),

  • \(+\) is the bitwise addition in \(\mathbb {F}_2^n\), sometimes also denoted by \(\oplus \).

The random variable YH is the mask. In practice, the bits making up Z can be manipulated in whatever order, i.e., they can even be scheduled to be manipulated one after the other, like in a bitslice implementation. We call Z an encoding with codes, or ODSM [3].

Then, we have the following twain theorems.

Theorem 9

Encoding with codes is secure against probing of order d.

Proof

By definition of a code of dual distance \(d+1\), any tuple of less than d coordinates is uniformly distributed [9]. Thus, if the attacker probes up to d (inclusive) wires, this word seen as an element of \(\mathbb {F}_2^d\) is perfectly masked. Therefore, no information on X can be recovered.    \(\square \)

Theorem 10

(Masking with codes is d -th order secure in the bounded-moments model). For all pseudo-Boolean function \(\psi :\mathbb {F}_2^n\rightarrow \mathbb {R}\) (leakage function, denoted \(y=\varphi \circ f\) in Sect. 2.2) of degree \(d^\circ (\psi ) \le d\), we have

$$\begin{aligned} \mathsf {Var}(\mathbb {E}(\psi (X G + Y H | X))) = 0 . \end{aligned}$$
(10)

Proof

Let \(\psi '\) the indicator of the code generated by H. Since H has dual-distance \(d+1\), we have that for all \(z\in \mathbb {F}_2^n\), \(0<w_H(z)\le d\), \(\hat{\psi '}(z)=0\), where \(\hat{\psi '}(z)=\sum _{z'\in \mathbb {F}_2^n} \psi '(z) (-1)^{z' \cdot z}\). Now, owing to Lemma 1 in [4], we also know that for all \(z\in \mathbb {F}_2^n\), \(w_H(z)>d^\circ (\psi )\), \(\hat{\psi }(z)=0\).

Now, we must prove that \(\mathsf {Var}(\mathbb {E}(\psi (X G + Y H | X))) = 0\), that for all \(x\in \mathbb {F}_2^k\), \(\sum _{y\in \mathbb {F}_2^{n-k}} \psi (x G + y H) = \sum _{z\in \mathbb {F}_2^n} \psi (x G + z) \psi '(z) = (\psi \otimes \psi ')(x G)\) is the same, where \(\otimes \) is the convolution product.

Actually, we can prove more than that, namely that \(\psi \otimes \psi '\) is constant on the full \(\mathbb {F}_2^n\). This is equivalent to proving that \(\widehat{\psi \otimes \psi '} = \hat{\psi } \hat{\psi '}\) is equal to zero on \(\mathbb {F}_2^n {\setminus }\{0\}\). Indeed, let \(z\in \mathbb {F}_2^n\), \(z\ne 0\). If \(w_H(z)>d^\circ (\psi )\), then \(\hat{\psi }(z)=0\). And if \(w_H(z)\le d^\circ (\psi )\le d\), then \(\hat{\psi '}(z)=0\). So, in both cases, we have \(\hat{\psi }(z) \hat{\psi '}(z) = 0\).    \(\square \)

Notice that the function \(\psi : \mathbb {F}_2^n\rightarrow \mathbb {R}\) such that \(\psi (x)=\sum _{i=0}^{n-1} x_i 2^i\), has degree one. It is sometimes (abusively) referred to as the identity function. Obviously, if the attacker gets to know \(\psi (Z)\), then he can recover Z, hence deduce X by projection on subspace vector C. But this is not our security hypothesis. Our result from Theorem 10 (and in particular its Eq. (10)) is that the inter-class variance of \(\psi (Z)\) knowing X is equal to zero, for all \(d^\circ (\psi )\le d\).

In Eq. (10), the degree of \(\psi \) can be accounted by two reasons:

  1. 1.

    High-order leakage in \(y=\varphi \circ f\), owing to glitches (see Sect. 4), capacitive coupling, IR drop, etc. (refer to [18, Sect. 4.2]);

  2. 2.

    Combination function from the attacker, which can be: multivariate (which involved a product of shares), monovariate (hence necessarily high-order zero-offset).

As another remark, we notice that, although it is not strictly mandatory, the randomized variable Z can be manipulated by subwords, a bit like for classical masking, where the subwords coincide with shares.

Let us give the example of the look-up table, in the case \(k=8\) and \(n=16\). We know that we can reach 4-th order security [4]. But we can decide not to manipulate only Z as such, but to cut it into two parts, \(Z=(Z_H, Z_L)\), where \(Z_H, Z_L\in \mathbb {F}_2^8\). This cut is motivated by the adequation between the masking scheme and the machine architecture, where maybe the basic register size is 8 bits. Then, we also cut the T-table(s) into two tables, namely \(T_H\) and \(T_L\), both of 256 bytes. The Algorithm 1 allows to evaluate the T-table using bytes only, i.e., without placing \(Z_H\) and \(Z_L\) side-by-side for all data Z.

figure a

3.5 Illustration for Some Coding-Based Masking Schemes

In the previous section, we have shown with Theorems 9 and 10 that the two models (bit-level probing and bounded moments) are equivalent, which motivates to consider the probing model at bit level (as opposed to at word level, as done in many papers (to cite a few: [16, 19]). We give hereafter some examples of masking with codes at bit-level.

Perfect Masking. The masks \(M_1\), \(M_2\), etc. are chosen uniformly in \(\mathbb {F}_2^k\). We assume here that k|n. It is possible to see perfect masking as a special case of ODSM [3], where:

(11)

Rotating Substitution-Box Masking (RSM [32]). Let us illustrate RSM on \(n=8\) bits. The mask M is chosen uniformly in:

  • the set \(\mathcal {C}_0 = \{\mathtt {0x00}\}\) for no resistance,

  • the set \(\mathcal {C}_1 = \{\mathtt {0x00},\mathtt {0xff}\}\) for resistance to first-order attacks,

  • the set \(\mathcal {C}_2\), a non-linear code of length 8, size 12 and dual distance \(d^\perp _{\mathcal {C}_2}=3\),

  • the set \(\mathcal {C}_3\), a linear code of length 8, dimension 4 and dual distance \(d^\perp _{\mathcal {C}_3}=4\). This code is fully described in [15]. It is a self-dual code of parameters [8, 4, 4].

The case \(\mathcal {C}_3\) is interesting since there are sixteen masks, hence (in hardware), the sixteen Substitution-boxes (S) of an algorithm such as AES can be implemented masked. When \(\varphi =w_H\) and \(Z=f(T, k^*)=S(T\oplus k^*)\), then the leakage distributions \(X=\varphi (Z\oplus M)\) are represented in Fig. 4.

Fig. 4.
figure 4

Leakage distribution of RSM using \(M\sim \mathcal {U}(\mathcal {C}_3)\) on \(n=8\) bits

RSM involves a random index, that is the choice of the initial codeword in \(\mathcal {C}_d\), for a protection order of d. This choice can be done in a leak-free manner by using a one-hot representation. In the case of \(\mathcal {C}_3\), sixteen such indices can be selected. The one-hot representation is given in Fig. 5. The random index is selected at random initially; then, from round to round, it is simply shifted.

Fig. 5.
figure 5

Example of one-hot counter (out of 16), used to designate the round index position

Leakage Squeezing (LS). In leakage squeezing, the shares are like for perfect masking, except that some bijective functions are applied to the them, thereby mixing bits better [10, 12, 13, 17].

Results. For the illustration of the bounded moment model, we use for our illustrations the Hamming weight leakage model. Notice that any other first-order leakage model would yield comparable results.

Also, we illustrate the leakage based on two extreme plaintexts, that is 0x00 and 0xff. However, in some situations, these two plaintexts lead to the same leakage (e.g., for symmetry reasons).

In all the presented schemes, security holds only provided there is no high-order leakage. Said differently, it is possible to consider that there is a high-order leakage. For instance, in recap Fig. 6, the indicated security order is the attack total order. The total attack order is the sum of multiplicative contribution from the hardware and the operations carried out by the attacker. That is, poor hardware which couples bits contributes to facilitates attacks by combining bits.

Fig. 6.
figure 6

Security level of several masking schemes. The order \(d=1,2,3,4\) corresponds both to the number of probes (see Fig. 1(a)) used by the attacker and to the moment of leakage when the attacker uses an integrating probe (see Fig. 1(b))

3.6 Masking and Faults Detection

Codes are also suitable tools when both side-channel leakage must be masked and faults must be detected. This need is general in cryptography, and has specific applications when thwarting Hardware Trojan Horses (HTH) [11, 33, 34]. Indeed, the activation part of a HTH is impeded by masking, whereas the payload part is caught red-handed by a detection code.

4 Leakage Model, and Glitches

The term glitch refers to a non-functional transition(s) occurring in combinational logic. They exist because combinational gates are non-synchronizing, i.e., they evaluate as soon as one input arrive. In terms of hardware description languages (VHDL, Verilog, etc.), they are modelled as processes where all inputs belong to the sensitivity list. Thus, for the vast majority of gates with many inputs, there is the possibility of a race between the inputs. Therefore, some gates can evaluate several times within one clock period. Actually, the deeper the combinational gates, the more likely it is that:

  • there is a large timing difference between the inputs, thereby generating new glitches, and

  • some input is already the output of a glitching gate, thereby amplifying the number of glitches.

It is known that glitches can defeat masking schemes [28,29,30]. Some masking schemes which somehow tolerate [21, 22, 35, 39] or avoid glitches [27, 31] have been put forward. However, the real negative effect of glitches on security is usually perceived in a qualitative manner.

Fig. 7.
figure 7

Example of 4th-order glitch occurring upon 4th-order conjunction \(\bigwedge _{i=1}^{i=4} x_i\)

Therefore, we would like to account quantitatively for the effect of glitches. Let us start by an illustrative example, provided in Fig. 7. The upper part of this figure represents a pipeline, where some combinational gates (AND gates represented by and XOR gate represented by ) form a partial netlist between two barriers of flip-flops (DFF gates represented by ). For the sake of this explanation, all the gates are assumed to have the same propagation time, namely 1 ns. The lower part of this figure gives the chronograms of the execution of this netlist, when initially all signals are set to zero. It appears that, owing to the difference of paths between the two inputs of the final XOR gate, this gate generates a glitch, highlighted with symbol , which lasts 3 ns, between time 1 and 4 ns within the depicted clock period. The condition for this glitch to appear is the following: \(x_1 \wedge x_2 \wedge x_3 \wedge x_4\). This means that this glitch is a 4th-order leakage. So, if the masking scheme is only 3rd-order resistant, the setup of Fig. 7 would generate a glitch which compromises the security in a 1st-order side-channel attack. That is, the circuit itself contributes to the attack, in combining the bits on behalf of the attacker.

Fig. 8.
figure 8

Example of two 2nd-order glitches occurring upon 2th-order conjunctions \(\bigwedge _{i=1}^{i=2} x_i\) and \(\bigwedge _{i=3}^{i=4} x_i\)

Assume now a setup slightly more simple than that of Fig. 7, where there is only one AND gate behind the second input of the XOR gate. However, we assume such pattern is present twice, once computing \(y_0 = x_0 \oplus (x_1 \wedge x_2)\), and another time computing \(y_5 = x_5 \oplus (x_4 \wedge x_3)\). Then, in this case depicted in Fig. 8, the leakage incurred by the glitches at the output of the XOR gates would only combine two bits amongst the \(x_i\) (namely \(x_1\) & \(x_2\), and \(x_3\) & \(x_4\)). Therefore, it suffices for the attacker to conduct a 2nd-order attack on the glitchy traces to succeed a \(2\times 2=4\)th order attack on the masking scheme. The circuit and the attacker collaborate in the objective of realizing a 4th-order attack: half of the combination is carried out by the circuit (\((x_1 \wedge x_2)\) and \((x_3 \wedge x_4)\)), while the other half is left remaining to the attacker. Indeed, by raising the traces to the second power, the attacker obtains a term \((x_1 \wedge x_2) \times (x_3 \wedge x_4)\), which coincides with the leakage condition of Fig. 7, that is \(\bigwedge _{i=1}^{i=4} x_i\).

To conclude on the leakage model complexification, we underline that it has a negative impact on two situations:

  • on low-entropy masking schemes, where the individual shares are not protected at the maximum order (see for instance RSM in Sect. 3.5), and

  • on any masking schemes, where shares interact between themselves by some combinational logic.

In those two cases, a great care must be taken; tools as that described in [18] can help check the design is secure (or not).

5 Conclusion

Throughout this paper, we have seen how coding and side-channel analysis can benefit one from another, for attack as well as for protection.

This is a nice example of cross fertilization between disciplines, in which Claude Carlet played a decisive role. Thanks to you, Claude!