1 Protection as a coding problem

Cryptographic algorithms are subject to attacks aiming at extracting their keys. When the adversary has access to the device, he is able to target the implementation of the cryptographic algorithm. Two attack paths customarily encountered are side-channel attacks (where the attacker reads some leakage from the implementation when it is running), and fault injection attacks (where the attacker modifies some intermediate variables inside of the implementation).

In this article, we analyze algorithmic protections combining both side-channel prevention and fault injection detection. We survey security models for a given set of security parameters. In general, several such models can be defined, each addressing a particular kind of attacker. The equivalence or even the reduction of security models is hard and is currently at the core of intensive researches. However, when one focuses on one specific implementation operated in a given context, then security notions can be clarified (e.g., be shown equivalent).

In this paper, we focus on a protection against side-channel and fault injection attacks where the state of the cryptographic algorithm is encoded. From the security model and its parameters, we can thus derive desirable protection properties. Those result from a statistical analysis of the leakage in the presence of countermeasures.

Contributions

Regarding side-channel analysis protections, we identify that the inner product masking scheme [3, 58] is an instance of the leakage squeezing (see [18, 19, 37, 37, 38] for 2 shares, and [16] for strictly more than 2 shares) protection using linear bijections (Section 6.4). The papers about inner product masking scheme explain the engineering aspects related to secure computation of finite field laws (addition and multiplication), whereas papers about leakage squeezing highlight the accurate security level of the data representation. In this article, we bridge the gap by showing how to design inner product masking schemes with quantifiable security level against bit-level side-channel attacks. For the first time, we relate the dual distance of the code used in the countermeasure, the mutual information between sensitive variable and leakage, and the attack success rate.

A second contribution of this paper is to analyze joint side-channel and fault attacks protections. Specifically, we emit a warning: fault protections and side-channel protections can happen to combine nicely, provided a careful analysis of their combined implementation is carried out (Section 6.3). Without such analysis, the combination can be destructive security-wise.

Eventually, we expose a novel method to derive Boolean codes from codes over \(\mathbb {F}_{2^{k}}\) (Section 7).

Outline

The rest of the paper is structured as follows. We start in Section 2 by explaining how error correcting codes can provide a protection against both side-channel and fault injection attacks. Then, we review in Section 3 existing security models, and select some of them. Relevant security parameters are given in Section 4. The impact on the protections architecturing is then analyzed in Section 5. Known constructions are revisited in Section 6, and a new one is given in Section 7. Our contributions beyond the state-of-the-art are in Section 6 and 7. Eventually, conclusions are in Section 8.

2 Introduction

2.1 Principle of coding

One purpose of codes is to detect (and correct) errors. Another purpose is to allow multiple users to use the same channel without interference, while maximizing the use of its capacity. In the context of protections against side-channel attacks, one user will be the cryptographic computation, and the other ones are noisy sources, aiming at making the leakage passing through the channel as difficult to interpret as possible for an eavesdropper. Clearly, this dual use of codes allows to kill two birds with the same stone, which makes it appealing.

Let us insist more in detail on the protection against side-channel attacks. We denote by \(X{\in \mathbb {F}_{2}^{k}}\) (where \({\mathbb {F}_{2}^{k}}\) is {0,1}k equipped with an additive group law, denoted by “⊕”) a sensitive variable, we intend to protect. It is usually a word, of bit length k. As usual in statistics, we shall use capital letters (such as X) for random variables, and small letters (such as x) for their realizations. The AES [45] block cipher will be our running example, because it is very widespread in the field and is well studied in academic papers. As AES is byte-oriented, we will consider that every variable can be represented by one or more bytes, hence k = 8 bits. In a cryptographic implementation, such variable is leaking some non-injective and noisy information. The non-injective function is denoted as \(\varphi :{\mathbb {F}_{2}^{k}}\to \mathbb {R}\), and N denotes the additive noise. Both are represented in Fig. 1, as well as the leakage \(X \rightsquigarrow \varphi (X)+N\).

Fig. 1
figure 1

Leakage arising from the manipulation of variable X

Typically, φ is an extensive function (that is, it is the weighted sum over \(\mathbb {R}\) of its coordinates), such as the Hamming weight (denoted as w H ). This model is attested in many devices, such as smartcards, whose leakage is analyzed in Fig. 2 [31].

Fig. 2
figure 2

Decomposition of the leakage per value of φ(X) = w H (X) ∈{0,…,8}

To protect against straightforward analysis of leakage, masking countermeasure has been initially presented (by Thomas S. Messerges [40]) as a two-step process:

  1. 1.

    the algorithmic parameters (e.g., substitution boxes) are recomputed for a given mask (randomly chosen) by replacing each sensitive data X by \((X \oplus \bigoplus _{i = 1}^{t} Y_{i}, Y_{1},Y_{2},{\ldots } ,Y_{t})\), where t is some security parameter and where the Y i ’s are chosen randomly independently in the same additive group \({\mathbb {F}_{2}^{k}}\) as X, and then

  2. 2.

    the masked algorithm is executed with masked plaintext and masked key as inputs.

While this strategy works well from a theoretical point of view, some criticisms have emerged over time:

  • from a security point of view, it has been noted that the recomputation stage algorithm (which does not depend on the key) leaks a lot of information which can be combined in a constructive way with the algorithm execution (where masked sensitive data, that is data which depend on the key and on inputs/outputs known by the attacker, are used), and that such attack path is hard to counter [14, 47, 55],

  • from a performance point of view, the recomputation takes a longer timeFootnote 1 than the execution of the recomputed algorithm, which obviously limits the advantage of such solution.

Therefore, solutions which are free from the preliminary recomputation stage are favored in practice in many applications (except low-cost smartcard, which do not have enough resources to get rid of the security-wise weak table recomputation stage). Historically, the data and the masking material are processed together during the execution of the algorithm. For instance, in the case where t = 1 above, the computation is organized by duplicating the state: one half contains the masking material \(Y{\in \mathbb {F}_{2}^{k}}\), whereas the second half contains the masked data \(X\oplus Y{\in \mathbb {F}_{2}^{k}}\). This is illustrated in Fig. 3. It shall be noted that the leakage is now bi-variate, hence harder to exploit by the attacker, because the latter must combine two values to recover useful information. However, some implementations manage to handle XY and Y side-by-side; when the non-injective leakage function φ is extensive, we thus have φ(XY,Y ) = φ(XY ) + φ(Y ), hence it is convenient to describe the masking as an encoding of (X,Y ). Namely, the sensitive variable X is encoded by a linear code of generating matrix (I ∥ 0), the mask is encoded using the repetition code of generating matrix (II), where I is the identity matrix in \({\mathbb {F}_{2}^{k}}\), and these two codewords are added together.

Fig. 3
figure 3

Leakage arising from the manipulation of the masked variable XY and of its (single) mask Y, here of same size k = 8 as that of X

Thus, we see that protection against side-channel attack can also be expressed in terms of codes. In the former example, the two binary codes are:

  1. 1.

    C, of parameters [n = 2k,k,1], of generating matrix (I ∥ 0), and

  2. 2.

    D, of parameters [n = 2k,k,2], of generating matrix (II),

such that any element \(Z=(X \oplus Y, Y) {\in \mathbb {F}_{2}^{n}}\) is the direct sum of the encoding of X through C and of Y though D.

This approach of coding is well suited to the physical leakage as represented in Fig. 1, since side-channel analysis can be reinterpreted as a decoding problem: the aim of the attacker is the recovery of X after its encoding with masks, and transformation through the non-injective (owing to φ) and noisy (owing to N) leakage function. Notice that high-order masking schemes are detailed in greater details under the view of coding theory in Section 6.1.

However, we stress that the attacker has other means to recover information on Z (X after coding):

  • with a probing station, the attacker is able to read and/or write selected bits,

  • on multicore platforms running an operating system, cache hit/miss Footnote 2 probing can be used as an attack, especially if data are used as addresses to memories.

2.2 Design choice

In the previous section, we took the example (recall Fig. 3) of mask (Y) and information (X) of same bitwidth. However, we have already seen that this can be more general, with a value of t larger than 1 in traditional masking. Typically, Y can be made smaller (as small as 1 bit, e.g., in [8, 40, 54]—for instance, in [40], a 1-bit masking is used to perform a Boolean to arithmetic transform.) But also, for enhanced security, Y can be larger than X, especially in so-called high-order masking schemes [52].Footnote 3 The general encoding using linear codes of X is as follows:

$$ Z = X G \oplus Y H , $$
(1)

where:

  • G is the generating matrix of a code of length n and of dimension k, and

  • H is the generating matrix of a code of length n and of dimension (nk).

Typically, k = 8 bits for AES. For high-order protections, the masks are used as multiple k bit words. Therefore, a typical study will be that (nk) is a multiple of k.

However, probing attacks do target individual bits.

Therefore, we will consider two kinds of codes: codes on \(\mathbb {F}_{2^{k}}\) and codes on \(\mathbb {F}_{2}\). Notice that a code on \(\mathbb {F}_{2^{k}}\) can be expanded on \(\mathbb {F}_{2}\). In MAGMA [56], this operation can be realized on a code C easily using command

$${\texttt{C\_expanded := SubfieldRepresentationCode(C, GF(2));}}.$$

If C has parameters \([n, k, d]_{2^{m}}\), then C_expanded has parameters [m n,m k,d ]2, where d d. A concrete example will be given in Section 7.

3 Security models

3.1 Side-channel analysis

Masking consists in adding some randomness in the computations, which forces the attacker to perform a high-order attack, process during which several leakage sources are combined. In turn, if the leakage samples are noisy, the combination results in a so-called noise amplification.

There are mainly two security models:

  • Probing model (cf. Section 3), as in [32] (and many other papers [9, 23, 36, 50] which stem from this seminal publication).

  • Bounded moment model (cf. Section 3), initially defined in [44, Section 4], and then reintroduced in [6].

Probing model

The probing model states the following:

Definition 1 (Probing model)

A masking scheme is secure at order t in the probing model if no tuple of t intermediate variables depends on the secret.

An unprotected implementation is secure at order t = 0 (recall Fig. 2). A protected implementation is secure at order t > 1.

When the algorithm handles bitvectors (elements of \({\mathbb {F}_{2}^{k}}\)), there is an ambiguity whether the Definition 1 refers to intermediate variables as bitvectors or as individual bits. Thus, in the sequel, we shall clarify this point when talking about the probing model.

An automated method to test for the security of an algorithm with respect to this model at bitvector-level is given in [4, 5].

Bounded moment model

The bounded moment model states the following:

Definition 2 (Bounded moment model)

A masking scheme is secure at order t in the bounded moment model if no moment of degree t in the intermediate variables depends on the secret.

With this definition, we also have that an unprotected implementation is secure at order t = 0, while a protected implementation is secure at order t > 1.

The Definition 2 has initially been introduced in the context of low entropy masking schemes (LEMS [8, 44]). The concept has been recovered independently [42, 43] by noting that attacks at many orders are possible, but that in usual situations (see exception in [14]), the lowest order is the most successful.

Reductions between leakage security models are studied in [6]. When probing model and bounded moment models are considered at the bit level, then they are equivalent (see Theorems 9 and 10 of [30]).

3.2 Fault injection analysis

Protection of block ciphers with codes is a topic which has been studied for a long time [2, 35]. Basically, the security metric relates to the code error detection probability. However, we notice that few constructs have been tackling simultaneously protection against both side-channel and fault injection analyzes.

3.3 Combination of side-channel and fault injection

The ODSM countermeasure (to be analyzed at Section 6.3) is the first joint protection against side-channel and fault injection analyzes. Carlet et al. noticed that masks are not sensitive by themselves (in that they do not leak information “standalone”); thus faults can be detected by verifying that masks have not been altered. This strategy is all the more relevant in first-order masking schemes, where the security can be attained by reusing the same mask throughout the algorithm to protect, hence the possibility to perform the integrity check at any arbitrarily chosen time while the algorithm unfolds. A careful warning is nonetheless formulated in Section 6.3.

4 Security parameters

Security at order one is nowadays considered insufficient for most practical operational environments. Indeed, many attacks at first order (such as second-order correlation power analysis [41], collision-correlation [26], MIA [7], etc.) are known and well mastered by most adversaries.

Regarding fault injection attacks, it is known that very powerful exploitation techniques exist for block ciphers [33]. Thus, once again, detecting a single fault is insufficient.

However, it shall be noted that some palliative countermeasures are usually implemented in addition to the two abovementioned curative countermeasures. Palliative countermeasures consist typically in artificial insertion of horizontal noise (desynchronized start date, random interrupts, dummy decoil operations, etc.), which makes the step for succeeding higher-order attacks drastically high.

Concluding, second-order resistance (t ≥ 2) to both side-channel analysis and fault injection resistance is, in most case, sufficient if well complemented by other protection means, in a construction denoted by defense in depth.

5 Architectural options for protection

Protecting against both side-channel and fault injection attacks can resort to the masks verification strategy of ODSM. But more generally, it can be imagined to implement orthogonal protections one of top of each other. Both approaches have pros and cons:

  • encode then mask suffers no security issue. Indeed, encoding does increase the data bitwidth while making the encoded data redundant, thus reducing the density of the new sensitive variable. However, this does not cause any security issue as masking remains secure even if the variable to protect is not uniformly distributed (which is the case because the sensitive variable here belongs to a codebook). The “encode then mask” suffers more performance than security issues: the sensitive variable, after encoding, encounters a blow-up in size corresponding to the inverse of the code rate. After application of the side-channel protection, this overhead is multiplied by the order of the masking scheme.

  • mask then encode is thus more efficient in terms of variable size growth. But care must be taken on the way the redundancy is applied. Indeed, linear codes consist in computing some redundancy on top of the masked data, and this redundancy is a linear transformation. It is well known that some linear transformations destructively combine with the masking: e.g., the addition of all shares clearly completely unmasks the masked data. Besides, it becomes non-obvious to compute on an encoded state. The only proposal in this direction is paper [51], which handles security at bit-level.

In addition to those considerations, it shall be noticed that verification can be achieved both at word or at bit levels. Further investigations are left for future considerations.

6 Some known constructions revisited

In this section, we present several masking schemes under the prism of coding theory. We highlight the links between their definition and their security level. The perfect additive masking (Section 6.1) is typically word-oriented.

6.1 Perfect additive masking scheme [9]

In this section, we answer the question “why is masking an encoding?”. Actually, it is straightforward to show that share-based masking schemes (e.g. [27, 50]) consist in encodings. We denote by t the order of the masking, and by d = t + 1 the number of shares, that are elements of \({\mathbb {F}_{2}^{k}}\). The protection rationale is as follows:

  • \(x{\in \mathbb {F}_{2}^{k}}\) the clear data,

  • \(y=(y_{1}, y_{2}, \ldots , y_{t})\in ({\mathbb {F}_{2}^{k}})^{t}\) are the masks, and the protected data is:

  • \(z=(x\oplus \bigoplus _{i = 1}^{t} y_{i}, y_{1}, y_{2} \ldots , y_{t})\in ({\mathbb {F}_{2}^{k}})^{d}\).

So we have n = d × k = (t + 1) × k, and z = x Gy H, where

$$ G = \left( \begin{array}{ccccc} I_{k} & 0 & 0 & {\cdots} & 0 \end{array}\right) \quad \text{and} \quad H = \left( \begin{array}{ccccc} I_{k} & I_{k} & 0 & {\cdots} & 0 \\ I_{k} & 0 & I_{k} & {\cdots} & 0 \\ {\vdots} & {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ I_{k} & 0 & 0 & {\cdots} & I_{k} \end{array}\right). $$
(2)

Notice that G H T≠ 0, thus the codes generated by G and H are not complementary dual [22].

6.2 Inner product (IP [3]) masking scheme

The perfect masking scheme depicted in (2) presents intrinsic weaknesses:Footnote 4 for instance, it does not correspond to individual bit masking, but rather word-wise. Individual bits in the shares can be attacked independently one of the others, thereby enabling k parallel divide-and-conquer mono-bit strategies. Hence there is a need for a secondary security objective which is bit-oriented. The publications dealing with inner product masking [3, 58] therefore attempt to shuffle bits within one share. However, both the choice to focus on one share and the code selection method are currently not discussed mathematically in the published literature. Still, there is a way to select linear functions in line with a security objective. This will be made clear in Section 6.4 devoted to leakage squeezing countermeasure.

6.3 Orthogonal direct sum masking (ODSM [11])

ODSM refers to the masking scheme where the data to protect is represented as in (1).

Implementation

An example of implementation of ODSM for AES (n = 2k = 16) is given in the Appendix B of [12]. These indications shall suffice to reproduce a protected design. The only practical detail to be precised in the implementation is the computation of the linear transformation \(\mathbb {F}_{2}^{16}\to \mathbb {F}_{2}^{16}\). It can be implemented as a vector-matrix product, as explained in Algorithm 2 of [12]. Thus, there is no need to save a 216 × 16 table corresponding to all the values of x L for \(x\in \mathbb {F}_{2}^{16}\). Besides, if it is wanted all the same to resort to a table-based implementation, it is possible to split the 216 × 16 table into tables of size 216 × 8 (see Alg. 1 in [30]).

Security against fault injection attacks

In ODSM, the transformations (e.g., the call to substitution boxes) are presented as operating in parallel on the whole state \(z{\in \mathbb {F}_{2}^{n}}\). It is described in [11] how linear and non-linear operations can be tabulated. When k|n, the ODSM scheme can be interpreted as a computation which can be carried out on k-bit words. In this case, one knows that linear operations can be safely implemented as the parallel composition of the linear operation on each of the d = n/k shares. However, this should not be understood as the fact that arbitrary linear operations can be securely implemented on the whole state. Indeed, for instance, the projection of z on x is linear and is clearly insecure. Therefore, care must be taken when implementing (linear) operations between shares. For instance, it is secure to project z on the code of generating matrix H in parallel to the code of generating matrix G, and to get then y, but the projection algorithm shall be scrutinized. Indeed, the following method:

Step 1::

z is projected on the code of generating matrix G in parallel to the code of generating matrix H to retrieve x,

Step 2::

then y is retrieved as the subtraction zx G on \(\mathbb {F}_{2}\),

is not desirable from a security standpoint, owing to the demasking of the variable at Step 1.

6.4 Leakage squeezing

Background about leakage squeezing countermeasure

The leakage squeezing idea [28, 38] is based on masking but additionally applies some bijective functions (linear or non-linear) to the shares. A quantitative analysis of the gain in terms of bounded moment leakage security model is carried out in [37], where it is found that the best bijections can be non-linear in relation with non-linear codes (e.g., the Nordstrom-Robinson code for k = 8, n = 16). A comprehensive search of functions / codes suitable in the bounded moment leakage model is carried out in [20]. The suitable codes are nicknamed Complementary Information Set (CIS). A survey of usage of codes in the field of side-channel analysis is conducted by the first author [15]. In parallel, an approach using cellular automata to build codes is proposed in [34]. Following from [37], the conditions for building better codes are precised in [18]. Also, this journal paper shows that the leakage squeezing countermeasure resists model imperfections. The mutual information between the sensitive data and the leakage is computed empirically in [18, 37]. In [19], it is demonstrated mathematically that this mutual information vanishes exponentially with the noise variance, at a rate which is proportional to the countermeasure first non-constant moment (known as the HCI or High-order Correlation Immunity). In this section, we relate bounded moments, mutual information, and attack success rate. That is, we show that the attacks are all the more difficult as the first non-constant moment of the leakage is high, and that this behavior tracks that of the mutual information.

Eventually, notice that leakage squeezing with more than two shares has already been studied, from a security perspective in [16] and from the codes construction point of view in [25] (where HO-CIS codes are introduced as a generalization of CIS codes). The most recent survey on codes in side-channel analysis is available in [30]. In the rest of this section on leakage squeezing, we do detail only leakage squeezing with two shares.

Definition and use-case

Leakage squeezing (LS) consists in masking \(X{\in \mathbb {F}_{2}^{k}}\) using representation

$$ (X\oplus Y, F(Y)), $$
(3)

where F is a bijective function from \({\mathbb {F}_{2}^{k}}\). The security order of LS is studied in [37]. We compare here-after LS at various orders (and we use indices, e.g., F t , for t = 0,1,…, to make a difference between the different functions F):

  • 0 (no protection); the leakage has only one share, that is X (plain).

  • 1: F 1 = I d, i.e., (3) represents perfect masking (F 1(y) = y).

  • 2: F 2 is a linear function, where the matrix of F 2 is:

    $$M_{2}= \left( \begin{array}{cccc} 0&0&1&1\\ 1&1&0&1\\ 0&1&1&1\\ 1&1&0&0 \end{array}\right). $$
  • 3: F 3 is a linear function (which is optimal—cf. F3 in [18, Section 5.2]), of matrix:

    $$M_{3}= \left( \begin{array}{cccc} 0&1&1&1\\ 1&0&1&1\\ 1&1&0&1\\ 1&1&1&0 \end{array}\right). $$

Alternatively, the truth tables of F t are (using hexadecimal notations):

  • {F 1(y),0 ≤ y < 24} = {0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f},

  • {F 2(y),0 ≤ y < 24} = {0,a,e,4,5,f,b,1,7,d,9,3,2,8,c,6},

  • {F 3(y),0 ≤ y < 24} = {0,e,d,3,b,5,6,8,7,9,a,4,c,2,1,f}.

When the bijective functions F t are linear, the leakage squeezing is a special instance of ODSM, with generating matrices G and H defined in (1) equal to G = (I k ||0) and H = (I k ||M t ).

Leakage distributions for leakage squeezing

The resulting (uni-variate) distributions in Hamming weight, when XY and F t (Y ) are manipulated in parallel, are represented in Fig. 4a, when the noise has variance σ 2 = 1. The versions resisting attacks at orders 1, 2 and 3 are represented in Figs. 5a, 6a, and 7a. The scale is the same for all plots. It can be seen that:

  • distributions in Fig. 4a do not have the same mean,

  • distributions in Fig. 5a have the same mean (= n = 4), but not the same variance (informally, some distributions are larger than others),

  • distributions in Fig. 6a have the same mean (= n = 4), the same variance (= n/2 + σ 2), but not the same skewness (informally, some distributions are bending to the right, other to the left, while the others are straight),

  • distributions in Fig. 7a have same mean (= n = 4), same variance (= n/2 + σ 2), no skewness, but different kurtosis (informally, some distributions have smaller tails than others).

Fig. 4
figure 4

Leakage distribution without countermeasure (n = 4). Due to absence of masking, the leakage traces consist in Gaussian functions, centered at 0,1,…,k

Fig. 5
figure 5

Leakage distribution with leakage squeezing countermeasure at order 1 (k = 4, see Section 6.4)

Fig. 6
figure 6

Leakage distribution with leakage squeezing countermeasure at order 2 (k = 4, see Section 6.4)

Fig. 7
figure 7

Leakage distribution with leakage squeezing countermeasure at order 3 (k = 4, see Section 6.4)

Remark 1

The distributions represented in Figs. 4a, 5a, 6a, and 7a are the convolution of the 2n cosets of the weight distribution of the graph of functions F t , for 0 ≤ t ≤ 3.

The bi-variate distributions, that is:

$$(w_{H}(X\oplus Y)+N, w_{H}(F_{t}(Y))+N^{\prime}) \in\mathbb{R}^{2}, \quad\text{where } N,N^{\prime}\sim\mathcal{N}(0,\sigma^{2}), $$

which represent the word-oriented case, are represented in Figs. 4b, 5b, 6b, and 7b. It can be seen that Fig. 4a is merely the value at abscissa of the corresponding bi-variate distribution (somehow artificial, since this implementation uses only one share—however, the representation allows to contrast leakage of unprotected and protected implementations) represented in Fig. 4b. Besides, Figs. 5a, 6a, and 7a are merely the diagonal of corresponding bi-variate distributions represented in Figs. 5b, 6b, and 7b.

It is interesting to see that some distributions are identical for some values of x. We group identical distributions by classes, labeled in lexicographical order. In the uni-variate case (recall (5)), the number of classes is respectively 5, 5, 6 and 3 (for bijection F 0, F 1, F 2 and F 3), as represented in Table 1. The bi-variate case (recall (4)) is represented in the bottom line of Table 1. In these tables, the layout is as given below:

$$\left[\begin{array}{cccc} \mathrm{x}=\texttt{0x0} & \mathrm{x}=\texttt{0x1} & \mathrm{x}=\texttt{0x2} & \mathrm{x}=\texttt{0x3} \\ \mathrm{x}=\texttt{0x4} & \mathrm{x}=\texttt{0x5} & \mathrm{x}=\texttt{0x6} & \mathrm{x}=\texttt{0x7} \\ \mathrm{x}=\texttt{0x8} & \mathrm{x}=\texttt{0x9} & \mathrm{x}=\texttt{0xa} & \mathrm{x}=\texttt{0xb} \\ \mathrm{x}=\texttt{0xc} & \mathrm{x}=\texttt{0xd} & \mathrm{x}=\texttt{0xe} & \mathrm{x}=\texttt{0xf} \end{array}\right].$$
Table 1 Classes of identical uni-variate and bi-variate distributions, in leakage squeezing with functions F t , for 0 ≤ t ≤ 3

Uni- and bi-variate attacks on leakage squeezing

Attacks have been simulated, both in uni- and bi-variate settings. In the bi-variate setting, the attacker gets the leakages L (1) and L (2) corresponding to masked data and mask (with bijection \(F_{t}: {\mathbb {F}_{2}^{k}}{\to \mathbb {F}_{2}^{k}}\) applied on it):

$$\begin{array}{@{}rcl@{}} \left\{ \begin{array}{ll} L^{(1)} = w_{H}(T\oplus k^{*}\oplus Y)+N \\ L^{(2)} = w_{H}(F_{t}(Y))+N^{\prime} \end{array} \right., \end{array} $$
(4)

where X = Tk is the sensitive variable (known text \(T{\in \mathbb {F}_{2}^{k}}\) and secret key \(k^{*}{\in \mathbb {F}_{2}^{k}}\)). The (4) is the application of the Hamming weight leakage model on the two shares of (3), and in the addition of noise. In the uni-variate setting, the attackers gets only one leakage sample:

$$ L=L^{(1)}+L^{(2)}. $$
(5)

For the sake of fair comparison, we focus on the optimal attack [13], that is, the attack which maximizes the probability of success in secret key recovery. Notice that the bijections used in leakage squeezing countermeasure are supposed public information.

  • The uni-variate attack measures the sum \(l_{q}^{(1)}+l_{q}^{(2)}\) of leakage for each trace q (1 ≤ qQ), hence the optimal attack estimates the correct key k as:

    $$\begin{array}{@{}rcl@{}} \hat{k^{*}} &=& \underset{k{\in\mathbb{F}_{2}^{k}}}{\text{argmax}} \sum\limits_{q = 1}^{Q} \log \sum\limits_{y{\in\mathbb{F}_{2}^{k}}} \backslash\\ && \exp -\frac1{4\sigma^{2}} \left\{ \left( l_{q}^{(1)}+l_{q}^{(2)} - w_{H}(t_{q} \oplus k \oplus y, F_{t}(y)) \right)^{2} \right\}. \end{array} $$
    (6)
  • The bi-variate attacks measures each share \(l_{q}^{(1)}\) and \(l_{q}^{(2)}\) independently, hence the optimal attack estimates the correct key k as:

    $$\begin{array}{@{}rcl@{}} \hat{k^{*}} &=& \underset{k{\in\mathbb{F}_{2}^{k}}}{\text{argmax}} \sum\limits_{q = 1}^{Q} \log \sum\limits_{y{\in\mathbb{F}_{2}^{k}}}\backslash \\ &&\exp -\frac1{2\sigma^{2}} \left\{ \left( l_{q}^{(1)} - w_{H}(t_{q} \oplus k \oplus y) \right)^{2} + \left( l_{q}^{(2)} - w_{H}(F_{t}(y) ) \right)^{2} \right\}. \end{array} $$
    (7)

Notice that the noise in uni-variate case is \(N+N^{\prime }\sim \mathcal {N}\left (0,2\sigma ^{2}\right )\), whereas in the bi-variate case, it is \((N,N^{\prime })\sim \mathcal {N}\left (\left (\begin {array}{cc}0&0\\0&0 \end {array}\right ),\sigma ^{2}\left (\begin {array}{cc}1&0\\0&1 \end {array}\right ) \right )\); this explains the different factors in the exponential for expressions (6) and (7). Results in terms of success rate (\(\mathsf {SR}=\mathbb {P}(\hat {h^{*}}=k^{*})\)) are shown in Fig. 8 for σ = 1. The success rates are obtained after 100 independent attacks, and the estimation error of the curves are superimposed (they correspond to ± the standard deviation of the S R estimator; refer to [39] for their calculation).

Fig. 8
figure 8

Attack result for σ = 1

It can be seen that the security increases (i.e., more and more traces are needed to recover the key) with the resistance order t, 0 ≤ t ≤ 3. Said differently, the larger the dual distance of the code generated by H, the more difficult the attack. Moreover, it appears clearly that bi-variate attacks are more successful than uni-variate attacks, since information is lost while the two leakages are summed up (recall that in Table 1, there are less classes in the uni-variate case than in the bi-variate case). This notice settles a quantitative assessment why so-called zero-offset uni-variate attacks [57] are less efficient than truly multi-variate counterparts. The two functions F 2 and F 3 seem to yield similar security level, at least for low noise σ = 1. However, when the noise increases, F 3 clearly increases more than F 2 the resistance of the implementation against attacks, as illustrated in Fig. 9 for σ = 2. One can see the “staggering” of the number of traces to succeed for a given order: the success rate curve without protection (F 0) is squared to obtain that with 1st-order protection (F 1). This fact has already been reported in [29].

Fig. 9
figure 9

Attack result for σ = 2

Information leakage under leakage squeezing protection

Besides, we also evaluate the information leakage of the four levels of protections. We compute I(L;X), where X = Tk is uniformly distributed over \({\mathbb {F}_{2}^{k}}\), and where the leakage L is the uni- or bi-variate leakage function.

  • In the uni-variate case, L is

    $$L^{(1)}+L^{(2)}=w_{H}(T\oplus k^{*}\oplus Y)+N+ w_{H}(F_{t}(Y))+N^{\prime}\in\mathbb{R};$$
  • In the bi-variate case, L is

    $$(L^{(1)},L^{(2)})=(w_{H}(T\oplus k^{*}\oplus Y)+N, w_{H}(F_{t}(Y))+N^{\prime})\in\mathbb{R}^{2},$$

where:

  • Y is uniformly distributed over \(\mathbb {F}_{2}^{n-k}\) (here \(={\mathbb {F}_{2}^{k}}\) since n = 2k) and

  • N and N are two additive (recall Fig. 1) and independent noises of centered normal law with identical standard deviation σ.

The resulting mutual information values are given in Fig. 10 for uni- and bi-variate attacks.

Fig. 10
figure 10

Mutual information analysis for uni- and bi-variate leakage

Interestingly, in presence of large noise, the mutual information decreases affinely with σ (in log-log scale), with a slope − 2(t + 1) = − 2d, where:

  • t is the protection order, and

  • d = t + 1 is the minimum order of a successful attack (also denoted High-order Correlation Immunity or HCI in [19, Def. 2]).

This noting is demonstrated mathematically in [19, Theorem 1].

It can thus be stated that LS with bijection F t has the same bit-level security with two shares as perfect masking with t + 1 = d shares.

Link between attacks and information leakage

It is demonstrated in [29] that, for additive distinguishers, there exists a coefficient E, called first-order exponent, such that the number of traces q to extract the key k with success probability S R satisfies the property:

$$ 1-\mathsf{SR} \approx \exp-q \cdot E , $$
(8)

where ≈ is an asymptotic equivalence (detailed in [29]).

It is hinted in [53] that such exponent is proportional to the mutual information (as computed in previous Section 6.4), provided the distinguisher is the template attack. Now, with perfect profiling, the template attack [24] coincides with the optimal distinguisher [13]. Thus, we aim in this section at validating this finding on the bi-variate (but higher-order secure) LS masking scheme, using the distinguishers (6) and (7) for the optimal attacks.

To validate experimentally that the first-order E involved in (8) is proportional to I(L;X), we extract the number of measurements q to recover the key k with probability S R = 80%. The two Figs. 8 and 9 allow to extract 16 values of number of traces. The corresponding values (for σ = 1 and 2) of I(L;X) are extracted from Fig. 10. These data are represented in Fig. 11.

Fig. 11
figure 11

Number of traces to extract the secret key k with probability 80% as a function of the mutual information between the leakage and the sensitive variable X = Tk

In the case of the bi-variate attack, it is possible to fit these data by linear regression as relationship:

$$\log(q) = \log(-\log(0.80)) - \log(\alpha \cdot I(L;X)) , $$

where the estimated parameter α is found to be α = 0.0396361 ± 0.0002805. This good fit with a law where q × I(L;X) is a constant (curve of slope − 1 in Fig. 11) validates that in the case of optimal attack on bi-variate leakage, one has that (8) holds, with first-order exponent equal to:

$$ E = \alpha \cdot I(L;X). $$
(9)

We underline that this result holds, surprisingly, for 4 different leakage scenarios (corresponding to the use of F t , t ∈{0,1,2,3}). Therefore, the relationship (9) seems very general. On the contrary, it might explain why the law (8) fits less nicely (the interpolated slope of the curve is > − 1), since sum of two leakages is a ad hoc operation.

7 A new construction for leakage squeezing and inner product masking

7.1 Rationale of the construction

In this section, we explain how to obtain CIS (and HO-CIS) codes based on code expansion from \(\mathbb {F}_{2^{k}}\) to \(\mathbb {F}_{2}\). The procedure is the following:

  1. 1.

    Decide on a number m of shares of k bit words.

  2. 2.

    Search for a code of parameters \([m,1]_{2^{k}}\) of minimum distance m; basically, this means that the generating matrix of the code consists in a line of m non-zero values of \(\mathbb {F}_{2^{k}}\).

  3. 3.

    Expand the code on \(\mathbb {F}_{2}\). This code is HO-CIS of order m (see Proposition 2.2 of [1]). The protection order of this code in bit-level security models is equal to its minimum distance minus one.

  4. 4.

    Write its generating matrix as (M 1||M 2||…||M m ), where M i (1 ≤ im) are k × k matrices with entries in \(\mathbb {F}_{2}\).

  5. 5.

    As explained in [17, Appendix B, page 21], the linear function to apply to share i (1 ≤ im) is generated by matrix \(M_{i}^{-1}\).

7.2 Example on a non-optimal code

We detail in Listing 1 an example of a masking with m = 2 shares of k = 4 bits, which has order 1 security at word level and order 2 security at the bit level.Footnote 5

Listing 1
figure 12

Example of program for obtaining codes (in magma [56] language)

The construction for this code needed in leakage squeezing is explained below:

  1. 1.

    we opt for a leakage squeezing with a mask Y of bitwidth equal to that of the data X to protect,

  2. 2.

    the code C5 in \(\mathbb {F}_{2^{k}}\) is generated by (1||1 + X), where \(\mathbb {F}_{2^{4}}=\mathbb {F}_{2}[X]/1+X+X^{4}\), hence has parameters [2,1,2]16,

  3. 3.

    this code is expanded into C5_expanded, which has parameters [8,4,3]2. Therefore, the security at bit level is 3 − 1 = 2, which is one more than that of C5 at word level,

  4. 4.

    the generating matrix of C5_expanded is written in systematic form as

    $$(I_{4} || \text{\texttt{M5\_inv}}) = \left( \begin{array}{ccccccccc} 1&0&0&0& &1&1&0&0\\ 0&1&0&0& &0&1&1&0\\ 0&0&1&0& &0&0&1&1\\ 0&0&0&1& &1&1&0&1 \end{array}\right)$$
  5. 5.

    the researched linear bijection has matrix \(M5 = \text {\texttt {M5\_inv}}^{-1} = \left (\begin {array}{cccc} 0&1&1&1\\ 1&1&1&1\\ 1&0&1&1\\ 1&0&0&1 \end {array}\right )\).

The resulting linear function has truth table:

$$\{F_{5}(y), 0\leq y<2^{4}\} = \{\texttt{0}, \texttt{e}, \texttt{3}, \texttt{d}, \texttt{7}, \texttt{9}, \texttt{4}, \texttt{a}, \texttt{f}, \texttt{1}, \texttt{c}, \texttt{2}, \texttt{8}, \texttt{6}, \texttt{b}, \texttt{5}\}.$$

7.3 Example on an optimal code

In the case k = 4 and n = 2k = 8, we detail how the (autodual, and unique of type-II Footnote 6) code with parameters [8,4,4]2 and generating matrix

$$ \left( \begin{array}{ccccccccc} 1&0&0&0& &0&1&1&1\\ 0&1&0&0& &1&0&1&1\\ 0&0&1&0& &1&1&0&1\\ 0&0&0&1& &1&1&1&0 \end{array}\right) $$
(10)

can be derived from a linear code of parameters [2,1]16 over \(\mathbb {F}_{2^{4}}\). We aim to find an irreducible polynomial P(X) such that \(\mathbb {F}_{16}=\mathbb {F}_{2}[X]/P(X)\) and the code over \(\mathbb {F}_{16}\) has parameters [1,X + X 2 + X 3]2. Equivalently, this means that:

  1. 1.

    d (P(X)) = 4 and

  2. 2.

    the three following conditions are met:

    • X(X + X 2 + X 3) = 1 + X 2 + X 3 mod P(X),

    • X 2(X + X 2 + X 3) = 1 + X + X 3 mod P(X),

    • X 3(X + X 2 + X 3) = 1 + X + X 2 mod P(X).

The second, third, and forth condition mean that P(X) is a divisor of \(\gcd (X(X+X^{2}+X^{3})+ 1+X^{2}+X^{3},X^{2}(X+X^{2}+X^{3})+ 1+X+X^{3},X^{3}(X+X^{2}+X^{3})+ 1+X+X^{2})=\gcd (1+X^{4},1+X+X^{4}+X^{5},1+X+X^{2}+X^{4}+X^{5}+X^{6})\).

Now, it happens indeed that 1 + X 4|1 + X + X 4 + X 5,1 + X + X 2 + X 4 + X 5 + X 6. However, \(\mathbb {F}_{2}[X]/(1+X^{4})\) is a ring, and not a field, because 1 + X 4 = (1 + X)4. Thus, the code is defined over a ring, which has never been analyzed this way in masking, and which opens the door to interesting perspectives.

We have tested all 4! permutations of the four last columns in the [8,4,4]2 code generating matrix (10), without success to lift this code as a [2,1,2]16 code in \(\mathbb {F}_{16}\). Actually, this code can be obtained as a binary image (through a graymap) of a [2,1] code on \(\mathbb {F}_{4}[X]/(X^{2})\) or on \(\mathbb {F}_{2}[X]/(X^{4})\), and of a code of parameters [4,2] on \(\mathbb {F}_{2}[X]/(X^{2})\).

8 Conclusions

In this paper we have studied the statistical distribution of uni- and multi-variate leakage functions of cryptographic implementations when some countermeasures against fault injection and side-channel analyzes are applied.

We have observed that the previous studied protection called leakage squeezing is a generalization of the variants of perfect masking, including inner product masking (see Table 2 for a recap). In this sense, we extend the work [48], which explores the links between inner product masking and direct sum masking. We show that leakage squeezing is all the more secure as its underlying code has a high minimum distance d. Side-channel attacks of orders 1,…,t = d − 1 are impossible. We relate this value to the slope − 2d of the mutual information between sensitive variables and the leakage (represented in log-log scale), and show that, in practice, the success rate of attacks is less when d is large. We also reveal that bi-variate mutual information (resp. bi-variate attack probability success) is less than in the uni-variate case.

Table 2 Hierarchy between masking styles

Eventually, we propose a new method to build (HO-)CIS codes based on code expansion, which is promising in the context of leakage squeezing, i.e., when a high level of security is required both at word- and at bit-level.