1 Context About the Protection Problem

Sensitive computations must be secured against non-invasive attacks, which attempt to correlate the leakage of some operations with a hypothetical model [13].

A protection against this threat is the masking [13, Chap. 9] countermeasure. Masking consists in changing the intermediate variables of the computation into randomized versions, which are thus decorrelated from the unprotected variables, each being a potential target for a side-channel attack. Indeed, leakage localization tests have been put forward to pinpoint leakages [4, 10, 14]; hence masking schemes must have a full coverage.

In particular, in the field of symmetrical encryption, the mainstream approach consists in the use of purely Boolean structures. This is the case of widely used (and standardized) ciphers, such as DES [15] and AES [16]. In both these examples, the operations (except for simple data move, which will simply amplify the leakage signal-to-noise ratio, but not create new leakage model) simply consist in XORs and in look-up tables (LUTs). It is therefore natural to restrict to so-called Boolean masking, where the only operation in terms of masking is the XOR. This is innately compatible with the functional XOR operations, and LUTs are also easy to protect, e.g., using recomputation [20].

1.1 Nature of Computation

Complex computations, like cryptographic algorithms, can be seen as a sequence of parallel basic operations. In hardware, the basic operations are logic gates. They take as input a small amount of bits (e.g., 1, 2, 3, up to maximum 6 usually) and yield another bit according to a Boolean function. In software, the basic operations are instructions. They take a couple of operands and yield another one, computed through a deterministic function. For instance, xor r1 r2 r3 computes the exclusive-or of 32-bit registers r2 and r3 and saves the result r2 xor r3 in r1.

1.2 Combinational or Sequential?

We notice the spatio-temporal nature of computation: many bits are manipulated in parallel, and sometimes, the computation has loops. In hardware, this is called an iterative implementation, for instance, the instantiation of gates for one round of AES-128, which are evaluated ten times. In software, interestingly, the loops (in the underlying hardware, i.e., the processor) cannot be unrolled, because all operations pass through a unique integer unit. Typically, the accumulated register is updated again and again at each instruction (at least, for most instructions—with the exception of instructions where the result is saved directly in memory). Footnote 1

1.3 Outline of the Article

The rest of this article is structured as follows: First of all, the mainstream masking algorithms are presented and challenged from a security standpoint in Sect. 2. Second, masking is analyzed vis-à-vis technological and logical high-order leakage function in Sect. 3. A new definition of realistic security objectives is given in Sect. 4. Eventually, this chapter is concluded in Sect. 5.

2 Definition of t-Order Security by ISW [12]

2.1 Revisiting of ISW Definition

Ishai, Sahai, and Wagner (ISW) define as stateless circuits [12, Sec. 2] fully combinational circuits, which are acyclic. This models fully unrolled hardware implementations, which are usually prohibitive in cost, but all the same implemented in some contexts like for extremely low latency or for some sort of side-channel resistance [3]. On the other hand, stateful circuits are circuits with loops.

Definition 1 (t-Order Security, as per ISW)

According to ISW, a circuit is t-order secure if the attacker can get no information about the unprotected variables by:

  1. 1.

    Using t probes at arbitrary positions in one loop Footnote 2

  2. 2.

    But with the possibility to re-probe (and even to move the t probes) for free at every loop Footnote 3

2.2 Ill-Formed Definition

The problem with this definition of probing security is that it does not characterize well some countermeasures. For instance, perfect masking [5] of order t can be either secure or insecure depending on the implementation. The secure implementation is the parallel one (see Fig. 1a), because indeed t probes are needed, whereas the insecure implementation is the sequential one (see Fig. 1b), because one probe suffices to read out one bit of the t shares over t clock cycles (even without changing the probe location). This kind of linear readout attack scenario is illustrated in Fig. 2; once the probe tip is installed on top on the register bit to probe, the consecutive values held in this DFF (Data Flip-Flop) are read out non-invasively, one after the other. This definition shall not be confused with the more general (word) probing model where whole words can be read out at once.

Fig. 1
figure 1figure 1

Parallel (a) vs. sequential (b) implementation of perfect masking for t = 2

Fig. 2
figure 2figure 2

Linear readout of one bit in a word of n = 32 bits, using one prober tip

Admittedly (this was the hypothesis in seminal paper of ISW [12]), the relevant security parameter in probing is the number of probes. Indeed:

  1. 1.

    The probe tips are very small, but the probe itself is large (see Fig. 3a); hence only few of them can be placed over a circuit.

    Fig. 3
    figure 3figure 3

    Probe example, courtesy of http://www.bridgetec.com/holders.html (a) and operation for the probe tip to properly contact the targeted area (b)

  2. 2.

    The step consisting in placing the probe is costly, for at least two reasons. First of all, the identification of the probe’s location is time-consuming (it consists, as to say, to identify a needle in a haystack). Second, the positioning of the probe (see Fig. 3a) is slightly invasive, in that it requires to scratch the chip surface to get a reliable electrical contact with the resource to spy (see Fig. 3b).

2.3 Attack on Coron’s Higher-Order Masking of Look-Up Tables [8]

Coron’s higher-order masking of look-up tables [8] is a software variant of ISW scheme [12] (more precisely, it is a variant of the word-level variant of ISW, namely [22]). This scheme is proven high-order secure, but the proof is incorrect, because the given implementation and the one assumed in the proof do not match: in the given implementation (algorithms), some resources are reused over time, hence creating a security weakness, as we shall detail in this section.

We recall Coron’s masked computation of look-up table \(S:\mathbb {F}_2^n\to \mathbb {F}_2^n\) in Algorithm 1, which makes use of masks sharing refresh procedure given in Algorithm 2. In this latter algorithm, the operator “\(\gets _{\mathcal {R}}\)” stands for uniformly randomized affectation.

Algorithm 1: Masked computation of y = S(x) (Alg. 1 in [8])

Algorithm 2: The RefreshMasks function (Alg. 2 in [8])

We show that there is a second-order attack (in the sense of ISW) on Coron’s scheme:

  • The attacker probes at line 2 of Algorithm 2; then [one bit of] all the random numbers injected in Algorithm 1 are known. Let us call them ri,j for 0 ≤ i ≤ t (the ith time the RefreshMasks function is called) and for 1 ≤ j ≤ t (the jth fresh mask in invocation i of RefreshMasks).

  • In parallel, the attacker also probes [one bit of] y0 at line 12 of Algorithm 1. This value is equal to \(S(\bigoplus _{i=0}^{t} x_i)\oplus \bigoplus _{i'=0}^{i}\bigoplus _{j=1}^{t} r_{i',j}\). As the attacker knows [one bit of] all the ri,j, he can deduce [one bit of] \(S(\bigoplus _{i=0}^{t} x_i) = S(x)\).

With one probe, only one bit of the n = 8 bit register can be probed, which allows nonetheless to recover one bit of S(x) in the clear. However, this is sufficient information to break the AES: after knowing about n values of S(x) targeted bit for x = p ⊕ k (plaintext \(p\in \mathbb {F}_2^n\) xored with the key byte \(k\in \mathbb {F}_2^n\)) knowing the values of p, a unique k can be derived.

The values to probe can be found as well in the reference code of https://github.com/coron/htable/blob/master/src/aes_htable.c (hash a9e88df, put online on 25 Sep 2015):

  • Line 39, in function subbyte_table (line 12 of Algorithm 1)

  • Line 51, in function refreshword (line 2 of Algorithm 2)

The online version of file aes_htable.c shall thus not be used as is. Its security problem can be fixed easily, by avoiding the reuse t − 1 times of tables T and T′ and (t − 1) × t times of variable tmp. The corrected algorithm is Algorithm 3 (which calls Algorithms 4 and 5 as subfunctions).

Algorithm 3: Masked computation of y = S(x) (fixed version of Algorithm 1)

Algorithm 4: The RefreshMasks function (fixed version of Algorithm 2)

Algorithm 5: The generation of internal masks InitRefreshMasks

2.4 Motivation for Bit-Mixing Masking Schemes

The attack in the previous Sect. 2.3 has revealed structural weaknesses in state-of-the-art masking schemes. In particular, the attack exploiting the consecutive values taken by one bit allows to break arbitrary high-order masking schemes such as perfect masking scheme [5] or Coron’s table-based masking [8].

In reaction to this weakness, so-called inner product masking schemes have been proposed [1, 2, 18, 23], which make such attack more chancy. A comprehensive analysis between probing security at bit versus word levels is carried out in [7, 19].

3 Analysis of the Security Issue

In the previous section, we showed how one single probe is able to defeat at bit level high-order masking schemes proved at word level. Therefore, we recommended in Sect. 2.4 masking schemes which combine, by design, several bits together. In this section, we examine how multi-bit high-order leakage might arise, created either by the hardware or the software themselves (to the free benefit of the attacker).

3.1 Hardware Case

In the hardware case, coupling between bits can be due:

  • Spatially, to:

    • Glitches: in combinational logic, gates do not evaluate in their order in the netlist (since they are non-synchronizing); for more information on how glitches appear in combinational circuits and contribute to lower the security with respect to side-channel attacks, we refer the reader to the didactic explanations provided in section 4 of [11] devoted to this topic.

    • IR drop: individual gates cannot be considered independent, since they share the same power/ground network; the effect is well illustrated in Fig. 9(b) of [9, Sec. 4.2.3].

    • Capacitive coupling: some gates, physically placed close one to each other, can have a capacitive coupling of their output nets; the effect is well illustrated in Fig. 9(c) of [9, Sec. 4.2.3].

    • Unselected gates: some gates are instantiated in a netlist and supposed to be have a useful functionality only at some times. But actually, being there (i.e., being instantiated and thus activated), they contribute to the leakage continuously, even when they handle data which is eventually not selected (i.e., not used downstream). For example, Fig. 4a illustrates a complete masking scheme, made up of four algorithms: (1) data masking, (2) operation masking, (3) masks refresh (optional), and (4) data unmasking. The last algorithm should, obviously, be executed only at the end of the computation. In the example of a masked iterative block cipher with Ω rounds, the shares can be combined only to recover the ciphertext. However, Fig. 4b shows a faulty implementation of a perfect masking scheme, wherein the data unmasking logic is executed at each round 0 ≤ ω ≤ Ω of the block cipher, thereby leaking information on all intermediate rounds. Footnote 4

      Fig. 4
      figure 4figure 4

      Masking scheme steps (a) and illustration of a netlist for perfect masking with t = 2 shares (b), with an unselected gate’s leakage flaw

  • Temporally, when some gates are reused over time, as explained in the linear probing issue (recall Fig. 2 of Sect. 2.2).

3.2 Software Case

In this section, we tackle the question of software security with respect to side-channel leakage. Let us first precise what is implied under the term “software.” Software means that some control is written in a memory, but the execution is carried out by one (or several) processor(s). Now, processors are pieces of hardware and hence suffer from the same leakage sources as mentioned in previous Sect. 3.1.

Let us recall two optimizations occurring at compilation stage, which make software execution more amenable to side-channel attacks [17]:

  • Register packing consists in regrouping several variables into one register. Indeed, registers are usually wide and hence can accommodate the concatenation of several words (say shares), to be processed in parallel, e.g., using bitslice operations.

  • In Static Single Assignment (SSA) mode, any new variable is affected to a new virtual register. However, in the next pass, registers are allocated. Dead registers are considered as fresh resources and hence are reused, which opens the door to linear probing issues.

For the sake of pedagogy, let us make explicit some unusual sources of leakage occurring in software. One important point to make clear is that glitches do exist in software. In particular, we find in CPUs the case of leakage of unselected gates, owing to unselected logic mentioned in the previous section. Let us illustrate this on the example of two CPUs: 1. 6502, 2. LEON3. For the sake of legibility, lines which are too long (ending by a “\” sign) have been folded.

3.2.1 6502

Let us analyze the integer unit of 6502 processor, described in VHDL in https://github.com/chenxiao07/vhdl-nes/tree/master. Lines 765–772 of source file vhdl-nes-master/src/free6502.vhd are recalled below:

3.2.2 LEON3

Let us also analyze the integer unit of LEON3 processor, described in VHDL in iu3.vhd excerpt below:

3.2.3 Analysis of the 6502 and LEON3 Codes

It clearly appears that both CPUs (6502 and LEON3) do compute all the possible bitwise operations in parallel before selecting the one actually relevant for the computation indicated by the current instruction. This behavior is explained in Fig. 5 and corresponds to an unselected gate’s flaw. In particular, the arithmetic addition combines all the bits (since there is a carry propagation, i.e., the last bit depends on all previous bits) and hence defeats the register packing strategy. This is illustrated in Fig. 6. The structure of the full adder (FA) is recalled in Fig. 7.

Fig. 5
figure 5figure 5

Illustration of the unselected gate’s flaw on the ALU (Arithmetic and Logic Unit) of a CPU

Fig. 6
figure 6figure 6

Illustration addition (+) and exclusive-or (̂) operations: (a) the addition (a + b + cin = 2ncout + d) couples bits of the accumulator (here assumed n = 4 bit), through carries represented as red arrows, (b) the exclusive-or operation (a ⊕ b = d) is bitslice, meaning that each bit is computed independently (as illustrated by the “ ” separation line)

Fig. 7
figure 7figure 7

Full adder, as used in the addition of Fig. 6a

4 New Definition of Security Order

Therefore, in the probing model of ISW, the defender has only one option to enhance the security: disallow the adversary from probing more than once with one probe by

  • In hardware: unrolling circuits, hence designing fully combinational logic (as in [3, 24])

  • In software: unrolling loops, hence using n2 times more memory than announced in [8]

For this reason, we propose a new definition of security order. We call this definition the security order in the Noisy Non-Injective (NNI) model.

Definition 2 (t-Order Security, in the NNI Model)

The implementation is t-order secure in the NNI model if no information can be recovered by measuring

  • tspace different bits, at

  • ttime different times,

where tspace × ttime ≤ t.

In this definition,

  • tspace relates to the “high-order” aspect of side-channel attacks in the NNI model.

  • ttime relates to the “multivariate” aspect of side-channel attacks.

We introduce the following result to motive for the definition:

Proposition 1

Let \(\mathcal {L}\) be a pseudo-Boolean function \(\mathbb {F}_2^n\to \mathbb {R}\) , of degree one. Then, assuming the attacker performs a zero-offset attack (i.e., t time = 1), we have that

$$\displaystyle \begin{aligned}\forall i, 0\leq i\leq t, \quad\mathbb{E}(\mathcal{L}(Z)^i|X=x) \quad \mathit{\text{does not depend on }}x \end{aligned}$$

if the implementation is t-order secure.

Proof

See, for instance, Theorem 2 and/or Proposition 3 in [6].

This means that the attacker will need to raise the leakage traces to the power tspace; hence a noise variance raised to the power tspace. Assuming that the noise is independent from sample to sample, then we also have that the noise variance is raised to the power ttime in t-variate attacks [21].

Thus, the relevant quantity is indeed the product between tspace × ttime.

This model is more satisfactory, as it allows for the designer to do:

  • In hardware, fully combinational circuits (ttime = 1)

  • In software, fully sequential bitslice implementations (tspace = 1))

The security notion of Definition 2 is thus more flexible in terms of design solutions to thwart attacks and also more realistic than Definition 1.

However, this model is relevant only in the case where the noise is minimal, so that taking two consecutive measurements decreases the effectiveness of the attack.

5 Conclusion

In this article, we reviewed some masking schemes of the scientific literature. We present their effectiveness with respect to real-world analysis methods and suggest some adaptations. The goal of this article is mostly to underline the discrepancy which can exist between attacks and designs. As of today, attacks arise from the academic world and are pretty virulent. The protection of sensitive circuits is evolving less fast, but a key for a good protection is to understand the risk. The analysis of several adversarial models is performed, and the attacks are confronted to real implementations. We clearly identify a gap between attacks and countermeasures and contribute to bridge it.