1 Introduction

Differential Power Analysis (DPA) was introduced as a means of extracting cryptographic keys by Kocher et al. [16] in 1999, who noted that the power consumption of a device was dependent on the operations being performed, and the value of the operands used. They showed that one could acquire the power consumption over time while a device was computing a cryptographic algorithm, and analyze the acquisitions to determine the cryptographic key. Subsequently, it was shown that the same analyses could be conducted by exploiting other side channels, e.g., the changes in the electromagnetic field around a microprocessor [1, 10, 25].

A typical DPA attack involves acquiring a series of acquisitions while a device is operating on varying inputs and analyzing the power traces by comparing what occurred at the same point in time in each trace. The simplest analysis is to choose one bit of an intermediate state and divide the set of acquisitions depending on the value of this bit, make two mean traces and subtract one trace from the other point by point. A significant difference should be visible in the trace corresponding to where this intermediate state was created by the device. This is typically referred to as a first-order analysis, as each point in the output trace is dependent on the same point in time in the acquisitions. If two (or more) points in each trace are combined, we refer to as a second-order (or higher-order) analysis.

To prevent the side-channel analyses of a cryptographic implementation, one would typically apply a random mask to the input such that operating on the masked data is indistinguishable from random data. A common masking technique is Boolean masking, where an input word gets masked by a random value. All operations are then performed using the Boolean-masked data. However, there exist many cryptographic algorithms that require both Boolean and arithmetic operations, such as the addition of integers, e.g., SHA-2 [22], ChaCha [5], Blake [2], Skein [9], IDEA [17], RC6 [27]. Masked versions of these algorithms therefore require changing Boolean masks into arithmetic masks, and vice versa, which we refer to as “Boolean-to-arithmetic" and “Arithmetic-to-Boolean" mask conversions, respectively.

In 2001, Goubin [12] proposed an efficient constant-time method for Boolean-to-arithmetic mask conversion. His method is secure against first-order analysis, but does not resist second-order attacks. The solutions in the literature use recursive methods [7, 28], where the missing carry bits are calculated using a masked-adder structure, or look-up table-based methods [30, 31], that perform precomputations and store intermediates in memory. It has also been suggested that higher-order versions of Boolean-to-arithmetic mask conversion cannot be done in constant time [30].

In this paper, we present novel algorithms for higher-order secure Boolean-to-arithmetic mask conversion. All proposed methods run in constant time and are independent on the input-word size. In particular, we present a second-order secure algorithm that requires only 31 instructions and a third-order secure algorithm that requires only 74 instructions. Our algorithms are significantly faster than the best recursive methods in the literature [7].

This paper is an extended version of previous work first published on the IACR’s eprint server in 2016 [13]. After a weakness was identified in our third, and higher-order, algorithms [8], the algorithms were updated and are presented here. Our second-order secure algorithm remains secure and is at least one order of magnitude faster than previous work. The algorithms we present were tested using exhaustive simulation, inspired by the state-of-the-art strong non-interference (SNI) notion. Proofs based on this notion are also provided. We show that our corrected algorithms are, at a minimum, about twice as fast as related work—including recent work [8].

In addition to the new contributions, we highlight the importance of the number of random numbers when comparing algorithms. Many algorithms give an artificially low instruction count hiding a large number of required random numbers. In comparing our work with other published works, we take into account the number of random values required to implement an algorithm by considering the number of instructions required to compute an Xorshift random number generator [21] used by Coron’s explicit implementation [8]. We demonstrate that our work is about twice as fast as Coron’s in all cases. Moreover, our algorithms have quadratic complexity with regard to the security order, whereas Coron’s algorithms have exponential complexity.

The paper is organized as follows. In Sect. 2, we describe the general Boolean-to-arithmetic mask conversion problem and discuss previous work. In Sect. 3, we present a novel constant-time algorithm to perform a secure second-order Boolean-to-Arithmetic mask conversion, and generalize it to higher orders in Sect. 4. In Sect. 5, we compare our work with other algorithms in the literature, and discusses implementation considerations in both software and hardware in Sect. 6. Conclusions are drawn in Sect. 7.

2 Boolean-to-arithmetic masking

In this paper, we shall consider operations available in a typical microprocessor with registers of a fixed bit length. Specifically, we shall consider values that are in the field \((\mathbb {Z}_{2^k}, \oplus , +)\) where \(k \in \mathbb {Z}_{\ge 0}\) is the bit length of the registers used, \(\oplus \) is a bitwise XOR operation and \(+\) is integer addition. Other operations are available in a typical microprocessor, but are not relevant to the algorithms described in this paper.

We define the problem of changing a Boolean mask into an arithmetic mask as follows:

Definition 1

(The problem of converting Boolean to arithmetic masks) Given \(x^{\prime }=x \oplus r\), where \(x, r \in (\mathbb {Z}_{2^k}, \oplus , +)\), as a Boolean-masked secret x and r is a random value taken from \(\mathbb {Z}_{2^k}\), we wish to be able to compute \(x^{\prime \prime }= x + s\), with \(s \in (\mathbb {Z}_{2^k}, \oplus , +)\) where \(k \in \mathbb {Z}_{\ge 0}\), without revealing any information on x through some side channel, where \(x^{\prime \prime }\) is the arithmetically masked secret x and s is a random value taken from \(\mathbb {Z}_{2^k}\).

One naïve approach would be to perform the conversion directly by simply removing the Boolean mask and by adding an arithmetic mask afterward, i.e.,

$$\begin{aligned} (x^{\prime } \oplus r) + s = ((x \oplus r) \oplus r) + s = x + s = x^{\prime \prime }, \end{aligned}$$

using the notation given in Definition 1. This, however, would manipulate x directly, allowing an attacker to use side-channel analysis to determine that a hypothesized value of x is manipulated during the mask conversion. Hence, one needs to use an algorithm where all intermediates are statistically independent of the secret x.

Definition 1 is generalized to higher-order masking schemes as follows:

Definition 2

(The problem of converting Boolean to arithmetic masks of higher order) Assuming a masking scheme of order n. Then, given \(x^{\prime }=x \oplus r_1 \oplus \cdots \oplus r_n\), where \(x, r_i \in (\mathbb {Z}_{2^k}, \oplus , +)\), \(k \in \mathbb {Z}_{\ge 0}\) for \(i \in \{ 1, \ldots , n\}\), as a Boolean-masked secret x and n a random values, \(r_i\) for \(i \in \{ 1, \ldots , n\}\), taken from \(\mathbb {Z}_{2^k}\), we wish to compute \(x^{\prime \prime }=x + s_1 + \cdots + s_n\), with \(s_i \in (\mathbb {Z}_{2^k}, \oplus , +)\) for \(i \in \{ 1, \ldots , n\}\), without revealing any information on x through some side channel, where \(x^{\prime \prime }\) is the arithmetically masked secret x and \(s_i\), for \(i \in \{ 1, \ldots , n\}\), are random values taken from \(\mathbb {Z}_{2^k}\).

Higher-order mask conversion methods require that the masks used for the arithmetically masked output are not related to the Boolean-masked input to avoid any side-channel leakage. If we consider, without loss of generality, a second-order secure Boolean-to-arithmetic mask conversion that uses the same input masks \(r_1\) and \(r_2\) to mask the output, information would leak through the carries generated from the arithmetic masks. For ease of expression, we shall consider an attacker able to XOR two intermediate states together in a second-order side-channel attack (a very rough approximation of a second-order side-channel attack, we refer the interested reader to Mangard et al. [19] for a more detailed discussion). If an attacker can combine the input \(x^{\prime }\) and the output \(x^{\prime \prime }\) using some side-channel information, they obtain the following:

$$\begin{aligned} x^{\prime } \oplus x^{\prime \prime }&= (x \oplus r_1 \oplus r_2) \oplus (x + r_1 + r_2) \\&= (x \oplus r_1 \oplus r_2) \oplus ((x \oplus r_1 \oplus c_1)) \oplus r_2 \oplus c_2) \\&= c_1 \oplus c_2, \end{aligned}$$

where \(c_1\) and \(c_2\) represent the carries produced in the additions \(x + r_1\) and \((x + r_1) + r_2\), respectively, as an XOR difference. That is, \(c_1 = (x + r_1) \oplus x \oplus r_1\) and \(c_1 = (x + r_1 + r_2) \oplus x \oplus r_1 \oplus r_2\). We note that \(c_1\) and \(c_2\) are dependent on x and could be used to conduct a side-channel attack.

To avoid this source of higher-order leakage, the output of the mask conversion needs to be masked with values that are independent of the input Boolean masks. This can be achieved through refreshing the masks during the conversion, either once or periodically, as required [7].

In the following, we describe some of the methods for mask conversion that have been presented in the literature.

2.1 Goubin’s method

Goubin proposed an efficient method of converting a Boolean mask to an arithmetic mask at CHES 2001 [12]. His method requires a constant number of instructions, is resistant to first-order side-channel analysis and, at the time of writing, remains the most efficient algorithm known.

The essential observation of Goubin was that the function

$$\begin{aligned} \varPhi _\mathbb {Z}(a, b) : \mathbb {Z}^2 \longrightarrow \mathbb {Z}: a, b \longmapsto (a \oplus b) + b \end{aligned}$$
(1)

is affine over \(\mathbb {F}_2\). It follows that \(( \varPhi (a, b) \oplus \varPhi (a, 0) )\) is linear for any \(b \in \mathbb {Z}\). Trivially, we note the same function is valid in the field \((\mathbb {Z}_{2^k}, \oplus , +)\), for any \(k \in \mathbb {Z}_{\ge 0}\), and in the remainder of this paper, we shall consider the function:

$$\begin{aligned} \varPhi (a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) + b \nonumber \end{aligned}$$
(2)

for some \(k \in \mathbb {Z}_{\ge 0}\).

Taking the notation from Definition 1, for some arbitrary k in \(\mathbb {Z}_{\ge 0}\), the above allows one to mask the computation of \(\varPhi (x^{\prime }, r)=(x^{\prime } \oplus r) + r\) with an additional random value \(\gamma \in \mathbb {Z}_{2^k}\). We recall \(x, r \in (\mathbb {Z}_{2^k}, \oplus , +)\) and \(x^{\prime } = x \oplus r\). Then,

$$\begin{aligned} \varPhi (x^{\prime }, \gamma \oplus r)=(x^{\prime } \oplus (\gamma \oplus r)) + (\gamma \oplus r), \end{aligned}$$
(3)

which can be followed by an unmasking step using

$$\begin{aligned} \varPhi (x^{\prime }, \gamma )=(x^{\prime } \oplus \gamma ) + \gamma . \end{aligned}$$
(4)

Hence, a secure Boolean-to-arithmetic mask conversion can be performed using the following relationship:

$$\begin{aligned} x^{\prime \prime }&= x^{\prime } \oplus \varPhi (x^{\prime }, \gamma ) \oplus \varPhi (x^{\prime }, \gamma \oplus r) \nonumber \\&= x^{\prime } \oplus [(x^{\prime } \oplus \gamma ) + \gamma ] \oplus [(x^{\prime } \oplus (\gamma \oplus r)) + (\gamma \oplus r)] \end{aligned}$$
(5)

where following the notation in Definition 1, \(s=r\), i.e., \(x^{\prime \prime } = x + r\). One can implement this conversion using 7 instructions (2 additions and 5 XOR operations), as described by Goubin, and is recalled in Algorithm 1.

figure a

Goubin then proceeds to give a proof of the following:

Lemma 1

An implementation of Algorithm 1 is resistant to first-order side-channel analysis.

Proof

From Algorithm 1, we can obtain the list of intermediate values \(V_0,{\ldots },V_6\) that appear during the computation of (5):

$$\begin{aligned} V_0= & {} \gamma \\ V_1= & {} \gamma \oplus r \\ V_2= & {} x^{\prime } \oplus \gamma \\ V_3= & {} (x^{\prime } \oplus \gamma ) + \gamma \\ V_4= & {} \left[ (x^{\prime } \oplus \gamma ) + \gamma \right] \oplus x^{\prime } \\ V_5= & {} x^{\prime } \oplus \gamma \oplus r \\ V_6= & {} (x^{\prime } \oplus \gamma \oplus r) + (\gamma \oplus r) \end{aligned}$$

If we suppose that \(\gamma \) is uniformly distributed on \(\mathbb {Z}_{2^k}\), for some arbitrary \(k \in \mathbb {Z}_{\ge 0}\), it is easy to see that:

  • The values \(V_0, V_1, V_2\), and \(V_5\) are uniformly distributed on \(\mathbb {Z}_{2^k}\).

  • The distributions of \(V_3, V_4\), and \(V_6\) are dependent on \(x^{\prime }\) but not on r. \(\square \)

We note that this proof holds in the field \((\mathbb {Z}_{2^k}, \oplus , +)\), for any \(k \in \mathbb {Z}_{\ge 0}\), but not in \(\mathbb {Z}\) since the carry produced by the most significant bits of x combined with the arithmetic mask will depend on x.

2.2 Recursive methods

One can also convert a Boolean-masked value into an arithmetically masked value using an addition operation, which generates the required carries that can then be applied to the Boolean-masked input value bit-by-bit. The first application was proposed by Goubin [12] as a means of converting an arithmetic mask to a Boolean mask (a topic beyond the scope of this paper), and a similar technique was described by Golić in 2007 who proposed using the same method for Boolean-to-arithmetic mask conversion in hardware [11]. Both conversion methods have a complexity of \(\mathcal {O}(n)\) with regard to the bit length of the inputs because all n bits of the input word are processed individually.

Another hardware-oriented design was proposed by Schneider et al. [28], who presented a conversion method based on a Carry Look-ahead Adder (CLA) structure which reduces the complexity to \(\mathcal {O}(\log n)\). They adopted a threshold implementation [23, 24] approach to avoid first and second-order side-channel leakage.

Recursive software implementations were proposed by, for example, Karroumi et al. They described a method adding two Boolean-masked values in \(\mathcal {O}(n)\) time [15]. Coron et al. [6] were the first to propose the use of Carry Look-ahead Adders in software, thus reducing the complexity to \(\mathcal {O}(\log n)\). Both works made use of masked AND operations, as defined by Trichina [29] and Ishai et al. [14], respectively.

2.3 Higher-order Boolean-to-arithmetic masking

Coron et al. [7] proposed a method for conducting a higher-order Boolean-to-arithmetic mask conversion (see Definition 2) at CHES 2014. Their algorithm calculates carries recursively and is built on masked AND and XOR operations that are resistant to higher-order side-channel analysis. Using these secure operations, one can construct an adder resistant to higher-order side-channel analysis with which one can also convert an arithmetic mask to a Boolean mask (the latter topic being beyond the scope of this paper). The authors reported that their fastest hth-order Boolean-to-arithmetic mask conversion has a minimum time complexity of \(\mathcal {O}((2\,h+1)^2 n)\), with regard to the bit length of the inputs n.

The first look-up table-based conversion algorithm that resists second-order attacks was proposed by Vadnala and Großschädl [30], where, to achieve the desired level of resistance, the algorithm adopts the generic second-order secure S-box implementation of Rivain et al. [26]. Using this method, following the notation in Definition 2, one computes \(x_i + r\) for fixed r, where \(x_i \in \{ 0, \dots , 2^k \}\), and then chooses the correct masked output from all the possible values generated. However, a table with \(2^k\) entries is required which is problematic if k is not small.

An improved version was proposed by Vadnala and Großschädl [31], where an input k-bit word would be split into p words with smaller bit widths of \(\ell \le 8\) bits. The conversion is then done on each word individually, and the results combined. Their final solution has a time complexity of \(\mathcal {O}(2^{\ell +2} p)\) and a memory requirement of \(\mathcal {O}(2^{\ell +2} (\ell +2))\).

3 Constant-time second-order Boolean-to-arithmetic mask conversion

In this section, we present a novel method to perform second-order secure Boolean-to-arithmetic mask conversion whose time complexity is independent of the input-word size. Following the notation in Definition 2, we consider a Boolean-masked input \(x^{\prime }=x \oplus r_1 \oplus r_2\), where \(x, r_1, r_2 \in (\mathbb {Z}_{2^k}, \oplus , +)\), and an arithmetically masked output \(x^{\prime \prime }=x + s_1 + s_2\), where \(s_1, s_2 \in (\mathbb {Z}_{2^k}, \oplus , +)\).

In the following, we try to express the ideas behind the mask conversion as clearly as possible. This leads to many instances where an expedient expression will leak if implemented as shown because of the ordering of operations. We highlight some of the issues in the text but do not attempt to enumerate all the possible leaks that could be caused by incorrectly ordering operations.

3.1 Definitions

We recall (2), defined over the field \((\mathbb {Z}_{2^k}, \oplus , +)\), for any \(k \in \mathbb {Z}_{\ge 0}\)

$$\begin{aligned} \varPhi (a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) + b \nonumber \end{aligned}$$
(6)

for any \(k \in \mathbb {Z}_{\ge 0}\). We shall also use the function

$$\begin{aligned} \bar{\varPhi }(a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) - b \nonumber \end{aligned}$$
(7)

for any \(k \in \mathbb {Z}_{\ge 0}\). While subtraction is not a field operation, we shall use it as a convenient way of expressing the addition with the additive inverse of an operand. Similar to \(\varPhi \), Goubin notes that

$$\begin{aligned} x - r = x^{\prime } \oplus \bar{\varPhi }(x^{\prime }, \gamma ) \oplus \bar{\varPhi }(x^{\prime }, \gamma \oplus r), \end{aligned}$$
(8)

using the notation in Definition 1, and that \(\bar{\varPhi }\) is also affine over \(\mathbb {F}_2\) [12].

3.2 The Algorithm

Our conversion method consists of three steps.

  1. 1.

    We compute \(\left( x + \left( r_1 \oplus r_2 \oplus \alpha \right) \right) + s_1\) for some random values \(\alpha , s_1 \in \mathbb {Z}_{2^k}\).

  2. 2.

    We compute \(s_2 - (r_1 \oplus r_2 \oplus \alpha )\) for some random \(s_2 \in \mathbb {Z}_{2^k}\).

  3. 3.

    Add the results of Steps 1 and 2, resulting in \(x + s_1 + s_2\).

We describe these steps in detail below.

Step 1 We consider Goubin’s solution to the first-order Boolean-to-arithmetic mask conversion (5),

$$\begin{aligned} x + r = (x \oplus r) \oplus \varPhi (x \oplus r, \gamma ) \oplus \varPhi (x \oplus r, \gamma \oplus r) . \end{aligned}$$
(9)

Let \(r=r_1 \oplus r_2\) and \(\gamma = \gamma _1 \oplus \gamma _2\), where \(r_1, r_2, \gamma _1, \gamma _2 \in \mathbb {Z}_{2^k}\), then

$$\begin{aligned} x + (r_1 \oplus r_2)&= (x \oplus r_1 \oplus r_2) \oplus \varPhi (x \oplus r_1 \oplus r_2, \gamma _1 \oplus \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x \oplus r_1 \oplus r_2, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2), \end{aligned}$$
(10)

or, more succinctly, using the notation from Definition 2,

$$\begin{aligned} x + (r_1 \oplus r_2)&= x^{\prime } \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2) . \end{aligned}$$
(11)

Given that \(\varPhi \) is affine over \(\mathbb {F}_2\), we can split the first \(\varPhi \) operation giving,

$$\begin{aligned} x + (r_1 \oplus r_2)&= \varPhi (x^{\prime }, \gamma _1) \oplus \varPhi (x^{\prime }, \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2). \end{aligned}$$
(12)

If one were to compute \(x + (r_1 \oplus r_2)\) using the above, a second-order side-channel attack would be possible for same reason that we require the input and output mask to be different. That is, the combined leakage of the input \(x^{\prime }\) and \(x + (r_1 \oplus r_2)\) will depend on x (see Sect. 2).

To overcome this problem, we apply an extra Boolean mask, \(\alpha \in \mathbb {Z}_{2^k}\), to \(x^{\prime }\) as follows:

$$\begin{aligned} (x \oplus \alpha ) + (r_1 \oplus r_2)&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2). \end{aligned}$$
(13)

However, \((x \oplus \alpha ) + (r_1 \oplus r_2)\) is not useful but can be modified given that \(\varPhi \) is affine over \(\mathbb {F}_2\), resulting in

$$\begin{aligned} x + (r_1 \oplus r_2 \oplus \alpha )&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2 \oplus \alpha ). \end{aligned}$$
(14)

If we consider (14), we note that the combination of \(x + (r_1 \oplus r_2 \oplus \alpha )\) with either \(\varPhi (x^{\prime } \oplus \alpha , \gamma _1)\) or \(\varPhi (x^{\prime } \oplus \alpha , \gamma _2)\) will not be statistically independent of x. In both cases, the combination will be dependent on the carry bits produced by the arithmetic operations, as all variables occur an even number of times. The effect is similar to that seen if masks are not changed in a higher-order conversion algorithm, as discussed in Sect. 2. We can, prevent this by applying a Boolean mask \(s_1 \in \mathbb {Z}_{2^k}\), giving:

$$\begin{aligned} (x + (r_1 \oplus r_2 \oplus \alpha )) \oplus s_1&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2\nonumber \\&\quad \oplus r_1 \oplus r_2 \oplus \alpha ) \oplus s_1 . \end{aligned}$$
(15)

The order that (15) is computed is important to avoid combining masks that would allow a second-order side-channel attack. However, this is quite straightforward and will not be detailed here.

Then, given \(\left( x + (r_1 \oplus r_2 \oplus \alpha )\right) \oplus s_1\), one can apply Goubin’s first-order Boolean to arithmetic mask conversion, as described in Algorithm 1, which will produce

$$\begin{aligned} x + (r_1 \oplus r_2 \oplus \alpha ) + s_1 \end{aligned}$$
(16)

without any first or second-order leakage.

Step 2 The second step is another Boolean-to-arithmetic mask conversion to securely compute \(s_2 - (r_1 \oplus r_2 \oplus \alpha )\), where \(s_2\) represents one of the two output masks. For this purpose, one can use the first-order secure Boolean-to-arithmetic mask conversion defined in (5), where we define \(s_{2}^{\prime }=s_2 \oplus (r_1 \oplus r_2 \oplus \alpha )\) as the Boolean-masked input and \(s_2^{\prime \prime }= s_2 - (r_1 \oplus r_2 \oplus \alpha )\) as the arithmetically masked output of the following conversion. Then, given (5), we have

$$\begin{aligned} s_2^{\prime \prime } = s_{2}^{\prime } \oplus \bar{\varPhi }(s_{2}^{\prime }, \delta ) \oplus \bar{\varPhi }(s_{2}^{\prime }, \delta \oplus r_1 \oplus r_2 \oplus \alpha ), \end{aligned}$$
(17)

where \(\delta \) is a random value taken from \(\mathbb {Z}_{2^k}\). If we let \(\delta =r_1\), then

$$\begin{aligned} s_2^{\prime \prime } = s_{2}^{\prime } \oplus \bar{\varPhi }(s_{2}^{\prime }, r_1) \oplus \bar{\varPhi }(s_{2}^{\prime }, r_2 \oplus \alpha ), \end{aligned}$$
(18)

and, given that \(\bar{\varPhi }\) is affine over \(\mathbb {F}_2\), this can be rewritten as

$$\begin{aligned} s_2 - (r_1 \oplus r_2 \oplus \alpha ) = \bar{\varPhi }(s_{2}^{\prime }, r_1) \oplus \bar{\varPhi }(s_{2}^{\prime }, r_2) \oplus \bar{\varPhi }(s_{2}^{\prime }, \alpha ). \end{aligned}$$
(19)

Equation (18) requires a total of 7 XORs and 2 additions, whereas Eq. (19) requires 5 XORs and 3 additions. Thus, the first equation might be attractive for hardware implementations in cases where additions are more expensive than XOR operations.

Step 3 We can now compute the desired arithmetically masked value \(x^{\prime \prime }\) by combining the output of (16) and (19), i.e.,

$$\begin{aligned} x^{\prime \prime }= & {} \left( \left( x + \left( r_1 \oplus r_2 \oplus \alpha \right) + s_1 \right) \right) + \left( s_2 - \left( r_1 \oplus r_2 \oplus \alpha \right) \right) \\= & {} x + s_1 + s_2 . \end{aligned}$$

3.3 Implementation details

Algorithm 2 shows the second-order secure Boolean-to-arithmetic mask conversion described above, which requires 31 instructions. We made some effort to reduce the number of instructions and random values required without affecting the level of security or the sequence of steps. That is, we compute the steps given at the beginning of Sect. 3.2. The details of the steps given above remain unmodified for clarity.

figure b

We prove the security of Algorithm 2 using the probing model proposed by Ishai et al. [14], where we seek to show that it is secure for up to two probes. For this, we use the refined model proposed by Barthe et al. [4] where we make use of the t-SNI (strong non-interference) construction, with t being the security order. This allows us to prove that an algorithm is only vulnerable to a side-channel attack of order n, where \(n \ge t + 1\) (rather than \(n \ge 2\,t + 1\) required by Ishai et al.).

Lemma 2

(2-SNI of Algorithm 2) Let \(\{ x^{\prime }, r_1, r_2 \}\) be the input shares of Algorithm 2, and \(\{ x^{\prime \prime }, s_1, s_2 \}\) be the output shares for any set of t intermediate variables and any subset \(|k| \le t_k\) of output shares such that \(t + t_k \le 2\), there exists a subset I of intermediate variables with \(|I| \le 2\), such that the distribution of those t intermediate variables, and the output shares can be perfectly simulated from \(\{ x^{\prime }, r_1, r_2 \}\).

Proof

We construct two sets \(I = \{ x^{\prime }, r_1, r_2 \}\) and \(J = \{ \gamma _1, \gamma _2, \alpha , s_1, s_2 \}\) corresponding to the input shares and the random values required, respectively. We denote \(a_i\), for \(1 \le i \le 31\), as the intermediate values in Algorithm 2, the definition of which means that is easy to see that each \(a_i\) can be perfectly simulated from the input shares and/or the required random values. That is, any internal variable within Algorithm 2 can be perfectly simulated from a subset of elements from I and/or J. \(\square \)

This was validated by implementing a simulator and verifying that the distributions of each \(a_i\), for \(1 \le i \le 31\), are identical for all values of \(x \in \mathbb {Z}_{2^4}\), without loss of generality. Likewise, the simulator also verified the joint distribution of all possible combinations of pairs of elements in \(I \cup J \cup \{a_1,\ldots ,a_{31}\}\) (i.e., the union of the set of inputs, required random values, and intermediate states) are identical for all values of \(x \in \mathbb {Z}_{2^4}\), without loss of generality. Thus, demonstrating that Algorithm 2 is resistant to first and second-order side-channel analysis. We chose the field \(\mathbb {Z}_{2^4}\) to be small enough to make verification trivial and large enough that carries are propagated as they would be in an arbitrarily large field.

Remark 1

We note our proof is somewhat different to the proofs described by Barthe et al. [4]. Typically, one would seek to model intermediate states as random values to ease the computation complexity of the verification. However, the combination of Boolean and arithmetic operations makes this difficult, and it is simpler to model the random values as inputs to determine whether the distribution of each intermediate state is identical for all values that the masked input can take.

4 Higher-order Boolean-to-arithmetic masking

To generalize the algorithm described in Sect. 3, we consider an nth-order Boolean masking scheme, for \(n > 2\), that masks the secret value x with random masks \(r_1, \ldots , r_n\). That is, we wish to take \(x^{\prime } = x \oplus \bigoplus _{i=1}^n r_i\) and compute \(x^{\prime \prime } = x + \sum _{i=1}^n s_i\) without allowing any nth-order leakage to occur (see Definition 2).

In the following, we try to express the ideas behind the mask conversion as clearly as possible. This leads to many instances where an expedient expression will leak if implemented as shown because of the ordering of operations. We highlight some of the issues in the text but do not attempt to enumerate all the possible leaks that could be caused by incorrectly ordering operations.

4.1 The Algorithm

Our conversion method consists of four steps.

  1. 1.

    We compute \(x + (\alpha \oplus \bigoplus _{i=1}^n r_i) + \bigoplus _{i=1}^{n-1} \mu _i\) for some random values \(\alpha , \mu _i \in \mathbb {Z}_{2^k}\).

  2. 2.

    We compute \((\alpha \oplus \bigoplus _{i=1}^n r_i) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i\) for some random values \(\kappa _i \in \mathbb {Z}_{2^k}\).

  3. 3.

    We compute \(\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i\) for all output masks \(s_i \in \mathbb {Z}_{2^k}\).

  4. 4.

    We combine the results of Steps 1, 2, and 3 to obtain \(x + \sum _{i=1}^n s_i\).

We describe these steps in detail below.

Step 1 We consider Goubin’s solution to the first-order Boolean-to-arithmetic mask conversion (5):

$$\begin{aligned} x + r = (x \oplus r) \oplus \varPhi (x \oplus r, \gamma ) \oplus \varPhi (x \oplus r, \gamma \oplus r). \end{aligned}$$
(20)

Let \(r=r_1 \oplus \cdots \oplus r_n\) and \(\gamma = \gamma _1 \oplus \ldots \oplus \gamma _n\), where \(r_1,\ldots , r_n, \gamma _1,\ldots , \gamma _n \in \mathbb {Z}_{2^k}\); then following the reasoning given in Sect. 3.2, we can state

$$\begin{aligned} x + \bigoplus _{i=1}^n r_i = x^{\prime } \oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \right) \oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) . \end{aligned}$$
(21)

Given that \(\varPhi \) is affine over \(\mathbb {F}_2\), we can split the first \(\varPhi \) operation giving,

$$\begin{aligned} x + \bigoplus _{i=1}^n r_i= & {} ((n \wedge 1 ) x^{\prime } ) \oplus \left( \bigoplus _{i=1}^n \varPhi (x^{\prime }, \gamma _i) \right) \nonumber \\&\oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) , \end{aligned}$$
(22)

where \(\wedge \) is a logical-AND operation. That is, we require an XOR with \(x^{\prime }\) only when n is odd.

To prevent second-order leakage caused by the combination of the input \(x^{\prime }\) and the output of (22), we apply an extra Boolean mask, \(\alpha \in \mathbb {Z}_{2^k}\), following the reasoning given in Sect. 3, i.e.,

$$\begin{aligned} x&+ \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) = (( n \wedge 1 ) ( x^{\prime } \oplus \alpha )) \nonumber \\&\oplus \left( \bigoplus _{i=1}^n \varPhi (x^{\prime } \oplus \alpha , \gamma _i) \right) \oplus \varPhi \left( x^{\prime } \oplus \alpha , \alpha \oplus \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) , \end{aligned}$$
(23)

where we compute \(\varPhi (x^{\prime } \oplus \alpha , \gamma _i)\), for \(i \in \{ 1, \ldots , n \}\), as

$$\begin{aligned} \varPhi ( x^{\prime },\alpha , \gamma _i ) \longmapsto (( x^{\prime } \oplus \gamma _i) \oplus \alpha ) + \gamma _i \end{aligned}$$

to avoid any second-order leakage caused by combining \((x^{\prime } \oplus \alpha )\) with the output of (23).

However, the computation would still cause a higher-order leak, i.e., when \(x^{\prime }\), \(\alpha \), and (23) get combined. Thus, we are required to add extra masks to prevent this leakage, and we use \(\mu _i\) for \(i \in \{1, \ldots , n-1 \}\) and also \(\xi _i\) for \(i \in \{1, \ldots , n-2 \}\), as follows:

$$\begin{aligned}&\left( x + \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) \right) \oplus \bigoplus _{i=1}^{n-1} \mu _i \nonumber \\&\quad = \bigoplus _{i=1}^{n-2} \xi _i \oplus (( n \wedge 1 ) ( x^{\prime } \oplus \alpha )) \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (x^{\prime } \oplus \alpha , \gamma _i) \oplus \mu _i \right) \nonumber \\&\quad \quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _n) ) \oplus \varPhi \left( x^{\prime } \oplus \alpha , \alpha \oplus \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) \oplus \bigoplus _{i=1}^{n-2} \xi _i . \end{aligned}$$
(24)

Note that the masks \(\xi _i\) are used to protect the intermediate values of (24) as there is not a secure way of ordering these terms without causing leakage. The masks, therefore, need to be interleaved with the computation and removed at the end.

The result is then passed through a function that will perform a Boolean-to-arithmetic mask conversion to replace the Boolean operation with an arithmetic operation, i.e.,

$$\begin{aligned} x +&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i. \end{aligned}$$
(25)

Note that the first-order conversion requires sufficient additional masks (\(\delta _i\) for \(i=1...n-1\)) to not cause any leakage, i.e.,

$$\begin{aligned} x +&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i = ( (n-1) \wedge 1 )\zeta ) \nonumber \\&\oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\zeta , \delta _i) \right) \oplus \varPhi \left( \zeta , \bigoplus _{i=1}^{n-1} \delta _i \oplus \mu _i \right) , \end{aligned}$$
(26)

where \(\zeta = (x + (\alpha \oplus \bigoplus _{i=1}^n r_i )) \oplus \bigoplus _{i=1}^{n-1} \mu _i\).

Step 2 In the second step, we wish to compute

$$\begin{aligned} \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i , \end{aligned}$$
(27)

for some random values \(\kappa _i \in \mathbb {Z}_{2^k}\). In which, we view the combination of any elements of \(\{ \kappa _1, \ldots , \kappa _{n-1} \}\), \(\{ \mu _1 \), \(\ldots \), \(\mu _{n-1} \}\), and, likewise, the combination of any elements of \((r_1 \oplus \ldots \oplus r_n)\) as secret. Let

$$\begin{aligned} \epsilon = \alpha \oplus \bigoplus _{i=1}^n r_i \oplus \bigoplus _{i=1}^{n-1} \kappa _i , \end{aligned}$$
(28)

then, given (22), we can compute

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \kappa _i = \left( \left( (n-1) \wedge 1 \right) \epsilon \right) \nonumber \\&\quad \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\epsilon , \beta _i) \right) \oplus \varPhi \left( \epsilon , \bigoplus _{i=1}^{n-1} \kappa _i \oplus \beta _i \right) , \end{aligned}$$
(29)

where \(\beta _i\) are random values taken from \(\mathbb {Z}_{2^k}\) for \(i \in \{ 1, \ldots , n-1 \}\). We note that the order in which operands are treated is particularly important. For example, the terms of the XOR sums need to be computed separately, i.e., \(\bigoplus _{i=1}^n r_i \oplus \bigoplus _{i=1}^{n-1} \kappa _i = \left( \kappa _1 \oplus r_1 \right) \oplus \left( \kappa _2 \oplus r_2 \right) \oplus \cdots \oplus \left( \kappa _{n-1} \oplus r_{n-1} \right) \oplus r_n \,\).

As a next step, we add \(n-1\) additional masks, as in Step 1, i.e., we add random values \(\mu _i\) for \(i \in \{1, \ldots , n-1 \}\) as follows:

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i + \bigoplus _{i=1}^{n-1} \kappa _i \right) \oplus \bigoplus _{i=1}^{n-1} \mu _i = \left( \left( (n-1) \wedge 1 \right) \epsilon \right) \nonumber \\&\qquad \qquad \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\epsilon , \beta _i) \oplus \mu _i \right) \oplus \varPhi \left( \epsilon , \bigoplus _{i=1}^{n-1} \kappa _i \oplus \beta _i \right) . \end{aligned}$$
(30)

Finally, we can perform a first-order secure Boolean-to-arithmetic mask conversion (similar as described in Step 1) to replace the single Boolean operation with an arithmetic operation giving

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i + \bigoplus _{i=1}^{n-1} \kappa _i \right) + \bigoplus _{i=1}^{n-1} \mu _i, \end{aligned}$$
(31)

which equals to (27).

Step 3 In the third step, we wish to compute \(\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^n s_i\), for some random values to be used as output masks \(s_i \in \mathbb {Z}_{2^k}\), for \(i \in \{ 1, \ldots , n \}\). This can be achieved by conducting an \((n-2)\)th-order secure Boolean-to-arithmetic mask conversion (e.g., using Algorithm 1 when \(n=3\) or Algorithm 2 where \(n=4\) etc.) using the input \(\bigoplus _{i=1}^{n-1} \kappa _i \oplus \bigoplus _{i=1}^{n-2} \lambda _i\), \(\lambda _i\) (for \(i \in \{1, \ldots , n-2 \}\)) are random values, resulting in \(\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n-2} s_i \,\). Note that we choose the output masks of the lower-order conversion to be used as output masks for the order we are considering.

Then one can add \(s_{n-1}\) and \(s_n\) to get the desired result, i.e.,

$$\begin{aligned} \bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i . \end{aligned}$$
(32)

Step 4 We add the output of each step. Adding the output of Step 1 to the output of Step 3 produces

$$\begin{aligned} x + \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i . \end{aligned}$$
(33)

Then subtracting the output of Step 2 results in

$$\begin{aligned} x + \sum _{i=1}^{n} s_i \,. \end{aligned}$$
(34)

4.1.1 Complexity

Each of the steps described above, without the use of Boolean-to-arithmetic mask conversions of a lower order, will have a linear increase in time complexity with regard to the order of the side-channel resistance, that is, have time complexity \(\mathcal {O}(n)\). The recursive call to Boolean-to-arithmetic mask conversions of a lower order will increase the time complexity to \(\mathcal {O}( n^2)\).

4.2 Implementation details

Algorithm 3 gives an explicit implementation of a third-order secure Boolean-to-arithmetic mask conversion as an example of the method described above, which requires 74 instructions.

figure c

As previously, we proceed with a 3-SNI proof:

Lemma 3

(3-SNI of Algorithm 3) Let \(\{ x^{\prime }, r_1, r_2, r_3 \}\) be the input shares of Algorithm 3, and \(\{ x^{\prime \prime }, s_1, s_2, s_3 \}\) be the output shares for any set of t intermediate variables and any subset \(|k| \le t_k\) of output shares such that \(t + t_k \le 3\), there exists a subset I of intermediate variables with \(|I| \le 3\), such that the distribution of those t intermediate variables, and the output shares can be perfectly simulated from \(\{ x^{\prime }, r_1, r_2, r_3 \}\).

Proof

We construct two sets \(I = \{ x^{\prime }, r_1, r_2, r_3 \}\) and \(J = \{ \gamma _1, \gamma _2\), \(\gamma _3\), \(\beta _1\), \(\beta _2\), \(\delta _1\), \(\delta _2\), \(\kappa _1\), \(\kappa _2\), \(\alpha \), \(\mu _1\), \(\mu _2\), \(s_1\), \(s_2\), \(s_3 \}\) corresponding to the input shares and the random values required, respectively. We denote \(a_i\), for \(1 \le i \le 74\), as the intermediate values in Algorithm 3, the definition of which means that is easy to see that each \(a_i\) can be perfectly simulated from the input shares and/or the required random values. That is, any internal variable within Algorithm 3 can be perfectly simulated from a subset of elements from I and/or J. \(\square \)

Our proof was validated by implementing a simulator and verifying that the distributions of each \(a_i\), for \(1 \le i \le 74\) is identical for all values of \(x \in \mathbb {Z}_{2^4}\), without loss of generality. Likewise, the simulator also verified the joint distribution of all possible combinations of pairs and triplets of elements in \(I \cup J \cup \{a_1,\ldots ,a_{74}\}\) (i.e., the union of the set of inputs, required random values, and intermediate states) are identical for all values of \(x \in \mathbb {Z}_{2^4}\), without loss of generality. Thus, demonstrating that Algorithm 3 is resistant to first, second, and third-order side-channel analysis. In this case the choice of the field \(\mathbb {Z}_{2^4}\) is more important as a large field size could not be tested in a reasonable amount of time.

The number of inputs required for Algorithm 3 would seem to be too large for an efficient exhaustive search through all the possible sources of third-order leakage. However, we note that in simulating individual operations only a subset of the inputs or random values are required. The effect is similar to the use of gadgets in SNI proofs [4].

More concretely, to conduct a search of this algorithm efficiently the only combinations where at least one element in the elements chosen from the set \(I \cup J \cup \{a_1,\ldots ,a_{74}\}\) is dependent on x. That is, any combination where none of the elements were computed from variables dependent on x they can be safely discarded. For elements in \(\{a_1,\ldots ,a_{74}\}\) the simplest known expression using elements of \(I \cup J\) was taken by either following the sequence of instructions, or from the equations above. These expressions were used to generate a C source file to analyze that combination, which was then complied with the -Ofast flag using gcc. Thus, \(1.29 \times 10^5\) C source files were automatically created and compiled. Executing the resulting binaries required 48 CPU cores running for twelve weeks.

We test our algorithm as one block because, to date, breaking up our algorithm into gadgets that can be independently verified and creating an algorithm equivalent to those we propose has a prohibitive cost in performance [8]. That is, Coron’s algorithm has exponential complexity, with regard to the security order, compared to the quadratic complexity of the algorithm presented in this paper.

5 Comparison

Table 1 compares the performance of our method with previous work. We consider the work of Coron et al. [7] who proposed a high-order secure Boolean-to-arithmetic algorithm in 2014. We also discuss the recent results from Coron [8] and compare it with our results. We do not consider LUT-based methods as they would require a precomputation phase and additional memory (see Sect. 2). For this comparison, we estimated the operation count of all methods by considering all necessary operations excluding the generation of random numbers, loop-instruction overheads, and variable initialization.

Table 1 Operation count for different Boolean-to-arithmetic mask conversion methods up to a security order of eight

We estimate the costs for Coron et al. [7] higher-order Boolean-to-arithmetic mask conversion method as follows. For a single-masked AND (SecAnd) operation [7, Sect. 3], we estimate the number of required instructions to be

$$\begin{aligned} \begin{aligned} 2 \cdot (n+1)\cdot n+25, \end{aligned} \end{aligned}$$

with n being the security order. Furthermore, we estimate the higher-order secure masked addition function (SecAddGoubin) as defined in [7, Sect. 3.2] to be

$$\begin{aligned} \begin{aligned} (2 \cdot&(n+1)\cdot n+26+n) \\&+(k-1)\cdot [2\cdot (n+1)\cdot n+27]+(2\cdot (n+1)), \end{aligned} \end{aligned}$$

where k represents the bit-width of the operands. The Expand function has an estimated complexity of \(2\cdot (n+1)\) and the FullXor function requires \(2 \cdot n + n\).

Using these estimations, we calculated the total operation count for a higher-order Boolean-to-arithmetic mask conversion as defined in [7, Sect. 5] for register sizes of 8, 16, 32, and 64 bits and provide the results in Table 1. It shows that our solution is faster than Coron et al. method from [7] for all considered register widths and security orders, often by several orders of magnitude. Compared to [8], who adopts our original idea from [13], our solution is faster for any security order. Specifically, our third-order secure algorithm is more than twice as fast, and the difference increases to more than an order of magnitude by the eight order. Our second-order secure algorithm is at least one order of magnitude faster than previous work.

5.1 Performance details

Table 2 lists the number of required instructions for our proposed algorithms in terms of arithmetic and Boolean operations up to a security order of 8. As a reference, we also list Goubin’s solution in the first-order case, and our solution in the other cases.

Table 2 Number of arithmetic and Boolean operations required for our proposed method up to a security order of eight
Table 3 Comparison of required number of random variables

In Table 3, we list the number of required random variables to perform a Boolean-to-arithmetic mask conversion. Many mask conversion proposals give an artificially low instruction count hiding a large number of required random numbers. We list the number of random values required to compute Goubin’s original method [12], Coron et al. method [7], our original solution from 2016 [13], and Coron’s recent approach [8]. We compare these numbers with the number of random values required by our proposed algorithm. For the second- and third-order algorithms, we give the number of random values required by our explicit algorithms, where some random values are used to fulfill several purposes.

The results show that our solution requires significantly fewer random variables when compared to all other algorithms in the literature. In particular, when compared to [8], we require almost half the number of random values for the second-order case, and more than half the number random number is all other cases, increasing to an order of magnitude by the eighth order.

Finally, we provide a comparison of all solutions that takes into account the number of randomness required. Table 4 lists the total instruction count for the algorithms. We added the number of instructions listed in Table 1 with the number of instructions required to generate all random numbers that are required for the mask conversion. For this purpose, we consider a practical implementation of a Pseudo Random Number Generator (PRNG), like the Marsaglia’s Xorshift RNG used by Coron [8]. We consider only the XOR instructions, since modern processors are often able to perform shift operations for free. We also did not consider typical overheads that are caused by function calls or loading constants from memory. Therefore we assume that generating a random number takes five instructions in the comparison given in Table 4.

By considering the cost of generating random values, we show that our proposal is about twice as fast as the next best algorithm for lower orders and increases to more than ten times faster for the eighth order case and above.

Table 4 Total operation count including the generation of random numbers using Marsaglia’s Xorshift PRNG [21] from Coron’s the open-source C implementation

6 Implementation considerations

All algorithms described in this paper have the property that all calculated intermediates (and relevant higher-order combinations thereof) are statistically independent of the secret value x. In the past, it has been shown that the claimed security order of those algorithms is usually lower when they are directly applied in software or hardware. For example, in a software implementation intermediate values are often unintentionally combined by the underlying hardware architecture. One typical cause of leakage is where intermediate values of the algorithm, which are stored in some registers, get overwritten with other intermediate results of the algorithms. Other sources of leakage include the combination of internal signals that depend on two or more intermediate values which are either stored in registers (register interferences) or currently (or previously) used in operations in the processor’s datapath. Hence, implementations of first-order side-channel resistant algorithms may show first-order leakages in practice, and the same holds true for higher-order secure algorithms whose resistance level has shown to be actually lower than claimed [3].

Direct applications of secure algorithms in hardware require similar care when implemented. Integrated circuits in CMOS, for example, have the property that many gates make output transitions several times per clock cycle. Such transitions (glitches) contain information about the secret value, even though all intermediates have been carefully masked at the algorithm level [20]. State-of-the-art countermeasures try to get rid of those physical effects by applying (additional) countermeasures at the gate level (e.g., using secure logic styles such as dual-rail logic [18]) or algorithm level (e.g., using secret sharing and multi-party computation such as threshold implementations [23]).

Naïve implementation of algorithms that have been proven secure—in the sense that every calculated intermediate is statistically independent of the secret value—can, therefore, not be automatically considered resistant to side-channel analysis. However, the algorithms proposed in this paper can be combined with other countermeasures in order to guarantee resistance at the claimed security order. We do not provide any further details here since the countermeasures required will vary from one platform to another.

7 Conclusions

In this paper, we present Boolean-to-arithmetic mask conversion methods that can be computed in constant time for a masking scheme of second and third order. We present explicit algorithms for a second-order secure mask conversion that requires 31 instructions, and a third-order secure mask conversion that requires 74 instructions. Our second-order secure algorithm is at least one order of magnitude faster than previous work; and our corrected algorithm for third order is more than twice as fast.

Our algorithms are shown to be secure under the SNI model, although we treat our algorithms as a single gadget, rather than break it into smaller gadgets that can be independently verified. Attempts at achieving this have a large performance cost [8], i.e., exponential complexity with regard to the security order, compared to the quadratic complexity of the algorithm presented in this paper. An efficient method for that would allow one to break our algorithms into gadgets is left for future research. Also, given the resources required to validate the SNI proof for Algorithm 3, a similar proof for orders greater than three is also left for future research.