Constant-time higher-order Boolean-to-arithmetic masking

Hutter, Michael; Tunstall, Michael

doi:10.1007/s13389-018-0191-z

Constant-time higher-order Boolean-to-arithmetic masking

Regular Paper
Published: 23 May 2018

Volume 9, pages 173–184, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Cryptographic Engineering Aims and scope Submit manuscript

Constant-time higher-order Boolean-to-arithmetic masking

Download PDF

287 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Converting a Boolean mask to an arithmetic mask, and vice versa, is often required in implementing side-channel-resistant instances of cryptographic algorithms that mix Boolean and arithmetic operations. In this paper, we describe a method for converting a Boolean mask to an arithmetic mask that runs in constant time for a fixed order and has quadratic complexity as the security order increases, a significant improvement in previous work that has exponential complexity. We propose explicit algorithms for a second-order secure Boolean-to-arithmetic mask conversion that uses 31 instructions and for a third-order secure mask conversion that uses 74 instructions. We show that our second-order secure algorithm is at least an order of magnitude faster and our third-order secure algorithm is more than twice as fast as other algorithms in the literature.

High-Order Conversion from Boolean to Arithmetic Masking

Secure Conversion between Boolean and Arithmetic Masking of Any Order

Conversion from Arithmetic to Boolean Masking with Logarithmic Complexity

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Differential Power Analysis (DPA) was introduced as a means of extracting cryptographic keys by Kocher et al. [16] in 1999, who noted that the power consumption of a device was dependent on the operations being performed, and the value of the operands used. They showed that one could acquire the power consumption over time while a device was computing a cryptographic algorithm, and analyze the acquisitions to determine the cryptographic key. Subsequently, it was shown that the same analyses could be conducted by exploiting other side channels, e.g., the changes in the electromagnetic field around a microprocessor [1, 10, 25].

A typical DPA attack involves acquiring a series of acquisitions while a device is operating on varying inputs and analyzing the power traces by comparing what occurred at the same point in time in each trace. The simplest analysis is to choose one bit of an intermediate state and divide the set of acquisitions depending on the value of this bit, make two mean traces and subtract one trace from the other point by point. A significant difference should be visible in the trace corresponding to where this intermediate state was created by the device. This is typically referred to as a first-order analysis, as each point in the output trace is dependent on the same point in time in the acquisitions. If two (or more) points in each trace are combined, we refer to as a second-order (or higher-order) analysis.

To prevent the side-channel analyses of a cryptographic implementation, one would typically apply a random mask to the input such that operating on the masked data is indistinguishable from random data. A common masking technique is Boolean masking, where an input word gets masked by a random value. All operations are then performed using the Boolean-masked data. However, there exist many cryptographic algorithms that require both Boolean and arithmetic operations, such as the addition of integers, e.g., SHA-2 [22], ChaCha [5], Blake [2], Skein [9], IDEA [17], RC6 [27]. Masked versions of these algorithms therefore require changing Boolean masks into arithmetic masks, and vice versa, which we refer to as “Boolean-to-arithmetic" and “Arithmetic-to-Boolean" mask conversions, respectively.

In 2001, Goubin [12] proposed an efficient constant-time method for Boolean-to-arithmetic mask conversion. His method is secure against first-order analysis, but does not resist second-order attacks. The solutions in the literature use recursive methods [7, 28], where the missing carry bits are calculated using a masked-adder structure, or look-up table-based methods [30, 31], that perform precomputations and store intermediates in memory. It has also been suggested that higher-order versions of Boolean-to-arithmetic mask conversion cannot be done in constant time [30].

In this paper, we present novel algorithms for higher-order secure Boolean-to-arithmetic mask conversion. All proposed methods run in constant time and are independent on the input-word size. In particular, we present a second-order secure algorithm that requires only 31 instructions and a third-order secure algorithm that requires only 74 instructions. Our algorithms are significantly faster than the best recursive methods in the literature [7].

This paper is an extended version of previous work first published on the IACR’s eprint server in 2016 [13]. After a weakness was identified in our third, and higher-order, algorithms [8], the algorithms were updated and are presented here. Our second-order secure algorithm remains secure and is at least one order of magnitude faster than previous work. The algorithms we present were tested using exhaustive simulation, inspired by the state-of-the-art strong non-interference (SNI) notion. Proofs based on this notion are also provided. We show that our corrected algorithms are, at a minimum, about twice as fast as related work—including recent work [8].

In addition to the new contributions, we highlight the importance of the number of random numbers when comparing algorithms. Many algorithms give an artificially low instruction count hiding a large number of required random numbers. In comparing our work with other published works, we take into account the number of random values required to implement an algorithm by considering the number of instructions required to compute an Xorshift random number generator [21] used by Coron’s explicit implementation [8]. We demonstrate that our work is about twice as fast as Coron’s in all cases. Moreover, our algorithms have quadratic complexity with regard to the security order, whereas Coron’s algorithms have exponential complexity.

The paper is organized as follows. In Sect. 2, we describe the general Boolean-to-arithmetic mask conversion problem and discuss previous work. In Sect. 3, we present a novel constant-time algorithm to perform a secure second-order Boolean-to-Arithmetic mask conversion, and generalize it to higher orders in Sect. 4. In Sect. 5, we compare our work with other algorithms in the literature, and discusses implementation considerations in both software and hardware in Sect. 6. Conclusions are drawn in Sect. 7.

2 Boolean-to-arithmetic masking

In this paper, we shall consider operations available in a typical microprocessor with registers of a fixed bit length. Specifically, we shall consider values that are in the field $(\mathbb {Z}_{2^k}, \oplus , +)$ where $k \in \mathbb {Z}_{\ge 0}$ is the bit length of the registers used, $\oplus $ is a bitwise XOR operation and $+$ is integer addition. Other operations are available in a typical microprocessor, but are not relevant to the algorithms described in this paper.

We define the problem of changing a Boolean mask into an arithmetic mask as follows:

Definition 1

(The problem of converting Boolean to arithmetic masks) Given $x^{\prime }=x \oplus r$, where $x, r \in (\mathbb {Z}_{2^k}, \oplus , +)$, as a Boolean-masked secret x and r is a random value taken from $\mathbb {Z}_{2^k}$, we wish to be able to compute $x^{\prime \prime }= x + s$, with $s \in (\mathbb {Z}_{2^k}, \oplus , +)$ where $k \in \mathbb {Z}_{\ge 0}$, without revealing any information on x through some side channel, where $x^{\prime \prime }$ is the arithmetically masked secret x and s is a random value taken from $\mathbb {Z}_{2^k}$.

One naïve approach would be to perform the conversion directly by simply removing the Boolean mask and by adding an arithmetic mask afterward, i.e.,

$$\begin{aligned} (x^{\prime } \oplus r) + s = ((x \oplus r) \oplus r) + s = x + s = x^{\prime \prime }, \end{aligned}$$

using the notation given in Definition 1. This, however, would manipulate x directly, allowing an attacker to use side-channel analysis to determine that a hypothesized value of x is manipulated during the mask conversion. Hence, one needs to use an algorithm where all intermediates are statistically independent of the secret x.

Definition 1 is generalized to higher-order masking schemes as follows:

Definition 2

(The problem of converting Boolean to arithmetic masks of higher order) Assuming a masking scheme of order n. Then, given $x^{\prime }=x \oplus r_1 \oplus \cdots \oplus r_n$, where $x, r_i \in (\mathbb {Z}_{2^k}, \oplus , +)$, $k \in \mathbb {Z}_{\ge 0}$ for $i \in \{ 1, \ldots , n\}$, as a Boolean-masked secret x and n a random values, $r_i$ for $i \in \{ 1, \ldots , n\}$, taken from $\mathbb {Z}_{2^k}$, we wish to compute $x^{\prime \prime }=x + s_1 + \cdots + s_n$, with $s_i \in (\mathbb {Z}_{2^k}, \oplus , +)$ for $i \in \{ 1, \ldots , n\}$, without revealing any information on x through some side channel, where $x^{\prime \prime }$ is the arithmetically masked secret x and $s_i$, for $i \in \{ 1, \ldots , n\}$, are random values taken from $\mathbb {Z}_{2^k}$.

Higher-order mask conversion methods require that the masks used for the arithmetically masked output are not related to the Boolean-masked input to avoid any side-channel leakage. If we consider, without loss of generality, a second-order secure Boolean-to-arithmetic mask conversion that uses the same input masks $r_1$ and $r_2$ to mask the output, information would leak through the carries generated from the arithmetic masks. For ease of expression, we shall consider an attacker able to XOR two intermediate states together in a second-order side-channel attack (a very rough approximation of a second-order side-channel attack, we refer the interested reader to Mangard et al. [19] for a more detailed discussion). If an attacker can combine the input $x^{\prime }$ and the output $x^{\prime \prime }$ using some side-channel information, they obtain the following:

$$\begin{aligned} x^{\prime } \oplus x^{\prime \prime }&= (x \oplus r_1 \oplus r_2) \oplus (x + r_1 + r_2) \\&= (x \oplus r_1 \oplus r_2) \oplus ((x \oplus r_1 \oplus c_1)) \oplus r_2 \oplus c_2) \\&= c_1 \oplus c_2, \end{aligned}$$

where $c_1$ and $c_2$ represent the carries produced in the additions $x + r_1$ and $(x + r_1) + r_2$, respectively, as an XOR difference. That is, $c_1 = (x + r_1) \oplus x \oplus r_1$ and $c_1 = (x + r_1 + r_2) \oplus x \oplus r_1 \oplus r_2$. We note that $c_1$ and $c_2$ are dependent on x and could be used to conduct a side-channel attack.

To avoid this source of higher-order leakage, the output of the mask conversion needs to be masked with values that are independent of the input Boolean masks. This can be achieved through refreshing the masks during the conversion, either once or periodically, as required [7].

In the following, we describe some of the methods for mask conversion that have been presented in the literature.

2.1 Goubin’s method

Goubin proposed an efficient method of converting a Boolean mask to an arithmetic mask at CHES 2001 [12]. His method requires a constant number of instructions, is resistant to first-order side-channel analysis and, at the time of writing, remains the most efficient algorithm known.

The essential observation of Goubin was that the function

$$\begin{aligned} \varPhi _\mathbb {Z}(a, b) : \mathbb {Z}^2 \longrightarrow \mathbb {Z}: a, b \longmapsto (a \oplus b) + b \end{aligned}$$

(1)

is affine over $\mathbb {F}_2$. It follows that $( \varPhi (a, b) \oplus \varPhi (a, 0) )$ is linear for any $b \in \mathbb {Z}$. Trivially, we note the same function is valid in the field $(\mathbb {Z}_{2^k}, \oplus , +)$, for any $k \in \mathbb {Z}_{\ge 0}$, and in the remainder of this paper, we shall consider the function:

$$\begin{aligned} \varPhi (a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) + b \nonumber \end{aligned}$$

(2)

for some $k \in \mathbb {Z}_{\ge 0}$.

Taking the notation from Definition 1, for some arbitrary k in $\mathbb {Z}_{\ge 0}$, the above allows one to mask the computation of $\varPhi (x^{\prime }, r)=(x^{\prime } \oplus r) + r$ with an additional random value $\gamma \in \mathbb {Z}_{2^k}$. We recall $x, r \in (\mathbb {Z}_{2^k}, \oplus , +)$ and $x^{\prime } = x \oplus r$. Then,

$$\begin{aligned} \varPhi (x^{\prime }, \gamma \oplus r)=(x^{\prime } \oplus (\gamma \oplus r)) + (\gamma \oplus r), \end{aligned}$$

(3)

which can be followed by an unmasking step using

$$\begin{aligned} \varPhi (x^{\prime }, \gamma )=(x^{\prime } \oplus \gamma ) + \gamma . \end{aligned}$$

(4)

Hence, a secure Boolean-to-arithmetic mask conversion can be performed using the following relationship:

$$\begin{aligned} x^{\prime \prime }&= x^{\prime } \oplus \varPhi (x^{\prime }, \gamma ) \oplus \varPhi (x^{\prime }, \gamma \oplus r) \nonumber \\&= x^{\prime } \oplus [(x^{\prime } \oplus \gamma ) + \gamma ] \oplus [(x^{\prime } \oplus (\gamma \oplus r)) + (\gamma \oplus r)] \end{aligned}$$

(5)

where following the notation in Definition 1, $s=r$, i.e., $x^{\prime \prime } = x + r$. One can implement this conversion using 7 instructions (2 additions and 5 XOR operations), as described by Goubin, and is recalled in Algorithm 1.

Goubin then proceeds to give a proof of the following:

Lemma 1

An implementation of Algorithm 1 is resistant to first-order side-channel analysis.

Proof

From Algorithm 1, we can obtain the list of intermediate values $V_0,{\ldots },V_6$ that appear during the computation of (5):

$$\begin{aligned} V_0= & {} \gamma \\ V_1= & {} \gamma \oplus r \\ V_2= & {} x^{\prime } \oplus \gamma \\ V_3= & {} (x^{\prime } \oplus \gamma ) + \gamma \\ V_4= & {} \left[ (x^{\prime } \oplus \gamma ) + \gamma \right] \oplus x^{\prime } \\ V_5= & {} x^{\prime } \oplus \gamma \oplus r \\ V_6= & {} (x^{\prime } \oplus \gamma \oplus r) + (\gamma \oplus r) \end{aligned}$$

If we suppose that $\gamma $ is uniformly distributed on $\mathbb {Z}_{2^k}$, for some arbitrary $k \in \mathbb {Z}_{\ge 0}$, it is easy to see that:

The values $V_0, V_1, V_2$, and $V_5$ are uniformly distributed on $\mathbb {Z}_{2^k}$.
The distributions of $V_3, V_4$, and $V_6$ are dependent on $x^{\prime }$ but not on r. $\square $

We note that this proof holds in the field $(\mathbb {Z}_{2^k}, \oplus , +)$, for any $k \in \mathbb {Z}_{\ge 0}$, but not in $\mathbb {Z}$ since the carry produced by the most significant bits of x combined with the arithmetic mask will depend on x.

2.2 Recursive methods

One can also convert a Boolean-masked value into an arithmetically masked value using an addition operation, which generates the required carries that can then be applied to the Boolean-masked input value bit-by-bit. The first application was proposed by Goubin [12] as a means of converting an arithmetic mask to a Boolean mask (a topic beyond the scope of this paper), and a similar technique was described by Golić in 2007 who proposed using the same method for Boolean-to-arithmetic mask conversion in hardware [11]. Both conversion methods have a complexity of $\mathcal {O}(n)$ with regard to the bit length of the inputs because all n bits of the input word are processed individually.

Another hardware-oriented design was proposed by Schneider et al. [28], who presented a conversion method based on a Carry Look-ahead Adder (CLA) structure which reduces the complexity to $\mathcal {O}(\log n)$. They adopted a threshold implementation [23, 24] approach to avoid first and second-order side-channel leakage.

Recursive software implementations were proposed by, for example, Karroumi et al. They described a method adding two Boolean-masked values in $\mathcal {O}(n)$ time [15]. Coron et al. [6] were the first to propose the use of Carry Look-ahead Adders in software, thus reducing the complexity to $\mathcal {O}(\log n)$. Both works made use of masked AND operations, as defined by Trichina [29] and Ishai et al. [14], respectively.

2.3 Higher-order Boolean-to-arithmetic masking

Coron et al. [7] proposed a method for conducting a higher-order Boolean-to-arithmetic mask conversion (see Definition 2) at CHES 2014. Their algorithm calculates carries recursively and is built on masked AND and XOR operations that are resistant to higher-order side-channel analysis. Using these secure operations, one can construct an adder resistant to higher-order side-channel analysis with which one can also convert an arithmetic mask to a Boolean mask (the latter topic being beyond the scope of this paper). The authors reported that their fastest hth-order Boolean-to-arithmetic mask conversion has a minimum time complexity of $\mathcal {O}((2\,h+1)^2 n)$, with regard to the bit length of the inputs n.

The first look-up table-based conversion algorithm that resists second-order attacks was proposed by Vadnala and Großschädl [30], where, to achieve the desired level of resistance, the algorithm adopts the generic second-order secure S-box implementation of Rivain et al. [26]. Using this method, following the notation in Definition 2, one computes $x_i + r$ for fixed r, where $x_i \in \{ 0, \dots , 2^k \}$, and then chooses the correct masked output from all the possible values generated. However, a table with $2^k$ entries is required which is problematic if k is not small.

An improved version was proposed by Vadnala and Großschädl [31], where an input k-bit word would be split into p words with smaller bit widths of $\ell \le 8$ bits. The conversion is then done on each word individually, and the results combined. Their final solution has a time complexity of $\mathcal {O}(2^{\ell +2} p)$ and a memory requirement of $\mathcal {O}(2^{\ell +2} (\ell +2))$.

3 Constant-time second-order Boolean-to-arithmetic mask conversion

In this section, we present a novel method to perform second-order secure Boolean-to-arithmetic mask conversion whose time complexity is independent of the input-word size. Following the notation in Definition 2, we consider a Boolean-masked input $x^{\prime }=x \oplus r_1 \oplus r_2$, where $x, r_1, r_2 \in (\mathbb {Z}_{2^k}, \oplus , +)$, and an arithmetically masked output $x^{\prime \prime }=x + s_1 + s_2$, where $s_1, s_2 \in (\mathbb {Z}_{2^k}, \oplus , +)$.

In the following, we try to express the ideas behind the mask conversion as clearly as possible. This leads to many instances where an expedient expression will leak if implemented as shown because of the ordering of operations. We highlight some of the issues in the text but do not attempt to enumerate all the possible leaks that could be caused by incorrectly ordering operations.

3.1 Definitions

We recall (2), defined over the field $(\mathbb {Z}_{2^k}, \oplus , +)$, for any $k \in \mathbb {Z}_{\ge 0}$

$$\begin{aligned} \varPhi (a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) + b \nonumber \end{aligned}$$

(6)

for any $k \in \mathbb {Z}_{\ge 0}$. We shall also use the function

$$\begin{aligned} \bar{\varPhi }(a, b) \, : \, (\mathbb {Z}_{2^k}, \oplus , +)^2&\longrightarrow (\mathbb {Z}_{2^k}, \oplus , +) \\ a, b&\longmapsto (a \oplus b) - b \nonumber \end{aligned}$$

(7)

for any $k \in \mathbb {Z}_{\ge 0}$. While subtraction is not a field operation, we shall use it as a convenient way of expressing the addition with the additive inverse of an operand. Similar to $\varPhi $, Goubin notes that

$$\begin{aligned} x - r = x^{\prime } \oplus \bar{\varPhi }(x^{\prime }, \gamma ) \oplus \bar{\varPhi }(x^{\prime }, \gamma \oplus r), \end{aligned}$$

(8)

using the notation in Definition 1, and that $\bar{\varPhi }$ is also affine over $\mathbb {F}_2$ [12].

3.2 The Algorithm

Our conversion method consists of three steps.

1.
We compute $\left( x + \left( r_1 \oplus r_2 \oplus \alpha \right) \right) + s_1$ for some random values $\alpha , s_1 \in \mathbb {Z}_{2^k}$.
2.
We compute $s_2 - (r_1 \oplus r_2 \oplus \alpha )$ for some random $s_2 \in \mathbb {Z}_{2^k}$.
3.
Add the results of Steps 1 and 2, resulting in $x + s_1 + s_2$.

We describe these steps in detail below.

Step 1 We consider Goubin’s solution to the first-order Boolean-to-arithmetic mask conversion (5),

$$\begin{aligned} x + r = (x \oplus r) \oplus \varPhi (x \oplus r, \gamma ) \oplus \varPhi (x \oplus r, \gamma \oplus r) . \end{aligned}$$

(9)

Let $r=r_1 \oplus r_2$ and $\gamma = \gamma _1 \oplus \gamma _2$, where $r_1, r_2, \gamma _1, \gamma _2 \in \mathbb {Z}_{2^k}$, then

$$\begin{aligned} x + (r_1 \oplus r_2)&= (x \oplus r_1 \oplus r_2) \oplus \varPhi (x \oplus r_1 \oplus r_2, \gamma _1 \oplus \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x \oplus r_1 \oplus r_2, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2), \end{aligned}$$

(10)

or, more succinctly, using the notation from Definition 2,

$$\begin{aligned} x + (r_1 \oplus r_2)&= x^{\prime } \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2) . \end{aligned}$$

(11)

Given that $\varPhi $ is affine over $\mathbb {F}_2$, we can split the first $\varPhi $ operation giving,

$$\begin{aligned} x + (r_1 \oplus r_2)&= \varPhi (x^{\prime }, \gamma _1) \oplus \varPhi (x^{\prime }, \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime }, \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2). \end{aligned}$$

(12)

If one were to compute $x + (r_1 \oplus r_2)$ using the above, a second-order side-channel attack would be possible for same reason that we require the input and output mask to be different. That is, the combined leakage of the input $x^{\prime }$ and $x + (r_1 \oplus r_2)$ will depend on x (see Sect. 2).

To overcome this problem, we apply an extra Boolean mask, $\alpha \in \mathbb {Z}_{2^k}$, to $x^{\prime }$ as follows:

$$\begin{aligned} (x \oplus \alpha ) + (r_1 \oplus r_2)&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\qquad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2). \end{aligned}$$

(13)

However, $(x \oplus \alpha ) + (r_1 \oplus r_2)$ is not useful but can be modified given that $\varPhi $ is affine over $\mathbb {F}_2$, resulting in

$$\begin{aligned} x + (r_1 \oplus r_2 \oplus \alpha )&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2 \oplus r_1 \oplus r_2 \oplus \alpha ). \end{aligned}$$

(14)

If we consider (14), we note that the combination of $x + (r_1 \oplus r_2 \oplus \alpha )$ with either $\varPhi (x^{\prime } \oplus \alpha , \gamma _1)$ or $\varPhi (x^{\prime } \oplus \alpha , \gamma _2)$ will not be statistically independent of x. In both cases, the combination will be dependent on the carry bits produced by the arithmetic operations, as all variables occur an even number of times. The effect is similar to that seen if masks are not changed in a higher-order conversion algorithm, as discussed in Sect. 2. We can, prevent this by applying a Boolean mask $s_1 \in \mathbb {Z}_{2^k}$, giving:

$$\begin{aligned} (x + (r_1 \oplus r_2 \oplus \alpha )) \oplus s_1&= \varPhi (x^{\prime } \oplus \alpha , \gamma _1) \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _2) \nonumber \\&\quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _1 \oplus \gamma _2\nonumber \\&\quad \oplus r_1 \oplus r_2 \oplus \alpha ) \oplus s_1 . \end{aligned}$$

(15)

The order that (15) is computed is important to avoid combining masks that would allow a second-order side-channel attack. However, this is quite straightforward and will not be detailed here.

Then, given $\left( x + (r_1 \oplus r_2 \oplus \alpha )\right) \oplus s_1$, one can apply Goubin’s first-order Boolean to arithmetic mask conversion, as described in Algorithm 1, which will produce

$$\begin{aligned} x + (r_1 \oplus r_2 \oplus \alpha ) + s_1 \end{aligned}$$

(16)

without any first or second-order leakage.

Step 2 The second step is another Boolean-to-arithmetic mask conversion to securely compute $s_2 - (r_1 \oplus r_2 \oplus \alpha )$, where $s_2$ represents one of the two output masks. For this purpose, one can use the first-order secure Boolean-to-arithmetic mask conversion defined in (5), where we define $s_{2}^{\prime }=s_2 \oplus (r_1 \oplus r_2 \oplus \alpha )$ as the Boolean-masked input and $s_2^{\prime \prime }= s_2 - (r_1 \oplus r_2 \oplus \alpha )$ as the arithmetically masked output of the following conversion. Then, given (5), we have

$$\begin{aligned} s_2^{\prime \prime } = s_{2}^{\prime } \oplus \bar{\varPhi }(s_{2}^{\prime }, \delta ) \oplus \bar{\varPhi }(s_{2}^{\prime }, \delta \oplus r_1 \oplus r_2 \oplus \alpha ), \end{aligned}$$

(17)

where $\delta $ is a random value taken from $\mathbb {Z}_{2^k}$. If we let $\delta =r_1$, then

$$\begin{aligned} s_2^{\prime \prime } = s_{2}^{\prime } \oplus \bar{\varPhi }(s_{2}^{\prime }, r_1) \oplus \bar{\varPhi }(s_{2}^{\prime }, r_2 \oplus \alpha ), \end{aligned}$$

(18)

and, given that $\bar{\varPhi }$ is affine over $\mathbb {F}_2$, this can be rewritten as

$$\begin{aligned} s_2 - (r_1 \oplus r_2 \oplus \alpha ) = \bar{\varPhi }(s_{2}^{\prime }, r_1) \oplus \bar{\varPhi }(s_{2}^{\prime }, r_2) \oplus \bar{\varPhi }(s_{2}^{\prime }, \alpha ). \end{aligned}$$

(19)

Equation (18) requires a total of 7 XORs and 2 additions, whereas Eq. (19) requires 5 XORs and 3 additions. Thus, the first equation might be attractive for hardware implementations in cases where additions are more expensive than XOR operations.

Step 3 We can now compute the desired arithmetically masked value $x^{\prime \prime }$ by combining the output of (16) and (19), i.e.,

$$\begin{aligned} x^{\prime \prime }= & {} \left( \left( x + \left( r_1 \oplus r_2 \oplus \alpha \right) + s_1 \right) \right) + \left( s_2 - \left( r_1 \oplus r_2 \oplus \alpha \right) \right) \\= & {} x + s_1 + s_2 . \end{aligned}$$

3.3 Implementation details

Algorithm 2 shows the second-order secure Boolean-to-arithmetic mask conversion described above, which requires 31 instructions. We made some effort to reduce the number of instructions and random values required without affecting the level of security or the sequence of steps. That is, we compute the steps given at the beginning of Sect. 3.2. The details of the steps given above remain unmodified for clarity.

We prove the security of Algorithm 2 using the probing model proposed by Ishai et al. [14], where we seek to show that it is secure for up to two probes. For this, we use the refined model proposed by Barthe et al. [4] where we make use of the t-SNI (strong non-interference) construction, with t being the security order. This allows us to prove that an algorithm is only vulnerable to a side-channel attack of order n, where $n \ge t + 1$ (rather than $n \ge 2\,t + 1$ required by Ishai et al.).

Lemma 2

(2-SNI of Algorithm 2) Let $\{ x^{\prime }, r_1, r_2 \}$ be the input shares of Algorithm 2, and $\{ x^{\prime \prime }, s_1, s_2 \}$ be the output shares for any set of t intermediate variables and any subset $|k| \le t_k$ of output shares such that $t + t_k \le 2$, there exists a subset I of intermediate variables with $|I| \le 2$, such that the distribution of those t intermediate variables, and the output shares can be perfectly simulated from $\{ x^{\prime }, r_1, r_2 \}$.

Proof

We construct two sets $I = \{ x^{\prime }, r_1, r_2 \}$ and $J = \{ \gamma _1, \gamma _2, \alpha , s_1, s_2 \}$ corresponding to the input shares and the random values required, respectively. We denote $a_i$, for $1 \le i \le 31$, as the intermediate values in Algorithm 2, the definition of which means that is easy to see that each $a_i$ can be perfectly simulated from the input shares and/or the required random values. That is, any internal variable within Algorithm 2 can be perfectly simulated from a subset of elements from I and/or J. $\square $

This was validated by implementing a simulator and verifying that the distributions of each $a_i$, for $1 \le i \le 31$, are identical for all values of $x \in \mathbb {Z}_{2^4}$, without loss of generality. Likewise, the simulator also verified the joint distribution of all possible combinations of pairs of elements in $I \cup J \cup \{a_1,\ldots ,a_{31}\}$ (i.e., the union of the set of inputs, required random values, and intermediate states) are identical for all values of $x \in \mathbb {Z}_{2^4}$, without loss of generality. Thus, demonstrating that Algorithm 2 is resistant to first and second-order side-channel analysis. We chose the field $\mathbb {Z}_{2^4}$ to be small enough to make verification trivial and large enough that carries are propagated as they would be in an arbitrarily large field.

Remark 1

We note our proof is somewhat different to the proofs described by Barthe et al. [4]. Typically, one would seek to model intermediate states as random values to ease the computation complexity of the verification. However, the combination of Boolean and arithmetic operations makes this difficult, and it is simpler to model the random values as inputs to determine whether the distribution of each intermediate state is identical for all values that the masked input can take.

4 Higher-order Boolean-to-arithmetic masking

To generalize the algorithm described in Sect. 3, we consider an nth-order Boolean masking scheme, for $n > 2$, that masks the secret value x with random masks $r_1, \ldots , r_n$. That is, we wish to take $x^{\prime } = x \oplus \bigoplus _{i=1}^n r_i$ and compute $x^{\prime \prime } = x + \sum _{i=1}^n s_i$ without allowing any nth-order leakage to occur (see Definition 2).

In the following, we try to express the ideas behind the mask conversion as clearly as possible. This leads to many instances where an expedient expression will leak if implemented as shown because of the ordering of operations. We highlight some of the issues in the text but do not attempt to enumerate all the possible leaks that could be caused by incorrectly ordering operations.

4.1 The Algorithm

Our conversion method consists of four steps.

1.
We compute $x + (\alpha \oplus \bigoplus _{i=1}^n r_i) + \bigoplus _{i=1}^{n-1} \mu _i$ for some random values $\alpha , \mu _i \in \mathbb {Z}_{2^k}$.
2.
We compute $(\alpha \oplus \bigoplus _{i=1}^n r_i) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i$ for some random values $\kappa _i \in \mathbb {Z}_{2^k}$.
3.
We compute $\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i$ for all output masks $s_i \in \mathbb {Z}_{2^k}$.
4.
We combine the results of Steps 1, 2, and 3 to obtain $x + \sum _{i=1}^n s_i$.

We describe these steps in detail below.

Step 1 We consider Goubin’s solution to the first-order Boolean-to-arithmetic mask conversion (5):

$$\begin{aligned} x + r = (x \oplus r) \oplus \varPhi (x \oplus r, \gamma ) \oplus \varPhi (x \oplus r, \gamma \oplus r). \end{aligned}$$

(20)

Let $r=r_1 \oplus \cdots \oplus r_n$ and $\gamma = \gamma _1 \oplus \ldots \oplus \gamma _n$, where $r_1,\ldots , r_n, \gamma _1,\ldots , \gamma _n \in \mathbb {Z}_{2^k}$; then following the reasoning given in Sect. 3.2, we can state

$$\begin{aligned} x + \bigoplus _{i=1}^n r_i = x^{\prime } \oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \right) \oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) . \end{aligned}$$

(21)

Given that $\varPhi $ is affine over $\mathbb {F}_2$, we can split the first $\varPhi $ operation giving,

$$\begin{aligned} x + \bigoplus _{i=1}^n r_i= & {} ((n \wedge 1 ) x^{\prime } ) \oplus \left( \bigoplus _{i=1}^n \varPhi (x^{\prime }, \gamma _i) \right) \nonumber \\&\oplus \varPhi \left( x^{\prime }, \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) , \end{aligned}$$

(22)

where $\wedge $ is a logical-AND operation. That is, we require an XOR with $x^{\prime }$ only when n is odd.

To prevent second-order leakage caused by the combination of the input $x^{\prime }$ and the output of (22), we apply an extra Boolean mask, $\alpha \in \mathbb {Z}_{2^k}$, following the reasoning given in Sect. 3, i.e.,

$$\begin{aligned} x&+ \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) = (( n \wedge 1 ) ( x^{\prime } \oplus \alpha )) \nonumber \\&\oplus \left( \bigoplus _{i=1}^n \varPhi (x^{\prime } \oplus \alpha , \gamma _i) \right) \oplus \varPhi \left( x^{\prime } \oplus \alpha , \alpha \oplus \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) , \end{aligned}$$

(23)

where we compute $\varPhi (x^{\prime } \oplus \alpha , \gamma _i)$, for $i \in \{ 1, \ldots , n \}$, as

$$\begin{aligned} \varPhi ( x^{\prime },\alpha , \gamma _i ) \longmapsto (( x^{\prime } \oplus \gamma _i) \oplus \alpha ) + \gamma _i \end{aligned}$$

to avoid any second-order leakage caused by combining $(x^{\prime } \oplus \alpha )$ with the output of (23).

However, the computation would still cause a higher-order leak, i.e., when $x^{\prime }$, $\alpha $, and (23) get combined. Thus, we are required to add extra masks to prevent this leakage, and we use $\mu _i$ for $i \in \{1, \ldots , n-1 \}$ and also $\xi _i$ for $i \in \{1, \ldots , n-2 \}$, as follows:

$$\begin{aligned}&\left( x + \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) \right) \oplus \bigoplus _{i=1}^{n-1} \mu _i \nonumber \\&\quad = \bigoplus _{i=1}^{n-2} \xi _i \oplus (( n \wedge 1 ) ( x^{\prime } \oplus \alpha )) \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (x^{\prime } \oplus \alpha , \gamma _i) \oplus \mu _i \right) \nonumber \\&\quad \quad \oplus \varPhi (x^{\prime } \oplus \alpha , \gamma _n) ) \oplus \varPhi \left( x^{\prime } \oplus \alpha , \alpha \oplus \bigoplus _{i=1}^n \gamma _i \oplus r_i \right) \oplus \bigoplus _{i=1}^{n-2} \xi _i . \end{aligned}$$

(24)

Note that the masks $\xi _i$ are used to protect the intermediate values of (24) as there is not a secure way of ordering these terms without causing leakage. The masks, therefore, need to be interleaved with the computation and removed at the end.

The result is then passed through a function that will perform a Boolean-to-arithmetic mask conversion to replace the Boolean operation with an arithmetic operation, i.e.,

$$\begin{aligned} x +&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i. \end{aligned}$$

(25)

Note that the first-order conversion requires sufficient additional masks ($\delta _i$ for $i=1...n-1$) to not cause any leakage, i.e.,

$$\begin{aligned} x +&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i = ( (n-1) \wedge 1 )\zeta ) \nonumber \\&\oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\zeta , \delta _i) \right) \oplus \varPhi \left( \zeta , \bigoplus _{i=1}^{n-1} \delta _i \oplus \mu _i \right) , \end{aligned}$$

(26)

where $\zeta = (x + (\alpha \oplus \bigoplus _{i=1}^n r_i )) \oplus \bigoplus _{i=1}^{n-1} \mu _i$.

Step 2 In the second step, we wish to compute

$$\begin{aligned} \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i , \end{aligned}$$

(27)

for some random values $\kappa _i \in \mathbb {Z}_{2^k}$. In which, we view the combination of any elements of $\{ \kappa _1, \ldots , \kappa _{n-1} \}$, $\{ \mu _1 $, $\ldots $, $\mu _{n-1} \}$, and, likewise, the combination of any elements of $(r_1 \oplus \ldots \oplus r_n)$ as secret. Let

$$\begin{aligned} \epsilon = \alpha \oplus \bigoplus _{i=1}^n r_i \oplus \bigoplus _{i=1}^{n-1} \kappa _i , \end{aligned}$$

(28)

then, given (22), we can compute

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \kappa _i = \left( \left( (n-1) \wedge 1 \right) \epsilon \right) \nonumber \\&\quad \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\epsilon , \beta _i) \right) \oplus \varPhi \left( \epsilon , \bigoplus _{i=1}^{n-1} \kappa _i \oplus \beta _i \right) , \end{aligned}$$

(29)

where $\beta _i$ are random values taken from $\mathbb {Z}_{2^k}$ for $i \in \{ 1, \ldots , n-1 \}$. We note that the order in which operands are treated is particularly important. For example, the terms of the XOR sums need to be computed separately, i.e., $\bigoplus _{i=1}^n r_i \oplus \bigoplus _{i=1}^{n-1} \kappa _i = \left( \kappa _1 \oplus r_1 \right) \oplus \left( \kappa _2 \oplus r_2 \right) \oplus \cdots \oplus \left( \kappa _{n-1} \oplus r_{n-1} \right) \oplus r_n \,$.

As a next step, we add $n-1$ additional masks, as in Step 1, i.e., we add random values $\mu _i$ for $i \in \{1, \ldots , n-1 \}$ as follows:

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i + \bigoplus _{i=1}^{n-1} \kappa _i \right) \oplus \bigoplus _{i=1}^{n-1} \mu _i = \left( \left( (n-1) \wedge 1 \right) \epsilon \right) \nonumber \\&\qquad \qquad \oplus \left( \bigoplus _{i=1}^{n-1} \varPhi (\epsilon , \beta _i) \oplus \mu _i \right) \oplus \varPhi \left( \epsilon , \bigoplus _{i=1}^{n-1} \kappa _i \oplus \beta _i \right) . \end{aligned}$$

(30)

Finally, we can perform a first-order secure Boolean-to-arithmetic mask conversion (similar as described in Step 1) to replace the single Boolean operation with an arithmetic operation giving

$$\begin{aligned}&\left( \alpha \oplus \bigoplus _{i=1}^n r_i + \bigoplus _{i=1}^{n-1} \kappa _i \right) + \bigoplus _{i=1}^{n-1} \mu _i, \end{aligned}$$

(31)

which equals to (27).

Step 3 In the third step, we wish to compute $\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^n s_i$, for some random values to be used as output masks $s_i \in \mathbb {Z}_{2^k}$, for $i \in \{ 1, \ldots , n \}$. This can be achieved by conducting an $(n-2)$th-order secure Boolean-to-arithmetic mask conversion (e.g., using Algorithm 1 when $n=3$ or Algorithm 2 where $n=4$ etc.) using the input $\bigoplus _{i=1}^{n-1} \kappa _i \oplus \bigoplus _{i=1}^{n-2} \lambda _i$, $\lambda _i$ (for $i \in \{1, \ldots , n-2 \}$) are random values, resulting in $\bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n-2} s_i \,$. Note that we choose the output masks of the lower-order conversion to be used as output masks for the order we are considering.

Then one can add $s_{n-1}$ and $s_n$ to get the desired result, i.e.,

$$\begin{aligned} \bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i . \end{aligned}$$

(32)

Step 4 We add the output of each step. Adding the output of Step 1 to the output of Step 3 produces

$$\begin{aligned} x + \left( \alpha \oplus \bigoplus _{i=1}^n r_i \right) + \bigoplus _{i=1}^{n-1} \mu _i + \bigoplus _{i=1}^{n-1} \kappa _i + \sum _{i=1}^{n} s_i . \end{aligned}$$

(33)

Then subtracting the output of Step 2 results in

$$\begin{aligned} x + \sum _{i=1}^{n} s_i \,. \end{aligned}$$

(34)

4.1.1 Complexity

Each of the steps described above, without the use of Boolean-to-arithmetic mask conversions of a lower order, will have a linear increase in time complexity with regard to the order of the side-channel resistance, that is, have time complexity $\mathcal {O}(n)$. The recursive call to Boolean-to-arithmetic mask conversions of a lower order will increase the time complexity to $\mathcal {O}( n^2)$.

4.2 Implementation details

Algorithm 3 gives an explicit implementation of a third-order secure Boolean-to-arithmetic mask conversion as an example of the method described above, which requires 74 instructions.

As previously, we proceed with a 3-SNI proof:

Lemma 3

(3-SNI of Algorithm 3) Let $\{ x^{\prime }, r_1, r_2, r_3 \}$ be the input shares of Algorithm 3, and $\{ x^{\prime \prime }, s_1, s_2, s_3 \}$ be the output shares for any set of t intermediate variables and any subset $|k| \le t_k$ of output shares such that $t + t_k \le 3$, there exists a subset I of intermediate variables with $|I| \le 3$, such that the distribution of those t intermediate variables, and the output shares can be perfectly simulated from $\{ x^{\prime }, r_1, r_2, r_3 \}$.

Proof

We construct two sets $I = \{ x^{\prime }, r_1, r_2, r_3 \}$ and $J = \{ \gamma _1, \gamma _2$, $\gamma _3$, $\beta _1$, $\beta _2$, $\delta _1$, $\delta _2$, $\kappa _1$, $\kappa _2$, $\alpha $, $\mu _1$, $\mu _2$, $s_1$, $s_2$, $s_3 \}$ corresponding to the input shares and the random values required, respectively. We denote $a_i$, for $1 \le i \le 74$, as the intermediate values in Algorithm 3, the definition of which means that is easy to see that each $a_i$ can be perfectly simulated from the input shares and/or the required random values. That is, any internal variable within Algorithm 3 can be perfectly simulated from a subset of elements from I and/or J. $\square $

Our proof was validated by implementing a simulator and verifying that the distributions of each $a_i$, for $1 \le i \le 74$ is identical for all values of $x \in \mathbb {Z}_{2^4}$, without loss of generality. Likewise, the simulator also verified the joint distribution of all possible combinations of pairs and triplets of elements in $I \cup J \cup \{a_1,\ldots ,a_{74}\}$ (i.e., the union of the set of inputs, required random values, and intermediate states) are identical for all values of $x \in \mathbb {Z}_{2^4}$, without loss of generality. Thus, demonstrating that Algorithm 3 is resistant to first, second, and third-order side-channel analysis. In this case the choice of the field $\mathbb {Z}_{2^4}$ is more important as a large field size could not be tested in a reasonable amount of time.

The number of inputs required for Algorithm 3 would seem to be too large for an efficient exhaustive search through all the possible sources of third-order leakage. However, we note that in simulating individual operations only a subset of the inputs or random values are required. The effect is similar to the use of gadgets in SNI proofs [4].

More concretely, to conduct a search of this algorithm efficiently the only combinations where at least one element in the elements chosen from the set $I \cup J \cup \{a_1,\ldots ,a_{74}\}$ is dependent on x. That is, any combination where none of the elements were computed from variables dependent on x they can be safely discarded. For elements in $\{a_1,\ldots ,a_{74}\}$ the simplest known expression using elements of $I \cup J$ was taken by either following the sequence of instructions, or from the equations above. These expressions were used to generate a C source file to analyze that combination, which was then complied with the -Ofast flag using gcc. Thus, $1.29 \times 10^5$ C source files were automatically created and compiled. Executing the resulting binaries required 48 CPU cores running for twelve weeks.

We test our algorithm as one block because, to date, breaking up our algorithm into gadgets that can be independently verified and creating an algorithm equivalent to those we propose has a prohibitive cost in performance [8]. That is, Coron’s algorithm has exponential complexity, with regard to the security order, compared to the quadratic complexity of the algorithm presented in this paper.

5 Comparison

Table 1 compares the performance of our method with previous work. We consider the work of Coron et al. [7] who proposed a high-order secure Boolean-to-arithmetic algorithm in 2014. We also discuss the recent results from Coron [8] and compare it with our results. We do not consider LUT-based methods as they would require a precomputation phase and additional memory (see Sect. 2). For this comparison, we estimated the operation count of all methods by considering all necessary operations excluding the generation of random numbers, loop-instruction overheads, and variable initialization.

Table 1 Operation count for different Boolean-to-arithmetic mask conversion methods up to a security order of eight

Full size table

We estimate the costs for Coron et al. [7] higher-order Boolean-to-arithmetic mask conversion method as follows. For a single-masked AND (SecAnd) operation [7, Sect. 3], we estimate the number of required instructions to be

$$\begin{aligned} \begin{aligned} 2 \cdot (n+1)\cdot n+25, \end{aligned} \end{aligned}$$

with n being the security order. Furthermore, we estimate the higher-order secure masked addition function (SecAddGoubin) as defined in [7, Sect. 3.2] to be

$$\begin{aligned} \begin{aligned} (2 \cdot&(n+1)\cdot n+26+n) \\&+(k-1)\cdot [2\cdot (n+1)\cdot n+27]+(2\cdot (n+1)), \end{aligned} \end{aligned}$$

where k represents the bit-width of the operands. The Expand function has an estimated complexity of $2\cdot (n+1)$ and the FullXor function requires $2 \cdot n + n$.

Using these estimations, we calculated the total operation count for a higher-order Boolean-to-arithmetic mask conversion as defined in [7, Sect. 5] for register sizes of 8, 16, 32, and 64 bits and provide the results in Table 1. It shows that our solution is faster than Coron et al. method from [7] for all considered register widths and security orders, often by several orders of magnitude. Compared to [8], who adopts our original idea from [13], our solution is faster for any security order. Specifically, our third-order secure algorithm is more than twice as fast, and the difference increases to more than an order of magnitude by the eight order. Our second-order secure algorithm is at least one order of magnitude faster than previous work.

5.1 Performance details

Table 2 lists the number of required instructions for our proposed algorithms in terms of arithmetic and Boolean operations up to a security order of 8. As a reference, we also list Goubin’s solution in the first-order case, and our solution in the other cases.

Table 2 Number of arithmetic and Boolean operations required for our proposed method up to a security order of eight

Full size table

Table 3 Comparison of required number of random variables

Full size table

In Table 3, we list the number of required random variables to perform a Boolean-to-arithmetic mask conversion. Many mask conversion proposals give an artificially low instruction count hiding a large number of required random numbers. We list the number of random values required to compute Goubin’s original method [12], Coron et al. method [7], our original solution from 2016 [13], and Coron’s recent approach [8]. We compare these numbers with the number of random values required by our proposed algorithm. For the second- and third-order algorithms, we give the number of random values required by our explicit algorithms, where some random values are used to fulfill several purposes.

The results show that our solution requires significantly fewer random variables when compared to all other algorithms in the literature. In particular, when compared to [8], we require almost half the number of random values for the second-order case, and more than half the number random number is all other cases, increasing to an order of magnitude by the eighth order.

Finally, we provide a comparison of all solutions that takes into account the number of randomness required. Table 4 lists the total instruction count for the algorithms. We added the number of instructions listed in Table 1 with the number of instructions required to generate all random numbers that are required for the mask conversion. For this purpose, we consider a practical implementation of a Pseudo Random Number Generator (PRNG), like the Marsaglia’s Xorshift RNG used by Coron [8]. We consider only the XOR instructions, since modern processors are often able to perform shift operations for free. We also did not consider typical overheads that are caused by function calls or loading constants from memory. Therefore we assume that generating a random number takes five instructions in the comparison given in Table 4.

By considering the cost of generating random values, we show that our proposal is about twice as fast as the next best algorithm for lower orders and increases to more than ten times faster for the eighth order case and above.

Table 4 Total operation count including the generation of random numbers using Marsaglia’s Xorshift PRNG [21] from Coron’s the open-source C implementation

Full size table

6 Implementation considerations

All algorithms described in this paper have the property that all calculated intermediates (and relevant higher-order combinations thereof) are statistically independent of the secret value x. In the past, it has been shown that the claimed security order of those algorithms is usually lower when they are directly applied in software or hardware. For example, in a software implementation intermediate values are often unintentionally combined by the underlying hardware architecture. One typical cause of leakage is where intermediate values of the algorithm, which are stored in some registers, get overwritten with other intermediate results of the algorithms. Other sources of leakage include the combination of internal signals that depend on two or more intermediate values which are either stored in registers (register interferences) or currently (or previously) used in operations in the processor’s datapath. Hence, implementations of first-order side-channel resistant algorithms may show first-order leakages in practice, and the same holds true for higher-order secure algorithms whose resistance level has shown to be actually lower than claimed [3].

Direct applications of secure algorithms in hardware require similar care when implemented. Integrated circuits in CMOS, for example, have the property that many gates make output transitions several times per clock cycle. Such transitions (glitches) contain information about the secret value, even though all intermediates have been carefully masked at the algorithm level [20]. State-of-the-art countermeasures try to get rid of those physical effects by applying (additional) countermeasures at the gate level (e.g., using secure logic styles such as dual-rail logic [18]) or algorithm level (e.g., using secret sharing and multi-party computation such as threshold implementations [23]).

Naïve implementation of algorithms that have been proven secure—in the sense that every calculated intermediate is statistically independent of the secret value—can, therefore, not be automatically considered resistant to side-channel analysis. However, the algorithms proposed in this paper can be combined with other countermeasures in order to guarantee resistance at the claimed security order. We do not provide any further details here since the countermeasures required will vary from one platform to another.

7 Conclusions

In this paper, we present Boolean-to-arithmetic mask conversion methods that can be computed in constant time for a masking scheme of second and third order. We present explicit algorithms for a second-order secure mask conversion that requires 31 instructions, and a third-order secure mask conversion that requires 74 instructions. Our second-order secure algorithm is at least one order of magnitude faster than previous work; and our corrected algorithm for third order is more than twice as fast.

Our algorithms are shown to be secure under the SNI model, although we treat our algorithms as a single gadget, rather than break it into smaller gadgets that can be independently verified. Attempts at achieving this have a large performance cost [8], i.e., exponential complexity with regard to the security order, compared to the quadratic complexity of the algorithm presented in this paper. An efficient method for that would allow one to break our algorithms into gadgets is left for future research. Also, given the resources required to validate the SNI proof for Algorithm 3, a similar proof for orders greater than three is also left for future research.

References

Agrawal, D., Archambeault, B., Rao, J.R., Rohatgi, P.: The EM side-channel(s). In: Kaliski Jr., B.S., Koç, Ç.K., Paar, C. (eds.) CHES 2003, Volume 2532 of LNCS, pp. 29–45. Springer, Heidelberg (2003)
Google Scholar
Aumasson, J.-P., Henzen, L., Meier, W., C.-W. Phan, R.: SHA-3 Proposal BLAKE. https://131002.net/blake (2010) . Accessed May 2018
Balasch, J., Gierlichs, B., Grosso, V., Reparaz, O., Standaert, F.-X.: On the cost of lazy engineering for masked software implementations. In: Joye, M., Moradi, A. (eds.) CARDIS 2014, Volume 8968 of LNCS, pp. 64–81. Springer, Heidelberg (2014)
Google Scholar
Barthe, G., Belaïd, S., Dupressoir, F., Fouque, P.-A., Grégoire, B., Strub, P.-Y., Zucchini, R.: Strong non-interference and type-directed higher-order masking. In: Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C., Halevi, S. (eds.) ACM Conference on Computer and Communications Security 2016, pp. 116–129. Springer, Heidelberg (2016)
Google Scholar
Bernstein, D. J.: Chacha, a variant of salsa20 (2008)
Coron, J.-S., Großschädl, J., Tibouchi, M., Vadnala, P.K.: Conversion from arithmetic to Boolean masking with logarithmic complexity. In: Leander, G. (ed.) FSE 2015, Volume 8731 of LNCS, pp. 130–149. Springer, Heidelberg (2015)
Google Scholar
Coron, J.-S., Großschädl, J., Vadnala, P.K.: Secure Conversion between Boolean and Arithmetic Masking of Any Order. In: Batina, L., Robshaw, M. (eds.) CHES 2014, Volume 8731 of LNCS, pp. 188–205. Springer, Heidelberg (2014)
Google Scholar
Coron, J.-S.: Higher-order conversion from Boolean to arithmetic masking. IACR Cryptol. ePrint Arch. 2017, 252 (2017)
Google Scholar
Ferguson, N., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., Walker, J.: The skein hash function family. http://www.skein-hash.info (2010). Accessed May 2018
Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic analysis: concrete results. In: Koç, C.K., Naccache, D., Paar, C. (eds.) CHES 2001, Volume 2162 of LNCS, pp. 251–261. Springer, Heidelberg (2001)
Google Scholar
Golić, J.D.: Techniques for random masking in hardware. IEEE Trans. Circuits Syst. 54(2), 291–300 (2007)
Article MathSciNet MATH Google Scholar
Goubin, L.: A sound method for switching between Boolean and arithmetic masking. In: Koç, C.K., Naccache, D., Paar, C. (eds.) CHES 2001, Volume 2162 of LNCS, pp. 3–15. Springer, Heidelberg (2001)
Google Scholar
Hutter, M., Tunstall, M.: Constant time higher-order Boolean-to-arithmetic masking. Cryptol. ePrint Arch. Rep. 2016/1023/20161222:183711. https://eprint.iacr.org/2016/1023/20161222:183711 (2016). Accessed May 2018
Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against probing attacks. In: Boneh, D. (ed.) CRYPTO 2003, Volume 2729 of LNCS, pp. 463–481. Springer, Heidelberg (2003)
Google Scholar
Karroumi, M., Richard, B., Joye, M.: Addition with blinded operands. In: Prouff, E. (ed.) COSADE 2014, Volume 8622 of LNCS, pp. 41–55. Springer, Heidelberg (2014)
Google Scholar
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M.J. (ed.) CRYPTO ’99, Volume 1666 of LNCS, pp. 388–397. Springer, Heidelberg (1999)
Google Scholar
Lai, X., Massey, J.L.: A proposal for a new block encryption standard. In: Damgård, I. (ed.) Workshop on the Theory and Application of Cryptographic Techniques, Volume 473 of LNCS, pp. 389–404. Springer, Heidelberg (1990)
Google Scholar
Leiserson, A.J., Marson, M.E., Wachs, M.A.: Gate-level masking under a path-based leakage metric. In: Batina, L., Robshaw, M. (eds.) CHES 2014, Volume 8731 of LNCS, pp. 580–597. Springer, Heidelberg (2014)
Google Scholar
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks—Revealing the Secrets of Smart Cards. Springer, New York (2007)
MATH Google Scholar
Mangard, S., Popp, T., Gammel, B.M.: Side-channel leakage of masked CMOS gates. In: Menezes, A. (ed.) CT-RSA 2005, Volume 3376 of LNCS, pp. 351–365. Springer, Heidelberg (2005)
Google Scholar
Marsaglia, G.: Xorshift RNGs. J. Stat. Softw. 8(14), 1–6 (2003)
Article Google Scholar
National Institute of Standards and Technology (NIST). FIPS-180-4: Secure Hash Standard. http://csrc.nist.gov/publications/fips/fips180-4 (2015). Accessed May 2018
Nikova, S., Rechberger, C., Rijmen, V.: Threshold implementations against side-channel attacks and glitches. In: Ning, P., Qing, S., Li, N. (eds.) ICICS 2006, Volume 4307 of LNCS, pp. 529–545. Springer, Heidelberg (2006)
Google Scholar
Nikova, S., Rijmen, V., Schläffer, M.: Secure hardware implementation of nonlinear functions in the presence of glitches. J. Cryptol. 24(2), 292–321 (2011)
Article MathSciNet MATH Google Scholar
Quisquater, J.-J., Samyde, D.: Electromagnetic analysis (EMA): measures and counter-measures for smart cards. In: Attali, I., Jensen, T.P. (eds.) E-smart 2001, Volume 2140 of LNCS, pp. 200–210. Springer, Heidelberg (2001)
Google Scholar
Rivain, M., Dottax, E., Prouff, E.: Block ciphers implementations provably secure against second order side channel analysis. In: Nyberg, K. (ed.) FSE 2008, Volume 5086 of LNCS, pp. 127–143. Springer, Heidelberg (2008)
Google Scholar
Rivest, R.L., Robshaw, M.J.B., Sidney, R., Yin, Y.L.: The RC6 Block Cipher. http://people.csail.mit.edu/rivest/pubs/RRSY98.pdf (1998). Accessed May 2018
Schneider, T., Moradi, A., Güneysu, T.: Arithmetic addition over Boolean masking-towards first- and second-order resistance in hardware. In: Malkin, T., Kolesnikov, V., Lewko, A.B., Polychronakis, M. (eds.) ACNS 2015, Volume 9092 of LNCS, pp. 559–578. Springer, Heidelberg (2015)
Google Scholar
Trichina, E.: Combinational logic design for AES subbyte transformation on masked data. IACR Cryptol. ePrint Arch. 2003, 236 (2003)
Google Scholar
Vadnala, P.K., Großschädl, J.: Algorithms for switching between boolean and arithmetic masking of second order. In: Gierlichs, B., Guilley, S., Mukhopadhyay, D. (eds.) SPACE 2013, Volume 8204 of LNCS, pp. 95–110. Springer, Heidelberg (2013)
Google Scholar
Vadnala, P.K., Großschädl, J.: Faster mask conversion with lookup tables. In: Mangard, S., Poschmann, A.Y. (eds.) COSADE 2015, Volume 9064 of LNCS, pp. 207–221. Springer, Heidelberg (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Cryptography Research, 425 Market Street, 11th Floor, San Francisco, CA, 94105, USA
Michael Hutter & Michael Tunstall

Authors

Michael Hutter
View author publications
You can also search for this author in PubMed Google Scholar
Michael Tunstall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Tunstall.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hutter, M., Tunstall, M. Constant-time higher-order Boolean-to-arithmetic masking. J Cryptogr Eng 9, 173–184 (2019). https://doi.org/10.1007/s13389-018-0191-z

Download citation

Received: 02 August 2017
Accepted: 08 May 2018
Published: 23 May 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s13389-018-0191-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Constant-time higher-order Boolean-to-arithmetic masking

Abstract

Similar content being viewed by others

High-Order Conversion from Boolean to Arithmetic Masking

Secure Conversion between Boolean and Arithmetic Masking of Any Order

Conversion from Arithmetic to Boolean Masking with Logarithmic Complexity

1 Introduction

2 Boolean-to-arithmetic masking

Definition 1

Definition 2

2.1 Goubin’s method

Lemma 1

Proof

2.2 Recursive methods

2.3 Higher-order Boolean-to-arithmetic masking

3 Constant-time second-order Boolean-to-arithmetic mask conversion

3.1 Definitions

3.2 The Algorithm

3.3 Implementation details

Lemma 2

Proof

Remark 1

4 Higher-order Boolean-to-arithmetic masking

4.1 The Algorithm

4.1.1 Complexity

4.2 Implementation details

Lemma 3

Proof

5 Comparison

5.1 Performance details

6 Implementation considerations

7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation