Keywords

1 Introduction

The mass adoption of connected devices, like smartphones, tablets or smart-watches, implied a deep change in the industry. From basic cellular phones, mobile devices evolved into indispensable microcomputers of everyday life. Our smartphone collects our information, verifies our identity, secures our credit card transactions, replaces our car keys, enables us to watch movies and series, and can perform many other “useful” operations.

It is thus mandatory that such smart objects provide users with enough security for the collected data. This turns out to be the role of trusted execution environments (TEE) that solved the security problem by delegating the sensitive tasks to an embedded secure element (eSE). However, smartphones are far from being standardized in all their aspects. For example the same OS version can run on multiple platforms. These platforms may have different flavors of eSEs or none at all. Thus, applications must adapt the security to the device at hand. Far from being an easy task, providers must even protect the device from the legitimate user itself, which in some cases may behave wickedly.

The white-box model takes on its full meaning in such an environment, where the cryptographic implementation is exposed. The white-box model can be seen as the opposite of the black box model, where the attacker has only access to the inputs and the outputs. Indeed, the white-box model assumes that the cryptographic primitive runs in an untrusted environment. In such a scenario the attacker has full access to the device: he can access the binary code of the application and observe or interact with the device’s execution, in order for example to extract the encryption key from the implementation. It is thus very difficult to provide efficient security solutions against such threats. Nonetheless, due to the mass adoption of smart-devices, solutions are required by the industry to mitigate the problem.

Besides, the market needs white-box solutions for DRMs, Pay-TV, secure storage, etc., motivating researchers all over the world to keep working on it. The competitive spirit of researchers, together with the market pressure, also motivated the creation of international challenges, like the WhiBox contest [1, 2], where users are invited to submit white-box AES implementations, and each participant can try to break others’ submissions. In these challenges, a white-box is considered unbroken until its key is found or its functionality reversed. Points are assigned to the participants who implement the longer lasting unbroken white-box and to those breaking the strongest ones. For example, an interesting result of the 2019 edition of the WhiBox competition [2] is that the best white-boxes withstood attacks for more than one month. Thus, despite theoretically flawed, the security offered by actual implementations can be sufficient for content which value is limited to a short-term period (like for example a live football world cup).

One of the many techniques used in the contest of white-box is the use of encodings [20], affine or non-linear functions applied to the input/output of tables, and unknown by the attacker. These random encodings provide a map from the clear world and the encoded world, and allow randomization of key-dependent data. Such countermeasures are however expensive to deploy. The designer often needs to find a tradeoff between security and complexity (as a combination of memory requirements and running time).

The purpose of our work is thus to suggest an encoding principle that can help to reduce such complexity rise, while maintaining the security at the state-of-the-art. In particular, we suggest a novel way to create encodings that are both cheap and safe. We show how to use a degraded semi-homomorphic encryption scheme, based on the work of Benaloh [6] to build non-linear encodings that provide better security against algebraic attacks as well as security against differential computation analysis (DCA). Our encodings allow dropping the use of tables for the most used operations in crytographic algorithms. Also, our proposition provides better performances for most operations by exchanging lookup table accesses with CPU operations. We provide both a security evaluation of our proposition together with an application to an AES-128 white-box. Based on our suggested AES-128 white-box, we provide memory and performances comparisons against state-of-the-art implementations.

This paper is organized as follows. Section 2 is dedicated to the state-of-the-art in white-box designs and attacks. Afterwards, Sect. 3 introduces our new encoding scheme. In Sect. 4 we provide a security analysis of our suggested white-box encoding. The complexity in terms of memory and time of our new design is evaluated in Sect. 5. Finally some suggestions for further developments of our idea are provided in Sect. 6. Section 7 concludes this work.

2 State-of-the-Art

In 2002, two seminal papers from Chow et al. [19, 20] introduced a new way to implement cryptographic algorithms that provided some risk mitigation. Such solutions to resist white-box attacks are known as “white-box cryptography” (WBC). They paved the way for a new and very active research field for both theoreticians and practitioners.

2.1 White-Box Designs

The work of Chow et al. [19, 20] exposed the first white-box descriptions for both DES and AES ciphers. Their design has been the white-box implementation reference since then. Chow et al. introduced the use of tables to perform computations. Also, similar to the randomization technique of Kilian [31], by encoding each table with input/output functions unknown to the attacker, Chow et al. provided an initial solution to the problem of obfuscating a program.

After the first white-box propositions were published, Link et al. suggested an improved version of the DES white-box in [36], to better resist attacks. However, several attacks that allowed to recover the key hidden in the DES white-box were proposed by Wyseur et al. [47] and by Goubin et al. [27]. In parallel, the AES white-box was broken using an algebraic attack by Billet et al. [7], which was further refined and generalized by Michiels et al. [39]. In order to thwart such attacks, other authors proposed further white-box designs of the AES-128, for example Bringer et al. in [16] suggested an approach based on polynomials, but their suggestion was broken by De Mulder et al. in [23]. In 2009, Xiao and Lai proposed an improvement of Chow et al. AES white-box in [48], but again, their proposal was broken by De Mulder et al. [22]. In 2010, Karroumi suggested in [30] to use dual ciphers to protect the AES white-box. Unfortunately such a design was also broken by Lepoint et al. [35].

Generally speaking, all propositions of AES and DES white-box designs have been shown to be theoretically broken in the sense that the embedded key can be extracted. For each proposition a theoretical attack was found, see for example [7, 35, 39].

In a parallel thread of work, researchers tried to clarify the security notions related to the white-box context, thus works like those of Delerablée et al. [24], Saxena et al. [44] and Bock et al. [12] appeared.

A further step in the understanding of the security of white box implementations was brought forward by the introduction, in 2015 and 2016, of two attacks borrowed from the field of physical security. These attacks were used to break AES white-box suggestions appeared meanwhile, like the work of Luo et al. [37] and Lee et al. [32]. Such attacks were fault injection attacks, presented by Sanfelix et al. [42] and side-channel analysis (a.k.a. differential computation analysis, DCA for short), by Bos et al. [14]. Eventually such attacks provided easier methods to break all previous contributions, and researchers started to shift their interest from the algebraic security, to such a physical security dimension.

Indeed, several practical attacks have further reduced the security margin provided by a white-box implementation [14, 42]. However, all is not lost, as such advanced attacks motivated the study of advanced countermeasures. During the WhiBox contest editions, for example, a few implementations stood more than one month, while the hacking community tried to break them. From such implementations stemmed new understanding and improved countermeasures (see for example [9, 13, 28, 41, 43]), that allowed to thwart, or mitigate, the attacks explained so far.

2.2 White-Box Encoding

In order to counteract such new attacks, designers suggested to adapt known embedded security countermeasures, like the masking countermeasure [29, 40]. For example the work of Lee et al. [34] suggested a masked AES white-box implementation. Although providing an undeniable improvement on the security of white-box instances, such countermeasures deteriorate white-box implementations in terms of memory and performances.

Encodings are one of the key concepts introduced by Chow et al. Despite their use in [20] to counteract algebraic attacks authors worked in the recent publications to improve the effectiveness of encodings and to provide a masking stage to the algorithm. The two notions of encodings and masking are sometimes overlapping, and an encoding scheme may act as a masking scheme, and vice-versa. Loosely speaking, masking is a technique that removes the correlation between a value and its representation by, for example, using Shamir’s secret sharing [46], while encoding is the application of (secret) input and output bijections to a transformation. Thus a secret sharing scheme can be seen as the application of the XOR bijection with a mask to the identity transformation, while an encoding can be interpreted as the application of some (non) linear masking scheme like for example [17] to a secret value.

It has been shown by various authors [9, 13, 28, 41, 43] how an accurate choice of encoding is paramount to the security of the white-box. In particular it seems that the best approach, as suggested by recent works [9, 28, 33, 45] is to use a linear masking on top of a non linear one. Such countermeasures are however expensive to deploy. The designer often needs to find a tradeoff between security and complexity (as a combination of memory requirements and running time). As an example, the winner of the 2019 edition of the WhiBox competition [2] used such an encoding (as revealed by the reverse engineering attack by Goubin et al. [28]) and the smallest implementation was about 20 MB for an AES-128 encryption.

In the following section we suggest a new encoding scheme that provides an improved security, fast operations, and a reduced memory footprint.

3 New Encoding Design for White-Box Constructions

Our proposal is directly inspired by the Benaloh cryptosystem, suggested in [6]. The original scheme is partially homomorphic, meaning that it allows to perform only one type of operation on plaintexts in the cipher domain. We modified the original Benaloh cryptosystem while preserving the semi-homomorphic properties in order to provide a new encoding scheme. Below, we recall the mathematical background that is used in the rest of the paper. Afterwards, we provide a brief explanation of the Benaloh cryptosystem together with our suggested modifications to use it as a white-box encoding.

3.1 Preliminaries

In the rest of this paper we will use notions such as quadratic residue, or higher residue. Such notions are detailed below.

Definition 1

(Quadratic Residue). Let \(m \in \mathbb {Z}_{n}^*\) for an odd integer n. Then m is a quadratic residue modulo n if there exists \(x \in \mathbb {Z}_{n}^*\) such that:

$$\begin{aligned} x^2 \equiv m \bmod n. \end{aligned}$$

If no such x exists, then m is a quadratic non-residue modulo n.

Definition 2

(Legendre symbol). Given a prime p and \(m \in \mathbb {Z}_{p}^*\), the Legendre symbol of m modulo p is denoted \(\left( \frac{m}{p}\right) \) and defined as follows:

$$ \left( \frac{m}{p}\right) = \left\{ \begin{array}{ll} 0 &{} {if m \equiv 0 \mod p}, \\ 1 &{} if \ m \ is \ a quadratic residue modulo \ p, \\ -1 &{} \text {otherwise.} \end{array} \right. $$

Definition 3

(Quadratic Residuosity Problem). The quadratic residuosity problem (QRP) is the following: given an odd composite integer n and \(m \in \mathbb {Z}_{n}^*\), decide whether m is a quadratic residue or a quadratic non-residue modulo n.

Remark 1

If n is prime, then the QRP can easily be solved by Euler’s criterion: for any \(m \in \mathbb {Z}_{n}^*\), \(\left( \frac{m}{n}\right) \equiv m^{(n-1)/2} \mod n\) (see for instance [38]).

Definition 4

(Higher Residue). Let \(m \in \mathbb {Z}_{n}^*\) for an odd integer n, m is said to be a d-residue modulo n if there exists \(x \in \mathbb {Z}_{n}^*\) such that

$$\begin{aligned} x^d \equiv m \bmod n. \end{aligned}$$

If no such x exists, then m is said to be a d-non-residue modulo n.

Definition 5

(Higher Residuosity Problem). The higher residuosity problem (HRP) is the following: given an odd composite integer n and \(m \in \mathbb {Z}_{n}^*\), decide whether m is a d-residue or a d-non-residue modulo n.

Remark 2

If the factorization of n is known, then the HRP can easily be solved (see [38]).

3.2 Original Description of Benaloh Cryptosystem

The Benaloh cryptosystem, introduced by Benaloh in 1994 [6] and improved by Fousse et al. [25], is an extension of the Goldwasser-Micali cryptosystem (GM) [26]. The latter’s security relies on the QRP, while the former’s on the HRP. Where the GM cryptosystem encrypts bits individually, Benaloh’s improvement allows blocks of bits to be encrypted at once. Both schemes are probabilistic cryptosystems in the sense that several encryptions of the same message under the same key yield different ciphertexts. In this section, we describe the original Benaloh’s cryptosystem.

Key Generation. The public and private key are generated as follows.

  • Choose a block size r and two large prime numbers p and q such that:

    • \(r | (p-1)\),

    • \(\gcd (r, (p-1)/r) = 1\),

    • \(\gcd (r, q-1) = 1\).

  • Set \(n=p \times q\) and compute \(\phi (n) = (p-1)(q-1)\).

  • Select \(y \in \mathbb {Z}_{n}^*\) such that, for any prime factor \(r_i\) of r:

    • \(y^{\phi (n)/r_i} \not \equiv 1 \bmod n\).

The public key is (nry), and the private key is (pq).

Encryption. Given the public parameters (nry) and a an element of \(\mathbb {Z}_{r}\), the encryption \(E_r\) is defined as:

$$\begin{aligned} E_r(a) = y^au^r \bmod n, \end{aligned}$$

where u is a random number in \(\mathbb {Z}_{n}^*\).

Decryption. Given decryption key (pq), and ciphertext c, the decryption \(D_r\) is defined as:

$$\begin{aligned} D_r(c) = \log _x( c^{\phi (n)/r}) \bmod n, \end{aligned}$$

where \(x = y^{\phi (n)/r} \bmod n\).

The homomorphic property is easily verified:

$$ \begin{array}{r c l} E_r(a) \times E_r(b) &{} \equiv &{} y^{a}u_0^r \times y^{b}u_1^r \mod n \\ &{} \equiv &{} y^{a + b}(u_0u_1)^r \mod n \\ &{} \equiv &{} E_r(a+b) \mod n. \end{array} $$

Our work aims at using this cryptosystem as an encoding. This allows to homomorphically perform some operations on the encoded values and thus reduce the overall memory cost of the white-box. In the following, we propose some modifications to achieve our goal.

3.3 Modified Benaloh Cryptosystem

We describe and motivate in this section our adaptations of the Benaloh cryptosystem to make it suitable for using as a white-box encoding. We deal with the encoding itself in Sect. 3.4.

Key Generation. The public and private key are generated as follows.

  • Choose a prime number p.

  • Choose a block size \(r = 2^k\) such that \(k \ge 2\) and r is the highest power of 2 which divides \(p-1\).

  • Select randomly y a generator of \(\mathbb {Z}_{p}^*\).

  • Select randomly \(t \in \mathbb {Z}_{p}^*\).

The public key is (pr) and the private key is (ty).

Compared to the original key generation, the private key t is introduced, u is fixed to 1 and the modulus is a prime number instead of a composite of two prime numbers. For the sake of simplicity, we keep the expression “private key” despite the fact that we use our private key both for encryption and decryption.

Encryption. Given the private key (ty) and the public parameter p, the encryption \(E_{t}\) is defined as:

$$\begin{aligned} E_{t}(m) = t\,y^m \bmod p, \end{aligned}$$
(1)

where m is an element of \(\mathbb {Z}_{p}\).

It is easily verified that:

$$\begin{aligned} E_{t}(m_0) \times E_{t}(m_1) \equiv E_{t^2}(m_0+m_1) \mod p. \end{aligned}$$

Compared to the original encryption, the definition set of m is extended to the entire group \(\mathbb {Z}_{p}\).

Decryption. Given the decryption key (ytpr) and the ciphertext c, the decryption function \(D_{t}\) is defined as:

$$\begin{aligned} D_{t}(c) = \log _x((t^{-1}c)^{(p-1)/r}) \bmod p, \end{aligned}$$
(2)

where \(x = y^{(p-1)/r} \bmod p\).

The decryption differs from the original one only by the multiplication by \(t^{-1} \bmod p\).

Motivations

About the Modulus. The modulus has been chosen to be a prime number. Since our proposal will rely on the QRP and will use a small modulus, there is no security benefit in choosing a composite modulus as the QRP is easy to solve even for small composite moduli.

About the Block Size. The block size r has been chosen to be a power of 2 in order to ensure that the least significant bit (LSB) of \(D_{t}(E_t(a + b))\) equals the exclusive-or between the LSB of a and the LSB of b. Our proposal is based on this property, with the condition \(r = 2^k\) and \(k \ge 2\).

About the Base. The base y has been chosen to be a generator of \(\mathbb {Z}_{p}^*\) as it guarantees that \(y^{(p - 1)/r} \not \equiv 1 \bmod p\). It is a requirement from the original key generation algorithm. The secrecy of y is a consequence of Sect. 4.1.

About the Key. In our proposal, the senstive data bits will be carried by the LSB of the exponent of y. The private key t has been introduced to hide this bit. If t was not present, with only the three previous modifications to the original Benaloh scheme, the sensitive bit m could be guessed by using the Legendre symbol of \(E_1(m)\), which equals 1–2 m. The multiplication of \(E_1(m)\) by a uniformly random number t makes its Legendre symbol equal to \(\left( \frac{t}{p}\right) (1 - 2m)\), which is equal to 1 or \(-1\) with the same probability 1/2.

3.4 Modified Benaloh Cryptosystem as White-Box Encoding

In this section we show how our modified Benaloh cryptosystem can be used as a white-box encoding. Our proposal relies on three well-known facts:

  1. 1.

    Any Boolean function can be expressed as a logical circuit composed of \(\mathtt {XOR}\), \(\mathtt {AND}\) and \(\mathtt {NOT}\) gates.

  2. 2.

    The sum in \(\mathbb {Z}_{}\) of two bits a and b is \(a + b = (a\, \wedge \, b)\ ||\ (a\, \oplus \, b)\), where || denotes the concatenation operator.

  3. 3.

    The sum in \(\mathbb {Z}_{}\) of one bit a with 1 is \(a + 1 = a \ ||\ (a\, \oplus \, 1) = a \ ||\ \bar{a}\).

We thus propose to consider the cryptographic algorithm as a logical circuit which gates are modified-Benaloh encoded. We describe hereafter how to encode (resp. decode) the circuit’s input (resp. output) and how to evaluate its gates.

Encoding and Decoding Functions 

Encoding Step. To encode a bit a, the modified key generation method (see Sect. 3.3) is run and the modified encrypting function (cf. Eq. (1)) is applied to \(2s + a\), where \(0 \le s < (p-1)/2\) is uniformly drawn at random. We will denote the encoding function by \(\mathsf {Enc}\):

$$ \mathsf {Enc}(a) = E_t(2s + a). $$

Here, s is introduced to make our encoding probabilistic, as does the random u in the original Benaloh scheme. Introducing this s instead of keeping the original u, we save one entropy bit: indeed, if the bit x was encoded as \(E_t(x) = t y^x u^r \bmod p\), then \(E_t(x) = t y^{vr + x} \bmod p\) with \(v = \log _y u\), and since \(r \ge 4\) and \(r\,|\,(p - 1)\), \((vr + x \bmod p - 1) \equiv x \bmod 4\). In other words, the second LSB of the exponent of y would always be 0. On the other hand, having both u and s does not provide more entropy to \(\mathsf {Enc}(x)\). Therefore, we discard u but introduce s to keep the probabilistic property of the Benaloh scheme.

Decoding Step. To decode a value \(\mathsf {Enc}(a)\), the modified decryption function (cf. Eq. (2)) is applied and the result is reduced modulo 2 to get a single bit. We will denote the decoding function by \(\mathsf {Dec}\):

$$ \mathsf {Dec}(\mathsf {Enc}(a)) = D_t(\mathsf {Enc}(a)) \bmod 2. $$

Evaluating Logical Gates 

As recalled before, a circuit can be constructed using only \(\mathtt {XOR}\), \(\mathtt {AND}\) and \(\mathtt {NOT}\) gates. We describe hereafter how each gate can be evaluated under the modified Benaloh-encoding.

From now, we consider that the encodings of the bits a and b are:

$$ \begin{aligned} \mathsf {Enc}(a)&= E_{t_0}(2s + a) = t_0\,y^{2s + a} \bmod p \\ \mathsf {Enc}(b)&= E_{t_1}(2s' + b) = t_1\,y^{2s' + b} \bmod p\text {.} \end{aligned} $$

\(\mathtt {XOR}\) Implementation. To compute an encoding of \(a \oplus b\), it is sufficient to multiply the encoding of the two bits:

$$\begin{aligned} \mathsf {Enc}(a)\mathsf {Enc}(b)&\equiv t_0t_1\,y^{2(s + s') + a + b} \bmod p \nonumber \\&\equiv t_0t_1\,y^{2(s + s' + ab) + (a\ \oplus \ b)} \bmod p\text {.} \end{aligned}$$
(3)

Let us verify that indeed \(\mathsf {Dec}(\mathsf {Enc}(a)\mathsf {Enc}(b)) = a \oplus b\). Let be \(\alpha \equiv 2(s + s' + ab) + (a \oplus b) \bmod p-1\). Since \(p - 1\) is even, \(\alpha \bmod 2 = a \oplus b\). Setting \(t = t_0t_1\), the decryption function \(D_t\) returns \(\alpha \) reduced modulo \(r = 2^k\), thus this step preserves the k least significant bits of \(\alpha \). Therefore, the result of the decoding function is \(a \oplus b\).

It follows that an arbitrary number of \(\mathtt {XOR}\) gates can be evaluated in a row without caring for carries, that is to say, if \(\{a_1,\dots ,a_n\}\) is a set of bits:

$$ \mathsf {Dec}\left( \prod _{i=1}^{n}\mathsf {Enc}(a_i)\right) = a_1 \oplus \dots \oplus a_n. $$

It is worth noticing that only modular multiplications, hence only CPU operations, are needed to evaluate \(\mathtt {XOR}\) gates. Thus, the evaluation of any linear function comes at no memory cost by using our modified-Benaloh encodings.

\(\mathtt {AND}\) Implementation. Contrary to the \(\mathtt {XOR}\) operation, we use tables to implement the \(\mathtt {AND}\) operation. A naive solution is to use one table with two operands as inputs. This table is used to decode each operand, evaluate the \(\mathtt {AND}\) gate and re-encode the result. Each table admits \(p^2\) entries and returns a \(\log _2 p\)-bit value, which implies a memory consumption of \(p^2 \lceil \log _2 p\rceil \) bits per table.

We present instead another solution based on right-shift tables admitting only p entries, thus reducing the memory consumption per table down to \(p \lceil \log _2 p\rceil \) bits. This solution is based on the fact that when \(\mathsf {Enc}(a)\) and \(\mathsf {Enc}(b)\) are multiplied together, y is raised to the power \(2(s + s' + ab) + (a \oplus b)\) (see Eq. (3)). Then, right-shifting it results in \(s + s' + ab\). Similarly, right-shifting the exponent of y in \(\mathsf {Enc}(a)\) (resp. \(\mathsf {Enc}(b)\)) gives s (resp. \(s'\)). By adding these three numbers, we have \(2(s + s') + ab\). Thus, we obtain with our solution the encoding of the desired bit ab as \(\mathsf {Enc}(ab) = E_t(2s'' + ab)\) for some t and \(s'' = s + s'\).

The first step is thus to right-shift the power of y in the expression of \(\mathsf {Enc}(a)\mathsf {Enc}(b)\), \(\mathsf {Enc}(a)\) and \(\mathsf {Enc}(b)\). The fact that \(r = 2^k\) impedes these three right shifts to be performed in a homomorphic way by successive left shifts. Thus, they have to be tabulated. Each table decrypts its entry (instead of decoding it, otherwise the random s’s would be lost), then right-shifts the result and finally encrypts the shifted value to which is added a random even number (see Fig. 1).

Fig. 1.
figure 1

Input and output of the three right-shift tables.

The output of the three tables are multiplied together to get, according to the notations in Fig. 1, \(\mathsf {Enc}(ab) = ty^{2(S + s + s') + ab} \bmod p\) with \(t = t_2t_4t_6\) and \(S = s_2 + s_4 + s_6\).

\(\mathtt {NOT}\) Implementation. To get an encoding of \(\bar{a} = a \oplus 1\), one must multiply \(\mathsf {Enc}(a)\) by any non-quadratic residue modulo p: let v be one of them, then there exists an integer \(\alpha \) such that \(v \equiv y^{2\alpha + 1} \bmod p\). Thus \(v\mathsf {Enc}(a) \equiv t_0y^{2(s + \alpha ) + a + 1} \equiv t_0y^{2(s + \alpha + a) + \bar{a}} \bmod p\).

Another solution for the \(\mathtt {NOT}\) implementation can be used: it is sufficient to integrate the \(\mathtt {NOT}\) operation to the next table. For instance, the next table can be an \(\mathtt {AND}\) table. The evaluation of the \(\mathtt {NOT}\) gate can thus be delegated to the next table due to the linear property of the \(\mathtt {XOR}\) gate.

From now, sequences of binary operations composed with \(\mathtt {AND}\)s, \(\mathtt {XOR}\)s and \(\mathtt {NOT}\)s can be computed under our modified-Benaloh encoding.

3.5 Using the Modified-Benaloh Encoding in a White-Box Design

Hereafter, we give a way to design a white-boxed cipher by using the modified-Benaloh encoding. As an example, we give in Sect. 5 an AES implementation using our proposal.

Overall Parameters. The modified-Benaloh key generation algorithm (see Sect. 3.3) is run to get the parameters p, r and y. They have to be common to all encoded bits within the white-box, in order for the logical gates to be correctly evaluated as described in Sect. 3.4. These parameters might be refreshed at any moment, at the cost of an extra decoding-then-encoding step on each encoded bit to switch from the former set of parameters to the new one.

Encoding Data Bits 

Encoding the Key. The \(\ell \)-bit cipher key is embedded within the white-box in a modified-Benaloh encoded form. The white-box embeds a table with \(\ell \) entries, each one being of the form \(t_j\,y^{2s_j + k_j} \bmod p\), where \(1 \le j \le \ell \) and \(k_j\) is the jth key bit.

Encoding the Plaintext. The white-box turns each plaintext bit to a modified-Benaloh encoding using a 1-bit input table. Each bit of the plaintext has its own encoding table. Such tables are of the form:

$$ T_i = \{\mathsf {Enc}(0), \mathsf {Enc}(1)\} = \{t_i\,y^{2s_i} \bmod p,\ t_i\,y^{2s_i' + 1} \bmod p\}. $$

Note that \(t_i\) here is fully independent from the \(t_j\)’s that protect the cipher key.

Let us detail here the use of the private key \(t_i\). We can remark that it must be the same for the two possible encodings of the same bit: suppose that we could have \(\mathsf {Enc}(0) = E_t(2s)\) and \(\mathsf {Enc}(1) = E_{t'}(2s' + 1)\) with \(t \ne t'\) and \((t,t')\) being different for each bit to encode. Let us consider a table that, within the circuit, decrypts its entry. As an example, it could be a table evaluating an \(\mathtt {AND}\) gate. For this decryption step, the private key of the table entry, or a product of several of them, if the entry is a product of several encodings, has to be known. Then, when the entry of such a table would be a combination of n encoded input bits, \(2^n\) tables (one per possible n-tuple of encoded bits) would be necessary to perform the operation delegated to the table. Therefore, the private key \(t_i\) has to be fixed per encoded bit and not per encoding.

Cipher Evaluation. Once each bit of the plaintext is encoded, the cipher, designed as a logical circuit, can be evaluated. One has just to perform binary operations on the encoded bits as explained in Sect. 3.4.

Decoding Data Bits. The ciphertext’s bits are decoded from their modified-Benaloh encoded form by a table that applies the decoding function on its entries. Note that one table per ciphertext bit is necessary, as each decoding table uses a different private key from the other decoding tables.

4 Security Considerations

With our modified-Benaloh scheme, we proposed to encode each bit b as:

$$\begin{aligned} \mathsf {Enc}(b) = t\,y^{2s+b} \mod p \end{aligned}$$

for some random numbers t and s. All logical operations between encoded bits are done in the encoding domain with modular multiplications and table accesses.

In this section we provide a security analysis of our proposal, by addressing some potential flaws.

4.1 About the Shift Tables

In this section, we study two attack paths opened by the shift tables used to evaluate the \(\mathtt {AND}\) gates.

The input-output-squared attack. Let \(n_{i,b} = t_i\, y^{2s_b + b} \bmod p\) be the Benaloh-encoded input of a shift table and \(n_{o,b} = t_o\, y^{2s + s_b} \bmod p\) the corresponding output. For the sake of clarity, we set here \(s = 0\) as it will not be useful in this section. Then \(n_{i,b}\, n_{o,b}^2 \equiv t_i\, t_o^2\, y^{4s_b + b} \bmod p\).

The attacker can collect \(n_{i,0}, n_{o,0}, n_{i,1}\) and \(n_{o,1}\) and compute:

$$ \begin{aligned} z&= n_{i,0}\, n_{i,1}\, n_{o,0}^2\, n_{o,1}^2 \bmod p \\&= t_i^2\, t_o^4\, y^{4(s_0+s_1) + 1} \bmod p. \end{aligned} $$

Since y is a generator of \(\mathbb {Z}_{p}^*\), there exists \(\tau _i, \tau _o\in \{0, \dots , p - 1\}\) such that \(t_i = y^{\tau _i} \bmod p\) and \(t_o = y^{\tau _o} \bmod p\). This implies that \(z = y^{4(\tau _o + s_0 + s_1) + 2\tau _i + 1} \bmod p\).

If the attacker can guess the least significant bit (LSB) of \(\tau _i\) from z, then he can compute \(\left( \frac{t_i}{p}\right) = \tau _i \bmod 2\) and therefore b. However, whereas the LSB of \(\log _y z\) is actually independent from the chosen generator y, it is not the case of its second LSB: let \(y'\) be another generator of \(\mathbb {Z}_{p}^*\) and a be the integer such that \(y = y'^{2a + 1} \bmod p\); then \(\log _{y'}(z) = \log _{y'}(y)\log _y(z) \equiv 2(a + \tau _i) + 1 \bmod 4\). Since a can be odd or evenFootnote 1, the knowledge of z does not give any information on \(\tau _i \bmod 2\).

The Frequency Attack. Suppose that we want to \(\mathtt {AND}\) two bits \(a_0\) and \(a_1\). Let \(s_0^{(0)}, s_0^{(1)}, s_1^{(0)}\) and \(s_1^{(1)}\) be four integers modulo \((p - 1)/2\) and let \(\mathsf {Enc}(a_0) = t_0\,y^{2s_0^{(a_0)} + a_0} \bmod p\) and \(\mathsf {Enc}(a_1) = t_1\,y^{2s_1^{(a_1)} + a_1} \bmod p\). The table fed by \(\mathsf {Enc}(a_0)\times \mathsf {Enc}(a_1)\) during the evaluation of the \(\mathtt {AND}\) gate (see Fig. 1) returns some \(\alpha = t\,y^{2\delta + \beta } \bmod p\) where \(\beta \) is a bit that equals \(a_0a_1 \oplus ((s_0^{(a_0)} + s_1^{(a_1)}) \bmod 2)\). Since \(\left( \frac{\alpha }{p}\right) = \left( \frac{t}{p}\right) (1 - 2\beta )\), \(\left( \frac{\alpha }{p}\right) = \left( \frac{t}{p}\right) \) with probability 3/4 if \(s_0^{(0)} \equiv s_0^{(1)} \bmod 2\) and \(s_1^{(0)} \equiv s_1^{(1)} \bmod 2\), which leaks the value of \(a_0a_1\). Therefore, the LSB of the random \(s_0^{(0)}, s_0^{(1)}, s_1^{(0)}\) and \(s_1^{(1)}\) should be adjusted to avoid this, for instance by imposing \(s_i^{(0)} \bmod 2 = s_i^{(1)} + 1 \bmod 2\) for \(i = 0, 1\). In the case where such a solution would not be tractable, we suggest to implement a Boolean-masked \(\mathtt {AND}\) to decorrelate the content of the shift tables from the bits to \(\mathtt {AND}\) together.

4.2 About the Key

We recall that in our proposal in Sect. 3.3, the cipher key is embedded in a Benaloh-encoded form within the white-box. Encoded this way, no information can be extracted on the key just looking at its encoded form: since each key bit b has its own random private subkey t, \(E_{t}(b)\) is indistinguishable from a random number in \(\mathbb {Z}_{p}\) and so t acts like a one-time pad.

On the other hand, some information about the key may be revealed by the following DCA-like attack. Seeing the cipher as a logical circuit, any gate output is Benaloh-encoded with a random t unknown from the attacker but fixed over all executions of the white-box. Then any variation of the Legendre symbol of the gate output is only due to a variation of the encoded bit.

Therefore, the attacker can focus on an \(\mathtt {AND}\) gate which inputs depend on a few key bits. By making an assumption on these key bits, the attacker can compute an expected sequence of outputs of his targeted \(\mathtt {AND}\) gate when the input plaintexts vary. Then, by comparing this sequence to the Legendre symbols actually output by the white-box, the attacker can accept or reject his hypothesis on the subkey. Repeating this procedure with different gates depending on other key bits, he can reduce the subset of possible cipher keys.

In order to thwart such an attack, a possible countermeasure consists in implementing a Boolean-masked circuit (in addition to applying the Benaloh-encoding). The impact of the masking on the \(\mathtt {XOR}\) gates comes at no memory cost. However, concerning the \(\mathtt {AND}\) masking, we suggest to use the secure \(\mathtt {AND}\) proposed by Biryukov et al. [8]. The impact on the number of tables is thus limited to only a factor 4.

4.3 Summary

In order to prevent the identified security issues, we decline our Benaloh-encoded white-box into two flavours:

  1. 1.

    Proposition 1: a lightweight white-boxed cipher without countermeasure,

  2. 2.

    Proposition 2: a white-boxed cipher implemented as a Boolean-masked circuit to protect vulnerable \(\mathtt {AND}\) gates (Sect. 4.2).

We close this section with a brief estimation of the computational effort needed to defeat our propositions:

  1. 1.

    Proposition 1 can be defeated by the Legendre symbol attack of Sect. 4.2. It is equivalent to a differential computational analysis (DCA) with a Legendre symbol leakage model. The results summarized in [9, Table 1] imply that the cost of the DCA Legendre symbol attack is \(O(nk\log _2 p)\), where n is the length of the trace, k is the number of key hypotheses and \(\log _2 p\) the cost of performing Euler’s criterion.

  2. 2.

    Proposition 2 can be defeated by a \(2^\text {nd}\)-order DCA if the sensitive data are shared into 2 bits. [9, Table 1] implies in this case that the cost of the Legendre symbol attack is \(O(n^2 k\log _2 p)\).

5 Performances: Example with AES-128 Encryption

In this section we provide performances estimations for an AES white-box implementation designed with our two propositions with the security improvements described in Sect. 4.3. Furthermore, we compare our AES-128 white-box design performances against other state-of-the-art designs, with respect to execution time and space requirements.

The AES can be written with only elementary gates. In particular, an AES can be only composed of \(\mathtt {XOR}\), \(\mathtt {NOT}\) \(\mathtt {XOR}\) (\(\mathtt {NXOR}\)), and \(\mathtt {AND}\) operations. The AES requires only \(\mathtt {XOR}\) gates to implement, except the SubBytes function that requires also \(\mathtt {AND}\) gates. For SubBytes, we use the bitsliced software implementation proposed by Calik [18, Sect. 7] which is an improvement of the Boyar and Peralta [15] circuit. He proposed an AES SBox with 113 gates, composed of \(77\,\mathtt {XOR}\)s, \(4\,\mathtt {NXOR}\)s and \(32\,\mathtt {AND}\)s. Following our proposal in Sect. 3.4), only \(\mathtt {AND}\) gates require tables. Hence, the memory consumption of an AES-128 with our proposition is \(15\,360 \times p \times \lceil \log _2(p)\rceil \) bits, where \(15\,360 = 32 \times 10 \times 16 \times 3\):

  • \(32\ \mathtt {AND}\)s are required for each SBox.

  • There are 10 rounds in the AES-128.

  • There are 16 input bytes.

  • There are 3 tables for each \(\mathtt {AND}\) with our solution (see Sect. 3.4).

Besides, the secret key is considered to be embedded within the white-box in a Benaloh-encoded form: it corresponds to 128 tables of \(\lceil \log _2(p)\rceil \) bits. Finally, the white-box requires one table to encode each plaintext bit, and one table per ciphertext bit to remove all the random masks \(t_i\) accumulated through the circuit: it corresponds to \(128 \times 2 = 256\) tables of \(p\lceil \log _2(p)\rceil \) bits. It leads to a total of \(15\,744 = 15\,360 + 128 + 256\) tables. We can note that whatever the size of the chosen parameter p, the number of tables is always the same, i.e. \(15\,744\).

In a same way, the execution time is constant and is not dependent on the size of the parameters (as long as they fit in the architecture registers). Indeed, whatever the chosen parameters of the implementation, we have \(15\,744\) tables to access and \(30\,520 = (92+128+((77+4+3\times 32)\times 16))\times 10\) \(\mathtt {XOR}\) gates to evaluate, where:

  • 92 represents the number of \(\mathtt {XOR}\)s in the MixColumns.

  • 128 represents the number of \(\mathtt {XOR}\)s for the AddRoundKey.

  • \(77+4\) represents the number of \(\mathtt {XOR}\)s in the SBox computations.

  • \(3\times 32\) represents the number of \(\mathtt {AND}\)s in the SBox computations, and the number of \(\mathtt {XOR}\)s required during the \(\mathtt {AND}\) calculations.

  • 16 represents the number of SBox in the AES-128.

  • 10 represents the number of rounds in the AES-128.

It leads to an execution time of \(15\,744\) table accesses and \(30\,520\) short modular multiplications. The \(\mathtt {NOT}\) gates are not taken into consideration. Indeed, we consider that these gates can be delegated to the next table of the circuit.

Table 1 gives the min and max bounds for memory consumption of an AES white-box using our proposition, where the bounds depend on the used prime number p.

Table 1. Memory consumption according to the bit size of p.

In Fig. 2 we provide a comparison of sizes and estimated execution time of published AES-128 white-boxes and our suggestions Proposition 1 and Proposition 2. The entries are sorted by publication date from the left to the right. The comparison aims at providing an overview of the evolution of white-box design sizes according to the execution time of the implementation. Figure 2 thus compares the size of a reference AES implementation [21] with the size of the white-box implementations of Chow et al. [20], Bringer et al. [16]Footnote 2, Xiao et al. [48], Karroumi [30], Lee et al. 1 [32], Lee et al. 2 [34], Lee et al. 3 [33], Luo et al. [37], Bai et al. [3], Biryukov et al. [9], Seker et al. [45]Footnote 3 and this work.

The execution time estimations are obtained by using the number of LUT accesses multiplied by 0.8 ns (typical RAM access times for DDR3 memory). For the works that did not use any table, we accounted for 0.5 ns per computation (typical operation time for a 2 GHz processor).

For our implementation, we choose a 6-bit prime number. Given the remarks of Sect. 4.3, a 6-bit prime number does not significantly weaken our white-box compared to longer primes, while allowing a competitive memory footprint. Concerning memory size, the overall memory cost of the complete implementation is dominated by the \(\mathtt {AND}\) tables. For example, by choosing \(p = 53\), with the implementation described in Sect. 3, Proposition 1 leads to an implementation of \(5,\!014,\!144\) bits (626.76 kilobytes), coherently with Table 1. The execution time is constant and is not dependent of the size of the prime number: \(15\,744 \times 0.8 + 30\,520\times 0.5 = 27\,855\) ns.

We remark that our new encoding allows a more efficient white-box design than the Chow et al. [20] one, and we also argue that our design may be adapted to a masked implementation with reduced size impact compared to the one of Biryukov et al. [9].

Fig. 2.
figure 2

Memory performances and estimated execution time of published AES-128 white-boxes.

6 Further Work

In this section we provide a few ideas to further develop our encodings. We organized such ideas in two main sections. The first section suggests improvements to the side-channel security of white-boxes. Afterwards we present ideas to thwart fault attacks.

6.1 Against Side-Channel Attacks

White-box implementations are vulnerable to attacks exploiting software execution traces containing information about the memory addresses being accessed or about manipulated data. In order to complexify such attacks, one may add countermeasures. For example, one can:

  • Shuffle and randomize the computations by introducing dummy operations as suggested in [10]. This can be achieved for example by computing \(\mathsf {Enc}(0) \times \mathsf {Enc}(b)\) for some b, at random time.

  • It is possible to mask the AES circuit prior to encoding it into tables. The impact on the size and speed would be balanced by the augmented security. We also remark that such countermeasure can easily thwart the Legendre symbol attack presented in Sect. 4.2.

6.2 Against Fault Attacks

White-box implementations are particularly vulnerable to fault attacks. An attacker can easily change the execution flow of the implementation or substitute the value of a variable [11, Sect. 7.2]. Hence, the design of a white-box must integrate countermeasures against such attacks. Typically redundancy (use of redundant representations such as a residue number system – RNS [4], or use of redundant information), error detecting or correcting techniques, or infective countermeasures [5] are used to thwart such attacks. For example, in order to introduce redundancy, one can observe that the exponents used in the encodings are values modulo r. Thus by using a composite r, it is possible to perform smaller computations modulo each prime dividing r. The result can then be recomputed by using the Chinese remainder theorem. Each of the two submodules can be used to:

  • Encode the same sensitive value, which provides redundancy.

  • Encode different bits of the plaintext, which provides efficiency.

  • Encode the correct value on one submodule, a random value on the second one, which provides randomization.

7 Conclusion

This work addresses the problem of encoding data when building a white-box implementation. We suggest a new encoding scheme based on the Benaloh cryptosystem that allows both compactness and speed. In particular, the semi-homomorphic property of our encoding allows to drop half of the tables (in our example those used for the \(\mathtt {XOR}\) operations) and to speed up computations by exchanging part of the tables (used for example in [20]) with homomorphic operations. We modify the Benaloh scheme in order to obtain an encoding that inherits the semi-homomorphic properties of the original design, while fitting the size constraints of the white-box context. Our new proposition allows the white-box designer to tune the performances and adapt the security of the implementation to meet its requirements.

As future work, we remark that it seems possible to enhance speed and memory consumption of our proposal. In this regard, a promising line of research is to parallelize multiplications and table accesses. Another direction for further work is the study of other homomorphic encryption schemes (e.g. lattice based). In particular, the study of fully homomorphic schemes may turn out advantageous. Indeed, we have shown in this paper that one can modify a semi-homomorphic scheme and use it as encoding. Thus it would be interesting to investigate the modification of a fully homomorphic scheme in a similar way. It could enable to enhance the security and the memory consumption by only keeping the input and output tables.