1 Introduction

1.1 Background

With the development of information technology, our view of cryptosystem has been changed. In the traditional model, endpoints are considered trustworthy and data need only to be protected when transmitted through insecure channels. However, this model can no longer be applied to all situations now. According to the capability of an adversary to attack a cryptosystem, three attack models are proposed, black-box model, gray-box model and white-box model [8].

The black-box model is the traditional one in which an adversary only has access to the inputs and outputs of a cryptographic algorithm. In the gray-box model, an adversary can take advantage of some leaked information from within a cryptosystem to deploy side-channel cryptanalysis. In the white-box model, an adversary is assumed to have the ability to observe, even modify, the implementation of a cryptographic algorithm and complete access to the execution environment. In this attack model, the endpoints are open devices and no longer trustworthy.

Typical examples of a white-box model can be found in digital content protection systems such as DRM (Digital Rights Management) or Pay-TV systems, where key-instantiated cryptographic algorithms are implemented on e.g. set-top boxes, PCs, smart phones or tablets. A malicious user can fully control a terminal device and use decompiler and debugger tools both in static and dynamic ways to find out the secret key from a cipher’s software implementation. The extraction of the secret key would compromise the content protection. The attack could even be carried out by a malware program which is running in a terminal device and transmits the necessary key material through a communication channel, e.g. the Internet.

The attack under the white-box model is now one of the most harmful attacks. Yet unfortunately, almost all of the existing cipher schemes are not intended to operate in a white-box attack context. The secret key is usually stored in the form of plaintext and easy to be found. White-box cryptography, introduced by Chow et al. in 2002, is an essential technique to minimize security risks for open devices and aims to prevent the exposure of the confidential information [8]. The original key is obfuscated by random data and blended into the program codes. Then it is very difficult to determine the key even though the cryptographic algorithm is openly observable and modifiable.

Given the increasing demands for software-only applications, it comes as no surprise that white-box cryptography receives a lot of attention from industry. For example, white-box cryptography has been used to add additional security to HCE(Host Card Emulation)-based mobile payments to mitigate the risk caused by the absence of hardware security. It hides sensitive data within the mobile application. To achieve more security, in a practical application, white-box cryptography could be used together with other techniques to resist reverse engineering threats, e.g. code obfuscation and name mangling.

1.2 Previous work

In 2002, Chow et al. designed a white-box implementation to protect AES software [8]. In their scheme, AES is converted into network-encoded key-dependent lookup tables. Because of the fixed and public S-boxes and MixColumn matrices, their white-box AES implementation was broken soon. Billet et al. presented an efficient and practical attack in 2005, called BGE attack [2]. The secret key can be extracted with negligible memory and worst time complexity 230. In 2013, Lepoint et al. improved BGE attack and gave a new attack with a work factor of 222 [14]. In 2009, Michiels et al. proposed an algorithm to attack white-box implementations of SLT (substitution-linear transformation) ciphers and claimed that SLT ciphers are less suited for white-box implementations unless new white-box techniques were designed [17].

Some efforts have been made to improve Chow et al.’s design, but all failed. In [22], Xiao and Lai introduced a secure implementation of white-box AES to resist BGE attack by using enhanced lookup tables. In 2012, Mulder et al. [10] presented a practical cryptanalysis of Xiao et al.’s scheme, using the linear equivalence algorithm presented in [4] as a building block. The AES key was efficiently extracted with a work factor of about 232. In 2011, Karroumi presented an improved white-box AES implementation [13] using dual ciphers to modify the state and key representations in each round as well as two operations: SubBytes and MixColumns. He claimed that the complexity of BGE attack is raised to 291. However, Lepoint et al. pointed out that Karroumi’s white-box AES implementation was actually the same as Chow et al.’s [14].

At present, most studies focus on transforming existing cipher schemes into white-box implementation. The advantage of doing so is that the security of the original ciphers has been well demonstrated. Apart from white-box AES, there are some other white-box implementations, e.g., DES [7, 15], SHARK [20]. Cryptanalysis of white-box DES implementations can be found in [12, 21] and [11]. Similar to AES, the public and fixed confusion and diffusion operations of the original ciphers lead to the insecurity of their white-box implementations.

1.3 Characteristics of white-box cryptography

So far, there is not a standard specification for white-box cryptography. So the implementations may vary. Through observing published implementations, we summarize the characteristics of white-box cryptography as follows.

  • Practical Security: White-box cryptography is a technique to provide practical security for software applications in open devices. Though there still lacks solid theoretic foundations, it has been used in many actual industrial applications, e.g. Irdeto’s DRM solution, Bell ID’s HCE solution and whiteCryption’s Cryptanium Secure Key Box.

  • Security vs. Efficiency: The general way to hide the secret key is to use key-dependent lookup tables. It makes the application more secure, but meanwhile, it causes more complexity and cost. The designer should keep the balance between security and efficiency.

    However, from another point of view, the large size of white-box implementations could even be a security measure. It forces the attacker to consume a large amount of memory while computing a target function and makes the attack of code-lifting impossible. Using the concept of memory hardness, dedicated and tunable white-box ciphers are designed [3, 6]. They could be used to relieve the damage of having malware in a top secret network system.

  • Server Load: In a white-box cryptographic solution, the server is still considered trustworthy. So it is not necessary to use white-box implementations of cryptographic algorithms on the server side. Then in the situation of serving many users, the total memory size will not be a burden for the server.

1.4 Our contributions

In this paper, instead of merely transforming AES into white-box implementations, we design a dedicated white-box AES-like implementation. Firstly, we propose an AES-like block cipher. In [1], Bai et al. merely put forward a scheme to change S-boxes into key-dependent. We develop a method to construct key-dependent S-boxes and key-dependent MDS (Maximum Distance Separable) matrices to replace the corresponding components in AES. We demonstrate that our AES-like cipher has the similar security with AES in the black-box model.

Secondly, with a chosen key, our AES-like block cipher is transformed into white-box implementation by using Chow et al.’s method [8]. We demonstrate that our white-box AES-like implementation can resist known white-box attacks.

Our method could be taken as an example to develop new white-box implementations from other existing ciphers.

1.5 Paper organization

We begin by reviewing Chow et al.’s white-box AES implementation in Section 2. The main known white-box attacks toward it are also introduced in this section. In Section 3, we introduce our white-box AES-like implementation. The security is analyzed in Section 4. Section 5 is the assessment of size and performance. Finally, Section 6 raises the conclusions and further discussion.

2 Preliminaries

2.1 Terminology and notation

  • “ ⊕” denotes the bit-wise exclusive-or(xor). ⊕ c means the function f(x) = xc.

  • “ ∘” denotes the composition of two functions, i.e., (fg)(x) = f(g(x)). \(\underset {i=0}{\overset {N}{\circ }} f_{i}\) means f N f N−1∘…∘f 0.

  • “ ∥” denotes the concatenation of two bit-strings. If f and g are two functions, f(x)∥g(y) is denoted by (fg)(xy). \(\underset {i=0}{\overset {N}{\parallel }} f_{i}\) means f 0f 1∥…∥f N .

Definition 1 (encoding)

Let X be a function which maps m bits to n bits. F is a m-bit bijection and G is a n-bit bijection. X = GXF −1 is called an encoded version of X. F is the input encoding and G is the output encoding.

Definition 2 (network-encoding)

Let X and Y be encoded versions of X and Y. A network-encoding for XY is X Y = H∘(XY)∘F −1=(HXG −1)∘(GYF −1).

Definition 3 (linear transformation)

A linear transformation is a bit-string to bit-string transformation L such that L(xy) = L(x)⊕L(y), for any inputs x, y. If L is linear, then it can be represented using matrix-vector multiplication over G F(2).

2.2 AES

We give a brief review about AES here. For more details, please refer to [19]. For definiteness and without loss of generality, the encryption process of the AES variant with a 128-bit key is used throughout this paper. AES is a block cipher that maps a 16-byte input to a 16-byte output. Let K denote an AES key. According to the AES specification, AES encryption consists of 10 rounds and has 11 round keys, K r (r=0,1,…,10), which are expanded from K by using a key scheduling process. Each round updates a 16-byte state variable by applying a combination of four basic transformations.

  • AddRoundKey uses “ ⊕” to add a 16-byte round key K r into the 16-byte state and is denoted by \(ARK_{K^{r}}\). Then \(ARK_{K^{r}}(state)=\underset {i = 0}{\overset {15}{\parallel }} {\oplus _{{K_{i}^{r}}}(state_{i})}\), where \({K_{i}^{r}}\) is the ith byte of K r and s t a t e i is the ith byte of state.

  • SubBytes applies the AES S-box transformation, denoted by S, to every byte of the state. We denote SubBytes by SB. Then \(SB(state)=\underset {i = 0}{\overset {15}{\parallel }} S(state_{i})\).

    The bytes of state are interpreted as elements in G F(28) (modulo the irreducible polynomial of x 8 + x 4 + x 3 + x+1 on G F(2)). S is a composition of the multiplicative inverse transformation and an affine transformation.

    $$ S(state_{i}) = A \cdot state_{i}^{-1} \oplus c, $$
    (1)

    where A is a constant 8×8 matrix over G F(2) and c is a constant in G F(2)8.

  • ShiftRows is a permutation on the indices of the 16 bytes of the state and is denoted by SR. \(SR(state)=\underset {i = 0}{\overset {15}{\parallel }}state_{((i\mod 4) \times 4 + i) \mod 16}\).

  • MixColumns mixes every 4 consecutive bytes (every column, if a state is regarded as a 4×4 matrix) of one state and is denoted by M i x C.

    One mix operation is the product of a constant matrix MC and one column of a state. All the operations are over G F(28). Thus,

    $$ MixC(state)=\underset{i=0}{\overset{3}{\parallel}} MC\cdot\left( \begin{array}{*{20}c}state_{i \times 4}\\ state_{i \times 4+1}\\ state_{i \times 4+2}\\ state_{i \times 4+3} \end{array}\right). $$
    (2)

A conventional way to describe the whole AES encryption is as follows:

$$\begin{array}{l} AES_{K} = ARK_{K^{10}} \circ SR \circ SB \circ \left( \underset{r = 1}{\overset{9}{\circ}} (ARK_{K^{r}} \circ MixC \circ SR \circ SB)\right) \circ ARK_{K^{0}}. \end{array} $$

2.3 Chow et al.’s white-box AES implementation

In [8], the first step of Chow et al.’s white-box AES implementation is to convert AES into a series of lookup tables and hide the secret keys into these tables. The next step is to protect the table-based implementation with random network-encodings.

2.3.1 Table-based implementation

Compared with standard AES, without impacting the final result, the boundaries between rounds can be defined in different ways. Chow et al. rearranged AES as follows:

$$\begin{array}{@{}rcl@{}} AES_{K} &=& ARK_{K^{10}} \circ SB \circ SR \circ ARK_{K^{9}} \circ \left( {\underset{r=1}{\overset{9}{\circ}} \left( {MixC \circ SB \circ SR \circ ARK_{K^{r - 1}}} \right)} \right) \\ &=& ARK_{K^{10}} \circ SB \circ ARK_{\hat {K}^{9}} \circ SR \circ \left( {\underset{r=1}{\overset{9}{\circ}} \left( {MixC \circ SB \circ ARK_{\hat {K}^{r-1}} \circ SR} \right)} \right). \end{array} $$

Here, \(\hat {K}^{r - 1}\;(r = 1,2,\ldots ,10)\) is the result of applying ShiftRows to the round key K r−1. In each round, ARK and SB can be combined into a series of sixteen lookup tables that map 8 bits to 8 bits. Those tables are called T-boxes, defined as follows:

$$\begin{array}{@{}rcl@{}} && {T_{i}^{r}} (state_{i}) = S(state_{i} \oplus \hat {K}_{i}^{r - 1}), \;i = 0,\ldots,15 \;and\; r = 1,\ldots,9, \\ && T_{i}^{10} (state_{i}) = S(state_{i} \oplus \hat {K}_{i}^{9}) \oplus K_{i}^{10}, \;i = 0,\ldots,15. \end{array} $$

In rounds 1 to 9, after a byte is mapped through a T-box, it then goes into a M i x C transformation. Let Y denote a transformation as follows:

$$ Y(state_{i}) = \left\{\begin{array}{l} MC \cdot \left( \begin{array}{cccc} {state_{i}} & {00} & {00} & {00} \end{array}\right)^{T},\quad \text{if}\;i\;\bmod \;4 = 0 \\ MC \cdot \left( \begin{array}{cccc} {00} & {state_{i}} & {00} & {00} \end{array} \right)^{T},\quad \text{if}\;i\;\bmod \;4 = 1 \\ MC \cdot \left( \begin{array}{cccc} {00} & {00} & {state_{i}} & {00} \end{array}\right)^{T},\quad \text{if}\;i\;\bmod \;4 = 2 \\ MC \cdot \left( \begin{array}{cccc} {00} & {00} & {00} & {state_{i}} \end{array} \right)^{T},\quad \text{if}\;i\;\bmod \;4 = 3. \end{array}\right. $$
(3)

T-boxes can be combined with the transformation Y. We call the combinations \(T{Y_{i}^{r}}\;(r = 1,\ldots ,9\;\text {and}\;i =0,\ldots ,15)\) tables. Each \(T{Y_{i}^{r}}\) maps 8 bits to 32 bits. Then the final result of round r can be written as follows:

$$\left( \begin{array}{cccc} {state_{i}} & {state_{i + 1}} & {state_{i + 2}} & {state_{i + 3}} \end{array} \right) = \underset{j = 0}{\overset{3}{\oplus}} TY_{i + j}^{r} (state_{i + j})\;, \;i = 0,4,8,12. $$

To accomplish xor operations in the upper expression, XOR tables are needed for network encoding. Define a lookup table, XOR, which takes two 4-bit values as inputs and maps them to their xor:

$$XOR(x\parallel y) = x \oplus y. $$

Thus, except for ShiftRows, all transformations in AES are implemented as lookup tables.

2.3.2 Protecting table-based implementation

Random encodings are composed with each table. A mixing bijection \({P_{i}^{r}} \)(an invertible 8×8 matrix over G F(2), r=1,2,…,9 and i=0,1,…,15) is inserted before the \(T{Y_{i}^{r}}\) table and a mixing bijection MB (an invertible 32×32 matrix over G F(2)) is inserted after the \(T{Y_{i}^{r}}\) table. This forms type II tables.

The effect of MB must be eliminated before the next round. Similar with the transformation Y, extra 8 bits to 32 bits tables are required to calculate M B −1. \({Q_{i}^{r}} \) is the inverse of the next round’s \(P_{i}^{r+1}\). M B −1 and \({Q_{i}^{r}} \) are combined into type III tables. XOR tables are type IV.

External encodings are used to make the white-box AES implementation difficult to separate from its containing application. Select two mixing bijections (128×128 matrices over G F(2)) F and G. F is inserted prior to the first round and combined with the inverse of \({P_{i}^{1}}\). G is inserted after the AddRoundKey step of the last round. Extra 8 bits to 128 bits tables are implemented as type I to complete these two operations.

Finally, a set of non-linear 4-bit to 4-bit bijections are implemented on the input and output of each table. The non-linear bijections applied on the output of one table can be eliminated by the non-linear bijections applied on the input of the following table. Type I–IV tables are shown in Fig. 1.

Fig. 1
figure 1

Four table types in Chow et al.’s white-box AES implementation (r=1,…,9, i=0,…,15)

There are 16×2=32 type I tables, 16×9=144 type II tables, 16×9=144 type III tables and 480+(96+96)×9+480=2688 type IV tables. Each type I table needs 4 lookup operations. Then, the total number of lookups in Chow et al.’s white-box implementation is 3104. The needed storage space is 770048 bytes. For details about Chow et al’s white-box AES implementation, please refer to [8] and [18].

2.4 Main known attacks

An attacker’s goal with respect to the white-box model is to extract the secret key. All known attacks toward Chow et al.’s white-box AES implementation are based on the fact that AES’s specification is publicly known and the coefficients of the two AES operations, SubBytes and MixColumns, are fixed. In this section, we review two practical attacks, BGE attack and Lepoint et al.’s attack.

2.4.1 BGE attack

In Chow et al.’s white-box AES implementation, a round is composed of four 4-byte to 4-byte mappings which consist of type II tables, type III tables and XOR tables of type IV. Each mapping is the main analysis object in the BGE attack, as shown in Fig. 2. We can notice that the effects of mixing bijection MB are canceled when just considering the inputs and outputs of the mapping, i.e., the MixColumn matrix MC is no longer protected by MB.

Fig. 2
figure 2

One of the mappings in one round of Chow et al.’s white-box AES (r=1,…,9 and i=0,4,8,12)

Here, the input byte-encoding \({P_{i}^{r}}\) is a little different from which is mentioned in Section 2.3. It is the composition of two concatenated 4-bit to 4-bit input encodings and one 8-bit to 8-bit mixing bijection. The output encoding \({Q_{i}^{r}}\) is the inverse transformation of \(P_{i}^{r+1}\).

The BGE attack uses the relations between the inputs and outputs of the mapping in Fig. 2. Let x 0,x 1,x 2,x 3 denote four input bytes and let y 0,y 1,y 2,y 3 denote the resulting four output bytes. The relations can be written as follows:

$$ \left\{ \begin{array}{l} y_{0} = Q_{i}^{r}\left( 02 \cdot \bar{T}_{i}^{r}\left( x_{0}\right) \oplus 03 \cdot \bar{T}_{i + 1}^{r} \left( x_{1}\right)\oplus 01 \cdot \bar{T}_{i + 2}^{r}\left( x_{2}\right) \oplus 01 \cdot \bar{T}_{i + 3}^{r}\left( x_{3}\right) \right)\\ y_{1} = Q_{i + 1}^{r}\left( 01 \cdot \bar{T}_{i}^{r}\left( x_{0}\right) \oplus 02 \cdot \bar{T}_{i + 1}^{r}\left( x_{1} \right)\oplus03 \cdot \bar{T}_{i + 2}^{r}\left( x_{2}\right) \oplus 01 \cdot \bar{T}_{i + 3}^{r}\left( x_{3}\right) \right)\\ y_{2} = Q_{i + 2}^{r}\left( 01 \cdot \bar{T}_{i}^{r}\left( x_{0} \right) \oplus 01 \cdot \bar{T}_{i + 1}^{r}\left( x_{1} \right)\oplus02 \cdot \bar{T}_{i + 2}^{r}\left( x_{2} \right) \oplus 03 \cdot \bar{T}_{i + 3}^{r}\left( x_{3} \right) \right)\\ y_{3} = Q_{i + 3}^{r}\left( 03 \cdot \bar{T}_{i}^{r}\left( x_{0} \right) \oplus 01 \cdot \bar{T}_{i + 1}^{r}\left( x_{1} \right)\oplus01 \cdot \bar{T}_{i + 2}^{r}\left( x_{2} \right) \oplus 02 \cdot \bar{T}_{i + 3}^{r}\left( x_{3} \right) \right) \end{array} \right. $$
(4)

where r=1,…,9 and i=0,4,8,12. \(\bar {T}_{i}^{r}\) is a shorthand for \({T_{i}^{r}} \circ {P_{i}^{r}}\). The constant coefficients in (4) are the elements of the AES MixColumn matrix.

The BGE attack comprises three steps:

  • Step 1 removes the non-linear parts from the parasites \({Q_{i}^{r}}\) to get unknown affine bijections which are still marked as \({Q_{i}^{r}}\).

  • Step 2 recovers the unknown affine bijections \({Q_{i}^{r}}\).

  • Step 3 extracts the AES key by using the results of the two former steps.

In Step 2, each \({Q_{i}^{r}}\) can be expressed as \({M_{i}^{r}}(x)\oplus {q_{i}^{r}}\), where \({M_{i}^{r}}\) is a linear bijection and \({Q_{i}^{r}}\) is in G F(2)8. First, the linear part of \({Q_{i}^{r}}\) is recovered up to \({{\Lambda }_{{\gamma _{i}^{r}}} }\), for some non-zero \({\gamma _{i}^{r}}\) in G F(28), \({\Lambda }_{\gamma _{i}^{r}}\) denote the matrix over G F(2)8 of the multiplication by \({\gamma _{i}^{r}}\). There exists unique pairs \(\left (\delta _{i}^{r},c_{i}^{r} \right )\) of elements in G F(28), \(\delta _{i}^{r}\) being non-zero, such that:

$$\begin{array}{@{}rcl@{}} \tilde{P}_{i}^{r}:x &\mapsto& \left( {S^{- 1}}^{\circ} {{\Lambda}_{{\delta_{i}^{r}}}}^{\circ} \left( \tilde{M}_{i}^{r} \right)^{- 1} \right)\left( y_{i} \left( x,00,00,00 \right) \oplus c_{i}^{r} \right)\\ \tilde{P}_{i + 1}^{r}:x &\mapsto& \left( {S^{- 1}}^{\circ} {{\Lambda}_{\delta_{i + 1}^{r}}}^{\circ} \left( \tilde{M}_{i}^{r} \right)^{- 1} \right)\left( y_{i}\left( 00,x,00,00\right) \oplus c_{i + 1}^{r} \right)\\ \tilde{P}_{i + 2}^{r}:x &\mapsto& \left( {S^{- 1}}^{\circ} {{\Lambda}_{\delta_{i + 2}^{r}}}^{\circ} \left( \tilde{M}_{i}^{r} \right)^{- 1} \right)\left( y_{i}\left( 00,00,x,00 \right) \oplus c_{i + 2}^{r} \right)\\ \tilde{P}_{i + 3}^{r}:x &\mapsto& \left( {S^{- 1}}^{\circ} {{\Lambda}_{\delta_{i + 3}^{r}}}^{\circ} \left( \tilde{M}_{i}^{r} \right)^{- 1}\right)\left( y_{i}\left( 00,00,00,x\right) \oplus c_{i + 3}^{r} \right) \end{array} $$
(5)

are affine mappings, where S is the AES S-box and \(\tilde {M}_{i}^{r}=M_{i}^{r}\circ {\Lambda }_{1 \left / \gamma _{i}^{r}\right .}\). Those mappings are exactly \(\tilde {P}_{i}^{r}={P_{i}^{r}}(x)\oplus {K_{i}^{r}}\). From (5), \({\Lambda }_{\gamma _{i}^{r}}\) is derived. Then \({M_{i}^{r}}\) is recovered. \({Q_{i}^{r}}\) can be recovered at the same time.

Each \(\tilde {Q}_{i}^{r}\) can be determined in 224 work-steps. Thus, for an entire AES round, the computing is 16×224=228 steps. The total complexity of the attack is approximate to 230 because the attack should be applied to three consecutive rounds. For details about BGE attack, please refer to [2].

2.4.2 Lepoint et al.’s attack

Firstly, Lepoint et al. attacked the first round of Chow et al.’s white-box AES implementation. The main analysis object is the same as in the BGE attack, i.e., the mapping shown in Fig. 2.

Define \(S_{j} = S \circ \oplus _{\hat {K_{j}^{0}}} \circ P_{j}^{1},\quad j = 0,\ldots ,15\). The attack consists in finding collisions in output of the mapping:

$$Q_{i}^{1} \left( 02 \cdot S_{i} \left( x \right) \oplus 03 \cdot S_{i + 1} \left( 0 \right) \oplus c \right) = Q_{i}^{1} \left( 02 \cdot S_{i} \left( 0 \right) \oplus 03 \cdot S_{i + 1} \left( y \right) \oplus c \right), $$

where i=0,4,8,12 a n d c=01⋅S i+2(0)⊕01⋅S i+3(0), implying that

$$ 02 \cdot \left( S_{i} \left( x \right) \oplus S_{i} \left( 0 \right) \right) \oplus 03 \cdot \left( S_{i + 1} \left( y \right) \oplus S_{i + 1} \left( 0 \right) \right) = 0 $$
(6)

x and y vary over G F(28). Then, 255 linear equations can be obtained. Consider \(Q_{i + 1}^{1} \), we can obtain

$$ 01 \cdot \left( S_{i} \left( x \right) \oplus S_{i} \left( 0 \right) \right) \oplus 02 \cdot \left( S_{i + 1} \left( y \right) \oplus S_{i + 1} \left( 0 \right) \right) = 0 $$
(7)

Another 255 linear equations are obtained. Go on with \(Q_{i + 2}^{1} \) and \(Q_{i + 3}^{1} \), in all, we can get 4×255 linear equations involving 512 unknowns. By solving the system of equations, S i and S i+1 can be recovered. S i+2 and S i+3 can be achieved in the same way. With these recovered S j , j=0,1,…,15, the output byte-encodings \({Q_{i}^{1}} \), i=0,1,…,15, can be recovered, i.e., the inverse of the input byte-encodings of the second round.

The next step is to recover the secret key of the second round with recovered encodings. Lepoint et al.’s attack recovers the AES secret key form Chow et al.’s white-box AES implementation in complexity 222. For details about this attack, refer to [14]. In [14], Lepoint et al. also proposed several improvements to the BGE attack and reduced the overall work factor to 222 as well.

2.4.3 Other attacks

Michiels et al. presented a generalization of the BGE attack which can be applied to white-box implementations derived from generic SLT ciphers (if some specific conditions are met) by using Chow et al.’s white-box techniques [17]. The time complexity of this attack varies with the choice of the original SLT ciphers, i.e., AES or Serpent or some other SLT ciphers.

In [5], Biryukov et al. proposed a structural attack toward SASAS structures. S and A represent substitution and affine transformations. They hinted that their method could be adopted to attack Chow et al.’s white-box AES implementation [3].

3 Our white-box AES-like implementation

3.1 Motivation

Almost all candidates of white-box implementations are transformed from existing ciphers which are designed without consideration of running in a white-box attack context. Because the confusion and diffusion operations are fixed and public, these white-box implementations are easy to be broken.

To increase the difficulty of attacking, the designers naturally think about modifying these operations into key-dependent. But, how to ensure the reliability of the modified cipher is worth considering, i.e., the key-dependent confusion and diffusion components should not weaken the security in the black-model. In this paper, we design a white-box AES-like implementation. The security of our scheme is similar with AES.

3.2 Design strategy

In AES, the confusion operations are S-boxes and the diffusion operations are ShiftRows and MixColumn matrices. We take AES for reference to design our AES-like cipher:

  • Firstly, to get key-dependent S-boxes, we change the constants in the affine transformation (1) into key-dependent.

  • Secondly, we replace the MC matrix in (2) with a key-dependent 4×4 MDS matrix over G F(28).

  • Finally, we use Chow et al.’s techniques (see Section 2.3) to transform our AES-like cipher into a white-box version.

3.3 Secret key

A sequence of secret bytes would be used to generate key-dependent components: 5 bytes for one S-box and 4 bytes for one MDS matrix (see Sections 3.4.2 and 3.5.2). Taking AES as reference, 160 key-dependent S-boxes and 36 key-dependent MDS matrices are totally needed in our scheme. Then the optimal number of secret bytes would be 944.

In a practical application, however, it is difficult to handle a secret key with a length of 944 bytes. A better way is to derive a pseudo-randomness sequence from a short seed key. In our scheme, we use RC4 as an expanding process to derive 944 bytes from a 16-byte seed key. Actually, by RC4, the seed key’s length could vary from 1 byte to 255 bytes according to actual needs. Of course, if necessary, the expanding process could be replaced with other approaches.

3.4 Key-dependent S-box

3.4.1 Generation of 8×8 invertible matrix over G F(2)

In our scheme, two invertible 4×4 matrices are used to generate one invertible 8×8 matrix over G F(2). Two notations:

  • Let G s e e d be the group of all invertible 4 × 4 matrices over G F(2).

  • Let M 1M 2 denote \(\left (\begin {array}{ll} M_{1} & M_{1} \\ M_{1} & M_{1}+M_{2} \end {array} \right )\) , where M 1 and M 2 are in G s e e d .

The following Algorithm 1 is designed to prepare all elements in G s e e d and index them. One element M in G s e e d can be expressed by a word (2 bytes). The expression is

$$a_{3,3} a_{3,2} a_{3,1} a_{3,0} a_{2,3} a_{2,2} a_{2,1} a_{2,0} \parallel a_{1,3} a_{1,2} a_{1,1} a_{1,0} a_{0,3} a_{0,2} a_{0,1} a_{0,0} $$

among which a i,j (i=0,1,2,3 and j=0,1,2,3) are elements of matrix M in G F(2).

figure d

The maximum index is 20159. G s e e d requires a storage of 20160×16 bits, i.e., 39.375KB. We can use two elements in G s e e d to generate one invertible 8×8 matrix over G F(2). The process is based on the following Theorem 1.

Theorem 1

Let M 1 and M 2 be elements in G seed . Then M 1 ⊗M 2 is an invertible 8×8 matrix over GF(2) and the number of possible results is 20160 2 ≈2 29 when M 1 and M 2 run through G seed separately.

Proof

Noticing the fact that \(M_{1} \otimes M_{2} = \left (\begin {array}{ll} M_{1} & M_{1} \\ M_{1} & {M_{1}+M_{2}} \end {array}\right )\) can be transformed from \(\left (\begin {array}{ll} M_{1} & 0 \\ 0 & M_{2} \end {array} \right )\) after row and column elementary transformations and there are 20160 elements in G s e e d , the results of the theorem are obvious. □

With the indexed group G s e e d , we can design an nonreversible mapping to map two random bytes to one element in G s e e d . The mapping is described as Algorithm 2.

figure e

The whole process of Algorithm 2 can be denoted by M = F i n d M(x,y). Then we can use 4 random bytes, a 0, a 1, a 2, a 3, to determine an invertible 8×8 matrix A over G F(2), A = F i n d M(a 0, a 1)⊗F i n d M(a 2, a 3).

3.4.2 Generation of key-dependent S-box

A Key-Dependent S-box K e y S can be generated by changing the constant A and c in (1) into key-dependent A and c . Using 5 bytes, K 0, K 1, K 2, K 3, K 4, from the byte sequence of the secret key, K e y S can be expressed as

$$ KeyS(x) = (\left( {FindM\left( {K_{0} ,\;K_{1} } \right) \otimes FindM\left( {K_{2} ,\;K_{3} } \right)} \right) \cdot x^{-1}) \oplus K_{4} = A^{\prime}\cdot x^{-1} \oplus c^{\prime}, $$
(8)

where x is in G F(28).

3.5 Key-dependent MDS matrix

3.5.1 MDS matrix

The concept of MDS (Maximum Distance Separable) matrix comes from coding theory, in particular from maximum distance separable codes. An MDS matrix offers optimal linear diffusion and can be used to build diffusion layers in block ciphers. The MC matrix of AES in (2) is an example of MDS matrix. The following Lemma 1 provides an equivalent definition of MDS matrix [16].

Lemma 1

A square matrix A over a finite field R is an MDS matrix if and only if every square submatrix of A is nonsingular.

3.5.2 Generation of key-dependent MDS matrix

We use four bytes, K 0,K 1,K 2,K 3, from the secret key to generate one MDS matrix MM. The process is denoted by M M = G e n M D S(K 0,K 1,K 2,K 3) and is described as Algorithm 3. Theorem 2 proves the correctness of Algorithm 3.

figure f

Theorem 2

The 4×4 matrix MM over GF(2 8 ) which is generated by Algorithm 3 is an MDS matrix.

Proof

All the operations are over G F(28). If x,yG F(28), then xy means xy. Because b 0,b 1,b 2,b 3,b 4,b 5,b 6,b 7 are pairwise distinct constants, according to the Cauchy determinant formula, the determinant of MM is

$$\begin{array}{@{}rcl@{}} \left| MM \right| &=& \frac{\prod\limits_{0 \le i < j \le 3}\left( u_{j} - u_{i} \right)\left( v_{j} - v_{i} \right)}{\prod\limits_{i = 0}^{3} \prod\limits_{j = 0}^{3} \left( u_{i} + v_{j} \right)} \\ &=& \frac{\prod\limits_{0 \le i < j \le 3} \left( \left( a_{i} \oplus a_{j} \right)\parallel \left( b_{i} \oplus b_{j} \right) \right)\left( \left( a_{i + 4} \oplus a_{j + 4} \right)\parallel \left( b_{i + 4} \oplus b_{j + 4} \right) \right)}{\prod\limits_{i = 0}^{3} \prod\limits_{j = 0}^{3} \left( a_{i} \oplus a_{j + 4} \right)\parallel \left( b_{i} \oplus b_{j + 4} \right)}\\ &\ne& 0 \end{array} $$
(9)

i.e., MM is a nonsingular matrix. Similar to (9), every square submatrix of MM is nonsingular. Then, according to Lemma 1, we can draw the conclusion that MM is an MDS matrix. □

3.5.3 Number of possible results

In this section, we figure out the number of possible results of Algorithm 3 when K 0, K 1, K 2, K 3 run through G F(28) separately.

Theorem 3

Suppose \(K_{i}, K^{\prime }_{i} \in GF{\left (2^{8} \right )}, i=0,1,2,3\) , then we have

$$GenMDS\left( {K_{0}, K_{1}, K_{2}, K_{3}}\right)=GenMDS\left( {K^{\prime}_{0}, K^{\prime}_{1}, K^{\prime}_{2}, K^{\prime}_{3}}\right) $$

if and only if all \(a^{\prime }_{i}-a_{i}\) are equal, i=0,1,⋯ ,7, a i and \(a^{\prime }_{i}\) are halves of K i and \(K^{\prime }_{i}\) as described in Algorithm 3.

Proof

“ ⇐”: Suppose

\(GenMDS\left ({K_{0}, K_{1}, K_{2}, K_{3}}\right )=GenMDS\left ({K^{\prime }_{0}, K^{\prime }_{1}, K^{\prime }_{2}, K^{\prime }_{3}}\right )\),then we have

$$ \left\{ \begin{array}{l} {{u_{0}} + {v_{0}} = {u^{\prime}_{0}} + {v^{\prime}_{0}}}\\ {{u_{0}} + {v_{1}} = {u^{\prime}_{0}} + {v^{\prime}_{1}}}\\ {{u_{0}} + {v_{2}} = {u^{\prime}_{0}} + {v^{\prime}_{2}}}\\ {{u_{0}} + {v_{3}} = {u^{\prime}_{0}} + {v^{\prime}_{3}}} \end{array} \right. \Rightarrow {u^{\prime}_{0}} - {u_{0}} = {v^{\prime}_{i}} - {v_{i}}, i = 0,1,2,3 $$
(10)

and

$$ \left\{ \begin{array}{l} {{u_{0}} + {v_{0}} = {u^{\prime}_{0}} + {v^{\prime}_{0}}}\\ {{u_{1}} + {v_{0}} = {u^{\prime}_{0}} + {v^{\prime}_{1}}}\\ {{u_{2}} + {v_{0}} = {u^{\prime}_{0}} + {v^{\prime}_{2}}}\\ {{u_{3}} + {v_{0}} = {u^{\prime}_{0}} + {v^{\prime}_{3}}} \end{array} \right. \Rightarrow {v^{\prime}_{0}} - {v_{0}} = {u^{\prime}_{i}} - {u_{i}}, i = 0,1,2,3. $$
(11)

From (10) and (11), we know: all \({u^{\prime }_{i}} - {u_{i}}\) and \({v^{\prime }_{i}} - {v_{i}}\) are equal, i=0,1,2,3. Then we can draw the conclusion: all \(a^{\prime }_{i}-a_{i}\) are equal, i=0,1,⋯ ,7.

“ ⇒”: Reverse process of the above demonstration. □

From Theorem 3, we know that: for a fixed set of K i G F(28),i=0,1,2,3, there are other 15 sets of random 4-bytes can lead to the same result by using Algorithm 3. Then, we can work out the number of possible results of Algorithm 3 when K 0, K 1, K 2, K 3 run through G F(28) separately. The number is 232/16=228.

3.6 Our AES-like cipher

The operation of AddRoundKey in AES is no longer needed in our AES-like cipher. AES’s SubBytes and MixColumns are replaced with KeySubBytes and KeyMixColumns, as described below.

  • KeySubBytes applies the key-dependent S-boxes to the corresponding byte of the state and is denoted by K e y S B.

  • KeyMixColumns mixes every 4 consecutive bytes (every column) of one state by multiplying them with the corresponding key-dependent MDS matrix and is denoted K e y M i x C.

The whole process of our AES-like cipher can be accomplished in 4 steps. A pre-computation process is needed to prepare all the key-dependent S-boxes and MDS matrices which includes STEP1-3.

STEP1 – generation of 944 secret bytes

K is our 16-byte secret key. Using K as the key of RC4, we can get a pseudo-random sequence of bytes. Choose the first 944 bytes as K . Let \(K^{\prime }_{i}\) denote the ith byte of K .

STEP2 – generation of 160 key-dependent S-boxes

Denote the 160 key-dependent S-boxes as \(Key{S_{i}^{r}}\), r=1,…,10, i=0,1,…,15. According to Section 3.4.2, each \(Key{S_{i}^{r}}\) can be generated from 5 bytes: \(K^{\prime }_{16\times r + i + k}\), k=0,1,2,3,4. Then 800 bytes are needed to generate all the S-boxes.

STEP3 – generation of 36 key-dependent MDS matrices

Denote the 36 key-dependent MDS matrices as \(M{M_{j}^{r}}\), r=1,…,9, j=0,1,2,3. According to Section 3.5.2, each \(M{M_{j}^{r}}\) can generated from 4 bytes: \(K^{\prime }_{800 + 4\times r + j + l}\), l=0,1,2,3.

STEP4 – replacement

Replace AES’s SubBytes and MixColumns. We denote our scheme by K e y A E S. The whole cipher is expressed as:

$$\begin{array}{@{}rcl@{}} KeyAES_{K} &=& KeyS{B^{10}} \circ SR \circ \left( \underset{r = 1}{\overset{9}{\circ}} (KeyMix{C^{r}} \circ KeyS{B^{r}} \circ SR)\right)\\ &\,=\,& \left( \underset{i = 0}{\overset{15}{\parallel}} KeyS_{i}^{10}\right) \circ SR \circ \left( \underset{r = 1}{\overset{9}{\circ}} \left( \left( \underset{j = 0}{\overset{3}{\parallel}} M{M_{j}^{r}}\right) \circ \left( \underset{i = 0}{\overset{15}{\parallel}} Key{S_{i}^{r}}\right) \circ SR\right)\right). \\ \end{array} $$
(12)

3.7 White-box AES-like implementation

We use Chow et al.’s techniques to implement our AES-like cipher in a white-box model. According to (12), the lookup tables of our white-box AES-like implementation has the same structures with Chow et al.’s and can also be defined as Type I, Type II, Type III and Type IV like in Fig. 1. The difference is Type II table, as shown in Fig. 3.

Fig. 3
figure 3

Type II table in our white-box AES-like implementation (r=1,…,9, i=0,…,15 and j=0,…,3)

4 Security analysis

4.1 Security in the black-box model

4.1.1 Key space

Same with AES, the secret key’s length of our AES-like cipher is 16 bytes. It is impossible to carry out exhaustive key search. Another form of brute force attack is to guess all the key-dependent S-boxes and key-dependent MDS matrices. With a set of selected S-boxes and MDS matrices, the attacker could find an equivalent secret key in reverse.

From Theorem 1 and (8), we can see that to guess one key-dependent S-box, an attacker will face about 229×28=237 choices. According to the conclusion of Section 3.5.3, there are 228 choices to guess one key-dependent MDS matrix. There are totally 160 key-dependent S-boxes and 36 key-dependent MDS matrices used in our AES-like cipher. The total choices will be 26928. Then, it is also impossible to carry out this attack.

4.1.2 Properties of key-dependent S-box and MDS matrix

Key-dependent S-box

According to [9], the design criteria for AES S-box are:

  • Non-linearity. This is to prevent linear and differential cryptanalysis and is of more importance than other criteria.

  • Algebraic complexity. This is to prevent algebraic attacks.

As shown in (1), an AES S-box S is a composition of the multiplicative inverse transformation on G F(28) and an affine transformation on G F(2). Our key-dependent S-box K e y S have the same structure, shown in (8). The non-linearity is determined by the multiplicative inverse transformation and will not be changed by the affine transformation. Therefore, compared to AES, our key-dependent S-box has the same ability to resist linear and differential attacks.

The linear transformation over G F(2)8, induced by an arbitrary 8×8 invertible matrix M over G F(2), is expressed in [23] as \(M\left (x \right ) = {\sum }_{i = 0}^{7} {{a_{i}}{x^{{2^{i}}}}}\) over G F(28). Then, the transformation Mx −1a can be described as

$$\begin{array}{@{}rcl@{}} M\left( x^{- 1} \right) \oplus a &=& \sum\limits_{i = 0}^{7} a_{i}x^{-2^{i}} \oplus a = \sum\limits_{i = 0}^{7} a_{i}x^{2^{8} - 2^{i} - 1} \oplus a \\ &=& a_{0}x^{254} \oplus a_{1}x^{253} \oplus a_{2}x^{251} \oplus a_{3}x^{247} \oplus a_{4}x^{239} \\ &&\oplus a_{5}x^{223} \oplus a_{6}x^{191} \oplus a_{7} x^{127} \oplus a, \end{array} $$
(13)

a i and a are coefficients in G F(28). For AES S-box, they are 05, 09, f9, 25, f4, 01, b5, 8f, 63. For our key-dependent S-box, the coefficients are randomly changing with the secret key.

The algebraic complexity is measured by terms and degree of (13). For AES S-box, they are 9 and 254, respectively. If all the coefficients are not zero at the same time, our key-dependent S-box has the same algebraic complexity. The probability is ((28−1)/28)9≈0.965. It means that weak secret keys account for approximately 4 % of the key space in our scheme.

However, we could lengthen the secret key and filter weak ones to eliminate the effects of weak keys. To ensure that our key-dependent S-box has the same ability with AES S-box to resist algebraic attacks, the length of the secret key should be changed to 16/0.965≈17 bytes. It is an feasible method because the secret seed key could be lengthened up to 255 bytes in our scheme.

Key-dependent MDS matrix

The quality of a linear diffusion layer is evaluated by its branch number. A high branch number implies that changing a single bit of the input will change the output a lot, which is exactly what we expect from a good diffusion layer. MixColumn matrix MC in AES and our key-dependent matrix MM are both 4×4 MDS matrices over G F(28). Their linear branch numbers and differential branch numbers are the same. All the numbers are 5 [16]. Then, our key-dependent matrix has the same diffusion property with MixColumn matrix in AES.

From the above analysis on our key-dependent S-box and key-dependent MDS matrix, we can draw the conclusion that our AES-like cipher could be as secure as AES in a black-box environment.

4.2 Security in the white-box model

In the white-box model, an attacker is able to fully access inputs and outputs of each table in a round. His goal is to extract the secret key from the tables of our white-box AES-like implementation. In one of his analysis objects, shown in Fig. 2, a set of 4 key-dependent S-boxes and 1 key-dependent MDS matrix are hidden, which are determined by 5×4+4=24 bytes of the secret key.

A selected set of 4 S-boxes and 1 MDS matrix can be used to retrieve the 24 secret bytes or their equivalence. The attacker could guess all the 237×4+28=2176 choices of the set. But, due to the protection given by the encodings, it is difficult to do so. The difficulty can be described by white-box diversity and ambiguity. Known attacks could be used to improve the guessing attack. But the improved attack is still not feasible.

4.2.1 White-box diversity and ambiguity

Chow et al. discussed the white-box analogs of key space as white-box diversity and white-box ambiguity [8]. For a given table type, white-box diversity means in how many ways a table of this type can be constructed and white-box ambiguity is the number of distinct constructions which produce the same given table.

The white-box diversity for the various kinds of tables in our white-box implementation is:

  • Type I: (16!)2×2016064×(16!)32≈22420;

  • Type II: (16!)2×262.2×237×2256×(16!)8≈2798(higher than Chow et al.’s);

  • Type III: (16!)2×2256×(16!)8≈2699;

  • Type IV: (16!)2×16!≈2133.

The white-box ambiguity of a given table is:

  • a table of Type I: (16!)2×2016032≈2546;

  • a table of Type II: (16!)2×201608≈2203(the lower bound without considering the impact of S-boxes);

  • a table of Type III: (16!)2×201608≈2203;

  • a table of Type IV: 16!×16≈248.

In the above calculations, 16! means the sum of 4-bit non-linear encodings and 20160 means the sum of 4×4 invertible matrices over G F(2). For the key-dependent S-boxes, our white-box diversity of Type II is higher than Chow et al.’s. The other items are the same. From the above results, we know that it is impossible for an attacker to guess all the tables at the same time.

4.2.2 Against BGE attack

For our white-box AES-like implementation, BGE attack can be used to determine whether a selected set of 4 key-dependent S-boxes and 1 key-dependent MDS matrix represents the 24 bytes of secret key in an analysis object as shown in Fig. 2. From (4) and (5), we can know that the coefficients of the 4 S-boxes and 1 MDS matrix should be guessed at the same time.

When the 4 S-boxes and 1 MDS matrix are selected, the time complex of attacking one mapping in Fig. 2 is 224. Then the time complex of attacking an entire round is about 4×(237)4×228×224=2202. The entire time complex of recovering all the bytes of secret key in our white-box AES-like implementation will exceed 9×2202≈2205.

4.2.3 Against Lepoint et al.’s attack

Lepoint et al.’s attack can also be used to determine whether a selected set of 4 key-dependent S-boxes and 1 key-dependent MDS matrix represents the 24 bytes of secret key in an analysis object as shown in Fig. 2. From (6) and (7), we can know that an attacker should guess 2 key-dependent S-boxes and 1 key-dependent MDS matrix at the same time. The sum of choices is (237)2×228=2102. The guess of the other 2 S-boxes could be omitted for a much lower order of magnitude.

If the S-boxes and MDS matrices are fixed, the time complex of Lepoint et al.’s attack is 222. Then, the entire time complex of recovering all the bytes of secret key in our white-box AES-like implementation is about 2102×222=2124.

4.2.4 Against other attacks

By now, the main known attacks to Chow et al.’s white-box AES take advantage of the properties of the public and fixed operations of the MixColumn matrix and S-box. In our scheme, the confusion and diffusion operations are changing key-dependently. In addition to BGE attack and Lepoint et al.’s attack, the time complex of other attacks would be also increased, e.g., Michiels et al.’s attack [17], that they are no longer practical.

Biryukov et al.’s structural attack provided a way to recover the substitution (S) and affine (A) transformations in an equivalent form from an SASAS structure. If the attack is implemented toward our white-box AES-like implementation, the resulted S-boxes and diffusion matrices may not be the original ones and may not be simultaneously in the key space of our scheme, i.e., not in the group of 237 S-boxes and 228 MDS matrices. So, they cannot be used to recover the secret key. Extra guess should be needed. On the other hand, Biryukov et al’s attack can not exactly recover the external encodings. The attacker cannot lift the codes and distribute them to illegal users arbitrarily.

5 Size and performance

The implementations of AES and our AES-like cipher in the black-box model can be optimized by using lookup tables. For the same S-box and MixColumn matrix in each round, AES needs only 5 tables: one 8-bit to 8-bit table for SubBytes and four 8-bit to 32-bit tables for MixColumns. 304 table lookups and 11+3×4×9=119 xors are needed to finish the whole cipher. Our AES-like cipher needs 304 tables: 160 8-bit to 8-bit tables for KeySubBytes and 144 8-bit to 32-bit tables for KeyMixColumns. During the whole process, 304 table lookups and 12×9=108 xors are needed. The white-box implementation of our AES-like cipher has the same structure with Chow et al.’s white-box AES implementation. Table 1 lists the needed storage spaces and numbers of operations.

Table 1 Needed storage spaces and numbers of operations

We did performance tests of encrypting data, from 1M to 20M, in an environment as: 2.53 GHz Intel M540 processor, 4 GB memory, Windows 7(32-bit) OS and Visual C++ 2010. To prepare key-dependent S-boxes and key-dependent MDS matrices, a process of pre-computation should be done once at the beginning of our AES-like cipher. This process takes about 109ms. From Fig. 4, we can see that our AES-like cipher runs slightly slower than AES. Yet, because of the same structure, the two white-box implementations undoubtedly have equal efficiency which is about 15 times slower than the original ciphers.

Fig. 4
figure 4

Comparisons of the encryption efficiencies

Considering the pre-computation process and the extra storage space, our AES-like cipher may not be as efficient as AES. But as far as the white-box attack context is concerned, the white-box application of our AES-like cipher scheme has the advantage of safety as well as the same efficiency with Chow et al.’s white-box AES. In a practical application, if a white-box AES implementation of Chow et al.’s has been carried out on a device, it can be replaced by ours easily. Replacement of lookup tables is enough.

6 Conclusion and discussion

In this paper, we proposed a white-box AES-like implementation. First, we developed key-dependent S-boxes and key-dependent MDS matrices to replace the corresponding components in AES. We demonstrated that the security of our AES-like cipher is similar with AES. Second, we showed that the white-box implementation of our AES-like cipher can resist known attacks. With the same structure of lookup tables, our implementation has not only the same efficiency with Chow et al.’s white-box AES implementation, but also extra security.

Recently, some dedicated white-box block ciphers have been proposed [3, 6]. Using key-dependent confusion and diffusion operations is a design strategy, e.g., in [3]. How to construct these operations while keeping good cryptographic properties would be worth discussing.