Keywords

1 Introduction

A lot of researches have been published about various methods to secure implementations of different kinds of cryptographic algorithms, after Kocher et al. [11] introduced Simple Power Analysis(SPA) and Differential Power Analysis(DPA), types of power analysis. SCA attacks are physical attacks to find out secure data by using Side Channel Information such as power consumption, electromagnetic wave, timing, and so on during the execution. The attacks are based on the statistical dependency between the intermediate values and leaked information. It means that it is possible for adversaries to determine the entire secret key related to the intermediate values.

It is very common to randomize the sensitive variables by masking techniques when a countermeasure is used to protect implementations of block ciphers against SCA. One or several random values are added to the secret data during the execution of cryptographic algorithms, which means that every intermediate value is independent of any secret variable. But, the first-order masking method is vulnerable to a second-order SCA. The second-order masking schemes should be considered to resist the attacks, but decrease performance.

Some cryptographic algorithms use the boolean and arithmetic operations to make the security. To counteract the SCA attacks, it is necessary to convert back and forth between the boolean masking and the arithmetic masking. Thus, Goubin et al. [3] has suggested a secure method to convert between masks, which is only applicable to the first-order masking. However, it is shown that it is impossible to apply the Goubin’s method to the second-order masking schemes [4]. Vadnala et al. [4] has proposed masking conversion for the second-order masking, but it requires 1027 times more operations to convert 8-bit size of masks. It is not possible to use it on embedded devices.

In this paper, we suggest a new countermeasure method which randomizes all the intermediate values of cryptographic algorithms. We call it a function masking. Our scheme makes lookup tables which randomly convert all the functions and operations in the algorithms with encoding and decoding(we call it linear and non-linear function masks). The algorithms are reconstructed by using these lookup tables. Actually, this method is similar to white-box cryptography because of encoded lookup tables. However, our method dynamically inputs the round keys while white-box cryptography includes the round keys in lookup tables. Thus, it is possible to change the round keys depending on the environments and the attackers cannot predict all the intermediate values during the execution of cryptographic algorithms. We show the security of the function masking, apply it to HIGHT algorithm, and compare with second order masking.

The remainder of this paper is organized as follows. Section 2 describes the existing countermeasures of SCA and HIGHT cryptography algorithm which we applied the function masking. In Sect. 3, we introduce the concept of function masking and explain the implementation method of HIGHT algorithm to apply function masking. We show the security and performance analysis in Sect. 4. Finally, in Sect. 5, we offer the conclusion.

2 Related Work

2.1 Countermeasures Against Side Channel Attacks

To the best of our knowledge, the most widely used technique protecting against DPA is to mask key-dependent intermediate data by random values. This is called masking. For a key-dependent intermediate byte \(x\) and a random mask \(m\), masking requires a function \(f(x, m) = x \cdot m\), where \(\cdot \) is defined as bitwise XOR(boolean masking), modulo addition (additive masking) or multiplication (multiplicative masking).

$$\begin{aligned} y \oplus m'&=\mathbf{MaskedSbox}(x \oplus m \oplus k)\\ \quad m,m'&\,:\ {\text {random values(mask)}}, y = \mathbf{Sbox}(x \oplus k) \end{aligned}$$

However, using only one mask which is called a first-order masking is vulnerable to a second-order DPA.

$$\begin{aligned}&y_1 \oplus m' = \mathbf{MaskedSbox}(x_1 \oplus m \oplus k_1), y_1 = \mathbf{Sbox}(x_1 \oplus k_1) \nonumber \\&y_2 \oplus m' = \mathbf{MaskedSbox}(x_2 \oplus m \oplus k_2), y_2 = \mathbf{Sbox}(x_2 \oplus k_2) \end{aligned}$$
(1)
$$\begin{aligned} y_1 \oplus m' \oplus y_2 \oplus m' = y_1 \oplus y_2 = \mathbf{Sbox}(x_1 \oplus k_1) \oplus \mathbf{Sbox}(x_2 \oplus k_2) \end{aligned}$$
(2)

To be specific, Eqs. (1) and (2) show that an attacker can obtain a non-masked result value of XORing two S-box outputs by XORing two masked S-box outputs. This is due to the fact that \(m'\) is canceled out by the XOR operation. A second-order DPA is therefore started by making two target points of a power trace as one point using subtractions or multiplications. The next step is to mount DPA based on a hypothetical value computed by XORing two S-box outputs [2].

Protection of second-order DPA requires more than two masks, and all intermediate values have to be masked through out the execution of the algorithm. Especially, each of input and output bytes of S-box must use different masks. For this reason, a masked AES implementation requires 16 masked S-boxes. As a result, a high-order masking of AES gives rise to an efficient implementation of S-boxes. Unfortunately, Table 1 shows that implementing a high-order masking scheme affects the performance of AES. To be more precise, the countermeasures are 150–300 times slower than a straightforward implementation. This might be an intolerable performance for a practical solution. HIGHT algorithms, which we will apply function masking, includes both boolean and arithmetic operations. To properly apply data masking, it is required to use a secure boolean-from/to-arithmetic mask conversion without exposing non-masked intermediate values against a second-order DPA. Goubin proposed secure mask conversion which can hide sensitive intermediate in convert process [3]. However, this conversion can only resist for first-order DPA. Vadnala et al. [4] proposed new conversion method which can work for second-order DPA. This method as shown in Algorithm 1 requires \(4 \times 2^k + 3\) operations for conversion of a \(k\)-bit mask. An arithmetic-to-boolean conversion also requires the similar number of operations. It must be a critical overhead when it is applied to all mask conversions.

Table 1. Performance of the high-order masking scheme in AES
figure a

2.2 HIGHT Algorithm

The HIGHT(HIGh security and light weigHT) [8] is a symmetric cipher which encrypts and decrypts data with a 64-bit block cipher using a key of size 128 bits. It provides light-weight and low-powered hardware implementation for ubiquitous computing devices. We will briefly introduce the algorithm of HIGHT. The 64-bit plaintext and ciphertext are denoted by concatenations of 8 bytes such as \(P=P_7 \Vert P_6\Vert P_5\Vert P_4\Vert P_3\Vert P_2\Vert P_1\Vert P_0\) and \(C=C_7 \Vert C_6 \Vert C_5 \Vert C_4 \Vert C_3 \Vert C_2 \Vert C_1 \Vert C_0\). Round functions are consisted of several mathematical operations: \(\boxplus \) addition mod \(2^8\), \(\boxminus \) subtraction mod \(2^8\), \(\oplus \) XOR, and \(\lll r\) r-bit left rotation. The encryption of HIGHT algorithm is totally made up of initial transformation, round function, final transformation, and key schedule. It is described in detail below.

figure b

\(WK_{0 \le i \le 7}\) means whitening key and \(SK_{0 \le i \le 127}\) is subkey. Round function uses functions \(F_{0}\) and \(F_{1}\):

$$\begin{aligned} F_{0}=(x \lll 1) \oplus (x \lll 2) \oplus (x \lll 7)\\ F_{1}=(x \lll 3) \oplus (x \lll 4) \oplus (x \lll 6) \end{aligned}$$

The decryption process is similar to the encryption of HIGHT.

3 Function Masking for Symmetric Cryptography Algorithm

3.1 Function Masking

Our function masking is inspired by a white-box implementation [9] of block ciphers. Protection of a key-customized encryption function \(E_k\) in a white-box implementation is replaced by \(E'_k = G \cdot E_k \cdot F^{-1}\), where F and G are input and output encoding, respectively. Being chosen randomly without reference to \(k\), the use of G and F unlikely weakens the ordinary black-box security of \(E_k\). However, one of the serious problems of this solution is the large size of the lookup tables. Our motivation in this matter is that an attacker in a gray-box model is not fully privileged to access the lookup table. For this reason, we try to generate a dynamic-key lookup table which takes both a key and an operand as an input. To be specific, it can be represented by

$$\begin{aligned} E (k, x)&= G (E(k, F^{-1} (x) ) ) \end{aligned}$$

where \(x\) is an operand to be involved with \(k\).

By generating a lookup table for \(E (k, x)\), we can significantly reduce the total size of the lookup table than a white-box implementation because the table can be shared throughout all rounds. Also, this yields an additional advantage over a white-box lookup table: it can support dynamic key applications. In other words, this method can be also used when a secret key is updated from time to time like in the case of a session key. A potential problem is how to design the lookup table within practical size because a key is added to an input to the table. In the following, we explain how to apply function masking to HIGHT in such an efficient way.

3.2 Applying Function Masking to HIGHT Algorithm

Function Masking Method

Encoding \( \varvec{ \& }\) Decoding. It is required to conceal all of the intermediate values. The function masking method uses non-linear, linear encoding and random masking. Chow et al. [9] has suggested input and output encodings to protect a table. An encoding is a bijection. Encodings are networked with input and output of tables. If a table T is prevented with chosen bijections G,H

$$\begin{aligned} T' = H \circ T \circ G^{-1} \end{aligned}$$

G is the input encoding and H is the output encoding. In case of two tables for lookup operations, it is expressed in a networked fashion. For example, tables \(T_1\) and \(T_2\) are protected with encodings as follows.

$$\begin{aligned} T_2' \circ T_1' = (H \circ T_2 \circ G^{-1}) \circ (G \circ T_1 \circ H^{-1}) = H \circ T_2 \circ T_1 \circ H^{-1} \end{aligned}$$

Encodings make all lookup tables to obfuscate in Function Masking method. Furthermore, linear functions \(L\) and \(M\) are used to achieve diffusion for security, defined by Shannon [12]. There is also random mask to conceal \(2\times 4\)-bit output values. Random mask is used to encode the modular addition in round.

  • 4-bit non-linear function mask \(G, H, G^{-1},H^{-1}\): \(\{0,1\}^{4} \rightarrow \{0,1\}^{4}\)

  • 8-bit linear function mask \(L, M, L^{-1},M^{-1}\): \(\{0,1\}^{8} \rightarrow \{0,1\}^{8}\)

  • 8-bit random mask \(C_1,C_2(0 \le C_1 \le 255,~0 \le C_2 \le 255)\)

Several types of lookup tables (See Figs. 1, 2 and 3) could be generated with above masks.

Reduction of Lookup Table Size. The modular addition and XOR operation result in a value with two operands. If two input values are 8-bit, it can be shown that all of the \(2^{16}(=65536)\) possible output values produce distinct lookup tables. However, it is too much big to store in memory sometimes. To overcome this problem, it could be transformed into \(2 \times 2^{12}(=8192)\) lookup tables. Then, it could significantly reduce the size of lookup tables. At first, an 8-bit operand and a high 4-bit of another operand will become input values of the first lookup table. An 8-bit output of the first table and a low 4-bit of another operand produce an 8-bit result value of the modular addition or XOR operation by using the second table. For example, the XOR operation of two 8-bit operands can be computed with type III-1 and III-2 tables. There are also two tables of type IV-1 and IV-2 for the modular addition.

Applying Function Masking to HIGHT Algorithm. The Function Masking method is applied to the HIGHT algorithm. It is required to make 12 lookup tables of 5 types for HIGHT algorithm.

Initial Transformation downsizing the size of lookup tables is applied to the modular addition and XOR with 2 \(\times \)8-bit input. \(P_0\) and \(P_1\) are encoded by a type I-2 table. In the case of \(P_0\), the encoded value is added with Whitening Key by using two tables of type IV-1 and IV-2 tables. Then \(X_{0,0}\) is obtained after changing the mask from type I-4 table. The intermediate value \(X_{0,1}\) is the encoded value of \(P_1\). Moreover, \(P_2\) and \(P_3\) are encoded by type I-1 table. The encoded value of \(P_2\) is XORed with Whitening Key by using two tables of type III-1 and III-2 tables. Thus, a table lookup of type I-3 yields the intermediate value \(X_{0,2}\). The intermediate value \(X_{0,3}\) is the encoded value of \(P_3\). \(P_4,P_5\) and \(P_6,P_7\) are the same process as above lookup operations \(P_0,P_1\) and \(P_2,P_3\) respectively.

Round Transformation. Let’s take a close look at the first two 8-bit values of the round inputs shown in Fig. 4. Subkey is protected by encoding through type I-1 table. Type II-1 and III-2 tables operate functions F and XOR. A high 4-bit of the encoded subkey and the 8-bit value \(X_{i-1,0}\) are the input value of the type II-1 table. Thus, The output and a low 4-bit of the encoded subkey go into the type III-2 table. And the 8-bit value of \(X_{i-1,1}\) and a high 4-bit of the XORed value make the 8-bit output by using a type IV-1 table. The intermediate value \(X_{i,2}\) is obtained by a type IV-2 table with the central output and a low 4-bit of the XORed value. \(X_{i-1,0}\) becomes \(X_{i,1}\) just as it is.

The next process is similar to the previous process but the modular addition and XOR operation are out of order. A type I-2 table encodes a subkey to conceal. A high 4-bit of the encoded subkey and the 8-bit value \(X_{i-1,2}\) are the input of a type II-2 table. The result and a low 4-bit of the encoded subkey calculate the modular addition by the type IV-2 table. After computing the modular addition by lookup operations, the output is divided into 2 \(\times \) 4-bit values. Thus, the high 4-bit output and the value \(X_{i-1,3}\) are the input of a type III-1 table. The 8-bit outcome value of the type III-1 table and the low 4-bit output make the intermediate value \(X_{i,4}\). \(X_{i,3}\) is gained by the \(X_{i-1,2}\). The rest of process is the same as before. \(X_{i,5},X_{i,6}\) could be output of \(X_{i-1,4},X_{i-1,5}\) by the same process of the first one. The later process makes \(X_{i,7},X_{i,0}\) with input of \(X_{i-1,6},X_{i-1,7}\). Lastly all of the output values are rearranged by a left cyclic shift.

Final Transformation. It is easy to look into the final transformation since it is similar to the initial transformation. The value \(X_{32,0}\) is added with a Whitening Key by using lookup tables, type IV-1 and IV-2 tables. A first byte \(C_0\) of ciphertext is obtained after decoding table of a type V-1. \(C_1\) is the output of the type V-1 from the intermediate value \(X_{32,1}\). The value \(X_{32,2}\) XOR with a Whitening key by using lookup tables of type III-1 and III-2. \(C_3\) is obtained by a type V-2 table with an input value \(X_{32,3}\). \(C_4,C_5\) and \(C_6,C_7\) are derived from \(X_{32,4},X_{32,5}\) and \(X_{32,6},X_{32,7}\) by the same process of \(X_{32,0},X_{32,1}\) and \(X_{32,2},X_{32,3}\) respectively.

Fig. 1.
figure 1

Tables of Type I

Fig. 2.
figure 2

Tables of Type II and Type III

Fig. 3.
figure 3

Tables of Type IV and Type V

4 Security and Performance Analysis

4.1 Security Analysis

To demonstrate the security of the proposed method against side channel attack, we mainly show that a masked intermediate value is independent from a non-masked value. To do so, we first compare each bit of a masked and a non-masked intermediate values using the proposed and the original HIGHT implementations, respectively. The target intermediate value to be compared is \(X_2\)(third byte, see Algorithm 2) in the first round output because it is affected by the first byte of the first round key. The main step of single-bit DPA is to compute a differential trace after dividing power traces into two sets according the value of a target bit. The protection of DPA can be then justified if two bits of the non-masked and the masked \(X_2\) at each bit position are different with probability 1/2. For the verification, we have performed encryption for 10,000,000 different plaintexts using the two HIGHT implementations, and also compared each bit of the masked and the non-masked values of \(X_2\). As a result, Table 2 shows that they are different with a nearly 1/2 probability for every bit position. This property prevents a DPA attacker from constructing the correct sets of power traces and thus DPA is unlikely to work when using function masking.

Fig. 4.
figure 4

Round transformation

Table 2. Probability of different bit between function masking and no masking intermediate

In the case of CPA(correlation power analysis), an attacker computes a correlation value between the Hamming weights of a hypothetical value and the power consumption [13]. This is due to the fact that the power consumption of a micro-controller at a given point is known to be proportional or inversely proportional to the Hamming weight of a processed data. To demonstrate the protection of CPA, we show that the Hamming weights of a masked and a non-masked values of \(X_2\) are independent from each other. Let \(HW_{\alpha }\) denote the set of plaintexts that lead to the Hamming weight \({\alpha }\) of the non-masked value of \(X_2\). Then, we have \(\alpha \in [0, 8]\) because there are nine possible Hamming weights for an 8-bit value. We have performed encryption for 10,000,000 random plaintexts using the original HIGHT implementation and divided the plaintexts into \(HW_{\alpha }\), where \(\alpha \in [0,8]\). The next step is to show that the plaintexts in \(HW_{\alpha }\) lead to well-distributed Hamming weights of \(X_2\) in our implementation. For this purpose, we have repeated encryption on our proposed implementation for each set of plaintexts in \(HW_{\alpha }\), where \(\alpha \in [0,8]\). If the Hamming weights of the masked values of \(X_2\) are uniformly distributed, they will show the probabilities for the Hamming weights of an 8-bit value shown in Table 3. For \(\alpha \) \(\in \) [0,8], our experimental result shown in Table 4 gives us that the plaintexts in \(HW_{\alpha }\) cause the Hamming weights of \(X_2\) to be almost uniformly distributed in our implementation. This means that a masked and a non-masked values are not correlated to each other with overwhelming probability. We can therefore conclude that our function masking can also protect against CPA.

Table 3. Probability distribution for the Hamming weight of a uniformly distributed 8-bit value [10]
Table 4. Probability distribution for the Hamming weight of a function masked value

4.2 Performance Analysis

In this section, we compared the performance of the data masking and function masking. There is no secure implementation of HIGHT with second-order masking so far. Thus, it was tried to estimate the approximate overhead by calculating the number of operations required additional. Conversion of boolean and arithmetic mask is needed 10 times for one round function, when an implementation is used converting algorithm of [4]. Initial and final transformation are required two times mask conversion.

If data masking is applied at the beginning and end of 4 round, namely 8 rounds, initial and final transformation, the required operations of mask conversion are 86,268\((((8 \times 10) + (2 \times 2)) \times 1027)\) because one mask conversion needs 1,027 additional operations. In the case of no masking HIGHT, 392 operations are required because initial and final transformation need 4 operations in each and one round needs 12 operations where the HIGHT is composed of 32 rounds. Thus, it can be estimated that data masking version is over 200 times slower than the straightforward version. Even this is optimistic estimate excluding random number creation for mask conversion.

HIGHT applied function masking requires 16 times table lookup for initial transformation, 20 times for final transformation and 20 times for each round. For 8 rounds masking, table lookup will be 196 times. Since rest of unmasked 24 rounds require 288 operations, total operations for function masking are 484 times. Although this means function masking is 1.2 times slower than original HIGHT, actual runtime should be slower than the expectation because memory operation takes longer than ALU operation in CPU.

We implemented the function masked HIGHT in C language using a Intel core i7. Table 5 shows that lookup tables are around 25 Kbytes and it takes 1.79 times longer than original HIGHT.

Table 5. Lookup table size and time complexity of function masked HIGHT

5 Conclusion

Prior works have documented the masking methods against the standard DPA attack. However, The masking method is vulnerable to the high-order DPA attacks since the attacks use correlation coefficient between two points or more. To resist the high-order attack, the high-order masking schemes have been proposed but it is not easy to implement in reality because of bad performance. In this study, it is possible to implement our function masking scheme which needs only a little overhead in reality. Thus, our scheme takes only 1.79 times more than the original HIGHT algorithm, but spends almost 200 times less than the second-order masking method. It means that it is possible to implement the masked HIGHT algorithm on the microprocessor against SCA by using 25 KB memory.

In the future, we should consider about the reduction of table size. The efficiency and security of the masked HIGHT should be verified by applying the function masking to the standard cryptographic algorithms, AES or ARIA. We expect that it is possible to compare with the high-order masked AES since many researches of high-order masking AES have been published. And it will be confirmed on the small processor devices as well as PC with different environments.