Keywords

1 Introduction

Due to widespread usage of Internet of Things (IoT) technology, the need of protection from security threats on resource-constrained devices has been continuously growing. Since 2003, the cryptography community has already recognized the importance of this need, and researchers and developers have focused on cryptography tailored to limited computation resources in hardware and software implementations. This has resulted in opening up a new subfield of cryptography, namely, lightweight cryptography, which led to the launch of the eSTREAM project. This project running from 2004 to 2008 can be viewed as the most important research activity in the area of lightweight stream ciphers. The eSTREAM portfolio contains four software-oriented ciphers and three hardware-oriented ciphers.

From an industrial point of view, it has been widely recognized that maturity is important regarding deployment of cryptographic mechanisms. In fact, the ISO/IEC 18033-1 [32] standard states this property as one criteria for inclusion of cryptographic mechanisms. The concept behind this it that, if cryptographic mechanisms are standardized, they should be in the public domain for many years. In this way, security and performance analysis of them can be performed by third parties, which would give the users a significant amount of confidence in security. The above mentioned eSTREAM project activity affected industry: one of the eSTREAM portfolio cipher, Trivium [14], is standardized in the lightweight stream cipher standard, ISO/IEC 29192-3 [31] together with Enocoro [48]. Grain-128a, which is based on the eSTREAM portfolio cipher Grain v1, is standardized in ISO/IEC 29167-13 [33] for the RFID application standard.

Despite of the above extensive academic and industry efforts, there is still an important gap to fill. There has been no authenticated encryption with associated data (AEAD) mechanism that meets very severe performance requirements in hardware and still offers 128-bit security, accompanied by serious evidence on cryptanalysis. In 2013, NIST initiated a lightweight cryptography project, followed by two workshops on the same subject. In 2017, NIST published a call for submissions for lightweight cryptographic mechanisms. One remarkable feature is that NIST requires each submission to implement the AEAD functionality. In [9], it was shown that lightweight stream ciphers are typically more suitable than lightweight block ciphers for energy optimization when encrypting longer messages, in particular when the speed can be increased at the expense of moderate extra hardware. Thus, a lightweight stream cipher seems to be a good starting point for a lightweight AEAD design.

This paper presents Grain-128AEAD, an authenticated encryption algorithm with support for associated data. The specification is in line with the requirements given by NIST and is based on the Grain stream cipher family. More specifically, it is closely based on Grain-128a, introduced in 2011, which has, already for several years, been analyzed in the literature. To benefit from the maturity of the Grain family, our strategy in the design of Grain-128AEAD is to have the changes made to Grain-128a as small as possible. Grain-128a is in turn based on Grain v1 and Grain-128, which have both been extensively analyzed, providing much insight into the security of the design approach. All Grain stream ciphers also allow the throughput to be increased by adding additional copies of the Boolean functions involved.

Industrial relevance of the Grain family can be explained as follows: Grain-128a receives a lot of attention from industry. ISO/IEC 29167-13:2015 specifying Grain-128a has been adopted in industrial applications. For instance, the passive IT70 RFID tag [30] that Honeywell has designed for automotive applications implements this security standard.

The outline of the paper is as follows. In Sect. 2 the specification of the new primitive is given. Then the overall design rationale, motivating the design choices, are given in Sect. 3. A security analysis, focusing on cryptanalysis of Grain-128a is then given in Sect. 4. The hardware implementation is described in Sect. 5 and the paper is concluded in Sect. 6.

2 Design Details

Grain-128AEAD consists of two main building blocks. The first is a pre-output generator, which is constructed using a Linear Feedback Shift Register (LFSR), a Non-linear Feedback Shift Register (NFSR) and a pre-output function, while the second is an authenticator generator consisting of a shift register and an accumulator. The design is very similar to Grain-128a, but has been modified to allow for larger authenticators and to support AEAD. Moreover, the modes of usage have been updated.

2.1 Building Blocks and Functions

The pre-output generator generates a stream of pseudo-random bits, which are used for encryption and the authentication tag. It is depicted in Fig. 1. The content of the 128-bit LFSR is denoted \(S_t = [s_0^t, s_1^t,\ldots ,s_{127}^t]\) and the content of the 128-bit NFSR is similarly denoted \(B_t = [b_0^t, b_1^t,\ldots ,b_{127}^t]\). These two shift registers represent the 256-bit state of the pre-output generator.

Fig. 1.
figure 1

An overview of the building blocks in Grain-128AEAD.

The primitive feedback polynomial of the LFSR, defined over GF(2) and denoted f(x), is defined as

$$ f(x) = 1 + x^{32} + x^{47} + x^{58} + x^{90} + x^{121} + x^{128}. $$

The corresponding update function of the LFSR is given by

$$\begin{aligned} s_{127}^{t+1}= & {} s_0^t + s_7^t + s_{38}^t + s_{70}^t + s_{81}^t+ s_{96}^t \\ ~= & {} \mathcal {L}(S_t). \end{aligned}$$

The nonlinear feedback polynomial of the NFSR, denoted g(x) and also defined over GF(2), is defined as

$$\begin{aligned} g(x)&= 1 +\, x^{32} + x^{37} + x^{72} + x^{102} + x^{128} + x^{44}x^{60} + x^{61}x^{125} \\&\quad +\, x^{63}x^{67} + x^{69}x^{101} + x^{80}x^{88} + x^{110}x^{111} + x^{115}x^{117}\\&\quad +\, x^{46}x^{50}x^{58} + x^{103}x^{104}x^{106} + x^{33}x^{35}x^{36}x^{40} \end{aligned}$$

and the corresponding update function is given by

$$\begin{aligned} b_{127}^{t+1}&= s_0^t + b_0^t + b_{26}^t + b_{56}^t + b_{91}^t + b_{96}^t + b_{3}^tb_{67}^t + b_{11}^tb_{13}^t \\&\quad + \, b_{17}^tb_{18}^t + b_{27}^tb_{59}^t + b_{40}^tb_{48}^t + b_{61}^tb_{65}^t + b_{68}^tb_{84}^t\\&\quad +\, b_{22}^tb_{24}^tb_{25}^t + b_{70}^tb_{78}^tb_{82}^t + b_{88}^tb_{92}^tb_{93}^tb_{95}^t\\&= \, s_0^t + \mathcal {F}(B_t). \end{aligned}$$

Nine state variables are taken as input to a Boolean function h(x). Two of these bits are taken from the NFSR and seven are taken from the LFSR. The function is defined as

$$ h(x) = x_0x_1 + x_2x_3 + x_4x_5 + x_6x_7 + x_0x_4x_8, $$

where the variables \(x_0,\ldots ,x_8\) correspond to, respectively, the state variables \(b_{12}^t, s_{8}^t, s_{13}^t, s_{20}^t, b_{95}^t, s_{42}^t, s_{60}^t, s_{79}^t\) and \(s_{94}^t\).

The output of the pre-output generator, is then given by the pre-output function

$$ y_t=h(x) + s_{93}^t + \sum _{j \in \mathcal {A}}b_j^t, $$

where \(\mathcal {A} = \{2, 15, 36, 45, 64, 73, 89\}\).

The authenticator generator consists of a shift register, holding the most recent 64 odd bits from the pre-output, and an accumulator. Both are of size 64 bits. We denote the content of the accumulator at instance i as \(A_i =\) \([a_0^i,a_1^i,\ldots ,a_{63}^i]\). Similarly, the content of the shift register is denoted \(R_i = [r_0^i,r_1^i,\ldots ,r_{63}^i]\).

2.2 Key and IV Initialization

Before the pre-output can be used as keystream and for authentication, the internal state of the pre-output generator and the authenticator generator registers are initialized with a key and IV. Denote the key bits as \(k_i\), \(0\le i \le 127\) and the IV bits as IV\(_i\), \(0 \le i \le 95\). Then the state is initialized as follows. The 128 NFSR bits are loaded with the bits of the key \(b_i^0=k_i\), \(0 \le i \le 127\) and the first 96 LFSR elements are loaded with the IV bits, \(s_i^0=\textit{IV}_i\), \(0 \le i \le 95\). The last 32 bits of the LFSR are filled with 31 ones and a zero, \(s_i^0=1, 96 \le i \le 126,~s_{127}^0=0\). Then, the cipher is clocked 256 times, feeding back the pre-output function and XORing it with the input to both the LFSR and the NFSR, i.e.,

$$\begin{aligned} s_{127}^{t+1}= & {} \mathcal {L}(S_t) + y_t, \quad 0 \le t \le 255, \\ b_{127}^{t+1}= & {} s_0^t + \mathcal {F}(B_t) + y_t, \quad 0 \le t \le 255. \end{aligned}$$

Once the pre-output generator has been initialized, the authenticator generator is initialized by loading the register and the accumulator with the pre-output keystream as

$$\begin{aligned} a_j^{0}= & {} y_{256+j}, \qquad 0 \le j \le 63, \\ r_j^{0}= & {} y_{320+j}, \qquad 0 \le j \le 63. \end{aligned}$$

When the register and the accumulator are initialized, the key is simultaneously shifted into the LFSR,

$$\begin{aligned} s_{127}^{t+1}= & {} \mathcal {L}(S_t) + k_{t-256}, \quad 256 \le t \le 383, \end{aligned}$$

while the NFSR is updated as

$$\begin{aligned} b_{127}^{t+1}= & {} s_0^t + \mathcal {F}(B_t), \quad 256 \le t \le 383. \end{aligned}$$

Thus, when the cipher has been fully initialized the LFSR and the NFSR states are given by \(S_{384}\) and \(B_{384}\), respectively, and the register and accumulator are given by \(R_0\) and \(A_0\), respectively. The initialization procedure is summarized in Fig. 2.

Fig. 2.
figure 2

An overview of the initialization in Grain-128AEAD. Note that, in hardware, the accumulator initialization is realized by first loading 64 pre-output bits into the register, followed by moving them to the accumulator.

2.3 Operating Mode

For a message m of length L, denoted \(m_0,m_1,\ldots ,m_{L-1}\), set \(m_L=1\) as padding in order to ensure that m and m\(\Vert 0\) have different tags.

After initializing the pre-output generator, the pre-output is used to generate keystream bits \(z_i\) for encryption and authentication bits \(z_i'\) to update the register in the accumulator generator. The keystream is generated as

$$ z_i = y_{384+2i}, $$

i.e., every even bit (counting from 0) from the pre-output generator is taken as a keystream bit. The authentication bits are generated as

$$ z_i' = y_{384+2i+1}, $$

i.e., every odd bit from the pre-output generator is taken as an authentication bit. The message is encrypted as

$$ c_i = m_i \oplus z_i, \quad 0\le i < L. $$

The accumulator is updated as

$$ a_j^{i+1} = a_j^i + m_ir_j^i, \qquad 0 \le j \le 63,\quad 0\le i\le L, $$

and the shift register is updated as

$$\begin{aligned} r_{63}^{i+1}= & {} z_i',\\ r_{j}^{i+1}= & {} r_{j+1}^i, \qquad 0 \le j \le 62. \end{aligned}$$

An AEAD scheme allows for data that is authenticated, but unencrypted. Grain-128AEAD achieves this simply by explicitly setting \(y_{384+2i}=0\) for bits that should not be encrypted, but should still be authenticated. This means that it is possible to control the associated data on bit level, and this data can appear anywhere in the message.

3 Design Rationale

This section presents a short overview of the Grain stream ciphers and how the design has evolved through the different versions. It also enumerates and discusses the differences between Grain-128a and the proposed Grain-128AEAD.

3.1 A Short History of the Grain Family of Stream Ciphers

The Grain family of stream ciphers are based on the idea behind the nonlinear filter generator. In a nonlinear filter, an LFSR is used to provide a sequence with large period, and a nonlinear function, taking parts of the LFSR sequence as input, is used to add nonlinearity to the keystream sequence. Much work has been put into analyzing the nonlinear filter generator and it is clear that it is very difficult to design a secure nonlinear filter generator with a reasonable hardware footprint [13]. In particular algebraic attacks have been shown to be very strong against this design, see e.g., [17, 41].

In order to better withstand algebraic attacks, and to make the relation between state/key and keystream more complex, Grain adds an NFSR to the nonlinear combiner. The initial submission to the ECRYPT eSTREAM project was analyzed in [10, 37], showing that the nonlinear functions required higher resiliency and nonlinearity. The modified design was subsequently published as Grain v1 [28] and was later selected for the final portfolio in eSTREAM. Grain v1 uses an 80-bit key, and a 128-bit key variant was proposed in [27]. Based on previous results on the Grain construction, Grain-128 was more aggressively designed, making the nonlinear NFSR feedback function of degree 2, but with high nonlinearity and resiliency. The relatively small functions compensated for the fact that the shift registers were increased to 128 bits each, which increased the hardware footprint. The low degree functions were exploited in [3, 44] in order to cryptanalyze a significant number of initializations rounds. These results suggested that the nonlinear functions needed a higher security margin. Grain-128a was proposed in [50], and in addition to increasing the degree of the nonlinear feedback function, an optional authentication mode was added. Work on Grain-128 were subsequently improved [19,20,21, 35], emphasizing the need for more complex Boolean functions, and Grain-128 is considered broken and should not be used. The design proposed in this paper, Grain-128AEAD, is closely based on Grain-128a, using the same feedback and output functions. However, slight modifications have been made in order to add security and make it resistant to the attack proposed in [46].

3.2 Differences Between Grain-128AEAD and Grain-128a

Grain-128AEAD takes Grain-128a as starting point, but introduces a number of slight modifications. The modifications are primarily motivated by the NIST Lightweight Cryptography Standardization Process, but inspiration also comes from recent results in [25, 46].

Larger MACs. The register and the authenticator has been increased to 64 bits (instead of 32 bits) in order to allow for authentication tags (MACs) of size 64 bits.

No Encryption-Only Mode. Grain-128a allowed for an operation mode with only encryption, where the authentication was removed. This mode resulted in smaller hardware footprint since the two additional registers, and their associated logic, could be left out from an implementation. The encryption-only mode was also more efficient since the initialization process does not include initializing the register and the accumulator, and every pre-output bit was used as keystream. The proposed Grain-128AEAD is a pure authenticated encryption algorithm, and authentication of data is always supported. Thus, there is only one mode of operation.

Initialization Hardening. Based on the ideas in [25] and used in Lizard [26], Grain-128AEAD re-introduces the key into the internal state during the initialization clock cycles. More specifically, it is serially shifted into the LFSR in parallel to the initialization of the register and the accumulator. Several variants can be considered here, including where and when to add the key. The LFSR is chosen due to the fact that if the LFSR is recovered (e.g., in a fast correlation attack as in [46]), it is comparably easy to recover the NFSR state. Moreover, since the LFSR output is XORed with the NFSR input, the key bits will continue to affect also the NFSR during pre-output generation. As for when, we choose to re-introduce it during the last 128 clocks of the initialization. This provides maximum separation between its first introduction in the key loading part, where the key is loaded into the NFSR, and when it is re-introduced. Relations between keys are e.g., more difficult to exploit if the key is properly mixed into the state before the key is re-introduced.

By introducing the key as the last part of the initialization, we achieve the attractive effect that a state recovery attack does not immediately imply key recovery, as was the case for previous versions of Grain. While a state recovery would still render the cipher to be considered broken, the practical effect to deployed devices is highly limited. Recovering the state will only compromise the security of the current message, and not all messages using the same key.

Keystream Limitation. Grain stream ciphers have been designed to allow for encrypting large chunks of data using the same key/IV pair. Previously, the Grain ciphers have not had any explicit limitation on the keystream length. However, to rule out attacks that use very large keystream sequences, Grain-128AEAD restricts the number of keystream bits for each key/IV to \(2^{80}\). We believe that this is well above what will be needed in the foreseeable future. Restricting the number of keystream bits will also make attacks that use linear approximations more difficult, e.g., [46].

4 Security Analysis

The security of the Grain family of stream ciphers has been investigated by a large number of third party analysts, publishing various analysis results on the different variants of Grain. Since its first introduction in 2005, much have been learned about the construction and the design approach. There have also been several published ciphers inspired by the design, e.g., Sprout [2] and its successor Plantlet [42]. Also Fruit [23] and Fruit-80 [1] are based on the same design idea. These ciphers have in common that they attempt to realize extremely resource constrained encryption. To minimize the hardware footprint, the key is assumed to be stored in non-volatile memory (NVM) on a device, and this memory is made part of the cryptographic algorithm. Since the key needs to be stored on a device anyway, using the key directly from NVM in the algorithm does not impose additional hardware to the construction. This is not the case for Grain, as we allow the key to be updated in the device, and the key storage is not a part of the cipher. Still, the fact that the above mentioned ciphers use the Grain design idea shows that the design seems to be very suitable for lightweight cryptography.

4.1 General Security Analysis

A main class of attacks on stream ciphers is the Time/Memory/Data tradeoff (TMD-TO) attack, an efficient method of finding either the key or the state of ciphers by balancing between time, memory and keystream data. This can sometimes be much more efficient and more practically applicable than a simple exhaustive key search attack. Some stream ciphers are vulnerable to TMD-TO attacks and their effective key lengths could then be reduced. This typically happens if the state size is too small. A famous practical TMD-TO attack on A5/1 was given in [12].

A TMD-TO attack consists of two parts. The first is a preprocessing phase, during which a table is constructed. The mapping of different keys or internal states to some keystream segment is computed and stored in the table. It is sorted on keystream segments and this process is assumed to use time complexity P and memory M. In the second (real-time) phase, the attacker has intercepted D keystream segments and search for a collision with the table with time complexity T. A collision will recover the corresponding input. By a trade-off between parameters PDM, and T, attackers can devise attacks according to available time, memory and data. Examples of tradeoffs are Babbage-Golic (BG) [4, 24] and Biryukov-Shamir (BS) [11] with curves \({TM = N}\), \({P = M}\) with \({T \le D}\); and \({MT^2D^2=N^2}\), \({P=N/D}\) with \({T \ge D^{2}}\), where N is the input space, respectively.

For Grain-128AEAD, attackers have no direct way to reconstruct the internal state, since the cipher has an internal state of size 256 bits (128-bit LFSR + 128-bit NFSR), i.e. \(N=2^{256}\). The best attack complexity achieved under BG tradeoff is with \({T = M = D = N^{1/2} = 2^{128}}\), which is not favourable compared to exhaustive key search. Also the BS tradeoff does not give complexity parameters of particular interest. Some improvements to TMD-TO attacks can be achieved through so called BSW sampling [12] and the performance of such an approach is characterized by the sampling resistance of the stream cipher. Various generalizations of the concept of sampling resistance can be considered, e.g. [34], but it seems unlikely that this will lead to an attack with better performance than a standard Hellman-type time-memory tradeoff attack on the keyspace, a generic attack applicable to any cipher. Also, our limit on the length of keystreams affects such attacks.

Another class of general attacks are algebraic attacks, where the attacker derives a system of nonlinear equations in unknown key bits or unknown state bits and then solves the system. In general, solving a system of nonlinear equations is not known to be solvable in polynomial time, but there might be special cases that can be solved efficiently [16]. Due to the NFSR, the degree of the equations will gradually increase and it does not look promising to try to derive a system of nonlinear equations due to this property as well as the algebraic degree of the h function.

A general cryptanalytic technique is a guess-and-determine attack, where one guesses parts of the state and then from the keystream tries to determine other parts of the state. The goal is to guess as few positions as possible and determine as many as possible from equations involving the keystream. Again, since the dependence between a keystream symbol and the state includes many different positions in the state and some of them in nonlinear expressions, one has to guess a large portion of state variables in order to use an equation to determine a single state variable.

Being a binary additive stream cipher, Grain-128AEAD does not allow reuse of a key/IV pair since this will leak information about the corresponding plaintexts. Moreover, since Grain-128AEAD closely resembles Grain-128a, a key/IV pair used in one cipher may also not be reused in the other. Such cross-cipher key/IV reuse in a related cipher model is outside the security model of Grain-128AEAD.

In the subsequent subsections, we now describe the attacks that we consider as the main threat against lightweight stream ciphers in general and Grain-128AEAD in particular.

4.2 Correlation Attacks

Grain-128a was designed to resist conventional (fast) correlation attacks that exploit correlations between the state of the LFSR and the corresponding key stream. There has been devised a fast correlation attack on small state Grain-like stream ciphers in [49]. Due a much bigger state, this attack does not apply to Grain-128a. On the other hand, a recent paper [46] reveals that there are multiple linear approximations in Grain-128a that together with a viewpoint based on a finite field allow a fast correlation attack on the raw encryption mode of Grain-128a (and on the other members of the Grain family), where every keystream bit is assumed to be accessible by an opponent. This attack recovers the state of Grain-128a with data and time complexity of about \(2^{114}\). The data needs to come from the same secret key and the same IV.

It should be noted that this fast correlation attack does not apply to Grain-128a in authentication mode, as then only every second key stream bit may be accessible to an opponent. Thus, it does not apply to Grain-128AEAD.

4.3 Chosen IV Attacks

A variety of chosen IV attacks on Grain have been proposed, in both fixed key scenario as well as in the related key setting, and either for distinguishing purpose or for key recovery. In a fixed key scenario, chosen IV attacks have been devised on reduced-round versions using conditional differentials and using cube attacks, or combinations of both [22, 38,39,40]. On Grain-128, a dynamic cube attack has been developed that succeeds in finding the secret key for the full 256-round initialization for a fraction of keys, [19]. Dynamic cube attacks have not been successful on Grain-128a thus far. Most of these results are experimental in nature, and do work only if the computational effort is practically feasible.

More recently, division property has been developed to improve cube attacks. Division property is an iterated technique for integral distinguishers introduced by Todo, in [45] and was applied initially to block ciphers. It turned out that it also applies to the initialization of stream ciphers, not only for distinguishers but also for key recovery. As opposed to conventional cube attacks, it can provide theoretical results. The latest result on Grain-128a in this direction is a key recovery on 184 initialization rounds, [47]. The data complexity is \(2^{95}\), and the computational complexity corresponds to about \(2^{110}\) operations.

An attack that reaches the largest number of initialization rounds of Grain-128a in a fixed key scenario thus far is a conditional differential distinguishing attack and reaches 195 initialization rounds, but it works only for a fraction of all keys, [40].

The relevance of related key cryptanalysis of stream ciphers has been a subject of debate. A related key attack on Grain-128a in [18] recovers the secret key with a computational complexity \(2^{96}\), requiring \(2^{96}\) chosen IVs and about \(2^{104}\) keystream bits. It requires only 2 related keys. Another related key attack in [8] recovers the secret key using \(2^{64}\) chosen IVs and \(2^{32}\) related keys, where these figures need to be multiplied by some factor (about \(2^8\)). Due to the modified initialization procedure, related key attacks on Grain-128AEAD are expected to be less efficient than those against Grain-128a.

4.4 Fault Attacks

In the scenario of fault attacks on stream ciphers, the attacker is allowed to inject faults into the internal state, which means either flipping a binary value in memory or assigning a value to zero. By analyzing the difference in keystreams for the faulty and the fault-free case, one attempts to deduce the complete or some partial information about the internal state or the secret key. Fault attacks on stream ciphers have recently received some attention, starting with the work of Hoch and Shamir [29]. The most common methods of injecting faults is by using laser or through clock glitches. Fault attacks usually rely on assumptions that is beyond the model of cryptanalysis and for this reason one can often find rather efficient fault attacks on most ciphers. In some scenarios they are, however, not unrealistic and the exact complexity and the related requirements are of interest to study.

Fault attacks on the Grain family of stream ciphers were studied in [15] and [36]. More recently, there was a number of papers providing improved attacks, [5,6,7, 43]. In [43] the model is the most realistic one as it considers that the cipher has to be re-initialized only a few times and faults are injected to any random location and at any random clock cycle. No further assumptions are needed over location and timing for injections. In the attack one constructs algebraic equations based on the description of the cipher by introducing new variables so that the degrees of the equations do not increase. Following algebraic cryptanalysis, such equations based on both fault-free and faulty key-stream bits are collected. Then a solving phase using the SAT Solver recovers the state of any Grain member in minutes, For Grain v1, Grain-128 and Grain-128a, it uses only 10, 4 and 10 injected faults, respectively.

We stress that we are not claiming resistance against fault attacks for Grain-128AEAD. Rather, when fault attacks is a realistic threat, one has to implement protection mechanisms against fault injection.

5 Implementation

Lightweight ciphers are important in constrained devices. A minimal design is desirable, e.g., minimum area and very low power consumption since they often must operate for an extended period of time, without a battery change. In some cases, devices run without its own power supply, something that is often the case with RFID tags.

Table 1. The gate count for different functions.

Grain-128AEAD can be constructed using primitive hardware building blocks, such as NAND gates, XOR gates and flip flops. In order to get an idea of the hardware footprint related to an implementation of the cipher, we implement the stream cipher using 65 nm library from ST Microelectronics, stm065v536. For synthesis and power simulation, the Synopsys Design Compiler 2013.12 is used. It can be noted that the result is highly dependent on what kind of gates are available and how the tool utilizes the standard cells. We define a 2-input NAND gate to have a gate count of 1 and other gate counts are given in relation to this NAND gate. An excerpt from the standard-cell library documentation is given in Table 1.

Table 2. Gate count for the different building blocks, for different levels of parallelization, s.

We synthesize the design and extract the gate count for each building block. A summary of the gate count for each building block, and for different parallelization levels, is given in Table 2. The control logic and accumulator logic is extra circuitry and state machines for controlling the stream cipher, i.e., loading key and IV, multiplexing data, etc.

The gate count remains constant during synthesis, but the physical area, power and speed changes based on the optimization techniques employed. First, we synthesize the design at clock frequency 100 kHz. The design is synthesized for three levels of parallelization; 1, 2, and 32 times. The result is given in Table 3.

Table 3. Implementation results running at 100 kHz, for different levels of parallelization.

We also synthesize for the maximum possible speed, to achieve maximum throughput, without constraints on area. The results are given in Table 4.

Table 4. Implementation results running at maximum possible speed, for different levels of parallelization.

6 Conclusions

We have presented Grain-128AEAD, a new cipher in the Grain family. It is closely based on Grain-128a and takes advantage of the well-analyzed design principle behind the Grain stream ciphers. By making slight modifications to Grain-128a, the cipher meets the requirements in the NIST lightweight standardization process, providing 64-bit MAC, 128-bit key and 96-bit IV. The hardware footprint makes the cipher well suited for constrained environments, but the design is flexible enough to allow for also very high speed requirements at the expense of additional hardware.