1 Introduction

Authenticated encryption (AE) is a symmetric-key cryptographic primitive for providing both confidentiality and authenticity. Due to the recent rise in communication networks operated on small devices, the era of the so-called Internet of Things, AE is expected to play a key role in securing these networks.

In this paper, we study blockcipher modes for AE with primary focus on the hardware implementation size. Here, we consider the overhead in size; thus, the state memory size beyond the underlying blockcipher itself (including the key schedule) is the criteria we want to minimize, which is particularly relevant for hardware implementation. We observe this direction has not received much attention until the launch of CAESAR competition (see below), while it would be relevant for future communication devices requiring ultra low-power operations.

Generic Approaches One generic approach for reducing the implementation size of blockcipher modes is to use lightweight blockciphers. It covers a broad area of use cases, where standard AES is not suitable due to the implementation constraints, and one of the major criteria is area minimization. One of the most popular lightweight blockciphers is PRESENT [21] proposed in 2007. Since then, many have been proposed in the last decade, such as KATAN [24], LED [36], PICCOLO [62], PRINCE [23] and TWINE [63]. SIMON and SPECK [17] are proposed by NSA in 2014. More recent designs are SKINNY (which is a tweakable blockcipher [18]) and GIFT [15, 16].

The other approach is to use standard AES implemented in a tiny, serialized core [51], where the latter is shown to be effective for various schemes including popular CCM [5] or OCB [43] modes, as shown in [22] and [13]. Still, this requires much larger number of clock cycles for each AES encryption than the standard round-based implementation and hence is not desirable when speed or energy is also a criteria in addition to size.

AE Modes with Small Memory CAESAR [3] is a competition for AE started in 2012. It attracted 57 AE schemes, and there are new schemes that were designed to minimize the implementation size while designed as a blockcipher mode (i.e., it uses a blockcipher as a black box). Among them, JAMBU [66] is considered to be one of the most relevant modes to our purpose, which can be implemented with \((1.5n+k)\)-bit state memory, using n-bit blockcipher with k-bit key. However, the provable security result is not published for this scheme,Footnote 1 and the security claim about the confidentiality in the nonce misuse scenario was shown to be flawed [54]. We also point out that the rate of JAMBU is 1/2, i.e., it makes two blockcipher calls to process one input block. CLOC and SILC [39, 40] have provable security results and were designed to minimize the implementation size; however, they have \((2n+k)\)-bit state memory and the rate is also 1/2.

NIST Lightweight Cryptography Project Recently, the growing importance of lightweight applications has also been addressed by NIST’s lightweight cryptography project [48], which recognizes the apparent lack of suitable AE standards to be used for lightweight applications. They highlighted the requirements under the backdrop of several arising applications like sensor networks, health care, distributed control systems and several others, where highly resource-constrained devices communicate among themselves.

A variant [14] of the COFB mode instantiated with the GIFT blockcipher has been submitted to the NIST lightweight crypto project [48]. This submission incorporates a longer nonce to be compatible with the submission guidelines and adopts a hardware optimized feedback function with lower XOR counts but with one bit authenticity degradation.

We next summarize our contributions.

A New Type of Feedback Function To reduce the state memory, it is natural to use feedback from the blocks involved in each blockcipher call, at the cost of losing parallelizability. There are existing feedback modes (such as ciphertext feedback of CBC encryption); however, we found that none of them is enough to fulfill our needs. We first formalize the feedback function as a linear function to take blockcipher output (Y) and plaintext block (M) to produce the corresponding ciphertext block (C) and the chain value as the next input to blockcipher (X). This formalization covers all previous popular feedback functions. Then, we propose a new type of feedback function, called combined feedback, where X is a linear function (not a simple XOR) of M and Y. We show that if the above linear function satisfies certain conditions, we could build a provably secure, small-state AE. We first present a mode of tweakable random function which has additional input called tweak in addition to n-bit block input, to demonstrate the effectiveness of combined feedback and intuition for provable security. The proposed scheme (iCOFB for idealized COmbined FeedBack) has a quite high provable security, comparable to \({\Theta }\)CB3 presented in the proof of OCB3 [43], and has small memory (n-bit block memory plus those needed for the primitive). In addition, it needs one primitive call to process n-bit message block.

Blockcipher AE mode with Combined Feedback Function Starting from iCOFB, we take a further step to propose a blockcipher mode using combined feedback. The main obstacle is the instantiation of tweakable random function (or, equivalently tweakable blockcipher [47]) using a blockcipher. We could use an existing tweakable blockcipher mode for this purpose, e.g., XEX [55] by Rogaway, and thanks to the standard birthday type security of XEX, the resulting blockcipher mode would also have standard birthday type security. However, the implementation of XEX or similar ones needs n-bit memory used as input mask to blockcipher, in addition to the main n-bit state block, implying \((2n+k)\)-bit state memory. Therefore, instead of relying on the existing tweakable blockcipher modes, we instantiate the tweakable random function using only n / 2-bit mask and provide a dedicated security proof for our final proposal (mode), which we call COFB. We show COFB achieves almost birthday bound security, roughly up to \(O(2^{n/2}/n)\) queries, based on the standard PRP assumption on the blockcipher.

COFB needs n / 2-bit register for mask in addition to the registers required for holding round keys and the internal n-bit state for the blockcipher computation. Hence, the state size of COFB is \(1.5n+k\) bits. The rate of COFB is 1, i.e., it makes one blockcipher call to process one input block, meaning it is as fast as encryption-only modes. On the downside, \(\textsf {COFB}\) is completely serial both for encryption and for decryption, which is inherent to the use of combined feedback. However, we argue that this is a reasonable trade-off, as tiny devices are our primal target platform for \(\textsf {COFB}\). See Table 1 for comparison of COFB with others. The description and the security analysis of COFB in Sects. 4 and 5 have been described at the proceedings version of our paper in CHES 2017 [25].

Instantiations and Hardware Implementations We instantiate and implement COFB with the 128-bit version of the blockcipher AES known as AES-128. We also implement COFB with the 128-bit version of the blockcipher GIFT (described as GIFT-128 in [15, 16]) to get an idea of the lightweight property of the COFB mode by checking how small (hardware area) it can go with a lightweight blockcipher. For the sake of completeness, we compare our implementation figures with various schemes (not limited to blockcipher modes) listed in the hardware benchmark framework called ATHENa [1]. The implementation details of COFB[AES] have already been described in [25, 26]. COFB[AES] shows the impressive performance figures of \(\textsf {COFB}\) both for size and for speed compared to other AES-based AE modes. Moreover, if we implement COFB with GIFT, then it achieves much smaller area than COFB[AES] and is quite competitive to even ad hoc designs (see Sect. 6). The implementation details of COFB[GIFT] are also described in Sect. 6, which is a new contribution compared to [25]. For the sake of a fair benchmarking, we also provide CAESAR hardware API-supported hardware implementation results for both these two versions. Nevertheless, we think this comparison implies a good performance of COFB among others even using the standard AES-128 and implies COFB with a lightweight blockcipher to hit the limit of blockcipher-based AE’s speed and size.

Table 1 Comparison of AE modes, using an n-bit blockcipher with k-bit keys

2 Preliminaries

Notation We fix a positive integer n which is the block size in bits of the underlying blockcipher \(E_K\). Typically, we consider \(n = 128\) and AES-128 [8] is the underlying blockcipher, where K is the 128-bit AES key. The empty string is denoted by \(\lambda \). For any \(X \in \{0,1\}^{*}\), where \(\{0,1\}^*\) is the set of all finite bit strings (including \(\lambda \)), we denote the number of bits of X by |X|. Note that \(|\lambda | = 0\). For two bit strings X and Y, \(X\Vert Y\) denotes the concatenation of X and Y. A bit string X is called a complete (or incomplete) block if \(|X| = n\) (or \(|X| < n\), respectively). We write the set of all complete (or incomplete) blocks as \(\mathcal {B}\) (or \(\mathcal {B}^{<}\), respectively). Let \(\mathcal {B}^{\le } = \mathcal {B}^{<} \cup \mathcal {B}\) denote the set of all blocks. For \(B\in \mathcal {B}^{\le }\), we define \(\overline{B}\) as follows:

$$\begin{aligned} \overline{B} = {\left\{ \begin{array}{ll} 0^{n} &{}\quad \text{ if } B = \lambda \\ B \Vert 10^{n-1- |B|} &{}\quad \hbox { if}\ 0<|B|<n \\ B &{}\quad \hbox { if}\ |B|=n \end{array}\right. } \end{aligned}$$

Given non-empty \(Z\in \{0,1\}^*\), we define the parsing of Z into n-bit blocks as

$$\begin{aligned} (Z[1], Z[2],\ldots ,Z[z]) \xleftarrow {\scriptscriptstyle n} Z, \end{aligned}$$
(1)

where \(z=\lceil |Z|/n \rceil \), \(|Z[i]|=n\) for all \(i<z\) and \(1\le |Z[z]|\le n\) such that \(Z=(Z[1]\,\Vert \,Z[2]\,\Vert \,\ldots \,\Vert \,Z[z])\). If \(Z=\lambda \), we let \(z=1\) and \(Z[1]=\lambda \). We write \(||Z|| = z\) (number of blocks present in Z). We similarly write \((Z[1], Z[2],\ldots ,Z[z]) \xleftarrow {\scriptscriptstyle m} Z\) to denote the parsing of the bit string Z into m-bit strings \(Z[1], Z[2],\ldots , Z[z-1]\) and \(1 \le |Z[z]| \le m\). Given any sequence \(Z = (Z[1], \ldots , Z[s])\) and \(1 \le a \le b \le s\), we represent the sub sequence \((Z[a], \ldots , Z[b])\) by \(Z[a\ldots b]\). For integers \(a \le b\), we write \([a\ldots b]\) for the set \(\{a,a+1,\ldots ,b\}\). For two bit strings X and Y with \(|X| \ge |Y|\), we define the extended xor-operation as

$$\begin{aligned} X\underline{\oplus }Y&= X[1\ldots |Y|]\oplus Y \text { and }\nonumber \\ X~\overline{\oplus }~Y&= X \oplus (Y\Vert 0^{|X|-|Y|}), \end{aligned}$$

where \((X[1], X[2],\ldots ,X[x]) \xleftarrow {\scriptscriptstyle 1}X\), and thus, \(X[1\ldots |Y|]\) denotes the first |Y| bits of X. Also note that,

$$\begin{aligned} X \underline{\oplus } Y = (X \overline{\oplus } Y)[1\ldots |Y|]. \end{aligned}$$

When \(|X|=|Y|\), both operations reduce to the standard \(X\oplus Y\).

Let \(\gamma = (\gamma [1], \ldots , \gamma [s])\) be a tuple of equal-length strings. We define \(\textsf {mcoll}(\gamma ) = r\) if there exist distinct \(i_1, \ldots , i_r \in [1\ldots s]\) such that \(\gamma [{i_1}] = \cdots = \gamma [{i_r}]\) and r is the maximum of such integer. We say that \(\{i_1, \ldots , i_r\}\) is an r-multi-collision set for \(\gamma \). Finally, we use the notation \(E^c\) to denote the complement of an event E.

Authenticated Encryption and Security Definitions An authenticated encryption (AE) is an integrated scheme that provides both privacy of a plaintext \(M \in \{0,1\}^*\) and authenticity of M as well as associated data \(A \in \{0,1\}^*\). Taking a nonce N (which is a value unique for each encryption) together with associated data A and plaintext M, the encryption function of AE, \(\mathcal {E}_K\), produces a tagged ciphertext (CT) where \(|C| = |M|\) and \(|T| = t\). Typically, t is fixed and we assume \(n=t\) throughout the paper. The corresponding decryption function, \(\mathcal {D}_K\), takes (NACT) and returns a decrypted plaintext M when the verification on (NACT) is successful, otherwise returns the atomic error symbol denoted by \(\bot \).

Privacy Given an adversary \(\mathcal {A}\), we define the PRF advantage of \(\mathcal {A}\) against \(\mathcal {E}\) as \(\mathbf{Adv }_{\mathcal {E}}^{\text {prf}}(\mathcal {A}) = |\Pr [\mathcal {A}^{\mathcal {E}_K} = 1] - \Pr [\mathcal {A}^{\$} = 1]|\), where \(\$ \) returns a random string of the same length as the output length of \(\mathcal {E}_K\), by assuming that the output length of \(\mathcal {E}_K\) is uniquely determined by the query. The PRF advantage of \(\mathcal {E}\) is defined as

$$\begin{aligned} \mathbf{Adv }_{\mathcal {E}}^{\text {prf}}(q, \sigma , t) = \max _{\mathcal {A}} \mathbf{Adv }_{\mathcal {E}}^{\text {prf}}(\mathcal {A})\, , \end{aligned}$$

where the maximum is taken over all adversaries running in time t and making q queries with the total number of blocks in all the queries being at most \(\sigma \). If \(\mathcal {E}_K\) is an encryption function of AE, we call it the privacy advantage and write as \(\mathbf{Adv }_{\mathcal {E}}^{\text {priv}}(q, \sigma , t)\), as the maximum of all nonce-respecting adversaries (that is, the adversary can arbitrarily choose nonces provided all nonce values in the encryption queries are distinct).

Authenticity We say that an adversary \(\mathcal {A}\)forges an AE scheme \((\mathcal {E},\mathcal {D})\) if \(\mathcal {A}\) is able to compute a tuple (NACT) satisfying \(\mathcal {D}_K(N, A, C, T) \ne \bot \), without querying (NAM) for some M to \(\mathcal {E}_K\) and receiving (CT), i.e., (NACT) is a non-trivial forgery.

In general, a forger is nonce-respecting with respect to encryption queries, but can make \(q_f\) forging attempts without restriction on N in the decryption queries, that is, N can be repeated in the decryption queries and an encryption query and a decryption query can use the same N. The forging advantage for an adversary \(\mathcal {A}\) is written as \(\mathbf{Adv }_{\mathcal {E}}^{\text {auth}}(\mathcal {A}) = \Pr [\mathcal {A}^{\mathcal {E}_K,\mathcal {D}_K} \text{ forges }]\), and we write

$$\begin{aligned} \mathbf{Adv }_{\mathcal {E}}^{\text {auth}}((q_e, q_f), (\sigma _e, \sigma _f), t) = \max _{\mathcal {A}} \mathbf{Adv }_{\mathcal {E}}^{\text {auth}}(\mathcal {A}) \end{aligned}$$

to denote the maximum forging advantage for all adversaries running in time t, making \(q_e\) encryption and \(q_f\) decryption queries with total number of queried blocks being at most \(\sigma _e\) and \(\sigma _f\), respectively.

Unified Security Notion for AE The privacy and authenticity advantages can be unified into a single security notion as introduced in [34, 57]. Let \(\mathcal {A}\) be an adversary that only makes non-repeating queries to \(\mathcal {D}_K\). Then, we define the AE advantage of \(\mathcal {A}\) against \(\mathcal {E}\) as

$$\begin{aligned} \mathbf{Adv }_{\mathcal {E}}^{\text {AE}}(\mathcal {A}) = |\Pr [\mathcal {A}^{\mathcal {E}_K, \mathcal {D}_K} = 1] - \Pr [\mathcal {A}^{\$, \bot } = 1]|, \end{aligned}$$

where the \(\bot \)-oracle always returns \(\bot \) and the \(\$ \)-oracle is as the privacy advantage. We similarly define \(\mathbf{Adv }_{\mathcal {E}}^{\text {AE}}((q_e, q_f), (\sigma _e, \sigma _f), t) = \max _{\mathcal {A}} \mathbf{Adv }_{\mathcal {E}}^{\text {AE}}(\mathcal {A})\), where the maximum is taken over all adversaries running in time t, making \(q_e\) encryption and \(q_f\) decryption queries with the total number of blocks being at most \(\sigma _e\) and \(\sigma _f\), respectively.

Blockcipher Security We use a blockcipher E as the underlying primitive, and we assume the security of E as a PRP (pseudorandom permutation). The PRP advantage of a blockcipher E is defined as \(\mathbf{Adv }_{E}^{\text {prp}}(\mathcal {A}) =|\Pr [\mathcal {A}^{E_K} = 1] - \Pr [\mathcal {A}^{\mathsf {P}} = 1]|\), where \(\mathsf {P}\) is a random permutation uniformly distributed over all permutations over \(\{0,1\}^n\). We write

$$\begin{aligned} \mathbf{Adv }_{E}^{\text {prp}}(q,t) = \max _{\mathcal {A}} \mathbf{Adv }_{E}^{\text {prp}}(\mathcal {A}), \end{aligned}$$

where the maximum is taken over all adversaries running in time t and making q queries. Here, \(\sigma \) does not appear as each query has a fixed length.

Coefficients-H Technique We outline the Coefficients-H technique developed by Patarin, which serves as a convenient tool for bounding the advantage (see [53, 64]). We will use this technique (without giving a proof) to prove our main theorem. Consider two oracles \(\mathcal {O}_0 = (\$, \bot )\) (the ideal oracle for the relaxed game) and \(\mathcal {O}_1\) (real, i.e., our construction in the same relaxed game). Let \(\mathcal {V}\) denote the set of all possible views an adversary can obtain. For any view \(\tau \in \mathcal {V}\), we will denote the probability to realize the view as \({\mathsf {ip}_{\mathsf {real}}}(\tau )\) (or \({\mathsf {ip}_{\mathsf {ideal}}}(\tau )\)) when it is interacting with the real (or ideal, respectively) oracle. We call these interpolation probabilities. Without loss of generality, we assume that the adversary is deterministic and fixed. Then, the probability space for the interpolation probabilities is uniquely determined by the underlying oracle. As we deal with stateless oracles, these probabilities are independent of the order of query responses in the view. Suppose we have a set of views, \(\mathcal {V}_{\text {good}} \subseteq \mathcal {V}\), which we call good views, and the following conditions hold:

  1. 1.

    In the game involving the ideal oracle \(\mathcal {O}_0\) (and the fixed adversary), the probability of getting a view in \(\mathcal {V}_{\text {good}}\) is at least \(1-\epsilon _1\).

  2. 2.

    For any view \(\tau \in \mathcal {V}_{\text {good}}\), we have \(\mathsf {ip}_{\mathsf {real}}(\tau ) \ge (1-\epsilon _2) \cdot \mathsf {ip}_{\mathsf {ideal}}(\tau )\).

Then, we have \(|\Pr [\mathcal {A}^{\mathcal {O}_0}=1]-\Pr [\mathcal {A}^{\mathcal {O}_1}=1]|\le \epsilon _1+\epsilon _2\). The proof can be found at (say) [64].

3 Idealized Combined Feedback Mode

In this section, we introduce our idealized combined feedback mode. Let \(E_K\) be the underlying primitive, a blockcipher, with key K. Depending on how the next input block of \(E_K\) is determined from the previous output of \(E_K\), a plaintext block, or a ciphertext block, we can categorize different types of feedback modes. Some of the feedback modes are illustrated in Fig. 1. The first three modes are known as the message feedback mode, ciphertext feedback mode, and output feedback mode, respectively. The examples using the first three modes can be found in the basic encryption schemes [4] or AE schemes [5, 39, 40, 69]. The fourth mode, which uses additional (linear) operation \(G:\mathcal {B}\rightarrow \mathcal {B}\), is new. We call it combined feedback. In the combined feedback mode, the next input block X[i] of the underlying primitive \(E_K\) depends on at least two of the following three values: (1) previous output \(E_K(X[i-1])\), (2) plaintext M[i], and (3) ciphertext C[i]. With an appropriate choice of G, this feedback mode turns out to be useful for building small and efficient AE schemes. We provide a unified presentation of all types of feedback functions below.

Fig. 1
figure 1

Different types of feedback modes. We introduce the last feedback mode (called the combined feedback mode) in our construction

Definition 1

(Feedback Function) A function \(\rho : \mathcal {B} \times \mathcal {B} \rightarrow \mathcal {B} \times \mathcal {B}\) is called a feedback function (for an encryption) if there exists a function \(\rho ': \mathcal {B} \times \mathcal {B} \rightarrow \mathcal {B} \times \mathcal {B}\) (used for decryption) such that

$$\begin{aligned} \forall Y, M \in \mathcal {B},~~ \rho (Y, M) = (X, C) \Rightarrow \rho '(Y, C) = (X, M). \end{aligned}$$
(2)

\(\rho \) is called a plaintext or output feedback if X depends only on M or Y, respectively (e.g., the first and third modes in Fig. 1). Similarly, it is called ciphertext feedback if X depends only on C in the function \(\rho '\) (e.g., the second mode in Fig. 1). All other feedback functions are called combined feedback.

The condition stated in Eq. (2) is sufficient for inverting the feedback computation from the ciphertext. Given the previous output block \(Y=E_K(X[i-1])\) and a ciphertext block \(C= C[i-1]\), we are able to compute \((X,M)=(X[i], M[i])\) by using \(\rho '(Y,C) \).

In particular, when G is not the zero function nor the identity function, the combined feedback mode using this G is not reduced to the remaining three modes. It can be described as \(\rho (Y,M) = (X,C)=(G(Y) \oplus M, Y \oplus M)\).

3.1 iCOFB Construction

The idealized version of our construction is described in Fig. 3 and illustrated in Fig. 2. Here we idealize in many ways from a real implementable AE construction. This is a simple warm up for the sake of simplicity and to understand the basic structure of our main construction. In the following construction, we simply assume that the last message block is a complete block. In other words, all messages are elements of \(\mathcal {B}^+ {\mathop {=}\limits ^{\text {def}}}\cup _{i \ge 1} \mathcal {B}^i\). We denote the set of all nonnegative integers as \(\mathbb {Z}_{\ge 0}\). We also consider a tweakable random function \(\mathsf {R}\) which takes tweak \((N, A, i, j) \in \mathcal {N} \times \{0,1\}^* \times \mathbb {Z}_{\ge 0} \times \mathbb {Z}_{\ge 0}\) where N is called a nonce chosen from a nonce space \(\mathcal {N}\), A is associated data, and the pair of nonnegative integers (ij) is called a position tweak.

Fig. 2
figure 2

\(\textsf {iCOFB}\): It is based on a tweakable random function \(\mathsf {R}_{N, A, (a,b)}\) and a feedback function \(\rho \). The diagram shows how the tag and ciphertext computed for a three complete blocks message

Fig. 3
figure 3

Encryption and decryption algorithms of \(\textsf {iCOFB}\) AE mode. Here \(M,C \in \mathcal {B}^+\) and \(\rho , \rho ' {:} \mathcal {B}^2 \rightarrow \mathcal {B}^2\). The choices of these functions are described in Sect. 3.2

3.2 The Feedback Function \(\rho \)

It is easy to see that for a feedback function \(\rho \), the decryption algorithm correctly decrypts a ciphertext. If we closely look into the correctness property, what we need that given (YC), the value of M should be uniquely computable. Once M is computed, X can be computed by applying \(\rho \) again. In this paper, we require very lightweight function, e.g., linear function, on the choice of \(\rho \). If \(\rho \) is a linear function, then we can express \(\rho \) by a \(2n \times 2n\) binary matrix

$$\begin{aligned} \begin{pmatrix} E_{1,1} &{}\quad E_{1,2} \\ E_{2,1} &{}\quad E_{2,2} \end{pmatrix} \end{aligned}$$

where \(E_{i,j}\)’s are \(n \times n\) binary matrices and the line 7 in the encryption algorithm of Fig. 3 becomes

$$\begin{aligned} X[i]= & {} E_{1,1} \cdot Y[i-1] + E_{1,2} \cdot M[i],\\ C[i]= & {} E_{2,1} \cdot Y[i-1] + E_{2,2} \cdot M[i]. \end{aligned}$$

We have the following lemma.

Lemma 1

If \(\rho \) is a linear function satisfying Eq. (2), then \(E_{2,2}\) must be invertible.

Proof

If not, then there exist \(M \ne M'\) with \(E_{2,2} \cdot M = E_{2,2} \cdot M'\). Then, for any Y, \(\rho (Y, M) =(X,C)\) and \(\rho (Y, M') = (X', C)\). However, \(\rho '(Y, C)\) cannot be both (XM) and \((X', M')\). \(\square \)

Assuming \(E_{2,2}\) is invertible, let \(\rho \) be a linear feedback function satisfying Eq. (2). Then, \(\rho '\) can be chosen to be a linear function defined as follows:

$$\begin{aligned} (E_{1,1} + E_{1,2} E_{2,2}^{-1}E_{2,1}) \cdot Y[i-1] + E_{1,2} \cdot C[i]= & {} X[i] \\ E_{2,2}^{-1}E_{2,1} \cdot Y[i-1] + E_{2,2}^{-1}\cdot C[i]= & {} M[i] . \end{aligned}$$

We also express the above system of linear equations as

$$\begin{aligned} \begin{pmatrix} D_{1,1} &{}\quad D_{1,2} \\ D_{2,1} &{}\quad D_{2,2} \end{pmatrix} \cdot \begin{pmatrix} Y[i-1] \\ C[i] \end{pmatrix} = \begin{pmatrix} X[i] \\ M[i] \end{pmatrix} \end{aligned}$$

where \(D_{i,j}\)’s are \(n \times n\) matrix determined from the above linear equations. In particular, \(D_{1,1} = (E_{1,1} + E_{1,2} E_{2,2}^{-1}E_{2,1})\), \(D_{1,2} = E_{1,2}\), \(D_{2,1} = E_{2,2}^{-1}E_{2,1}\), and \(D_{2,2} = E_{2,2}^{-1}\).

Let \(\varvec{I}\) and \(\varvec{O}\) denote the identity matrix and zero matrix, respectively, of size n. We have seen that in all types of feedback modes, we define the ciphertext block C as \(M \oplus Y\). They differ how the next input block X is defined. Let \(\rho _{\mathsf {OFB}}\), \(\rho _{\mathsf {CFB}}\), \(\rho _{\mathsf {PFB}}\) denote the feedback functions for output, ciphertext, and plaintext feedback mode, respectively. Then, we have

$$\begin{aligned} \rho _{\mathsf {OFB}} = \begin{pmatrix} \varvec{I} &{}\quad \varvec{O} \\ \varvec{I} &{}\quad \varvec{I} \end{pmatrix}, \quad \rho _{\mathsf {CFB}} = \begin{pmatrix} \varvec{I} &{}\quad \varvec{I} \\ \varvec{I} &{}\quad \varvec{I} \end{pmatrix}, \quad \rho _{\mathsf {PFB}} = \begin{pmatrix} \varvec{O} &{}\quad \varvec{I} \\ \varvec{I} &{}\quad \varvec{I} \end{pmatrix}. \end{aligned}$$

In this paper, we consider combined feedback function. By combined feedback, we mean that the following four matrices \(E_{1,1}, E_{1,2}, D_{1,1}\) and \(D_{1,2}\) are nonzero. Note that these matrices represent the effect of output vector and plaintext or ciphertext block to the next input block. In this paper, we fix our choice of \(\rho \) (and \(\rho '\)) as

$$\begin{aligned} \rho = \begin{pmatrix} \varvec{G} &{}\quad \varvec{I} \\ \varvec{I} &{}\quad \varvec{I} \end{pmatrix}, \quad \rho ' = \begin{pmatrix} \varvec{I} + \varvec{G} &{}\quad \varvec{I} \\ \varvec{I} &{}\quad \varvec{I} \end{pmatrix} \end{aligned}$$

where \(\varvec{G}\) is an invertible matrix such that \(\varvec{I} + \varvec{G}\) is also invertible (see Fig. 1). We will specify one choice of \(\varvec{G}\) later.

3.3 Security Analysis of the Idealized Construction

In this section, we provide the security analysis of the idealized construction. Now we prove that under a very minimal assumption on \(\rho \), the idealized version has perfect privacy and authenticity with negligible advantage. We say that a linear feedback function \(\rho \) is validFootnote 2 (which is true for our choice of the feedback function) if

(P1)\(E_{2,1}\) is invertible,   (A1)\(D_{1,2}\) is invertible, and (A2)\(D_{1,1}\) is invertible,

where \(\mathbf{(P1) }\) is needed for the privacy notion and (A1) and (A2) are needed for the authenticity notion. We remark that the invertibility of \(E_{1,1}\) is not required.Footnote 3 Here, \(\mathbf{A1 }\) implies that for any two \(C \ne C'\) and for any Y, \(D_{1,1}\cdot Y + D_{1,2}\cdot C \ne D_{1,1}\cdot Y + D_{1,2}\cdot C'\). Note that we assume that \(E_{2,2}^{-1}\) is invertible for correctness. Thus, A2 means that the \(2n \times 2n\) feedback matrix for \(\rho \) is also invertible. Another important implication of A2 is the following:

$$\begin{aligned} \Pr [Y{\mathop {\leftarrow }\limits ^{{\scriptscriptstyle \$}}}\mathcal {B}: D_{1,1}\cdot Y + D_{1,2}\cdot C = X] = 2^{-n}, ~\forall (C, X) \in \mathcal {B}^2. \end{aligned}$$

We have the following theorem.

Theorem 1

If \(\rho \) is valid then for adversary \(\mathcal {A}\) making \(q_e\) encryption queries and \(q_f\) forging attempts having at most \(\ell _f\) many blocks, we have

$$\begin{aligned} \mathbf{Adv }_{\mathsf {iCOFB}}^{\text {priv}}(\mathcal {A}) = 0,\quad \mathbf{Adv }_{\mathsf {iCOFB}}^{\text {auth}}(\mathcal {A}) \le \frac{q_f(\ell _f+1)}{2^{n}}. \end{aligned}$$

Proof

We consider an adversary \(\mathcal {A}\) which makes \(q_e\) nonce-respecting encryption queries \((A_i, N_i, M_i)\) and receives \((C_i, T_i)\), \(1 \le i \le q_e\), and makes \(q_f\) decryption queries \((N^*_i, A^*_i, C^*_i, T^*_i)\), \(1 \le i \le q_f\). The intermediate variables Z appeared in the both encryption and decryption algorithms are represented by \(Z_i[j]\) for the jth computation of the ith query, where Z can be AMCXY, and t (recall that t is a position tweak, i.e., the t is a function of the indices of the current data block). Note that \(T_i, T^*_i\) and \(N_i\)’s are single blocks.

Perfect Privacy We prove the perfect privacy under the assumption that \(E_{2,1}\) is invertible (i.e., P1). To show perfect privacy, it would be sufficient to show that \(C_1, \ldots , C_{q_e}\) are uniformly and independently distributed and this would be true provided \(Y_1, \ldots , Y_{q_e}\) are uniformly and independently distributed (due to P1 which says that keeping all other fixed, influence from \(Y_i[j]\) to \(C_i[j]\) is bijective). Note that \(Y_i[j] = \mathsf {R}_{N_i, A_i, t_i[j]}(X_i[j])\). We know that a tweakable random function returns a random string if the input concatenated with the tweak is fresh. So it is sufficient to show that for all ij, \((N_i, A_i, t_i[j], X_i[j])\) is fresh. But this is easy to see as \(\mathcal {A}\) is a nonce-respecting adversary, and for any i, the values of \(t_i[j]\)’s are distinct, and hence, \((N_i, t_i[j])\)’s are distinct for all (ij).

Authenticity Advantage We first consider the case of single forging attempt. Let \((N^*,A^*,C^*,T^*)\) be the forging attempt, and let \(m^*\) be the length of \(C^*\) in n-bit blocks. We also define p as the length of the largest common prefix of \(((C_\alpha [1], t_\alpha [1]), \ldots , (C_\alpha [m_1], t_\alpha [m_1]))\) and \(((C^*[1], t^*[1]), \ldots , (C^*[m^*], t^*[m^*]))\) if \((N^*, A^*) = (N_\alpha , A_\alpha )\) for some \(1\le \alpha \le q\). Since all \(N_i\)s are distinct, \(\alpha \) is unique if exists. We define \(p=0\) if \((N^*, A^*)\ne (N_i, A_i)\) for all \(1\le i\le q\).

When \(p>0\), from the definition of tweak \(t[\cdot ]\), it is easy to see that \(p < \min \{m_\alpha , m^*\}\). So, we have

$$\begin{aligned} Y_\alpha [p] = Y^*[p], ~~(C_\alpha [p+1], t_\alpha [p+1]) \ne (C^*[p+1], t^*[p+1]). \end{aligned}$$

Claim

\((N^*, A^*, t^*[p+1], X^*[p+1])\) is fresh among all tweaked inputs.

Proof

(of Claim) We prove this in two sub-cases. We first note that for all \(i \ne p+1\), \(t_\alpha [i] \ne t^*[p+1]\) and so it would be sufficient to show that \((t_\alpha [p+1], X_\alpha [p+1]) \ne (t^*[p+1], X^*[p+1])\). If \(C_\alpha [p+1] = C^*[p+1]\) then \(X_\alpha [p+1] = X^*[p+1]\) but \(t_\alpha [p+1] \ne t^*[p+1]\). Similarly, when \(C_\alpha [p+1] \ne C^*[p+1]\), by A1 condition, the next tweaked inputs are distinct. When \(p=0\), \(N^*\ne N_i\) for all \(1\le i\le q\); hence, the claim holds. \(\square \)

Therefore, \(Y^*[p+1]\) is uniformly distributed given the values obtained so far, including the case \(p=0\). By A2 condition, the probability of the next input also remains fresh with probability at least \((1 - 2^{-n})\). We can continue this until the last tweaked input, and so the last tweaked input remains fresh with probability at least \(1 - {(m^*-p)}/{2^{n}}\), and if last tweaked input is fresh, the n-bit tag is completely random. Hence, the forging probability is \((m^*-p)/2^n +1/2^n\). When \(p=0\), the first tweaked input \((N^*, A^*, t^*[0], 0^{n})\) is fresh; hence, the forging probability is \((m^*+1)/2^n\). Thus, the case \(p=0\) achieves the maximum forging probability \((m^*+1)/{2^{n}}\) for a single attempt.

In the case of \(q_f>1\) forging attempts, the success probability is at most a multiplication of \(q_f\) and the bound for the single forging attempt. Hence, it is at most \({q_f(\ell _f+1)}/{2^{n}}\) as we have \(m^* \le \ell _f\) from the definition. This completes the proof. \(\square \)

Now we see that P1 and A1 are also necessary. For example, if P1 is not satisfied, then we find a nonzero block d such that, \(d^{tr}\cdot E_{2,1} = 0^{n}\) where \(d^{tr}\) denotes the transposition of the vector. Then, for any Y, \(d^{tr}\cdot E_{2,2} \cdot M = d^{tr} \cdot C\) where \(C = E_{2,1}\cdot Y + E_{2,2} \cdot M\). This observation can be used as a privacy distinguisher.

Similarly if A1 is not satisfied, then \(D_{1,2}\) is not invertible. So there exists a nonzero d such that \(D_{1,2}\cdot d = 0^{n}\). Thus, \(D_{1,1}\cdot Y + D_{1,2}\cdot C^* = D_{1,1}\cdot Y + D_{1,2}\cdot C\) where \(C^* = C + d\). This observation can be extended to an authenticity attack.

4 COFB: A Small-State, Rate-1, Inverse-Free AE Mode

In this section, we present our proposal, \(\textsf {COFB}\), which has rate-1 (i.e., needs one blockcipher call for one input block), and is inverse-free, i.e., it does not need a blockcipher inverse (decryption). In addition to these features, this mode has a quite small-state size, namely \(1.5n+k\) bits, in case ‘’ the underlying blockcipher has an n-bit block and k-bit keys. We first specify the basic building blocks and parameters used in our construction.

4.1 Specification

Key and Blockcipher The underlying cryptographic primitive is an n-bit blockcipher, \(E_K\). We assume that n is a multiple of 4. The key of the scheme is the key of the blockcipher, i.e., K.

Masking Function We define the masking function \(\mathsf {mask}:\{0,1\}^{n/2}\times \mathbb {N}^2 \rightarrow \{0,1\}^{n/2}\) as follows:

$$\begin{aligned} \mathsf {mask}(\varDelta ,a,b) = \alpha ^a \cdot (1 + \alpha )^b \cdot \varDelta \end{aligned}$$
(3)

We may write \(\mathsf {mask}_{\varDelta }(a,b)\) to mean \(\mathsf {mask}(\varDelta ,a,b)\). Here, \(\cdot \) denotes the multiplication over \( GF (2^{n/2})\), and \(\alpha \) denotes the primitive element of the field. For the primitive polynomial defining the field, we choose the lexicographically first one, that is, \(p(x) = \texttt {x}^{64} + \texttt {x}^4 + \texttt {x}^3 + \texttt {x} +1\), following [6, 38]. Rogaway [55] showed that for all \((a, b) \in \{0,\ldots ,2^{51}\} \times \{0,\ldots ,2^{10}\}\), the values of \(\alpha ^a \cdot (1 + \alpha )^b\) are distinct. If we follow the notations of [55], the right- hand side of Eq. (3) could be written as \(2^a 3^b\varDelta \). For other values of n, we need to identify the primitive element \(\alpha \) of the primitive polynomial and an integer L such that \(\alpha ^a \cdot (1 + \alpha )^b\) are distinct for all \((a, b) \in \{0,\ldots ,L\}\times \{0,\ldots ,4\}\). Then, the total allowed size of a message and associated data would be at most nL bits. We need this condition to prove the security claim. In particular, we have the following properties of the masking function.

Lemma 2

  For any \((a, b) \ne (a', b')\) chosen from the set \(\{0,\ldots ,L\}\times \{0,\ldots ,4\}\) (as described above), \(c \in \{0,1\}^{n/2}\) and a random n / 2 bit string \(\varDelta \), we have

$$\begin{aligned} \Pr [\mathsf {mask}_{\varDelta }(a,b) \oplus \mathsf {mask}_{\varDelta }(a',b') = c] = \frac{1}{2^{n/2}}, \text { and } \Pr [\mathsf {mask}_{\varDelta }(a,b) = c] = \frac{1}{2^{n/2}}. \end{aligned}$$

Proof of the first equation trivially follows from the fact that \(\alpha ^a \cdot (1 + \alpha )^b\) are distinct for all \((a, b) \in \{0,\ldots ,L\}\times \{0,\ldots ,4\}\).

Similar masking functions are frequently used in other modes, such as [10, 49, 55]; however, the masks are full n bits. The use of n-bit masking function usually allows to redefine the AE scheme as a mode of XE or XEX tweakable blockcipher [55], which significantly reduces the proof complexity. In our case, to reduce the state size, we decided to use the n / 2-bit masking function, and as a result the proof is ad hoc and does not rely on XE or XEX.

Feedback Function Let \(Y\in \{0,1\}^n\) and \((Y[1], Y[2], Y[3], Y[4]) \xleftarrow {\scriptscriptstyle n/4} Y\), where \(Y[i] \in \{0,1\}^{n/4}\). We define \(G:\mathcal {B}\rightarrow \mathcal {B}\) as \(G(Y) = (Y[1]\oplus Y[4], Y[1], Y[2], Y[3])\).Footnote 4 We also view G as the \(n \times n\) non-singular matrix, so we write G(Y) and \(G\cdot Y\) interchangeably. For \(M \in \mathcal {B}^{\le }\) and \(Y \in \mathcal {B}\), we define \(\rho _1(Y, M) = G \cdot Y \oplus \overline{M}\). The feedback function \(\rho \) and its corresponding \(\rho '\) are defined as

$$\begin{aligned} \rho (Y, M)&= (\rho _1(Y,M), Y\,\underline{\oplus }\,M), \\ \rho '(Y, C)&= (\rho _1(Y,Y\,\underline{\oplus }\,C), Y\,\underline{\oplus }\,C). \end{aligned}$$

Note that when \((X, M) = \rho '(Y,C)\), \(X = (G \oplus I)\cdot Y \underline{\oplus }\,C\). Our choice of G ensures that \(I \oplus G\) is also invertible matrix. So when Y is chosen randomly for both computations of X (through \(\rho \) and \(\rho '\)), X also behaves randomly. We need this property when we bound probability of bad events later.

Furthermore, in order to handle the case that the last ciphertext block is shorter than n bits, we need more conditions. That is, for the last ciphertext block (C[m] with \(|C[m]|=b\le n\)), the state S is updated by \(\rho \) as

$$\begin{aligned} S' = G(S) + \mathsf {ozp}(\mathsf {msb}_b(S)+C[m]) = G(S) + (\mathsf {msb}_b(S)\,\Vert \,0\ldots 0) + \mathsf {ozp}(C[m]). \end{aligned}$$

To make sure that \(S'\) is random whenever S is random (otherwise the tag may have smaller entropy), we also need the following condition:

  1. Condition

    \(G + M_{\mathsf {msb}[i]}\) is regular (i.e., binary matrix of rank n) for any \(i=1,2,\ldots ,n\), where \(M_{\mathsf {msb}[i]}\) denotes the matrix for extracting the first i bits: \(M_{\mathsf {msb}[i]}\cdot X = \mathsf {msb}_i(X)\).

Since \(M_{\mathsf {msb}[n]} = I\), this also covers the case \(G+I\).Footnote 5

The matrix G of the current specification corresponds to

$$\begin{aligned} G = \begin{bmatrix} 1&\quad 0&\quad 0&\quad 1 \\ 1&\quad 0&\quad 0&\quad 0 \\ 0&\quad 1&\quad 0&\quad 0 \\ 0&\quad 0&\quad 1&\quad 0 \\ \end{bmatrix}. \end{aligned}$$

It is also a transpose of a matrix called M14 in [39] and fulfills the condition mentioned above.

Tweak Value for The Last Block Given \(B\in \{0,1\}^*\), we define \(\delta _B\in \{1,2\}\) as follows:

$$\begin{aligned} \delta _B = {\left\{ \begin{array}{ll} 1 &{}\quad \text{ if } B \ne \lambda \text{ and } n \text{ divides } |B| \\ 2 &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$
(4)

This will be used to differentiate the cases that the last block of B is n bits or shorter, for B being associated data or plaintext or ciphertext. We also define a formatting function \(\mathsf {Fmt}\) for a pair of bit strings (AZ), where A is associated data and Z could be either a plaintext or a ciphertext. Let \((A[1], \ldots , A[a]) \xleftarrow {\scriptscriptstyle n} A\) and \((Z[1], \ldots , Z[z]) \xleftarrow {\scriptscriptstyle n}Z\). We define \(\textsf {t}[i]\) as follows:

$$\begin{aligned}\textsf {t}[i] = {\left\{ \begin{array}{ll} (i, 0) &{}\quad \text{ if } i< a \\ (a-1, \delta _{A}) &{}\quad \text{ if } i =a\\ (i-1, \delta _{A}) &{}\quad \text{ if } a< i < a + z \\ (a+z-2, \delta _{A} + \delta _{Z}) &{}\quad \text{ if } i = a + z \end{array}\right. } \end{aligned}$$

Now, the formatting function \(\textsf {Fmt}(A, Z)\) returns the following sequence:

$$\begin{aligned} \big ((A[1], \textsf {t}[1]), \ldots , (\overline{A[a]}, \textsf {t}[a]), (Z[1], \textsf {t}[a+1]), \ldots , (\overline{Z[z]}, \textsf {t}[a+z])\big ), \end{aligned}$$

where the first coordinate of each pair specifies the input block to be processed, and the second coordinate specifies the exponents of \(\alpha \) and \(1+\alpha \) to determine the constant over \(\text {GF}(2^{n/2})\). Let \(\mathbb {Z}_{\ge 0}\) be the set of nonnegative integers and \(\mathcal {X}\) be some non-empty set. We say that a function \(f: \mathcal {X} \rightarrow (\mathcal {B} \times \mathbb {Z}_{\ge 0} \times \mathbb {Z}_{\ge 0})^+\) is prefix-free if for all \(X \ne X'\), \(f(X) = (Y[1],\ldots ,Y[\ell ])\) is not a prefix of \(f(X') = (Y'[1],\ldots ,Y'[\ell '])\) (in other words, \((Y[1],\ldots ,Y[\ell ]) \ne (Y'[1],\ldots ,Y'[\ell ])\)). Here, for a set \(\mathcal {S}\), \(\mathcal {S}^+\) means \(\mathcal {S}\cup \mathcal {S}^2\cup \cdots \), and we have the following lemma.

Lemma 3

The function \(\mathsf {Fmt}(\cdot )\) is prefix-free.

The proof is more or less straightforward, and hence, we skip it.

We present the specifications of COFB in Fig. 5, where \(\alpha \) and \((1+\alpha )\) in Eq. (3) are written as 2 and 3. See also Fig. 4. The encryption and decryption algorithms are denoted by \(\textsf {COFB}\)-\(\mathcal {E}_K\) and \(\textsf {COFB}\)-\(\mathcal {D}_K\). We remark that the nonce length is n / 2 bits, which is enough for the security up to the birthday bound. The nonce is processed as \(E_K(0^{n/2}\,\Vert \,N)\) to yield the first internal chaining value. The encryption algorithm takes non-empty A and non-empty M, and outputs C and T such that \(|C|=|M|\) and \(|T|=n\). The decryption algorithm takes (NACT) with \(|A|,|C|\ne 0\) and outputs M or \(\bot \). Note that some of building blocks described above are not presented in Fig. 5, since they are introduced for the proof. An equivalent presentation using them is presented in Fig. 6.

Fig. 4
figure 4

Encryption of COFB for 3-block associated data and plaintext

Fig. 5
figure 5

Encryption and decryption algorithms of COFB

Fig. 6
figure 6

A presentation of COFB using Fmt function. This is equivalent to Fig. 5

5 Security of COFB

We present the security analysis of COFB in Theorem 2. Before going to the proof, as mentioned earlier, we would like to mention that we use the function Fmt and Lemma 3 in the proof to make it easy to understand. We would also like to mention that we instantiate iCOFB with COFB by choosing

$$\begin{aligned}\mathsf {R}_{N, A, (i, j)}(X) = {\left\{ \begin{array}{ll} f(N, A) &{}\mathrm{cases} \text{ if } i = 0, j = 0 \\ E_K(X \oplus \textsf {mask}_{\varDelta }(a+i-1, \delta _A)) &{}\mathrm{cases} \text{ if } i < m, j = 0\\ E_K(X \oplus \textsf {mask}_{\varDelta }(a+m-2, \delta _A + \delta _M)) &{}\mathrm{cases} \text{ if } i = m, j = 1 \end{array}\right. } \end{aligned}$$

where f(NA) is the function that simulates the associated data phase and outputs Y[a] (Line 1–11, Fig. 5, K is implicit and chosen uniformly from the key space and \(X = 0^n\) in this case). \(\varDelta \) (computed using \(E_K\) and N), a, m, \(\delta _A\) and \(\delta _M\) are described as in the previous section, and we instantiate \(\rho \) by the feedback function described in the previous section. However, the security proof of COFB does not follow from that of iCOFB, since as a tweakable PRF, the security of R is only guaranteed up to n / 4 bits, and thus, we cannot rely on the hybrid argument to show the security of COFB. We next proceed with our proof for our instantiation.

Theorem 2

(Main theorem)

$$\begin{aligned} \mathbf{Adv }_{\mathsf {COFB}}^{\text {AE}}((q_e, q_f), (\sigma _e, \sigma _f), t)&\le \mathbf{Adv }_{\text {AES}}^{\text {prp}}(q', t')+ \frac{0.5(q')^2}{2^n} \\&\quad + \frac{4\sigma _e}{2^{n/2}}+ \frac{(q_e + \sigma _e+2\sigma _f )\cdot \sigma _f + q_f}{2^n}. \end{aligned}$$

where \(q'=q_e+q_f+\sigma _e +\sigma _f\), which corresponds to the total number of blockcipher calls through the game, and \(t' = t + O(q')\). To be precise, \(\sigma _e\) is the total number of blocks in \(q_e\) encryption queries and \(\sigma _f\) is the total number blocks in the \(q_f\) decryption queries.

Proof

Fix a deterministicnon-repeating query making distinguisher \(\textsf {adv}\) that interacts with either the

  1. 1.

    Real oracle \((\textsf {COFB}\)-\({\mathcal {E}_K})\) or the

  2. 2.

    Ideal oracle \((\$)\)

making exactly\(q_e\) encryption queries \((N_i,A_i,M_i), {i=1\ldots q_e}\) with total \(\sigma _e\) many blocks and tries to forge with exactly\(q_f\) many queries \((N^*_i, A^*_i, C^*_i, T^*_i), {i=1\ldots q_f}\) having a total of \(\sigma _f\) many blocks. We also assume that adversary always makes fresh forging attempt. More formally, it cannot submit \((N_i, A_i, C_i, T_i)\) for some i, as a forging attempt.

Let \(M_i\) and \(A_i\) have \(m_i\) and \(a_i\) blocks, respectively, and \(C^*_i\) and \(A^*_i\) have \(c^*_i\) and \(a^*_i\) blocks, respectively, for every i. We use X and Y to denote the intermediate variables (blockcipher input outputs) corresponding to the encryption queries. We also use \(X^*\) and \(Y^*\) to denote the intermediate variables (blockcipher input outputs) corresponding to the forging queries. Note that some of the inputs and outputs for forging queries can be determined by those of encryption queries.

Without loss of generality, we can assume \(q'{:}= q_e+q_f+\sigma _e +\sigma _f \le 2^{\frac{n}{2}-1}\) since otherwise, the bound obviously holds as the right hand side becomes more than one.

We use the notation \(\mathsf {X}, \mathsf {Y}\) etc. (mathsf notations) to denote random variables and the capital letters XY etc. to denote fixed values. Note that all values appeared in the transcript or internally are random variables due to randomness either from the real world or from the ideal world.

5.1 Hybrid Argument (Transition from \(\textsf {COFB}\)-\({\mathcal {E}_K}\) to \(\textsf {COFB}\)-\(\textsf {R})\)

The first transition we make is to use an n-bit (uniform) random permutation \(\mathsf {P}\) instead of \(E_K\) and then to use an n-bit (uniform) random function \(\mathsf {R}\) instead of \(\mathsf {P}\). This two-step transition requires the first two terms of our bound, from the standard PRP–PRF switching lemma and from the computation to the information security reduction (e.g., see [19]). Then what we need is a bound for COFB using \(\mathsf {R}\), denoted by COFB-R. Let \(\textsf {COFB}^{-1}\)-\({\textsf {R}}\) be the corresponding decryption function. So it is sufficient to prove

$$\begin{aligned} \mathbf{Adv }_{\textsf {COFB-R}}^{\text {AE}}((q_e, q_f), (\sigma _e, \sigma _f), \infty ) \le \frac{4\sigma _e}{2^{n/2}} + \frac{(q_e + \sigma _e+2\sigma _f )\cdot \sigma _f + q_f}{2^n}. \end{aligned}$$
(5)

5.2 Description of the Real Oracle

The Real oracle simulates COFB-R to \(\textsf {adv}\) honestly. In case of \(q_e\) encryption queries, for \(i=1,\dots ,q_e\), we write \((N_i,A_i,M_i)\) and \((C_i,T_i)\) to denote the ith encryption query and response, such that

$$\begin{aligned} (C_i,T_i) = \textsf {COFB-R}(N_i,A_i,M_i). \end{aligned}$$

Here,

$$\begin{aligned} A_i=(A_i[1], \ldots , A_i[a_i]), M_i=(M_i[1], \ldots , M_i[m_i]) , C_i=(C_i[1], \ldots , C_i[m_i]). \end{aligned}$$

Let \(\ell _i = a_i + m_i\), which denotes the total input block length for the ith encryption query. We write \(X_i[j]\) (resp. \(Y_i[j]\)) for \(i=1,\dots ,q_e\) and \(j=0,\dots ,\ell _i\) to denote the jth input (resp. output) of the internal \(\mathsf {R}\) invoked at the ith encryption query, where the order of invocation follows the specification shown in Fig. 5. We remark that \(X_i[0] = 0^{n/2} \Vert N_i\) and \(Y_i[\ell _i] = T_i\) for all \(i=1,\dots ,q_e\). Similarly, we write \(\varDelta _i\) to denote \(Y^2_{i}[0] \Vert Y^3_{i}[0]\) where \(Y^1_{i}[0] \Vert \cdots \Vert Y^4_{i}[0] \xleftarrow {\scriptscriptstyle n/4} Y_i[0]\). Note that, total number input blocks queried to the encryption oracle (corresponding to the \(q_e\) queries) is \(\sigma _e\).

In case of all the \(q_f\) forge queries, for \(i'=1,\dots ,q_f\), we write \((N^*_{i'}, A^*_{i'}, C^*_{i'}, T^*_{i'})\) and \(Z^*_{i'}\) to denote the \(i'\)th forging attempt and response, such that

$$\begin{aligned} Z^*_{i'} = \textsf {COFB}^{-1}\text{- }{\textsf {R}}(N^*_{i'}, A^*_{i'}, C^*_{i'}, T^*_{i'}), \end{aligned}$$

where \(Z^*_{i'}\) is a valid message \(M^*_{i'}=(M^*_{i'}[1], \ldots , M^*_{i'}[c^*_{i'}])\) or \(\perp \). Here, \(A^*_{i'}=(A^*_{i'}[1], \ldots , A^*_{i'}[a^*_{i'}])\) and \(C^*_{i'}=(C^*_{i'}[1], \ldots , C^*_{i'}[c^*_{i'}])\). Let \(\ell ^*_{i'} = a^*_{i'} + c^*_{i'}\), which denotes the total input block length for the \(i'\)th forge query. We write \(X^*_{i'}[j']\) (resp. \(Y^*_{i'}[j']\)) for \(i'=1,\dots ,q_f\) and \(j=0,\dots ,\ell ^*_i\) to denote the jth input (resp. output) of the internal \(\mathsf {R}\) invoked at the ith forging attempt, where the order of invocation follows the specification shown in Fig. 5.

Definition 2

For any i, let \(p_i\) denote the length of the longest common prefix of \(\textsf {Fmt}(A^*_i, C^*_i)\) and \(\textsf {Fmt}(A_j, C_j)\) where \(N_j = N^*_i\). Note that there cannot be more than one j as the adversary is nonce respecting. If there is no such j, we define \(p_i = -1\).

In the above definition, it holds that \(p_i < \min \{\ell ^*_i, \ell _j\}\) as \(\textsf {Fmt}\) is prefix-free. So, \(X_i^*[0\ldots p_i]\) is same as \(X_j[0\ldots p_i]\) due to common prefix. Moreover, \(X_i^*[p_i+1]\) is also determined as \(X_i^*[p_i+1] = M^*_i[p_i -a^*_i + 1] \oplus G\cdot Y^*[p_i] \oplus \textsf {mask}_{\varDelta _j}(t_j[p_i])\).

5.3 Releasing More Information

We release more information to the adversary, which can only gain the advantage. First, after completing all queries and forging attempts (i.e., decryption queries), let the adversary \(\textsf {adv}\) learn all the Y-values for all encryption queries only (which also allows adversary to learn X-values too). In addition, we also release the \(Y^*\)-values (consequently \(X^*\)-values) for the decryption queries (they are sampled according to the Y-values from the encryption queries). To be precise, we sample \(Y^*\) by the following way. First we denote the set of all prefixes of the formatted inputs for all the \(q_e\) encryption queries by \(P_{enc}\) (see the following example). We similarly denote the set of all formatted input prefixes for all the \(q_f\) decryption queries by \(P_{dec}\).

Example 1

Let \(q_e = 2\) and \(((N,t[1]), (A, t[2]), (M, t[3])) \leftarrow \textsf {Fmt}(N, A, M)\) and \(((N', t'[1]), (A', t'[2]), (M', t'[3])) \leftarrow \textsf {Fmt}(N', A', M')\) with \(N\ne N'\) be the two encryption queries. Hence,

$$\begin{aligned} P_{enc}= & {} \{(N, t[1]), (N', t'[1]), ((N, t[1]), (A, t[2])), ((N,t[1]), (A, t[2]), (M, t[3])), \\&((N', t'[1]), (A', t'[2])), ((N', t'[1]), (A', t'[2]), (M', t'[3]))\}. \end{aligned}$$

\(\square \)

Note that, \(|P_{enc}| = \varSigma _{i=1}^{q_e} (\ell _i+1)\) as nonces do not repeat. However, \(|P_{dec}| \le \varSigma _{i=1}^{q_f} (\ell ^*_i+1)\) as nonces can repeat in the decryption queries. For all \(x \in P_{dec}\), we sample \(\mathsf {Z}_x\) uniformly from \(\{0, 1\}^n\). Let

$$\begin{aligned} (Z^*_i[1], \ldots , Z^*_i[\ell ^*_i]) \leftarrow (A^*_i[1], \ldots , A^*_i[a^*_i], C^*_i[1], \ldots C^*_i[c^*_i]). \end{aligned}$$

Note that, the prefix set corresponding to \(Y^*_i[j]\) denoted by \(\textsf {prefix}(Y^*, i, j)\) is

$$\begin{aligned} ((N^*_i, *), (Z^*_i[1], *), \ldots , (Z^*_i[j], *)), \end{aligned}$$

where \(*\) denotes certain t values. Similarly, the prefix set denoted by \(\textsf {prefix}(Y, i, j)\) corresponding to \(Y_i[j]\) is

$$\begin{aligned} ((N_i, *), (Z_i[1], *), \ldots , (Z_i[j], *)), \end{aligned}$$

where

$$\begin{aligned} (Z_i[1], \ldots , Z_i[\ell ^*_i]) \leftarrow (A_i[1], \ldots , A_i[a^*_i], M_i[1], \ldots M_i[m_i]). \end{aligned}$$

We now sample \(Y^*\) values by the following way

$$\begin{aligned} \mathsf {Y}^*_i[j] = {\left\{ \begin{array}{ll} \mathsf {Y}_{i'}[j'] &{}\quad \text{ if } j \le p_i \ \text{ and } \ \textsf {prefix}(\mathsf {Y}^*, i, j) = \textsf {prefix}(\mathsf {Y}, i', j') \\ \mathsf {Z}_x &{}\quad \text{ if } j > p_i, \ \text{ and } \ \textsf {prefix}(\mathsf {Y}^*, i, j) = x. \end{array}\right. } \end{aligned}$$

Note that, \(\mathsf {X}^*\)s are determined from the \(\mathsf {Y}^*\)s and \(C^*\)s.

5.4 Description of the Ideal Oracle

The original ideal oracle returns all ciphertext \(C_i\) and \(T_i\) randomly for encryption queries. In case of all the \(q_f\) forge queries \((N^*_{i'}, A^*_{i'}, C^*_{i'}, T^*_{i'})\), \(i'\in \{1,\ldots ,q_f\}\), the oracle returns \(\perp \) (here we assume that the adversary makes only fresh queries). The encryption of the ideal oracle can equivalently be sampled in the following way:

We first sample all \(Y_i[j]\) blocks uniformly and independently from \(\{0, 1\}^n\), \(\forall i \in \{0, \ldots , q_e\}, \forall j \in \{0, \ldots , \ell _i \}\). The ciphertext blocks during the message processing phase are computed as

$$\begin{aligned} C_i[j-a_i] = M_i[j-a_i] \oplus Y_i[j], \forall i \in \{1, \ldots q_e\}, j \in \{a_i, \ldots , \ell _i-2\}. \end{aligned}$$

Note that, \(\forall i \in \{1, \ldots , q_e\}\), \(\varDelta _i\) values are computed as \(Y^2_{i}[0] \Vert Y^3_{i}[0]\). Also, \(\forall i \in \{1, \ldots q_e\}\), \(C_i[m_i]\) is computed as

$$\begin{aligned} C_i[m_i] = M_i[m_i] \underline{\oplus } Y_i[\ell _i-1]. \end{aligned}$$

Note that the last message block of the ith encryption query \(M_i[m_i]\) may not be full (i.e, \(|M_i[m_i]| < n\)). The tag \(T_i, \forall i \in \{1, \ldots , q_e\}\) is computed as

$$\begin{aligned} T_i = Y_i[\ell _i]. \end{aligned}$$

As ciphertext is computed xoring message blocks by uniformly sampled Y-values, the equivalent ideal oracle actually samples \(C_i\) randomly. The same is true for tag values.

5.5 Overview of Attack Transcripts

Now we redefine the transcript random variable of an adversary. We replace ciphertext and tag by all \(\mathsf {Y}\)-values of encryption queries. Note that ciphertext and tag can be uniquely computed from \(\mathsf {Y}\)-values. Thus, the transcript of the adversary \(\mathcal {T}{:}{=} (\mathcal {T}_e,\mathcal {T}_f)\) where

  1. 1.

    \(\mathcal {T}_e=(\textsf {N}_i, \textsf {A}_i, \textsf {M}_i, \textsf {Y}_i), {i=1\ldots q_e}\) and

  2. 2.

    \(\mathcal {T}_f=(\textsf {N}^*_{i'}, \textsf {A}^*_{i'}, \textsf {C}^*_{i'}, \textsf {T}^*_{i'}, \textsf {Z}^*_{i'}), {i'=1\ldots q_f}\).

Here, \(\textsf {Z}_{i'}^*\) denotes the output of the decryption oracle \(\mathcal {D}\) (it is always \(\bot \) when we interact with the ideal oracle) for the \(i'\)th decryption query \((\textsf {N}^*_{i'}, \textsf {A}^*_{i'}, \textsf {C}^*_{i'}, \textsf {T}^*_{i'})\). Note that \(\textsf {Y}_i\) denotes \((\textsf {Y}_i[0], \ldots , \textsf {Y}_i[\ell _i]) = \textsf {Y}_i[0\ldots \ell _i]\), where \(\ell _i = a_i + m_i\), and \(a_i\) (resp. \(m_i\)) denotes the block length of \(\textsf {A}_i\) (resp. \(\textsf {M}_i\)). Similarly we define \(c^*_{i'}\) and \(a^*_{i'}\) for forging queries and write \(\ell ^*_{i' }= a^*_{i'} + c^*_{i'}\). The values of \(\textsf {X}_i\) are uniquely determined from \(\textsf {Y}_i\), \(\textsf {A}_i\), and \(\textsf {M}_i\). We use \(\mathcal {T}^{\textsf {real}} = (\mathcal {T}_e^{\textsf {real}}, \mathcal {T}_f^{\textsf {real}})\) (or \(\mathcal {T}^{\textsf {ideal}} = (\mathcal {T}_e^{\textsf {ideal}}, \mathcal {T}_f^{\textsf {ideal}})\)) to denote the random variable corresponding to the transcripts in the \(\textsf {Real}\) world (or in the \(\textsf {Ideal}\) world, respectively).

Notations on probabilities realizing transcripts Consider a fixed transcript \(\mathcal {T}= (\mathcal {T}_e, \mathcal {T}_f)\) where

  1. 1.

    \(\mathcal {T}_e = (N_i, A_i, M_i, Y_i)_{i = 1\ldots q_e}\) and

  2. 2.

    \(\mathcal {T}_f = (N^*_{i'}, A^*_{i'}, C^*_{i'}, T^*_{i'}, Z^*_{i'})_{{i'} = 1\ldots q_f}\).

We use the notation \(\Pr _{\mathsf {real}}[\mathcal {T}]\) (or \(\Pr _{\mathsf {ideal}}[\mathcal {T}]\)), to denote the probability

$$\begin{aligned} \Pr [(\mathsf {N}_i, \mathsf {A}_i, \mathsf {M}_i, \mathsf {Y}_i)_{i = 1\ldots q_e} =\mathcal {T}_e \ \wedge \ (\mathsf {N}^*_{i'}, \mathsf {A}^*_{i'}, \mathsf {C}^*_{i'}, \mathsf {T}^*_{i'}, \mathsf {Z}^*_{i'})_{{i'} = 1\ldots q_f} =\mathcal {T}_f], \end{aligned}$$

in the real world (or in the Ideal world), where the probability is defined over randomness of the transcript. \(\mathcal {T}\) is called realizable in the real world (or in the Ideal world) if \(\Pr _{\mathsf {real}}[\mathcal {T}] > 0\) or (\(\Pr _{\mathsf {Ideal}}[\mathcal {T}] > 0\), respectively).

5.6 Definition and Analysis of Bad Transcripts

Bad Views A transcript \(\mathcal {T}{:}{=} (\mathcal {T}_e,\mathcal {T}_f)\) is called “bad" if one of the following events occurs:

B1::

\(\mathsf {L}_i[j] = 0^{n/2}\) for some \(i \in [1\ldots q_e]\) and \(j > 0\).

B2::

\(\mathsf {X}_i[j] = \mathsf {X}_{i'}[j']\) for some \((i,j) \ne (i', j')\) where \(j, j' > 0\).Footnote 6

B3::

\(\textsf {mcoll}(\mathsf {R}) > n/2\), where \(\mathsf {R}\) is the tuple of all \(\mathsf {R}_i[j]\) values.

B4::

\(\mathsf {X}_i^*[j] = \mathsf {X}_{i_1}[j_1]\) or \(\mathsf {X}_i^*[j] = \mathsf {X}_{i'}^*[j']\) for some \(i, i', j', i_1, j_1\) such that \(p_i < j \le \ell ^*_i\) and \(\textsf {prefix}(\mathsf {Y}^*, i, j) \ne \textsf {prefix}(\mathsf {Y}^*, i', j')\) (where \(\mathsf {Y}^*_i[j] = \textsf {R}(\mathsf {X}^*_i[j])\) and \(\mathsf {Y}^*_{i'}[j'] = \textsf {R}(\mathsf {X}^*_{i'}[j'])\)).

B5::

There exists i, such that \({T}^*_i\) is a correct guess of \(\mathsf {Y}^*_i[\ell ^*_i]\). This clearly cannot happen for the ideal oracle case.

B6::

There exists i, such that \(\mathsf {Z}^*_i \ne \bot \). This clearly cannot happen for the ideal oracle case.

Some Intuitions on the Bad Events We add some intuitions on these events.

  • When B1 does not hold, then \(\mathsf {X}_i[j] \ne \mathsf {X}_{i'}[0]\) for all \(i, i'\), and \(j > 0\). Hence, \(\mathsf {\varDelta }_i\) will be completely random.

  • When B2 does not hold, then all the inputs for the random function are distinct for encryption queries, which makes the responses from encryption oracle completely random in the Real game.

  • When B3 does not hold, then at the right half of \(\mathsf {X}_i[j]\) we see at most n / 2 multi-collisions. A successful forgery is to choose one of the n / 2 multi-collision blocks and forge the left part so that the entire block collides. Forging the left part has \(2^{-n/2}\) probability due to randomness of masking.

  • When B4 does not hold, then the \((p_i+1)\)st to \(\ell ^*_i\)th input for the ith forging attempt will be fresh with a high probability and so all the subsequent inputs will remain fresh with a high probability. Also, it ensures that there is no collision between the inputs

  • Finally, when B5 does not hold, then the all the guesses for the tag in the decryption queries are wrong.

If \(\textsf {adv}\) interacts with the real oracle, we always have

  1. 1.

    \(\mathsf {X}_i[0] = 0^{n/2}\Vert N_i\)

  2. 2.

    \(\textsf {R}(\mathsf {X}_i[j])=\mathsf {Y}_i[j]\), \(i=1\ldots q_e, j=0\ldots \ell _i\),

where \(\mathsf {X}_i[j]\) and \(\mathsf {Y}_i[j]\)s are computed via \(\rho \) and \(\mathsf {\varDelta }\) values.

The following lemma bounds the probability of not realizing a good view while interacting with a random function (this will complete the first condition of the Coefficients-H technique).

Lemma 4

For any transcript \(\mathcal {T}\),

$$\begin{aligned} \Pr _{\mathsf {ideal}}[\mathcal {T}\not \in \mathcal {V}_{\text {good}}]\le & {} \Pr [\mathbf{B1 }] + \Pr [\mathbf{B2 }] + \Pr [\mathbf{B3 }] + \Pr [\mathbf{B4 } \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c] + \Pr [\mathbf{B5 }]\\\le & {} \frac{4\sigma _e}{2^{n/2}} + \frac{(q_e + \sigma _e+2\sigma _f )\cdot \sigma _f + q_f}{2^n}. \end{aligned}$$

Proof

(of Lemma 4) Throughout the proof, we assume all probability notations are defined over the ideal game. We bound all the bad events individually, and then by using the union bound, we will obtain the final bound. We first develop some more notation. Let \((\mathsf {Y}_i^1[j], \mathsf {Y}_i^2[j], \mathsf {Y}_i^3[j], \mathsf {Y}_i^4[j]) \xleftarrow {\scriptscriptstyle n/4}\mathsf {Y}_i[j]\). Similarly, we denote \((M_i^1[j], M_i^2[j]) \xleftarrow {\scriptscriptstyle n/2} M_i[j]\).

  1. (1)

    \(\Pr [\mathbf{B1 }] \le \sigma /2^{n/2}\): We fix a pair of integers (ij) for some \(i \in [1\ldots q]\) and \(j \in [1\ldots \ell _i]\). Now, \(\mathsf {L}_i[j]\) can be expressed as

    $$\begin{aligned} (\mathsf {Y}_{i}^2[j-1] \Vert \mathsf {Y}_{i}^3[j-1]) \oplus (\alpha ^{a} \cdot (1 + \alpha )^{b} \cdot \mathsf {\varDelta }_i) \oplus M_i^1[j] \end{aligned}$$

    for some a and b. Note that when \(j > 1\), \(\mathsf {\varDelta }_i\) and \(\mathsf {Y}_i[j-1]\) are independently and uniformly distributed, and hence for those j, we have \(\Pr [\mathsf {L}_i[j] = 0^{n/2}] = 2^{-n/2}\) (apply Lemma 2 after conditioning \(\mathsf {Y}_i[j-1]\)). Now when \(j =1\), we have the following three possibilities: (i) \(\mathsf {L}_i[1] = (1+\alpha )\cdot \mathsf {\varDelta }_i\oplus \mathsf {Cons}\) if \(a_i \ge 2\), (ii) \(\mathsf {L}_i[1] = \alpha \cdot \mathsf {\varDelta }_i\oplus \mathsf {Cons}\) if \(a_i =1\) and the associated data block is full, and (iii) \(\mathsf {L}_i[1] = \alpha ^2 \cdot \mathsf {\varDelta }_i\oplus \mathsf {Cons}\) if \(a_i =1\) and the associated data block is not full, for some constant \(\mathsf {Cons}\). In all cases by applying Lemma 2, \(\Pr [\mathbf{B1 }] \le \sigma _e/2^{n/2}\).

  2. (2)

    \(\Pr [\mathbf{B2 }] \le \sigma _e/2^{n/2}\): For any \((i, j) \ne (i', j')\) with \(j, j' \ge 1\), the equality event \(\mathsf {X}_{i}[j] = \mathsf {X}_{i'}[j']\) has a probability at most \(2^{-n}\) since this event is a non-trivial linear equation on \(\mathsf {Y}_{i}[j-1]\) and \(\mathsf {Y}_{i'}[j'-1]\) and they are independent to each other. Note that \(\sigma _e^2/2^{n} \le \sigma _e/2^{n/2}\) as we are estimating probabilities.

  3. (3)

    \(\Pr [\mathbf{B3 }] \le 2\sigma _e/2^{n/2}\): The event \(\mathbf{B3 }\) is a multi-collision event for randomly chosen \(\sigma \) many n / 2-bit strings as \(\mathsf {Y}\) values are mapped in a regular manner (see the feedback function) to \(\mathsf {R}\) values. From the union bound, we have

    $$\begin{aligned} \Pr [\mathbf{B3 }]&\le \left( {\begin{array}{c}\sigma _e\\ n/2\end{array}}\right) \frac{1}{2^{(n/2)\cdot ((n/2)-1)}} \le \frac{\sigma _e^{n/2}}{2^{(n/2)\cdot ((n/2)-1)}} \le \left( \frac{\sigma _e}{2^{(n/2)-1}}\right) ^{n/2} \le \frac{2\sigma _e}{2^{n/2}}, \end{aligned}$$

    where the last inequality follows from the assumption (\(\sigma _e\le 2^{(n/2)-1}\)).

  4. (4)

    \(\Pr [\mathbf{B4 } \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c] \le \frac{(q_e + \sigma _e+1.5\sigma _f)\cdot \sigma _f}{2^n}\): We fix some i and want to bound the probability \(\Pr [\mathsf {X}_i^*[j] = \mathsf {X}_{i_1}[j_1] \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c]\) for some \(i_1, j_1\) and \(p_i < j \le \ell ^*_i\). We also want to bound the probability \(\Pr [\mathsf {X}_i^*[j] = \mathsf {X}^*_{i'}[j'] \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c]\) for some \(i', j'\) and j. The number of bad pairs is at most

    $$\begin{aligned} (q_e+\sigma _e+\ell ^*_i)\cdot \ell ^*_i + \sigma _f \cdot \ell ^{*}_i \le (q_e + \sigma _e+\sigma _f)\cdot \ell ^*_i + \sigma _f \cdot \ell ^{*}_i. \end{aligned}$$

    Therefore, the total number of bad pairs is \(\sum _{1\le i\le q_f}(q_e + \sigma _e+\sigma _f)\cdot \ell ^*_i + \sigma _f \cdot \ell ^{*}_i \le (q_e + \sigma _e+ 2\sigma _f)\cdot \sigma _f\). Thus, \(\Pr [\mathbf{B4 } \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c]\) is at most \(\frac{(q_e + \sigma _e+ 2\sigma _f)\cdot \sigma _f}{2^n}\).

  5. (5)

    \(\Pr [\mathbf{B5 }] \le q_f/2^{n}\): The event \(\mathbf{B5 }\) is bounded by \(\frac{q_f}{2^n}\) since, \(Y^*_i[\ell ^*_i]\) is uniform, and there are total \(q_f\) decryption queries

Summarizing, we have

$$\begin{aligned} \Pr _{\mathsf {ideal}}[\mathcal {T}\not \in \mathcal {V}_{\text {good}}]&\le \Pr [\mathbf{B1 }] + \Pr [\mathbf{B2 }] + \Pr [\mathbf{B3 }] + \Pr [\mathbf{B4 } \wedge \mathbf{B1 }^c \wedge \mathbf{B3 }^c] + \Pr [\mathbf{B5 }] \\&\le \frac{\sigma _e}{2^{n/2}} + \frac{\sigma _e}{2^{n/2}} + \frac{2\sigma _e}{2^{n/2}} + \frac{(q_e + \sigma _e+2\sigma _f)\cdot \sigma _f}{2^n} + \frac{q_f}{2^n}\\&= \frac{4\sigma _e}{2^{n/2}} + \frac{(q_e + \sigma _e+2\sigma _f )\cdot \sigma _f + q_f}{2^n}, \end{aligned}$$

which concludes the proof. \(\square \)

Analysis of Good Transcripts We define a view to be good if none of the bad events hold. Let \(\mathcal {V}_{\text {good}}\) be the set of all such good views. For any view \(\mathcal {T}\in \mathcal {V}_{\text {good}}\), we want to show \(\mathsf {ip}_{\mathsf {real}}(\mathcal {T}) \ge (1-\epsilon _2) \cdot \mathsf {ip}_{\mathsf {ideal}}(\mathcal {T})\) for some \(\epsilon _2\). Hence, we proceed by computing the ratio between \(\mathsf {ip}_{\mathsf {real}}(\mathcal {T})\) and \(\mathsf {ip}_{\mathsf {ideal}}(\mathcal {T})\) for \(\mathcal {T}\in \mathcal {V}_{\text {good}}\). Note that, here \(\mathsf {ip}_{\mathsf {real}}(\mathcal {T})\) is actually \(\Pr _{\mathsf {real}}[\mathcal {T}]\) and \(\mathsf {ip}_{\mathsf {ideal}}(\mathcal {T})\) is \(\Pr _{\mathsf {ideal}}[\mathcal {T}]\). We first fix a realizable (with respect to ideal world) good view

$$\begin{aligned} \mathcal {T}= ((N_i, A_i, M_i, Y_i)_{i \in \{1,\ldots ,q_e\}}, (N^*_{i'}, A^*_{i'},C^*_{i'}, T^*_{i'}, Z^*_{i'})_{i' \in \{1,\ldots ,q_f\}}), \end{aligned}$$

where \(Z^{*}_{i'}=\bot \). We separate \(\mathcal {T}\) into

$$\begin{aligned} \mathcal {T}_{e}=(N_i, A_i, M_i, Y_i)_{i \in \{1,\ldots ,q_e\}} \text { and } \mathcal {T}_{f}=(N^*_{i'}, A^*_{i'},C^*_{i'}, T^*_{i'}, Z^*_{i'})_{i' \in \{1,\ldots ,q_f\}}, \end{aligned}$$

Lemma 5

For a good and a realizable view \(\mathcal {T}\)

$$\begin{aligned} \Pr _{\mathsf {ideal}}[\mathcal {T}] = {1}/{2^{n(q_e + \sigma _e)}}. \end{aligned}$$

Proof of Lemma 5

All the \(q_e + \sigma _e\) internal \(\mathsf {Y}\)s are chosen uniformly from \(\{0, 1\}^n\). Hence, it is easy to see that for a good view \(\mathcal {T}\), \(\Pr _{\mathsf {ideal}}[\mathcal {T}]\) is equal to \({1}/{2^{n(q_e+\sigma _e)}}\). \(\square \)

Lemma 6

For a good and a realizable view \(\tau \),

$$\begin{aligned} \Pr _{\mathsf {real}}[\mathcal {T}] = \Pr _{\mathsf {ideal}}[\mathcal {T}]. \end{aligned}$$

Proof of Lemma 6

Now we consider the real case. Since \(\mathbf{B1 }\) and \(\mathbf{B2 }\) do not hold with \(\mathcal {T}\), all inputs of the random function inside \(\mathcal {T}_e\) are distinct, which implies that the released \(\mathsf {Y}\)-values are independent and uniformly random. The variables in \(\mathcal {T}_e\) are uniquely determined given these \(\mathsf {Y}\)-values, and there are exactly \(q_e + \sigma _e\) distinct input–output of \(\mathsf {R}\). Therefore, \(\Pr _{\mathsf {real}}[\mathcal {T}_e]\) is exactly \(2^{-n(q_e + \sigma _e)}\) (note that, \(\Pr _{\mathsf {real}}[\mathcal {T}_e]\) is defined exactly in the same way as \(\Pr _{\mathsf {real}}[\mathcal {T}]\)). We also use the notation \(\Pr _{\mathsf {real}}[\mathcal {T}_f| \mathcal {T}_e]\) which is defined in a similar way. We next evaluate

$$\begin{aligned} \Pr _{\mathsf {real}}[\mathcal {T}] = \Pr _{\mathsf {real}}[\mathcal {T}_e]\cdot \Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e] = \frac{1}{2^{n(q_e + \sigma _e)}}\cdot \Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e]. \end{aligned}$$
(6)

Lower Bounding\(\Pr _{\mathsf {real}}[ \mathcal {T}_f|\mathcal {T}_e]\) We observe that \(\Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e]\) is equal to \(\Pr _{\mathsf {real}}[\bot _\mathsf {all}|\mathcal {T}_e]\), where \(\bot _\mathsf {all}\) denotes the event that \(\mathsf {Z}^{*}_{i}=\bot \) for all \(i=1,\dots ,q_f\), as other variables in \(\mathcal {T}_f\) are determined by \(\mathcal {T}_e\).

Let \(\eta \) denote the event that, for all \(i=1,\ldots ,q_f\), \(\mathsf {X}^*_i[j]\) for \(p_i<j\le \ell ^*_i\) is not colliding to \(\mathsf {X}\)-values in \(\mathcal {T}_e\) and \(X^*_i[j']\) for all \(j'\ne j\). For \(j=p_i+1\), the above condition is fulfilled by \(\mathbf{B4 }\), and thus, \(\mathsf {Y}^*_i[p_i+1]\) is uniformly random, and hence, \(\mathsf {X}^*_i[p_i+2]\) is also uniformly random, due to the property of feedback function (here, observe that the mask addition between the chains of \(\mathsf {Y}^*_i[j]\) to \(\mathsf {X}^*_i[j+1]\) does not reduce the randomness).

Now we have

$$\begin{aligned} \Pr _{\mathsf {real}}[\bot _\mathsf {all}|\mathcal {T}_e] = 1-\Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c|\mathcal {T}_e], \end{aligned}$$

and we also have

$$\begin{aligned} \Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c|\mathcal {T}_e] = \Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta |\mathcal {T}_e] + \Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta ^c|\mathcal {T}_e]. \end{aligned}$$

Here, \(\Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta |\mathcal {T}_e]\) is the probability that at least one \(T^*_i\) for some \(i=1,\dots ,q_f\) is correct as a guess of \(\mathsf {Y}^*_i[\ell ^*_i]\). Since \(\mathbf{B5 }\) does not hold with \(\tau \), the probability \(\Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta |\mathcal {T}_e]\) is 0.

For \(\Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta ^c|\mathcal {T}_e]\) which is at most \(\Pr _{\mathsf {real}}[\eta ^c|\mathcal {T}_e]\), the above observation suggests that this can be evaluated by counting the number of possible bad pairs (i.e., a pair that a collision inside the pair violates \(\eta \)) among all the \(\mathsf {X}\)-values in \(\mathcal {T}_e\) and all \(\mathsf {X}^*\)-values in \(\mathcal {T}_f\) as well as collisions between two \(\mathsf {X}\)-values among all the \(\mathsf {X}\)-values in \(\tau _f\) as in the same manner to the collision analysis of, e.g., CBC-MAC using \(\mathsf {R}\). \(\mathbf{B4 }\) does not hold with \(\tau \), \(\Pr _{\mathsf {real}}[(\bot _\mathsf {all})^c,\eta ^c|\mathcal {T}_e]\) is 0. Hence, \(\Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e]\) is

$$\begin{aligned} \Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e] = \Pr _{\mathsf {real}}[\bot _\mathsf {all}|\mathcal {T}_e] = 1. \end{aligned}$$

Combining all, we have

$$\begin{aligned} \mathsf {ip}_{\mathsf {real}}(\tau ) = \Pr _{\mathsf {real}}[\mathcal {T}]&= \frac{1}{2^{n(q_e + \sigma _e)}}\cdot \Pr _{\mathsf {real}}[\mathcal {T}_f|\mathcal {T}_e] = \Pr _{\mathsf {ideal}}[\tau ] = \mathsf {ip}_{\mathsf {ideal}}(\tau ). \end{aligned}$$

\(\square \)

6 Hardware Implementation of COFB

6.1 Overview

COFB primarily aims to achieve a lightweight implementation on small hardware devices. For such devices, the hardware resource for implementing memory is often the dominant factor of the size of entire implementation, and the scalability by parallelizing the internal components is not needed. In this respect, COFB’s small-state size and completely serial operation are quite desirable.

For implementation aspects, COFB is simple, as it consists of a blockcipher and several basic operations (bitwise XOR, the feedback function, and the constant multiplications over \(\text {GF}(2^{n/2})\)). Combined with the small-state size, this implies that the implementation size of COFB is largely dominated by the underlying blockcipher. In this section, we provide hardware implementation details of COFB using two blockciphers, AES and GIFT. Here, GIFT is a family of lightweight blockcipher proposed by Banik et al. [15]. It employs a structure similar to PRESENT [21] while improves efficiency by carefully choosing S-box and the bit permutation. It has 64-bit and 128-bit block versions, both have 128-bit key. We write GIFT-128 or simply write GIFT to denote the 128-bit block version. We write COFB[AES] and GIFT-128 to denote COFB using AES-128 and COFB[GIFT], respectively.

Table 2 Clock cycles per message byte for \({\textsf {COFB[AES]}}\)

We provide the number of clock cycles needed to process input bytes, as a conventional way to estimate the speed. Here, COFB[AES] taking a-block AD (associated data) and an m-block message needs \(12(a+m) + 23\) cycles. Table 2 shows the number of average cycles per input message bytes, which we call cycles per byte (cpb), assuming AD has the same length as message and the underlying blockcipher has 128-bit block. That is, the table shows \((12\cdot 2m + 23)/16m\).

Table 3 Clock cycles per message byte for COFB[GIFT]

Similarly, COFB[GIFT] needs \(41\cdot (a + m) + 81\) cycles for a-block AD and an m-block message. Table 3 shows the number of average cycles per input message bytes, which we call cycles per byte (cpb), assuming AD has the same length as message and the underlying blockcipher has 128-bit block. That is, the table shows \((41\cdot 2m + 81)/16m\).

6.2 Hardware Implementation of \(\textsf {COFB[BC]}\) Without CAESAR Hardware API

We describe the implementation details of both \(\textsf {COFB[AES]}\) and \(\textsf {COFB[GIFT]}\) without the CAESAR hardware API. These are basic round-based implementations without any pipelining and employ module architecture. We primary focus on the encryption-only circuit; however, the combined encryption and decryption circuit should have very small amount of overhead thanks to the inverse freeness (i.e., no blockcipher decryption routine is needed) and simplicity of the mode. Due to the similarity between the associated data and the message processing phase, the same hardware modules are used in both phases. A single bit switch is used to distinguish between the two types of input data. The main architecture consists of the modules described below. We remark that there is also a Finite State Machine (FSM) which controls the flow by sending signal to these modules. The FSM has a rather simple structure and is described below. Then, the overall hardware architecture is described in Fig. 8. We would like to mention that both the versions can be described with the same hardware architecture as they have exactly the same interface. Hence, we often use BC instead of the underlying blockcipher, where BC\(\in \)\(\{\textsf {AES-128}, \textsf {GIFT-128}\}\). We also assume that BC comprises of r rounds.

  1. 1.

    State Registers The state registers are used to store the intermediate states after each iteration. We use a 128-bit \(\mathbf{State }\) register to store the 128-bit BC block state, a 64-bit \(\varDelta \) register to store the 64-bit mask applied to each BC input, and a 128-bit \(\mathbf{Key }\) register to store the 128-bit key. The round key of BC is stored in the additional 128-bit register (\(\mathbf{Round Key }\)); however, this is included in the BC module.

  2. 2.

    BC RoundBC round function module runs one BC round computation and produces a 128-bit output, using two 128-bit inputs: one from the \(\mathbf{State }\) and the other from (internal) \(\mathbf Round Key \) registers. The latter register is initialized by loading the master key, stored in the \(\mathbf{Key }\) register, each time the BC function is invoked. The output of BC module is stored into the \(\mathbf{State }\) register, which is the input for the next round. The entire operation is serial, while the internal round computation and the round key generation run in parallel, and needs \({r+1}\) cycles to perform full BC encryption.

  3. 3.

    Feedback Function\(\rho \). The \(\rho \) module is to compute the linear feedback function \(\rho \) on the 128-bit data block and the 128-bit intermediate state value (output from the BC computation). The output is a 128-bit ciphertext and a 128-bit intermediate state (to be masked and stored to the \(\mathbf{State }\) register).

  4. 4.

    Mask Update\(\mathbf{uMask }\) module updates the mask stored in \(\varDelta \) register. \(\mathbf{uMask }\) receives the current mask value and updates it by multiplying with \(\alpha \) or \((1 + \alpha )\) or \((1 + \alpha )^2\) based on the signals generated by the FSM, where signals are to indicate the end of the message and the completeness of the final block process.

  5. 5.

    FSM The control of the complete design can be described by a finite state machine (FSM). We provide a separate and simple view of FSM in Fig. 7. The FSM consists of 9 states and starts with the \(Reset\_St\). This state is idle and followed by a \(Load\_St\), which initializes the BC state by loading nonce (before the first BC invocation). After the initialization, FSM enters into the BC invocation phase to encrypt the nonce. This phase consists of \(BC\_Reset\_St\) to reset BC parameters, \(BC\_Start\_St\) for key whitening, \(BC\_Round\_St\) to run one BC round and \(BC\_Done\_St\) to indicate the end of the BC invocation. Depending on whether the current blockcipher call is final or not, the FSM either releases the tag or it enters to the \(Compute\_\rho \_Add\_Mask\_St\), which computes the \(\rho \) function, updates mask, and partially masks the blockcipher input. The FSM sends two additional bits \(\textsf {EOM}\) to denote the end of data block and \(\textsf {isComplete}\) to denote the last data block is complete or not. Next it enters the \(BC\_Reset\_St\) for the next blockcipher invocation. After the last BC invocation, it enters the \(Release\_Tag\_St\). Finally, the FSM enters the end state. We use a 4-bit register to keep track of the states. It is to be noted that, in addition to the state transition, FSM also sends the corresponding relevant signals to the top modules.

Fig. 7
figure 7

FSM for \(\textsf {COFB}[BC]\) hardware implementation

Fig. 8
figure 8

Hardware circuit diagram

Basic Implementation We describe a basic flow of our implementation, which generally follows the pseudocode of Fig. 5. Prior to the initialization, \(\mathbf{State }\) register is loaded with \(0^{64}\,\Vert \,N\). Once \(\mathbf{State }\) register is initialized, the initialization process starts by encrypting the nonce (\(0^{64}\,\Vert \,N\)) with BC. Then, 64 bits of the encrypted nonce is chopped by the “\(\text {chop}\)" function as shown in Fig. 8, and this chopped value is stored into the \(\varDelta \) register (this is initialization of \(\varDelta \)). After the initialization, 128-bit associated data blocks are fetched and sent to the \(\rho \) module along with the previous BC output to produce a 128 bit intermediate state. This state is partially masked with 64-bit \(\varDelta \) for every BC call. After all the associated data blocks are processed, the message blocks are processed in the same manner, except that the \(\rho \) function produces 128-bit ciphertext blocks in addition to the intermediate state values. Finally, after the message processing is over, the tag is generated using an additional BC call.

Combined Encryption and Decryption As mentioned earlier, we here focus on the encryption-only circuit. However, due to the similarity between the encryption and the decryption modes, the combined hardware for encryption and decryption can be built with a small increase in the area, with the same throughput. This can be done by adding a control flow to a binary signal for mode selection.

6.3 Implementation Results with out CAESAR Hardware API

We have implemented both \(\textsf {COFB[AES]}\) and COFB[GIFT] on Xilinx Virtex 6 and Virtex 7, using VHDL and Xilinx ISE 13.4. Table 4 presents the implementation results of COFB on Virtex 7 with the target device \(\textsf {xc7vx330t}\) and on Virtex 6 with the target device \(\textsf {xc6vlx760}\). We employ RTL approach and a basic iterative type architecture (128-bit round- based implementation). The areas are listed in the number of Slice Registers, Slice LUTs and Occupied Slices. We also report frequency (MHz), Throughput (Gbps), and throughput–area efficiency. Table 4 presents the mapped hardware results of \(\textsf {COFB[AES]}\). In this paper, we have slightly optimized the implementation in [25, 26] to get a better estimate of the number of slice registers.

Table 4 FPGA implementation results of COFB[AES] and COFB[GIFT]

For AES-128, we use the implementation available from Athena [1] maintained by George Mason University. This implementation stores all the round subkeys in a single register to make the AES implementation faster and parallelizable. However, the main motivation of COFB is to reduce hardware footprint. Hence, we change the above implementation to a sequential one such that it processes only one AES round in a single clock cycle. This in turn eliminates the need to store all the round subkeys in a single register and reduces the hardware area consumed by the AES module.

For GIFT-128, we use our own implementation in FPGA. The implementation is round based without any pipelining. The architecture uses three registers State, RK, and Round to hold the blockcipher state, current round key, and the round counter, respectively. The architecture is divided into four modules SN, BP, ARK and ARC, UKEY. operations. SN module applies a 4-bit s-box to each of the 4-bit nibbles of the state. BP applies the bit permutation on the state. ARK performs the round key addition on the state, and ARC applies round constant addition on the state. UKEY updates the round key and stores it in RK. The architecture also uses another module EXT to extract a part of the round key to be added to the state. The hardware implementation results in slice registers, slice LUTs, and Slices are presented in Table 4.

6.4 Hardware Flexibility of the COFB Design

COFB is itself very lightweight, and it uses a few operations other than the blockcipher computations. In Table 5, we present the hardware area occupied by the blockcipher and the other modules for both COFB[AES] and COFB[GIFT] on Vertex 6. We observe that COFB[AES] consumes low hardware footprint and the majority of the hardware footprint is used by AES, whereas in COFB[GIFT] the implementation size is much smaller as the underlying blockcipher GIFT is much lighter than AES. This depicts that implementation area optimized blockcipher will be the most efficient one.

Table 5 COFB[AES] and COFB[GIFT]: area utilization by modules in Virtex 6

6.5 Hardware Implementation of \(\textsf {COFB}[BC]\) with CAESAR Hardware API

We implement \(\textsf {COFB}[BC]\) (both encryption and decryption) using the CAESAR Hardware API with the motivation of a fair comparison. We would like to mention that the architecture described in Sect. 6.2 is different from this one in terms of the user interface and the design of the control unit. For example, in the previous implementation, we assume that the nonce, the associated data blocks, and the message blocks are supplied separately to the main module. In this case, all of them are supplied as input blocks with a corresponding data type. Also, the previous architecture is encryption only, whereas this architecture supports both encryption and decryption. The implementation with GMU API adds a significant overhead on the previous result.

The top-level module is \(\textsf {AEAD}\), which invokes 4 low-level modules \(\textsf {PreProcessor}\), \(\textsf {PostProcessor}\), \(\textsf {CMD FIFO}\), and \(\textsf {Cipher-Core}\). We primarily focus on the Cipher-core module, as the GMU package [7] already provides codes for the other modules. We do not need to concentrate on the internal structures of all the modules except the \(\textsf {Cipher-Core}\) one. However, we just need to set certain parameters according to our design criteria. A top-level block diagram of this API is given in [41].

Cipher-core Module The Cipher-core module describes the core architecture of \(\textsf {COFB}[BC]\). It invokes Cipher-core-controller and Cipher-core-datapath modules. The controller describes the finite state machine for the design, and the datapath describes the data processing flow between the internal circuit and sub-modules. The Cipher-core-controller and the Cipher-core-datapath modules are described below.

Cipher-core-controller Module The Cipher-core-controller module describes the finite state machine (FSM) for the CAESAR hardware-based implementation of COFB. At the top layer, the controller uses the standard state descriptions provided in specification for the CAESAR hardware API [41] along with a few extra state compatible to our design. The standard states are as mentioned below

  • \(S\_RESET\)

  • \(S\_KEY\_CHECK\)

  • \(S\_NPUB\_READ\)

  • \(S\_DATA\_LOAD\)

  • \(S\_AD\_PROCESS\)

  • \(S\_AD\_PROCESS\_LAST\_BLOCK\)

  • \(S\_DATA\_PROCESS\)

  • \(S\_DATA\_PROCESS\_LAST\_BLOCK\)

  • \(S\_GEN\_VER\_TAG\).

Our design requires a few extra states

  • \(S\_NONCE\_PROCESS\)

  • \(S\_AES\_START\)

  • \(S\_AES\_DONE\)

  • \(S\_COMPUTE\_RHO\_AND\_TWEAK\).

The internal BC module runs its own FSM with \(BC\_Reset\_St\), \(BC\_Start\_St\), \(BC\_Round\_St\) and \(BC\_Done\_St\). The FSM for the BC module follows the same described in Sect. 6.2.

The control enters through the \(S\_RESET\) state and consequently passes through \(S\_KEY\_CHECK\) and \(S\_NPUB\_READ\) states according to the specification. \(S\_NPUB\_READ\) is followed by \(S\_DATA\_LOAD\) which in turn is followed by \(S\_NONCE\_PROCESS\), \(S\_AD\_PROCESS\), or \(S\_DATA\_PROCESS\) depending on the data type. If the block data input is not valid, the control returns to \(S\_DATA\_LOAD\). Otherwise, from each of these 3 states, the control enters \(S\_BC\_START\) to denote a BC invocation. Note that, there are four intermediate states between \(S\_BC\_START\) and \(S\_BC\_DONE\) corresponding to the BC module. After BC computation is complete (as signaled by \(S\_BC\_DONE\)), the control goes to \(S\_COMPUTE\_RHO\_AND\_TWEAK\). For the last input block of a certain data type, the control enters \(S\_AD\_PROCESS\_LAST\_BLOCK\) (or \(S\_DATA\_PROCESS\_LAST\_BLOCK\)). If it is not the last block, the control reverts back to \(S\_AD\_PROCESS\) (or \(S\_DATA\_PROCESS\), respectively). Finally, after the message is processed the FSM enters \(S\_GEN\_VER\_TAG\), where the tag is either generated (for encryption) or verified (for decryption).

Control Signals The controller uses signals described in [41] as well as a few additional signals. The controller accepts the following additional input signals:

  1. 1.

    \(\textsf {bc\_done}\) (to denote the completion of BC invocation) and

  2. 2.

    \(\textsf {tag\_match}\) (to denote verification).

It outputs the following additional output signals:

  1. 1.

    \(\textsf {delta\_load}\) signal (to denote whether to load the \(\varDelta \) register or to update it)

  2. 2.

    \(\textsf {sel\_ed}\) signal (to denote whether the input to the BC module is the nonce or the internal feedback value)

  3. 3.

    \(\textsf {bc\_start}\) (to denote the invocation of the blockcipher)

  4. 4.

    \(\textsf {init\_key}\) (to initialize the key register)

Cipher-core-datapath Module. This module accepts inputs from both the Cipher-core and the Cipher-core-controller modules. It accepts the secret key, nonce, message blocks, associate data blocks, and several other control signals described in the specification [41]. It also accepts additional signals \(\textsf {delta\_load}\), \(\textsf {sel\_ed}\), \(\textsf {bc\_start}\), and \(\textsf {init\_key}\) generated by Cipher-core-controller. It outputs the ciphertext block, the tag, and two additional signals \(\textsf {bc\_done}\) and \(\textsf {tag\_match}\). It invokes internal modules for BC round functions, \(\rho \) computations, and \({\varDelta }\) update functions. These modules are already described in Sect. 6.2.

6.6 Implementation Results with CAESAR Hardware API

We implement both \(\textsf {COFB[AES]}\) and COFB[GIFT] using CAESAR hardware API on the same platform. We also implement our design on Spartan 6 with the target device \(\textsf {xc6slx150t}\) for the sake on comparison with the other results on Spartan 6. Table 6 presents the detailed implementation results for \(\textsf {COFB[AES]}\) and \(\textsf {COFB[GIFT]}\).

Table 6 Implementation results of COFB[AES] and COFB[GIFT] using CAESAR hardware API

6.7 Area Overhead Due to CAESAR Hardware API

We observe from Tables 4 and 6 that there are significant overheads both in hardware area (for example, 40% increase in LUTs , 30% increase in Slices for COFB[AES] in Virtex 6) and in throughput (for example, 42% decrease in throughput for COFB[GIFT] in Virtex 6). Though CAESAR API favors basic iterative architecture, strict requirements of the complex control unit design and input handling mechanism cuts the efficiency significantly.

Table 7 Comparison on Virtex 6 [2] (Results are taken by selecting \(\mathbf{CAESAR Round 2 }\) and \(\mathbf{Standards }\) implemented results (following CAESAR API) in [2])
Table 8 Comparison on Virtex 7 [2]

6.8 Benchmarking with ATHENa Database

We compare implemented results of \(\textsf {COFB}[BC]\) with the results published in ATHENa Database [2], taking Virtex 6 and Virtex 7 as our target platforms. We also include the optimized results from the paper [45] for benchmarking only in Virtex 6 platform. In the first implementation, we ignore the overhead to support the CAESAR API and the fact that ours is encryption only while the others are (to the best of our knowledge) supporting both encryption and decryption, and the difference in the achieved security level, both quantitative and qualitative. To provide a fair benchmarking, we also provide a CAESAR hardware API-based implementations of \(\textsf {COFB}[BC]\). We found that, even if the implementation results gain a significant overhead, still we achieve a highly competitive result, even after adding a circuit for supporting CAESAR API and decryption. In Table 7, we provide comparisons for Vertex 6, and in Table 8, we provide comparisons for Vertex 7. Note that, we also add two custom API-based implementation results (for CLOC-AES-Optimized and NORX-Optimized) provided in [45]. The other results in [45] have been collected in ASIC platform and hence are not compatible with our figures.

Finally, in Table 9, we provide a benchmark of \(\textsf {COFB}[BC]\)-CAESAR-API on Spartan 6. We use the results in [32, 68] as well as few others (such as JAMBU-SIMON) in [2]. We observe that, some of the implementations (like ASCON, SILC, Ketje) in this platform support a specialized lightweight API (also support CAESAR API) and hence achieve better hardware area than ours (supports standard CAESAR hardware API) following basic iterative architecture.

Table 9 Comparison on Spartan 6 using results from [2, 32, 68]

We also remark that it is basically hard to compare \(\textsf {COFB}\) using AES-128 or GIFT-128 with other non-block-cipher-based AE schemes in the right way, because of the difference in the primitives and the types of security guarantee. For example, ACORN is built from scratch and does not have any provable security result and is subjected to several cryptanalyses [29, 46, 58, 59]. Sponge AE schemes (ASCON, Ketje, NORX, and PRIMATES-HANUMAN) use a keyless permutation of a large block size to avoid key scheduling circuit and have the provable security relying on the random permutation model.

7 Conclusion

This paper presents \(\textsf {COFB}\), a blockcipher mode for AE focusing on minimizing the state size. When instantiated with an n-bit blockcipher, \(\textsf {COFB}\) operates at rate-1 and requires state size of 1.5n bits and is provable secure up to \(O(2^{n/2}/n)\) queries based on the standard PRP assumption on the blockcipher. In fact this is the first scheme fulfilling these features at once. A key idea of \(\textsf {COFB}\) is a new type of feedback function combining both plaintext and ciphertext blocks. We first present an idealized version of COFB, named iCOFB along with its provable security analysis. We instantiate COFB with the AES-128 blockcipher. We also present hardware implementation results for COFB with AES-128 and GIFT-128 blockcipher, respectively (both with or without CAESAR Hardware API). These two implementations demonstrate the effectiveness of our approach.