1 Introduction

The reversible logic has demonstrated promising results in emerging applications of computing paradigm such as quantum computation and nanotechnology [1, 2]. Reversible logic was first related to power when Landauer stated that information loss due to functional irreversibility leads to power dissipation [3]. This principle is further supported by Bennett that zero power dissipation can be achieved only when the circuit contains reversible gates [4]. A cryptographic algorithm is an essential part in building secure systems to protect information against attacks. A well-known cryptographic algorithm is the Data Encryption Standard [5], which has been widely adopted in many security products. However, serious considerations arise for long-term security because of the relatively short key word length of only 56 bits and due to the highly successful cryptanalysis attacks.

In November 2001, the National Institute of Standards and Technology (NIST) of the United States chose the Rijndael algorithm as the suitable Advanced Encryption Standard (AES) [6] to replace the DES algorithm. Since then, many hardware implementations have been proposed in literature. Some of the works target field programmable gate arrays (FPGA) [7] as the implementation platform and some target application-specific integrated circuits (ASIC) [8]. But both FPGA and ASIC implementations consume power from its source which leads to information leakage through side channel attacks.

Quantum computing is a new paradigm for implementing cryptographic algorithms using reversible gates [9]. The primary reason for this is the increasing demands for low power and increased security for the computing devices. As our computing demands become more complex, the power requirements tend to increase. This further leads to a prominent side channel attack namely power analysis attacks on cryptographic systems, notably categorized as simple power analysis (SPA), differential power analysis (DPA) and high order power analysis (HO-PA) attacks. Cryptographic systems implemented with reversible gates ideally consume zero power and hence thwarts all side channel attacks related to power analysis. Energy efficient implementations of reversible building blocks to counterattack power analysis attacks were proposed in [10]. By using proper charge sharing mechanism, resistance to power analysis attacks has been achieved in their proposed implementations.

In Ref. [11], an attempt has been made to apply reversible logic to develop secure cryptosystems against power analysis attacks. A prototype of reversible Arithmetic and Logic Unit (ALU) for crypto-processor was presented in that work. Reversible Montgomery multipliers were proposed to resist power analysis attacks in crypto hardware [12]. The various functional blocks in AES have been synthesized using Toffoli family of reversible gates in [13]. The reversible gate design of AES SubBytes and Inverse SubBytes transformations were proposed in [14]. The reversible gate design of finite field architectures for Elliptic Curve Cryptography was proposed in [15].

In the current work, a well optimized reversible gate design of 128-bit AES cryptographic algorithm is presented. First, the key transformations in the algorithm are synthesized by using conventional logic gates. Then the transformations are mapped to reversible logic and are synthesized by using reversible logic gates. The reversible logic synthesis is optimized by reusing the existing reversible gates wherever applicable so that the performance metrics are improved.

The paper is organized as follows. The background of reversible logic gates and their quantum cost are given in Sect. 2. Section 3 highlights the transformation steps in AES cryptographic algorithm. The proposed reversible building blocks of AES algorithm are explained in Sect. 4. The performance analysis of the proposed reversible design with the existing designs is given in Sect. 5. Section 6 concludes the paper with necessary references.

2 Background on Reversible Logic Gates

Reversible logic marks a promising new direction where all operations are performed in an invertible manner. That is, in contrast to traditional logic, all computations can be reverted (i.e. the inputs can be obtained from the outputs and vice versa). In addition, reversible logic gates are bijective transformations on the inputs, i.e. number of input and outputs are equal and every distinct input gives a distinct output. Reversible gates do not allow any fanout or feedback. The commonly used reversible gates are NOT, controlled-NOT alias CNOT (also called as Feynman gate), controlled–controlled-NOT alias CCNOT (also called as Toffoli gate), SWAP and Fredkin gates as shown in Fig. 1. Among these, NOT, CNOT and Toffoli gates are basically n-bit controlled Toffoli gate with control lines n = 0, 1 and 2 respectively. The behavior of few reversible gates is defined as follows:

Fig. 1
figure 1

Reversible gates

  • NOT: a′ = 1 ⊕ a

  • CNOT: a′ = a, b′ = a ⊕ b

  • TOFFOLI: a′ = a, b′ = b, c′ = c ⊕ ab

  • SWAP: b′ = a, a′ = b

  • FREDKIN: a′ = a, b′ = a′b + ac, c′ = ab + a′c.

The reversible logic synthesis can be done either with Toffoli gate family or Fredkin gate family since both are universal gates. In our proposed reversible design, Toffoli family of reversible gates has been used for reversible logic synthesis. The quantum cost of Toffoli gate with 0, 1 and 2 control lines is 1, 1, and 5 respectively. The performance metrics considered for reversible gate design are number of ancilla inputs, number of garbage outputs, number of reversible gates used, its quantum cost and delay in terms of number of stages. Ancilla inputs (Constants with either 0 or 1) and Garbage outputs are information that is not needed for the actual computation. They are required since the reversibility necessitates an equal number of outputs and inputs. Quantum cost denotes the effort needed to transform a reversible circuit to a quantum circuit [16].

3 Background on AES Algorithm

The AES algorithm (FIPS 2001) is a symmetric block cipher that processes data blocks of 128 bits using a cipher key of length 128, 192, or 256 bits. Each data block consists of a 4 × 4 array of bytes called the state S, on which the basic operations of the AES algorithm are performed. In the encryption process, after an initial round key addition, a round function consisting of four different transformations—SubBytes, ShiftRows, MixColumns and AddRoundKey—is applied to the data block in the encryption process.

The SubBytes transformation is a nonlinear byte substitution that operates independently on each byte of the state S using a substitution table (S-Box). The ShiftRows operation is a circular shifting on the rows of the state with different numbers of bytes (offsets). The MixColumns transformation mixes the bytes in each column of the state by the multiplication with a fixed polynomial modulo x4 + 1. AddRoundKey is an XOR operation that adds a round key to the state S in each iteration, where the round keys are generated during the key expansion phase. The round function is performed iteratively 10, 12, or 14 times (Nr), depending on the key length of 128, 192 or 256-bits respectively.

The MixColumns transformation is not applied to the final round. The complete AES encryption steps are outlined in Fig. 2. The AES decryption steps can be performed by two ways such as inverse cipher and equivalent inverse cipher. In the Inverse Cipher process, shown in Fig. 3, the sequence of the transformations differs from that of the encryption, while the sequence of the key schedules for encryption and decryption remains the same. In the equivalent inverse cipher process, shown in Fig. 4, the sequence of transformations is same as the encryption. This is accomplished with a change in the key schedule.

Fig. 2
figure 2

AES Encryption process

Fig. 3
figure 3

AES decryption using inverse cipher process

Fig. 4
figure 4

AES decryption using equivalent inverse cipher process

4 Proposed Reversible Building Blocks of AES Transformations

In this research, the major transformations of AES algorithm such as SubBytes [14], MixColumns, ShiftRows, AddRoundKey and Key scheduler are deduced using conventional logic gates. Then, conventional logic gates are mapped to reversible logic gates and are reused wherever possible in order to optimize the performance metrics.

The proposed reversible gate design of AES algorithm utilizes Toffoli family of reversible gates for their logic synthesis and is optimized in terms of reduced number of ancilla inputs, garbage outputs, gate count, quantum cost and delay. The functional verification of the proposed reversible gate designs is carried out by writing Verilog code for each reversible gate and integrating them. Xilinx ISE 13.2 is used for simulation purpose.

4.1 SubBytes and InvSubBytes Transformations

In the encryption module, the SubBytes transformation is a non-linear transformation, which computes the multiplicative inverse of each byte of the state S in GF(28) with irreducible polynomial P(x) = x8 + x4 + x3 + x + 1 followed by an affine transformation. The transformation in the decryption module performs the inverse of the corresponding transformation in the encryption module.

4.1.1 Design Approaches

The SubBytes and InvSubBytes transformations can be implemented by two different approaches. They can be either constructed as a single circuit whose input–output relation is directly equivalent to the SubBytes transformation known as Look-up Table (LUT) approach or constructed as a multiplicative inversion circuit and an affine transformation circuit independently. Then, these two circuits are cascaded to design the SubBytes transformation known as composite field arithmetic approach.

4.1.2 Composite Field Arithmetic Approach

Composite fields are frequently used in the design of Galois Field arithmetic where arithmetic operations rely on table lookups. Composite fields also known as sub-field arithmetic can be used to reduce lookup-related costs. Hence, another approach to design SubBytes and InvSubBytes transformations is the use of composite field arithmetic in the design of multiplicative inversion module. The idea of applying composite field arithmetic to AES algorithm is explored in detail in [17,18,19]. Applying composite field arithmetic, the elements of large-order fields are mapped to those of small-order fields in which the field operations can be carried out in a simpler way with reduced hardware cost. The functional blocks to design multiplicative inverse module in GF(28) is shown in Fig. 5.

Fig. 5
figure 5

Basic building blocks to design multiplicative inverse module in GF(28)

The details of each block in the GF(28) multiplicative inversion module are illustrated below.

  • δ → Isomorphic mapping from GF(28) to composite fields

  • x2 → Squarer in GF(24)

  • xλ → Multiplication with constant λ in GF(24)

  • ⊕ → Addition operation in GF(24)

  • x−1 → Multiplicative inversion in GF(24)

  • X → Multiplication operation in GF(24)

  • δ−1 → Inverse isomorphic mapping to GF(28).

The SubBytes transformation can be performed by taking the multiplicative inverse in GF(28) followed by affine transformation and the InvSubBytes transformation can be performed by inverse affine transformation followed by multiplicative inverse in GF(28). The sequence of steps to carry out both transformations is shown below.

  • SubBytes transformation: multiplicative inversion in GF(28) followed by Affine transformation.

  • InvSubBytes transformation: inverse affine transformation followed by Multiplicative inversion in GF(28).

4.1.3 Proposed Reversible Multiplicative Inverse Module in GF(28)

Since reversible gate designs are functionally reversible, it is sufficient to design either forward isomorphic mapping or inverse isomorphic mapping. The number of XOR operations required in the forward isomorphic mapping is 24 whereas inverse isomorphic mapping takes only 23 XOR operations. Hence, inverse isomorphic mapping function has been designed by using reversible gates in this research and the same design can be used for forward isomorphic mapping also.

The corresponding reversible gate design requires 23 CNOT gates which results in a quantum cost of 23. This calculation is based on the assumption that each XOR operation requires one CNOT gate when the reversible logic synthesis is performed by conventional design using one-to-one mapping from logic operations to equivalent reversible gates. However, in the proposed design, the gate count and quantum cost is optimized by properly reusing the existing reversible gates. The proposed reversible inverse isomorphic mapping block takes only 15 CNOT gates and has a quantum cost of 15 which gives 35% savings in both gate count and quantum cost compared to the conventional design. The proposed design has zero ancilla inputs, zero garbage outputs and has delay of 13 as shown in Table 1. The reversible gate design of forward/inverse isomorphic mapping block is shown in Fig. 6.

Table 1 Performance analysis of proposed reversible building blocks of multiplicative inverse module in GF(28)
Fig. 6
figure 6

Reversible IsoMap/InvIsoMap block

The reversible squarer and multiplication with constant λ block in GF(24) is shown in Fig. 7. The squarer block takes 4 CNOT gates and multiplication with constant λ block takes 4 CNOT gates when they are synthesized separately by using reversible gates. But the proposed reversible gate design of the merged squarer and multiplication with constant λ block takes only 4 CNOT gates and has a quantum cost of 4 which gives 50% savings in gate count and quantum cost compared to the individual designs. Also, it takes zero ancilla input, zero garbage output and has a delay of 3 as shown in Table 1.

Fig. 7
figure 7

Reversible squarer and multiplication with constant block

The addition operation in GF(24) can be performed by simple logical XOR operation. The reversible gate design of the GF(24) adder block is shown in Fig. 8, which takes 4 CNOT gates and has a quantum cost of 4. Also, it takes zero ancilla inputs, 4 garbage outputs and has a delay of 1 as shown in Table 1.

Fig. 8
figure 8

Reversible adder in GF(24)

The reversible gate design of GF(24) multiplier block requires 9 AND operations and 46 XOR operations. The reversible logic synthesis by one-to-one mapping takes 9 CCNOT gates and 46 CNOT gates which results in a quantum cost of 91. The proposed design is optimized by reusing the reversible gates properly so that the reversible GF(24) multiplier block takes only 9 CCNOT gates and 25 CNOT gates and has a quantum cost of 70 which results in 38% savings in Gate count and 23% savings in quantum cost. The proposed design takes 13 ancilla inputs, 17 garbage outputs and has a delay of 18 as shown in Table 1. The reversible gate design of the multiplication in GF(24) is shown in Fig. 9.

Fig. 9
figure 9

Reversible multiplier in GF(24)

The Multiplicative inverse in GF(24) can be calculated by three different approaches namely Square–Multiply approach, Multiple-Decomposition approach and Direct Mapping approach [18]. The computation of multiplicative inverse using Square and multiply approach is shown in Fig. 10. The corresponding reversible gate design can be obtained by using reversible squarer block and reversible GF(24) multiplier. The reversible gate design of multiplicative inverse in GF(24) using Square and Multiply approach takes 70 CNOT, 18 CCNOT gates and has a quantum cost of 160. Also, it takes 34 ancilla inputs, 42 garbage outputs and has a delay of 56 as shown in Table 2.

Fig. 10
figure 10

Multiplicative inverse using square and multiply approach

Table 2 Performance comparison of reversible multiplicative inverse modules in GF(24)

In multiple-decomposition approach, the multiplicative inverse in GF(24) can be computed by using composite field approach. In this case, the element in GF(24) is decomposed into a composite field of GF(22)2. The computation of multiplicative inverse in composite field GF(22)2 is shown in Fig. 11 [18]. The reversible gate design of multiplicative inverse in composite field GF(22)2 is shown in Fig. 12. The reversible gate design takes 22 CNOT, 9 CCNOT gates, and has a quantum cost of 67. Also, it takes 14 ancilla inputs, 14 garbage outputs, and has a delay of 19 as shown in Table 2.

Fig. 11
figure 11

Multiplicative inverse using multiple decomposition approach

Fig. 12
figure 12

Reversible gate design of multiplicative inverse module using multiple decomposition approach

Let m = {m3m2m1m0}2 be an element in GF(24) and its inverse is given by m−1 = {m −13 m −12 m −11 m −10 }2 which can be obtained by direct mapping approach as given in Eq. (1) [18].

$$\begin{aligned} {\text{m}}_{ 3}^{ - 1} = {\text{m}}_{ 3} \oplus {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 1} \oplus {\text{m}}_{ 3} {\text{m}}_{ 0} \oplus {\text{m}}_{ 2} \hfill \\ {\text{m}}_{ 2}^{ - 1} = {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 1} \oplus {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 0} \oplus {\text{m}}_{ 3} {\text{m}}_{ 0} \oplus {\text{m}}_{ 2} \oplus {\text{m}}_{ 2} {\text{m}}_{ 1} \hfill \\ {\text{m}}_{ 1}^{ - 1} = {\text{m}}_{ 3} \oplus {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 1} \oplus {\text{m}}_{ 3} {\text{m}}_{ 1} {\text{m}}_{ 0} \oplus {\text{m}}_{ 2} \oplus {\text{m}}_{ 2} {\text{m}}_{ 0} \oplus {\text{m}}_{ 1} \hfill \\ {\text{m}}_{ 0}^{ - 1} = {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 1} \oplus {\text{m}}_{ 3} {\text{m}}_{ 2} {\text{m}}_{ 0} \oplus {\text{m}}_{ 3} {\text{m}}_{ 1} \oplus {\text{m}}_{ 3} {\text{m}}_{ 1} {\text{m}}_{ 0} \oplus \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{m}}_{ 3} {\text{m}}_{ 0} \oplus {\text{m}}_{ 2} \oplus {\text{m}}_{ 2} {\text{m}}_{ 1} \oplus {\text{m}}_{ 2} {\text{m}}_{ 1} {\text{m}}_{ 0} \oplus {\text{m}}_{ 1} \oplus {\text{m}}_{ 0} \hfill \\ \end{aligned}$$
(1)

By analysing Eq. (1), it can be inferred that the direct mapping approach requires 25 AND operations and 21 XOR operations to calculate the multiplicative inverse. The reversible logic synthesis of Eq. (1) using one-to-one mapping takes 25 CCNOT gates, 21 CNOT gates and has a quantum cost of 146. In the proposed reversible gate design of GF(24) multiplicative inversion module, the reversible gates are properly reused so that it takes only 8 CCNOT and 14 CNOT gates with a quantum cost of 54 as shown in Fig. 13. The reusability of reversible gates results in 52% savings in gate count and 63% savings in quantum cost. The proposed design takes 8 ancilla inputs and 8 garbage outputs and has a delay of 19 as shown in Table 2.

Fig. 13
figure 13

Reversible gate design of multiplicative inverse module using direct mapping approach

From Table 2, it can be inferred that the multiplicative inverse in GF(24) can be efficiently computed with direct mapping approach, since it takes less number of reversible gates and the quantum cost involved is also less. This is because, the composite field approach will not give optimum results when the order of the field involved is small such as GF(24). Hence, in the proposed reversible SubBytes and InvSubBytes transformations, the multiplicative inverse in GF(24) is calculated by direct mapping approach.

The performance metrics of the reversible building blocks of GF(28) multiplicative inversion module are given in Table 1. The proposed reversible GF(28) multiplicative inversion module requires two reversible IsoMap/InvIsoMap block, two reversible GF(24) adder, one reversible Squarer and Multiplication by constant λ block, three reversible GF(24) multiplier, and one reversible GF(24) multiplicative inversion module as shown in Fig. 5 [18]. The performance metrics of the proposed reversible GF(28) multiplicative inversion module are tabulated in Table 3.

Table 3 Performance analysis of proposed reversible multiplicative inverse module in GF(28)

4.1.4 Proposed Reversible Affine Transformation Block

The reversible gate design of affine transformation is shown in Fig. 14. Since the design is functionally reversible, it is enough if either affine transformation or inverse affine transformation is considered. In this research, reversible gate design of affine transformation has been carried out which actually requires 4 NOT operations and 32 XOR operations. The reversible logic synthesis using one-to-one mapping approach requires 4 NOT and 32 CNOT gates with a quantum cost of 36. By properly reusing the existing reversible gates, the proposed reversible affine transformation block is optimized to 4 NOT and 21 CNOT gates with a quantum cost of 25. This optimization results in 31% savings in both gate count and quantum cost. The proposed design takes zero ancilla inputs, zero garbage outputs and has a delay of 21 as shown in Table 4.

Fig. 14
figure 14

Reversible affine transformation block

Table 4 Performance analysis of proposed reversible subbytes/invsubbytes transformation module

4.1.5 Proposed Reversible SubBytes/InvSubBytes Transformation Blocks

The reversible SubBytes transformation can be obtained by cascading reversible GF(28) multiplicative inversion module and reversible affine transformation block as shown in Fig. 15 [18]. The proposed reversible SubBytes transformation takes 4 NOT gates, 152 CNOT gates, 35 CCNOT gates and has a quantum cost of 331. Also, it takes 47 ancilla inputs, 67 garbage outputs and has a delay of 104 as shown in Table 4. The simulation output of the proposed reversible SubBytes transformation is shown in Fig. 16.

Fig. 15
figure 15

Block diagram of the proposed reversible SubBytes transformation

Fig. 16
figure 16

Functional verification of the proposed reversible SubBytes transformation block

The reversible InvSubBytes transformation can be obtained by cascading reversible affine transformation and reversible GF(28) multiplicative inversion module as shown in Fig. 17 [18]. The performance metrics of the proposed reversible InvSubBytes transformation is similar to reversible SubBytes transformation since the same building blocks are used for both transformations. The simulation output of the proposed reversible InvSubBytes transformation is shown in Fig. 18. The proposed reversible InvSubBytes transformation takes 4 NOT gates, 152 CNOT gates, 35 CCNOT gates and has a quantum cost of 331. Also, it takes 47 ancilla inputs, 67 garbage outputs and has a delay of 104. The performance metrics of the proposed reversible SubBytes/InvSubBytes transformation are given in Table 4.

Fig. 17
figure 17

Block diagram of the proposed reversible InvSubBytes transformation

Fig. 18
figure 18

Functional verification of the proposed reversible InvSubBytes transformation block

4.1.6 Performance Analysis

Table 5 summarizes the performance improvement in the proposed reversible SubBytes/InvSubBytes transformation module compared to the conventional reversible designs. An important point to be noted here is that the conventional reversible designs are the ones obtained by direct one-to-one mapping of logic operations to reversible gates. The two important performance metrics known as gate count and quantum cost are analysed in this research to show the improvement in the proposed reversible designs.

Table 5 Performance comparison of reversible subbytes/invsubbytes transformation modules

The proposed design of reversible SubBytes/InvSubBytes transformation module shows 36% reduction in gate count and 35% reduction in quantum cost compared to the conventional reversible designs. In addition, the proposed reversible gate design shows 35% reduction in gate count and 97% reduction in quantum cost compared to the existing reversible SubBytes and InvSubBytes transformation module [13] as shown in Table 6.

Table 6 Performance improvement in the proposed reversible subbytes/invsubbytes transformation modules

4.2 MixColumns and InvMixColumns Transformations

The MixColumns transformation operates on the state column-by-column, treating each column as a four-term polynomial (FIPS 2001). In this transformation, each byte of a column is replaced by a function of all the bytes in the same column. Here, each transformation is based on the polynomial c(x) = {03}x3 + {01}x2 + {01}x + {02}. This polynomial is co-prime to x4 + 1 and therefore invertible with d(x) = {0B}x3 + {0D}x2 + {09}x + {0E} as the inverse. So, the InvMixColumns transformation is constructed with the polynomial d(x) where d(x) = c−1(x).

4.2.1 Proposed Reversible MixColumns Transformation

The structure of MixColumns transformation is shown in Fig. 19. The whole design of the architecture is divided into two parts namely block-1 and block-2. The combined block is the integration of block-1 and block-2. The reversible block-1 implements the following operations such as Xbcd, Xacd, Xabd, Xabc and Xab, Xbc, Xcd, Xda in the MixColumns transformation of AES crypto core in a bit-by-bit basis using reversible gates. Here, the notation Xbcd indicates b ⊕ c ⊕ d and Xab indicates a ⊕ b. It takes one bit from all the four elements of the column in the state array and gives eight valid outputs as shown in Fig. 20. The reversible gate design of block-1 is made with 12 CNOT gates and it takes 4 ancilla inputs and zero garbage outputs. Also, it has a quantum cost of 12 with 8 units delay as given in Table 7.

Fig. 19
figure 19

Structure of MixColumns transformation

Fig. 20
figure 20

Reversible gate design of block-1

Table 7 Performance analysis of reversible mixcolumns transformation block

The block-2 performs XTime operation and gives the final output of MixColumns transformation. It takes sixteen valid inputs which come out from different block-1 and performs the GF(28) multiplication and gives eight valid outputs which are nothing but the final output of the MixColumns transformation of AES crypto core. The reversible gate design of block-2 is carried out with 11 CNOT gates and it takes zero ancilla input and 8 garbage outputs as shown in Fig. 21. In addition, it has a quantum cost of 11 with 11 units of delay as given in Table 7.

Fig. 21
figure 21

Reversible gate design of block 2

The previously mentioned sub-blocks block-1 and block-2 are instantiated 8 times and 4 times respectively and combined together to get the final design of the MixColumns transformation in AES crypto core as shown in Fig. 22. The reversible gate design of the combined block is made of 140 CNOT gates and it takes 32 ancilla inputs and 32 garbage outputs. In addition, it has a quantum cost of 140 with 19 units of delay as given in Table 7.

Fig. 22
figure 22

Proposed reversible MixColumns transformation block

4.2.2 Design of Reversible InvMixColumns Transformation

The reversible gate design of InvMixColumns transformation is made by decomposing it into five blocks such as reversible MixColumns transformation block, reversible inverse block-1, reversible inverse block-2, reversible inverse block-3 and reversible X4Time block as shown in Fig. 23. The reversible inverse blocks 1, 2 and 3 consist of similar gate level structure. These blocks take two sets of 8-bit inputs and perform XOR operations among them by using CNOT gates. The blocks produce 8-bit XORed outputs and the remaining 8-bit outputs are directly taken from one of the 8-bit inputs.

Fig. 23
figure 23

Structure of InvMixColumns transformation

The MixColumns transformation in AES algorithm is synthesized by using reversible gates in Sect. 4.2.1. The same reversible design can also be used in reversible InvMixColumns transformation block in order to reduce the hardware cost. The reversible MixColumns transformation block requires 140 CNOT gates with a quantum cost of 140. In addition, it takes 32 ancilla inputs, 32 garbage outputs and has a delay of 19 units as shown in Fig. 22. The reversible inverse blocks 1, 2 and 3 perform bitwise XOR operations on their two sets of inputs using CNOT gates and produces XORed outputs as shown in Fig. 24.

Fig. 24
figure 24

Reversible gate design of inverse blocks 1, 2 and 3

Each reversible inverse block requires 8 CNOT gates with a quantum cost of 8 and has unit delay. The number of ancilla input and garbage output will be different for each reversible inverse blocks. Reversible inverse block-1 does not require any ancilla inputs and garbage outputs whereas reversible inverse block-2 requires 8 ancilla inputs and it has zero garbage output. Reversible inverse block-3 has 8 garbage outputs with zero ancilla input. The reversible X4Time block which computes the multiplication with constant term 04 can be designed by cascading two XTime blocks [20]. Instead, the X4Time block which saves one XOR operation can be designed directly as shown in Fig. 25.

Fig. 25
figure 25

Structure of X4Time block in GF(28)

The corresponding reversible gate design of X4Time block is shown in Fig. 26. It requires 5 CNOT gates with quantum cost of 5 and has a delay of 5 units. This reversible block has zero ancilla input and garbage output. The previously mentioned sub-blocks such as reversible MixColumns transformation block, reversible inverse blocks 1, 2 and 3, and reversible X4Time block are instantiated once, 4 times, 2 times, 2 times and 2 times respectively, and combined together to get the final design of the reversible InvMixColumns transformation block in AES crypto core as shown in Fig. 27. The reversible gate design of the combined block is made with 214 CNOT gates and it takes 48 ancilla inputs and 48 garbage outputs. In addition, it has a quantum cost of 214 with 28 units of delay as given in Table 8.

Fig. 26
figure 26

Reversible gate design of X4Time block

Fig. 27
figure 27

Proposed reversible InvMixColumns transformation block

Table 8 Performance analysis of reversible invmixcolumns transformation module

4.2.3 Performance Analysis

This section describes the performance analysis of the proposed reversible gate design of MixColumns and InvMixColumns transformations. Table 9 compares the performance metrics of different reversible MixColumns transformation designs. The proposed reversible InvMixColumns design gives 7% reduction in gate count and quantum cost compared to existing design [22]. The simulation output of both reversible MixColumns and InvMixColumns transformations are shown in Figs. 28 and 29 respectively for three random sets of inputs.

Table 9 Performance comparison of reversible mixcolumns transformation modules
Fig. 28
figure 28

Functional verification of the proposed reversible MixColumns transformation block

Fig. 29
figure 29

Functional verification of the proposed reversible InvMixColumns transformation

Since gate count and quantum cost are the important metrics for reversible gate design, they are considered here for comparison with existing designs. The proposed reversible MixColumns design gives 10% reduction in gate count and quantum cost compared to existing design [20]. Table 10 compares the performance metrics of different reversible InvMixColumns transformation designs. The proposed reversible MixColumns transformation module shows 81% reduction in gate count and 93% reduction in quantum cost compared to the existing design [13] as given in Table 11.

Table 10 Performance comparison of reversible invmixcolumns transformation modules
Table 11 Performance comparison of reversible mixcolumns transformation module

4.3 Shiftrows and InvShiftRows Transformations

4.3.1 InvShiftRows Transformation

The ShiftRows transformation provides a simple permutation of the data, whereas the other transformations involve substitutions. Further, since the state is treated as a block of columns, it is this step which provides the diffusion of values between columns. The ShiftRows step operates on the rows of the state. It cyclically shifts the bytes in each row. The first row is left unchanged. Each byte of the second row is shifted one to the left. Similarly, the third and fourth rows are left shifted by two and three respectively.

4.3.2 InvShiftRows Transformation

InvShiftRows is the inverse of the ShiftRows transformation. The bytes in the last three rows of the state are cyclically shifted over different numbers of bytes. The first row is not shifted. Each byte of the second row is shifted one to the right. Similarly, the third and fourth rows are left shifted by two and three respectively. Both ShiftRows and InvShiftRows transformations involve cyclic shifting of bytes of data either to the right or to the left direction. Hence, these transformations do not require any specific reversible gates to perform the operations.

4.4 AddRoundKey Transformation

In the AddRoundKey Transformation, roundkey is added by combining each byte of the state with the corresponding byte of the roundkey by bitwise XOR operation. For each round, the roundkey is derived from the main key using Rijndael’s key schedule; each roundkey is the same size as the state. This is the only step which makes use of the key and obtains the result. So, this step is used at the end of each round. The transformation can be efficiently implemented in reversible logic using a CNOT gate for every bit. A total of 128 CNOT gates are required for realizing this transformation. Table 12 shows the results in terms of number of gates, quantum cost, delay, number of garbage outputs and number of constant inputs for AddRoundKey transformation.

Table 12 Performance analysis of AddRoundKey transformation module

4.5 Proposed Reversible Key Scheduler

The Key Scheduler is responsible for generating the round keys which are used in the AddRoundKey transformation. The Key Scheduler module has three essential steps:

  1. 1.

    The RotWord() step which cyclically shifts each byte in a word one byte to the left can be implemented by wiring alone.

  2. 2.

    The SubWord() step which applies SubBytes transformation to each of the four bytes in a word requires the same S-boxes as used in the main encryption flow.

  3. 3.

    The Rcon is a constant word array and only the leftmost byte in each word is nonzero. As only one byte of Rcon is nonzero, and thus, only 8 CNOT gates are required to XOR the Rcon value with the output which is obtained from the previous step.

  4. 4.

    An 128-bit XOR operation is done by using 128 CNOT gates for the final step.

Reversible logic gate design of key scheduler for 128-bit AES algorithm is shown in Fig. 30 in which i varies from 1 to Nr. For each round, the round keys are generated by varying i from 1 to 10. Table 13 gives the performance metrics in terms of number of reversible gates, quantum cost, delay, number of garbage outputs and number of ancilla inputs required for designing RotWord operation, SubBytes transformation, xoring with round constant (rcon), xor operation and complete Key Scheduler using reversible logic gates. The RotWord operation is permutation of bits, so it does not require reversible gates.

Fig. 30
figure 30

Proposed reversible key scheduler

Table 13 Performance analysis of proposed reversible key scheduler

The functional verification of the reversible key scheduler is shown in Fig. 31. The performance improvement of the proposed reversible key scheduler is given in Table 14. The proposed reversible key scheduler shows 33% reduction in gate count and 97% reduction in quantum cost compared to the existing design [13].

Fig. 31
figure 31

Functional verification of the proposed reversible key scheduler

Table 14 Performance comparison of reversible key schedulers

5 Performance Analysis of Reversible AES Design

5.1 Proposed Reversible AES Encryption Module

The 128-bit AES encryption algorithm takes 128-bit plaintext as input alongwith an 128-bit key and produces 128-bit ciphertext as output. The number of rounds (Nr) is 10 for 128-bit keys. The reversible logic gate design of 128-bit AES encryption requires 11 AddRoundKey transformations, 10 SubBytes transformations, 9 MixColumns transformations and 10 ShiftRows transformations. The Fanout module is designed using 128 CNOT gates with one unit delay and has a quantum cost of 128, garbage outputs of 128 and zero ancilla inputs. The reversible gate design of AES encryption algorithm is shown in Fig. 32.

Fig. 32
figure 32

Proposed reversible AES encryption module

From Fig. 32, it can be observed that the proposed reversible gate design of AES encryption module requires reversible AddRoundKey block, reversible SubBytes transformations, reversible MixColumns transformations and reversible Key schedulers. As ShiftRows involve cyclic shifting of state elements, they do not require any reversible logic gates. The remaining transformations are properly synthesized using reversible logic gates and the optimized designs are used to build the proposed reversible AES encryption module. Table 15 gives the performance metrics of proposed reversible gate design of 128-bit AES encryption module. The functional verification of the proposed reversible 128-bit AES module is shown in Fig. 33.

Table 15 Performance analysis of proposed reversible AES encryption module
Fig. 33
figure 33

Functional verification of the proposed reversible AES encryption module

5.2 Proposed Reversible AES Decryption Module

The top-level structure of 128-bit AES decryption algorithm takes 128-bit ciphertext as input along with an 128-bit key and produces an 128-bit plaintext as output. There are four distinct operations: InvSubBytes, InvShiftRows, InvMixColumns and AddRoundKey. After an initial AddRoundKey transformation, Nr-1 rounds are performed which consists of a sequence of four operations. The number of rounds is 10 for 128-bit key length. The final round excludes InvMixColumns transformation. The reversible gate design of 128-bit AES decryption module using Inverse Cipher is shown in Fig. 34. It requires 11 AddRoundKey transformations, 10 InvSubBytes transformations, 9 InvMixColumns transformations and 10 InvShiftRows transformations.

Fig. 34
figure 34

Proposed reversible AES decryption module using inverse cipher method

The reversible gate design of 128-bit AES decryption module using Equivalent Inverse Cipher is shown in Fig. 35. It requires 11 AddRoundKey transformations, 10 InvSubBytes transformations, 18 InvMixColumns transformations and 10 InvShiftRows transformations. The performance comparison of 128-bit AES decryption using inverse cipher and equivalent inverse cipher is given in Table 16. In the equivalent inverse cipher, the mixroundkeys are used. The mixroundkeys are the modified roundkeys obtained by applying InvMixColumns to the roundkeys. Thus, the quantum cost and number of reversible gates required to design the reversible equivalent inverse cipher are more compared to the reversible inverse cipher. Hence, in the proposed reversible gate design of AES decryption module, inverse cipher method is used to decrypt the ciphertext. Table 17 gives the performance metrics of proposed reversible gate design of 128-bit AES decryption module. The functional verification of the proposed reversible 128-bit AES module is shown in Fig. 36.

Fig. 35
figure 35

Proposed reversible AES decryption module using equivalent inverse cipher method

Table 16 Performance comparison of reversible AES decryption modules
Table 17 Performance analysis of proposed reversible AES decryption module
Fig. 36
figure 36

Functional verification of proposed reversible AES decryption module using inverse cipher method

5.3 Performance Improvement

The proposed reversible SubBytes/InvSubBytes transformation module of AES crypto core shows 36% reduction in gate count and 35% reduction in quantum cost compared to the conventional reversible designs. In addition, the proposed design shows 35% reduction in gate count and 97% reduction in quantum cost compared to the existing design of reversible SubBytes and InvSubBytes transformation module.

The proposed reversible gate design of MixColumns and InvMixColumns transformation of AES crypto core based on CNOT gates show 81 and 93% reduction in gate count and quantum cost respectively compared to existing designs. The proposed reversible key scheduler of AES crypto core shows 33% reduction in gate count and 97% reduction in quantum cost compared to the existing design. The performance improvement in all the proposed designs is mainly due to the reuse of already available reversible gates to the maximum extent. The complete AES encryption process takes 16,192 ancilla inputs, 26,752 garbage outputs with a delay of 2132 and a total of 70,320 reversible logic gates with a quantum cost of 115,120. The complete AES decryption process takes 16,768 ancilla inputs, 27,328 garbage outputs with a delay of 1412 and a total of 72,984 reversible logic gates with a quantum cost of 117,784. In summary, our proposed AES reversible design gives 36% reduction in gate count and 97% reduction in quantum cost when compared to the existing design [13] as shown in Table 18. This considerable reduction in the performance metrics is mainly due to the reuse of the already existing gates in the structures.

Table 18 Performance improvements in proposed reversible AES design

6 Conclusion

A novel reversible gate design of complete 128-bit AES algorithm is presented. Since the reversible logic gates ideally consume zero power and their quantum computing based implementation is less sensitive to power analysis attacks, they are exploited to construct the AES algorithm in this work. The Toffoli family of reversible gates are used in the proposed designs and the reversible logic gates are reused as much as possible in order to optimize the performance metrics in the proposed structures. The proposed reversible gate design of AES algorithm gives 36% reduction in gate count and 97% reduction in quantum cost when compared to the existing design. Hence, the proposed design can be effectively used to protect confidential data in low power and secure applications such as wireless sensor networks.