1 Introduction

Network security yields a safe and secure communication. It consists of the provisions and policies adopted by a network administrator to prevent and monitor unauthorized access, misuse, modification, or denial of a computer network and network accessible resources. Network security covers a variety of computer networks, both public and private, that are used in everyday jobs involving transactions and communications among businesses, government agencies and individuals.

Cryptography provides way for securing electronic information. Many cryptographic algorithms were invented in the past few years. In that only few algorithms are withstanding against the third party. There was a question of long term security of the electronic information. In 2001, the National Institute of Standards and Technology (NIST) [1] tested few algorithms and announced that the Advanced Encryption Standard (AES) algorithm is the best cryptographic algorithm. Hence, in this paper the AES algorithm has been chosen and the architecture is modified in order to obtain high throughput.

Also in this paper, several area optimisation techniques like composite field arithmetic (CFA), order changing and on the fly key expansion have been implemented to reduce the area of the AES algorithm. Because of high throughput and less area, the proposed architecture is used for the real time image processing application, i.e., image encryption/decryption.

1.1 Motivation

It is mandatory to protect the confidential data over the internet. The Software implementation has the advantage of upgradability, easy usage, portability and flexibility. Since the hardware implementation provides a high physical security [2] than the software implementation, the proposed AES architecture is implemented in FPGA. The FPGA combines the flexibility and ease of upgrade that fulfils the advantage of software implementation with much improved physical security. The motivation of this paper is to present the hardware architecture for AES algorithm with high throughput and less area, so that it can be applied for the low end embedded applications.

This paper is organized as follows. Section 2 gives the relevant works of various authors reported in the literature. Section 3 reviews the basic AES encryption/decryption algorithm. Section 4 gives the contribution of this paper. Section 5 briefly describes the proposed methodology for increasing the throughput and for reducing the area. The implementation results and the performance comparison with the existing literature are discussed in the Sect. 6. The conclusion and the future work are stated in the last section.

2 AES Algorithm

Information security i.e. a secure data transmission can be obtained by employing effective and reliable encryption algorithms. There are several encryption algorithms which serves this purpose. AES [3] is one of the strongest cryptographic algorithms. It is approved by the NIST. AES is a symmetric block cipher which can encrypt and decrypt the information. AES takes input in blocks of 128, 192 or 256 bits and uses keys of lengths 128, 192 and 256 bits. The 128 bits are divided into 16 bytes and are put in a 4 × 4 array. A Byte is the basic unit for processing in AES algorithm. A Byte can be represented as a polynomial. The encryption and decryption processes of AES algorithm is shown Fig. 1.

Fig. 1
figure 1

Encryption and decryption process of AES

2.1 Steps of AES Algorithm

Encryption and decryption processes [47] consist of several steps. They are AddRoundKey, ShiftRows/InverseShifRows, ByteSubstitution step/Inverse ByteSubstitution, MixColumns/Inverse MixColumns step. These steps constitute a round. The number of rounds depends upon the size of the key. For the key sizes 128, 192, 256 bits, the corresponding number of rounds is 10, 12 and 14. Next sub-section provides brief explanation about each step.

2.1.1 AddRoundKey

In this step, the input data and the initial key are Xored. This is the first step in encryption process. The Key Expansion unit expands the initial key so that it can be used in the further AddRoundKey steps.

2.1.2 ByteSubstitution/Inverfse ByteSubstituion

The ByteSubstitution in encryption and the Inverse ByteSubstitution in decryption are the only non-linear transformations in the rounds of AES algorithm. Here each Byte of the state array, which is considered as an element of GF (28), is transformed to another byte by referring a Look-Up-Table (LUT).

2.1.3 ShiftRows/Inverse ShiftRows

In the ShiftRows and Inverse ShiftRows step, the rows of the state array are shifted cyclically to the left and right respectively. The first row remains unchanged. The second, third and fourth rows are shifted by one, two and three offsets to the left for the ShiftRows transformation. For the Inverse ShiftRows step, the second, third and fourth rows are shifted to the right by one, two and three offsets.

2.1.4 MixColumn/Inverse MixColumn

In these steps, each column of the state array is multiplied with a polynomial. The MixColumn transformation can be expressed in the matrix form as follows,

$$ \left[ {\begin{array}{*{20}c} {{\text{S}}_{{ 0 , {\text{C}}}}^{{\prime }} } \\ {{\text{S}}_{{ 1 , {\text{C}}}}^{{\prime }} } \\ {{\text{S}}_{{ 2 , {\text{C}}}}^{{\prime }} } \\ {{\text{S}}_{{ 3 , {\text{C}}}}^{{\prime }} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {02} & {03} & {01} & {01} \\ {01} & {02} & {03} & {01} \\ {01} & {01} & {02} & {03} \\ {03} & {01} & {01} & {02} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\text{S}}_{{ 0 , {\text{C}}}} } \\ {{\text{S}}_{{ 1 , {\text{C}}}} } \\ {{\text{S}}_{{ 2 , {\text{C}}}} } \\ {{\text{S}}_{{ 3 , {\text{C}}}} } \\ \end{array} } \right] $$

The Inverse MixColumn step is expressed as follows,

$$ \left[ {\begin{array}{*{20}c} {{\text{S}}_{{0,{\text{C}}}}^{\prime } } \\ {{\text{S}}_{{1,{\text{C}}}}^{\prime } } \\ {{\text{S}}_{{2,{\text{C}}}}^{\prime } } \\ {{\text{S}}_{{3,{\text{C}}}}^{\prime } } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} { 0 {\text{e}}} & { 0 {\text{b}}} & { 0 {\text{d}}} & { 0 9} \\ { 0 9} & { 0 {\text{e}}} & { 0 {\text{b}}} & { 0 {\text{d}}} \\ { 0 {\text{d}}} & { 0 9} & { 0 {\text{e}}} & { 0 {\text{b}}} \\ { 0 {\text{b}}} & { 0 {\text{d}}} & { 0 9} & { 0 {\text{e}}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\text{S}}_{{ 0 , {\text{C}}}} } \\ {{\text{S}}_{{ 1 , {\text{C}}}} } \\ {{\text{S}}_{{ 2 , {\text{C}}}} } \\ {{\text{S}}_{{ 3 , {\text{C}}}} } \\ \end{array} } \right] $$

3 Related Research

The implementation of the cryptographic algorithm in hardware has been carried out since past few years. Several architectures have been proposed to implement the AES algorithm by different authors.

Samiee et al. [8] designed a fully sub-pipelined encryptor with six substages in each round unit and achieved a throughput of 43.71 Gbps on a Xilinx XC2VP207fg676 device. The author also presented a efficient key expansion architecture suitable for six sub-pipelined architecture.

Reddy et al. [9] proposed two architectures—Pipeline and Sub-Pipeline. The architectures were coded using Verilog hardware description language (HDL) and simulated using Cadence ncsim. In this paper, pipelined AES is implemented using LUT. This design is implemented on XC5VLX110T-1 device. The area of 4611 slices and the throughput of 13.238 Gbps is achieved in this method. The other design uses Sub-Pipeline architecture and it occupies an area of 8896 slices and the throughput of 25.89 Gbps.

Zhang et al. [10] presented novel high speed architectures for the hardware implementation of the AES algorithm. CFA is employed to reduce the area and different implementations for the inversion in subfield GF (24) are compared in this paper. A fully Sub Pipelined encryptor with seven sub stages in each round is implemented and achieved a throughput of 21.56 Gbps on Xilinx XCV1000 device.

Kamal et al. [11] implemented area optimization for the AES algorithm and it required 2732 slices of a Xilinx Virtex-II XC2V1000bg575 device, runs at a maximum clock speed of 98.95 MHz and achieves a throughput of 29.32 Mbps.

Hammad et al. [12] designed a new efficient architecture for high speed AES encryptor using CFA in Byte Substitution Round. The architecture is presented with a multistage sub pipelined architecture that allows a high throughput. This design achieves a throughput of 39.053 Gbps for nine pipelining stages with an operational frequency of 305.1 MHz. This design also achieves a throughput of 27.94 Gbps on Xilinx Virtex 2 device with an operational frequency of 218.3 MHz.

Fan and Hwang [13] proposed a sequential and fully pipelined AES realization using Xilinx ISE 7.1 synthesizer and uses an efficient low cost AddRoundKey architecture for real time key generations. The sequential AES design achieves a throughput of 0.876 Gbps with an operational frequency of 75.3 MHz. The pipelined AES design achieves a throughput of 28.4 Gbps with an operational frequency of 222 MHz.

Swankoski et al. [14] proposed a parallel architecture in which internal hardware functionality is not duplicated but reused. This created a reasonably compact single block, which is ideal for duplication. This allowed multiple users to share the same hardware, as spatial isolation is achieved by the physical separation of individual encryption blocks. This algorithm achieved a throughput of 18.80 Gbps and an area of 23,979 slices in Virtex 2 pro FPGA device with ten parallel blocks.

Yoo et al. [15] presented a high speed parallel pipelined architecture is proposed in order to get high throughput. The AES block cipher is implemented with Virtex II Pro FPGA using 0.13 µm and 90 nm technology. By using an efficient inter-round and intra-round pipeline design, it achieves a high throughput of 29.77 Gbps in encryption.

Jose et al. [16] implemented an AES-128 algorithm using parallelism, pipelining, partial and dynamic reconfiguration. Using partial and dynamic reconfiguration, the subkeys which are contained in LUT’s are modified in this paper. Three hardware languages Handel-C, VHDL and JBits are combined with partial and dynamic reconfiguration, pipelining and sub-pipelining. It reaches a throughput of 24.922 Gbps on Xilinx XC2V6000-6.

Jyrwa et al. [17] presented a hardware implementation of AES algorithm. In this paper, the authors have worked with an iterative architecture and modifications such as merging of SubBytes and ShiftRows, LUTs for decryption have been successfully implemented. The encryption and decryption process of Rijndael algorithm was done in VHDL language and the corresponding FPGA implementation in the device XC2VP30 yields an area of 6211 slices and a throughput of 1.458 Gbps is obtained.

4 Contribution of This Paper

From the related work, it is observed that there are number of architecture like parallel, pipeline, sub-pipeline, parallel pipeline, sub-pipeline CFA are used to increase the throughput of AES. Here in this paper, parallel sub-line architecture is used to increase the throughput, CFA based S-box, on the fly key expansion and order change techniques are combined to reduce the area which is not reported in the earlier work.

The proposed AES architecture was synthesized in a Xilinx ISE12.2i tool and simulated using ISIM. This prototyped AES in FPGA achieves a high throughput of 52.29 Gbps and occupies an area of 1094 slices.

5 Proposed Methodology

5.1 Parallel Sub-Pipeline Architecture for High Throughput

This architecture combines the parallel and sub-pipeline architecture. The Sub-Pipeline architecture has the advantage of processing more number of inputs. This is done by storing the intermediate results in the register. The area is increased due to increased use of registers, but the throughput is increased to a greater extent. The parallel architecture has the advantage of processing the data in high speed by dividing the total module into small modules and processing it parallely. The parallel architecture [18] also provides high throughput but occupies more area.

The advantages of both the architectures have been merged by combining the Parallel and Sub-Pipeline architecture which is shown in the following Fig. 2.

Fig. 2
figure 2

Parallel Sub-Pipeline architecture

In the proposed technique, the input 128 bits is divided into four 32 bits and is given to separate hardware components, thus achieving the parallel architecture. Registers are inserted in intra round and inter round. The inner round consists of AddRoundkey, SubBytes, ShiftRows and MixColumn. These comprise one round. In AES-128, there are ten rounds. The intra round consists of these ten rounds i.e., from round 1 to round 10. This achieves the Sub-Pipelined architecture [19].

Initially the key is given to the Key Scheduling unit, where the key is expanded by rotating and exor-ing with the Round Constant values. The Key generated from the first round is sent to the next round which is shown in the Fig. 3.

Fig. 3
figure 3

Key generation unit

First of all, there will be a exor operation between the input and key and resulting result will be further processed through the rounds.

When the Clock is given, all the four 32 bits input will enter into the round 1. For the next clock, the result from round 1 is stored in the register and is sent to the round 2. At the same time, the next new set of inputs will enter into the hardware. Thus a greater throughput can be achieved from this architecture. This type of architecture increases the area. Inorder to decrease the area, some of the area optimisation techniques are explored and implemented. The different area optimisation techniques alter some of the internal architecture of the AES algorithm. The area optimisation techniques [20] are using CFA in the Sub-Bytes round instead of using LUT for S-Box, by generating the keys on the fly and by changing the order of the AES algorithm is discussed in the next sub-section.

5.2 Techniques to Reduce the Chip Area

5.2.1 Composite Field Arithmetic Structure in S-Box

In LUT based approaches, each Byte in the state array is transformed into another byte by looking at the LUT. This LUT [21] is referred nearly 200 times throughout the AES algorithm. Hence this occupies a large area. In order to reduce the chip area, non-LUT based approaches are preferred.

The SubBytes and the Inverse SubBytes transformations are the most resource consuming operations in the steps of AES algorithm. An area optimized AES algorithm is a high boon to many resource limited applications like RF Identification (RFID) tags, embedded systems applications like Mobile phones and tiny sensor networks. Hence CFA is employed in the SubBytes and Inverse SubBytes steps to reduce the area.

Subbytes Using CFA: Non-LUT based approaches for the SubBytes and its Inverse is implemented using combinational circuits. This circuit calculates the values of the transformed byte on the fly without having the values pre-stored using tables. The implementation of a circuit to find the multiplicative inverse in the GF (28) is very difficult and costly. Therefore, CFA [2226] is employed. By using CFA, an element in the higher order field GF (28) is mapped to an element in the lower order field GF (24) and can be further mapped to an elementary field GF [(22)2]. Now computations can be made in the lower order field. The Multiplicative Inverse can be found in the lower order field using the following polynomial [23],

$$ \left( {{\text{bx}} + {\text{c}}} \right)^{ - 1} = {\text{b}}\left( {{\text{b}}^{2}\uplambda + {\text{c}}\left( {{\text{b}} + {\text{c}}} \right)} \right)^{ - 1} {\text{x}} + \left( {{\text{c}} + {\text{b}}} \right)\left( {{\text{b}}^{2}\uplambda + {\text{c}}\left( {{\text{b}} + {\text{c}}} \right)} \right)^{ - 1} $$
(1)

The combinational circuit for the Multiplicative Inverse is derived using the Eq. (1) and is shown in Fig. 4 [23]. The detailed circuit of the multiplicative inverse is given in [23].

Fig. 4
figure 4

Byte substitution using CFA

Inverse Subbytes Using CFA: The Inverse S-Box can also be built using CFA. The steps involved are

  1. 1.

    Inverse Affine Transformation and Isomorphic mapping

  2. 2.

    Multiplicative Inverse

  3. 3.

    Inverse Isomorphic Mapping

Inverse Affine transformation in matrix form is shown in the Eq. (2)

$$ {\text{A}}^{ - 1} = \left[ {\begin{array}{*{20}c} 0 & {0 } & 1 & 0 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 & 0 \\ \end{array} } \right] \times \left[ {\begin{array}{*{20}c} {{\text{b}}_{ 0} } \\ {{\text{b}}_{ 1} } \\ {{\text{b}}_{ 2} } \\ {{\text{b}}_{ 3} } \\ {{\text{b}}_{ 4} } \\ {{\text{b}}_{ 5} } \\ {{\text{b}}_{ 6} } \\ {{\text{b}}_{ 7} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{array} } \right] $$
(2)

The Isomorphic Mapping, Multiplicative Inverse, Inverse Isomorphic Mapping steps are the same as the SubBytes step using CFA.

5.2.2 On the Fly Key Expansion Technique

As the name itself suggests, the keys are generated on the fly [27] using this technique. In the LUT based key expansion technique, the pre calculated keys are stored in a particular location. This consumes some amount of area. In order to reduce the area, the keys are computed on the fly and the circuit to generated the on the fly key is shown in the Fig. 5.

Fig. 5
figure 5

On the fly key expansion module

The 128 bit input key is divided into 4 bytes (words) as Key #0, Key #1, Key #2, Key #3. The Rcon register consists of the round constant values to be used for each round. Initially, the Key #3 is rotated and the bytes are substituted from the S-Box. This value is ex-ored with the Round Constant values and is ex-ored with the key #0.

The resulting word and the next key #1 is ex-ored and the resulting word will be the new generated key #2 which will be sent as the input for the next round input. Similarly the operation follows for the consecutive words. Again, the new generated key is fed as input to the Key Generation unit for the next set of key for the next round. Thus the process of generating key continues until it reaches the round 10 for AES-128 bit algorithm.

In this paper, a process of generating keys on the fly reduces area but consumes time. This can be overcome by combining this concept with the Parallel Sub-Pipeline architecture, as it saves time by doing the operation in high speed.

5.2.3 Order Changing

In AES design flow, the SubBytes and ShiftRows transformations are commute; that is, a SubBytes transformation immediately followed by a ShiftRows transformation is equivalent to a ShiftRows transformation immediately followed by a SubBytes transformation [28]. The order of the AddRoundKey and MixColumns transformations can also be reversed, provided that the columns (words) of the decryption key schedule are modified using the MixColumns transformation.

Thus the encryption structure in Fig. 6a can be modified to the equivalent structure in Fig. 6b. In this figure, note the change in the sequence of SubByte and ShiftRow and the sequence of AddRoundkey and MixColumn.

Fig. 6
figure 6

AES encryption a ordinary structure, b equivalent structure

Figure 7 shows the straightforward decryption structure of the AES algorithm. One should note that,

Fig. 7
figure 7

AES decryption a ordinary structure, b equivalent structure

  • InvShiftRows transformation immediately followed by InvSubBytes transformation is equivalent to InvSubBytes transformation immediately followed by InvShiftRows transformation [29].

  • InvMixColumns transformation is linear and hence

    $$ {\text{InvMixCol}}({\text{state}} \oplus {\text{roundkey}}) = {\text{InvMixCol}}\left( {\text{state}} \right) \oplus {\text{InvMixCol}}\left( {\text{roundkey}} \right) $$

Thus the decryption structure in Fig. 7a can be modified to the equivalent structure in Fig. 7b. In this Figure, note the change in the sequence of InvSubByte and InvShiftrow also the change of Mix round column and Add round key.

This order changing approach optimizes the architecture and hence the area is reduced in this technique.

6 Performance Comparison

6.1 Evaluation Metrices

The throughput, area consumption and efficiency are the most important parameters in evaluating the performance of the implementation. The proposed PSP CO2 architecture is prototyped in Xilinx Virtex XC6VLX75T device. The architecture is simulated on ISIM tool after post place and route. The entire design is synchronized with clock signals. The simulation is done based on the NIST standard inputs and checked for encrypted outputs. The different input parameters given for our algorithm are given in Fig. 8. Mode is set to ‘1’ for encryption and ‘0’ for decryption. Also, the PSP CO2 architecture is also tested for the image encryption/decryption by linking the MATLAB tool with Xilinx tool.

Fig. 8
figure 8

RTL schematic of the proposed PSP CO2 AES architecture

6.2 RTL Schematic

The RTL schematic of the proposed PSP CO2 architecture is shown in the Fig. 8. The entity name is specified at the top as encdec32. In this paper, Parallel Sub-Pipeline architecture is proposed for high throughput. In this architecture the 128 bit is divided into four bytes as shown in the figure as input 1, input 2, input 3, input 4. The encrypted and decrypted bits are shown as output from the block.

The inner RTL Schematic of the PSP CO2 architecture is shown in the Fig. 9. The Fig. 9 shows the different stages of AES algorithm and the inputs and outputs of the consecutive stages.

Fig. 9
figure 9

Detailed internal RTL schematic of the proposed architecture with LUT in the Sub-Bytes round

6.3 Throughput & Area Analysis in FPGA

The design is coded in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Synthesis is done by Xilinx ISE12.2i and the timing simulation is performed by ISIM simulator to verify the functionality of the circuit. The device Virtex 6 XC6VLX75T is the targeted device for FPGA implementation.

Throughput [29] is the speed at which the data is encrypted/decrypted. The throughput is very important in a communication process and this determines the performance of the algorithm.

The throughput is determined by the Eq. (3)

$$ {\text{Throughput}} = \frac{{128 \times {\text{Number}}\;{\text{of}}\;{\text{blocks/cycle}}}}{{{\text{Clock}}\;{\text{Period}}}} $$
(3)

128—indicate the block size of the input data.

The efficiency [30] is calculated by using Eq. (4)

$$ {\text{Efficiency}} = \frac{\text{Throughput}}{\text{Area}} $$
(4)

The performance comparison of the proposed PSP CO2 architecture with the existing architecture reported in literature is shown in the Table 1. Efficiency is the throughput per area which is denoted as Mbps/slices. The throughput and efficiency are calculated for all the architectures in order to estimate and compare the speed of the algorithm.

Table 1 Performance comparison of FPGA Results of the PSP CO2 with the existing works

Granado et al. [16] proposed a methodology to implement the AES algorithm using pipelined and parallel architecture using partial and dynamic reconfiguration. This achieves a throughput of 24.922 Gbps and an area of 3576 slices. The proposed combined architecture achieves a high throughput of 27.68 Gbps in the same device and the area is also higher than the [11], because more number of registers is used in the PSP CO2 architecture.

Thangkhome et al. [31] proposed two architectures Loop Unrolled and Pipelined architectures and showed that the Pipelined architecture achieved a higher throughput than the Loop Unrolled architecture. The Table 1 shows that the proposed architecture achieves higher throughput than the Pipelined architecture since the higher speed will be achieved through the parallel computing and Pipelined techniques. The proposed architecture achieves a throughput of 28.98 Gbps and an area of 4230 slices. Higher efficiency of 6.85 Mbps is achieved in the proposed architecture.

Zhang and Wang [32] proposed a pipelined architecture for AES algorithm and a throughput of 34.7 Gbps and an area of 2389 is occupied. The PSP CO2 gives a throughput of 35.38 Gbps and area of 4110 slices. The increase in area is due to the insertion of registers in inter round and intra round.

Iyer et al. [33] proposes an iterative architecture which achieved a throughput of 0.497 Gbps and an area of 1725 slices in Xilinx Virtex 4 device. The PSP CO2 architecture achieved throughput of 37.23 Gbps and an area of 5050 slices in the same device.

Yoo et al. [15] proposes a parallel pipelined architecture of 28.44 Gbps and an area of 6541 slices in the Xilinx Virtex 2P device. The proposed architecture achieves a throughput of 29.23 Gbps and an area of 6925 slices in the same device.

The throughput of the proposed PSP CO2 architecture and the throughput of various architectures as proposed by several authors is compared, keeping the device same and is shown in Fig. 10. For example the throughput calculated for the device X2V6000-6 in [16] is compared with the throughput obtained for the same device by applying proposed architecture which is found to be better.

Fig. 10
figure 10

Throughput comparison of proposed work with literature

From the Fig. 10, it is observed that the proposed architecture achieves high throughput than the existing works with the same device.

Also, HDL code is written for the existing architectures like Iterative, Pipeline, Sub-Pipeline, Parallel and Parallel Pipeline and synthesized targeting XC6VLX75T device. The throughput and the efficiency of the proposed PSP CO2 architecture and the other existing architecture are shown in Figs. 11 and 12.

Fig. 11
figure 11

Throughput comparison of proposed architecture with existing architecture

Fig. 12
figure 12

Efficiency comparison of proposed architecture with existing architecture

From Fig. 12, it is noted that the proposed PSP plus CFA plus on the fly key expansion plus order change architecture provides higher efficiency than the other architecture.

Hence this paper presents different architectures for different throughput and reduced area, so that the user can chose the architecture based on requirement and application.

6.4 Cryptanalysis of AES Algorithm

There are various possible attacks are reported in [29]. According to [29], potential cryptanalysis performed on AES was not fruitful in retrieving the key or the plain text. The authors in [30] prove that differential cryptanalysis and linear cryptanalysis performed on AES will not able to break and proves to be more secure.

In algebraic attacks which works such that the system is expressed as a multivariate polynomial equations which can be solved to find the key [31]. The eXtended Linearization (XL) algorithm [32] and the eXtended Sparse Linearization (XSL) algorithm [33] were aimed at solving the systems of equations obtained through crypt analysis. But the number of equations with thousands of unknown variables makes these less feasible for computing the key.

In side channel attacks the variations in observable parameters are noted and cryptanalysis is done on these parameters. Only timing attacks are non invasive, all other side channel attacks like power analysis attack, fault injection attack, electromagnetic radiation are invasive. The probability of these attacks are normally less because of the requirement of precise measuring equipments and the requirement of encrypting device itself. Also there are many countermeasures available to overcome these side channel attacks, like increasing latency, masking of data, shuffling of data after every access etc. Though related key attack [33] was able to crack few rounds of AES 192 and AES 256, a complete attack to break the AES 128 and retrieve its key has not been possible as of now.

These results don’t provide a promising cryptanalysis to break the AES-128 algorithm hence we believe that AES-128 algorithm is quiet promising indeed.

6.5 Real Time Image Encryption/Decryption

The proposed PSP CO2 architecture is also tested for different set of input images. In this research, the gray scale images of different sizes are taken and are resized to 128 × 128 size. The gray scale images are converted to binary images using the command dec2bin in MATLAB. Now, the pixel values of the binary images are 1’s and 0’s which are then sent to the proposed architecture of AES encoder. In the AES encoder, the plain text/input bits are converted to the cipher/encrypted data. The encrypted data is again sent to the MATLAB using the FILE operation. The encrypted image’s pixel values are then sent to the AES decoder. In the AES decoder, the encrypted data is converted to the decrypted data/plain text which is again linked to the MATLAB through the FILE operation in VHDL. The decrypted image obtained will be the binary image which is again converted to gray scale image by using the command bin2dec. The resulting will be the reconstructed image which will be same as the Original image. Images of Lena, Flow and Frnd are encrypted and decrypted using proposed architecture and are shown in the Fig. 13.

Fig. 13
figure 13

Image encryption/decryption using the proposed PSP CO2 architecture of AES. a Original image, b encrypted image, c decrypted binary image, d reconstructed image

7 Conclusion

This paper presents the FPGA implementation of high throughput and reduced area architecture for AES Encryption and Decryption algorithm for wireless communication. In this research, the Parallel Sub-pipelined architecture yielded a high throughput and a better efficiency for AES Algorithm and the area of the algorithm is optimised using area optimisation techniques such as using CFA in the place of LUT in SubBytes round, On the fly key Expansion technique and by changing the order of the steps in the AES algorithm.

The FPGA implementation of the proposed architecture gives high throughput of 52.29 Gbps and an area of 1094 slices. The image is also tested for its encryption/decryption using the proposed architecture. Thus, the proposed architecture achieves a high throughput and reduced area than all the existing architectures and can be used in real time applications for high security.