1 Introduction

The advanced encryption standard (AES)-128 and Blowfish algorithms are both from the symmetric-key block cipher cryptography family. The AES scheme is used by most of the Institute of Electrical and Electronics Engineers (IEEE) standards to secure wireless communication amongst mobile devices. The AES-128 scheme can process 128-bit data blocks with 128-bit cipher keys. Its plain text goes through 10 rounds (Nr), 4 columns (Nb) and 4 cipher keys (Nk). Each encryption round performs different operations, such as byte substitution (SubBytes), shift rows (ShiftRows), mix-column (MixColumns) and addition of a round key (AddRoundKey). Meanwhile, the decryption round performs inverse transformations, such as InvSubBytes, InvShiftRows, InvMixColumns and AddRoundKey. However, this scheme requires high computing platform and large design size because of its complex architecture. A few attacks, such as related-key attack and side-channel attack on its safety level [1,2,3,4], which can cause doubt among users or providers, against AES have also been found. Therefore, the issues that should be considered are performance and security because mobile devices have limited battery power and are designed to be portable with lots of features [5, 6]. Most of the current research trends are more concerned with simple and high-speed security architecture [5, 6].

To overcome this issue, an alternative security scheme with improved power throughput and low hardware utilisation is introduced and developed in this paper. On the basis of performance analysis between AES and Blowfish schemes in previous works [7,8,9,10,11,12,13,14,15,16], Blowfish is considered the alternative scheme to replace existing AES because it has better performance, simpler architecture and high security. On the basis of [17], Blowfish has a 64-bit block size and a variable key length from 32 bits up to 448 bits. The Blowfish scheme consists of two units, namely, key expansion and data encryption units. The 64-bit input data of this scheme are divided into two 32-bit halves, and the P-Arrays (P1-P18), which comprise 18 32-bit subkeys for the key expansion unit, are used. This scheme has 16 rounds, with each round implementing the Feistel (F) function. In the F function block, four 32-bit S-boxes have 256 entries each. After the 16th round, two 32-bit half data are recombined to obtain the cipher text during the encryption mode. As for the decryption process, the Blowfish flow refers to the inverse of the encryption process. All operations involve only XORs and additions (ADDs) of 32-bit data. On the basis of [2, 16, 18], Blowfish is proven to be a highly secure and strong encryption scheme. Furthermore, in the proposed research, the Blowfish is designed and executed for 16 rounds to enable high security encryption [4]. Given that Blowfish is also unpatented and freely available, it may have the potential to replace existing AES to achieve a high-performance end product with better security level.

This work proposes the development of enhanced AES-128 and Blowfish algorithms with Verilog code by using a combination of three design techniques, namely, parallel, pipelined and memory (P2M), as a complete solution instead of applying a segregated approach; these techniques are the main contributions of this study. Through Zynq-7000 XC7Z020 field programmable gate array (FPGA) platform with Artix-7 technology, the proposed P2M AES-128 and P2M Blowfish are implemented as a prototyping product to verify their functionality and complexity in real-time environments. Subsequently, the performance of the proposed AES-128 and Blowfish is analysed in terms of design throughput, logic resource and power consumption as another contribution of this study. On the basis of the performance results, this study can guide researchers to determine the possibility of designing the proposed P2M Blowfish through application-specific integrated circuit (ASIC) methodology to produce chipsets before being implemented in mobile devices for secure wireless communication instead of the AES-128. This work supports the current research trends that focus on developing simple and high-speed security architecture via the P2M Blowfish since mobile devices have limited core storage and power source.

This paper is organised as follows. Section 2 discusses the related research on the AES-128 and Blowfish designs through FPGA platforms. Section 3 describes the design methodology of the proposed AES-128 and Blowfish architectures by using parallel, pipelined and memory techniques. Section 4 explains their results and discussion in terms of FPGA hardware utilisation, throughput and power consumption. Section 5 concludes this research.

2 Related Research

Studies that conduct performance analysis on the AES-128 and Blowfish designs based on FPGA platforms are limited. Table 1 shows that previous researchers used different design techniques either in the data processing unit or key data processing unit of their AES-128 architectures. The sequential technique in [19,20,21] is known as the conventional method because it involves only the consecutive data signal flows of the design circuit, which could increase the clock cycles. Then, the memory technique was used by Toubal et al. [22] to store the keys generated during the encryption process. However, other related works in [23,24,25,26,27,28,29,30,31] showed that instead of using the memory block, the registers were used to store a large amount of data of S-boxes and inverse S-boxes for encryption and decryption, respectively. The findings in Table 1 show that the highest throughput of 1.085 Gbps and lowest power consumption of 0.88 W were achieved by the proposed P2M AES-128 if compared to the reference works. Nuray et al. [28] obtained the least logic resources, as indicated by the 1% usage of slices through pipelined and parallel techniques in their AES-128 architecture. These findings also show that the AES-128 designs obtained better performance in terms of slices used, throughput and power consumption with the memory, pipelined or parallel technique compared with the sequential technique. However, in reality, these results have proven the difficulty in obtaining the best performance in all parameters at once. Thus, appropriate design techniques must be implemented in the AES architecture by considering many issues, such as time constraint, core density and power for the design activities, to achieve the best performance as much as possible.

Table 1 Performance analysis on AES-128 designs from previous research and proposed work

The performance of 64-bit Blowfish designs from previous research is summarised in Table 2. Previous works used either a single design technique or a combination of two design techniques that comprises sequential, parallel, pipelined or memory techniques in their Blowfish architectures. The sequential, parallel and pipelined techniques were mostly used to design the sub-blocks of data processing and key processing units. Meanwhile, the memory technique was used to store a large amount of data of four S-boxes and P-Arrays. The performance analysis indicates that the Blowfish designed by Sudarshan et al. [32] using pipelined and memory-based techniques has the smallest core size, using only 214 slices. Nalawade and Gawali [39] achieved the highest throughput of 1.632 Gbps using the memory technique in their Blowfish architecture. However, the Blowfish designed by Karthigaikumar and Baskaran [33], which used pipelined and parallel techniques, is the best design for power saving in mobile devices because their Blowfish consumed only 77 mW power.

Table 2 Performance analysis on 64-bit Blowfish designs from previous research

Although the AES-128 and Blowfish designs in previous works were developed on different FPGA families, this is not an issue because the FPGA is used only as a medium to verify the functionality and complexity of the cryptography architectures. The implementation of AES and Blowfish designs on different FPGA platforms is also considered a benchmark to represent the performance analysis among reference works. Therefore, Tables 1 and 2 indicate that the combination of at least two design techniques in each cryptography architecture, which consist of pipelined, parallel or memory, could provide support in achieving either the lowest hardware utilisation, lowest power consumption or highest throughput. The performance analysis also showed that the weakest performance of AES-128 and Blowfish designs was obtained by using one of the pipelined, parallel or sequential techniques in its architecture. However, none of the reference works on the AES-128 and Blowfish designs achieved the best performance all at once in terms of hardware utilisation, throughput and power consumption despite employing different techniques. These outcomes confirm the possibility of designing the AES-128 and Blowfish architectures with the combination of three techniques, namely, parallel, pipelined and memory techniques, to improve their performance results further as a contribution of this study.

3 Design Methodology

With the use of Xilinx Vivado version 2015.2, the proposed P2M AES-128 and P2M Blowfish were designed using the hardware description language code called Verilog. Both cryptography designs have a 128-bit block size and 128-bit key length to achieve a fair performance comparison. The design methodology for the architectures of the proposed AES-128 and Blowfish is explained in Sect. 3.1 and 3.2, respectively. After this step, both the Verilog codes of these architectures are implemented on the Zynq-7000 FPGA platform for hardware verification, as shown in Fig. 1. This process begins by generating a bit stream file via Xilinx software, which contains a binary sequence of each proposed AES-128 and Blowfish. These files are downloaded on the FPGA platform by controlling the input data and clock frequency with the use of a logic analyser and signal generator, respectively. The output data are also monitored through the logic analyser. The three parameters of performance analysis, namely, hardware utilisation, throughput and power consumption, are determined by using the Xilinx FPGA software.

Fig. 1
figure 1

Implementation setup for the proposed P2M AES-128 and P2M Blowfish

3.1 P2M AES-128

The proposed AES-128 design comprises two important units, namely, data processing and key expansion. The data processing unit for encryption consists of four main transformations, namely, SubBytes, ShiftRows, MixColumns and AddRoundKey. These transformations can be inverted and implemented in reverse order to produce a decryption function. The inverse transformations include InvSubBytes, InvShiftRows, InvMixColumns and AddRoundKey. The proposed AES-128 is designed by using the parallel, pipelined and memory techniques to improve the power throughput with reduced hardware utilisation. Designing the AES-128 architecture using these techniques is part of the contribution of this study. The difference between the conventional AES-128 architecture by using sequential technique and the proposed P2M AES-128 architecture is illustrated in Fig. 2. As shown in Fig. 2a, the sequential technique, which was implemented by Subhashini and Jagadeeswari [21], executed the data signal only in serial order and then transformed the data consecutively in every round. Instead of memory block, the registers were used to store a large amount of data of S-box and inverse S-box. This technique could increase the employment of flip-flops (FFs) and slow down the speed of AES-128 performance because each register had its own timing delay. The power consumed by the conventional design would also be increased.

Fig. 2
figure 2

Differences in AES-128 architectures: a Conventional methodology; b Proposed P2M methodology

Unlike the proposed P2M AES-128 based on Fig. 2b, parallel technique is used to execute 128-bit input data and 128-bit input key data at every round to obtain the data from the memory of S-box and inverse S-box for the processes of SubBytes and SubWord in the data processing and key expansion units, respectively. A total of 256 entries of 8-bit S-box or inverse S-box values are stored in a lookup table (LUT)-based random access memory (RAM) block. Figure 3 shows additional details on the data path of SubBytes and SubWord transformations with the implementation of parallel and memory techniques. These transformations were continuously repeated 10 times for every 128-bit data frame. The S-box is used when the mode is ‘1’ for the encryption process, and the inverse S-box is used when mode is ‘0’ for the decryption process. Through this technique, the execution time for SubBytes and SubWord transformations can be accelerated to obtain a high design throughput. The hardware requirement for these transformations can also be reduced because the same memory is shared, thereby resulting in low power consumption.

Fig. 3
figure 3

Parallel and memory-based techniques in the proposed P2M AES-128 architecture

In this research, the pipelined technique is implemented to achieve the highest possible throughput by dividing AES-128 design into partitions and by placing registers. The registers comprised FFs with a reset function. Figure 2b shows the data path of pipelining in the P2M AES-128 architecture. Every output port of MixColumn and Rcon is also defined as a register at every round to reduce many critical paths and synchronise the data. In the final round, the 128-bit cipher text is obtained after the encryption, or the 128-bit original text is regained after the decryption.

3.2 P2M Blowfish

The P2M Blowfish design comprises two important units: data processing and key expansion units. On the basis of [17], the data processing unit employs 18 P-Arrays and four S-boxes in the F function for encryption or decryption execution within 16 rounds, as shown in Algorithm 1. The P-Array consists of 32-bit subkeys, which are generated in the key expansion unit, as depicted in Algorithm 2 based on [17]. Both units involve only the XOR and ADD operations. As part of the contribution of this study, a combination of parallel, pipelined and memory techniques was used to increase the P2M Blowfish design throughput and reduce its hardware utilisation and power consumption. In general, Fig. 4 illustrates the difference between the conventional 64-bit Blowfish architecture by using the sequential technique and the proposed P2M Blowfish architecture. As shown in Fig. 4a, Kurniawan et al. [34] used the sequential technique in their Blowfish design, where the data signal was executed and processed consecutively in every round. The registers were used to store a large amount of data of four S-boxes. This factor could increase the logic resources and power consumption, and decrease the speed of their Blowfish performance.

Fig. 4
figure 4

Differences in Blowfish architectures a conventional methodology; b proposed P2M methodology

figure f
figure g

In the P2M Blowfish architecture, the parallel technique is used to combine the two 64-bit Blowfish cores to obtain the Blowfish of 128-bit block size for a fair performance comparison with the P2M AES-128. On the basis of Fig. 4b, the two Blowfish cores contain the standard Blowfish operation. Both Blowfish cores are executed concurrently as dual-core and they share the same memory block, which is used to store the data of 18 P-Arrays and four S-boxes, which are presented in hexadecimal form. The pipelined technique was employed in every round to increase the design throughput and ensure accurate timing for real-time communication. In the first round of both the parallel 64-bit Blowfish cores, pipelining path begins with every 64-bit output data after the F function are stored in the registers for the Blowfish operation in the next round. The F function comprises of four 32-bit S-boxes that are processed by using the XOR and ADD operations for the encryption or decryption process within 16 rounds. At the last two rounds, the 64-bit data of each Blowfish core are only swapped and operated with the XOR function before being concatenated to obtain 128-bit final output data.

Specifically, the BRAM of 32-bit with 1024 entries of π data as shown in Fig. 5 is deployed for F function in the proposed Blowfish architecture. Through memory technique, the usage of registers can be decreased which can help speed up the execution time of the Blowfish encryption or decryption process. Basically, the mode is used to select the encryption process at logic ‘1’ or decryption process at logic ‘0’. Then, the input data of 128-bit are divided into two 64-bit data for execution of Blowfish algorithm simultaneously with F function in each round. The F function in both the Blowfish cores shared the same memory block and operated in parallel. The generated output data in each round are stored in the registers for pipelining purpose.

Fig. 5
figure 5

Parallel, pipelined and memory-based techniques in the proposed 128-bit Blowfish architecture

4 Results and Discussion

Another contribution of this study is that the proposed P2M AES-128 and P2M Blowfish were synthesised and implemented on Xilinx Zynq-7000 XC7Z020 FPGA core with Artix-7 technology to analyse their performances in terms of three parameters, namely, hardware utilisation, throughput and power consumption. The maximum clock frequencies of P2M AES-128 and P2M Blowfish are 250 and 324 MHz, respectively. These maximum clock frequencies are obtained at the limitation of the data rate before the simulation waveform began to have a timing error. Generated from the Xilinx software, Table 3 shows the list of hold time, thd and setup time, tsu at certain clock frequency for both the proposed AES-128 and Blowfish. The setup time limits the fastest frequency for the clock which means the shortest period of data signal and hold time must be met to have proper operation [40]. With the two design techniques, the maximum clock frequency of the proposed AES-128 and Blowfish could be increased to meet the hold time and lead to a higher throughput. The timing analysis from Table 3 can also be used as a guideline to identify the maximum clock frequency of the enhanced P2M AES-128 and P2M Blowfish.

Table 3 The value of thd and tsu at different clock frequency

The performance results of these cryptography designs are analysed by using the Xilinx software and discussed as follows.

4.1 Hardware Utilization

The FPGA hardware utilised by the proposed P2M AES-128 and P2M Blowfish is depicted in Fig. 6. The generated implementation report from Xilinx software shows that the proposed Blowfish core is smaller, with 45.3% less usage of the configurable logic block (CLB) and slices compared with the proposed AES-128 core. On the basis of [41], a CLB is formed by two slices, which comprise the LUTs and FFs. All the logic functions of both the proposed AES-128 and Blowfish are operated here. The finding also shows that the proposed Blowfish design used 31.5% less LUT and 5.3% less FFs than the one required by the proposed AES-128. A larger memory is needed by the proposed AES-128 for S-box data storage with a difference of 5.5% if compared with the proposed Blowfish. Only 9% of the input–output (IO) block is used by the proposed Blowfish for real-time implementation through the Zynq-7000 platform. The use of the parallel, pipelined and memory techniques in both proposed cryptography architectures has more impact on the FPGA resources of the proposed P2M Blowfish core. This result also confirms that the proposed P2M Blowfish operation is less complex and has a smaller core size than the proposed P2M AES-128. This characteristic proves that the proposed P2M Blowfish core is more suitable to be implemented in wireless mobile devices with compact functions and low cost for secure communication.

Fig. 6
figure 6

Hardware utilisation between the proposed P2M AES-128 and P2M Blowfish

4.2 Throughput

In this work, throughput was directed to evaluate the characteristic of the proposed cryptography architecture and its performance on Zynq-7000 FPGA. Throughput was calculated by using Eq. (1) based on [23, 30], where it involves the design data size in bits at a maximum frequency within the encryption or decryption latency. Latency is the time interval between the start of encryption or decryption of per block data and the start of the output data, where the encryption or decryption process of the proposed AES-128 and Blowfish includes the execution time of data and key expansion operations. Latency is calculated in clock cycles [23, 30].

$${\text{Throughput}} \left( {{\text{Gbps}}} \right) = \frac{{{\text{Data Size}} \left( {{\text{bits}}} \right) * {\text{Maximum}} {\text{Clock}} {\text{Frequency}} \left( {{\text{MHz}}} \right)}}{{{\text{Latency}}}}$$
(1)

Throughput per slice can be compared by using data transmission speed and design size. This procedure is the most objective method of comparing different security architectures on an FPGA device [42]. The equation for throughput per slice is shown below.

$${\text{Throughput}}/{\text{slice}} = \frac{{{\text{Throughput}} \left( {{\text{Gbps}}} \right)}}{{{\text{No}}. {\text{of}} {\text{slices}} {\text{used}}}}$$
(2)

Figure 7 shows that the throughput of the proposed Blowfish is higher than that of the proposed AES-128 with a 50% gap at a latency of 19 clock cycles. Meanwhile, the throughput per slice for the proposed Blowfish is 98% higher than that of the proposed AES-128. This result shows that with a small design core, the proposed Blowfish can encrypt and decrypt data faster by using the parallel, pipelined and memory techniques.

Fig. 7
figure 7

Performance data of the proposed P2M Blowfish and P2M AES-128 at maximum clock frequency

4.3 Power Consumption

The Vivado Power tool from Xilinx software was used to analyse the power consumption through its power report. In this research, only the dynamic power was analysed during the implementation stage to provide the most accurate power estimation of the user design [43]. This choice was made because the netlist optimisation that affects the final logic resource utilisation, such as register replication or retiming, was taken into account. By default, implementation tools aim to achieve the design performance objective and minimise device utilisation. This idea means that the use of small FPGA hardware corresponds to a low consumption of the dynamic power. Dynamic power is associated with user design activity and switching events in the core or IO of the device [43]. This power depends on the voltage level, logic and routing resources used by the user design and determined as Eq. (3) [43].

$${\text{Dynamic}} {\text{power}} \left( {\text{W}} \right) = \left( {{\text{Clock}} + {\text{Logic}} + {\text{Signals}} + {\text{BRAM}} + {\text{IO}}} \right) ({\text{W}})$$
(3)

On the basis of Fig. 8, the power analysis shows that the proposed P2M Blowfish has a total power consumption of 53 mW, which is a 94% difference from that of the proposed AES-128. This analysis is conducted at 100 MHz clock frequency for both the proposed cryptography designs as the benchmark for their power comparison. Given the simpler function of the proposed P2M Blowfish than that of the P2M AES-128, the lowest power consumption can be achieved through the employment of parallel, pipelined and memory techniques in its architecture. This characteristic can help prolong the battery lifetime of mobile devices which can reduced its operation cost while the security function is executed.

Fig. 8
figure 8

Power consumption of the proposed P2M AES-128 and P2M Blowfish at 100 MHz clock frequency

4.4 Performance Comparison with Reference Works

The performance of the proposed P2M AES-128 and P2M Blowfish is compared with that of reference works based on the FPGA platform in terms of throughput, power consumption and hardware utilisation. These comparisons can be considered a guideline for researchers to evaluate the performance improvement that was achieved by the proposed AES-128 and Blowfish through parallel, pipelined and memory techniques [42]. As depicted in Fig. 9a, at 250 MHz clock frequency, the throughput of the proposed P2M AES-128 is the highest among the previous AES-128 designs with at least an 18% gap. The proposed P2M Blowfish achieved the highest throughput among others with a maximum difference of 50% at 324 MHz clock frequency.

Fig. 9
figure 9

Performance comparison between the proposed P2M AES-128 and P2M Blowfish with the reference works: a throughput vs. clock frequency; b power consumption vs. slices

Figure 9b shows that the proposed P2M Blowfish requires only 8% slices less than the one in [32] with the lowest power consumption of a minimum of 31% compared with the others, thus being the best-enhanced cryptography design. The proposed P2M AES-128 has lower power consumption with a minimum gap of 58% and the use of 6231 slices compared with previous AES-128 designs. These results prove that the use of the parallel, pipelined and memory techniques could shorten the latency of the proposed cryptography designs to speed up their execution time. The combination of these techniques can also reduce the use of slices, which represents hardware utilisation and contributes to low power consumption [42]. Overall, the characteristics of P2M Blowfish can lead to a longer battery lifetime with a small core space of mobile devices and effective cost at a high data speed while performing security function.

5 Conclusions

In this work, the AES-128 and Blowfish algorithms were enhanced by using three design techniques, namely, parallel, pipelined and memory techniques. The performance of the proposed P2M AES-128 and P2M Blowfish in terms of design throughput, hardware utilisation and power consumption was analysed. The findings show that the P2M Blowfish performed the best with at least an 8% difference compared with the P2M AES-128 and other reference works. These performance results also prove that the proposed P2M Blowfish has a possibility to replace the AES-128 as an existing cryptography algorithm, which is still being employed in mobile devices according to the IEEE standards. With its small design core, high throughput and low power consumption, the P2M Blowfish is suitable for use in mobile devices at low cost as a security feature to support wireless communication. In future works, the proposed AES-128 and Blowfish designs will be designed by using complementary metal oxide semiconductor 0.18 μm technology via ASIC methodology for further performance analysis.