1 Introduction

Nowadays, the ECC technique is used for high-security standards, which is providing security information and transaction applications such as personal digital assistants (PDA), cellular phones, smart cards, web servers, SKM, digital signature, finance, and data authentication. The ECC technique is the powerful public-key cryptography (PKC) technique, which is employed to secure the information in wireless devices (Chiou et al. 2017; Hosspain and Kong 2015; Liu et al. 2017; Zhu et al. 2013). The ECC technique provides more safety to the modern Rivets-Shamir-Adleman (RSA) security with expressively shorter-key length (SKL). The FPGA technology is recycled for a hardware execution of linked reproduction, which provides a shorter design time, low cost and high flexibility of the system (Lai and Huang 2011; Liu et al. 2017). The wireless security system is employed for two types of cryptography such as public and shorter key cryptography (PKC and SKC). The SKC design architecture is compact and in crucial small size. The PKC design produces an essential technology for necessary arrangement, digital signature and encryption and decryption (Lee et al. 2014; Guitouni et al. 2011; Azarderakhsh and Mozaffari-Kermani 2015; Azarderakhsh et al. 2015).

The reliable communication systems offer an integrated platform like reliability, data confidentiality, message authentication for the security services. These services are not possible without secure group key management protocol. The group key agreement (GKA) technique is following different kinds of group key management organizations like centralized, distributed, and contributory. But the main drawback of this technique is limited computational complexity (Esmaeildoust et al. 2013; Debiao et al. 2016). The ECC technique is based on elliptic curves, which are defined as prime fields, binary fields and finite fields (Yeh et al. 2013). The RSA algorithm is widely used for secure data transmission. But RSA algorithm is presently helpless because of the fast factoring attack in cryptanalysis. The ECC technique provides higher security compared to RSA algorithm (Debiao and Zeadally 2015; Kimmo and Mozaffari-Kermani 2014; Kuang et al. 2016; Meher and Lou 2017; Shukla et al. 2020).

In elliptic curve system, more time is required to operate the ECC in number based reproduction (Sonali and Shekhar 2016). The amount based reproduction is classified into two atomic blocks: ECC point adding (ECCPA) and ECC plug replication (ECCPR). However, GF (p) is a support to particular elliptic curve but cannot support several elliptic curves (Sonali et al. 2016). The ECC system is implemented by using digit-serial Gaussian normal basis (GNB) multipliers. The GNB multiplier will be recycled where efficient outcomes are needed, and this method provides less robustness (Sonali et al. 2016; Sree et al. 2017; Vijeyakumar et al. 2016; Perianin et al. 2020).

To overcome this problem, the ECC processor structure addresses improvements which is based on two aspects like power and system performance. In this paper, the DMM and DVM multipliers are used for cryptography system design, of which multipliers are designed by different kinds of adders such as optimized carry look ahead adder (OCLA), optimized carry bypass adder (OCBA), and look up carry select adder (LCSLA). First, the DVM-LCSLA method reduces the cycles for multiplicative inversion over a finite field. Twin field multiplier is also accepted for the additional processing speed. Instantly, DVM-LCSLA scheduler has controlled the data-path for both serial and parallel power modes (Sai et al. 2019). Hence, the vitality—adaptive system, calculating improvement with active control concert trade-off is introduced. These methods provide better performance regarding FPGA and ASIC than the conventional manner. FPGA implementation results indicated that the reduced power utilization, decreased time delay information and minimized Hardware area overhead are achieved. The remaining part of this work is discussed as follows:

  • Section 2 discusses the operation of dual field multipliers with ECC architecture

  • Section 3 discusses the function of proposed DVM-LCSLA method

  • Section 4 discusses simulation result and performance evaluation

  • Section 5 discusses the conclusion and future scope of the work

2 Dual field multipliers: ECC architecture

The ECC processor supports practical security applications like ECDSA and extensive data encryption and decryption systems, containing all original error correction based calculation and general predictable processes called as step binary, step accumulation, coordinate conversion, numbering multiplication, Montgomery pre-processing, Montgomery supported processing, inversion, and predictable field multiplication. Arbitrary elliptic curve and finite field can be organiser-designed for the tractability. Figure 1 demonstrates the DVM-LCSLA structure with four accumulation units (AUs) of combined DVM and CSLA. The system consists of the core manager, error correction scheduler and Montgomery Scheduler (MS). The foremost manager translates the information towards operating the Error Correction unit and Clock Control Unit (CCU). Each error correction operation includes an order of linked exponentiations and accompaniments. Thus the elliptic cryptography scheduler performs the operation of the instruction of data-path elements iteratively. Then elliptic cryptography component contains register contributor (RC) and four EC data selectors to contact the Montgomery unit (MU) and dual-field adders underneath similar (four AUs) and sequential (one AU) control styles Fig. 1.

Fig. 1
figure 1

The block diagram of the dual-field Montgomery multiplier-carry save adder architecture

The elliptic cryptography information chooser deciphers controller instruction after the elliptic cryptography and then the central controller near the MU for linked reproduction process and the dual-field adder is used on behalf of the modular adding operation. Then read data (RD) has stored an intermediate result to the register bank. In this architecture, two multipliers will be used such as Montgomery multiplier and dual field Vedic multiplier. These multipliers are optimized with the help of different adders such as OCLA, OCBA, and LCSLA which is explained in following sections.

2.1 DMM-CSA structure

The CSA is one of the digital adder employed in computer microarchitecture to calculate the sum of several N-bit numbers in binary value. The differences from another digital adder that produces binary outputs of similar dimensions of the same input and the order of limited bit is an arrangement of the transmit bit. In this work, the DMM architecture is designed for the ECC system by using CSA circuit.

In the CSA, a long carry propagation is one of the main problems, because this adder has required several full adder circuits. Hence, this CSA design has occupied more area. Existing design utilizes more hardware and also has poor ASIC performances Fig. 2. To improve the ASIC and FPGA performances, we have proposed four methods such as DMM-OCLA, DMM-OCBA, DMM-LCSLA, and DVM- LCSLA- methods which are implemented to analyze output performance.

Fig. 2
figure 2

Block diagram of the DMM-CSA

2.2 DMM-OCLA structure

In the past years, most research works has been focusing on the minimizing delay of the addition operation. The system of high performance and high speeds are being invented, which need high-speed adders and addition being the fundamental function of the most circuit. Figure 3 shows the architecture of the existing CLA structure.

Fig. 3
figure 3

Existing CLA architecture

The existing CLA design requires several full adder (FA) circuits for given input bits (A0, A1, A2, B0, B1, B2). For example, 8-bit CLA adder design requires eight FA circuit that needs more area. With the help of FA design, we can able to design 8 bit CLA which has shown Fig. 4.

Fig. 4
figure 4

Structure of optimized DMM-CLA

An optimized CLA using FA design with register is used instead of two or more FA design requirement. The output data are stored in the register based input clock cycle. In the OCLA, reduced carry propagation delay, where the circuit design occupies less area compared to DMM-CSA method is done.

2.3 DMM-OCBA structure

Figure 5 shows the circuit diagram of the CBA structure. In the CBA, the input is believed to be stacked in equal, and skip data (signal) of all blocks are set up at same time. The first skipped block requires to have the even size as an un-skipped block before it achieves the objective that all the chief multiplexer’s data sources show up independently in the CBA.

Fig. 5
figure 5

A circuit diagram of existing CBA structure

The conventional CBA design requires several numbers of logic gates, of which circuit design occupies more area in the multiplier design. Therefore, the LUT circuit is utilized for CBA design of the DMM design. The collection of LUT substitutes ensures execution calculation within the similar collective index process. Then the time consumption will be substantial, meanwhile recovering an assessment after recollection is quicker associated with contribution and production process. The optimized LUT-CBA architecture is exposed in Fig. 6 in which dispensation time and area will also be less when compared to the existing CBA adder design.

Fig. 6
figure 6

Block diagram of the DMM-OCBA structure

2.4 DMM-LCSLA structure

The elementary knowledge of this work is to use LUT as a substitute of ripple carry adder (RCA) through Cin = 1. The architecture of the CSLA with its Binary to Excess-1 Converter (BEC) is revealed in Fig. 7.

Fig. 7
figure 7

Block diagram of the existing CSLA

The block diagram of the first optimized LCSLA adder is depicted in Fig. 8, by using the optimized LCSLA architecture used in fast arithmetic process applications. Hence the lower power consumption is accomplished with reduced hardware area overhead and used in high-speed applications. The LCSLA is operating in numerous complex structures to cut the transmit circulation interruption.

Fig. 8
figure 8

Block diagram of the optimized LUT-CSLA

The elementary knowledge of this exertion is to customise LUT as an alternative of RCA through the consistent LCSLA in the direction of accomplishing subject area and control depletion. The main advantage of this LCSLA is that the time taken to perform RCA has been reduced, and it consists of one full adder and one-half adder.

The input arrival time is smaller than the multiplexer collection input arrival time. Established on the collection line input Cin, this adder provides each LUT output or multiplexer output. Therefore, the DMM-LUTCSLA method has improved computation time of the ECC system. But this adder is not much suitable for the Montgomery multiplier design, due to LCSLA design, implemented for DVM design of the ECC system.

3 Proposed DVM-LCSLA method

Figure 9 shows 2 × 2 multiplication by using a Vedic multiplier. The ECC configuration can be realized by using a 8 × 8 double field Vedic multiplier.

Fig. 9
figure 9

2×2 Vedic multiplication

The Vedic multiplier configuration is executed by utilizing LCSLA [shown in 2.1.4]. In this strategy, two kind of fields, for example, binary field and prime field are utilized for cryptography systems. Several essential standards and substitute—plans utilized in Vedic science are implemented to determine total numeric multiplication. The Vedic multiplier design is fast and appearing differently in relation to the Montgomery multipliers. The Vedic multiplier is applied to all sorts of cutting edge plans. Here think about the Urdhva Triyagbhyam increase, which has binary duals, multiplicand (a1, a0) and multiplier (b1, b0). Thus, the results after duplication technique of binary numbers give 4-piece of yield.

For the most part, Vedic multiplier is following the underneath steps,

Step 1:

The perpendicular multiplication of least significant bits (LSB) produce a definitive outcome of the least significant bits

Step 2:

At that point the inclining multiplication of LSB multiplicand bits and most significant bit (MSB) of multiplier realizes the multiplier bits freely. The including system gives the second bit of last outcome

Step 3:

Increase the MSB of the multiplicand and the multiplier. The creation is added to the past multiplier to accomplish in stage 2 additional system. By then, aggregate and correspondence are evaluated as the third and quarter piece of the finishing thing. Figure 9 shows the outline of the 2 × 2 increase by using a Vedic multiplier Fig. 9.

The 4×4 DVM of the block diagram is showed up in Fig. 10. In segment 2.1.5, the estimation of the Dual Field Vedic multiplier is presented. As demonstrated by this diagram, the Verilog code is made to affirm the results. This block contains four 2×2 multiplier block and three viper block. In this diagram, a0 to a3 and b0 to b3 address as four-bit input regard.

Fig. 10
figure 10

Working procedure of 4×4 DVM

4x4 dual field Vedic multiplier block diagram is appeared in Fig. 11. From the outset, Least Significant Bit (LSB) of the two data (a0, a1 and b0, b1) is given to the commitment of 2×2 multiplier block to perform increment movement. In the subsequent stage, a2, a3 and b0, b1, third stage a0, a1 and b2, b3, at definite stage a2, a3, and b2, b3 values play out the 2×2 multiplier movement. Last two stage multiplier puts away one adder similarly as the starting two-stage multiplier sets aside in one more adder. The two adders results gives the commitment of the last adder. Finally, 8-bit results are passed on in the yield of the DVM structure. The proposed FPGA execution of LCSLA technique based execution estimations are generous than the current procedure (Karthikeyan and Jagadeeswari 2020).

Fig. 11
figure 11

4x4 dual field Vedic multiplier block diagram

3.1 Cache memory for elliptic curve cryptography with error correction scheme

In this work, first, the whole multiplication algorithm has been unrolled with the goal that no additional cycles are squandered for circle tasks. Second, we reused the working registers as a memory cache to diminish the quantity of fundamental burden tasks. A wide range of hashing capacities can be utilized to plan address to various areas in the cache ways. It has been recently demonstrated that XOR planning accomplish less miss rates when contrasted with the set-acquainted cache structure. Since the ECC data is stored with every data zone stored in the cache, values are encoded before they are composed inside the storage space so as to create the ECC bits.

Error correcting codes utilized for this reason for existing are themselves hashing capacities that create a piece vector from another information vector. As this hashing is accomplished for fault identification purposes, we propose to utilize the ECC encoding circuit as the hashing capacity of the slanted cache and expel the stored ECC bits from the cache structure all together and utilize the ECC bits for ordering. Figure 12 shows the proposed architecture where the registered ECC bits for the tags are not, at this point stored inside the cache ways however rather the ECC is utilized as the hashing capacity and the figured ECC pieces are utilized as the records to the cache ways.

Fig. 12
figure 12

Fault tolerant cache architecture

The Fig. 13 shows the overall work flow of side channel attack against cache. Upon reading of the tag, the read esteem is checked for any conceivable delicate errors before the stored tag is looked at against the tag some portion of the memory address for a potential cache hit result. This new technique, diminishes the successful territory of the cache and makes it less inclined to delicate errors.

Fig. 13
figure 13

Overall work flow of side channel attack against cache

4 Results and discussions

The proposed elliptic curve cryptography with multiplier design has been captured in Verilog hardware description language (HDL), and implementation has been done on Xilinx ISE Design Suite 14.1 targeting Virtex-6 FPGA device.

The simulation output result of proposed Vedic multiplier is shown in above Fig. 14. This multiplier is used in ECC architecture. As a result, the proposed dual field Vedic multiplier—look up table carry select adder can achieves a higher throughput and much smaller area-time product (ATP) than previous strategies

Fig. 14
figure 14

Simulation result

The simulation response of hit–miss logic is shown in Fig. 15. During simulation, cache enters the tag compare state where it investigates the labels and checks the legitimate bit to choose whether there is a store hit or miss

Fig. 15
figure 15

Simulation results of hit—miss logic result

The simulation response of RAM-cache is shown in Fig. 16. By using this proposed dual field Vedic multiplier—look up table carry select adder the performance of cache is improved perfectly

Fig. 16
figure 16

Simulation results of RAM—cache

The simulation response of tag valid array is shown in Fig. 17. The tag substantial cluster has been utilized for getting to the information from the information array and keeping up the bits.

Fig. 17
figure 17

Simulation results of tag valid array

The simulation output result of point multiplication delay in proposed DVM-LCSLA multiplier is shown in above Fig. 18.

Fig. 18
figure 18

Point multiplication delay obtained with DVM-LCSLA

The RTL schematic diagram of proposed DVM-LCSLA multiplier is shown in above Fig. 19. As compared to other multipliers, the proposed DVM-LCSLA multiplier has a low area because of less number of adder levels in the underlying algorithm.

Fig. 19
figure 19

RTL schematic of DVM-LCSLA

The floor plan view of proposed dual field Vedic multiplier-look up table carry select adder based cryptography is shown in Fig. 20.

Fig. 20
figure 20

DVM-LCSLA-floor plan view

Table 1 and Fig. 21 discuss the performance analysis of power consumption. This comparison clearly states that the proposed dual field Vedic multiplier-look up table carry select adder based memory design obtain the best results against power consumption as compared with existing methods, for example total power consumption of proposed system is 18.63 \(\upmu {\text{W}}.\)

Table 1 Performance analysis of power consumption
Fig. 21
figure 21

Performance analysis of power consumption

Figure 22 discuss the performance analysis of time complexity. In this comparison, it clearly states that the proposed dual field Vedic multiplier-look up table carry select adder based memory design obtain best results against time complexity as compared with existing methods, for example overall time complexity of proposed system is 6 s only.

Fig. 22
figure 22

Performance comparison of time analysis

Figure 23 discusses the performance analysis of Area overhead. This comparison clearly states that the proposed dual field Vedic multiplier-look up table carry select adder based memory design produce best results against area overhead as compared with existing methods

Fig. 23
figure 23

Performance comparison of area overhead

5 Conclusion

This work introduces dual field Vedic multiplier-look up table carry select adder architecture for cryptography based system. The proposed DVM-LCSLA is developed based on Xilinx software by using Verilog code. In this method, the multiplier is used to perform the multiplication operation, where, this multiplier, as an alternative to the accumulator, the LCSLA accumulator was used to evaluate constraints such as controller power and interrupted delay. Among existing methods, DVM-LCSLA method give better results in FPGA and ASIC performances. In FPGA implementation, factors like requirement of LUT, flip-flops, and frequency have been improved in DVM-LCSLA. Hence the hardware area overhead reduction (86.01%), Power Reduction (74.63%) and time delay (29.61%) are reduced in proposed DVM-LCSLA with 180 nm technology, and Hardware Area overhead Reduction (46.77%), Power Reduction (78.42%) and time delay (21.23%), are reduced than the conventional methods in 45 nm technology. In the future work, ECC architecture and internal blocks will be optimized to minimize the ASIC and FPGA performances further.