1 Introduction

In VLSI design, there is a trade-off between the low power design and the higher operating speeds or the minimum time delay. In most of these systems, low power consumption must be met whilst also achieving the equally challenging goals of high-speed operation and less time delay. In high-performance digital systems, such as microprocessors and digital signal processors (DSP) applications, the necessity for low-power design is becoming a critical challenge. As a result, low-power digital integrated circuit design has become a very active and ever evolving subject in VLSI design.

According to Landauer's principle, non-reversible logic computations must generate heat of the order of kT Joules for every bit of information lost, where k is Boltzmann's constant, and T is the absolute temperature for which the computations are done. The amount of heat dissipating at room temperature is small but not negligible [1]. Multipliers are the prime components in digital signal processors, microprocessors, micro controllers, etc. The architecture of the multiplier should be designed considering the power dissipation. Incorporating reversible logic into the architecture of the multiplier reduces the power losses to a significant value. The reversible logic can be the new normal in the terms power optimization circuits, nanotechnology, fast computing, and digital signal processing [2].

2 Vedic multiplication

The Vedic methods are based on natural principles that are followed by the human mind. Some of them, which are used for multiplication are listed down in Table 1 along with the name of their corollaries.

Table 1 Vedic mathematics sutras and their corollaries

These principles are easier and faster in terms of the manual calculations involved in the multiplication than the conventional methods [3]. It's difficult to remember large numbers most of the time. However, for manual computations, picturing a line diagram and just adding two consecutive product terms is easier. This Vedic technique allows us to remember just few numbers. As a result, for manual calculations, Vedic multiplication is faster and more convenient [4,5,6,7,8,9,10]. “Urdhva Tiryakbhayam” is a popularly known method of Vedic math, where Urdhva means “vertically” and Tiryakbhayam means “diagonally” crosswise. Besides the commonly used Urdhva Tiryakbhayam sutra, Nikhilam sutra is also used for higher radix multiplication [11, 12].

3 Reversible logic

The relevance of reversible logic is established when we are dealing with low power, less area and time-efficient designs. Power dissipation, less chip area and time delay are significant design metrics in the digital design [13, 14].

The truth table of the combinational logic developed using the reversible logic is uniquely determined pattern [13]. The number of inputs and outputs is same in reversible logic gates. This discussion is restricted to two-valued logic functions describing switching logic.

A reversible logic circuit is characterized by the following:

  1. 1.

    Less number of reversible gates (less hardware complexity)

  2. 2.

    Less number of constant inputs.

  3. 3.

    Less number of garbage output.

  4. 4.

    Optimised Quantum cost.

  5. 5.

    Optimised Time Delay

The reversible logic gate employs a one-to-one mapping mechanism to help identify the outputs from the inputs. In reversible logic, the output vector is used to recover the input vector. It meets the requirements of Landauer's principle. Reversible logic is a promising option to the conventional digital logic for arithmetic operations with the optimized characteristics in terms of power, heat dissipation and delay.

Constant input: This can be defined as the number of inputs that are tied to either 0 or 1 value, to synthesize the given logical function.

Garbage output: The number of outputs that aren't employed in the synthesis of a function is referred to as garbage output. These are crucial; without them, reversibility is impossible to achieve.

Quantum cost: The quantum cost of a reversible circuit is calculated by counting the number of 2 × 2 logic gates such as controlled-Not, controlled-V, and controlled-V + gates required to implement the design [15].

Delay: It is defined in terms of the number of gates in the path from any input to any output, provided each gate performs computation in one unit of time and all inputs to the circuit are available before the computation begins [16].

Hardware complexity: Optimizing the hardware is one of the prime objectives of reversible logic. Hardware complexity is defined in terms of the number of EXOR, AND, and OR operations involving a design [17,18,19].

3.1 Toffoli gate

Toffoli gate is a 3 × 3 reversible gate. The first 2 output vectors are simple buffers, and the third output is the function of all three inputs. The inputs, A, B, and C are mapped to the outputs P, Q, and R as shown in Fig. 1. It is also considered as a universal reversible gate. The quantum cost of Toffoli gate is 5 [18].

Fig. 1
figure 1

Block diagram representation of the TOFFOLI gate

3.2 Peres gate

The Peres gate is a 3 × 3 reversible gate with output “P” directly mapped to input “A”. The remaining two output vectors mapping to combinational logic function with input variables. The block diagram of Peres gate is shown in Fig. 2. Quantum cost of Peres gate is 4 [16].

Fig. 2
figure 2

Block diagram representation of the PERES gate

3.3 BVPPG gate

This gate has 5 input variables, (A, B, C, D, E), and its output vector is (P, Q, R, S, T). Figure 3 shows the implementation block of the BVPPG gate. It has the quantum cost of 5 [20,21,22].

Fig. 3
figure 3

Block diagram representation of the BVPPG gate

4 QCA technology

Quantum Cellular Automata (QCA) is a substitute for conventional MOS-based designs of digital circuits and is a more power-efficient technology. The concept of Quantum-dot Cellular Automata (QCA) was introduced by Tougaw and Lent from Notre Dame University in 1993. One of the proposed implementations of the Quantum Cellular Automata is the Quantum-dot Cellular automaton. Quantum-dot Cellular Automata is a lower-level abstraction. Quantum dots are charge containers that have discrete electrical energy states [23]. A QCA cell is made up of four quantum dots. It is the basic computing element in QCA nanotechnology. In a cell, two electrons occupy diagonal position in the quantum cell owing to the Coulomb force, forming two configurations for encoding binary logical "0" and "1". The electrons in the cell interacts with each other using quantum–mechanical tunnelling. The polarization levels ‘ + 1’ and ‘-1’ represent the logic levels ‘1’ and ‘0’, respectively. Typically, QCA devices are described on the basis of symmetric square cells. Computational logic gates and memory structures can be correctly imitated with these symmetric square cells. Combinations of majority and not gates can realize any logic function. These structures can be implemented by assembling QCA cells in a specific geometric pattern to achieve the desired logic function. [24, 25].

In Fig. 4, the electron configurations within the cell for both polarization levels are depicted. Figures 5 and 6 show a basic inverter and a 3-input majority gate, respectively. In a majority gate, we have 3 inputs and 1 output, while the middle cell is the decision-making cell. AND and OR logic is realized with three input majority gates by setting the third input to ‘-1’ and ‘ + 1’ respectively.

Fig. 4
figure 4

Polarized QCA cell defining logic levels 0 and 1

Fig. 5
figure 5

QCA inverter

Fig. 6
figure 6

QCA 3-input majority gate

In Figs. 7 and 8, two strictures of the XOR gate, with cell counts of 14 and 13, respectively, are shown, which are used to realize the proposed designs. The latter structure, which is discussed in the results section, is used to optimise the design.

Fig. 7
figure 7

QCA structure I of XOR gate

Fig. 8
figure 8

QCA structure II of XOR gate

In QCA Designer, wire crossovers and interconnections in complex circuits can be realized using either a co-planar or multilayer approach. In a coplanar approach, two crossing wires are orthogonal to each other so that the crossing cells do not affect the neighbouring cells. The cells in the first wire are oriented at 90° and in the other crossing wire, they are oriented at 45° as depicted in Fig. 9. There is one more way for wire crossovers that does not require cell rotation. It is based on the advancing of the clocking phases from switch to relaxed and back to switch, which is discussed in Sect. 4.1.

Fig. 9
figure 9

Wire crossing a coplanar and b multilayer

4.1 Clocking in QCA designer

Clocking in QCA is different than in conventional digital design. One of the main differences between them is that the latter circuit has no control over the clocks. This means that information is transmitted through each cell and not retained. Each cell erases its own state every clock cycle. Meta-stability is overcome by latching cell arrays to controlled clocking zones. It also facilitates the realization of a pipelined computing architecture. Clock 0 (switch), Clock 1 (hold), Clock 2 (released), and Clock 3 (relaxed) are the four clock zones that are applied systematically to each QCA cell in QCA circuits, with each zone having a phase difference of 90° with the others. This allows information to be pumped through the circuit as a result of the successive latching and unlatching of cells connected to different clock cycles. If a wire is clocked from left to right with ascending clocking zones, the information flows in the same direction as shown in Fig. 10. In QCA Designer, any single cell can be independently connected to any of the clocks, subject to the functionality of the circuit [26,27,28].

Fig. 10
figure 10

QCA clocking with 4 phases

In this paper, wire crossovers in the proposed designs are based on the clocking zones, where cells latched to switch phase can cross cells latched to release phase, and cells on hold phase can cross cells in relaxed phase without having a polarization effect on the neighbouring cells [29]. The energy levels of a system are determined by the polarization and hence the interaction of the cells. In the ground state, they are aligned, and in an excited state, they align oppositely to cell-to-cell repulsion and kink occurs [30, 31].

5 Proposed designs

The maximum benefits of power optimization can be obtained when implemented at the algorithmic and architectural level. The novel reversible gate design is a three-input three-output primary reversible gate with the attributes comparable to the Peres gate. The main feature of the proposed design is to improve functionality with optimised power and delay performance.

The proposed design, henceforward, shall be known as the SS gate. SS (Siddhesh Soyane) is the name of the proposed gate. Let us consider A, B, and C as the inputs and P, Q, and R as the outputs. The truth table for the SS gate is as given in Table 2.

Table 2 Truth table of the proposed reversible gate

The above truth table essentially translates into the following logic equations:

$$P \, = \, (A \cdot B^{\prime } ) \, + \, (B \cdot C)$$
(1)
$$Q \, = \, (B^{\prime } \cdot C) \, + \, (B \cdot C^{\prime } )$$
(2)
$$R \, = \, C^{\prime }$$
(3)

The proposed SS gate with the input and output variables generates 3 outputs as shown in Fig. 11. First output, P, is the function of all 3 input variables. Second output is the EXOR between the inputs B and C while the third output R, is the complement of the third input variable C.

Fig. 11
figure 11

Block diagram representation of the SS gate

Quantum implementation of the SS gate uses 1 Controlled Not gate and 2 Controlled-V gates. The dotted box in the Fig. 12 has a Controlled Not (2 × 2 reversible) gate with a QC of 1 and a Not (1 × 1 reversible) gate with QC of 0. Therefore, the QC of the dotted box is 1 and hence, the Quantum cost of the SS gate is 3. A comparison of the proposed SS gate with basic reversible gates is given in Table 3.

Fig. 12
figure 12

Quantum implementation of SS gate

Table 3 A comparison table of basic reversible gates

Figures 13 and 14 show the layout of the proposed reversible gate in the QCA design tool using both the above-discussed XOR gates, and its simulated waveform is shown in Fig. 15. It has three inputs, a, b, and c, and three outputs, p, q, and r. The logic is implemented using a majority gate with three inputs, an inverter, and an XOR gate.

Fig. 13
figure 13

Proposed QCA layout I of SS gate

Fig. 14
figure 14

Proposed QCA layout II of SS gate

Fig. 15
figure 15

Simulated waveform of SS gate

5.1 SS Gate as half adder-subtractor

The improved functionality of the half adder-subtractor, both can be realized at the same time by using 2 SS gates as shown in the Fig. 16.

Fig. 16
figure 16

Block diagram of half adder and half subtractor using SS gate

For SS gate to function as the half adder-subtractor we shall consider the variables B and C as it’s two inputs. Therefore, the above circuit shall perform C + B and C−B operations. Input A is the constant input which is maintained at the ‘0’ level. Consequently, it shall force the first half of the output Carry, (A’·B), to ‘0’ value and only the (B·C) part produces the required carry for the addition. Similarly, in the Borrow, (A’·B), gives ‘0’ value and (B·C’), remains as the borrow value. Sum or Difference (B ⊕ C) is the third valid output. G1 and G2 are the garbage values. Quantum cost of the proposed half adder-subtractor is 6.

The layouts of the half adder-subtractor using both the XOR structures implemented on the QCA tool are shown in Figs. 17 and 18, respectively. The simulated waveforms in Fig. 19.

Fig. 17
figure 17

QCA layout I of SS gate as half adder

Fig. 18
figure 18

QCA layout II of SS gate as half adder

Fig. 19
figure 19

Simulated waveform of half-adder

Half adder-subtractor is also implemented using the SS gate. It requires two SS gates. Its QCA layout and simulated waveform are shown in Figs. 20 and 21, respectively.

Fig. 20
figure 20

Layout of half adder-subtractor

Fig. 21
figure 21

Simulated waveform of half adder-subtractor

5.2 Proposed 2-bit multiplier

This section proposes two circuits to implement 2-bit multiplier. The multiplication algorithm remains the same while the gates used and hence the parameters associated with the design are optimised in the later. The block diagram of layout I of the proposed 2-bit multiplier is shown in Fig. 22. It uses 2 SS gates and 2 BVPPG gates. The desired functionality is achieved with 4 input variables a0, a1, b0, b1, 6 constant inputs, x1-6, maintained at value '0', and 8 garbage outputs, g1-8.

Fig. 22
figure 22

Block diagram of 2-bit multiplier

The theoretical quantum cost of the design is 16 and the hardware complexity in terms of the logical operations as defined in the Sect. 3, (6α + 8β + 2γ) = 16, where α is a 2 input XOR operation, β is a 2 input AND operation, and γ is a OR operation.

In Fig. 23, we have the QCA layout of the design I of 2-bit multiplier, it is a coplanar design employing the wire crossovers with the help of clocking zone, thus reduced complexity and ease of implementation.

Fig. 23
figure 23

QCA layout I of 2-bit multiplier

The design II layout of the 2-bit multiplier is shown in the Fig. 24, it employs the 2 SS gates and 4 majority gates. It is also a coplanar design. Design layout II of the XOR gate is used. In this design too we use clock zones for wire crossovers, data and energy flow.

Fig. 24
figure 24

QCA layout II of the 2-bit multiplier

The simulated waveform is shown in Fig. 25. Total cell count, energy dissipation, area used, and latency and thus quantum cost is reduced in the design layout II compared to the design I.

Fig. 25
figure 25

Simulated waveform of 2-bit multiplier

5.3 General architecture for 2n × 2n multiplier

In this section a general idea of 2n × 2n is presented. The implementation of the 2-bit multiplier can be extended to 4-bits, 8-bits, and further. The idea of the 2n × 2n (where n = 1, 2, 3…) can be formulated in the algorithm and generalised as shown in Fig. 26. The general block diagram of the 2n × 2n consists of 4 multiplier blocks and 3 adder blocks. The multiplier blocks generate the partial products, and the two-stage adder adds the partial products in the required fashion.

Fig. 26
figure 26

General block diagram for 2n × 2n multiplier

For an n-bit multiplier, we need four (n/2) bit multiplier blocks. The input bit values to be fed to each of the multiplier blocks is shown in Fig. 26. The stage one adders consist of one n-bit adder and one (n + n/2) bit adder. The outputs of the stage one adders are then fed to a (n + n/2) adder at the second stage. The second stage adder gives the product bit values of q [(2n−1): (n/2)]. The first multiplier block directly yields the product bits q [(n/2)−1:0].

6 Results and discussions

The functional correctness of the SS gate, implementation of half adder-subtractor, 2-bit multipliers can be verified manually and through any simulation tool. QCA Designer was used to verify the theoretical and practical outcomes of the proposed designs. For the verification of the proposed designs the default parameters have been taken into consideration [23]. The designs are implemented in single layer with coplanar wire crossing only.

QCA Designer-E, an extended module of QCA designer was used for timing, energy dissipation analysis. QCA circuits are simulated using the bi-stable approximation and coherence vector simulation engines. Energy dissipation can be calculated using QCA Designer–E using Coherence vector energy simulation engine setup and considering parameters [27] shown in Fig. 27. The simulation was performed in coherence vector energy mode with the energy display option for each cell turned on. Total energy dissipation, Sb along with the error SbE and average energy dissipation per cycle, Ab and error AbE are associated with the energy dissipation. The design analysis, Sb, Ab, cell count and area associated with the proposed circuits are presented in Table 4. The designs used for comparison were taken from the papers citied in the reference section, simulated again on the QCA designer and then analysis was tabulate in the comparison tables.

Fig. 27
figure 27

QCA Designer engine setup for energy analysis

Table 4 Design analysis using QCA designer-E

In Table 5, the proposed design of the reversible gate used as a half adder is compared with the conventional logical implementation of the half adder [32, 33]. Though, conventional logic requires less number of cells and area, by the virtue, design based on reversible logic dissipates less energy. Further, in Table 6, the half adder-subtractor proposed in the paper is compared with a half-adder subtractor based on the reversible logic presented in [27]. Quantum cost is calculated as the product of area and latency (clock delay) [34]. Cell count and energy dissipation are found to be less in the proposed design. As discussed in Sect. 5.2, two layouts of a 2-bit multiplier are implemented using two different structures of the XOR gate; the analysed features of both layouts are compared with a design presented in [33]. The analysis is tabulated in Table 7, where we find the design II with better optimized parameters.

Table 5 Half adder comparison between [32] and proposed design in QCA
Table 6 Half adder-subtractor comparison between [27] and proposed design in QCA
Table 7 2-bit multiplier comparison between [33] and proposed designs in QCA

The design I of the 2-bit multiplier was also implemented on Xilinx ISE and the analysis and comparison with the designs in [20] are tabulated in the Table 8.

Table 8 2-bit multiplier comparison between designs [20] and the proposed design

6.1 Implementing the design on kintex7 (KC705)

The general block diagram presented in Sect. 5.3 for the n-bit multiplier architecture is implemented on Kintex-7, the design utility and delay are compared with the multiplier-less squaring architectures in [13] and [14]. Delay and the number of LUTs used are compared in Tables 9 and 10, respectively.

Table 9 Delay comparison between squaring architecture designs in [13, 14] and the proposed design implemented on Kintex-7 (KC705)
Table 10 Comparison of device utilization between squaring architectures designs in [13, 14] and the proposed design implemented on Kintex7 (KC705)

It is observed that the proposed architecture produces more delay for lower bits, but with an increase in the bit value, the delay reduces and the area is also optimized.

7 Conclusion

The novel 3 × 3 reversible gate presented in the paper is compared with the existing basic reversible gates. It has been modified to work as a half-adder and half adder-subtractor. The proposed designs are implemented with two different XOR structures in the QCA layout. Designs realised with the latter structure are more optimised. The presented designs are comparable and better than some existing designs, as found in the result and discussion sections. Design II of the 2 × 2 multiplier was optimised for the cell count and area, and hence the energy dissipation, over Design I. According to this study, clock zone-based crossover can reduce the complexity and improve the performance of the design. Further, a general 2n bit multiplier architecture is presented exhibiting a simple 2 step optimised design. It is compared with a squaring architecture; as the value of bit n increased, the performance metrics improved.