1 Introduction

A reversible logic has application in quantum computing. Reversible circuits are required to have an equal number of inputs and outputs. They are designed without any feedback and fanout. There are a few parameters or resource constraints used to measure the performance of reversible circuits namely quantum cost (QC), garbage outputs (GO), ancilla inputs (AI), gate count (GC), and delay \((\triangle )\). The quantum cost of a reversible circuit is the number of \(1\times 1\) and \(2\times 2\) quantum gates that are used to construct the circuit. Garbage outputs are the ones that are neither the primary outputs nor the ones required for further computation. An ancilla or constant inputs are required to derive a certain function and to retain one-to-one mapping. A delay corresponds to the number of primitive quantum gates in the critical path of the circuit.

Multipliers are the major computational units that are used frequently in digital signal processing computations. Optimization is a major objective in designing a multiplier with design constraints. In a reversible circuit design, it is necessary to minimize the count of garbage outputs and ancilla inputs in order to reduce the total number of qubits; therefore, we present the design of a reversible multiplier which produces zero garbage outputs and minimizes the number of ancilla inputs compared to the existing multiplier designs in the literature.

In this work, we present a modified version of the add and shift method of multiplication. The basic components used in our design are add or NOP block and rotate right (ROR) block. To meet our design requirement, we also present a modified circuit of reversible ALU design presented in [1]. In addition, we present a generalized circuit design methodology that is supported by a generalized behavioral model to design a constant depth rotate right reversible circuit. This design is motivated by the design presented in [2]. The reversible multiplier design presented in this work outperforms existing multiplier designs in terms of its garbage outputs and ancilla inputs. We also give an estimate of the gate count, quantum cost, ancilla inputs, and delay for \(N\times N\) qubit multiplier. In the optimization of performance parameters, there is always a trade-off; such as in optimizing one parameter, the other parameters get affected. Here, our objective is to optimize the ancilla inputs and garbage outputs, due to this the remaining parameters get affected. To indicate the trade-off, the estimation of all the performance parameters is given for each block used in designing a \(N\times N\) reversible multiplier. The paper is organized into several sections namely: Sect. 1 provides an introduction to reversible logic gates; Sect. 2 elaborates on the background of reversible logic gates; Sects. 35 cover existing designs, behavioral model of the proposed design, and the proposed circuit design methodology, respectively; Sects. 68 cover performance parameters calculation, results comparison, and conclusion, respectively.

2 Background on reversible logic gates

This section covers the basics of reversible logic gates. Any N variable reversible system is built with \(N\times N\) reversible circuits. A few \(1\times 1\) and \(2\times 2\) primitive quantum gates are used to construct large-sized reversible gates and circuits. The quantum cost of reversible gates used in this work can be found in [3]. The reversible gates used in this work are Fredkin, CNOT, Toffoli, and Swap gates [46].

2.1 CNOT gate

A CNOT gate is also known as a Feynman gate (FG). It is a \(2\times 2\) reversible gate. The inputs and outputs are denoted as (AB) and ( PQ ), respectively. Here, A is treated as a control qubit, while B is treated as a target qubit. The mapping of input to outputs are denoted as \(P \leftrightarrow A \), \(Q \leftrightarrow A \oplus B \). The block diagram and symbol of CNOT gate are shown in Fig. 1. QC of FG is 1.

Fig. 1
figure 1

CNOT gate and its symbol

2.2 Toffoli gate (TG)

This gate is also known as a \(C^{2}\) NOT gate. The TG used in our work is a \(3\times 3\) gate with inputs ( ABC ) and outputs (PQR), respectively. Here, A and B are the control qubits, while C is the target qubit. The mapping between inputs and outputs is given with the relation \(P \leftrightarrow A\), \(Q \leftrightarrow B\), \(R \leftrightarrow (A \cdot B) \oplus C\). The block diagram and symbol of TG are presented in Fig. 2. QC of TG is 5.

Fig. 2
figure 2

Toffoli gate and its symbol

2.3 Fredkin gate (FRG)

A Fredkin gate is commonly used as a controlled Swap gate. In this paper, we use a \(3\times 3\) Fredkin gate. AB, and C are the inputs and PQ, and R are the output qubits. The mapping of the input and outputs are given based on the value of A, which is the control qubit. When A is high, \( Q \leftrightarrow C \) and \(R \leftrightarrow B \). When A is low, \( Q \leftrightarrow B \) and \(R \leftrightarrow C \). Irrespective of A value, \( P \leftrightarrow A \). The block diagram and symbol of FRG gate are shown in Fig. 3. QC of FRG is 5.

Fig. 3
figure 3

Fredkin gate and its symbol

2.4 Swap gate (SG)

The Swap gate is a \(2\times 2\) gate. It swaps the input and output qubits unconditionally. The mapping is \( A \leftrightarrow Q \) and \( B \leftrightarrow P \). The block diagram and symbol are shown in Fig. 4. QC of SG is 3.

Fig. 4
figure 4

Swap gate and its symbol

3 Existing work

The research on reversible logic is being explored in the domains of design, synthesis, and testing. Although there are many synthesis techniques available to realize reversible circuits, having dedicated designs of a reversible circuit component gives flexibility in choosing the designs based on the application requirement. Multiple ways of designing arithmetic circuits have been explored in conventional, reversible and quantum computing [721]. Several interesting contributions have been explored in the existing synthesis of reversible logic circuits [3, 2229]. In this section, we discuss the existing reversible multiplier designs that are important arithmetic circuits in processing digital signals.

There are several multiplier designs proposed by many authors in view of optimizing different performance parameters namely quantum cost, ancilla inputs, garbage outputs, logic depth, and a combination of these parameters. The multipliers proposed in [3034] follow two phases in computing product terms. In the first phase, the partial products are computed. In the second phase, the summation of partial products are computed to get the final product terms. In all the designs mentioned above, the partial product generation and summation stages are improved either in terms of constant inputs, quantum cost, or garbage outputs. We found that the design in [34] gives better results when compared to other existing work in terms of ancilla inputs and gate count. Apart from the regular parallel or array multiplier designs, other techniques of multiplications like Booth, Wallace, and Vedic are proposed by researchers in [3537]. All these designs are illustrated for smaller operand width. It is necessary for any designer to choose designs based on their performance over a wide range of operand width. We found that the recent publication on multiplier in [38] discussed NxN reversible multiplier design. We considered the designs proposed in [38, 39] and compared it with our work, as the efforts in both of these papers were to reduce the number of constant inputs and garbage outputs.

4 Proposed reversible multiplier behavioral model

In this section, we present an algorithm for multiplying two n qubit numbers A and B. The result is stored in 2n qubit product register P. There are two conventional techniques of add and shift method of multiplying two numbers. Shift the multiplicand left and add it to the product register contents iteratively or add the multiplicand and shift the product register contents right iteratively. In the first technique, after the computation is completed, the multiplicand will not be in its original form and since one cannot recover the multiplicand, it will be considered as garbage qubits. We choose the second technique. The final result will be the contents of the product register and the multiplicand contents are unaltered, so that garbage outputs are not generated.

4.1 Behavioral model of \(N\times N\) multiplier

figure a

The multiplier algorithm is best explained using an example of \(4\times 4\) multiplication with a dot diagram as shown in Fig. 5. Initially, the P register is loaded with ancilla 0 qubits. The multiplicand B is added to the P register contents. For the P register, it considers a least significant position (LSP) from \( n-1 \) and move up to \( 2n-1 \) position. For the B register, LSP starts from 0 and moves up to \( n-1 \). A multiplicand is added to the P register only if the corresponding multiplier qubit is high; otherwise, only rotate operation is performed. The rotate right operation is performed irrespective of the value of a multiplier qubit. While adding a multiplicand to the P register, the \(( n-1)\)th position of the P register is aligned to the 0th position of the B register. Due to this alignment, one rotate operation is eliminated at the end of computation. The diagram shown in Fig. 5 is self-explanatory of the algorithm.

Fig. 5
figure 5

\(4\times 4\) qubit multiplication dot diagram

4.2 Behavioral model of rotate right operation

In this section, we present a rotate right operation for 2n qubit data width. In the multiplication technique presented in the Algorithm 1, there is a need to rotate the P register contents to the right; the size of the P register is 2n qubit width. The rotate right operation is performed by swapping the qubits in two stages. The circuit is designed to obtain constant logic depth. To give an illustrative example, we present a rotate right operation in Fig. 6 for data width of 8 qubits. The numbers 0–7 represent the qubit positions. The initial representation of qubits is shown in the left most part of Fig. 6. The rotate right operation is performed in two stages. In the first stage, qubits in the position pairs (0, 7), (1, 6), (5, 2),  and (4, 3) are swapped [(0, 7) indicates 0 is swapped with 7]. In the second stage, (0, 6), (1, 5), and (2, 4) are swapped. It is visible from the diagram that the swapping of qubits in each stage is parallel which reduces the logic depth compared to the sequential shifting of qubits. The reversible circuit design of rotate right operation will be discussed in the latter section of this paper. We present a generalized pseudo-code for the method discussed in Fig. 6 to swap the qubits in two stages.

Fig. 6
figure 6

Rotate right with two sets of disjoint transpositions

The pseudo-code presented in the Algorithm 2 performs rotate right operation by 1 qubit position. This code works for both even and odd data width. The Algorithm 2 will be useful in any application when data input width is variable (i.e., even or odd). For the multiplication technique proposed in this paper, the width of product register (P) is always even; hence, the second half of the pseudo-code is redundant for the proposed work.

figure b

5 Proposed garbageless reversible multiplier circuit design

From the behavior models Algorithms 1 and 2 presented in the Sect. 4, it is clear that to compute the product of two n qubit numbers, we need to design the following reversible circuits: (1) n qubit addition or no operation (ADD/NOP) circuit; (2) uncontrolled rotate right operation circuit. This section elaborates on the reversible circuit design methodology of ADD/NOP and rotate right (ROR) block.

5.1 ADD or NOP circuit design

Fig. 7
figure 7

Reversible ADD/NOP circuit

ADD or NOP block has evolved from the ALU design proposed in [1]. We have modified the original work to adapt to our garbageless multiplier design. The reversible circuit design of ADD/NOP block is shown in Fig. 7. Inputs to the ADD/NOP block are:

(a) n qubit product register \(| P_{[2n-1:n-1]}\rangle \); (b) n qubit input operand \(| B_{[n-1:0]}\rangle \); (c) 1 qubit input Zcin initialized with ancilla 0; (d) 1 qubit input operand \( A_{[m]} \), where m is the qubit position varying from 0 to \( n-1 \). Here, \( A_{[m]} \) acts as the control qubit; if it is high, the P and B register contents are added. At the output, we have the B register contents unaltered, where the P register contents will have the sum of P and B contents. If \( A_{[m]} \) is low, then the B and P register contents are regenerated at the output without modification. The role of Zcin is to propagate the carry generated from the previous qubit position. If the control qubit \( A_{[m]} \) is high, \( P_{[2n-1]} \) will have the final carry out generated; otherwise, it will retain its value.

The computation of ADD/NOP block is summarized below. Here, the index of B varies from 0 to \( n-1 \) and P varies from \( n-1 \) to \( 2n-1 \), according to the requirement of the multiplier design. The index of A is chosen to be m which ranges from 0 to \( n-1 \), where n is the size of operands (multiplier and multiplicand). Here, j is used to indicate the qubit position of the product register P.

  1. 1.

    Computation Phase 1 Initialize Zcin with ancilla 0 qubit. In further stages, the same line will propagate the carry generated from the previous stage.

  2. 2.

    Step 1 Apply \(3\times 3\) Toffoli gate at locations \( A_{[m]} \), \( B_{[0]} \), and \( P_{[n-1]} \). After the computation, \( A_{[m]} \) and \( B_{[0]} \) will retain their value. \( P_{[n-1]} \) will get transformed according to the equation given below.

    $$\begin{aligned} | P_{[n-1]} \rangle =( | A_{[m]} \rangle \cdot | B_{[0]} \rangle ) \oplus | P_{[n-1]} \rangle \end{aligned}$$
    (1)
  3. 3.

    Step 2a For 0 \( \le \) i \( \le \) \( n-1 \) and \( n-1 \) \( \le \) j \( \le \) \( 2n-2 \), apply \(3\times 3\) Fredkin gate at locations \( P_{[j]} \), \( B_{[i]} \), and Zcin. Here, \( P_{[j]} \) acts as a control line to FRG gate and it will not change after the computation. The remaining lines \( B_{[i]} \) and Zcin will get modified according to the equations shown below.

    $$\begin{aligned} \left| Zcin \right\rangle= & {} {\left\{ \begin{array}{ll} \left| B_{[i]}\right\rangle &{} \text {if } \left| P_{[j]} \right\rangle = 1 \\ \left| Zcin \right\rangle &{} \text {if } \left| P_{[j]} \right\rangle = 0 \end{array}\right. } \end{aligned}$$
    (2)
    $$\begin{aligned} \left| B_{[i]} \right\rangle= & {} {\left\{ \begin{array}{ll} \left| Zcin \right\rangle &{} \text {if } \left| P_{[j]} \right\rangle = 1 \\ \left| B_{[i]} \right\rangle &{} \text {if } \left| P_{[j]} \right\rangle = 0 \end{array}\right. } \end{aligned}$$
    (3)
  4. 4.

    Step 2b For 1 \( \le \) i \( \le \) \( n-1 \) and n \( \le \) j\( \le \) \( 2n-2 \), apply a \(3\times 3\) Toffoli gate at locations \( A_{[m]} \), \( B_{[i]} \), and \( P_{[j]} \). After the computation, \( A_{[m]} \) and \( B_{[i]} \) will retain their value. \( P_{[j]} \) will get transformed according to the equations given below.

    $$\begin{aligned} \left| P_{[j]}\right\rangle = {\left\{ \begin{array}{ll} \left| P_{[j]}\right\rangle \oplus \left| B_{[i]} \right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 1 \\ \left| P_{[j]} \right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 0 \end{array}\right. } \end{aligned}$$
    (4)

    Steps 2a and 2b execute concurrently.

  5. 5.

    Step 3 Apply a \(3\times 3\) Toffoli gate at locations \( A_{[m]} \), \( B_{[n-1]} \), and \( P_{[2n-1]} \) . This step is required in order to store the final carry out after n qubit addition. After the computation, \( P_{[2n-1]} \) stores final carry out.

    $$\begin{aligned} \left| P_{[2n-1]}\right\rangle = {\left\{ \begin{array}{ll} \left| B_{[n-1]} \right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 1 \\ \left| P_{[2n-1]} \right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 0 \end{array}\right. } \end{aligned}$$
    (5)

    In other words, \( P_{[2n-1]} \) stores the final carry out only if the control line that is corresponding to the multiplier qubit \( A_{[m]} \) is high; otherwise, it will restore the previous value present in it, that means, it will retain the previous carry value stored in \( P_{[2n-1]} \) from the previous computation.

  6. 6.

    Computation Phase 2 The steps in this phase contribute to the generation of the final product value P and the regeneration of the contents of multiplicand B .

  7. 7.

    Step 4a For \( 2n-2 \ge j \ge n-1 \) and \( n-1 \ge i \ge 0 \), apply a \(3\times 3\) Fredkin gate at locations \( P_{[j]} \) , \( B_{[i]} \), and Zcin. After the computation, the value at \( P_{[j]} \) will be retained as it is whereas Zcin and \( B_{[i]} \) will get modified according to the equations shown in Computation Phase 1, Step 2a.

  8. 8.

    Step 4b For \( 2n-2 \ge j \ge n-1 \) and \( n-1 \ge i \ge 0 \), apply a \(3\times 3\) Toffoli gate at locations \( A_{[m]} \), Zcin, and \( P_{[j]} \). At the end of computation, \( A_{[m]} \) and Zcin will retain its value; whereas, \( P_{[j]} \) will get modified according to the equation mentioned below.

    $$\begin{aligned} \left| P_{[j]} \right\rangle = {\left\{ \begin{array}{ll} \left| Zcin\right\rangle \oplus \left| P_{[j]}\right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 1 \\ \left| P_{[j]}\right\rangle &{} \text {if } \left| A_{[m]} \right\rangle = 0 \end{array}\right. } \end{aligned}$$
    (6)

Steps 4a and 4b execute sequentially.

5.2 Rotate right reversible circuit design

The reversible circuit for the rotate right operation is shown in Fig. 8. The circuit takes no ancilla and rotates the data to the right from MSB qubits to LSB qubits by 1 position (ROR). The reversible rotate circuit is designed using Swap gates and performs a rotate operation with constant delay. For clarity of understanding, we have shown eight qubit ROR circuit design. The product register qubits \( P_{[0]} \) to \( P_{[7]} \) are given as input. After one rotate operation, \( P_{[0]} \) occupies \( {P_{[7]}}\)th qubit position and qubits from \( P_{[7]}\) to \( P_{[1]} \) shift to the right by one position. The quantum cost of Swap gate is 3. The delay in performing the rotation operation involves two Swap gates in series; therefore, the constant delay of 6 is obtained by considering each cycle individually and decomposing it into two sets of disjoint swaps. The gates shown in the dotted boxes are executed in parallel. The rotator design is motivated by the property proven in [2]. According to the authors, any permutation is the composition of two set of disjoint transpositions. This is illustrated in Fig. 6, which also shows that the cycle is a composition of two reflections. The authors have proven that any permutation of n qubits can be performed in 4 layers (levels or logic depth) of CNOT gates with n ancilla input qubits, or in six layers with no ancilla input qubits (delay of two Swap gates). If we had opted for first technique of multiplication in which the multiplicand (n bit operand) is shifted, it leaves the multiplicand altered, which in turn will yield garbage or garbage output.

Fig. 8
figure 8

Reversible rotate right circuit (ROR)

The rotate circuit proposed performs an unconditional rotate operation in the sense that irrespective of multiplier qubit value, a rotation is performed. Another option that we explored was to use a conventional multiplication technique that needs conditional shift or rotate circuit. A controlled Swap gate or Fredkin gate instead of Swap gate can be used to rotate the qubits. The rotate operation is controlled by \( A_{[m]}\) qubit. The same control qubit is used by all the Fredkin gates in the rotate circuit as shown in Fig. 9. Here, the computation becomes sequential and the delay will increase with the size of the rotate circuit unlike our proposed design. Another reason for ignoring the design shown in Fig. 9 is that the quantum cost of Fredkin gate is more than the Swap gate. To optimize the delay and quantum cost, we omitted this option. If the delay has to be maintained constant, then one has to store the control lines for these Fredkin gates and use them in parallel. This will again increase the number of ancilla lines, thus violating our objective of minimizing the ancilla lines.

Fig. 9
figure 9

Alternative reversible rotate circuit

5.3 Reversible multiplier circuit design methodology

Fig. 10
figure 10

Reversible multiplier circuit

In this section, we illustrate the design steps of a reversible multiplier show in Fig. 10.

  1. 1.

    For \( m=0 \) to \( n-2 \) repeat Step-1 and Step-2.

  2. 2.

    Step 1: ADD or NOP Apply the data qubits \( A_{[m]} \), Zcin, product register \( P_{[2n-1:n-1]}\), and the multiplicand register contents \( B_{[n-1:0]} \) to ADD/NOP block. After the computation, the contents of \( A_{[m]} \), Zcin, and B are restored; whereas, the P register contents will get modified according to the computation equations mentioned in the Section V-A. ADD/NOP circuit will perform the addition on P and B register contents if \( A_{[m]} = \)high; otherwise, the contents of those registers are retained.

  3. 3.

    Step 2: rotate right (ROR) Apply the P register contents to ROR (ROR-1 block indicates rotate right by 1 position) block, which performs a rotate right operation with a constant delay of 6. The computation is carried out as follows:

    $$\begin{aligned} \left| P_{[2n-1:0]}\right\rangle \leftarrow \left| P_{[2n-1:0]}\right\rangle \circlearrowright 1. \end{aligned}$$
  4. 4.

    Step 3: update \( m=n-1 \), repeat Step-1.

6 Performance parameters calculation

In this section, we discuss the performance parameters and the calculation for each circuit used in the reversible multiplier design. As a final part of the calculation, we show the overall calculation of the reversible multiplier.

6.1 Performance parameters of ADD/NOP block

The equations shown below are with respect to the design mentioned in Fig. 7. The calculation of quantum cost (QC) is shown below.

$$\begin{aligned} \text {QC(ADD/NOP)}&=5 \times \text {No. of TG} +5\times \text {No. of FRG}\nonumber \\&=5\times (2n+1)+5\times (2n)\nonumber \\&=20n+5 \end{aligned}$$
(7)

The ancilla inputs include the product register qubits \((n +1)\) which are initially set to ancilla 0 and Zcin used for carry propagation initialized to ancilla 0. So the ancilla for n qubit ADD/NOP block is given below.

$$\begin{aligned} \text {AI(ADD/NOP)} = n+2 \end{aligned}$$
(8)

The delay of ADD/NOP block includes the critical path delay. To find the critical path, the design has been divided into stages where each stage is sequential in execution. This is illustrated in Fig. 11 in vertical lines. The computation stages are divided into Phase 1 and Phase 2. The Phase 1 consists of the computation of half sum and final carry out. In Phase 2, the full sum is computed and the content of the B register is regenerated. There are \( 3n+2 \) stages. We have computed the number of stages involving Toffoli gates and Fredkin gates. Since the delay is proportional to the quantum gates present in each reversible gate, the total delay of ADD/NOP circuit can be found by using the equation shown below.

$$\begin{aligned} \bigtriangleup \text {(ADD/NOP)}&=\text {Delay of single stage} \times \text {No. of stages} \nonumber \\&=5\times (3n+2)\nonumber \\&=15n+10 \end{aligned}$$
(9)

6.2 Performance parameter of ROR block

The performance parameters are estimated for the rotate right reversible circuit (ROR). It is discussed in the previous sections that to perform one-time rotation, two phases of swap operations are carried out. The swapping of qubits in each phase are computed parallely. The rotate circuit presented has no garbage outputs and no ancilla input qubits. Qcost (QC) and delay \((\bigtriangleup )\) are calculated as follows:

$$\begin{aligned} \text {QC(ROR)}&=3\times \text {No. of SG}\nonumber \\&=3\times (n-1)\nonumber \\&=3n-3 \end{aligned}$$
(10)

The above equation is for a generalized rotate circuit design. For our work, we feed the 2n qubits of the product register contents to the rotate block. The modified equation is shown below.

$$\begin{aligned} \text {QC(ROR)}&=3\times \text {No. of SG}\nonumber \\&=3\times (2n-1)\nonumber \\&=6n-3 \end{aligned}$$
(11)
$$\begin{aligned} \bigtriangleup \text {(ROR)}&=6 \end{aligned}$$
(12)
Fig. 11
figure 11

Critical path computation

6.3 Performance parameters of \(N\times N\) reversible multiplier block

The performance parameters calculation for \(N\times N\) reversible multiplier block is generated by summing up the calculations of ADD/NOP and rotate right block (ROR) components.

$$\begin{aligned} \text {QC(Mul)}&=n\times \text {QC(ADD/NOP)}+(n-1)\times \text {QC(ROR)}\nonumber \\&=n\times (20n+5)+(n-1)\times (6n-3)\nonumber \\&=26n^{2}-4n+3 \end{aligned}$$
(13)

Although the ancilla inputs of the rotate right circuit is nil, the input to the ADD/NOP block and ROR circuit is the P register contents, and all the 2n qubit locations of P register which are initialized with ancilla 0 qubits.

$$\begin{aligned} \text {AI(Mul)}&=\text {AI(ADD/NOP)}+ \text {AI(ROR)} \nonumber \\&=2n+1 \end{aligned}$$
(14)

The delay of the multiplier is the summation of delay of ADD/NOP block and rotate right circuit.

$$\begin{aligned} {\bigtriangleup }\text {(Mul)}&=n \times \bigtriangleup \text {(ADD/NOP)}+(n-1)\times {\bigtriangleup }\text {(ROR)}\nonumber \\&=n \times (15n+10)+(n-1)\times 6\nonumber \\&=15n^{2}+16n-6 \end{aligned}$$
(15)

7 Comparison results

In the literature, there are more designs presented for a \(4\times 4\) reversible multiplier. It is necessary to design a circuit which is scalable to any size. Hence, we compared our design with other \(N\times N\) reversible designs that are available in the literature. The designs proposed in [38, 39] show the calculation for \(N\times N\) reversible multiplier. In both of these papers, the authors have shown only the ancilla inputs and garbage outputs calculations; the comparisons shown in Tables 1 and 2 list only ancilla inputs and garbage outputs for these papers. It is clear from the result shown in Table 1 that as the operands size increases, the percentage of improvement also increases. Our proposed design showed a better improvement in terms of the ancilla inputs resulting in saving the chip area since the number of lines are reduced.

Table 1 Ancilla inputs comparison of \(N\times N\) reversible multiplier
Table 2 Garbage outputs comparison for \(N\times N\) reversible multiplier

We have listed the garbage outputs of the designs proposed in [38, 39]. Our design outperforms the existing designs because it is 100 % better in terms of garbage outputs since our design produces no garbage. We compared our design with another garbageless reversible multiplier design using the recursive scheme in [40]. Here, we compare our work with Karatsuba multiplier design presented in [40]. The comparison for gate count, ancilla inputs, and delay is shown in Table 3. The design of Karatsuba (1) shown in Table 3 follows Bennet’s first scheme. An extra register is used to store the result and the circuit is run backward. The parallel recursive calls are made to reduce the time complexity. The design of Karatsuba (2) also uses parallel recursive call, but the design does not follow Bennet’s first scheme, instead it follows a recursive garbage disposal scheme. The multiplication is computed parallel to garbage disposal. But the trade-off is that the number of gates increases due to the different design blocks adapted in the garbage disposal design process. The design of Karatsuba (3) follows Bennet’s first scheme for garbage disposal. The only difference with respect to the design of Karatsuba (1) is that the recursive calls are sequential rather than parallel. Karatsuba (4) is designed using the recursive scheme similar to the one adapted in Karatsuba (2), but recursive calls are sequential. For the designs presented in [40], we considered the minimum bound on ancilla inputs calculation. It is observed from Table 3 that with the slight increase in the delay and gate count, the proposed design has improved the ancilla inputs compared to all the Karatsuba designs.

Table 3 Comparison with Karatsuba recursive multiplier

8 Conclusion

In this work, we have proposed ADD and rotate based on an \(N\times N\) reversible multiplier design. We presented the general behavioral model of the design. The proposed multiplier is compared with the relevant existing reversible multiplier designs in the literature. We presented the generalized equations for the performance parameters of the proposed reversible multiplier. It is observed from comparison results that our work outperforms the other designs in terms of the ancilla inputs and zero garbage outputs. The proposed design can be integrated in a larger data path subsystem designs where the garbage outputs and ancilla inputs reductions are the major concerns.