1 Introduction

Many studies have been carried out to find an alternative to complementary metal–oxide–semiconductor (CMOS) technology, which is no longer cost-effective due to the need for even more miniaturized nanoelectronic devices in various applications, such as signal processing [1]. The conducted studies have suggested quantum-dot cellular automata (QCA) as a promising alternative to CMOS technology. Low power consumption, very high speed, low delay, and compact size are some of the advantages of QCA circuits. Additionally, adders, as fundamental digital logic circuits and the most widely used computational elements, have received much attention from numerous researchers [2, 3]. Adders have also been used as one of the main blocks in several very large-scale integration (VLSI) circuits such as various processors and microprocessors.

QCA technology provides a very practical approach to designing low-energy circuits. In 1961, Rolf Landauer found another essential reason for lower cost besides the non-ideal behavior of transistors [4]. Landauer showed that in conventional circuits, each bit of information has an energy loss of kTln2 joules, with k being the Boltzmann constant and T the process temperature. This little energy low adds up to a significant amount considering modern circuits with millions of operations per second, especially considering the use of small-scale transistors on a smaller area. Adders are widely used in VLSI circuits, so optimized designs based on new technologies can significantly improve the performance of these circuits.

This circuit also requires 16 clock cycles, and the quantum cost is 0.35. A QCA FA consisting of 64 cells and an area of 0.07 μm2 is presented in [5]. A multilayer QCA FA is presented in [6], which consists of 93 cells covering an area of 0.087 μm2. This design requires 4 clock cycles at a cost of 0.087. The QCA FA design in [7] is implemented with 73 cells, which cover 0.09 μm2. The delay of the circuit is 3, and the cost is 0.03. The design in [8] requires 28 QCA cells that cover a 0.01 μm2 area. The delay of this FA is 3, and its quantum cost is 0.007. For designing a 0.02 μm2 FA, only 18 QCA cells are used in [9]. The delay of this design is 2, and it has a 0.01 quantum cost. A 128-cell QCA FA is presented in [10], which covers 0.15 μm2 area with a delay of 3. The FA design in [11] requires 82 QCA cells, which cover an area of 0.06 μm2. The quantum cost for this design is 0.045.

The design of full adders as an extensively used block in complex circuits is also of great importance. Full adder designs with low energy consumption and simple structure can help achieve simpler digital circuits. Another important logic design is the hybrid full adder/full subtractor circuit, performing both addition and subtraction.

In the present paper, a new full adder is presented based on the AG (Amiri Gate) design. Section 2 briefly introduces the QCA technology. Section 3 reviews the adder circuits that have been designed based on QCA technology, with a special focus on recent circuit designs. Section 4 proposes an AG-based full adder block and a carry-save adder. In Sect. 5, the proposed designs are compared with recent designs in terms of total quantum cost, delay, and gate count. Finally, concluding remarks and future work are given in Sect. 6.

2 QCA technology

2.1 Basic ideas of QCA

In QCA, each QCA cell represents a logic bit in nanoscale. A QCA cell includes four dots and two electrons trapped inside the cell. The Coulomb repulsive force between the electrons generates the logic values "0" and "1". QCA cells can be represented by squares (Fig. 1), and the two logic values are determined based on the potential barriers and clock phases.

Fig. 1
figure 1

Structure of QCA cells

Each electron moves inside the cell and tunnels between the dots. The motion of an electron inside the cell is nonlinear since the Coulomb force is resulted not only from the interactions between the electrons inside the cell but also from electrons in any adjacent cell. The logic state of a QCA cell is affected directly by the logic state of the adjacent cell, so the logic state can be transmitted to the next cells sequentially [12].

2.2 QCA wires

The placement of QCA cells next to each other forms a QCA wire in which a binary signal (logic value "0" or "1") is transmitted from input to output because of the electrostatic interaction between adjacent cells. In general, there are two wire types in QCA, i.e., 45-degree and 90-degree wires (Refs. [13] and [14]). Figures 2 and 3 show 90- and 45-degree wires, respectively.

Fig. 2
figure 2

90-degree QCA wire

Fig. 3
figure 3

45-degree QCA wire

2.3 QCA wire crossing

Forming the intersection of two QCA wires that cross each other is an important issue. There are two techniques for wire crossing without interference. In the first technique, the wires intersect in the same layer (coplanar crossover), while the second technique uses multiple layers to form the intersection of wires [14]. In the multilayer technique, only 90-degree cells with non-adjacent clock phases are utilized in the wire design, and the wires are implemented on different layers to prevent interference [12, 15]. In coplanar crossing, both the 45-degree and 90-degree wires are used [12,13,14,15].

2.4 QCA timing and clocking

In QCA technology, clocking operation is performed by applying the clock signal in four different periodic phases. In fact, the QCA clock controls the barriers inside cells, in addition to synchronizing information flow. Electrons can move when the barriers are low, while they are trapped in dots when the barriers are high. Therefore, QCA clock phases can create two polarization states. The phases of QCA clock are shown in Fig. 4. In the first phase (switch), the barriers are gradually raised, and the QCA cell attains its input value. At the end of the switch phase, the barriers are sufficiently high so that electron tunneling is prevented and the cell is locked. In the second phase (hold), the barriers are still high, and the cell can transmit its data to the adjacent cells since it is completely stable. In the third phase (release), the gradual lowering of the barriers destabilizes the cell. The cell loses its polarization in the relax phase since there is no need to its data. In the fourth phase (relax), the barriers are kept low, and the cell is not used. When the relax phase ends, the switch phase starts again, and the whole procedure is repeated [16].

Fig. 4
figure 4

Four QCA clock phases

3 Research background

In the present research, a carry-save full adder is designed. Before introducing the proposed adder and the evaluation criteria, we review the QCA-based full adder designs in the literature.

3.1 Background of full adders

Binary addition is the most basic mathematical operation. Full adders and half adders are extensively employed in building arithmetic circuits. In the proposed single-bit adder, A, B, and Ci are the inputs, while C (Carry) and S (Sum) are the outputs. The first majority gate-based QCA full adder with three inverters and five majority gates was proposed by Lent et al. in 1994 [12]. Equations (1) and (2) give the carry and sum formulas of the proposed full adder circuit.

$${\text{Carry}} = M\left( {A, \, B, \, C_{{{\text{in}}}} } \right)$$
(1)
$${\text{Sum}} = M\left( {M\left( {A,\overline{B},C_{{{\text{in}}}} } \right),\; \, M\left( {A,B,\overline{C}_{{{\text{in}}}} } \right),\;M\left( {\overline{A},B,\;C_{{{\text{in}}}} } \right)} \right)$$
(2)

where M may represent either M3 (three-input majority gate) or M5 (five-input majority gate), respectively. The majority gate yields the result of Carry. Sum has to be optimized since inverters and majority gates produce it. After introducing the five-input majority gate (M5), few full adders based on QCA have been introduced [17]. The Sum relation is presented in Eq. (3). Three-input majority gates have an essential role in full adder circuits.

$$S = M_{5} - M3\left( {A,\;B,\;C_{{{\text{in}}}} ,\; \overline{C}_{{{\text{out}}}} ,\; \overline{C}_{{{\text{in}}}} } \right)$$
(3)

A Boolean logic function for the proposed full adder receives Ai, Bi, and Ci as inputs and presents Si and Ci as outputs. The outputs of Si and Ci are functions of inputs as given in Eqs. (4) and (5). Outputs Si and Ci and inputs Ai, Bi, and Ci are Boolean factors. In fact, Ai and Bi can be considered the ith bit of, respectively, Ai and Bi integers, and Ci is the carry bit related to ith position. The proposed full adder calculates Ci+1 (Carry out) and Si (Sum) as follows [14]:

$$S_{i} = A_{i} \oplus B_{i} \oplus C_{i}$$
(4)
$$C_{i + 1} = A_{i} B_{i} + A_{i} C_{i} + B_{i} C_{i}$$
(5)

3.1.1 Background of carry-save adders

In a carry-save adder (CSA), the number of additions is reduced from three to two. The total propagation delay is obtained by summing the propagation delays of the three gates (without considering the number of bits).

The carry-save block contains n full adders, which perform summation and carry a bit according only to the bits related to the three input values. An important drawback of CSAs is the difficulty in detecting signs. For example, for A and B (the carry-save pair), which represent a number with an actual value of C + S, the exact sign of C + S is not known. The proper sign is not understood unless in the case of a full-length addition. Under this condition, carry look-ahead adder (CLA) can more effectively deal with the situation. A comparison with other adders will be made later.

4 Simulation setup and proposed design

The proposed 1-bit QCA full adder and 8-bit QCA CSA are simulated using QCADesigner tool version 2.0.3. This software allows easy design and simulation of QCA circuits using effective design tools, which are available only in sophisticated circuit design applications. QCADesigner uses two simulation engines based on coherence vector and bistable approximation. We use the bistable approximation mode in this paper due to its more rapid performance compared to the coherence vector mode. Table 1 presents the parameters used for bistable approximation simulation for all structures [18]. The gates of the proposed structures are simulated separately, and their performance is evaluated (Fig. 5).

Table 1 Simulation parameters
Fig. 5
figure 5

Block diagram of AG full adder

4.1 Proposed (AG) full adder

The full adder is a good example of a system where the majority circuit uses less logic gates than the best sum of products decomposition. The layout for a single full adder with a carry-save full adder is shown in Fig. 6a.

Fig. 6
figure 6

Proposed full adder (AG); a logical design, b cellular layout

A novel QCA-based area-efficient coplanar full adder is presented in this subsection based on Eqs. (4) and (5).

The logical function of three-input majority gate is defined by Eq. (6).

$$M3 \, = \, \left( {A, \, B, \, C_{{{\text{in}}}} } \right) = {\text{AB }} + {\text{ BC}}_{{{\text{in}}}} + {\text{ AC}}_{{{\text{in}}}}$$
(6)

The outputs of the QCA full adder can be computed as follows:

$${\text{Sum}} = A \oplus B \oplus C = M\left( {C_{{{\text{in}}}} ,\; \overline{C}_{{{\text{out}}}} ,\;M\left( {A,B, \overline{C}_{{{\text{in}}}} } \right)} \right)$$
(7)
$$C_{{{\text{out}}}} = {\text{AB}} + {\text{AC}}_{{{\text{in}}}} + {\text{BC}}_{{{\text{in}}}}$$
(8)

In addition, the output of the full adder can be computed as follows:

$${\text{Sum}} = M\left( {\overline{{M\left( {A,B,C_{{{\text{in}}}} } \right)}} ,\;M\left( {\overline{{M\left( {A,B,C_{{{\text{in}}}} } \right)}} ,\;B,\;C_{{{\text{in}}}} } \right),\;A} \right) = M\left( {\overline{C}_{{{\text{out}}}} ,\;M\left( {\overline{C}_{{{\text{out}}}} ,\;B,\;C_{{{\text{in}}}} } \right),\;A} \right)$$
(9)

Figure 6 shows the QCA block diagram for the implementation of this full adder circuit. Moreover, the sum output can be computed as follows:

$${\text{Sum}} = M5\left( {\overline{C}_{{{\text{out}}}} ,\;\overline{C}_{{{\text{out}}}} ,\;B,\;C_{{{\text{in}}}} ,\;A} \right)$$
(10)
$${\text{Carry}} - {\text{out}} = M\left( {A,\;B,\;C_{{{\text{in}}}} } \right)$$
(11)

The designed circuit for the one-bit QCA full adder circuit is shown in Fig. 5. In this circuit, A and B are two one-bit inputs and Cin is the carry input. Carry and Sum denote the outputs of carry and sum, respectively. The proposed design includes 62 cells, as shown in Fig. 6b. Additionally, for generating Sum and Carry outputs, three clock pulses are required. In this circuit, three single-bit inputs are denoted by A, B, and Cin. Conventional QCA cells are used in the proposed QCA-based full adder. The designs that use compact XOR gates [19] are much simpler than those with inverters and three-input majority gates. In these designs, the use of extended vertical wires can lead to unreliable signals (Table 2).

Table 2 Truth table of AG full adder

Figure 6b shows the simulation results using the bistable approximation engine by default settings. The simulation results illustrate that the designed FA performs correctly. The delay is 0.25 clock cycles. Based on our simulation results that are shown in Table 3, our proposed QCA FA circuit has a minimum number of cell count, area, delay and cost in comparison with previous designs in [16, 17, 19,20,21]. Hence, the proposed QCA FA is applicable to design larger QCA circuits such as CSA circuits (Fig. 7).

Table 3 Comparison of QCA full adder designs
Fig. 7
figure 7

Simulation results of the proposed FA gate

4.2 Proposed QCA-based carry-save full adder

A carry-save full adder reduces the number of additions from three to two. The total propagation delay is equal to the three gate delays for any number of bits. Each CSA unit contains n full adder blocks. Each block calculates sum bit and carry bit solely based on the three input bits. Then, the final sum can be calculated via left-shifting the carry sequence by one unit and adding a zero bit in front of the partial sum sequence; finally, by adding this sequence with CSA, the (1 + n)-th bit value is generated. This process can continue indefinitely. In this process, an input is added for each full adder stage without requiring an internal carry propagation, and the stages can be arranged as a binary tree structure. In this circuit, the number of bits for each input is fixed.

The CSA algorithm is mainly implemented in multipliers, which are used for a wide range of applications in high-speed digital signal processing. Using CSA leads to faster carry propagation in multiplier circuits (Garg, 2004). The main advantage of CSA is fewer outputs. The proposed CSA has 12 inputs and six outputs. Moreover, the proposed CSA needs four full adders and a ripple carry adder (RCA), while it requires fewer QCA cells. In this subsection, the advantages of the proposed circuit, which consists of regular QCA cells, are introduced. Additionally, the input and output cells are located on opposite sides, so the proposed structure can be more easily realized by integrating different QCA designs, including logic and arithmetic unit architectures. Figure 8 shows the structure of the adder circuit with the proposed CSA.

Fig. 8
figure 8

Adder circuit structure with CSA

The bits of the two numbers have to pass through a FA. To obtain the final result, the intermediate result has to be given to a CSA. The same logic as in Eqs. (1) and (4) is used. Moreover, the proposed low-complexity gate is used in the proposed CSA circuit. The CSA circuit includes 521 cells and each cell occupies 0.64 μm2, as shown in Fig. 9. To obtain the desired output, 1.75 clock cycles is required. The proposed CSA circuit is efficient in terms of gate count, so a lower propagation delay compared to conventional adders is guaranteed. We presented the circuit layout of the proposed full adder by using only a CSA.

Fig. 9
figure 9

Implantation of QCA in the proposed CSA circuit

5 Results and discussion

In the present paper, we comprehensively analyzed consumption in our proposed method and made a comparison with the results of previous designs. Prior to comparing the proposed design with the previous designs, a comprehensive analysis is presented based on circuit bound models. To choose the most optimized scheme for our designs, we analyze and compare the structure and energy consumption of the recent QCA-based designs to be used in the proposed CSA. The proposed circuits are simulated to evaluate the effect of different QCA parameters, and the analysis results are presented in Fig. 7. The proposed FA operates well in low-energy circuits and is optimized in QCA technology. Cell minimization techniques are employed to achieve the smallest number of cells for the proposed FA. These techniques use inverters based on cell rotation.

The lower complexity of the proposed design compared to the circuits presented in Refs. [16, 17, 19,20,21] is indicated in Table 3. Additionally, the designs in Table 3 use a single layer that can simultaneously access the input and output cells. The design presented in Ref. [20] uses two clock pulses, 66 cells each with an area of 0.06 μm2, and one XOR gate, so it is superior to the previous designs. Our proposed FA circuit is compared with the designs presented in Refs. [16, 17, 19,20,21].

In the previous subsections, it was described how n-bit calculations are performed by the proposed single-bit adder without requiring a stack or combination of single-bit adders. However, the stack of single-bit adders is necessary in conventional adders. Additionally, this is advantageous in terms of area occupation because the QCA cell count is fixed. The numbers of cells in the proposed AG-based FA and the proposed CSA circuit are equal to 62 and 521, respectively. The simulation results of the proposed QCA-based CSA circuit are shown in Fig. 10.

Fig. 10
figure 10

Simulation results of the proposed CSA

According to Table 4, the area occupation of the proposed CSA circuit is smaller than that of previous designs [22, 24] since this circuit has an area-efficient adder to increase the operating speed and reduce the occupied area. However, multilayer circuits consume more lower cost. Moreover, our design shows better performance in other aspects, such as delay. Consequently, compared to previous designs, the total costs of the proposed complex adders are considerably reduced [23, 24].

Table 4 Comparison of 8-bit QCA CSA circuits

The present study aimed to exploit the compactness of QCA-based adders as well as designing energy-efficient circuits. An area-efficient approach was also used with polarity switching of the proposed FA in the first stage of this study. The advantage and efficiency of the proposed approach compared to the existing methods were shown using logic gates that consume low energy. Moreover, full adders are the basic units in digital logic and arithmetic circuits. A one-layer eight-bit QCA-based CSA circuit is presented in this article. The proposed design offers good performance regarding the delay, area, and cell count compared to existing designs. The proposed eight-bit CSA circuit based on QCA depends on a new dedicated QCA full adder circuit.

6 Conclusion

The present study aimed to utilize the low energy consumption feature of QCA adders and design QCA-based low-energy circuits. Therefore, a major part of this research was dedicated to implementing a high-performance full adder circuit. Moreover, the energy dissipation of the proposed circuit was analyzed to present a more detailed insight into the QCA operation principles. This study presented a robust and compact CSA design. The proposed circuit was compared with recent designs, and the superiority of our proposed circuit was confirmed. The proposed scheme provides many logic operations with only a few additional elements and takes into account carry propagation. Furthermore, it was shown that the proposed eight-bit CSA design occupies less area, needs a smaller number of cells, and has a lower delay compared to conventional CSA schemes. The suggested design can be used for designing an n-bit QCA-based CSA. Also, the proposed idea can be a fundamental impression for designing other types of adders such as full subtractor and ripple carry adder.