1 Introduction

Quantum-dot Cellular Automata (QCA) is an innovative technology that promotes potential improvements over performance obtained through conventional Complementary Metal–Oxide Semiconductor (CMOS) operation [1,2,3,4]. It is considered as a developing technology to meet energy efficient strategy of logic circuits [5,6,7]. A four-dot squared cell is the basic element of the QCA that contains two free, equal charges [8,9,10]. These electrons involve the dots diagonally because of the Coulombic interaction [2, 11, 12]. The QCA encodes binary data by the charges instead of current. Fast operation, low energy consumption, and small dimensions are considered as the advantages of the QCA circuits [13, 14].

On the other hand, a shift register is a form of the memory in which, cells are connected in a line. Each cell stores one bit of information and its contents are shifted to the next cell during each clock cycle [15]. Registers are an essential component in any digital devices to store digital information. Though there are numerous advantages provided by QCA, there are some challenges identified. Fabrication defects in the QCA technology cause to create defective cells in the substrate [16]. It is necessary to identify the types of errors to assess and improve the robustness of the QCA-based circuits, especially shift registers. Therefore, searching the source of errors in the QCA-based shift register operation and their relative probabilities of occurrence are very challenging. Despite the importance of the shift register in QCA-based designs, fault tolerance ability in the molecular shift register is not addressed properly in the literature.

Therefore, an efficient and reliable 2-bit Universal Shift Register (USR) based on the QCA technology is proposed in this paper. A Rotated Majority Gate (RMG) is used as the main factor in the proposed design because of its correct functionality in the face of misalignment and displacement faults [17]. The functionality of the RMG is based on the Coulombic interaction among four neighboring QCA cells, which depends on the accuracy and geometry of its implementation [18]. Also, the functionality of the RMG is evaluated under different deposition to investigate the correctness of the proposed circuit. Defect tolerance is attained by using this gate at a logic gate level, which is necessary for achieving an acceptable manufacturing yield [19].

The rest of this paper is organized as follows: some of the previous design approaches about shift register based on the QCA are surveyed in Section 2. The proposed method and its layers based on QCA are explained in Section 3. The simulation results using QCADesigner simulator are discussed in Section 4. Finally, conclusions are provided in Section 5.

2 Related Work

The previous works presented several QCA-based shift register designs. An architecture of shift-register based on QCA has been presented in [20] where it maintains data in a stable conformation. The memory architecture is based on a current dual-phase synchronized and line-based one-bit memory cell block. It provides size, density and latency developments over some one-bit memory cells through its efficient clocking scheme. Also, it maintains data indefinitely by applying opposite values to the two inputs. A row enable signal is defined for each row to permit the read/write operations. An XNOR gate is used to apply opposite values to control inputs of the dual-phase and line-based memory cell for read/write operation. However, shift-register memory architecture needs extra circuitry to reserve the stored value after a read operation. Furthermore, the fault-tolerance features of the proposed method are not discussed.

Moreover, a new architecture for asynchronous registers in Null Convention Logic (NCL), as a solution to the “layout \(=\) timing” problem in the QCA circuits has been proposed in [21]. The TH22 and TH12 gates are considered as the main elements of the NCL register. The TH22 gate can be implemented with a majority gate and the feedback to one of its inputs. The TH12 gate is the same as OR gate and can be implemented employing a majority gate with one of its inputs fixed at ‘1’. The NCL register leads to having a great reduction in the number of QCA cells required to implement an asynchronous NCL register. New NCL register used to design a serial adder to prove the working of the proposed register in a sequential circuit. So, the proposed design is an efficient way to ease the “layout \(=\) timing” problem in QCA circuits. Even though sequential circuit designed with the new NCL needs a careful layout design to obtain correct timing, it can be interfaced with other circuitry without worrying about the layout affecting timing. Therefore, the “layout \(=\) timing” problem is not solved locally. Also, the fault-tolerance features are not considered in this design.

A new design of QCA-based 8-bit USR using 4 \(\times \) 1 multiplexer and D-FF implementation has been proposed in [16]. The proposed 8-bit USR consists of eight 4 \(\times \) 1 multiplexers and eight D-FFs. If the outputs of the shift register are available, then the serial input data may be outputted in parallel from the flip-flop. Also, the multiplexers and D-FF in [16] have the low complexity, area and delay compared to some previous designs. They may be utilized in processors designing for speed enhancement or digital communication. However, cell missing and possible defects occurrence in the proposed 8-bit USR are not considered.

Furthermore, Purkayastha, De [22] have proposed a modified design of 4-bit and 8-bit USR. At first, QCA layouts of D flip-flop with clear input and 4 to 1 multiplexer are used to design 4-bit and 8-bit USR. The proposed QCA-based USR takes serial data as inputs and performs both left and right shift operation upon them. This is decided by the 4 to 1 multiplexer circuits. There is a 4 to 1 multiplexer and a D flip-flop for each bit of the shift register. The proposed USR layout has an intense reduction of complexity, area and clock cycle delay in comparison with the previous design. The 8-bit and 4-bit USR may find its application in processors with high speed of operation, and it may be extended to design n-bit URS using QCA.

Finally, Das and De [23] have proposed an optimized design of shift register based on QCA using a new QCA layout of D flip-flop. The QCA structure of 3-bit Serial-In-Serial-Out (SISO) shift register is realized by cascading three QCA-based D flip-flops. The output of one flip-flop is also utilized as an input to next flip-flop. All the flip-flops are acted by a shared clock, and all are also set or reset simultaneously. The proposed shift register also outperforms the existing designs by reducing cell count and increasing density. The power consumption is performed by the proposed design to show the low energy consumption of the circuit. Defects may occur in three conditions in the proposed shift register, including single missing cell, single additional cell, and misalignments of the cell. They result in producing defective cells in the substrate.

Table 1 shows the summarization and comparison of the most important advantages and disadvantages of the discussed QCA-based shift registers.

Table 1 Summarization of the discussed QCA-based register and their advantages and disadvantages

3 Proposed Design

The USR as one of the most applicable and important electronic circuits is based on the QCA technology. The 4×1 multiplexer and D-FF are the essential modules in designing USR. Left and right shift are performed by the shift register, and the multiplexer is employed to select one operation at a time. Recently, great progress has been made in the molecular manufacturing of QCA in which each QCA cell is a molecule. Defects can occur in both phases of chemical synthesis and deposition phase during manufacturing. The defects occur in the deposition phase more than chemical synthesis phase, which causes to create defective cells in the substrate. In the USR, the faults can occur at two places: 2 \(\times \) 1 Multiplexer and D-FF. The RMG is made by a symmetrical rotation of up to 45 of the inputs and output around the central cell, which does not affect the performance of the majority gate and is completely robust. This gives an important degree of freedom for synthesizing designs based on QCA, as RMG can be used as the Original Majority Gate (OMG) block. Few studies are performed to investigate the properties of QCA-based fault designs, and a comprehensive comparison between the OMG and the RMG is given in [18, 25]. The comparison shows that the RMG has higher fault tolerance capability. It is based on the fact that when the rotation of all input and output cells around the center cell, electrostatic revulsion among the electrons of the cells, the inner cells influence the outcome to the durable polarization. However, the OMG is more dependent on the middle input (B) than the other inputs both regarding displacement and misalignment. But, in the RMG, this dependency can be entirely changed according to the degree of rotation. A design that makes use of the RMG instead of OMG is proposed to make a shift register more efficient with fault tolerant. The RMG and the original device have the same logic-level behavior. Two 3-input QCA majority gates, which are the OMG and the RMG, are depicted in Fig. 1a and b, respectively.

Fig. 1
figure 1

Two types of 3-input majority gate, a OMG, b RMG [17]

Timing in QCA is performed by clocking at four distinct and cascaded phases to synchronize the QCA cells. This strategy is used to facilitate adiabatic switching and thus a reliable circuit [26]. Not only does clocking control the information flow but also supplies the real power in QCA [27]. Each clock signal has a phase shifted by \(\pi /\)2. The four clock phases are a switch, hold, release and relax as shown in Fig. 2. During the switch phase, the cell polarization process starts by raising barriers and continues until the cell becomes polarized. During the hold phase, the barriers remain to their greatest extent, fix the polarization state of QCA cell and affect its neighbor. During release phase, the barriers are lowered, and reduction in the cell polarization occurs. Finally, in the relax phase, the cell barriers remain in their lowest extent, and the cell becomes unpolarized [28,29,30].

Fig. 2
figure 2

Four phases of the QCA clock [31]

In the QCA memory architecture memory must be kept in motion in which the value of stored data moves through different cells. So, D-FF is used in the proposed design because of its simplicity. The basic D-FF of this architecture is shown in Fig. 3. The data bit is stored in a loop until the CLK control signal is low. When CLK increases, the input bit is stored in the loop. The right AND gate is called an enable gate and works independently from the rest of the circuit. The truth table of D-FF is shown in Table 2.

Fig. 3
figure 3

Basic D-FF

Table 2 D-FF operation

The D-FF is performed in this article using 62 cells and in an area of 0.10 µm2. Figure 4 shows the layout of the D-FF according to its operations in Table 2. It has is 1.75 clock cycle (7 phases) delay by considering its performance.

Fig. 4
figure 4

Proposed D-FF implemented in QCA

Also, a multiplexer lets a system to choose one of the several input signals and forward it to the output. It is used as a switch because of this capability. The signal selection forwarded to the output of the multiplexer is made by the selection lines. The 4× 1 multiplexer has been implemented with three 2×1 multiplexers and used as a module. The signals from In1 to In4 are the four input signals and the selection lines S0, S1 are used to select one of the four inputs. Figure 5 represents the 4×1 multiplexer modular block implementation in the QCA that is implemented by applying modules of the 2×1 multiplexer. The logical functionality is as follows: If the S0, S1 rails are 00, 10, 01, 11, the outputs are set to In1, In2, In3, In4.

Fig. 5
figure 5

4× 1 multiplexer modular block implemented in QCA

Figure 5 represents the implementation of the QCA-based 4×1 multiplexer in an area of 0.26 µm2 and 161 cells. In Fig. 6, the delay of the 4×1 multiplexer is equal to 2.75 clock cycles (11 phases). As specified in the figure, In1 to In4 imply the multiplexer inputs and S0 and S1 show the selector lines of the 4×1 multiplexer. The S0 and S1, as one of the four inputs, are selected and transmitted to the output based on the selection lines.

Fig. 6
figure 6

Proposed 4 to 1 multiplexer implemented in QCA

A fault-tolerant USR is explained in the rest of this section using QCA technology. The S1 and S0 control the different operations of the registers. The list of different actions with their corresponding combinations is shown in Table 3. If the outputs of the D-FF are available, then the serial input data may be outputted in parallel from D-FF output by shifting. If the register can be shifted in two directions and loaded parallel, it is mentioned as a USR. The shifting operation is activated one at a time i.e. either right or left shift can be performed at a time. This is decided by the 4×1 multiplexer circuits. Thus, the block diagram of a 2-bit USR consisted of two 4×1 multiplexers, and two D-FF is observed in Fig. 7.

Table 3 Different mode of operations of 2-bit USR
Fig. 7
figure 7

Block diagram of 2-bit USR

Two 4×1 multiplexers have two common selectors (S0 and S1). When S1S0 = ‘00’, the present value of the register is applied to D-FF inputs. This status creates a path from each D-FF output to its input. Therefore, the previously stored value of D-FF is increased by transmitting a signal. When S1S0 =‘01’, the input In2 of multiplexers has a path to D-FF inputs. This causes right shift operations with the serial input transferred into In2. When S1S0 =‘10’, a left shift operation occurs and the other serial input is transmitted to In1. Ultimately, when S1S0 =‘11’, the binary data on the parallel input lines is transferred into the register simultaneously. Figure 8 shows the performance of the presented QCA-based 2-bit USR in an area of 1.45 \(\mu \textit {m}^{2}\) and 769 cells. Maximum 6.25 clock cycles (25 phases) is needed to get the first output irrespective of the S1S0 combination. Coplanar crossover is used for the crossing of wire in the 2-bit USR. The coplanar design uses both regular and rotated types of cells. If they are aligned properly, then these two kinds of cells do not affect each other’s signals. So, it is possible to create a large and simple layout in a single layer.

Fig. 8
figure 8

Proposed 2-bit USR implemented in QCA

4 Simulation Results and Discussion

This section presents the simulation results of the proposed USR circuit. The simulation tools, simulation parameters, accuracy analysis and comparisons with existing layouts are discussed in this section.

4.1 Simulation Tool

QCADesigner is used to create an accurate simulation and layout tool for QCA. It can simulate complex QCA circuits on most standard platforms [32]. Now, QCADesigner has three distinct simulation engines available. Each of the three engines has a diverse and important set of advantages and disadvantages. The first is a digital logic simulator in which cells are considered to be either null, logical one, or logical zero. A nonlinear approximation engine as a second engine uses the nonlinear cell-to-cell response function to define the stable state of the cells within a design. The third uses a two-state Hamiltonian to realize the full quantum mechanical model. Therefore, it is used to simulate the proposed design.

4.2 Simulation Parameters

The designs of the proposed layout in the previous section have been performed on QCADesigner Ver. 2.0.3 and are simulated using the bistable approximation simulation engine with default parameters. A short explanation of each parameter used for a simulation engine is presented in Table 4.

Table 4 Parameters model in the QCADesigner simulator

4.3 Accuracy Analysis

Figure 9 depicts the simulation of D-FF in QCADesigner. It takes 1.5 clock cycles that results get output correctly. When EN is “1”, the output is enabled and when EN is “0”, the output is “0”. When CLK is “1”, write state is enabled and data value save in memory loop and when CLK is “0”, read state is enabled and saved a bit is placed on output. The result is confirmed via the truth table as shown in Table 2. This confirmation shows the accuracy of the circuit.

Fig. 9
figure 9

Simulation results of D-FF

Figure 10 shows the simulation of the 4×1 multiplexer. The delay is 2.75 clock cycles according to the simulation results. Four waveforms with different frequencies are applied to inputs In1, In2, In3 and In4 and the multiplexer outputs the signal at In1 when select bus S1S is at 00 and outputs the signal at In2 when S1S is at 01. The multiplexer outputs the signal at In3 when S1S is at 10 and outputs the signal at In4 when S1S is at 11.

Fig. 10
figure 10

Simulation results of the 4 to 1 multiplexer

To confirm the operation of the 2-bit USR, the bit string is implemented and simulated for all combinations of the EN and CLK inputs of the D-FFs and selected bus (S0S1) of 4 to 1 multiplexers. The simulation result of the 2-bit USR is shown in Fig. 11. It can be seen from the simulation result that when S1S0 = 11, the 2-bit USR performs parallel load operation with a delay of 6.25 clock cycles and when S1S0 = 00, the previously stored data are fed to the D-FF through the feedback path. The parallel load input is 2, 0, 2, 3, and 1. The simulation output shows that the output is the same as the input when S1S0 = ‘11’. When S1S0 = ‘00’, the previously stored data of register are latched to D-FF inputs with the control line combination. The maximum delay is 6.25 clock cycle.

Fig. 11
figure 11

The simulation of the 2-bit USR, when S1S0=“11 & 00”

Figure 12 shows the simulation result of right shift operation when S1S0 = ‘01’. The serial input ‘110101’ is applied to the input line 1 (SR) of the leftmost multiplexer, and then the simulated output is observed from Out2 and Out1, respectively. Out2 has 6.25 clock cycles delay and Out1 has 11.25 clock cycles delay. Also, the left shift operation occurs with the serial input string ‘110101’ and with the select line combination S1S0 = ‘10’. The input is applied to the serial input line 1 (SL) of the right multiplexer. The first simulated output is collected from Out1 with 6.25 clock cycles delay and then from Out2 with 12.25 clock cycles delay. Figure 13 shows the left shift operation of 4-bit USR.

Fig. 12
figure 12

The simulation of the 2-bit USR, when S1S0 =“01”

Fig. 13
figure 13

The simulation of the 2-bit USR, when S1S0 =“10”

4.4 Comparisons

The RMG accurate functionality is in the face of cell misalignment and cell displacement faults. In kind of cell misalignment, the defective cell is not properly aligned with its neighboring cells. In other words, the defective cell gets nearer to one or some cells and away from the others. When a cell is not placed in its original direction, cell displacement fault occurs. A program which imposes some cell displacement defects in the QCA circuit randomly has been generated to evaluate the proposed design. The program receives the layout file of QCA circuit generated by QCADesigner and some of the defects as inputs. Then, it displaces some cells randomly. The output file is the desired number of cell displacement defects. For each number of defects, the program has been performed 50 times, and the fault tolerance is estimated. The amount of cell displacement is assumed as 7 nm. Only the output, which functions true and reaches the required maximum signal level is considered as true output. In the following, some cell misalignment defects are imposed in the QCA circuit randomly to assess

circuit in the face of cell misalignment defect. Table 5 presents a comparative study of USR proposed in this paper with that proposed in [16, 22] regarding fault tolerance properties.

Table 5 A comparative study of the USRs in terms of fault tolerance properties

The proposed design based on RMG has fault tolerant attribute regarding misalignment and displacement faults compared to the previously proposed designs as shown in Table 5. Missing cell defect is likely to occur at proposed shift register because it is designed in a single layer and is not applied for cell redundancy. A comparative analysis of the proposed design is performed considering the previously proposed designs. The comparison of USR presented in this paper with other papers presented in [16, 22] is shown in Table 6 regarding area, complexity, and delay.

Table 6 Comparison between 2-bit USR presented in this paper with previous designs

In the compared structures the multilayer design is used in [22], and the signal distribution network (SDN) method is used in [16] for wiring. There exist tree techniques for wiring 90 to 90 in the proposed structure: multilayer crossover, Signal Distribution Network (SDN) and wire-crossing using the difference of clock phase. The multilayer crossover technique uses a crossover bridge method. This technique is constructed by adding more layers, and the QCA signal passes through the upper layer. Using the multilayer crossover the number of cells increases in crossover area but delay remains unchanged. SDN block can accept an arbitrary number of inputs and yield an equally arbitrary number of distributed signals. So using the SDN method the implementation of proposed design divided into two parts: signal distribution network and combinational logic gates. This method requires a diversity of clocking regions that range in size from a single cell up to dozens of cells. Hence, it increases the number of cells and delay. Another type of coplanar wire crossing that uses 90 cells is based on the difference of 180 degrees between two non-adjacent clock regions. Moreover, the additional tasks which are the rotation, translation of QCA cells, and the consideration of the number of sub-layers are not necessary. Using this method the number of cells remains unchanged, but the delay is increased. The reason for increasing the number of the clock in the proposed structure is because clocking has to be controlled using this method.

5 Conclusion and Future Work

Fault-tolerant design of QCA logic circuits is necessary to replace CMOS technology. This paper presents, a new extendable and efficient design of 2-bit USR with fault tolerance attribute in QCA technology using an efficient 4×1 multiplexer and D-FF implementation to perform four actions: Remaining in an unchanged state, shift to the right, shift to left and parallel load. The 4 \(\times \)1 multiplexer and D-FF have been used as the basic modules to design USR. This study explores a fault tolerance method in designing USR by replacing RMG instead of OMG. Hence, a robust configuration is provided for cells due to its tolerance to misalignment and displacement defects. This new structure is an optimal design compared to previous works. A significant improvement has been achieved according to the comparisons and evaluations regarding the numbers of defects imposed by the circuit. But, the proposed design cannot tolerate the performed missing cell defect. Finally, the comparison of area, complexity, and delay with other structures for 2-bit USR based on QCA is done in this paper.

The high robustness of this USR can be used to assign the most commonly utilized fault-tolerant arithmetic circuits in the future. These circuits are the building block of nano processors in which the nanodevices are provided for the future. Moreover, it may be extended to the n-bit USR.