1 Introduction

The demand for low-power circuits is rising in modern applications such as wireless sensor networks (WSNs), internet of things (IoT), implantable biomedical devices, and other battery-operated portable devices, due to limited access to energy resources [1,2,3]. With the prediction of the scientist G. Moore that the transistor count per chip would be quadruple every three years, people started running behind this prediction, leading to the miniaturization of CMOS devices to nano-regime [4]. For decades, the CMOS has ruled the electronics market, but as the dimensions are scaled, it changed the entire scenario of the semiconductor industry. The CMOS, which was the key role player of the market, suffers from several severe degradations due to the shrinking of the size. As the dimensions are shrunk, the current controlling ability of the gate is affected by several adverse effects known as short-channel effects (SCEs) [5]. These effects are responsible for threshold voltage variations, which lead to various other issues like leakage currents including both drain-induced barrier lowering (DIBL) and subthreshold. When the control of the gate in the channel region is affected by the electric field from the drain and source nodes results in short-channel effects [6].

The SCEs present a barrier to enhance SRAM density and decrease power dissipation. Static random-access memories (SRAMs) are one of the key circuits for various handheld devices and computing devices; they occupy a large proportion of the total space of system-on-chips (SoCs) due to their repetitive structures and excellent logic performance, lowering SRAMs’ power consumption can lower the SoC’s overall power consumption [7]. To improve the performance of the circuit, low power can be achieved by lowering power supply voltage (VDD), as static and dynamic power consumptions reduce linearly and quadratically, respectively, with VDD reduction [8]. However, as VDD drops, the static noise margin (SNM) degrades. This is due to a narrowing of the gap between VDD and subthreshold voltage (Vth), as well as a direct relationship between Vth and SNM [9, 10].

These limitations decrease the performance of CMOS SRAM, making it unsuitable for modern low-power applications. As a result, conventional CMOS has been replaced with a fin-shaped field-effect transistor (FinFET) due to its superior features, such as improved gate control and subthreshold slope [5, 11]. However, using FinFET devices to design a traditional 6-transistor (6T) SRAM cell has not solved the issues with the SRAM cell. The 6T SRAM suffers highly from the read-disturbance issue induced by voltage division between pull-down transistor and access transistor during the read operation, resulting in poor read SNM (RSNM) [12]. Writability of the 6 T SRAM cell in terms of write SNM (WSNM) is degraded at severe low-VDD, as it cannot maintain the cell ratio [13]. Furthermore, bit-interleaving (BI) architecture cannot be employed for the 6T SRAM cell to reduce multi-bit upset and to increase soft-error immunity. This is because the 6 T SRAM cell experiences half-select-disturbance issues [14]. These mentioned issues forced the designers to employ new techniques in SRAM design to overcome the challenges related to the 6T SRAM cell.

To improve RSNM, isolating fully the storage nodes from the bitlines during the read operation can be useful, as performed in [2, 5, 12, 14,15,16,17,18], at cost of the higher area, control signals, or bitlines. Furthermore, the employment of a separate read path introduces additional leakage, which intensifies as technology shrinks. Conventional 8T SRAM suffers from this issue. SRAM cells proposed in [17, 19, 20] uses modified isolated read paths, in which data are independent of read port. In [21], an 9T SRAM cell was proposed, which improves the RSNM and minimizes static power, but consumes higher dynamic power, attributed to dual-ended bitlines. The efficient way to reduce dynamic power, as well as total power, is to use of single-bitline structure in SRAM design [2, 14]. It reduces bitline activity factor to less than half as well as leakage and area. This technique, on the other hand, degrades the reading/writing speed and writing ‘1’ ability [2, 14]. Therefore, a write-assist technique is necessary to avoid write failure. SRAM suggested in [16, 18, 22, 23] utilize an NMOS or PMOS transistor, located inside the latch core, to cut the feedback path off during the write operation, resulting in WSNM enhancement. But this reduces writing speed, which can be explained with the formation of two cascaded inverters, in which one of the inverters is followed by another one. Exerting transistors with different sizes can increase the cell’s WSNM, but sizing is not an effective solution in FinFET technology due to the width quantization [5, 23]. A 12T SRAM using FinFET devices at 14-nm is proposed for subthreshold operation in [24]. They have used a separate method for writing and one embed bit line for reading purposes. This improved the static noise margins and the cell can operate in subthreshold operations without any error. Authors in [25] proposed a 13T SRAM cell with improved power and speed, which is free from half-select issues. The cell does not require a write bit line, and thus, improves the overall performance. A low-power robust 9T SRAM cell is discussed in [26], which uses a single bit line for read and write operations and is a variation tolerant cell designed at 16-nm technology for subthreshold applications. To prove the robustness of the circuit, it has been compared with several SRAM cells.

In this paper, we aim to design a novel SRAM cell to meet the issues of stability, power, and robustness for modern applications like WSNs. The proposed single-bitline 9T (SB9T) SRAM cell designed with FinFET devices offers the following outcomes: (1) Appropriate for the low-power near-threshold operation, (2) Improved RSNM by employment of the isolated read path from the latch core, (3) Enhanced WSNM by cutting the feedback of cross-coupled inverters pair off during the write operation, (4) Reduced bitline activity factor by using only one bitline for performing both the read and write operations to reduce dynamic power and as well as overall density, (5) Further reduced dynamic read power by non-precharge operation in the read mode, (6) Minimized static power dissipation by stacked transistor in the left inverter of the cell core, single-bitline structure, higher count of p-type devices, and virtual ground (VGND) signal maintained at VDD, (7) Eliminated both the read and write half-select-disturbance issues to support BI architecture, (8) Good stability with minimum sizes of FinFET devices, (9) Improvement in most of the performance metrics by employing nine transistors (less area overhead), and (10) Elimination of half-select issues to supporting bit-interleaving architecture to reduce multiple-bit upset and enhance soft-error immunity.

The rest of the paper is as follows. Section 2 reviews the previous SRAM cells design. The proposed SB9T SRAM cell is introduced in Sect. 3. Section 4 presents the SRAM performance and results and its analysis. Finally, Sect. 5 concludes this study.

2 An Overview of Existing SRAM Cells Design

This section reviews previously published SRAM cells with their pros and cons, which have been considered for comparison.

2.1 Write–Read-Enhanced 8T (WRE8T) SRAM Cell

The write-read-enhanced 8T (WRE8T) SRAM cell utilizes one bitline for the read operation and another bitline for the write operation to reduce dynamic power, as shown in Fig. 1a [9]. The cell uses a power-gating write-assist technique for the left inverter (storage node Q is an input) to decouple the storage node QB from the power rails, VDD and GND, during the write operation to enhance WSNM. The noise of the read bitline affects the storage node Q during the read operation due to the lack of read-decoupling technique, therefore, resulting in RSNM degradation. To mitigate the half-select-disturbance issues, an efficient write-back technique has been proposed in which data of the half-selected cells is first read by their read bitline, and then, it is put on the write bitline of those half-selected cells by three n-type transistors and an inverter to restore the original data. This technique, representing a read operation before the execution of a write operation, increases power consumption.

Fig. 1
figure 1

Schematic diagram of the previous SRAM cells under investigation. a WRE8T [9], b TGRD9T [22], c ST9T [27], d DIRP10T [17], e PPN10T [28], and f FC11T [19]

2.2 Transmission Gate Read Decoupled 9T (TGRD9T) SRAM Cell

Figure 1b shows the schematic of the transmission gate read decoupled 9T (TGRD9T) SRAM cell [22]. This cell employs separate bitlines for performing the read and write operations. This results in reduced bitline activity factor during the read or write operation, and therefore, dynamic power consumption decreases. An n-type transistor has been placed inside the cell core to cut the feedback path off during the write operation to improve WSNM. Moreover, a decoupled read path has been employed to increase RSNM. Authors have eliminated the half-select-disturbance issues by adjusting transistors’ size in which pull-down transistors have the widest width. This increases static power dissipation and reduces cell density.

2.3 One-Sided Schmitt-Trigger 9T (ST9T) SRAM Cell

Authors in [27] have used a strong cross-coupled structure composed of a conventional inverter with stacked transistors in both pull-up and pull-down networks and a Schmitt-trigger inverter. The designed one-sided Schmitt-trigger 9T (ST9T) SRAM cell (see Fig. 1c) still suffers from read-disturbance issues, resulting in RSNM reduction. The power-gating write-assist technique employed in this design cuts the power rails, VDD and GND, from the storage node Q, and therefore, increases WSNM. To support BI architecture, half-select disturbance issues have been mitigated by adjusting the width of control signals WWLA, WWLB, and WL.

2.4 PMOS-PMOS-NMOS-Based Cell Core 10T (PPN10T) SRAM Cell

The PMOS-PMOS-NMOS-based cell core 10T (PPN10T) SRAM cell, shown in Fig. 1d, uses a single-ended reading structure and differential writing structure [28]. The decoupled read path improves RSNM at the expense of leakage introduction. The stacked transistors presented in the cell core increase hold SNM (HSNM) as well as RSNM. These transistors, on the other hand, are responsible for static power dissipation reduction and write delay increment.

2.5 Data-Independent Read Port 10T (DIRP10T) SRAM Cell

Figure 1e illustrates the schematic of the data-independent read port 10T (DIRP10T) SRAM cell with a single-ended reading structure and fully differential writing scheme [17]. This cell performs its write operation like the conventional 6T SRAM cell. The isolated read path utilized in this design increases RSNM and reduces leakage. This is because the read path is independent of the data. However, this path has formed by three n-type stacked transistors, resulting in read delay increment. The half-select-disturbance issues have not been eliminated in this design.

2.6 Feedback-Cutting 11T (FC11T) SRAM Cell

The single-ended feedback-cutting 11T (FC11T) SRAM cell (see Fig. 1f) employs only one bitline for the execution of both the read and write operations [19]. The RSNM and WSNM are improved with the aid of the read-decoupling technique and feedback-cutting write-assist scheme, respectively. The decoupled read path, formed by three stacked transistors, reduced read current. The bitline should be discharged for both the write ‘0’ and write ‘1’ operation, which increases dynamic write power.

3 Proposed SB9T SRAM Bitcell Structure

Figure 2 shows the schematic diagram of the proposed SB9T SRAM cell appropriate for the near-threshold operation. The latch core of the proposed design is composed of two cross-coupled conventional inverters (M1 to M5). The transistors M1 to M3 from the left inverter, gated by the storage node QB, and the transistors M4 and M5 form the right inverter, gated by the node Q2. The true storage nodes Q and QB store the data and its complement. A low-Vth (SLVT model) p-type transistor (M6) is placed inside the cross-coupled structure, which is between the input of the right inverter and the output of the left inverter. This transistor, gated by the feedback-cutting line (FCL) signal, controls the feedback path. The proposed SB9T SRAM cell employs only one bitline (BL) to perform both the read and write operations. The read operation is controlled by the utilization of the read-wordline (RWL) signal. To perform a read operation, the RWL is kept at high logic level (VDD). The virtual ground (VGND) signal, connected to the source of the M8 transistor, is grounded only during a read operation to write a “0” on the BL. Anytime else, it sets to VDD to restrain an unnecessary leakage current in half-selected cells. The write-wordline (WWL) signal, on the other hand, controls the write operation, and it should be set to high logic level (VDD) to execute a write operation. The FCL is maintained at VDD during a write operation to cut the feedback path off. This facilitates the write operation in the proposed single-ended SRAM cell. During the hold mode, the RWL, WWL, and FCL are all pulled-down and the VGND is pulled-up. This removes both the read and write paths and establishes the feedback path. So, the data will be maintained by the latch core. The status of the various control signals used in the proposed design at different operational modes is given in Table 1.

Fig. 2
figure 2

Schematic diagram of the proposed SB9T SRAM cell

Table 1 Control signals of the proposed SB9T SRAM cell

In the proposed design, considering one single-bitcell, two access transistors M7 and M9 have been connected to the same bitline BL. This increases the overall bitline capacitance. As we know, the major contributor to the bitline capacitance is the bitline wire and the parasitic capacitance of FinFET, on the other hand, is not significant as MOSFET. Therefore, this increase in the overall bitline capacitance is less than 10 percent for every 210 cells [23, 29]. In the suggested cell, an isolated read path and a feedback-cutting write-assist technique have been utilized to improve the cell’s read stability and writability, respectively. Therefore, a minimum-size transistor can be used to reduce the area occupied by the SRAM.

4 Simulations Results and Comparisons

4.1 Simulation Setup

In this section, the performance of the proposed SRAM design is evaluated and estimated by utilizing the HSPICE software and the 7-nm tri-gate FinFET technology [30]. The tri-gate FinFET is a thin-film, narrow silicon island with a gate on three of its sides. It provides a symmetric device architecture where the channel is controlled by the gate from three sides of the Si film. Since the gate control is increased, the scaling of the Si film thickness in tri-gate FinFET is better implemented. Moreover, in the tri-gate FinFET, the gates are electrically connected and the metal gate is used in place of the polysilicon gate. The use of a metal gate eliminates the poly-depletion problem of polysilicon gates. It increases carrier mobility by reducing the transverse electrical field at a given gate overdrive. In the 7-nm tri-gate FinFET technology, fins are 32 nm in height and 6.5 nm thick, on a 27-nm pitch. The fins are drawn at 7-nm width, with 20-nm spacing, although the actual fin physical dimension is 6.5 nm. Gates are drawn with a 20-nm gate length to stay on a 1-nm grid, while the actual length is 21 nm [31]. A replacement high-K metal gate process follows the trend through 14-nm processes. Gates are uniformly spaced on a grid with a contacted poly-pitch (CPP) of 54 nm. To accommodate the CPP scaling the spacer thickness is assumed to decrease 1 nm at each node from 14 nm to 7 nm. Spacer formation follows poly-gate deposition, allowing the use of low-k material in one spacer layer. Cutting gate polysilicon with the gate cut mask in a manner that keeps the spacers intact, with a dielectric deposition following, ensures that fin cuts are buried under gates or the gate cut fill dielectric, so source/drain growth is on full fins [31]. Table 2 lists some parameters of the FinFET technology used for simulations. To achieve a relative estimation of the proposed SRAM’s performance, its various design metrics such as the RSNM, WSNM, read/write delay, dynamic read/write power, static power, and area are compared with other published SRAM designs. This includes the conventional 6T, write-read-enhanced 8T (WRE8T) [9], transmission gate read decoupled 9T (TGRD9T) [22], one-sided Schmitt-trigger 9T (ST9T) [27], data-independent read port 10T (DIRP10T) [17], PMOS-PMOS-NMOS-base latch core 10T (PPN10T) [28], and single-ended feedback-cutting 11T (FC11T) [19]. Some important features of these designs are listed in Table 3. To have fair and meaningful comparisons, all the aforementioned designs, as well as the proposed design, are examined in a 4 kb array (64 × 64-bits word) with the interconnect capacitance of 0.16 fF/µm [5, 7, 27]. Note that all the studied SRAMs are redesigned and re-simulated by the 7-nm FinFET technology in the near-threshold supply voltage of VDD = 0.5 V and 25˚C room temperature. Table 4 summarizes all the best simulation results for the SRAMs under investigation.

Table 2 Some important parameters of the utilized 7-nm FinFET technology in typical corners
Table 3 Cell features comparison
Table 4 Comparison of the studied SRAMs based on different design metrics at VDD = 0.5 V

4.2 Read Performance Analysis

4.2.1 Read Operation

The proposed SB9T SRAM cell does not need a precharge operation as it can write both the ‘0’ and ‘1’ logic values on the BL, which can be distinguished by an amplifier. To perform a read operation, first, the RWL is asserted and the VGND is grounded, and then, the data will be written on the BL through the paths of M7-M8-VGND or M3-M7. The pseudo-node Q (PQ) is the drain of both the p-type transistor M3 and the n-type transistor M8, passing strong ‘1’ and ‘0’ logic values, respectively, and therefore, it can be charged or discharged a large BL capacitance. Assume that Q and QB nodes store ‘1’ and ‘0’ logic values, respectively. As the QB = ‘0,’ the pull-up network of the left inverter (M2 and M3) is turned on and the BL is charged by the path comprising M3-M7. However, due to the presence of the n-type transistor M7 on the read path, passing weak ‘1’ logic value, the BL will be charged to “VDD – Vth-M7.” However, this value can be distinguished by an amplifier. Thus, a keeper circuit such as the positive feedback sensing keeper proposed in [32] can be utilized to enhance the read performance. As a result, the proposed SRAM design accomplishes a read ‘1’ operation. Now, let us consider a ‘0’/‘1’ is stored at internal storing node Q/QB. As the Q = ‘0,’ the transistor M8 is enabled and a ‘0’ can be written on the BL through the path of M7-M8-VGND as the VGND signal is kept at GND in this mode. Figure 3 shows the read ‘0’ and ‘1’ operations of the proposed SB9T SRAM cell.

Fig. 3
figure 3

a Read ‘1’ and b Read ‘0’ operation of the proposed SB9T SRAM cell

4.2.2 Read Stability and Its Variability

The proposed SRAM design eliminates the read-disturbance issue, and as a result, the RSNM is as wide as HSNM. This is because the data storing node Q is fully decoupled from the BL by the M7 and M2 (it is inserted between the true storage node Q and node PQ) during the read ‘0’ operation, and therefore, the read current (the current required for discharging the large capacitance of BL) never flows through the storage node Q, but through the bypassing M8. This is the main reason why the proposed SRAM design is read-disturbance-free, therefore, resulting in RSNM enhancement. Figure 4 shows the read butterfly curve of the proposed SB9T SRAM as well as other SRAMs considered in this study for comparison. To extract the butterfly curve for the proposed SRAM, a DC source voltage is injected into the input of the right inverter, which is swept from ‘0’ to ‘VDD,’ then monitoring its output. The voltage transfer characteristic (VTC) of this inverter is plotted. The same process is repeated for the left inverter, and its VTC is plotted on the same plot window. For RSNM comparison, the worst-case RSNM has been measured, which is the side length of the biggest square inscribed inside the smaller wing of the read butterfly curves [33]. The conventional 6T and WRE8T SRAMs do not employ isolated read paths for fully decoupling the internal storing nodes Q and QB from the bitlines, and therefore, the read current flows through the path including the internal storing nodes. Consequently, these SRAMs suffer highly from the read-disturbance issue. This is the reason why these designs show the least RSNM among all the SRAMs. The ST9T SRAM experiences the read-disturbance issue during the read ‘0’ operation as the current to discharge its bitline’s capacitance flows through the storage node Q. However, it offers a higher RSNM compared to the above SRAMs. This can be explained by the strong cross-coupled structure of a normal inverter and a Schmitt-trigger inverter. The Schmitt-trigger inverter provides a sharp-VTC compared to the conventional inverter, and then, an increase in the Q node’s voltage never reaches the trip voltage of the Schmitt-trigger inverter for flipping the cell’s content, mitigating the read-disturbance issue [27]. In other SRAMs, both the storage nodes Q and QB are completely isolated from bitlines during the read operation, resulting in RSNM improvement. In these SRAMs, the RSNM is as wide as HSNM. However, the best RSNM is related to the PPN10T SRAM owing to the presence of stacked transistors in its latch core [28]. It improves the inverter’s VTC, therefore, increasing the noise margin. Generally, the proposed SB9T SRAM offers 1.77 × /1.03 × higher/lower RSNM when compared with WRE8T/PPN10T SRAM. Recent studies have shown that an SRAM with RSNM of at least 25% of VDD is highly stable [14, 15]. In this respect, the proposed SRAM exhibits high stability as it has an RSNM equal to 42.40% of VDD.

Fig. 4
figure 4

Read butterfly curves of the studied SRAM designs at VDD = 0.5 V

In nanoscale devices, the impact of process variations on SRAM’s performance becomes more significant. The SNM is the most important design metric of an SRAM that can be degraded substantially with severe process variations [23]. To study the performance of the SRAMs under investigation when are subjected to harsh process variations, their SNM distribution plots during the read operation are plotted and shown in Fig. 5. To extract these plots, the Monte Carlo (MC) simulations with 5,000 iterations have been conducted to analyze process variations effects. Modifications in the manufacturing process parameters can be split into two categories including global variation and local variation. Global change in channel length, fin width, and fin height are considered Gaussian with 3σ = 10% of their nominal values, and 3σ = 5% of the nominal value of gate oxide thickness. Furthermore, local change in channel length and fin width is considered Gaussian with 3σ = 5% of their nominal values [5]. It is obvious from Fig. 5 that the 6 T/WRE8T SRAM shows the highest variability (variability is defined as the standard deviation to mean ratio of a given parameter [16]) due to the read-disturbance issue. The read-decoupling technique employed in TGRD9T/DIRP10T/FC11T/SB9T SRAM eliminates the read-disturbance issue, and therefore, this SRAM offers 1.26 × lower RSNM variability than that of the 6T/WRE8T SRAM. The best and second-best RSNM variability are related to the PPN10T and ST9T, respectively, due to the isolated read path and stacked transistors in the cell core of the former SRAM and the utilization of the strong latch core composed of normal and Schmitt-trigger inverters in the latter one.

Fig. 5
figure 5

RSNM distribution plots of the studied SRAMs at VDD = 0.5 V. a 6T/WRE8T, b ST9T, c TGRD9T/DIRP10T/FC11T/SB9T, and d PPN10T

4.2.3 Read Delay

SRAM’s access time has a direct relation with bitline discharging swiftness. Figure 6 compares the worst-case read delay of the various SRAM designs. The read delay is measured as the time required for 50-mV development between both the bitlines (BL and BLB) after the word line (WL) being asserted in the case of differential reading SRAMs [14] and the time required for bitline discharge to half of VDD in the case of single-ended reading SRAMs [34] (or charging the bitline to “0.2 × VDD” [23]). The ST9T, DIRP10T, and FC11T SRAMs show the same highest read delay among all the SRAMs considered for comparison. This is because the read path in these designs is formed by three series-connected n-type transistors, resulting in the reduced read current. The single-ended reading operation in the above-mentioned SRAMs further increases the read delay. Due to single-ended reading operation as well as the existence of two stacked transistors in their reading path, the WRE8T, TGRD9T, and PPN10T SRAMs exhibit the same lower read delay than those of the aforementioned SRAMs. The conventional 6T SRAM has the best read delay attributed to its simple differential structure with only a single n-type access transistor. The proposed SB9T SRAM sometimes has to charge its bitline BL through the path in which the n-type transistor M7 exists. This is the reason why our suggested SRAM offers a higher read delay (1.28 ×) than that of the TGRD9T SRAM. However, the read delay is reduced by 1.17 × when compared with ST9T SRAM at VDD = 0.5 V.

Fig. 6
figure 6

Read delay comparison at VDD = 0.5 V

4.3 Write Performance Analysis

4.3.1 Write Operation

The write operation of the proposed SB9T SRAM cell is shown in Fig. 7. The data are to be written to the cell is applied to the BL, and then, the WWL is pulled-up. At the same time, the RWL is grounded and both the FCL and VGND are kept at high logic level (VDD). This removes the reading and feedback paths. As the FCL = ‘1,’ the cross-coupled inverters pair is turned into two cascaded inverters in which the left inverter (M1 to M3) is followed by the right inverter (M4 and M5). During the write operation, the data ‘1’ or ‘0’ on the BL are transferred to node Q2 by the path through the write-access transistor M9. This switches the right inverter, and then, the storage node QB is updated to ‘0’ or ‘1.’ Finally, a ‘1’ or ‘0’ by the left inverter appears at the storage node Q, and the write operation is completely accomplished.

Fig. 7
figure 7

a Write ‘1’ and b Write ‘0’ operation of the proposed SB9T SRAM cell

Fig. 8
figure 8

a WSNM and b WM of the studied SRAM designs at VDD = 0.5 V

4.3.2 Write-Ability and Its Variability

The proposed SB9T SRAM design eliminates the writing ‘1’ issue in single-ended SRAMs as it utilizes the feedback-cutting write-assist mechanism. So, the write operation is facilitated, and consequently, the WSNM is improved. To prove it, Fig. 8a shows the WSNM of the studied SRAMs for writing ‘1’ as it is the worst-case process in the proposed SB9T SRAM design. WSNM is graphically estimated by using the read VTC obtained in the previous section in combination with write VTC. The write VTC, while writing ‘1’ to the storage node Q is plotted by sweeping the DC source voltage injected into the input of the right inverter from ‘0’ to ‘VDD’ with BL, VGND, and WWL high, and RWL low, and then, monitoring its output. The side length of the minimum square that can be embedded between and lower half of these curves gives WSNM [16, 35]. As shown in Fig. 8a, the ST9T SRAM is offering the highest WSNM among all the SRAMs due to the power-gating write-assist technique as well as the strong cross-coupled structure of conventional and Schmitt-trigger inverters. The feedback-cutting write-assist mechanism used in the WRE8T/FC11T/SB9T SRAM improves the WSNM by 2.35 × , 1.30 × , and 17.06 × compared to 6 T, PPN10T, and DIRP10T, respectively. As shown in Fig. 8a, the read and write VTCs converge to a single stable point, which indicates that the cross-coupled inverters of the SRAM bitcell can function as a monostable circuit signifying a successful write operation.

Another metric to estimate the writability of an SRAM is write margin (WM), which is more appropriate than the WSNM based on recent studies [16, 24, 25, 34]. To measure the WM, the desired data is applied on the bitline BL, the wordline WL is swept from ‘0’ to ‘VDD,’ and the difference between VDD and WL voltages in which the nodes Q and QB cross each other [16]. In the differential SRAMs, WM for writing ‘1’ and ‘0’ is the same, while in the single-ended writing SRAMs, the WM for writing ‘1’ is higher than writing ‘0.’ In this study, the worst-case WM of the studied SRAMs has been considered, as shown in Fig. 8b. It is observed that the proposed SB9T SRAM offers 1.17 × /1.07 × /1.15 × higher WM compared to DIRP10T/ST9T/PPN10T SRAM. This improvement is due to the application of the feedback-cutting mechanism and the presence of only one transistor in its write path. The conventional 6T, DIRP10T, and PPN10T SRAMs show almost equal and least WM value owing to the lack of any write-assists technique. Furthermore, we have taken into account the impact of process variations on the WM by conducting the MC simulations. Figure 9 exhibits the WM distribution plots for various SRAMs in which it is clear that the proposed SRAM has the best mean WM and second-best WM variability, offering 1.79 × /1.17 × lower variability in WM compared to DIRP10T/PPN10T SRAM. However, it shows a 1.18 × higher spread in WM in comparison with WRE8T/ST9T SRAM.

Fig. 9
figure 9

WM distribution plots of the studied SRAMs at VDD = 0.5 V. a 6 T/DIRP10T, b WRE8T/ST9T, c PPN10T, and d TGRD9T/SB9T/ FC11T

4.3.3 Write Delay

As mentioned earlier, the proposed SB9T SRAM uses the feedback-cutting write-assist mechanism to facilitate the write ‘1’ operation as well as WSNM/WM. This, in turn, increases the write delay of the suggested SRAM. Figure 10 compares all the SRAMs under investigation in terms of worst-case write delay. The write delay is measured as the time needed for storage node Q (QB) to reach 90% (10%) of VDD right after the wordline WL assertion [14, 34]. Due to the simple differential writing structure coupled with only one access transistor, the conventional 6 T and DIRP10T SRAMs show the same lowest write delay among all the SRAMs. The write delay of the PPN10T SRAM is 1.09 × higher than that of the conventional 6 T SRAM because the stacked p-type transistors in the cell core increase the time required for charging the opposite storage node. The WRE8T and ST9T SRAMs have the same write path as well as power-gating write-assist technique, and therefore, offer 1.73 × higher write delay compared to the conventional 6 T SRAM. This degradation is because these SRAMs are of a single-ended scheme. Owing to the feedback-cutting write-assist technique used in designing single-ended TGRD9T/SB9T/FC11T SRAM, the write delay is 2.09 × /2.27 × /2.36 × and 1.21 × /1.32 × /1.37 × higher than those of DIRP10T and ST9T SRAMs, respectively. This is due to the formation of two cascaded inverters in which one of the inverters is followed by another one, resulting in the reduced writing speed.

Fig. 10
figure 10

Write delay comparison at VDD = 0.5 V

4.4 Mitigation of Half-Select-Disturbance Issues

Figure 11 shows the memory architecture by using the proposed SB9T SRAM bitcell. In this architecture, the RWL and WWL are row-based signals, whereas the BL, VGND, and FCL are column-based signals. Moreover, four SRAM bitcells, representing four different situations during a write operation, are observed. The selected and unselected SRAM bitcells are of normal write and hold operations, respectively, as discussed in previous sections. Here, we discussed and proved that data stored in the row and column half-selected cells are maintained.

Fig. 11
figure 11

Simplified 2 × 2 architecture of SB9T cell during a write operation in the selected cell

When a normal operation (hold/read/write) is performed in the single-SRAM cell, the FCL signal can be replaced with the WWL. Because with pulling-up (down) the WWL to perform a write operation (other operations), the feedback path is cut (established). However, the status of the FCL signal in the half-selected cells differs from the WWL, as shown in Table 5. Assume that the selected cell is performing the writing ‘1’ to ‘0’ storing node Q. As the row-based signal WWL is asserted, the BL is connected to the node Q2 of the row half-selected cell. However, this issue cannot flip the state of this cell. This is because the FCL signal is kept at GND, enabling the transistor M6 to establish the feedback path. This denies any single-ended effort to write ‘1’ to this cell. This can be attributed to the fact that the n-type transistor M9 passes a weak ‘1’ logic value and cannot surpass the pull-down transistor. To write ‘1’ to node Q of this cell, the case in which the feedback path is intact, the transistor M1 does not allow to complete this process. Therefore, it is necessary to cut the feedback path first, the node QB is updated to ‘0,’ and finally, a ‘1’ appears at the node Q. Figure 12a shows the simulated results of various node voltages for row half-selected cell during the write ‘1’ operation in the selected cell for a much longer time than write delay. The node Q2 voltage never reaches the switching threshold of the latch core’s inverters. Therefore, the data in this cell are reversed.

Table 5 Status of various control signals in a row and column half-selected cells during a write operation in the selected cell
Fig. 12
figure 12

Simulated node voltages of a row half-selected cell and b column half-selected SB9T cell while writing ‘1’ to node Q at VDD = 0.5 V

In the column half-selected cell, the column-based bitline BL is set to ‘1’ or ‘0’ depending on what data are to be written to the selected cell. Moreover, the RWL, WWL, and FCL are all grounded, and the VGND is pulled-up. This makes the BL is decoupled fully from the cell and the feedback path is removed. As shown in Fig. 12b, showing the node voltages of the column half-selected cell, the data are maintained by the cell.

Similarly, while performing a read operation in the selected cell, misreading in both the row and column half-selected cells is prevented due to the application of row-based RWL signal and column-based VGND signal.

4.5 Dynamic Power Comparison

Dynamic power is an SRAM’s power, including dynamic read power and dynamic write power, which is dissipated by that SRAM during its read and write operations, respectively. The dynamic power consumption is mainly due to charging/discharging a large capacitance of bitlines and control signals and can be expressed as Eq. (1) [5].

$$P_{{{\text{dynamic}}}} = \alpha \times C_{{{\text{effective}}}} \times V_{{{\text{DD}}}}^{2} \times f_{{\text{read/write}}}$$
(1)

where \(\alpha\) is the activity factor of bitline, \(C_{{{\text{effective}}}}\) is the effective capacitance, \(V_{{{\text{DD}}}}^{2}\) is the second order of the power supply voltage, and \(f_{{\text{read/write}}}\) is the reading/writing frequency. It can be inferred from Eq. (1) that the dynamic read power consumed by an SRAM is lower than its write power due to discharging the bitlines capacitance to a small amount (utmost 50% of VDD) during the read operation, while they should be fully discharged to zero potential during the write operation. Dynamic power consumed by SRAMs with differential structure is higher than those of SRAMs with single-ended structure because \(\alpha\) is equal to one. Furthermore, an SRAM with slow reading/writing operation reduces \(f_{{\text{read/write}}}\), therefore, resulting in \(P_{{{\text{dynamic}}}}\) reduction.

The dynamic read power results are plotted in Fig. 13a. All the SRAMs, except the conventional 6T SRAM, employ a single-ended reading operation, resulting in a reduced \({\upalpha }\) (means \({\upalpha }\) is less than half). This reduces dynamic read power consumed by these SRAMs based on Eq. (1). As shown in Fig. 13a, the proposed SB9T SRAM offers the best read power. The key reasons for this reduction of dynamic read power are decoupled read bitline and non-precharge operation. In addition, the voltage swing takes place at single nodes only owing to a single-ended operation. Despite the ST9T being of a single-ended structure, it has higher dynamic read power among its counterparts. This is due to the high wordline capacitance and dynamic power because of the high height layout (see Table 6).

Fig. 13
figure 13

Dynamic power comparison at VDD = 0.5 V. a Read power and b Write power

Table 6 Layout dimension of the studied SRAM bitcells

Figure 13b shows the dynamic write power results of the studied SRAMs. A higher write power consumption observed in the 6T, PPN10T, and DIRP10T SRAMs is because of their differential writing structure. The FC11T SRAM consumes higher write power, despite a single-bitline structure, because its bitline needs to be discharged to the ground during both write ‘0’ and write ‘1’ operations. The least write power is related to the WRE8T SRAM, attributed to its single-ended writing scheme, fewer signals assertion, and power-gating technique. The third-best write power is for the proposed SB9T SRAM due to four main reasons: (1) due to single-ended write operation the swing voltage at operating nodes is reduced, (2) direct write on complementary node due to feedback-cutting operation, (3) lower writing speed and (4) fewer enabled control signals.

4.5.1 Static Power Comparison

Static power is another important metric, as most of the SRAM bitcells in an SRAM array remain in the idle mode most of the time [34]. In the advanced technology, the major components of leakage current are gate leakage (IG), junction leakage (IJN), and subthreshold leakage (ISUB) through different transistors. In FinFET devices compared to CMOS devices, body leakage is less and the body effect is much less involved because of the narrow and high structure of Fins. Due to this reason the junction current, related to the body effect, has been omitted in the overall leakage current measurement [25]. The various leakage current components of the proposed SB9T SRAM cell are shown in Fig. 14a and can be expressed as:

$$\begin{aligned} I_{{\text{G - SB9T}}} & = I_{{{\text{DG}}_{{{\text{M1}}}} }} + I_{{{\text{SG}}_{{{\text{M2}}}} }} + I_{{{\text{DG}}_{{{\text{M2}}}} }} + I_{{{\text{SG}}_{{{\text{M3}}}} }} + I_{{{\text{DG}}_{{{\text{M3}}}} }} + I_{{{\text{GD}}_{{{\text{M4}}}} }} \\ & + I_{{{\text{GS}}_{{{\text{M4}}}} }} + I_{{{\text{GD}}_{{{\text{M5}}}} }} + I_{{{\text{SG}}_{{{\text{M5}}}} }} + I_{{{\text{DG}}_{{{\text{M6}}}} }} + I_{{{\text{SG}}_{{{\text{M6}}}} }} + I_{{{\text{SG}}_{{{\text{M7}}}} }} \\ & + I_{{{\text{DG}}_{{{\text{M7}}}} }} + I_{{{\text{SG}}_{{{\text{M8}}}} }} + I_{{{\text{DG}}_{{{\text{M8}}}} }} + I_{{{\text{SG}}_{{{\text{M9}}}} }} + I_{{{\text{DG}}_{{{\text{M9}}}} }} \\ \end{aligned}$$
(2)
$$I_{{\text{SUB - SB9T}}} = I_{{{\text{SUB}}_{{{\text{M1}}}} }} + I_{{{\text{SUB}}_{{{\text{M5}}}} }} + I_{{{\text{SUB}}_{{{\text{M9}}}} }}$$
(3)
$$I_{{\text{Leakage - SB9T}}} = I_{{\text{G - SB9T}}} + I_{{\text{SUB - SB9T}}}$$
(4)
Fig. 14
figure 14

a Proposed SB9T SRAM cell with its various leakage components, b Conventional 6 T SRAM cell with its various leakage components, and c Static power comparison at VDD = 0.5 V

Figure 14b shows the various leakage components of the conventional 6T SRAM cell. From this figure, we can write:

$$\begin{aligned} I_{{\text{G - 6T}}} = &I_{{{\text{DG}}_{{{\text{MN1}}}} }} + I_{{{\text{GD}}_{{{\text{MN2}}}} }} + I_{{{\text{GS}}_{{{\text{MN2}}}} }} + I_{{{\text{GD}}_{{{\text{MN3}}}} }} + I_{{{\text{SG}}_{{{\text{MN4}}}} }} \\ & + I_{{{\text{DG}}_{{{\text{MN4}}}} }} + I_{{{\text{DG}}_{{{\text{MP1}}}} }} + I_{{{\text{SG}}_{{{\text{MP1}}}} }} + I_{{{\text{GD}}_{{{\text{MP2}}}} }} \\ \end{aligned}$$
(5)
$$I_{{{\text{SUB}} - 6{\text{T}}}} = I_{{{\text{SUB}}_{{{\text{MN}}1}} }} + I_{{{\text{SUB}}_{{{\text{MN}}3}} }} + I_{{{\text{SUB}}_{{{\text{MP}}2}} }}$$
(6)
$$I_{{\text{Leakage - 6T}}} = I_{{\text{G - 6T}}} + I_{{\text{SUB - 6T}}}$$
(7)

From Eqs. (2) to (7), it may appear that static power dissipation for the proposed SB9T SRAM should be higher than that of the conventional 6T SRAM. However, the second-best leakage power dissipation, as shown in Fig. 14c, which is the summation of the leakage power measured for holding ‘1’ and ‘0’ in the storage node Q, is related to the proposed SB9T SRAM. This can be explained as follows. Due to nonzero positive voltage at node Q2, the gate to source voltage (VGS) of the M9 and M5 and drain to source voltage (VDS) of the M9 are rendered negative and lowered, respectively. Thus, the ISUB through these transistors is reduced based on Eq. (8) in which Vth0 is the initial threshold voltage, \(\lambda_{BS} > 0\) and \(\lambda_{DS} > 0\) are body bias coefficients and drain-induced barrier lowering (DIBL) coefficient, respectively, and VDS is a drain to source voltage. I0 is the subthreshold current when \(V_{GS} = V_{th}\), \(\eta\) is the subthreshold swing factor, and \(V_{T} = KT/q\) is the thermal voltage.

The transistors M2 and M3 are connected in series, and then, form a stack. When Q = ‘0,’ one of the terminals in the stack is connected to GND and another one is connected to VDD; therefore, the intermediate node PQ raises to a certain nonzero positive value, which is higher than GND and lower than VDD. This nonzero positive voltage at the PQ node reduces leakage current as well as leakage power.

$$\begin{aligned} I_{SUB} & = I_{0} \exp \left[ {\frac{{V_{GS} - V_{th} + \lambda_{BS} V_{BS} + \lambda_{DS} V_{DS} }}{{\eta V_{T} }}} \right]\\ &\quad\times\left[ {1 - \exp \left( {\frac{{ - V_{DS} }}{{V_{T} }}} \right)} \right] \\ V_{th} & = V_{th0} - \lambda_{BS} V_{BS} - \lambda_{DS} V_{DS} \\ \end{aligned}$$
(8)

The effective channel length of the transistors in the cross-coupled inverters of the proposed SRAM bitcell (left portion) increases due to the presence of stacked transistors. Since the increase in effective channel length leads to the increase in the transistor’s threshold voltage, a further reduction in leakage power is obtained. Moreover, the existence of a greater number of p-type devices in the proposed design compared to most of the comparison SRAMs slightly reduces static power. This can be realized by considering the hot-carrier injection mechanism in short-channel devices. Therefore, the suggested cell offers the second-best static power. In the proposed design, the read path is independent of the data, which further reduces the static power dissipation. The conventional 6T and DIRP10T SRAMs have comparatively higher static power dissipation because they utilize relatively more bitlines. Although the PPN10T SRAM employs three bitlines, it consumes lower static power than the above SRAMs owing to the presence of stacked transistors in its cell core. The application of only one bitline and the presence of stacked transistors in the cell core’s inverters make the ST9T SRAM be low static power.

4.6 SRAM Bitcells’ Area Comparison

This section compares all the tested SRAM bitcells based on layout area. The layouts of these SRAM bitcells have been drawn on the fin grid of 7-nm FinFET according to the FinFET layout design rules reported in [26] and are shown in Fig. 15. Table 6 gives dimensions of these layouts based on \(\lambda\), where \(\lambda\) is the minimum feature size assumed to be \(1/2\) of the gate length. Table 6 shows that the conventional 6 T SRAM bitcell shows the smallest layout area, whereas the highest layout area is related to the FC11T SRAM bitcell. These are due to the simple and compact structure of the 6 T SRAM bitcell with its minimum number of transistors and the higher transistors count used in the FC11T SRAM bitcell design. The proposed SB9T SRAM bitcell, employing nine transistors and being single-ended in nature, shows 1.04 × /1.06 × /1.02 × /1.31 × lower and 1.74 × /1.06 × /1.10 × higher layout area compared to ST9T/DIRP10T/PPN10T/FC11T SRAM bitcell and 6 T/WRE8T/TGRD9T SRAM bitcell, respectively.

Fig. 15
figure 15

Layout of the studied SRAM bitcells. a 6T, b WRE8T, c TGRD9T, d ST9T, e DIRP10T, f PPN10T, and g FC11T, and h proposed SB9T

5 Conclusion

Improving the performance of SRAMs in terms of power and stability in advanced CMOS nodes is very challenging because the various SCEs become crucial concerns. To have a trade-off between delay and power, it is important to design the SRAM, which operates well in the near-threshold region. However, in further scaled CMOS technology and low-VDD, the impact of PVT variations is significant. The FinFET technology, as a potential alternative to CMOS, can reduce the SCEs and mitigate PVT variations while offering less power and high stability. This paper presented a novel half-selection disturb-free single-bitline 9T SRAM (namely SB9T) with high stabilities for low-power near-threshold operation in 7-nm FinFET technology. The read stability was improved in the proposed design by using a read-decoupling technique. In addition, it showed a high writability by employing a feedback-cutting mechanism. The dynamic and static power consumptions have been reduced in the proposed SB9T SRAM with the aid of a single-bitline structure, non-precharge read operation, stacking effects, and a higher count of p-type transistors. The best outcome of the proposed SB9T SRAM was an improvement in RSNM and WSNM by 1.77 × /1.36 × and 2.35 × /13.13 × /1.30 × compared to WRE8T/ST9T and 6T/DIRP10T/PPN10T, respectively, and reduction in dynamic read power by a minimum of 1.50 × . Moreover, the third (second)-best dynamic write power (static power) was related to the proposed SRAM over seven contemporary SRAMs at VDD = 0.5 V.

In the proposed SB9T SRAM array (assuming BI architecture), only one SRAM bitcell is involved during the reading or writing operation, and the majority of SRAM bitcells remain in idle mode to maintain the stored data. This increases static power, as well as the overall power consumption. This parameter can be minimized by assigning a full swing VDD and a scaled VDD to the involved SRAM bitcell and the unselected SRAM bitcells, respectively. The scaling of operating VDD for the unselected SRAM bitcells can be continued as far as the content never flips. This technique can be performed by using row/column decoders and address bits and considered as the future work.