1 Introduction

The advent of wireless sensor networks (WSNs) has completely revolutionized conventional approaches to security measures in any establishment. Clusters of sensors, actuators and microcontrollers in systematic arrangements are rapidly being employed for detecting intruders, identifying suspicious objects and remote surveillance. Two types of architectures are used for object detection and tracking, one is proximity sensor and the other one is processing logic with memory system. The latter is accomplished by using CMOS technology. The enormous amount of memory required by tracking systems to store information, demands the deployment of high capacity static random access memory (SRAM) cell used as cache memory (Singh and Kumar Vishvakarma 2017), which operates in the frequency range of 1–100 MHz (Bol et al. 2013; Myers et al. 2016; Pal et al. 2019a). Physical implementation of such a system involves the transmission of huge volumes of data from local devices to central cloud systems, which leads to a considerable amount of energy dissipation (Sun et al. 2017). In addition, it is observed that SRAM cells occupy more than 90% of on-chip area in most system on chips (SoCs) (Pal and Islam 2016a). Hence, the aim of any SRAM based memory design is to obtain both energy as well as area efficiency while maintaining adequate performance standards (Mohammadi et al. 2018).

Supply voltage scaling is a popular approach to achieve low power memory operation. This is because, standby power is linearly dependent on supply voltage (\(\textit{V}_{\mathrm{DD}}\)) (Gupta et al. 2018), while, dynamic power has a quadratic relationship with \(\textit{V}_{\mathrm{DD}}\) (Nabavi and Sachdev 2018).

However, as \(\textit{V}_{\mathrm{DD}}\) is reduced, the circuit parameters become increasingly sensitive to process variations like line edge roughness (LER) and random dopant fluctuation (RDF), which may lead to threshold voltage (\(\textit{V}_{\mathrm{t}}\)) mismatch between adjacent transistors in a memory cell (Kulkarni et al. 2007). Furthermore, increased process, voltage and temperature (PVT) variations at low supply voltages, degrade the ability to maintain the pull-up and pull-down ratio in the conventional 6T SRAM cell. This deteriorates the write ability and the read stability of the cell beyond tolerable standards and increases the probability of write and read failures (Pal et al. 2020). In addition, supply voltage scaling leads to an exponential increase in operational delay which further exacerbates the total energy consumption (Anh-Tuan et al. 2011). With technology scaling, the chances of soft-errors in the form of single-cell upsets (SCUs) and multiple-cell upsets (MCUs) also increase, further adding to the instability of the cell (Pal and Islam 2016b; Pal et al. 2020).

Therefore, realizing a robust SRAM cell that is capable of operating reliably when subjected to severe PVT variations is the objective of low power cache memory design. Various designs have been proposed to overcome the limitations of conventional 6T cell in subthreshold operation. Employing column-decoupling technique in the fully differential 8T (FD8T) (Anh-Tuan et al. 2011) cell (see Fig. 1a) reduces the half-select disturbance considerably while curtailing leakage power dissipation as well. However, it yields very poor read margins owing to its structural similarities with conventional 6T cell. Further reduction in hold power (\(\textit{H}_{\mathrm{PWR}}\)) is obtained by connecting an extra tail-transistor in the pull-down path of cells in the bit-interleaving architecture supporting 11T (BI11T) (Pal and Islam 2016b) cell (see Fig. 1b) as well as in cells proposed in (Pal and Islam 2016a; Islam and Hasan 2012). Single-ended cells proposed in (Tu et al. 2010, 2012; Tawfik and Kursun 2008; Pal et al. 2019b, c, d; Keda et al. 2006; Aly and Bayoumi 2007) exhibit much lower power consumption due to the presence of a single bitline. The single-ended disturb-free 9T (SEDF9T) cell (see Fig. 2a) proposed by Tu et al. (2012), employs transistor stacking in the read path to reduce leakage current. The dual \(\textit{V}_{t}\) SRAM cell (Tawfik and Kursun 2008) records a small static power dissipation due to high threshold value. However, the asymmetrical nature during the write operation in these cells, leads to severely degraded write ‘1’ margin. Cells in Pal et al. (2019b, c, d) and Aly and Bayoumi (2007) employ feedback cutting techniques to improve the write ability but suffer from very high write delay. On the other hand, significant enhancements in read stabilities are observed in dual-bitline cells suggested in Pal and Islam (2016b), Islam and Hasan (2012), Chang et al. (2010), Chiu et al. (2014) and the SEDF9T Tu et al. (2012) cell, due to isolation of storage nodes from bitlines. Furthermore, WWL12T (Yadav et al. 2017) (see Fig. 2b) has transmission gates (TGs) in its access path and the PMOS of the TGs is controlled by the data storing nodes of the internal core cell. As a result, it consumes considerably higher \(\textit{H}_{\mathrm{PWR}}\) as one of the PMOS of the TGs is always ON. The differential 12T (D12T) (Ataei et al. 2016) cell (see Fig. 3a) employs a scheme, where the bitline leakage is not dependent on stored data, for improving static power dissipation while acquiring longer operational delay.

As is evident, these designs achieve considerable improvements in read stability as well as power consumption, however, not much improvement in write ability is obtained. Hence, the proposed fully differential write assist 10T (FDWA10T) cell (see Fig. 3b) seeks to improve write ability as well as maintain a robust read stability.

The rest of the paper is divided into the following sections: Section II gives a detailed description of the proposed FDWA10T cell. Section III discusses the soft-error and half-select issue. Section IV presents an analysis of the various design metrics of the proposed cell in comparison with other cells under PVT variations. Section V concludes this paper.

Fig. 1
figure 1

a The FD8T cell (Anh-Tuan et al. 2011) and b the BI11T cell (Pal and Islam 2016b)

Fig. 2
figure 2

a The SEDF9T cell (Tu et al. 2012) and b the WWL12T cell (Yadav et al. 2017)

Fig. 3
figure 3

a The D12T cell (Ataei et al. 2016) and b the proposed FDWA10T cell and its charging/discharging path during write operation

Fig. 4
figure 4

The simplified array-based architecture of FDWA10T cell

2 The proposed FDWA10T cell and its operation

The proposed 10T cell is given in Fig. 3b. An additional tail transistors, MN9, which is common in a row, is connected with the core inverter. A simplified architecture of the proposed cell is given in Fig. 4. Access transistors MN6 and MN5 are selected by row-based wordline (WL). The write access transistors MN4 and MN3 are activated by CSL. Read-decoupling transistor MN8/MN7 is connected between node Y2/Y1 and row-based \(\textit{V}_{\mathrm{SSL}}\)/\(\textit{V}_{\mathrm{SSR}}\). The row-based signal PD is used to activate MN9.

2.1 Enhanced write operation

During the write operation, the tail transistor is switched ON by setting PD to \(\textit{V}_{\mathrm{DD}}\). Both CSL and WL are set to \(\textit{V}_{\mathrm{DD}}\) to initiate the write operation. This turns ON access transistors MN4/MN3 and MN6/MN5 respectively. The data to be written to node ‘L’ (‘H’) is applied to BLB (BL). Depending upon the data to be written \(\textit{V}_{\mathrm{SSL}}\)/\(\textit{V}_{\mathrm{SSR}}\) is either maintained at \(\textit{V}_{\mathrm{DD}}\) or at GND. Consider the case of ‘1’ writing where the node ‘L’ stores ‘0’ and the node ‘H’ stores ‘1’ intially (see Fig. 3b). Then, in order to write ‘1’ at ‘L’, BLB is charged to \(\textit{V}_{\mathrm{DD}}\) while BL is discharged to GND by write driver (not shown). \(\textit{V}_{\mathrm{SSL}}\)/\(\textit{V}_{\mathrm{SSR}}\) is kept at \(\textit{V}_{\mathrm{DD}}\)/GND, accordingly. Since, the node ‘L’ stores ‘0’, MN8 is turned OFF initially for a brief time. However, as voltage at node ‘L’ rises, it turns MN8 ON, providing the node with two distinct paths for charging ‘L’ - from \(\textit{V}_{\mathrm{SSL}}\), through MN8, MN4 and from BLB, through MN6, MN4, to \(\textit{V}_{\mathrm{DD}}\). On the other hand, as node ‘H’ stores ‘1’, MN7 is initially turned ON. This leads to the presence of two separate paths for discharging ‘H’ to GND - through MN3, MN5 to BL and through MN3, MN7 to \(\textit{V}_{\mathrm{SSR}}\). Thus, BLB and \(\textit{V}_{\mathrm{SSL}}\) rapidly charge the node ‘L’ while the node ‘H’ is discharged by BL and \(\textit{V}_{\mathrm{SSR}}\). Therefore, the data, desired to be stored is written successfully. An identical process takes place for writing ‘0’ to node ‘L’ where BLB (BL) is set low (high).

2.2 Differential disturb free read operation

The columnar CSL is set to GND to turn MN4 and MN3 OFF. Therefore, the data storing nodes are isolated from the bitlines. WL is set to \(\textit{V}_{\mathrm{DD}}\) to turn ON MN6 and MN5. \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are maintained at GND. BLB and BL are precharged HIGH during hold operation. MN9 is turned ON by setting PD to \(\textit{V}_{\mathrm{DD}}\). This helps in maintaining the stored data. Consider the case of read ‘1’ operation, where node ‘L’ stores ‘1’ and node ‘H’ stores ‘0’. The ‘1’ storing node, ‘L’, turns ON transistor MN8. As a result, BLB discharges through MN6, MN8 to grounded \(\textit{V}_{\mathrm{SSL}}\). On the contrary, as ‘H’ stores ‘0’, MN7 is turned OFF and BL remains in the precharged condition. The bitlines are connected to the input of a sense amplifier (not shown), which senses the stored data as soon as a 50 mV voltage difference is developed between the bitlines and read operation is completed. Alternatively, if ‘L’ stored ‘0’ and ‘H’ stored ‘1’, BL would discharge through MN5, MN7 to grounded \(\textit{V}_{\mathrm{SSR}}\).

2.3 Hold operation

During hold operation, the tail transistor is kept ON by setting PD to \(\textit{V}_{\mathrm{DD}}\). This helps in maintaining the data stored in the cross-coupled inverters. WL and CSL are set to GND to turn OFF access transistors MN6/MN5 and MN4/MN3 respectively. \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are set to \(\textit{V}_{\mathrm{DD}}\), to minimize the bitline leakage current. Both BLB and BL are maintained at \(\textit{V}_{\mathrm{DD}}\) to reduce wake-up time.

Fig. 5
figure 5

\(2\times 2\) memory array during write ‘1’ operation

3 Elimination of half-select disturbance

The proposed cell is half-select free. Fig. 5 gives simplified \(2\times 2\) memory array architecture of the FDWA10T cell during write ‘1’ operation. For writing ‘1’ to node ‘L’ of the top left cell, the columnar CSL_0 and row-based WL_0 are activated while the bitline BLB_0/BL_0 is maintained at \(\textit{V}_{\mathrm{DD}}\)/GND. The row-based VSSL_0/VSSR_0 is maintained at \(\textit{V}_{\mathrm{DD}}\)/GND. PD_0/PD_1 is maintained at \(\textit{V}_{\mathrm{DD}}\), irrespective of SRAM operation. Therefore, the upper left cell is entirely selected for write operation. The upper right cell is half-selected as MN6 and MN5 are activated by row-based WL_0. However, it does not affect the storage nodes because of the columnar nature of CSL, as CSL_1 is kept at GND, which isolates the storage nodes from bitlines and thus prevents miswriting. The bottom right cell is unselected because both WL_1 and CSL_1 are deactivated. The bottom left cell is subjected to a HIGH CSL_0 and consequently, its access transistors MN4 and MN3 are turned ON. However, both VSSL_1 and VSSR_1 are maintained at \(\textit{V}_{\mathrm{DD}}\) during hold operation. Consequently, if either of the nodes (‘L’/‘H’) is storing ‘1’, the corresponding read discharge transistor (MN8/MN7) is turned ON and a direct path between the node and \(\textit{V}_{\mathrm{DD}}\) (VSSL_1/VSSR_1) is established, which adds to the stability of the cell. On the other hand, for a node storing ‘0’ the corresponding read discharge transistor is turned OFF, preventing miswrite. Similarly, during read operation half-select is avoided as CSL is column-based while WL is row-based, which isolates the storage nodes from bitlines in cells that are in standby mode. Hence, FDWA10T cell is half-select free and can support bit-interleaving architecture (Pal et al. 2019c).

4 Simulation setup and results

The simulation are performed in HSPICE with 16-nm CMOS Predictive Technology Model (NIMO 2008). In order to determine their relative strengths, various performance parameters of the proposed FDWA10T cell have been compared with that of the previously proposed FD8T (Anh-Tuan et al. 2011) (see Fig. 1a), BI11T (Pal and Islam 2016b) (see Fig. 1b), SEDF9T (Tu et al. 2012) (see Fig. 2a), WWL12T (Yadav et al. 2017) (see Fig. 2b) and D12T (Ataei et al. 2016) (see Fig. 3a) cells. Given that SRAM cells designed for WSN applications operate in the frequency range of 1–100 MHz (Bol et al. 2013; Myers et al. 2016), the proposed cell is operated at \(\sim\)24 MHz. Transistor sizing is a very important aspect of SRAM design and therefore, appropriate sizings have been chosen in accordance with (Pal and Islam 2016b; Pal et al. 2019a). The impact of process variations is a significant factor for operations in the deep submicron technology (Islam and Hasan 2012). Therefore, variations in process parameters along with supply voltage and temperature are considered in accordance with (Pal and Islam 2016b).

Fig. 6
figure 6

a Read delay of different cells at various supply voltages and b\(\textit{T}_{\mathrm{RA}}\) distribution plot of BI11T and FDWA10T cells @ \(\textit{V}_{\mathrm{DD}}\) = 300 mV

4.1 Read access time (T\(_{RA}\))

In cells, possessing differential read schemes, \(\textit{T}_{\mathrm{RA}}\) or read delay is estimated as mentioned in Pal et al. (2019e, f), while the method used in Pal et al. (2019c) is employed for a single-ended cell like SEDF9T. Figure 6a shows a comparative analysis of the read delay of the comparison cells at different \(\textit{V}_{\mathrm{DD}}\). It is observed that SEDF9T exhibits the longest read delay. This can be attributed to the presence of three stacked transistors in the read path as well as the single-ended bitline structure of the cell (Shin et al. 2017). The D12T cell records a faster read delay than SEDF9T owing to the presence of differential read operation. However, the presence of three stacked transistors in the read path of D12T when compared to two stacked transistors in the proposed cell, leads to longer delay than that of FDWA10T. In the BI11T cell, both the read discharge transistors (MN5 and MN6) as well as the access transistors (MN3 and MN4), are directly connected to the bitlines, BL and BLB. As a result, the overall bitline capacitance increases and leads to a slower discharge rate when compared to FDWA10T. Since, the WWL12T and FDWA10T have a similar read path, consisting of two equally sized transistors connected to bitline (see Figs. 2b,  3b), their read delays are also same. However, the FD8T cell exhibits the shortest read delay because of its 6T like structure and has higher \(\beta\)-ratio. Consequently, from Table 1, which shows the \(\textit{T}_{\mathrm{RA}}\) of various cells @ \(\textit{V}_{\mathrm{DD}}\) = 0.3 V, it is observed that the proposed cell shows \(2.09\times\), \(1.73\times\) and \(1.47\times\) shorter \(\textit{T}_{\mathrm{RA}}\) than that of SEDF9T, BI11T and D12T respectively while showing a \(1.5\times\) longer read delay than that of FD8T, at \(\textit{V}_{\mathrm{DD}}\) = 300 mV.

Process variation is a key factor in submicron designs (Pal and Islam 2016b). Hence, it is essential that the cell should operate robustly in such conditions. Fig. 6b gives the distribution plots of \(\textit{T}_{\mathrm{RA}}\) for FDWA10T and BI11T cells when operating at 300 mV supply voltage. As is evident, the FDWA10T cell shows \(1.05\times\) narrower spread in \(\textit{T}_{\mathrm{RA}}\) as compared to BI11T cell which proves its robustness compared to BI11T cell.

Fig. 7
figure 7

Butterfly curve of a FDWA10T cell and b FD8T cell (Anh-Tuan et al. 2011) at different supply voltages

Fig. 8
figure 8

RSNM distribution plot of FD8T and FDWA10T cells @ \(\textit{V}_{\mathrm{DD}}\) = 300 mV

4.2 Read stability

Read stability is quantitatively measured as the Read Static Noise Margin (RSNM), which is estimated from a butterfly curve as mentioned in Pal et al. (2019c), as is shown in Fig. 7a. Since, all cells, except FD8T, employ read-decoupling techniques, their storage nodes are isolated from the bitlines and are essentially, read-disturb free. Consequently, they exhibit identical RSNM values (see Table 1), which are much higher (\(3.94\times\) at \(\textit{V}_{\mathrm{DD}}\) = 300 mV) than that shown by FD8T, which is essentially a conventional 6T cell (Pal and Islam 2016b) and has its storage nodes directly connected to the bitlines during read operation (see Fig. 7b for RSNM of FD8T).

Given the impact of PVT variations on SRAM cells, its influence on RSNM has been taken into consideration. Fig. 8 provides the RSNM distribution plots of FD8T and FDWA10T at \(\textit{V}_{\mathrm{DD}}\) = 0.3 V. It is observed, that the RSNM of FD8T/FDWA10T shows a mean (\(\mu\)) = 18.09 mV/71.91 mV and a standard deviation (\(\sigma\)) = 7.58 mV/8.80 mV, resulting in variability (\(\sigma\)/\(\mu\)) = 0.42/0.12. As a result, the FDWA10T cell shows \(3.5\times\) narrower spread in RSNM as compared to the FD8T cell, which establishes its relative superiority in terms of robustness.

Fig. 9
figure 9

a Node voltages at FDWA10T cell during write ’1’ operation and b write delay values of different cells at various supply voltages

Fig. 10
figure 10

a WSNM values of different cells at 300 mV supply voltage and b WSNM distribution plot of FD8T and FDWA10T at \(\textit{V}_{\mathrm{DD}}\) = 300 mV

Fig. 11
figure 11

a Static power dissipation values of different cells at various supply voltages and b HSNM of FDWA10T and BI11T cells at 300 mV supply voltage

4.3 Write access time (\(\textit{T}_{\mathrm{WA}}\))

Write delay or \(\textit{T}_{\mathrm{WA}}\) for writing ‘1’ at a node, initially storing ‘0’, is measured as mentioned in Islam and Hasan (2012). The node voltages during the write operation of the FDWA10T cell are depicted in Fig.  9a. The write delay of different cells at various supply voltages are provided by Fig. 9b. The BI11T (Pal and Islam 2016b) cell exhibits the longest write delay as it employs MN9, an NMOS device, to drive the LWL from WL, which reduces the voltage swing in LWL and diminishes the driving strength of its access transistors. SEDF9T (Tu et al. 2012), shows a considerably large \(\textit{T}_{\mathrm{WA}}\) because of its single-ended asymmetrical structure, which particularly degrades the ‘1’ writing ability. The differential writing WWL12T cell shows a slightly improved write delay when compared to SEDF9T. However, it is particularly difficult to perform the write ‘1’ operation in the WWL12T cell as MN8/MN7 is initially turned ON for a brief duration of time and provides a disadvantageous discharging path to high BLB/BL, which slows down the charging of the ‘0’ storing node. As a result, the D12T cell which consists of an additional discharging path, shows shorter write delay than WWL12T. In the proposed cell, \(\textit{V}_{\mathrm{SSL}}\)/\(\textit{V}_{\mathrm{SSR}}\) is either maintained at \(\textit{V}_{\mathrm{DD}}\)/GND or at GND/\(\textit{V}_{\mathrm{DD}}\) during write operation, which speeds up the writing process by providing multiple write paths. To write ‘1’ to node ‘L’ (initially storing ‘0’), of FDWA10T (see Fig. 3b), BLB is kept at \(\textit{V}_{\mathrm{DD}}\) which charges ‘L’ through MN6 and MN4. Initially, MN8 is OFF as L is storing ‘0’. However, as the voltage at ‘L’ rises, it turns MN8 ON and provides an additional path to charge node ‘L’ through MN8 and MN4. Similarly, as the node ‘H’ is initially storing ‘1’, MN7 is turned ON for a brief period of time and provides an additional discharging path through MN3 and MN7 to \(\textit{V}_{\mathrm{SSR}}\) (set at GND). Consequently, FDWA10T exhibits shorter \(\textit{T}_{\mathrm{WA}}\) than D12T due to the presence of an additional charging path as well as the presence of two transistors in its charging/discharging path when compared to three in the latter. The FD8T cell, owing to the presence of a single access transistor in its access path, provides the shortest write delay. Therefore, from Table 1, which exhibits the \(\textit{T}_{\mathrm{WA}}\) of various cells @ \(\textit{V}_{\mathrm{DD}}\) = 0.3 V, it can be seen that the \(\textit{T}_{\mathrm{WA}}\) of the FDWA10T cell is \(4.33\times\), \(1.86\times\), \(1.59\times\) and \(1.20\times\) shorter than that of BI11T, SEDF9T, WWL12T and D12T respectively while being \(2.42\times\) longer than that of FD8T.

Table 1 Comparison among different SRAM cells @ \(\textit{V}_{\mathrm{DD}}\) = 0.3 V

4.4 Write ability

The write ability depicts the ease with which an SRAM cell is able to write the desired data (Pal and Islam 2016b), and is estimated in terms of Write Static Noise Margin (WSNM) (Pal et al. 2019d). Fig. 10a shows the WSNM of the comparison cells. From the figure it can be seen that the read and write VTC meet at a single point which is the indication of successful write operation (Pal et al. 2014). The SEDF9T, WWL12T and D12T, owing to the presence of multiple access transistors in series, show poor write ability (see Fig. 10a). The FDWA10T cell exhibits a relatively larger (\(4.1\times\)) WSNM than the aforementioned cells owing to the presence of multiple write paths, as \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are maintained at \(\textit{V}_{\mathrm{DD}}\) or GND depending on the data to be written. However, the absence of series access transistors in BI11T/FD8T results in \(1.32\times /1.02\times\) higher write ability than the proposed cell. Moreover, the inverter in FD8T (see Fig. 1a), made up of MP3 and MN5, provides a complete LWL swing and does not diminish the driving strength of the cell, unlike BI11T, and as a result, it records a higher WSNM value compared to BI11T.

Given the severity of deterioration in performance that results out of PVT variations in submicron SRAM designs, it is necessary to take into account the performance of a memory cell when subjected to such variations. Fig. 10b provides the WSNM distribution plot of FD8T and FDWA10T at a supply voltage of 0.3 V. It can be seen that the WSNM of FD8T/FDWA10T shows a mean (\(\mu\)) = 98 mV/74 mV and a standard deviation (\(\sigma\)) = 19.6 mV/10.4 mV, resulting in variability (\(\sigma\)/\(\mu\)) = 0.20/0.14. As a result, the proposed cell shows \(1.43\times\) narrower spread in WSNM as compared to FD8T. This indicates the robustness of the proposed design.

4.5 Leakage power dissipation

SRAM cells mostly operate in the standby mode. Therefore, in submicron technologies, standby or hold power \((\textit{H}_{\mathrm{PWR}})\) dissipation is the major component of overall power consumption, and can be attributed to increase in leakage current (Islam and Hasan 2012). Fig. 11a shows the \(\textit{H}_{\mathrm{PWR}}\) of various cells at different \(\textit{V}_{\mathrm{DD}}\). From the figure it can be seen that SEDF9T and D12T consumes lower power than WWL12T because the read path of these cells consists of stacked transistor which reduces the bitline leakage. The WWL12T consumes highest hold power among the compared cell because its one if the PMOS in the access TGs is always ON which increases the leakage current (see Fig. 2b). However, it is the additional stacking of tail transistor, MN9, in the core cell of FDWA10T, that reduces the leakage current significantly. In addition, both \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are maintained at \(\textit{V}_{\mathrm{DD}}\) during standby mode which curtails leakage in the read path. As a result, from Table 1, which gives the \(\textit{H}_{\mathrm{PWR}}\) consumption of various cells @ \(\textit{V}_{\mathrm{DD}}\) = 0.3 V, it can be seen that the proposed cell consumes \(1.03\times\), \(1.03\times\), \(1.06\times\) and \(2.09\times\) lower static power consumption as compared to D12T, FD8T, SEDF9T and WWL12T respectively. The BI11T (see Fig. 1b) consumes very low (\(6.4\times\)) power (see Fig. 11a) as its tail transistor MN7 is kept OFF during hold mode which disconnect the core inverter from ground and leakage current cannot flow to ground. However, this advantage is achieved at a cost of severely degraded (\(3.74\times\) smaller) hold static noise margin (HSNM) when compared to the FDWA10T cell (see Fig. 11b).

Fig. 12
figure 12

The layout design of FDWA10T cell

Fig. 13
figure 13

Read delay distribution plot of BI11T and FDWA10T cells @ \(\textit{V}_{\mathrm{DD}}\) = 300 mV at 45-nm technology node

Fig. 14
figure 14

RSNM distribution plot of FD8T and FDWA10T cells @ \(\textit{V}_{\mathrm{DD}}\) = 300 mV at 45-nm technology node

Fig. 15
figure 15

WSNM distribution plot of FD8T and FDWA10T cells @ \(\textit{V}_{\mathrm{DD}}\) = 300 mV at 45-nm technology node

4.6 Layout area and validation

The layout of the different cells used in this paper have been designed using the 45 nm technology rules. Table 1 gives the layout area of different cells normalized to FDWA10T cell (see Fig. 12). The FD8T and SEDF9T cells consume \(0.91\times\) and \(0.87\times\) smaller area as compared to the proposed cell due to the presence of fewer transistors. On the other hand, the BI11T/WWL12T/D12T cell consumes higher area (\(1.14\times /1.26\times /1.23\times\)) when compared to FDWA10T due to the presence of fewer transistors in the proposed cell. If the area consumed by the tail transistor (MN9) is taken into consideration, then the area consumption increases. In order to include the aforementioned area, the layout of the tail transistor is drawn within the row pitch at the leftmost side for layout estimation (not shown in layout). It is observed that the area consumption per cell increases by 7% (upon considering a word-size of 16 bits) if the area consumed by the tail transistor is shared by each cell in a row. Given that the layouts of the different cells used in this work have been designed in the 45-nm technology node, we have simulated the major design metrics using the 45 nm PTM model as well. In order to gauge the impact of process variation on the cell performance, variation in different parameters (as explained earlier in Section IV) in 45 nm PTM, has also been considered to perform Monte Carlo simulation with a sample size of 5000. By using this model at a supply voltage of 300 mV, the proposed FDWA10T cell exhibits \(4.21\times /3.52\times /2.95\times\) shorter read delay than SEDF9T/BI11T/D12T and \(4.55\times\) higher RSNM than FD8T. Moreover, it shows \(6.74\times /3.96\times /2.33\times /1.87\times\) shorter write delay than BI11T/SEDF9T/WWL12T/D12T and \(5.6\times\) higher WSNM than SEDF9T/D12T/WWL12T. Furthermore, the FDWA10T consumes \(3.82\times /2.65\times /2.11\times /2.11\times\) lower \(\textit{H}_{\mathrm{PWR}}\) than WWL12T/SEDF9T/FD8T/D12T. Since process variation is severe in subthreshold region, its impact on read delay, RSNM and WSNM has been analysed. From Fig. 13, it can be seen that the proposed cell exhibits \(1.12\times\) narrower spread in \(\textit{T}_{\mathrm{RA}}\) than that of BI11T. In addition, it also exhibits \(4.22\times\) and \(1.45\times\) narrower spread in RSNM (see Fig. 14) and WSNM (see Fig. 15) respectively when compared to FD8T. Since these results follow the same trend as that of 16 nm PTM model, it can be concluded that the simulation results used in this paper are credible and the proposed design is scalable to 16-nm technology and beyond.

5 Conclusion

A Fully Differential Write Assist 10T (FDWA10T) SRAM cell has been proposed in this work. It employs read-decoupling technique to improve the read stability and provides multiple charging and discharging paths during the write operation, as \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are kept at \(\textit{V}_{\mathrm{DD}}\) or GND, to improve the write ability of the cell. Leakage power dissipation is minimized as both \(\textit{V}_{\mathrm{SSL}}\) and \(\textit{V}_{\mathrm{SSR}}\) are maintained at \(\textit{V}_{\mathrm{DD}}\) in the standby mode. Stacking of transistor MN9, with the cross-coupled inverter, further reduces leakage power. Various design metrics of the cell have been analyzed under severe process variations and improvements have been observed in the majority of instances. In addition, it consumes lesser area when compared to the WWL12T and D12T cells. Hence, the proposed FDWA10T cell, is a reliable choice for fast, low power and high-capacity cache memory application in wireless sensor networks.