2.1 Introduction

With the advent of technologies like wireless sensors, bio-medical implants and internet-of-things (IoTs), ultra low-power operation and “normally-off instant power-on” mode have become an absolute necessity [1,2,3]. These systems have sporadic wake-up times, and thus the leakage power is a dominant phenomena in the power consumption for such systems. To minimize the leakage power, power gating approach has been proposed, where a lower voltage (hold voltage) is used for volatile memory to retain data while all logic circuits are turned off [4]. However, even maintenance of this hold voltage (during power-down mode) in high-performance processing units, leads to a huge power dissipation due to leakage current, which is \(\approx \)40% of the dynamic energy [5]. Even worse, during abrupt power failures the data in volatile memory is lost and computation tasks have to be restarted. This happens due to the volatile nature of CMOS memory cells used in conventional CPUs such as SRAM-based caches and flip-flop (FF) based register files. To mitigate these issues, different circuits have been designed to back-up data from on-chip memory (SRAM), FFs and registers to off-chip nonvolatile memory (NVM) thus preserving the system state in case of power failures. This is known as two-macro scheme, i.e. SRAM (for faster access) in conjunction with NVM (for nonvolatility). However, the main drawback of this methodology is that it requires long store/restore time due to serial SRAM read/write and long NVM write/read procedures. This results in long power-on/off time. Thus, the two-macro scheme is vulnerable to data loss in case of sudden power failure [6, 7]. To address these limitations, NVM elements are directly integrated to SRAM or FF units, where it forms a direct bit-to-bit connection in a vertical arrangement to achieve faster parallel data transfer and turn on/off speed. This gives rise to NV-SRAM/NV-FF units.

Emerging NVMs such as floating-gate based memories, PCM (Phase Change Memory), FRAM (Ferroelectric RAM), OxRAM (Oxide-based RAM), CBRAM (Conductive Bridge RAM) and STT-MRAM (Spin Transfer Torque based Magnetoresistive RAM) have emerged as promising solutions for realizing embedded nonvolatile Logic. However, due to large access/programming times, high operating voltages and limited endurance, floating gate or FLASH memories are less favored choices. PCM devices, on the other hand, requires large current to heat the GST material for resistive switching between crystalline and amorphous states. FRAM poses a number of challenges owing to data signal degradation in the scaling of devices. STT-MRAM also need large programming currents to exert a spin torque on the magnetic moment of the free layer with respect to the fixed layer and hence leads to higher power dissipation during the programming phase. As a result, OxRAM devices have emerged as a great choice for hybrid CMOS-NVM based nonvolatile circuits owing to their low cost, high density, low operating voltages, negligible leakage, access times about 1000\(\times \) faster than floating-gate memories, full CMOS compatibility, possibility of 3D integration and integration in vias [8,9,10,11]. In this chapter, we are presenting the most important counterparts (CMOS-OxRAM) of conventional volatile memory systems, i.e. (i) NV-SRAM, and (ii) NV-FF. These hybrid nonvolatile circuits offer advantages like: (i) nearly zero leakage, (ii) efficient backup/restore operation and (iii) high performance and low energy. We have presented 4T-2R NV-SRAM bitcell that offers “real-time nonvolatility”. Using this 4T-2R NV-SRAM bitcell, we have proposed a novel NV-FF design. This chapter is organized as follows: Sect. 2.2 summarizes the different NV-SRAM/NV-FF implementations proposed in literature so far. Section 2.3 discusses our 4T-2R NV-SRAM bitcell and explains its different programming schemes. Section 2.4 shows our novel real-time NV-FF implementation using 4T-2R NV-SRAM bitcell and presents its operating modes. We have also presented a modified NV-FF design that offers better system performance compared to the aforementioned NV-FF design. The chapter concludes with Sect. 2.5.

2.2 Prior Art: NV-SRAM and NV-FF

This section is devoted to the overview of developments in NV-SRAM/NV-FF circuit designs using the emerging NVM memory technologies:

2.2.1 Nonvolatile Static Random Access Memory

Memory architecture use hierarchy of caches (L1, L2, last level cache (LLC), etc.) and the optimization target for designing each cache level is different. L1 is accessed quite frequently and therefore, it needs higher speed and write endurance whereas LLC is targeted to minimize off-chip accesses and thereby needs large capacity. Hence, it is recommended to have SRAM-based L1 cache for better performance [12] whereas emerging NVMs can be used in L2 or LLC (due to their latency, density and write endurance values). To realize a nonvolatile cache using NVM, researchers have proposed various bitcell based optimization schemes. It is proposed to use NV-SRAM (including a volatile- and nonvolatile circuit) for nonvolatile cache implementation. Under normal operations (when external power is supplied), the volatile circuit provides fast data access. When controlled power-down/sleep-mode is enabled or there is sudden power failure, the nonvolatile circuit provides data backup, thereby retaining data previously stored in the volatile circuit. In literature, several different hybrid (CMOS-OxRAM/CMOS-MTJ (magnetic tunnel junction)/CMOS-PCM) NV-SRAM designs like 9T-2R [13], 8T-2R [7, 14, 15], 8T-2MTJ [16], 8T-1R [17], 7T-2R [18, 19], 7T-1R [20, 21], 6T-2R [22], 6T-2MTJ [23], 4T-2R [24, 25] and 4T-2MTJ [26, 27] have been proposed. Figure 2.1 shows the circuit schematic for different NV-SRAM implementations. These implementations differ in their approach to store data during power-down mode. Xue et al. [13] proposed 9T-2R NV-SRAM bitcell where they used equalization transistor connected between the storage nodes for data restoring mode. However, the area requirements for 9T-2R is \(\approx \)230F\(^2\) compared to \(\approx \)140F\(^2\) for conventional 6T SRAM. Furthermore, separate wordlines (WLs) are required for the storage nodes that increases the count of control signals leading to routing congestion. Chiu et al. [7, 14] proposed 8T-2R bitcell for better density compared to 9T-2R. This bitcell offered BL-CL (Bitline-Control line) sharing scheme to reduce area overhead and also enabled write-assist function. However, the drawback with this implementation is the requirement of extra control lines for off-loading the data for power-down mode. Moving ahead, to minimize the leakage currents, Tasson et al. [17] proposed 8T-1R NV-SRAM bitcell. The restore time for 8T-1R is \(\approx \)2.6\(\times \) compared to 8T-2R [7] due to multiple steps involved in operation and also its read latency is higher than 9T-2R [13], 6T-2R [22] and conventional 6T SRAM. By using 1T-2R as off-loading storage element with conventional 6T SRAM cell, a 7T-2R bitcell was proposed by Sheu et al. [18]. Using this bitcell, the write margin improved by 1.03\(\times \) and 1.37\(\times \) when compared to 6T SRAM and 6T-2R [22] bitcell respectively. However, when compared to 8T-2R [7], write margin and read stability is degraded. Furthermore, the area of 7T-2R bitcell is 1.07\(\times \) more than 6T-2R bitcell.

NVM elements used in implementations [7, 13,14,15,16,17,18,19,20,21,22,23] are exploited such that, they store the NV-SRAM state only during controlled power-down or sleep-mode, enabling only a ‘last-bit non volatility’ whereas in [24,25,26,27] offers ‘real-time nonvolatility’ as NVM devices participate actively during bitcell programming. In this chapter, we will discuss our 4T-2R NV-SRAM work [24, 25] that offers ‘real-time nonvolatility’ and will summarize its different programming schemes.

Fig. 2.1
figure 1

Circuit schematics of different NV-SRAM bitcells proposed in literature: a 9T-2R: WLL and WLR are separate WLs to control 1T-1R cells, b 8T-2R: SWL indicates NVM switch line, c 8T-1R, d 7T-2R, e 7T-1R, and f 6T-2R (redrawn from [13, 14, 17, 18, 20, 22]). Variable resistance here indicates NVM element

Fig. 2.2
figure 2

The figures have been redrawn from the referenced papers

Different NV-FF schematics proposed in literature a OxRAM-based NV-FF [5] b STT-MTJ-based NV-FF [34] c SHE-MTJ-based NV-FF [35] d Ferroelectric capacitors-based NV-FF [46].

2.2.2 Nonvolatile Flip-Flops

Several NV-FFs have been proposed over time using emerging NVM devices such as OxRAM [5, 27,28,29,30,31], MTJ [32,33,34,35,36,37,38,39,40,41,42], ferroelectric capacitors [43,44,45] and transistors [6, 46]. These flip-flops provide on-demand and controlled data backup and restore whenever appropriate backup signal is triggered. However, having additional circuit as an off-loading data block leads to area and power overheads. Therefore, the major challenge in designing NV-FF lies in coming up with an area efficient circuit design along with high performance in terms of speed, power and energy. A lot of developmental work has been done in designing and optimization of NV-FF. Figure 2.2 shows some of the NV-FF designs as proposed in literature. Iyenger et al. [3] proposed a MTJ-based NV-FF with enhanced scan capability in two variants—Enhanced Scan Enabled NV-FF (ES NV-FF) and High Performance ES NV-FF (HPES NV-FF). In ES NV-FF, two parallel latches allowed enhanced scan and store-restore operations. The output of the master latch was connected to the slave latch as well as the NV latch. The two MTJ devices are written serially during negative pulse of the clock cycle thus limited the operating frequency of the FF. In HPES NV-FF, the MTJ devices were written in parallel thus, the frequency of FF is not compromised. The authors also analyzed that the cell area of ES NV-FF was \(\approx \)1.8\(\times \) compared to standard master–slave FF (MSFF) and gave a maximum frequency of 2 GHz. HPES NV-FF had an area overhead of \(\approx \)2.5\(\times \) that of MSFF with 2 GHz operating range. In [5], a bipolar OxRAM-based NV-FF was proposed. The off-loading NVM circuit was connected to the slave part of the FF element, comprised of two OxRAM devices whose operational modes were controlled by a group of transistors called the NVM-L and NVM-R. Each NVM block was a 3T-1R structure that contributed in controlling and providing current compliance to the circuit. Authors claimed that the circuit has zero standby-leakage power and nonvolatility, at an area overhead of only 25% as compared to Balloon FF solution [47] and a 10% increase in CLK-Q delay compared to a normal FF delay. In [33] and [48], two MTJ devices were used for off-loading the data from MSFF and the MTJ devices retained the off-loaded state only during the sleep-mode. In these designs, the MTJ states were updated on every clock cycle, which increased the power consumption, reduced the FF speed and endurance of MTJ. Furthermore, Jung et al. [48] aimed to minimize short circuit current by using low-skewed NAND (LS-NAND), which was used to efficiently interface the two supply voltage levels of 1.1 and 1.8 V. In [32, 49, 50], the NV-FF was implemented as a part of write driver circuit. As a result, the transistor sizes in these designs were quite large leading to higher parasitic capacitance. This affected the operational speed of the FF as well as its data integrity. Magnetic FF proposed by Sakimura et al. [32] gave a maximum operating frequency of 500 MHz with 1 ns data backup time. Endoh et al. [50] proposed a PFET based 1T-1MTJ NV-FF with operating frequency of 600 MHz. Kazi et al. [51] proposed two OxRAM-based NV-FF exploiting sub V\(_T\) operation enabling zero leakage sleep states. The FF operated at 2 V and had a current compliance of 10 \({\upmu }\)A. The write energy was OxRAM dependent while the sub V\(_T\) operation reduced the read energy by 5.4%. The restore operation was done at 0.4 V. In recent work by Kang et al. [52], a voltage controlled Magnetic Anisotropy (VCMA) NV-FF was proposed which exploited the magnetic anisotropy assistance in faster switching of the magnetic devices used in the circuit. Authors reported that due to the phenomena of VCMA, the current density and pulse duration can be greatly reduced for MTJ switching. An improvement of 98.4% was observed in data backup energy for VCMA STT-MRAM-based NV-FF and 89.5% improvement was observed in data backup delay as compared to conventional STT-MRAM-based NV-FF. While this methodology was beneficial for STT-MRAM-based NV-FF, the margin of improvement in SHE-based NV-FF was small (74.6% in data backup energy and 19% in data backup delay). Bishnoi et al. [53] proposed a 2 MTJ-based NV-FF which reduced the static power consumption by 5\(\times \) compared to CMOS based FFs. However, the design proposed was bulky as it required 32 transistors and 2 MTJ cells as compared to 26 transistors used in conventional CMOS based FFs. A Ferroelectric-Based Nonvolatile FF for wearable health care systems was proposed by Izumi et al. in [54]. The FF was based on storing complementary data in coupled ferroelectric capacitors, that enabled the reduction in the capacitor size by 88%. The FF had a read voltage margin of 240 mV at 1.5 V, which resulted in 2.4 pJ low access energy with 10-year (at 85 \(^{\circ }\)C) data-retention capability. Ali et al. [55] also proposed a MTJ-based NV-FF which was aimed for power gating application. The proposed design could achieve 80% less area as compared to traditional STT-MRAM-based NV-FF with a backup energy of 111 fJ and restore energy of 6.9 fJ. The backup and restore time achieved were 3 ns and 0.16 ns respectively.

All the above designs are based on off-loading of data when a controlled power-down signal is applied. These designs do not take care of the fact that power outage might also be due to glitches which leads to loss of data since the data during normal phase is not backed up. Some designs use a battery backup for such cases where a sudden power loss brings the FF to a battery mode which is charged enough to backup the states to the NVM block. This battery backup clock requires extra area and therefore increases the overhead. Moreover, the designs which do not have a battery backup design to backup the data during sudden power-loss over-optimizes the fact that power glitches will not corrupt the data. It is a known fact that the circuit concepts used in developing NV-SRAM can be extended to designing NV-FFs [44]. We therefore take into consideration the points mentioned above and come up with a real-time data-backup-based NV-FF which is based on the 4T-2R NV-SRAM proposed in [24].

Fig. 2.3
figure 3

a Circuit schematic of 4T-2R NV-SRAM bitcell (redrawn from [24]), b DC IV characteristics of 3 nm thick HfO\(_x\) based OxRAM device used in this study (modelled in [56])

2.3 NV-SRAM: Principle, Programming Schemes and Stability Analysis

4T-2R NV-SRAM bitcell discussed in this study is shown in Fig. 2.3a [24]. Figure 2.3b shows the IV characteristics of 3 nm thick HfO\(_x\) based OxRAM devices obtained using compact model described in [56]. To realize the nonvolatility in 4T-2R NV-SRAM bitcell, the pull-up transistors in SRAM bitcell are replaced by OxRAM devices. OxRAM devices actively participate during NV-SRAM programming and helps retaining the logic state during power-down mode. NV-SRAM bitcell has two modes of operation: Write mode and Read mode. OxRAM devices are programmed only during the Write Mode. True nonvolatility of the NV-SRAM bitcell is achieved as data can be retrieved from the OxRAM devices not only after a controlled power-down but also after an abrupt power failure.

2.3.1 Programming Schemes

For NV-SRAM, to encode the data in the OxRAM devices, we have proposed different programming schemes [24, 25]. The programming schemes are classified on the basis of their approach to program the OxRAM devices, e.g. (i) sequential programming in which the two OxRAM devices are programmed in two cycles, and (ii) parallel programming in which both the OxRAM devices are programmed in single cycle. The working principle, advantages and trade-offs for the aforementioned programming schemes are summarized below:

Fig. 2.4
figure 4

During Write ‘1’ operation: switching in a Ox1 and b Ox2 devices for LRS-HRS and HRS1-HRS2 programming schemes [24]

2.3.1.1 Two-Cycle Programming Scheme

In two-cycle programming scheme, OxRAM devices are programmed serially in the bitcell. On PL, a two-cycle programming pulse is applied with peak amplitude \(=\) 1.6 V. PL is a 2 \(\upmu \)s long pulse with 1 \(\upmu \)s pulse for RESET (PL \(=\) 1) and 1 \(\upmu \)s pulse for SET (PL \(=\) 0) programming. During the first cycle (PL \(=\) 1), OxRAM device connected to the internal node storing logic state ‘0’ undergoes RESET switching (as V\(_{TB}\) is negative) whereas during second cycle (PL \(=\) 0), the OxRAM device connected to the internal node storing logic state ‘1’ undergoes SET switching (as V\(_{TB}\) is positive). Note: \(V_{TB}(Ox1) = V_{BL} - V_{PL}\), \(V_{TB}(Ox2) = V_{BLB} - V_{PL}\). Initially, both OxRAM devices are in strong SET state and current through them is of the orders of few \(\approx \)mA (higher power dissipation during the first programming cycle). Figure 2.4 shows the switching activity in both the OxRAM devices while writing logic states ‘1’ and ‘0’ to the 4T-2R NV-SRAM bitcell. It is to be noted that the time required to program OxRAM device to RESET state (\(\approx \)470 ns) is more than the time required to program it to SET state (\(\approx \)500 ns). During RESET programming (first cycle), \(\approx \)390 nA flows through the OxRAM device and the post-programming resistance is nearly 2 M\(\Omega \). During SET programming (second cycle), \(\approx \)2.8\( \upmu \)A flows through the OxRAM device and the post-programming resistance is nearly 268 k\(\Omega \). This programming methodology is called as two-cycle LRS-HRS scheme and the resistance window achieved using this scheme is \(\approx \)7.6\(\times \). In this scheme, OxRAM device in LRS determines the limiting parameters for the NV-SRAM performance as the maximum current flows through it during both read and Write operations. As the resistance of OxRAM device decreases, larger pull-down transistors are required to handle the current flowing in the circuit. This mitigates the inherent advantage of using fewer transistors in 4T-2R NV-SRAM design. The other disadvantages of using LRS-HRS scheme are: higher power dissipation, sneak paths and lower SNM (Static Noise Margin). To mitigate some of these issues, an efficient programming scheme HRS1-HRS2 can be used instead of LRS-HRS. In HRS1-HRS2 scheme, one of the OxRAM device is programmed using a weak-SET while the other OxRAM device is programmed in RESET state. This lowers down the switching energy/bit and pull-down transistor area. In this scheme, peak amplitude of PL is kept as 1.2 V (as 90 nm CMOS uses similar voltage ranges for its operation). For Write logic ‘1’, data is loaded to BL and its complementary data is loaded to BLB. While programming, the effective positive V\(_{TB}\) across the OxRAM device storing ‘1’ and the negative V\(_{TB}\) across the OxRAM device storing logic ‘0’ is less than the positive and negative V\(_{TB}\) when OxRAM was programmed using PL \(=\) 1.6 V respectively. This results in different SET and RESET resistance states (0.68 M\(\Omega \) and 2.04 M\(\Omega \) resp.) in each OxRAM device using HRS1-HRS2 (see Fig. 2.4). Using HRS1-HRS2, V\(_{TB}\) for SET switching is 313 mV (compared to 750 mV using LRS-HRS) and for RESET switching is −780 mV (compared to 797 mV using LRS-HRS). The NMOS transistor width and write energy are lowered down to 240 nm (640 nm in LRS-HRS) and 0.414 pJ (1.8 pJ in LRS-HRS) using energy efficient HRS1-HRS2 scheme. Detailed timing diagram for two-cycle programming scheme is shown in Fig. 2.5a.

Fig. 2.5
figure 5

Operational modes for Read and Write operations in a two-cycle programming scheme, and b single-cycle programming scheme using pulse engineered signals at Programming line (PL) and Bitline (BL) [25]

Fig. 2.6
figure 6

Applied PL, BL and BLB signals during Write logic ‘1’ operation for a Node Q, b Node QB. c Ox1 switching during RESET and SET regions (inset: switching activity during the first cycle), and d Ox2 switching during RESET region [25]

2.3.1.2 Single-Cycle Programming Scheme

In this scheme, the PL and BL signals are modified in such a way that OxRAM devices are programmed simultaneously in a single cycle. In this scheme, a triangular pulse with equal rise and fall times is applied at the PL line providing the required amplitude and polarity of \(V_{TB}\) to switch the OxRAM devices in NV-SRAM simultaneously. Figure 2.5b shows the timing diagram for single-cycle programming scheme. For data write ‘1’, the BL line is slowly ramped to 1.2 V while BLB line is kept at 0 V. When the access transistors are turned on, the internal nodes Q and QB reflect the data writes at BL and BLB. This action is supported further due to the cross-coupled connection between the NMOS pull-down transistors (M1 and M2). Figure 2.6a shows the triangular pulse as applied to the PL line. It can be seen that depending on the potential difference across the device (\(V_{TB}\)) due to voltage values at PL and BL/BLB, the OxRAM devices are either SET or RESET. For Node QB (as shown in Fig. 2.6b), polarity of \(V_{TB}\) stays negative (with peak amplitude −1.6 V) throughout the triangular single-cycle pulse applied at PL (as BLB \(=\) 0 V). Ox2 device switches from LRS \(\rightarrow \) HRS, resulting in negligible current through it. As a result, QB stabilizes at 0 V (logic ‘0’) and transistor M1 is turned off. Figure 2.6c, d shows the resistive switching at Ox1 and Ox2. Due to the modulation of \(V_{TB}\) across Ox1, the device switches twice in the first write cycle owing to the fact that the device started from an initial LRS state. A point to note here is that the double switching in the OxRAM device will be a one time phenomena and will only be visible during the first write cycle unless otherwise the devices are re-initialized. Meanwhile, the potential drop across Ox2 will be negative for the entire write cycle. A similar phenomena is evident when writing data ‘0’ to the bitcell. Table. 2.1 shows the comparison in the resistive switching parameters for data write ‘0’ and ‘1’. For the proposed methodology, the device RESETs in 357 ns and SETs in 168 ns (logic ‘1’ write).

Impact of PL and BL Signals on Single-Cycle Programming: When considering the single-cycle operation, the amplitude, rise and fall times and the pulse width of the control and data signals PL and BL/BLB, are the key parameters which determine the operability of the NV-SRAM bitcell. For the OxRAM device used in the design, the pulse width of the write cycle is taken as 1 \(\upmu \)s. Programmed resistance state based storage in the proposed 4T-2R NV-SRAM depends on the magnitude and polarity of \(V_{TB}\). Impact of the peak amplitude of BL (keeping PL fixed at 1.6 V) on OxRAM device switching is shown in Fig. 2.7a, b. The potential drop across the OxRAM devices (\(V_{TB}\)) is affected as the slope of the data signal BL is varied. As the maximum amplitude (\(V_{data_{max}}\)) is increased, \(V_{TB}(Ox1)\) is decreased, which signifies that a weaker programming condition is applied to the device. This results in programming of the OxRAM devices Ox1 and Ox2 in different SET and RESET resistance states.

Table 2.1 Absolute programming times for programming the OxRAM devices used in 4T-2R NV-SRAM bitcell [25]

The programming of OxRAM devices also depend on the peak voltage of PL as shown in Fig. 2.7c, d. Since the previous programming state of the OxRAM devices governs the subsequent programming conditions of the device (specifically RESET state going to subsequent SET state), the OxRAM devices switches for varying values of \(V_{TB}\) (Fig. 2.7c). It can be observed that the RESET switching times remain constant. This results in same initial condition for Ox2 (Fig. 2.7d). From this figure it can be observed that the SET switching time of OxRAM increases with the amplitude of PL. This is because more time is needed to build up the desired \(V_{TB}\) across the OxRAM terminals. It is observed from Fig. 2.7 that by modulating the slopes of PL and BL signals, the latency of the 4T-2R NV-SRAM bitcell can be tuned in single-cycle programming approach.

Furthermore, by varying the rise and fall times of the applied PL signal (i.e. having an asymmetric triangular pulse) the programming time of the NV-SRAM bitcell can be tuned. This is because the rise and fall times determines the rate at which the potential drop across the device is developed to program the OxRAM devices to SET/RESET states. Figure 2.6a gives a fair idea on the modulation of the SET and RESET region of the OxRAM device for applied PL and BL/BLB signals. For switching of the device from HRS \(\rightarrow \) LRS, the state of the OxRAM is modulated by varying the rise and fall times of the PL signal (Fig. 2.8a, b). A point to note here is that the RESET operation of the device occurs during the rise time of PL (Fig. 2.6a). With reduction in rise time, the slope of PL increases. The \(V_{TB}\) required for making transition from LRS \(\rightarrow \) HRS is achieved faster, thus reducing the switching time of the device. Correspondingly, the device achieves a SET state faster as the fall time of PL is increased. Figure 2.8c, d represents the variation in the resistance values and switching times of Ox1/Ox2 with change in rise time of asymmetric PL signal.

Fig. 2.7
figure 7

Effect on \(V_{TB}\) required for switching, due to change in peak amplitude of a, b BL (1–1.3 V) keeping peak amplitude of PL \(=\) 1.6 V, and c, d peak amplitude of PL (1.5–1.8 V) keeping peak amplitude of BL \(= 1\) V [25]

An advantage of single-cycle programming scheme over double-cycle programming is less energy required during write operation (\(\approx \)80 fJ for HRS1-HRS2 as compared to 1.8 pJ for LRS-HRS scheme [25]). The low energy is due to the fact that the OxRAM devices stay in RESET region for 60% of the total programming time during which a small amount of current flows through the device (\(\sim \)nA). Furthermore, the programming time of the single-cycle scheme is reduced by half as compared to the two-cycle scheme making the single-cycle programming scheme an energy and latency efficient approach.

2.3.1.3 Read Operation for Two-Cycle and Single-Cycle Programming Schemes

The approach to read the programmed bitcell for both two-cycle and single-cycle programming is same. To read the cell the bitlines are precharged to \(V_{dd}/2\) which corresponds to the state ‘a’ in Fig. 2.5. Following that WL is asserted and a read voltage is applied to PL (state ‘b’). Current flows through Ox1 and Ox2 depending on the resistance state to which it is programmed. OxRAM device which is programmed to a higher resistance value will allow less current to flow through it as compared to OxRAM device programmed to a low resistance state. The current through the device will charge or discharge the internal node and will, in turn, pull-up pr pull-down the BL/BLB lines. This approach is similar to the read in a conventional SRAM cell. The sense amplifier to differentiate the data written in the bitcell in such a case can be a voltage control sense amplifier (VCLA). Another approach to read the bitcell is to use read voltage to capture the current through the device. In such case, we use a current controlled sense amplifier (CCSA). In this scheme, a read voltage (\(V_{read}\)) is applied to the PL and current corresponding to the resistive state flows through the device. Since WL is asserted, the current flows through the BL/BLB lines, which is converted to voltage levels by the sense amplifier enabling data read from bitcell. The advantage of such a read scheme is that there is no need of a precharge circuit to precharge the bitlines. This reduces the area overhead of the overall NV-SRAM array.

Fig. 2.8
figure 8

Change in \(V_{TB}\) at switching instant, for a Ox1 and b Ox2 with change in the duration of rising edge of PL voltage pulse. Change in the c switching time and d resistance of OxRAM devices (after successful Write operation) by modulating the duration of rising edge of the pulse applied at PL, keeping peak amplitude of BL signal at 1 V [25]

2.3.2 Stability Analysis

Stability of a memory cell is an important aspect to look into since it quantifies the amount of noise that it can tolerate without flipping the logic state stored in it. If the noise crosses the threshold value, the stability of the cell is compromised due to unwanted fluctuations at the output node. This degradation further leads to read disturbs and write failures. The key aspects of cell stability are defined by two approaches—butterfly curves for read, write and hold [58] and N-curve [59]. The metrics obtained from these approaches enable the designers to make a more robust and a stable cell [59]. Using the voltage and current information from these stability approaches, a designer can understand the implications of stability metrics on the intrinsic and extrinsic properties of the bitcell.

2.3.2.1 Static Noise Margin (SNM)

Conventionally, stability of SRAM bitcell is defined using SNM [58]. SNM is the maximum value of DC noise voltage \(V_{n}\) that can be tolerated by the memory bitcell without changing the logic state. For 4T-2R NV-SRAM bitcell (at 90 nm technology node), the hold, read and write SNM are 0.3 V, 0.13 V and 0.42 V respectively. For a SRAM bitcell with cell ratio (CR) 2 and pull-up ratio (PR) 1, the hold, read and write SNM values are 0.5 V, 0.15 V and 0.5 V respectively [57]. Figure 2.9a–c shows the effect of \(V_{dd}\) scaling on hold, read and write SNM for 4T-2R NV-SRAM. Figure 2.9d–f show the hold, read and write SNM curves for 4T-2R NV-SRAM with pull-down transistor width (M1 and M2) in range 200 nm–2 \(\upmu \)m. The width of M3/M4 is kept constant at 180 nm. It is observed that read SNM is a strong function of CR. For lower CR values, Read operation fails, hence for reliable Read operation CR needs to be equal to, or greater than 2.2. Furthermore it is observed for successful Write operation, pull-down transistor (M1 and M2) width of 200 nm (CR \(\approx \) 1.11) is desirable, however due to destructive Read operation bitcell needs to be designed with CR \(\ge \) 2.2 [57].

Fig. 2.9
figure 9

Simulated 4T-2R NV-SRAM bitcell—a, d Hold SNM b, e Read SNM and c, f Write SNM, for different \(V_{dd}\) values and different pull-down transistor widths respectively. N-curves for 6T SRAM and 4T-2R NV-SRAM are shown in g and impact of h different \(V_{dd}\) and i pull-down transistor widths on N-curves of 4T-2R NV-SRAM bitcell [57]

2.3.2.2 N-curve

It is evident that SNM considers only the voltage matrices of SRAM/NV-SRAM cell to analyze the bitcell stability. N-curve method [59], which considers both voltage and current matrices, gives the following stability matrices—SVNM (static voltage noise margin), SINM (static current noise margin), WTV (write-trip voltage) and WTI (write-trip current). Read stability criteria is defined using SVNM and SINM. A small SVNM combined with a large SINM (or vice versa) results in a stable cell because the \(V_{n}\) required to disturb the cell is large [59]. Table 2.2 summarizes N-curve parameters calculated for 6T SRAM and 4T-2R NV-SRAM [57]. By modulating the pull-down transistor width (i.e. by changing CR) of the NV-SRAM cell and \(V_{dd}\) amplitude, N-curve characteristics are plotted (shown in Fig. 2.9g–i). It is observed that with increasing \(V_{dd}\) and pull-down transistor size, there is improvement in SINM, WTI and WTV, while SVNM remains almost constant.

Table 2.2 N-curve parameters for 6T SRAM and 4T-2R NV-SRAM bitcell [57]

It is evident that 4T-2R NV-SRAM bitcell offers numerous advantages over other NV-SRAM designs proposed in literature, such as (i) real-time nonvolatility, (ii) permits unconventional transistor sizing, (iii) low area footprint and (iv) low-power operation. For quantitative comparison, we have presented in Table 2.3 the comparison of 4T-2R NV-SRAM bitcell with other NV-SRAM implementations proposed in literature so far.

Table 2.3 Comparison of different 4T-2R NV-SRAM bitcells and state-of-the art 6T SRAM bitcell [25]

2.4 Real-Time NV-FF Based on 4T-2R NV-SRAM Circuit

Figure 2.10 shows the schematic of the proposed real-time NV-FF. The circuit uses four OxRAM devices to store the data in real-time when it is transferred from D to Q. The major advantages of this NV-FF are:

  • The circuit is implemented in a small area as compared to both the conventional CMOS based FF and off-loading-based NV-FF.

  • The circuit offers zero leakage current during off-state of the NV-FF.

  • The circuit takes care of power glitches during active/normal operating mode, that may cause the data to be corrupted.

  • The circuit is easy to design and replaces the PMOS transistors in the conventional CMOS based FF and NV-FF. Thus cost effective.

The proposed NV-FF design consists of two modules—(1) master block (2) slave block. Unlike traditional NV-FF which has 3 operating modes (active, store and restore modes), the proposed NV-FF has only two operating modes—active/normal mode (which also stores the data to the nonvolatile device) and restore mode.

Fig. 2.10
figure 10

Schematic of the real-time NV-FF. It is similar to the conventional CMOS based FF in terms of modules constituting it—Master Block and Slave Block

2.4.1 Operating Modes of the Real-Time NV-FF

  1. 1.

    Active/Normal Mode: In this mode when CLK \(=\) 0 is asserted and data D \(=\) 0 latches on to the input, the NMOS transistor M1 in the master block turns on. This leads to the load capacitor C\(_{L1}\) of the master block to be discharged to ‘0’. To store the data in the OxRAM device, PL is taken as a two-cycle signal wherein it transitions from ‘1’ to ‘0’ after a pulse of duration equal to the switching time of the OxRAM device. As the internal capacitor (C\(_{L1}\)) discharges, a low potential appears at the bottom electrode of the OxRAM device connected to the M1 transistor. When PL \(=\) 1, this OxRAM gets programmed to a RESET state (HRS) since V\(_{TB}<0\). A PL \(=\) 1 is followed by a PL \(=\) 0 which does not disturb the state of the OxRAM since V\(_{TB}=0\). A ‘0’ at the output node of the 1T-1R in the feed-forward path of the master block switches off the NMOS transistor M2 in the master block’s feedback path. The OxRAM device connected to M2 slowly charges the internal node capacitor C\(_{L2}\) to logic ‘1’ and holds the state. The programming of this OxRAM device in the feedback path is similar to that in the feed-forward path. At PL \(=\) 1, the OxRAM in the feedback path does not program as V\(_{TB}\approx 0\). When PL goes to ‘0’, due to charging of the internal load capacitor C\(_{L2}\), V\(_{TB}>0\). This programs the OxRAM to a SET state. It can be noted here that the OxRAM devices in the block are always programmed to opposite states whenever a data is applied. The master block is followed by an inverter whose output is fed as an input to the slave block. Therefore, the internal node Q\(_m\) holds an inverted value as compared to the input D (Q\(_m = 1\) when D \(=\) 0). When CLK \(=\) 1, D is isolated from the master block and the output of the 1T-1R in the master block’s feedback path is connected to the gate of the transistor M1. Q\(_m = 1\) is applied to the gate of the transistor M3, turning it on. This discharges the load capacitor C\(_{L1}\) of the slave block to ‘0’ giving an output of ‘0’ at Q. Q \(=\) 0 turns off the transistor M4 and therefore, the OxRAM connected to it slowly charges the internal node capacitor C\(_{L2}\) to ‘1’. The programming of the OxRAM devices in the slave block is same as that in the master block.

    A point to note here is that there is no external control signal which monitors/triggers the data off-loading. This reduces the number of external connections to the NV-FF thus easing routing and pin/terminal congestion.

  2. 2.

    Power-down Mode: During the power-down mode, all the signals are pulled down to zero and the FF goes to a standby mode. Since the nonvolatile devices store the data as its resistive state, it is not lost and can be restored when the NV-FF comes back to power.

  3. 3.

    Restore Mode: A CLK \(=\) 0 and PL \(=\) 1 are asserted when the NV-FF block is turned on. This allows a current to flow through OxRAM devices (connected to M3 and M4) depending on the resistive state to which they are programmed. The OxRAM device connected to M4 (programmed to a SET state) charges the gate of transistor M3 to a logic ‘1’ turning it on. This leads to the discharge of internal node capacitor C\(_{L1}\) in the slave block to logic ‘0’ restoring the data at the output Q of the NV-FF.

We can observe here that the data off-loading occurs at the normal mode but only the OxRAM devices in the slave block participate in the data restoring. In addition to this, the NV-FF in this case is slower than the conventional NV-FF since the total time to transfer the data to the output (T\(_{D-Q}\)) is equal to \(\approx \)2\(\times \) the programming time of the OxRAM device. The total time needed to transfer and store the data in real-time is:

$$\begin{aligned} T_{D-Q} = T_{master}+T_{slave} \end{aligned}$$
(2.1)
$$\begin{aligned} T_{D-Q} = max(T_{feed-forward_{OxRAM}},T_{feedback_{OxRAM}})+ \end{aligned}$$
(2.2)
$$\begin{aligned} max(T_{feed-forward_{OxRAM}},T_{feedback_{OxRAM}}) \end{aligned}$$
(2.3)

Say T\(_{feed-forward_{OxRAM}}\) = T\(_{feedback_{OxRAM}} =\) T, then

$$\begin{aligned} T_{D-Q} = 2T \end{aligned}$$
(2.4)

Therefore, the performance of the NV-FF in this case heavily depends on the programming time of the OxRAM device. As the technology improves, faster OxRAM devices are being proposed. Therefore, such a design proves to be beneficial in terms of area and performance.

Figure 2.11 shows the timing diagram of the 4T-2R-based NV-FF for real-time data storage. The transistors used in this simulation is from the 90 nm technology node and the OxRAM model is the same as described in [56]. The FF operates at 1.6 V. For the device model used in the simulation T\(_{RESET}\) \(\ge \) T\(_{SET}\) [25], thus, T\(_{store} = 714\) ns and T\(_{restore} = 2\) ns. Since the write current of the OxRAM is small (2.3 \(\upmu \)A for programming OxRAM in LRS and 364 nA for programming OxRAM in HRS), the transistor sized used in the latch can be kept to minimum standard sizing without any additional parasitics.

Fig. 2.11
figure 11

Timing diagram of the 4T-2R-based NV-FF. When CLK \(=\) 0, the OxRAM in master block gets programmed and when CLK \(=\) 1, OxRAM in slave block gets programmed. R1 (feed-forward) and R2 (feedback) are master block OxRAM, R3 (feed-forward) and R4 (feedback) are slave block OxRAM

Fig. 2.12
figure 12

Schematic of the proposed modified NV-FF depicting the 3 major operating blocks

2.4.2 Modified NV-FF Design for Improved System Performance

Due to the limitations posed by the NV-FF design proposed in previous section, a modified NV-FF design is presented here. The proposed NV-FF, as shown in Fig. 2.12, has three different modes of operation: (1) active or normal mode, (2) off-loading or store mode and (3) restore mode. The NV-FF consists of a volatile master stage and a single OxRAM device in the slave stage. This device stores the off-loaded data just before power-down mode is activated. A small area overhead is required for the proposed NV-FF (6 extra transistors in addition to the 22 transistors needed for conventional CMOS based FFs). Only the slave latch is employed to write/read the OxRAM device during the off-loading/restore mode without the need for any sensing or dedicated write driver block. STR (store) and RSTR (restore) signals are asserted such that only one signal is activated during store/restore operation. Figure 2.13 shows the schematic of the off-loading block of the proposed NV-FF. We can see that the block is essentially made up of three separate modules.

Fig. 2.13
figure 13

The off-loading block showing the three sub-blocks. The data is off-loaded during controlled power-down in a single OxRAM device. The control clock is a simple OR gate controlled by two control signals STR and RSTR. The data generation block is used to program the device during data off-loading and providing supply during restore operation

  1. 1.

    Nonvolatile Block: This block stores the data that is to be off-loaded from the output node Q. When Q \(=\) 0, the OxRAM in this block is programmed to HRS and when Q \(=\) 1, the OxRAM in this block is programmed to LRS.

  2. 2.

    Control Block: This block controls the operation being performed by the off-loading section. It consists of a simple OR gate with two inputs: STR and RSTR. Table 2.4 shows the operation performed by the off-loading block according to the input combinations of the STR and RSTR signals. It is to be noted that STR and RSTR can never be ‘1’ at the same time.

  3. 3.

    Data Generation Block: This block along with the control block off-loads the data or restores the data to the output node Q when a STR or RSTR signal is applied. The block is mainly responsible for the following two tasks:

    1. a.

      Provide write data voltage (V\(_{WR}\)) when data is being off-loaded.

    2. b.

      Provide read data voltage (V\(_{RD}\)) when the data is being read to restore the data.

    For the proposed circuit V\(_{WR}\) is taken as 1.6 V and V\(_{RD}\) is taken as 0.4 V. The read voltage has to be chosen such that the internal state of the OxRAM is not disturbed during data read.

Table 2.4 Operations performed by the control block during off-loading of data to the nonvolatile block
Fig. 2.14
figure 14

Store operation in the proposed modified NV-FF. Green data shows the polarities and data values at off-loading block circuit nodes when Q \(=\) 0 is to be stored in OxRAM. Blue data shows the polarities and the values at off-loading block circuit nodes when Q \(=\) 1 is to be stored. The red data shows the control block signals

2.4.3 Operating Modes of the Proposed NV-FF

  1. 1.

    Active/Normal Mode: During active or normal mode, both the STR and RSTR signals are held at logic ‘0’ and the terminals of the OxRAM device are grounded (V\(_{TB}=0\)). Therefore the OxRAM device does not participate in the normal FF operation. When CLK \(=\) 0, the master stage latches the input from data line D. When CLK \(=\) 1, the feedback path in the master stage holds the last sampled value and the data is transferred to the slave stage. The operation continues till a power-down/sleep-mode is activated.

  2. 2.

    Store Mode: In this mode the control signals, STR \(=\) 1 and RSTR \(=\) 0, are asserted to off-load the data to the OxRAM device. This leads to a logic ‘1’ output at the control block thereby switching on the two transistors in nonvolatile block and the data generation block respectively (refer Fig. 2.14). Since RSTR \(=\) 0, both the multiplexers in the data generation block selects the first input to assert its value at the output. Therefore, on one hand V\(_{WR}\) is chosen which provides the data write voltage of 1.6 V to the power supply of the inverter, on the other hand the data which is to be off-loaded is provided at the input of the inverter through the other MUX. It can be seen that the output of the inverter is opposite to the data value being stored. This makes sure that the polarity of the voltage applied to the OxRAM is properly maintained. When Q \(=\) 0, TE is at lower potential and BE is at higher potential (V\(_{TB}<0\)), therefore the OxRAM is programmed to HRS. Similarly, when Q \(=\) 1, TE is at higher potential than BE (V\(_{TB}>0\)) and therefore the OxRAM is programmed to LRS. After the data is written to the OxRAM block, the power-down signal is asserted to switch off the NV-FF. It is to be noted here that the FF has to wait till the OxRAM is programmed so as to avoid any kind of data corruption during off-loading.

  3. 3.

    Power-Down/Sleep-Mode: In sleep-mode, all the data and the control signals are pulled down. Since the OxRAM device stores the data as its resistive state, the data off-loaded to it remains stored. As the system is fully switched off, the leakage current of this block is negligible.

  4. 4.

    Restore Mode: During the restore mode, CLK \(=\) 0 and RSTR \(=\) 1 are asserted thereby switching ON the transistors in the nonvolatile and the data generation blocks. The data block provides a read voltage (V\(_{RD} = 0.4\) V) to the OxRAM which restores the data in the slave latch. When the OxRAM is in SET state, the internal node Q\(_m\) charges to a logic ‘1’. The action of charging node Q\(_m\) is supported by the inverters in the feedback circuit. Due to the presence of inverter between the data store/restore node Q\(_m\) and output node Q, the original data is restored at the output of the FF. Similar steps are followed when logic ‘1’ is restored from OxRAM. Figure 2.15 shows the restore operation in the propose FF circuit.

Fig. 2.15
figure 15

Restore operation in the proposed modified NV-FF. Here V\(_{RD}\) provides the read voltage that is needed to read the state stored in the OxRAM. The I\(_{read}\) current obtained from the OxRAM charges the load capacitance at the output to a logic value ‘1’ or ‘0’ depending on the state of the OxRAM (LRS or HRS)

Fig. 2.16
figure 16

Timing diagram showing different operating modes of the modified NV-FF. The off-loading is controlled by the control signals STR and RSTR

Table 2.5 Comparison of various NV-FF designs proposed in literature and the proposed NV-FF designs

The timing diagram of the simulated NV-FF block is shown in Fig. 2.16 with different operating modes. The simulations were based on the OxRAM model as given in [56] and 90 nm technology node. The flip-flop operates at 1.6 V. Data is stored in minimum T\(_{store} = 15\) ns and restored in minimum T\(_{restore} = 2\) ns. The NV-FF has a backup energy of 3.08 pJ and restore energy of 0.4 pJ. Since the write current of the OxRAM is small (2.3 \(\upmu \)A for programming OxRAM in LRS and 364 nA for programming OxRAM in HRS), the transistor sized used in the latch can be kept to minimum standard sizing without any additional parasitics.

Table 2.5 shows the comparison of the proposed NV-FFs with other NV-FF designs present in literature. The NV-FF considered in this table ranges from data off-loading in OxRAM, Ferroelectric capacitors to MTJ based devices. While conventional NV-FF rely on serial [36, 40, 41, 55, 63] or two-phase writing [38, 39, 42] during data off-loading, the proposed NV-FF uses a single NV device which works on parallel data writing. This drastically reduces the access time of the NV-FF and the overall energy of the circuit.

2.5 Conclusion

In this chapter, we have presented a real-time 4T-2R NV-SRAM bitcell using HfO\(_x\) based OxRAM devices. We have explained its different operational modes (i.e. Write mode and Read mode) along with multiple programming approaches (Two-cycle and Single-cycle programming schemes). Since stability of NV-SRAM bitcell has been a concern, we presented a detailed analysis summarizing the impact of \(V_{dd}\) scaling and transistor down-scaling on the stability metrics (SNM and N-curve). It is observed that using 4T-2R NVSRAM there is a possibility of transistor down-scaling and lower switching current enables low-power circuit design. We further extended the scope of 4T-2R NV-SRAM bitcell by proposing a real-time NV-FF using it. We also discussed the shortcomings of having OxRAM device actively participating in the normal operation of NV-FF and proposed a modified NV-FF design to mitigate the issues. Although the major challenge pertaining to the design of NV-SRAM and NV-FF is to take care of the abrupt power glitches, active participation of OxRAM device slows down the overall circuit. We believe, with advancement in the material science engineering, this challenge will be addressed. Some developmental works by [10, 64,65,66,67,68,69] give us an idea that the development in this area has started picking up. Thus in days to come, OxRAM based real-time designs will be not only be area and power efficient but show better performance in terms of latency and energy.