1 Introduction

To realize high-speed portable systems based on the Internet-of-things (IoT) such as laptops, cell phones, and tablets, the design of all system-on-chip (SoC) blocks with high density, especially the memory unit, is required [1]. In this regard, various approaches have been presented by the researchers, such as (a) changing the basic logic in transistor design, (b) using new technologies, and c) using modern methods at the circuit level. From the point of view of basic logic selection, multi-valued logic (MVL) has been presented as a promising solution, as it processes and transmits more information simultaneously than conventional binary logic [2, 3]. Meanwhile, various designs of memory cells as a basic block based on ternary logic have been presented as the most efficient MVL to realize high-speed SoCs [2, 3]. A ternary static-random access memory (T-SRAM) structure can reduce the complexity of internal connections by about 33% compared to conventional binary SRAM cells with six transistors (B-SRAM) [4]. In the implementation of triple memory architectures, standard triple inverter (STI) gates are usually used in the back-to-back interconnect. This type of memory architecture faces several challenges, such as significant losses in static power due to two DC currents and increased layout complexity and manufacturing costs due to the need for an additional power supply in the production of logic stage ‘1’ [5]. Recently, researchers have proposed T-SRAM structures by standard ternary buffer (STB) gates without the need for data switching (such as D-latch circuits [6]) to solve the problems of STI-based T-SRAM. On the other hand, many ternary memory designs using STBs still suffer from challenges such as the presence of two DC paths [7, 8] and the need for an additional power supply [8] to generate logic level ‘1’. In terms of technology, the design of SRAM cells based on carbon nanotube field-effect transistors (CNFETs) has been introduced by researchers [9, 10]. Finally, in terms of design methods, novel circuit-level methods have been presented, such as the gate-diffusion input (GDI) method for designing circuits in binary logic, namely the B-GDI method, with the approach of maintaining layout-level complexity and maximum energy storage [10,11,12].

In this paper, a single-bit line (BL) T-SRAM cell using the STB gate structure using a modified triple-GDI (T-GDI) cell in a BWG CNFET technology, namely BWG CN-TGDI, is proposed. The characteristics of the proposed memory architecture structure can be summarized as follows:

  1. (a)

    The use of the single BL architectural design with a specific approach to reduce the complexity of the internal interconnections and the dynamic performance parameter due to the reduction of the coefficient of switching activity in BL.

  2. (b)

    The use of stacked transistor-based triple negative and positive inverters in the proposed STB structure design to control the static current paths and the resulting significant reduction in static power.

  3. (c)

    The use of a pseudo-resistor (PR) structure in the STB gate with a special current control approach in the DC path at the time of logic stage ‘1’ generation and a significant improvement in the power-delay product (PDP) metric.

  4. (d)

    The use of the threshold voltage calibration method by the flat band voltage parameter with the static noise margin (SNM) expansion approach of logic level ‘1’.

  5. (e)

    The use of only two threshold voltage levels in the STB structure and the use of transmission gate (TG) topology to improve (1) the design capability, (2) the triple logic data transmission with full swing voltage, and (3) the improvement of the soft error problem with the support of bit-interleaved architecture.

In this study, the proposed memory structure is compared with ternary SRAM designs based on BWG CNFET technology from recent literature [13, 14]. The simulations have been performed by the H-SPISE software at an ambient temperature (25 °C) and in a specific range of supply voltage (Vdd) around the nominal value of 0.5 V. The Monte Carlo (MC) simulation results show an improvement in static power, PDP, and SNM parameters by 52.8%, 3.79 times, and 15.7% for the proposed cell structure compared to the best-performing design. In addition, the evaluation results of other memory designs are presented using the Tri-EQM and Tri-ECPOT metrics, whereas the proposed T-SRAM architecture achieved the best results compared to other memory architectures. The dynamic noise margin (DNM) evaluation results using average noise threshold energy (ANTE) are also presented using the proposed triple noise injection circuit (T-NIC) for other triple memory designs. The results are illustrative and confirm the outstanding performance of the proposed memory structure compared to other designs in terms of noise immunity. In addition, the evaluation results of other memory designs are presented using the Tri-EQM and Tri-ECPOT metrics, where the proposed T-SRAM architecture achieved the best results compared to other memory architectures.

Finally, on a practical level, the proposed memory architecture is used to store the image pixels of colored quick-response (QR) codes (with black, white, and grayscale modules [15]). The storage process in the memory array is based on the proposed hardware algorithm and the mapping of the image pixel data to the equivalent values of voltage number in three levels and the link between the H-SPICE and MATLAB software. The simulation results for the peak signal-to-noise ratio (PSNR) and mean structural similarity index (MSSIM) are 42 dB and 0.99, respectively, indicating the good performance of the proposed storage array in storing image data. Moreover, other memory arrays have been evaluated based on the figure of merit (FoM), PSNR and MSSIM metrics, and hardware efficiency parameters, and the proposed memory structure has shown better results than other memory cells. Therefore, the memory array structure using the proposed T-SRAM cell with excellent interaction between other hardware parameters and quality can be considered a worthy candidate in the MVL design-based SoC blocks with an application approach in next-generation IoT platforms.

The rest of the text of the paper is divided into the following sections: Sect. 2 briefly presents the basic information about BWG CNFET technology and ternary logic. Section 3 describes the BWG CN-TGDI cell structure and the proposed STB gate. In addition, the T-SRAM cell based on the proposed STB gate is presented in this section along with the description of its functional mechanism. At the end of this section, the proposed memory array design with the capability to support bit-interleaved architecture is presented. Section 4 is devoted to the simulation results and comparison between memory architectures, other hardware performance parameters, Tri-FoMs, and noise-immune. Section 5 presents the application of the proposed memory structure in the field of image processing through the hardware algorithm as well as the quality evaluation results by the proposed FoM. Finally, the article ends with a conclusion.

2 Preliminary Information

The subsections briefly explain the background of the basic concepts required to describe the technology and logic of the proposed memory design.

2.1 Overview of the Ballistic-Wrap-Gate (BWG) CNFET Structure and Characteristics

Nowadays, CNFET devices with amazing physical and electrical properties have been introduced as suitable candidates for the development of VLSI circuits with very small layout dimensions. Figure 1a shows the three-dimensional (3-D) structure of the BWG CNFET with three CNTs (as channel region) for ballistic transfer of carriers between drain and source [10]. In the structure of these devices, the CNTs in the drain/source expansion regions are heavily doped/undoped for the n-type and p-type, respectively. In the gate region, the CNTs are surrounded by dielectric materials with high-k and thickness (Tox) to separate them from the silicon substrate and overcome the bulk effect. Therefore, the bulk terminal can be neglected in this type of device. According to Fig. 1a, the physical width of the gate (Wgate) can be approximated by Eq. (1) [10,11,12]:

$$\begin{aligned}& W_{{{\text{gate}}}} \left( {{\text{including}}\, {\text{overhangs}}} \right) = S \times \left( {N_{{{\text{CNT}}}} - 1} \right) + d_{{{\text{CNT}}}} \\ & \quad + W_{{{\text{Lith0}}}}\approx {\text{Max}} \left( {S \times N_{{{\text{CNT}}}} \cdot W_{{{\text{Lith0}}}} } \right) \\ \end{aligned}$$
(1)
Fig. 1
figure 1

a 3-D schematic of a BWG CNFET with CNT array [10], b Id − Vgs/− Vds curve for an n-type BWG CNFET (in the presence of Vfb variations in the range of 0V to 0.3 V and dCNT from 1 to 1.2 nm)

The dCNT is the diameter of the CNT, S is the distance between the centers of two adjacent CNTs (known as pitch), and WLitho is the minimum possible gate limited by the photolithography process (i.e., 2λ (lambda)). Further details on the parameters of the BWG CNFET device can be found in references [10, 12]. The threshold voltage (Vth,CNT) as the minimum voltage required to turn on the transistor can be expressed by Eq. (2) [10, 12]:

$$V_{{{\text{th}}{\text{.CNT}}}} \left( {{\text{Volt}}} \right) \approx \frac{{aV_{\pi } }}{{\sqrt 3 qd_{{{\text{CNT}}}} \left( {{\text{nm}}} \right)}} + {\Delta }V_{{{\text{th0}}}} + V_{{{\text{fb}}}} - \delta \times V_{{{\text{dsi}}}} { }$$
(2)

a is the distance between the centers of two adjacent nanotubes (lattice constant, \(\sqrt 3\) a0 ≈ 2.45A°). a0 is the carbon–carbon distance in the CNT structure (≈ 0.142 nm). as the ππ carbon bond energy is based on the Hückel tight-binding model and is equal to 3.033 eVolt. q is the unit electron charge. The dCNT is one of the basic physical parameters and is equal to (\(\frac{{a\sqrt {n^{2} + n.m + m^{2} } }}{\pi } \approx 0.0783\sqrt {n^{2} + n.m + m^{2} }\)), where n and m are chirality vectors, and for transistor applications, it must be unequal to 3r (r ∈ Z, is an integer). Assuming that m is zero (in other words, CNT of zigzag type with semiconductor behavior), the dCNT equation is dCNT ≈ 0.0783n. ΔVth0 (Vth roll-off) is the Vth,CNT parameter without the drain-induced barrier lowering (DIBL) effect, and Vfb is the CNT flat-band voltage. δ is the DIBL coefficient in CNT, and Vdsi is the internal drain-to-source voltage [10].

According to Eq. (2), Vth,CNT has an inverse and nonlinear relationship with the dCNT parameter, and determining a specific dCNT for each transistor is very costly and cannot be set with great accuracy. Moreover, this parameter is highly susceptible to process variations and is therefore considered constant during fabrication [9, 16, 17]. On the other hand, Vth,CNT has a linear relationship with the Vfb parameter, and this parameter can be adjusted to achieve the desired Vth,CNT by gate engineering or CNT doping [16, 17]. Figure 1b shows the Id − Vgs/− Vds curves based on the variation of dCNT and Vfb parameters for an n-type BWG CNFET. According to Fig. 1b, the slopes of the curves are almost similar due to the variation of Vfb and dCNT parameters, whereas Vds is at a higher saturation level due to the variation of the Vfb parameter. Therefore, according to the results of Fig. 1b and the explanations mentioned above, the calibration of Vth,CNT by Vfb can be an efficient solution for the design of MVL-based circuits using BWG CNFET technology [9].

2.2 Ternary Logic, Terminology, and its Unique Features

The binary logic system is based on the representation of values in two levels {‘0’ and ‘1’} with corresponding voltage levels, GND and Vdd, in digital electronics. The ternary (or three-level) logic system is a special and optimal variant of the MVL system, which contains three different logic values. Balanced and unbalanced modes are used to represent three logical numbers [16,17,18]. In the balanced approach, the representation of the numbers is based on the set of logical symbols {‘− 1’, ‘0’ and ‘1’} corresponding to the voltage values − Vdd, Vdd/2, and Vdd, respectively. The design of ternary logic systems using balanced notation is rarely used due to the many problems that exist in the realization of negative voltages [18]. In the unbalanced method, the set of logic symbols {‘0’, ‘1’, and ‘2’) corresponding to the voltage levels GND, Vdd/2, and Vdd has been used. After the STI gate, the STB gate is the most basic gate type widely used in the T-SRAM architecture. In this type of gate, output levels similar to the input levels are generated for logical inputs {‘0’, ‘1’, and ‘2’} [7, 8]. Generally, a negative triple inverter (NTI) and a positive triple inverter (PTI) with binary outputs are used in the design of the STB structure [7, 8, 19]. Equation (3) shows the functions of the three gates STB, NTI, and PTI at logic and voltage levels. The truth table of the three gates is also shown in Table 1 (X ∈ [‘0’, ‘1’ and ‘2’]).

$$\begin{aligned} {\text{PTI}}\left( X \right) & = \left\{ {\begin{array}{*{20}l} {`2{\text{'}} - X} \hfill & {{\text{if}} \,X = `0{\text{'}}.`2{\text{'}}} \hfill \\ {`2{\text{'}}} \hfill & {{\text{if}} \,X = `1{\text{'}} } \hfill \\ \end{array} \left( {{\text{in }}\,{\text{logic }}\,{\text{value}}} \right) \,{\text{or}} } \right. \\ & \quad \left\{ {\begin{array}{*{20}l} {V_{{{\text{dd}}}} - X} \hfill & {{\text{if}} \,X = {\text{GND}}.V_{{{\text{dd}}}} } \hfill \\ {V_{{{\text{dd}}}} } \hfill & {{\text{if}} \,X = V_{{{\text{dd}}}} /2} \hfill \\ \end{array} \left( {{\text{in }}\,{\text{voltage }}\,{\text{level}}} \right)} \right. \\ {\text{NTI}}\left( X \right) & = \left\{ {\begin{array}{*{20}l} {`2{\text{'}} - X} \hfill & {{\text{if}} \,X = `0{\text{'}}.`2{\text{'}}} \hfill \\ {`0{\text{'}}} \hfill & {{\text{if}} \,X = `1{\text{'}} } \hfill \\ \end{array} } \right.\left( {{\text{in}}\,{\text{ logic }}\,{\text{value}}} \right)\,{\text{or}} \\ & \quad \left\{ {\begin{array}{*{20}l} {V_{{{\text{dd}}}} - X} \hfill & {{\text{if}}\, X = {\text{GND}}.V_{{{\text{dd}}}} } \hfill \\ 0 \hfill & {{\text{if}} \,X = V_{{{\text{dd}}}} /2} \hfill \\ \end{array} \left( {{\text{in }}\,{\text{voltage }}\,{\text{level}}} \right)} \right. \\ STB\left( X \right) & = X - `0{\text{'}}\left( {{\text{in }}\,{\text{logic}}\,{\text{ value}}} \right){ }\,{\text{or}} \, \\ X &= X - {\text{GND}}\,\left( {{\text{in}}\,{\text{ voltage }}\,{\text{level}}} \right) \\ \end{aligned}$$
(3)
Table 1 Truth table for NTI, STB, and PTI gates

3 Case Study

3.1 The Proposed BWG CN-TGDI Cell Structure

A gate-diffusion input (GDI) technique with a simple transistor-level structure is proposed as a circuit-level method to design energy-efficient sequential and combinational circuits while maintaining complexity at the layout level [20, 21]. Figure 2a shows the binary GDI cell structure based on BWG CNFET technology [10], namely BWG CN-BGDI, which consists of two transistors, pull-up (PU) and pull-down (PD) networks. Table 1 shows the modifications of the P, N, and G terminals in the BWG CN-BGDI cell structure to achieve different Boolean functions. According to the table, the inverter gate (NOT) is formed as a special function of the BWG CN-BGDI cell by connecting Vdd and GND to the diffusion terminals P and N, respectively. In Ref [19], the authors first introduced the GDI cell architecture for triple logic (namely t-GDI cell) based on top-gate CNFET technology for T-SRAM cell design. In the transistor-level structure of the proposed t-GDI cell, the method of dual chirality vectors (in other words, dual-dCNT) for CNTs has been used to adjust the Vth,CNT of the PU and PD transistors. The evaluation of the t-GDI cell architecture shows that the structure deals with important challenges in MVL circuit design, such as insufficient NM and high static power (especially at the middle logic ‘1’). Figure 2b shows the modified t-GDI cell structure by BWG CNFET technology, namely the BWG CN-TGDI cell. The 3-D layout scheme is also shown in Fig. 2c. According to Fig. 2b, the proposed cell structure consists of two parts: The first part contains NTI and PTI gates as a three-to-binary decoder part with a common gate input (terminal G). The PTI and NTI structures have been realized by adding p-/n-type BWG-CNFET devices as inner stacked transistors in the PU and PD networks of the NOT gate, respectively (in other words, PTI and NTI gates with cascode arrangement [22]).

Fig. 2
figure 2

a Structure cells of a BWG CN-BGDI [10], b Proposed BWG CN-TGDI, and c 3-D layout view

The stack structure is used with the approach of increasing the resistance of the path between Vdd and GND, thereby reducing the OFF current for the cell. The second part, as a binary-to-triple encoder, consists of a p-type BWG CNFET connected to the diffusion input p- and an n-type BWG CNFET connected to the diffusion input N in the role of PU and PD networks, where a complementary pseudo-resistor network (PR) consists of two p- and n-type BWG CNFETs with reverse gate voltage bias [23], is also placed between the Out node and the PU and PD networks. Finally, it should be noted that the modifications to improve the design of the T-GDI cell in [19] are the use of stacked transistors in the triple-to-binary decoder part and the PR network in the binary-to-triple encoder part.

3.2 Proposed STB Gate Based on BWG CN-TGDI cell

Figure 3a shows the proposed single-source STB gate based on the BWG CN-TGDI cell, with the diffusion terminals P and N connected to the Vdd and GND levels, respectively. According to the figure, the STB output node voltage (VSTB(x)) can be calculated by the node voltages Vx and Vy and the resistance values T9 and T10 (as PR network), which can be expressed by Eqs. (4) and (5):

$$V_{{{\text{STB}}}} \left( x \right) = \frac{{\left( {{\text{RT}}_{8} + {\text{RT}}_{10} } \right)}}{{{\text{RT}}_{9} + ({\text{RT}}_{8} + {\text{RT}}_{10} )}}{ }V_{x}$$
(4)
$$V_{{{\text{STB}}}} \left( x \right) = \frac{{\left( {{\text{RT}}_{8} + {\text{RT}}_{10} } \right)}}{{{\text{RT}}_{10} }} V_{y}$$
(5)
Fig. 3
figure 3

a Transistor-level schematic of the proposed STB gate (Vfb of each BWG CNFET is indicated in volts next to it), b VTC curve with 4-part divisions with static current, c VTC curves in Vfbp variations in NTI and Vfbn in PTI with static currents

According to the figure, when the input voltage is less than 1/4 Vdd, the value of RT9 is much larger than RT10, so the denominator in Eq. (4) is larger than the numerator.

On the other hand, when RT10 is very small, the fractional part in Eq. (5) is approximately equal to 1 and the value of the output voltage (i.e., VSTB(x)) is close to the Vy value by a factor of one (the formation of the logical noise margin (NM) of ‘0’ in the VTC curve). The opposite condition occurs when the input voltage increases and is close to the nominal voltage (Vdd) in the range of 1/4 Vdd. In such a situation, the value of RT9 is much smaller than that of RT10. As a result, the coefficient of Eq. (4) is almost equal to 1 and the value of VSTB (x) is close to Vx (forming a logical NM of level ‘2’). Finally, if the input voltage is in the interval 1/4 Vdd of the middle axis of VTC (the time interval of the formation of the logic level ‘1’), the values of RT9 and RT10 are equal, whereas the value of RT8 is greater than the resistance of the PR structure. In this situation, the coefficients of Eqs. (4) and (5) are approximately equal to 1, and VSTB (x) is obtained from the average of Vx and Vy with unity gain. In Fig. 3b, the VTC of the proposed STB gate is shown with the values of the NMs at 1/4 Vdd and 1/2 Vdd for levels ‘0’, ‘2’ and ‘1’, respectively. According to the figure, the calibration of the threshold voltages is based on only two types of Vfb. A weaker n-type BWG CNFET (Vfb =  + 0.26 V) is used for the PTI gate design and a weaker p-type BWG CNFET (Vfb = − 0.26 V) is used for the NTI gate design. To achieve symmetrical and optimal NM variations around the stable point (i.e., Vdd/2) and form a logic level ‘1’ with suitable width, the Vfb value for the transistors in the PR network is considered equal to |0.26 V|. Figure 3b shows a diagram of the static current drawn from the voltage source at different logic levels. According to the figure of static current, near 0 mV (logical ‘0’) is almost equal to 0.056 nA, and near 500 mV voltage (logical ‘2’) is also about 0.049 nA. As can be seen from the figure, the proposed buffer architecture has a low static current of about 105 nA at almost 250 mV, which is due to the precise control of the static current in the path from Vdd to GND by the PR network at the time of generating logic level ‘1’.

Figure 3c shows the effects of Vfb variation across the hysteresis window in the VTC with static currents for the proposed STB gate. It can be seen from the figure that when Vfbp is decreased in T1, the VTC curves of the NTI gate are shifted to the left, and the NM decreases at logic ‘0’. On the other hand, the increase of Vfbn in T4 shifts the VTC curves of the PTI gate to the right, and the NM decreases in logic ‘2’.Therefore, it is possible to achieve NMs close to Vdd/4 for logic levels ‘0’ and ‘1’ (as ideal NMs in the ternary logic-based inverter gate design [24]) with an optimal selection of Vfb for NTI and PTI. The logic function of the proposed STB gate based on the VTC curve in Fig. 3b is as follows:

  1. 1.

    When the sweep input voltage is equal to GND, the binary outputs of the NTI and PTI gates in the decoder part are equal to the logic level ‘2’. Therefore, T8 in the encoder part is ON and the signal with the GND level is transmitted from the node y and the T10 path (in the PR network) to the output node (STB (X)) (the formation of a NM close to 1/4 Vdd on the left side of the VTC curve).

  2. 2.

    When the input voltage rises to Vdd/2, the binary outputs of the NTI and PTI gates are at logic levels ‘0’ and ‘2’, respectively, and both T7 and T8 switch on. Therefore, the VSTB (X) node corresponds to the average voltage level of nodes x and y (forming a logic level ‘1’ with an NM close to 1/4 Vdd in the center of the VTC curve). In such a situation, the static current is controlled by the PR network, which will lead to the improvement of the PDP parameter of the STB structure.

  3. 3.

    Finally, when the input voltage rises to Vdd, the binary output of the NTI and PTI gates is inhibited to a logic ‘0’. Therefore, T7 in the encoder part will be activated and the signal with the Vdd level in node x from the T9 path (in the PR network) will appear in the output node (forming an NM close to 1/4 Vdd on the right side).

3.3 Tri-valued SRAM Structure Based on the Suggested STB Gate and its Functional Mechanism

Figure 4 shows the transistor schematic of the proposed T-SRAM cell with 14 transistors in a single BL topology. The structure of the core architecture is based on the latch section, which consists of the proposed STB (with a positive feedback path) and a pair of TGs as an access network connected to the row-word line (RWL) and column-word line (CWL) and their complement. According to Fig. 4b, the functional cycles of the memory structure for storing three logical levels are described as follows:

Fig. 4
figure 4

a Conceptual model and b transistor-level architecture of the proposed T-SRAM cell based on STB gate

Write cycle before this cycle begins, BL is precharged to the GND level. The write cycle begins with the placement of RWL and CWL and their complement at high and low voltage levels, respectively, activating the TG pair as an access network. Next, the data from the BL and TG paths are transferred quickly and in full swing to node q, and its value is updated.

Maintenance (hold) cycle This cycle begins with the RWL and CWL signals and their complement going to low and high voltage levels, respectively, as the access network is disabled and the latch part is disconnected from BL. In the following, the data in node q are kept at the correct value by the STB gate and the positive feedback path with a suitable NM.

Read cycle Before this cycle begins, BL is precharged to the GND level again. The read cycle from the cell is activated by resetting RWL and CWL and their complement to high and low voltage levels, respectively. The data are transferred from node q at high speed at the correct voltage level and without voltage drop from the path of the access network to outside the cell. This cycle ends with the placement of the data on BL.

3.4 Single-Row/-Column Memory Array Architecture and its Peripheral Blocks

Figure 5 shows the proposed memory array using T-SRAM cell with a low internal interconnect complexity level. According to the figure, the other write, control, and read blocks are based on the binary functions of the BWG CNT-BGDI method in Table 2. To achieve full voltage swing read cycles, write and read blocks have been used in a modified F1 cell design based on the full swing B-GDI method [25].

Fig. 5
figure 5

Proposed T-SRAM cell-based single-row/column memory array with peripheral blocks and control signals

Table 2 Basic logic functions based on BWG CN-BGDI cell [10]

According to Fig. 5, the conditions for transferring data to the memory structure through the TG input network (in other words, Pp and Np) are based only on the chip enable (CE) signal. In addition, the control block has the highest density of transistors among the other side blocks. On the other hand, due to the use of NOT and F2 functions in the design, this block is immune to the problem of misaligned/misplaced CNTs [10]. Before starting the write cycle, BL is precharged to the GND level by activating and deactivating the precharge (Pre) and write enable (WE) signals through the PD network transistors of the F1 cell in the input buffer/write block (i.e., Nw2 and Nw3). Then, by changing the WE signal to a low voltage level and the RWL and CWL signals to a high voltage level, the logic data ‘0’, ‘1’, or ‘2’ in the input/output port from the path of the Pp, Np, Pw1, and Nw1 transistors are placed on BL. In the following, they are stored through the path of the access network in node q. On the other hand, the read block (F1 cell in the data read path) is floating due to the inactivity of the RE and Pre signals. The hold cycle starts by applying RWL and CWL and their complements to GND and Vdd, respectively, and disconnecting the latch section (STB gate with positive feedback path) from BL. In a similar process to the write cycle, BL is pre-charged to the GND level before the read cycle by reactivating the Pre signal. On the other hand, the input/output port is precharged to the GND level by Ns2 and Ns3. When RWL and CWL are reactivated, the data are applied to BL. The data are then shifted to the output block by setting the Pre and RE signals to high and low voltage levels, respectively, and the conditions for data transmission to the bi-directional port are established. The read cycle ends with data transmission through the PD network transistors of the F1 cell in the read block (i.e., Ns1 and Ps1) at the input/output port. Since the BL is floating during the reading process by the write block due to the non-activation of the TG (Nw1 and Pw1) and PD (Nw2 and Nw3) networks, there is no disturbance in the data reading process and the data can be received with appropriate NM in the bi-directional port. Figure 6 shows the simulation results of the proposed memory array, whereas the performance of the memory architecture has been suitable in receiving, storing, and transmitting logical data ‘0’, ‘1’, and ‘2’ with full swing voltage. In the end, according to the results in the time of less than 10ns (when the CE signal is at a low voltage level), the memory array is not able to receive the logical data ‘2’.

Fig. 6
figure 6

Simulation results of the proposed memory array for logical data sets ‘0’, ‘1’, and ‘2’

3.5 Proposed Bit-Interleaved Architecture

Today, the array structure of modern memories is based on bit-interleaved (BI) architecture with the process of writing each bit of data separately in memory words with a simple bit error correction approach [12]. Figure 7 shows the 2 × 2 memory array architecture with four T-SRAM cells in four positions (Select, Row Half-Select (RHS), Column Half-Select (CHS), and Unselect) based on the BI architecture. Table 3 shows the status of the other control signals when writing logical (Digit) ‘1’ data to the upper left T-SRAM cell. It can be seen from Fig. 7 and Table 3 that by placing the RWL_0 and CWL_0 signals at high voltage levels and turning on the TG pair, the data write cycle has occurred in the upper left cell as a selected cell. In such a situation, although the row and column selection signals for two RHS and CHS cells are active by the decoder circuits, due to the inactivity of one of the TG pairs, the data access and change in the q-node of these cells are not possible. In the unselected cell (bottom right cell), each TG pair (access network) is in the off state due to the inactivity of the RWL_1 and CWL_1 signals, and there is no way to send data into the cell. Therefore, for the three cells in the memory array, there will be no path to change the data contents in their q nodes. So the data in these cells will be preserved in their logical value. Ultimately, the proposed SRAM structure can be fully compatible with the BI architecture and solve the HS problem, despite the access network consisting of pairs of TGs.

Fig. 7
figure 7

Simplified configuration of 2 × 2 memory array in the operation of writing data ‘1’ inside the upper left bit-cell (red and green colors indicate low and high signal levels, respectively)

Table 3 Status of control signals in data write cycle ‘1’ in 2 × 2 memory array based on BI architecture

4 Simulation Results and Circuit-Level Evaluation

This section first presents the transistor model and the settings used for the simulation environment. Then, the results of the performance evaluation of T-SRAM cells based on hardware metrics are discussed. Finally, the results of the noise-immune curve evaluation of other triple memory designs are presented.

4.1 Simulation Environment Setup

To carry out simulations, the developed Verilog-A Virtual-Source (VS) CNFET model introduced for BWG CNFET devices in the Synopsys H-SPICE environment [10] has been used. The MC simulations have been performed in a Gaussian distribution with ± 10% deviation at the ± 6σ level and 3000 point iterations in 16 nm technology (Lg = 16 nm). To achieve an optimal mean value of the emission error, the distance between the CNTs (in other words, the pitch) and the NCNT parameters for each transistor are assumed to be 4 nm and four tubes, respectively, according to references [10, 26].

4.2 Estimation of the Layout Area

Figure 8a, b shows the 2-D and 3-D views of the transistor-level layout drawn using the computer-aided design (CAD) electrical-VLSI design for a single-row/single-column memory array based on the proposed T-SRAM. The minimum feature size in the layout design (in other words, half the length of the physical gate (λ), 2λ = Lgmin) is assumed to be 16 nm. After the layout design, other design rule checks (DRC), layout versus schematic (LVS), and electrical rule checks (ERC) are also performed. Figure 8 shows that the structure of the proposed memory array and the peripheral circuits occupies a layout level of about 22 µm2, which indicates that it is suitable for designing a SoC with the approach of IoT applications.

Fig. 8
figure 8

Layout views: a 2-D and b 3-D structure of the proposed triple SRAM cell-based memory array

4.3 Evaluation of the SNM

The stability of a ternary memory structure to evaluate the proper storage of ternary values can be determined by the SNM. This parameter is defined as the maximum value of the DC noise voltage (Vn) that a memory structure can withstand before the stored bit is flipped. This parameter can be determined graphically using the butterfly curve (based on the combination of VTC and VTC−1) [7, 8]. Figure 9 shows the butterfly curve for the proposed T-SRAM in the presence of Vfb parameter variations in an MC run with 30 iterations. As can be seen from the figure, the SNM values for the NTI and PTI gates and the memory cell (i.e., Cell SNM) are equal to the smallest diameter surrounded by two and four squares between the VTCs, respectively, which can be expressed by Eqs. (6) and (7):

$${\text{SNM}}_{{{\text{NTI}}}} \& {\text{ SNM}}_{{{\text{PTI}}}} = {\text{Min}}\left( {{\text{SNM}}_{0} ,{\text{SNM}}_{2} } \right)$$
(6)
$${\text{Cell}}\,{\text{ SNM }} = {\text{Min}}\left( {{\text{SNM}}_{0} ,{\text{SNM}}_{1 + } ,{\text{SNM}}_{1 - } ,{\text{SNM}}_{2} } \right)$$
(7)
Fig. 9
figure 9

Butterfly diagram for NTI and PTI gates and proposed T-SRAM cell under process variations in Vfb parameter

The simulation results show that the Cell SNM for the proposed triple memory cell is limited to the SNM1+ and is around 121 mVolt.

4.4 Investigation of the Hardware Performance Parameters

Table 4 shows the evaluation results of other proposed and studied T-SRAM architectures in terms of transistor-level structure and hardware performance parameters. In addition, to get an overview of the performance quality of triple memory designs, a comparison between other designs and the single-BL B-SRAM cell design based on the BWG CN-BGDI method [12] has been performed. The core of the B-SRAM cell consists of two inverters with an asymmetric transistor structure based on the BWG CN-BGDI cell. According to the schematics in the table, the proposed memory architecture has a cellular and controlled structure with lower complexity than other triple memory cells. On the other hand, according to the results presented for the layout parameter, the proposed triple memory cell has a larger area than the binary memory cell, only about 0.139 µm2. Moreover, the layout area of the proposed memory cell is 1.79 times smaller than that of the triple memory cell in Ref. [14], which means a saving in the layout area of about 44%. The results of SNM parameters at different logical levels are also shown in the table for other designs. Based on the results presented, the proposed cell has the largest value of SNM at logical level ‘1’ (the critical NM level in triple logic circuit design) compared to other triple memory cells. The NMs for logical levels ‘0’, ‘1’, and ‘2’ are within a limit of maximum difference of about 16.7 mV, which expresses the uniform expansion and is close to the idea (i.e., 1/4 Vdd) of NMs between four eyes in a butterfly curve (see Fig. 9).

Table 4 Evaluation of other studied BWG CNFET triple SRAM structures based on hardware performance parameters

Table 4 shows the evaluation results of other hardware performance parameters such as delay and power consumption in logical transmission levels with PDP (as the average energy) and energy-delay product (EDP), as the trade-off parameter between the worst-case delay and power. According to the results, the average power consumption for the triple memory cell is the lowest when storing logical levels ‘0’ and ‘2’ compared to the B-SRAM cell, whereas the power consumption of the memory structure in Ref. [14] is close to the B-SRAM cell for these levels. Moreover, the proposed memory architecture (with a single DC path) has provided more suitable results in terms of PDP and EDP compared to the triple cell with the best performance of about 61% and 69%, respectively.

4.5 Evaluation of Other T-SRAM Cells Based on Triple FoMs

In this section, a comprehensive comparison of the performance of triple memory structures based on the extension and modification of the proposed FoMs has been used to evaluate the binary SRAM designs in references [28] and [27]. In the FoMs equations, an attempt has been made to consider and incorporate other important factors in evaluating the hardware efficiency of triple designs. Due to the importance of adequate performance of ternary memory designs in logic ‘1’ and the FoM of Ref. [27], a suitable ternary FoM is proposed, namely the tri-EQM (tri-variate-electrical quality metric), which incorporates other factors affecting the generation time of logic ‘1’ and can be expressed by Eq. (8):

$${\text{Tri}} - {\text{EQM}} = \frac{{{\text{SNM}} \,^{\prime}1^{\prime}}}{{ \left( {{\text{Write}} \,^{\prime}1^{\prime}\,{\text{Energy}} \times {\text{read}}\,^{\prime}1^{\prime}\,{\text{Energy}}} \right) \times {\text{leakage}} \,{\text{power}} \times {\text{Normailzed}}\_{\text{area}} }}$$
(8)

The numerator includes the SNM parameter of logic ‘1’ and the denominator includes other circuit-level parameters that are affected by variations in cell voltage. On the other hand, based on the FoM of Ref. [28], Eq. (1), and the particular importance of the number of CNTs to the hardware efficiency of designs by CNFET technology, a triple FoM, namely the tri-variate energy consumption per number of nanotubes (Tri-ECPOT), has been proposed, which is suitable for a comprehensive evaluation of the three-valued design and can be expressed by Eq. (9):

$${\text{Tri}} - {\text{ECPOT}} = \frac{{{\text{Avg}}.\, {\text{Propagation}} \,{\text{delay}} \times {\text{Avg}}.\, {\text{power}} \,{\text{consumption}} \left( {{\text{at}}\,^{\prime}0^{\prime\prime}1^{\prime\prime}2^{\prime}\,{\text{logic}}\, {\text{levels}}} \right) }}{{{\text{Number}}\, {\text{of}} \,{\text{tubes }} \times {\text{Number}}\, {\text{Transistor}} \,{\text{cell}}}}$$
(9)

Figure 10 shows the normalized results of Tri-EQM and Tri-ECPOT for other designs under varying the Vdd in the voltage range from 0.35V to 0.65 V at an ambient temperature. The results show that the values of the two metrics for the designs have variations significantly with the increase of Vdd. The proposed memory cell has more optimal Tri-EQM and Tri-ECPOT values than other structures in the tested range. Thus, the proposed T-SRAM cell achieved higher values of about 2.14 × and 52% for Tri-EQM and Tri-ECPOT, respectively, than ternary memory structures at nominal supply voltage and similar design technology.

Fig. 10
figure 10

Evaluation results of a Normalized Tri-EQM, and b Normalized Tri-ECPOT

4.6 Investigation and Evaluation of the Noise-Immune Curve

Due to the critical nature of the DNM problem, especially for MVL-based designs, it is very important to study it for triple memory structures. In this context, the noise injection circuit (NIC) scheme presented in Ref. [29] to evaluate the noise problem in ternary logic structures, an improved NIC scheme using BWG CN-BGDI cell functions is proposed. Figure 11 shows the transistor-level circuit of the proposed triple NIC (T-NIC) circuit. As can be seen from the figure, the input circuit is based on an adjustable delay block to move one of the inputs to an XNOR gate. The noise duration (tn), in other words, the noise pulse width (Wn), can be set by the signal Vc. Finally, after the noise pulse is formed by the XNOR gate as the middle stage, the amplitude of the noise pulse (Vn) can be set in triple levels and received by the STI gate proposed in Ref. [19] as the last stage. To generate incremental noise pulses from logic levels ‘0’ to ‘1’, the Vdd,n terminals should be changed to the voltage level Vdd/2, while to generate incremental noise pulses from logic levels ‘1’ to ‘2’, the voltage at the Vdd,n terminals should be at the nominal voltage level (i.e., Vdd). Figure 12a shows the 3-D view of the noise immunity curves extracted from applying the noise pulse by the proposed T-NIC on triple memory cells. When the amplitude of the noise pulse rises above the level of any curve of a memory design (entering the noise region into the vulnerable region), it can cause distortion and data rotation in the structure of that memory design. The figure shows that the proposed memory architecture has the highest curve level among the memory designs, which indicates the greater strength of the proposed design against the noise disturbance problem. For a comprehensive evaluation of the noise immunity problem based on the results obtained by applying the noise signal through T-NIC, the average noise threshold energy (ANTE) can be used, which is shown in Eq. (10) [10]:

$${\text{ANTE}} = E\left( {V_{{{\text{noise}}}}^{2} \times T_{{{\text{noise}}}} } \right) = \frac{{\mathop \sum \nolimits_{{i = t_{n} }}^{{T_{{n,{\text{crit}}}} }} {\text{DNM}}_{i} }}{N}$$
(10)
Fig. 11
figure 11

Proposed T-NIC: a Noise pulse increment from logic level ‘0’ to ‘1’, and b Noise pulse increment from logic level ‘1’ to ‘2’

Fig. 12
figure 12

a Noise-immunity curves (each point in the curve is composed of two components Vn and Wn), and b NANTE parameter results for other T-SRAM cells

E() represents the expectation operator. Higher values of ANTE for a circuit structure indicate that the structure is more resistant to DNM. In the equation, the tn and tn,crit are the first and last time points, respectively, that are taken into account for the evaluation of the DNM. The N parameter also specifies the number of test points.

In addition, according to the concept of ANTE and PDP, a more comprehensive criterion for comparing a circuit design without the influence of the transistor size parameter, namely normalized ANTE (NANTE), can be expressed, which is shown in Eq. (11):

$${\text{NANTE}} = {\text{ANTE}}\left( {V^{2} .s} \right)/{\text{PDP}}_{{{\text{crit}}}} \left( {{\text{nW}}.{\text{ps}}} \right)$$
(11)

PDPcrit represents the average power-delay product obtained for the sensitive node under the influence of noise. Figure 12b shows the results of evaluating the NANTE parameter for other memory schemes with noise applied to node q by the proposed T-NIC. Based on the results shown in the figure, the proposed design has the highest NANTE value among the other T-SRAM designs, which indicates the more suitable performance of the proposed architecture from the DNM point of view. On the other hand, the memory design in Ref. [14] has the worst result in terms of noise immunity.

5 Application Level: Image Processing

Binary image processing is a common solution in many applications to simplify hardware design and save significant energy due to its fast performance and quantization of gray image pixels into black and white pixels (i.e., binary data ‘0’ and ‘1’) [17, 30, 31]. However, image processing based on binary logic leads to lower accuracy and loss of important data. Figure 13 shows a qualitative comparison between binary and ternary images of a MATLAB grayscale image “Penguins” to confirm the loss of data with binary logic. It can be seen from Fig. 13 that more details have been removed from the original gray image in the binary image than the ternary image. Therefore, preserving the gray level of the image using the concept of ternary logic is an effective solution to obtain more data, which can improve the field of image processing with practical approaches such as pattern recognition and machine vision [16].

Fig. 13
figure 13

Qualitative comparison between penguin images to show data loss in binary image processing: a original grayscale image, c binary level image, and d ternary level image

5.1 Colored QR Code Image Storage Using the Proposed Algorithm

The grayscale QR code images, which have a larger storage capacity than conventional binary QR codes [15], can be considered suitable candidates for data storage based on ternary logic. This type of image consists of three modules: black with the value 0 (as logical data ‘0’), grayscale with the value 128 (as logical data ‘1’), and white with the value 255 (as logical data ‘2’). Figure 14 shows the different steps to store a gray QR code image using the proposed hardware algorithm in the triple memory array structure. The execution process in Fig. 14 is based on the simultaneous link between Synopsys H-SPICE and MATLAB. More specifically, the MATLAB environment is used to perform the pre-processing and post-processing steps, while the simulation is performed in the HSPICE tool. The steps of the image data storage process as shown in Fig. 14 and the proposed algorithm in Table 5 are as follows:

  • Step 1 First, a colored QR code image is generated based on the process shown in Fig. 15 by merging binary and grayscale QR code images. In the next step, the colored QR image is transferred to the MATLAB environment, and its size is changed to n (length) = 255 × m (width) = 255 (65,025 pixels in total) using MATLAB functions.

  • Step 2 In this step, the goal is to generate a digital pulse at three voltage levels from the pixel data of the colored QR image, where the number of pulses is equal to the number of image pixels, to apply to the memory array in the H-SPICE environment. In this context, by [PIXEL × Vdd/255] and considering Vdd = 0.5, each pixel in the colored QR code image matrix is multiplied by a numerical value of about 0.0019 to create the corresponding voltage matrix with components around three voltage levels (i.e., GND, Vdd/2, and Vdd). Then, the obtained matrix is converted into a digital pulse by the piecewise linear (PWL) function in H-SPICE, which is applied to the transistor-level library.

  • Step 3 At this stage, the results stored at the transistor level are returned through a re-link between H-SPICE and MATLAB and in a reverse execution process through a return function into a matrix with numerical components corresponding to pixel values between 0 and 255. In the end, image processing commands in MATLAB are used to print the stored image from the obtained matrix for quality evaluation. Therefore, the proposed algorithm provides results close to reality by creating the conditions that the voltage values equivalent to the pixel data on the transistor level of the memory circuit are involved with the hardware level.

Fig. 14
figure 14

The storage mechanism of a colored QR code image sample based on the proposed algorithm with real evaluation by the smartphone

Table 5 Overview of the steps of the proposed algorithm based on the link between MATLAB & HSPICE
Fig. 15
figure 15

Merge binary QR code images at two brightness levels and create a colored QR code image

5.2 Performance Evaluation and Quality Estimation

To understand the quality of data storage in the proposed triple memory array structure, two test images based on colored QR codes with different text content are considered. The quality evaluation results of the stored images based on PSNR and MSSIM for other NTI, PTI, and q nodes (as sensitive memory nodes) in the buffer body are shown in Table 6. The numbers for the quality metric mentioned in the table are the average results obtained for each metric in three consecutive runs. According to the results presented, the values obtained for the PSNR and MSSIM metrics are within the limits of 42 dB and 0.99, respectively, which indicates the suitability of the proposed design for use in Internet-of-things-based platforms.

Table 6 The results of storing color QR code images in the proposed memory structure with PSNR and MSSIM values

Table 7 shows the evaluation results of hardware performance parameters (such as delay, power, and average energy consumption) and quality (PSNR metric concerning the colored QR code image in sensitive node q) for other memory schemes. The results listed in the table show the good performance of the proposed memory design in terms of hardware efficiency (with an average improvement of 1.5 times in PDP) and quality (with an average improvement of 5.4% in PSNR) compared to the memory design in references [13, 14]. Based on references [32, 33], an energy-efficient FoM is proposed by hardware efficiency and quality metrics with a comprehensive evaluation approach for T-SRAM architectures, which is expressed in Eq. (12):

$${\text{FoM}} = \frac{{\overbrace {{\left( {{\text{PSNR}} \times {\text{Mean}} \,{\text{SSIM}}} \right)^{2} }}^{{{\text{Quality}}\,{\text{ metrics}}}} \times \overbrace {{{\text{Transistor}}\, {\text{saving}} }}^{{{\text{Cell}} \,{\text{density}}}}}}{{\underbrace {{{\text{EDP}}\, {\text{saving}}}}_{{{\text{Speed}} \& {\text{Power}}}} }}$$
(12)
Table 7 Evaluation results of other parameters of hardware efficiency and quality for other designs

The memory structure with higher values for this FoM indicates the suitability of the scheme for use in image processing applications based on triple logic. The EDPsaving and TransistorSaving indicate the amount of energy improvement and the design level, respectively, compared to the cell structure in Ref. [14] with worse performance conditions. Figure 16 shows the proposed FoM parameter values for other triple memory architectures. As can be seen from the results, the proposed memory array structure has shown better performance than the two counterparts. The memory structure in Ref. [14] has a lower FoM value in the evaluation of both images due to the higher PDP value and poorer image storage quality.

Fig. 16
figure 16

Evaluation of triple memory architecture based on the proposed FoM

6 Conclusion

In this paper, a new design of a single-BL T-SRAM cell with minimal layout complexity using the BWG CN-TGDI method is proposed for application in IoT-based platforms. The memory structure consists of the combination of an STB gate with single and positive feedback (as the latch part) and a pair of TGs (as the access network). In the STB cell structure, the calibration of the Vth,CNT based on the Vfb parameter is used to extend the SNM at logic level ‘1’. In addition, PR and stacked transistor structures are used in the body of the STB cell to control the static current paths, especially in the generation of logic level ‘1’. Based on the proposed triple SRAM structure, a memory array with peripheral circuits utilizing the functions of the BWG CN-BGDI cell and supporting the BI architecture in a layout area of about 22 µm2 is developed.

The hardware performance evaluation results show an improvement in PDP and EDP parameters of about 38.8% and 31%, respectively, for the proposed memory cell compared to the memory structure with better performance. Also, the evaluation results of Tri-EQM and Tri-ECPOT metrics (in terms of nominal supply voltage) show an improvement of 2.14 times and 52%, respectively, for the proposed architecture compared to the counterpart memory cell. In addition, noise immunity evaluation has also been performed by the proposed T-NIC circuit. The results show a 1.93 × improvement in the NANTE metric for the proposed scheme over the best-performing structure.

Finally, the memory architecture is used in a real application to store colored QR code images with three-valued data content using the proposed hardware algorithm. The simulation results confirm the good performance of the proposed memory array design in terms of the quality metrics, PSNR and MSSIM, which are about 42 dB and 0.99, respectively. To comprehensively evaluate other memory architectures, a FoM based on hardware performance and quality metrics is proposed. The results show that the proposed memory architecture outperforms the counterpart memory structure by about 54.5%. Therefore, the proposed triple memory array with minimal complexity and hardware layout level can be considered a suitable candidate for developing SoC chips with an application approach for IoT platforms.