Keywords

1 Introduction

Conventional memory technologies, i.e., SRAM, DRAM, and Flash, have achieved remarkable successes in modern computer industry. Following technology scaling, the shrunk feature size and the increased process variations impose serious power and reliability concerns on these technologies.

In recent years, many emerging nonvolatile memory technologies have emerged above the horizon. As one promising candidate, spin-transfer torque random access memory (STT-RAM) has demonstrated great potentials in embedded memory and on-chip cache designs [16] through a good combination of the non-volatility of Flash, the comparable cell density to DRAM, and the nanosecond programming time like SRAM.

In STT-RAM, the data is represented as the resistance state of a magnetic tunneling junction (MTJ) device. The MTJ resistance state can be programmed by applying a switching current with different polarizations. Compared to the charge-based storage mechanism of conventional memories, the magnetic storage mechanism of STT-RAM shows less dependency on the device volume and hence, better scalability. Nonetheless, despite of these advantages, the unreliable write operation and high write energy are to be the major issues in STT-RAM designs. And these design metrics are significantly impacted by the prominent statistical factors of STT-RAM, including CMOS/MTJ device process variations under scaled technology and the probabilistic MTJ switching behaviors [7, 8]. In particular, the randomness of MTJ switching process incurred by the thermal fluctuations may generate the intermittent write failures of STT-RAM cells.

Many studies were performed to evaluate the impacts of process variations and thermal fluctuations on STT-RAM reliability [911]. The general evaluation method is as follows: First, Monte-Carlo SPICE simulations are run extensively to characterize the distribution of the MTJ switching current I during the STT-RAM write operations, by considering the device variations of both MTJ and MOS transistor; Then I samples are sent into the macro-magnetic model to obtain the MTJ switching time (τ th ) distributions under thermal fluctuations; Finally, the τ th distributions of all I samples are merged to generate the overall MTJ switching performance distribution. A write failure happens when the applied write pulse width is smaller than the needed τ th . Nonetheless, the costly Monte-Carlo runs and the dependency on the macro-magnetic and SPICE simulations incur huge computation complexity [1215], limiting the application of such a simulation method at the early stage of STT-RAM design and optimization. Meanwhile, the modeling of write energy in STT-RAM was also studied extensively [16]. However, many such works only assume that the write energy of STT-RAM is deterministic and cannot successfully take into account its statistical characteristic induced by process variations and thermal fluctuations.

In this chapter, we propose “PS3-RAM”—a fast, portable and scalable statistical STT-RAM reliability/energy analysis method. PS3-RAM includes three integrated steps: (1) characterizing the MTJ switching current distribution under both MTJ and CMOS device variations; (2) recovering MTJ switching current samples from the characterized distributions in MTJ switching performance evaluation; and (3) performing the simulation on the thermal-induced MTJ switching variations based on the recovered MTJ switching current samples. By introducing the sensitivity analysis technique to capture the statistical characteristics of the MTJ switching, and dual-exponential model to efficiently and accurately recover the MTJ switching current samples for statistical STT-RAM thermal analysis, PS3-RAM can achieve multiple orders-of-magnitude (\( >{10}^5 \)) run time cost reduction with marginal accuracy degradation under any variation configurations when compared to SPICE-based Monte-Carlo simulations. Finally, we released PS3-RAM from SPICE and macro-magnetic modeling and simulations, and extended its application into the array-level reliability analysis and the design space exploration of STT-RAM.

The structure of this chapter is organized as the follows: Section 2 gives the preliminary of STT-RAM; Section 3 presents the details of PS3-RAM method; Section 4 presents the application of our PS3-RAM on cell and array level reliability analysis and design space exploration; Section 5 shows the deterministic/statistical write energy analysis based on our PS3-RAM; Section 6 discusses the computation complexity; The last section-Appendix gives the detailed theoretical model deduction and its numerical validation for sensitivity analysis.

2 Preliminary

2.1 STT-RAM Basics

Figure 1c shows the popular “one-transistor-one-MTJ (1T1J)” STT-RAM cell structure, which includes a MTJ and a NMOS transistor connected in series. In the MTJ, an oxide barrier layer (e.g., MgO) is sandwiched between two ferromagnetic layers. ‘0’ and ‘1’ are stored as the different resistances of the MTJ, respectively. When the magnetization directions of two ferromagnetic layers are parallel (anti-parallel), the MTJ is in its low (high) resistance state. Figure 1a, b show the low and the high MTJ resistance states, which are denoted by RL and RH, respectively. The MTJ switches from ‘0’ to ‘1’ when the switching current drives from reference layer to free layer, or from ‘1’ to ‘0’ when the switching current drives in the opposite.

Fig. 1
figure 1

STT-RAM basics. (a) Parallel (low resistance). (b) Anti-parallel (high resistance). (c) 1T1J cell structure

2.2 Process Variations and Programming Uncertainty of STT-RAM

2.2.1 Process Variations-Persistent Errors

The current through the MTJ is affected by the process variations of both transistor and MTJ. For example, the driving ability of the NMOS transistor is subject to the variations of transistor channel length (L), width (W), and threshold voltage (V th). The MTJ resistance variation also affects the NMOS transistor driving ability by changing its bias condition. The degraded MTJ switching current leads to a longer MTJ switching time and consequently, results in an incomplete MTJ switching before the write pulse ends. This kind of errors is referred to as “persistent” errors, which are mainly incurred by only device parametric variations. Persistent errors can be measured and repeated after the chip is fabricated.

2.2.2 Thermal Fluctuation-Non-persistent Errors

Another kind of errors is called “non-persistent” errors, which happen intermittently and may not be repeated. The non-persistent errors of STT-RAM are mostly caused by the intrinsic thermal fluctuations during MTJ switching [17]. In general, the impact of thermal fluctuations can be modeled by the thermal induced random field h fluc stochastic Landau-Lifshitz-Gilbert (LLG) equation (1) as [17]:

$$ \frac{d\overrightarrow{m}}{dt}=-\overrightarrow{m}\times \left({\overrightarrow{h}}_{eff}+{\overrightarrow{h}}_{fluc}\right)+\alpha \overrightarrow{m}\times \left(\overrightarrow{m}\times \left({\overrightarrow{h}}_{eff}+{\overrightarrow{h}}_{fluc}\right)\right)+\frac{{\overrightarrow{T}}_{norm}}{M_s} $$
(1)

Where \( \overrightarrow{m} \) is the normalized magnetization vector. Time t is normalized by γM s ; γ is the gyro-magnetic ratio and M s is the magnetization saturation. \( {\overrightarrow{h}}_{eff}=\frac{{\overrightarrow{H}}_{eff}}{M_s} \) is the normalized effective magnetic field. \( {\overrightarrow{h}}_{fluc} \) is the normalized thermal agitation fluctuating field at finite temperature which represent the thermal fluctuation. α is the LLG damping parameter. \( {\overrightarrow{T}}_{norm}=\frac{\overrightarrow{T}}{M_sV} \) is the spin torque term with units of magnetic field. And the net spin torque \( \overrightarrow{T} \) can be obtained through microscopic quantum electronic spin transport model. Due to thermal fluctuations, the MTJ switching time will not be a constant value but rather a distribution even under a constant switching current.

3 PS3-RAM Method

Figure 2 depicts the overview of our proposed PS3-RAM method, mainly including the sensitivity analysis for MTJ switching current (I) characterization, the I sample recovery, and the statistical thermal analysis of STT-RAM. The first step is to configure the variation-aware cell library by inputting both the nominal design parameters and their corresponding variations, like the channel length/width/threshold voltage of NMOS transistor, as well as the thickness/area of MTJ device. Then a multi-dimension sensitivity analysis will be conducted to characterize the statistical properties of I, followed by an advanced filtering technology—smooth filter, to improve its accuracy. After that, the write current samples can be recovered based on the above characterized statistics and current distribution model. The write pulse distribution will be generated after mapping the switching current samples to the write pulse samples by considering the thermal fluctuations. Finally, the statistical write energy analysis and the STT-RAM cell write error rate can be performed based on the samples of the write current once the write pulse is determined. Array-level analysis and design optimizations can be also conducted by using PS3-RAM.

Fig. 2
figure 2

Overview of PS3-RAM

3.1 Sensitivity Analysis on MTJ Switching

In this section, we present our sensitivity model used for the characterization of the MTJ switching current distribution. We then analyze the contributions of different variation sources to the distribution of the MTJ switching current in details. The definitions of the variables used in our analysis are summarized in Table 1.

Table 1 Simulation parameters and environment setting

3.1.1 Sensitivity Analysis on Variations

  1. 1)

    Threshold voltage variations: The variations of channel length, width and threshold voltage are three major factors causing the variations of transistor driving ability. V th variation mainly comes from random dopant fluctuation (RDF) and line-edge roughness (LER), the latter of which is also the source of some geometry variations (i.e., L and W) [18, 19]. It is known that the V th variation is also correlated with L and W and its variance decreases when the transistor size increases. The deviation of the V th from the nominal value following the change of LL) can be modeled by [15]:

    $$ \Delta {V}_{th}=\Delta {V}_{th0}+{V}_{ds}\kern0.35em \exp \left(\frac{L}{l^{\prime }}\right)\cdot \frac{\Delta L}{l^{\prime }} $$
    (2)

    Then the standard deviation of V th can be calculated as:

    $$ {\sigma}_{V_{th}}^2=\frac{C_1}{WL}+\frac{C_2}{ \exp \left(L/{l}^{\prime}\right)}\cdot \frac{W_c}{W}\cdot {\sigma}_L^2 $$
    (3)

    Here W c is the correlation length of non-rectangular gate (NRG) effect, which is caused by the randomness in sub-wavelength lithography. C 1, C 2 and l′ are technology dependent coefficients. The first term in (3) describes the RDF’s contribution to \( {\sigma}_{V_{th}} \). The second term in (3) represents the contribution from NRG, which is heavily dependent on L and W. Following technology scaling, the contribution of this term becomes prominent due to the reduction of L and W.

  2. 2)

    Sensitivity analysis on variations: Although the contributions of MTJ and MOS transistor parametric variabilities to the MTJ switching current distribution cannot be explicitly expressed, it is still possible for us to conduct a sensitivity analysis to obtain the critical characteristics of the distribution. Without loss of generality, the MTJ switching current I can be modeled by a function of W, L, V th , A and T thick . A and T thick are the MTJ surface area and MgO layer thickness, respectively. The 1st-order Taylor expansion of I around the mean values of every parameter is:

    $$ \begin{array}{l}\kern0.35em I\left(W,L,{V}_{th},A,{T}_{thick}\right)\approx I\left(\overline{W},\overline{L},{\overline{V}}_{th},\overline{A},\overline{T_{thick}}\right)+\frac{\partial I}{\partial W}\left(W-\overline{W}\right)+\frac{\partial I}{\partial L}\left(L-\overline{L}\right)\\ {}\kern9.7em +\frac{\partial I}{\partial {V}_{th}}\left({V}_{th}-{\overline{V}}_{th}\right)+\frac{\partial I}{\partial A}\left(A-\overline{A}\right)+\frac{\partial I}{\partial {T}_{thick}}\left({T}_{thick}-\overline{T_{thick}}\right)\end{array} $$
    (4)

    Here W, L and T thick generally follow Gaussian distribution [9], A is the product of two independent Gaussian distributions, V th is correlated with W and L, as shown in (2) and (3).

    Because the MTJ resistance \( R\propto \frac{e^{T_{thick}}}{A} \) [9], we have:

    $$ \frac{\partial I}{\partial A}\Delta A+\frac{\partial I}{\partial {T}_{thick}}\Delta {T}_{thick}=\frac{\partial I}{\partial R}\left(\frac{\partial R}{\partial A}\Delta A+\frac{\partial R}{\partial {T}_{thick}}\Delta {T}_{thick}\right)=\frac{\partial I}{\partial R}\Delta R $$
    (5)

    Equation (5) indicates that the combined contribution of A and T thick is the same as the impact of MTJ resistance. The difference between the actual I and its mathematical expectation μ I can be calculated by:

    $$ I\left(W,L,{V}_{th},R\right)-E\left(I\left(\overline{W},\overline{L},{\overline{V}}_{th},\overline{R}\right)\right)\approx \frac{\partial I}{\partial W}\Delta W+\frac{\partial I}{\partial L}\Delta L+\frac{\partial I}{\partial {V}_{th}}\Delta {V}_{th}+\frac{\partial I}{\partial R}\Delta R $$
    (6)

    Here we assume \( {\mu}_I\approx E\left(I\left(\overline{W},\overline{L},{\overline{V}}_{th},\overline{R}\right)\right)=I\left(\overline{W},\overline{L},{\overline{V}}_{th},\overline{R}\right) \) and the mean of MTJ resistance \( \overline{R}\approx R\left(\overline{A},\overline{\tau}\right) \). Combining (2), (3), and (6), the standard deviation of I (σ I ) can be calculated as:

    $$ \begin{array}{l}{\sigma}_I^2={\left(\frac{\partial I}{\partial W}\right)}^2{\sigma}_W^2+{\left(\frac{\partial I}{\partial L}\right)}^2{\sigma}_L^2+{\left(\frac{\partial I}{\partial R}\right)}^2{\sigma}_R^2\\ {}\kern2.3em +\kern0.35em {\left(\frac{\partial I}{\partial {V}_{th}}\right)}^2\left(\frac{C_1}{WL}+\frac{C_2}{ \exp \left(L/{l}^{\prime}\right)}\cdot \frac{W_c}{W}\cdot {\sigma}_L^2\right)\kern0.35em +\kern0.35em 2\frac{\partial I}{\partial L}\frac{\partial I}{\partial {V}_{th}}{\rho}_1\sqrt{\frac{C_1}{WL}}{\sigma}_L\\ {}\kern-2.5em +\kern0.35em 2\frac{\partial I}{\partial W}\frac{\partial I}{\partial {V}_{th}}{\rho}_2\sqrt{\frac{C_1}{WL}}{\sigma}_W+2\frac{\partial I}{\partial L}\frac{\partial I}{\partial {V}_{th}}{V}_{ds} \exp \left(-\frac{L}{l^{\prime }}\right)\frac{\sigma_L^2}{l^{\prime }}\end{array} $$
    (7)

    Here \( {\rho}_1=\frac{\operatorname{cov}\left({V}_{th0},L\right)}{\sqrt{\sigma_{V_{th0}}^2{\sigma}_L^2}} \) and \( {\rho}_2=\frac{\operatorname{cov}\left({V}_{th0},W\right)}{\sqrt{\sigma_{{}_{V_{th0}}}^2{\sigma}_W^2}} \) are the correlation coefficients between V th0 and L or W, respectively [19]. \( {\sigma}_{V_{th0}}^2=\frac{C_1}{WL} \). Our further analysis shows that the last three terms at the right side of (7) are significantly smaller than other terms and can be safely ignored in the simulations of STT-RAM normal operations.

    The accuracy of the coefficient in front of the variances of every parameter at the right side of (7) can be improved by applying window based smooth filtering. Take W as an example, we have:

    $$ {\left(\frac{\partial I}{\partial W}\right)}_i=\frac{I\left(\overline{W}+i\Delta W,L,{V}_{th},R\right)-I\left(\overline{W}-i\Delta W,L,{V}_{th},R\right)}{2i\Delta W} $$
    (8)

    where \( i=1,2,\dots K. \) Different \( \frac{\partial I}{\partial W} \) can be obtained at the different step i. K samples can be filtered out by a windows based smooth filter to balance the accuracy and the computation complexity as:

    $$ \overline{\frac{\partial I}{\partial W}}={\displaystyle \sum_{i=1}^K{\omega}_i{\left(\frac{\partial I}{\partial W}\right)}_i} $$
    (9)

    Here ω i is the weight of sample i, which is determined by the window type, i.e., Hamming window or Rectangular window [20].

  3. 3)

    Variation contribution analysis: The variations’ contributions to I are mainly represented by the first four terms at the right side of (7) as:

    $$ \begin{array}{l}{S}_1={\left(\frac{\partial I}{\partial W}\right)}^2{\sigma}_W^2,{S}_2={\left(\frac{\partial I}{\partial L}\right)}^2{\sigma}_L^2,{S}_3={\left(\frac{\partial I}{\partial R}\right)}^2{\sigma}_R^2\\ {}{S}_4={\left(\frac{\partial I}{\partial {V}_{th}}\right)}^2\left(\frac{C_1}{WL}+\frac{C_2}{ \exp \left(L/{l}^{\prime}\right)}\cdot \frac{W_c}{W}\cdot {\sigma}_L^2\right)\end{array} $$
    (10)

    As pointed out by many prior-arts [2124], an asymmetry exists in STT-RAM write operations: the switching time of ‘0’ → ‘1’ is longer than that of ‘1’ → ‘0’ and suffers from a larger variance. Also, the switching time variance of ‘0’ → ‘1’ is more sensitive to the transistor size changes than ‘1’ → ‘0’. As we shall show later, this phenomena can be well explained by using our sensitivity analysis. To the best of our knowledge, this is the first time the asymmetric variations of STT-RAM write performance and their dependencies on the transistor size are explained and quantitatively analyzed.

    As shown in Fig. 1, when writing ‘0’, the word-line (WL) and bit-line (BL) are connected to V dd while the source-line (SL) is connected to ground. \( {V}_{gs}={V}_{dd} \) and \( {V}_{ds}={V}_{dd}-IR \). The NMOS transistor is mainly working in triode region. Based on short-channel BSIM model, the MTJ switching current supplied by a NMOS transistor can be calculated by:

    $$ I=\frac{\beta \left[\left({V}_{dd}-{V}_{th}\right)\left({V}_{dd}-IR\right)-\frac{a}{2}{\left({V}_{dd}-IR\right)}^2\right]}{1+\frac{1}{v_{sat}L}\left({V}_{dd}-IR\right)} $$
    (11)

    Here \( \beta =\frac{\mu_0{C}_{ox}}{1+{U}_0\left({V}_{dd}-{V}_{th}\right)}\frac{W}{L} \). U 0 is the vertical field mobility reduction coefficient, μ 0 is the electron mobility, C ox is gate oxide capacitance per unit area, a is body-effect coefficient and v sat is carrier velocity saturation. Based on short-channel PTM model [25] and BSIM model [26, 27], we derive \( {\left(\frac{\partial I}{\partial W}\right)}^2 \), \( {\left(\frac{\partial I}{\partial L}\right)}^2 \), \( {\left(\frac{\partial I}{\partial R}\right)}^2 \) and \( {\left(\frac{\partial I}{\partial {V}_{th}}\right)}^2 \) as:

    $$ \begin{array}{l}{\left(\frac{\partial I}{\partial W}\right)}_0^2\approx \frac{1}{{\left({A}_1W+{B}_1\right)}^4},{\left(\frac{\partial I}{\partial L}\right)}_0^2\approx \frac{1}{{\left(\frac{A_2}{W}+{B}_2W+C\right)}^4}\\ {}{\left(\frac{\partial I}{\partial R}\right)}_0^2\approx \frac{1}{{\left(\frac{A_3}{W}+{B}_3\right)}^4},{\left(\frac{\partial I}{\partial {V}_{th}}\right)}_0^2\approx \frac{1}{{\left(\frac{A_4}{\sqrt{W}}+{B}_4\sqrt{W}\right)}^4}\end{array} $$
    (12)

    Our analytical deduction shows that the coefficients \( {A}_{1-4} \), \( {B}_{1-4} \) and C are solely determined by W, L, V th and R. The detailed expressions of coefficients \( {A}_{1-4} \), \( {B}_{1-4} \) and C can be found in the appendix. Here R is the high resistance state of the MTJ, or R H . For a NMOS transistor at ‘0’ → ‘1’ switching, the MTJ switching current is:

    $$ I=\frac{\beta }{2a}{\left[\left({V}_{dd}-{V}_{th}-IR\right)-\frac{I}{W{C}_{ox}{v}_{sat}^2}\right]}^2 $$
    (13)

    Here R is the low resistance state of the MTJ, or R L . We have:

    $$ \begin{array}{l}{\left(\frac{\partial I}{\partial W}\right)}_1^2\approx \frac{1}{{\left({A}_5W+{B}_5\right)}^4},{\left(\frac{\partial I}{\partial L}\right)}_1^2\approx \frac{1}{{\left(\frac{A_6}{W}+{B}_6\right)}^2}\\ {}{\left(\frac{\partial I}{\partial R}\right)}_1^2\approx \frac{1}{{\left(\frac{A_7}{W}+{B}_7\right)}^4},{\left(\frac{\partial I}{\partial {V}_{th}}\right)}_1^2\approx \frac{1}{{\left(\frac{A_8}{W}+{B}_8\right)}^2}\end{array} $$
    (14)

    Again, \( {A}_{5-8} \) and \( {B}_{5-8} \) can be expressed as the function of W, L, V th and R and the detailed expressions of those parameters can be found in the appendix in this chapter.

    In general, a large S i corresponds to a large contribution to I variation. When W is approaching infinity, only S 3 is nonzero at ‘1’ → ‘0’ switching while both \( {S}_2 \) and S 3 are nonzero at ‘0’ → ‘1’ switching. It indicates that the residual values of S 1S 4 at ‘0’ → ‘1’ switching is larger than that at ‘1’ → ‘0’ switching when \( W\to \infty \). In other words, ‘0’ → ‘1’ switching suffers from a larger MTJ switching current variation than ‘1’ → ‘0’ switching when NMOS transistor size is large.

  4. 4)

    Simulation results of sensitivity analysis: Sensitivity analysis [28] can be used to obtain the statistical parameters of MTJ switching current, i.e., the mean and the standard deviation, without running the costly SPICE and Monte-Carlo simulations. It can be also used to analyze the contributions of different variation sources to I variation in details. The normalized contributions (P i ) of variation resources, i.e., W, L, V th , and R, are defined as:

    $$ {P}_i=\frac{S_i}{{\displaystyle \sum_{i=1}^4{S}_i}},\ i=1,2,3,4 $$
    (15)

    Figures 3 and 4 show the normalized contributions of every variation source at ‘0’ → ‘1’ and ‘1’ → ‘0’ switching’s, respectively, at different transistor sizes. We can see that L and V th are the first two major contributors to I variation at both switching directions when W is small. At ‘1’ → ‘0’ switching, the contribution of L raises until reaching its maximum value when W increases, and then quickly decreases when W further increases. At ‘0’ → ‘1’ switching, however, the contribution of L monotonically decreases, but keeps being the dominant factor over the simulated W range. At both switching directions, the contributions of R ramps up when W increases. At ‘1’ → ‘0’ switching, the normalized contribution of R becomes almost 100 % when W is really large.

    Fig. 3
    figure 3

    The normalized contributions under different W at ‘1’ → ‘0’ switching

    Fig. 4
    figure 4

    The normalized contributions under different W at ‘0’ → ‘1’ switching

3.2 Write Current Distribution Recovery

After the I distribution is characterized by the sensitivity analysis, the next question becomes how to recover the distribution of I from the characterized information in the statistical analysis of STT-RAM reliability. We investigated the typical distributions of I in various STT-RAM cell designs and found that dual-exponential function can provide the excellent accuracy in modeling and recovering these distributions. The dual-exponential function we used to recover the I distributions can be illustrated as:

$$ f(I)=\left\{\begin{array}{l}{a}_1{e}^{b_1\left(I-\mu \right)}\kern1.25em I\le \mu \kern1.25em \\ {}{a}_2{e}^{b_2\left(\mu -I\right)}\kern1em I>\mu \end{array}\right. $$
(16)

Here a 1, b 1, a 2, b 2 and μ are the fitting parameters, which can be calculated by matching the first and the second order momentums of the actual I distribution and the dual-exponential function as:

$$ \begin{array}{l}{\displaystyle \int f(I)dI=1,}\\ {}{\displaystyle \int If(I)dI=E(I),}\\ {}{{\displaystyle \int {I}^2f(I)dI=E(I)}}^2+{\sigma}_I^2\end{array} $$
(17)

Here E(I) and σ 2 I are obtained from the sensitivity analysis.

The recovered I distribution can be used to generate the MTJ switching current samples, as shown in Fig. 5. At the beginning of the sample generation flow, the confidence interval for STT-RAM design is determined, e.g., \( \left[\kern0.15em {\mu}_I-6{\sigma}_I,{\mu}_I+6{\sigma}_I\right] \) for a six-sigma confidence interval. Assuming we need to generate N samples within the confidence interval, say, at the point of \( I={I}_i \), a switching current sequence of [N Pr i ] samples must be generated. Here \( \Pr {}_i\approx f\left({I}_i\right)\Delta \), Δ equals \( \frac{12{\sigma}_I}{N} \), or the step of sampling generation. f(I i ) is the dual-exponential function. Note that N determines both the analysis granularity and the level of the estimated error rate.

Fig. 5
figure 5

Basic flow for MTJ switching current recovery

Figure 6 shows the relative errors of the mean and the standard deviation of the recovered I distribution w.r.t. the results directly from the sensitivity analysis (see (6) and (7)). The maximum relative error <10−2, which proves the accuracy of our dual-exponential model.

Fig. 6
figure 6

Relative errors of the recovered I w.r.t. the results from sensitivity analysis

Figures 7 and 8 compare the probability distribution functions (PDF’s) of I from the SPICE Monte-Carlo simulations and from the recovery process based on our sensitivity analysis at two switching directions. Our method achieves good accuracy at both representative transistor channel widths (\( W=720 \) nm or \( W=720 \) nm).

Fig. 7
figure 7

Recovered I vs. Monte-Carlo result at ‘1’ → ‘0’

Fig. 8
figure 8

Recovered I vs. Monte-Carlo result at ‘0’ → ‘1’

3.3 Statistical Thermal Analysis

The variation of the MTJ switching time (τ th ) incurred by the thermal fluctuations follows Gaussian distribution when τ th is below \( 10\sim 20 \) ns [21]. In this range, the distribution of τ th can be easily constructed after the I is determined. The distribution of MTJ switching performance can be obtained by combining the τ th distributions of all I samples.

4 Application 1: Write Reliability Analysis

In this section, we conduct the statistical analysis on the write reliability of STT-RAM cells by leveraging our PS3-RAM method. Both device variations and thermal fluctuations are considered in the analysis. We also extend our method into array-level evaluation and demonstrate its effectiveness in STT-RAM design optimizations.

4.1 Reliability Analysis of STT-RAM Cells

The write failure rate P WF of a STT-RAM cell can be defined as the probability that the actual MTJ switching time τ th is longer than the write pulse width T w , or \( {P}_{WF}=P\left({\tau}_{th}>{T}_w\right) \). τ th is affected by the MTJ switching current magnitude, the MTJ and MOS device variations, the MTJ switching direction, and the thermal fluctuations. The conventional simulation of P WF requires costly Monte-Carlo runs with hybrid SPICE and macro-magnetic modeling steps. Instead, we can use PS3-RAM to analyze the statistical STT-RAM write performance. The corresponding simulation environment is also summarized in Table 1.

Figures 9 and 10 depict the P WF ’s simulated by PS3-RAM for both switching directions at 300 K. For comparison purpose, the Monte-Carlo simulation results are also presented. Different T w ’s are selected at either switching directions due to the asymmetric MTJ switching performances [21], i.e., \( {T}_w=10,15,20 \) ns at ‘0’ → ‘1’ and \( {T}_w=6,8,10,12 \) ns at ‘1’ → ‘0’. Our PS3-RAM results are in excellent agreement with the ones from Monte-Carlo simulations.

Fig. 9
figure 9

Write failure rate at ‘0’ → ‘1’ when T = 300 K

Fig. 10
figure 10

Write failure rate at ‘1’ → ‘0’ when T = 300 K

Since ‘0’ → ‘1’ is the limiting switching direction for STT-RAM reliability, we also compare the P WF ’s of different STTRAM cell designs under different temperatures at this switching direction in Fig. 11. The results show that PS3-RAM can provide very close but pessimistic results compared to those of the conventional simulations. PS3-RAM is also capable to precisely capture the small error rate change incurred by a moderate temperature shift (from T = 300 to 325 K).

Fig. 11
figure 11

PWF under different temperatures at ‘0’ → ‘1’

It is known that prolonging the write pulse width and increasing the MTJ switching current (by sizing up the NMOS transistor) can reduce the P WF . In Fig. 12, we demonstrate an example of using PS3-RAM to explore the STT-RAM design space: the tradeoff curves between P WF and T W are simulated at different W’s. For a given P WF , for example, the corresponding tradeoff between W and T W can be easily identified on Fig. 12.

Fig. 12
figure 12

STT-RAM design space exploration at ‘0’ → ‘1’

4.2 Array Level Analysis and Design Optimization

We use a 45 nm 256 Mb STT-RAM design [29] as the example to demonstrate how to extend our PS3-RAM into array-level analysis and design optimizations. The number of bits per memory block \( {N}_{bit}=256 \) and the number of memory blocks \( {N}_{word}=1\mathrm{M} \). To repair the operation errors of memory cells, circuit-level technique-ECC (error correction code) is usually applied [30]. Two types of ECC’s with different implementation costs are being considered, i.e., single-bit-correcting Hamming code and a set of multi-bits-correcting BCH codes. We use (n, k, t) to denote an ECC with n codeword length, k bit user bits being protected (256 bit here) and t bits being corrected. The ECC’s corresponding to the error correction capability t from 1 to 5 are Hamming code (265; 256; 1) and four BCH codes–BCH1 (274; 256; 2), BCH2 (283; 256; 3), BCH3 (292; 256; 4) and BCH4 (301; 256; 5), respectively. The write yield of the memory array Y wr can be defined as:

$$ {Y}_{wr}=P\left({n}_e\le t\right)={\displaystyle \sum_{i=0}^t{C}_n^i{P}_{WF}^i{\left(1-{P}_{WF}\right)}^{n-i}} $$
(18)

Here, n e denotes the total number of error bits in a write access. Y wr indeed denotes the probability that the number of error bits in a write access is smaller than the error correction capability.

Figure 13 depicts the Y wr ’s under different combinations of ECC scheme and W when \( {T}_W=15\kern0.5em \mathrm{ns} \) at ‘0’ → ‘1’ switching. The ECC schemes required to satisfy \( \sim 100\% \) Y wr for different W are: (1) Hamming code for \( W=630\kern0.5em \mathrm{nm} \); (2) BCH2 for \( W=540\kern0.5em \mathrm{nm} \); and (3) BCH4 for \( W=480\kern0.5em \mathrm{nm} \). The total memory array area can be estimated by using the STT-RAM cell size equation \( {\mathrm{Area}}_{\mathrm{cell}}=3\left(W/L+1\right)\left({F}^2\right) \) [31]. Calculation shows that combination (3) offers us the smallest

Fig. 13
figure 13

Write yield with ECC’s at ‘0’ → ‘1’, Tw = 15 ns

STT-RAM array area, which is only 88 % and 95 % of the ones of (1) and (2), respectively. We note that PS3-RAM can be seamlessly embedded into the existing deterministic memory macro models [31] for the extended capability on the statistical reliability analysis and the multi-dimensional design optimizations on area, yield, performance and energy.

Figure 14 illustrates the STT-RAM design space in terms of the combinations of Y wr , W, T w and ECC scheme. After the pair of (Y wr , T w ) is determined, the tradeoff between W and ECC can be found in the corresponding region on the figure. The result shows that PS3-RAM provides a fast and efficient method to perform the device/circuit/architecture co-optimization for STT-RAM designs.

Fig. 14
figure 14

Design space exploration at ‘0’ → ‘1’

5 Application 2: Write Energy Analysis

In addition to write reliability analysis, our PS3-RAM method can also precisely capture the write energy distributions influenced by the variations of device and working environment. In this section, we first prove that there is a sweet point of write pulse width for the minimum write energy without considering any variations. Then we introduce the concept of statistical write energy of STT-RAM cells considering both process variations and thermal fluctuations, and perform the statistical analysis on write energy using our PS3-RAM method.

5.1 Write Energy Without Variations

The write energy of a STT-RAM cell during each programming cycle without considering process and thermal variations is deterministic and can be modeled by (19) as:

$$ {E}_{av}={I}^2R{\tau}_{th} $$
(19)

Here I denotes the switching current at either ‘0’ → ‘1’ or ‘1’ → ‘0’ switching, τ th is the corresponding MTJ switching time and R is the MTJ resistance value, i.e., R L (R h ) for ‘0’ → ‘1’(‘1’ → ‘0’) switching. As discussed in prior art [21], the switching process of an STT-RAM cell can be divided into three working regions:

$$ I=\left\{\begin{array}{l}{I}_{C_0}\left(1-\frac{ \ln \left({\tau}_{th}/{\tau}_0\right)}{\Delta}\right),\kern1.5em {\tau}_{th}>10\kern0.5em \mathrm{ns}\\ {}{I}_{C_0}+C\kern.4em \ln \left(\frac{\pi }{2\theta}\right)/{\tau}_{th},\kern1.5em {\tau}_{th}<3\kern0.5em \mathrm{ns}\\ {}\frac{P}{\tau_{th}}+Q,\kern4.75em 3\le {\tau}_{th}\le 10\kern0.5em \mathrm{ns}\end{array}\right. $$
(20)

Here \( {I}_{C_0} \) is the critical switching current, Δ is thermal stability, \( {\tau}_0=1\kern0.5em \mathrm{ns} \) is the relax time, θ is the initial angle between the magnetization vector and the easy axis, C, P, Q are fitting parameters.

For a relatively long switching time range (\( {\tau}_{th}\approx 10\sim 300\kern0.5em \mathrm{ns} \)), the undistorted write energy P av can be calculated as:

$$ {E}_{av}={I}_{C_0}^2{\left(1-\frac{ \ln \left({\tau}_{th}\right)}{\Delta}\right)}^2R{\tau}_{th}=\frac{I_{C_0}^2}{\Delta^2}{\left(\Delta - \ln \left({\tau}_{th}\right)\right)}^2{\tau}_{th} $$
(21)

In the long switching time range, we have \( \ln \left({\tau}_{th}\right)<0 \). Thus, \( {\left(\Delta - \ln \left({\tau}_{th}\right)\right)}^2 \) or E av monotonically raises as the write pulse τ th increases and the minimized write energy E av occurs at \( {\tau}_{th}=10\kern0.5em \mathrm{ns} \).

In the ultra-short switching time range (\( {\tau}_{th}<3\kern0.5em \mathrm{ns} \)), E av can be obtained as:

$$ \begin{array}{l}\kern-0.335em {E}_{av}={\left({I}_{C_0}+C\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)/{\tau}_{th}\right)}^2R{\tau}_{th}\\ {}\kern1em =2{I}_{C_0}RC\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)+{I}_{C_0}^2R{\tau}_{th}+\frac{C^2{ \ln}^2\left(\pi /2\theta \right)R}{\tau_{th}}\\ {}\kern1em \ge 2{I}_{C_0}RC\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)+2\sqrt{I_{C_0}^2{R}^2{C}^2{ \ln}^2\left(\pi /2\theta \right)}\\ {}\kern1em \ge 4{I}_{C_0}RC\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)\end{array} $$
(22)

As (22) shows, the minimum of E av can be achieved when \( {\tau}_{th}=C\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)/{I}_{C_0} \). However, for the ultra-short switching time range (usually \( C\kern0.15em \ln \left(\frac{\pi }{2\theta}\right)/{I}_{C_0}>3\kern0.15em \mathrm{ns} \)), E av monotonically decreases as τ th increases.

Similarly, in the middle switching time range (\( 3\le {\tau}_{th}\le 10\kern0.5em \mathrm{ns} \)), E av can be expressed as:

$$ {E}_{av}={\left(\frac{P}{\tau_{th}}+Q\right)}^2R{\tau}_{th}={\left(\frac{P}{\sqrt{\tau_{th}}}+Q\sqrt{\tau_{th}}\right)}^2R\ge 4PQR $$
(23)

Again, the minimized E av occurs at \( {\tau}_{th}=\frac{P}{Q} \). Here \( \frac{P}{Q}\ge 10\kern0.5em \mathrm{ns} \) based on our device parameters characterization [21]. Thus, the write energy E av in this range monotonically decreases as τ th grows.

According to the monotonicity of E av in the three regions, the most energy-efficient switching point of E av should be at \( {\tau}_{th}=10\kern0.5em \mathrm{ns} \). To validate above theoretical deduction for the sweet point of E av , the SPICE simulations are also conducted. Here the STT-RAM device model without considering process and thermal variations is also adopted from [21].

Figure 15 shows the simulated write energy E av over different write pulse at ‘0’ → ‘1’ switching. As Fig. 15 shows, E av monotonically decreases in the ultra-short switching range and continues decreasing in the middle range, but becomes monotonically increasing after entering the long switching time range. The sweet point of E av occurs around \( {\tau}_{th}=10\kern0.5em \mathrm{ns} \), which validates our theoretical analysis for the write energy without considering any variations.

Fig. 15
figure 15

Average Write Energy under different write pulse width when T = 300 K

We also present the simulated \( {E}_{av}-{\tau}_{th} \) curve under different temperatures in Fig. 16. The trend and sweet point of \( {E}_{av}-{\tau}_{th} \) curves remain almost the same when the temperature increases from T = 300 to 400 K. In fact, the write energy E av decreases a little bit as the temperature increases. The reason is that the driving ability loss of the NMOS transistor (I) dominates E av though the MTJ switching time (τ th ) increases when the working temperature raises.

Fig. 16
figure 16

Average Write Energy vs. write pulse width under different temperature

5.2 PS3-RAM for Statistical Write Energy

As discussed in previous section, the write energy of a STT-RAM cell can be deterministically optimized when all the variations are ignored. However, since the switching current I, the resistance R, and the switching time τ th in (19) may be distorted by CMOS/MTJ process variations and thermal fluctuations, the deterministic value will no longer be able to represent the statistic nature of the write energy of a STT-RAM cell. Accordingly, the optimized write energy at sweet point (\( {\tau}_{th}=10\kern0.5em \mathrm{ns} \)) shown in Fig. 15 should be expanded as a distribution.

Similar to the write failure analysis, we conduct the statistical write energy analysis using our PS3-RAM method. We choose the mean of NMOS transistor width \( W=540\kern0.5em \mathrm{nm} \). The remained device parameters and variation configurations keep the same as Table 1.

Figures 17 and 18 show the simulated statistical write energy by PS3-RAM for both switching directions at 300 K. For comparison, the SPICE simulation results are also presented. As shown in the figures, the distribution of write energy captured by our PS3-RAM method are in excellent agreement with the results from SPICE simulations at both ‘1’ → ‘0’ and ‘0’ → ‘1’ switching’s.

Fig. 17
figure 17

Statistical Write Energy vs. write pulse width at ‘1’ → ‘0’

Fig. 18
figure 18

Statistical Write Energy vs. write pulse width at ‘0’ → ‘1’

6 Computation Complexity Evaluation

We compared the computation complexity of our proposed PS3-RAM method with the conventional simulation method. Suppose the number of variation sources is M, for a statistical analysis of a STT-RAM cell design, the numbers of SPICE simulations required by conventional flow and PS3-RAM are \( {N}_{std}={N}_s^M \) and \( {N}_{\mathrm{PS}3\hbox{-} \mathrm{RAM}}=2KM+1 \), respectively. Here K denotes the sample numbers for window based smooth filter in sensitivity analysis, N s is average sample number of every variation in the Monte-Carlo simulations in conventional method, \( K\ll {N}_s \). Note that our switching current sample recovery flow does not require any extra Monte-Carlo simulations. The speedup \( {X}_{\mathrm{speedup}}\approx \frac{N_s^M}{2KM} \) can be up to multiple orders of magnitude: for example, if we set \( {N}_s=100 \), \( M=4 \), (note: V th is not an independent variable) and \( K=50 \), the speed up is around \( 2.5\times {10}^5 \).

7 Conclusion

A fast and scalable statistical STT-RAM reliability/energy analysis method called PS3-RAM was developed in this chapter. PS3-RAM can simulate the impact of process variations and thermal fluctuations on the statistical STT-RAM write performance or write energy distributions, without running costly Monte-Carlo simulations on SPICE and macro-magnetic models. Simulation results show that PS3-RAM can achieve very high accuracy compared to the conventional simulation method, while achieving a speedup of multiple orders of magnitude. The great potentials of PS3-RAM in the application of the device/circuit/architecture co-optimization of STT-RAM designs are also demonstrated.