Embedded STT-MRAM: Device and Design

Kang, Seung H.; Jung, Seong-Ook

doi:10.1007/978-1-4939-2163-8_4

Seung H. Kang² &
Seong-Ook Jung³

1810 Accesses
1 Citations

Abstract

Spin-transfer-torque magnetoresistive random access memory (STT-MRAM) is made of a combination of semiconductor integrated circuits (IC) and a dense array of nanometer-scale magnetic tunnel junctions (MTJ). This emerging memory is of growing technological interest due to its potential to bring disruptive device innovation to the world of electronics. STT-MRAM is capable of providing high speed, unlimited endurance, and nonvolatility simultaneously, which is often recognized as a unique advantage over conventional and other emerging memories. While the technology is at an early stage and evolving in multiple platforms, STT-MRAM is particularly compelling as an embedded memory for system-on-chip (SOC). STT-MRAM can be integrated into SOC without altering baseline logic platforms both in process and in design. This chapter overviews key device and circuit subjects from the perspective of co-designing logic and MTJ.

Access provided by Autonomous University of Puebla. Download chapter PDF

Materials for spin-transfer-torque magnetoresistive random-access memory

Article 10 May 2018

Beyond STT-MRAM, Spin Orbit Torque RAM SOT-MRAM for High Speed and High Reliability Applications

Logic Circuits Design Based on MRAM: From Single to Multi-States Cells Storage

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 Introduction

Generic device scaling no longer secures the evolution of IC, causing the silicon-based technology to face unprecedented challenges in materials, devices, and processes. These challenges translate to compromises in power dissipation, performance, and cost for a wide range of IC products. While the end of physical scaling is not imminent, its value is being heavily eroded by the growing technological and economic concerns at the nanoscale. Some of promising innovations that can mitigate or overcome such problems may be found in spintronic IC. In the past few years, the spintronics community has achieved significant discoveries and breakthroughs [1]. Most recognized is the emergence of STT-MRAM [2–6]. Key discoveries and advances have triggered industry-wide R&D efforts in pursuit of an alternative memory in lieu of conventional memories that are not only facing acute tradeoffs in performance and power, but also nearing fundamental scaling limits. In parallel, various forms of MTJ-based novel logic devices and circuits have been demonstrated [7–9], opening a possible path for spintronic IC to expand beyond memory applications. Furthermore, a novel computing architecture concept, known as normally-off computer, was proposed as a way to reduce the energy consumption of modern microprocessors [10–12]. Still at an early stage in its endeavor, the global spintronics community continues to propel a plethora of innovations in materials, devices, circuits, and architectures.

STT-MRAM is particularly compelling as an embedded memory for SOC. In contrast to standalone commodity memories, each type of SOC requires a different combination of memory attributes such as speed, energy consumption, and reliability including cyclic endurance and data retention. STT-MRAM can be offered in a variety of macros whose designs are customized for application-specific SOC. In general, density requirements are found over a wide range (a few kbits to 256 Mbits). Yet, even in small densities, it can realize significant values in system performance, energy consumption, security, and cost, when device and circuit attributes are tailored at a system-architecture level. Furthermore, the memory element MTJ can be integrated in a fully logic-compatible way without altering or adversely impacting baseline logic platforms by adding two or three mask layers into a back-end-of-line (BEOL) flow [3].

Driving STT-MRAM beyond discrete devices and arrays toward SOC necessitates extensive learning cycles in device, circuit, yield, and reliability engineering. In order to produce variability- and fault-tolerant STT-MRAM, a systematic design methodology is required to assure robust functionality of STT-MRAM over a wide range of process-voltage-temperature (PVT) windows. This chapter overviews key device and circuit subjects from the perspective of co-designing logic and MTJ to enable STT-MRAM as a scalable custom embedded memory to serve advanced SOC.

4.2 Device Physics

4.2.1 Magnetic Tunnel Junction (MTJ)

A MTJ is a building block as a storage element for STT-MRAM. A MTJ consists of metallic ferromagnetic films separated by an oxide tunnel barrier, typically an ultra-thin magnesium oxide (MgO). The conductance of a ferromagnetic metal-insulator-ferromagnetic metal (FM1-I-FM2) structure is governed by tunnel magnetoresistance, a quantum mechanical phenomenon that results from spin-dependent tunneling [13]. When conduction electrons are emitted from one ferromagnetic metal electrode FM2, schematically illustrated in Fig. 4.1, they are spin-polarized to the magnetization direction of FM2 and tunnel through the thin tunnel barrier with their spin states conserved. The electron density of states in the opposite ferromagnetic metal electrode FM1 that these tunneling electrons encounter is dependent on the magnetization direction of FM1. Consequently, the electrical resistance (R) of FM1-I-FM2 structure is determined by relative orientations of the magnetizations, which is described by [14]

$$ R=\frac{R_{\perp }}{1+\frac{TMR}{2} \cos \theta } $$

(4.1)

where θ is the angle between the two configurations, R _⊥ is the resistance measured in the perpendicular magnetic configuration (θ = π/2). R becomes minimum (R _p) for the parallel magnetization configuration (θ = 0) and maximum (R _ap) for the anti-parallel configuration (θ = π). Accordingly, a MTJ serves as a variable resistor that can be configured to have binary states (0 and 1) defined by two discrete resistance values (R _p and R _ap, respectively). The tunnel magnetoresistance ratio (TMR) is then defined as:

$$ TMR=\frac{R_{ap}-{R}_p}{R_p}\times 100\,\% $$

(4.2)

TMR is one of critical device parameters for the design of STT-MRAM for error-free and high-speed read operations since the signal margin for sensing an array of MTJ is governed by TMR.

Figure 4.2 illustrates typical MTJ film stacks that essentially consist of metallic films separated by a tunnel barrier, most commonly MgO on the order of 1 nm in thickness. Depending on the orientation of the magnetization with respect to the film plane, two representative cases are shown here: (a) in-plane MTJ (i-MTJ);and (b) perpendicular MTJ (p-MTJ). The free layer is a soft ferromagnetic metal (e.g. CoFeB) whose magnetization can be switched by STT. The reference layer is commonly a synthetic structure to provide a reference magnetization fixed in one direction (e.g. top pinned layer in Fig. 4.2a) relative to the free layer magnetization. For i-MTJ, the magnetization of the bottom pinned layer is fixed by an antiferromagnet (AFM) pinning layer (e.g. PtMn) via the exchange bias effect. The top pinned layer is then antiferromagnetically coupled to the bottom pinned layer via interlayer exchange coupling with a non-magnetic spacer (e.g. Ru). This type of reference layer is called a synthetic antiferromagnet (SAF). In comparison, for p-MTJ, the out-of-plane magnetization of the bottom pinned layer can be developed inherently during film formation, hence the AFM pinning layer is not necessary (Fig. 4.2b). Ordinarily, the reference layer stack of p-MTJ is still structured in a SAF configuration. To promote high TMR, however, p-MTJ often necessitates an additional pinned layer (commonly in CoFeB) underneath the tunnel barrier which is exchange-coupled with the SAF.

The metallic films of MTJ are deposited by physical vapor deposition (PVD). The MgO barrier can be grown by PVD or a combination of PVD and oxidation. MTJ device properties are tailored through a selection of desired materials and a precise control of microstructure, film thickness, and cross-sectional feature size. Key MTJ parameters, critical to optimizing performance, energy consumption and reliability, include: TMR, resistance–area product (RA), energy barrier (E _B) and switching current density (J _c).

Experimental TMR reached ~600 % at room temperature in a CoFeB/MgO/Co FeB junction [15]. From a microstructure perspective, the most critical factor in achieving high TMR is promoting strong MgO (001) texture. For practical device applications, high TMR needs to be achieved in conjunction with relatively low RA, preferably, <10 Ω cm². TMR of 253 % at RA = 5.9 Ω cm² has been demonstrated by inserting CoFe as a crystallization template to induce preferred grain growth in MgO and then to promote crystallization of CoFeB through annealing [16]. In-situ annealing of the MgO barrier has been known to promote the (001) texture further, resulting in high TMR (>170 %) even for MTJ films with an ultralow RA (~1 Ω cm²) [17].

At a static mode, MTJ maintains its resistance state without power (i.e. nonvolatile) as long as the magnetic anisotropy of its free layer is greater than the thermal excitation energy described by k _B T where k _B is the Boltzmann constant and T is temperature. For i-MTJ which is typically patterned into an elliptically shaped cell, the free layer magnetic moment can have only two energetically favorable states aligned with the long-axis (called easy-axis) of the MTJ, thereby allowing either R _p or R _ap. For p-MTJ, the two states are determined by out-of-plane moments. Hence p-MTJ does not require a particular shape and is typically patterned in a circular shape.

Under the simplified assumption of a single-domain free layer, the energy barrier (E _B) between the two energetically favorable states is often given by

$$ {E}_B=\frac{M_s{H}_kV}{2}{\left(1-\frac{H_{ext}}{H_k}\right)}^2 $$

(4.3)

where M _s is the saturation magnetization of the free layer, V is the free layer volume, H _k is the effective uniaxial anisotropy field, and H _ext is the external field present along the easy-axis (which vanishes in the absence of any stray field). For MTJ to be non-volatile, E _B must be larger than the thermal excitation energy over a range of operating and storage temperatures. For example, for a single MTJ to retain its state for 10 years, E _B must be 40 k _B T (1 eV) or greater. A recent report demonstrated that E _B can be 100 k _B T or greater, which is remarkable for p-MTJ on the order of 30 nm in diameter [6].

4.2.2 Spin-Transfer-Torque (STT) Switching

A traditional way of programming MTJ is to apply a magnetic field to switch the free layer magnetization. A drawback of this method is the requirement of large current to induce sufficient magnetic field. It is also well understood that this method does not provide good scalability because decreasing the MTJ size entails larger switching fields, hence, even more current.

A breakthrough in physics of MTJ switching was accomplished in 1996 by the theoretical formulation that the free layer magnetization could be modulated by the direct transfer of spin angular momentum from spin-polarized electrons [18, 19]. This phenomenon, called spin-transfer-torque (STT) magnetization reversal, delivered a new means to control the free layer magnetization by directly applying electric current through MTJ without a need of magnetic field. The magnitude of STT scales with the current density (J). This is particularly beneficial for device scalability since the critical switching current (I _c) should scale proportionally to the size of MTJ. A breakthrough demonstration of STT-MRAM at an array level was first reported in 2005, including TMR of 160 % and switching speed as fast as 1 ns [2].

For i-MTJ, the intrinsic critical switching current (I _c0) is given by

$$ {I}_{c0}=\frac{2e}{\hslash}\;\frac{\alpha }{\eta }{M}_SV\left({H}_{k\left|\right|}+\frac{H_d-{H}_{k\perp }}{2}\right) $$

(4.4)

where α is the damping constant, η is the spin polarization constant, H _k|| is the uniaxial anisotropy field in the film plane, H _d is the effective perpendicular demagnetization field that corresponds to the field required to saturate the free layer moment perpendicular to the film plane, and H _k⊥ is the anisotropy field perpendicular to the plane. The H _d term, given by 4πM _s, represents an additional energy term that needs to be overcome during STT switching because the shape anisotropy induces an oscillatory motion of magnetization confined in the direction perpendicular to the film plane, resulting in an elliptical precession. Undesirably, H _d only greatly increases I _c0 without contributing to E _B. A technological challenge in building STT-MRAM is to reduce I _c0 while maintaining sufficient E _B. Hence, an effective way of reducing I _c0 without degrading E _B is to introduce perpendicular anisotropy H _k⊥ to cancel a substantial portion of H _d.

Considered as an essential figure of merit, the STT switching efficiency is described by the ratio of E _B and I _c0. For i-MTJ, it is typically on the order of 0.5–1 k _B T/μA. This allows good scalability for i-MTJ as small as approximately 40 nm (short axis). However, the success of STT-MRAM is largely dependent on whether MTJ can be scaled to deep nanoscale nodes (30 nm and below) in conjunction with low switching energy and high stability. Unless the STT efficiency is raised significantly, i-MTJ may not provide sufficient E _B for nonvolatility, which limits physical scaling of i-MTJ for future nodes.

This scalability challenge can be overcome by adopting p-MTJ which provides much greater anisotropy even in small features. E _B of p-MTJ is determined by crystalline or interface perpendicular magnetic anisotropy (PMA), not by the shape anisotropy of i-MTJ. Various PMA materials have been investigated, which include L1₀-ordered FePt or FePd alloys, Co-based superlattices such as Co/Pt and Co/Ni laminates, rare-earth/transition metal alloys, etc. To build useful MTJ devices, however, these materials must be engineered for an optimal combination of materials properties like M _s and H _k and device properties like TMR and J _c. A prior report addressed that the anisotropy resulting from the CoFeB-MgO interface can induce large H _k⊥ [4]. When CoFeB is sufficiently thin (typically 1.5 nm or thinner), such interface PMA can overcome the demagnetization field, i.e., H _k⊥ > H _d. The film can then become magnetized fully perpendicular to the plane. With further tuning of the stack, such interface PMA can be achieved for even thicker CoFeB. Recently, p-MTJ devices utilizing interfacial PMA of CoFeB have successfully been engineered for fully functional 8 Mbit STT-MRAM [6].

Referring to Eq. (4.4), I _c0 and E _B pertaining to p-MTJ with interfacial PMA are described by

$$ {I}_{c0}=\frac{e}{\hbar}\;\frac{\alpha }{\eta}\;{M}_SV{H}_{k\perp}^{\mathit{eff}} $$

(4.5)

$$ {E}_B=\frac{M_SV{H}_{k\perp}^{\mathit{eff}}}{2} $$

(4.6)

where $ {H}_{k\perp}^{\mathit{eff}} $ is the effective perpendicular anisotropy field. In contrast to i-MTJ described by Eqs. (4.3) and (4.4), note that I _c0 is directly proportional to E _B. The absence of the H _d term means that STT switching is far more efficient. This leads to substantially higher STT efficiency (E _B/I _c0). Recently, $ {E}_B/{I_c}_0\sim 5\kern0.24em {k}_BT/\mu A $ has been reported from an array of ~30 nm p-MTJ [6], suggesting that the STT efficiency of p-MTJ could be an order of magnitude greater than that of in-plane MTJ. This is a significant breakthrough demonstrating the scalability of p-MTJ based on interfacial PMA of CoFeB, which is a preferable material to achieve high TMR as well.

4.3 Device Engineering

4.3.1 Bitcell and Array

STT-MRAM is a hybrid IC built on a combination of semiconductor logic and MTJ. Its bitcell which represents 1 bit is commonly architected in 1 transistor plus 1 MTJ (1T-1J). As shown in Fig. 4.3, a MTJ is connected in series to an n-type metal oxide semiconductor transistor (NMOS). This transistor is called an access transistor since it controls read and write access to the connected MTJ as a digital switch.

Figure 4.4 is a schematic representation of a typical STT-MRAM array that consists of 1T-1J bitcells. To read the information stored in a cell, the word line (WL) of the selected cell is turned on and a small read current is applied by a sensing circuit (Sect. 4.4) to either the selected bit line (BL) or the source line (SL) with the other end of the cell grounded (GND). A sense amplifier determines the cell state by sensing the difference between the cell resistance and the reference resistance predefined from a reference MTJ array. In comparison, the write operation requires bidirectional currents because the direction of write current determines which resistance state (R _p or R _ap) is programmed to MTJ. With the bitcell architecture shown in Fig. 4.4, for R _ap → R _p, a write voltage is applied to BL (V _BL = V _DD) with WL turned on (V _WL = V _DD) and SL grounded (V _SL = 0 V, GND), and vice versa for R _p → R _ap. For successful write operation, the write current (I _w) supplied to the MTJ in each bitcell must be larger than the MTJ critical switching current (I _c).

Figure 4.4b shows an example layout of 1T-1J bitcell with an array architecture illustrated in Fig. 4.4a. Provided that the minimum metal half-pitch is λ, two metal lines BL and SL running in parallel limit the minimum bitcell width to 4 λ. Then the metal plate connected to the source and the drain of the access transistor may limit the bitcell height to 3 λ or 1.5 times of the gate pitch. Assuming the metal pitch is larger than the gate pitch, the bitcell size can be as small as 12 λ².

The array architecture shown in Fig. 4.4 is simple to design and operate. One shortcoming of this structure is that every BL is coupled with its own SL, thereby causing a larger array footprint. A more compact array can be realized by placing SL orthogonal to BL, as shown in Fig. 4.5. SL is then parallel to WL and shared between two neighboring rows of WL. With this architecture, the bitcell size can be as small as 6 λ² (Fig. 4.5b), a half of the size in Fig. 4.4. However, this architecture results in more complex write operation for R _p → R _ap. When SL is raised to a write voltage, the selected BL is grounded. Simultaneously, all the unselected BL associated with the selected WL must be raised to the same level of the write voltage to avoid unintentional current flows to the unselected MTJ. Consequently, this architecture consumes more power during write operation. Furthermore, this may even necessitate two separate write pulses to complete a full write cycle, since in chip-level operation each full cycle carries multiple bits (typically, 32, 64, or 128 bits) of R _p and R _ap concurrently. Accordingly, this architecture is not desirable for low power and high speed applications.

Table 4.1 describes an example of the attributes of a bitcell embedded for a 45-nm low-power logic platform [3]. The bitcell size is ~50 F², where F is 45 nm (minimum feature size of this node). When the half-metal pitch λ is used, the size is ~20 λ² since λ is 70 nm. This is significantly larger than that of an ideal layout of the same array architecture in Fig. 4.4b, which is attributed to the constraints of logic design rules of this particular logic technology.

Table 4.1 Key attributes of an embedded STT-MRAM bitcell demonstrated for a 45-nm low-power logic platform [3]

Full size table

4.3.2 Writability

MTJ switching is a current-induced phenomenon, and the switching operation requires a bidirectional control of current. For 1T-1J, the currents supplied to MTJ are not symmetrical with respect to the polarity of current, owing to the phenomenon known as the source degeneration effect. This occurs when a resistive load is placed at the source side of a transistor. As a consequence, despite the same operating voltage (V _DD) applied to BL or SL, the transistor output currents are asymmetrical. This is illustrated in Fig. 4.6, where such asymmetry is simulated at a full circuit level. This causes a significant disadvantage which reduces the write margin of 1T-1J. Furthermore, the STT effect on a typical MTJ is also asymmetrical, which is described by I _c asymmetry (β), defined as $ \left|\frac{I_c^{P\to AP}}{I_c^{AP\to P}}\right| $. Typical MTJ devices exhibit β of 1.5 or larger, presumably, due to smaller STT effect for R _p → R _ap (electrons flowing from the free layer to the reference layer). When these two effects are coupled in a conventional 1T-1J bitcell, it is much more difficult to switch the cell from R _p to R _ap, often results in increase in transistor size or operation voltage. Several approaches have been suggested to mitigate these problems: (1) I _c asymmetry reduction using dual spin polarizers [20]; (2) a “top-pinned” MTJ film stack [21]; and (3) a modified 1T-1J with a reversely connected MTJ [3].

In most cases of STT-MRAM targeted for fast switching, a primary challenge is to design for the capability of supplying sufficiently large driving current for MTJ switching. A simple alternative to 1T-1J is 2T-1J, for which one MTJ is coupled with two access transistors in parallel. The drive current can become significantly larger. Despite the fact that an additional transistor makes the effective transistor size twice as large as that of 1T-1J, the bitcell size increases only by ~33 % to 16 λ², as shown in Fig. 4.7. This is realized through an optimized 2T-1J layout by sharing the source line between neighboring bitcells and therefore eliminating the spacing between the active regions of neighboring bitcells. Compared with the 1T-1J bitcell (Fig. 4.4b) whose bitcell height is often 1.5 times of the gate pitch, the height of the 2T-1J bitcell is increased to 2 times of the gate pitch, thereby increasing the bitcell size by ~33 %.

In STT-MRAM, I _c has a strong dependence on write pulse width, as illustrated in Fig. 4.8. Fast MTJ switching, often referred to as precessional switching (10 ns or below), requires substantially larger I _c than relatively slow switching. This leads to challenges in designing high-performance bitcells. Unless the MTJ size is substantially small, it is often difficult to realize sub-10 ns switching without enlarging the bitcell size. This is a primary reason why continuing innovations in MTJ materials engineering are still desired to reduce J _c. Recent advances have realized reliable switching in 1T-1J below 4 ns with write error rate lower than 10⁻⁶ [6].

Practically, it is necessary to tailor MTJ and bitcell attributes for varying write speed requirements depending on different STT-MRAM product applications. For example, for embedded Level 2 or Level 3 CPU cache memory, the MTJ switching speed is preferred to be on the order of a few nanoseconds, although this could often be relaxed significantly through various design optimization techniques. In contrast, for traditional embedded nonvolatile memory applications, the switching speed on the order of a microsecond is still compelling (a few orders of magnitudes faster than embedded Flash). An advantage of STT-MRAM is such that MTJ can be tuned for custom bitcells which can serve widely varying ranges of product applications.

4.4 Circuit Design

4.4.1 Write Circuit

The write operation of STT-MRAM is to switch the state of MTJ by supplying current higher than I _c. The polarity of the current determines the switched state, either 0 (R _p) or 1 (R _ap). As shown in Fig. 4.9 [22], a write driver is connected to BL and SL, respectively, which acts as a current source or a sink depending on the current polarity. Each write driver is realized by a tri-state inverter. The magnitude of the write supply current (I _w) is determined by the size of the write driver. To write 0, the current flows from the free layer to the pinned layer of MTJ, so that the write driver of BL operates as a current source and that of SL as a current sink. Accordingly, the D value of the write driver (Fig. 4.9c) is high for BL and low for SL. As shown in Fig. 4.8, I _c is a function of write pulse width. I _c becomes higher as the pulse width is shorter. Thus a write enable signal (WET, WEB) should be controlled precisely to prevent write failure (occurring when I _w < I _c). On the other hand, a wear-out reliability risk may arise when I _w is too high. Therefore, designing a write driver must consider two factors: precise control of write pulse width and optimal sizing of the driver.

Both I _w and I _c are dependent on process variation and can be modeled by Gaussian distributions. For a single cell, the write access pass yield (WAPY), expressed in sigma (standard deviation), is obtained by combining the distributions of I _w and I _c

$$ WAP{Y}_{Cell}=\frac{\mu_{I_w}-{\mu}_{I_c}}{\sqrt{\sigma_{I_w}^2}+{\sigma}_{I_c}^2} $$

(4.7)

where $ {\mu}_{I_W} $ and $ {\mu}_{I_C} $ are the mean of I _w and I _c, respectively, and σ_w and σ_c are the standard deviation of I _w and I _c, respectively.

4.4.2 Read Circuit

4.4.2.1 Conventional Sensing Circuit

The read operation of STT-MRAM determines the resistance state of each cell with respect to the predefined state of a reference MTJ array. The operation relies on a sensing circuit and a sense amplifier which converts an output voltage of the sensing circuit to a digital signal. Figure 4.10 shows a conventional sensing circuit designed for MRAM [23]. The circuit is comprised of a data branch and two reference branches. Each branch includes a clamp NMOS (NC _D or NC _R) and a load PMOS (PL _D or PL _R). The sensing current (I _s) is controlled by the gate voltage of clamp NMOS (V _{G_clamp}). The clamp NMOS generates different currents according to the MTJ state 0 or 1. The source voltage of clamp NMOS is fixed in a saturation region. The saturation current of clamp NMOS is high at 0 and low at 1. The two clamp NMOS of the reference branches (NC _R) are designed to generate a saturation current at a medium level between 0 and 1 of the data branch, as shown in Fig. 4.11. The saturation current of NC _R is conveyed to PL _D through a current mirror circuit. Thus, the saturation current of NC _D is larger than that of PL _D for 0 and accordingly V _data0 is low, whereas the saturation current of NC _D is smaller than that of PL _D for 1 and V _data1 is high.

The sense amplifier determines 0 or 1 of the data MTJ by comparing output voltages (V _data0, V _data1, V _ref) of the sensing circuit. Read is successful when the difference between V _data and V _ref (ΔV ₀ = V _ref − V _data0, ΔV ₁ = V _data1 − V _ref) is larger than the offset voltage of the sense amplifier (V _{SA_OS}). Note that V _data is susceptible to PVT variations. Thus, it is important to design V _ref in a way to trace PVT variations of V _data.

4.4.2.2 Read Yield

A circuit designer needs to prevent two types of functional failure during read operation. Sensing failure occurs when ΔV ₀ or ΔV ₁ is smaller than V _{SA_OS}. Read disturbance failure is possible when I _s exceeds I _c (i.e. unintentional switching during sensing). Considering these two, a statistical read yield model can be built in the following way.

The statistical distributions of ΔV ₀, ΔV ₁, and V _{SA_OS} can be modeled by Gaussian distributions. For a single cell, the read access pass yield for 0 or 1 (RAPY_Cell0 or RAPY_Cell1) is given by [24]

$$ RAP{Y}_{Cell0,1}=\frac{\mu_{\Delta {V}_{0,1}}-{\mu}_{SA\_ OS}}{\sqrt{\sigma_{\Delta {V}_{0,1}}^2+{\sigma}_{SA\_ OS}^2}} $$

(4.8)

where $ {\mu}_{\Delta {V}_{0,1}} $ and μ _{SA _ OS} are the mean of ΔV _0,1 and V _{SA_OS}, respectively, and $ {\sigma}_{\Delta {V}_{0,1}} $ and σ _{SA _ OS} are the standard deviation of ΔV _0,1 and V _{SA_OS}, respectively. RAPY_Cell is then defined as the smaller of RAPY_Cell0 and RAPY_Cell1.

$$ RAP{Y}_{Cell}= \min \left( RAP{Y}_{Cell0},\ RAP{Y}_{Cell1}\right) $$

(4.9)

The criterion for read disturbance failure is I _s ≥ I _c. Thus, the read disturbance pass yield (RDPY) is given by:

$$ RDP{Y}_{Cell}=\frac{\mu_{I_c}-{\mu}_{I_s}}{\sqrt{\sigma_{I_c}^2-{\sigma}_{I_s}^2}} $$

(4.10)

I _s has large effects both on RAPY and on RDPY. As illustrated in Fig. 4.12, RDPY_Cell decreases as I _s increases. In addition, RDPY_Cell is lower when I _c is lower. On the other hand, there is an optimum I _s which maximizes RAPY_Cell because it is difficult to achieve small $ {\sigma}_{\Delta {V}_{0,1}} $ and large $ {\mu}_{\Delta {V}_{0,1}} $ when I _s is too low and too high, respectively. Therefore, depending on I _c, different design strategies are applicable to maximize read yield. For high I _c, RDPY_Cell is also high, so that I _s is tuned to maximize RAPY_Cell (Point A). For low I _c, it is desired to find I _s to make RDPY_Cell and RAPY_Cell equal (Point B). In general, I _c continually scales down as the feature size shrinks, which means that controlling read disturb yield is becoming of great significance.

4.4.3 Advanced Sensing Circuits

Assuring adequate sensing margin (ΔV ₀ and ΔV ₁) for STT-MRAM at deeply scaled nodes necessitates extensive design efforts owing to the decrease in supply voltage and the increase in process variation. Further, STT-MRAM must be designed to avoid potential read disturbance, desiring a low-current sensing method. To solve these challenges, various types of advanced sensing circuits have been developed.

4.4.3.1 Source Degeneration PMOS

The load PMOS and the clamp NMOS shown in Fig. 4.10 can become a significant source of process variation for the sensing circuit. The clamp NMOS has a large source resistance, and it operates as a source degeneration resistance to keep the current through the clamp NMOS as constant as possible. On the other hand, the source of the load PMOS is directly connected to a voltage supply (V _DD), leading to large current variation. To reduce the variation effect of the load PMOS, a source degeneration scheme can be adopted by inserting a degeneration PMOS between the source of the load PMOS and the voltage supply, as shown in Fig. 4.13 [24]. This is relatively a simple method to increase read yield by minimizing the variation of ΔV ₀ and ΔV ₁ caused by the process variation of the load PMOS.

4.4.3.2 Self Body Biasing

Assuring sensing margin (ΔV ₀ and ΔV ₁) is more challenging at lower V _DD and higher V _TH (threshold voltage), i.e., when the voltage headroom (V _DD–V _TH) is smaller. Note that high V _TH transistors are widely adopted for low standby power applications. Figure 4.14 describes a sensing circuit that can mitigate this challenge by utilizing self body biasing [25]. The body bias can decrease V _TH of the load PMOS when the sensing circuit is active. This helps secure the sensing margin during read operation while not causing high leakage current at the standby mode. In addition, it internally generates a body voltage without utilizing a body voltage generator, so that the area overhead is minimal compared with the conventional sensing circuit (Fig. 4.10). This scheme can be coupled with the degeneration PMOS described in Sect. 4.4.3.1.

4.4.3.3 Split-Path Sensing

A sensing circuit can adopts variable V _ref to increase ΔV ₀ and ΔV ₁. V _ref is modulated according to the MTJ state. At 0, V _ref increases, so does ΔV ₀. Whereas at 1, V _ref decreases, hence ΔV ₁ increases. Figure 4.15a shows a conventional sensing circuit with variable V _ref with a symmetric cross-coupled current mirror. A drawback of this circuit is that the mismatch occurred during such current mirroring increases the standard deviation of ΔV ₀ and ΔV ₁. An alternative scheme has been proposed by adopting a split-path sensing circuit, shown in Fig. 4.15b [26]. The split path enables variable V _ref while minimizing the number of current mirrors. The variable V _ref enhances $ {\mu}_{\Delta {V}_{0,1}} $ by doubling it. The minimized number of current mirrors reduces $ {\sigma}_{\Delta {V}_{0,1}} $.

4.4.3.4 Offset-Canceling Triple-Stage Sensing

One of emerging challenges for STT-MRAM sensing circuits is to reduce I _s while maintaining sensing margins. This is attributed to rapid reduction in I _c with p-MTJ (owing to high STT efficiency addressed in Sect. 4.2) and also with a demand for smaller MTJ (e.g. diameter <20 nm). Reduced I _s results in larger $ {\sigma}_{\Delta {V}_{0,1}} $, which directly reduces sensing yields. The impact of $ {\sigma}_{\Delta {V}_{0,1}} $ is often greater than $ {\mu}_{\Delta {V}_{0,1}} $. Figure 4.16 illustrates an offset-canceling triple-stage (OCTS) sensing circuit [27]. This is designed for reducing $ {\sigma}_{\Delta {V}_{0,1}} $ by canceling the offsets of the sensing circuit caused by process variations. The principle of OCTS is to sense progressively three cells that are the data cell and the two reference cells with 0 and 1 through one sensing circuit. The output voltages are then added or subtracted to cancel out the offsets.

A drawback of OCTS is such that there is only one sensing circuit sequentially to read data and reference cells. Hence it is difficult to avoid a read speed penalty, though this may be tolerable for most applications. In addition, gate capacitors are required to store the sensed value at each stage, which increases the array size.

4.4.3.5 Self-Reference Sensing

OCTS can effectively cancel out the offsets of the sensing circuit, but cannot improve the sensing yield related to MTJ process variation. Self-reference circuits, shown in Fig. 4.17, generate V ₀, V ₁, and V _ref with only one MTJ. Such sensing circuits minimize the offsets caused not only by the sensing circuit but also by MTJ process variation. Figure 4.17a is a relatively simple self-reference scheme. It reads the MTJ cell and store V ₀ and V ₁ in capacitance at the first stage. At the second stage, the MTJ cell is written to 0 by a larger write current than that of the first stage. The sensing circuit reads this cell to generate V _ref. Because the stored information of the MTJ is removed during the read operation, this method is destructive, so that the readout value must be written back to the MTJ at the last step. This degrades the read speed and consumes more energy. In contrast, an alternative scheme, which is nondestructive, is shown in Fig. 4.17b [28]. This scheme allows maintaining the MTJ state after the first sensing. When the MTJ is at 0, the resistance of the cell is nearly constant regardless of the current flowing through the MTJ cell. The resistance change is detected by current when the MTJ turns into 1. Accordingly, this nondestructive self-reference scheme generates V _ref without write and write-back processes, overcoming the drawbacks of the circuit in Fig. 4.17a. However, it is difficult to secure the resistance difference when the current difference is subtle, which may cause a challenge for ever decreasing operating current requirement with scaling.

4.4.4 Array Architecture

In general, memory array architecture is an essential design parameter that influences performance, power consumption, yield, and chip size. Determining an optimal array architecture is therefore dependent on bitcell specification, chip specification, target yield, and even reliability. The array efficiency, a ratio of bitcell array area over total memory area including peripheral circuits, becomes higher as the array size increases. But, this leads to degradation in performance because parasitic resistances and capacitances increase owing to the increase in the number of cells connected to BL. As shown above, read circuit design is more challenging for STT-MRAM, so that an effort to increase the array size must be considered cautiously in a way not to degrade read performance.

4.4.4.1 Multiplexer (MUX) Architecture

STT-MRAM employs a different MUX structure compared with conventional memories. For the read operation of STT-MRAM, selected BL is connected to the sensing circuit and SL to the ground (GND). For the write operation of 0, the write driver connected to selected BL drives V _DD and the driver connected to SL drives GND, and vice versa for 1. As shown in Fig. 4.18, these read and write operations can be enabled by different types of MUX structures. Figure 4.18a is a common architecture to achieve a small footprint since only one write driver is required for all BL and SL. However, the current originated from the write driver must pass through the MUX, hence, reducing I _w. Figure 4.18b shows a merged write driver for which selection control utilizes an independent write driver for each BL and SL. This results in a larger area, but provides higher I _w owing to the absence of MUX along the write path. Figure 4.18c illustrates selectors to drive V _DD and GND from both terminals of BL and SL by separating a write driver. The area becomes even larger, however, this MUX can realize higher yield by controlling parasitic mismatches because the lengths of BL and SL are the same for all the cells.

4.4.4.2 Reference Cell Architecture

STT-MRAM reference cell architectures are generally categorized according to the position of reference cells, as shown in Fig. 4.19. An array structure which places all reference cells into one WL (Fig. 4.19a) has an advantage in achieving small footprint. But, this architecture is not preferable for high-capacity STT-MRAM owing to parasitic mismatches between reference cells and data cells. In comparison, the reference architecture in Fig. 4.19b, which places all reference cells along one BL, has an advantage of achieving higher read yield. There is an area penalty associated with this, though tolerable. While each data block ordinarily has its own reference cell array, it is also possible to design a shared reference scheme for which two data blocks share two adjacent reference cells (Fig. 4.19c).

4.5 Co-design of MTJ and Logic

Designing an IC at an advanced technology node with embedded STT-MRAM requires a proven statistical circuit model which addresses systematic and random variations of MTJ and logic [29]. It is important to understand the challenges imposed by deep scaling of the logic technology. Recent work addressed a first-of-its-kind statistical circuit model and its application for designing a STT-MRAM building block and its array [30]. Figure 4.20 illustrates key components of this co-design methodology. The model can be seamlessly integrated into a common CMOS circuit simulation environment. The statistical variability-aware model fits Si data by covering PVT variations. The model is also combined with a micromagnetic physical model, hence, allowing co-optimization of MTJ physical parameters and cell circuit parameters. Hence, the model is applied to tune device parameter specifications required to meet target chip performance, yield, and reliability. Correlations among critical design parameters are systematically examined by the model. An example is shown in Fig. 4.21. By performing statistical Monte Carlo simulations, the model is capable of predicting an array functional yield. The model correlates various functional failure modes to physical cell defects or circuit-design errors. The accuracy of the model has been validated by chip-level functionality and yield data [31].

4.6 Perspective

Modern SOC memory subsystems are diverse or complicated, so that difficult to be served by one prevalent memory. It is desirable to optimize each SOC platform by a different combination of memory attributes such as speed, power consumption, reliability, and cost. In this aspect, STT-MRAM is attractively positioned since its building block MTJ can be tuned for a broad range of memory attributes which can serve largely different types of SOC applications. For example, low-power STT-MRAM can become an ideal embedded nonvolatile memory for battery-powered wireless connectivity networks pertaining to Internet-of-Things and wearable electronics, not only by storing nonvolatile codes, but also by storing and executing fast data [32, 33]. This type of STT-MRAM simplifies the conventional memory subsystem and also extends battery life. In addition, its logic-friendly design and process compatibility can realize such benefits at advanced logic nodes for which it is difficult to employ conventional embedded nonvolatile memory technology. On the other hand, high-performance (<5~ ns) STT-MRAM can serve as an alternative to SRAM. Despite the fact that STT-MRAM is slower than SRAM at a discrete circuit level, the memory subsystem can be architected in a way that the performance can be comparable or even better at a system level. Furthermore, there is a significant range of custom SRAM for which its leakage power and cost (chip area) are critical drawbacks. One emerging case is Level-3 cache for mobile CPU. Moreover, even higher performance STT-MRAM potentially realized in custom-designed bitcells and circuits may move up to a higher level of memory hierarchy (Level-2 cache). In addition, high-throughput embedded STT-MRAM can become an attractive memory for GPU by providing higher on-chip memory density at lower energy consumption. Finally, the MTJ applied for STT-MRAM can be utilized for security and anti-tampering applications. Examples include one-time programmable memory, random number generator, and physically unclonable function.

References

Kang SH, Lee K. Emerging materials and devices in spintronic integrated circuits for energy-smart mobile computing and connectivity. Acta Mater. 2013;61:952–73.
Article Google Scholar
Hosomi M, Yamagishi H, Yamamoto T, et~al. A novel nonvolatile memory with spin torque transfer magnetization switching: spin-RAM. IEDM Tech Dig. 2005;2005:459–62.
Google Scholar
Lin CJ, Kang SH, Wang YJ, et~al. 45 nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell. IEDM Tech Dig. 2009;2009:279–82.
Google Scholar
Ikeda S, Miura K, Yamamoto H, et~al. A perpendicular-anisotropy CoFeB-MgO magnetic tunnel junction. Nat Mater. 2010;9:721–4.
Article Google Scholar
Rizzo ND, Houssameddine D, Janesky J, et~al. A fully functional 64 Mb DDR3 ST-MRAM built on 90 nm CMOS technology. IEEE Trans Magn. 2013;49(7):4441–6.
Article Google Scholar
Thomas L, Jan G, Zhu J, et~al. Perpendicular STT-MRAM with high spin-torque efficiency and thermal stability for embedded memory applications. J Appl Phys. 2014;155(17):172615
Google Scholar
Sekikawa M, Kiyoyama K, Hasegawa H, et~al. A novel SPRAM-based reconfigurable logic block for 3D-stacked reconfigurable spin processor. IEDM Tech Dig. 2008;2008:1–3.
Google Scholar
Ohno H. A hybrid CMOS/magnetic tunnel junction approach for nonvolatile integrated circuits. In: VLSI Technology Symposium, 2009, p. 122–23.
Google Scholar
Ohno H, Endoh T, Hanyu T, et~al. Magnetic tunnel junction for nonvolatile CMOS logic. IEDM Tech Dig. 2010;2010:9.4.1–4.
Google Scholar
Ando K. Nonvolatile magnetic memory. J Fed. 2001;12:89–95.
Google Scholar
Ando K, Ikegawa S, Abe K, et~al. Roles of non-volatile devices in future computer systems: normally-off computers. In: Energy-aware systems and networking for sustainable initiatives. Hershey: IGI Global; 2012. p. 83–907.
Chapter Google Scholar
Kawahara T. Scalable spin-transfer torque RAM technology for normally-off computing. IEEE Des Test Comput. 2011;28(1):52–63.
Article Google Scholar
Jullière M. Tunneling between ferromagnetic films. Phys Lett A. 1975;54(3):225–6.
Article Google Scholar
Jaffrès H, Lacour D, Nguyen Van Dau F, et~al. Angular dependence of the tunnel magnetoresistance in transition-metal-based junctions. Phys Rev B. 2001;64:064427.
Article Google Scholar
Ikeda S, Hayakawa J, Ashizawa Y, et~al. Tunnel magnetoresistance of 604% at 300 K by suppression of Ta diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealed at high temperature. Appl Phys Lett. 2008;93:082508.
Article Google Scholar
Choi YS, Tsunematsu H, Yamagata S, et~al. Novel stack structure of magnetic tunnel junction with MgO tunnel barrier prepared by oxidation methods: preferred grain growth promotion seed layers and bi-layered pinned layer. Jpn J Appl Phys. 2009;48:120214.
Article Google Scholar
Maehara H, Nishimura K, Nagamine Y, et~al. Tunnel magnetoresistance above 170% and resistance–area product of 1 Ω-μm² attained by in-situ annealing of ultra-thin MgO tunnel barrier. Appl Phys Express. 2011;4:033002.
Article Google Scholar
Slonczewski JC. Current-driven excitation of magnetic multilayers. J Magn Magn Mater. 1996;159:L1–7.
Article Google Scholar
Berger. Emission of spin waves by a magnetic multilayer traversed by a current. Phys Rev B. 1996;54:9353–8.
Article Google Scholar
Diao Z, Panchula A, Ding Y, et~al. Spin transfer switching in dual MgO magnetic tunnel junctions. Appl Phys Lett. 2007;90:132508.
Article Google Scholar
Lee YM, Yoshida C, Tsunoda K et~al. Highly scalable STT-MRAM with MTJs of top-pinned structure in 1T/1MTJ cell. In: VLSI Technology Symposium, 2010, p. 49–50.
Google Scholar
Kawahara T. 2 Mb SPRAM with bit-by-bit bi-directional current write and parallelizing-direction current read. IEEE J Solid-State Circuits. 2008;43(1):109.
Article MathSciNet Google Scholar
Maffitt TM. Design considerations for MRAM. IBM J Res Dev. 2006;50(1):25.
Article Google Scholar
Kim J, Ryu K, Kang SH, et~al. A novel sensing circuit for deep submicron spin transfer torque MRAM. IEEE Trans Very Large Scale Integr Syst. 2012;20(1):181–6.
Article Google Scholar
Kim J, Ryu K, Kim JP et~al. An STT-MRAM sensing circuit with self-body biasing in deep submicron technologies. IEEE Trans Very Large Scale Integr Syst. 2014;22(7):1630-4 doi:10.1109/TVLSI.2013.2272587.
Google Scholar
Kim J, Na T, Kim JP. A split-path sensing circuit for spin-torque transfer MRAM. IEEE Trans Circuits Syst, 2014. doi:10.1109/TCSII.2013.2296136.
Google Scholar
Na T, Kim J, Kim JP et~al. An offset-canceling triple-stage sensing circuit for deep submicrometer STT-RAM. IEEE Trans Very Large Scale Integr Syst. 2014;22(7):1620-4. doi:10.1109/TVLSI.2013.2294095.
Google Scholar
Chen Y. A nondestructive self-reference scheme for spin-transfer torque random access memory. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), 8–12 March 2010, p. 148–53.
Google Scholar
Zhu X, Kang SH. Spin-transfer-torque MRAM: device architecture and modeling. In: Wang X, editor. Metallic spintronics devices. CRC, 2014 p. 21–70.
Google Scholar
Zhu X, Kang SH. Variation-aware device modeling and design for embedded STT-MRAM array. In: 55th MMM Conference HC-13, 2010
Google Scholar
Kim JP, Kim T, Hao W et~al. A 45nm 1Mb embedded STT-MRAM with design techniques to minimize read-disturbance. In: VLSI Circuits Symposium, 2011, p. 296–97.
Google Scholar
Kang SH. Embedded STT-MRAM for energy-efficient and cost-effective mobile systems. In: VLSI Technology Symposium, 2014, p. 36–7.
Google Scholar
Lee K, Kan JJ, Kang SH. Unified embedded non-volatile memory for emerging mobile markets. In: ISLPED, 2014, p. 131–6.
Google Scholar

Download references

Author information

Authors and Affiliations

Qualcomm Technologies Inc., 5775 Morehouse Dr., San Diego, CA, 92121, USA
Seung H. Kang
Yonsei University, Seoul, South Korea
Seong-Ook Jung

Authors

Seung H. Kang
View author publications
You can also search for this author in PubMed Google Scholar
Seong-Ook Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung H. Kang .

Editor information

Editors and Affiliations

IBM, Hopewell Junction, New York, USA
Rasit O. Topaloglu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kang, S.H., Jung, SO. (2015). Embedded STT-MRAM: Device and Design. In: Topaloglu, R. (eds) More than Moore Technologies for Next Generation Computer Design. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2163-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2163-8_4
Published: 21 January 2015
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2162-1
Online ISBN: 978-1-4939-2163-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics