Introduction

Spin transistors are a new class of electronic device that unites an ordinary transistor with the useful functions of a spin (magnetoresistive) device [1]. Following the first proposal of the spin-transistor concept by Datta and Das [2] and Johnson [3], a wide variety of spin transistors based on various operating principles have been proposed [1]. One of the most attractive applications for spin transistors is integrated electronics such as complementary metal-oxide-semiconductor (CMOS) circuits. Since scalability and integration ability are still indispensable for integrated circuits using spin transistors in the same manner as present CMOS integrated circuits, spin transistors analogous to field-effect transistors (FETs) are attractive. In particular, spin transistors based on CMOS devices, called spin-functional MOSFETs, are expected to be a building block for novel low-power integrated circuits. The spin-functional MOSFETs are classified into two categories, i.e., spin-MOSFET and pseudo-spin-MOSFET. The spin-MOSFET is a field-effect spin transistor using the ferromagnetic source and drain, and the pseudo-spin-MOSFET is a circuit for reproducing the functions of spin transistors using an ordinary MOSFET and a magnetic tunnel junction (MTJ). Their interesting features of nonvolatile information storage and variable transconductance (or current drivability) are useful and suitable for new low-standby power architecture, such as nonvolatile power gating (NVPG). Furthermore, these devices can exhibit excellent performance as an integrated transistor and would have excellent compatibility with present integrated circuit technology. These features also make the spin-functional MOSFETs attractive for integrated circuit applications. In general, transistor performance of the spin transistors, such as propagation delay, switching energy (power-delay product), and standby power, is not superior to ordinary MOSFETs on device level. However, the circuit performance would be dramatically improved by introducing new circuit architecture based on the functionalities of the spin transistors. NVPG is a standby (static) power reduction technique using nonvolatile storage/latch circuits and would be highly beneficial for low-power CMOS logic circuits/systems. Nonvolatile SRAM (NV-SRAM) and nonvolatile flip-flop (NV-FF) are key memory circuits to realize NVPG, and these are simply configured with spin-functional MOSFETs. The hierarchical memory system for the NVPG architecture can be organized with nonvolatilized registers, register files, and caches using the NV-SRAM and NV-FF circuits. Power reduction for this memory system can be achieved only when the power is managed according to break-even time (BET) that is an index of energy performance for NVPG. These NV-SRAM and NV-FF circuits can minimize the BET, leading to dramatic power reduction of the CMOS logic circuits/systems.

Classification of Field-Effect Spin Transistors: Spin-FET and Spin-MOSFET

As noted above, scalability and integration ability of spin transistors are essential for integrated circuit applications. Therefore, spin transistors analogous to FETs are more attractive than vertical-type devices. Furthermore, FET-type devices can suppress the power-delay product to a relatively small value in comparison with vertical-type devices and also exhibit the high current drivability and low off-current. These features of FET-type devices enable them to be applied to high-speed and low-power integrated circuits [1]. The FET type of spin transistor can be classified into two types: the spin-FET [2] (and its modified versions [46]) and the spin-MOSFET [7] (with several variations [810]). These spin transistors are comprised of a modulation-doped FET (MODFET) structure or a MOSFET structure in combination with the ferromagnetic source and drain that act as a spin injector and a spin detector, respectively, as shown in Fig. 1. Note that the quasi-one-dimensional channel structure is necessary to avoid dephasing the spin-polarized electrons for the original spin-FET, whose structure is not required for the spin-MOSFET. The operating principles of these devices are quite different. In the spin-FET and its related devices, the switching operation can be achieved by spin precession or dephasing of spin-polarized carriers injected into the channel. On the other hand, the relative magnetization configurations of the source and drain are used to modify output currents for the spin-MOSFET.

Fig. 1
figure 1

Device structures of (a) spin-FET and (b) spin-MOSFET that are comprised of a MODFET structure or a MOSFET structure in combination with a ferromagnetic source and drain. FM and HMF represent a ferromagnet and a half-metallic ferromagnet, respectively

The spin-FET employs the Rashba spin-orbit interaction [2, 1113] for the spin precession of spin-polarized electrons in the channel with the magnetization configuration of the source/drain fixed. The spin moment of the electrons rotates while traveling owing to the effective magnetic field induced by the Rashba spin-orbit interaction. Since the Rashba spin-orbit interaction can be controlled by a gate bias, the parallel and antiparallel configurations between the electron spin and the magnetization of the ferromagnetic drain can be achieved by the spin orientation of the conducting electrons reaching the ferromagnetic drain. The output currents depend on these spin configurations, i.e., high and low currents are generated in the parallel and antiparallel configurations, respectively. The unique output characteristics, including oscillating output currents with respect to a gate bias and the resulting negative transconductance, can be obtained. Particular materials with a strong spin-orbit interaction, such as InGaAs, InAs, and InSb, are required for the channel to sufficiently induce the Rashba spin-orbit interaction. The scalability of the spin-FET depends on the channel material, and a material with a strong spin-orbit interaction could shorten the channel length. However, the strong spin-orbit interaction would result in shorter spin relaxation length due to D’yakonov-Perel’ mechanism. The scaling is limited to submicrons or more [14]. A MODFET structure is used as the basic structure of the spin-FET in the original proposal. There is a possibility that a MOSFET structure can be applied according to the future development of recently emerging III–V MOSFET technology [15]. The spin-FET has been considered to perform the on/off switching by the spin precession of the conducting channel electrons. In this case, the antiparallel configuration (low current state) is used for the off-state, and this leads to imperfect off-state (nonzero off-current) and causes the problem of a low on/off ratio [16]. The off-state of the spin-FET should be achieved by controlling a gate bias for a sufficient cutoff condition in the same fashion as ordinary FETs. Then, the two different on-states (high and low currents) should be employed for circuit operations, which would enable attractive circuit applications.

The channel of the spin-MOSFET is composed of silicon with a very weak spin-orbit interaction. Thus, spin precession of the conducting electrons can be ruled out as long as an external magnetic field is not applied intentionally. Furthermore, it was investigated using ESR experimental results that the spin relaxation time is expected to be sufficiently long (several nsec) in Si channels with a relatively low doping density even at room temperature [13]. However, recent spin injection experiments have showed a different result of much shorter spin relaxation time [17]. The discrepancy between these results needs to be verified.

The output current of the spin-MOSFET can be modulated by the magnetization configuration of the ferromagnetic source and drain, i.e., the high and low current drivabilities can be achieved in the parallel and antiparallel magnetization configurations, respectively. Moreover, in contrast to the spin-FET, the cutoff state of the spin-MOSFET is simply achieved by a gate bias condition in the same manner as ordinary MOSFETs. Since the spin-MOSFET requires no spin precession of spin-polarized electrons in the channel, it has the excellent scalability compatible with ordinary MOSFETs. In principle, the spin-MOSFET would have the same degree of performance as metal source/drain MOSFETs owing to the similarity in the device structure, although it is necessary to establish ferromagnetic source/drain technology adaptable to the present CMOS platform.

In order to evaluate the magnetization-configuration-dependent output characteristics of the spin-MOSFET, the magnetocurrent ratio is used as a performance index. The magnetocurrent ratio , γ MC, is defined by

$$ {\gamma}_{\mathrm{MC}}=\frac{I_{\mathrm{D}}^{\mathrm{P}}-{I}_{\mathrm{D}}^{\mathrm{AP}}}{I_{\mathrm{D}}^{\mathrm{AP}}}, $$
(1)

in which I D P and I D AP represent the drain currents in the parallel and antiparallel magnetization configurations, respectively. A suitable magnitude of γ MC depends on the application. A high γ MC value is preferable for nonvolatile memory applications. On the other hand, a moderate γ MC value would be required for logic circuit applications, because of a trade-off between γ MC and propagation delay. The spin-MOSFET uses two on-currents with the high and low drivabilities controlled by its magnetization configuration. Since propagation delay in typical integrated logic circuits is determined by charging and discharging speed of load capacitance (in other words, by the current drivability of driver transistors), the low I D AP in the antiparallel magnetization configuration restricts the propagation delay. Nevertheless, this situation depends on circuit configuration, and a circuit in which the effect of I D AP on the circuit performance plays a minor role can be configured, as shown later. In any case, γ MC should be designed to ensure correct logic operations with sufficiently low error rate. Note that there is a very different way for realizing spin transistors. The pseudo-spin-MOSFET [1, 18] is a simple circuit for reproducing the functions of a spin-MOSFET using an ordinary MOSFET and an MTJ.

Device Structures and Characteristics of Spin-MOSFETs

The basic structure of spin-MOSFETs [710] is comprised of an MOS capacitor and a ferromagnetic source and drain (S/D). Spin-MOSFETs can be classified by the structure of the ferromagnetic S/D. The ferromagnetic S/D need to satisfy appropriate contact resistance condition to exclude the conductivity mismatch problem and to generate spin-polarized currents in the channel. The S/D also should act not only as an electrical contact for the channel in the on-state but also as a blocking contact for leakage currents in the off-state. Note that when a body barrier using a highly resistive thin body or silicon-on-insulator (SOI) channel is employed, the blocking contact is not required.

In a manner analogous to ordinary MOSFETs, ferromagnetic pn junctions using a ferromagnetic semiconductor might be considered to be promising for the S/D of a spin-MOSFET. However, ferromagnetic pn junctions are not practicable at present. This is because n-type ferromagnetic semiconductors have not been realized yet. In addition, although p-type ferromagnetic semiconductors have been intensively investigated for many years, the Curie temperatures of ferromagnetic semiconductors are still lower than room temperature.

Ferromagnetic Schottky junctions using a ferromagnetic metal can be employed for the S/D of a spin-MOSFET (Fig. 2a), which is analogous to the recently developed metal S/D (or Schottky barrier S/D) MOSFETs [1921]. The on/off operation of the spin-MOSFET is based on the gate-bias-induced modification of the Schottky barrier width at the source/channel junction. The control of the Schottky barrier height, that is, the tuning of the contact resistance, is required to eliminate the conductivity mismatch problem. Nevertheless, the reduction of the Schottky barrier height is rather important, since the Fermi-level pinning at the junction interface causes a very high contact resistance. The reduction of the barrier height is necessary for high current drivability and high spin injection efficiency [8].

Fig. 2
figure 2

Band diagrams of spin-MOSFETs using (a) FM contacts, (b) HMF Schottky contacts, and (c) HMF tunnel contacts for the source/drain

Half-metallic ferromagnets (HMFs) are also useful for the ferromagnetic S/D. The band structure of HMFs consists of metallic and insulating (semiconducting) spin bands, and thus HMFs theoretically have a spin polarization of 100 % at the Fermi energy. In practice, the half-metallicity of HMFs would be degraded due to their imperfect quality and the influence of the HMF/Si interface [22, 23]. However, spin polarizations higher than 90 % were experimentally achieved (see section “Half-Metallic Ferromagnet Technology for Source/Drain”), which is sufficiently applicable to nonvolatile logic circuits, as discussed later (section “Energy-Efficient Integrated Circuit Applications”). Note that these high spin polarizations of HMFs are inaccessible to ordinary ferromagnets. The metallic spin band of HMF contacts forms a Schottky junction with the Si channel, and the insulating spin band forms an energy barrier whose barrier height is related to the bandgap of the insulating spin band. Thus, the spin-dependent barrier structure appears at the source/drain junctions (Fig. 2b), which acts as a highly efficient spin filter. Since highly efficient spin injection and detection are important to realize spin-MOSFETs, HMFs are promising for the ferromagnetic S/D of spin-MOSFETs. The control of the Schottky barrier height would also be required even for HMFs, as discussed below.

Tunnel contacts using an ultrathin insulating barrier are useful for controlling the Schottky barrier height (Fig. 2c). The HMF/Si and FM/Si junctions would induce the Fermi-level pinning phenomenon in which the Fermi energy of the HMF or FM electrodes is placed to a deep level in the bandgap of Si, regardless of the work function of the HMF or FM electrodes. For highly scaled-down spin-MOSFETs with a nanoscale channel length, the channel conductance gets to be fairly low, which causes the requirement of a sufficiently low contact resistance to resolve the conductance mismatch problem. In addition, lower effective Schottky barrier heights (ϕ in Fig. 2c) are favorable for the high current drivability. Therefore, the depinning and reduction of the Schottky barrier height are indispensable. A tunnel contact to an n+-Si region using a ferromagnetic electrode, that is, an application of a common maneuver to achieve an electrical contact, is the easiest way to reduce the effective Schottky barrier height. However, the highly doped n+-Si region would affect on a spin flip. A depinning tunnel contact with a low work-function electrode is also promising for eliminating the effective Schottky barrier [2426]. Noted that insulating ferromagnets [27] with an ordinary metal electrode are also applicable to a tunnel contact type of ferromagnetic S/D.

Figure 3a shows the calculated output characteristics of a spin-MOSFET with a HMF S/D [7]. The simulation was performed under the assumptions of ballistic transport and complete spin polarization (100 %) without any spin-flip scattering. The spin-MOSFET shows excellent transistor behavior in parallel magnetization (solid curves), and the drain current highly regulates in antiparallel magnetization (broken curves). In this case, γ MC is huge. However, it deteriorates when the spin polarization is degraded. Figure 3b shows the calculated output characteristics of a spin-MOSFET using the ferromagnetic S/D with a spin polarization of 70 % [8]. The magnetization-configuration-dependent output characteristics are established even in this case, although γ MC decreases with increasing drain bias V DS. This behavior is adaptable to particular logic applications (nonvolatile bistable circuits), discussed later, in which γ MC under low V DS conditions plays an essential role. A moderate γ MC value of ~100 % is sufficient for the nonvolatile bistable circuit applications. In the case of the ballistic conduction, this value would be achieved for the ferromagnetic S/D with a spin polarization of 70–80 % [8]. As noted previously, HMFs are important to realize such high spin polarizations. γ MC depends on the Schottky barrier height of the ferromagnetic S/D [8]. Lower Schottky barrier heights are required for achieving higher γ MC and also higher current drivability. There is a possibility of a decrease in γ MC when a spin flip of the conducting spin-polarized electrons occurs frequently in the channel. However, the remarkably long spin relaxation time in Si (with a moderate doping density) at room temperature [13] would likely be able to achieve sufficient γ MC values for logic applications. The subthreshold leakage current of spin-MOSFETs depends on the magnetization configurations of the ferromagnetic S/D. In particular, when the leakage current is caused by direct tunneling from the source to the drain [28], it would be effectively suppressed by the antiparallel magnetization configuration of the HMF S/D [29].

Fig. 3
figure 3

Calculated output characteristics of the spin-MOSFET (a) with the HMF source/drain (Reproduced with permission from [7]) and (b) with the FM source/drain (Reproduced with permission from [8])

In spin-MOSFETs, current-induced magnetization switching (CIMS) [3032] can be employed for changing the magnetization configuration of the ferromagnetic S/D. For instance, the CIMS can be performed by installing the structure of a current-perpendicular-to-plane spin-valve (CPP-SV) device to the ferromagnetic drain (or source), in which one of the electrodes in the CPP-SV structure is the ferromagnetic drain (or source) and the other electrode is the pinned layer (i.e., the ferromagnetic drain (or source) is the free layer). The CPP-SV structure is used for the CIMS rather than the magnetoresistive effect. In this structure, a parasitic resistance induced by the additional structure would be small and thus hardly deteriorate the transistor characteristics owing to the fully metallic layered structure of the spin valve. Recently, CIMS was observed in a CPP-SV device with half-metallic electrodes, and the possibility of CIMS with a low current density was demonstrated [33].

Half-Metallic Ferromagnet Technology for Source/Drain

The HMF S/D technology is an important stepping stone for developing spin-MOSFETs. Co-based full-Heusler alloys, such as Co2FeSi (CFS), Co2FeSi1-x Al x (CFSA), and Co2MnSi (CMS), are theoretically predicted and experimentally confirmed to exhibit very high spin polarizations even at temperatures greater than room temperature. Recently remarkably high tunnel magnetoresistance (TMR) ratios were observed in MTJs with full-Heusler alloy electrodes, as shown in Table 1 [3441]. Although these investigations imply the half-metallicity of full-Heusler alloys, it is necessary to note that the results depend on the interface structure between the full-Heusler alloy electrode and tunnel barrier and are affected by the spin filter effect of the tunnel barrier. The high TMR ratios shown in Table 1 would expect that these full-Heusler alloys are feasible for the S/D of spin-MOSFETs. As described previously, if the magnetoresistive effect in the ferromagnetic S/D is used to achieve the off-state of spin-MOSFETs, these spin polarizations are not enough [16]. However, the off-state of spin-MOSFETs is achieved by the gate bias and not by the magnetization configuration. The magnetoresistive effect is employed to generate high and low on-currents controlled by the magnetization configuration. Therefore, full-Heusler alloys are a feasible candidate for the HMF S/D of spin-MOSFETs.

Table 1 Magnetic tunnel junction (MTJ) structures using full-Heusler alloy electrodes and their tunneling magnetoresistance (TMR) and spin polarization (Ps)

The full-Heusler alloy electrodes of these MTJs are usually formed using a sputtering method at room temperature and successive thermal annealing. In order to establish the HMF S/D technology for spin-MOSFETs adapted to the Si CMOS platform, HMFs would prefer to be formed by silicidation induced by rapid thermal annealing (RTA) that is a widely used technique in the present CMOS fabrication process. Full-Heusler alloys containing Si, such as CFS, CFSA, and CMS, can be considered to be a kind of silicide, and thus they could possibly be formed by RTA-induced silicidation. Takamura et al. proposed an RTA technique for full-Heusler alloy thin films using a silicon-on-insulator (SOI) substrate [42, 43], in which CFS thin films were formed by RTA-induced silicidation of an Fe/Co/SOI multilayer, as shown in Fig. 4a. Since diffusion of the transition metal atoms is blocked by the buried oxide (BOX) layer of the SOI substrate, the stoichiometric composition can be achieved by adjusting the film thicknesses of the transition metal layers and the SOI layer. Very high-quality (110)-oriented texture CFS films with the L21-ordered crystal structure (discussed later) were formed by this method. Using the RTA-induced silicidation of an Fe/Co/amorphous Si (a-Si) multilayer, a CFS film (whose quality is comparable to the CFS films formed by the RTA-induced silicidation using a SOI substrate) can be directly formed on an amorphous insulator film without a SOI substrate, as shown in Fig. 4b [44, 45]. Furthermore, the formation of L21-phase CFSA films was investigated by the RTA-induced silicidation of a Co-Fe-Al alloy/SOI stack structure [46].

Fig. 4
figure 4

Formation process of Co2FeSi (CFS) using RTA-induced silicidation of (a) Fe/Co/Si-on-insulator (SOI) and (b) Fe/Co/amorphous Si multilayers

The half-metallicity of full-Heusler alloys depends on the degree of order of the atomic arrangement in their crystallographic structures. The most ordered structure is L21, and it is well recognized that other disorder structures such as the A2, B2, and D03 structures exist. These possible crystal structures are shown in Fig. 5. The ordered structure hierarchy is L21-B2-A2, and the D03 disorder is considered to be a particular case of the A2 disorder. The L21 structure of full-Heusler X 2 YZ alloys consists of eight stacked body-centered cubic (bcc) lattices, and the corner sites of each bcc lattice are occupied by X atoms and the body-centered sites are occupied by Y and Z atoms regularly. In the B2 structure, the YZ sublattice is disordered, i.e., the exchange between Y and Z atoms in the L21 lattice results in the B2 structure. The A2 structure has no ordered sublattice, i.e., both the X and YZ sublattices are disordered by the exchanges among X, Y, and Z atoms. The D03 structure is induced by the exchange between X and Y atoms in the L21 structure, e.g., in the case of CFS, the exchange between Co and Fe causes the D03 structure.

Fig. 5
figure 5

L21, B2, A2, D03 structures of full-Heusler alloys

In general, the degree of the L21 order is highly important for the half-metallicity of full-Heusler alloys. The disordered structures induce gap states in the minority gap, resulting in degradation of the half-metallicity. To evaluate the degree of atomic arrangement order in full-Heusler alloys, indices S L21 and S B2 are commonly used [43, 47], which represent the degrees of order of the L21 and B2 structures, respectively, and can be determined from X-ray diffraction analysis. It is worth noting that since CFSA can adjust its Fermi energy by the Al composition so as to lie near the center of the minority gap [4850], it can exhibit a half-metallic band structure even in the B2 structure. These electronic property and structural robustness of CFSA make it attractive for device applications. The A2 and its related D03 disorder structures significantly degrade the half-metallicity of full-Heusler alloys [48, 51]. Furthermore, it is difficult to distinguish the D03 disorder structure from the L21 structure by the commonly used X-ray diffraction analysis technique [52]. Recently, a new X-ray diffraction analysis technique that can distinguish these structures was developed [53].

S L21 and S B2 for the CFS films formed by the RTA-induced silicidation increased with increasing the RTA temperature, as shown in Fig. 6a, and the films exhibited very high S L21 and S B2 [43] that are comparable to those of a CFS bulk alloy treated by annealing at 1000 °C for 20 days [54]. The CFS films were also found to incorporate few D03 disorder, as shown in Fig. 6b [53], i.e., the RTA-formed CFS films have a nearly perfect L21 structure.

Fig. 6
figure 6

(a) Degree of L21 order for RTA-formed CFS thin films as a function of degree of B2 order. (b) Site occupancies as a function of RTA temperature for the RTA-formed CFS thin films

Prototype spin-MOSFETs with the SOI channel were fabricated using RTA-formed CFS contacts for the S/D [55] and using ferromagnetic Fe-silicide contacts for the S/D [56]. These contacts of the devices employed the CFS/Si or Fe-silicide/Si Schottky junctions for the S/D. The on-currents of these devices were severely degraded, which would be caused by the Fermi-level pinning phenomenon. As a general phenomenon, ferromagnetic silicides including full-Heusler alloys could induce dense gap states at the junction interface with Si, resulting in the Fermi-level pinning at a deep level in the bandgap of Si. Therefore, barrier height control is required when ferromagnetic Schottky junctions are used for the S/D of spin-MOSFETs, as described previously. Note that a moderate contact resistance (that depends on the channel resistance) is required to exclude the conductivity mismatch problem [5759]; however, a huge contact resistance cancels out the magnetoresistance effect in spin-MOSFETs [59].

The dopant segregation technique [60] was tried to reduce the Schottky barrier height of the CFS/Si junctions, i.e., ion implantation of As atoms with the relatively small injection energy and successive RTA-induced silicidation of CFS were performed. The As atoms were expected to be segregated at the interface of the CFS/Si interface, resulting in the reduction of the barrier height. The current drivability of the As-segregated device was confirmed to be much higher than that of a reference device without As implantation [55]. The dopant segregation technique was also effective at achieving a high on/off ratio and at reducing the subthreshold swing [55]. The most important issue for this technology would be the formation of an abrupt interface. The reduction of the SOI thickness could possibly be effective at forming such abrupt junctions.

The RTA-induced silicidation technique for full-Heusler alloy thin films can also be applied to the formation of a tunnel contact with an L2 1-ordered full-Heusler alloy electrode [44, 45]. A high-quality CFS/SiO x N y /n-Si(100) tunnel junction was fabricated by the RTA-induced silicidation technique using the SiO x N y barrier layer formed by radical oxynitridation of an epitaxial intrinsic Si layer grown on the n-Si(100) substrate, in which the CFS electrode was formed by the silicidation of an Fe/Co/amorphous Si multilayer deposited on the barrier layer. Since diffusion of Co and Fe atoms into the Si substrate during the RTA was blocked by the SiO x N y layer, the interface between CFS and SiO x N y layers was atomically flat without crystallinity degradation of the CFS film, as shown in Fig. 7, and the crystallographic features of the CFS layer were comparable to those of a high-quality CFS film formed by the RTA-induced silicidation of an Fe/Co/SOI multilayer.

Fig. 7
figure 7

Cross-sectional high-resolution TEM image of an RTA-formed CFS/SiO x N y /Si junction (With kind permission from Springer Science Business Media [45])

Full-Heusler-alloy/insulator/Si junctions are promising for the S/D tunnel contacts of spin-MOSFETs. Since the depinning effect would be expected in such tunnel junctions owing to the insulator film deposited on the Si surface [24, 25], the barrier height can be controlled by the work function of the full-Heusler alloy electrode. Recently, the work function of CFSA films was shown to be controlled to a sufficiently low value applicable to spin-MOSFETs by the composition of Al [61]. Other depinning contact structures using low work-function interlayer materials such as CoFe/Dy/AlO x /n-Si and CoFe/Mg/AlO x /i-Si were also proposed [26, 27].

The RTA-induced formation technique is also useful for the epitaxial germanidation of Ge-containing full-Heusler alloy (e.g., Co2FeGe) thin films [62], which would be applicable to Ge-channel spin-MOSFETs. Other types of ferromagnetic source/drain (spin injector/spin detector) structures using ordinary ferromagnetic materials were also investigated [6366].

Spin Transport in Silicon Channel: Current Topics

Understanding and controlling spin injection/transport/detection for Si MOS inversion channels are critical challenges for realizing spin-MOSFETs. Various types of techniques have been applied to investigate these phenomena for Si channels [6776], as shown in Table 2. In particular, the Hanle effect induced by the spin precession of traveling spin-polarized electrons is considered to be the most powerful tool for investigations of the spin transport phenomena in semiconductor channels [67, 68, 77]. The spin dynamics (e.g., spin lifetime) can quantitatively be analyzed from the magnetic field interval (which is referred to as B π ) of the oscillation peaks of Hanle-effect signals [67, 68].

Table 2 Various techniques for investigating spin injection phenomena for Si channels

Huang et al. [68] successfully observed oscillatory signals of coherently traveling spin-polarized electrons in bulk Si, using their developed spin device based on a spin-valve transistor. Their observed bias dependence of the Hanle-effect signals, in which magnetic field intervals between the oscillatory peaks increased with increasing bias voltage, was the overwhelming evidence of the spin transport. However, this technique is difficult to be adapted to FET-type devices, since the specific vertical-type device using hot electron transport is required for these measurements.

The spin accumulation technique using three-terminal planar devices [72, 74] has widely been employed to investigate the Hanle effect. However, this technique could be difficult to distinguish observed signals from other spurious signals such as spin signals from trapped electrons at the ferromagnet/semiconductor interface [78]. Furthermore, spin transport phenomena in the semiconductor channel cannot be evaluated from this technique. The nonlocal technique using four-terminal planar devices [6971] can evaluate the Hanle effect for pure spin currents in the semiconductor channel. However, its signal intensities are weak and multiple oscillations are difficult to be obtained owing to the diffusive spin transport that causes the strong dephasing effect.

A new Hanle-effect device based on a MOSFET type of spin transistor was proposed to reveal the dynamics of spin-polarized electron transport in the Si MOS inversion channel [79, 80]. The proposed device has abilities to detect spin signals with high sensitivity and to distinguish spin transport signals from the other spurious signals. Spin transport behaviors induced by the Hanle effect in the device were theoretically analyzed and were found to be well correlated to the universality of electron mobility in the MOS inversion channel. The Hanle-effect spin transistor can be expected to elucidate the true nature of spin transport in the Si MOS inversion channel.

Figure 8 shows the proposed Hanle-effect spin transistor that consists of a MOS structure, two ferromagnetic tunnel contacts (FM1 and FM2), and two nonmagnetic ohmic contacts (NM1 and NM2). Spin-polarized electrons are injected from FM1 to the source region by applying a bias V inj. Then, a part of the spin-polarized electrons is extracted to the channel region by a bias V acc. During the transport in the channel region, the Larmor precession of the traveling spin-polarized electrons is induced by a magnetic field B applied perpendicular to the channel. When the traveling spin-polarized electrons reach the drain-side channel edge, the spin direction is rotated with an angle determined by the velocity v of the electrons, the channel length L, and B. By applying a bias V det, a part of the spin-polarized electrons is selectively extracted to FM2 owing to the spin-dependent tunneling effect, and the resulting current I det sensitively relates to the spin polarization and direction of the transported electrons. Therefore, I det oscillates reflecting the spin precession of the traveling electrons. This measurement technique is artifice for observing spin signals with high efficiency even when ordinary ferromagnets (whose spin polarization is not so high) are used for FM1 and FM2. Note that using ideal half-metallic ferromagnets with a high spin polarization, simple spin-MOSFET structures [1, 710] would be directly applied to detection and manipulation of spin-polarized currents. Also note that the carrier density in the MOS inversion channel can be controlled by not only the gate bias but also the body (substrate) bias (not shown in the figure). In the following analysis, MOSFET operations in the linear region are assumed for the Hanle-effect spin transistor. The effective spin polarization P eff due to the Hanle effect at the drain-side channel edge under the dc condition is given by

Fig. 8
figure 8

Device structure of Hanle-effect spin transistor (Reproduced with permission from [80])

$$ {P}_{\mathrm{eff}}(B)={\displaystyle {\int}_0^{\infty}\frac{L}{2t\sqrt{\pi Dt}}{e}^{-\frac{{\left(L-vt\right)}^2}{4Dt}}{e}^{-\frac{t}{\tau_{\mathrm{sf}}}} \cos \left(\frac{g{\mu}_BB}{\hslash }t\right)\mathrm{d}t}, $$
(2)

where g is the g-factor (= 2) of Si, μ B the Bohr magneton, ħ the reduced Planck constant, τ sf the spin lifetime, and D the diffusion constant.

A magnetic field B π required to rotate the spin direction of the transported electrons by π is given by the intervals between the oscillating peaks of P eff. However, its accuracy depends on whether V acc is sufficient or not. Thin and solid curves in Fig. 9 show P eff as a function of B for V acc = 0 and 3 V, respectively. For the nonlocal condition (thin curve), the peak intensity at B = 0 is highly weakened and the multiple oscillation is unclear owing to the strong dephasing effect. Although the second peaks are severely reduced, they could be detected by highly sensitive measurement techniques [6971]. However, these peaks cannot represent B π owing the effect of the diffusive transport with the considerable spin relaxation. In contrast, when V acc is sufficiently applied, P eff dramatically increases and the multiple oscillations get to appear clearly, as shown by the solid curve in Fig. 9. In this situation, the intervals between the peaks approximately represent B π . This is due to the dephasing effect weakened by applying V acc.

Fig. 9
figure 9

Effective spin polarization as a function of magnetic field for V acc = 0 V (thin curve) and 2 V (solid curve). The inset shows the magnified curve for V acc = 0 V

B π can be estimated from the first magnetic field interval B π osc of the oscillatory peaks of P eff. B π osc can be expressed by \( {B}_{\pi}^{\mathrm{osc}}={\beta}_{\pi}\frac{V_{\mathrm{acc}}}{L^2}+\varDelta B \), where β π is given by πћμ eff /gμ (in which μ eff represents the effective electron mobility in the linear operation mode) and ΔB is the correction factor [80]. Here, β π osc is defined by the following equation: β π osc = B π osc L 2/V acc. Note that when ΔB can be neglected, β π osc corresponds to β π .

β π (= πћμ eff /gμ B ) is proportional to μ eff, and μ eff in MOS inversion channels is governed only by the vertical effective electric field E eff inducted by a gate bias V G, which is known as the universality of μ eff [81]. Therefore, β π can also show the same universality of μ eff. Figure 10a shows β π osc as a function of V acc for τ sf = 0.1, 0.5, 1, 5, and 10 ns. The top horizontal axis shows t e (which is the particle-picture representation of the transit time). When t e is sufficiently shorter than τ sf, β π osc corresponds to β π . In contrast, when t e is longer than τ sf, β π osc is deviated from β π . To satisfy β π osc = β π , higher V acc is required for shorter τ sf. When sufficiently high V acc is applied so that the relation t e < τ sf is satisfied, β π osc is identical to β π regardless of τ sf.

Fig. 10
figure 10

(a) β π osc as a function of V acc for τ sf = 0.1, 0.5, 1, 5, 10 ns. (b) β π osc as a function of E eff for V acc = 3 V and τ sf = 8 ns. The notations are described in the text (Reproduced with permission from [80])

A solid curve in Fig. 10b shows β π osc as a function of E eff. β π is also plotted by a thin curve in the figure. In this figure, a constant τ sf = 8 ns [13] is assumed for the entire E eff region, and V acc is set to 3 V so that t e (= 0.56 ns) becomes much shorter than τ sf. In this condition, β π osc corresponds to β π , and it obeys the universal curve of μ eff [81]. This behavior becomes the crucial evidence of spin transport in the MOS inversion channel. In contrast, when V acc reduces, β π osc starts to deviate from the universal curve of β π . This feature is beneficial to experimentally determine τ sf. Since the dominant scattering mechanisms (such as impurity, phonon, and roughness scatterings) for μ eff can be changed by E eff [81], the relation between the scattering mechanisms and τ sf can be investigated using the universality curve of β π osc.

Pseudo-Spin-MOSFET Technology

The pseudo-spin-MOSFET (PS-MOSFET) is a circuit for reproducing the functions of spin transistors using an ordinary MOSFET and an MTJ [1, 17, 82, 83]. Figure 11a shows the circuit configuration of the PS-MOSFET. The MTJ connected to the source of the MOSFET feeds back its voltage drop to the gate, and the degree of negative feedback depends on the resistance states of the MTJ. Therefore, the effective input bias V GS0 and also substrate (body-source) bias V BS0 can be varied by the magnetization configuration of the MTJ even under a constant gate bias (V G) condition. Therefore, the PS-MOSFET can possess high and low current drivabilities that are controlled by the magnetization configuration of the MTJ, as shown in Fig. 11b. Although the series resistance change of the MTJ also affects the output current of the PS-MOSFET, the negative feedback effect is more effective for the current modulation. In addition, magnetic-field-free CIMS for the MTJ can be achieved in a high V G condition, as shown in Fig. 11c. (Note that V G required for CIMS can be designed by the resistance of the MTJ and the size of the MOSFET.) Thus, the PS-MOSFET can reproduce the spin-transistor behavior and would be the most promising spin transistor based on the present magnetoresistive random access memory (MRAM) technology. Desired γ MC can be designed by adjusting the TMR and resistance of the MTJ and the size of the MOSFET [83]. However, it should be noted that although γ MC increases with increasing MTJ resistance, the higher resistance degrades the current drivability of the PS-MOSFET. Also note that when the MTJ is connected to the drain of the MOSFET, the difference in the drain currents between the parallel and antiparallel magnetization configurations decreases with V DS, as shown in Fig. 11d. This is due to the absence of the negative feedback effect. However, this causes another feature, i.e., the drain currents are higher than those in the PS-MOSFET configuration. Although both the configurations can act as spin transistors, the preferable configuration would depend on applications.

Fig. 11
figure 11

(a) Circuit configuration of pseudo-spin-MOSFET (PS-MOSFET). (b) Calculated output characteristics and (c) CIMS behavior of the PS-MOSFET, in which the free layer of the MTJ is connected to the source terminal of the MOSFET. (d) Calculated output characteristics of another type of PS-MOSFET in which the MTJ is connected to the drain side of the MOSFET. Dashed curves in (c) and (d) show the output currents in the case that the CIMS does not occur

A prototype PS-MOSFET was fabricated using an MTJ with a full-Heusler alloy (Co2FeAl; CFA) electrode and an MgO tunnel barrier [82]. The PS-MOSFET showed high and low current drivabilities that were controlled by the magnetization configurations of the MTJ, i.e., the spin-transistor behavior of the fabricated device was confirmed. γ MC of the PS-MOSFET increased with decreasing V D and also increased with increasing V G. The maximum γ MC value was as high as 45 %. The CIMS behavior in a PS-MOSFET and monolithic integration of PS-MOSFETs using a vendor-made CMOS chip were also investigated [83, 84].

As described above, spin transistors can be virtually realized by the PS-MOSFET architecture based on the present MRAM technology. Design of PS-MOSFETs can be easily achieved using a general circuit simulator such as SPICE with an appropriate circuit model of MTJs [85]. Therefore, spin transistors can be employed in logic circuits sooner rather than later, when the MRAM technology is added on the CMOS logic platform.

Energy-Efficient Integrated Circuit Applications

Nonvolatile Bistable Circuits

The most attractive applications of spin-functional MOSFETs (that are a collective term for spin-MOSFETs and PS-MOSFETs) would be in highly functional low-power logic circuits called “nonvolatile logic,” although they are also applicable to nonvolatile memory. Nonvolatile logic is a generic name for logic gates, circuits, and systems that maintain their state despite a power shutdown/failure by using nonvolatile memory or its elements. Various types of nonvolatile logic have been proposed [1]. Here, nonvolatile power-gating (NVPG) architecture [1] using spin-functional MOSFETs is reviewed, which would serve as a basis for spintronic nonvolatile logic circuits/systems.

Power dissipation has been one of the most important concerns for highly integrated CMOS logic systems, such as microprocessors and system-on-chip devices (SoCs) [86, 87], since it constrains the performance and the degree of device integration. In general, power dissipation in CMOS logic circuits can be divided into two factors, i.e., dynamic and static powers. The former is caused by on-currents passing through the CMOS logic gates due to logic operations and the latter by leakage currents in the CMOS gates even during standby mode in which no logical operations are executed. The magnitude of the leakage current for each individual transistor is exponentially small in comparison with the on-current. However, the static-power dissipation gives rise to severe problems for CMOS logic circuits owing to their very large-scale integration and device size scaling. Recently proposed power-gating (PG) architectures based on multithreshold voltage CMOS (MTCMOS) technology [8691] are very effective at reducing the static-power dissipation in CMOS logic circuits. In this type of architecture, logic circuits/systems on a chip are partitioned into several circuitry domains, called power domains that are electrically separated from power-supply lines and/or ground lines by sleep transistors (power switches). These domains can be shut down during standby mode, and the static power is thereby considerably reduced. A key technology for realizing PG systems is the backup of logic information in the power domains. Flip-flop (FF) and static random access memory (SRAM) used in various registers and caches act as bases for the hierarchical memory system of microprocessors and SoCs. However, they cannot be shut down without losing their logic data. Therefore, several architectures were developed for realizing PG systems, e.g., a regulated power supply for SRAM, data transfers from FFs to backup devices through purpose-built interconnects or a bus line, and balloon/retention FF technologies with dual-power rails. Although PG systems have already been developed with these techniques, disadvantages exist depending on architecture used to achieve PG, e.g., limited static-power reduction, addition of purpose-built interconnects for data transfer, extra time and power for data transfer using a bus line, excess area occupation for backup devices, and greater layout/control complexity due to the dual-power rails. These disadvantages would be solved by introducing nonvolatile SRAM (NV-SRAM) and nonvolatile FF (NV-FF) into PG systems [18, 9298].

NV-SRAM and NV-FF cells can be configured by connecting resistive or capacitive nonvolatile memory elements to the bistable circuit (inverter loop) of standard SRAM and FF cells, respectively [1]. When MTJs are used as the nonvolatile memory elements for the NV-SRAM/NV-FF cells, attractive features including low-voltage operation and high write/erase cycle endurance would be prospected. Nevertheless, the MTJs connected to the NV-SRAM/NV-FF cell will deteriorate their circuit performance, such as degrading the operating speed, variability tolerance, and static noise margin (SNM) and also increasing the power dissipation during normal SRAM/FF operation mode. In order to overcome these problems, a new approach using spin-functional MOSFETs has been proposed [18, 9298]. Since the spin-functional MOSFETs can electrically separate the bistable circuit from the nonvolatile memory elements, they have little or no deleterious effects on the bistable circuit operation, as shown below.

Figure 12a shows the circuit configuration of an NV-SRAM cell, in which two PS-MOSFETs are connected to the storage nodes of a standard SRAM cell [18, 9294]. Figure 13 shows simulated waveforms for the cell operations. When the cell is powered off, the store operation is executed, i.e., data on the storage nodes are stored into the MTJs in the PS-MOSFETs by CIMS. This can be performed by the application of a pulse signal to the CTRL line after the PS-MOSFETs are turned on. Then, the cell can be shut down without losing its logic information. In the case of a restart (wake-up from the shutdown state), the stored data in the MTJs can be restored to the storage nodes by the restore operation, in which only the power supply of the inverter loop is pulled up after the PS-MOSFETs are turned on. Since the PS-MOSFETs are activated only during the store and restore operation modes, currents passing through the MTJs (which are harmful for the normal SRAM operation) can be shut off during the normal SRAM operation mode.

Fig. 12
figure 12

Circuit configurations of (a) NV-SRAM and (b) NV-FF cells using PS-MOSFETs. The direction of the MTJs depends on the power-gating architecture, i.e., the pinned layer of the MTJs is connected to the source terminal for the virtual power-supply (VDD) architecture, and the free layer of the MTJs is connected to the source terminal for the virtual ground architecture

Fig. 13
figure 13

Simulated waveforms of NV-SRAM cell during store, shutdown, and restore modes

Nonvolatile latch (NV-LAT) and NV-FF circuits can also be configured using PS-MOSFETs in the same manner as the NV-SRAM cell [9597]. Figure 12b shows a positive edge-triggered master slave nonvolatile delay-FF (NV-DFF) that consists of a conventional LAT and an NV-LAT, as indicated in the figure. These NV-SRAM and NV-DFF cells have the following features: (i) These cells can be completely powered off without losing their logic information. (ii) The performance of the normal SRAM/DFF operations is hardly degraded, since the PS-MOSFETs can be electrically separated from the bistable circuit. (iii) The store/restore operations are quite simple. (iv) A moderate TMR ratio of ~100 % and a moderate Vhalf (that is a bias voltage when the TMR ratio is reduced to half its original value) of ~100 mV are sufficient for the restore operation, i.e., a high TMR ratio and a high Vhalf value for the MTJs are not required.

Supplemental remarks concerning the features (ii) and (iv) are briefly discussed below. Figure 14 shows SNM for the read operation of various NV-SRAM cells. Note that the ordinary SRAM part of all the NV-SRAM cells was designed by reference to an optimized 6T-SRAM cell using 65 nm MOSFETs [93, 94]. For all the cells, the read operation gives the worst-case SNM. The SNM of the NV-SRAM cell using PS-MOSFETs is independent of the connection of the MTJs and is completely the same as that of the corresponding volatile 6T-SRAM cell. This is due to the effect of the PS-MOSFETs, i.e., the electrical separation of the MTJs from the bistable circuit part during the read operation. On the other hand, the SNMs of the other NV-SRAM cells are severely deteriorated by the effect of the MTJs connected directly to their inverter loops. For the NV-DFF using PS-MOSFETs, the degradation of the operation speed is only around 5 % compared to an ordinary DFF [83]. This degradation is caused by the parasitic capacitance induced by the connection of the PS-MOSFETs, and its magnitude is comparable to that of retention FFs such as balloon FFs. The PS-MOSFET architecture has little or no deleterious effects on the bistable circuit operations.

Fig. 14
figure 14

(a)–(f) Circuit configurations of 6T-SRAM and various NV-SRAM cells. (g) Static noise margins (readout operation) of these SRAM/NV-SRAM cells. Note that the ordinary SRAM part of all the NV-SRAMs cells was designed by reference to an optimized 65-nm-technology 6T-SRAM cell

Figure 15a shows the effect of Vhalf on the restore operation of the NV-SRAM cell using PS-MOSFETs. Lower Vhalf values necessitate slightly longer times to generate the sufficient voltage difference between VQ and VQB. However, this does not lead to crucial degradation of the restore operation. The inverter loop of the cell acts as a kind of a sense amplifier after the initial bistable condition is established, and the resulting bistable condition can be determined only at the initial stage of the restore operation. Since VQ and VQB are very small during the initial stage of the restore operation, the reduction of the TMR ratio caused by the effect of Vhalf is small. Therefore, Vhalf plays a minor role in the restore operation. Figure 15b shows the effect of the TMR ratio on the restore operation of the NV-SRAM cell. Moderate TMR ratios less than 100 % are acceptable for the restore operation. This is because only a very small current difference between the two MTJs is required in principle at the initial stage of the restore operation. It is worth noting that a TMR ratio of 100 % for the MTJs is equivalent to γ MC = ~80 % for the PS-MOSFETs in the case shown in Fig. 15. Also note that in practice a TMR ratio required for NV-SRAM/NV-DFF cells should be determined by taking into account variability of constituent transistors and MTJs in the cells.

Fig. 15
figure 15

Time evolution of the node voltages and TMR ratio of the NV-SRAM cell using PS-MOSFETs during the restore operation, in which (a) Vhalf and (b) maximum TMR (at a zero bias condition) are varied

Nonvolatile Power-Gating Application

PG is the most attractive architecture to reduce static-power dissipation in advanced CMOS logic systems, as described previously. PG execution requires data transfer for state and data retention, which costs performance overhead and restricts the energy performance of PG. Thus, nonvolatile memory circuits, such as nonvolatile registers, register files, and caches, have a great impact on highly efficient PG of logic systems. Recently, NVPG has been proposed, in which NV-SRAM and NV-DFF cells based on the PS-MOSFET architecture were employed for these nonvolatile memory circuits [1, 85, 97100]. The NV-SRAM- and NV-DFF-based circuits are highly suitable for NVPG owing to their important features of “electrical separation of the normal SRAM/FF operation and the nonvolatile memory operation.” The normal/nonvolatile operation separation is essential to avoid performance degradation caused by introducing nonvolatile functionality into SRAM/DFF circuits. Furthermore, this separation results in effective reduction of their break-even time (BET) [101] that is one of important performance indices of NVPG. In this part, PG ability of the NV-SRAM/NV-DFF cells shown in Fig. 12 is discussed using the BET.

Figure 16 schematically shows the time evolution of static leakage power and power required for the NVPG operations in the NV-SRAM or NV-DFF cell. The static power of the equivalent volatile SRAM/DFF cell is also shown in the figure. The operation sequence includes normal SRAM/DFF operation, store, shutdown, and restore modes (also see the text for Fig. 12a). The BET is defined by a shutdown period when the extra energy required for the NVPG operations (which includes an increase in the static leakage energy of the cell) is equal to the static energy saved (not wasted) during the shutdown period [9297, 100, 101]. When the shutdown period is longer than the BET, the static energy is effectively saved by the NVPG operations. A shorter BET makes it possible to execute temporally and spatially fine-grained (energy-efficient) NVPG. The BET is given by comparing the extra energy to the static energy of the equivalent volatile SRAM/DFF cell:

Fig. 16
figure 16

Time evolution of static power and power required for NVPG of the NV-SRAM/NV-DFF cell (Reproduced with permission from [97])

$$ BET=\frac{E_{\mathrm{store}}+{E}_{\mathrm{restore}}}{\left({I}_L^V-{I}_L^{SD}\right){V}_{DD}}+{\eta}_L{\tau}_{\mathrm{exe}}= BE{T}_{SR}+ BE{T}_L, $$
(3)

where E store and E restore represent the energies required for the store and restore operations, respectively, and η L is given by η L = (I L NVI L V)/(I L VI L SD). I L NV and I L SD represent the leakage currents of the NV-SRAM/NV-DFF cell during the normal SRAM/DFF operation and shutdown modes, respectively. I L V represents the leakage current of the equivalent volatile SRAM/DFF cell during the normal operation mode. This expression can easily be expanded to a general nonvolatile logic circuit/domain including ordinary CMOS logic gates and nonvolatile memory circuits [96, 97].

BET can be divided into two components: BET SR and BET L. BET SR (the 1st term of Eq. 3) is governed by the energy dissipation owing to the store and restore operations, and BET L (the 2nd term of Eq. 3) is dominated by the energy dissipation owing to the static leakage current during the normal SRAM/DFF operation mode (and thus BET L is proportional to the duration τ exe of the normal SRAM/DFF operation mode).

The store (write) operation for the MTJs in the NV-SRAM and NV-DFF cells requires high energy that prolongs BET SR. Therefore, the NV-SRAM and NV-DFF cells should not employ nonvolatile data retention during their normal operation mode and the normal/nonvolatile operation separation is essential to minimize BET SR. BET SR can be successfully reduced by write bias control (for the SR and CTRL lines shown in Fig. 12) and cell design optimization (channel width/length (W PSM/L PSM) of the PS-MOSFETs) for the NV-SRAM and NV-DFF optimization, respectively [9297]. BET SR can also be reduced by shortening the write pulse width (τ wpw) for the MTJs [92, 97]. Nevertheless, the reduction of τ wpw causes an increase in the write current margin [92].

The leakage current (I L NV) of the NV-SRAM/NV-DFF cell during the normal SRAM/DFF operation mode is higher than that of the equivalent volatile SRAM/DFF cell owing to the addition of the PS-MOSFETs. However, I L NV can be sufficiently reduced by bias control (for the CTRL line) and cell design (W PSM/L PSM) for the NV-SRAM and NV-DFF cells, respectively [9297]. Thus BET L can also be minimized by these leakage reduction techniques.

Figure 17a shows BET as a function of τ exe for the NV-SRAM cell. The write bias control and the leakage control can effectively reduce BET SR and BET L, respectively. When the NV-SRAM cell is applied to caches, the shutdown operation without the write operation to the MTJs, i.e., store-free shutdown, is available [9294]. This is because there frequently exists the situation that data already stored in the MTJs of the NV-SRAM cells before shutdown are required after the next step wake-up. In this case, it is not necessary to rewrite data to the MTJs in the shutdown operation. The energy dissipation of this shutdown operation can be highly saved, resulting in a very short BET.

Fig. 17
figure 17

(a) BET as a function of normal SRAM operation duration τ exe for the NV-SRAM cell, in which various BET reduction techniques (bias controls) are employed. (b) BET as a function of τ exe for the NV-DFF cell, in which the area occupation ratio R AO of the NV-DFFs to power domain and the channel width/length ratio (W PSM/L PSM) of the PS-MOSFETs are varied (Reproduced with permission from [97])

Figure 17b shows BET as a function of τ exe for the NV-DFF cell, in which the area occupation ratio R AO of the NV-DFFs to the power domain and the W PSM/L PSM ratio of the PS-MOSFETs are varied. BET effectively decreases with decreasing W PSM/L PSM, since the write currents and the leakage currents of the NV-DFFs are diminished by the W PSM/L PSM optimization. Moreover, BET strongly depends on R AO and is reduced by decreasing R AO. Note that R AO of FFs in typical logic circuits is known to be several ten percent.

The above-described controlled BET for the NV-SRAM and NV-DFF cells was acceptable for fine- and coarse-grained NVPG of logic systems such as microprocessors and SoCs. It should be noted that the power domains are partitioned into sufficiently smaller size so that not only coarse- but also fine-grained NVPG can be achieved. A shutdown period restricted by BET L is adaptable to core-level NVPG of these logic systems, and a shutdown period restricted by BET SR is applicable to more spatially and temporally fine-grained NVPG for the logic domains of the logic systems. The combination of these short- and long-period shutdown operations would lead to highly energy-efficient NVPG [96100].

Nonvolatile Hierarchical Memory System

One of the key factors for the remarkable development of microprocessors and also SoCs is the hierarchical memory architecture that enables high-speed operations in spite of its totally large capacity. The hierarchical memory architecture is still essential even in nonvolatile logic systems. A nonvolatile hierarchical memory system for microprocessors and SoCs can be constructed by applying the NV-FF and the NV-SRAM cells to its constituent memory circuits such as registers, register files, and caches, and NVPG architecture can be established using the nonvolatile hierarchical memory system. This architecture allows temporally and spatially optimized granularity of power gating with high energy efficiency that cannot be achieved only by the present CMOS technology.

Figure 18 schematically shows a proposed NVPG multicore processor (NVPG-MP) [97, 99, 100]. (In this example, first-level and second-level caches (L1$s and L2$) are employed. Of course, another memory organization is also applicable.) The processor cores are assumed to employ the reduced instruction set computer (RISC) and pipeline architectures. The register files (RFs), program counters (PCs), configuration registers (CRs), and L1$s in the processor cores are nonvolatilized using the NV-SRAM or NV-DFF cells. These nonvolatilized memory circuits are required for core-level NVPG (whose domains are indicated by the dotted lines in the figure). However, the pipeline registers (PRs) in the cores should be comprised of ordinary (volatile) FFs to retain high operation speed (clock frequency) of the critical path. The L2$ and the other CRs (that are used in the high-speed memory interfaces, external I/O interfaces, functional modules (FMs), and reconfigurable processing modules (RPMs) in the entire chip) are also nonvolatilized for module- and chip-level NVPG (whose domains are indicated by the dotted and dashed lines in the figure, respectively). Note that fully nonvolatilized lower-level caches would not be required, as described below.

Fig. 18
figure 18

Example organization of a multicore processor using the NVPG architecture (Reproduced with permission from [97])

Figures 19a–c schematically show possible nonvolatile hierarchical memory systems. The L1$s should be fully nonvolatilized for quick system restart as shown in Figs. 19a and b, since most of data in the L1$ are expected to be used again after shutdown. On the other hand, some portions of the lower-level caches (such as L2$ and a third level cache (L3$; not shown in the figure)) are required to be nonvolatilized as shown in Fig. 19b, since only a part of data locally stored in the lower-level caches is required for quick restart of the system. Therefore, the partially nonvolatilized lower-level caches are preferable [97, 99, 100]. The nonvolatile main memory using a large-capacity high-speed nonvolatile memory such as ReRAM, spin-RAM, and PRAM has an advantage in instant-on rebooting. In general, the active power of these nonvolatile memories is higher than that of conventional DRAM. In addition, a part of data in the main memory is required for instant-on rebooting, similar to the lower-level caches. Therefore, the hybrid design of DRAM and one of these nonvolatile memories, which include an architecture of a nonvolatile memory cache for a DRAM main memory, would be promising, as shown in Fig. 19b. Therefore, the configuration shown in Fig. 19b is the most promising for the nonvolatile hierarchical memory system of NVPG-MPs (and also NVPG-SoCs), which is also discussed below from performance point of view.

Fig. 19
figure 19

Configurations of nonvolatile hierarchical memory systems in which (a) all the memories are fully nonvolatilized; (b) volatile PRs, nonvolatile registers, register files, and L1$, and partially or fully nonvolatile L2$ and main memory are used; and (c) volatile higher-level memories and nonvolatile lower-level memories (L2$ and main memory) are used. (d) Schematic cumulative processing volume of microprocessors employing NVPG with these memory systems (Reproduced with permission from [97])

Thick and thin curves in Fig. 19d schematically show time evolution of cumulative processing volume for an NVPG-MP using ordinary PRs and nonvolatilized PRs (NV-PRs), respectively, in which RF, L1$, PC, and the other registers and caches in the NVPG-MP are nonvolatilized. The processing speed of the system using the ordinary PRs during the normal operation mode is faster than that of the system using the NV-PRs, since the operation speed of the NV-DFFs slightly degrades that of the NV-PRs owing to the effect of the connection of the PS-MOSFETs, as discussed previously. The system using the ordinary PRs requires a relatively longer shutdown-preparation period than the system using the NV-PRs, since already-issued instructions need to be completely executed before the shutdown. However, this effect would be minor since in many cases the shutdown period is much longer than the waiting period for the already-issued instructions. Therefore, the ordinary PRs are preferable for the NVPG system.

When the registers and L1$ are volatile and the others are nonvolatile as shown in Fig. 19c, the system requires an extra-long period for data backup to the lower-level memories for the shutdown operation. This causes overhead for the shutdown, as shown in the dotted curve in Fig. 19d. Furthermore, this system requires data transfer from the lower-level memories for system wake-up and would frequently cause cache miss for the initial stage of the wake-up, resulting in performance degradation, as shown in Fig. 19d. Therefore, the cores using the volatile/nonvolatile hybrid memory system shown in Fig. 19b are preferable.

Note that in this core system of the NVPG-MP, the NVPG control process would be easily realized: When the NVPG instruction (which is newly added to the instruction set of the system) is detected at the decoder (DEC), the no-operation (NOP) instruction is iteratively issued until already-issued instructions are completed. After the PRs empty, the processor core is ready for NVPG. In addition, the ordinary interrupt sequence with exception handling can also be used for the NVPG control, in which NVPG executions can be treated as one of the normal operations. Therefore, the NVPG architecture has an excellent compatibility with the present microprocessor/SoC technologies.

Recently, a so-called “normally-off computing” architecture using nonvolatile memories has received renewed interest [102, 103]. It was proposed as a kind of an instant-on architecture with the abilities of frequent shutdown and fast rebooting. Firstly, this architecture was supposed to use the hierarchical memory system shown in Fig. 19a [103], and then it was modified using the hierarchical memory system shown in Fig. 19c [104]. The normally-off computing itself cannot reduce static energy during system runtime, and thus this architecture needs to be employed in combination with PG [104]. The NVPG is a superordinate architecture inevitably including the normally-off computing architecture, although the performance depends on its memory architecture, as described previously. Obviously, the normally-off computing architecture is no more than limited example/implementation of a part of the NVPG architecture. The NVPG architecture employing the nonvolatile hierarchical memory system shown in Fig. 19b is the embodiment of idealized PG and enables the highly efficient reduction of static energy during system runtime.