Main

The history of electronics is generally composed of three major electron devices—the vacuum tube1, the bipolar junction transistor (BJT)2 and the metal–oxide–semiconductor field-effect transistor (MOSFET), as illustrated in Fig. 1. The first two devices had important roles in advancing modern computing; however, it is the rise of the MOSFET, particularly complementary metal–oxide–semiconductor (CMOS) technology, during the past six decades3,4,5 that really ignited the explosive development of information technology, which has been serving as the powerful engine of contemporary human civilization. The most appealing merit of the MOSFET for very-large-scale-integration (VLSI) applications is that the continuous scaling down of its physical size drives every important metric—cost, performance, energy consumption and so on—towards efficiency. Moreover, the ‘field effect’ nature and complementary circuit (containing both n-type and p-type MOSFETs) topology of CMOS enable ultralow leakage power, which makes them extremely favourable in low-power applications. Therefore, after the silicon (Si)/silicon dioxide (SiO2) interface (trap) problem was solved, which was the main bottleneck in the early days4, CMOS technology quickly overtook bipolar transistors in the ever-growing digital market, and paved the way towards the VLSI-based information technology era. Thus far, CMOS technology has served the digital VLSI industry for over half a century and has been scaled down to sub-10-nm technology nodes6. However, the road to scaling has not been smooth. Many challenges, primarily short-channel effects (SCEs), have been plaguing CMOS technology since the device size entered the 1-μm regime. In the historical CMOS scaling before the twenty-first century, efforts were focused on reducing the physical gate oxide thickness and engineering the source, drain and channel doping profile7, whereas in the modern scaling scenario during the past two decades, novel materials and device architecture8,9,10,11,12,13,14, such as strained channel, high-dielectric-constant (k) metal gate (HKMG), silicon-on-insulator (SOI) and fin field-effect transistors (FinFETs), have been introduced to suppress SCEs and other adverse effects. According to the latest International Roadmap for Devices and Systems (IRDS)15, scaling at sub-5-nm technology nodes will stall at physical gate lengths of 14 nm and 12 nm for low-power (LP) and high-performance (HP) applications, respectively, which would be an undesirable situation. Therefore, the primary objective of this Perspective is to identify the most promising sub-5-nm logic devices and technologies as well as the required research efforts, thereby rendering the widely distributed research activities in this field to be more focused and more efficient.

Fig. 1: The history of transistor technology.
figure 1

All major transitions from vacuum tubes to BJTs, and eventually to MOSFETs, have been primarily driven by the need to reduce power consumption. Four major non-traditional FET scaling technologies, that is, SOI, strained channel, HKMG and FinFET are shown according to their commercialization time, corresponding to the 0.18-µm, 90-nm, 45-nm and 22-nm technology nodes, respectively. It is noted that beginning from the 22-nm node, the technology node becomes increasingly smaller than the FET physical dimension. Ec, conduction band minima; Ev, valence band maxima; IBM, International Business Machines; UCSB, University of California, Santa Barbara. Transistor count data are from https://en.wikipedia.org/wiki/Transistor_count.

FET fundamentals

The functionality of a typical MOSFET, depicted in the transmission electron microscope image in Fig. 2a, is analogous to that of a generic tap. In a tap, we control the flow of water using a mechanical knob (or gate), whereas in a MOSFET we control the flow of charge carriers (electrons or holes)—from source to drain through a channel, using an electric gate through ‘field effect’ or capacitive coupling. From a semiconductor energy-band perspective, gate bias is used to modulate the energy bands of the channel underneath the gate, as illustrated in Fig. 2b for an n-type MOSFET, thereby controlling its mobile charge-carrier population (nmob). Efficient modulation of the channel potential (φch) through the gate voltage (Vg) is essentially an electrostatic problem, and can be understood by analysing the relevant capacitive couplings from all terminals to the channel where the centroid of mobile charges is located. These capacitances include the effective gate capacitance Cgox’ and quantum capacitance16 CQ, as well as parasitic capacitances from source/drain (Cs/d), substrate depletion (Cdep) and interface traps (Cit), as illustrated in Fig. 2c (see detailed explanation of these capacitances in the caption). The gate efficiency in modulating φch can be derived as

$$\frac{\Delta {\varphi }_{{\rm{ch}}}}{\Delta {V}_{{\rm{g}}}}=\frac{{C}_{{{\rm{gox}}}^{{\prime} }}}{{C}_{{{\rm{gox}}}^{{\prime} }}+{C}_{{\rm{s}}}+{C}_{{\rm{d}}}+{C}_{{\rm{dep}}}+{C}_{{\rm{it}}}+{C}_{{\rm{Q}}}}$$
(1)
Fig. 2: Fundamentals of FET physics and operation.
figure 2

a, Transmission electron microscope image of a 45-nm node planar n-MOSFET with HKMG. Credit: Intel Corp. b, Energy band diagram of an n-type FET. Ec, Ev and f(E) are the conduction band minima (above which mobile charge carriers are electrons), the valence band maxima (below which mobile charge carriers are defined as holes) and the Fermi–Dirac distribution, respectively. λ is the natural length that determines the distance over which the potential changes from the source or drain to the channel. c, Schematic illustration of the capacitor network that determines the electrostatics in an FET with grounded source. Cs/d are capacitances at source/channel and drain/channel junctions, respectively. In general, Cd is smaller than Cs owing to stronger depletion at the drain side. Cgox’ is the modified gate capacitance that accounts for the charge centroid shift from the channel surface owing to waveform spreading (can be described as an additional capacitance, Ccent, connected in series with the physical gate-oxide capacitance Cgox, which equivalently increases the gate oxide thickness). Cdep refers to capacitance induced by carrier depletion at the substrate surface. Cit is a conceptual capacitance induced by interface trap states (= dQit/dφch), where Qit is the interface trap charge density. CQ is quantum capacitance (= dQmob/dφch), where Qmob (= q × nmob) is the mobile charge density and q is elementary charge. Vsub is the substrate bias, if any. d, Transfer characteristics, that is, IdVg curve, and the definition of SS. The operation ranges of HP and LP applications are illustrated. e, Output characteristics, that is, IdVd curve, and the definition of output resistance Ro.

CQ, which is roughly proportional to nmob (ref. 16), an exponential function (see equations in Supplementary Section 1) of φch following the Fermi–Dirac distribution (see sketch in Fig. 2b), plays a unique role in determining the switching behaviour of FETs. When gate bias is below the threshold voltage (Vth), nmob is minimal, thus CQ is negligible with respect to Cgox’. Moreover, the parasitic capacitances do not change rapidly with Vg, thus Vg can tune φch linearly according to equation (1). Therefore, the drain current (Id, proportional to nmob) changes exponentially (appears linear in the semi-log axis) with Vg in the subthreshold regime (Vg < Vth), as shown in Fig. 2d. The steepness of the IdVg curve in this regime, usually quantified by a metric called subthreshold swing (SS), as shown in equation (2), is determined by the gate electrostatic efficiency (the first term) of a MOSFET, and the carrier transport mechanism (the second term) that equates to a thermionic emission limited constant minimum value of about 60 mV per decade of drain current span at room temperature for MOSFETs (see detailed derivation in Supplementary Section 2).

$${\rm{SS}}={\left(\frac{{\rm{d}}[\log {I}_{{\rm{d}}}]}{{\rm{d}}{V}_{{\rm{g}}}}\right)}^{-1}=\frac{\Delta {V}_{{\rm{g}}}}{\Delta {\varphi }_{{\rm{ch}}}}{\left(\frac{{\rm{d}}[\log {I}_{{\rm{d}}}]}{{\rm{d}}{\varphi }_{{\rm{ch}}}}\right)}^{-1}$$
(2)

It is noted that nmob and CQ increase exponentially with Vg, and when CQ becomes comparable to Cgox’, the linear φch modulation gets decelerated. Eventually, CQ becomes much larger than Cgox’ and any other capacitance components, causing the gate efficiency to approach zero, and any additional ΔVg drops almost entirely across the gate oxide, instead of inside the channel. Therefore, the increase of nmob, and hence Id, begin to rely on Cgox’ΔVg, that is, in a linear manner (appears saturated in the semi-log axis), as illustrated in Fig. 2d. This transition of Id from exponential increase to linear increase represents the physical essence of Vth.

In FET operation, the gate terminal is in charge of device switching, whereas the drain terminal is normally exploited to bias FETs in different operation modes after the device is turned ON. When the drain voltage (Vd) is small with respect to the overdrive voltage Vod = Vg − Vth, abundant mobile charges in the channel make FETs behave like resistors, which results in a linear Id–Vd relation. When Vd increases above Vod, mobile charges become depleted at the drain side, leading to the pinch-off of the channel, and thereby to current saturation17, as illustrated in Fig. 2e. In submicrometre-scale devices, large lateral electric-field-induced carrier velocity saturation can trigger an early current saturation before channel pinch-off occurs17.

The most important metrics for FET in digital circuit performance are power consumption (P) and speed (or equivalently delay, τ)

$$P\propto {C}_{{\rm{total}}}{V}_{{\rm{dd}}}^{2}\,f+{I}_{{\rm{off}}}{V}_{{\rm{dd}}}$$
(3)
$$\tau \propto {C}_{{\rm{total}}}{V}_{{\rm{dd}}}/{I}_{{\rm{on}}}$$
(4)

where Ctotal is the total load capacitance, Vdd is the supply voltage, f is the operation frequency, Ioff is the OFF current, which is normally specified for certain application, and Ion is the ON current, which is extracted by fixing Ioff and Vdd, as illustrated in Fig. 2d.

FET scaling challenges

Although MOSFET scaling comes with great benefits7, it raises many technological challenges.

Gate efficiency degradation

As FET gate length gets shorter, Cs/d can become comparable to Cgox, that is, the source and drain begin to share the control over channel potential with the gate, thereby degrading SS, which is the primary manifestation of SCEs. Drain-induced barrier lowering (DIBL) is the other metric to quantify SCEs. As inferred from the name, DIBL lowers the channel potential barrier (with respect to the source) with increased drain bias, thereby lowering Vth, which leads to a non-ideal current saturation (quantified by output resistance Ro) in the output characteristics (Fig. 2e).

Parasitic resistance and capacitance

FET scaling requires the entire device length, which includes gate, contact and spacer lengths, termed as contacted gate pitch (CGP; Fig. 3a), to scale down. As a result, parasitic resistances, such as contact resistance Rcontact, source/drain sheet resistance Rsheet and current-crowding-induced resistance Rcrowd in the source/drain region, and parasitic capacitances, primarily composed of fringing and overlap capacitances (Cfrin/Cov) between the gate and the source/drain (contacts) (Fig. 3a), keep increasing, and hence begin to undermine the benefits of scaling.

Fig. 3: FET scaling.
figure 3

a, A typical FET structure, in which various issues during device scaling are illustrated. Rev-diode, reverse-biased diode leakage. b, Mechanisms of four major leakage paths in scaled MOSFETs. EFG is the Fermi level of the gate electrode. c, A hierarchical illustration of FET scaling scenarios using a tap, whose operation mechanism serves as a good analogy to that of FETs, that is, the water/charge-carrier flow from source to drain through a channel is controlled by the knob/gate. Taps of different forms are employed to illustrate the general idea of different scaling scenarios. λ is the natural length (see equation (8)) and ν is the carrier velocity (see equations (5) and (6)).

Leakage currents

As scaling continues, the low-power merit of MOSFETs begins to be undermined by leakage currents, primarily from four sources (mechanisms are described in Fig. 3b): gate leakage that is composed of directing tunnelling (DT), Fowler–Nordheim (FN) tunnelling and hot-carrier (HC) injection; channel leakages comprising subthreshold (sub-Vth) that is enhanced by DIBL, and direct source-to-drain tunnelling (SDT) leakage; and junction leakages in the forms of reverse-biased diode leakage and gate-induced drain leakage (GIDL) in both of which band-to-band-tunnelling (BTBT) plays a key role. It is noted that as channel thickness reduces and Vdd becomes smaller, the drain depletion underneath the gate and, hence, the vertical (to channel surface) component of GIDL (shown in the sketch for GIDL) are suppressed, leaving only the lateral component along the channel direction.

Variability issues

Large-scale CMOS manufacturing, which is inherently imperfect, and simultaneous scaling unavoidably introduce variations in the channel and dielectric thicknesses, channel length, dopant density18, gate material granularity19 and so on. These variations are reflected in the device electrical characteristics, such as Ion/off and Vth. As the device size (CGP) keeps decreasing, and the manufacturing process (such as lithography) gets increasingly complex, controlling and accounting for process variability has become a critical factor for any technology to be viable20.

Reliability issues

In ultrascaled MOSFETs, both the vertical and the lateral electric fields get stronger, which inevitably stress the devices and degrade the reliability21. Moreover, the introduction of HKMG to replace the SiO2/poly-Si gate stack introduces further device reliability issues. The degradation mechanisms of gate dielectric and channel/dielectric interface22, such as time-dependent dielectric breakdown, bias temperature instability and HC injection, need re-examination.

Modern CMOS scaling

Traditional MOSFET scaling7, which relies on reducing the oxide thickness, along with source, drain and substrate doping-profile engineering23, was eventually limited by large gate leakage through the gate oxide. Fortunately, modern MOSFET scaling has successfully addressed this issue, by employing HKMG, followed by many other novel technologies. In this section, the modern CMOS scaling history is reviewed within a hierarchical scaling framework (Fig. 3c).

Carrier transport engineering

Higher carrier velocity is desired to achieve better device performance at the same device size and supply voltage. However, if targeting the same device performance, higher carrier velocity alleviates the device-size scaling constraint, and/or enables lower operation voltages, and hence energy consumption.

Carrier velocity, either in the drift–diffusion limit vDD, (with various types of scattering involved) or in the ballistic limit vballistic that determines the performance upper bound, is inversely proportional to transport effective mass (\({m}_{{\rm{tr}}}^{* }\))

$${v}_{{\rm{DD}}}={\xi }_{{\rm{tr}}}{\mu }_{{\rm{DD}}}={\xi }_{{\rm{tr}}}\frac{{q}_{0}{\tau }_{{\rm{MFT}}}}{{m}_{{\rm{tr}}}^{* }}$$
(5)
$${v}_{{\rm{ballistic}}}=\frac{1}{\hbar }\frac{{\rm{d}}E}{{\rm{d}}k}=\frac{\hbar k}{{m}_{{\rm{tr}}}^{* }}$$
(6)

where ξtr is the electric field along the transport direction, µDD is carrier mobility, q0 is electron charge, τMFT is the mean free time (MFT) between two successive scattering events, ħ is the reduced Planck’s constant, E is energy and k is the carrier wavevector. Therefore, low \({m}_{{\rm{tr}}}^{* }\) is desired to achieve high carrier velocity. There are mainly two pathways to achieve low \({m}_{{\rm{tr}}}^{* }\).

One pathway is strain engineering (in production). Introducing strain into a Si lattice, thereby modifying the Si band structure, is one of these methods. The early efforts of introducing biaxial global strain by epitaxially growing a thin Si channel on top of a relaxed silicon–germanium (SiGe) virtual substrate24 confronted two key challenges for process integration: (1) it could not provide the best strain configuration for both n-type and p-type transistors and (2) the SiGe layer induced a large number of defects in the strained Si. Subsequently, more integration-friendly local strain techniques were successfully developed, including the gate cap stressor (tensile strain)25 and embedding SiGe in the recessed source/drain (compressive strain)10 for enhancing electron and hole mobilities, respectively. These two techniques have been adopted in a complementary manner by the industry to achieve high-performance CMOS logic10.

The other alternative (under research and development) is to replace the Si channel with high-mobility materials of either low \({m}_{{\rm{tr}}}^{* }\) or large τMFT (equation (5)). However, given the maturity of Si manufacturing technology, it is not practical to use any other materials as the primary substrate. Therefore, the first challenge for any potential new channel material is process integration with the Si substrate. Ge and III–V (specifically indium gallium arsenide (InGaAs)) materials are well known for their high hole and electron mobilities, respectively, and more importantly their lattice constants do not deviate much from that of Si (about 5.43 Å), which alleviates the process integration challenge to some extent. Therefore, these two materials are considered to be promising as post-Si channel materials26,27,28,29,30. The other challenge for any channel material is the lack of a high-quality gate oxide/insulator that can form a good interface with them, as in the Si/SiO2 system. In this regard, Ge has been employed in the form of SiGe alloy with low Ge content for the channel26; thereby, the high-quality thermal SiO2 could still be used. III–V materials cannot benefit from the well tuned Si/SiO2 interface and, hence, a large density of interface states plagues the device performance and reliability of III–V channel MOSFETs27. The low bandgap of III–V materials requires Vdd to be small to avoid GIDL leakage currents, which limits Vod and hence the device performance. The larger dielectric constant of III–V materials, with respect to Si, is another unfavourable factor for scaling30. Moreover, the ultralow electron effective mass of III–V materials, while benefiting mobility, introduces a density of states (DOS) bottleneck30,31 and large SDT leakage32 starting from 20-nm gate length. The DOS for each quantized (vertical to the channel) level in any FET with a planar channel is

$${\rm{DOS}}=\frac{{g}_{{\rm{s}}}{g}_{{\rm{v}}}\sqrt{{m}_{x}{m}_{z}}}{2{\rm{\pi }}{\hbar }^{2}}$$
(7)

where gs/v is the spin/valley degeneracy, mx/z is the effective mass along the channel length/width direction. In summary, high performance of III–V FETs heavily relies on the ultralow effective mass of III–V materials, which, however, prevents their scaling down.

It is worth mentioning that recently, III-nitride semiconductors, in particular, gallium nitride (GaN), have started gaining interest for CMOS application33, because of their intrinsic material property merits such as higher (with respect to Si) electron mobility, bandgap and breakdown voltage. However, numerous technical challenges still remain, including large contact resistance, severely imbalanced n-type and p-type device performances, and process integration on Si wafer. Simply put, substantial efforts are needed to exploit the advantages of GaN for CMOS.

Improving device electrostatics

The scalability of FETs34 can be quantified by a feature length called the natural length λ, which essentially captures the steepness of the potential variation from the source or drain to the channel. The first-order approximation (refer to ref. 35 for a more rigorous but complex form) can be written as

$$\lambda =\sqrt{\alpha {T}_{{\rm{gox}}}{T}_{{\rm{ch}}}\frac{{\varepsilon }_{{\rm{ch}}}}{{\varepsilon }_{{\rm{gox}}}}}$$
(8)

where εgox/ch and Tgox/ch are the permittivity (= k × ε0, where ε0 is permittivity of free space) and thickness of gate dielectric/channel, respectively. α is a geometric factor that captures the gate topology, and decreases with an increasing number of gates in an FET. For good electrostatics, channel length Lch > (3–5)λ is required. According to equation (8), scalability can be improved along three main directions—reducing the equivalent oxide thickness (EOT; that is, reducing the Tgox/εgox ratio), employing multi-gate device topology (that is, reducing α) and reducing the channel thickness (Tch).

The first direction, EOT reduction, is in production. To avoid large gate leakage while continuing scaling, a high-k dielectric was employed11 to replace SiO2. A high-k dielectric can deliver an equivalent Cgox (per unit area) that SiO2 can provide at only a very small thickness, which is called EOT.

$${\rm{EOT}}={T}_{{\rm{HK}}}\frac{{\varepsilon }_{{{\rm{SiO}}}_{2}}}{{\varepsilon }_{{\rm{HK}}}}$$
(9)
$${C}_{{\rm{gox}},{\rm{HK}}}=\frac{{\varepsilon }_{{\rm{HK}}}}{{T}_{{\rm{HK}}}}=\frac{{\varepsilon }_{{{\rm{SiO}}}_{2}}}{{\rm{EOT}}}$$
(10)

where THK is the physical thickness of the high-k dielectric, and εHK and \({\varepsilon }_{{{\rm{SiO}}}_{2}}\) are the dielectric permittivities of the high-k dielectric and SiO2, respectively.

Although a high-k dielectric is appealing in terms of device scalability and performance, there are many challenges11,13,14,17 to address, including the difficulty in obtaining a high k and a high bandgap simultaneously, high-k dielectric/poly-Si interface thermal-instability-induced Fermi-level pinning and hence large Vth, and the channel-surface optical phonon scattering induced mobility degradation. Through a decade of research, the semiconductor industry converged on hafnium oxide (HfO2), which provides a reasonably high k of about 20 and a high bandgap of about 5.7 eV. A metal gate was employed to replace the poly-Si gate, which not only avoids the enduring gate depletion issue and the Fermi-level pinning effect for the poly-Si gate but also screens out the surface optical phonon scattering and thereby significantly improves carrier mobility. The high-k thin film and a high-k dielectric/Si channel interface quality have been greatly improved by employing the reaction self-limiting atomic-layer-deposition technique, and a SiO2 interfacial layer between the high-k dielectric and the Si channel. Aided by these efforts as well as the aggressive gate last process (with respect to the traditional gate first process), Intel delivered the revolutionary first-generation HKMG based chips in 200711. Further improvement directions36,37 include introducing new high-k dielectrics with a higher k (such as lanthanum oxide) and/or reducing the thickness of the interfacial layer.

The second direction, novel device topology, is in production and under development. The introduction of SOI technology, in which a buried SiO2 layer is employed to decouple the channel and substrate, not only eliminates any substrate leakage8 and latch-up effects but also brings immunity to irradiation-induced failure from high-energy particles, which is crucial for outer-space and high-altitude electronics. However, it also raised great concerns such as the high cost of SOI wafers and the self-heating effect9 caused by the much lower thermal conductivity of SiO2, with respect to Si, thus severely limiting the SOI market.

The introduction of non-planar or three-dimensional (3D) transistors38,39, particularly FinFETs40,41 (Fig. 4a) revolutionized FETs and accelerated the shrinkage of the SOI market. The tri-gate FinFETs are manufacture friendly and can improve the current drive capability by increasing the fin height, and/or decreasing the fin pitch. A side effect of the FinFET structure in circuit design is the quantized fin height that narrows the design space in terms of allowed device widths42. It is worthwhile mentioning that FinFETs can also be made on SOI substrate, which has the advantage of minimized substrate leakage such as subfin leakage43, while suffering the penalty of increased cost and the self-heating effect. Thus far, commercial FinFETs have evolved to the 5-nm technology node and beyond6,29, and are expected to survive at least one more generation. The increasing challenges for FinFET technology to continue scaling beyond the 5-nm technology node include the 3D-structure-induced large parasitic capacitance, the high-aspect-ratio fins that are mechanically unstable, the small fin pitch that makes HKMG formation and raised source/drain epitaxy44 in between the fins very difficult, and the minimum fin width (thickness) limited to about 4 nm (ref. 45), beyond which device performance undergoes rapid degradation, as shown in Fig. 4b. In other words, FinFET technology might find it hard to survive when the physical gate length becomes smaller than 10 nm.

Fig. 4: Transistors go beyond FinFET.
figure 4

a, Schematics of a FinFET, a VSNW FET, an LPNW FET, a VSNS FET, an LPNS FET, a VS2D FET and an LPSWNT FET. S/G/D represents source/gate/drain, Lcontact/Lspacer/Lgap is the length of contact/spacer/S-to-D gap. n is the number of channels in the lateral direction.  b, FinFET performance (operation frequency) versus fin width (thickness), suggesting that 4 nm is the optimal fin width (Wc). The performance percentage values shown are relative to a production baseline 14 nm FinFET technology with Fin width of 7 nm. NFET/PFET represents n-/p-type FET. c, The advantages of VSNS with respect to FinFET in terms of effective device width Weff owing to increased total perimeter of channel surface for charge-carrier conduction (white)50. The relative performance values shown are with respect to that of the single data point (Fin) for a FinFET. DNS is the nanosheet width, and Tsus is the inter-sheet distance. d,e, DOS (d) and carrier density (e) of a SWNT array (teal) and a 2DS (purple). Ef is the Fermi level. Panel b reproduced with permission from ref. 45, IEEE. Panel c adapted with permission from ref. 50, The Japan Society of Applied Physics.

Scaling theory34 (equation (8)) indicated that the gate-all-around nanowire (NW) structure provides the best electrostatics from the gate-count point of view. This has been confirmed by both experimental and theoretical studies46,47. Moreover, vertically stacked NW (VSNW) FETs48 and laterally packed NW (LPNW) FETs49 (Fig. 4a), have been demonstrated, which effectively increases the current drive capability with respect to a single NW FET. VSNWs can be made based on FinFET technology, that is, this structure can benefit from the maturity of FinFET process, and hence can save tremendous process development efforts. In contrast, the fabrication processes (bottom up or top down) of LPNW FETs need a lot more investment. The vertical channel of LPNW FETs enables the relaxation of the channel length scaling, and an area and cost reduction without a leakage penalty. However, as technology node shrinks, their scaling advantage could diminish owing to the trend that the effective NW thickness (TNW) and the total width of the laterally placed source, drain and gate contacts (Fig. 4a) become comparable to the gate length. Recently, vertically stacked nanosheet (VSNS) FETs50 (Fig. 4a) and laterally packed nanosheet (LPNS) FETs51 (Fig. 4a), derivatives of VSNW and LPNW FETs, respectively, have been developed. The wider nanosheet (NS) with respect to NW, while sacrificing a bit of electrostatics (thus, higher DIBL and SS) owing to the deviation from the ideal cylindrical gate-all-around structure34, provides more surface area for current conduction and hence higher drive current and performance (Fig. 4c). Interuniversity Microelectronics Centre (IMEC) introduced a dielectric wall in a single NS stack to separate n-type and p-type devices, forming a ‘forksheet’ structure52, which allows for tighter n-to-p spacing, and hence more area reduction, at the cost of degraded electrostatics owing to the missing gate in the wall region. It is noted that these NW and NS devices have to inherit the raised and silicided source/drain and low-k spacer techniques6 from FinFETs to minimize parasitic resistances and capacitances, respectively.

Although NW and NS FETs are promising for replacing FinFETs at the 5-nm node and even beyond, in terms of process maturity and device performance, they are also facing challenges, such as the self-heating effect53 caused by their one-dimensional heat transport, multiple surface-orientation-induced interface issues and large variability54. Moreover, similar to the FinFETs, the channel thickness of NW and NS FETs cannot be scaled below 3 nm, owing to quantum confinement (leading to reduced DOS), mobility degradation, large variability and fabrication difficulties. Therefore, it remains over optimistic to expect that NW and NS FETs can extend scaling to sub-1-nm nodes where physical gate lengths are expected to be smaller than 10 nm.

The third direction, atomic-scale channel thickness, is under research. Single-wall carbon nanotubes (SWNTs)55 and the emerging 2D layered semiconductors (2DS), in particular, transition-metal dichalcogenides56,57, have the capability of scaling FETs to the 1-nm node and beyond because of their intrinsic atom-scale thicknesses (≤1 nm). SWNTs and 2DS also have the advantages (with respect to bulk materials) of a dangling-bond-free surface and atomically uniform thickness, which effectively suppress interface trap generation and variability, respectively, in FET applications. It is worth noting that future FETs are unlikely to go  back to a planar structure. Even though SWNTs and 2DS have promising material properties, they have to be integrated into the modern 3D structures (see the suggested vertically stacked 2D (VS2D) FET and the laterally packed SWNT (LPSWNT) FET in Fig. 4a) to make them relevant to VLSI technology. Figure 4d,e shows the calculated DOS and carrier density, respectively, for a SWNT (based on a tight-binding model) array and 2DS (based on effective mass model). As shown, as long as the tube density of the SWNT array and the effective mass of the 2DS are reasonably large, SWNT and 2DS channels can provide sufficiently large DOS and comparable inversion carrier density with respect to the Si channel.

The first semiconducting SWNT FET was demonstrated in 1998 with a back-gate structure58. Subsequently, an atomic-layer-deposition-based process was developed to deposit zirconium dioxide (ZrO2) on the pristine surface of a carbon lattice that lacked nucleation centres, allowing the demonstration of a high-k top-gated SWNT FET together with a hole mobility reaching 3,000 cm2 V−1 s−1 (ref. 59). The mechanism behind this success was later attributed to the wetting functionality of several specific metals (scandium, yttrium and zirconium) to the SWNT surface60. It was also found that palladium (Pd) was suitable to form a p-type ohmic contact with the SWNTs, and could deliver a ballistic-transport current of about 25 µA per tube61. Excellent scalability of the SWNT FET was experimentally proved by showing negligible SS degradation when the channel length was scaled from 15 µm to 15 nm (ref. 62). Recently, a top-gated SWNT FET with 5-nm channel/gate length and graphene contact was demonstrated63. The measured SS was as low as 73 mV dec−1, indicating that negligible SDT leakage was involved, probably owing to the fact that carrier effective mass of 1-nm SWNT can be as high as 0.68m0 for certain chiralities64. In 2015, an IBM group65, developed an end-contact approach for SWNT by forming a molybdenum carbide (Mo2C) alloy in the contact area, which shows negligible Schottky barrier and contact-length dependence of contact resistance. With this technology, the IBM group successfully scaled the footprint of SWNT FETs to 40 nm (ref. 66). It is noted that the ON–OFF current ratio of most demonstrated SWNT FETs is relatively low59,60,61,62,66,60,61,, owing to the small bandgap (about 0.7 eV) of 1-nm SWNTs64. In other words, SWNT FETs should be targeted for HP rather than for LP applications.

No matter how good a single SWNT FET is, SWNT FET technology can be competent only when a high-density, high-purity (semiconducting), well aligned wafer-scale SWNT array can be achieved with a CMOS-compatible process. Metallic SWNTs can be removed by several approaches, including thermocapillary flows67, electrical breakdown68 and density-gradient ultracentrifugation69. A surface-electrochemistry-assisted self-assembly70,71 technique has been developed for SWNT placement and alignment70. Recently, Liu et al.72, using a solution process, realized a decently high density (120 tubes per μm) and an average alignment degree of 9°, simultaneously. These achievements are exciting, but significant efforts are needed to enhance the n-type device performance (currently not on par with the p-type device), control variability63,66,73 and develop CMOS-compatible processes.

2DS, such as molybdenum disulfide (MoS2) and black phosphorus (BP), arose in the surge of graphene (a 2D semimetal) research56,57,74. Their planar structure and relatively large bandgaps offer great advantages, with respect to SWNTs, in terms of developing CMOS-compatible process and LP FETs57. Therefore, the 2D FETs have witnessed explosive growth75,76 and are considered more feasible for high-volume production. It is noted that although graphene nanoribbons can be considered as a type of 2DS, their potential for FET application is not on par with other 2DS owing to the difficulty in fabricating large arrays of uniform graphene nanoribbons with width ≤1 nm needed to achieve a bandgap >0.5 eV. Interested readers are referred to a recent review77 for more details.

In 2011, the first top-gated monolayer (1L) MoS2 FET was demonstrated, realizing an ON–OFF current ratio and SS of 108 and 74 mV dec−1, respectively78. This success greatly stimulated the enthusiasm of the device community on 2D FET research. A key limitation was that the measured electron mobility in 1L-MoS2 (refs. 78,79) was as low as 13 cm2 V−1 s−1, which is far below the predicted phonon scattering limited mobility80 of about 410 cm2 V−1 s−1. Such a large mobility loss is attributed to scattering with high-density defects (mostly charged) in MoS2 and dielectric surface roughness (multilayer 2D channel is less affected)81. In 2013, inspired by the idea82 of employing a high-k dielectric to suppress Coulomb scattering, Liu et al.83 and Fang et al.84 managed to obtain nearly 200 cm2 V−1 s−1 electron and hole mobilities from 1L tungsten diselenide (WSe2).

2D FETs have been plagued by large contact resistance (Rc)74,78,79,81,83,84,85. A theoretical study uncovered that the major contributors include the van der Waals gap, the lack of effective doping and the Fermi-level pinning effect, and that contact metals with d orbitals are preferred86. English et al.87 found that an ultrahigh vacuum was essential for achieving low Rc on MoS2. Recently, inspired by the idea of in-plane seamless graphene edge contact88, Yeh et al.89, demonstrated edge-contacted graphene–2DS–graphene FETs, which have the capability of dynamically modulating (by gate bias) the Schottky barrier at graphene–2DS contacts, and achieved a low Rc of 0.67 kΩ µm for 1L-WS2.

Converting the semiconducting 2H-MoS2 into metallic 1T-MoS2 in the contact area90 has been found to lower the Rc to 0.24 kΩ µm. The semi-metallic bismuth (Bi) has been found to form an ultralow n-type Rc (0.123 kΩ µm)91 to 1L-MoS2 owing to its low DOS near the conduction band edge of MoS2 that suppresses Fermi-level pinning. Although the values achieved in the above two methods are encouraging, the metastability of 1T-phase MoS2 and low melting point (271 °C) of Bi present great challenges in practical process integration. Intel92 and Taiwan Semiconductor Manufacturing Company (TSMC)93 found another semimetal, antimony (Sb), that has a much higher melting point (631 °C) than that of Bi and could deliver an equally low n-type Rc of 0.145 kΩ µm (ref. 92) to 1L-MoS2, but with relatively low Ion. Recently, Li et al.94 reported an Sb–MoS2 Rc of 0.042 kΩ µm, which is close to the quantum limit (~0.0296  kΩ µm at a carrier density of 3 × 1013 cm–2). Efforts for such Sb-MoS2 contacts to be feasible for industry usage in the future include stability at high temperature (>400 °C), and VLSI compatible doping technique to replace strong back gate biasing that induces large parasitic capacitance.

The ambipolar conduction observed in most 2D FETs arises from the Schottky barrier nature of their source/drain contacts. Although such ambipolarity provides an opportunity for designing certain devices such as reconfigurable FETs for low-cost electronics95, it introduces large leakage currents in CMOS logic gates, which degrades energy efficiency and/or causes logic operation failure. To suppress the ambipolar current and enable energy-efficient 2D complementary (n-type and p-type) FETs, developing transparent (ohmic) contacts via effective doping is essential. Fang et al.84,96 introduced surface adsorbates, such as nitrogen dioxide and potassium, that can transfer charge to 2DS, and achieved a degenerate doping level. However, owing to the pristine surface of 2DS, surface adsorbates are usually unstable. In contrast, intercalation doping97, in which dopants are inserted in between the 2D layers, is more reliable, but limited by the relatively long dopant diffusion time. Therefore, non-traditional approaches should be innovated for doping 2DS.

There have been a few attempts to fabricate ultrashort channel (10-nm-scale gate length (Lg)) 2D FETs98,99,100,101,102. Cao et al.101 employed an ultrathin metallic NW as the top gate as well as a self-aligned mask, instead of relying on electron beam lithography, to define the channel length, thereby demonstrating a top-gated 10-nm-scale 1L-MoS2 FET. Desai et al.102 fabricated a similar device, by using an SWNT, instead of an NW, as the back gate. Such a method presents two problems. One is the loss of self-alignment function of gate for source/drain formation, which makes the source-to-drain distance unacceptably large (about 1 μm), and the CGP (Fig. 3a) is still reliant on electron beam lithography. The other is the excess gate voltage consumption in the low-DOS SWNT gate (Fig. 4d) to achieve the required drive current level, which fundamentally limits the voltage scalability and energy efficiency. These two problems could be even more challenging in the recently reported graphene edge-gated FET103.

The pristine surface of 2DS makes the direct growth of a high-quality, thin gate dielectric on it challenging. An effective seed layer78 has to be identified to wet the surface of the 2DS. In fact, the best scenario is to develop 2D layered high-k dielectrics. The emerging perovskites have been found able to generate high capacitance in a capacitor structure104. However, it remains unclear whether thin perovskites in the FET environment can provide the same capacitance, without introducing any hysteresis (memory effect of perovskites) into the current–voltage curves. Recently, Chamlagain et al.105 reported that 2D insulating tantalum pentoxide (Ta2O5), thermally oxidized from tantalum sulfide (TaS2), has a k of about 15.5, and could serve as an effective gate dielectric on MoS2, which is a positive step towards achieving a 2D high-k dielectric.

On the 2DS synthesis front, Kang et al.106 successfully demonstrated a wafer-scale full-coverage synthesis of 1L-MoS2 by using metal–organic chemical vapour deposition. Yeh et al.89 reported an area-selective growth scheme, in which the 2DS is grown at predefined channel regions (smaller than 2DS grain size) in a graphene canvas. This scheme strategically avoids the necessity of wafer-scale, single-crystal 2DS synthesis, and thus provides an alternative route for 2DS towards industrialization.

Emerging technology benchmarking with CMOS

This section provides a comprehensive benchmarking of the several promising transistor technologies discussed above, in terms of contact resistance, mobility, ON-current level and the upper limit of short-channel performance. Although it is unfair to compare emerging technologies (such as 2DS) with almost mature (III–V/Ge and NW and NS) or even commercialized (Si FinFET) technologies, it can provide practical guidelines on how much effort is needed for these technologies to approach or exceed state-of-the-art performance. It is noted that the parasitic capacitance data are not available for emerging transistors (SWNTs and 2D), thus they are not benchmarked here.

Figure 5a shows the Rc data versus channel thickness for various FETs based on different materials (InGaAs28, Si (in FinFET)44, SWNT65 and 2D81,83,84,85,89,91,92,94,107,108,109,110,111,112,113,114), as well as the IRDS requirements. For SWNTs, the Mo-based end-contact approach has reduced the single-tube Rc to 15 kΩ. The mature Si/Ge and III–V devices unsurprisingly have the lowest Rc, and can comfortably meet the IRDS requirements. Recent progress on 2DS contacts has closed the n-type Rc gap of 2D FETs with respect to those of other competing materials (Fig. 5a). However, simultaneously achieving low Rc for both n-type and p-type 2DS FETs remains challenging.

Fig. 5: Comparative analysis of state-of-the-art and emerging MOSFETs.
figure 5

a, Collected contact resistance (in FET ON-state region, that is, at a carrier density level of about 1013 cm−2) data from the literature: IRDS15, InGaAs28, Si FinFET44, SWNTs65 and 2D81,83,84,85,89,91,92,94,107,108,109,110,111,112,113,114. The dashed curve indicates the general trend of increasing contact resistance with reduced channel thickness. BV is benzyl viologen. b, Collected mobility data from the literature: SWNTs59, graphene (Gr)74,115, MoS2 (refs. 79,116,117,118), WS2 (refs. 106,119,120,121), SnS2 (refs. 119,122), ReS2 (ref. 123), MoSe2 (refs. 119,124), WSe2 (refs. 83,84,119,125,126), SnSe2 (refs. 119,127), ReSe2 (refs. 128), InSe (ref. 129), BP130,131,132, GaSe (ref. 133) and tellurene107; ML represents multilayer. All data except those that are ‘from models’ are experimental results. c, Collected highest obtainable ON current from various materials and devices15,27,45,66,83,94,99,101,134,135,136,137,138. The dimensions in the legend represent Lg. The IRDS 2022 data corresponds to an HP ON current (for Vd = 0.7 V; Lg = 16 nm; and Ioff = 10 nA µm–1). For the SWNT Ion data points, only the bottom-most data is from experiment, the remaining data are projections based on improving SWNT assembly density. d, Projected Ion versus effective mass along (mx) and perpendicular to (mz) transport direction in units of the free electron mass (m0) for HP (left) and LP (right) at Lg = 5.9 nm. Si, Ge, III–V, SWNT and some typical 2DS107,140,141,142 are used for benchmarking, purely from an effective mass perspective. It is noted that low-band-gap materials (particularly III–V and SWNT) suffer from high leakage current, and hence degrade SS and the attainable ON current for a fixed Ioff and Vdd. The dashed blue line represents the isotropic case, that is, mx = mz. It is noted that for longer (shorter) Lg with respect to 5.9 nm, the optimal regions will shift to the left (right), accordingly.

Source data

Figure 5b provides a benchmarking on mobility versus bandgap for SWNTs59, graphene74,115, MoS2 (refs. 79,116,117,118), WS2 (refs. 106,119,120,121), SnS2 (refs. 119,122), ReS2 (ref. 123), MoSe2 (refs. 119,124), WSe2 (refs. 83,84,119,125,126), SnSe2 (refs. 119,127), ReSe2 (ref. 128), InSe (ref. 129), BP130,131,132, GaSe (ref. 133) and tellurene107. A general trend that can be observed is that large-bandgap materials have low mobilities. Among all semiconductors, SWNTs and III–V materials exhibit the highest mobilities, owing to their relatively small bandgaps. Mobility in Si NWs rapidly decreases with reduced thickness. Among all experimentally measured 2DS, only WS2, WSe2, tellurene and BP offer high mobilities. It is noted that big gaps still exist between measured and theoretically predicted mobilities for 2DS, indicating plenty of room for further improvements in the material and device quality.

Contact resistance and mobility are indeed two widely employed metrics to quantify device performance. However, it is not rare to find that they are incorrectly measured or calculated, especially when device structure and/or material is prepared in a non-traditional manner78,79. The most meaningful metric for device performance is the obtainable highest current level, as collected from the various devices15,27,45,66,83,94,99,101,134,135,136,137,138 and plotted in Fig. 5c, versus Vd/Lg, which is essentially the average lateral electric field along the channel. The best condition is obviously at the top-left corner, that is, obtaining the highest possible current at the lowest lateral electric field. Compared with SWNTs and other even more mature materials, most 2DS devices, except the Sb-contacted MoS2 FET, show a lower ON-current level, especially for chemical vapour deposition (CVD) samples, because of their high contact resistance and defect-rich material or imperfect device fabrication.

Rigorous quantum transport simulations139 have been performed to explore the full benefit of the broad 2DS family107,119,140,141,142 for sub-10-nm FETs. Performance maps (Fig. 5d) (see ref. 139 for colour maps) for HP and LP at Lg = 5.9 nm were generated by using effective masses (mx, along transport direction; mz, along device width direction) as variables. It is noted that mx determines the carrier velocity and SDT, whereas mz serves as a DOS modulator. Materials with an effective mass around 0.3m0, such as WSe2, tellurene and WS2, are all desirable for LP. Compared with LP, HP allows much higher Ioff (Fig. 2d), and hence is less sensitive to SDT. Therefore, anisotropic materials, such as SnSe2, BP, ReS2 and so on, that have small mx but large mz, are preferred for HP. It is worthwhile noting that although these two maps are made for 2D FETs, they are also of a certain reference value to other material systems. Therefore, Si, Ge, III–V and SWNTs are also placed in relevant positions within Fig. 5d. A unique advantage of 2DS is that they can cover almost the entire map (see ref. 139 for colour maps), that is, 2DS are capable of a wide range of applications at different technology nodes.

On the basis of the review, analysis and benchmarking above, a summary table (Table 1) to quantify (with the number of diamonds) the prospect of various emerging and future MOSFET technologies can be generated, in terms of several critical metrics including electrostatics, SDT, manufacturability, intrinsic ON current (Ion,channel), which excludes the effect of contacts, Ioff, parasitic resistance (parasitic R), parasitic capacitance (parasitic C), circuit design flexibility in choosing device width (W flexibility), self-heating effect, reliability, variability, and the performance balance between n-type and p-type devices (CMOS design). As clearly reflected, the low-dimensional SWNTs and 2DS offer remarkable scalability advantage, with respect to bulk semiconductors. At this stage, the semiconductor industry seems more interested in 2DS76,92,143, likely owing to the fact that 2DS are much more manufacturable than SWNTs.

Table 1 Benchmarking emerging/future MOSFETs

The future ‘trans-resistors’ beyond MOSFETs

MOSFETs are limited by their operation mechanism; therefore, the power consumption and energy efficiency fail to scale at the same pace as the device size. The CMOS community has been searching for ‘beyond MOSFET’ transistors, to break this energy-efficiency bottleneck. In this context, it is instructive to review the fundamental aspects of a transistor—‘trans’ (arising from ‘transfer’ or modulation) and ‘resistor’ (arising from resistance of a channel). Thus, ‘trans’ captures the approach to manipulate the information state or carrier—which is the ‘resistor’. For commercialized MOSFETs and BJTs, ‘trans’ is realized with the electric-field effect through a static gate capacitor and a p–n junction barrier modulation, respectively, whereas the ‘resistor’ is implemented in the form of thermionic emission over a barrier for both devices. There have been plenty of efforts to introduce innovations into one (or both) of these two aspects.

The various ‘trans’ approaches, negative capacitance (NC)144, suspended-gate (SG)145 and Mott-phase-change-material gated (Mott-G)146 FETs (Fig. 6), were proposed and have been experimentally explored. NC FETs utilize the negative capacitance state of ferroelectric materials to construct a negative Cgox, targeting to overcome the unity upper limit of gate efficiency. SG and Mott-G FETs introduce a nanoelectromechanical (NEM) switch and an insulator-to-metal phase-change memory cell, respectively, into the gate stack, and utilize their non-equilibrium-state switching transients to realize an abrupt (with respect to the gate voltage) Cgox increase, which translates to an abrupt increase of charge density and drain current, that is, an ultrasmall SS. It is worth mentioning that the memory nature of the NEM switch and phase-change memory cell inevitably result in a memory effect, that is, hysteresis, in the transfer characteristics, which limits SG and Mott-G FETs to memory applications.

Fig. 6: Transistors go beyond MOSFETs.
figure 6

Revisiting the two aspects of transistor—‘trans’ and ‘resistor’ for exploring the design space of future FETs. An NC FET is a negative capacitance FET144; an SG FET is a suspended-gate FET145; a Mott-G FET is phase-change-material gated FET146; an SB FET is a Schottky barrier FET; a vac. FET is a vacuum-channel FET147; a Mott FET employs a phase-change material as the channel or to connect in series to the source148; a Supl. FET is superlattice FET149; a DS FET is a Dirac source FET150; a TFET is an interband or BTBT FET151,152; a BE-TFET is a bipolar-enhanced TFET153; an NC-TFET is an NC-gated TFET154; an i-MOS is an impact-ionization FET155; a Mott-i-MOS is a Mott-gated i-MOS FET; an FB FET is charge feedback FET156; a BiS FET is bilayer pseudospin FET159; an NEM relay is a nanoelectromechanical relay-type FET160. Dark grey indicates device concepts (Supl., BiS, Spin157,158 FETs, BE-TFETs, etc) that have not been experimentally proven. Among all the experimentally demonstrated (light blue) beyond-MOSFET transistors, TFETs and NC FETs are the most promising low-SS devices, and have received the most extensive studies. A more detailed version of this figure (Supplementary Fig. 1) with band diagrams and charge carrier illustrations has been provided in Supplementary Section 3 for interested readers. Note that a dark blue colour is used for the MOSFET to highlight that it is the benchmark device. The light grey colour indicates ‘dark space’ or unexplored device concepts. The listed ‘trans’ and ‘resistor’ options are not exhaustive, and hence, ‘…’ is used along the two axes to indicate opportunities for more innovations.

In contrast, the ‘resistor’ has been implemented in many beyond-thermionic-emission forms and mechanisms. In Schottky barrier and vacuum147 FETs, charge carriers tunnel through the Schottky barrier between the metallic source and the semiconducting or vacuum channel, respectively. Mott FETs148 employ a phase-change memory cell connected in series with the source, to achieve an abrupt change of resistance of the ‘resistor’ and hence the drain current, during the memory-state switching, which also introduces hysteresis in the IV curve. Superlattice FETs149 employ a multi-quantum well in the source region to form an artificial resonant tunnelling band, which is narrow enough to filter the thermionic emission of high-energy (with respect to the Fermi level) carriers. Dirac source FETs150 utilize the decreasing DOS towards the Dirac point of a graphene source to realize reduced high-energy carrier injection. The minimum achievable SS of this device remains larger than 30 mV dec−1, simply owing to the fact that graphene is gapless, that is, the DOS near the Dirac point, although low, still allows high-energy carrier leakage. Tunnelling FETs (TFETs)151,152 utilize the bandgap of the source to filter the thermionic emission of high-energy carriers. However, the low band-to-band tunnelling probability sets a constraint on its ON current. Bipolar-enhanced TFETs (BE-TFETs)153 have been proposed to alleviate this issue, by amplifying the drain current with the large current gain of the BJT. Experimental demonstration is needed to prove this device concept. A previous study154 employed a ferroelectric gate (expected to function as an NC) on a TFET, and found that the TFET performance was enhanced. Whether such an enhancement is due to the NC effect or simply due to the high k of the ferroelectric material needs to be examined. Essentially, superlattice FETs, Dirac source FETs and TFETs have a similar form of ‘resistor’, as they are all based on DOS engineering in the source to achieve localized carrier injection near the Fermi level. Impact-ionization MOSFETs (i-MOS)155 employ superexponential generation (versus gate voltage) of electron–hole pairs during impact ionization to realize a superexponential increase of the drain current. However, a large Vd is required to activate the impact-ionization process, and hence cannot help reduce supply voltage. In feedback FETs156, electron and hole potential barriers are intentionally introduced near the n-type source and p-type drain sides, respectively, to form a positive feedback loop between electron–hole redistribution and barrier modulation, which dynamically accelerates the electron–hole injection rate. This device also requires a large Vd to activate the feedback loop.

Compared with the abovementioned charge-carrier-based FETs, spin FETs157,158 employ spin as the information carrier, and implement the ‘resistor’ in the form of magnetoresistance. It is worthwhile noting that spin FETs here refer to Rashba type158 in which the spin direction is manipulated by a small gate electric field during transport from the source to the drain. Unfortunately, the controllable manipulation of spin direction in experiments is difficult, which has thus far prevented an experimental proof of spin FETs. In bilayer pseudospin (BiS) FETs159, it was predicted that the condensation of electron–hole pairs (bosons) in the insulator separating the electron–hole bilayer could markedly reduce the interlayer resistance, thereby realizing an abrupt increase in conductance. However, experimental demonstrations are required to validate the device concept of BiS FETs. NEM relay160, compared with SG FETs, employ a NEM switch at the channel/drain junction, to realize a physical contact and separation between channel and drain. Such mechanical switches suffer heavily from reliability issues arising from stiction, particularly for ultrascaled devices161. As reflected in Fig. 6, there remains a large dark space for LP device designers to explore. It is worth noting that to implement those futuristic transistors, judicious selection of material platforms could play a critical role. For example, the pristine surface of 2D materials can be exploited to develop ultrasteep-slope TFETs162, which are difficult to realize with traditional bulk materials.

Among all the experimentally demonstrated novel transistors, TFETs151 and NC FETs144 have received the most extensive studies (see Supplementary Section 4). The TFET structure with BTBT was introduced in 1978151. In 2004, it was found that sub-60 SS can appear in the BTBT leakage current branch of an SWNT FET structure, demonstrating that sub-60 SS is experimentally achievable152. TFETs are essentially gated p–i–n diodes as schematically illustrated in the top half of Supplementary Fig. 2a. Ultralow-SS TFETs have been achieved in many different material systems and device structures, as shown by the collected experimental data163,164,165,166,167,168 in Supplementary Fig. 2b. At present, the main challenge for TFETs is the low BTBT probability that limits Ion. Innovative structural designs such as using a 2D heterojunction162 and a gated Esaki diode structure169 as well as experimental demonstrations are required to overcome this issue.

Since the concept of NC FETs was proposed144 in 2007–2008, it quickly rose to a notable position, owing to its fabrication-friendly structure (bottom half of Supplementary Fig. 2a), which is basically a MOSFET with an additional NC (typically ferroelectric material, FE) layer inserted between the oxide and the gate. Compared with the FE memory application170,171 in which the bistable states of the FE was used to store binary information, NC FETs are believed to utilize the metastable state of the FE layer, which provides a negative polarization response to an external electric field, that is, NC, to better (<1) the unity gate efficiency of MOSFETs (equation (1)). Thus far, there have been many claimed ‘NC FETs’, showing low SS, with relatively small hysteresis in measured d.c. IV curves172,173,174,175,176,177, as shown in Supplementary Fig. 2c. However, according to the fundamental device physics of NC FETs, the design space of NC FETs for hysteresis-free sub-60 SS has recently been found178 to be very small because of the generally large quantum capacitance, and the excellent electrostatics of modern FETs. In other words, these measured small SS values cannot be explained by the static NC interpretation144. In fact, more and more studies179,180,181 have indicated that they can be attributed to the transient effects during the measurement and/or FE polarization dynamics. Recently, a study182 found an ultrahigh trap density (level of 1014 cm−2) between the FE layer and the gate oxide layer. Thus, the trapping and de-trapping dynamics (not controllable) may present an alternative explanation of the steep slopes in the reported ‘NC FETs’. Moreover, high trap density in an electron device is generally accompanied by reliability issues. Thus, considerable research efforts are needed to address these issues and thoroughly understand the interplay between the FE and the oxide, before applying NC in CMOS products.

Beyond-Moore integration pathways

In the CMOS scaling history, most efforts have been invested in shrinking the MOSFET feature size, and optimizing the utilization of in-plane area of the logic chip, including the ‘2.5D’ chiplets183—a packaging technique, following Moore’s law. It is not unfathomable to find out someday that the MOSFET physical size shrinking and the in-plane increase in device integration density have stopped, owing to the fabrication difficulty and cost, as well as power-density constraints. In this regard, the vertical physical space of the logic chip has not been well exploited. Although FinFETs and NW and NS FETs can be considered as one type of such effort, they are only beneficial for improving device current and electrostatics, and are not helpful in increasing the device integration density beyond a certain point. Three-dimensional integration that stacks either devices (usually in a complementary manner)184 or dies in the vertical space is considered a practical approach to increase device density. Besides the benefit of density scaling, 3D integrated circuits can significantly reduce interconnect delay and power dissipation185. Three-dimensional integration can be realized via different pathways, including wire bonding and/or flip-chip based 3D packaging, through-Si-via (TSV)-based 3D die/wafer stacking, and monolithic 3D integration (m-3D). Among the three, m-3D is the most desirable approach, owing to its large layer density and high local connectivity-enabled design possibilities. Although m-3D has thermal budget and heat dissipation issues, progress has been made in conquering these challenges, such as employing solid-phase epitaxy regrowth to activate dopants below 600 °C (ref. 186), cooling 3D integrated circuits with power delivery networks187 and employing a high-thermal-conductivity hexagonal boron nitride interlayer dielectric188. Three-dimensional integration is an inclusive technology. Eventually, it can evolve to the form of heterogeneous 3D integration169,185,188,189,190,191, in which very dissimilar systems of different materials (such as 2DS and Si188), devices and functionalities can be integrated together along both vertical and lateral directions, thereby constructing an ultrapowerful and energy-efficient system-on-chip and heterogeneous system-of-chips or chiplets (Fig. 7). Nonetheless, it is important that device and circuit architectures, such as the recently demonstrated 0.5T0.5R hybrid resistive random-access memory192—that not only allow computing paradigms such as ‘in-memory’ computing but also are specifically designed to simultaneously provide higher lateral and vertical stacking densities—remain a central priority for revolutionary advancements in 3D integrated-circuit design.

Fig. 7: A transistor is the powerful engine that is driving all aspects of the information technology industry.
figure 7

Transistors have been igniting and driving forward the research and development of information technologies, in terms of integration schemes—including heterogeneous 3D stacking by means of either TSV or m-3D integration; 2.5D (or 3D198) chiplets that allow ultraclose packaging of dies; and device-level 3D stacking (n-type device on top of p-type device or vice versa), as well as computing schemes, device physics, and materials. The centre of the figure shows various transistor topologies and materials that have already been adopted by the industry. The schematic on the right of ‘Computing schemes’ box shows a spin qubit directly constructed on a FET with split gate (bottom) that allows a quantum computer to benefit from CMOS scaling. The conceivable interactions open up a multitude of exciting possibilities for studying novel physics and designing novel structures and functionalities. For example, TMDs from ‘Materials’ can be employed to build NS FETs for sub-1-nm CMOS technologies; graphene and TMDs can be useful for studying ‘Device physics’ such as spintronics and valleytronics, or qubits can be designed with TMDs; BTBT devices and circuits can be used for neuromorphic computing; 1D and 2D materials can be utilized for monolithic heterogeneous 3D integration and 3D chiplet architectures with advanced packaging and connectivity, power delivery, thermal management, and so on. HK, high-k; hBN, hexagonal boron nitride. V1–10 and f1–3 are the input and output signals of a neural network, respectively. |φ> refers to the superimposed qubit state, consisting of qubit basis states |0> and |1>, with probability amplitudes of α and β, respectively. MG, TMD, and CNT represent metal gate, transitional metal dichacogenide, and carbon nanotube, respectively.

Beyond von Neumann computing

Information technology has never stopped evolving. Beside the classical computers designed for general-purpose high-performance computation, neuromorphic computing193 and quantum computing194 are being actively developed, and expected to usher unprecedented advantages in some domains, such as chemical reaction simulations, or artificial intelligence and machine learning, which are too computationally intensive or power consuming for a conventional von Neumann computer. A common requirement from all of them is a transistor of ultrahigh compactness and ultralow leakage, as well as high current drivability, robustness and energy efficiency, either as core or supporting or interface elements193,194,195,196,197, as indicated in Fig. 7. Given these desirable characteristics, it is reasonable to argue that transistor scaling and evolution will never stall, and Moore’s law will stay with us for a long time in the foreseeable future.