Keywords

1 Introduction

An Internet of Things (IoT) network is a crucial component of different revolutionary concepts such as Industry 4.0 [1] and smart homes/smart cities [2]. The IoT devices within the networks gather vast amounts of data for dedicated processors/AI models, which boost the precision of analyses. An essential criterion for the IoT device is low power consumption. Ultra-low-power (ULP) radio, intermittently turned on for a short amount of time for data transmission to reduce the average power of the IoT device, is popular for the IoT device as it reduces the power consumption of power-hungry blocks such as the transceiver (TRX) and extends the lifetime of the device [3]. The system will place the device into sleep mode for a specific period, with only critical blocks such as memory and wakeup timers powered on for timing purposes.

On the other hand, there is a trend to power the IoT device with energy harvesters to realize perpetual operation. As the battery has a finite lifetime, there may be chances that the IoT device will miss critical data if it runs out of battery. Also, replacing batteries will be a tremendous task considering that there will be trillions of IoT devices. Further, the battery may pose environmental issues and create safety risks if not handled properly. By replacing the batteries with energy harvesters (EH), the lifetime of the device increases, and we can obviate the labor to replace the batteries, which otherwise requires a substantial effort. EH, such as solar cells (typical available power indoor: 10–100 μW/cm2) and thermoelectric generators (typical available power: 10–1000 μW/cm2), are promising in this perspective [4,5,6]. Yet, they usually only output voltage with amplitudes ~0.3 V–0.4 V and are unstable with environmental factors (temperature and light intensity) [4]. We can use a boost converter to stabilize and step up the voltage to the standard I/O voltage, but this increases the footprint (cost) and power consumption of the IoT device. These criteria open a prospective research direction for ultra-low-voltage (ULV) circuits, powered directly by these energy harvesters, and avert the penalties of the interim converters.

Clock references are indispensable parts of the TRX. Wide-ranging purposes such as the low-power wakeup timer, the phase-locked loop, the data converters, etc. require different clock references. Hence, this chapter elaborates on the design and measurement results of two ultra-low-voltage clock references in deep-submicron silicon processes. Section 2 introduces the regulation-free sub-0.5 V 16/24 MHz crystal oscillator for energy-harvesting Bluetooth Low Energy (BLE) radios implemented in 65 nm CMOS [7], whereas Sect. 3 demonstrates a fully integrated 0.35-V 2.1 MHz temperature-resilient relaxation oscillator using an asymmetric swing-boosted RC network implemented in 28 nm CMOS [8].

2 Regulation-Free Sub-0.5 V 16/24 MHz Crystal Oscillator for Energy-Harvesting BLE

2.1 Motivation

The crystal oscillator (XO) is an essential circuit module for modern TRXs. It provides a stable clock reference for different parts such as data converters, phase-locked loops, sensors, etc. Despite its excellent frequency stability, it can take a few milliseconds for the XO to settle into the steady state [9,10,11] without any fast startup technique [12] due to the high-quality factor of the crystal (~105). This startup time (ts) dominates the “on” latency of the radio, and its startup energy (ES) may significantly degrade the effectiveness of duty-cycling of an ultra-low-power radio. If the active energy (ETRX) of a TRX is 1280 nJ (on-time of 128 μs [13] and active power of 10 mW [14]), the percentage of energy spent for starting the XO in every working cycle is ~42% for ES of 1000 nJ for a conventional XO and a duty cycle of 0.1%. Such a percentage will go further up as recent circuit techniques can manage to suppress the active power of the TRX (PTRX) [15,16,17]. Then, reducing ES for the ULP radios is of paramount importance to reduce its average power consumption. Recent efforts in both academia and industry succeeded in shortening the ts and ES of the XO [13, 14, 18,19,20,21,22,23].

This section reports a regulation-free sub-0.5 V XO according to the system aspect of the EH BLE radios described in [24,25,26,27]. Unlike the existing fast startup XOs based on standard or I/O voltages to power up their inverter-like or active-load amplifiers [13, 18,19,20,21], the proposed XO is ULV-enabled by using single-/multi-stage resistive-load amplifiers [28]. This architecture circumvents the ineluctable voltage headroom limit, rendering it compatible with the ULV application. Specifically, we propose a dual-mode gm scheme and a Scalable Self-reference Chirp Injection (SSCI) technique for the XO to surmount the operating challenges in both startup and steady state (Fig. 1). The reported XO includes load capacitors of 6 pF and suits common commercially available crystals. Yet, we can also apply the technique to crystals with different load capacitances.

Fig. 1
A circuit of X O crystal oscillator with a resistor, 2 capacitors, and inductors along with C M O S I C. A phasor diagram of output swing versus time for proposed X O, and steady state.

Overview of the proposed XO and illustration of tS improvement by two techniques: SSCI and inductive three-stage gm. The LM, CM, and RM are the modeled inductance, capacitance, and resistance of the crystal, respectively, whereas CS is the crystal’s stray capacitance

2.2 Fast Startup XO Using Dual-Mode gm Scheme and SSCI

For a crystal’s resonant frequency (fm) at tens of MHz, its ts (milliseconds) dominates the “on” latency of a duty-cycled radio, raising the average power consumption. In addition, for energy-limited EH sources, the ES of the XO is crucial as it may demand a large instant current from the EH source or reservoir. Recent XOs [13, 18,19,20,21,22] succeeded in reducing both ts and ES. Herein, we propose two techniques, the dual-mode gm and the SSCI, for balancing the XO performances in both startup (i.e., ts and ES) and steady state [i.e., power consumption and phase noise (PN)]. The envelope of the XO during startup at the time t is

$$ {A}_{\textrm{env}}(t)={A}_i\bullet {e}^{\frac{R_{\textrm{N}}-{R}_{\textrm{M}}}{2{L}_{\textrm{M}}}t}, $$
(1)

where Ai is the initial amplitude and RN is the negative resistance of the overall impedance viewed from the crystal core. The LM and RM are the motional inductance and resistance of the crystal, respectively. The aim of the SSCI is to increase Ai instantly after enabling the XO, while the dual-mode gm allows a boosted RN afterward. They together bring down tS without momentarily raising the startup power, culminating in a lower ES and a relaxed power-source design.

2.2.1 Scalable Self-Reference Chirp Injection (SSCI)

Signal injection to the XO can bring down ts if the injection frequency is close to fm of the crystal [19]. Instead of waiting for the XO to build up its oscillation amplitude, we can use an auxiliary oscillator (AO) to excite the crystal. Yet, due to the high Q nature of the crystal, such signal injection is only effective if its frequency error from fm is <0.5% [13]. There were several signal injection techniques for kick-starting the XO reported. We can categorize them into three groups: constant frequency injection (CFI) [18, 21, 22], dithering injection [13], and chirp injection (CI) [19].

CFI injects a clock signal into the crystal with a constant frequency precisely matching fm. Albeit this scheme is very efficient and simple in concept, the AO requires calibration as well as a delicate design that will be challenging in a sub-0.5 V design. As an example, the XO in [21] achieves ts values of 58/10/2 μs from 1.84/10/50 MHz crystals. Yet, it has a supply voltage of 1 V. Also, the ring oscillator entails frequency calibration after fabrication.

Dithering injection toggles the AO frequencies to compensate for the frequency deviation caused by temperature and voltage variations. As such, the injection signal can cover a wider frequency range than that of CFI. Still, trimming is necessary to compensate for the process variation. When compared with CFI, its effect on shortening ts is lower since the signal power spreads to a wider spectrum. For instance, the XO in [13] exhibits a slashed tS of <400 μs by using dithered-signal injection (dithered step size: 2%).

Here, we consider CI to be more robust and low cost, as it relies on a frequency-rich signal to excite the crystal and avoids frequency calibration. The principle is alike dithering but covers a wider frequency range. It gradually sweeps the oscillating frequency and progressively decreases/increases the frequency. As such, this chirping sequence can generate a spectrum between the highest frequency fH to the lowest frequency fL, as evinced by its Fourier transform [29]. If fL < fm < fH regardless of PVT variations, the crystal will persistently receive the power. Despite its weaker effectiveness on tS reduction since the power spreads to a wider band, CI has the benefit of no trimming on the AO. It is especially suitable for low-cost and ULV radios, where there is the possibility of exacerbating the frequency variation of the AO against voltage and temperature. In [19], a RN-boosting technique applies together with CI, showing a tS of 158 μs without trimming or calibration on the AO. Still, the related RC sweeping unit for modulating the frequency of the AO is area hungry (estimated ~90% of the chip area) due to its large time constant (at the order of 10 μs) for generating the chirping sequence. Table 1 summarizes the key features of the three signal injection techniques.

Table 1 Overview of different signal injection techniques to kick-start the XO

Herein, we introduce the SSCI (Fig. 2) that only entails an untrimmed oscillator with relaxed precision. Its frequency range can easily cover fm variation against PVT. Unlike the RC-based chirping [19], we incorporate a five-stage RO with a finite state machine (FSM) to control the oscillating frequency of the RO via a cap-bank. Subsequently, the circuit can generate the chirping sequence by referencing its own signal and requiring no area-hungry RC units to modulate the oscillating frequency. The FSM counts the number of pulses and sequentially raises COSC by sending the control signal fctrl to the RO. Additionally, compared to the analog sweeping technique in [19], the FSM can digitally scale the total injection time (tCI), decided by the number of exciting cycles at each cap-bank value COSC:

$$ {t}_{\textrm{CI}}=N\times \sum \limits_i{t}_i, $$
(2)

where N is the number of cycles to repeat at each COSC and ti is the period of a single cycle at i-th COSC. The average amplitude of oscillation on the crystal after the chirping sequence is proportional to \( \sqrt{t_{\textrm{CI}}} \) [19, 29]. Thus, N can be programmed to adjust tCI, rendering the XO easily compatible with different crystal parameters (i.e., an optimum tCI depends on LM, RM and RN (CS) [19]). This digital-intensive architecture is more area-efficient. The oscillation signal at the RO has a varying duty cycle with VT variation. To maximize the injection energy (i.e., 50% duty cycle), the chirp-modulated signal is a div-by-2 output of the RO. This output serves as both the exciting signal for the crystal via the output driver and the trigger signal for the FSM. After the injection, the FSM automatically powers down the RO.

Fig. 2
A circuit of S C C I with F S M, R O with M O S F E T, capacitors, and resistors, and chirp modulated signal as input to output driver along with the graph of C subscript oscillator versus time.

Proposed SSCI. It generates a chirping signal to kick-start the XO using an untrimmed RO with relaxed precision. The FSM (finite state machine) provides feasibility to scale tCI, accommodating different crystal packages (i.e., LM and CS)

2.2.2 Dual-Mode gm Scheme

The XO using a one-stage gm (AXO-1), especially for the Pierce oscillator, is popular as it can optimize the steady-state PN [13, 19,20,21]. The gm offers a negative resistance compensating for the equivalent resistance of the crystal. Its value also determines the growth of the oscillation amplitude before the XO reaches the steady state.

From Fig. 3a, by omitting the resistive loss induced by AXO-1, the impedance between the I/O (Zamp-1) becomes

$$ {Z}_{\textrm{amp}-1}=-\frac{g_{\textrm{m}}}{4{\omega_0}^2{C_{\textrm{L}}}^2}+\frac{1}{j{\omega}_0{C}_{\textrm{L}}}, $$
(3)

where CL is the designated crystal’s load capacitance and ω0 is the angular oscillating frequency 2πf0. With Zamp shunted by the crystal’s stray capacitance (CS), it affects the negative resistance (RN) of the overall impedance looking from the crystal core (ZC):

$$ {R}_{\textrm{N}}\equiv -\operatorname{Re}\left({Z}_{\textrm{c}}\right)=\frac{-\operatorname{Re}\left({Z}_{\textrm{amp}}\right)}{{\left[{\omega}_0{C}_{\textrm{s}}\operatorname{Re}\left({Z}_{\textrm{amp}}\right)\right]}^2+{\left[1-{\omega}_0{C}_{\textrm{s}}\operatorname{Im}\left({Z}_{\textrm{amp}}\right)\right]}^2} $$
(4)
Fig. 3
2 circuits of A subscript X O 1 crystal oscillator in steady state mode with 2 resistors, 4 capacitors, an inductor, and 1 gm, and start-up mode A subscript X O 3 with 3 gm.

XO using (a) a one-single gm (AXO-1) for the steady state and (b) a three-stage gm (AXO-3) for the startup

If ω0CS|Zamp| ≪ 1, we can have RN ≈ −Re(Zamp) that matches the expression in [13] for AXO-1. A large RN favors more tS reduction according to Eq. (1). Yet, for |Zamp| to be comparable with 1/ω0CS [i.e., a higher gm and thus |Re(Zamp)| to speed up the startup], we have to cogitate the effect from CS. Then, we can deduce the specific RN of AXO-1 (i.e., RN,1) from Eq. (4) as

$$ {R}_{\textrm{N},1}=\frac{4{g}_{\textrm{m}}{C_{\textrm{L}}}^2}{{\left({g}_{\textrm{m}}{C}_{\textrm{s}}\right)}^2+16{C_{\textrm{L}}}^2{\omega_0}^2{\left({C}_{\textrm{L}}+{C}_{\textrm{S}}\right)}^2}, $$
(5)

Taking the derivative of Eq. (5), we can obtain the maximum value of RN,1 with respect to gm at a fixed CL:

$$ {R}_{\textrm{N},1,\max }=\frac{C_{\textrm{L}}}{2{\omega}_0{C}_{\textrm{s}}\left({C}_{\textrm{L}}+{C}_{\textrm{s}}\right)}, $$
(6)

where we apply gm = 4ω0CL(1 + CL/Cs). Obviously, Im(Zamp-1) can only be negative (capacitive) for AXO-1, and RN,1 has an upper limit if only gm is the sizing parameter [19, 20]. For instance, the RN,1 is limited to 1.2 kΩ with CS = 2 pF, f0 = 24 MHz and CL = 6 pF, even if we apply an oversized gm = 14.5 mS. There were efforts to raise RN,1 by increasing gm or tuning CL temporarily during the startup [20, 30, 31]. Yet, increasing gm incurs larger power consumption and is unfavorable toward the reduction of ES. Further, Eq. (6) binds RN,1, with a maximum of 1/2ω0CS (i.e., 1.66 kΩ in the above example when CL ≫ CS and gm ≈ 4ω0CL2/Cs).

Inspecting Eq. (4), if a positive Im(Zamp) is possible to counteract the effect of CS, we can boost RN to surmount the aforesaid RN limit. The idea is to mimic a μH-range inductor on-chip for this purpose. Interestingly, a three-stage gm (AXO-3) with designated capacitive loads (Zo1–2) can effectively mimic an inductive effect during the startup (Fig. 3b). Although [32] applied a multistage gm to save the XO’s steady-state power, here, we explore first its inductive feature for tS reduction. For AXO-3, we define its Zamp as Zamp-3. We can maneuver both the Re(Zamp-3) and Im(Zamp-3) between a positive and a negative values by adjusting the inter-stage impedances, as demonstrated in [7]. For instance, if we set gm1,2 = 0.4 mS, gm,3 = 1.5 mS, ro1,2 = 7 kΩ, CL = 6 pF, ω0 = 2π × 24 MHz, and Co1 = Co2 = 0.5 pF, we can obtain a Zamp-3 = −1.6 + 1.2 jkΩ. We can utilize the Im(Zamp-3) > 0, manifesting that Zamp-3 is inductive, to mitigate Cs and break the limitation (Eq. (6)). Foregoing, we can have Re(ZC-3) = −2.4 kΩ due to the inductive AXO-3. Then, we can achieve a higher RN even with similar power consumption when compared with the AXO-1, enabling an energy-efficient startup. Due to the intricate expression of RN,3, we do its optimization numerically, before proceeding to the transistor level implementation. Besides, the technique is also applicable to different f0. Apparently, for the same power budget, AXO-3 is inferior to AXO-1 in terms of the steady-state PN, as each stage shares a smaller bias current and the noises accumulate. Also, Im(ZC-3), which determines the XO’s oscillating frequency, deviates from the designated value due to the presence of Co1 and Co2. This affects the accuracy of f0. Consequently, it is desirable to implement a dual-mode gm scheme that can balance the startup and steady-state performances. During the startup where the PN and accuracy of f0 are irrelevant, we enable AXO-3 and connect to the crystal to attain a larger RN for fast startup. When the crystal gains sufficient energy for oscillation, AXO-3 is off and disconnected from the crystal while AXO-1 takes over to sustain the oscillation. As a result, the XO can benefit from both AXO-3 (fast startup) and AXO-1 (low PN and accurate f0).

2.3 Transistor-Level Implementation

We design the core elements of the XO (e.g., AXO-1, AXO-3, and RO) to operate below a 0.5 V VDD. Only the static and DC circuits (digital logics and constant-gm bias circuit) operate at 0.7 V to facilitate the design. These circuits, mostly powered off during the steady state, consume <5 μA. Thus, an on-chip switched capacitor charge pump can easily generate the 0.7 V supply and share it with other blocks at the system level as described in [26].

Subthreshold common-source (CS) amplifiers with resistive loads (Fig. 4a, b) constitute the basis of both AXO-1 and AXO-3. Unlike other solutions that use current-source loads [13, 20, 21], the resistive load aids in preserving a moderate gm even with VDD < 0.35 V, for a small bias current (simulated at Idc = 100 μA). For instance, the simulated gm of AXO-1 is 1.3 mS at VDD = 0.3 V and −40 °C, being four times higher than that of the current-source load (assuming an identical gm with VDD = 0.35 V at 20 °C). Further, at high temperature, the intrinsic output resistance of the transistor decreases rapidly. This affects the stability of RN and causes variation on ts, especially for AXO-3. The AXO-1 with resistive load has a trade-off of lower immunity to the power supply noise (noise power from VDD modulated to the output of XO with resistive load that is 3 dB larger than its current-source-load counterpart at 1 kHz offset). Also, it has a large f0 variation with the gm of the AXO-1 not fixed. Still, this is manageable for the BLE standard (< ±50 ppm [33]), as well as other IoT protocols (e.g., ZigBee: ±40 ppm). A small nominal Idc of 100 μA is adequate for the expected PN.

Fig. 4
2 circuits of A X O 1 with 3 switches, 2 resistors, M O S F E T and V subscript in and out and A X O 3 with 5 switches, 6 resistors, 4 capacitance, and 3 M O S F E T with V subscript bias.

Circuit implementation of (a) AXO-1 and (b) AXO-3

A feedback resistor RF self-biases AXO-1, whereas AXO-3 is an AC-coupled three-stage CS amplifier aided by a constant-gm bias circuit. As the gm of the AXO-3 has a considerable impact on RN,3, the constant-gm bias circuit secures AXO-3 to be inductive and a stable RN,3 for robust-and-fast startup against PVT. We choose the channel lengths of the transistors such that their output resistances are ~10× larger than the resistors R1–3. This soothes the temperature dependency of RN,3 as R1–3 and then dominates ro1–3. We design AXO-3 to have similar power consumption (~100 μA) as AXO-1. As such, the power consumption does not vary instantaneously, easing the design and layout of the power supply. Each current branch includes CMOS switches where we can isolate AXO-1 or AXO-3 from the crystal, while lowering their leakage power (simulated <14 nW at 0.35 V and 20 °C) when disabled. Their sizes allow that their on-resistances are negligible when compared with R1–3.

Both the parasitic capacitances of the transistors and the finite I/O resistance of AXO-3 affect the RN,3. Thus, we should further optimize RN,3 via simulation. The total gm budget is 2.3 mS (total bias current: 100 μA, assuming a gm/ID = 23 V−1), with ro1–3 set according to the gm of each gain stage. Figure 5a shows the locus plots of Zamp-1 and Zamp-3 implemented with practical transistors and integrated passives. Zamp-1 is capacitive over all frequencies, while Zamp-3 is inductive over the 13–46 MHz range, which is compatible with different f0. Optimized at the most popular XO frequency of 24 MHz, the optimum RN,3 is 2.4 kΩ after paralleling it with a CS of 2 pF. This result is ~9× higher than RN,1 under the same gm budget and surpasses RN,1,max (Fig. 5b). The boosting effect is insensitive to the frequency between 15 and 34 MHz, under RN,3/RN,1 > 6.

Fig. 5
2 graph of Im versus Re part of Z subscript amp negative with a decreasing plot of capacitive and inductive, negative 3 and a graph of simulated R subscript N minus 1, negative 3 versus frequency.

(a) Locus plot of the Zamp-1,-3 against frequency. (b) Simulated RN,1 and RN,3 with a fixed total gm budget of 2.3 mS and the boosting ratio against frequency

Ideally, we should enable AXO-3 during the entire startup phase. Yet, the gm’s of M1–3 deviate from their small-signal values when the oscillation amplitude is growing. This results in an aggravated RN,3. As a consequence, the optimum active time of AXO-3 tsw is the time when RN,3 ≈ RN,1, which means AXO-3 no longer helps ts reduction. We can find the optimal tsw via simulations with measured crystal parameters to avoid any extra detection and control mechanism.

To realize the SSCI, we implement a five-stage RO constituted by CS amplifiers with source degeneration. Compared to the RO with inverters or relaxation oscillator, a RO with CS amplifiers balances the frequency stability and compatibility with the sub-0.5 V VDD. The source resistor (RS in Fig. 2) also reduces the variation of the oscillating frequency against VDD. From simulation, the frequency variation of RO reduces by ~20% over a 0.3–0.5 V VDD. We set RD as 36 kΩ. The current consumption of the RO is 20 μA. We implemented the div-by-2 unit and FSM with standard logic.

We designed the fH and fL of the SSCI module as 36 and 12 MHz, respectively, chosen to satisfy fL < fm < fH even with PVT variation (Fig. 6). The total size of the COSC, simulated to be 135 fF, outputs an fL of 12 MHz (after div-by-2). Then, we determine the resolution of the cap-bank, decided by the minimum duration of tCI; since for a complete chirping sequence, we need to sweep all of the states at least once, we set the minimum tCI (i.e., N = 1) as the resolution (number of pulses), defined in Eq. (2). The optimum tCI, according to [19] and the measured crystal parameter, becomes 4.6 μs. Thus, we set COSC as a binary-coded 6-bit cap-bank (unit cap: 2.14 fF), corresponding to a minimum tCI of 4 μs with the designated fH and fL. Even though there is a discrepancy between the applied and optimum tCI, it almost does not affect the ts as the tCI is only present for a short period when compared with ts. As the amplitude of oscillation after the CI is proportional to \( \sqrt{t_{\textrm{CI}}} \), even the applied tCI is 13% shorter than the optimum; the amplitude is only 7% smaller. Due to the high growth of the oscillation amplitude of the AXO-3 (time constant in Eq. (1): 9.33 μs), we can compensate for the discrepancy between the applied and optimum tCI by the AXO-3 quickly, for example, the growth of oscillation amplitude countervails the 0.6 μs discrepancy (~1.07×). No significant difference in ts will emerge, even with PVT variation on the tCI (Fig. 7).

Fig. 6
2 bar graph of occurrences versus low and high frequencies. The highest point is (15, 10) for f L. The lowest point is 28, 9) for f H. Values are estimated.

(a) Monte Carlo-simulated fL with VDD = 0.4 V and T = 90 °C; (b) Monte Carlo-simulated fH with VDD = 0.3 V and T = −40 °C. N = 30 for both cases

Fig. 7
A graph of delta t subscripts s versus deviation of t subscript C I from optimum. It plots a decreasing trend. The highest point is at (negative 80, 1.8). Values are estimated.

Increase in ts caused by the deviation of tCI from the optimum duration

The RO generates an oscillating signal at 2fH with COSC = 0 fF (with oscillating frequency governed by the parasitic capacitances) and COSC progressively increased by the FSM bit-by-bit according to N to COSC = 135 fF wherein the RO oscillates at 2fL. In this work, the variable N is digitally configurable among 1, 2, 4, and 8.

2.4 Experimental Results and Comparison with State of the Art

The XO, fabricated in 65 nm CMOS with fixed on-chip CL of 6 pF, occupied an active area of 0.023 mm2 (Fig. 8a), of which 36% corresponds to the CL (Fig. 8b). The target f0 can be flexible between 16 and 24 MHz. We first verify the SSCI functionality. Figure 9a exhibits the measurement of the oscillating frequency of the RO (after div-by-2) against COSC, which is consistent with the post-layout simulation. The average fL and fH across five dies at room temperature are 10.93 MHz (σ: 0.32 MHz) and 35.96 MHz (σ: 1.21 MHz), respectively. Figure 9b confirms the chirping sequence with N = 1, and Fig. 9c plots the duration of tCI against N.

Fig. 8
A photograph of a chip of length and breadth 250, 90 micrometers with 1 and 3 stage gm, constant gm bias, R O, and output driver along with the pie chart plots the area breakdown.

(a) Chip micrograph. (b) Area breakdown of the XO

Fig. 9
3 graph of frequency versus C subscript O S C code plots 3 decreasing trends, voltage versus time plots a chaotic fluctuating trend, and t subscript C I versus N with an increasing slope.

(a) Measured and simulated oscillating frequencies of the RO versus COSC at different conditions, robust to cover f0 of the crystal even with VDD and temperature variations. (b) Measured chirping sequence (N = 1). (c) Injection duration tCI against N. For the latter two figures, VDD = 0.35 V, T = 20 °C

Then, we tested the XO with a 24 MHz crystal (package: 3.2 × 2.5 mm2) without any startup aid at room temperature (20 °C) and VDD = 0.35 V. The measured crystal parameters LM, RM, CM, and CS are 11.1 mH, 19 Ω, 3.95 fF, and 1.3 pF, respectively. Under these conditions, we have ts = 1.3 ms (Fig. 10a). The ts decreases to 530 μs with AXO-3 enabled during the startup.

Fig. 10
A startup waveform without start-up aid and with S S C I and A subscript X O 3 enabled plots the V subscript D D, A subscript X O 1 alone, t subscript s with 1.3 milliseconds, and 400 microseconds.

Measured startup waveform (a) without startup aid and (b) with SSCI and AXO-3 enabled

We estimate RN,1 and RN,3 from the growth of the oscillation amplitude according to Eq. (1), which we can write as

$$ \ln \left(\frac{A_{\textrm{env}}\left({t}_0+\Delta t\right)}{A_{\textrm{env}}\left({t}_0\right)}\right)=\frac{R_{\textrm{N}}-{R}_{\textrm{M}}}{2{L}_{\textrm{M}}}\bullet \Delta t. $$
(7)

By measuring the growth of the oscillation amplitude within a specific time interval, we can estimate the RN of the XO. For AXO-1, the growth of oscillation is 1.01×/μs, and thereby we calculate RN,1 as 230 Ω (Fig. 11), which is close to the prediction (as described in Sect. 2.3). Similarly, we find RN,3 ≈ 2.2 kΩ. Owing to two reasons, the reduction of ts is not commensurate with the RN-boosting ratio between AXO-3 and AXO-1. Firstly, as described in Sect. 2.3, M1–3 will deviate from their nominal operating points and deteriorate RN,3. We can reveal this by measuring ts against tsw (Fig. 12). When tsw is short (<60 μs) where M1–3 are in the subthreshold region, the small-signal model is still valid to estimate ts against tsw (i.e., slope of the curve (~ −10) closely matches with −RN,3/RN,1 + 1). As tsw further increases, the oscillation drives M1–3 away from its original operating point and worsens RN,3. Hence the slope of the curve declines and eventually reaches zero whereas the AXO-3 no longer aids ts-reduction. Secondly, the XO entails an overhead time to enter the steady state after switching to AXO-1. After this, the XO still takes ~380 μs to enter the steady state. Here, the nonideality of the ULV AXO-3 limits the improvement on ts. In fact, for the amplifiers with standard I/O voltage and higher output swing, the reduction of ts should be more profound and better matched with the RN-boosting ratio.

Fig. 11
A graph of In versus delta t. It plots an increasing slope with a slope of 0.099 per microsecond and R subscript N as 2.2-kilo ohm for A subscript X O 3.

Estimated RN from the exponential growth of XOUT’s amplitude (before the transistors enter the triode/cutoff region)

Fig. 12
A graph of t subscript s versus A subscript x o 3 enabling time, t subscript SW. It plots a decreasing slope with saturation as the transistors enter triode or cut-off region.

Total ts versus tsw, the enabling time of AXO-3 (without SSCI)

With both AXO-3 and SSCI enabled, we further decrease tS to 400 μs (3.3× reduction) and the corresponding ES is 14.2 nJ (2.8× reduction) (Fig. 10b). When switching from AXO-3 to AXO-1 that have different output impedances and, subsequently, operating frequencies, there is an instantaneous change in the output swing, since the magnitude of current passing through the crystal does not change abruptly. The percentage of energy consumed in the startup phase by the SSCI, AXO-3, and AXO-1 is: 7%, 39%, and 53%, respectively. We verified that tsw can tolerate ±50% uncertainty for <10%tS variation, implying that we can obtain an adequate ts even with nonoptimal tsw (e.g., variation on PVT and crystal’s parameters). This also justifies that the existing RO will be good enough to control tsw, avoiding any external detection and control mechanism.

For the transient frequency of the XO, it takes ~300 μs to settle for a ±20 ppm f0 accuracy (i.e., 50 kHz drifting from the center frequency of 2.44 GHz in a packet, as defined in [33]). This result is 3.5× faster than the case without startup aid (Fig. 13). The steady-state power is 31.8 μW at 0.35 V, and the PN is −134 dBc/Hz at 1 kHz offset, being adequate for most IoT applications and comparable to other state-of-the-art XOs with a standard voltage (e.g., PN of −136 dBc/Hz at 1 kHz and f0 = 26 MHz in [10]).

Fig. 13
A graph of delta f subscript 0 per f subscript 0 versus time. It plots a decreasing trend and chaotic fluctuating decreasing wave for 0.3 and 1.05 milliseconds.

Transient f0 profiles of the XO (VDD = 0.35 V, T = 20 °C)

The XO can uphold a steady-state output swing >80% of VDD for VDD = 0.3–0.5 V. The ts varies <25% from its mean (400 μs) for VDD = 0.3–0.5 V (Fig. 14a). Only the RO of the SSCI fails to start if VDD drops down to 0.25 V, but AXO-3 is still in place to aid tS reduction. Over −40–90 °C, tS variation is <7.5% (Fig. 14b). We obtained similar results for a 16 MHz crystal (i.e., Δf0/f0 = 13.4 ppm over 0.3–0.5 V, Δf0/f0 = 21.9 ppm over −40–90 °C, and tS variation, 9.8%).

Fig. 14
2 graph of start-up time versus voltage with a decreasing plot for no S S C I as R O fails to operate and temperature with 3 decreasing plots for 0.3, 0.35, and 0.4 volts.

Measured XO (f0 = 24 MHz) performances. (a) Startup time against VDD. (b) Startup time against temperature

Table 2 benchmarks the performance of the XO with the prior art. In terms of Es, this work is >2.6× better than [20] and slightly higher than [21]. Furthermore, we can consider this circuit in the vanguard, since it proves the feasibility of regulation-free operation under a wide range of sub-0.5 V VDD, while conforming to the frequency-stability specification of the BLE (Bluetooth Low Energy) standard.

Table 2 Performance summary and comparison with recent art

3 A 0.35 V 5200 μm2 2.1 MHz Temperature-Resilient Relaxation Oscillator with 667 fJ/cycle Energy Efficiency Using an Asymmetric Swing-Boosted RC Network and a Dual-Path Comparator

3.1 Motivation

For the crystal-less IoT node [34] and wakeup receiver [35], low-power and fully integrated kHz-to-MHz clock sources with moderate frequency inaccuracy are pivotal to their operations. For instance, [35] requires a frequency reference with ~2.5% frequency accuracy to calibrate the digitally controlled oscillator of the wakeup receiver. Although the crystal oscillator offers better frequency stability, a typical MHz-range crystal oscillator can consume tens of μW, which is impermissible for the always-on module of an IoT node. In fact, we expect a μW-range power budget in the standby mode [23]. Also, the presence of an off-chip crystal can restrict the volume miniaturization of the IoT nodes.

The ring oscillator is a viable solution among the fully integrated oscillators due to its outstanding power efficiency, tuning range, and compactness [36]. Yet, the oscillating frequency of the ring oscillator is prone to PVT variations that require extra circuitry for compensation. For the LC oscillator, it has a proper balance between the integration level and frequency stability [37, 38]. Yet, the LC tank is too bulky for MHz-range applications.

Recent relaxation oscillators (RxOs) [39,40,41,42,43,44,45,46,47] proved their potential by attaining fast settling time, moderate intrinsic frequency stability, tiny footprint, and high energy efficiency. A typical RxO consists of a period-defining network, amplifiers, and logic gates. The period-defining network periodically (dis)charges the capacitors therein, and the amplifiers compare the voltages on the capacitors with a reference voltage. The logic gates read the output from the amplifiers and generate the required output correspondingly.

For IoT nodes powered by sub-0.5 V energy-harvesting sources such as the thermoelectric generator and solar cell, ULV operation adds to the RxO design constraints. Existing RxO architectures [39,40,41,42,43,44] do not favor sub-0.5 V operation, which severely confines the voltage headroom. Hence the linearity and accuracy of the current and voltage references are inferior, and their degraded precisions can affect the RxO’s stability. Also, at high temperature, the transistor’s leakage current (ILeak) limits the performance of the current/voltage reference.

Recently, a swing-boosted differential RxO proposed in [45] featured a symmetric swing-boosted RC network to define the period of the RxO, enabling no current or voltage reference while delivering a swing-boosted output to improve the noise performance. As this architecture does not entail current or voltage reference, it allows scaling down of the VDD without affecting the RC network precision. Nevertheless, it has the common-mode voltage (VCM) of the RC network restricted to mid VDD, which implies VCM < 0.25 V for sub-0.5 V operation, thereby hindering the operation of its subsequent comparator.

This section proposes a RxO that surmounts the challenges of sub-0.5 V operation and achieves high area and energy efficiencies. The key techniques are (1) an asymmetric RC network to free the VCM restriction while preserving a swing-boosted output and (2) a dual-path comparator with delay compensation to allow temperature resilience. Prototyped in 28 nm CMOS, the RxO occupied a tiny area (5200 μm2) and attained superior energy efficiency (667 fJ/cycle) and figure of merit (FoM1 = 181 dB) with respect to the prior art.

3.2 Asymmetric Swing-Boosted RC Network

Figure 15a depicts the schematic of the swing-boosted RC network. As demonstrated in [45], the RxO utilizing this RC network exhibits low jitter (σjit) attributed to its swing-boosted output voltages (Vx,y) from the symmetric RC network (k = 1).

Fig. 15
A circuit of swing-boosted differential R x O with 2 resistors, 2 capacitors, and an integrator to produce output along with the timing diagram for symmetric and asymmetric R C network.

(a) Simplified schematic of the swing-boosted differential RxO. (b) Timing diagram of the output of the RC network with k = 1, with VCM fixed to 0.5 VDD. (c) Timing diagram of the output of the RC network with k > 1 such that VCM,U and VCM,D suit the design of the subsequent ULV comparator (this work)

Considering Ø1 (Fig. 15b), Vx is initially at the ground and Vtop connects to VDD, whereas Vy is initially at VDD and Vbot connects to the ground. Vx charges to VDD and Vy charges to the ground with time constant (τ) RC. When they cross at VCM such that Vy < Vx, the comparator inverts its outputs. Consequently, the chopper alternates the connections, where Vtop now connects to the ground and Vbot connects to VDD. As the charges across the capacitors conserve, Vx and VY change to VCM + VDD and VCM − VDD after the transition. The process in Ø2 is complementary, and the operation repeats Ø1 after another transition. Hence, the differential signal Vx,y has a swing of 2 × VDD. Since the σjit of the RxO is inversely proportional to the slope of Vx,y at the threshold (Sxy), raising the swing of Vx,y increases Sxy and improves the σjit.

The RC network symmetry restricts VCM to mid VDD regardless of the oscillation phases (Ø1,2). As VDD decreases to <0.5 V, the VCM shrinks to <0.25 V, which is insufficient to properly bias a differential pair with a tail current source. To break this limit, we propose an asymmetric RC network (k > 1), in which one RC branch has a larger τ. From Fig. 15c, this act facilitates Vx,y to (dis)charge at different τ. The leaps on Vx and Vy after the chopping are still ±VDD, whereas the VCM of Vx and Vy alternate between VCM,U and VCM,D in Ø1 and Ø2, respectively. As such, we can design k that allows proper VCM,U (VCM,D) and thereby favors the operation of the subsequent ULV comparator.

Analyzing the waveform in Fig. 15c, we can derive four equations governing the (dis-)charge of the asymmetric RC network:

$$ \left({V}_{\textrm{CM},\textrm{D}}+{V}_{\textrm{DD}}\right){e}^{-\frac{T_1}{k\textrm{RC}}}={V}_{\textrm{CM},\textrm{U}}, $$
(8)
$$ \left({V}_{\textrm{CM},\textrm{D}}-2{V}_{\textrm{DD}}\right){e}^{-\frac{T_1}{\textrm{RC}}}+{V}_{\textrm{DD}}={V}_{\textrm{CM},\textrm{U}}, $$
(9)
$$ \left({V}_{\textrm{CM},\textrm{U}}+{V}_{\textrm{DD}}\right){e}^{-\frac{T_2}{\textrm{RC}}}={V}_{\textrm{CM},\textrm{D}}, $$
(10)
$$ \left({V}_{\textrm{CM},\textrm{U}}-2{V}_{\textrm{DD}}\right){e}^{-\frac{T_2}{\textrm{kRC}}}+{V}_{\textrm{DD}}={V}_{\textrm{CM},\textrm{D}}. $$
(11)

Assuming that T1 = T2, solving Eqs. (8)–(11) leads to

$$ {\left(\frac{V_{\textrm{DD}}-{V}_{\textrm{CM},\textrm{D}}}{{\textrm{V}}_{\textrm{DD}}+{\textrm{V}}_{\textrm{CM},\textrm{D}}}\right)}^k=\frac{V_{\textrm{CM},\textrm{D}}}{2{V}_{\textrm{DD}}-{V}_{\textrm{CM},\textrm{D}}}, $$
(12)
$$ {\left(\frac{V_{\textrm{CM},\textrm{U}}}{2{V}_{\textrm{DD}}-{V}_{\textrm{CM},\textrm{U}}}\right)}^k=\frac{V_{\textrm{DD}}-{V}_{\textrm{CM},\textrm{U}}}{V_{\textrm{DD}}+{V}_{\textrm{CM},\textrm{U}}}, $$
(13)
$$ k=\frac{T}{2\textrm{RC}}/\ln \left(\frac{1+3{e}^{-T/2\textrm{RC}}}{1-{e}^{-T/2\textrm{RC}}}\right), $$
(14)

where T1 = T2 = T/2. Therefore, we can calculate the required k to achieve a sufficient separation of VCM,U (VCM,D) by numerically solving Eqs. (12) and (13), as well as the corresponding T by Eq. (14). Figure 16a illustrates the VCM,U, VCM,D, and T versus k.

Fig. 16
2 graph of V subscript C M normalized to V subscript D D, oscillating frequency normalized to case k = 1 and S subscript X Y versus K. It plots 2 decreasing trends and 2 decreasing slopes.

(a) The simulated VCM,U, VCM,D, and the oscillating frequency versus k. Choosing a k > 1 enables a lower (higher) VCM,D (VCM,U), facilitating the ULV operation. (CLK) The SXY from mathematical modeling and simulated 1/σjit from an ideal RxO with asymmetric RC network versus k. Overdesigning k decreases the SXY and thus aggravates σjit

The Sxy around the threshold crossing determines the σjit with the following equation [48]:

$$ {\sigma}_{jit}=\alpha \frac{V_{n, xy}}{S_{xy}}, $$
(15)

where 𝛼 is a constant of proportionality and Vn,xy is the equivalent noise from the RC network and the subsequent comparator appearing at its output. We can determine Sxy by solving for the difference between the derivative of VX and VY when t = T/2 (the time when crossing occurs),

$$ {S}_{\textrm{x}\textrm{y}}=\frac{d{V}_{\textrm{x},\textrm{y}}}{dt}\left(t=\frac{T}{2}\right). $$
(16)

For instance, in Ø2, VX and VY become

$$ {V}_X(t)=\left({V}_{\textrm{CM},\textrm{U}}+{V}_{\textrm{DD}}\right){e}^{-\frac{t}{\textrm{RC}}}, $$
(17)
$$ {V}_Y(t)=\left({V}_{\textrm{CM},\textrm{U}}-2{V}_{\textrm{DD}}\right){e}^{-\frac{t}{k\textrm{RC}}}+{V}_{\textrm{DD}}, $$
(18)

where we set t = 0 as the beginning of Ø2. Taking the derivative of VX with respect to t and substituting t = T/2, we can get

$$ \frac{d{V}_X}{dt}\left(t=\frac{T}{2}\right)=-\frac{1}{\textrm{RC}}\left({V}_{\textrm{CM},\textrm{U}}+{V}_{\textrm{DD}}\right){e}^{-\frac{T}{2\textrm{RC}}}, $$
(19)

and substituting Eq. (10) into Eq. (19):

$$ \frac{d{V}_X}{dt}\left(t=\frac{T}{2}\right)=-\frac{1}{\textrm{RC}}{V}_{\textrm{CM},\textrm{D}}. $$
(20)

Similarly, we can obtain the slope of VY at t = T/2:

$$ \frac{d{V}_Y}{dt}\left(t=\frac{T}{2}\right)=-\frac{1}{k\textrm{RC}}\left({V}_{\textrm{CM},\textrm{D}}-{V}_{\textrm{DD}}\right). $$
(21)

Then, Sxy in Ø2 is

$$ {S}_{xy}=-\frac{1}{\textrm{RC}}\left({V}_{\textrm{CM},\textrm{D}}-\frac{V_{\textrm{CM},\textrm{D}}}{k}+\frac{V_{\textrm{DD}}}{k}\right), $$
(22)

where we can find the relationship between VCM,D and k from Eq. (12). Note in (3.22) that when k = 1 (symmetric RC network as in [45]), Sxy = −VDD/RC, showing that a higher VDD improves Sxy and thus σjit. Figure 16b shows the Sxy as a function of k. Under the identical RC and VDD, increasing k results in decreasing Sxy. We can calculate Sxy similarly in Ø1; provided that T1 = T2, Sxy in Ø1 should be equivalent (in negative) to Sxy in Ø2.

Based on Fig. 16a, b, we can have the following takeaway: a large k allows VCM,U (VCM,D) to approach VDD (ground), easing the use of an NMOS (N-metal-oxide semiconductor) (PMOS [p-channel metal-oxide semiconductor])-input amplifier for comparisons. Yet, upsizing k penalizes σjit since σjit ∝ 1/Sxy. Besides, pushing VCM,U (VCM,D) close to VDD (ground) saturates the input pairs of the subsequent amplifiers. Then, there is a trade-off between the minimum VDD and σjit for the RxO utilizing the asymmetric RC network. The minimum gate voltage at the NMOS-input amplifier is ~0.2 V (i.e., 0.1 V for the tail current source +0.1 V for the gate-source voltages of the differential pair), and the minimum VDD of the comparator is ~0.35 V (explained in Sect. 3.3). To yield a minimum VCM,U of 0.2 V to drive the NMOS-input amplifier with 15% margin, we choose k = 2.4 such that VCM,U is 0.23 V (0.66 × VDD). During the fabrication, the mismatch between the resistors diverts VCM,U (VCM,D) from their desired values. Nevertheless, since k is the ratio between the resistors, we can minimize its variation through a delicate layout and a common centroid technique. This means that a 15% margin is adequate to safeguard the operation of the RxO. Correspondingly, we positioned VCM,D at 0.33 × VDD to favor the PMOS-input amplifier.

With k = 2.4 in Fig. 16b, Sxy reduces by 39%. To verify the degradation of σjit, we built an ideal RxO utilizing the asymmetric RC network with a noise source and simulated the σjit with different values of k. We juxtapose the simulated 1/σjit of such RxO in Fig. 16b. The 1/σjit decreases (hence σjit increases) at a similar rate of k with Sxy. The 1/σjit at k = 2.4 decreases by 36%, thus verifying our analysis.

3.3 Circuit Implementation

3.3.1 ULV Comparator with Dual-Path Amplifiers

In [45], the RxO utilizes an inverter-based amplifier for voltage comparison. Although this amplifier has excellent noise performance, it is not suitable for ULV operation as it requires a minimum voltage headroom of 2(VGS + VDS). We proposed the asymmetric RC network in Sect. 3.2 for ULV operations, where we can adjust the VCM,U (VCM,D) according to k. To cope with different VCM at two phases of oscillations under a ULV headroom, we utilize a comparator with dual-path amplifiers to handle the voltage comparisons across Vx,y. The comparator consists of an NMOS-input, a PMOS-input amplifier, and logic gates to generate the CLK signal. The NMOS-input amplifier, enabled in Ø1, is capable of handling a higher input VCM, where VX and VY cross at VCM,U, with the PMOS-input amplifier disabled. The complementary operation happens in Ø2. As such, both amplifiers can perform comparisons under the ULV headroom. When compared with the case using k = 1 and only a PMOS-input amplifier, the variation of the RxO’s oscillating period (TOSC) reduces by ~40%.

Figure 17a, b presents the proposed ULV RxO, with each amplifier built by cascading three gain stages, each formed by a fully differential common-source (CS) amplifier (Fig. 18a), to boost the overall voltage gain. The simulated gains of the cascaded amplifiers are >27 dB. Following the amplifiers, the logic gates generate the CLK signals and operate the chopper of the RC network after boosting to CLKH (explained below).

Fig. 17
2 circuits of U L V swing boosted R x O of asymmetric R C network and a dual path comparator with a chopper, P M O S input amplifier, and S R latch with logic gates with timing diagram.

(a) Proposed ULV swing-boosted RxO featuring an asymmetric RC network and a dual-path comparator. We track the delays of the amplifiers to tackle the frequency fluctuation against temperature and voltage variations. (b) Schematic of the logic gates. The SR latch, together with the delay unit, guarantees that the RxO only generates desired oscillating signal without glitch

Fig. 18
A schematic of differential C S amplifier with 6 M O S F E T, and C M F B circuit for N M O S C S amplifier with 8 M O S F E T along with V subscript out and ref.

(a) Schematic of the differential CS amplifier (NMOS). (b) CMFB circuit for the NMOS CS amplifier

Since we can adjust the VCM,U (VCM,D) of the RC network between VDD and ground by choosing an appropriate k, the main limitation for the minimum VDD of the RxO derives from two factors: the dual-path amplifier and the logic gates. Assuming all transistors biased in the subthreshold region with the gate voltages bounded between VDD and ground, the minimum VDD of the differential CS amplifier is VSD,1 + VDS,3 + VDS,5 (in Fig. 18a) if we assume the VDS-drop on M6, the transistor for power-gating, is negligible. To maintain operation in the subthreshold region, the |VDS| of a transistor should be >3 × VT, where VT is the thermal voltage. The VT reaches 34 mV at 120 °C. Hence, the minimum VDD of the differential CS amplifier is 306 mV in theory. We allow ~10% margin for the design and choose a VDD of 0.35 V. On the other hand, the necessary VDD for the logic gates to operate under the desired oscillating frequency also limits the minimum VDD. In the selected CMOS 28 nm process, the delay of the logic gates with VDD of 0.35 V varies <1% of TOSC from −20 to 120 °C, evincing that a VDD of 0.35 V is sufficient to power the logic gates.

The comparator’s delay (tdelay) affects the TOSC stability. As described later, a delay generator compensates for tdelay under different operating conditions. Here, we target a maximum Δtdelay ~ 25% of TOSC across −20 to 120 °C such that the resultant Tosc variation after compensation is <2.5%, reserving a 10% mismatch margin between tdelay and the delay generator. The simulated tdelay (N + P channel) ranges from 17 ns at 120 °C to 146 ns at −20 °C under a power consumption of 500 nW (at 27 °C), with a variation ~10% above the target.

The gate voltages of M3 and M4 determine the operating region of M5 (Fig. 18a). To guarantee M5 operates in the subthreshold region, VDS,5 needs to be higher than 3 × VT. We can either increase Vin,P (Vin,N), which is the RC network output for the first amplifier, by upsizing k or decreasing the VGS of M3 and M4. As explained in Sect. 3.2, upsizing k deteriorates the σjit. On the other hand, under the same bias current and channel length, decreasing VGS incurs a wider M3(M4), thus exacerbating the tdelay and the RxO’s frequency stability. From the simulation, the amplifier’s delay raises by 26% with the VGS of M3(M4) reduced by 10 mV (with the width of M3(M4) enlarged). We aim for a VGS of 0.1 V for M3(M4) to achieve a proper trade-off between the tdelay and σjit.

Since each amplifier is only responsible for comparing Vx and Vy in one phase, we can have them power-gated based on the CLK state to reduce the power consumption. For instance, in Ø1 where CLK is high and the common-mode voltage of Vx and Vy is at VCM,U, we enable the NMOS-input amplifier for comparison, while powering down the PMOS-input amplifier. The operation reverses in Ø2. This duty-cycling scheme saves 26% of the total RxO power budget.

To ensure that M1 and M2 operate in the subthreshold region, a common-mode feedback (CMFB) circuit generates their gate voltages (Fig. 18b). The CMFB circuit compares the common-mode output voltage of the amplifier to Vref and corrects VFB. We scaled the transistors’ sizes of the CMFB circuit from the main amplifier such that the PVT variations have the same effect on the amplifier and CMFB circuit to enhance its robustness.

We utilized a SR latch to read the results from the amplifiers and yield the desired state of CLK. Also, we used a delayed CLK (\( \overline{\textrm{CLK}}\Big) \) signal CLKD (\( \overline{\textrm{CL}{\textrm{K}}_{\textrm{D}}}\Big) \) to mask out the glitches and avert the undesired transition of CLK due to glitches from the amplifiers during the switching. For instance, as illustrated in Fig. 17b, before the end of Ø1 (CLK and CLKD are high), both S and R of the SR latch are high and maintain the state of CLK. Therein, with the NMOS-input amplifier enabled, we disable the PMOS-input amplifier. Once VX > VY, R becomes low and S is still at high (since \( \overline{\textrm{CL}{\textrm{K}}_{\textrm{D}}} \) is low), which forces CLK to low. Then, the circuit enables the PMOS-input amplifier, while disabling the NMOS-input amplifier. During the switching of the amplifiers, we may have an undesired transition on Vout,N/Vout,P. The CLKD signal and the NAND gates guarantee that these undesired glitches do not affect the state of CLK. After a delay of τd, CLKD goes low. Both S and R are high again, and the SR latch maintains the state of CLK until Vout,P goes high (VX < VY). The operation repeats itself after another transition of CLK. A simple RC circuit and inverters with τd of ~80 ns implement the delay unit. We selected τd to allow sufficient margin before the zero-crossing point of VXY without affecting the comparison, yet it would be long enough to filter out the glitches from the amplifiers during the switching amid PVT variation.

A constant-gm bias circuit aids the amplifiers in withstanding voltage and temperature variations [49]. A switched-capacitor voltage doubler (Fig. 19a) powers the bias circuit, which extends the voltage headroom (2 × VDD ≈ 0.7 V). As we can reuse the CLK signal from the RxO itself to operate the voltage doubler, the power (11%) overhead is low. During the start-up, there is no CLK signal yet to drive the voltage doubler, and hence there would be no output from the bias circuit without any auxiliary signal. Thus, a start-up pulse (duration ~1 μs, generated on-chip after VDD rises) enables an auxiliary ring oscillator (RO) to operate the voltage doubler in this start-up phase (Fig. 19b, c). With the V2X boosted up to ~2 × VDD, the bias circuit functions properly within this period. Then, we disable the start-up pulse and the auxiliary RO, with the RxO starting to operate. Like this, the RO does not pose interference to the RxO nor affect the accuracy of the RxO’s frequency. The RO’s frequency ranges from 15.2 to 35.1 MHz across −20–120 °C.

Fig. 19
A schematic of switched ultra-low voltage capacitor voltage doubler with 4 M O S F E T and a V subscript D D, auxiliary R O with voltage doubler, and their corresponding g timing diagram.

(a) Schematic of the switched capacitor voltage doubler. (b) The auxiliary RO that drives the voltage doubler during the startup. (c) Timing diagram of the auxiliary RO and the voltage doubler

3.3.2 Delay Generators

The temperature dependency of tdelay affects RxO’s TOSC. Ideally, TOSC is only dependent on the RC network. However, the tdelay after the zero-crossings of Vx,y prolongs the duration of each phase. As tdelay is temperature-dependent, it deteriorates the RxO’s frequency stability. Raising the amplifiers’ power budget can diminish the ratio tdelay/TOSC, but it penalizes the RxO energy efficiency. In [42], a period controller compensates tdelay by doubling the current injected into the period-defining capacitors, in which the current injection duration tracks tdelay. As such, it can correct TOSC to minimize its temperature sensitivity. Yet, the period controller entails an extra comparator for copying tdelay, penalizing the power budget.

Since the delay of an amplifier relates to its bias current, we introduce a delay generator to create a pulse, with its width inversely proportional to the bias current. As demonstrated in Fig. 20a, two delay generators (for NMOS- and PMOS-input amplifiers) with scaled currents from the main amplifiers generate the pulses after the edges of CLKH. From the simulation, the width of the pulses ØF closely tracks tdelay (error <7.6% of tdelay or <2.3% of TOSC). To compensate tdelay, we halve the τ of the RC branches when ØFH = 1 by closing switches S1 and S2 in Fig. 17a. The open-loop compensation scheme alleviates the long settling time of the oscillator. Furthermore, this compensation method can even off the temperature dependency of the resistors in the RC network, avoiding area-hungry composite resistors to obtain a zero temperature coefficient (TC) [42, 46].

Fig. 20
A circuit of delay generator with p and n channel along with its timing diagram. And 4 graphs of time versus temperature, amplitude versus time, t subscript D P, and D N versus code.

(a) Proposed delay generator to track the tdelay at different operating conditions and its timing diagram. (b) Matching between tdelay and tDN + tDP against temperature variation (under nominal case). (c) Principle of the delay compensation: when ØFH is high, τ of the RC branches halved thus Vx,y (dis)charge at a double rate to compensate tdelay. (d, e) The Monte Carlo-simulated tDP and tDN (100 runs) at 27 °C with different input codes for the capacitor banks

We implemented the delay-controlling capacitors CN and CP as four-bit capacitor banks, with their values programmed to balance the process variation once after fabrication. The design of the tuning ranges of the capacitances can cover the variations of tdelay amid process variations. The tdelay of NMOS-input and PMOS-input amplifiers vary from 15 to 45 ns and 36 to 60 ns, respectively, from the Monte Carlo simulation (100 runs, at 27 °C). Consequently, we design the delay generator and the capacitor banks capable of generating pulses of width in this range by adjusting their codes correspondingly (Fig. 20d, e). With the proposed compensation scheme, the simulated variation of TOSC decreases from 25% to 2.1% over −20–120 °C. For the constant-gm biasing, the current decreases with temperature. Hence, both IBN and IBP, the biasing currents of the NMOS-input and PMOS-input amplifiers, are minimum at −20 °C. Consequently, the tDN and tDP are largest at −20 °C and decrease to their minimum toward 120 °C. Therefore, we have the overall resolutions of tDN and tDP confined at low temperature (7 ns and 13 ns). Still, these resolutions are sufficient to uphold the 2.5% frequency error requirement. In case a finer resolution is necessary, the number of bits of the capacitor banks can increase.

3.3.3 CLK Boosters

The non-idealities of the switches influence the performance of the RxO. For example, the nonzero on-resistances (RON) of the transistors that constitute switches S1–6 (in Fig. 17a) affect the τ of the RC network. Under sub-0.5 V, the transistors work in the subthreshold region. Then, the situation emerges as RON increases exponentially with –(VGS − VTH), where the worst case of |VGS| is 0.5 × VDD without any boosting technique. Further, as RON is prone to temperature variations (RON increases with a decreasing temperature), it inevitably affects the frequency stability of the RxO. To alleviate the impact, we should minimize RON in comparison with R in the RC network. One possibility is reducing RON by upscaling the widths of the transistors that compose the switches. Yet, this act leads to another problem: in the deep submicron CMOS process, the ILeak in the off-state, especially at high temperature, restricts the RxO’s performance and operation range. Considering the switches S1–2 in Fig. 17a again, at high temperature, the transistors with high ILeak equivalently reduce τ. Altogether, there is a trade-off between their RON at low temperature and ILeak at high temperature.

To tackle this challenge, we employ clock boosters [50] to triple the swing of the digital signals (CLKH, \( \overline{{\textrm{CLK}}_{\textrm{H}}} \), and ØFH). The clock booster, powered from VDD, increases the swing of the periodic signal (high, 2 × VDD; low, –VDD) without additional power supply. With a boosted swing, the worst |VGS| for the transistors now becomes 1.5 × VDD. Besides, benefitting from the negative voltage (–VDD) at the logic low level, it effectively suppresses ILeak, even at 120 °C. For example, this scheme not only tightens the variations of the RON of an NMOS switch across −20–120 °C by 8600× (VD = VS = 0.5 × VDD, Fig. 21a) but also shrinks ILeak in the off-state at 120 °C from 307 to 0.8 nA (Fig. 21b), rendering the RxO robust in an extreme environment.

Fig. 21
2 graph of R subscript O N and drain current versus temperature. Each plot has an increasing and decreasing trend for V subscript G as 0.35 and 7 Volt.

(a) RON of an NMOS from −20 to 120 °C with different VG. For both cases, VD = VS = 0.175 V. The increased swing on VG reduces the variations of RON by 8600×. (b) ILeak of the same NMOS in (a) in the off-state. With a negative VG, the ILeak reduces by 389× at 120 °C. For both cases, VD = 0.35 V and VS = 0 V

3.4 Measurement Results

We fabricated a prototype of the RxO in 28 nm CMOS 1P10M technology. It occupied a core area of 5200 μm2, dominated by the comparator (28%) and RC network (26%) (Fig. 22a, b). The RxO consumed 1.4 μW at 22 °C on average (N = 7) (Fig. 23a, b)), where the comparator (49%, from simulation) dominates (Fig. 22c). After the fabrication, we apply three-point trim to the capacitor banks of the delay generator based on the measured frequency of the RxO.

Fig. 22
A photograph of the chip of length and breadth of 85 and 61 micrometers with R C network, delay generator, C L K booster, and comparator along with the pie chart plots the area and power breakdown.

(a) Chip micrograph of the fabricated RxO in 28 nm CMOS. (b) Area breakdown of the RxO. (c) Power breakdown of the RxO (from simulation)

Fig. 23
4 graph of power versus temperature and voltage, frequency deviation versus temperature and voltage. Each plot has 6 fluctuating trends.

Measured performance of the RxO from seven chip samples. (a) Power consumption versus temperature. (b) Power consumption versus VDD. (c) Frequency stability versus temperature. (d) Frequency stability versus VDD

Peripheral equipment such as the oscilloscope (for observing the waveform in real-time) and the frequency counter (for measuring the frequency f) have high input capacitances. The digital buffers with a VDD of 0.35 V and reasonable sizing are not capable of driving these equipment. Thus, we utilize on-chip-level shifters to raise the output signals for swings of 0.9 V. Afterward, we feed such signals to digital buffers with a VDD of 0.9 V (supplied independent of the RxO’s VDD) to drive the peripheral equipment.

The mean oscillating frequency of the RxO is 2.1 MHz. It has an energy efficiency of 667 fJ/cycle, rendering it the most energy-efficient RxO reported in the MHz-range. After calibrations, the deviations of the RxOs’ frequencies are <2.5% from −20 to 120 °C (Fig. 23c). The resulting TC is 158 ppm/°C on average. The mean variation of the RxO’s frequencies from 0.35 to 0.38 V (~9% of VDD) is 2.5% (Fig. 23d). The line sensitivity, where we also take the supply voltage into account \( \left[\left(\frac{\Delta f}{f}\right)/\left(\frac{\Delta V}{V}\right)\right] \), is 26.8%. The large sensitivity of the RxO to voltage variation is attributable to the subthreshold operation and low VDS across the transistors of the amplifiers. From the simulation, the bias current of the NMOS-input amplifier increases by 25% from 0.35 to 0.38 V, hence affecting the tdelay and the RxO’s frequency. Still, the 0.35–0.38 V range is sufficient for IoT devices powered by solar cells and installed in the typical indoor environment (e.g., home and office), as the open-circuit voltage of a solar cell varies 30 mV amid a change in light intensity of ~3× [51, 52]. If we relax the requirement on frequency stability or recalibration of the frequency at different VDD is feasible, the working range of the RxO can extend to 0.5 V and then limited by the breakdown voltage of the CMOS process (1 V) due to the voltage doubler and clock booster.

The RMS period jitter of the RxO is 800 ps (0.15% of TOSC) (Fig. 24a). The accumulated jitter increases at a rate of √N up to ~60 cycles, in which the thermal noise is the dominant noise source (Fig. 24b). When compared with [45], the high period jitter is attributable to the low supply voltage, low power, and different amplifiers handling the comparison in Ø1 and Ø2. Still, the RxO is appropriate for the devices in which ULV and ultra-low power are the priorities (e.g., wakeup receiver [35]). The long-term stability is 210 ppm (gating time >0.1 s). To characterize the supply noise rejection of the RxO, we superimpose a sinusoidal signal on VDD and measure the corresponding period jitter. In the presence of a 20 mVpp sinusoidal signal (1 kHz) at the supply, the period jitter of the RxO exhibits a value of 2 ns.

Fig. 24
2 graph of occurrences versus period error plots a conic shape wave, and accumulated jitter versus the number of cycles plots 2 increasing trends.

(a) Measured period jitter of the RxO (52,000 hits on the oscilloscope). (b) Accumulated jitter of the RxO

We also characterize the startup time of the RxO, which is crucial if the RxO is power gating to further suppress the power consumption of the IoT node. As the asymmetric RC network requires finite clock cycles to produce a consistent output signal, the RxO’s frequency settles after the third clock pulse (Fig. 25a, b). Over the entire temperature range, the RxO enters the steady state within 3.6 μs after enabling VDD (Fig. 25c).

Fig. 25
3 graph of voltage versus time plots the chaotic fluctuating trends, frequency versus time plots an increasing trend, and start-up time versus temperature plots a de-reading trend.

(a) Startup waveform of the RxO, with VDD switched on at t = 0 s. (b) Transient frequency during startup. The RxO reaches steady state within three clock cycles or 3.6 μs after enabling VDD. (c) The startup time of the RxO at different temperatures

Herein we benchmark the RxO using two FoM. First, we evaluated the RxO using the FoM proposed in [44]

$$ \textrm{Fo}{\textrm{M}}_1=10\log \left(\frac{f\bullet {T}_{\textrm{range}}}{\textrm{Power}\bullet \textrm{TC}}\right), $$
(23)

with the temperature range Trange. This FoM takes into account the trade-off among f, power, Trange, and TC. The FoM1 of the RxO is 181 dB, which is comparable to the state of the art in spite of the ULV VDD of 0.35 V. Then, we evaluated the RxO using the conventional FoM:

$$ \textrm{Fo}{\textrm{M}}_2=\textrm{PN}-20\log \left(\frac{f}{f_{\textrm{offset}}}\right)+10\log \left(\frac{\textrm{Power}}{1\ \textrm{mW}}\right), $$
(24)

where PN is the phase noise at the offset frequency from the carrier foffset. The PN of the RxO at 10 kHz offset is −68.4 dBc/Hz, resulting in an FoM2 of −143.4 dBc/Hz.

Table 3 summarizes the performance of the RxO and compares it with recent art. This work is the first sub-0.5 V temperature-resilient (<2.5%) RxO achieving a high power efficiency of 667 fJ/cycle (Fig. 26). When compared with the RxO with a symmetric swing-boosted RC network [45], this RxO operates at a 4× less VDD, while achieving a comparable TC after compensation.

Table 3 Performance summary and comparison with the state-of-the-art RXOs
Fig. 26
A graph of energy efficiency versus supply voltage. It plots the comparison with fully integrated oscillators for frequency-locked loops and higher oscillating frequency.

Comparison with state-of-the-art fully integrated oscillators. Red circle, relaxation oscillator; blue circle, frequency-locked-loop type oscillator. A larger circle implies a relatively higher oscillating frequency. The figure only shows selected oscillators with frequencies between 0.1 and 10 MHz

4 Conclusions

This chapter detailed the analysis and design of two ULV MHz-range clock references for different purposes, with both clock references implemented and taped out in deep-submicron CMOS, exhibiting well-founded and pioneering measurement results. The first is a regulation-free sub-0.5 V XO for energy-harvesting BLE radios. We introduced two circuit techniques, dual-mode gm and SSCI, to reduce the startup time ts and energy Es. The dual-mode gm exploits the inductive feature of three-stage gm (AXO-3) to counteract the crystal’s CS during the startup and the low-noise feature of one-stage gm (AXO-1) to preserve the PN in the steady state. The XO prototyped in 65 nm CMOS has a compact area (0.023 mm2) that is >3.1× smaller than the prior art. The measured ts and Es of the XO, with a 24 MHz crystal, are 400 μs and 14.2 nJ, respectively. The frequency stability against voltage (0.3–0.5 V) is 17.9 ppm and temperature (−40–90 °C) is 14.1 ppm; both conform to the BLE standard.

The second clock reference is a 2.1 MHz temperature-resilient RxO with a 0.35 V supply voltage for ultra-low-power IoT nodes. We jointly design an asymmetric swing-boosted RC network and a dual-path comparator to tackle the challenges of ULV (<0.5 V) operation. The open-loop delay generator compensates for the temperature-sensitive delay of the comparator. Fabricated in 28 nm CMOS, it has an active area of only 5200 μm2 and achieves the best energy efficiency of 667 fJ/cycle among the previously reported MHz-range RxOs. Further, it also has a high figure of merit of 181 dB in spite of the ULV headroom and can settle within 3.6 μs after enabling the supply voltage.