ACSP · Analog Circuits And Signal Processing

# Rui Paulo da Silva Martins Pui-In Mak *Editors*

# Analog and Mixed-Signal Circuits in Nanoscale CMOS



# **Analog Circuits and Signal Processing**

#### Series Editors

Mohammed Ismail, Khalifa University, Dublin, OH, USA Mohamad Sawan, Westlake University, Hangzhou, Zhejiang, China

The Analog Circuits and Signal Processing book series, formerly known as the Kluwer International Series in Engineering and Computer Science, is a high level academic and professional series publishing research on the design and applications of analog integrated circuits and signal processing circuits and systems. Typically per vear we publish between 5-15 research monographs, professional books, handbooks, and edited volumes with worldwide distribution to engineers, researchers, educators, and libraries. The book series promotes and expedites the dissemination of new research results and tutorial views in the analog field. There is an exciting and large volume of research activity in the field worldwide. Researchers are striving to bridge the gap between classical analog work and recent advances in very large scale integration (VLSI) technologies with improved analog capabilities. Analog VLSI has been recognized as a major technology for future information processing. Analog work is showing signs of dramatic changes with emphasis on interdisciplinary research efforts combining device/circuit/technology issues. Consequently, new design concepts, strategies and design tools are being unveiled. Topics of interest include: Analog Interface Circuits and Systems; Data converters; Active-RC, switched-capacitor and continuous-time integrated filters; Mixed analog/digital VLSI;Simulation and modeling, mixed-mode simulation; Analog nonlinear and computational circuits and signal processing; Analog Artificial Neural Networks/ Artificial Intelligence; Current-mode Signal Processing; Computer-Aided Design (CAD) tools; Analog Design in emerging technologies (Scalable CMOS, BiCMOS, GaAs, heterojunction and floating gate technologies, etc.); Analog Design for Test; Integrated sensors and actuators; Analog Design Automation/Knowledge-based Systems; Analog VLSI cell libraries; Analog product development; RF Front ends, Wireless communications and Microwave Circuits; Analog behavioral modeling, Analog HDL.

Rui Paulo da Silva Martins • Pui-In Mak Editors

# Analog and Mixed-Signal Circuits in Nanoscale CMOS



*Editors* Rui Paulo da Silva Martins University of Macau Macao, China

Pui-In Mak University of Macau Macao, China

 ISSN 1872-082X
 ISSN 2197-1854
 (electronic)

 Analog Circuits and Signal Processing
 ISBN 978-3-031-22230-6
 ISBN 978-3-031-22231-3
 (eBook)

 https://doi.org/10.1007/978-3-031-22231-3
 ISBN 978-3-031-22231-3
 (eBook)

 ${\ensuremath{\mathbb C}}$  The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To all our Families.

### Foreword

The world is changing rapidly, and the speed of change depends on modern electronic technologies and their fast advancements. Electronics, or microelectronics products, now govern all modern devices and apparatuses. After the microprocessor's invention, the continuous reduction in size of transistors until almost the physical limit mainly favored digital electronics. Subordinately, there was a development of disciplines that refer to analog electronics. Analog developments have had to adapt to limitations imposed by technological advancements but have also exploited the advantages offered by new technologies.

If we consider the engineering design aspect, we see that there is a substantial difference between digital and analog design. While digital design, which allows a high level of automation, requires innovation at the architectural level, analog design requires mental processing and inventiveness at a much lower complexity, reaching the transistor and layout levels. Indeed, different applications require the use of specific design techniques and strategies. Moreover, since analog functions of a very different nature must operate on the same chip, as happens for the so-called system on chip (SoC), the designers must have a broad spectrum of knowledge while possessing a specific familiarity with their sector of activity.

Market demands have brought requests for various analog engineering skills. They must know how to convert analog to digital, make circuits that arrive at microwave frequencies, and obtain amplification with extremely low noise or handling power at very high or ultra-low levels.

For the above, research laboratories with increasing capability and specific competencies have been born. These laboratories of excellence produce state-of-the-art results, also presented at primary conferences, and published in prestigious scientific journals. More important, their activity indicates the path of innovation in industries. The State Key Lab of Analog and Mixed Signal VLSI of Macau is an example of a scientific reality born and grown to a top level in a few years.

Transferring knowledge from the research and development laboratories to the productive world is a fundamental step of innovation. Unfortunately, advanced knowledge is not available on the shelf and is often tacit (i.e., owned by the researcher and not codified), hence the vital necessity for methods that facilitate the process. In the digital field, the need is less felt for two reasons. Few industrial players dominate the market and have robust R&D activity internally. The architectural studies for realizing with billions of transistor digital processing functions can be codified and studied using a hierarchical approach. Design automation supports the project phases at a lower level.

Conversely, the transfer of analogical knowledge is problematic. Textbooks provide essential knowledge, and scientific articles summarize advanced knowledge. The status suggests that the activity of an analog designer is like a craftsman (or an artist) that uses their own experience (or inventiveness). There is a gap between basic and advanced knowledge; overcoming this isn't easy.

The knowledge gained from studying textbooks does not go into detail and does not teach the "tricks" that are fundamental to the success of an advanced design. Scientific articles often miss relevant features, and the used wording is for an already experienced audience. Also, they use procedures and research methodologies that only the specialized community often understands. Furthermore, the mathematical foundations are frequently believed to be known and referred to other publications, indicated as bibliographic references.

All this constitutes a barrier to an effective transfer of analog knowledge. How to fill the gap? The procedure is complex because it requires harmonizing the role and needs of two interacting realities, to increase the visibility of research centers, and to identify stimulating research topics. It also needs to collaborate on medium- and long-term issues and educate future engineers and researchers capable of conveying the tacit knowledge accumulated during the study and research periods.

The first essential element for the complex process indicated above is the present book which, as well illustrated in the introduction, describes in suitable detail the knowledge developed in various fields of microelectronics by many scientists of a top research center. The book, other than the desire to "fill the gap," favors the cooperation between research and production.

Pavia, Italy Macao, SAR, China Franco Maloberti

## Acknowledgment

All the research works have been funded by the University of Macau and partially, under different projects, by the Science and Technology Development Fund (FDCT), Macao SAR, China.

We would like to thank Kass Chow for her precious work in the formatting and editing of the text.

## Introduction

General purpose integrated circuits (ICs) emerged in the early 1960s, and the application-specific integrated circuits (ASICs) gained traction in the IC market during the 1980s, highly influencing until now the world of ICs. ASICs are customized ICs for a particular application or end use responsible for expanding the semiconductor industry, changing the respective business model, and significantly increasing IC designs and the working opportunities especially for analog design engineers. ASICs also influenced the whole ecosystem of semiconductor system design, fabrication, and manufacturing, testing, and packaging, as well as the CAD tools. They are completely different from other standard ICs like microprocessors or memories specifically designed for a wide range of applications. On the other hand, analog and mixed-signal ASICs contain both analog and digital circuits, as well as a key building block that is the data converter, in the same chip. Their design allows engineers to explore a great potential to reduce complex multiple IC chips, minimize costs, protect intellectual property, improve reliability and performance, as well as increase miniaturization, bringing down the power consumption. Real-life applications include smart mobile phones, sensor systems with on-chip standard digital interfaces, voice-related signal processing, charge controllers for lithium-ion batteries, unmanned aerial vehicles (or drones), automotive and other electrical vehicles, aerospace electronics, and the fast-developing Internet of Everything (IoE). All this, in a global network that involves communications among the users and the whole universe of around 50 billion electronic gadgets worldwide. Essential in such electronics infrastructure are the brain (CPU), the memory, and the senses (analog/ digital interface with audio, vision, and sensors) because a brain does not work without a sensing system. This ubiquitous network operates with data acquired from analog sources, thus connecting two different realities, the analog (physical/real) and the digital (metaverse/virtual) worlds. Since the interface between the two realms plays with analog signals, the most critical building blocks are high-performance radios, power-efficient RF and mm-wave circuits, ultra-low-voltage clock references, low-power and high-performance data converters, integrated energy harvesting interfaces, fully integrated power converters, and low-dropout regulators.

All these circuits need to exhibit high-quality performance with low power consumption, high energy efficiency, and high speed, thus enabling a reliable and consistent development of the IoE while enlarging its frontiers. Since the total market value of the A/D interface is in excess of 20 Billion USD per year, this imposes a huge pressure in the design area with a high demand for analog design engineers. Nevertheless, although this opens a vast field of opportunities for those engineers, it also constitutes a huge challenge for them because it implies a continuous knowledge update to accompany the fast pace of IC technology development down to the nanometer scale in CMOS. Mixed-signal ASIC design offers engineers the possibility of putting their creativity into practice to come about with innovative solutions. Then, the main objective of this book is to present state-of-the-art designs, all based in material from the two top publishing electronics outlets, the *IEEE* International Solid-State Circuits Conference (ISSCC) and the IEEE Journal of Solid-State Circuits (JSSC), adequate for the applications referred above and that are appropriate to stimulate and well equip the mind of future skillful analog design engineers. The authors list of this book also reveals a high quality of international collaboration. This book includes eight chapters organized as follows. Chapter "High-Performance SAW-Less TDD/FDD RF Front-Ends" presents highperformance radio frequency (RF) transceiver (TXR) front-ends, describing first a SAW-less TXR for multiband TDD communications and next a fully integrated multiband FDD SAW-less transmitter (TX) for 5G New Radio (5G-NR). Chapter "Power-Efficient RF and mm-Wave VCOs/PLL" addresses power-efficient RF and mm-wave VCOs, presenting different techniques that enhance the performance of the oscillator, as well as introducing reference-spur-reduction techniques for the subsampling PLL. Chapter "Ultra-Low-Voltage Clock References" puts forward designs and measurement results of two ultra-low-voltage clock references in deep-submicron silicon processes, introducing a regulation-free sub-0.5 V crystal oscillator for energy-harvesting Bluetooth Low Energy (BLE), and also demonstrating a fully integrated 0.35-V temperature-resilient relaxation oscillator using an asymmetric swing-boosted RC network. Chapters "Low-Power Nyquist ADCs" and "High-Performance Oversampling ADCs" give out two types of analog-todigital converters, namely low power Nyquist and high-performance oversampling, respectively. Chapter "Low-Power Nyquist ADCs", in particular, starts by describing two high-performance pipelined ADCs, a 12-bit SAR-assisted three-stage pipelined ADC with an open-loop Gm-R-based residue amplifier running at 1GS/s, and a 3.3-GS/s 6-bit fully dynamic pipelined ADC using a linearized dynamic amplifier. Besides, it also presents two time-domain ADCs different from conventional voltage-domain ADCs: the first is a 13-bit hybrid ADC which combines a SAR ADC with a time-to-digital converter (TDC), and the second is an 8-bit 10-GS/s time-domain ADC that aggregates four time-interleaved channels. Chapter "High-Performance Oversampling ADCs" initially describes a sturdy multi-stage noise-shaping (MASH) continuous time (CT)-delta sigma modulator (DSM), followed by the analysis of the preliminary sample and quantization technique and finalizing with two different noise-shaping pipeline-SAR ADCs. Chapter "Integrated Energy Harvesting Interfaces" introduces different switchedcapacitor (SC) power converters for several AC-type and DC-type energy-harvesting interfaces that allow full integration, target high system efficiency, and small size for the highly miniaturized Internet of Things (IoT). Chapter "Fully Integrated Switched-Capacitor Power Converters" hands out fully integrated switched-capacitor power converters discussing the topology, analyzing the power conversion losses, introducing techniques to reduce gate-drive switching and parasitic losses, and comparing centralized and distributive clock generation methods for multiphase SC converters. It also presents practical design examples of a SC converter-ring and a multi-output SC converter. Finally, chapter "Hybrid Architectures and Controllers for low-dropout regulators, introducing classic LDO control methods and power stage selection, it also details examples of analog-assisted and hybrid control digital LDOs, as well as ampere-level switching LDO for high-performance multi-core processors.

Rui Paulo da Silva Martins On leave from Instituto Superior Técnico Universidade de Lisboa Lisbon, Portugal

State-Key Laboratory of Analog and Mixed-Signal VLSI / Institute of Microelectronics (IME) Faculty of Science and Technology – Department of Electrical and Computer Engineering University of Macau Macao, SAR, China

Pui-In Mak State-Key Laboratory of Analog and Mixed-Signal VLSI / Institute of Microelectronics (IME) Faculty of Science and Technology – Department of Electrical and Computer Engineering University of Macau Macao, SAR, China

# Contents

| Part I Radio Front-Ends and Clock References                                                                                  |     |  |  |  |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------------|-----|--|--|--|--|--|--|
| High-Performance SAW-Less TDD/FDD RF Front-Ends                                                                               | 3   |  |  |  |  |  |  |
| Power-Efficient RF and mm-Wave VCOs/PLL                                                                                       | 51  |  |  |  |  |  |  |
| Ultra-Low-Voltage Clock References                                                                                            | 91  |  |  |  |  |  |  |
| Part II Data Converters                                                                                                       |     |  |  |  |  |  |  |
| Low-Power Nyquist ADCs                                                                                                        |     |  |  |  |  |  |  |
| High-Performance Oversampling ADCs<br>Chi-Hang Chan, Yan Zhu, Liang Qi, Sai Weng Sin,<br>Maurits Ortmanns, and Rui P. Martins | 181 |  |  |  |  |  |  |
| Part III Energy Harvesters and Power Converters                                                                               |     |  |  |  |  |  |  |
| Integrated Energy Harvesting Interfaces                                                                                       | 221 |  |  |  |  |  |  |
| Fully Integrated Switched-Capacitor Power ConvertersJunmin Jiang, Yan Lu, Wing-Hung Ki, and Rui P. Martins                    | 253 |  |  |  |  |  |  |
| <b>Hybrid Architectures and Controllers for Low-Dropout Regulators</b><br>Xiangyu Mao, Mo Huang, Yan Lu, and Rui P. Martins   | 281 |  |  |  |  |  |  |
| Index                                                                                                                         | 309 |  |  |  |  |  |  |

# Part I Radio Front-Ends and Clock References

# High-Performance SAW-Less TDD/FDD RF Front-Ends



Gengzhen Qi, Pui-In Mak, and Rui P. Martins

#### 1 Introduction

In this chapter, we introduce two high-performance radio frequency (RF) transceiver (TXR) front-ends. The first is an SAW-less TXR for multiband TDD communications that employs a novel N-path switched capacitor (SC) gain loop; another is a fully integrated multiband FDD SAW-less transmitter (TX) for 5G new radio (5G-NR).

To achieve an area-efficient SAW-less wireless TXR for multiband TDD communications, we propose an N-path SC gain loop. Unlike the direct-conversion transmitter (TX: BB filter  $\rightarrow$  I/Q modulation  $\rightarrow$  PA driver) and receiver (RX: LNA  $\rightarrow$  I/Q demodulation  $\rightarrow$  BB filter) with such functions arranged in an openloop style, here we unify the signal amplification, bandpass filtering, and I/Q

G. Qi (🖂)

P.-I. Mak State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao, SAR, China e-mail: pimak@um.edu.mo

R. P. Martins

School of Microelectronics Science and Technology, Sun Yat-sen University, Guangdong, China

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao, SAR, China e-mail: qigzh@mail.sysu.edu.cn

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao, SAR, China

On leave from Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_1

(de)modulation in a closed-loop formation, being reconfigurable as a TX or RX with an LO-defined center frequency. The key advantages are the multiband operation capability in the TX mode and high resilience to OB blockers in the RX mode. Fabricated in 65-nm CMOS, the TXR prototype consumes up to 38.4 mW (20 mW) in the TX (RX) mode at the 1.88-GHz LTE band2. The LO-defined center frequency covers >80% of the TDD-LTE bands with neither on-chip inductors nor external input matching components. By properly injecting (extracting) the signals into (from) the N-path SC gain loop, the TX mode reaches an -1-dBm output power, a -40-dBc ACLR<sub>EUTRA1</sub>, and a 2.0% EVM at 1.88 GHz, while showing a -154.5-dBc/Hz OB noise at 80-MHz offset. In the RX mode, we measured a 3.2-dB NF and a + 8-dBm OB-IIP<sub>3</sub>. The active area (0.038 mm<sup>2</sup>) of the TXR is 24 times smaller than the state-of-the-art LTE solutions.

Besides that, we also report a fully integrated multiband FDD SAW-less TX for 5G-NR. It features the following: (1) a bandwidth-extended N-path filter modulator (BW-Ext FIL-MOD) to enable high-Q bandpass filtering at a flexible RF, with its bandpass characteristic enhanced through the synthesis of a complex pole pair via merging positive and negative feedback networks (PFN/NFN), thus surmounting the trade-off between the passband flatness and out-of-band (OB) rejection; (2) an isolated baseband (BB) input network to avoid the mutual loading effect between the BW-Ext FIL-MOD and itself; and (3) a transimpedance amplifier (TIA)-based power amplifier driver (PAD) to absorb both the bias and signal currents of the BW-Ext FIL-MOD for better linearity and power efficiency. Fabricated in 28-nm CMOS, the TX manifests a 20-MHz passband BW and a consistently low OB noise ( $\leq -157.5 \text{ dBc/Hz}$ ) for different 5G-NR bands between 1.4 and 2.7 GHz. The circuit exhibits sufficient output power (3 dBm) and high TX efficiency (2.8–3.6%) concurrently with high linearity (ACLR<sub>1</sub> <44 dBc and EVM <2%). The active area is 0.31 mm<sup>2</sup>.

The organization of this chapter is the following:

Section 2 introduces an N-Path SC gain loop as a SAW-less TXR for multiband TDD communications. Section 2.1 introduces the properties of the SC gain loop that holds a number of promises of being a reconfigurable TXR. Section 2.2 describes how it is possible to embed the gain-boosted N-path technique into the SC gain loop to build the SAW-less TX mode. In Sect. 2.3, we report the SAW-less RX mode utilizing a switched BB extraction technique to improve further the NF. Section 2.4 describes the LO (local oscillator) generator (LOGEN) with the TX–/RX-mode control logics, followed by the measurement results in Sect. 2.5. Finally, Sect. 2.6 draws the conclusions.

Section 3 introduces the SAW-less multiband transceiver using an N-path SC gain loop. Section 3.1 introduces the BW-Ext FIL-MOD and compares it with the original FIL-MOD. Section 3.2 details the TX design and analyzes it using both the functional view approach and LTI-based transfer function. Other details are for the isolated BB input network, wideband TIA-based PAD, and low-power LOGEN. Section 3.3 summarizes the measurement results and Sect. 3.4 presents the conclusions.

#### 2 SAW-Less Multiband Transceiver Using an N-Path SC Gain Loop

In order to develop multiband cellular radios at low cost, on-chip N-path switched capacitor (SC) filters [1] rekindled as a promising replacement of the off-chip SAW filters. The improved speed and parasitic effects of ultra-scaled CMOS technologies enable the N-path SC filters to provide tunable high-Q filtering over a wide range of frequencies [2, 3]. Beyond filtering, N-path mixing also facilitates wideband receivers (RXs) to achieve input matching and harmonic rejection [4-6]. The mixer-first wideband RX [4] shows a high out-of-band (OB)-IIP<sub>3</sub> (+25 dBm) while covering a wide RF range (0.1–2.4 GHz), but the NF (5 dB) becomes a hard trade-off with the power consumption (70 mW) due to the absence of RF gain. The noise-canceling RX [5] balances better between the OB-IIP3 (+13.5 dBm) and NF (1.9 dB), but its dual-path topology involves extra mixing and baseband (BB) circuitries, consuming more area (1.2 mm<sup>2</sup>) and power (78 mW). Recently reported, a gain-boosted N-path technique led to a single-mixing blocker-tolerant RX [6] with competitive OB-IIP3 (+13 dBm) and NF (1.5–2.9 dB) at smaller area (0.028 mm<sup>2</sup>) and power (11 mW). Regrettably, it demands a high gain-boosting factor (200 mS), which strongly restricts the signal bandwidth (BW = 2.6 MHz) and RF coverage (<1.5 GHz); both are inadequate for modern cellular standards such as the LTE.

SAW-less transmitters (TXs) confront a different challenge as the effort is on lowering the OB noise and spectral leakage, especially at the nearby RX bands (<100-MHz offset). The SAW-less TX in [7] exploits direct quadrature voltage modulation to lower the OB noise (-159 dBc/Hz at 40-MHz offset) and raise the power efficiency (5.7%). Its power amplifier driver (PAD) is the only gain stage, rendering the OB noise primarily dominated by the thermal noise of the passive mixers and phase noise of the LO generator. Nevertheless, its PAD relies on a passive LC load to deliver the required output power ( $P_{out}$ ) and suppress the OB harmonics, being inflexible to support multiband communications. In fact, recent SAW-less multiband TXs still rely on dedicated baluns to extend the RF coverage. An example is the current-mode SAW-less TX in [8] that exhibits a -158-dBc/Hz OB noise at the RX band (30-MHz offset), but demanding large die area (1.06 mm<sup>2</sup>), and power (96 mW) for its mixer and voltage-to-current converter that have to be linear and low noise.

We present an N-path SC gain loop as a SAW-less TXR for multiband TDD communications, entailing no on-chip inductors or external passives for input matching. Unlike the typical TXRs with the building blocks cascaded in an open-loop style to build up the RF-to-BB (or BB-to-RF) signal-processing chain, here the N-path SC gain loop operates in a closed-loop style to unify the TX and RX functions, allowing a very compact multiband TXR (0.038 mm<sup>2</sup>). Measured at the LTE-band2 (1.88 GHz) and band5 (0.836 GHz), the TX mode exhibited an ACLR<sub>EUTRA1</sub> <-40 dBc, an EVM  $\leq 2.1\%$ , and a low output noise of <-154 dBc/Hz at 80-MHz offset. The RX mode drew 20-mW power and exhibited a 3.2-dB NF and a + 8-dBm OB-IIP<sub>3</sub> at 1.88 GHz.

#### 2.1 Principle of the SC Gain Loop as a TXR

Comparatively, a SAW-less RX should be able to amplify a weak in-band (IB) signal in the presence of large OB blockers, whereas a SAW-less TX ought to be able to deliver a large IB signal with low OB noise and spectral leakages. Such discrepancy inspires the exploration of a RX-TX-compatible N-path technique to implement a reconfigurable TXR suitable for TDD-LTE (even FDD-LTE, by duplicating the TXR as separated TX and RX modes). To build this concept, Fig. 1a presents an SC gain loop. We create the primary functions of TX and RX with a gain stage rounded by a SC network that consists of a capacitor  $C_{\rm F}$  and two switches. Empirically, we can recognize "upmix + gain" as a TX function (Fig. 1b), with the BB signal injected between the left-hand switch and  $C_{\rm F}$  and the RF signal extracted from the gain stage's output. Based on the Miller effect, the extra downmix path helps reusing the gain stage to boost the effective  $C_{\rm F}$  and reduce the effective on-resistance of the two switches. Besides, with the right-hand switch, the gain stage's output sees a large RF impedance, avoiding any unwanted gain drop. By transforming the SC network into an N-path SC network, we can embed the gain-boosted N-path technique [9] into such a gain loop, unifying the key TX functions in a closed-



**Fig. 1** (a) An SC gain loop performs gain, downmix and upmix in a self-feed manner. It can operate as a (b) TX under BB injection and RF extraction or (c) RX under RF injection BB extraction. The extra downmix and upmix paths allow gain-boosted N-path filtering. For RX, there are two BB extraction solutions, BBL (without right-hand switch) and BBR

loop style: (1) signal amplification, (2) high-Q bandpass filtering, and (3) I/Q modulation.

Figure 1c shows that the SC gain loop is reconfigurable as a RX by using the "gain + downmix" function. Specifically, with the RF signal injected at the gain stage's input, we can extract the BB signal around  $C_{\rm F}$ . Similar to the TX mode, the extra upmix path allows it to be compatible with the gain-boosted N-path technique. The resultant RX essentially offers the key functions: (1) signal amplification, (2) high-Q bandpass filtering, (3) I/Q demodulation, and (4) input impedance matching.

#### 2.2 N-Path SC Gain Loop as a TX

#### **TX-Mode Architecture**

From Fig. 2, with N = 4, the N-path SC gain loop becomes a practical TX by adding four passive RC filters  $R_{BT}C_{BT}$  to receive the general four-phase BB signals  $V_{BB}$ ,  $_{TX1 - 4}$  (i.e., differential and I/Q). Switches SW<sub>L</sub> and SW<sub>R</sub> perform the upmix and downmix functions, respectively, around the gain stage ( $G_{mRF}$ ).  $G_{mRF}$  is an inverting amplifier that ensures the gain loop is under negative feedback. Outside the gain loop, the utilization of a wideband PAD boosts the gain, provides isolation, and drives the off-chip 50- $\Omega$  load.

With SW<sub>L</sub> and SW<sub>R</sub> activated periodically by a four-phase nonoverlap LO, the capacitor  $C_F$  charges with an in-phase BB voltage at one side, while charging an amplified out-phased BB voltage on the other side. As such, the loop gain due to the Miller effect boosts the effective capacitance of  $C_F$  at the input. This mechanism not only reduces the chip area for  $C_F$  but also its parasitic effects, allowing the TX mode to operate at a higher RF. Another key aspect is that we embed high-Q bandpass filtering at both  $V_{i, TX}$  and  $V_{o, TX}$  sharing one N-path SC network. Further, we sum the IB RF voltage in-phase over a switching period, while the OB RF voltage cancels out at both  $V_{i, TX}$  and  $V_{o, TX}$ . Unlike the typical passive N-path filter with the OB rejection limited by the on-resistance of the switches, here the loop gain offered by  $G_{mRF}$  alleviates such a limit due to the on-resistance division by the open loop gain (i.e., high OB rejection without consuming large LO power).

#### Functional View of the TX Mode

Figure 3a presents, for intuitive understanding, a functional view of the TX mode. It is not an equivalent circuit, since the "I/Q modulation" and "high-Q bandpass filtering" appear unmerged as two cascaded functions to illustratively make a comparison with [7]. The I/Q modulation is alike a typical TX, synthesizing an RF signal at  $V_{\rm i, TX}$  from a four-phase BB signal at  $V_{\rm BB, TX1 - 4}$ .  $V_{\rm i, TX}$  virtually passes through a high-Q bandpass filter that can reject the OB noise first at  $V_{\rm i, TX}$  and second



**Fig. 2** Four-path SC gain loop as a TX. Injection of the four-phase BB ( $V_{BB, TX1 - 4}$ ) via  $R_{BT}$ . The PAD extracts  $V_{o, TX}$  and drives the 50  $\Omega$ . Gain-boosting the N-path filter by  $G_{mRF}$  realizes high-Q bandpass responses at  $V_{i, TX}$  and  $V_{o, TX}$ 

at V<sub>0, TX</sub>. We can model the N-path SC network as a linear-time-invariant (LTI) RLC resonator around the passband [2], where the tunable inductor represents a tunable center frequency. Interestingly, when we omit the extra downmix path in Fig. 2, the closed-loop TX returns to an open-loop style similar to [7] that aims at low OB noise emission by direct quadrature voltage modulation (Fig. 3b). The narrowband PAD in [7] exploits a passive LC resonator for unwanted harmonic attenuation (<-40 dBc), and hence the output response has a low Q and fixed center frequency. Unlike [7], here we reuse the gain created by  $G_{mRF}$  to boost the Q of the bandpass responses at  $V_{i, TX}$  and  $V_{o, TX}$ , resulting in much stronger OB noise suppression. To exemplify it, Fig. 4 plots the simulated gain responses at  $V_{i, TX}$  and  $V_{o, TX}$  at 2 GHz. Without the extra downmix path, V<sub>i, TX</sub> offers only 11-dB OB rejection with no further improvement added at  $V_{0, TX}$ . For the proposed TX, the rejection at  $V_{i, TX}$  improves to 22.5 dB, with an extra rejection of 7.8 dB added at  $V_{o, TX}$ . There is a 1.6-dB gain drop at  $V_{0,TX}$  in the TX, due to the finite frequency-translated impedance of the extra downmix path. Also, the gain response in Fig. 4 refers the four-phase BB to the single-phase RF. If considering the single-phase BB to the single-phase RF, the gain value is 9 dB higher. This fact applies to all plotted gain responses of the TX mode (presented later).



**Fig. 3** (a) Functional view of the TX, we can compare it with [7] (b) that uses a passive LC resonator for its narrowband PAD. This work features a gain-boosted N-path filter and a wideband PAD to allow high-Q filtering and LO-defined center frequency, being more flexible for multiband operation

#### **TX-Mode Open-Loop Equivalent Model**

To simplify the quantitative study, we develop an open-loop equivalent model of the TX mode illustrated in Fig. 5. Inspired by the principle of Miller decomposition, the N-path SC network has a subdivision into two parts referring to the gain stage's output and input. The former is alike a typical passive N-path filter [9] hanged on  $V_{o,TX}$ , while the latter becomes four separated SC networks (i.e., single path, single phase) placed between the BB passive filter  $R_{BT}C_{BT}$  and voltage mixer  $SW_L$ . We can model the input-referred on-resistance of SW<sub>R</sub> ( $R_{SWR, i}$ ) and the input-referred Miller capacitance of  $C_F$  ( $C_{F, i}$ ) at BB as



Fig. 5 Proposed open-loop TX model, with the gain-boosted N-path filter decomposed into two N-path filters using the principle of Miller decomposition [9].  $R_F$  (9.3 k $\Omega$ ) is large enough and omitted

$$\begin{cases}
R_{SWR,i} = \frac{R_{SWR} / / R_{F} + R_{L}}{1 + G_{mRF}R_{L}} \\
R_{SWR,o} = \frac{R_{SWR} / / R_{F} + R_{SWR}}{1 + G_{mRF}R_{SWR}} \\
C_{F,i} = |C_{F} \cdot \frac{(1 - G_{mRF}R_{F})R_{L}}{R_{F} + R_{L}}| \\
C_{F,o} = C_{F}
\end{cases}$$
(1)



**Fig. 6** Comparison of modeled and simulated gain responses: (a) open-loop  $V_{i, TX}$  in Fig. 5 versus the closed-loop  $V_{i, TX}$  in Fig. 2 and (b) open-loop  $V_{o, TX}$  in Fig. 5 versus the closed-loop  $V_{o, TX}$  in Fig. 2

where  $R_{\text{SWR, o}}$  is the output-referred switch's on-resistance,  $C_{\text{F, o}}$  is the Miller capacitance at  $V_{\text{o, TX}}$ , and  $R_{\text{L}}$  is the load impedance of  $G_{\text{mRF}}$ . The modeled  $C_{\text{F, i}}$  in Eq. (1) is equal to  $C_{\text{F}}$  multiplied by the open-loop gain of  $G_{\text{mRF}}$ . The enlarged  $C_{\text{F, i}}$  implies less physical capacitors to realize a specific BW. For instance, with  $G_{\text{mRF}} = 130 \text{ mS}$ ,  $R_{\text{F}} = 9.3 \text{ k}\Omega$ ,  $R_{\text{L}} = 38.4 \Omega$ , and  $C_{\text{F}} = 8 \text{ pF}$ , the computed  $C_{\text{F, i}}$  is 39.7 pF which is ~5× more area efficient than the general passive N-path filter [2]. Also, the circuit suppresses the effective resistance  $R_{\text{SWR, i}}$  (10  $\Omega$ ) and  $R_{\text{SWR, o}}$  (11.3  $\Omega$ ) through  $G_{\text{mRF}}$ , improving the OB rejection.

To verify the accuracy of the model, Fig. 6a plots the gain responses at  $V_{i, TX}$  simulated with  $R_{BT} = 500 \Omega$ , where the modeled open-loop response fits well with the closed-loop, except that there is a 1.2-dB gain drop due to the input parasitic capacitance of  $G_{mRF}$ . The modeled gain response of the open-loop  $V_{o, TX}$  is accurate (Fig. 6b), which has a better OB rejection (~2 dB) as the far-out blockers see a smaller impedance at the open-loop  $V_{o, TX}$ .

#### Gain Response

Based on the open-loop equivalent model above, we can study the gain response from BB to  $V_{\text{RF, TX}}$  in two steps. Recalling Fig. 5, the gain stage  $G_{\text{mRF}}$  essentially isolates its preceding and following stages, allowing first the computation of the transfer function from BB to  $V_{i, TX}$ , followed by that from  $V_{i, TX}$  to  $V_{\text{RF, TX}}$ . For the former, we employed a simplified equivalent circuit (Fig. 7a). Since the input impedance seen from  $G_{\text{mRF}}$  is mainly capacitive ( $C_1$ ), we can denote the load impedance as  $Z_{L1}(\omega) = 1/j\omega C_1$ . Here, the angular frequency  $\omega$  is close to  $\omega_{\text{LO}}$ . In view of the BB, the input voltage  $V_{\text{BB, TX1} - 4}$  is in series with the BB impedance which becomes.



**Fig. 7** Simplified circuit for calculating the responses: (a) from BB to  $V_{i, TX}$  of the open-loop TX and (b) from current source  $I_{i, TX}$  to  $V_{RF, TX}$ .  $I_{i, TX}$  is equal to the transconductance  $G_{mRF}$  multiplied by  $V_{i, TX}$ 

$$Z_{\rm BB}(\omega - \omega_{\rm LO}) = R_{\rm BT} / \frac{1}{j(\omega - \omega_{\rm LO})C_{\rm BT}} / Z_{\rm F,i}(\omega - \omega_{\rm LO})$$
(2)

where  $Z_{\rm F, i}(\omega - \omega_{\rm LO})$  (Fig. 5) is the impedance of a single path decomposed from the extra downmix path. Since the SC circuit operates as a BB-to-BB gain response, we can ignore the high-order frequency components and represent  $Z_{\rm F, i}(\omega - \omega_{\rm LO})$  as.

$$Z_{\rm F,i}(\omega - \omega_{\rm LO}) \approx 4R_{\rm SWR,i} + \frac{1}{j(\omega - \omega_{\rm LO})C_{\rm F,i}}$$
(3)

which enhances the BB rejection due to the boosted capacitance of  $C_{\rm F, i}$ . Putting Eq. (3) into Eq. (2),  $Z_{\rm BB}(\omega - \omega_{\rm LO})$  expands to.

$$Z_{\rm BB}(\omega - \omega_{\rm LO}) = \frac{1}{1/R_{\rm BT} + j(\omega - \omega_{\rm LO})C_{\rm BT} + \frac{j(\omega - \omega_{\rm LO})C_{\rm F,i}}{1 + 4R_{\rm SWR,i}j(\omega - \omega_{\rm LO})C_{\rm F,i}}}$$
(4)

At center frequency with  $\omega = \omega_{LO}$ , the impedance is  $R_{BT}$  in Eq. (4), and when  $\omega$  moves away from  $\omega_{LO}$ , the impedance starts to roll off. The -3-dB BW approaches  $1/R_{BT}(C_{BT} + C_{F,i})$  if  $R_{SWR,i}$  is close to zero. In this case,  $C_{F,i}$  dominates the BB BW and the ultimate rejection is infinite. If enlarging  $R_{SWR,i}$ , the -3-dB BW widens and finally converges to  $1/R_{BT}C_{BT}$ .

According to the LTI expression of the up-converted RF voltage in [10], we can obtain the RF voltage  $V_{i, TX}$  transferred from  $V_{BB, TX1 - 4}$  in Fig. 7a as.



**Fig. 8** Comparison of gain responses: (a) simulated open-loop  $V_{i, TX}$  versus the approximation of Eq. (5) and (b) simulated open-loop  $V_{o, TX}$  versus the approximation of Eq. (8)

$$V_{i,TX}(\omega) = \frac{\sqrt{2}}{\pi} \frac{Z_{L1}(\omega) \cdot Z_{BB}(\omega - \omega_{LO}) / R_{BT}}{Z_{L1}(\omega) + R_{SWL}} \times \frac{e^{j\pi/4} \cdot V_{BB,TX1}(\omega - \omega_{LO}) + e^{-j\pi/4} \cdot V_{BB,TX2}(\omega - \omega_{LO})}{1 + \frac{2}{\pi^2} Z_{BB}(\omega - \omega_{LO}) \sum_{m=-\infty}^{+\infty} \frac{1}{(4m+1)^2 (Z_{L1}(4m\omega_{LO} + \omega) + R_{SWL})}}$$
(5)

where m is an integer. Figure 8a plots Eq. (5) for 2 GHz which matches well with the simulated curve spanning from 1.5 to 2.5 GHz. The term of infinite summation in Eq. (5) comprises the fundamental and higher odd-order harmonics. Since the fundamental term is dominant, we can simplify Eq. (5) as.

$$V_{i,TX}(\omega) \approx \frac{\sqrt{2}}{\pi} \frac{Z_{L1}(\omega) \cdot Z_{BB}(\omega - \omega_{LO})/R_{BT}}{Z_{L1}(\omega) + R_{SWL} + \frac{2}{\pi^2} Z_{BB}(\omega - \omega_{LO})} \times \left(e^{j\pi/4} \cdot V_{BB,TX1}(\omega - \omega_{LO}) + e^{-j\pi/4} \cdot V_{BB,TX2}(\omega - \omega_{LO})\right)$$
(6)

where  $V_{i, TX}(\omega)$  provides an intuitive view of the low-pass response of  $Z_{BB}(\omega - \omega_{LO})$  up-converted to  $V_{i, TX}$  as a high-Q bandpass response, with a -3-dB BW around twice of that from Eq. (4).

Similarly, to analyze the response from  $V_{i, TX}$  to  $V_{RF, TX}$  of the open-loop TX model, we developed a simplified circuit (Fig. 7b), with  $G_{mRF}$  modeled as a transconductor converting the input  $V_{i, TX}$  to an output current  $I_{i, TX}$ , expressed as  $I_{i, TX} = G_{mRF}V_{i, TX}$ .  $C_2$  is the input parasitic capacitance of the PAD.  $I_{i, TX}$  draws current from  $R_L$ ,  $C_2$ , and the output-referred N-path filter to create the RF voltage  $V_{o, TX}$ . As the PAD involves no frequency translation and its passband gain should be flat, our focus is on the response from  $V_{i, TX}$  to  $V_{o, TX}$ . Referring to Eq. (13), the impedance  $Z_{o, TX}(\omega)$  seen by  $V_{o, TX}$  is as follows:

$$Z_{o,TX}(\omega) = \frac{R_{SWR,o} \cdot Z_{L2}(\omega)}{R_{SWR,o} + Z_{L2}(\omega)} + \frac{\left(\frac{Z_{L2}(\omega)}{R_{SWR,o} + Z_{L2}(\omega)}\right)^2 \cdot \frac{2}{\pi^2} Z_{F,o}(\omega - \omega_{LO})}{1 + \frac{2}{\pi^2} Z_{F,o}(\omega - \omega_{LO}) \sum_{m=-\infty}^{+\infty} \frac{1}{(4m+1)^2 (Z_{L2}(4m\omega_{LO} + \omega) + R_{SWR,o})}}$$
(7)

where  $Z_{L2}(\omega) = R_L / \frac{1}{j\omega C_2}$  and  $Z_{F,o}(\omega - \omega_{LO}) = \frac{1}{j(\omega - \omega_{LO})C_{F,o}}$ . Thus,  $V_{o,TX}$  will be.

$$V_{o,TX}(\omega) = G_{mRF} Z_{o,TX}(\omega) \cdot V_{i,TX}(\omega)$$
(8)

From Fig. 8b, the prediction of Eq. (8) fits well with the simulations covering a 1-GHz span at LO = 2 GHz. By ignoring higher harmonics, we can simplify Eq. (7) just to a high-Q bandpass impedance as below:

$$Z_{o,TX}(\omega) = \frac{Z_{L2}(\omega)}{Z_{L2}(\omega) + R_{SWR,0}} \cdot \left( R_{SWR,o} + \frac{Z_{L2}(\omega)}{1 + j\frac{\pi^2}{2} C_{F,0}(R_L + R_{SWR,o})(\omega - \omega_{LO})} \right).$$
(9)

At the center frequency, Eq. (9) is equal to  $Z_{L2}(\omega)$ , and then, it is interesting that the output-referred N-path load does not bring any gain drop if the harmonics are out of concern [11]. However, the parasitic capacitance  $C_2$  induces a  $1/(1 + j\omega R_L C_2)$  gain drop (e.g., -0.6 dB at 2 GHz). Also, the -3-dB BW of the high-Q bandpass filtering is equal to  $4/\pi^2 C_{F_1} \circ (R_L + R_{SWR_1} \circ)$  when  $Z_{L2}(\omega)$  is resistive.

#### Noise Analysis

Most OB noise of the TX mode results from the thermal noises of the  $R_{BT}$ , the gain stage  $G_{mRF}$ , and the on-resistance  $R_{SWL}$ . As modeled in Fig. 9a, the thermal noise voltage  $V_{n, RBT}$  is in series with the BB impedance and experiences the same transfer function of the BB signals. As such, the power spectral density (PSD) at  $V_{n, oTX}$  due to  $R_{BT}$  is.

$$\overline{V_{n,R_{BT,oTX}}^2} = |H_{i,TX}(\omega)|^2 \cdot |H_{o,TX}(\omega)|^2 \cdot \overline{V_{n,R_{BT}}^2}$$
(10)

where we introduce  $|H_{i, TX}(\omega)|$  and  $|H_{o, TX}(\omega)|$  as the transfer functions from BB to  $V_{i, TX}$  and  $V_{i, TX}$  to  $V_{o, TX}$ , respectively, to represent Eqs. (5) and (8). The thermal noise power of  $R_{BT}$  is  $V_{n,R_{BT}}^2 = 4kTR_{BT}$ . The high-Q bandpass filtering in the N-path SC gain loop greatly suppresses the OB noise contribution from  $R_{BT}$ . Similarly, the output noise PSD due to  $R_{SWL}$  is.



Fig. 9 Simplified equivalent circuit for noise analysis of (a)  $R_{\rm BT}$  and  $SW_{\rm L}$  and (b)  $G_{\rm mRF}$ 

$$\overline{V_{n,R_{SWL,}oTX}^2} = \left(\frac{R_{BT}}{Z_{BB}(\omega - \omega_{LO})}\right)^2 \cdot |H_{i,TX}(\omega)|^2 \cdot |H_{o,TX}(\omega)|^2 \cdot \overline{V_{n,R_{SWL}}^2}$$
(11)

where  $\overline{V_{n,R_{sWL}}^2} = 4kTR_{SWL}$ . In Fig. 9b, we modeled the thermal voltage source of the  $G_{mRF}$  stage as an input-referred  $V_{n, GmRF}$ , and the corresponding noise power is  $\overline{V_{n,GmRF}^2} = 4kT/G_{mRF}$ .  $V_{n, GmRF}$  experiences the same transfer function as  $V_{n, iTx}$ ; thus the output noise PSD due to  $G_{mRF}$  is.

$$\overline{V_{n,GmRF,oTX}^2} = |H_{o,TX}(\omega)|^2 \cdot \overline{V_{n,GmRF}^2}.$$
(12)

Figure 10a, b exhibit the simulated output noises at  $V_{o, TX}$  due to  $R_{BT}$  and  $G_{mRF}$ , as well as due to SW<sub>L</sub> and SW<sub>R</sub>, respectively. When the offset frequency is beyond 70 MHz,  $G_{mRF}$  generates more noise than that of  $R_{BT}$  as it experiences less OB rejection within the N-path SC gain loop. When comparing it with SW<sub>L</sub>, SW<sub>R</sub> contributes with less noise due to the lack of amplification in the extra downmix path (Fig. 2). Furthermore, the output noises shown in Fig. 10 are almost flat, with the offset frequency referred over 80 MHz. Upsizing the switches SW<sub>L</sub>, R can lead to a lower output noise floor, at the expense of more LO power.

Figure 11 displays the simulated harmonic folding effect at  $V_{o, TX}$ . For N = 4, the nearest and strongest component that folds back to the desired band (around LO) is 3xLO and after is 5xLO [10]. Even though the TX is single-ended, the even harmonic folding is insignificant (simulated < -70 dB). Due to the high-Q bandpass filtering of the gain-boosted N-path technique around 3xLO and 5xLO, the far-out noise induced by the folding terms is much smaller than the IB that experiences no frequency translation. Thus, the harmonic folding effect is generally less important in the TX design.



Fig. 10 Simulated output noise power at  $V_{o, TX}$  with contribution from (a)  $R_{BT}$  and  $G_{mRF}$  and (b) SW<sub>L</sub> and SW<sub>R</sub>



#### **OB** Noise, Passband Roll-off, and Harmonic Emission

The BB resistor  $R_{\rm BT}$  plays a key role in balancing the performances in terms of signal BW, voltage gain, OB linearity, and OB noise. Intuitively, any resistors coupled with the N-path SC gain loop will degrade the Q of the passband responses at  $V_{i, TX}$  and  $V_{o, TX}$ . As plotted in Fig. 12a, a large  $R_{\rm BT}$  improves the OB rejection but at the expense of a gain drop in the passband due to the finite input impedance at  $V_{i, TX}$ . Simulated at 2 GHz, with  $R_{\rm BT}$  ranging from 100 to 800  $\Omega$ , the OB rejection increases but induces a 4-dB gain drop, whereas the -1-dB BW decreases from 24 to 8 MHz. The output noise at  $V_{o, TX}$  drops from 1.04 to 0.33 aV<sup>2</sup>/Hz at 80-MHz offset owing to the increased rejection from 7.3 to 19.3 dB (Fig. 12b). In fact, as  $R_{\rm BT}$  generates noise



**Fig. 12** (a)  $R_{\rm BT}$  should balance the signal BW, OB rejection, and noise. (b) The output OB noise reduces with increased OB rejection but it will saturate as a large  $R_{\rm BT}$  induces noise itself. (c) When RF frequency increases from 0.836 to 1.88 GHz, passband frequency shifting increases from 0.9 to 1.9 MHz while the gain droop decreases from 1.1 to 0.8 dB

itself, the rejection of OB noise will saturate eventually if raising only  $R_{\rm BT}$ . Here, the TX mode chooses a 500- $\Omega$   $R_{\rm BT}$  to reach 17.1-dB OB rejection and 0.42-aV<sup>2</sup>/Hz output noise at  $V_{\rm o, TX}$ . From SpectreRF simulations (pss + pnoise), the OB noise at  $V_{\rm RF, TX}$  is -157.7 dBm/Hz at 80-MHz offset, where the main contributors are  $R_{\rm BT}$  (24%),  $G_{\rm mRF}$  (20%), SW<sub>L, R</sub> + LO divide-by-4 circuitry (20%), and PAD (10%). The rest arises from the off-chip 50- $\Omega$  load and switches SW<sub>TX - RX</sub> for the TX-RX-mode control.

Mainly due to the input capacitor of  $G_{mRF}$ , the simulated passband frequency shifting is within 0.9–1.9 MHz when the RF frequency covers the range between 0.836 GHz (band5) and 1.88 GHz (band2) (Fig. 12c), with 1.1-dB passband roll-off at band5 and 0.8 dB at Band2 for LTE10. The impact of such roll-off characteristics on the EVM performance should be insignificant, since the EVM of an LTE signal is the RMS value of each resource block (RB)'s EVM and the BW (180 kHz) of each RB is much smaller than the signal BW (9 MHz for LTE10) [12]. To address this roll-off issue, we can apply a preemphasis digital equalizer to compensate the passband roll-off for the TX mode, similar to the post-emphasis equalizer used in the RX path [13]. In the digital baseband, the compensation of different passband roll-offs (0.3 dB range) according to the RF frequency is feasible at the cost of small area and power. In addition, the preemphasis digital equalizer can be an option to compensate the sharp roll-off region of the analog filter for [8], although without passband frequency shifting.

For the spectral purity,  $V_{o, TX}$  contains typical LO harmonic emission with a thirdorder harmonic rejection ratio (HRR<sub>3</sub>) of 9.5 dB for N = 4. Nevertheless, with the limited BW of the PAD and output bonding wire, the HRR<sub>3</sub> at the TX output  $V_{RF, TX}$ improves (Fig. 13), going up with frequency (e.g., 23 dB at 2 GHz). In addition, the single-ended PAD operating in class AB mode dominates the second-order harmonic distortion (HD<sub>2</sub>) at  $V_{RF, TX}$ . In fact, by properly matching the PAD's push-pull transistors, HD<sub>2</sub> can be <-37 dBc at a 0-dBm  $P_{out}$  from simulations (Fig. 13). Harmonic rejection N-path filtering [14] can be an option to further improve the harmonic emission. In practice, an off-chip PA loads the output of the PAD. For LTE applications, the commercial high-power PA (e.g., [15]) is narrowband and will



suppress all OB harmonics from its TX. Thus, the PA harmonic distortion still dominates the spectrum clearance at the final output. Finally, we use a single-pole multi-throw switch for the multiband TX to interface with different PAs.

#### **Other Implementation Details**

To enhance the power efficiency and avoid any internal gain nodes, we choose  $G_{mRF}$ as an inverter-based amplifier self-biased by a feedback resistor  $R_{\rm F}$ . The nonoverlap  $LO_1$  -  $LO_4$  (25% duty cycle) prevents the I/Q cross talk from degrading the linearity. The capacitor  $C_{\rm BT}$  essentially operates as a charge buffer at the BB side to relieve the gain drop at the RF side, due to the input capacitance of  $G_{mRF}$ . By switching SW<sub>L, R</sub>, the circuit will up-sample the filtered BB voltages to  $V_{i, TX}$  in sequence, thus seeing a high input impedance of  $G_{mRF}$  and allowing good linearity. In order to decouple the signal-handing ability of  $G_{mRF}$  to the overall TX output power, we further amplify the RF signal at  $V_{o, TX}$  by the wideband single-ended PAD before outputting  $V_{RF, TX}$ . The PAD, based on a push-pull cascode structure  $(M_{1-2})$ , contains the cascode transistors  $(M_{3-4})$  self-biased by a feedback resistor (Fig. 2). From the simulations, the class AB PAD achieves a -1-dB output BW of  $\sim 2.1$  GHz, which is adequate to cover >80% of the LTE-TDD/FDD bands from 0.7 to 2 GHz. With the PAD singleended, a 2.5-V supply allows better HD<sub>3</sub> (-43.4 dBc) and voltage gain (9.3 dB) while showing a 12.2% drain efficiency at a 0-dBm single-tone Pout. Both HD<sub>3</sub> (-37 dBc) and voltage gain (5.7 dB) will degrade with a 1.2-V supply applied and the cascode transistors removed.

As a reconfigurable TXR, the TX-RX-mode switches  $SW_{TX} - RX$  (Fig. 2) are critical and require careful sizing to minimize the parasitic effects, especially for the PAD that has an input capacitance of ~500 fF. With the PAD powered-down, we set

 $V_{B1} = \text{GND}$  and  $V_{B2} = V_{DD25}$  (2.5 V) for  $M_1$  and  $M_2$ , respectively, to share the voltage stress between  $M_1$  and  $M_4$  and to ensure device reliability [16].

Unlike the gain-boosted mixer-first RX in [6] that benefits from a large  $G_{mRF}$  (200 mS) to improve the NF and OB linearity, here we downsize  $G_{mRF}$  to 130 mS with the concerns of spectral regrowth and EVM in the TX mode. In addition, [6] employs a 0.7-V supply for power savings, but entailing large transistors to generate a 200 mS  $G_{mRF}$ , which strongly restricts the RF coverage due to a large parasitic capacitance. On the other hand, we design the parasitic capacitance to be 2.2× smaller than [6]. The loop gain offered by  $G_{mRF}$  reduces the physical size of  $C_F$  to 8 pF for the targeted signal BW of ~12 MHz. As the PAD isolates the N-path SC gain loop with the external parts (e.g., bondwire and pad), we define the center frequency of the passband with the LO.

#### 2.3 N-Path SC Gain Loop as a RX

As depicted in Fig. 14a, we can reconfigure the four-path SC gain loop as a multiband RX, which is compatible with the TX mode without tuning of components. With SW<sub>L</sub> and  $C_F$  already embedded in the downmix process, we can disable in the RX mode [6] the original switch SW<sub>R</sub> in the SC gain loop (Fig. 2). We insert the RF signal  $V_{\text{RF, TX}}$  at the source port in the input of  $G_{\text{mRF}}(V_{i, \text{RX}})$ , and then, after amplification, there is a down-conversion of the RF signal as four-phase BB signals extracted by series switches SW<sub>B</sub> driven by a set of out-phased LO. Finally, there is the amplification of the four-path BB signals ( $V_{\text{BB, RX1}} - 4$ ) by an inverter-based transconductance amplifier ( $G_{\text{mBB}}$ ), with the channel length of the transistors set at 0.18 µm to reduce the flicker noise. We size the  $G_{\text{mBB}}$  as 11 mS. The theory of the gain-boosted N-path RX appeared in [9, 17].

Due to the bidirectional transparency property of N-path passive mixers, the RX can attain input matching over a wide range of RF without any off-chip matching components. In Fig. 14a, we create a LO-defined bandpass input impedance at  $V_{i, RX}$ , by frequency-translating the BB low-pass response to RF as bandpass, via the passive mixer SW<sub>L</sub>. Besides, the mixer SW<sub>L</sub>, at the extra upmix path, also creates an N-path filter around  $G_{mRF}$  with the feedback capacitor  $C_F$  together, resulting in high-Q bandpass filtering at both  $V_{i, RX}$  and  $V_{o, RX}$  (Fig. 14b). Due to the loop gain created by  $G_{mRF}$ , the BB circuitry sees a higher impedance back to the source port [18], allowing the use of large  $R_{BR}$  (21 k $\Omega$ ) and small  $C_{BR}$  (1 pF) to save the die area and improve the NF.

Resulting from  $G_{mRF}$  offering phase inversion between its input and output, we can alleviate the frequency shifting of the S<sub>11</sub>-BW from the LO due to the input capacitance (imaginary part) to <1 MHz [6]. In contrast, the simulated frequency shifting raises to ~5 MHz for the typical passive mixer-first RX [4]. The series inductance (bondwire) can be as large as 1.9 nH when targeting a  $S_{11} < -10$  dB, which is within the practical range and can help to lower the NF by 0.4 dB in virtue of enhancing the passband gain, while suppressing harmonic folding terms [19]. The



**Fig. 14** (a) Four-path SC gain loop as a RX. The four-phase BB, extracted via SW<sub>B</sub> driven by a 25% LO out-phased with the N-path filter, avoids the BB noise from leaking to the source port (i.e., better NF), with SW<sub>L</sub> +  $C_F$  already embedded in the downmix function [6]. (b) Simulated gain responses at  $V_{i, RX}$  and  $V_{o, RX}$ . (c) Passband frequency shifting increases from 0.3 to 0.8 MHz when the RF band ranges from band5 to band2 and the passband gain droop increases from 1.6 to 1.9 dB

simulated passband frequency shifting ranges from 0.3 to 0.8 MHz when the RF frequency varies from 0.836 GHz (band5) to 1.88 GHz (band2) (Fig. 14c), with the passband roll-off ranging from 1.6 to 1.9 dB. Compared to the TX mode, here the passband ripple is  $\sim$ 1 dB higher due to the impact of BB impedance, which we can address with the post-emphasis digital equalizer [13].

With the input impedance matching provided by the BB impedance frequencytranslated to the RF port [6], it allows a large  $R_F$  (9.3 k $\Omega$ ) to concurrently improve the gain, NF, and OB rejection.  $R_{BR}$  and  $C_{BR}$  mainly define the BW of the BB low-pass response.

For the BB extraction, unlike the RX in [6] that exploits the series resistors  $R_1$ , here we employ switches SW<sub>B</sub> in series with the BB amplifiers (Fig. 14a). SW<sub>B</sub> driven by the same 25%-duty-cycle LO are out-phased with those in the N-path filter, to prevent the BB noise from leaking directly to the antenna port (only leaking to the output of  $G_{mRF}$ ), resulting in a better NF. Figure 15 plots the simulated OB



**Fig. 15** Simulated OB rejection at  $V_{i, RX}$  and  $V_{o, RX}$  and NF versus (**a**) the BB series resistors  $R_1$  in [6] and (**b**) the on-resistance  $R_{SWB}$  of the out-phased switches SW<sub>B</sub> at 2 GHz

rejection and NF for two different BB extraction techniques at 2 GHz. Figure 15a exhibits the NF optimized to 3.12 dB with the BB series resistor  $R_1$  set at 100  $\Omega$ . Smaller  $R_1$  will induce a higher gain drop whereas a large  $R_1$  will generate additional noise from itself. Here (Fig. 15b), the on-resistance  $R_{SWB}$  of the out-phased switch SW<sub>B</sub> sized at the same value of 100  $\Omega$  improves the NF to 1.84 dB. Besides, both BB extraction techniques provide similar OB rejection at both  $V_{i, RX}$  and  $V_{o, RX}$ .

#### 2.4 Four-Phase LO Generator and TX-/RX-Mode Logics

Figure 16 presents a div-by-4 ring counter that generates the four-phase LO. To achieve high LO phase precision at low power, transmission gates form the dynamic D flip-flops, with a phase corrector added after the first input buffer which receives the off-chip differential master clock ( $4LO_P$  and  $4LO_N$ ) running at 4x LO. Due to the reconfigurability of the TXR, we add a TX-/RX-mode logic block to activate or disable the switches SW<sub>L</sub>, SW<sub>R</sub> and SW<sub>B</sub>. For the TX mode, the four-phase LO drives  $LO_{1-4, SWL}$  and  $LO_{1-4, SWR}$ , while the switch SW<sub>B</sub> is off. For the RX mode, switch SW<sub>R</sub> is on, while  $LO_{1-4, SWL}$  and  $LO_{1-4, SWL}$  and  $LO_{1-4, SWB}$  share the same four-phase LO but are out of phase with each other. By properly sizing and determining the transistors' ratios in the output buffers and TX-/RX-mode logic block, the nonoverlap LO robustly withstands PVT variations. The simulated LO phase noise at 2 GHz is -159.8 and -160.7 dBc/Hz at 80 MHz offset for TX and RX modes, respectively. The LO generator at 2 GHz draws 10.3 mA (7 mA) in the TX (RX) mode.



Fig. 16 Twenty-five percent duty-cycle four-phase LO generator with TX-/RX-mode switch logics

#### 2.5 Measurement Results

Figure 17 exhibits the TXR prototype fabricated in 65-nm CMOS without inductors or baluns; it occupies a very small die area (0.038 mm<sup>2</sup>) dominated by the four-path feedback capacitors (32 pF) and the PAD. The TX and RX modes controlled by switches  $SW_{TX - RX}$  operate separately.

#### TX Mode

The power supply of the  $G_{mRF}$  and the LO generator is 1.1 V while that of the PAD is 2.5 V. Figure 18 depicts the measured output spectrum of a 10-MHz-BW 64-QAM OFDM signal at the LTE band2 (1.88 GHz). The ACLR<sub>EUTRA1</sub> and ACLR<sub>EUTRA2</sub> are -40 and -51.9 dBc, respectively, at a -1-dBm output power after de-embedding the loss of the cable and PCB. The EVM is 2.0%. By adjusting the gain-phase balancing of  $V_{BB, TX1}$  – 4, the spurs due to I/Q mismatch and LO feedthrough are suppressible to <-40 dBc. By simply sweeping the LO frequency, we consistently measure high-Q bandpass responses at different RF consistent with the simulations (Fig. 19a). The noise floor is -154.5 dBc/Hz at 80-MHz offset for band2 (Fig. 19b), which includes the thermal noise of the TX path and the phase noise of the LO generator. We obtained similar measured results at LTE band5 (0.836 GHz at 45-MHz offset) and band21 (1.455 GHz at 48-MHz offset). At a -



Fig. 17 Chip photo of the TXR, reconfigurable as a TX or RX by simple mode switching

| dB/div Ref 0. | 00 dBm |       | <br>    |   |                                                                                                                 |     |      |        |
|---------------|--------|-------|---------|---|-----------------------------------------------------------------------------------------------------------------|-----|------|--------|
|               | -42    | 7 dBc | 4.5 dBm | ╡ | -40.3 dBc                                                                                                       |     | -51. | 9 dBc  |
| )             |        |       |         |   |                                                                                                                 | 7   |      |        |
| )             |        | hand  |         |   | and and the second s | ~~~ | Am   |        |
| nter 1.88 GHz |        |       |         |   |                                                                                                                 |     | Spa  | n 50 l |

Fig. 18 TX mode: output spectrum of a 10-MHz BW 64-QAM OFDM signal at LTE band2

1-dBm output power, CIM $_3$  and CIM $_5$  are <-49 and <-63 dBc, respectively (Fig. 19a).

The PAD (16.3 mW), the  $G_{mRF}$  (10.8 mW), and the LO generator (11.3 mW) dominate the power consumption at LTE band2. In Fig. 20a, the power consumption rises from 31.3 (band 5) to 38.4 mW (band2) due to the dynamic type of the LO generator that has an average power efficiency of ~6.6 mW/GHz. The TX-mode



Fig. 19 TX mode: (a) LO-defined bandpass responses at different LTE bands (matching the simulations), with the output response (0 dB) referred to a 0-dBm  $P_{out}$  and (b) CIM<sub>3, 5</sub> at 5-MHz BB frequency and OB noise



Fig. 20 TX mode: (a) power breakdown at different RF. (b) Power consumption versus output power for the LTE band2 and band5

power consumption downscales accordingly when the output power backs off (Fig. 20b) for band2 and band5. For example, at a 3-dB power back-off, the power consumption at band2 (band5) drops by 5.1 mW (4.8 mW), mainly associated with the PAD. The power saving is <15% with the 3-dB power back-off only based on the gate biases ( $V_{\rm B1, 2}$ ) of the PAD. We should explore a more power-efficient variable gain PAD in order that it can operate out of the N-path SC gain loop.

Table 1 presents the performance summary as well as the benchmark with stateof-the-art current—/voltage-mode LTE TXs [8, 20]. Due to the effective gainboosted N-path filtering, this circuit succeeds in improving the multiband flexibility and area efficiency that is 24 times better than the state of the art, while preserving a comparable power efficiency (2.1% for band2 and 2.4% for band5). However, the
|                                               | This Work - TX-Mode                                                      |                      | JSSC'1                            | 4 [1.8] ª                           | ISSCC'15 [1.21]                                                |  |  |
|-----------------------------------------------|--------------------------------------------------------------------------|----------------------|-----------------------------------|-------------------------------------|----------------------------------------------------------------|--|--|
| TX Techniques                                 | SC Gain Loop +<br>Gain-Boosted N-Path Filter +<br>Wideband Push-Pull PAD |                      | Current<br>Class-A/B P<br>Passive | -Mode +<br>ower Mixer +<br>e Baluns | Voltage-Mode Mixer + 33%-<br>Duty-Cycle LO +<br>Passive Baluns |  |  |
| On-chip Balun/Inductor                        | Zero                                                                     |                      | Fo                                | our                                 | Two                                                            |  |  |
| Multi-Band Flexibility                        | Defined by LO                                                            |                      | Count o                           | n Baluns                            | Count on Baluns & Paths                                        |  |  |
| External Matching Parts                       | Zero (comp                                                               | atible w/ RX)        | Zero (o                           | only TX)                            | Zero (only TX)                                                 |  |  |
|                                               | Measured Performances at different LTE Bands (LTE10, 10MHz signal BW)    |                      |                                   |                                     |                                                                |  |  |
|                                               | Band2<br>(1.88 GHz)                                                      | Band5<br>(0.836 GHz) | Band2<br>(1.88 GHz)               | Band5<br>(0.836 GHz)                | Band13<br>(0.782 GHz)                                          |  |  |
| Output Power, Pout (dBm)                      | -1                                                                       | -1.2                 | 3.1                               | 2.8                                 | 2                                                              |  |  |
| Output Noise (dBc/Hz)<br>@ Freq. Offset (MHz) | -154.5@80                                                                | -156@45              | -158 ° @ 80                       | -159 * @ 45                         | -157.9 @ 31<br>(P <sub>ost</sub> = -1dBm)                      |  |  |
| ACLR <sub>EUTRA1</sub> (dBc)                  | -40.3                                                                    | -41.6                | -43                               | -43.4                               | -54                                                            |  |  |
| ACLR <sub>EUTRA2</sub> (dBc)                  | -51.9                                                                    | -50.3                | -54.5                             | -54.9                               | N/A                                                            |  |  |
| EVM (%)                                       | 2.0                                                                      | 2.1                  | 1.4                               | 1.4                                 | 0.8                                                            |  |  |
| Power (mW)                                    | 38.4                                                                     | 31.3                 | 69.6°                             | 73.6°                               | 216                                                            |  |  |
| TX Efficiency (%)                             | 2.1                                                                      | 2.4                  | 2.9                               | 2.6                                 | 0.7                                                            |  |  |
| Active Area (mm <sup>2</sup> )                | 0.038                                                                    |                      | 1.06 °                            |                                     | 0.93                                                           |  |  |
| Supply Voltage (V)                            | 1.1, 2.5                                                                 |                      | 1.8                               |                                     | 1.8                                                            |  |  |
| Technology                                    | 65 nm CMOS                                                               |                      | 55 nm L                           | P CMOS                              | 40 nm LP CMOS                                                  |  |  |

Table 1 COMPARISON WITH STATE-OF-THE-ART LTE TXS

<sup>a</sup> BBs are generated by on-chip DAC; <sup>b</sup> Measured with 50/20 Resource Block; <sup>c</sup> Without DAC, Biguad and 2 baluns

output noise is inferior when compared with [8] that concentrates the power budget on the final power mixer stage and [20]. We also acknowledge that [8] and [20] have a higher output power and better spectral purity under a higher power budget.

#### **RX Mode**

We use a single 1-V supply for the RX mode. In Fig. 21a, the circuit exhibits narrowband input impedance matching with  $S_{11} < -12$  dB with the position simply defined by the LO. The unmatched OB parts are of low impedance favoring the blocker rejection. The NF is 2.2 dB and up to 3.2 dB at 1.88 GHz (Fig. 21b). The NF goes up with frequency due to the BW limit of  $G_{mRF}$ . The total power consumption rises from 16.3 to 20 mW along with the frequency mainly due to the LO generator (Fig. 21b).  $G_{mRF}$  and the BB circuits consume 9.2 and 3.8 mW, respectively. The IIP<sub>2</sub>, IIP<sub>3</sub> and P<sub>-1dB</sub> measurements mainly assess the linearity. From Fig. 22a, the IB-IIP<sub>2</sub>/IIP<sub>3</sub> is +30/-12 dBm, whereas the OB-IIP<sub>2</sub>/IIP<sub>3</sub> is +48/+8 dBm at 80-MHz offset. Further, we measured the IIP<sub>2</sub> profile by applying two tone tests with frequency at  $f_{LO} + \Delta f$  and  $f_{LO} + \Delta f + 1$  MHz, whereas we obtained the IIP<sub>3</sub> profile at  $f_{LO} + \Delta f$  and  $f_{LO} + 2\Delta f - 1$  MHz. At 80-MHz offset, the measured OB-P-1dB is - 5 dBm (Fig. 22b).

Figure 23a plots the NF measured at BB that is  $\leq 3.2$  dB at 5 MHz under LO = 0.836 GHz. The blocker NF is 16 dB with a 0-dBm CW blocker injected at



Fig. 21 RX mode: (a) LO-defined narrowband  $S_{11}$ , and (b) RF-to-IF gain, power consumption, and NF



**Fig. 22** RX mode: (a) IIP<sub>2</sub> and IIP<sub>3</sub> profiles and (b) RF-to-IF gain versus blocker power at 80-MHz offset



Fig. 23 RX mode: (a) NF versus BB frequency and (b) blocker NF at 80-MHz offset

|                             | This Work – RX-Mode                                                         | ISSCC'15 [1.6]                                                               | JSSC'14 [1.22]                                     | JSSC'14 [1.23]                                           |  |
|-----------------------------|-----------------------------------------------------------------------------|------------------------------------------------------------------------------|----------------------------------------------------|----------------------------------------------------------|--|
| RX Techniques               | Gain-Boosted-Mixer-<br>First + N-Path Filtering +<br>Switched-BB Extraction | Gain-Boosted-Mixer-<br>First + N-Path Filtering +<br>Resistive-BB Extraction | Current-Reuse +<br>Active/Passive<br>N-Path Mixers | RF LNA +<br>Passive Mixer +<br>G <sub>m</sub> -C + OpAmp |  |
| RF Input Style              | Single-Ended                                                                | Single-Ended                                                                 | Single-Ended                                       | Differential                                             |  |
| External Matching Parts     | Zero                                                                        | Zero                                                                         | Zero                                               | Transformer                                              |  |
| Supply (V)                  | 1                                                                           | 0.7, 1.2                                                                     | 1.2, 2.5                                           | 0.9                                                      |  |
| Power (mW) @ RF (GHz)       | 20@1.88                                                                     | 11@1.5                                                                       | 16.2@0.85                                          | 40@3                                                     |  |
| DSB NF a (dB)               | 3.2 @ 1.88                                                                  | 2.9@1.5                                                                      | 5.5 @ 0.85                                         | 3.1@3                                                    |  |
| BB BW (MHz)                 | ~9                                                                          | 2                                                                            | 9                                                  | 0.5 to 50                                                |  |
| Die Size (mm <sup>2</sup> ) | 0.038 (include TX)                                                          | 0.028                                                                        | 0.55                                               | ~0.6                                                     |  |
| 0-dBm<br>Blocker NF (dB)    | 16 @ 80 MHz                                                                 | 13.5 @ 80 MHz                                                                | N/A                                                | 14 @ 80 MHz                                              |  |
| OB-P <sub>1dB</sub> (dBm)   | -5 @ 80 MHz                                                                 | -6 @ 80 MHz                                                                  | -2.5 @ 50 MHz                                      | -12.5 @ 80 MHz                                           |  |
| OB-IIP3 (dBm)               | +8                                                                          | +13                                                                          | +17.4                                              | +3                                                       |  |
| OB-IIP2 (dBm)               | +48                                                                         | +50                                                                          | +61                                                | +80 (calibrated)                                         |  |
| BB Filtering                | tering 1 Real Pole                                                          |                                                                              | 2 Complex Poles<br>+ 2 Zeros                       | 1 Real Pole +<br>1 Biquad                                |  |
| Voltage Gain (dB)           | 36                                                                          | 38                                                                           | 51±1                                               | 70                                                       |  |
| Technology                  | 65 nm CMOS                                                                  | 65 nm CMOS                                                                   | 65 nm CMOS                                         | 28 nm CMOS                                               |  |

Table 2 Comparison with State-of-the-Art LTE RXs

<sup>a</sup> Measured flat NF at BB.

80-MHz offset (Fig. 23b). Estimated from simulations, 4 dB of such a blocker NF is due to the saturation of  $G_{mRF}$  and increase of the on-resistance of SW<sub>L</sub>; both lead to lower OB rejection. To improve it, we should enlarge the supply voltage and power budget of  $G_{mRF}$ . Another 5 dB is due to the gain compression at BB originated by the utilization of a large  $R_{BR}$ . The reciprocal mixing of the LO phase noise contributes with additional 4 dB, with the remainder mainly caused by the phase noise of the signal generator (Agilent E4438C) that provides the CW blocker signal. Moreover, the simulated blocker NF (<10 dB) from Cadence (qpss + qpnoise) is typically better than the measured one, as the latter includes the collective effect of equipment's noise limit and uncertainty [21].

Table 2 illustrates the performance summary and compares it with the state-ofthe-art wideband RXs [6, 21, 22]. We pulled off a similar NF and die size when comparing them with [6] but entailing here only a single supply. Although this work consumes more power than [6], it operates at a 1.25 times higher RF and has a 4.5 times larger BB BW. The die area is at least 14 times smaller than [21] and [22] at a comparable power consumption, NF, and OB-IIP<sub>3</sub>.

## 2.6 Conclusions

We developed an area-efficient SAW-less multiband TXR using an N-path SC gain loop, which can be reconfigurable as a TX or RX by properly injecting and extracting the RF and BB signals. Specifically, with a four-path SC network as the feedback path of a gain stage, we embody all essential TX and RX functions, that is, signal amplification, high-Q bandpass filtering, and I/Q (de)modulation. The LO-defined bandpass filtering effectively suppresses the OB noise in the TX mode and OB blockers in the RX mode. We do not involve any on-chip inductors or external input matching components to cover a wide range of RF. The circuit analysis derived from an open-loop TX model using the Miller effect, which simplified the calculation of the signal transfer function, and noise contributions of  $R_{\rm BT}$ ,  $G_{\rm mRF}$ , and on-resistances of all switches. The RX mode featured a switched BB extraction technique to improve the RF. The very small area of the TXR rendered it as an attractive candidate for cost reduction of SAW-less multiband cellular radios, although we should further develop a higher TX output power (>0 dBm to account for the PCB loss) and power-efficient variable gain for practical cell phone applications.

# **3** 1.4–2.7-GHz FDD SAW-Less Transmitter for 5G-NR Using an N-Path Filter Modulator

For the sub-6-GHz 5G new radio (5G-NR), it is increasingly challenging to reduce the number of surface acoustic wave (SAW) filters between the transmitter (TX) and the power amplifier (PA), due to the increased channel bandwidth (BW). As depicted in Table 3, there are ten frequency division duplex (FDD) bands between 1.4 and 2.7 GHz specified by the 3GPP standard [23]. When there is a wide channel BW accompanied by a small TX receiver (RX) frequency spacing ( $\Delta f$ ), it results in a small  $\Delta f/BW$  ratio. For instance, it is 2.4 for the NR-n74 band. To realize a SAW-less TX, it has to emit negligible out-of-band (OB) noise at the nearby RX band to avoid desensitization. Recent efforts [8, 24] rely on high-order baseband (BB) filters and substantial bias currents to hinder the OB noise (e.g., -158 dBc/Hzat 30-MHz offset), yet at the cost of large power consumption (>90 mW) and area (1.06 mm<sup>2</sup>). Although the charge-domain direct-launch digital TX [25] is more flexible and consumes less power (40.3 mW) and area (0.22 mm<sup>2</sup>), it has a limited output power (-3.5 dBm) and entails off-chip baluns to underpin multiband coverage. For the direct quadrature voltage modulator [7], it demonstrates adequate output power (4 dBm), low output noise (-159 dBc/Hz), and high TX efficiency (10%), but it shows no multiband flexibility due to its LC-based PA driver (PAD) and adds more BB filtering to suppress the OB noise due to the digital-to-analog converter (DAC).

| NR-FDD<br>Bands | TX Freq.<br>(MHz) | RX Freq.<br>(MHz) | f<br>(MHz) | Max. BW<br>(MHz) |
|-----------------|-------------------|-------------------|------------|------------------|
| n1              | 1920 – 1980       | 2110 - 2170       | 190        | 20               |
| n2              | 1850 – 1910       | 1930 – 1990       | 80         | 20               |
| n3              | 1710 – 1785       | 1805 - 1880       | 95         | 30               |
| n7              | 2500 - 2570       | 2620 - 2690       | 120        | 20               |
| n25             | 1850 - 1915       | 1930 - 1995       | 80         | 20               |
| n30             | 2305 – 2315       | 2350 - 2360       | 45         | 10               |
| n65             | 1920 - 2010       | 2110 - 2200       | 190        | 20               |
| n66             | 1710 – 1780       | 2110 - 2200       | 400        | 40               |
| n70             | 1695 – 1710       | 1995 - 2020       | 300        | 25               |
| n74             | 1427 – 1470       | 1475 – 1518       | 48         | 20               |

Table 3 The 5G-NR FDD bands in the 1.4–2.7-GHz range

Merging the gain-boosted N-path filter with the I/Q modulator in [26, 27] enables flexible RF bandpass filtering on-chip. Unlike the typical TXs in [7, 24, 25] that are open-loop designs, such a filter modulator (FIL-MOD) operates in a closed-loop format to perform concurrently I/Q modulation and high-Q bandpass filtering. The local oscillator (LO) frequency can simply tune the center frequency. Yet, the filtering effectiveness is moderate with the signal BW extension, due to the tight trade-off between the passband BW and OB rejection. Although the FIL-MOD [26, 27] can cover the 0.7–2-GHz LTE bands and supports up to a 10-MHz BW, it limits the OB noise to -154.5 dBc/Hz.

Then, we introduce a SAW-less TX supporting a 20-MHz BW, which covers the 10 5G-NR FDD bands within the 1.4–2.7-GHz range. We establish a number of circuit techniques to balance the performance metrics, namely, the channel BW, the output noise at small  $\Delta f$ , the multiband flexibility, and the integration level. The key innovations are threefold as follows:

- 1. A BW-extended N-path filter modulator (BW-Ext FIL-MOD) by embedding a high-order gain-boosted N-path filter into the I/Q modulator, we obtain BW extension and steeper OB filtering.
- 2. An isolated BB input network it avoids the mutual loading effect between the BW-Ext FIL-MOD and the input network, thus upholding a high-Q filtering profile at RF, while reducing the nonlinearity and cross talk at BB.
- 3. A transimpedance amplifier (TIA)-based PAD it absorbs the bias and signal currents of the BW-Ext FIL-MOD for better linearity and power efficiency. It also features an inner parallel  $G_{\rm m}$  linearizer to suppress its third-order cross-intermodulation product (CIM<sub>3</sub>).

#### 3.1 Existing FIL-MOD and Proposed BW-Ext FIL-MOD

In [27], the gain-boosted N-path filter merged with the I/Q modulator. This act relaxed the order of the BB filters that otherwise should be high enough to suppress the OB noise from the DAC. Yet, the gain-boosted N-path filter suffered from a tight trade-off between the passband BW and OB rejection, as explained next. Afterward, we introduce the BW-Ext FIL-MOD that alleviates such a tradeoff.

#### Existing FIL-MOD

Figure 24a displays the simplified schematic of the FIL-MOD from [27]. The switch SW<sub>L</sub> upmixes the BB signals to RF, while the gain stage  $-G_{m1}$  offers signal amplification. A switched capacitor (SC) N-path negative-feedback network (NFN) made by switch SW<sub>R</sub> and capacitor  $C_F$  surrounds  $-G_{m1}$ . The input network, simplified as  $Z_S$ , can be a simple passive RC filter [7, 27]. According to [9, 17], the open-loop gain provided by  $-G_{m1}$  offers three benefits due to the created Miller effect with  $C_F$ : (1) boosting the effective capacitance of  $C_F$  at the input of  $-G_{m1}$ , which reduces the chip area for  $C_F$ ; (2) the open-loop gain divides the effective ON-resistance of SW<sub>L, R</sub>, enabling higher OB rejection or relaxing the LO power budget; and (3) high-Q bandpass filtering occurs at both the input and the output of  $-G_{m1}$ , since the in-band RF voltage adds in-phase over a switching period, while canceling out the OB RF voltages. Yet, there is a hard trade-off between the passband BW (10 MHz in [27]) and OB rejection, rendering the FIL-MOD unsuitable for wide-band applications.

Figure 24a shows an analytical model of the FIL-MOD, to analyze its RF bandpass response in which we can frequency-translate the BB low-pass response to RF as a bandpass, owing to the bidirectional transparency of the passive mixers [4, 28]. By omitting SW<sub>L, R</sub>, the BB equivalence is  $-G_{m1}$  surrounded by the NFN, made by  $C_F$ . Based on such model, we can obtain the first-order BB transfer function from  $V_{BB, I}(s)$  to  $V_{BB, O}(s)$  (Fig. 24b):



**Fig. 24** (a) The existing N-path FIL-MOD and its analytical model. It features a gain-boosted N-path filter to realize high-Q bandpass filtering at flexible RF. (b) The frequency response up-translated from BB to RF

$$H(s)_{\text{MOD}} = \left(\frac{V_{\text{BB,O}}(s)}{V_{\text{BB,I}}(s)}\right)_{\text{MOD}} = r_{\text{o}}C_{\text{F}}\omega_{3\text{ dB}} \cdot \frac{s - G_{\text{m1}}/C_{\text{F}}}{s + \omega_{3\text{dB}}}$$
(13)  
$$\omega_{3\text{dB}} = \frac{1}{C_{\text{F}} \cdot (G_{\text{m1}}r_{\text{o}}Z_{\text{S}} + r_{\text{o}} + Z_{\text{S}})},$$

where  $r_{\rm o}$  is the output resistance of  $-G_{\rm m1}$  and  $\omega_{\rm 3dB}$  is the passband BW at BB. When  $G_{\rm m1}r_{\rm o} \ll 1$  and  $G_{\rm m1}Z_{\rm S} \ll 1$ , we acquire  $\omega_{\rm 3dB} \approx 1/(C_{\rm F}G_{\rm m1}r_{\rm o}Z_{\rm S})$  in which the passband BW is inversely proportional to  $C_{\rm F}$ . Referring to the gain loop inside the analytical model, the input conductance is  $(Z_1)^{-1} \approx s(G_{\rm m1}r_{\rm o})C_{\rm F}$ , which implies a first-order low-pass response. With a frequency translation to RF, we create a second-order RF bandpass response, with a passband BW of  $2 \cdot \omega_{\rm 3dB}$ . The LO frequency can tune the center frequency  $\omega_{\rm LO}$ .

#### The Proposed BW-Ext FIL-MOD

To alleviate the trade-off between the passband BW and OB rejection, we introduce a higher-order gain-boosted N-path filter, implemented through the addition of an N-path SC positive feedback network (PFN), composed by switch SW<sub>z</sub> and capacitor  $C_z$  (Fig. 25a). The anti-phased LO waveforms of SW<sub>L, R</sub> drive SW<sub>Z</sub> to ensure a positive feedback. The parallelized PFN and NFN co-synthesize a complex pole pair, explained by its analytical model presented in Fig. 25a. The BB equivalence becomes the  $-G_{m1}$  surrounded by both PFN and NFN. We model the PFN as a gain stage  $-A_z$  in series with  $C_z$ , whereas the NFN is only  $C_F$ .  $A_z$  is the conversion gain of the passive mixer driven by the 25%-duty-cycle LO waveforms, with  $A_z \approx 0.9$ . According to [29], we can induce the BB transfer function from  $V_{BB, I}(s)$  to  $V_{BB, O}(s)$ in the analytical model (Fig. 25b) as.



**Fig. 25** (a) The proposed BW-Ext N-path FIL-MOD and its analytical model. It features a NFN and a PFN to form a complex pair pole with -Gm1, widening the passband BW. (b) The frequency response up-translated from BB to RF



where  $Q_S$  is the Q factor for the BB resonance and  $\omega_{BB}$  is the natural frequency. Obviously, a low-pass response with a dual-pole roll-off profile emerges by co-synthesizing NFN and PFN. We can size  $Q_S$  and  $\omega_{BB}$  via  $C_{F, Z}$ . When  $G_{m1}r_o \ll 1$ and  $G_{m1}Z_S \ll 1$ ,  $Q_S$  is proportional to a capacitor ratio  $C_Z/C_F$ . By choosing a moderate  $Q_S$ , we can obtain a wider passband BW  $\Delta \omega$  (Fig. 25b) by extending  $\omega_{BB}$ . In Eq. (14), an unwanted zero appears due to  $C_F$  and located at  $G_{m1}/C_F$ , which however we can neglect since it is far away from the poles. The filter order derives from the input conductance of the gain loop (Fig. 25a), which is  $(Z_1)^{-1} + (Z_2)^{-1}$  $^1 \approx s^2 r_o C_F C_Z + s((1 - G_{m1}r_o A_Z)C_Z + G_{m1}r_o C_F)$  [29]. As a result, the filter order of the BB input conductance increases due to the PFN. With such a response frequency-translated to RF, we can create a higher-order bandpass response. In Fig. 25b,  $\omega_{LO} \pm \omega_{BB}$  defines the lower and upper center frequencies  $\omega_{C1, 2}$ , respectively. The RF passband BW is  $2 \cdot \Delta \omega$  and the Q factor given by  $Q_{FILMOD} = \omega_{LO}/(2 \cdot \Delta \omega)$ .

Figure 26 compares the RF frequency responses of the FIL-MOD and the BW-Ext FIL-MOD. We set  $G_{m1} = 100 \text{ mS}$ ,  $r_0 = 20 \Omega$ ,  $C_F = 5 \text{ pF}$ , and  $C_Z = 10 \text{ pF}$ ,

and suppose  $Z_{\rm S}$  is a simple passive RC low-pass filter with resistor  $R_{\rm B} = 400 \ \Omega$  and capacitor  $C_{\rm B} = 2 \,\mathrm{pF}$ . Due to the created complex pole pair, the BW-Ext FIL-MOD widens the flat passband BW (-1-dB point) to 46.8 MHz, which is 1.5 times larger than the FIL-MOD, along with an improved steepness of the roll-off profile. At the 80-MHz offset, the BW-Ext FIL-MOD improves the OB rejection by 15 dB at  $V_{\rm RF}$  (Fig. 25a) at a 2-GHz RF.

We can alternatively recognize the SC N-path PFN and NFN as notch filters; thus the OB noise feedforwards via these two paths bypassing  $-G_{m1}$  [3, 30]. Due to their inverse phase responses, the OB noise partially cancels at the output of  $-G_{m1}$ , leading to better OB rejection.

#### 3.2 TX Design and Analysis

#### Architecture

Figure 27 details the TX schematic that exhibits the four-phase BB signals (i.e., I/Q and differential) received from the isolated BB input network (SW<sub>B</sub> and  $C_B$ ). The



Fig. 27 Schematic of the fully integrated multiband SAW-less TX with low RX band noise. The LO defines the RF operating frequency. The key blocks are the BW-Ext FIL-MOD, the isolated BB input, and the wideband TIA-based PAD. The PAD absorbs the bias and signal currents of  $-G_{m1}$  for better linearity and TX efficiency

BW-Ext FIL-MOD performs I/Q modulation by upmixing the BB signals through the switch SW<sub>L</sub>. Subsequently, the gain stage  $(-G_{m1})$  amplifies the upmixed signal. A simple NMOS device realizes  $-G_{m1}$ , such that the following TIA-based PAD can absorb its bias and signal currents  $(I_{O, P} \text{ and } I_{O, N})$ . This co-design benefits both the linearity and TX efficiency, together with an inner parallel  $G_m$  linearizer  $(-G_{m2})$  to suppress the CIM<sub>3</sub> at the TX output. Then, an on-chip center tap transformer combines the differential signal currents before driving the off-chip 50- $\Omega$  load for measurements. The differential implementation not only benefits the output power but also allows using the cross-feedback capacitors  $C_1$  to cancel the parasitic effects associated with  $-G_{m1}$ , trimming the passband shape.  $C_Z$  offers the design freedom to balance the OB rejection with the passband flatness at both  $V_{L, P}$  and  $V_{O, P}$ .

The BW-Ext FIL-MOD embeds an N-path SC NFN (SW<sub>R</sub> and  $C_F$ ) and an N-path SC PFN (SW<sub>Z</sub> and  $C_Z$ ) around SW<sub>L</sub> and  $-G_{m1}$ . SW<sub>R</sub> performs the downmix function, while SW<sub>Z</sub> is for the anti-phased downmix function to create the PFN. For example, when the four-phase 25%-duty-cycle *LO* waveforms (i.e., LO<sub>1</sub> to LO<sub>4</sub>) drive the switches SW<sub>L</sub>, R, the anti-phased *LO* waveforms (i.e., LO<sub>3</sub> to LO<sub>2</sub>) will drive the SW<sub>Z</sub>. In addition, the isolated BB input network utilizes the adjacent phase of each nonoverlap *LO* waveform to time-interleave the operations of the BB injection and BW-Ext FIL-MOD. Thus, the adjacent LO waveforms (i.e., LO<sub>2</sub> to LO<sub>1</sub>) drive SW<sub>B</sub>.

#### Functional View of the TX

To be more intuitive, Fig. 28a presents a functional view (which is not the equivalent circuit) of the proposed multiband BW-extended TX, with the I/Q modulation and BW-extended bandpass filtering decomposed as two cascaded functions to simplify the comparison. The I/Q modulation is in the forward path like a typical TX, synthesizing an RF signal at  $V_{I, P}$  from the four-phase BB signals injected by the isolated BB input network. The RF signal virtually experiences BW-extended bandpass filtering at both  $V_{I, P}$  and  $V_{O, P}$  to reject the OB noise. In Fig. 28b, we draw the model of the N-path SC NFN as a paralleled RLC resonator around  $-G_{m1}$ , forming a gain-boosted N-path filter where the tunable inductor represents the tunable RF center frequency [3], whereas we can model the N-path SC PFN as series RLC resonators, which shunt to ground in Fig. 28a, thus creating a third-order Chebyshev notch filter around  $-G_{m1}$  when combining it with the paralleled RLC resonator. The NFN and PFN together result in a high-order gain-boosted N-path filter. The tunable inductor in the series RLC resonators also defines the RF center frequency, which is the same as the paralleled RLC resonator.

When compared with the conventional passive RC-BB input network in Fig. 28b, the isolated BB input network lowers the BB mutual loading effect and decouples the *Q*-factor degradation of the created RF bandpass response in the BW-Ext FIL-MOD (detailed in Sect. 3.2.4). The circuit in [27] exploited a wideband singled-ended voltage-input PAD in the conventional FIL-MOD (Fig. 28b); however, it can only deliver a -1-dBm output power at a -40-dBc ACLR<sub>1</sub>. Here, we



**Fig. 28** (a) Functional view of this work that features the isolated BB input, gain-boosted N-path filter with BW extension, and a wideband TIA as the PAD. (b) Functional view of the TX in [27] based on the passive RC-BB input, the gain-boosted N-path filter, and the voltage-input amplifier as the PAD

introduce a wideband differential TIA-based PAD. In this case, the PAD (reported later) is capable to deliver a larger output power along with a better linearity.

#### LTI Model of the BW-Ext FIL-MOD

For the RX, the bidirectional transparency property of the passive mixer allows frequency translation and impedance transformation [4, 28], and then it implies the introduction of a linear time-invariant (LTI) model for the original linear time-variant (LTV) circuitry. The LTI model represents the time-varying effects of the N-path switches by applying an impedance transform factor  $\alpha$  to the BB impedance and adding a virtual shunt resistance  $R_{\rm sh}$ . When N = 4, the 25%-duty-cycle LO waveforms drive the passive switches, with a scaling factor  $\alpha = 2/\pi^2$ .  $R_{\rm sh}$  represents the loss due to harmonic up-conversion and dissipation or the harmonic folding effect due to the passive switches.

The application of the LTI model to the passive mixer-based TX is also possible due to the bidirectional transparency property. Similar to [4, 28], the scaling factor  $\alpha$ acted as the BB impedance in TX. Yet, we can neglect the virtual shunt  $R_{sh}$  for TX, since the harmonic folding effect plays only a little contribution on the in-band amplified signal and also the OB noise [27]. Figure 29 displays the equivalent LTI-based model of the BW-Ext FIL-MOD. To simplify the analysis, we employed a single-ended architecture. Since the isolated BB input network operates as a BB-to-BB gain response in the signal path, the resistor  $R_{\rm B} = 4 \cdot R_{\rm SWB}$  represents the switch





**Fig. 30** The voltage gain, predicted by the LTI-based analysis against the simulation result over a 200-MHz span at 2-GHz RF.  $G_{m1} = 100$  mS,  $r_0 = 20 \ \Omega$ ,  $C_F = 5$  pF,  $C_z = 10$  pF,  $R_B = 400 \ \Omega$ , and  $C_B = 2$  pF

SW<sub>B</sub>. Here, we only focus on the LTI model for the BW-Ext FIL-MOD, leaving the analysis of the isolated BB input network for Sect. 3.2.4. In Fig. 29, the scaling factor  $\alpha$  acted on the components operating at BB frequency ( $\omega - \omega_{LO}$ ), that is,  $C_Z$ ,  $C_F$ ,  $C_B$ , and  $R_B$ . The voltage source  $V_{th}$  is the approximated LTI-based Thévenin equivalent voltage, which is  $\sqrt{2}/\pi \left(e^{j\pi/4}V_{BB,I}(s) + e^{-j\pi/4}V_{BB,Q}(s)\right)$  at BB frequency ( $\omega - \omega_{LO}$ ) [10]. When neglecting the input parasitic effect of  $-G_{m1}$  and *ON*-resistance  $R_{SWL}$ , Eq. (15) deducts the signal at  $V_{O, P}$ , which is a third-order bandpass response with a complex pole pair. In Eq. (15), we introduce two left-half s-plane zeros located at  $\alpha/(C_Z R_{SWZ})$  and  $\alpha/(C_F R_{SWR})$  when  $G_{m1}R_{SWR} \ll 1$ , which however we can move to higher frequencies by using smaller  $C_Z$  (or  $R_{SWZ}$ ) and  $C_F$  (or  $R_{SWR}$ ). Figure 30 plots Eq. (15) plotted at 2 GHz which matches well with the simulated

curve spanning from 1.9 to 2.1 GHz, except that there is a 1.3-dB gain increment since we neglect the input parasitic capacitance of  $-G_{m1}$ :

$$V_{o,P} \approx \frac{(C_Z R_{SWZ} s + \alpha)((C_F G_{m1} R_{SWR} - C_F) s + G_{m1} \alpha) r_o}{-C_F C_Z R_B C_B R_{SWZ} (R_{SWR} + r_o)} \\ \cdot \frac{\sqrt{2} / \pi (e^{j\pi/4} V_{BB,I}(s) + e^{-j\pi/4} V_{BB,Q}(s))}{s \cdot (s^2 + (\omega_o/Q) s + \omega_o^2)}$$
(15)

$$\omega_{o}^{2} = \frac{\alpha(G_{m1}r_{o}R_{B}\alpha(A_{Z}C_{Z}-C_{F})-R_{B}\alpha(C_{B}+C_{F}+C_{Z})-C_{F}(R_{SWR}+r_{o})-C_{Z}R_{SWZ})}{-C_{F}C_{Z}R_{B}C_{B}R_{SWZ}(R_{SWR}+r_{o})}$$

$$Q = \sqrt{C_F C_Z R_B C_B R_{SWZ}} \times \frac{\sqrt{(R_{SWR} + r_o)(\alpha(G_{m1}r_o R_B\alpha(C_F - A_Z C_Z) + R_B\alpha(C_B + C_F + C_Z) + C_F(R_{SWR} + r_o) + C_Z R_{SWZ}))}{\Delta}$$
$$\Delta = C_F R_B \alpha(G_{m1}r_o C_Z (A_Z R_{SWR} - R_{SWZ}) - C_B (R_{SWR} + r_o)) \qquad (17)$$
$$- C_F C_Z R_B \alpha(R_{SWR} + R_{SWZ} + r_o(1 + A_Z)) - C_F C_Z R_{SWZ} (R_{SWR} + r_o)$$
$$- C_B R_B C_Z R_{SWZ} \alpha$$

Since the PFN may introduce a stability concern, we must design carefully the BW-Ext FIL-MOD. As explained in Eq. (14), the capacitive ratio  $C_Z/C_F$  dominates the Q factor of the bandpass response and also the stability. When ignoring the input capacitance of  $-G_{m1}$  and the *ON*-resistances  $R_{SWL, R, Z}$ , based on the equivalent LTI-based model of Fig. 29, we can derive the loop gain by breaking the closed loop  $(-G_{m1} + C_{F, Z})$  and injecting a test voltage. With the closed loop broken at the input of  $-G_{m1}$ , we obtain the open loop voltage gain as.

$$\frac{V_{\text{test},O}(s)}{V_{\text{test},I}(s)} = \frac{s \cdot G_{\text{m1}}R_{B}\alpha r_{\text{o}}(C_{Z}A_{Z} - C_{\text{F}})}{s^{2} \cdot \left[(C_{Z}(1 + A_{Z}) + C_{B})C_{\text{F}}R_{B}r_{\text{o}}\right] + s \cdot \left[R_{B}\alpha(C_{\text{F}} + C_{Z} + C_{B}) + C_{\text{F}}r_{\text{o}}\right] + \alpha}$$
(18)

which is a second-order high-pass response. With  $G_{m1} = 100$  mS,  $r_0 = 20 \Omega$ ,  $C_F = 5$  pF,  $C_B = 2$  pF,  $R_B = 400 \Omega$ , and  $A_Z = 0.9$ , we can plot Eq. (18), as in Fig. 31, for different  $C_Z$  at the 1–100-MHz range. The loop gain climbs with the increment of  $C_Z$ , and the loop gain is <0 dB when  $C_Z$  is  $\leq 20$  pF. Thus, we can secure the closed-loop stability when  $C_Z/C_F \leq 4$ .

Besides that,  $C_Z$  offers the freedom to balance the OB rejection with the passband flatness at both  $V_{I, P}$  and  $V_{O, P}$ . In Fig. 32a, with the increment of  $C_Z$ , the Q factor of the RF bandpass response increases, as reflected in the passband ripple. When  $C_Z = 30$  pF, the BW-Ext FIL-MOD tends to be unstable, and its passband ripple is ~3.2 dB. When compared with  $C_Z = 5$ , the 10-pF  $C_Z$  extends the flat passband BW (-1 dB point) by 17.8 MHz, along with an improved OB rejection by 3.2 dB at an 80-MHz offset. Also, the *ON*-resistance  $R_{SWZ}$  offers freedom to balance the OB

(16)

 $(\mathbf{g})_{ij} (\mathbf{g})_{ij} (\mathbf{g$ 

Fig. 31 The loop gain is <0 dB when  $C_Z \le 20$  pF, indicating that the BW-Ext FIL-MOD can operate with stability



Fig. 32 (a) The simulated tunable BW and OB rejection for different  $C_Z$ . (b) The on-resistance  $R_{SWZ}$  offers other degree of freedom for tuning the BW and OB rejection

rejection and passband BW (Fig. 32b). A smaller  $R_{SWZ}$  improves the performance but at the cost of LO power. When  $R_{SWZ}$  is 20  $\Omega$ , the flat passband BW and OB rejection at the 80-MHz offset improve by 12.4 MHz and 3.1 dB, respectively, when compared with a 80- $\Omega R_{SWZ}$ .

#### **Isolated BB Input Network**

The isolated BB input aids to alleviate the mutual loading effect between the BW-Ext FIL-MOD and itself, while hindering the BB cross talk. For the 25%duty-cycle *LO* waveforms, short overlap among them may occur during the transition. For example, when the falling edge of  $LO_1$  overlaps with the rising edge of  $LO_2$ , a BB current loop occurs in the conventional passive RC-BB input network [27] (Fig. 33), with both the N-path SC PFN and NFN omitted and the finite input impedance of  $-G_{m1}$  expressed by the parasitic capacitor  $C_P$ . The induced BB current loop affects the voltage across  $C_P$ , thus degrading the linearity [7]. Here, the isolated BB input network hinders this effect by utilizing the adjacent phase of *LO* waveforms to time-interleave the operation of the BB injection. Although the switches driven by LO<sub>1</sub> and LO<sub>2</sub> turn on simultaneously during the short overlap period, the switch driven by LO<sub>3</sub> turns off, and then the BB current goes to ground through  $C_B$ , thus, preventing a BB current loop.

In the signal path, the isolated BB input network is similar to that of the passive RC-BB input, as a BB low-pass filter with the -3-dB frequency corner  $1/(4R_{SWB}C_B)$ . After that, with the BB signals injected into the BW-Ext FIL-MOD, the BB injection circuits become a load. Essentially, the BW-Ext FIL-MOD sees it as a paralleled RC network if it is a passive RC-BB input network, whereas each LO sequence in the isolated BB input network only sees a shunt capacitor  $C_{\rm B}$ , thus preserving the created RF bandpass response. In Fig. 34a, the isolated BB input improves the OB rejection by 7 dB at 80-MHz offset for  $C_{\rm B} = 2$  pF,  $R_{\rm B} = 400 \ \Omega$ , and  $R_{SWB} = 100 \Omega$ . Unlike the BB low-pass filter [7, 27] where the OB rejection is proportional to  $C_{\rm B}$ , a smaller  $C_{\rm B}$  enhances the OB rejection for the isolated BB input network. In Fig. 34b, the OB rejection obviously improves at a small  $C_{\rm B}$  (<2 pF) when comparing it with the passive RC-BB input, since the BW-Ext FIL-MOD dominates the OB rejection, and a larger  $C_{\rm B}$  degrades the Q factor of the created RF bandpass response. The lower limit of  $C_{\rm B}$  ties up to the passband BW; thus we should choose a proper  $C_{\rm B}$  to balance the passband BW and OB rejection. Also, the ON-resistance  $R_{SWB}$  offers the freedom to balance the passband BW of the BW-Ext FIL-MOD with the OB noise rejection (Fig. 34c). The increase of  $R_{SWB}$  enhances the OB rejection along with a lower output noise.



Fig. 33 The LO overlap effect between the falling edge of  $LO_1$  and rising edge of  $LO_2$ . It induces a BB current loop in the typical passive RC-BB input network, although hindered in the isolated BB input network



Fig. 34 (a) Isolated BB input improves OB rejection when compared with the passive RC-BB input. (b) Simulated OB rejection and passband BW at different  $C_{\rm B}$ . (c)  $R_{\rm SWB}$  allows balancing the OB rejection and OB noise at  $V_{\rm O, P}$ 

#### Wideband TIA-Based PAD

Since the single-ended voltage-input amplifier as a PAD can suffer from low linearity and output power [27], herein we present a differential wideband TIA-based PAD to absorb the bias and signal currents of  $-G_{m1}$ . An on-chip 1.4:1 transformer combines the differential RF outputs, shunted by a 5-bit tunable  $C_T$  (0.1 pF/step) to expand the RF coverage. With the cross-coupling capacitor  $C_3$  (2 pF), a moderate thick-oxide transistor ( $M_3$ ) is adequate, 300 µm/150 nm, to enhance the reverse isolation, reliability, and voltage gain (by 4.6 dB from simulation). The differential implementation not only aids the output power but also allows the utilization of a cross-feedback capacitor  $C_1$  to cancel the parasitic effects associated with  $-G_{m1}$ , trimming the passband shape.  $C_1$  is a 4-bit tunable capacitor bank with 50 fF/step. We use transistor  $M_2$  to isolate the two cross-coupling capacitors ( $C_1$  and  $C_3$ ) that serve different purposes.

The CIM<sub>3</sub> term is always a key challenge on the spurious emission in TXs [20, 31, 32], due to the third-order intermodulation product of signals around (1 × LO) and (3 × LO). With a BB tone ( $f_{BB}$ ) fed to the TX (Fig. 35a), there is, after the mixer, the desired tone at  $f_{LO} + f_{BB}$  and the unwanted tone at  $3 \times f_{LO} - f_{BB}$  due to the use of 25%-duty-cycle *LOs*. After passing the TIA-based PAD, the unwanted CIM<sub>3</sub> term appears at  $f_{LO} - 3 \times f_{BB}$ . Similar to the full-duplex spaced (FDS) jammer test in the RX with the equally duplexed TX leakage and OB jammer injected [30], the OB output-referred third-order intercept point (OB-OIP<sub>3</sub>) can reflect into the CIM<sub>3</sub> term in TX, leading to.

$$OB - IIP_3 = P_{out} + \frac{P_{in,3LO} + G_{PAD} - P_{CIM3}}{2}, \qquad (19)$$

where  $P_{\text{out}}$  is the TX output power and  $P_{\text{in, 3LO}}$  is the third LO PAD input power and  $G_{\text{PAD}}$  is the gain of PAD. The third LO PAD input power is 9.5 dB smaller than the first LO one for 25%-duty-cycle LO waveforms [7]. In order to reach a <-50 dBc CIM<sub>3</sub>, the OB-IIP<sub>3</sub> should be >24.75 dBm when delivering a 3-dBm output power with a 12-dB PAD gain.



**Fig. 35** (a) Signals around 1xLO and 3xLO (at the output of the FIL-MOD) inject to  $-G_{m1}$  and generate CIM<sub>3</sub> at  $V_{O, P}$ . (b) CIM<sub>3</sub> partially cancelled by the paralleled linearizer  $-G_{m2}$ 

Unlike [32] that employs an area-hungry passive LC filter between the MOD and PAD and [20] that uses power-hungry multiple LO phases to suppress the third-order harmonic, we utilize an inner parallel  $G_{\rm m}$  linearizer  $(-G_{\rm m2})$  in Fig. 35b to suppress the CIM<sub>3</sub> term at the TX output [33, 34]. We design  $-G_{\rm m2}$  to bias at the triode region and to achieve the partial cancellation of the third-order coefficient with  $-G_{\rm m1}$  biased in the saturation region. For a better power efficiency, we biased  $-G_{\rm m1}$  in the class-A/B mode.  $-G_{\rm m1}$  ( $-G_{\rm m2}$ ) is a simple NMOS transistor, sized with 120 µm/30 nm (60 µm/30 nm). With  $G_{\rm m}$  linearizer the OB-IIP<sub>3</sub> improves by 9.2 dB (i.e., 26.7 dBm) in the TIA-based PAD, when compared with the circuit without the linearizer. From 500-run PVT Monte-Carlo simulations, only 0.4% of the samples has a CIM<sub>3</sub> < 35 dBc. Yet, we can obtain CIM<sub>3</sub> < 40 dBc for all 500 runs after

simple bias calibration on  $-G_{m2}$ . Also, a careful layout is essential to pull of the robustness of the TIA-based PAD.

#### Four-Phase 25%-Duty-Cycle LOGEN

Figure 36 presents a low-power 25%-duty-cycle LO generator (LOGEN). The differential self-biased input buffers amplify first the input clock signals. After adjustment by a phase corrector, a divider-by-2 receives the input signals. To generate the 25%-duty-cycle LO waveforms  $(LO_{1} - 4)$ , it is necessary to apply "AND" logic on the frequency-divided 50%-duty-cycle signals  $Q_{12}, \overline{Q_{12}}$  and the signals 2LO<sub>P. N</sub>. Since the rising and falling edges of LO<sub>1 - 4</sub> derive from the 2LO<sub>P.</sub> <sub>N</sub>, the divider-by-2 will not contribute with noise for  $LO_{1-4}$  [7]. Unlike [7, 27] that employ a single-ended inverter-based input buffer, a differential self-biased input buffer amplifier here improves the phase noise without sacrificing the power [35]. By introducing a self-biased circuitry, we secure a differential amplifier, consisting of transistors  $M_1 - M_4$ . The embedded inverter ( $M_3$  and  $M_4$ ) acts as a push-pull amplifier, adaptively biased by a negative feedback scheme using the current-sourcing transistor  $M_1$  and the current-sinking transistor  $M_2$ . We embed another push-pull amplifier (inverter) with  $M_5$  and  $M_6$  that inversely amplify the input clock signal LO<sub>IN. N</sub> to  $V_{out}$ . Simulated at 2 GHz, the LOGEN exhibits a phase noise of -158.4 dBc/Hz at 40-MHz offset frequency, improving by ~6 dB when compared with the single-



Fig. 36 Low-power four-phase 25%-duty-cycle LOGEN with differential self-biased input buffers

ended input buffer. The LOGEN consumes 9.8 mW power at 2 GHz, of which 0.8 mW result from the differential self-biased input buffer.

#### **Other Implementation Details**

Resulting from the Miller effect created by the loop gain of  $-G_{m1}$ , small physical  $C_F$  (5 pF) and SW<sub>L, R</sub> (20 µm/30 nm) allow the reduction of the parasitic effects and *LO* power [9]. The simulated  $G_{m1}$  is 110.2 mS and the inverting gain from  $V_{I, P}$  to  $V_{O, P}$  is -1.6 V/V (Fig. 27). From simulations, the main contributions for the output noise at the 80-MHz offset are from  $-G_{m1}$  (27.8%), SW<sub>L, R</sub> (14.4%), SW<sub>Z</sub> (7.2%), and TIA-based PAD (18.2%), whereas SW<sub>B</sub> and LOGEN only contribute 1.2% and 3.8%, respectively. The remainder comes from the 50- $\Omega$  load and bias circuit. SW<sub>B</sub> is hence small (6 µm/30 nm) to save the *LO* power. LOGEN contributes to the output noise in two aspects: the *LO*-feedthrough noise from the switches' parasitic capacitor and the *LO*-modulated phase noise with BB signals. The differential TX implementation cancels the *LO*-feedthrough noise; the created high-Q bandpass filtering at both  $V_{I, P}$  and  $V_{O, P}$  effectively suppresses the *LO*-modulated phase noise, alleviating the *LO* power.

#### 3.3 Measurement Results

Figure 37 shows the fully integrated SAW-less TX, fabricated in 28-nm CMOS, occupying a 0.31-mm<sup>2</sup> active area. The power supply of the LOGEN is 1 V, with the TIA-based PAD powered at 1.8 V. Figure 38 depicts the measured output spectrum, with a 3-dBm output power at 2.535 GHz (NR-n7 band) when applying a 64-QAM SC-FDMA signal with a 20-MHz BW. The ACLR<sub>1</sub> and ACLR<sub>2</sub> are -44.4 and -58.7 dBc, respectively. The EVM is 1.9%. The I/Q mismatch image and LO feedthrough is suppressible to <-40 dBc by manual calibration on the gain-phase balancing of the four-phase BB inputs (I/Q and differential). Furthermore, the BB calibration is beyond the focus of this work. By simply sweeping the LO frequency, we can consistently measure a high-Q bandpass response at different NR bands covering 1.4–2.7 GHz and also compatible with the simulation (Fig. 39). The flat passband BW is >20 MHz. At a 3-dBm output power, the CIM<sub>3</sub> and CIM<sub>5</sub> are -54and - 64.2 dBc, respectively (Fig. 40a). The output noise floor is -158 dBc/Hz at 120-MHz offset for NR-n7 band, which includes the thermal noise of the TX path and modulated phase noise of the LOGEN. Figure 40 summarizes similar measured results at the NR-n2 Band (1.88 GHz at 80 MHz offset) and the NR-n74 Band (1.4485 GHz at 48 MHz offset), including the CIM<sub>3,5</sub>, the output noise floor, and the  $ACLR_{1,2}$  (Fig. 40b). When the output power is back off, the OB noise degradation is  $\leq$ 1.5 dB (Fig. 41a), and the ACLR<sub>1, 2</sub> varies <2 dB regardless of the signal BW of 10 or 20 MHz (Fig. 41b).







Fig. 38 Measured output spectrum for NR-n7 (2.535GHz) with a 20-MHz BW

The TX can support a wider signal BW >20 MHz, but it has to leverage with the OB rejection (refer to Fig. 34b). To improve both the OB noise and signal BW, we can incorporate better BB DAC and more BB filtering. When compared with the NR-n74 band ( $\Delta f/BW$  ratio = 2.4), although NR-n3/NR-n66 band has wider signal BW (30/40 MHz), the larger  $\Delta f/BW$  ratio (3.17/10) alleviates the design challenge. We did not realize a variable-gain control in this work, which however can be applicable to the TIA-based PAD. Like [7], we can split the PAD to high-gain and low-gain modes, and in each mode, we can slice the PAD in small equivalent unit cells.

At different  $C_Z$ , we measured the OB rejection and -3-dB passband BW for 5G-NR bands at different offset frequencies (Fig. 42a) according to the 3GPP standard. The OB rejection enhances by 3.1 dB (7.2 dB) at a 48 MHz (120 MHz)



Fig. 39 Measured and simulated LO-defined bandpass responses at different 5G-NR bands



Fig. 40 Measured CIM<sub>3, 5</sub>, output noise, and ACLR<sub>1, 2</sub> for different 5G-NR bands



Fig. 41 Output noise and ACLR1, 2 for 10- and 20-MHz signal BW under power back-off



Fig. 42 Measured and simulated OB rejection and -3-dB passband BW under different  $C_Z$  and SW<sub>Z</sub>



Fig. 43 Power consumption at different 5G-NR bands and under power back-off

offset when compared with the 5-pF  $C_Z$ , while paying out for 2.1 MHz (3.5 MHz) for a – 3-dB passband BW. However, the –3-dB passband BW is still >43 MHz for  $C_Z = 20$  pF. We can also increase the OB rejection by enlarging SW<sub>Z</sub> (Fig. 42b). When  $R_{SWZ} = 20 \Omega$ , the simulated OB rejection improves by ~6 dB at 120 MHz offset when compared with a 40- $\Omega R_{SWZ}$ , while yielding an acceptable BW decrement (<3 MHz).

At the NR-n7 band, the  $-G_{m1}$  and PAD (58.3 mW) plus the LOGEN (12.2 mW) dominate the power consumption. From Fig. 43a, the power consumption rises from 55.1 mW at the NR-n74 band to 70.5 mW at the NR-n7 band due to the parasitic effect of  $-G_{m1}$  and PAD and the dynamic type of LOGEN that has an average power efficiency of ~4.8 mW/GHz. Under power back-off, the power consumption down-scales accordingly (Fig. 43b) for the NR-n72, the NR-n2, and the NR-n7 bands. For example, at a 3-dB power back-off, the power consumption at the NR-n7 band (NR-n74 band) drops by 12.5 mW (11.2 mW) mainly associated with  $-G_{m1}$  and PAD. The power saving is  $\leq 20\%$ , only based on the gate biases of the PAD.

Table 4 summarizes the chip performance and compares it with the state-of-theart TXs [25, 27] and [32]. Due to the effective BW-Ext FIL-MOD, isolated BB input

|                                     |                                                                                        | This Work                                              |                                          | ISSCC'18 [1.32]                   | ISSCC'16 [1.26]                          | JSSC'17 [1.27] |  |
|-------------------------------------|----------------------------------------------------------------------------------------|--------------------------------------------------------|------------------------------------------|-----------------------------------|------------------------------------------|----------------|--|
| TX Techniques                       | Isolated-BB input +<br>BW-Extended N-Path Filter-Modulator<br>+ Wideband TIA-based PAD |                                                        | Tracking-Notch-<br>Filter Mixer +<br>PAD | Resistive QDAC<br>+ Passive Mixer | N-Path SC Gain<br>Loop +<br>Wideband PAD |                |  |
| Fully Integration                   | Yes                                                                                    |                                                        |                                          | Yes                               | No<br>(off-chip baluns)                  | Yes            |  |
| RF Range (GHz)                      | 1.4 to 2.7                                                                             |                                                        |                                          | 1.4 to 2.7                        | 0.9, 2.4                                 | 0.7 to 2       |  |
| Frequency Bands                     | NR-n74<br>(1.4485GHz)                                                                  | NR-n2 NR-n7 HPUE-B41   (1.88GHz) (2.535GHz) (2.535GHz) |                                          | 2.4GHz                            | LTE-B2<br>(1.88GHz)                      |                |  |
| RF BW (MHz)                         | 20                                                                                     | 20                                                     | 20                                       | 20                                | 20                                       | 10             |  |
| Output Power (dBm)                  | 3.0                                                                                    | 3.1                                                    | 3.0                                      | 3.1                               | -3.5                                     | -1             |  |
| Power Cons. (mW)                    | 55.1                                                                                   | 60.2                                                   | 70.5                                     | 113.2                             | 24.8                                     | 38.4           |  |
| TX Efficiency (%)                   | 3.6                                                                                    | 3.4                                                    | 2.8                                      | 1.8                               | 1.8                                      | 2.1            |  |
| Output Noise<br>(dBc/Hz) @ Δf (MHz) | -157.8 @<br>48                                                                         | -157.5@<br>80                                          | -158 @<br>120                            | -157.8 @<br>80                    | -158.9 @<br>45                           | -154.5 @<br>80 |  |
| CIM <sub>3</sub> (dBc)              | -52.3                                                                                  | -52.5                                                  | -54                                      | -59.6                             | <-50                                     | -52            |  |
| ACLR1 (dBc)                         | -45.4                                                                                  | -45.6                                                  | -44.4                                    | -44.7                             | -47                                      | -40.3          |  |
| EVM (%)                             | 1.9                                                                                    | 1.8                                                    | 1.9                                      | No data                           | <1.6                                     | 2.0            |  |
| Active Area (mm <sup>2</sup> )      | 0.31                                                                                   |                                                        | 1.04                                     | 0.22 &                            | 0.038 #                                  |                |  |
| Supply Voltage (V)                  | 1, 1.8                                                                                 |                                                        | No data                                  | 0.9, 1.1                          | 1.1, 2.5                                 |                |  |
| CMOS Tech. (nm)                     | 28                                                                                     |                                                        | 14                                       | 28                                | 65                                       |                |  |

Table 4 Performance Summary and Benchmark with the State of the Art

& Output baluns are off-chip # Single-ended implementation is compact but leads to a limited output power

network and wideband TIA-based PAD, this work manifests a number of performance advantages among the linearity, OB noise, and TX efficiency. Although [25] reported a smaller die area than here, it requires off-chip baluns to combine the differential outputs, and its output power is 6.5 dB lower. Comparing with [27], this work supports a two times wider signal BW and achieves a 3-dB lower OB noise and a 4-dB higher output power. With similar output power and OB noise as [32], this work shows higher area efficiency  $(3.3\times)$  and TX efficiency (56%).

### 3.4 Conclusions

This chapter described a 1.4–2.7-GHz FDD SAW-less TX, composed by a BW-Ext FIL-MOD, an isolated BB input network, and a TIA-based PAD to deliver wide passband BW, low output noise, sufficient output power, and high linearity. The BW-Ext FIL-MOD featured a higher-order bandpass response co-synthesized by the PFN and NFN, alleviating the trade-off between the passband BW and OB rejection of the original FIL-MOD. For the isolated BB input network, it upheld the BW-Ext FIL-MOD's high-Q bandpass response during the BB signal injection and avoided the BB cross talk. The TIA-based PAD together with the inner parallel  $G_m$  linearizer

allowed sufficient output power, better linearity, and power efficiency. The fully integrated TX prototyped in 28-nm CMOS occupied an active area of 0.31 mm<sup>2</sup>. The circuit obtained high linearity (ACLR<sub>1</sub> = -44.4 dBc) and low OB noise (-158 dBc/Hz) when delivering a 20-MHz signal at a 3-dBm output power. The overall performance rendered this SAW-less TX an attractive candidate for multiband FDD radios.

#### References

- Franks, L. E., & Sandberg, I. W. (1960). An alternative approach to the realization of network transfer functions: The N-path filters. *The Bell System Technical Journal*, 39, 1321–1350.
- Ghaffari, A., Klumperink, E. A. M., Soer, M. C. M., & Nauta, B. (2011). Tunable high-Q N-path band-pass filters: Modeling and verification. *IEEE Journal of Solid-State Circuits*, 46(5), 998–1010.
- Ghaffari, A., Klumperink, E. A. M., & Nauta, B. (2013). Tunable N-path notch filters for blocker suppression: Modeling and verification. *IEEE Journal of Solid-State Circuits*, 48(6), 1370–1382.
- Andrews, C., & Molnar, A. C. (2010). A passive mixer-first receiver with digitally controlled and widely tunable RF interface. *IEEE Journal of Solid-State Circuits*, 45(12), 2696–2708.
- Murphy, D., Darabi, H., Abidi, A., Hafez, A., Mirzaei, A., Mikhemar, M., & Chang, M. (2012). A blocker-tolerant, noise-cancelling receiver suitable for wideband wireless applications. *IEEE Journal of Solid-State Circuits*, 47(12), 2943–2963.
- 6. Lin, Z., Mak, P.-I., & Martins, R. P. (2015, February). A 0.028mm<sup>2</sup> 11mW single-mixing blocker-tolerant receiver with double-RF N-path filtering, S<sub>11</sub> centering, +13dBm OB-IIP3 and 1.5-to-2.9dB NF. In *IEEE ISSCC digest of technical papers* (pp. 36–37).
- 7. He, X., & van Sinderen, J. (2009). A low-power, SAW-less WCDMA transmitter using direct quadrature voltage modulation. *IEEE Journal of Solid-State Circuits*, 44(12), 3448–3458.
- Codega, N., Rossi, P., Pirola, A., Liscidini, A., & Castello, R. (2014). A current-mode, low outof-band noise LTE transmitter with a class-a/B power mixer. *IEEE Journal of Solid-State Circuits*, 49(7), 1627–1638.
- Lin, Z., Mak, P.-I., & Martins, R. P. (2014). Analysis and modeling of a gain-boosted N-path switched-capacitor bandpass filter. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 61(9), 2560–2568.
- Mirzaei, A., Murphy, D., & Darabi, H. (2011). Analysis of direct-conversion IQ transmitters with 25% duty-cycle passive mixers. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 58(10), 2318–2331.
- Mirzaei, A., & Darabi, H. (2011). Analysis of imperfections on performance of 4-phase passivemixer-based high-Q bandpass filters in SAW-less receivers. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 58(5), 879–892.
- 12. Kaelus Inc., Report of "EVM degradation in LTE systems by RF filtering". http://www.kaelus. com/en
- Tohidian, M., Madadi, I., & Staszewski, R. B. (2014). Analysis and design of a high-order discrete-time passive IIR low-pass filter. *IEEE Journal of Solid-State Circuits*, 49(11), 2575–2587.
- 14. Xu, Y., Zhu, J., & Kinget, P. R. (2014, June). A blocker-tolerant RF front end with harmonicrejection filtering. In *IEEE Proceedings of radio frequency integrated circuits symposium digest of technical papers*, pp. (39–42).
- 15. Anadigics LTE Power Amplifier (AWT6652).: http://www.anadigics.com/products/view/ awt6652

- Mak, P.-I., & Martins, R. P. (2010). High-/mixed-voltage RF and analog CMOS circuits come of age. *IEEE Circuits and Systems Magazine*, (4), 27–39.
- Lin, Z., Mak, P.-I., & Martins, R. P. (2014). A sub-GHz multi-ISM-band ZigBee receiver using function-reuse and gain-boosted N-path techniques for IoT applications. *IEEE Journal of Solid-State Circuits*, 49(12), 2990–3004.
- Park, J. W., & Razavi, B. (2014). Channel selection at RF using miller bandpass filters. *IEEE Journal of Solid-State Circuits*, 49(12), 3063–3078.
- Duipmans, L., Struiksma, R. E., Klumperink, E. A. M., Nauta, B., & Vliet, F. E. (2015). Analysis of the signal transfer and folding in N-path filters with a series inductance. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 62(1), 263–272.
- Chen, Y.-H., Fong, N., Xu, B., & Wang, C. (2015, February). An LTE SAW-less transmitter using 33% duty-cycle LO signals for harmonic suppression. In *IEEE ISSCC digest of technical* papers (pp. 172–173).
- Lin, F., Mak, P.-I., & Martins, R. P. (2014). An RF-to-BB-current-reuse wideband receiver with parallel N-path active/passive mixers and a single-MOS pole-zero LPF. *IEEE Journal of Solid-State Circuits*, 49(11), 2547–2559.
- van Liempd, B., Borremans, J., Martens, E., Cha, S., Suys, H., Verbruggen, B., & Craninckx, J. (2014). A 0.9 V 0.4-6 GHz harmonic recombination SDR receiver in 28 nm CMOS with HR3/HR5 and IIP2 calibration. *IEEE Journal of Solid-State Circuits*, 49(8), 1815–1826.
- 23. 3GPP TR 21.916 V16.0.0., Release 16.
- 24. Giannini, V., Ingels, M., Sano, T., Debaillie, B., Borremans, J., & Craninckx, J. (2011, February). A multiband LTE SAW-less modulator with -160dBc/Hz RX-band noise in 40nm LP CMOS. In *IEEE ISSCC digest of technical papers* (pp. 374–376).
- Filho, P., Ingels, M., Wambacq, P., & Craninckx, J. (2016, February). A 0.22mm<sup>2</sup> CMOS resistive charge-based direct-launch digital transmitter with -159dBc/Hz out-of-band noise. In *IEEE ISSCC digest of technical papers* (pp. 250–252).
- 26. Qi, G., Mak, P.-I., & Martins, R. P. (2016, February). A 0.038mm<sup>2</sup> SAW-less multiband transceiver using an N-path SC gain loop. In *IEEE ISSCC digest of technical papers* (pp. 452–453).
- Qi, G., Mak, P.-I., & Martins, R. P. (2017). A 0.038mm<sup>2</sup> SAW-less multiband transceiver using an N-path SC gain loop. *IEEE Journal of Solid-State Circuits*, 52, 2055–2070.
- Andrews, C., & Molnar, A. C. (2010). Implications of passive mixer transparency for impedance matching and noise figure in passive mixer-first receivers. *IEEE Transactions on Circuits* and Systems I: Regular Papers, 57(12), 3092–3103.
- Lien, Y.-C., Klumperink, E. A. M., Tenbroek, B., Strange, J., & Nauta, B. (2018). Enhancedselectivity high-linearity low-noise mixer-first receiver with complex pole pair due to capacitive positive feedback. *IEEE Journal of Solid-State Circuits*, 53(5), 1348–1360.
- 30. Qi, G., van Liempd, B., Mak, P.-I., Martins, R. P., & Craninckx, J. (2018). A SAW-less tunable RF front-end for FDD and IBFD combining an electrical-balance duplexer and a switched-LC N-path LNA. *IEEE Journal of Solid-State Circuits*, 53(5), 1431–1442.
- Ingels, M., et al. (2013, February). A multiband 40nm CMOS LTE SAW-less modulator with -60dBc C-IM3. In *IEEE ISSCC digest of technical papers* (pp. 338–339).
- 32. Liu, Q., Kwon, D., & Bui, Q. (2018, February). A 1.4-to-2.7GHz high-efficiency RF transmitter with an automatic 3FLO-suppression tracking-notch-filter mixer supporting HPUE in 14nm FinFET CMOS. In *IEEE ISSCC digest of technical papers* (pp. 172–174).
- Tanaka, S., Behbahani, F., & Abidi, A. A. (1997, June). A linearization technique for CMOS RF power amplifiers. In *IEEE proceedings of VLSI symposium* (pp. 93–94).
- Chen, Y., et al. (2018, June). A wideband transmitter for LTE-A HPUE using CIM3 cancellation. In *IEEE proceedings of radio frequency integrated circuits symposium* (pp. 312–315).
- 35. Yang, X., Zhu, Y., Chan, C.-H., Seng-Pan, U., & Martins, R. P. (2018). Analysis of commonmode interference and jitter of clock receiver circuits with improved topology. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 65(6), 1819–1829.

# Power-Efficient RF and mm-Wave VCOs/PLL



Hao Guo, Zunsong Yang, Chee Cheow Lim, Harikrishnan Ramiah, Yatao Peng, Yong Chen, Jun Yin, Pui-In Mak, and Rui P. Martins

# 1 Introduction

The phase-locked loop (PLL) providing a pure local oscillator (LO) or a clock signal is one of the most critical building blocks in the wireless or wireline transceiver. Since operating at the highest frequency and directly deciding the out-of-band phase noise of a PLL, the oscillator plays a key role in the PLL subsystem. This chapter presents several techniques to improve the oscillator performance from radio frequency (RF) to millimeter-wave (mm-wave) frequency. We will also introduce reference spur reduction techniques for the subsampling PLL.

H. Guo · Z. Yang · Y. Peng · Y. Chen · J. Yin (🖂) · P.-I. Mak

C. C. Lim

e-mail: lim.cheecheow@apu.edu.my

H. Ramiah Department of Electrical Engineering, University of Malaya, Kuala Lumpur, Malaysia e-mail: hrkhari@um.edu.my

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

e-mail: yb77441@um.edu.mo; yb77431@um.edu.mo; martinpeng@um.edu.mo; ychen@um.edu.mo; junyin@um.edu.mo; pimak@um.edu.mo

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

School of Engineering, Asia Pacific University of Technology and Innovation, Kuala Lumpur, Malaysia

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

On leave from Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_2

# 2 Inverse Class-F (Class-F<sup>-1</sup>) VCO

#### 2.1 Toward the Power-Efficient Low-Phase-Noise Oscillators

The trend toward denser modulation schemes in imminent RF communication systems has continuously driven the development of high-purity LC oscillators with low power consumption. For an LC oscillator operating at  $\omega_0$ , we can generally express its phase noise at an offset frequency of  $\Delta \omega$  as [1]

$$\mathcal{L}(\Delta\omega) = 10 \cdot \log\left[F \cdot \frac{kT}{V_{\rm p}^2} \cdot \frac{L}{Q_1} \cdot \frac{\omega_0^3}{(\Delta\omega)^2}\right] \tag{1}$$

where k is the Boltzmann's constant, T is the absolute temperature,  $V_{\rm p}$  is the differential oscillation amplitude, L is the tank inductance,  $Q_1$  is the tank quality factor (Q-factor) at  $\omega_0$ , and F is the device excess noise factor defined as the total effective output noise normalized to the noise of the tank parallel resistance [2]. Notably, the inductor Q-factor ( $Q_L$ ) will dominate  $Q_1$  in oscillators below 10 GHz. Equation (1) suggests three methods to reduce the phase noise: (1) choosing an inductor with small inductance and high  $Q_{\rm L}$  to minimize the  $L/Q_{\rm L}$  ratio, (2) maximizing the oscillation amplitude  $V_{\rm P}$ , and (3) cutting down the noise factor F by reducing the noise contribution from the transistors. In the first method, scaling down the inductor dimension devilishly will degrade  $Q_{\rm L}$ , limiting the minimum L/  $Q_{\rm L}$  [3]. In the second method, the supply voltage limits  $V_{\rm P}$  that unfortunately diminishes with the scaling down of CMOS technology. We can observe that these two methods mainly rely on CMOS technology, which is not easy to manipulate in the circuit design stage. Regarding the third method, the noise contribution from the transistors directly relates to the specific oscillator circuit topology, which offers plenty of room for the circuit designer to explore novel circuit techniques or oscillator topologies.

Considering  $V_P$  already maximized for the low-phase-noise design, in a classic class-B oscillator with a tail current source (Fig. 1a), the tail current source provides a high impedance at the tail node, which isolates  $V_S$  and ground, preventing the tank Q degradation even when  $M_{1,2}$  is in the deep triode region under a large  $V_P$ . Thus, such an arrangement well suppresses the noise contribution from  $M_{1,2}$ . On the other hand, we can convert the noise at low frequency and  $2\omega_0$  of the tail transistor ( $M_T$ ) to the phase noise by mixing with the output signal at  $\omega_0$  through  $M_{1,2}$  [4]. Typically,  $M_T$  can be a dominant noise source. If we remove  $M_T$  and directly connect  $V_S$  to the ground, the tank Q degradation will occur when  $M_{1,2}$  is in the deep triode region. Subsequently, the noise contribution from  $M_{1,2}$  will significantly increase.

Figure 1b illustrates that we can suppress the noise contribution from  $M_{1,2}$  under a large  $V_p$ , by utilizing a tail tank resonating at the second harmonic frequency  $(2\omega_0)$  [4]. Here, a high impedance  $Z_{TAIL}$  at  $2\omega_0$  helps to alleviate the tank loading effect. From the viewpoint of the time-variant model [5], a high  $Z_{TAIL}$  at  $2\omega_0$  generates a



Fig. 1 Schematics of (a) the class-B oscillator with a tail current source, (b) the class-B oscillator with a tail tank, (c) the class-B oscillator with implicit CM resonance, and (d) the class- $F_{2,3}$  oscillator

second harmonic voltage at  $V_s$ . Thus, the circuit reshapes the drain-to-source voltage  $V_{DS}$  waveform to have a flat region close to zero when the transistor is in the triode region (Fig. 1b). Since the output phase is insensitive to the transistor channel noise in the flat area of  $V_{DS}$ , there is no noise upconversion, ideally reducing the noise contribution from  $M_{1,2}$  to the same level as using an ideal tail current source. With the aid of the tail tank, the 1.2GHz oscillator in [4] exhibited an excellent figure of merit (FoM) of 195 dBc/Hz at 10 MHz offset. However, accommodating the extra tail inductor increases the die area.

To realize a compact design, it is necessary to implement the high impedance at  $2\omega_0$  with the utilization of the implicit common-mode (CM) resonance of the main LC tank [6]. In a differential spiral inductor with a mutual coupling coefficient k (Fig. 1c), the tank inductance and capacitance are (1 + k)L and  $C_{DM} + C_{CM}$  with the tank excited by a differential-mode (DM) input, resulting in an oscillation frequency of  $\omega_0 = 1/\sqrt{(1+k)L(C_{\rm DM}+C_{\rm CM})}$ . At a CM input, the tank inductance and capacitance change to (1 - k)L/2 and  $2C_{CM}$ , corresponding to a CM resonance frequency of  $\omega_{\rm CM} = 1/\sqrt{(1-k)LC_{\rm CM}}$ . By properly selecting k and  $C_{\rm DM}/C_{\rm CM}$  ratio, we can obtain  $\omega_{CM}/\omega_0 = 2$  such that the tank can provide a high impedance  $Z_{CM}$  at  $2\omega_0$ . The 2.85–3.75 GHz oscillator prototype in 28 nm CMOS pulled off an FoM between 191.8 and 192.5 dBc/Hz at 5 MHz offset and a 1/f<sup>3</sup> corner between 120 and 240 kHz. We can also apply the CM resonance to a class-F oscillator [7]. From Fig. 1c, the transformer tank employing  $C_{\rm CM}$  in the primary winding and  $C_{\rm DM}$  in the secondary winding exhibits two DM and one CM resonant frequencies. Under CM excitation, the input signal cannot see the secondary winding since k becomes weak due to the magnetic flux cancellation, resulting in  $\omega_{\rm CM} = 1/\sqrt{L_{\rm P,CM}C_{\rm CM}}$ . By properly selecting the ratio between  $L_P C_{CM}$ ,  $L_S C_{DM}$ , and  $L_{P, CM} C_{CM}$ , we can guarantee  $\omega_{\rm CM}/\omega_0 = 2$ . Besides, this work also placed the other DM resonant frequency at  $3\omega_0$ , which eventually generates an  $F_{2,3}$  tank. Moreover, the passive voltage gain  $(A_v)$ from the drain to gate nodes of the transistors provided by the transformer tank helps to reduce the transconductance required to sustain oscillation, further reducing the

noise contribution from the transistors. The 5.4–7 GHz class- $F_{2,3}$  oscillator in 40 nm CMOS attained an FoM between 190.5 and 191.4 dBc/Hz at a 10 MHz offset. In addition, reference [6, 7] also revealed that the second harmonic resonance reshapes the transistor's drain voltage waveform to achieve a symmetric rise and fall time, which suppresses the flicker noise upconversion and improves the close-in phase noise.

The Q-factor ( $Q_2$ ) of the second harmonic resonance can affect the transistor noise suppression capability. Intuitively, a large  $R_{P2}$  can enlarge the second harmonic voltage in  $V_{DS}$ , extending the flat area in the  $V_{DS}$  waveform where the output phase is insensitive to the transistor channel noise. Unfortunately, a high  $Q_2$  is typically unavailable in the CM resonance tank for two reasons. First, the differential spiral inductor employed in [6] suffers from a relatively low Q-factor ( $Q_{L,CM}$ ) at  $\omega_{CM}$ due to unavoidable magnetic flux cancellation. From Fig. 2a, the CM current in the adjacent windings flows in the opposite direction, raising the ac resistance and reducing the inductance at CM oscillation (Fig. 2b). The electromagnetic (EM) simulations maintain the inner diameter and sweep the spacing between the two turns to change the k and  $Q_{L,CM}$ , while keeping a relatively constant L. From Fig. 2c, a lower k yields a higher  $Q_{L,CM}$  at the cost of chip area. Hence, we can hardly realize the required large  $C_D$ -to- $C_C$  ratio at high frequency when all the capacitor banks are off. Second, the CM technique requires the implementation of CM capacitors ( $C_{CM}$ ) with single-ended switched capacitors (SC). When compared



**Fig. 2** (a) CM current excitation. (b) Circuit model of CM current excitation. (c) EM-simulated  $Q_{L, CM}$ , and  $Q_{L, DM}$  versus coupling coefficient k

with the differential SC, the *Q*-factor of the single-ended SC is inherently lower for achieving the same tuning range [8], which further degrades the tank *Q*-factor ( $Q_2$ ) at the CM resonant frequency.

The inverse class-F (class- $F^{-1}$ ) oscillator [9, 10] (discussed in the following subsections) aims to break the abovementioned limitations of the CM resonance technique and further improve the phase noise and power efficiency.

# 2.2 Principle of the Class- $F^{-1}$ Oscillator

Figure 3 represents an NMOS (N-channel metal oxide semiconductor)-only class-F<sup>-1</sup> oscillator with a 1: *n* transformer (n > 1), where n is the transformer turn ratio defined by  $n = k_{\rm m} \sqrt{L_{\rm S}/L_{\rm P}}$ . As the transformer-based tank can provide two resonant peaks; it is possible to map them to  $f_0$  and  $2f_0$ , respectively. Furthermore, as Fig. 4 illustrates, when the drain current  $I_{\rm D}$  containing the first three harmonics multiplies by the tank input impedance  $Z_{\rm IN} = V_{\rm D}/I_{\rm D}$ , it generates large first ( $V_{\rm D1}$ ) and second ( $V_{\rm D2}$ ) harmonic voltages, justifying its class-F<sup>-1</sup> operation [11] (i.e., square-like  $I_{\rm D}$  and half-sinusoidal  $V_{\rm D}$ ). Using the T-model of the transformer (Fig. 5), we can derive the low ( $f_{\rm L}$ ) and high ( $f_{\rm H}$ ) resonant frequencies of the transformer tank as

$$f_{L(H)}^{2} = \frac{1 + \xi \mp \sqrt{(1 + \xi)^{2} - 4\xi \left(1 - k_{m}^{2}\right)}}{2L_{S}C_{S}\left(1 - k_{m}^{2}\right)}$$
(2)

where  $\xi$  is a ratiometric given by  $\xi = L_S C_S / L_P C_P$ . Fig. 6a plots the ratio of  $f_H / f_L$  against  $\xi$  for different  $k_m$ . To map  $f_L$  to  $f_0$  and  $f_H$  to  $2f_0$  (i.e.,  $f_H / f_L = 2$ ), we arrive at





Fig. 4 Current and voltage waveforms under the class- $F^{-1}$  operation



Fig. 5 Equivalent circuit of the transformer-based tank

$$16\xi^2 + (100k_{\rm m}^2 - 68)\xi + 16 = 0 \tag{3}$$

where  $k_{\rm m} \leq 0.6$  is the necessary condition for satisfying class-F<sup>-1</sup> operation. If we only consider the situation  $\xi > 1$  that maximizes the high-band impedance  $R_{\rm P2}$ , the variation of  $k_{\rm m}$  against  $\xi$  suggests that a high  $\xi$  requires a low  $k_{\rm m}$  to ensure  $f_{\rm H}/f_{\rm L} = 2$  (Fig. 6b).

With phase noise closely related to the tank impedance, we derive the input impedance  $Z_{IN}$  at  $f_0$  and  $2f_0$  as



**Fig. 6** (a)  $f_{\rm H}/f_{\rm L}$  and (b)  $k_{\rm m}$  as a function of  $\xi$ 



**Fig. 7** (a)  $R_{P1}$  and  $R_{P2}$  and (b) transformer voltage gain at first harmonic and second harmonic as a function of  $\xi (Q_P = Q_S = 13.5 \text{ and } f_S = 1/(2\pi\sqrt{L_SC_S}) = 3.2 \text{ GHz})$ 

$$R_{\rm P1} = \frac{8}{25} \cdot \frac{Q_{\rm P}L_{\rm P}}{\sqrt{L_{\rm S}C_{\rm S}}} \cdot \frac{\sqrt{5\xi(1+\xi)^3 \cdot (4-\xi)}}{4\xi^2 \left(\frac{Q_{\rm P}}{Q_{\rm S}}\right) - \xi \left(1+\frac{Q_{\rm P}}{Q_{\rm S}}\right) + 4} \tag{4}$$

$$R_{\rm P2} = \frac{1}{25} \cdot \frac{Q_{\rm P}L_{\rm P}}{\sqrt{L_{\rm S}C_{\rm S}}} \cdot \frac{\sqrt{5\xi(1+\xi)^3 \cdot (1-4\xi)}}{\xi^2 \left(\frac{Q_{\rm P}}{Q_{\rm S}}\right) - 4\xi \left(1+\frac{Q_{\rm P}}{Q_{\rm S}}\right) + 1}$$
(5)

where  $Q_{\rm P}$  and  $Q_{\rm S}$  are the intrinsic *Q*-factors of the primary and secondary coils, respectively. Figure 7a plots  $R_{\rm P1}$  and  $R_{\rm P2}$  for different  $L_{\rm P}$ , revealing that a high  $\xi$  will reduce  $R_{\rm P1}$  and raises  $R_{\rm P2}$ , favoring the phase-noise performance.

If we define the drain-to-gate voltage gain of the transformer as  $A_V = V_G/V_D$ , by using Eqs. (2) and (3), the  $A_V$  of the first harmonic  $(A_{V1})$  and the second harmonic  $(A_{V2})$  voltages are functions of  $\xi$  and *n* expressed, respectively, as

$$A_{\rm V1}(\xi) = \frac{n\sqrt{-4\xi^2 + 17\xi - 4}}{\sqrt{\xi} \cdot (4 - \xi)} \tag{6}$$

$$A_{\rm V2}(\xi) = \frac{n\sqrt{-4\xi^2 + 17\xi - 4}}{\sqrt{\xi} \cdot (4\xi - 1)} \tag{7}$$

Figure 7b plots Eqs. (6) and (7) in dB against  $\xi$ , suggesting that  $A_{V1}$  moves up with  $\xi$ .  $A_{V2}$  is smaller than 1 for  $\xi > 1.4$  and drops quickly as  $\xi$  increases. With the first harmonic drain voltage  $V_{D1}$  amplified at the gate while the limited transformer's resonant bandwidth attenuates the second harmonic voltage  $V_{D2}$ , the gate voltage would predominantly contain the first harmonic component only. Then, we only need to consider the first harmonic voltage for reliability. Yet, negative  $g_m$  transistors are vulnerable to breakdown if  $A_{V1}$  is excessively large. In another design trade-off, we can fix  $f_0$  with large  $\xi$  and n, requiring a tiny  $C_P$  which is hard to implement in practice.

Although  $|Z_{in}|$  has two peaks, that is,  $R_{P1}$  and  $R_{P2}$ , at both  $f_0$  and  $2f_0$  and  $R_{P2} > R_{P1}$ , the class- $F^{-1}$  oscillator can only oscillate at the frequency  $f_0$  with the transformer configured as a two-port resonator exhibiting inverted magnetic coupling from drain to gate nodes, in such a way that the loop gain fulfills the phase condition only at  $f_0$ , forming positive feedback around the loop [12, 13]. In summary, a small  $\xi$  favors low-power applications, but if a minimum PN is necessary, we should maximize  $\xi$ , providing that the negative  $g_m$  transistors still operate within the reliability limit.

To generate the differential outputs, Fig. 8 exemplifies how we can stack the PMOS (P-channel metal-oxide semiconductor)- and NMOS-based class- $F^{-1}$  oscillators and merge their respective transformer-based tanks together. We short the



Fig. 8 Class-F<sup>-1</sup> oscillator with a PMOS-NMOS-complementary topology to generate differential outputs

voltage waveforms

center taps of the two coils to provide self-biasing at  $V_{DD}/2$ , with M<sub>P</sub> and M<sub>N</sub> sized such that  $g_{m,p} = g_{m,n}$ . This self-biased scheme eliminates the extra bias circuit and its noise contribution. The inherent single-ended topology avoids the in-phase relationship between the second harmonic components  $(V_{\text{DP},\text{H2}}, V_{\text{DN},\text{H2}})$  in  $V_{\text{DP}}$  and  $V_{\text{DN}}$ . Instead,  $V_{\text{DP},\text{H2}}$  and  $V_{\text{DN},\text{H2}}$  are also differential (Fig. 8). Consequently, we avoid the Q-factor degradation of the inductor or transformer. Since both first and second harmonic components are differential, we can tune the first and second harmonic frequencies with either CM or DM capacitors. Then, we can simply implement  $C_{\rm S}$ and  $C_{\rm P}$  with differential switched capacitors, which improves the Q-factor of the tank and simplifies the frequency tuning scheme. Improving the capacitor *Q*-factor is especially helpful at millimeter-wave (mm-wave) frequencies in which the capacitor Q-factor dominates the tank Q-factor.

#### A 3.5–4.5 GHz Low-Phase-Noise Class- $F^{-1}$ VCO 2.3

The main target of the class- $F^{-1}$  VCO prototype is to achieve low phase noise by adopting a large  $\xi$ . Thus, we choose  $\xi = 3$ , which corresponds to  $k_{\rm m} = 0.38$ . We utilized a 2-to-4-turn tapped transformer with  $L_{\rm P} = 2.28$  nH,  $L_{\rm S} = 4.28$  nH, and  $k_{\rm m} = 0.38$ . EM simulation shows that the intrinsic Q-factors for the primary (L<sub>P</sub>) and secondary ( $L_{\rm S}$ ) coils at 4GHz are  $Q_{\rm P} = 19$  and  $Q_{\rm S} = 17$ , respectively. We can control the resonant frequencies of the transformer tank in the class- $F^{-1}$  oscillator, that is,  $f_{\rm L}$ and  $f_{\rm H}$ , almost independently by tuning  $C_{\rm S}$  and  $C_{\rm P}$ , respectively. Then, we can obtain the oscillation frequency  $f_{\rm L} = f_0$  by tuning  $C_{\rm S}$  while we can adjust  $C_{\rm P}$  to guarantee  $f_{\rm H} = 2f_0$ . To determine  $C_{\rm P}$  and  $C_{\rm S}$ , we utilize 6-bit and 5-bit binary-sized switched capacitors with an LSB (least significant bit) of 11 fF and 20 fF, respectively. For continuous frequency tuning, we include varactors in both  $C_{\rm P}$  and  $C_{\rm S}$ . The sizes of NMOS and PMOS transistors are 37.5 µm/60 nm and 75 µm/60 nm, respectively. Figure 9 plots the simulated voltage waveforms. Although the drain voltages  $V_{\rm DN}$ and  $V_{\rm DP}$  have a significant amount of the second harmonic component (i.e.,  $V_{\rm D2}/$ 



 $V_{\text{D1}} = 0.4$ ), the gate voltage only predominantly contains the first harmonic component as expected. Therefore, we should use  $V_{\text{GP}}$  and  $V_{\text{GN}}$  as the oscillator's outputs. The *Q*-factors of the switched capacitors mainly depend on the tuning range. By choosing the *Q*-factors of  $C_{\text{P}}$  and  $C_{\text{S}}$  as 47 and 65, and absorbing the loss of capacitors into the transformer coils for simplicity, we can reasonably assume  $Q_{\text{P}} = Q_{\text{S}} = 13.5$ , which results in  $Q_1 = 14$  at  $f_0$  and  $Q_2 = 9.3$  at  $2f_0$ , respectively.

According to the time-variant model [7], the noise factor F in the phase noise expression of Eq. (1) is

$$F = \sum_{i} \frac{R_{\rm P}}{4KT} \cdot \frac{1}{2\pi} \int_0^{2\pi} \Gamma_{\rm i}^2(\theta) \cdot i_{\rm n,i}^2(\theta) \,\mathrm{d}\theta \tag{8}$$

where  $i_{n,i}^2(\theta)$  is the white current noise power spectral density (PSD) of the *i*th device and  $\Gamma_i(\theta)$  is the corresponding impulse sensitivity function (ISF). As illustrated in Fig. 10,  $R_P$  models the transformer tank loss, while  $G_{ds,n(p)}$  and  $g_{m,n(p)}$  represent the channel conductance and transconductance of the NMOS and PMOS transistors, respectively. Further, even if the channel is the only physical noise source of the transistor, due to the large signal swing at the gate and drain nodes, the transistors will transit between the saturation and triode regions when turned on. For this reason, we artificially split the transistor current noise PSD into  $i_{n,GM}^2 = 4kT\gamma g_m(\theta)$  and  $i_{n,GDS}^2 = 4kTG_{ds}(\theta)$ , representing the noise PSD due to  $g_m$  and  $G_{ds}$ , respectively. As such, we can evaluate *F*, consisting of the effective noise factor of the tank ( $F_{TANK}$ ),  $g_m$  ( $F_{GM}$ ), and  $G_{ds}$  ( $F_{GDS}$ ), respectively, by substituting  $i_{n,TANK}^2 = 4kT/R_P$ ,

Fig. 10 Noise sources of the class- $F^{-1}$  oscillator



|                                             | F <sub>TANK</sub> | F <sub>GM</sub>              | F <sub>GDS</sub> | $\Sigma F$ | η    | FoM <sup>a</sup><br>(Cal.) | FoM<br>(Sim.) |
|---------------------------------------------|-------------------|------------------------------|------------------|------------|------|----------------------------|---------------|
| Class-B [16] <sup>b,c</sup>                 | 1                 | γn                           | 0                | 2.29       | 0.64 | 194.2                      | N/A           |
| Dynamic-biased class-<br>C [5] <sup>c</sup> | 1                 | γn                           | 0                | 2.29       | 0.77 | 195                        | N/A           |
| Class-F [6] <sup>c</sup>                    | 0.7               | $0.7\gamma_n$                | 0.27             | 1.87       | 0.5  | 194                        | N/A           |
| Implicit CM $[11]^d$<br>( $Q_2 = 6.3$ )     | 1.22              | 1.4γ <sub>n</sub>            | 0.4              | 3.42       | 0.74 | 193.1                      | 192.8         |
| Implicit CM $[11]^d$<br>( $Q_2 = 9.3$ )     | 1.17              | γn                           | 0.22             | 2.68       | 0.82 | 194.6                      | 194.3         |
| <i>Class</i> $F^{-1d}$ ( $Q_2 = 9.3$ )      | 1.2               | $0.1\gamma_n + 0.14\gamma_p$ | 0.08             | 1.6        | 0.88 | 197.2                      | 196.7         |

Table 1 Performance comparison of different oscillator topologies

<sup>a</sup>Assume  $Q_1 = 14$ ,  $\gamma_n = 1.29$ , and  $\gamma_p = 1.35$  in 65-nm CMOS, and use Eq. (9)

<sup>b</sup>Assume the current source is ideal and does not contribute with noise

<sup>c</sup>Data extracted from its corresponding work

<sup>d</sup>Data obtained from simulation,  $f_0 = 4$  GHz

 $i_{n,MOS(G_M)}^2$ , and  $i_{n,MOS(G_{DS})}^2$  into Eq. (8). Here, we can obtain  $g_m(\theta)$  and  $G_{ds}(\theta)$  of the NMOS and PMOS transistors as well as the ISFs through the transient simulation [10]. Therefore, as summarized in Table 1, we can compare the phase noise upconversion mechanism between the class-F<sup>-1</sup> and other oscillator topologies. Typically we utilize a figure of merit (FoM) that normalizes the phase noise to power consumption, offset, and carrier frequencies to compare oscillator performances. By defining the power efficiency  $\eta = P_{RF}/P_{DC}$  and  $P_{RF} = V_P^2/2R_P$ , we can acquire the FoM expression based on Eq. (1):

$$FoM = 10 \log_{10} \left( \frac{Q_1^2 \eta}{F} \cdot \frac{2}{10^3 kT} \right) \tag{9}$$

We maintain the same values of  $Q_1$  (= 14),  $R_{P1}$  (= 250  $\Omega$ ), and  $V_{DD}$  (= 0.6 V) for both the class-F<sup>-1</sup> and the implicit CM resonance topologies. Under these constraints, F and  $\eta$  are the only parameters determining the FoM. For completeness, we also evaluate the FoM of the implicit CM resonance having a high  $Q_2$  with an unpractical value of k = 0.18. Class F<sup>-1</sup> has a higher  $F_{TANK}$  when compared with class B, class C, and class F due to the large impedance at  $2f_{LO}$ , which shapes the drain voltage and the ISF waveform into a non-sinusoidal one. Nonetheless, we suppress the device noise in class-F<sup>-1</sup> to a minimum. Specifically, class F and implicit CM resonance improve the  $F_{GDS}$  by reshaping the drain voltage with the creation of a tank impedance at  $3f_0$  for the former and  $2f_0$  for the latter. By further improving  $Q_2$  and impedance at  $2f_0$ , the  $F_{GDS}$  of the class-F<sup>-1</sup> is close to the class-B oscillator using an ideal current source and the class-C in which we prevent the negative  $g_m$  transistors from entering the deep triode region due to a small oscillation amplitude. Resulting from a small  $G_{DS}$  and a large voltage gain in the transformer, class F<sup>-1</sup> also achieves a much lower  $F_{GM}$  than the other four topologies. On the
other hand, we can also implement the voltage gain in class-C topologies through a transformer coupling from source to gate terminals [2]. Assuming a voltage gain of 2.5 in the dynamic-biased class-C topology of Table 1, it leads to a reduction of its  $F_{\rm GM}$  to  $\gamma_{\rm n}/2.5$ , which further improves the FoM to 196.7dBc/Hz, provided the negative  $g_{\rm m}$  transistors are still in the saturation region. However, such a voltage gain would reduce the maximum drain voltage swing that keeps the transistors in the saturation region.

For the class- $F^{-1}$  oscillator, since the phase of the tank impedance at  $2f_{LO}$  is zero, the second harmonic current will flow through a purely resistive path which minimizes the flicker noise upconversion caused by the Groszkowski's effect [7]. Again, using the time-variant model, there is a relationship between the  $1/f^3$  noise corner  $(\omega_{1/f^3})$  of an oscillator and the transistor's flicker noise corner  $(\omega_{1/f})$  through

$$\omega_{1/f^3} \approx \frac{1}{2} \cdot \omega_{1/f} \cdot \left(\frac{\Gamma_{\text{EFF,DC}}}{\Gamma_{\text{EFF,H1}}}\right)^2 \tag{10}$$

where  $\Gamma_{\rm EFF}(\omega_0 t) = \Gamma_{\rm MOS}(\omega_0 t) \cdot \alpha(\omega_0 t)$  and  $\alpha(\omega_0 t) = g_m(\omega_0 t)/g_{m, \max}$  is the noise modulating function that reflects the cyclostationary process of the flicker noise source.  $\Gamma_{\rm EFF, DC}$  and  $\Gamma_{\rm EFF, H1}$  are the dc and first harmonics of the  $\Gamma_{\rm EFF}(\omega_0 t)$ , respectively. The simulated  $\Gamma_{\rm EFF}$  of the class-F<sup>-1</sup> oscillator (Fig. 11) shows that its waveform is symmetric with  $\Gamma_{\rm EFF, DC} = 0.0011$  and  $\Gamma_{\rm EFF, H1} = 0.035$ , resulting in a small value of  $(\Gamma_{\rm EFF, DC}/\Gamma_{\rm EFF, H1})^2 = 9.88 \times 10^{-4}$ . Therefore,  $\omega_{1/f^3}$  is lower than  $\omega_{1/f}$  by more than three orders of magnitude.

Frequency pushing due to the supply variation is another critical metric for the oscillator. The oscillator with a small frequency pushing can relax the power supply rejection requirement of the voltage regulator. Furthermore, small frequency pushing helps to reduce the noise contribution from the voltage regulator itself. Groszkowski's effect can also induce the frequency pushing when the supply



voltage varies, considering the relationship between the frequency shift and harmonic current given by [14]

$$\left|\frac{\Delta\omega}{\omega_{0}}\right| = \frac{1}{Q^{2}} \sum_{n=2}^{\infty} \frac{n^{2}}{n^{2} - 1} \cdot \left|\frac{I_{\rm Hn}}{I_{\rm H1}}\right|^{2}$$
(11)

where  $I_{\text{Hn}}$  and  $I_{\text{H1}}$  are the *n*th harmonic and the fundamental components of the tank current, respectively. Usually, the second harmonic current is dominant over the other high-order harmonic currents. In the conventional LC oscillator without the second harmonic resonance, the second harmonic current flows through the tank capacitor. The supply voltage variation will change the  $I_{\text{H2}}/I_{\text{H1}}$  ratio in such a way that it will shift the oscillation frequency to restore the balance between the reactive and inductive energies of the tank.

The class-F<sup>-1</sup> topology significantly suppresses the frequency pushing due to the Groszkowski's effect since the  $I_{\rm H2}$  flows through a purely resistive path with  $f_0$  and  $2f_0$  perfectly aligned, that is,  $f_{\rm H} = 2f_0$  (Fig. 12a). The residual small frequency pushing induced by the Groszkowski's effect results from the high-order harmonics and change of the parasitic capacitance ( $C_{\rm GS}$  and  $C_{\rm GD}$ ) of the negative  $g_m$  transistors. It is interesting to observe that the frequency pushing can be either positive or negative depending on the relationship between the  $f_{\rm H}$  and  $2f_0$  (Fig. 12a). When  $f_{\rm H} < 2f_0$ , the phase of the tank impedance  $Z_{\rm in}$  is negative at  $2f_0$  (Fig. 12b) and  $I_{\rm H2}$  flows through a capacitive path. A higher  $V_{\rm DD}$  raises  $I_{\rm H2}/I_{\rm H1}$ , requiring a lower oscillation frequency to rebalance the capacitive and inductive energy of the tank. On the other hand, the phase of the tank impedance  $Z_{\rm in}$  becomes positive (Fig. 12b) at  $2f_0$  when  $f_{\rm H} > 2f_0$  and the  $I_{\rm H2}$  flows through an inductive path instead. Subsequently, the frequency shift caused by  $I_{\rm H2}$  becomes positive as  $V_{\rm DD}$  rises. When  $f_{\rm H}$  is



**Fig. 12** (a) Simulated frequency pushing defined as  $(f_0@V_{DDH} - f_0@V_{VDDL})/(V_{DDH} - V_{DDL})$  against  $C_P$  by varying  $V_{DD}$  from 0.55 V ( $V_{DDL}$ ) to 0.65 V ( $V_{DDH}$ ) and (b) tank impedance  $Z_{IN}$  when  $f_H > 2f_0$  (upper) and  $f_H < 2f_0$  (lower)

slightly higher than  $2f_0$ , it is possible to achieve zero frequency pushing by letting the positive frequency shift (caused by  $I_{H2}$ ) and the negative frequency shift (caused by the high-order harmonic currents and parasitic capacitance) to cancel each other (Fig. 12a).

We can keep a small imbalance, between the differential outputs of the class- $F^{-1}$  oscillator induced by asymmetrical parasitic capacitances of NMOS and PMOS transistors, when operating at 4GHz. The simulation results of Fig. 13 indicate that the amplitude ratio and phase error of the drain voltages are 1.11 and  $-1.95^{\circ}$ , respectively. On the other hand, there is a further suppression to  $1.006^{\circ}$  and  $0.12^{\circ}$  for the gate voltages owing to the voltage gain provided by the transformer.

Figure 14 presents the class- $F^{-1}$  oscillator prototyped in 65 nm CMOS occupying a core area of 0.14 mm<sup>2</sup>. The class- $F^{-1}$  oscillator is tunable from 3.49 to 4.51GHz, and we evaluate its phase noise performance using a Keysight E5052B Signal



Fig. 13 (a) Amplitude ratio and (b) phase imbalance between the differential outputs at the drain and gate nodes

Fig. 14 Chip micrograph





Fig. 15 (a) Measured phase noise and (b) FoM versus offset frequency at  $V_{DD} = 0.6$  V



Fig. 16 (a) Measured (a) phase noise and (b) FoM versus tuning frequency

Source Analyzer. Figure 15a shows the phase noise profile of  $f_{min}$  to  $f_{max}$  when the power consumption ranges between 1.14 and 1.2 mW at  $V_{DD} = 0.6$  V. Plotting the corresponding FoM in Fig. 15b, the maximum FoM ranges between 195.6 and 196.2 dBc/Hz in the  $1/f^2$  region. Figures 16a, b illustrate the measured phase noise and the FoM, respectively, across the tuning frequency at 0.1/1/10 MHz offsets. The circuit consistently maintains the FoM at 10 MHz offset, which agrees well with the simulation results described earlier, whereas at 100 kHz offset, the FoM varies between 190.9 and 192.5 dBc/Hz, translating to a  $1/f^3$  corner ranging from 100 to 300 kHz (Fig. 17a). The frequency pushing shows a +4.5 MHz/V and -15 MHz/V



Fig. 17 Measured (a) 1/f<sup>3</sup> phase noise corner versus frequency and (b) frequency pushing

at  $f_{\min}$  and  $f_{\max}$ , respectively (Fig. 17b). The opposite trend of the frequency pushing results from the different control voltages of the varactor at  $f_{\min}$  and  $f_{\max}$ .

# 3 Wideband Mode-Switching MM-Wave VCO

#### 3.1 Capacitive and Resonant Mode-Switching Techniques

Designing a wideband low-phase-noise oscillator is a challenging task. As frequency increases, the *Q*-factor of the capacitive devices in the resonant tank, for example, the varactor or the switched capacitor, eventually becomes lower than the *Q*-factor of the inductive devices, causing a severe trade-off between phase noise and frequency tuning range. To maintain low phase noise, the tuning range of the reported oscillators at 60GHz has a typical limitation that is ~10% [15].

Although initially entailed in the sub-10GHz oscillators, the mode-switching techniques [16, 17] promise to break the trade-off between phase noise and tuning range in the mm-wave oscillators. Basically, by changing the coupling polarity of two identical oscillators, the capacitive and resonant mode-switching techniques can achieve two modes with different effective tank capacitances or inductances for coarse frequency tuning (Fig. 18). Since the voltages on the two sides of the on-switches in both even and odd modes are in phase, their turn-on resistances have a negligible effect on the phase noise. Thus, we can keep the switch size small as long as they can adequately synchronize the two oscillators with frequency difference induced by the process, voltage, and temperature (PVT) variations. Within each mode, the varactor or switched capacitor is still necessary to obtain continuous frequency tuning. Ideally, we can have the tuning range doubled without impairing the phase noise. However, both the capacitive and resonant mode-switching techniques have their limitations. For the first technique,  $C_8$  increases



Fig. 18 Concepts of (a) capacitive mode switching and (b) resonant mode switching



Fig. 19 Concepts of the inductive mode-switching technique

the parasitic capacitance in the odd mode, shrinking the tuning range. The tuning range degradation exacerbates at a higher frequency since the parasitic capacitance occupies a larger portion of the tank capacitance. As for the latter, the energy stored in the inductor reduces at the odd mode, but the trace length and subsequently the resistive loss of the inductor remain unchanged, degrading the inductor Q-factor.

### 3.2 Inductive Mode-Switching Technique

To overcome the limitation of the capacitive and resonant mode-switching techniques, we present in Fig. 19 an inductive mode-switching technique [18] that utilizes the common-mode inductor  $L_{\rm CM}$  to vary the effective tank inductance in the even and odd modes. In the odd mode, since there is no current at the fundamental frequency flowing through the  $L_{\rm CM}$ , it excludes the resistive loss of  $L_{\rm CM}$  from



Fig. 20 Inductive mode-switching dual-core oscillator: (a) implementation using two separate inductors and (b) implementation using a single center-tap inductor

the tank inductance, preventing the inductor Q degradation. We can simply realize the  $L_{\rm CM}$  by the metal trace connecting the center tap of a differential inductor and the supply node.

Figure 20a shows one straightforward way to realize the inductive mode switching which uses two separate inductors. Here, we employed two identical single-turn inductors to secure a small inductance value at mm-wave frequencies. A pair of large decoupling capacitors  $C_{dcp}$  connected between  $V_{DD}$  and ground provide a short path for the oscillation signal and its harmonics. In the odd mode, when switches S<sub>3</sub> and S<sub>4</sub> are on while S<sub>1</sub> and S<sub>2</sub> are off, the voltages at P<sub>1</sub> and P<sub>2</sub> are differential. Node B<sub>1</sub> (B<sub>2</sub>) is the virtual ground, and the inductance of trace O<sub>1</sub>B<sub>1</sub> (O<sub>2</sub>B<sub>2</sub>) is the CM inductance ( $L_{CM}$ ) that would not count for the total tank inductance. Then, the VCO will operate at a high frequency:

$$\omega_{\rm O} = \frac{1}{\sqrt{\rm LC}} \tag{12}$$

where *L* represents the inductance of the trace  $P_1B_1P_2$ . On the other hand, in the even mode, when  $S_1$  and  $S_2$  are on and  $S_3$  and  $S_4$  are off, the voltages at  $P_1$  and  $P_2$  are in phase. Consequently, the CM inductance of trace  $O_1B_1$  ( $O_2B_2$ ) contributes to the overall tank inductance, thereby reducing the oscillation frequency:

$$\omega_{\rm E} = \frac{1}{\sqrt{(L+4L_{\rm CM})C}} \tag{13}$$

If we consider the magnetic coupling between traces  $P_1B_1$  and  $P_2B_1$  and denote the coupling coefficient as  $k_m$ , we can obtain accurate expressions of oscillation frequencies in the odd and even modes as [18].

$$\omega_{\rm O} = \frac{1}{\sqrt{(1-k_m)LC}} \tag{14}$$

Power-Efficient RF and mm-Wave VCOs/PLL

$$\omega_{\rm E} = \frac{1}{\sqrt{[(1+k_m)L + 4L_{\rm CM}]C}}$$
(15)

Since  $k_{\rm m}$  is weak in a single-turn inductor as long as the inductor's dimension is not over-shrunk,  $M = k_{\rm m}L_0$  is usually quite small. According to the EM simulation, an octagonal inductor with a radius of 52 µm gives  $L_0 \approx 80$  pH and  $k_{\rm m} \approx 0.07$  at 50 GHz, resulting in  $M \approx 5.6$  pH. Then, the frequency difference mainly depends on  $L_{\rm CM}$ , which we can precisely control by the length of the metal trace O<sub>1</sub>B<sub>1</sub> (O<sub>2</sub>B<sub>2</sub>).

In the layout (Fig. 20a), there is no ideal ground plane that can perfectly short the two ground nodes physically separated far away. As a result, the parasitic inductance from the ground plane would significantly increase  $L_{CM}$ , making it not well controlled. To overcome this issue, we presented a novel inductor layout by flipping both the center taps  $O_1B_1$  and  $O_2B_2$  into the center allowing the merger of  $O_1$  and  $O_2$  as a single node O (Fig. 20b). In the end, we can well define  $L_{CM}$  by the length of the metal traces  $OB_1$  and  $OB_2$ . The positions of ports  $P_{1-4}$  remain unchanged to facilitate the connection of negative  $g_m$  cells and switches.

# 3.3 A 42.9–50.6 GHz Quad-Core-Coupled VCO Using Inductive Mode-Switching Technique

To further reduce the phase noise, Fig. 21 illustrates how to apply the inductive mode-switching technique to a quad-core-coupled VCO. The switch  $(S_{i,j})$  connects the corresponding two ports  $(P_i, P_j)$ . The detailed arrangement of the 12 switches is  $S_{1,2}$   $(P_1, P_2)$ ,  $S_{2,3}$   $(P_2, P_3)$ ,  $S_{3,4}$   $(P_3, P_4)$ ,  $S_{4,5}$   $(P_4, P_5)$ ,  $S_{5,6}$   $(P_5, P_6)$ ,  $S_{6,7}$   $(P_6, P_7)$ ,  $S_{7,8}$   $(P_7, P_8)$ ,  $S_{1,8}$   $(P_1, P_8)$ ,  $S_{1,5}$   $(P_1, P_5)$ ,  $S_{2,6}$   $(P_2, P_6)$ ,  $S_{3,7}$   $(P_3, P_7)$ , and  $S_{4,8}$   $(P_4, P_8)$ .

In the odd mode (Fig. 21a),  $S_{1,8}$ ,  $S_{2,3}$ ,  $S_{4,5}$ ,  $S_{6,7}$ ,  $S_{1,5}$ ,  $S_{2,6}$ ,  $S_{3,7}$ , and  $S_{4,8}$  are on;  $S_{1,2}$ ,  $S_{3,4}$ ,  $S_{5,6}$ , and  $S_{7,8}$  are off. Then, the switches  $S_{1,8}$ ,  $S_{4,5}$ ,  $S_{1,5}$ , and  $S_{4,8}$  ( $S_{2,3}$ ,  $S_{6,7}$ ,  $S_{2,6}$ ,  $S_{3,7}$ ) connect together the ports  $P_1$ ,  $P_4$ ,  $P_5$ , and  $P_8$  ( $P_2$ ,  $P_3$ ,  $P_6$ ,  $P_7$ ). Similar to the



Fig. 21 Port excitations of the quad-core-coupled VCO in the (a) odd and (b) even modes. (c) EM simulated Q-factors of the single center-tap inductor and the conventional single-turn octangle inductor

case for the dual-core-coupled VCO, nodes  $B_{1-4}$  become the virtual ground nodes, and the inductances from metal traces OB<sub>1</sub>, OB<sub>2</sub>, OB<sub>3</sub>, and OB<sub>4</sub> will not count for the equivalent tank inductance  $L_{eq,odd}$ . To consider the magnetic coupling effect, we group OP<sub>1</sub>P<sub>2</sub> and OP<sub>5</sub>P<sub>6</sub> (OP<sub>3</sub>P<sub>4</sub> and OP<sub>7</sub>P<sub>8</sub>) as inductor 1 (inductor 2) together, and we can obtain their equivalent inductance  $L_A$  as  $L_A = (1 - k_1)L/2$  where L is the intrinsic inductance of trace P<sub>1</sub>B<sub>1</sub>P<sub>2</sub> and  $k_1$  is the magnetic coupling coefficient between traces B<sub>1</sub>P<sub>1</sub> and B<sub>1</sub>P<sub>2</sub>. If the magnetic coupling coefficient between inductors 1 and 2 is  $k_2$ , the equivalent tank inductance becomes

$$L_{\rm eq,odd} = \frac{(1-k_1)(1+k_2)}{4}L$$
(16)

With the adjacent ports between inductors 1 and 2 all excited in phase, the magnetic coupling  $k_2$  enhances the equivalent tank inductance.

In the even mode (Fig. 21b),  $S_{1,2}$ ,  $S_{3,4}$ ,  $S_{5,6}$ ,  $S_{7,8}$ ,  $S_{1,5}$ ,  $S_{2,6}$ ,  $S_{3,7}$ , and  $S_{4,8}$  are on;  $S_{2,3}$ ,  $S_{4,5}$ ,  $S_{6,7}$ , and  $S_{1,8}$  are off. Then, the switches  $S_{1,2}$ ,  $S_{5,6}$ ,  $S_{2,6}$ , and  $S_{1,5}$  ( $S_{3,4}$ ,  $S_{7,8}$ ,  $S_{3,7}$ ,  $S_{4,8}$ ) connect together the ports  $P_1$ ,  $P_2$ ,  $P_5$ , and  $P_6$  ( $P_3$ ,  $P_4$ ,  $P_7$ ,  $P_8$ ). Again, we group  $OP_1P_2$  and  $OP_5P_6$  ( $OP_3P_4$  and  $OP_7P_8$ ) as inductor 1 (inductor 2), with their equivalent inductance expressed as  $L_B = (1 + k_1)L/4 + L_{CM}/2$ . Now with all the adjacent ports between inductors 1 and 2 excited differentially, the equivalent tank inductance will be

$$L_{\rm eq,even} = \frac{(1+k_1)(1-k_2)}{4}L + (1-k_2)L_{\rm CM}$$
(17)

Interestingly, the magnetic coupling  $k_2$  helps to nullify the effect of  $k_1$  on both  $L_{eq}$ , odd and  $L_{eq,even}$  according to Eqs. (16) and (17). The EM simulation reveals that  $k_1$ and  $k_2$  are 0.13 and 0.15, respectively, when the inductor in Fig. 21 has a radius of 55 µm, resulting in  $(1 - k_1)(1 + k_2) \approx 1$  and  $(1 + k_1)(1 - k_2) \approx 0.96$ . Thus, we can well control the difference between  $L_{eq,even}$  and  $L_{eq,odd}$ , that becomes  $0.85L_{CM} - 0.01L$ , which also mainly depends on  $L_{CM}$ , to achieve a small frequency gap between the two modes.

We can compare the *Q*-factor of the inductor in Fig. 21 with that of a classic octagonal inductor having a similar inductance by using EM simulation, with the modes controlled by ideal switches. In Fig. 21c, the *Q*-factor of the inductor only degrades by 7% from 22.6 to 21 at 50 GHz. Assuming a varactor *Q*-factor of ~10, the tank *Q*-factor would negligibly reduce from 6.93 to 6.77 (2.3%). Also, the inductor *Q*-factors in both odd and even modes are almost identical when the frequency is less than 60 GHz.

Figure 22 shows the schematic of the quad-core-coupled VCO. There are four cross-coupled negative  $g_m$  cells and the tank capacitor  $C_T$  connected between the two ports always excited differentially in both odd and even modes, to compensate for the tank loss. We use the dual modes created by inductive mode switching for coarse frequency tuning, with the fine frequency tuning within each mode obtained



Fig. 22 Schematic of the quad-core-coupled VCO

by four-bit binary-switched varactors. We employ a small accumulation-mode MOS varactor for continuous frequency tuning.

If we choose  $L_0 = 85$  pH,  $L_{CM} = 17$ , pH  $k_1 = 0.13$ , and  $k_2 = 0.15$ , the calculation of the frequency ratio leads to  $f_H/f_L = 1.16$ . We use the EM simulated scatter parameters of the inductor to construct a resonant LC tank, observing as predicted two resonant frequencies, that is,  $f_H$  and  $f_L$ , and with their ratio equal to 1.15, which is quite close to the calculated results.

The switch size should be large enough to (1) avoid the bimodal oscillation, (2) guarantee frequency synchronization, and (3) prevent phase noise degradation in the presence of frequency mismatch among the four cores. As analyzed in [18], the third aspect for preventing phase noise degradation sets the most stringent requirement on the switch size. As a result, we choose the turn-on resistance of the switch to be 260  $\Omega$  in this design, which introduces around 1.5 dB PN penalty at a 20 MHz offset frequency when accounting for a 7% frequency mismatch.

The mode-switching quad-core-coupled VCO prototyped in a 65 nm LP CMOS process occupied a die area of 0.039 mm<sup>2</sup> (Fig. 23). We realize the switches for mode selection with thin-oxide PMOS transistors having a  $W/L = 7 \mu m/60$  nm to secure a turn-on resistance of 260  $\Omega$ . When turned off, the bias of the switch's gate is at 0.9 V, which can guarantee that the turn-off resistance is much higher than the







Fig. 24 Measured phase noise versus offset frequencies at carrier frequencies of (a) 43.43GHz (even mode) and (b) 46.03GHz (odd mode)

turn-on resistance for most of the time within one cycle at an output swing between 0.4 and 1.36 V. We employed the open-drain buffer to extract the oscillating signal from  $P_1$  for testing purposes, while we also installed other seven ports with the same buffer to balance the loading impedance.

The VCO prototype consumes around 21 mA at 0.9 V. The measured frequency tuning ranges are from 42.9 to 46.8 GHz in the even mode and from 46.03 to 50.6 GHz in the odd mode. Figure 24 displays the measured phase noise curves at two typical frequencies in both modes, obtained by averaging five measurements. The measured PNs at 43.43 and 46.03 GHz are -112.5 and -113.1 dBc/Hz at 3 MHz offset frequency, respectively. The estimated  $1/f^3$  corner frequency is 680 kHz and 710 kHz at 43.43 and 46.03 GHz, respectively. Figure 25 plots the



Fig. 25 Measured (a) phase noise and (b) FoM at 3 MHz offset versus carrier frequencies

measured phase noises and FoMs at 3 MHz offset frequency across the entire frequency range. The measured phase noise lies between -111.2 and -115.6dBc/Hz, corresponding to an excellent FoM between 181.1 and 186.6 dBc/Hz.

#### 4 Multi-Resonant-RLCM-Tank VCO

VCO designs evolved from single-resonant LC-tank VCOs to the recent multiresonant RLCM (resistor-inductor-capacitor-mutual-inductance) tank VCOs that allow reshaping of the VCO phase noise (PN) impulse sensitivity function (ISF). Two recent RF VCOs [9, 19] exploited the second harmonic resonance to impede the flicker noise upconversion, achieving an FoM<sub>@1MHz</sub> of up to 195.4 dBc/Hz (Fig. 26). Specifically, the RLCM tank in [19] tailors a multi-turn inductor with a positive mutual coupling factor (k > 0) to generate an implicit common-mode (CM) resonance at 2× the oscillation frequency (F<sub>OSC</sub>). Yet, the CM resonance has a quality factor (Q) that is ~50% of its differential-mode (DM) counterpart, due to partial magnetic flux cancellation within the multi-turn inductor. In [9], the transformer-based RLCM tank offers two intrinsically high-Q resonances at F<sub>OSC</sub> and 2F<sub>OSC</sub> that can effectively suppress the PN in the 1/f<sup>2</sup>-to-1/f<sup>3</sup> PN regions. Still,



Fig. 26 Simplified schematics and key features of the VCOs in [9, 19] and this work. All apply RLCM tanks to generate the multi-resonant tank impedance

when migrating to mm-waves, both [9, 19] face their respective challenges. A large fixed DM capacitor ( $C_{DM}$ ) for fulfilling the DM-to-CM capacitance ratio of [19] largely limits the tuning range (TR) at mm-wave. For [9], the high-ratio multi-turn transformer cannot maintain its Q at mm-waves.

Here, we present an mm-wave VCO using a multi-resonant RLCM tank (Fig. 26) that enables compact and high-FoM implementation. Specifically, a single turn multi-tap inductor with k < 0 and CM-only tunable capacitors, without the constraint of the DM-to-CM capacitance ratio, enable high-Q high impedance resonances at  $F_{OSC}$ ,  $2F_{OSC}$ , and  $3F_{OSC}$ .

We can properly combine the first, second, and third harmonic voltages to nullify the DC value of the flicker noise effective ISF ( $\Gamma_{1/f,eff,dc}$ ). In Fig. 27 (left),  $A_V$  and  $\theta_V$ denote the gain and phase shift, respectively, of the first harmonic voltage traveling from the drain ( $V_D$ ) to the gate ( $V_G$ ) of the MOS transconductor ( $G_M$ ). For a one-port RLCM tank, [ $A_V = 1$ ,  $\theta_V = 0$ ] can ideally yield  $\Gamma_{1/f,eff,dc} = 0$ . For a two-port RLCM tank [20–22], we have [ $A_V > 1$ ,  $\theta_V < 0$ ] that implies more amplitude and phase controllability. Specifically, we can obtain the normalized ISF at  $V_D$ :  $\Gamma(t) = V'_D(t)/$ max( $|V'_D(t)|$ ) and the noise modulation function:  $m(t) = G_M(t)/max[G_M(t)]$ . In one oscillation cycle, we can define  $G_M$  in the cutoff, linear, and saturation regions as 0,  $\beta V_D$ , and  $\beta (V_G - V_{th})$ , respectively.  $\Gamma(t)$  is only a function of the amplitude ratios



**Fig. 27** *Left:* One-port and two-port RLCM tanks. *Right:* flicker noise ( $\Gamma_{1/f,eff,dc}$ ) reduction versus  $\theta_V$  and the selected  $A_{1,2}$  and  $\theta_{1,2}$  combination from this work for  $2\Gamma^2_{rms} = 0.45$ 

and phase differences of the second to first harmonics  $[A_1, \theta_1]$  and third to first harmonics  $[A_2, \theta_2]$ . Such variables allow plotting  $\Gamma_{1/f,eff,dc}$  against  $\theta_V$  (Fig. 27 (right)). For the class-F<sub>2</sub> VCO [20] with  $[A_1 = 0.3, \theta_1 = 0]$  or class-F<sub>3</sub> VCO [21] with  $[A_2 = 0.33, \theta_2 = 180^\circ]$ ,  $\Gamma_{1/f,eff,dc}$  is zero only at  $\theta_V = 0$ , being a constraint of the two-port RLCM tank. Although [22] uses both  $2F_{OSC}$  and  $3F_{OSC}$  resonances, they are of low impedance and low Q, while the optimization only concerns  $A_1$  and  $A_2$ . Here, we reveal that proper combinations of  $A_{1,2}$  and  $\theta_{1,2}$  can yield  $\Gamma_{1/f,eff,dc} = 0$ , regardless of  $\theta_V$ . To realize  $[A_V = 2.95, \theta_V = -10^\circ]$  for the VCO, we choose  $[A_1 = 0.5, \theta_1 = 90^\circ]$  and  $[A_2 = 0.3, \theta_2 = 90^\circ]$  to nullify  $\Gamma_{1/f,eff,dc}$  and obtain  $2\Gamma_{rms}^2 = 0.45$  that is better than those in [19-21, 23].

Figure 28 (left) depicts the VCO and its sizing parameters, excluding the varactors and switched capacitor (SC) cells. An EM simulation at 28GHz aids extracting the equivalent inductive model (e.g.,  $L_1$  ( $r_1$ ),  $L_2$  ( $r_2$ ), and k). With k < 0, the CM-to-DM inductance ratio can be >1 (i.e.,  $L_{CM}/L_{DM} = (1 - k)/(1 + k)$ ), allowing the CM resonance to sit around  $2F_{OSC}$ , between the two DM resonances at  $F_{OSC}$  and  $3F_{OSC}$ . We can find the optimal DM-to-CM capacitance ratio by sweeping the factor X (Fig. 28, right) similar to [19]. Interestingly, the VCO with only CM tunable capacitors (i.e., X = 0) helps correcting the frequency ratio between



**Fig. 28** *Left:* detailed schematic of the VCO. *Right:* changing the  $C_{CM}$ -to- $C_{DM}$  ratio via the factor X. We achieved an improved FOM by using only  $C_{CM}$  that boosts  $Z_{2nd}$ , while correcting the second to third resonance frequency ratio from the ideal 0.67–0.72

the second and third resonances from the ideal 0.67–0.72; the latter allows retrieving the impedance and phase information at  $2F_{OSC}$ . Thus, the Leeson's noise factor (F) reaches its theoretical minimum of 1.34 for a MOS channel noise coefficient  $\gamma \approx 1$ , and the VCO power efficiency is 0.48. At this optimum point, the circuit also nulls the flicker noise contribution of  $M_{1,2}$ , ideally yielding  $FoM_{@10KHz} \approx FoM_{@10MHz}$ .

The single-turn inductor (Fig. 29, left) features two inner taps to merge L<sub>1</sub> into L<sub>2</sub>, partially reducing the magnetic flux cancellation between them as presented in the EM simulations. At 28 GHz, the EM-simulated Q<sub>1</sub> is 28.5 for L<sub>1</sub> = 126 pH and Q<sub>2</sub> is 27 for L<sub>2</sub> = 250pH, all realized in a compact area (320 × 250 µm<sup>2</sup>). The varactor (C<sub>v</sub>) and tunable C<sub>CM</sub> (3 bits) at V<sub>GP,N</sub> target a TR of 16%, and their simulated Q is >25. Tunable C<sub>2</sub> (3 bits, LSB: 9.5fF) at V<sub>DP,N</sub> is for the resonance alignment: with the first resonance set by C<sub>CM</sub> first, and then we can tune the second and third to align them with the first via C<sub>2</sub>. Furthermore, we can estimate  $\Gamma_{1/f,eff}(t)$  by simulating the impedances ( $\Gamma_{H,i}$ ) and phases ( $\phi_{\Gamma}$ <sup>(i)</sup>) at F<sub>OSC</sub>, 2F<sub>OSC</sub>, and 3F<sub>OSC</sub>. Multiplying them with the higher harmonics of m(t), we can nullify  $\Gamma_{1/f,eff,dc}$  within one oscillation cycle in the symmetric  $\Gamma_{1/f,eff}(t)$  (Fig. 29 (right)). The impedance variations of



**Fig. 29** Left: Proposed compact single-turn multi-tap inductor with high  $Q_1$  and  $Q_2$ . Right: Simulated  $\Gamma$ , m and  $\Gamma_{1/f,eff}$  in one cycle and harmonic impedances over TR

each resonance over the TR lead to imperfect zeroing of  $\Gamma_{1/f,eff,dc}$ , accounting for the FoM and  $1/f^3$  PN corner variations.

The VCO prototyped in 65 nm CMOS includes an on-chip divider-by-2 to ease the measurement similar to [20, 23]. To ensure the reliability with thin-oxide transistors, we selected  $V_{DD} = 0.48$  V. Figure 30 illustrates the measured PN and FoM profiles. It reaches a FoM@1MHz of up to 191.6 dBc/Hz and FoM@10MHz of up to 190.3 dBc/Hz over the de-embedded TR from  $F_{min}$  of 25.5 GHz to  $F_{max}$  of 29.9 GHz. With a 16% TR, the corresponding  $FoM_{T@1MHz}$  peaks at 195.7 dBc/ Hz. Limited by the increased noise floor at the 10 MHz offset due to the small-size divider-by-2 and the test buffer, the FoM<sub>@10MHz</sub> is inferior to FoM<sub>@1MHz</sub>. The 1/f<sup>3</sup> PN corner raises from 130 kHz at Fmin to 230 kHz at Fmax due to AM-PM conversion dominated by the parasitics of the varactors and the SC cells. The varactors cover a tuning range of 491 MHz at low frequency and 766 MHz at high frequency, for a control voltage from 0 to 1.2 V. The frequency pushing is +114 MHz/V at Fmin dominated by the varactors, but -535 MHz/V at  $F_{max}$  has its origins mainly in the SC cells. Multi-chip measurements (five samples) confirm that there is no CM oscillation risk, that the average FoM@1MHz is 190.65 dBc/Hz (±0.95 dB), and that the average  $1/f^3$  PN corner is 135 kHz (±15 kHz). With the tunable capacitance range and resolution, the PN and FoM after manual resonance alignment are 5 dB



Fig. 30 Upper: Measured PN at 12.74GHz. Lower: Performance metrics versus TR

(8 dB) better at a 1 MHz (100 kHz) offset. When operating in a PLL, we can tune first  $C_{CM}$  for frequency locking and then  $C_2$  to optimize the PN performance.

Benchmarking with the recent mm-wave VCOs in CMOS [23, 24] and BiCMOS [25], this work succeeds in improving both FoM and FoM<sub>T</sub> over a wide range of frequency offsets as Table 2 tabulates. The  $1/f^3$  PN corner is at least 2.4× smaller than in [23] and is comparable with [24] that, however, entails an extra mm-wave buffer to enlarge the third harmonic output, penalizing its FoM. For the die area, the VCO is  $1.9\times$  smaller than [24] and  $12.5\times$  smaller than [25] that relies on quad-core coupling. Measuring a TR of 16% from 25.5 to 29.9 GHz, the VCO reaches a FoM<sub>@1MHz</sub> up to 191.6 dBc/Hz that is at least 2 dB better than prior art mm-wave VCOs in both CMOS and BiCMOS (Table 2).

Figure 31 (left) depicts the VCO die micrograph with the active and capacitive elements laid out inside the inductor footprint for area savings. Figure 31 (right) plots the achieved  $1/f^3$  PN corner and the FoM relative to other mm-wave VCOs.

| Parameters                                                      |            | This Work                                                                                   |                    | JSSC'2018 [2.24]                                                            |                      | SSCL'18 [2.23]                                       | ISSCC'18 [2.25]                    |
|-----------------------------------------------------------------|------------|---------------------------------------------------------------------------------------------|--------------------|-----------------------------------------------------------------------------|----------------------|------------------------------------------------------|------------------------------------|
| Key Techniques                                                  |            | Multi-Resonant RLCM tank:<br>Single-Turn Multi-Tap Inductor<br>+ CM-only Tunable Capacitors |                    | Implicit Class-F <sub>23</sub><br>+ 3 <sup>rd</sup> -Harmonic<br>Extraction |                      | Class-F <sub>234</sub> +<br>DM Tunable<br>Capacitors | Class-C +<br>Quad-Core<br>Coupling |
| Supply Voltage (V)                                              |            | 0.48                                                                                        |                    | 1                                                                           |                      | 0.55                                                 | 3                                  |
| Tuning Range (TR)<br>(F <sub>min</sub> to F <sub>max</sub> GHz) |            | 16%<br>(25.48 to 29.92)                                                                     |                    | 14%<br>(27.3 to 31.2)                                                       |                      | 15.7%<br>(25.2 to 29.5)                              | 16%<br>(11.8 to 15.6)              |
| Output Frequency (GHz)                                          |            | 25.48<br>(12.74 *)                                                                          | 29.92<br>(14.96 *) | 27.3                                                                        | 31.2                 | 28.466<br>(14.233 *)                                 | 15                                 |
| PN<br>(dBc/Hz)                                                  | @ 1MHz ∆f  | -115.27                                                                                     | -112.31            | -106                                                                        | -104                 | -114.7                                               | -124                               |
|                                                                 | @ 10MHz ∆f | -134.00                                                                                     | -130.77            | -126                                                                        | -125                 | -131.9                                               | -142.6                             |
| Power Consumption<br>P <sub>DC</sub> (mW)                       |            | 3.8                                                                                         | 4                  | 22 <sup>&amp;</sup>                                                         | 23 <sup>&amp;</sup>  | 6.6                                                  | 72                                 |
| FOM<br>(dBc/Hz)                                                 | @ 1MHz ∆f  | 191.6                                                                                       | 189.8              | 181                                                                         | 180                  | 189.59                                               | 189                                |
|                                                                 | @ 10MHz ∆f | 190.3                                                                                       | 188.2              | 181 <sup>&amp;</sup>                                                        | 181 <sup>&amp;</sup> | 186.79                                               | 187.6                              |
| FOM <sub>T</sub><br>(dBc/Hz)                                    | @ 1MHz ∆f  | 195.7                                                                                       | 193.9              | 184                                                                         | 183                  | 193.51                                               | 193                                |
|                                                                 | @ 10MHz ∆f | 194.4                                                                                       | 192.3              | 184 <sup>&amp;</sup>                                                        | 184 <sup>&amp;</sup> | 190.71                                               | 191.6                              |
| 1/f <sup>3</sup> PN Corner (kHz)                                |            | 130                                                                                         | 230                | 120                                                                         | 210                  | 550                                                  | <50                                |
| Die Area (mm <sup>2</sup> )                                     |            | 0.08                                                                                        |                    | 0.15                                                                        |                      | 0.083                                                | 1                                  |
| Technology                                                      |            | 65nm CMOS                                                                                   |                    | 28nm CMOS                                                                   |                      | 65nm CMOS                                            | 130nm BiCMOS                       |

Table 2 Chip summary and benchmark with [23, 24] in CMOS and [25] in BiCMOS

**FOM** = |PN| +  $20\log_{10}(f_0/\Delta f)$ - $10\log_{10}(P_{DC}/1mW)$  **FOM**<sub>T</sub> = FOM +  $20\log_{10}(TR/10)$ 

\* Measured after on-chip divider-by-2 & Included the power of the first-stage O/P buffer for harmonic extraction

#### 5 Isolated Subsampling PLL

Recent mm-wave PLLs explored different architectures to enhance their jitter performance at low power. Without noisy loop components, the injection-locked PLL in [26] using a GHz reference (REF = 2.25 GHz) can effectively suppress the integrated jitter ( $86fs_{rms}$ ), resulting in a better jitter-power FoM (-247.2 dB). Yet, high-frequency REF injection leads to large spur (-32 dBc), entailing continuous frequency tracking to withstand the PVT variations. Also, at the system level, the generation of the GHz REF is necessary to be on-chip (i.e., cascaded PLLs). The power overhead, for example, additional 20 mW in [27], and unwanted coupling between the two VCOs become inevitable. To this end, direct synthesis mm-wave PLLs using a MHz REF are of higher interest, despite the challenge of a large division ratio (N). An example is a type-II mm-wave PLL reported in [28] that achieved 115fs<sub>rms</sub> integrated jitter, but the involved divider, charge pump (CP), and VCO totally draw 31 mW to suppress the in-band and out-of-band phase noise (PN).



Fig. 31 Chip summary and benchmark with [23, 24] in CMOS and [25] in BiCMOS

The subsampling PLL in [29] using a subsampling phase detector (SSPD) is also promising for improving the jitter-power FoM, as it eliminates the divider noise and removes the  $N^2$  noise amplification of the SSPD, CP, and loop filter (LF). Still, with the direct subsampling and pulse-controlled gain reduction techniques, both [29] at RF and [30] at mm-wave suffer from a tight trade-off between low spurs and low in-band PN.

Here, we report an isolated subsampling PLL (iSS-PLL) dissipating only 10.2 mW at 26.4 GHz. When locked to a 103 MHz REF, it attains low integrated jitter (71fs<sub>rms</sub>) without compromising the spur (-63 dBc). The jitter-power FoM (-252.9 dB) is at least 5.7 dB better than the recent art [26–28, 30]. These results are possible due to (1) the minimization of the disturbances at the sensing and control sides of the VCO by using a master-slave isolated SSPD (iSSPD) plus a V/I converter and (2) the optimization of the interstage sampling capacitor and amplitude of the iSSPD to suppress the in-band PN.

For a conventional SSPD (Fig. 32, upper), the PMOS-based track-and-hold (T&H) has to interface directly with the VCO tank, causing a spur due to: (1) periodic tank capacitance variation induced by the sampling capacitance ( $C_s$ ), alike the binary frequency shift keying (BFSK) modulation, (2) charge injection to the VCO tank from the switch at the sampling edge (SE), and (3) charge sharing between the VCO tank and  $C_s$  at the tracking edge (TE). The BFSK effect leads to an amplitude–/



**Fig. 32** *Upper*: Conventional SSPD suffers from large spur due to (1) BFSK modulation, (2) charge injection, and (3) charge sharing. *Lower*: Proposed iSSPD lowers the kickback charges and indirectly samples the VCO to decouple the spur with the in-band PN

frequency-modulated spur proportional to  $20\log(C_s)$ , opposing to the in-band PN (L<sub>in-band.SSPD</sub>) that is inversely proportional to C<sub>s</sub>.

The proposed T&H (Fig. 32, lower) manages to prevent the BFSK modulation and suppress the charges' kickback to the VCO tank. Specifically, the NMOS (M<sub>x</sub>: 2/0.1 µm) mainly allows the VCO tank to see its gate capacitance (C<sub>g</sub>) as a constant load, eliminating the BFSK effect during each T&H operation. The T&H only has a modulation effect at the drain node of M<sub>x</sub>, resulting in ~180 mV voltage variation together with the charges due to injection and sharing in the current branch. There is a 34 dB attenuation (from the simulation) of such voltage variation at the VCO tank, since the coupling can only happen through the small parasitic capacitance (C<sub>gd</sub> = 1.6fF) of M<sub>x</sub>. Thus, the maximum voltage variation ( $\Delta V_2$  in Fig. 32) is <3.5 mV at F<sub>REF</sub>, much smaller than the 190 mV  $\Delta V_1$  in [30]. As a result, the VCO tank sees little disturbance from the sensing side.

The control side of the VCO (Fig. 33, upper) can also induce the REF spur. In a feedforward master-slave configuration for subsampling and holding [29, 30], the pulser-embedded CP with ripples on  $I_{CP}$  induces the spur. The pulser supports gain reduction by tuning the duty ratio (DR<sub>pul</sub> < 1). Ideally, we can eliminate the amplitude mismatch between the up/down current sources ( $I_{UP}/I_{DN}$ ) at the locked point if the on-time is constant. In practice, we can hardly achieve timing alignment



Fig. 33 Upper: Conventional SSPD with a pulser-embedded CP induces spur due to the duty ratio and switch mismatches. Lower: Proposed iSSPD with inherent gain reduction offers a highly constant  $V_s$  for steady current conduction

due to the variable on-time. Together with the possible mismatch between the PMOS and NMOS switches, it is hard to use a wide PLL bandwidth (BW) to suppress the in-band PN without causing large ripples on  $I_{CP}$ .

The iSSPD cascades two isolated T&Hs to address the above trade-off (Fig. 33, lower). Controlled by two nonoverlapping clocks ( $\varphi_{1,2}$ ), the two T&Hs operate in a master-slave manner and finally output a highly constant voltage (V<sub>S</sub>). The V/I converter transforms two differential constant voltages from the two iSSPDs into a constant current (I<sub>VI</sub>). During  $\varphi_2$ , the charge injection and sharing result in a current disturbance to I<sub>UP</sub>/I<sub>DN</sub>, but they can mostly cancel each other in a push-pull fashion. By narrowing the duty ratio of  $\varphi_{1,2}$ , we can also weaken such disturbance, meanwhile reducing the static current consumption. By properly sizing the iSSPD, the entire gain (K<sub>iSSPD</sub>) of the iSSPD, implemented as <1, avoids the use of a pulser in the CP and reduces the capacitor area in the LF. The master T&H attenuates the sampled amplitude (V<sub>M</sub>) operating at the VCO frequency (F<sub>VCO</sub>) with the gain K<sub>M</sub>. The slave T&H centered at the REF frequency (F<sub>REF</sub>) is for DC-to-DC transformation with the gain K<sub>S</sub>, suppressing the ripples on I<sub>VI</sub>.

Figure 34 depicts the block diagram of the PLL, composed by a frequency locking loop (FLL) for coarse tuning and a type-II iSS-PLL for fine tuning. Since the master T&H operates up to 30GHz with a rise time of  $\sim$ 10 ps for REF, we choose



**Fig. 34** Block diagram of the proposed iSS-PLL. The iSSPD and V/I converter reduce the disturbance from the control side of the VCO, alleviating the trade-off between spur and PN. The VCO buffer enhances the sensing-side isolation between iSSPD and VCO

 $C_{s1} = 10$  fF for an output swing ( $V_{Buffer} = 0.56$  V) of the VCO buffer, which uses a compact ( $10 \times 10 \ \mu m^2$ ) stacked inductor to balance the tracking BW and the in-band PN of the master T&H. Additionally, we set the output swing at  $V_M$  as 0.12 V to reduce the disturbance kickback to the VCO tank. Without penalizing the spur, we can safely upsize  $C_{s2}$  (70fF) to compensate the shrinkage of the output swing at  $V_M$ , resulting in a lower in-band noise contribution of the slave T&H. By setting  $K_S \approx 1.6$ , we can realize  $K_{iSSPD} \approx 0.35$  for gain reduction. A third-order LF and a VCO buffer further reduce the disturbances at the control and sensing sides of the VCO, respectively.

The basis of the mm-wave LC VCO is a robust class-F topology using second-tofourth harmonic resonance to reduce the impulse-sensitivity-function rms value [23]. It measures an FoM<sub>@1MHz</sub> of 189dBc/Hz and a tuning range of 14.9% (from 25.4 to 29.5 GHz). Its output swing is adequate to directly drive the two cascaded divider-by-2 circuits with no extra buffers. For compactness and robustness, the divider-by-2 circuit, based on load-modulated dynamic latches, exhibits a >100% locking range at ~25 GHz.

The iSS-PLL prototyped in 65 nm CMOS operates at 1 V and 0.55 V (for the VCO). An off-chip crystal oscillator provides the 103 MHz REF. Figure 35 shows the measured PN after an on-chip divider-by-4 with the PLL running at 26.368GHz.



**Fig. 35** Measured PN after an on-chip divider-by-4 (i.e. 26.368GHz PLL frequency). The rms jitter is insensitive to the loop BW and its variation is <10fs<sub>rms</sub> among 6 chips

The restored in-band PN is -112.8 dBc/Hz at a 1 MHz offset. The integrated jitter is 71.16fs<sub>rms</sub>, which is insensitive to the loop BW, and its variation is <10fs<sub>rms</sub> among the six chips measured. The iSSPD + V/I converter dominates the in-band PN. We regulate the loop BW at ~4 MHz for an optimal jitter performance, that is, roughly an equal noise contribution by the iSSPD and VCO. The REF spur (-63 dBc) is insensitive to the PLL loop BW, and its variation is <3 dB among the six chips measured (Fig. 36).

Benchmarking with the recent mm-wave PLLs (Table 3 and Fig. 37), the iSS-PLL exhibited improvements of 5.7 dB in the jitter-power FoM and of 11.8 dB in the FoM<sub>r</sub> normalized to the 103 MHz REF. The iSS-PLL occupied an active area of 0.24 mm<sup>2</sup> (Fig. 37).



**Fig. 36** Measured spur after an on-chip divider-by-4 (i.e., 75 dB - 12 dB = 63 dB). The REF spur is insensitive to the loop BW and its variation is <3 dB among six chips

|                                | This Work        | [2.28]       | [2.27]      | [2.26]       | [2.30]       |  |
|--------------------------------|------------------|--------------|-------------|--------------|--------------|--|
|                                |                  | JSCC'18      | ISSCC'18    | ISSCC'17     | ISSCC'14     |  |
| CMOS Technology (nm)           | 65               | 28           | 65          | 65           | 40           |  |
| Key PLL Techniques             | isspd pl I       | Analog       | Cascaded +  | Injection-   | SSPD PLL     |  |
|                                |                  | Type-II PLL  | ILCM PLL    | Locked PLL   |              |  |
| Supply Voltage (V)             | 1 (0.55 for VCO) | 1.2          | N/A         | N/A          | 0.9/1        |  |
|                                | 25.4 to 29.5     | 23.3 to 30.2 | 25 to 30    | 27.4 to 30.8 | 53.8 to 63.3 |  |
| Freq. Range (GHZ)              | (14.9%)          | (25.8%)      | (18.2%)     | (11.7%)      | (16.2%)      |  |
| Ref Freq. (MHz)                | 103              | 491.5        | 120         | 2250         | 40           |  |
| Output Integrated              | 71 *             | 114          | 206         | 86           | 214          |  |
| littor (f                      | @26.368GHz       | @25.95GHz    | @29.22GHz   | @29.25GHz    | @62.64GHz    |  |
| Jiller (I <sub>Srms</sub> )    | (1k-100MHz)      | (10k-40MHz)  | (1k-100MHz) | (1k-100MHz)  | (1k-100MHz)  |  |
| PN @ 1MHz (dBc/Hz)             | -112.8 *         | -104         | -101.9      | -115.6       | -90          |  |
| Total Power (mW)               | 10.2             | 31           | 36.4        | 24.3         | 42           |  |
| FOM (dB)                       | -252.9           | -244         | -235.1      | -247.2       | -237         |  |
| FOM <sub>r</sub> (dB)          | -252.9           | -237.2       | -232.1      | -233.8       | -241.1       |  |
| Power Eff. (mW/GHz)            | 0.39             | 1.19         | 0.70        | 0.83         | 0.67         |  |
| Ref. Spur (dBc)                | -63 *            | -65          | -83 **      | -32.6        | -40.2        |  |
| Active Area (mm <sup>2</sup> ) | 0.24             | 0.11         | 0.95        | 0.11         | 0.16         |  |

Table 3 Chip summary and comparison with the state of the art

 $FOM = 10 log \left[ \left( \frac{\sigma_{rms}}{1 \text{ sec}} \right)^2 \cdot \frac{Power}{1 \text{ mW}} \right] \qquad FOM_r = 10 log \left[ \left( \frac{\sigma_{rms}}{1 \text{ sec}} \right)^2 \cdot \frac{Power}{1 \text{ mW}} \cdot \frac{f_{REF}}{103 \text{ MHz}} \right], \text{ normalized to } f_{REF} = 103 \text{ MHz}$ 

\* Restored from the measured output at 6.592GHz with on-chip divider-by-4 \*\* No reported injection spur



Fig. 37 PLL die photo and benchmark with recent PLLs operating at 25+ GHz

### References

- 1. Andreani, P., Wang, X., Vandi, L., & Fard, A. (2005). A study of phase noise in colpitts and LC-tank CMOS oscillators. *IEEE Journal of Solid-State Circuits*, 40(5), 1107–1118.
- Mazzanti, A., & Andreani, P. (2008). Class-C harmonic CMOS VCOs, with a general result on phase noise. *IEEE Journal of Solid-State Circuits*, 43(12), 2716–2729.
- Ahmadi-Mehr, S., Tohidian, M., & Staszewski, R. B. (2016). Analysis and design of a multicore oscillator for ultra-low phase noise. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 63(4), 529–539.
- Hegazi, E., Sjoland, H., & Abidi, A. A. (2001). A filtering technique to lower LC oscillator phase noise. *IEEE Journal of Solid-State Circuits*, 36(12), 1921–1930.
- Hajimiri, A., & Lee, T. H. (1998). A general theory of phase noise in electrical oscillators. *IEEE Journal of Solid-State Circuits*, 33(2), 179–194.
- Murphy, D., Darabi, H., & Wu, H. (2015, February). A VCO with implicit common-mode resonance. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 442–443.

- Shahmohammadi, M., Babaie, M., & Staszewski, R. B. (2015, February). A 1/f noise upconversion reduction technique applied to class-D and class-F oscillators. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 444–445.
- Sjoland, H. (2002). Improved switched tuning of differential CMOS VCOs. *IEEE Transactions* on Circuits and Systems II: Express Briefs, 49(5), 352–355.
- 9. Lim, C.-C., Yin, J., Mak, P.-I., Ramiah, H., & Martins, R. P. (2018, February). An inverseclass-F CMOS VCO with intrinsic-high-Q 1<sup>st</sup>- and 2nd-harmonic resonances for 1/f<sup>2</sup>-to-1/f<sup>3</sup> phase noise suppression achieving 196.2dBc/Hz FoM. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 374–375.
- Lim, C.-C., Ramiah, H., Yin, J., Mak, P.-I., & Martins, R. (2018). An inverse-class-F CMOS oscillator with intrinsic- high-Q 1st-harmonic and 2nd-harmonic resonances. *IEEE Journal of Solid-State Circuits*, 53(12), 3528–3593.
- Mortazavi, S. Y., & Koh, K.-J. (2016). Integrated inverse class-F silicon power amplifiers for high power efficiency at microwave and mm- wave. *IEEE Journal of Solid-State Circuits*, 51(10), 2420–2434.
- Bevilacqua, A., Pavan, F. P., Sandner, C., Gerosa, A., & Neviani, A. (2006, February). A 3.4-7 GHz transformer-based dual-mode wideband VCO. In *Proceedings of the 32nd European Solid-State Circuits Conference*, pp. 440–443.
- Rong, S., & Luong, H. C. (2012). Analysis and design of transformer-based dual-band VCO for software-defined radios. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 59(3), 449–462.
- Bevilacqua, A., & Andreani, P. (2012). An analysis of 1/f noise to phase noise conversion in CMOS harmonic oscillators. *IEEE Transactions on Circuits and Systems I*, 59(5), 938–945.
- Cao, C. (2006). Millimeter-wave voltage-controlled oscillators in 0.13- m CMOS technology. *IEEE Journal of Solid-State Circuits*, 41(6), 1297–1304.
- Li, G., & Afshari, E. (2011). A distributed dual-band LC oscillator based on mode switching. *IEEE Transactions on Microwave Theory and Techniques*, 59(1), 99–107.
- Li, G., Liu, L., Tang, Y., & Afshari, E. (2012). A low-phase-noise wide-tuning-range oscillator based on resonant mode switching. *IEEE Journal of Solid-State Circuits*, 47(6), 1295–1308.
- Peng, Y., Yin, J., Mak, P.-I., & Martins, R. P. (2018). Low-phase-noise wideband modeswitching quad-Core-coupled mm-wave VCO using a single-center-tapped switched inductor. *IEEE Journal of Solid-State Circuits*, 53(11), 3232–3242.
- Murphy, D., et al. (2017). Implicit common-mode resonance in LC oscillators. *IEEE Journal of Solid-State Circuits*, 52(3), 812–821.
- Babaie, M., & Staszewski, R. (2015). An ultra-low phase noise class-F<sub>2</sub> CMOS oscillator with 191 dBc/Hz FoM and long-term reliability. *IEEE Journal of Solid-State Circuits*, 50(3), 679–692.
- Babaie, M., & Staszewski, R. (2013). A Class-F CMOS Oscillator. *IEEE Journal of Solid-State Circuits*, 48(12), 3120–3133.
- 22. Shahmohammadi, M., et al. (2016). A 1/f noise Upconversion reduction technique for voltagebiased RF CMOS oscillators. *IEEE Journal of Solid-State Circuits*, 51(11), 2610–2624.
- Guo, H., et al. (2018). A 0.083-mm<sup>2</sup> 25.2-to-29.5 GHz multi-LC-tank class-F<sub>234</sub> VCO with a 189.6-dBc/Hz FoM. *IEEE Solid-State Circuits Lett.*, 1(4), 86–89.
- Hu, Y., et al. (2018). A low-flicker-noise 30-GHz class-F<sub>23</sub> oscillator in 28-nm CMOS using implicit resonance and explicit common-mode return path. *IEEE Journal of Solid-State Circuits*, 53(7), 1977–1987.
- 25. Padovan, F., et al. (2018, February). A quad-Core 15GHz BiCMOS VCO with -124dBc/Hz phase noise at 1MHz offset, -189dBc/Hz FoM, and robust to multimode concurrent oscillations. In *IEEE International Solid-State Circuits Conference- (ISSCC) Digest of Technical Papers*, pp. 376–377.
- 26. Yoo, S., et al. (2017, February). A PVT-robust -39dBc 1kHz-to-100MHz integrated-phasenoise 29GHz injection-locked frequency multiplier with a 600μW frequency-tracking loop

using the averages of phase deviations for mm-band 5G transceivers. In *IEEE International Solid-State Circuits Conference- (ISSCC) Digest of Technical Papers*, pp. 324–325.

- 27. Yoon, H., et al. (2018, February). A -31dBc integrated-phase-noise 29GHz fractional-N frequency synthesizer supporting multiple frequency bands for backward compatible 5G using a frequency Doubler and injection-locked frequency multipliers. In *IEEE International Solid-State Circuits Conference- (ISSCC) Digest of Technical Papers*, pp. 366–367.
- Ek, S., et al. (2018). A 28-nm FD-SOI 115-fs Jitter PLL-based LO system for 24-30 GHz sliding-IF 5G transceivers. *IEEE Journal of Solid-State Circuits*, 53(7), 1988–2000.
- 29. Gao, X., et al. (2009, February). A 2.2GHz 7.6mW sub-sampling PLL with -126dBc/Hz in-band phase noise and 0.15ps<sub>rms</sub> Jitter in 0.18µm CMOS. In *IEEE International Solid-State Circuits Conference- (ISSCC) Digest of Technical Papers*, pp. 392–393.
- Szortyka, V., et al. (2017, February). A 42mW 230fs-Jitter sub-sampling 60GHz PLL in 40nm CMOS. In *IEEE International Solid-State Circuits Conference- (ISSCC) Digest of Technical Papers*, pp. 366–367.

# **Ultra-Low-Voltage Clock References**



Ka-Meng Lei, Pui-In Mak, and Rui P. Martins

## 1 Introduction

An Internet of Things (IoT) network is a crucial component of different revolutionary concepts such as Industry 4.0 [1] and smart homes/smart cities [2]. The IoT devices within the networks gather vast amounts of data for dedicated processors/AI models, which boost the precision of analyses. An essential criterion for the IoT device is low power consumption. Ultra-low-power (ULP) radio, intermittently turned on for a short amount of time for data transmission to reduce the average power of the IoT device, is popular for the IoT device as it reduces the power consumption of power-hungry blocks such as the transceiver (TRX) and extends the lifetime of the device [3]. The system will place the device into sleep mode for a specific period, with only critical blocks such as memory and wakeup timers powered on for timing purposes.

On the other hand, there is a trend to power the IoT device with energy harvesters to realize perpetual operation. As the battery has a finite lifetime, there may be chances that the IoT device will miss critical data if it runs out of battery. Also, replacing batteries will be a tremendous task considering that there will be trillions of IoT devices. Further, the battery may pose environmental issues and create safety

K.-M. Lei (🖂) · P.-I. Mak

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

e-mail: kamenglei@um.edu.mo; pimak@um.edu.mo

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_3

risks if not handled properly. By replacing the batteries with energy harvesters (EH), the lifetime of the device increases, and we can obviate the labor to replace the batteries, which otherwise requires a substantial effort. EH, such as solar cells (typical available power indoor:  $10-100 \mu$ W/cm<sup>2</sup>) and thermoelectric generators (typical available power:  $10-1000 \mu$ W/cm<sup>2</sup>), are promising in this perspective [4–6]. Yet, they usually only output voltage with amplitudes ~0.3 V–0.4 V and are unstable with environmental factors (temperature and light intensity) [4]. We can use a boost converter to stabilize and step up the voltage to the standard I/O voltage, but this increases the footprint (cost) and power consumption of the IoT device. These criteria open a prospective research direction for ultra-low-voltage (ULV) circuits, powered directly by these energy harvesters, and avert the penalties of the interim converters.

Clock references are indispensable parts of the TRX. Wide-ranging purposes such as the low-power wakeup timer, the phase-locked loop, the data converters, etc. require different clock references. Hence, this chapter elaborates on the design and measurement results of two ultra-low-voltage clock references in deep-submicron silicon processes. Section 2 introduces the regulation-free sub-0.5 V 16/24 MHz crystal oscillator for energy-harvesting Bluetooth Low Energy (BLE) radios implemented in 65 nm CMOS [7], whereas Sect. 3 demonstrates a fully integrated 0.35-V 2.1 MHz temperature-resilient relaxation oscillator using an asymmetric swing-boosted RC network implemented in 28 nm CMOS [8].

# 2 Regulation-Free Sub-0.5 V 16/24 MHz Crystal Oscillator for Energy-Harvesting BLE

#### 2.1 Motivation

The crystal oscillator (XO) is an essential circuit module for modern TRXs. It provides a stable clock reference for different parts such as data converters, phase-locked loops, sensors, etc. Despite its excellent frequency stability, it can take a few milliseconds for the XO to settle into the steady state [9–11] without any fast startup technique [12] due to the high-quality factor of the crystal (~10<sup>5</sup>). This startup time ( $t_s$ ) dominates the "on" latency of the radio, and its startup energy ( $E_s$ ) may significantly degrade the effectiveness of duty-cycling of an ultra-low-power radio. If the active energy ( $E_{TRX}$ ) of a TRX is 1280 nJ (on-time of 128 µs [13] and active power of 10 mW [14]), the percentage of energy spent for starting the XO in every working cycle is ~42% for  $E_s$  of 1000 nJ for a conventional XO and a duty cycle of 0.1%. Such a percentage will go further up as recent circuit techniques can manage to suppress the active power of the TRX ( $P_{TRX}$ ) [15–17]. Then, reducing  $E_s$  for the ULP radios is of paramount importance to reduce its average power consumption. Recent efforts in both academia and industry succeeded in shortening the  $t_s$  and  $E_s$  of the XO [13, 14, 18–23].



**Fig. 1** Overview of the proposed XO and illustration of  $t_{\rm S}$  improvement by two techniques: SSCI and inductive three-stage  $g_{\rm m}$ . The  $L_{\rm M}$ ,  $C_{\rm M}$ , and  $R_{\rm M}$  are the modeled inductance, capacitance, and resistance of the crystal, respectively, whereas  $C_{\rm S}$  is the crystal's stray capacitance

This section reports a regulation-free sub-0.5 V XO according to the system aspect of the EH BLE radios described in [24–27]. Unlike the existing fast startup XOs based on standard or I/O voltages to power up their inverter-like or active-load amplifiers [13, 18–21], the proposed XO is ULV-enabled by using single-/multi-stage resistive-load amplifiers [28]. This architecture circumvents the ineluctable voltage headroom limit, rendering it compatible with the ULV application. Specifically, we propose a *dual-mode*  $g_m$  scheme and a *Scalable Self-reference Chirp Injection (SSCI)* technique for the XO to surmount the operating challenges in both startup and steady state (Fig. 1). The reported XO includes load capacitors of 6 pF and suits common commercially available crystals. Yet, we can also apply the technique to crystals with different load capacitances.

# 2.2 Fast Startup XO Using Dual-Mode g<sub>m</sub> Scheme and SSCI

For a crystal's resonant frequency  $(f_m)$  at tens of MHz, its  $t_s$  (milliseconds) dominates the "on" latency of a duty-cycled radio, raising the average power consumption. In addition, for energy-limited EH sources, the  $E_S$  of the XO is crucial as it may demand a large instant current from the EH source or reservoir. Recent XOs [13, 18–22] succeeded in reducing both  $t_s$  and  $E_S$ . Herein, we propose two techniques, the dual-mode  $g_m$  and the SSCI, for balancing the XO performances in both startup (i.e.,  $t_s$  and  $E_S$ ) and steady state [i.e., power consumption and phase noise (PN)]. The envelope of the XO during startup at the time t is

$$A_{\rm env}(t) = A_i \cdot e^{\frac{R_{\rm N} - R_{\rm M}}{2L_{\rm M}}t},\tag{1}$$

where  $A_i$  is the initial amplitude and  $R_N$  is the negative resistance of the overall impedance viewed from the crystal core. The  $L_M$  and  $R_M$  are the motional inductance and resistance of the crystal, respectively. The aim of the SSCI is to increase  $A_i$ instantly after enabling the XO, while the dual-mode  $g_m$  allows a boosted  $R_N$ afterward. They together bring down  $t_S$  without momentarily raising the startup power, culminating in a lower  $E_S$  and a relaxed power-source design.

#### Scalable Self-Reference Chirp Injection (SSCI)

Signal injection to the XO can bring down  $t_s$  if the injection frequency is close to  $f_m$  of the crystal [19]. Instead of waiting for the XO to build up its oscillation amplitude, we can use an auxiliary oscillator (AO) to excite the crystal. Yet, due to the high Q nature of the crystal, such signal injection is only effective if its frequency error from  $f_m$  is <0.5% [13]. There were several signal injection techniques for kick-starting the XO reported. We can categorize them into three groups: constant frequency injection (CFI) [18, 21, 22], dithering injection [13], and chirp injection (CI) [19].

CFI injects a clock signal into the crystal with a constant frequency precisely matching  $f_{\rm m}$ . Albeit this scheme is very efficient and simple in concept, the AO requires calibration as well as a delicate design that will be challenging in a sub-0.5 V design. As an example, the XO in [21] achieves  $t_{\rm s}$  values of 58/10/2 µs from 1.84/10/50 MHz crystals. Yet, it has a supply voltage of 1 V. Also, the ring oscillator entails frequency calibration after fabrication.

Dithering injection toggles the AO frequencies to compensate for the frequency deviation caused by temperature and voltage variations. As such, the injection signal can cover a wider frequency range than that of CFI. Still, trimming is necessary to compensate for the process variation. When compared with CFI, its effect on shortening  $t_s$  is lower since the signal power spreads to a wider spectrum. For instance, the XO in [13] exhibits a slashed  $t_s$  of <400 µs by using dithered-signal injection (dithered step size: 2%).

Here, we consider CI to be more robust and low cost, as it relies on a frequencyrich signal to excite the crystal and avoids frequency calibration. The principle is alike dithering but covers a wider frequency range. It gradually sweeps the oscillating frequency and progressively decreases/increases the frequency. As such, this chirping sequence can generate a spectrum between the highest frequency  $f_{\rm H}$  to the lowest frequency  $f_{\rm L}$ , as evinced by its Fourier transform [29]. If  $f_{\rm L} < f_{\rm m} < f_{\rm H}$ regardless of PVT variations, the crystal will persistently receive the power. Despite its weaker effectiveness on  $t_{\rm S}$  reduction since the power spreads to a wider band, CI has the benefit of no trimming on the AO. It is especially suitable for low-cost and ULV radios, where there is the possibility of exacerbating the frequency variation of the AO against voltage and temperature. In [19], a  $R_{\rm N}$ -boosting technique applies together with CI, showing a  $t_{\rm S}$  of 158 µs without trimming or calibration on the

|                                       | Characteristics of the injecting signal |           |                    |  |  |
|---------------------------------------|-----------------------------------------|-----------|--------------------|--|--|
|                                       | Constant frequency                      | Dithering | Chirping           |  |  |
| $t_{\rm S}$ and $E_{\rm S}$ reduction | <i>✓ ✓ ✓</i>                            | ~~        | <b>v</b>           |  |  |
| Excitation bandwidth                  | Narrow                                  | Moderate  | Wide               |  |  |
| Trimming on AO                        | Required                                | Required  | Not required       |  |  |
| Precision of AO                       | Very critical                           | Critical  | Relaxed            |  |  |
| Literature                            | [20, 21]                                | [13]      | [19] and this work |  |  |
|                                       |                                         |           |                    |  |  |

Table 1 Overview of different signal injection techniques to kick-start the XO



**Fig. 2** Proposed SSCI. It generates a chirping signal to kick-start the XO using an untrimmed RO with *relaxed* precision. The FSM (finite state machine) provides feasibility to scale  $t_{CI}$ , accommodating different crystal packages (i.e.,  $L_{M}$  and  $C_{S}$ )

AO. Still, the related RC sweeping unit for modulating the frequency of the AO is area hungry (estimated ~90% of the chip area) due to its large time constant (at the order of 10  $\mu$ s) for generating the chirping sequence. Table 1 summarizes the key features of the three signal injection techniques.

Herein, we introduce the SSCI (Fig. 2) that only entails an untrimmed oscillator with relaxed precision. Its frequency range can easily cover  $f_m$  variation against PVT. Unlike the RC-based chirping [19], we incorporate a five-stage RO with a finite state machine (FSM) to control the oscillating frequency of the RO via a cap-bank. Subsequently, the circuit can generate the chirping sequence by referencing its own signal and requiring no area-hungry RC units to modulate the oscillating frequency. The FSM counts the number of pulses and sequentially raises  $C_{\text{OSC}}$  by sending the control signal  $f_{\text{ctrl}}$  to the RO. Additionally, compared to the analog sweeping technique in [19], the FSM can digitally scale the total injection time ( $t_{\text{CI}}$ ), decided by the number of exciting cycles at each cap-bank value  $C_{\text{OSC}}$ :

$$t_{\rm CI} = N \times \sum_{i} t_i, \tag{2}$$

where *N* is the number of cycles to repeat at each  $C_{OSC}$  and  $t_i$  is the period of a single cycle at *i*-th  $C_{OSC}$ . The average amplitude of oscillation on the crystal after the chirping sequence is proportional to  $\sqrt{t_{CI}}$  [19, 29]. Thus, *N* can be programmed to adjust  $t_{CI}$ , rendering the XO easily compatible with different crystal parameters (i.e., an optimum  $t_{CI}$  depends on  $L_M$ ,  $R_M$  and  $R_N$  ( $C_S$ ) [19]). This digital-intensive architecture is more area-efficient. The oscillation signal at the RO has a varying duty cycle with VT variation. To maximize the injection energy (i.e., 50% duty cycle), the chirp-modulated signal is a div-by-2 output of the RO. This output serves as both the exciting signal for the crystal via the output driver and the trigger signal for the FSM. After the injection, the FSM automatically powers down the RO.

#### Dual-Mode g<sub>m</sub> Scheme

The XO using a one-stage  $g_{\rm m}$  ( $A_{\rm XO-1}$ ), especially for the Pierce oscillator, is popular as it can optimize the steady-state PN [13, 19–21]. The  $g_{\rm m}$  offers a negative resistance compensating for the equivalent resistance of the crystal. Its value also determines the growth of the oscillation amplitude before the XO reaches the steady state.

From Fig. 3a, by omitting the resistive loss induced by  $A_{XO-1}$ , the impedance between the I/O ( $Z_{amp-1}$ ) becomes

$$Z_{\rm amp-1} = -\frac{g_{\rm m}}{4\omega_0^2 C_{\rm L}^2} + \frac{1}{j\omega_0 C_{\rm L}},\tag{3}$$



**Fig. 3** XO using (**a**) a one-single  $g_m(A_{XO-1})$  for the steady state and (**b**) a three-stage  $g_m(A_{XO-3})$  for the startup

where  $C_{\rm L}$  is the designated crystal's load capacitance and  $\omega_0$  is the angular oscillating frequency  $2\pi f_0$ . With  $Z_{\rm amp}$  shunted by the crystal's stray capacitance ( $C_{\rm S}$ ), it affects the negative resistance ( $R_{\rm N}$ ) of the overall impedance looking from the crystal core ( $Z_{\rm C}$ ):

$$R_{\rm N} \equiv -\operatorname{Re}\left(Z_{\rm c}\right) = \frac{-\operatorname{Re}\left(Z_{\rm amp}\right)}{\left[\omega_0 C_{\rm s} \operatorname{Re}\left(Z_{\rm amp}\right)\right]^2 + \left[1 - \omega_0 C_{\rm s} \operatorname{Im}\left(Z_{\rm amp}\right)\right]^2}$$
(4)

If  $\omega_0 C_S |Z_{amp}| \gg 1$ , we can have  $R_N \approx -\text{Re}(Z_{amp})$  that matches the expression in [13] for  $A_{\text{XO-1}}$ . A large  $R_N$  favors more  $t_S$  reduction according to Eq. (1). Yet, for  $|Z_{amp}|$  to be comparable with  $1/\omega_0 C_S$  [i.e., a higher  $g_m$  and thus  $|\text{Re}(Z_{amp})|$  to speed up the startup], we have to cogitate the effect from  $C_S$ . Then, we can deduce the specific  $R_N$  of  $A_{\text{XO-1}}$  (i.e.,  $R_{N,1}$ ) from Eq. (4) as

$$R_{\rm N,1} = \frac{4g_{\rm m}C_{\rm L}^2}{\left(g_{\rm m}C_{\rm s}\right)^2 + 16C_{\rm L}^2\omega_0^2\left(C_{\rm L} + C_{\rm S}\right)^2},\tag{5}$$

Taking the derivative of Eq. (5), we can obtain the maximum value of  $R_{N,1}$  with respect to  $g_m$  at a fixed  $C_L$ :

$$R_{\rm N,1,\,max} = \frac{C_{\rm L}}{2\omega_0 C_{\rm s} (C_{\rm L} + C_{\rm s})},\tag{6}$$

where we apply  $g_m = 4\omega_0 C_L(1 + C_L/C_s)$ . Obviously,  $\text{Im}(Z_{\text{amp-1}})$  can only be negative (capacitive) for  $A_{\text{XO-1}}$ , and  $R_{\text{N,1}}$  has an upper limit if only  $g_m$  is the sizing parameter [19, 20]. For instance, the  $R_{\text{N,1}}$  is limited to 1.2 k $\Omega$  with  $C_S = 2 \text{ pF}$ ,  $f_0 = 24 \text{ MHz}$  and  $C_L = 6 \text{ pF}$ , even if we apply an oversized  $g_m = 14.5 \text{ mS}$ . There were efforts to raise  $R_{\text{N,1}}$  by increasing  $g_m$  or tuning  $C_L$  temporarily during the startup [20, 30, 31]. Yet, increasing  $g_m$  incurs larger power consumption and is unfavorable toward the reduction of  $E_S$ . Further, Eq. (6) binds  $R_{\text{N,1}}$ , with a maximum of  $1/2\omega_0 C_S$  (i.e., 1.66 k $\Omega$  in the above example when  $C_L \ll C_S$  and  $g_m \approx 4\omega_0 C_L^{-2}/C_s$ ).

Inspecting Eq. (4), if a positive  $\text{Im}(Z_{\text{amp}})$  is possible to counteract the effect of  $C_{\text{S}}$ , we can boost  $R_{\text{N}}$  to surmount the aforesaid  $R_{\text{N}}$  limit. The idea is to mimic a µH-range inductor on-chip for this purpose. Interestingly, a three-stage  $g_{\text{m}}$  ( $A_{\text{XO-3}}$ ) with designated capacitive loads ( $Z_{\text{o}1-2}$ ) can effectively mimic an inductive effect during the startup (Fig. 3b). Although [32] applied a multistage  $g_{\text{m}}$  to save the XO's steady-state power, here, we explore first its inductive feature for  $t_{\text{S}}$  reduction. For  $A_{\text{XO-3}}$ , we define its  $Z_{\text{amp}}$  as  $Z_{\text{amp-3}}$ . We can maneuver both the  $\text{Re}(Z_{\text{amp-3}})$  and  $\text{Im}(Z_{\text{amp-3}})$  between a positive and a negative values by adjusting the inter-stage impedances, as demonstrated in [7]. For instance, if we set  $g_{\text{m}1,2} = 0.4$  mS,  $g_{\text{m},3} = 1.5$  mS,  $r_{01,2} = 7$  k $\Omega$ ,  $C_{\text{L}} = 6$  pF,  $\omega_0 = 2\pi \times 24$  MHz, and  $C_{01} = C_{02} = 0.5$  pF, we can obtain a  $Z_{\text{amp-3}} = -1.6 + 1.2$  jk $\Omega$ . We can utilize the Im( $Z_{\text{amp-3}}$ ) > 0, manifesting that  $Z_{\text{amp-3}}$  is inductive, to mitigate  $C_{\text{s}}$  and break the limitation (Eq. (6)). Foregoing, we can have  $\text{Re}(Z_{\text{C-3}}) = -2.4$  k $\Omega$  due to the inductive  $A_{\text{XO-3}}$ . Then, we can achieve a

higher  $R_{\rm N}$  even with similar power consumption when compared with the  $A_{\rm XO-1}$ , enabling an energy-efficient startup. Due to the intricate expression of  $R_{\rm N,3}$ , we do its optimization numerically, before proceeding to the transistor level implementation. Besides, the technique is also applicable to different  $f_0$ . Apparently, for the same power budget,  $A_{\rm XO-3}$  is inferior to  $A_{\rm XO-1}$  in terms of the steady-state PN, as each stage shares a smaller bias current and the noises accumulate. Also,  $\rm Im}(Z_{\rm C-3})$ , which determines the XO's oscillating frequency, deviates from the designated value due to the presence of  $C_{\rm o1}$  and  $C_{\rm o2}$ . This affects the accuracy of  $f_0$ . Consequently, it is desirable to implement a dual-mode  $g_{\rm m}$  scheme that can balance the startup and steady-state performances. During the startup where the PN and accuracy of  $f_0$  are irrelevant, we enable  $A_{\rm XO-3}$  and connect to the crystal to attain a larger  $R_{\rm N}$  for fast startup. When the crystal gains sufficient energy for oscillation,  $A_{\rm XO-3}$  is off and disconnected from the crystal while  $A_{\rm XO-3}$  (fast startup) and  $A_{\rm XO-1}$  (low PN and accurate  $f_0$ ).

#### 2.3 Transistor-Level Implementation

We design the core elements of the XO (e.g.,  $A_{XO-1}$ ,  $A_{XO-3}$ , and RO) to operate below a 0.5 V  $V_{DD}$ . Only the static and DC circuits (digital logics and constant- $g_m$  bias circuit) operate at 0.7 V to facilitate the design. These circuits, mostly powered off during the steady state, consume <5  $\mu$ A. Thus, an on-chip switched capacitor charge pump can easily generate the 0.7 V supply and share it with other blocks at the system level as described in [26].

Subtreshold common-source (CS) amplifiers with *resistive loads* (Fig. 4a, b) constitute the basis of both  $A_{XO-1}$  and  $A_{XO-3}$ . Unlike other solutions that use current-source loads [13, 20, 21], the resistive load aids in preserving a moderate  $g_m$  even with  $V_{DD} < 0.35$  V, for a small bias current (simulated at  $I_{dc} = 100 \ \mu$ A). For instance, the simulated  $g_m$  of  $A_{XO-1}$  is 1.3 mS at  $V_{DD} = 0.3$  V and -40 °C, being four times higher than that of the current-source load (assuming an identical  $g_m$  with



Fig. 4 Circuit implementation of (a)  $A_{XO-1}$  and (b)  $A_{XO-3}$
$V_{\rm DD} = 0.35$  V at 20 °C). Further, at high temperature, the intrinsic output resistance of the transistor decreases rapidly. This affects the stability of  $R_{\rm N}$  and causes variation on  $t_{\rm s}$ , especially for  $A_{\rm XO-3}$ . The  $A_{\rm XO-1}$  with resistive load has a trade-off of lower immunity to the power supply noise (noise power from  $V_{\rm DD}$  modulated to the output of XO with resistive load that is 3 dB larger than its current-source-load counterpart at 1 kHz offset). Also, it has a large  $f_0$  variation with the  $g_{\rm m}$  of the  $A_{\rm XO-1}$ not fixed. Still, this is manageable for the BLE standard (< ±50 ppm [33]), as well as other IoT protocols (e.g., ZigBee: ±40 ppm). A small nominal  $I_{\rm dc}$  of 100 µA is adequate for the expected PN.

A feedback resistor  $R_{\rm F}$  self-biases  $A_{\rm XO-1}$ , whereas  $A_{\rm XO-3}$  is an AC-coupled threestage CS amplifier aided by a constant- $g_{\rm m}$  bias circuit. As the  $g_{\rm m}$  of the  $A_{\rm XO-3}$  has a considerable impact on  $R_{\rm N,3}$ , the constant- $g_{\rm m}$  bias circuit secures  $A_{\rm XO-3}$  to be inductive and a stable  $R_{\rm N,3}$  for robust-and-fast startup against PVT. We choose the channel lengths of the transistors such that their output resistances are  $\sim 10^{\times}$  larger than the resistors  $R_{1-3}$ . This soothes the temperature dependency of  $R_{\rm N,3}$  as  $R_{1-3}$  and then dominates  $r_{o1-3}$ . We design  $A_{\rm XO-3}$  to have similar power consumption ( $\sim 100 \ \mu$ A) as  $A_{\rm XO-1}$ . As such, the power consumption does not vary instantaneously, easing the design and layout of the power supply. Each current branch includes CMOS switches where we can isolate  $A_{\rm XO-1}$  or  $A_{\rm XO-3}$  from the crystal, while lowering their leakage power (simulated <14 nW at 0.35 V and 20 °C) when disabled. Their sizes allow that their on-resistances are negligible when compared with  $R_{1-3}$ .

Both the parasitic capacitances of the transistors and the finite I/O resistance of  $A_{XO-3}$  affect the  $R_{N,3}$ . Thus, we should further optimize  $R_{N,3}$  via simulation. The total  $g_m$  budget is 2.3 mS (total bias current: 100 µA, assuming a  $g_m/I_D = 23 \text{ V}^{-1}$ ), with  $r_{o1-3}$  set according to the  $g_m$  of each gain stage. Figure 5a shows the locus plots of  $Z_{\text{amp-1}}$  and  $Z_{\text{amp-3}}$  implemented with practical transistors and integrated passives.



**Fig. 5** (a) Locus plot of the  $Z_{\text{amp-1,-3}}$  against frequency. (b) Simulated  $R_{\text{N,1}}$  and  $R_{\text{N,3}}$  with a fixed total  $g_{\text{m}}$  budget of 2.3 mS and the boosting ratio against frequency

 $Z_{\text{amp-1}}$  is capacitive over all frequencies, while  $Z_{\text{amp-3}}$  is inductive over the 13–46 MHz range, which is compatible with different  $f_0$ . Optimized at the most popular XO frequency of 24 MHz, the optimum  $R_{\text{N},3}$  is 2.4 k $\Omega$  after paralleling it with a  $C_{\text{S}}$  of 2 pF. This result is ~9× higher than  $R_{\text{N},1}$  under the same  $g_{\text{m}}$  budget and surpasses  $R_{\text{N},1,\text{max}}$  (Fig. 5b). The boosting effect is insensitive to the frequency between 15 and 34 MHz, under  $R_{\text{N},3}/R_{\text{N},1} > 6$ .

Ideally, we should enable  $A_{XO-3}$  during the entire startup phase. Yet, the  $g_m$ 's of  $M_{1-3}$  deviate from their small-signal values when the oscillation amplitude is growing. This results in an aggravated  $R_{N,3}$ . As a consequence, the optimum active time of  $A_{XO-3} t_{sw}$  is the time when  $R_{N,3} \approx R_{N,1}$ , which means  $A_{XO-3}$  no longer helps  $t_s$  reduction. We can find the optimal  $t_{sw}$  via simulations with measured crystal parameters to avoid any extra detection and control mechanism.

To realize the SSCI, we implement a five-stage RO constituted by CS amplifiers with source degeneration. Compared to the RO with inverters or relaxation oscillator, a RO with CS amplifiers balances the frequency stability and compatibility with the sub-0.5 V  $V_{\rm DD}$ . The source resistor ( $R_{\rm S}$  in Fig. 2) also reduces the variation of the oscillating frequency against  $V_{\rm DD}$ . From simulation, the frequency variation of RO reduces by ~20% over a 0.3–0.5 V  $V_{\rm DD}$ . We set  $R_{\rm D}$  as 36 k $\Omega$ . The current consumption of the RO is 20  $\mu$ A. We implemented the div-by-2 unit and FSM with standard logic.

We designed the  $f_{\rm H}$  and  $f_{\rm L}$  of the SSCI module as 36 and 12 MHz, respectively, chosen to satisfy  $f_{\rm L} < f_{\rm m} < f_{\rm H}$  even with PVT variation (Fig. 6). The total size of the  $C_{\rm OSC}$ , simulated to be 135 fF, outputs an  $f_{\rm L}$  of 12 MHz (after div-by-2). Then, we determine the resolution of the cap-bank, decided by the minimum duration of  $t_{\rm CI}$ ; since for a complete chirping sequence, we need to sweep all of the states at least once, we set the minimum  $t_{\rm CI}$  (i.e., N = 1) as the resolution (number of pulses),



**Fig. 6** (a) Monte Carlo-simulated  $f_L$  with  $V_{DD} = 0.4$  V and T = 90 °C; (b) Monte Carlo-simulated  $f_H$  with  $V_{DD} = 0.3$  V and T = -40 °C. N = 30 for both cases



defined in Eq. (2). The optimum  $t_{\rm CI}$ , according to [19] and the measured crystal parameter, becomes 4.6 µs. Thus, we set  $C_{\rm OSC}$  as a binary-coded 6-bit cap-bank (unit cap: 2.14 fF), corresponding to a minimum  $t_{\rm CI}$  of 4 µs with the designated  $f_{\rm H}$ and  $f_{\rm L}$ . Even though there is a discrepancy between the applied and optimum  $t_{\rm CI}$ , it almost does not affect the  $t_{\rm s}$  as the  $t_{\rm CI}$  is only present for a short period when compared with  $t_{\rm s}$ . As the amplitude of oscillation after the CI is proportional to  $\sqrt{t_{\rm CI}}$ , even the applied  $t_{\rm CI}$  is 13% shorter than the optimum; the amplitude is only 7% smaller. Due to the high growth of the oscillation amplitude of the  $A_{\rm XO-3}$  (time constant in Eq. (1): 9.33 µs), we can compensate for the discrepancy between the applied and optimum  $t_{\rm CI}$  by the  $A_{\rm XO-3}$  quickly, for example, the growth of oscillation amplitude countervails the 0.6 µs discrepancy (~1.07×). No significant difference in  $t_{\rm s}$  will emerge, even with PVT variation on the  $t_{\rm CI}$  (Fig. 7).

The RO generates an oscillating signal at  $2f_{\rm H}$  with  $C_{\rm OSC} = 0$  fF (with oscillating frequency governed by the parasitic capacitances) and  $C_{\rm OSC}$  progressively increased by the FSM bit-by-bit according to N to  $C_{\rm OSC} = 135$  fF wherein the RO oscillates at  $2f_{\rm L}$ . In this work, the variable N is digitally configurable among 1, 2, 4, and 8.

# 2.4 Experimental Results and Comparison with State of the Art

The XO, fabricated in 65 nm CMOS with fixed on-chip  $C_{\rm L}$  of 6 pF, occupied an active area of 0.023 mm<sup>2</sup> (Fig. 8a), of which 36% corresponds to the  $C_{\rm L}$  (Fig. 8b). The target  $f_0$  can be flexible between 16 and 24 MHz. We first verify the SSCI functionality. Figure 9a exhibits the measurement of the oscillating frequency of the RO (after div-by-2) against  $C_{\rm OSC}$ , which is consistent with the post-layout simulation. The average  $f_{\rm L}$  and  $f_{\rm H}$  across five dies at room temperature are 10.93 MHz ( $\sigma$ : 0.32 MHz) and 35.96 MHz ( $\sigma$ : 1.21 MHz), respectively. Figure 9b confirms the chirping sequence with N = 1, and Fig. 9c plots the duration of  $t_{\rm CI}$  against N.

Then, we tested the XO with a 24 MHz crystal (package:  $3.2 \times 2.5 \text{ mm}^2$ ) without any startup aid at room temperature (20 °C) and  $V_{DD} = 0.35 \text{ V}$ . The measured crystal



Fig. 8 (a) Chip micrograph. (b) Area breakdown of the XO



Fig. 9 (a) Measured and simulated oscillating frequencies of the RO versus  $C_{\text{OSC}}$  at different conditions, robust to cover  $f_0$  of the crystal even with  $V_{\text{DD}}$  and temperature variations. (b) Measured chirping sequence (N = 1). (c) Injection duration  $t_{\text{CI}}$  against N. For the latter two figures,  $V_{\text{DD}} = 0.35 \text{ V}$ , T = 20 °C



Fig. 10 Measured startup waveform (a) without startup aid and (b) with SSCI and  $A_{XO-3}$  enabled



parameters  $L_M$ ,  $R_M$ ,  $C_M$ , and  $C_S$  are 11.1 mH, 19  $\Omega$ , 3.95 fF, and 1.3 pF, respectively. Under these conditions, we have  $t_s = 1.3$  ms (Fig. 10a). The  $t_s$  decreases to 530 µs with  $A_{XO-3}$  enabled during the startup.

We estimate  $R_{N,1}$  and  $R_{N,3}$  from the growth of the oscillation amplitude according to Eq. (1), which we can write as

$$\ln\left(\frac{A_{\rm env}(t_0 + \Delta t)}{A_{\rm env}(t_0)}\right) = \frac{R_{\rm N} - R_{\rm M}}{2L_{\rm M}} \cdot \Delta t.$$
(7)

By measuring the growth of the oscillation amplitude within a specific time interval, we can estimate the  $R_{\rm N}$  of the XO. For  $A_{\rm XO-1}$ , the growth of oscillation is  $1.01 \times /\mu$ s, and thereby we calculate  $R_{\rm N,1}$  as 230  $\Omega$  (Fig. 11), which is close to the prediction (as described in Sect. 2.3). Similarly, we find  $R_{\rm N,3} \approx 2.2 \ \text{k}\Omega$ . Owing to two reasons, the reduction of  $t_{\rm s}$  is not commensurate with the  $R_{\rm N}$ -boosting ratio





between  $A_{XO-3}$  and  $A_{XO-1}$ . Firstly, as described in Sect. 2.3,  $M_{1-3}$  will deviate from their nominal operating points and deteriorate  $R_{N,3}$ . We can reveal this by measuring  $t_s$  against  $t_{sw}$  (Fig. 12). When  $t_{sw}$  is short (<60 µs) where  $M_{1-3}$  are in the subthreshold region, the small-signal model is still valid to estimate  $t_s$  against  $t_{sw}$  (i.e., slope of the curve (~ -10) closely matches with  $-R_{N,3}/R_{N,1} + 1$ ). As  $t_{sw}$  further increases, the oscillation drives  $M_{1-3}$  away from its original operating point and worsens  $R_{N,3}$ . Hence the slope of the curve declines and eventually reaches zero whereas the  $A_{XO-3}$ no longer aids  $t_s$ -reduction. Secondly, the XO entails an overhead time to enter the steady state after switching to  $A_{XO-1}$ . After this, the XO still takes ~380 µs to enter the steady state. Here, the nonideality of the ULV  $A_{XO-3}$  limits the improvement on  $t_s$ . In fact, for the amplifiers with standard I/O voltage and higher output swing, the reduction of  $t_s$  should be more profound and better matched with the  $R_N$ -boosting ratio.

With both  $A_{XO-3}$  and SSCI enabled, we further decrease  $t_S$  to 400 µs (3.3× reduction) and the corresponding  $E_S$  is 14.2 nJ (2.8× reduction) (Fig. 10b). When switching from  $A_{XO-3}$  to  $A_{XO-1}$  that have different output impedances and, subsequently, operating frequencies, there is an instantaneous change in the output swing, since the magnitude of current passing through the crystal does not change abruptly. The percentage of energy consumed in the startup phase by the SSCI,  $A_{XO-3}$ , and  $A_{XO-1}$  is: 7%, 39%, and 53%, respectively. We verified that  $t_{sw}$  can tolerate ±50% uncertainty for <10% $t_S$  variation, implying that we can obtain an adequate  $t_s$  even with nonoptimal  $t_{sw}$  (e.g., variation on PVT and crystal's parameters). This also justifies that the existing RO will be good enough to control  $t_{sw}$ , avoiding any external detection and control mechanism.

For the transient frequency of the XO, it takes ~300  $\mu$ s to settle for a  $\pm 20$  ppm  $f_0$  accuracy (i.e., 50 kHz drifting from the center frequency of 2.44 GHz in a packet, as defined in [33]). This result is 3.5× faster than the case without startup aid (Fig. 13). The steady-state power is 31.8  $\mu$ W at 0.35 V, and the PN is -134 dBc/Hz at 1 kHz offset, being adequate for most IoT applications and comparable to other state-of-



Fig. 14 Measured XO ( $f_0 = 24$  MHz) performances. (a) Startup time against  $V_{DD}$ . (b) Startup time against temperature

the-art XOs with a standard voltage (e.g., PN of -136 dBc/Hz at 1 kHz and  $f_0 = 26$  MHz in [10]).

The XO can uphold a steady-state output swing >80% of  $V_{DD}$  for  $V_{DD} = 0.3-0.5$ -V. The  $t_s$  varies <25% from its mean (400 µs) for  $V_{DD} = 0.3-0.5$  V (Fig. 14a). Only the RO of the SSCI fails to start if  $V_{DD}$  drops down to 0.25 V, but  $A_{XO-3}$  is still in place to aid  $t_S$  reduction. Over -40-90 °C,  $t_S$  variation is <7.5% (Fig. 14b). We obtained similar results for a 16 MHz crystal (i.e.,  $\Delta f_0/f_0 = 13.4$  ppm over 0.3-0.5 V,  $\Delta f_0/f_0 = 21.9$  ppm over -40-90 °C, and  $t_S$  variation, 9.8%).

Table 2 benchmarks the performance of the XO with the prior art. In terms of  $E_s$ , this work is >2.6× better than [20] and slightly higher than [21]. Furthermore, we can consider this circuit in the vanguard, since it proves the feasibility of regulation-free operation under a wide range of sub-0.5 V  $V_{DD}$ , while conforming to the frequency-stability specification of the BLE (Bluetooth Low Energy) standard.

|                                                   | 1                         |                                                       |                                                         |                                  | -                  |                                      |                               | -                       |      |
|---------------------------------------------------|---------------------------|-------------------------------------------------------|---------------------------------------------------------|----------------------------------|--------------------|--------------------------------------|-------------------------------|-------------------------|------|
|                                                   |                           | This v                                                | This work         JSSC'16 [3.19]         ISSCO<br>[3.1] |                                  | C'16<br>13]        | ISSCC'17<br>[3.20]                   | JSS<br>[3.2                   | C'18<br>26]¶            |      |
| Applications                                      |                           | BLE                                                   |                                                         | Bluetooth                        | BLE                |                                      | BLE                           | N/A                     |      |
| Fast startup techniques                           |                           | ULV inductive<br>three-stage g <sub>m</sub> +<br>SSCI |                                                         | Chirp injection +<br>gm-boosting | Dithered injection |                                      | Dynamic load +<br>gm-boosting | Precisely-<br>timed CFI |      |
| Steady-state techniques                           |                           | ULV one-stage gm<br>+ resistive load                  |                                                         | One-stage<br>inverter            |                    | One-stage $g_m$ + current-source loa | d                             |                         |      |
| CMOS process (nm)                                 |                           | 65                                                    |                                                         | 180                              | 65                 |                                      | 90                            | 65                      |      |
| Active area (mm <sup>2</sup> )                    |                           | 0.023                                                 |                                                         | 0.12                             | 0.08               |                                      | 0.072                         | 0.09 (per<br>XO)        |      |
| Supply voltage, VDD (V)                           |                           | 0.35 <sup>a</sup>                                     |                                                         | 1.5                              | 1.68               |                                      | 1.0                           | 1.0                     |      |
| Temperature, $T_{\text{Range}}$ (°C)              |                           | -40 - 90                                              |                                                         | -30 - 125                        | -40 - 90           |                                      | -40 - 90                      | -40                     | - 85 |
| $C_{\rm L}({\rm pF})$                             |                           | 6                                                     |                                                         | 8 (off-chip)                     | 6                  | 9                                    | 10                            | 9                       | 8    |
| Frequency, f0 (MHz)                               |                           | 16                                                    | 24                                                      | 39.25                            | 24                 | 24                                   | 24                            | 50                      | 10   |
| Startup energy, $E_{\rm S}$ (nJ)                  |                           | 15.8                                                  | 14.2                                                    | 349                              | -                  | -                                    | 36.7                          | 13.3                    | 12   |
| Startup time, $t_{\rm S}$ (µs)                    |                           | 460                                                   | 400                                                     | 158                              | 64                 | 435                                  | 200 <sup>d</sup>              | 2.2                     | 10   |
| $\Delta t_{\rm S}/t_{\rm S}$ over $T_{\rm range}$ |                           | 9.8%                                                  | 7.5%                                                    | 7%                               | ±35%               | ±20%                                 | 26.6%                         | 7%                      | 3%   |
| $\Delta f_0/f_0$ (ppm)                            | versus T <sub>Range</sub> | 21.9 <sup>b</sup> 14.1 <sup>b</sup>                   |                                                         | ±5.5                             | N/A                |                                      | N/A                           | N/A                     |      |
|                                                   | versus V <sub>DD</sub>    | 13.4 °                                                | 17.9 °                                                  | ±0.6 (1.2-1.8 V)                 | N/A                |                                      | N/A                           | N/A                     |      |
| Steady-state power (µW)                           |                           | 31.6                                                  | 31.8                                                    | 181                              | 393                | 693                                  | 95                            | 195                     | 45.5 |

Table 2 Performance summary and comparison with recent art

<sup>a</sup>Digital and constant- $g_m$  bias circuits are at 0.7 V (current budget: 5  $\mu$ A) generated by an on-chip charge pump as [29])

<sup>b</sup>@ 0.35V

<sup>c</sup>Across 0.3–0.5 V @ 20° C

<sup>d</sup>Amplitude >90% and  $\Delta f_0/f_0 < \pm 20$  ppm

<sup>e</sup>Only results from similar crystal packages compared

## 3 A 0.35 V 5200 μm<sup>2</sup> 2.1 MHz Temperature-Resilient Relaxation Oscillator with 667 fJ/cycle Energy Efficiency Using an Asymmetric Swing-Boosted RC Network and a Dual-Path Comparator

## 3.1 Motivation

For the crystal-less IoT node [34] and wakeup receiver [35], low-power and fully integrated kHz-to-MHz clock sources with moderate frequency inaccuracy are pivotal to their operations. For instance, [35] requires a frequency reference with ~2.5% frequency accuracy to calibrate the digitally controlled oscillator of the wakeup receiver. Although the crystal oscillator offers better frequency stability, a typical MHz-range crystal oscillator can consume tens of  $\mu$ W, which is

impermissible for the always-on module of an IoT node. In fact, we expect a  $\mu$ W-range power budget in the standby mode [23]. Also, the presence of an off-chip crystal can restrict the volume miniaturization of the IoT nodes.

The ring oscillator is a viable solution among the fully integrated oscillators due to its outstanding power efficiency, tuning range, and compactness [36]. Yet, the oscillating frequency of the ring oscillator is prone to PVT variations that require extra circuitry for compensation. For the LC oscillator, it has a proper balance between the integration level and frequency stability [37, 38]. Yet, the LC tank is too bulky for MHz-range applications.

Recent relaxation oscillators (RxOs) [39–47] proved their potential by attaining fast settling time, moderate intrinsic frequency stability, tiny footprint, and high energy efficiency. A typical RxO consists of a period-defining network, amplifiers, and logic gates. The period-defining network periodically (dis)charges the capacitors therein, and the amplifiers compare the voltages on the capacitors with a reference voltage. The logic gates read the output from the amplifiers and generate the required output correspondingly.

For IoT nodes powered by sub-0.5 V energy-harvesting sources such as the thermoelectric generator and solar cell, ULV operation adds to the RxO design constraints. Existing RxO architectures [39–44] do not favor sub-0.5 V operation, which severely confines the voltage headroom. Hence the linearity and accuracy of the current and voltage references are inferior, and their degraded precisions can affect the RxO's stability. Also, at high temperature, the transistor's leakage current ( $I_{\text{Leak}}$ ) limits the performance of the current/voltage reference.

Recently, a swing-boosted differential RxO proposed in [45] featured a symmetric swing-boosted RC network to define the period of the RxO, enabling no current or voltage reference while delivering a swing-boosted output to improve the noise performance. As this architecture does not entail current or voltage reference, it allows scaling down of the  $V_{\rm DD}$  without affecting the RC network precision. Nevertheless, it has the common-mode voltage ( $V_{\rm CM}$ ) of the RC network restricted to mid  $V_{\rm DD}$ , which implies  $V_{\rm CM} < 0.25$  V for sub-0.5 V operation, thereby hindering the operation of its subsequent comparator.

This section proposes a RxO that surmounts the challenges of sub-0.5 V operation and achieves high area and energy efficiencies. The key techniques are (1) an asymmetric RC network to free the  $V_{\rm CM}$  restriction while preserving a swingboosted output and (2) a dual-path comparator with delay compensation to allow temperature resilience. Prototyped in 28 nm CMOS, the RxO occupied a tiny area (5200 µm<sup>2</sup>) and attained superior energy efficiency (667 fJ/cycle) and figure of merit (FoM<sub>1</sub> = 181 dB) with respect to the prior art.

#### 3.2 Asymmetric Swing-Boosted RC Network

Figure 15a depicts the schematic of the swing-boosted RC network. As demonstrated in [45], the RxO utilizing this RC network exhibits low jitter ( $\sigma_{jit}$ ) attributed to its swing-boosted output voltages ( $V_{x,y}$ ) from the symmetric RC network (k = 1).

Considering  $\emptyset_1$  (Fig. 15b),  $V_x$  is initially at the ground and  $V_{top}$  connects to  $V_{DD}$ , whereas  $V_y$  is initially at  $V_{DD}$  and  $V_{bot}$  connects to the ground.  $V_x$  charges to  $V_{DD}$  and  $V_y$  charges to the ground with time constant ( $\tau$ ) RC. When they cross at  $V_{CM}$  such that  $V_y < V_x$ , the comparator inverts its outputs. Consequently, the chopper alternates the connections, where  $V_{top}$  now connects to the ground and  $V_{bot}$  connects to  $V_{DD}$ . As the charges across the capacitors conserve,  $V_x$  and  $V_y$  change to  $V_{CM} + V_{DD}$  and  $V_{CM} - V_{DD}$  after the transition. The process in  $\emptyset_2$  is complementary, and the operation repeats  $\emptyset_1$  after another transition. Hence, the differential signal  $V_{x,y}$  has



Fig. 15 (a) Simplified schematic of the swing-boosted differential RxO. (b) Timing diagram of the output of the RC network with k = 1, with  $V_{CM}$  fixed to 0.5  $V_{DD}$ . (c) Timing diagram of the output of the RC network with k > 1 such that  $V_{CM,U}$  and  $V_{CM,D}$  suit the design of the subsequent ULV comparator (this work)

a swing of  $2 \times V_{DD}$ . Since the  $\sigma_{jit}$  of the RxO is inversely proportional to the slope of  $V_{x,y}$  at the threshold  $(S_{xy})$ , raising the swing of  $V_{x,y}$  increases  $S_{xy}$  and improves the  $\sigma_{jit}$ .

The RC network symmetry restricts  $V_{\rm CM}$  to mid  $V_{\rm DD}$  regardless of the oscillation phases ( $\emptyset_{1,2}$ ). As  $V_{\rm DD}$  decreases to <0.5 V, the  $V_{\rm CM}$  shrinks to <0.25 V, which is insufficient to properly bias a differential pair with a tail current source. To break this limit, we propose an asymmetric RC network (k > 1), in which one RC branch has a larger  $\tau$ . From Fig. 15c, this act facilitates  $V_{x,y}$  to (dis)charge at different  $\tau$ . The leaps on  $V_x$  and  $V_y$  after the chopping are still  $\pm V_{\rm DD}$ , whereas the  $V_{\rm CM}$  of  $V_x$  and  $V_y$ alternate between  $V_{\rm CM,U}$  and  $V_{\rm CM,D}$  in  $\emptyset_1$  and  $\emptyset_2$ , respectively. As such, we can design k that allows proper  $V_{\rm CM,U}$  ( $V_{\rm CM,D}$ ) and thereby favors the operation of the subsequent ULV comparator.

Analyzing the waveform in Fig. 15c, we can derive four equations governing the (dis-)charge of the asymmetric RC network:

$$(V_{\rm CM,D} + V_{\rm DD})e^{-\frac{T_{\rm I}}{RRC}} = V_{\rm CM,U},$$
 (8)

$$(V_{\rm CM,D} - 2V_{\rm DD})e^{-\frac{T_{\rm I}}{\rm RC}} + V_{\rm DD} = V_{\rm CM,U},$$
(9)

$$(V_{\rm CM,U} + V_{\rm DD})e^{-\frac{T_2}{RC}} = V_{\rm CM,D},$$
 (10)

$$(V_{\rm CM,U} - 2V_{\rm DD})e^{-\frac{T_2}{\rm kRC}} + V_{\rm DD} = V_{\rm CM,D}.$$
 (11)

Assuming that  $T_1 = T_2$ , solving Eqs. (8)–(11) leads to

$$\left(\frac{V_{\rm DD} - V_{\rm CM,D}}{V_{\rm DD} + V_{\rm CM,D}}\right)^k = \frac{V_{\rm CM,D}}{2V_{\rm DD} - V_{\rm CM,D}},\tag{12}$$

$$\left(\frac{V_{\rm CM,U}}{2V_{\rm DD} - V_{\rm CM,U}}\right)^{k} = \frac{V_{\rm DD} - V_{\rm CM,U}}{V_{\rm DD} + V_{\rm CM,U}},\tag{13}$$

$$k = \frac{T}{2\text{RC}} / \ln\left(\frac{1 + 3e^{-T/2\text{RC}}}{1 - e^{-T/2\text{RC}}}\right),\tag{14}$$

where  $T_1 = T_2 = T/2$ . Therefore, we can calculate the required k to achieve a sufficient separation of  $V_{\text{CM},\text{U}}$  ( $V_{\text{CM},\text{D}}$ ) by numerically solving Eqs. (12) and (13), as well as the corresponding T by Eq. (14). Figure 16a illustrates the  $V_{\text{CM},\text{U}}$ ,  $V_{\text{CM},\text{D}}$ , and T versus k.

The  $S_{xy}$  around the threshold crossing determines the  $\sigma_{jit}$  with the following equation [48]:

$$\sigma_{jit} = \alpha \frac{V_{n,xy}}{S_{xy}},\tag{15}$$

where  $\alpha$  is a constant of proportionality and  $V_{n,xy}$  is the equivalent noise from the RC network and the subsequent comparator appearing at its output. We can determine



**Fig. 16** (a) The simulated  $V_{CM,D}$ ,  $V_{CM,D}$ , and the oscillating frequency versus k. Choosing a k > 1 enables a lower (higher)  $V_{CM,D}$  ( $V_{CM,U}$ ), facilitating the ULV operation. (CLK) The  $S_{XY}$  from mathematical modeling and simulated  $1/\sigma_{jit}$  from an ideal RxO with asymmetric RC network versus k. Overdesigning k decreases the  $S_{XY}$  and thus aggravates  $\sigma_{jit}$ 

 $S_{xy}$  by solving for the difference between the derivative of  $V_X$  and  $V_Y$  when t = T/2 (the time when crossing occurs),

$$S_{\rm xy} = \frac{dV_{\rm x,y}}{dt} \left( t = \frac{T}{2} \right). \tag{16}$$

For instance, in  $\emptyset_2$ ,  $V_X$  and  $V_Y$  become

$$V_X(t) = (V_{CM,U} + V_{DD})e^{-\frac{t}{RC}},$$
 (17)

$$V_Y(t) = (V_{\rm CM,U} - 2V_{\rm DD})e^{-\frac{t}{kRC}} + V_{\rm DD},$$
(18)

where we set t = 0 as the beginning of  $\emptyset_2$ . Taking the derivative of  $V_X$  with respect to t and substituting t = T/2, we can get

$$\frac{dV_X}{dt}\left(t=\frac{T}{2}\right) = -\frac{1}{\mathrm{RC}}\left(V_{\mathrm{CM},\mathrm{U}}+V_{\mathrm{DD}}\right)e^{-\frac{T}{2\mathrm{RC}}},\tag{19}$$

and substituting Eq. (10) into Eq. (19):

$$\frac{dV_X}{dt}\left(t = \frac{T}{2}\right) = -\frac{1}{\mathrm{RC}}V_{\mathrm{CM,D}}.$$
(20)

Similarly, we can obtain the slope of  $V_Y$  at t = T/2:

$$\frac{dV_Y}{dt}\left(t = \frac{T}{2}\right) = -\frac{1}{kRC}(V_{CM,D} - V_{DD}).$$
(21)

Then,  $S_{xy}$  in  $Ø_2$  is

$$S_{xy} = -\frac{1}{\mathrm{RC}} \left( V_{\mathrm{CM},\mathrm{D}} - \frac{V_{\mathrm{CM},\mathrm{D}}}{k} + \frac{V_{\mathrm{DD}}}{k} \right), \tag{22}$$

where we can find the relationship between  $V_{\text{CM,D}}$  and k from Eq. (12). Note in (3.22) that when k = 1 (symmetric RC network as in [45]),  $S_{xy} = -V_{\text{DD}}/\text{RC}$ , showing that a higher  $V_{\text{DD}}$  improves  $S_{xy}$  and thus  $\sigma_{jii}$ . Figure 16b shows the  $S_{xy}$  as a function of k. Under the identical RC and  $V_{\text{DD}}$ , increasing k results in decreasing  $S_{xy}$ . We can calculate  $S_{xy}$  similarly in  $\emptyset_1$ ; provided that  $T_1 = T_2$ ,  $S_{xy}$  in  $\emptyset_1$  should be equivalent (in negative) to  $S_{xy}$  in  $\emptyset_2$ .

Based on Fig. 16a, b, we can have the following takeaway: a large k allows  $V_{CM II}$  $(V_{\rm CM,D})$  to approach  $V_{\rm DD}$  (ground), easing the use of an NMOS (N-metal-oxide semiconductor) (PMOS [p-channel metal-oxide semiconductor])-input amplifier for comparisons. Yet, upsizing k penalizes  $\sigma_{iit}$  since  $\sigma_{iit} \propto 1/S_{xy}$ . Besides, pushing  $V_{CM,U}$  $(V_{\rm CM,D})$  close to  $V_{\rm DD}$  (ground) saturates the input pairs of the subsequent amplifiers. Then, there is a trade-off between the minimum  $V_{DD}$  and  $\sigma_{iit}$  for the RxO utilizing the asymmetric RC network. The minimum gate voltage at the NMOS-input amplifier is  $\sim 0.2$  V (i.e., 0.1 V for the tail current source +0.1 V for the gate-source voltages of the differential pair), and the minimum  $V_{DD}$  of the comparator is ~0.35 V (explained in Sect. 3.3). To yield a minimum  $V_{CM,U}$  of 0.2 V to drive the NMOS-input amplifier with 15% margin, we choose k = 2.4 such that  $V_{CM,U}$  is 0.23 V (0.66 ×  $V_{DD}$ ). During the fabrication, the mismatch between the resistors diverts  $V_{\text{CM},\text{U}}$  ( $V_{\text{CM},\text{D}}$ ) from their desired values. Nevertheless, since k is the ratio between the resistors, we can minimize its variation through a delicate layout and a common centroid technique. This means that a 15% margin is adequate to safeguard the operation of the RxO. Correspondingly, we positioned  $V_{\rm CM,D}$  at 0.33  $\times$   $V_{\rm DD}$  to favor the PMOS-input amplifier.

With k = 2.4 in Fig. 16b,  $S_{xy}$  reduces by 39%. To verify the degradation of  $\sigma_{jit}$ , we built an ideal RxO utilizing the asymmetric RC network with a noise source and simulated the  $\sigma_{jit}$  with different values of k. We juxtapose the simulated  $1/\sigma_{jit}$  of such RxO in Fig. 16b. The  $1/\sigma_{jit}$  decreases (hence  $\sigma_{jit}$  increases) at a similar rate of k with  $S_{xy}$ . The  $1/\sigma_{jit}$  at k = 2.4 decreases by 36%, thus verifying our analysis.

#### 3.3 Circuit Implementation

#### **ULV Comparator with Dual-Path Amplifiers**

In [45], the RxO utilizes an inverter-based amplifier for voltage comparison. Although this amplifier has excellent noise performance, it is not suitable for ULV operation as it requires a minimum voltage headroom of  $2(V_{GS} + V_{DS})$ . We proposed the asymmetric RC network in Sect. 3.2 for ULV operations, where we can adjust the  $V_{CM,U}$  ( $V_{CM,D}$ ) according to k. To cope with different  $V_{CM}$  at two phases of oscillations under a ULV headroom, we utilize a comparator with dual-path amplifiers to handle the voltage comparisons across  $V_{x,v}$ . The comparator consists of an

NMOS-input, a PMOS-input amplifier, and logic gates to generate the CLK signal. The NMOS-input amplifier, enabled in  $\emptyset_1$ , is capable of handling a higher input  $V_{\text{CM}}$ , where  $V_X$  and  $V_Y$  cross at  $V_{\text{CM},U}$ , with the PMOS-input amplifier disabled. The complementary operation happens in  $\emptyset_2$ . As such, both amplifiers can perform comparisons under the ULV headroom. When compared with the case using k = 1 and only a PMOS-input amplifier, the variation of the RxO's oscillating period ( $T_{\text{OSC}}$ ) reduces by ~40%.

Figure 17a, b presents the proposed ULV RxO, with each amplifier built by cascading three gain stages, each formed by a fully differential common-source (CS) amplifier (Fig. 18a), to boost the overall voltage gain. The simulated gains of the cascaded amplifiers are >27 dB. Following the amplifiers, the logic gates generate the CLK signals and operate the chopper of the RC network after boosting to CLK<sub>H</sub> (explained below).

Since we can adjust the  $V_{\rm CM,U}$  ( $V_{\rm CM,D}$ ) of the RC network between  $V_{\rm DD}$  and ground by choosing an appropriate k, the main limitation for the minimum  $V_{\rm DD}$  of the RxO derives from two factors: the dual-path amplifier and the logic gates. Assuming all transistors biased in the subthreshold region with the gate voltages bounded between  $V_{\rm DD}$  and ground, the minimum  $V_{\rm DD}$  of the differential CS amplifier is  $V_{\rm SD,1} + V_{\rm DS,3} + V_{\rm DS,5}$  (in Fig. 18a) if we assume the  $V_{\rm DS}$ -drop on M<sub>6</sub>, the transistor for power-gating, is negligible. To maintain operation in the subthreshold region, the  $|V_{\rm DS}|$  of a transistor should be  $>3 \times V_{\rm T}$ , where  $V_{\rm T}$  is the thermal voltage. The  $V_{\rm T}$  reaches 34 mV at 120 °C. Hence, the minimum  $V_{\rm DD}$  of the differential CS amplifier is 306 mV in theory. We allow ~10% margin for the design and choose a  $V_{\rm DD}$  of 0.35 V. On the other hand, the necessary  $V_{\rm DD}$  for the logic gates to operate under the desired oscillating frequency also limits the minimum  $V_{\rm DD}$ . In the selected CMOS 28 nm process, the delay of the logic gates with  $V_{\rm DD}$  of 0.35 V varies <1% of  $T_{\rm OSC}$  from -20 to 120 °C, evincing that a  $V_{\rm DD}$  of 0.35 V is sufficient to power the logic gates.

The comparator's delay ( $t_{delay}$ ) affects the  $T_{OSC}$  stability. As described later, a delay generator compensates for  $t_{delay}$  under different operating conditions. Here, we target a maximum  $\Delta t_{delay} \sim 25\%$  of  $T_{OSC}$  across -20 to 120 °C such that the resultant  $T_{osc}$  variation after compensation is <2.5%, reserving a 10% mismatch margin between  $t_{delay}$  and the delay generator. The simulated  $t_{delay}$  (N + P channel) ranges from 17 ns at 120 °C to 146 ns at -20 °C under a power consumption of 500 nW (at 27 °C), with a variation  $\sim 10\%$  above the target.

The gate voltages of  $M_3$  and  $M_4$  determine the operating region of  $M_5$  (Fig. 18a). To guarantee  $M_5$  operates in the subthreshold region,  $V_{DS,5}$  needs to be higher than  $3 \times V_T$ . We can either increase  $V_{in,P}$  ( $V_{in,N}$ ), which is the RC network output for the first amplifier, by upsizing k or decreasing the  $V_{GS}$  of  $M_3$  and  $M_4$ . As explained in Sect. 3.2, upsizing k deteriorates the  $\sigma_{jit}$ . On the other hand, under the same bias current and channel length, decreasing  $V_{GS}$  incurs a wider  $M_3(M_4)$ , thus exacerbating the  $t_{delay}$  and the RxO's frequency stability. From the simulation, the amplifier's delay raises by 26% with the  $V_{GS}$  of  $M_3(M_4)$  reduced by 10 mV (with the width of  $M_3(M_4)$  enlarged). We aim for a  $V_{GS}$  of 0.1 V for  $M_3(M_4)$  to achieve a proper trade-off between the  $t_{delay}$  and  $\sigma_{jit}$ .



**Fig. 17** (a) Proposed ULV swing-boosted RxO featuring an asymmetric RC network and a dualpath comparator. We track the delays of the amplifiers to tackle the frequency fluctuation against temperature and voltage variations. (b) Schematic of the logic gates. The SR latch, together with the delay unit, guarantees that the RxO only generates desired oscillating signal without glitch

Since each amplifier is only responsible for comparing  $V_x$  and  $V_y$  in one phase, we can have them power-gated based on the CLK state to reduce the power consumption. For instance, in  $\emptyset_1$  where CLK is high and the common-mode voltage of  $V_x$  and  $V_y$  is at  $V_{CM,U}$ , we enable the NMOS-input amplifier for comparison, while powering down the PMOS-input amplifier. The operation reverses in  $\emptyset_2$ . This duty-cycling scheme saves 26% of the total RxO power budget.

To ensure that  $M_1$  and  $M_2$  operate in the subthreshold region, a common-mode feedback (CMFB) circuit generates their gate voltages (Fig. 18b). The CMFB circuit compares the common-mode output voltage of the amplifier to  $V_{ref}$  and corrects  $V_{FB}$ . We scaled the transistors' sizes of the CMFB circuit from the main amplifier such



Fig. 18 (a) Schematic of the differential CS amplifier (NMOS). (b) CMFB circuit for the NMOS CS amplifier

that the PVT variations have the same effect on the amplifier and CMFB circuit to enhance its robustness.

We utilized a SR latch to read the results from the amplifiers and yield the desired state of CLK. Also, we used a delayed CLK ( $\overline{\text{CLK}}$ ) signal  $\text{CLK}_{\text{D}}$  ( $\overline{\text{CLK}}$ ) to mask out the glitches and avert the undesired transition of CLK due to glitches from the amplifiers during the switching. For instance, as illustrated in Fig. 17b, before the end of  $\emptyset_1$  (CLK and CLK<sub>D</sub> are high), both S and R of the SR latch are high and maintain the state of CLK. Therein, with the NMOS-input amplifier enabled, we disable the PMOS-input amplifier. Once  $V_X > V_Y$ , R becomes low and S is still at high (since  $\overline{\text{CLK}_{D}}$  is low), which forces CLK to low. Then, the circuit enables the PMOS-input amplifier, while disabling the NMOS-input amplifier. During the switching of the amplifiers, we may have an undesired transition on  $V_{\text{out,N}}/V_{\text{out,P}}$ . The CLK<sub>D</sub> signal and the NAND gates guarantee that these undesired glitches do not affect the state of CLK. After a delay of  $\tau_d$ , CLK<sub>D</sub> goes low. Both S and R are high again, and the SR latch maintains the state of CLK until  $V_{out,P}$  goes high ( $V_X < V_Y$ ). The operation repeats itself after another transition of CLK. A simple RC circuit and inverters with  $\tau_d$  of ~80 ns implement the delay unit. We selected  $\tau_d$  to allow sufficient margin before the zero-crossing point of  $V_{XY}$  without affecting the comparison, yet it would be long enough to filter out the glitches from the amplifiers during the switching amid PVT variation.

A constant- $g_m$  bias circuit aids the amplifiers in withstanding voltage and temperature variations [49]. A switched-capacitor voltage doubler (Fig. 19a) powers the bias circuit, which extends the voltage headroom  $(2 \times V_{DD} \approx 0.7 \text{ V})$ . As we can reuse the CLK signal from the RxO itself to operate the voltage doubler, the power (11%) overhead is low. During the start-up, there is no CLK signal yet to drive the voltage doubler, and hence there would be no output from the bias circuit without any auxiliary signal. Thus, a start-up pulse (duration ~1 µs, generated on-chip after  $V_{DD}$  rises) enables an auxiliary ring oscillator (RO) to operate the voltage doubler in this start-up phase (Fig. 19b, c). With the  $V_{2X}$  boosted up to ~2 ×  $V_{DD}$ , the bias circuit



Fig. 19 (a) Schematic of the switched capacitor voltage doubler. (b) The auxiliary RO that drives the voltage doubler during the startup. (c) Timing diagram of the auxiliary RO and the voltage doubler

functions properly within this period. Then, we disable the start-up pulse and the auxiliary RO, with the RxO starting to operate. Like this, the RO does not pose interference to the RxO nor affect the accuracy of the RxO's frequency. The RO's frequency ranges from 15.2 to 35.1 MHz across -20-120 °C.

#### **Delay Generators**

The temperature dependency of  $t_{delay}$  affects RxO's  $T_{OSC}$ . Ideally,  $T_{OSC}$  is only dependent on the RC network. However, the  $t_{delay}$  after the zero-crossings of  $V_{x,y}$  prolongs the duration of each phase. As  $t_{delay}$  is temperature-dependent, it deteriorates the RxO's frequency stability. Raising the amplifiers' power budget can diminish the ratio  $t_{delay}/T_{OSC}$ , but it penalizes the RxO energy efficiency. In [42], a period controller compensates  $t_{delay}$  by doubling the current injected into the period-



**Fig. 20** (a) Proposed delay generator to track the  $t_{delay}$  at different operating conditions and its timing diagram. (b) Matching between  $t_{delay}$  and  $t_{DN} + t_{DP}$  against temperature variation (under nominal case). (c) Principle of the delay compensation: when  $\emptyset_{FH}$  is high,  $\tau$  of the RC branches halved thus  $V_{x,y}$  (dis)charge at a double rate to compensate  $t_{delay}$ . (d, e) The Monte Carlo-simulated  $t_{DP}$  and  $t_{DN}$  (100 runs) at 27 °C with different input codes for the capacitor banks

defining capacitors, in which the current injection duration tracks  $t_{delay}$ . As such, it can correct  $T_{OSC}$  to minimize its temperature sensitivity. Yet, the period controller entails an extra comparator for copying  $t_{delay}$ , penalizing the power budget.

Since the delay of an amplifier relates to its bias current, we introduce a delay generator to create a pulse, with its width inversely proportional to the bias current. As demonstrated in Fig. 20a, two delay generators (for NMOS- and PMOS-input

amplifiers) with scaled currents from the main amplifiers generate the pulses after the edges of CLK<sub>H</sub>. From the simulation, the width of the pulses  $Ø_F$  closely tracks  $t_{delay}$  (error <7.6% of  $t_{delay}$  or <2.3% of  $T_{OSC}$ ). To compensate  $t_{delay}$ , we halve the  $\tau$  of the RC branches when  $Ø_{FH} = 1$  by closing switches S<sub>1</sub> and S<sub>2</sub> in Fig. 17a. The open-loop compensation scheme alleviates the long settling time of the oscillator. Furthermore, this compensation method can even off the temperature dependency of the resistors in the RC network, avoiding area-hungry composite resistors to obtain a zero temperature coefficient (TC) [42, 46].

We implemented the delay-controlling capacitors  $C_{\rm N}$  and  $C_{\rm P}$  as four-bit capacitor banks, with their values programmed to balance the process variation once after fabrication. The design of the tuning ranges of the capacitances can cover the variations of  $t_{delay}$  amid process variations. The  $t_{delay}$  of NMOS-input and PMOSinput amplifiers vary from 15 to 45 ns and 36 to 60 ns, respectively, from the Monte Carlo simulation (100 runs, at 27 °C). Consequently, we design the delay generator and the capacitor banks capable of generating pulses of width in this range by adjusting their codes correspondingly (Fig. 20d, e). With the proposed compensation scheme, the simulated variation of  $T_{\rm OSC}$  decreases from 25% to 2.1% over -20–120 °C. For the constant- $g_m$  biasing, the current decreases with temperature. Hence, both  $I_{BN}$  and  $I_{BP}$ , the biasing currents of the NMOS-input and PMOS-input amplifiers, are minimum at -20 °C. Consequently, the  $t_{\rm DN}$  and  $t_{\rm DP}$  are largest at -20 °C and decrease to their minimum toward 120 °C. Therefore, we have the overall resolutions of  $t_{\rm DP}$  and  $t_{\rm DP}$  confined at low temperature (7 ns and 13 ns). Still, these resolutions are sufficient to uphold the 2.5% frequency error requirement. In case a finer resolution is necessary, the number of bits of the capacitor banks can increase.

#### **CLK Boosters**

The non-idealities of the switches influence the performance of the RxO. For example, the nonzero on-resistances ( $R_{ON}$ ) of the transistors that constitute switches  $S_{1-6}$  (in Fig. 17a) affect the  $\tau$  of the RC network. Under sub-0.5 V, the transistors work in the subthreshold region. Then, the situation emerges as  $R_{ON}$  increases exponentially with –( $V_{GS} - V_{TH}$ ), where the worst case of  $|V_{GS}|$  is  $0.5 \times V_{DD}$  without any boosting technique. Further, as  $R_{ON}$  is prone to temperature variations ( $R_{ON}$  increases with a decreasing temperature), it inevitably affects the frequency stability of the RxO. To alleviate the impact, we should minimize  $R_{ON}$  in comparison with R in the RC network. One possibility is reducing  $R_{ON}$  by upscaling the widths of the transistors that compose the switches. Yet, this act leads to another problem: in the deep submicron CMOS process, the  $I_{Leak}$  in the off-state, especially at high temperature, restricts the RxO's performance and operation range. Considering the switches  $S_{1-2}$  in Fig. 17a again, at high temperature, the transistors with high  $I_{Leak}$  equivalently reduce  $\tau$ . Altogether, there is a trade-off between their  $R_{ON}$  at low temperature and  $I_{Leak}$  at high temperature.

To tackle this challenge, we employ clock boosters [50] to triple the swing of the digital signals (CLK<sub>H</sub>,  $\overline{\text{CLK}_{\text{H}}}$ , and  $\emptyset_{\text{FH}}$ ). The clock booster, powered from  $V_{\text{DD}}$ ,



**Fig. 21** (a)  $R_{\rm ON}$  of an NMOS from -20 to  $120 \, ^{\circ}{\rm C}$  with different  $V_{\rm G}$ . For both cases,  $V_{\rm D} = V_{\rm S} = 0.175 \, {\rm V}$ . The increased swing on  $V_{\rm G}$  reduces the variations of  $R_{\rm ON}$  by 8600×. (b)  $I_{\rm Leak}$  of the same NMOS in (a) in the off-state. With a negative  $V_{\rm G}$ , the  $I_{\rm Leak}$  reduces by 389× at 120 °C. For both cases,  $V_{\rm D} = 0.35 \, {\rm V}$  and  $V_{\rm S} = 0 \, {\rm V}$ 

increases the swing of the periodic signal (high,  $2 \times V_{\text{DD}}$ ; low,  $-V_{\text{DD}}$ ) without additional power supply. With a boosted swing, the worst  $|V_{\text{GS}}|$  for the transistors now becomes  $1.5 \times V_{\text{DD}}$ . Besides, benefitting from the negative voltage  $(-V_{\text{DD}})$  at the logic low level, it effectively suppresses  $I_{\text{Leak}}$ , even at 120 °C. For example, this scheme not only tightens the variations of the  $R_{\text{ON}}$  of an NMOS switch across – 20–120 °C by  $8600 \times (V_{\text{D}} = \text{V}_{\text{S}} = 0.5 \times V_{\text{DD}}$ , Fig. 21a) but also shrinks  $I_{\text{Leak}}$  in the off-state at 120 °C from 307 to 0.8 nA (Fig. 21b), rendering the RxO robust in an extreme environment.

#### 3.4 Measurement Results

We fabricated a prototype of the RxO in 28 nm CMOS 1P10M technology. It occupied a core area of 5200  $\mu$ m<sup>2</sup>, dominated by the comparator (28%) and RC network (26%) (Fig. 22a, b). The RxO consumed 1.4  $\mu$ W at 22 °C on average (N = 7) (Fig. 23a, b)), where the comparator (49%, from simulation) dominates (Fig. 22c). After the fabrication, we apply three-point trim to the capacitor banks of the delay generator based on the measured frequency of the RxO.

Peripheral equipment such as the oscilloscope (for observing the waveform in real-time) and the frequency counter (for measuring the frequency f) have high input capacitances. The digital buffers with a  $V_{DD}$  of 0.35 V and reasonable sizing are not capable of driving these equipment. Thus, we utilize on-chip-level shifters to raise the output signals for swings of 0.9 V. Afterward, we feed such signals to digital



**Fig. 22** (a) Chip micrograph of the fabricated RxO in 28 nm CMOS. (b) Area breakdown of the RxO. (c) Power breakdown of the RxO (from simulation)

buffers with a  $V_{DD}$  of 0.9 V (supplied independent of the RxO's  $V_{DD}$ ) to drive the peripheral equipment.

The mean oscillating frequency of the RxO is 2.1 MHz. It has an energy efficiency of 667 fJ/cycle, rendering it the most energy-efficient RxO reported in the MHz-range. After calibrations, the deviations of the RxOs' frequencies are <2.5% from -20 to 120 °C (Fig. 23c). The resulting TC is 158 ppm/°C on average. The mean variation of the RxO's frequencies from 0.35 to 0.38 V (~9% of  $V_{DD}$ ) is 2.5% (Fig. 23d). The line sensitivity, where we also take the supply voltage into account  $\left[\left(\frac{\Delta f}{f}\right)/\left(\frac{\Delta V}{V}\right)\right]$ , is 26.8%. The large sensitivity of the RxO to voltage variation is attributable to the subthreshold operation and low  $V_{DS}$  across the transistors of the amplifiers. From the simulation, the bias current of the NMOS-input amplifier increases by 25% from 0.35 to 0.38 V, hence affecting the  $t_{delay}$  and the RxO's frequency. Still, the 0.35–0.38 V range is sufficient for IoT devices powered by solar cells and installed in the typical indoor environment (e.g., home and office), as the open-circuit voltage of a solar cell varies 30 mV amid a change in light intensity of



Fig. 23 Measured performance of the RxO from seven chip samples. (a) Power consumption versus temperature. (b) Power consumption versus  $V_{DD}$ . (c) Frequency stability versus temperature. (d) Frequency stability versus  $V_{DD}$ 



Fig. 24 (a) Measured period jitter of the RxO (52,000 hits on the oscilloscope). (b) Accumulated jitter of the RxO

~3× [51, 52]. If we relax the requirement on frequency stability or recalibration of the frequency at different  $V_{DD}$  is feasible, the working range of the RxO can extend to 0.5 V and then limited by the breakdown voltage of the CMOS process (1 V) due to the voltage doubler and clock booster.

The RMS period jitter of the RxO is 800 ps (0.15% of  $T_{OSC}$ ) (Fig. 24a). The accumulated jitter increases at a rate of  $\sqrt{N}$  up to ~60 cycles, in which the thermal noise is the dominant noise source (Fig. 24b). When compared with [45], the high period jitter is attributable to the low supply voltage, low power, and different amplifiers handling the comparison in  $\emptyset_1$  and  $\emptyset_2$ . Still, the RxO is appropriate for the devices in which ULV and ultra-low power are the priorities (e.g., wakeup receiver [35]). The long-term stability is 210 ppm (gating time >0.1 s). To



**Fig. 25** (a) Startup waveform of the RxO, with  $V_{DD}$  switched on at t = 0 s. (b) Transient frequency during startup. The RxO reaches steady state within three clock cycles or 3.6 µs after enabling  $V_{DD}$ . (c) The startup time of the RxO at different temperatures

characterize the supply noise rejection of the RxO, we superimpose a sinusoidal signal on  $V_{\rm DD}$  and measure the corresponding period jitter. In the presence of a 20 mV<sub>pp</sub> sinusoidal signal (1 kHz) at the supply, the period jitter of the RxO exhibits a value of 2 ns.

We also characterize the startup time of the RxO, which is crucial if the RxO is power gating to further suppress the power consumption of the IoT node. As the asymmetric RC network requires finite clock cycles to produce a consistent output signal, the RxO's frequency settles after the third clock pulse (Fig. 25a, b). Over the entire temperature range, the RxO enters the steady state within 3.6  $\mu$ s after enabling  $V_{\text{DD}}$  (Fig. 25c).

Herein we benchmark the RxO using two FoM. First, we evaluated the RxO using the FoM proposed in [44]

$$FoM_{1} = 10 \log \left( \frac{f \cdot T_{range}}{Power \cdot TC} \right),$$
(23)

with the temperature range  $T_{\text{range}}$ . This FoM takes into account the trade-off among *f*, power,  $T_{\text{range}}$ , and TC. The FoM<sub>1</sub> of the RxO is 181 dB, which is comparable to the state of the art in spite of the ULV  $V_{\text{DD}}$  of 0.35 V. Then, we evaluated the RxO using the conventional FoM:

|                                                 | Vaa        | Milarliá       | T in    | Sourceth     | Las       |            |
|-------------------------------------------------|------------|----------------|---------|--------------|-----------|------------|
|                                                 | K00,       | ESSCIPC'17     | Liu,    | ISSC'10      | Lee,      |            |
|                                                 | 135CC 17   |                | 1330 19 | JSSC 19      | 1330 20   | This work  |
|                                                 | [43]       | [40]           | [44]    | [41]         | [45]      | THIS WOLK  |
| Process                                         | 180        | 350            | 65      | 65           | 180       | 28         |
| (nm)                                            |            |                |         |              |           |            |
| Frequency                                       | 0.44       | 1              | 1.05    | 1.2          | 10.5      | 2.1        |
| (MHz)                                           |            |                |         |              |           |            |
| $V_{\rm DD}$ (V)                                | 1.4-3.3    | 3-4.5          | 0.98-   | 0.9-1.8      | 1.4-2.0   | 0.35-0.38  |
|                                                 |            |                | 1.02    |              |           |            |
| Power                                           | 21.3       | 210            | 69      | 0.82         | 219.8     | 1.4        |
| (uW)                                            |            |                |         | 0.02         | -1710     |            |
| Energy offi                                     | 48.4       | 210            | 65 7    | 0.68         | 20.0      | 0.67       |
| cionov                                          | +0.4       | 210            | 05.7    | 0.08         | 20.9      | 0.07       |
| (pl/avala)                                      |            |                |         |              |           |            |
|                                                 | 20,100     | 40, 125        | 15.55   | 20, 125      | 40, 125   | 20, 120    |
| I <sub>range</sub> (°C)                         | -20-100    | -40-125        | -15-55  | -20-125      | -40-125   | -20-120    |
| TC (ppm/°                                       | 169        | 24.3           | 4.3     | 100          | 137       | 158        |
| <u>C)</u>                                       |            |                |         |              |           |            |
| Variation                                       | 0.04%      | 0.42%          | 0.17%   | .0.5407      | 2.64%     | 2.3%       |
| across $V_{DD}$                                 |            |                |         | $\pm 0.54\%$ |           |            |
| Line sensi-                                     | 0.03%      | 0.84%          | 4.25%   |              | 6.16%     | 26.8%      |
| tivity $\left(\frac{\Delta f}{\Delta V}\right)$ |            |                |         | $\pm 0.54\%$ |           |            |
| (f, f)                                          |            |                |         |              |           |            |
| )                                               |            |                |         |              |           |            |
| Area (µm <sup>2</sup> )                         | 58,000     | 40,000         | 51,000  | 5000         | 15,000    | 5200       |
| Period iitter                                   | 1060       | _              | 160     | 1_           | 9.86      | 800        |
| (ps)                                            |            |                |         |              |           |            |
| Startup                                         |            | 1 <sup>a</sup> | 8       | 10           |           | 36         |
| time (us)                                       | -          | 1              | 0       | 10           |           | 5.0        |
|                                                 | 100        | -              |         | b            |           | -          |
| No. of                                          | 100        | 5              | -       | 170          | 15        | 17         |
| samples                                         |            |                |         |              |           |            |
| FoM <sub>1</sub> (dB)                           | 162        | 165            | 174     | 183          | 168       | 181        |
| FoM <sub>2</sub>                                | -152.7     | -              | -       | -            | -157.7    | -143.4     |
| (dBc/Hz)                                        | (@ 10 kHz) |                |         |              | (@ 1 kHz) | (@ 10 kHz) |

Table 3 Performance summary and comparison with the state-of-the-art RXOs

<sup>a</sup>Deduced from the numbers of cycles to start, which may underestimate the true startup time <sup>b</sup>For temperature stability measurement

$$FoM_2 = PN - 20 \log\left(\frac{f}{f_{offset}}\right) + 10 \log\left(\frac{Power}{1 \text{ mW}}\right),$$
(24)

where PN is the phase noise at the offset frequency from the carrier  $f_{\text{offset}}$ . The PN of the RxO at 10 kHz offset is -68.4 dBc/Hz, resulting in an FoM<sub>2</sub> of -143.4 dBc/Hz.

Table 3 summarizes the performance of the RxO and compares it with recent art. This work is the first sub-0.5 V temperature-resilient (<2.5%) RxO achieving a high power efficiency of 667 fJ/cycle (Fig. 26). When compared with the RxO with a



Fig. 26 Comparison with state-of-the-art fully integrated oscillators. Red circle, relaxation oscillator; blue circle, frequency-locked-loop type oscillator. A larger circle implies a relatively higher oscillating frequency. The figure only shows selected oscillators with frequencies between 0.1 and 10 MHz

symmetric swing-boosted RC network [45], this RxO operates at a  $4 \times$  less  $V_{DD}$ , while achieving a comparable TC after compensation.

### 4 Conclusions

This chapter detailed the analysis and design of two ULV MHz-range clock references for different purposes, with both clock references implemented and taped out in deep-submicron CMOS, exhibiting well-founded and pioneering measurement results. The first is a regulation-free sub-0.5 V XO for energy-harvesting BLE radios. We introduced two circuit techniques, *dual-mode*  $g_m$  and *SSCI*, to reduce the startup time  $t_s$  and energy  $E_s$ . The dual-mode  $g_m$  exploits the inductive feature of three-stage  $g_m$  ( $A_{XO-3}$ ) to counteract the crystal's  $C_s$  during the startup and the low-noise feature of one-stage  $g_m$  ( $A_{XO-1}$ ) to preserve the PN in the steady state. The XO prototyped in 65 nm CMOS has a compact area (0.023 mm<sup>2</sup>) that is >3.1× smaller than the prior art. The measured  $t_s$  and  $E_s$  of the XO, with a 24 MHz crystal, are 400 µs and 14.2 nJ, respectively. The frequency stability against voltage (0.3–0.5 V) is 17.9 ppm and temperature (-40–90 °C) is 14.1 ppm; both conform to the BLE standard.

The second clock reference is a 2.1 MHz temperature-resilient RxO with a 0.35 V supply voltage for ultra-low-power IoT nodes. We jointly design an asymmetric

swing-boosted RC network and a dual-path comparator to tackle the challenges of ULV (<0.5 V) operation. The open-loop delay generator compensates for the temperature-sensitive delay of the comparator. Fabricated in 28 nm CMOS, it has an active area of only 5200  $\mu$ m<sup>2</sup> and achieves the best energy efficiency of 667 fJ/ cycle among the previously reported MHz-range RxOs. Further, it also has a high figure of merit of 181 dB in spite of the ULV headroom and can settle within 3.6  $\mu$ s after enabling the supply voltage.

### References

- 1. Wollschlaeger, M., Sauter, T., & Jasperneite, J. (2017, March). The future of industrial communication. *IEEE Industrial Electronics Magazine*, 11, 17–27.
- Ahmed, E., Yaqoob, I., Gani, A., Imran, M., & Guizani, M. (2016, November). Internet-ofthings-based smart environments: State of the art, taxonomy, and open research challenges. *IEEE Wireless Communications*, 23(5), 10–16.
- 3. Bahai, A. (2016, September). Ultra-low energy systems: Analog to information. In *Proceedings of the European Solid-State Circuits Conference (ESSCIRC)* (pp. 3–6).
- Bandyopadhyay, S., & Chandrakasan, A. P. (2012, September). Platform architecture for solar, thermal, and vibration energy combining with MPPT and single inductor. *IEEE Journal of Solid-State Circuits*, 47(9), 2199–2215.
- Weng, P. S., Tang, H. Y., Ku, P. C., & Lu, L. H. (2013, April). 50 mV-input batteryless boost converter for thermal energy harvesting. *IEEE Journal of Solid-State Circuits*, 48(4), 1031–1041.
- Bito, J., Bahr, R., Hester, J. G., Nauroze, S. A., Georgiadis, A., & Tentzeris, M. M. (2017, May). A novel solar and electromagnetic energy harvesting system with a 3-D printed package for energy efficient Internet-of-Things wireless sensors. *IEEE Transactions on Microwave Theory* and Techniques, 65(5), 1831–1842.
- Lei, K.-M., Mak, P.-I., Law, M.-K., & Martins, R. P. (2018, September). A regulation free sub-0.5-V 16–/24-MHz crystal oscillator with 14.2-nJ startup energy and 31.8-µW steady-state power. *IEEE Journal of Solid-State Circuits*, 53(9), 2624–2635.
- Lei, K.-M., Mak, P.-I., & Martins, R. (2021, September). A 0.35-V 5,200-μm<sup>2</sup> 2.1-MHz temperature-resilient relaxation oscillator with 667 fJ/cycle energy efficiency using an asymmetric swing-boosted RC network and a dual-path comparator. *IEEE Journal of Solid-State Circuits*, 56(9), 2701–2710.
- Tsai, M.-D., Yeh, C.-W., Cho, Y.-H., Ke, L.-W., Chen, P.-W., & Dehng, G.-K. (2008, June). A temperature-compensated low-noise digitally-controlled crystal oscillator for multi-standard applications. In *Proceedings of the IEEE Radio Frequency Integrated Circuits Symposium* (*RFIC*) (pp. 533–536).
- Chang, Y., Leete, J., Zhou, Z., Vadipour, M., Chang, Y.-T., & Darabi, H. (2012, February). A differential digitally controlled crystal oscillator with a 14-bit tuning resolution and sine wave outputs for cellular applications. *IEEE Journal of Solid-State Circuits*, 47(2), 421–434.
- Iguchi, S., Sakurai, T., & Takamiya, M. (2017, November). A low-power CMOS crystal oscillator using a stacked-amplifier architecture. *IEEE Journal of Solid-State Circuits*, 52(11), 3006–3017.
- Lei, K.-M., Mak, P.-I., & Martins, R. P. (2021, January). Startup time and energy reduction techniques for crystal oscillators in the IoT era. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 68(1), 30–35.

- Griffith, D., Murdock, J., & Røine, P. T. (2016, February). A 24MHz crystal oscillator with robust fast start-up using dithered injection. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 104–105).
- Nordic Semiconductor. (2018). nRF52840 Data Sheet [Online]. Available:http://infocenter. nordicsemi.com/pdf/nRF52840\_PS\_v1.0.pdf
- Liu, Y.-H., Bachmann, C., Wang, X., Zhang, Y., Ba, A., Busze, B., et al. (2015, February). A 3.7 mW-RX 4.4 mW-TX fully integrated Bluetooth Low-Energy/IEEE802. 15.4/proprietary SoC with an ADPLL-based fast frequency offset compensation in 40nm CMOS. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 236–237).
- 16. Kuo, F. W., Ferreira, S. B., Chen, H. N. R., Cho, L. C., Jou, C. P., Hsueh, F. L., et al. (2017, April). A bluetooth low-energy transceiver with 3.7-mW all-digital transmitter, 2.75-mW high-IF discrete-time receiver, and TX/RX switchable on-chip matching network. *IEEE Journal of Solid-State Circuits*, 52(4), 1144–1162.
- 17. Liu, H., Sun, Z., Tang, D., Huang, H., Kaneko, T., Deng, W., et al. (2018, February). An ADPLL-centric bluetooth low-energy transceiver with 2.3mW interference-tolerant hybrid-loop receiver and 2.9mW single-point polar transmitter in 65nm CMOS. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 444–445).
- Blanchard, S. A. (2003, June/July). Quick start crystal oscillator circuit. In Proceedings of the IEEE University/Government/Industry Microelectronics Symposium (pp. 78–81).
- Iguchi, S., Fuketa, H., Sakurai, T., & Takamiya, M. (2016, February). Variation-tolerant quickstart-up CMOS crystal oscillator with chirp injection and negative resistance booster. *IEEE Journal of Solid-State Circuits*, 51(2), 496–508.
- 20. Ding, M., Liu, Y.-H., Zhang, Y., Lu, C., Zhang, P., Busze, B., et al. (2017, February). A 95µW 24MHz digitally controlled crystal oscillator for IoT applications with 36nJ start-up energy and >13× start-up time reduction using a fully-autonomous dynamically-adjusted load. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 90–91).
- Esmaeelzadeh, H., & Pamarti, S. (2018, March). A quick startup technique for high-Q oscillators using precisely timed energy injection. *IEEE Journal of Solid-State Circuits*, 53(3), 692–702.
- 22. Kwon, Y.-I., Park, S.-G., Park, T.-J., Cho, K.-S., & Lee, H.-Y. (2012, February). An ultra low-power CMOS transceiver using various low-power techniques for LR-WPAN applications. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 59(2), 324–336.
- Texas Instruments. (2013). CC2541 Data Sheet [Online]. Available: http://www.ti.com/lit/ds/ symlink/cc2541.pdf
- 24. Zhang, F., Miyahara, Y., & Otis, B. P. (2013, December). Design of a 300-mV 2.4-GHz receiver using transformer-coupled techniques. *IEEE Journal of Solid-State Circuits*, 48(12), 3190–3205.
- 25. Babaie, M., Kuo, F. W., Chen, H. N. R., Cho, L. C., Jou, C. P., Hsueh, F. L., et al. (2016, July). A fully integrated Bluetooth Low-Energy transmitter in 28 nm CMOS with 36% system efficiency at 3 dBm. *IEEE Journal of Solid-State Circuits*, 51(7), 1547–1565.
- 26. Yu, W.-H., Yi, H., Mak, P.-I., Yin, J., & Martins, R. P. (2017, February). A 0.18 V 382µW bluetooth low-energy (BLE) receiver with 1.33 nW sleep power for energy-harvesting applications in 28nm CMOS. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 414–415).
- 27. Yin, J., Yang, S., Yi, H., Yu, W.-H., Mak, P.-I., & Martins, R. P. (2018, February). A 0.2V energy-harvesting BLE transmitter with a micropower manager achieving 25% system efficiency at 0dBm output and 5.2nW sleep power in 28nm CMOS. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 450–451).
- Lei, K.-M., Mak, P.-I., Law, M.-K., & Martins, R. (2018, February). A regulation-free sub-0.5V 16/24MHz crystal oscillator for energy-harvesting BLE radios with 14.2nJ startup energy and

31.8µW steady-state power. In *IEEE International Solid-State Circuits Conference – (ISSCC)* Digest of Technical Papers (pp. 52–53).

- Klauder, J. R., Price, A. C., Darlington, S., & Albersheim, W. J. (1960, July). The theory and design of chirp radars. *The Bell System Technical Journal*, 39(4), 745–808.
- Vittoz, E. A., Degrauwe, M. G., & Bitz, S. (1988, March). High-performance crystal oscillator circuits: Theory and application. *IEEE Journal of Solid-State Circuits*, 23(3), 774–783.
- 31. Lei, K.-M., Mak, P.-I., & Martins, R. P. (2017, May). A 0.4 V 4.8 μW 16MHz CMOS crystal oscillator achieving 74-fold startup-time reduction using momentary detuning. In *IEEE International Symposium on Circuits and Systems (ISCAS)* (pp. 2791–2794).
- 32. Iguchi, S., Saito, A., Zheng, Y., Watanabe, K., Sakurai, T., & Takamiya, M. (2013, June). 93% power reduction by automatic self power gating (ASPG) and multistage inverter for negative resistance (MINR) in 0.7 V, 9.2 μW, 39MHz crystal oscillator. *IEEE Proceedings of the Symposium on VLSI Circuits*, C142–C143.
- 33. Bluetooth Core Specification v5.0 [Online]. Available: https://www.bluetooth.com/specifications/bluetooth-core-specification
- 34. Khan, O., et al. (2016, May). Frequency reference for crystal free radio. In *IEEE International Frequency Control Symposium* (pp. 1–2).
- 35. Pletcher, N. M., Gambini, S., & Rabaey, J. (2009, January). A 52 μW wakeup receiver with 72 dBm sensitivity using an uncertain-IF architecture. *IEEE Journal of Solid-State Circuits*, 44(1), 269–280.
- Sundaresan, K., Allen, P., & Ayazi, F. (2006, February). Process and temperature compensation in a 7-MHz CMOS clock oscillator. *IEEE Journal of Solid-State Circuits*, 41(2), 433–442.
- 37. Zhang, L., Kuo, N.-C., & Niknejad, A. (2019, October). A 37.5–45 GHz superharmoniccoupled QVCO with tunable phase accuracy in 28 nm CMOS. *IEEE Journal of Solid-State Circuits*, 54(10), 2754–2764.
- Ding, X., Wu, J., & Chen, C. (2019, February). A low-power 0.6-V quadrature VCO with a coupling current reuse technique. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 66(2), 202–206.
- Meng, X., Li, X., Cheng, L., Tsui, C.-Y., & Ki, W.-H. (2019, December). A low power relaxation oscillator with switched-capacitor frequency-locked loop for wireless sensor node applications. *IEEE Solid-State Circuits Letters*, 2(12), 281–284.
- Mikulić, J., Schatzberger, G., & Barić, A. (2017, September). A 1-MHz on-chip relaxation oscillator with comparator delay cancelation. In *Proceedings of the European Conference on Solid-State Circuits (ESSCIRC)* (pp. 95–98).
- 41. Savanth, A., Weddell, A., Myers, J., Flynn, D., & Al-Hashimi, B. (2019, November). A sub-nW/kHz relaxation oscillator with ratioed reference and sub-clock power gated comparator. *IEEE Journal of Solid-State Circuits*, 54(11), 3097–3106.
- 42. Tokairin, T., et al. (2012, June). A 280nW, 100kHz, 1-cycle start-up time, on-chip CMOS relaxation oscillator employing a feedforward period control scheme. *IEEE proceedings of the Symposium VLSI Circuits*, 16–17.
- Koo, J., Moon, K.-S., Kim, B., Park, H.-J., & Sim, J.-Y. (2017, February). A quadrature relaxation oscillator with a process-induced frequency-error compensation loop. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 94–95).
- 44. Liu, N., et al. (2019, July). A 2.5 ppm/°C 1.05-MHz relaxation oscillator with dynamic frequency-error compensation and fast start-up time. *IEEE Journal of Solid-State Circuits*, 54(7), 1952–1959.
- 45. Lee, J., George, A. K., & Je, M. (2020, September). An ultra-low-noise swing-boosted differential relaxation oscillator in 0.18-μm CMOS. *IEEE Journal of Solid-State Circuits*, 55(9), 2489–2497.
- 46. Lu, S.-Y., & Liao, Y.-T. (2019, February). A low-power, differential relaxation oscillator with the self-threshold-tracking and swing-boosting techniques in 0.18-μm CMOS. *IEEE Journal of Solid-State Circuits*, 54(2), 392–402.

- 47. Zhou, W., Goh, W. L., & Gao, Y. (2020, October). A 3-MHz 17.3-μW 0.015% period jitter relaxation oscillator with energy efficient swing boosting. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 67(10), 1745–1749.
- Abidi, A. A., & Meyer, R. G. (1983, December). Noise in relaxation oscillators. *IEEE Journal of Solid-State Circuits*, 18(6), 794–802.
- 49. Razavi, B. (2001). Design of Analog CMOS integrated circuits. Mc Graw Hill.
- 50. Ho, Y., Yang, Y.-S., Chang, C., & Su, C. (2013, November). A near-threshold 480 MHz 78 μW all-digital PLL with a bootstrapped DCO. *IEEE Journal of Solid-State Circuits*, 48(11), 2805–2814.
- Lee, H., Li, Z., Durrant, J. R., & Tsoi, W. C. (2016, June). Is organic photovoltaics promising for indoor applications? *Applied Physics Letters*, 108(25), 1–5.
- 52. Liao, W., et al. (2016, November). Lead-free inverted planar formamidinium tin triiodide perovskite solar cells achieving power conversion efficiencies up to 6.22%. Advanced Materials, 28(42), 9333–9340.

## Part II Data Converters

## Low-Power Nyquist ADCs



Minglei Zhang, Chi-Hang Chan, Yan Zhu, and Rui P. Martins

## 1 Introduction

The ADC-wise/ADC-based implementations become more favorable in the communication and sensing systems as their wider flexibility for the system architects. Both the wireless and wireline receivers impose a stringent speed and bandwidth requirement for the ADC units. Device sizing is the straightest method for the bandwidth improvement, whereas their self-loading effect degrades the conversion efficiency observably when toward a higher conversion speed, implying that the design becomes speed-limited rather than noise-limited. Relieving such deterioration with parallelism, like time interleaving [1], seems attractive; however, it faces the challenges of increased area, inter-channel cross-talk, high complexity of calibration, etc. Increasing the sampling frequency of a single-channel ADC can relax the abovementioned challenges of time-interleaved ADCs by not only reducing the number of aggregated channels but also lowering the overall input capacitance and then imposing a further push on the high-speed ADC performance boundary. The multichannel configuration also makes the seeking for energy efficiency inescapable, which is also the basic expectation of the low-power IoT systems.

M. Zhang · C.-H. Chan (🖂) · Y. Zhu

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China e-mail: mlzhang@um.edu.mo; ivorchan@um.edu.mo; yanzhu@um.edu.mo

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_4

Pipelined ADCs [2, 3] are famous among high-performance ADCs; however conventionally, the residue amplifiers often become the power and linearity bottleneck, leading to poor energy efficiency and limited performance. This imposes a general requirement for high-efficient and high-linear residue amplification techniques in pipeline architectures. Moreover, benefiting from the outstanding energy efficiency of the successive approximation register (SAR) quantizer, the hybridization between the pipeline and SAR ADCs became popular [4–6], as it breaks the speed bottleneck of the SAR ADCs through pipelining operation, while simultaneously keeping its good energy efficiency.

Conventional voltage-domain ADCs can provide accurate quantization steps with the assistance of the capacitor digital-to-analog converters (DACs) or resistor DACs; however, they suffer from noise burden and voltage headroom challenges with reduced supply voltages in advanced fabrication processes. The quantization steps of the ADCs are linearity scaled with the supply voltage in a voltage-domain conversion, while being reversed in a time-domain ADC. This feature allows the time-domain quantization to be suitable for the low-supplied scenarios, breaking the noise limitation of the voltage-domain architectures under a reducing power supply. The scaled time quantization step of the time-domain ADCs with fabrication technology nodes also makes the time-domain quantization attractive in ultra-highspeed ADCs.

This chapter elaborates on two high-performance pipelined ADCs, the first one is a 12-bit SAR-assisted three-stage pipelined ADC with an open-loop Gm-R-based residue amplifier running at 1 GS/s [7]. It achieves SNDR and SFDR of 60.0 dB and 74.6 dB with a Nyquist input, respectively, leading to a Walden FoM of 9.3 fJ/ conversion step and a Schreier FoM of 168.2 dB by consuming 7.6 mW. The other is a 3.3 GS/s six-bit fully dynamic pipelined ADC using a linearized dynamic amplifier [8], attaining 34 dB SNDR with a Nyquist input, consuming 5.5 mW, and yielding a Walden FoM of 40.0 fJ/conversion step. On the other hand, this chapter describes also two time-domain ADCs to distinguish from conventional voltage-domain ADCs, the first is a 13-bit hybrid ADC which combines a SAR ADC with a followed time-to-digital converter (TDC) [9], and it reaches 20 MS/s conversion speed under PVT (process-voltage-temperature) variations with a 0.6 V power supply and exhibits a Schreier FoM as high as 181.9 dB. The second is an eight-bit 10 GS/s time-domain ADC that aggregates only four time-interleaved channels [10]. It delivers an effective resolution bandwidth (ERBW) larger than 18 GHz, benefiting from the time-domain integrated front-end.

This chapter has the following organization: Section 2 introduces the SAR-assisted three-stage pipelined ADC and the proposed Gm-R-based residue amplifier. Section 3 presents the implementation of a two-step TDC-assisted SAR ADC, as well as details its PVT tracking strategy. Section 4 shows a post-amplification residue generation scheme and linearized dynamic amplifier in a fully on-chip calibrated pipelined ADC. Section 5 elaborates on the application of time-domain conversion in time-interleaved ADCs, besides the timing-extended residue generation for a lower metastability error rate.

#### 2 12b 1 GS/s Three-Stage Pipeline-SAR ADC

Conventional SAR ADCs [11] are not suitable for high sampling rates accompanied by high resolution, due to the speed limitation from its inherent serial conversion process. A two-step SAR-assisted pipelined ADC [4–6] breaks such limitations through the pipelined operation, while simultaneously accomplishing good energy efficiency. With the sampling rate increased, the two-step high-resolution ADC also encounters a speed bottleneck due to the large number of successive bit decisions required in each stage. A three-stage SAR-assisted pipelined SAR ADC [12] further speeds up the conversion rate by distributing the bit decisions into more sub stages, which shows a single-channel 12-bit ADC with a conversion rate above 500 MS/s, while maintaining the attractive energy efficiency of the two-step architecture.

Figure 1 illustrates the typical time allocation of the three-stage SAR-assisted pipeline ADC. The first and second stages need to accomplish the sampling  $(t_{samp})$ , SAR conversion  $(t_{conv1/2})$ , and residue amplification operation  $(t_{amp1/2})$  within one period, while the third stage only needs to complete the sampling  $(t_{amp2})$  and SAR conversion  $(t_{conv3})$ . Unlike the two-stage SAR-assisted pipelined architecture where the first stage is often the speed bottleneck of the ADC, both the first and second stages can be critical in the three-stage architecture. Therefore, the slowest among those three stages limits the maximum achievable speed as

$$t_{3\text{st,pipeSAR}} = \operatorname{Max} \left\{ \begin{array}{l} t_{\text{samp}} + t_{\text{conv1}} + t_{\text{amp1}} \\ t_{\text{amp1}} + t_{\text{conv2}} + t_{\text{amp2}} \\ t_{\text{amp2}} + t_{\text{conv3}} \end{array} \right\}.$$
(1)

For a high-resolution and high-speed target, the noise and sampling accuracy requirements fundamentally limit the sampling time of the first stage. In order to further push the pipeline speed, one degree of freedom allows the allocation of fewer



Fig. 1 Block diagram and operation timing of three-stage SAR-assisted pipelined ADC

resolving bits in the first and second stages. Nevertheless, this deteriorates the linearity of the first-stage amplifier and simultaneously asks for additional quantization in the last stage, thus turning the third stage into the speed bottleneck of the ADC. Another important option is shrinking the amplification time which can raise the speed of both the first and the second stages. Besides the amplification time. Since it has a direct influence on the SAR conversion loop speed, minimizing the CDAC in each stage becomes a crucial target in a high-speed three-stage pipelined ADC.

## 2.1 Residue Amplifier (RA) Discussion

The residue amplifier, which transfers the residue to the successive pipeline stage, is the critical block of the SAR-assisted pipelined ADC. A high-speed and highresolution pipelined ADC calls for an RA with a high gain within a short amplification period of time. The general classification of RAs covers closed-loop or openloop topologies based on their working mechanism. A conventional closed-loop RA [5] ensures its gain accuracy through a high DC open-loop gain and a passive feedback configuration with a complete setting, but resulting in a long setting time and high power consumption. It also suffers from low intrinsic gain under technology scaling. Recently, the ring amplifier [16], cascading a three-stage inverter, provides a power-efficient and fast-setting closed-loop RA solution with a slewbased charging (Fig. 2a). Many improvements based on the ring amplifier pulled off good resilience to PVT variation [17] and fast-settling with high stability [3], while its dead-zone consideration induces design challenges under different scenarios, especially for conventional analog designers. Recently, the report of open-loop architectures overcame the weaknesses of the closed-loop architecture, exhibiting high power efficiency in high-speed scenarios. In this section, we first review and discuss state-of-the-art designs of two open-loop RA topologies: the incompletesettled and the complete-settled amplification. Then, we introduce the adopted RA.

#### **Incomplete-Settled RA**

Figure 2b, c show the dynamic amplifier [18] and the integrator-type RA [12] working in an incomplete-settled scheme, respectively. They can manifest an excellent power efficiency and a low noise integrating feature with a long integration time. However, with a short amplification time in high-speed designs, its noise benefit weakens due to the increased jitter sensitivity [19].

With the amplified output not completely settled, the gain accuracy of the incomplete-settled RA is also sensitive to the clock jitter. The gain of the incomplete-settled RA is proportional to the integrating time as indicated in Eq. (2) [14]:



Fig. 2 Residue amplifiers with their conceptual transient responses. (a) Ring amplifier with closedloop. (b) Dynamic amplifier with open loop. (c) Integrator-type amplifier with open loop. (d) Conventional complete-settled amplifier with open loop

$$A_{\rm V} = \begin{cases} G_{\rm m} \cdot R_{\rm o} \cdot \left(1 - e^{-t_{\rm amp}/\tau}\right), \\ G_{\rm m} \cdot t_{\rm amp}/C_{\rm S}, & t_{\rm amp} < <\tau \end{cases}$$
(2)

where  $G_{\rm m}$  and  $R_{\rm o}$  are the transconductance and output impedance of the RA, respectively.  $\tau$  is the time constant, which is the product of  $R_{\rm o}$  and the sampling

capacitance ( $C_S$ ) of the backend stage. For a given  $A_v$ , a shorter tamp requires a steeper slope of integration, which tends to suffer from larger jitter-induced error, thus degrading the signal-to-noise ratio (SNR). Consequently, the jitter-induced noise becomes the primary restriction when the incomplete-settled RAs target higher SNR and speed specifications [14].

Figure 2c illustrates the amplification timing of the dynamic incomplete-settled RAs. Since it is necessary to integrate with the same initial output voltage (e.g., 0 V), it requires a reset phase before charging the load capacitance. The phase of the "amp." comprises the startup time ( $t_{\text{settling,RA}}$ ) of the RAs and the charging time ( $t_{\text{settling,Cs}}$ ) of the load capacitance. Therefore, the total amplification time ( $t_{\text{amp,tot}}$ ) of the dynamic incomplete-settled RAs becomes

$$t_{\text{amp,tot}} = t_{\text{settling,RA}} + t_{\text{settling,Cs}} + t_{\text{rst}}.$$
(3)

 $t_{\text{settling,RA}}$  relates to the load capacitance and the charging current of the RAs. For low noise and high-speed targets, the limitation of  $t_{\text{settling,Cs}}$  is the jitter-induced noise requirement as interpreted above.  $t_{\text{rst}}$  depends on the overdrive of the reset switch and the load. The inevitable reset time of the incomplete-settled RA increases the critical timing path of the ADC, thus inducing a speed constraint in the pipelined ADC.

#### **Complete-Settled RA**

Figure 2d shows a conventional open-loop RA with complete-settled amplification [20], which is immune to the abovementioned clock jitter issue. Unlike the incomplete-settled RA, its gain is independent of time as long as we settle the amplification process, and then we can express it as

$$A_{\rm V} = G_{\rm m} \cdot R_{\rm o,eq},\tag{4}$$

where  $G_{\rm m}$  and  $R_{\rm o,eq}$  are the transconductance and output impedance of the amplifier, respectively. Furthermore, due to its complete-settled characteristic, its voltage gain only suffers from transconductance  $G_{\rm m}$  variations over voltage and temperature, allowing the adoption of a simple  $G_{\rm m}$  bias circuit to perform a pure voltage-domain compensation. However, such a solution consumes static power and has poor V-I conversion linearity which either requires a higher than first-order background gain calibrations [14] with large power and area overhead or an analog linearization technique [21] but with limited ability for a high-resolution ADC. Thereby, we introduce a high-linearity and purely dynamic power Gm-R-based RA in this design.
#### Proposed Gm-R-based RA

Figure 3a illustrates the circuit schematic of the proposed residue amplifier, which is an open-loop architecture consisting of a differential Gm-cell with clock-controlled and resistive load ( $R_0$ ). When compared with a conventional static open-loop amplifier [20, 21], we designed this RA to have a dynamic operation with switches  $S_{r1}$  and  $S_{r2}$ . The Gm-cell, based on a differential flipped voltage follower (DFVF) [22], with the shunt-shunt feedback branches ( $M_2$ ,  $M_5$ ,  $M_7/M_4$ ,  $M_6$ ,  $M_8$ ) keeps the gate-source voltages of  $M_2$  and  $M_4$  constant when the inputs vary. From this, the DFVF guarantees high linearity under a large input swing with good power efficiency in a class-AB behavior. In contrast with the integrator-type RA [12, 13], which experiences a large time constant and an incomplete-settled mechanism, this



Fig. 3 (a) The Gm-R-based residue amplifier. (b) Timing diagram



sets completely with a small time constant through  $R_0$  (200  $\Omega$ ). We can describe the gain of the amplifier at the steady state by [22]

$$A_{\rm V} = 2\sqrt{\mu_{\rm n} \cdot C_{\rm ox} \cdot (W/L)_{M_{1\&3}}} \cdot \mu_{\rm p} \cdot C_{\rm ox} \cdot (W/L)_{M_{7\&8}} \cdot R_{\rm out} \cdot V_{\rm OV, M_{7\&8}}, \qquad (5)$$

where  $\mu_n$  and  $\mu_p$  are the mobility of the NMOS and the PMOS devices, respectively. W/L is the transistor size;  $V_{OV}$  is the overdrive voltage.  $R_{out}$  is the equivalent output impedance. Profiting from the complete-settled characteristic, we can easily compensate the gain variation versus the temperature variation of the presented RA in the voltage domain. Figure 3b shows the signal behavior of the proposed RA with an input residue voltage and a load of 40 fF. During the amplification phase (both  $\Phi_{S2}$ and  $\Phi_{RA}$  are high), we amplify the residue voltage to the next stage in a pipelined manner. With an ADC full swing ( $V_{FS}$ ) of 1.2 V<sub>pp</sub>, the residue voltage is within 75 mV<sub>pp</sub>. Figure 4 plots the open-loop DFVF RA with an 8× gain that ensures a near 9b linearity under the worst-case working scenario. Besides, the proposed completesettled RA eliminates the reset time, and its gain is independent of the time with the amplification well settled. Then, we can express its amplification time as

$$t_{\text{amp,tot,gm}R} = t_{\text{settling,RA}} + t_{\text{settling,C_s}}$$
(6a)

$$t_{\text{settling},C_{\text{S}}} = \ln 2 \cdot (N_{\text{bit}} + 1) \cdot \tau_{\text{RA}},\tag{6b}$$

where  $N_{\text{bit}}$  is the target bit resolution and  $\tau_{\text{RA}}$  is the time constant of the amplifier which is equal to  $R_{\text{out}} \times C_{\text{load}}$ . When compared with the incomplete-settled RA, we eliminate the reset time  $t_{\text{rst}}$  in Eq. (3) to speed up the amplification. In the first stage, the allocation of 250 ps for the sampling time  $t_{\text{samp}}$  ensures the sampling accuracy under the Nyquist input. The first 4b decisions finish within 500 ps with a small DAC-assisted SAR conversion. Then, we have only ~200 ps left for the RA1 to amplify the residue signal under a 1 GS/s goal. Therefore, a small load capacitance of RA1, which is CDAC2, is a key for a short  $t_{amp1}$ . Likewise, CDAC2 also affects the conversion time of the second-stage SAR  $t_{conv2}$  and in consequence potentially leads the second stage to be the speed bottleneck of the overall ADC.

The presented dynamic Gm-R-based amplifier provides a high-speed and energyefficient residue amplification with a dynamic feature. First, when compared with the incomplete-settled RA, the proposed RA not only removes the reset time, but it is also free from the jitter-induced error, while both of them limit the amplification speed of the incomplete-settled RA at high resolution. Second, the load capacitance (CDAC2) of RA1 reduces with a two-stage RA2 structure to further shorten the amplification time. Consequently, the presented Gm-R-based amplifier achieves an  $8\times$  gain with ~200 ps amplification time, which is one of the fastest to date. Furthermore, the proposed Gm-R-based RA suffers from fewer temperature-related variables benefiting from its complete-settled characteristic, and then we can adopt a low-power voltage-domain temperature compensation instead of complex timedomain solutions [9, 15].

#### **Two-Stage RA Consideration**

With the RA2 enabled for amplification, its input capacitance  $C_{in2}$  consists of two parts. One derives from the constant gate capacitance  $C_{gg2}$  and the other is the varying gate capacitance  $\Delta C_{gg}$  originated from the dynamic operation of the RA2. If we reuse RA1 for RA2, a large parasitic capacitance  $C_{gg2}$  originating from the  $C_{gs}$ and  $C_{gd}$  of the input pair causes a significant signal attenuation on the  $V_{res2}$ , equivalently leading to an effective gain loss on RA2, and also bringing serious nonlinear issues. One solution implies the enlargement of the size of CDAC2, tolerating the parasitic-induced error of RA2, while the large CDAC2 tends to require a longer amplification time of RA1 slowing down the speed of the first stage. Another solution is to scale down the size of RA2, reducing the parasiticinduced gain loss from RA2. However, we will scale down the gain of RA2 simultaneously, equivalently leading to an effective gain reduction. The consequent longer amplification time of RA2 and conversion time of the third stage slow down the loop speed of the second stage or third stage where either scenario will prevent the design to reach the GS/s target.

In order to keep a large gain of RA2 with a small input parasitic and avoid a large CDAC in the last two pipelined SAR stages, we split the RA2 into two stages, where the first (second) stage has small (large) values of parasitic and gain. In this design, as represented in Fig. 5, we have RA2 with a gain of  $8\times$  obtained through the cascade of two blocks, one with a gain of  $2\times$  and the other with a gain of  $4\times$ . Due to the complete-settled feature, both stages in RA2 can amplify the signal simultaneously. Such configuration is not feasible in integrator-type amplifiers, since the integrating process is incomplete-settled in the intermediate time and the large second-stage input variable parasitic capacitance will affect the first-stage amplifier's linearity. Those two constraints make the amplifier more sensitive to PVT variations and more



Fig. 5 The second stage and split RA2 (single side for simplification)

nonlinear which cannot fit the RA2's design requirements and inevitable demands for a two-step operation at a lower speed.

The proposed complete-settled amplifier together with a two-stage setup ensures a high-speed operation, which is not only on the three-stage conversion but also on the RA2's amplification. In this design, we design the pole of the first stage to be more than the double of the frequency location of the second stage's pole. While in order to maintain the same bandwidth as RA1, we need to budget an additional 25% of power in RA2. The additional noise contribution from the two-stage configuration is not critical for RA2 as the gain of RA1 will relax it. Another consideration is the linearity, with the input of the second stage of RA2 amplified 2× by the first stage of RA2. In this design, we configured both of the two stages of RA2 with the FVF input pair as the same as RA1 to meet the 7b linearity requirement.

## 2.2 ADC Implementation

Figure 6 presents the overall architecture and timing diagram of the prototype ADC, composed of three sub-SAR ADCs, two RAs, a clock generator, a digital logic, and a calibration block. To guarantee a >70 dB sampling linearity, we assign 250 ps for the sampling ( $\Phi_S = 1$ ). The ADC design exhibits a fully asynchronous loop timing and 14 conversion cycles, with 1 redundant bit allocated between each stage for error correction. Both the first and second stages only resolve four bits, ensuring an adequate sampling and amplification time for high linearity. The first stage requires a large CDAC (L-DAC = 540 fF) to alleviate the thermal noise and the mismatch, which usually becomes the speed bottleneck of the SAR-assisted pipelined ADC. In this work, we utilize a small CDAC (S-DAC = 60 fF) for the high-speed first-stage



Fig. 6 Block diagram and timing diagram of the proposed three-stage ADC

SAR conversion with its decisions transferred to the L-DAC in a bit-by-bit manner to generate the residue voltage ( $V_{res1}$ ) on the L-DAC. Both the L-CDAC and S-DAC sample the input signal (1.2  $V_{pp-diff}$ ) together during  $\Phi_S$ , mitigating the sampling mismatch through a symmetric clock tree and a careful layout with a correction by the bit overlapping in the second stage. RA1 that has a gain of eight amplifies  $V_{res1}$  to the second stage which resolves another four bits, with its residue voltage ( $V_{res2}$ ) delivered to the third stage through RA2 with a gain of eight, as well. Due to the abovementioned residue voltage attenuation during the amplification, the full-scale range of the second and third stages are ~500 mV<sub>pp-diff</sub> and ~400 mV<sub>pp-diff</sub>, respectively. Finally, the last-stage SAR ADC determines the remaining six bits. The splitting monotonic switching scheme [23], used in each stage, maintains a constant common mode voltage for both RAs and comparators. To reduce the inductive effect from the bonding wire, we embedded an input buffer [2] on chip.

### 2.3 Measurement Results

Figure 7 presents the die photograph of the prototype ADC fabricated in 28 nm CMOS, occupying a core area of  $0.0091 \text{ mm}^2$ . The ADC powered by a 1 V supply exhibits a 1.2 V<sub>pp-diff</sub> full-scale range. Due to the unskilled layout, the L-DAC suffers from some mismatch, and we adopted a bit of weight calibration in the measurement. We corrected the first 4b codes with an integer bit weight one-time and fixed the bit weight array to different samples. Besides the one-time calibration to the



Fig. 8 Measured DNL and INL

comparators offset, we obtained the RAs' gain in the foreground. As illustrated in Fig. 8, the measured DNL and INL are +0.47/-0.39 LSB and +1.87/-2.21 LSB, respectively.

Figure 9 plots the measured output spectrum after a decimation of 225-fold at a low input frequency and near the Nyquist input frequency, respectively. For a low input frequency of 140.63 MHz, the measured SNDR and SFDR are 61.4 dB and 74.6 dB, respectively, and the input buffer limits the noise performance (SNR). For the near Nyquist input frequency of 495.19 MHz, the measured SNDR keeps 60 dB and the SFDR keeps 74.6 dB. Figure 10 displays the measured dynamic performance at 1 GS/s versus the input frequency. The ADC attains a value above 9b ENOB even with the input frequency raised up to 1.2 GHz, and the estimated clock jitter (~200 fs) degrades the dynamic performance when the input signal exceeds 1 GHz.

The ADC consumed 7.6 mW power (excluding the input buffer) running at 1 GS/ s with a 1 V supply, resulting in a 9.28 fJ/conv.-step Walden FoM and 168.2 dB Schreier FoM. Figure 11 presents the power breakdown, where the RAs consume almost half of the power due to their short amplification time and low noise requirements. Table 1 summarizes the ADC performance and compares this work against state-of-the-art single-channel designs with similar speed and SNDR.



**Fig. 9** Measured output spectrum at 1 GS/s with 140.63 MHz input and 495.19 MHz input (16,384 points and ADC output decimated by 225×)



Fig. 10 Measured SFDR/SNDR versus input frequency



|                                               |                   | [24]             | [13]                    | [25]             | [15]                      | [17]              | [3]               |
|-----------------------------------------------|-------------------|------------------|-------------------------|------------------|---------------------------|-------------------|-------------------|
|                                               | This work         | VLSI 2017        | JSSC 2018               | ISSCC 2017       | ISSCC 2017                | ISSCC 2019        | JSSC 2019         |
|                                               |                   | KJ. Moon         | R. Sengal               | L. Kull          | H. Huang                  | B. Hershberg      | J. Lagos          |
| Architecture                                  | Pipelined<br>SAR  | Pipelined<br>SAR | Pipeline                | Pipelined<br>SAR | Pipelined<br>SAR Pipeline |                   | Pipeline          |
| Residue amplifier                             | Open-loop<br>Gm-R | gm-cell          | Open-loop<br>integrator | CML<br>amplifier | Dynamic<br>amplifier      | Ring<br>amplifier | Ring<br>amplifier |
| Technology                                    | 28 nm             | 28 nm            | 28 nm                   | 14 nm            | 65 nm                     | 16 nm             | 28 nm             |
| Resolution [bits]                             | 12                | 10               | 12                      | 10               | 12                        | 11                | 12                |
| Sample rate [MS/s]                            | 1000              | 500              | 280                     | 1500             | 330                       | 600               | 1000              |
| Supply voltage [V]                            | 1                 | 1                | 1                       | 0.95             | 1.3                       | 0.85              | 0.9               |
| SFDR @Nyq. [dB]                               | 74.56             | 69.2             | 77                      | 58.39            | 75.8                      | 78.3              | 73.1              |
| SNDR @Nyq. [dB]                               | 60.02             | 56.6             | 64                      | 50.1             | 63.5                      | 60.2              | 56.6              |
| Power [mW]                                    | 7.6               | 6                | 13                      | 6.92             | 6.2                       | 6.0               | 24.8              |
| FoM <sub>Walden</sub> @Nyq.<br>[fJ/conv-step] | 9.28              | 21.7             | 35.8                    | 17.7             | 15.4                      | 12                | 45                |
| FoM <sub>Schreier</sub> @Nyq. [dB]            | 168.2             | 162.8            | 164.3                   | 160.5            | 167.8                     | 167.2             | 159.6             |
| Area [mm <sup>2</sup> ]                       | 0.0091            | 0.015            | 0.22                    | 0.0016           | 0.08                      | 0.037             | 0.54              |

Table 1 ADC performance summary and comparison with state of the art

<sup>a</sup>Including the reference buffer

## 3 0.6 V PVT-Robust 13b 20 MS/s SAR-TDC ADC

With a low power supply, the voltage and time-domain hybrid architecture is a promising solution [26] which resolves the voltage-domain headroom issue in a conventional two-stage ADC [4] through partial time-domain quantization. The time LSB step that increases as  $V_{DD}$  decreases allows the time-domain quantization to be suitable for high-resolution and low-supply scenarios. The TDC-assisted SAR ADC [27, 28] combines a SAR ADC and a TDC through a voltage-to-time converter (VTC). It merges the advantages of a voltage-domain quantization with a large swing input signal and a time-domain quantization with a small swing residue signal, showing an outstanding energy efficiency in the high-resolution design.

Even if the TDC-assisted SAR ADC architecture contains the abovementioned advantages under a low power supply, it still encounters several design challenges. First, it is sensitive to the variations of PVT [27, 28]. Unlike the voltage-domain quantization with an LSB step defined by a reference voltage, the LSB step of the time-domain converter interacts with the delay of the delay cell, which is highly sensitive to the PVT variations, especially under a low power supply. Figure 12a shows a delay cell with two cascaded inverters and its discharging and charging-based model; we can express the time delay as

$$T_{\rm U} = T_{\rm U,N} + T_{\rm U,P} = \frac{V_{\rm DD} \cdot C_{\rm P}}{2} \cdot \left(\frac{1}{I_{\rm N}} + \frac{1}{I_{\rm P}}\right)$$
 (7)

where  $T_{U,N}$  and  $T_{U,P}$  are the delays related to the NMOS current  $I_N$  and PMOS current  $I_P$ , respectively. In Eq. (7),  $I_N$  and  $I_P$  vary along with PVT, and the variation



Fig. 12 Delay cell with (a) discharging- and charging-based model and (b) ambient temperature and supply voltage variation response

becomes large under a low power supply due to the corresponding smaller overdrive voltages. From Fig. 12b, the simulated time delay variations under 0.6 V power supply are close to  $3 \times$  and  $4 \times$  of the 1.2 V supplied condition across the temperature and supply voltage variations, respectively, with the adopted 65 nm CMOS process. The second challenge lies on the slow conversion speed of the low power-supplied TDC-assisted SAR ADC, not only caused by the long VTC latency with a small residue voltage but also from the large TDC LSB time step with the shrinking power supply [25]. We can recognize this from the results in Fig. 12b where the delay of the delay cell with a 0.6 V power supply is close  $4 \times$  of the 1.2 V condition. This section presents a PVT-robust TDC-assisted SAR ADC, which achieves a sub-1-dB SNDR drop across both  $-50 \ ^\circ$ C-90  $^\circ$ C and  $\pm 5\%$  power supply variation at 20 MS/s.

# 3.1 Voltage-Time Hybrid ADC Architecture

Figure 13a presents the overall architecture of the 13-bit two-step TDC-assisted SAR ADC, while Fig. 13b shows its detailed timing. A discharging-based VTC connects a seven-bit voltage-domain subranging SAR ADC with a 7-bit time-domain two-step TDC to obtain 13-bit outputs (with 1-bit redundancy). In the voltage domain, we adopted a subranging architecture with a detect-and-skip (DAS) logic to obtain both a high-speed and a high-linear capacitor digital-to-analog converter (CDAC) arrays simultaneously [6]. The four-bit coarse and seven-bit fine SAR ADCs sample the input signal during  $\Phi_s$  simultaneously, with the sampling accomplished with a bottom plate sampling scheme and bootstrapped switches in a sampling time of 10 ns to strengthen the sampling linearity. Afterward, the circuit acquires the first four most significant bits (MSBs) through the coarse stage and then transfers them to the fine stage with the DAS scheme to relax the reconstruction time of the first three MSB capacitors  $(64C_F, 32C_F, 16C_F)$  in the fine stage [6]. We can insert one redundant bit (corresponding to capacitor  $8C_{\rm F}$ ) between the coarse and fine stages to cover their gain and offset mismatches. Unlike the floating-based switching scheme [6], a solid-connected  $V_{\rm CM}$  replaces the floating operation to shield the kickback noise and the common-mode voltage drift induced by the floating capacitors, while retaining the energy-efficient and high-linear features. The  $V_{\rm CM}$  accuracy requirement greatly decreases as it appears in a common-mode manner at the comparator input. The unit capacitor in the coarse stage is 7.5 fF for high speed



Fig. 13 (a) Block and (b) timing diagram of the 13-bit two-step TDC-assisted SAR ADC with speed enhancement (single-ended SAR ADC for simplicity)

and 30 fF in the fine stage for high linearity. The size of the coarse comparator is half of the fine comparator for better energy efficiency.

After the conversion of the subranging SAR ADC, a fully dynamic VTC incorporating discharging branches and threshold-cross detectors (TCDs) converts the residue voltage of the SAR ADC CDAC to a time difference ( $T_P$  and  $T_N$ ) including a PVT tracking ability. Following this, a single-ended time generator produces the input (STA and STO) for the single-ended TDC giving one sign bit result, simultaneously. A 4-bit flash TDC and a 3.5-bit Vernier TDC (includes1.5-bit calibration range) totally resolve 6 bits in the time domain. In this design, we generate all the timings on chip in an asynchronous manner (Fig. 13b).

### 3.2 Stage Bit Number Arrangement

Firstly, the voltage-domain quantization relies on the matching of the unit capacitors in the CDAC, which is more accurate than the unit delay element in the time-domain quantization. Therefore, more bits in the second-stage time domain improve the matching requirement of the delay cells in the TDC, which induces extra calibration effort. Secondly, solving more bits in the first-stage voltage domain reduces the PVT tracking accuracy requirement between the VTC and its back-end TDC. For a 13-bit ADC with a target of >71.0 dB SNDR under PVT variations, the behavioral simulation suggests a PVT tracking accuracy of  $<\pm4.0\%$  with a seven-bit first stage while  $<\pm2.0\%$  with a six-bit first stage. Thirdly, solving more bits in the first stage also decreases both the energy efficiency and conversion rate of the low power-supplied SAR ADC significantly because of a higher comparator noise requirement and more SAR cycles. Therefore, in this work, we allocate seven bits in both the voltage and time domain after considering the above trade-offs.

In the seven-bit subranging SAR ADC, the bit number selection of the coarse stage places a trade-off between the conversion speed and the error tolerance range. More bits in the coarse stage can speed up the voltage-domain conversion through the DAS logic [6] but reduce the error tolerance range for the mismatch between the coarse and fine stages. Here, we assign four bits to the coarse stage to support an error tolerance range of 6.25%. Moreover, the four-bit coarse-stage SAR ADC assists the seven-bit subranging SAR ADC to accomplish the conversion in 10 ns under a 0.6 V power supply, according to the post-layout simulation with TT corner and room temperature.

In the six-bit two-step TDC, additional bits (but less than four bits) in the Vernier TDC reduces the number of time arbiters in the two-step TDC which results in better energy efficiency, while it deteriorates the two-step TDC linearity due to more mismatch errors from two signal paths in the Vernier TDC. Here, the 3.5-bit Vernier TDC including a 1.5-bit calibration range assists to reduce the total number of arbiters from 64 in a conventional 6-bit flash TDC to 29 (17 and 12 in the flash and Vernier TDCs, respectively). Moreover, the behavioral simulation shows that a <10% one-sigma mismatch among the delay cells in the two-step TDC is necessary

for >72.0 dB SNDR. Furthermore, the above 1.5-bit calibration range can tolerate a time residue generator (TRG) offset spread of  $\pm 4$  Vernier TDC LSB step ( $\pm 128$  ps).

# 3.3 PVT Inner Tracking Technique

The PVT-stabilized techniques in [6, 15] for the voltage-domain dynamic amplifier involve auxiliaries and complicated timing control logic, while with a limited compensation capability under a low power supply. In order to cover a large LSB variation under PVT and low power supply, we present a PVT inner tracking between the VTC and its back-end TDC.

#### **Incomplete-Settled RA**

Figure 14 shows the delay cell adopted in the four-bit flash TDC, which defines the LSB step of the flash TDC. Therefore, the time delay  $T_d$  of the delay cell across PVT is the term of the TDC that requires tracking by the VTC. According to Eq. (7), we can express its delay as

$$T_{\rm d} = \left(1 + \frac{1}{\beta}\right) \cdot \frac{V_{\rm DD} \cdot C_{\rm C}}{I_{\rm U}} \tag{8}$$

where  $I_{\rm U}$  is the NMOS-related discharging current to the loading capacitance  $C_{\rm C}$ .  $\beta$  is the scaling factor between the PMOS and NMOS-related currents. From Eq. (8),  $T_{\rm d}$  depends on the supply voltage  $V_{\rm DD}$  and the current source  $I_{\rm U}$ , which is PVT sensitive, resulting in a PVT-sensitive time LSB of the TDC.



#### **Discharging-Based VTC**

VTC builds the interface between the voltage-domain and time-domain conversion. The current-starved-based [29] and dynamic amplifier-based [6] VTCs are popular due to their low power consumption. However, both of them experience difficulties to track the delay cell variation over PVT as their delay-controlled voltage is the common-mode voltage while that of the delay cell is the power supply voltage in Fig. 14. Moreover, they also suffer from low linearity with the voltage-controlled delay feature. The current discharging-based VTC [29] provides a separate current branch, making the PVT tracking in the delay cell possible. However, the previous art [29] has a poor tracking ability due to the nonuniform discharging currents between the VTC and the delay cell. Here, we design a discharging-based VTC with a uniform discharging current to the delay cell with an improved tracking ability to assist a high-resolution ADC with robustness, while achieving a better noise and linearity performance than [29] through TCDs.

Figure 15 presents the circuit and timing diagram of the designed dischargingbased VTC. During the SAR ADC sampling and conversion ( $CK_T = 0$ ), we reset both the discharging currents  $I_{\rm D}$  and TCDs for energy efficiency. After the SAR ADC cycle, the residue voltages  $V_{\rm P}$  and  $V_{\rm N}$  hold at the CDAC of the seven-bit fine SAR ADC, with a common-mode voltage of  $V_{DD}/2$ . Then, we generate the rising





edge of  $CK_T$  asynchronously, and  $V_P$  and  $V_N$  start to discharge at the same rate through  $I_D$ . The VTC generates the time outputs  $T_P$  and  $T_N$  once  $V_P$  and  $V_N$  cross the threshold voltage  $V_{REF}$  of the TCDs, respectively. The pseudo-differential structure of the VTC in Fig. 15a reduces the noise requirement of  $V_{REF}$  through a cancellation effect. The TCDs and the current sources  $I_D$  power down with the time difference  $T_{RES}$ , generated through the power down logic  $CK_D$  (falling edge of  $CK_D$  in Fig. 15) for good speed-scaling energy efficiency.

From Fig. 15, we can write the output time difference  $T_{\text{RES}}$  of the VTC as

$$T_{\rm RES} = (V_{\rm P} - V_{\rm N}) \cdot \frac{C_{\rm Fine}}{I_{\rm D}} \tag{9}$$

where  $C_{\text{Fine}}$  is the capacitance of the single-end CDAC of the fine SAR ADC, which includes 136 unit capacitors ( $C_{\text{F}}$ ). From Eq. (9), the VTC shows attractive high linearity from the linear discharging operation. Considering the full-scale (FS) input voltage of  $V_{\text{P}}$  and  $V_{\text{N}}$  (( $V_{\text{P}} - V_{\text{N}}$ )<sub>FS</sub> =  $V_{\text{DD}}/136$ ), the FS  $T_{\text{RES}}$  of the VTC becomes

$$T_{\text{RES,FS}} = V_{\text{DD}} \cdot \frac{C_{\text{F}}}{I_{\text{D}}}.$$
 (10)

From Eq. (10), the VTC FS  $T_{\text{RES}}$  also depends on the power supply  $V_{\text{DD}}$  and the current source  $7I_{\text{D}}$ , which is similar to Eq. (8).

#### **PVT Tracking Implementation**

As Eq. (8) defines the flash TDC LSB step and Eq. (10) is the TDC FS input time difference, the FS output code of the four-bit flash TDC is

$$\frac{T_{\text{RES,FS}}}{T_{\text{d}}} = \frac{\beta}{1+\beta} \cdot \frac{C_{\text{F}}}{C_{\text{C}}} \cdot \frac{I_{\text{U}}}{I_{\text{D}}}.$$
(11)

We intentionally designed  $\beta > 4$  in Eq. (11) to alleviate the influence of the PMOS-related delay and allow the domination of the NMOS-related delay. The variation of  $\beta$  ( $\leq\pm3\%$ ) and  $C_{\rm F}/C_{\rm C}$  ( $\leq\pm0.5\%$ ) over the temperature and supply voltage variation is negligible; hence, if the VTC discharging current  $I_{\rm D}$  (Fig. 15) and the delay cell discharging current  $I_{\rm U}$  (Fig. 14) share the same PVT response, we can obtain the inner tracking between the VTC and its back-end TDC.

In order to match  $I_D$  and  $I_U$  response, we configure the current source  $I_D$  of the VTC as illustrated in Fig. 16. Here,  $I_D$  consists of 1 branch of 10 serial-connected unit NMOS and 6 configurable branches (with one of these branches used for VTC offset calibration) of 20 serial unit NMOS. The unit NMOS of the current source  $I_U$  (Fig. 14) has the same size as the unit NMOS of the current source  $I_D$  (Fig. 16), biased by  $V_{DD}$  (in Figs. 14 and 16,  $V_T = V_B = V_{OS} = V_{DD}$ ),  $I_D$ , and  $I_U$  share the same PVT characteristic. Moreover, we designed the current source  $I_D$  to be configurable



Fig. 17 Simulated (a) VTC  $T_{\text{RES,FS}}$ , delay cell  $T_{\text{d}}$ , and (b) flash TDC FS code versus temperature and power supply variation

by selecting a different number of branches for the process variation induced by the partial PMOS-related delay in Fig. 14 and the other non-idealities. One branch of 10 series unit NMOS with 3 branches of 20 series unit NMOS are active at the TT process corner to achieve a VTC FS output of 1024 ps according to Eq. (10). The above configuration supports a cover range of 100%  $T_{\text{RES,FS}}$  with a coarse step of 20%  $T_{\text{RES,FS}}$ . We obtain a fine-tuning step by setting  $V_{\text{B}}$  in Fig. 16 slightly deviated from  $V_{\text{DD}}$  ( $\Delta < 20$  mV). Figure 17 displays the simulated VTC output time  $T_{\text{RES,FS}}$ , the delay cell  $T_{\text{d}}$ , and the flash TDC FS code versus temperature and power supply variations. Three conditions are necessary to show the presented PVT tracking capability: -50 °C at 0.57 V supply (slowest), 27 °C at 0.60 V supply (normal), and 90 °C at 0.63 V supply (fastest). The PVT tracking technique tracks an LSB variation as large as 52.2% and reaches a TDC FS code variation of only 1.6%.

## 3.4 Measurement Results

Figure 18 shows the die photograph of the prototype two-step TDC-assisted SAR ADC, fabricated in a 1P9M 65 nm CMOS process, occupying an active area of 0.053 mm<sup>2</sup>. The ADC power supply and SAR ADC reference both have a voltage of 0.6 V. We added a large bypass capacitance on chip to stabilize the reference voltage, with the output data of the ADC decimated by 5 to mitigate the ripple that couples through the PAD ring. The unit capacitor in the SAR ADC is a custom-designed encapsulated metal-oxide-metal capacitor, and the adopted high-linear switching scheme [6] prevents calibration. The ADC uses one-time foreground calibration to remove the offset of the VTC and align its FS output time to the back-end TDC through the configurable branches in Fig. 16. Moreover, we apply the background bit shifting to the TRG offset off chip. Beyond that, the accuracy of the time-domain quantization relies on its intrinsic matching without calibration.

Figure 19 plots the measured DNL and INL with a conversion rate of 20 MS/s. The measured DNL and INL errors are within -0.74/+0.72 LSB and -1.03/+1.31 LSB, respectively, benefiting from the high-linear CDAC in the SAR ADC. Figure 20 exhibits the measured 66,536-point FFT spectrums with both the LF and Nyquist input signals at 20 MS/s. The ADC achieves 71.5 dB SNDR and 91.9 dB spurious-free dynamic range (SFDR) with an input frequency of 0.49 MHz and 71.0 dB SNDR and 89.5 dB SFDR with an input frequency of 9.98 MHz. Figure 21



Fig. 18 Die microphotograph

Fig. 19 Measured DNL

and INL



Fig. 20 Measured FFT spectrums with (a) LF and (b) Nyquist input signal at 20 MS/s



presents the measured SNDR and SFDR versus various ADC input frequencies with a sampling rate of 20 MS/s.

To verify the robustness of the ADC, Fig. 21 shows the measured SNDR variation across a temperature range of -50 °C–90 °C under different power supplies of 0.57 V, 0.60 V, and 0.63 V without bias optimization ( $V_T = V_{DD}$ ). In Fig. 21, benefiting from a PVT inner tracking technique, the maximum SNDR drop compared to the SNDR at 25 °C with a 0.6 V supply is 0.89 dB. Moreover, part of the SNDR drop under the high temperature results from the rise of the thermal noise.

The ADC consumed 82  $\mu$ W at 20 MS/s with a power supply of 0.6 V, where the VTC consumes 39% for low-noise purposes and the TDC utilizes only 17% due to the two-step architecture without TA. The other power consumption percentages include 36% for the voltage-domain SAR ADC and 8% for the bootstrapped clock generator.

Table 2 summarizes the performance of the presented ADC and compares it with other state-of-the-art ADCs. This work achieves a Walden FoM of 1.4 fJ/conversion

|                                           | This<br>work | [27]<br>JSSC<br>2016 | [28]<br>CICC<br>2017 | [30]<br>JSSC<br>2015 | [31]<br>VLSI<br>2016 | [32]<br>JSSC<br>2016 | [33]<br>JSSC<br>2017 |
|-------------------------------------------|--------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
| Architecture                              | SAR-<br>TDC  | SAR-<br>TDC          | SAR-<br>TDC          | Pipe-<br>SAR         | SAR-<br>VCO          | SAR-<br>Slope        | SAR                  |
| Technology                                | 65 nm        | 90 nm                | 180 nm               | 65 nm                | 40 nm                | 28 nm                | 40 nm                |
| Resolution [bits]                         | 13           | 10                   | 13                   | 13                   | -                    | 12                   | 13                   |
| Active area [mm <sup>2</sup> ]            | 0.053        | 0.041                | 0.215                | 0.054                | 0.030                | 0.005                | 0.068                |
| Power supply [V]                          | 0.6          | 0.6                  | 1.5/1.0              | 1.2                  | 1.1                  | 0.9                  | 1.0                  |
| IDNLI [LSB]                               | 0.74         | 0.47                 | 1.20                 | 0.58                 | -                    | 0.53                 | 1.08                 |
| IINLI [LSB]                               | 1.31         | 0.53                 | 1.70                 | 0.96                 | -                    | 0.82                 | 3.79                 |
| Sample rate or 2 × BW<br>[MS/s]           | 20           | 2                    | 20                   | 50                   | 6                    | 100                  | 6.4                  |
| SNDR @ LF [dB]                            | 71.5         | 55.0                 | 73.1                 | 71.5                 | 71.4                 | 65.7                 | -                    |
| SNDR @ Nyq. [dB]                          | 71.0         | 54.5                 | -                    | 70.9                 | -                    | 64.4                 | 64.1                 |
| Total power [µW]                          | 82           | 4.6                  | 1280                 | 1000                 | 350                  | 350                  | 46                   |
| FoM <sub>W</sub> @ LF [fJ/conv.<br>step]  | 1.3          | 5.0                  | 17.3                 | 6.5                  | 19.2                 | 2.2                  | -                    |
| FoM <sub>w</sub> @ Nyq.<br>[fJ/conv.step] | 1.4          | 5.3                  | -                    | 7.0                  | -                    | 2.6                  | 5.5                  |
| FoM <sub>S</sub> @ LF [dB]                | 182.4        | 168.4                | 172.0                | 174.9                | 170.7                | 177.2                |                      |
| FoM <sub>S</sub> @ Nyq. [dB]              | 181.9        | 167.9                | -                    | 175.5                | -                    | 175.9                | 172.5                |
| PVT robustness                            | Yes          | No                   | No                   | Yes                  | Yes                  | Yes                  | Yes                  |

Table 2 ADC performance summary and comparison with state of the art

step and a Schreier FoM of 181.9 dB with Nyquist input. When compared with the low-power-supplied TDC-assisted SAR ADC in [27], this work greatly increases the conversion rate and accuracy. On the other hand, compared to the TDC-assisted SAR ADC in [28], this ADC pulls off a similar conversion rate and accuracy but with a low power supply, avoiding complex off-chip calibrations to the SAR CDAC and TDC delay cells.

## 4 6b 3.3 GS/s Pipeline ADC

The pipelined architecture adopted for ADCs typically exhibits a  $9 \sim 14$ -bit resolution and a moderate sampling rate of around 2 GS/s per channel [34, 35]. Each stage (except the last) of the pipeline accomplishes three major operations: sampling, quantization, and residue amplification. Previous art [36] demonstrated that we can avoid this operation sequence by generating the residue with reference-embedded dynamic preamplifiers. However, with the reference realized by unbalancing the loading capacitance of the preamplifier's outputs, its accuracy relies on its tuning step, limiting the comparator's regeneration speed and the overall sampling rate of the 6b ADC to only 550 MHz. Another concern is the linearity of the residue

amplifiers. Reference [37] avoids this issue by activating different dynamic pre-amplifiers for each ADC threshold and calibrating the threshold to the desired value. This however limits the implementation into a tree structure with additional hardware and calibration overhead. Conventional closed-loop amplifiers call for high gain and suffer from stability issues, and they are also power-hungry. Dynamic amplifiers (DAs) can improve the power efficiency as well as the speed, but the linearity is relatively poor due to the input-dependent common current, especially under a large input swing. While calibration [38] requires a high order post-distortion extraction procedure that is hardware hungry, the time-domain linearization technique [39] limits the amplification speed significantly with the common-mode (CM) detector. This section presents a single-channel 3.3 GS/s six-bit pipelined ADC which features a post-amplification residue generation (PARG) scheme, linearized dynamic amplifier, and on-chip calibration to achieve high-speed, low-power, and a compact prototype.

### 4.1 Post-amplification Residue Generation

Figure 22a presents the conventional pipelined ADC structure and its timing diagram. It can achieve high-speed reach due to the pipelining operation and the fast flash sub-quantizer, with extended resolution obtained by cascading multiple low-resolution stages. Each stage (except the last) generally consists of a sampler, a sub-quantizer, and a multiplying digital-to-analog converter (MDAC). We can describe the operation of this conventional pipelined ADC with an open-loop amplifier as follows. First, we sample the input signal ( $V_{in}$ ) during  $\Phi_S$  and quantize it afterward with the sub-quantizer during  $\Phi_{\Omega}$ . Then, the DAC generates the residue



Fig. 22 (a) Conventional pipelined ADC architecture and timing diagram and (b) pipelined ADC with the PARG scheme

voltage according to the quantization results, further amplified during  $\Phi_A$  and passed to the subsequent stages for additional quantization. Under a low-to-moderate resolution target, the time required for the sampling, quantization, and amplification is similar, which leaves out an idle time slot. This classical arrangement accomplishes serially three major operations, including sampling, quantization, and residue amplification. The amplification ( $\Phi_A$ ) thereby must wait for the completion of the quantization as well as the DAC feedback ( $\Phi_Q$ ), leading to an inefficient time allocation with idle time in each pipelined stage.

In contrast with the conventional substage operation, Fig. 22b illustrates a PARG pipeline architecture [8]. While each pipelined stage shares the same hardware as a conventional pipeline ADC, we rearrange its operation to avoid idle time. First, we sample the input signal ( $V_{in}$ ) during  $\Phi_S$  and quantize it with the sub-quantizer during  $\Phi_A$ . Simultaneously, the RA also amplifies the sampled input within  $\Phi_A$  and passes the result to the subsequent stages for residue generation and further quantization. Instead of amplifying the residue, now the RA amplifies the full sampled input, generating the residue generation thus happens after the amplification allowing the comparator and the RA to work in parallel. The parallelized operation accelerates the overall speed by allowing each stage to accommodate only two basic operations, sampling and conversion/amplification, effectively eliminating the idle time.

Under the same target resolution, we can unify the required time for sampling  $(T_{\text{SAM}})$ , amplification  $(T_{\text{AMP}})$ , and comparison  $(T_{\text{comp}})$  when comparing the conventional pipeline and the PARG scheme. The smallest possible clock period of the conventional pipelined ADC is

$$T_{\rm CLK} = T_{\rm SAM} + T_{\rm setup} + T_{\rm pre} + T_{\rm comp} + T_{\rm AMP} + T_{\rm DAC}$$
(12)

where  $T_{\text{setup}}$ ,  $T_{\text{pre}}$ , and  $T_{\text{comp}}$  are the setup time of the sampled input/residue voltage, the pre-discharge time, and the regeneration time of the comparator, respectively.  $T_{\text{DAC}}$  is the DAC logic delay. In the conventional setup, each stage (except the last) must conclude sampling, comparison, and amplification where certain timing overheads, such as setup time and predischarge time of the comparator, are inevitable due to the serial operation in each stage. While with the post-residue amplification scheme, the shortest clocking period can decrease down to

$$T_{\rm CLK} = T_{\rm SAM} + T_{\rm setup} + T_{\rm pre} + T_{\rm comp} \tag{13}$$

Noteworthy is the fact that we saved  $T_{AMP}$  and can merge  $T_{DAC}$  and  $T_{setup}$  as the amplification and comparison now happen simultaneously. Here, we assume  $T_{pre} + T_{comp} > T_{AMP}$ , since the regeneration time of the comparator is often more critical than the amplification time in high-speed scenarios. The PARG can provide a 1.5-fold improvement in conversion speed compared to the conventional pipelined ADC for a low-to-moderate resolution target, or alternatively, it contributes with an additional one-third of regeneration time for the comparator, thus improving the

metastability error rate of the ADC. It is also worth highlighting that the optimum timing of Fig. 22a requires a clock pulse-width modification, shortening the sampling time from a half period, which is sensitive to PVT variations. While the half-period setup in the PARG is more robust, the actual saving or speed enhancement is higher than the discussed value above.

## 4.2 Linearized Dynamic Amplifier

An open-loop-type MDAC provides a convenient and high-speed way to facilitate the PARG with the amplified voltage held on the succeeding stage's capacitor; it places a higher pressure on the linearity requirement of the residue amplifiers as they interface with larger signal swings, especially at the initial stages. In this design, we adopt the dynamic residue amplifier due to its outstanding energy efficiency and fully dynamic power property. With the speed-limiting CM detection removed and its gain accuracy ensured by the calibration, a linearization technique is necessary to reach a reasonable SFDR performance even for a low-to-moderate resolution target.

Figure 23 depicts the schematic of a conventional DA [18] and its transient waveforms. Its basic working principle is discharging the supply-precharged load capacitors through an input-controlled current source. The voltage gain  $A_V$  of the DA is

$$A_{\rm V} = \frac{G_{\rm m}}{C_{\rm L}} \cdot T_{\rm amp} \approx \frac{g_{\rm m1} + g_{\rm m2}}{2} \cdot \frac{1}{C_{\rm L}} \cdot T_{\rm amp} \tag{14}$$

where  $g_{\rm m}$ ,  $C_{\rm L}$ , and  $T_{\rm amp}$  are the transconductance of the input transistors, the load capacitance, and the amplification time, respectively. Regarding the integrating nature of the discharging process, it is desirable to have the output differential



Fig. 23 (a) Schematic and (b) signal behavior of the conventional differential dynamic amplifier

current  $(I_{D2} - I_{D2})$  linearly related to the input voltage and consistent throughout time. However, the drain-source current of the MOSFET is second-order dependent on its overdrive voltage according to the square-law model, and thereby it fails to linearly follow the input, originating nonlinearity in the amplification. We can obtain the nonlinearity induced by the differential input pair by exploring the relationship between the differential drain current of an input pair and its differential input voltage based on the square-law equation [40]:

$$I_{\rm D1} - I_{\rm D2} = \frac{1}{2}k(V_{I+} - V_{I-})\sqrt{\frac{4I_{\rm CM}}{k} - (V_{\rm I+} - V_{\rm I-})^2}$$
(15)

where  $I_{D1}$  and  $I_{D2}$  are the drain currents generated by  $M_1$  and  $M_2$  in the amplification phase for the inputs  $V_{I+}$  and  $V_{I-}$ , respectively.  $I_{CM}$  represents the common-mode current and  $k = \mu C_{ox} \frac{W}{L}$  the geometry and process parameters of the MOSFETs. The suppression of the nonlinearity originated by the  $I_{CM}$  term is possible through a careful sizing of  $M_3$ , which reduces the channel length modulation effect. Consequently, the gate-source voltage of  $M_3$  mainly controls  $I_{CM}$  and is relatively constant within the amplification. On the other hand, the input pair originates a second-order input-dependent term  $(V_{I+} - V_{I-})^2$  inside the square root part in Eq. (15), imposing that the differential output current fails to follow the input linearly. This leads to a compressing type of nonlinearity as the square root term decreases when the input difference grew. Such a type of nonlinearity is the major bottleneck and becomes severe with the large input swing in the first initial pipelined stages.

As described in Eq. (15), there is a relationship between the input pair-induced nonlinearity and the second-order dependency of the MOSFET drain current on the input  $((V_{I+} - V_{I-})^2 \text{ term})$ . The presented idea alleviates its impact by forcing the term



under the square root part in Eq. (4.15),  $\sqrt{\frac{4I_{\rm CM}}{k} - (V_{\rm I+} - V_{\rm I-})^2}$ , approach to a constant value. Figure 24 displays the DA with an auxiliary path for linearization. On top of the conventional DA structure, we add an auxiliary pseudo-differential input pair M<sub>4</sub>-M<sub>5</sub> with clock-controlled through M<sub>6</sub>-M<sub>7</sub>. They share the same main clock signal  $\Phi_{\rm Amp}$  with the DA and provide compensation currents  $I_{\rm D1Aux}$  and  $I_{\rm D2Aux}$ . The sum of the compensation current is

$$I_{\text{sum,Aux}} = I_{\text{D1Aux}} + I_{\text{D2Aux}} = c + \frac{1}{2}k_1 \left(V_{1+}^2 + V_{1-}^2\right)$$
(16)

where  $k_1 = \mu C_{\text{ox}} \frac{W}{L}$  is the geometry and process parameters of M<sub>4</sub>/M<sub>5</sub> and the variable c is

$$c = 2k_1 \left( V_{\text{th}-\text{Aux}}^2 - 2V_{\text{th}-\text{Aux}} V_{\text{CM}} \right)$$
(17)

where  $V_{\text{th} - Aux}$  is the threshold voltage of the auxiliary pair. As  $I_{\text{sum, Aux}}$  in Eq. (16) is in parallel with  $I_{CM}$ , the second-order input-dependent term  $(V_{I+}^2 + V_{I-}^2)$  in Eq. (16) compensates for the nonlinearity from  $(V_{I+} - V_{I-})^2$  in Eq. (15). After including the compensation, Eq. (15) becomes

$$I_{\rm D1} - I_{\rm D2} = \frac{1}{2}k(V_{\rm I+} - V_{\rm I-})\sqrt{\beta}$$
(18)

where

$$\beta = \frac{4I_{\rm CM}}{k} + \frac{4k_1}{k} (V_{\rm CM} - V_{\rm th,AUX})^2 + \left(\frac{k_1}{k} - 1\right) (V_{\rm I+} - V_{\rm I-})^2$$
(19)

From Eq. (19), the perfect compensation happens when k1/k = 1 in which the differential current in Eq. (15) results as

$$I_{\rm D1} - I_{\rm D2} = \frac{1}{2}k(V_{\rm I+} - V_{\rm I-})\sqrt{\frac{4I_{\rm CM}}{k} + 4(V_{\rm CM} - V_{\rm th,AUX})^2}$$
(20)

Comparing Eq. (15) with Eq. (20), there is the cancelation of second-order terms,  $V_{I+}^2$  and  $V_{I-}^2$ , with the introduction of auxiliary paths, while  $\beta$  in Eq. (19) depends only on constant values:  $V_{\text{CM}}$ ,  $V_{\text{th} - \text{Aux}}$ ,  $\mu$ ,  $C_{\text{ox}}$ , and transistor geometries, as well as  $I_{\text{CM}}$ .

The basis of the DAs is the differential pair configured as a common source, thus the source-degeneration technique is also effective for its transconductance  $(g_m)$ linearization. By adding a degeneration resistor  $R_D$ , we induce negative feedback, which suppresses the gain variation of the DA. With the major portion of  $g_m$ nonlinearity suppressed by the previously discussed auxiliary path, the degeneration mainly alleviates the input dependency of  $I_{CM}$  for better overall linearity. The nonlinearity from  $I_{CM}$  is also significant since the drain-source voltage of  $M_3(V_x)$  is also dependent on the input, which eventually affects  $\beta$  in Eq. (19) even when k1/k = 1. However, the degeneration also turns the effective  $G_m$  of the input pair into  $\frac{g_m}{1+g_mR_D}$ , thus reducing the overall gain of the DA. The proposed linearization technique partially compensates such loss as its gain is superior to the conventional design. Simulation results show that degeneration resistors provide an extra 6 dB linearity improvement on top of the proposed linearization technique under typical



Fig. 25 Dynamic amplifier with proposed linearization technique: (a) THD improvement versus k1/k with auxiliary pair only and (b) THD improvement versus k1/k with degeneration

conditions; the overall THD improvement with both linearization techniques is no less than 16 dB across different corners and temperatures.

Figure 25a shows that the auxiliary pair contributes with close to 10 dB of linearity improvement with k1/k = 1. However, due to the PVT variations, the best compensation moves away from k1/k = 1. While the overall improvement stays above 16 dB for k1/k within 0.25–1.25 (Fig. 25b), a proper choice of k1/k and the degeneration resistance allows the proposed DA to tolerate a certain range of temperature variation. Here, we choose k1/k = 0.7 to ensure a sufficient THD performance over PVT.

# 4.3 On-Chip Calibration

Similar to the conventional architecture, the proposed pipelined ADC is sensitive to interstage offset and gain error impairments. In addition, due to the stringent linearity requirement, the gain mismatch between the signal paths of the DA is also critical. Originated from mismatch and process variations, the offset of the comparators and dynamic amplifiers as well as the interstage gain error from the residue amplifier significantly limit the overall performance of the ADC. With redundancy among stages often introduced in the conventional approach to mimic the offset error, it however complicates the quantizer design as it requires multiple reference voltages and comparators in each stage. Besides, it cannot correct the nonlinearity caused by the gain mismatch between the signal paths of the DA. Instead, in this design, we suppress both offset and gain impairments through a hardware-sharable and low-cost foreground calibration. The calibrations run sequentially, starting with the



Fig. 26 Detailed block diagram of the offset and gain calibrations

comparator offset, the amplifier signal path-gain mismatch, then followed by the amplifier gain. Figure 26 depicts their block diagram.

The offset calibration of the comparators starts from the first stage and each stage accomplishes it in sequence. During the calibration, the ADC works as normal, but the bottom plate of the DAC keeps resetting and disconnecting from the comparator control. We short together the differential inputs of each stage after amplification, which generates the corresponding common-mode voltage for the offset calibration. The decision passes through an eight-time majority voting logic and controls the counter.

We calibrate together the offset and the signal path-gain mismatch of the DA. Since both signal paths experience the same nonlinearity, their gain therefore undergoes the same characteristic and does not worsen the linearity performance. Nevertheless, due to the mismatch between the signal paths, their transfer characteristics can shift and scale, worsening the differential gain nonlinearity. The adjustment of the  $I_{D0}$  can compensate the offset. However, such compensation only aligns the center point of the deviated gain curves rather than the overall gain characteristic. Instead of trimming  $I_{D0}$  with an extra pair, we manage to trim the loading capacitors ( $C_L$ ) of the differential outputs which compensate both the offset and the differential gain error simultaneously.

The calibration of the interstage gain error happens from the last to the first stages. To detect the gain error of stage N (with nulled offsets in the comparator and amplifier), the DAC of stage N through  $\frac{C_u}{2}$  generates a half-LSB voltage while others keep the reset. Then, it suffers an amplification and quantization by the current and/or subsequent stages. The quantization result D[N:5] is ideally  $2^{(6-N)} - 1$ , which is the full scale of stage N to the fifth stage. The calibration starts with a minimum gain configuration and increases the gain until D[N:5] approaches its ideal value.

### 4.4 Overall ADC Implementation

Figure 27 draws the overall ADC architecture that consists of six 1 b stages which aggregate a six-bit resolution. We do not insert any redundancy among stages to obtain the best efficiency and avoid the need of multiple reference voltages, while



Fig. 27 Overall architecture

calibrating the gain and offset error in the foreground. The adoption of the PARG scheme allows the quantization and amplification to run in parallel for high speed. The interstage gain is close to  $1.5^{\times}$ , keeping a balance between the first few stages' linearity and later stages' noise/accuracy requirement. The stringent linearity requirement, which is a drawback of the PARG scheme, imposes a low  $1.5 \times$  gain. However, with the 6b target in this design, such a small gain does not lead to a large trade-off between noise and power, and therefore, the PARG scheme is quite appropriate for high-speed and low-resolution designs. Only the DAs in the first two stages utilize the proposed linearization technique, and the remaining stages maintain the conventional DA to ensure a proper common-mode range through the pipe. The auxiliary path in the proposed DA brings a faster dropping  $V_{\rm CM}$  when compared with a conventional architecture, which degrades its maximum achievable amplification time and noise performance. However, this design is not noise limited, and the bottleneck is linearity. Such noise performance degradation has no impact on the overall ADC energy efficiency. The sampling capacitances of the first, second, and third stages are 35 fF, 15 fF, and 15 fF, respectively, and 10 fF for the remaining stages. Such small capacitance ensures a high-speed and low-power operation. Furthermore, we adopt split monotonic switching to generate the residue in each stage, which avoids an additional CM voltage. We aligned the outputs of each stage with the D-flip-flop.

# 4.5 Measurement Results

Figure 28 presents the ADC, fabricated in 28 nm CMOS, with ~40 fF input capacitance (excluding ESD), occupying an active area of 0.0166 mm<sup>2</sup> (132  $\mu$ m × 126  $\mu$ m), including the on-chip calibration circuits. The input swing of



Fig. 28 Die photo



Fig. 29 Measured ADC (a) spectrum at near Nyquist input and (b) DNL/INL before and after calibration

the prototype is 400 mV<sub>pp-diff</sub> to adopt the PARG scheme. During measurements, the circuit performs on-chip calibration in the foreground with the calibration counter values frozen throughout all conditions. Figure 29a illustrates the measured output spectrum (decimated by 225) at 3.3 GS/s for an input near Nyquist (1.649 GHz), with and without calibration. Before the calibration, the second and third harmonics dominate the SFDR and greatly limit the achievable SNDR. The mismatches between differential circuits cause mainly the second harmonic, while the offset and gain error result in the third harmonic. After the conclusion of the calibration, these harmonics decline and the SFDR improves by 5 dB. Moreover, as depicted in Fig. 29b, the measured DNL and INL before calibration are +1.48/-1 LSB and



Fig. 30 ADC performance sweeps (a) versus Fin, (b) versus Fs, (c) versus supply voltage, and (d) versus randomly selected samples

|                                   | This work              | [39]<br>Verbruggen<br>JSSC'10 | [36]<br>Chen<br>VLSI'13 | [37]<br>Shu<br>VLSI'12 | [38]<br>Oh<br>JSSC'19 |
|-----------------------------------|------------------------|-------------------------------|-------------------------|------------------------|-----------------------|
| Architecture                      | Fully dynamic pipeline | Fully dynamic<br>pipeline     | Flash                   | Flash                  | Flash                 |
| Technology                        | 28 nm                  | 40 nm                         | 32 nm<br>SOI            | 40 nm                  | 65 nm                 |
| Supply (V)                        | 0.9                    | 1.1                           | 0.85                    | 1.1                    | 0.85                  |
| Power (mW)                        | 5.5                    | 2.6                           | 8.5                     | 11                     | 7.5                   |
| ERBW (GHz)                        | >6                     | 2                             | 2.43                    | 1.5                    | 3.1                   |
| Resolution (bit)                  | 6                      | 6                             | 6                       | 6                      | 6                     |
| $f_{\rm s}$ (GS/s)                | 3.3                    | 2.2                           | 5                       | 3                      | 2.5                   |
| SFDR@Nyq.<br>(dB)                 | 45.45                  | 41.5                          | 37.48                   | 38                     | 45.07                 |
| SNDR@NYQ.<br>(dB)                 | 34.16                  | 31.1                          | 30.9                    | 33.1                   | 33.8                  |
| FoM@Nyq<br>(fJ/convstep)          | 40.02                  | 40.3                          | 59.4                    | 99.3                   | 74.7                  |
| Active area<br>(mm <sup>2</sup> ) | 0.0166                 | 0.03                          | 0.02                    | 0.021                  | 0.12                  |
| Calibration                       | On-chip                | Off-chip                      | Off-chip                | Off-chip               | On-chip               |

Table 3 ADC performance summary and comparison with state of the art

+1.08/-1.68 LSB and after calibration are +1.08/-0.85 LSB and +1.11/-1.044 LSB, respectively.

Figure 30a plots the measured SFDR/SNDR across input frequencies from DC to 6 GHz. The circuit maintains the SNDR and the SFDR at ~33 dB and 45 dB due to the small input capacitance and the bootstrapped sampling front-end. Figure 30b exhibits the SNDR and SFDR as well as the power consumption versus sampling frequencies from 1 GS/s to 4 GS/s with a fixed input at  $\sim$ 1.6 GHz. The performance has a significant drop beyond 3.4 GS/s that results from the insufficient conversion time. At sampling rates below 300 MS/s, the performance degrades due to leakage on the residue holding capacitors. We sized their switches for the lowest  $R_{on}$  due to the high-speed target. Besides, it clearly shows that the power consumption scales linearity versus sampling frequency with a slope of 1.5  $\mu$ W/MHz, which confirms the fully dynamic characteristic of this prototype. With a fixed calibration set obtained at a 0.9 V supply and no recalibration, the SNDR degrades less than 3 dB for a  $\pm 5\%$  supply change (Fig. 30c). Moreover, three randomly selected samples demonstrate a similar performance which further proves the effectiveness of the calibrations (Fig. 30d). Table 3 summarizes the major ADC specifications and compares them with state-of-the-art designs. The proposed prototype achieved a competitive energy efficiency and SNDR in a compact area even including on-chip calibration circuitry.

## 5 8b 10 GS/s Time-Domain ADC

Recently, direct time-domain ADCs [41, 42] that consist of a VTC and a TDC exhibited promising speed advantages through a time quantization step reduction. Such structures are also friendly to time-interleaved architectures, not only for their small interleaving factor but also the small sub-channel input capacitance. The above features make time-interleaved time-domain ADCs easy to acquire a high-input bandwidth, while manifesting an attractive area efficiency.

The time-domain ADCs show advantages in a high-speed scenario, but they usually present a limited resolution of six bits when running over GS/s [41, 42] due to the mismatches between the time quantization steps. Higher resolution requires more time quantization steps, and it needs finer time steps with sub-gate delay for a certain conversion speed. Both the Vernier [9] and pulse shrinking [42] TDCs can achieve a sub-gate time resolution; however, a calibration for their highly nonlinear quantization steps is essential for the sub-2-ps condition, which often requires a known input condition with a large lookup table, thus inducing additional design complexity. Such time step variations over the PVT also impose a gain calibration when applied to multi-stage architectures.

This work reduces the interleaving factor of an eight-bit 10 GS/s ADC to four through a time-domain sub-channel ADC running at 2.5 GS/s. A  $16\times$  time interpolation-based TDC resolves in two steps while allowing both the interstage gain and the quantization step to avoid calibration over PVT variations.

#### 5.1 Time-Interleaved Architecture Considerations

Prior arts [43, 44] demonstrate that an eight-bit ADC running at 10 GS/s requires 8–16 time-interleaved SAR ADC channels in FinFET and SOI technologies while



Fig. 31 (a) Time-interleaved SAR ADC with hierarchical sampling and (b) time-interleaved time-domain ADC

even >16 channels in a conventional planar CMOS process [45]. Without a hierarchical interleaving front-end, time-interleaved SAR ADCs allow a limited bandwidth due to their large input capacitance. The hierarchical sampler [46, 47] provides a solution to strengthen the ADC bandwidth (Fig. 31a), with the 16 channels divided into four groups, and each group is driven by a dedicated voltage buffer. The added buffers isolate the sampling capacitors  $C_S$  and the capacitor arrays  $C_{S,SAR}$  of the sub-SAR ADCs, thus improving the bandwidth but with additional noise and power penalty.

Utilizing the high-speed feature of the TDCs, a single TDC can achieve a conversion rate as high as fourfold that of SAR ADCs, thereby reducing the number of interleaving channels from 16 to only 4 (Fig. 31b). Besides, the VTC not only provides a voltage-to-time conversion but also acts as a sub-channel wideband buffer to isolate the sampling network and the quantizer, ensuring that the time-domain quantization has no impact on the sampling function. The VTCs also consume dynamic power and their inverter-based output stages only need to drive several time comparators in the TDCs with small loading, while the static voltage buffers in Fig. 31a face a heavy load from the capacitor array of the SAR ADCs and routings. Such difference guarantees the high-energy efficiency of the VTC-based sub-channel buffer at a high-speed scenario, while still maintaining the high bandwidth feature of the hierarchical sampling architecture.

### 5.2 Sub-time-Domain ADC Architecture

Even though the time-interleaved time-domain ADC has a number of benefits, designing an energy-efficient eight-bit 2.5 GS/s sub-channel ADC in the time domain is not a trivial task due to its complex conversion mechanism. While we can find in the literature that the discharging-based VTC is sufficient to meet the requirement, the TDC often is the bottleneck in terms of conversion speed and linearity. For an N-bit differential time-domain ADC with a 50% duty cycle clock, its maximum achievable conversion speed is

$$f_{\rm S,MAX} < \frac{1}{2^N} \cdot \frac{1}{T_{\rm LSB}} \tag{21}$$

where  $T_{\text{LSB}}$  is the least significant bit (LSB) time resolution. From Eq. (21), an eightbit 2.5 GS/s TDC calls for a  $T_{\text{LSB}}$  of 1.5 ps, which is a challenging number considering a >10 ps minimum gate delay in 65 nm CMOS.

Two-stage TDCs [48] demonstrated an energy-efficient sub-gate time resolution with a reduced number of delay cells and time comparators; however, they require a time amplifier. The extra latency of the time amplifier compresses the TDC conversion period, thus preventing its application in a high-speed scenario. The PVT-sensitive gain of the time amplifier also necessitates extra calibration effort, bringing complexity. A two-stage TDC architecture by cascading a flash TDC and a



Fig. 32 (a) Block and (b) timing diagram of the interpolation-based eight-bit 2.5 GS/s time-domain ADC

Vernier TDC, introduced in [9], omitted the time amplifier but suffered from heavy nonlinear time steps with sub-2-ps  $T_{\text{LSB}}$ .

In this work, we present a time interpolation-based two-stage TDC with an uncalibrated  $T_{\rm LSB}$  of 1.375 ps to solve the abovementioned challenges. Figure 32a shows the block diagram of the eight-bit 2.5 GS/s time-domain ADC, which consists of a sample and hold block, a VTC-based sub-channel buffer, and an eight-bit two-stage TDC. The eight-bit two-stage TDC comprises a four-bit differential flash TDC with one-bit folding as the coarse stage and a five-bit single-ended  $16 \times$ interpolation-based TDC with one-bit redundancy as the fine stage. A time residue transfer logic connects two stages and converts the differential time residue to a single-ended time difference between  $R_{\rm F}$  and  $R_{\rm S}$  ( $R_{\rm F}$  and  $R_{\rm S}$  are the fast and the slow time residues, respectively) based on the quantization results in the coarse stage. We define the interstage gain between two stages by the  $16 \times$  interpolation factor in the fine stage, inherently without using a time amplifier, thus enabling a fast conversion rate. The 16× interstage gain shows PVT robustness, as the unit delay cells (22 ps) in both stages share the same topology. The interpolation operation also allows time quantization steps without calibration. The thermometer-to-binary (T2B) encoders in Fig. 32a adopt a multiplexer-based architecture [49], while the five-bit T2B encoder for the fine stage has a segmented topology based on three-bit T2B encoders to shorten the layout routing for its input data.

The sample and hold block in Fig. 32a adopts bootstrapped switches with crosscoupled compensation for high linearity. The single-ended sampling capacitance is only 45 fF to support a high input frequency, and the sampling time is 100 ps under 2.5 GS/s per channel (Fig. 32b). After sampling, the fully dynamic VTC [9] converts the sampled voltage into a time difference ( $S_P < 0 >$  and  $S_N < 0>$ , also the inputs of the time residue transfer) through a pair of current sources and crossing detectors. Then, we quantized the generated time difference with the presented eight-bit two-stage TDC in the time domain.

## 5.3 16× Time Interpolation-Based TDC

The phase interpolation technique [50] is useful for a finer sub-gate time step. However, the strict requirement for the phase interpolator output slew rate to reduce the phase interpolation error brings high power consumption. Moreover, when cascading the  $2\times$  phase interpolator to achieve a large interpolation factor, the mixed use of interpolations from both the rising and falling edges in [50] renders extra interpolation errors and then limits the maximum achievable interpolation factor, especially for the sub-2-ps output time intervals.

Figure 33a presents the  $16\times$  time interpolator in this work with a four-layer architecture and balanced input and output loading, while we share the phase interpolators at the edges with the adjacent  $16\times$  time interpolators. All the phase interpolators in Fig. 33a share the same unit cell topology presented in Fig. 33b, which guarantees a good matching performance. The  $2\times$  phase interpolator in Fig. 33b consists of two parallel inverters (inputs are  $P_{11}$  and  $P_{12}$ , respectively) with the same dimension for interpolation purposes and a followed driver for inversion. Consequently, all the interpolations in Fig. 33a proceed at the rising edges, which provides better consistency, thus allowing a larger number of cascaded layers. The  $16\times$  time interpolation divides a 22 ps time interval (between  $R_i$  and



Fig. 33 Schematic of (a)  $16^{\times}$  time interpolator with four-layer architecture and (b) unit phase interpolator cell. (c) Timing diagram of the  $16^{\times}$  time interpolator



Fig. 34 The five-bit interpolation-based TDC

 $R_{i+1}$ , and a 10% to 90% rising time of ~20 ps) into 16 output time intervals with a resolution of 1.375 ps and an output rising time (10–90%) of ~10 ps (Fig. 33c). The circuit accomplished a 16× gain from the input to the output through the interpolation factor inherently, which is also the origin of the interstage gain between the coarse and fine stages in Fig. 32. The delay of the four cascaded phase interpolators causes latency (Fig. 33c).

The vertical cascading in Fig. 33a provides more time intervals with finer time resolution, while a horizontal cascading generates more time intervals and keeps the current time resolution (Fig. 34). The five-bit TDC in Fig. 34 cascades two 16× time interpolators with 32 output time intervals for the five-bit quantization, with a quantization range defined by the 22 ps delay cells. We added dummy delay cells and interpolators both before and after the quantization cells to shield the terminal effect and inserted only partial dummy interpolators for energy saving. We also inserted an extra delay to the  $R_S$  signal path in Fig. 34 for a latency matching purpose in Fig. 33c and time range shifting with its accuracy quite relaxed due to the one-bit redundancy in the five-bit interpolation TDC.

When compared with the "minus" operation to obtain a sub-gate time resolution in the Vernier TDC [9], the "division" operation of the interpolation TDC shows better accuracy and robustness under the fabrication impairment. For example, using a 23.375 ps slow delay and a 22 ps fast delay for a 1.375 ps time resolution in a Vernier TDC, 2% variations ( $\sigma$ ) applied to both the slow and fast delays generate a 46.7% variation ( $\sigma$ ) to the 1.375 ps time step. On the other hand, a 2% variation ( $\sigma$ ) to the 22 ps delay cells in Fig. 34 divides into 16 parts by adding extra errors from the interpolators (totally <15% in this work). Figure 35 illustrates behavioral simulations of the SNDR of an eight-bit two-stage TDC with either a five-bit Vernier TDC or the presented 16× interpolation-based TDC as the fine stage versus their unit delay variations. The x-axis in Fig. 35 represents the variations ( $\sigma$ ) of the slow and fast time steps in the Vernier TDC, while in the interpolators. For the interpolation TDC, we investigated two behavioral models: the first has a unified time step variation to reflect the impact of the random mismatch between the unit phase



interpolators; the second has time step variations that are proportional to the input time intervals of different layers to explore the impact of interpolation errors from the cascaded architecture. The modeled eight-bit TDC has a four-bit flash TDC as the coarse stage with a 1% one-sigma quantization step variation (which meets the transistor simulation results). From Fig. 35, the presented  $16\times$  interpolation-based TDC relaxes the requirement of fine-stage delay units significantly when compared with the Vernier TDC. With a 50% one-sigma fine-stage variation (to 1.375 ps) and a 1% one-sigma coarse-stage variation (to 22 ps), the presented eight-bit two-stage TDC achieved around 43 dB and 40 dB mean SNDRs with the unified and inputrelated variation models, respectively, in the 100-run Monte Carlo simulation. Therefore, the  $16\times$  time interpolation technique saves this design not only from the interstage calibration but also from the complex time quantization step calibration.

### 5.4 Low Metastability Time Residue Logic

The metastability property of the two-stage time-domain ADC shares the same error mechanisms as the pipelined ADCs [51]. The coarse stage dominates the metastability error performance with a larger magnitude, while a well-designed fine stage produces an error only in 1 LSB due to the flash conversion feature.

Figure 36 presents a time residue transfer scheme with an extended timing logic, in order to increase the maximum available time for the coarse-stage time comparators. The residue transfer unit consists of a pair of time residue folding and subtraction logics for the differential time residue generation and a pair of dynamic OR gates for converting the above differential time residue to single-ended to fit the following single-ended interpolation-based TDC in the fine stage. During this process, we shifted the selector output by one coarse-stage time quantization step


Fig. 36 (a) Block diagram of the time residue folding and subtraction. (b) Extended timing for the comparator decision. (c) Time residue output versus time input

 $T_{D,Coarse}$  and embed extra delay units  $T_{D,Meta}$  (~65 ps) in the time subtraction signal path (Fig. 36b). By adopting these timing-extended units, the maximum available time for the coarse-stage time comparators increases from less than 22 ps to over 100 ps, thereby lowering the metastability error rate from the coarse stage. Moreover, we also introduce pipelined timing at 2.5 GS/s to allocate a longer decision period for the coarse and fine stages.



Fig. 38 Measured DNL and INL of the aggregate channel

# 5.5 Measurement Results

Figure 37 shows the chip die micrograph of the prototype eight-bit 10 GS/s timedomain ADC, fabricated in a 1P9M 65 nm CMOS process, occupying an active area of 0.095 mm<sup>2</sup>. We bonded the chip die on a printed circuit board with <0.5 mm critical bonding wires. We adopted a high-speed ADC measurement strategy with input amplitude and phase monitor, applying one-time foreground calibration to remove the offset of VTCs for better dynamic range performance. We removed in the background together with the residue transfer offset the gain, the remaining offset, and the time skew between different channels, while enabling the calibration occasionally to track the slow drift of the ambient temperature and supply voltage. We do not apply any calibration related to the time quantization steps and interstage gain, benefiting from the 16× interpolation-based two-stage architecture.



Fig. 39 Measured ADC output spectra at 10 GS/s with Nyquist (5 GHz) and over Nyquist (18.1 GHz) input frequencies (decimated by  $375\times$ )



Figure 38 displays the measured DNL and INL performance of the aggregate channel; the eight-bit ADC shows DNL and INL errors within -0.69/+0.58 LSB and -1.02/+1.25 LSB, respectively. Figure 39 presents 8192-point output spectra of the ADC with both Nyquist and over Nyquist input frequencies. The 10 GS/s ADC exhibits 40.1 dB SNDR and 52.8 dB spurious-free dynamic range (SFDR) with a Nyquist input, while keeping 37.6 dB SNDR and 46.7 dB SFDR with an 18.1 GHz input, benefiting from the time-domain time-interleaved front-end architecture. Figure 40 plots the measured SNDR and SFDR of the ADCs with various input



Fig. 41 Measured SNDR and SFDR versus ambient temperature and power supply variation



frequencies sampling at 10 GS/s; the SNDR drop at 18.1 GHz input is less than 3 dB when compared with the 40.5 dB SNDR at 0.13 GHz input.

Figure 41 illustrates the measured PVT robustness; two verified chips both show <0.5 dB and <0.6 dB SNDR variation across -55 °C-125 °C and  $\pm5\%$  power supply variations, respectively, when set to a common output swing. The better

Fig. 42 Measured ADC metastability error rate

|                                     |                | [41]    | [42]    | [52]     | [47]     | [44]     | [43]    | [53]     |
|-------------------------------------|----------------|---------|---------|----------|----------|----------|---------|----------|
|                                     | This work      | 1880.16 | CICC 10 | 18800.14 | 1880.14  | VI SI 13 | 1880.17 | 1880.14  |
|                                     |                | JSSC 10 | CICC 19 | 155CC 14 | JSSC 14  | VESI 15  | JSSC 17 | JSSC 14  |
| ADC architecture                    | TI time domain |         |         | TI SAR   |          |          |         | TI flash |
| Technology                          | 65nm           | 65nm    | 65nm    | 28nm     | 65nm     | 32nm     | 16nm    | 32nm     |
|                                     | CMOS           | CMOS    | CMOS    | SOI      | CMOS     | SOI      | FinFET  | SOI      |
| Resolution [bits]                   | 8              | 6       | 6       | 6        | 6        | 8        | 8       | 6        |
| Sampling speed [GS/s]               | 10             | 10      | 10      | 10       | 12.8     | 8.8      | 28      | 20       |
| Number of channels                  | 4×             | 4×      | 2×      | 8×       | 32×      | 8×       | 32×     | 8×       |
| Supply voltage [V]                  | 1.0            | 1.3     | 1.0     | 1.0      | 1.2/1.1  | 1.0      | 0.9     | 0.9      |
| Power [mW]                          | 50.8           | 98.0    | 29.7    | 32.0     | 162.0    | 35.0     | 280.0   | 69.5     |
| Active area (mm <sup>2</sup> )      | 0.095          | 0.073   | 0.015   | 0.009    | 0.232    | 0.025    | N/A     | 0.250    |
| SNDR @ Nyq. [dB]                    | 40.1           | 27.2    | 32.5    | 33.8     | 29.5**   | 37.0     | 31.5    | 30.7     |
| SFDR @ Nyq. [dB]                    | 52.8           | 42.1    | 40.7    | 41.1     | 32.4**   | 48.8     | 39.1    | 38.1     |
| SNDR @ >Nyq. [dB]                   | 37.6           | 24.6*   | 26.0*   | 29.7     | 26.4     | 27/1     | N/A     | N/A      |
|                                     | @18.1GHz       | @7.0GHz | @8.0GHz | @19.8GHz | @25.0GHz | N/A      |         |          |
| SFDR @ >Nyq. [dB]                   | 46.7           | 32.3*   | 28.5*   | 46.1     | 29.6     | 27/1     | N/A     | N/A      |
|                                     | @18.1GHz       | @7.0GHz | @8.0GHz | @19.8GHz | @25.0GHz | N/A      |         |          |
| FOM <sub>Walden</sub> [fJ/convstep] | 61.5           | 523.7   | 86.2    | 80.4     | 519.0**  | 68.9     | 325.7   | 124.1    |
| FOM <sub>Sechreier</sub> [dB]       | 150.0          | 134.3   | 144.8   | 145.7    | 135.5**  | 148.0    | 138.5   | 142.3    |

Table 4 Performance summary and comparison with state-of-the-art time-interleaved ADCs

\*Estimated from the input frequency sweep results.

\*\*Performance with 3.12-GHz input.

performance at a higher temperature and lower power supply is due to larger time quantization steps at these conditions.

Figure 42 shows the measured metastability error rate of one-channel timedomain ADC. The measured curve matches well with a behavioral modeled erfc curve with 42 dB SNDR considering terminal and quantization noise and DNL errors. The measured curve shows an error rate corner  $<10^{-8}$ , benefiting from the metastability-reduced residue transfer logic.

Table 4 summarizes the performance of the presented time-domain ADC with a comparison to other state-of-the-art time-interleaved ADCs. The presented ADC increases the resolution of the time-domain ADC to eight bits while with a <1.5 ps uncalibrated quantization steps. When compared with the time-interleaved SAR ADCs, this design uses less channel number for a target speed and saves considerable power than the designs with a hierarchical sampler. This time-domain ADC attains comparable sub-channel conversion speed with flash ADC but higher resolution and smaller input capacitance.

# References

- 1. Louwsma, S., Tuijil, E., & Nauta, B. (2011). *Time-interleaved analog-to-digital converters*. Springer.
- 2. Ali, A. M. A., et al. (2014, December). A 14 bit 1 GS/s RF sampling pipelined ADC with background calibration. *IEEE Journal of Solid-State Circuits*, 49(12), 2857–2867.
- Lagos, J., Hershberg, B. P., Martens, E., Wambacq, P., & Craninckx, J. (2019, March). A 1-GS/ s, 12-b, single-channel pipelined ADC with dead-zone-degenerated ring amplifiers. *IEEE Journal of Solid-State Circuits*, 54(3), 646–658.
- Lee, C. C., & Flynn, M. P. (2010, June). A 12b 50MS/s 3.5mW SAR assisted 2-stage pipeline ADC. In Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC) (pp. 239–240).
- Zhu, Y., Chan, C., Sin, S., Seng-Pan, U., & Martins, R. P. (2012, June). A 34fJ 10b 500 MS/s partial-interleaving pipelined SAR ADC. In *Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC)* (pp. 90–91).
- Zhang, M., Noh, K., Fan, X., & Sánchez-Sinencio, E. (2017, November). A 0.8–1.2 V 10–50 MS/s 13-bit subranging pipelined-SAR ADC using a temperature-insensitive time-based amplifier. *IEEE Journal of Solid-State Circuits*, 52(11), 2991–3005.
- Jiang, W., Zhu, Y., Zhang, M., Chan, C.-H., & Martins, R. P. (2020, Feb). A temperaturestabilized single-channel 1-GS/s 60-dB SNDR SAR-assisted pipelined ADC with dynamic Gm-R-based amplifier. *IEEE Journal of Solid-State Circuits*, 55(2), 322–332.
- 8. Zheng, Z., et al. (2022, June). A 3.3-GS/s 6-b fully dynamic pipelined ADC with linearized dynamic amplifier. *IEEE Journal of Solid-State Circuits*, *57*(6), 1673–1683.
- Zhang, M., Chan, C.-H., Zhu, Y., & Martins, R. P. (2019, December). A 0.6-V 13-bit 20-MS/s two-step TDC-assisted SAR ADC with PVT tracking and speed-enhanced techniques. *IEEE Journal of Solid-State Circuits*, 54(12), 3396–3409.
- Zhang, M., Zhu, Y., Chan, C.-H., & Martins, R. P. (2020, December). An 8-bit 10-GS/s 16× interpolation-based time-domain ADC with <1.5-ps uncalibrated quantization steps. *IEEE Journal of Solid-State Circuits*, 55(12), 3225–3235.
- 11. Harpe, P., Cantatore, E., & van Roermund, A. (2013, December). A 10b/12b 40 kS/s SAR ADC with data-driven noise reduction achieving up to 10.1b ENOB at 2.2 fJ/conversion-step. *IEEE Journal of Solid-State Circuits*, 48(12), 3011–3018.
- 12. Vaz, B., et al. (2017, February). 16.1 A 13b 4GS/s digitally assisted dynamic 3-stage asynchronous pipelined-SAR ADC. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 276–277).
- 13. Sehgal, R., Van Der Goes, F., & Bult, K. (2018, July). A 13-mW 64-dB SNDR 280-MS/s pipelined ADC using linearized integrating amplifiers. *IEEE Journal of Solid-State Circuits*, 53(7), 1878–1888.
- Iroaga, E., & Murmann, B. (2007, April). A 12-bit 75-MS/s pipelined ADC using incomplete settling. *IEEE Journal of Solid-State Circuits*, 42(4), 748–756.
- Huang, H., Sarkar, S., Elies, B., & Chiu, Y. (2017, February). A 12b 330MS/s pipelined-SAR ADC with PVT-stabilized dynamic amplifier achieving <1dB SNDR variation. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 472–473).
- Hershberg, B., Weaver, S., Sobue, K., Takeuchi, S., Hamashita, K., & Moon, U. (2012, December). Ring amplifiers for switched capacitor circuits. *IEEE Journal of Solid-State Circuits*, 47(12), 2928–2942.
- Hershberg, B., et al. (2019, February). A 6-to-600MS/s fully dynamic ringamp pipelined ADC with asynchronous event-driven clocking in 16nm. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 68–70).
- Verbruggen, B., Craninckx, J., Kuijk, M., Wambacq, P., & Van der Plas, G. (2010, February). A 2.6mW 6b 2.2GS/s 4-times interleaved fully dynamic pipelined ADC in 40nm digital CMOS. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 296–297).

- Sepke, T., Holloway, P., Sodini, C. G., & Lee, H. (2009, March). Noise analysis for comparator-based circuits. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 56(3), 541–553.
- Hashemi, S., & Razavi, B. (2014, August). A 7.1 mW 1 GS/s ADC with 48 dB SNDR at nyquist rate. *IEEE Journal of Solid-State Circuits*, 49(8), 1739–1750.
- Yu, L., Miyahara, M., & Matsuzawa, A. (2016, October). A 9-bit 1.8 GS/s 44 mW pipelined ADC using linearized open-loop amplifiers. *IEEE Journal of Solid-State Circuits*, 51(10), 2210–2221.
- Demosthenous, A., & Panovic, M. (2005, September). Low-voltage MOS linear transconductor/squarer and four-quadrant multiplier for analog VLSI. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 52(9), 1721–1731.
- 23. Liu, C., Chang, S., Huang, G., Lin, Y., & Huang, C. (2010, June). A 1V 11fJ/conversion-step 10bit 10MS/s asynchronous SAR ADC in 0.18µm CMOS. In *Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC)* (pp. 241–242).
- 24. Moon, K., et al. (2017, June). A 9.1 ENOB 21.7fJ/conversion-step 10b 500MS/s single-channel pipelined SAR ADC with a current-mode fine ADC in 28nm CMOS. In *Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC)* (pp. C94–C95).
- Kull, L., et al. (2017, February). 28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 474–475).
- Oh, T., Venkatram, H., & Moon, U.-K. (2014, April). A time-based pipelined ADC using both voltage and time domain information. *IEEE Journal of Solid-State Circuits*, 49(4), 961–971.
- 27. Chen, Y.-J., Chang, K.-H., & Hsieh, C.-C. (2016, February). A 2.02–5.16 fJ/conversion step 10 bit hybrid coarse-fine SAR ADC with time-domain quantizer in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 51(2), 357–364.
- Muhlestein, J., Leuenberger, S., Sun, H., Xu, Y., & Moon, U.-K. (2017, April). A 73dB SNDR 20MS/s 1.28mW SAR-TDC using hybrid two-step quantization. In *Proceedings of the IEEE CICC* (pp. 2152–3630).
- Zhu, S., Wu, B., Cai, Y., & Chiu, Y. (2018, April). A 2-GS/s 8-bit non-interleaved time-domain flash ADC based on remainder number system in 65-nm CMOS. *IEEE Journal of Solid-State Circuits*, 53(4), 1172–1183.
- 30. Lim, Y., & Flynn, M. P. (2015, December). A 1 mW 71.5 dB SNDR 50 MS/s 13 bit fully differential ring amplifier based SAR-assisted pipeline ADC. *IEEE Journal of Solid-State Circuits*, 50(12), 2901–2911.
- Sanyal, A., & Sun, N. (2016, June). A 18.5-fJ/step VCO-based 0–1 MASH ΣΔ ADC with digital background calibration. In *Proceedings of the IEEE Symposium on VLSI Circuits* (VLSIC) (pp. 26–27).
- Liu, C.-C., Huang, M.-C., & Tu, Y.-H. (2016, December). A 12 bit 100 MS/s SAR-assisted digital-slope ADC. *IEEE Journal of Solid-State Circuits*, 51(12), 2941–2950.
- 33. Ding, M., Harpe, P., Liu, Y.-H., Busze, B., Philips, K., & de Groot, H. (2017, February). A 46 μW 13 b 6.4 MS/s SAR ADC with background mismatch and offset calibration. *IEEE Journal of Solid-State Circuits*, 52(2), 423–432.
- 34. Devarajan, S., et al. (2017, December). A 12-b 10-GS/s interleaved pipeline ADC in 28-nm CMOS technology. *IEEE Journal of Solid-State Circuits*, 52(12), 3204–3218.
- 35. Ali, A. M. A., et al. (2020, December). A 12-b 18-GS/s RF sampling ADC with an integrated wideband track-and-hold amplifier and background calibration. *IEEE Journal of Solid-State Circuits*, 55(12), 3210–3224.
- Chen, V. H., & Pileggi, L. (2013, June). An 8.5mW 5GS/s 6b flash ADC with dynamic offset calibration in 32nm CMOS SOI. In *Proceedings of IEEE Symposium on VLSI Circuits (VLSIC)* (pp. C264–C265).

- Shu, Y. (2012, June). A 6b 3GS/s 11mW fully dynamic flash ADC in 40nm CMOS with reduced number of comparators. In *Proceedings of IEEE Symposium on VLSI Circuits (VLSIC)* (pp. 26–27).
- Oh, D., Kim, J., Jo, D., Kim, W., Chang, D., & Ryu, S. (2019, January). A 65-nm CMOS 6-bit 2.5-GS/s 7.5-mW 8 \$\times\$ time-domain interpolating flash ADC with sequential slopematching offset calibration. *IEEE Journal of Solid-State Circuits*, 54(1), 288–297.
- Verbruggen, B., Craninckx, J., Kuijk, M., Wambacq, P., & Van der Plas, G. (2010, October). A 2.6 mW 6 bit 2.2 GS/s fully dynamic pipeline ADC in 40 nm digital CMOS. *IEEE Journal of Solid-State Circuits*, 45(10), 2080–2090.
- 40. Akter, M. S., Sehgal, R., van der Goes, F., Makinwa, K. A. A., & Bult, K. (2018, October). A 66-dB SNDR pipelined split-ADC in 40-nm CMOS using a class-AB residue amplifier. *IEEE Journal of Solid-State Circuits*, 53(10), 2939–2950.
- 41. Zhu, S., Wu, B., Wu, B., Soppimath, K., & Chiu, Y. (2016, August). A skew-free 10 GS/s 6 bit CMOS ADC with compact time-domain signal folding and inherent DEM. *IEEE Journal of Solid-State Circuits*, 51(8), 1785–1796.
- 42. Hassanpourghadi, M., & Chen, M. S.-W. (2019, April). A 2-way 7.3-bit 10 GS/s time-based folding ADC with passive pulse-shrinking cells. In *Proceedings of the IEEE Custom Integrated Circuits Conference (CICC)* (pp. 1–4).
- 43. Frans, Y., et al. (2017, April). A 56-Gb/s PAM4 wireline transceiver using a 32-way timeinterleaved SAR ADC in 16-nm FinFET. *IEEE Journal of Solid-State Circuits*, 52(4), 1101–1110.
- 44. Kull, L., et al. (2013, June). A 35 mW 8 b 8.8 GS/s SAR ADC with low-power capacitive reference buffers in 32 nm digital DOI CMOS. In *Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC)* (pp. 260–261).
- 45. Cao, J., et al. (2017, February). A transmitter and receiver for 100Gb/s coherent networks with integrated 4×64GS/s 8b ADCs and DACs in 20nm CMOS. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 484–485).
- 46. Greshishchev, Y. M., et al. (2010, February). A 40GS/s 6b ADC in 65nm CMOS. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 390–391).
- Duan, Y., & Alon, E. (2014, August). A 12.8 GS/s time-interleaved ADC with 25 GHz effective resolution bandwidth and 4.6 ENOB. *IEEE Journal of Solid-State Circuits*, 49(8), 1725–1738.
- 48. Lee, M., & Abidi, A. A. (2008, April). A 9 b, 1.25 ps resolution coarse–fine time-to-digital converter in 90 nm CMOS that amplifies a time residue. *IEEE Journal of Solid-State Circuits*, 43(4), 769–777.
- Sall, E., & Vesterbacka, M. (2004, November). A multiplexer based decoder for flash analog-todigital converters. In *Proceedings of TENCON* (pp. 250–253).
- 50. Miyashita, D., Kobayashi, H., Deguchi, J., Kousai, S., & Hamada, M. (2011, June). A -104dBc/ Hz in-band phase noise 3GHz all digital PLL with phase interpolation based hierarchical time to digital convertor. In *Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC)* (pp. 112–113).
- 51. Hashemi, S., & Razavi, B. (2014, May). Analysis of metastability in pipelined ADCs. *IEEE Journal of Solid-State Circuits*, 49(5), 1198–1209.
- 52. Le Tual, S., Singh, P. N., Curis, C., & Dautriche, P. (2014, February). A 20GHz-BW 6b 10GS/s 32mW time-interleaved SAR ADC with master T&H in 28nm UTBB FDSOI technology. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 382–383).
- 53. Chen, V. H.-C., & Pileggi, L. (2014, December). A 69.5 mW 20 GS/s 6b time-interleaved ADC with embedded time-to-digital calibration in 32 nm CMOS SOI. *IEEE Journal of Solid-State Circuits*, 49(12), 2891–2901.

# **High-Performance Oversampling ADCs**



Chi-Hang Chan, Yan Zhu, Liang Qi, Sai Weng Sin, Maurits Ortmanns, and Rui P. Martins

# 1 Introduction

Driven by the rapid development of IoE, the performance of wireless communication SoCs demand high power efficiency while simultaneously allowing a wideband and high-resolution input/output signal for massive information throughput. In the receiver end, the performance bottleneck always lies in the analog-to-digital process, where the analog-to-digital converters (ADCs) need to have high dynamic range and exhibit low noise and with high-energy efficiency. Nevertheless, such a mixedsignal building block does not enjoy all the benefits inherited from the technology scaling, and it will become challenging a design to meet all the above aspects.

C.-H. Chan · Y. Zhu · S. W. Sin (⊠)

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

e-mail: ivorchan@um.edu.mo; yanzhu@um.edu.mo; terryssw@um.edu.mo

L. Qi

Shanghai Jiao Tong University, Shanghai, China e-mail: qi.liang@sjtu.edu.cn

M. Ortmanns Ulm University, Ulm, Germany e-mail: maurits.ortmanns@uni-ulm.de

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

On leave from Instituto Superior Técnico, Lisbon, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_5

Modern Wi-Fi and mobile communication standards call for an ADC with a few tens to hundred megahertz of bandwidth and with a dynamic range close to 80 dB. The continuous-time (CT) delta-sigma converter inherently embeds anti-aliasing and is the block to drive the system earlier due to its high impedance input interface. A discrete-time delta-sigma modulator (DSM) conventionally shows no obvious advantage, but when deeply hybridized with a successive approximation register (SAR) ADC, usually designated as noise-shaping (NS) SAR, the energy efficiency surpasses the CT-DSMs at a relatively low bandwidth design target. The work described in [1] introduced a multi-loop to reduce the performance gap between CT and discrete-time (DT) DSMs, increasing the noise-shaping order for higher SQNR (signal-to-quantization-noise ratio) at a low oversampling ratio (OSR). A Sturdy multistage NS delta-sigma (MASH) also designated as SMASH and an additional error feedback (EF) loop can relax the critical matching DAC. Preliminary sample and quantization techniques in [2] provide extra quantization during an idle time in the excess loop delay (ELD) compensation period, pushing the single-loop CT DSM toward low power at wide bandwidth. On the other hand, we can also extend a NS SAR ADC into a multistage configuration as in [3] and [4]. Besides, with partial interleaving techniques, a tens of megahertz bandwidth is possible. Furthermore, [3] also utilizes the residue amplifier in the pipeline architecture for EF-type NS. The N-0 multistage NS sigma-delta MASH structure in [4] further simplifies the critical gain accuracy in the NS pipeline SAR architecture.

The organization of this chapter is as follows: Section 2 introduces a SMASH CT DSM. Section 3 details the preliminary sample and quantization technique with the CT DSM. Then, Sects. 4 and 5 introduce two different NS pipeline SAR ADCs.

# 2 Sturdy Multistage Noise-Shaping (MASH) Continuous-Time (CT)-Delta-Sigma Modulator (DSM)

A large signal bandwidth required by wireless telecommunication applications restrains the OSR used for the DSMs. To obtain the desired resolution while keeping a better power efficiency, we need to explore the following two dimensions together in the design of the DSM.

On one hand, the DSM should aim to secure an aggressive NS. However, a higher NS order suffers from instability in a single-loop topology. Alternatively, we can use the multistage noise-shaping (MASH) [1] architecture to overcome the instability issue. Nevertheless, the noise leakage inherently exists in MASH DSMs resulting from the mismatch between analog and digital filters. A DT solution is more robust for the MASH over its CT counterpart whereas the speed remains the bottleneck. Besides, it is necessary to improve the opamp efficiency in a DT MASH. Due to the inherent switching activity, we can apply an opamp sharing technique to reduce the number of required opamps in a DT DSM [5], reducing the power and area consumptions. However, we can further improve the opamp sharing efficiency in a

DT MASH [6]. The CT solution allows a larger signal bandwidth and a better power efficiency whereas the noise leakage is detrimental in a CT MASH. The CT sturdy MASH (SMASH) [7] poses a high potential to replace the CT MASH as it exhibits a relaxed matching requirement to reduce the noise leakage.

On the other hand, we can employ a multibit quantizer to improve the resolution further and mitigate the requirements of the loop filter. Yet, the multibit feedback DAC is nonlinear owing to the element mismatch, dictating a DAC linearization technique. We can perform the DAC linearization by using either a dynamic element matching (DEM) technique or calibrations. DT architectures favorably use the DEM when the sampling frequency is relatively low. But, with a multi-GHz sampling frequency in CT solutions, the DEM is neither feasible enough nor power friendly. Instead, it is common to find DAC calibrations in wideband CT DSMs to address the DAC nonlinearity issue. Nevertheless, the on-chip DAC calibration [8] requires additional power and area consumptions. Moreover, the off-chip DAC calibration [9] is not able to track the current source mismatch error over temperature variations, which is not desirable in a high-performance CT DSM.

#### 2.1 Related Prior Arts

To implement a large signal bandwidth required by the mainstream LTE-A cellular standard, the CT DSM is undoubtedly preferable. Besides, with low supply voltage required in an advanced process, a SAR ADC as a quantizer poses a potential attractive alternative to a traditional flash ADC. However, a SAR ADC usually demands multiple clock cycles for quantization, thus imposing a larger timing headroom when compared with a flash architecture. Therefore, most of the reported SDMs employing a SAR as the quantizer only attained a small signal bandwidth [10, 11].

The MASH DSM circumvents the stability problem associated with high-order noise shaping, but the CT MASH suffers from the mismatch between the analog and the digital filter. By feeding the second stage into the first loop, the SMASH provides a solution to this leakage issue. Still, the quantization error extraction and cancellation are still challenging due to the delay and phase shift in a CT SMASH. The work [7] presents a CT SMASH for the first time, addressing the quantization error extraction and cancellation. Figure 1 presents the block diagram of a CT SMASH. As foreseen, there is one propagation delay introduced by the first quantizer. Then, with a CT input and a delayed DT output, the correct extraction of the quantization noise  $E_{a1}$  from the first quantizer would be somewhat problematic. Straightforwardly, in Fig. 1, it is necessary to generate the same delay for the CT input of the first quantizer in order to extract correctly  $E_{q1}$ . Theoretically, we can employ a sample and hold circuit to obtain such a delay. However, with a multi-GHz sampling frequency, the sample and hold topology is not power friendly at all. Instead, the proposed CT SMASH with a passive RC low-pass filter (LPF) to generate such a delay is feasible owing to the oversampling property. Moreover, to eliminate



effectively the quantization noise  $E_{q1}$ , we can properly select a first-order FF topology for the second loop to implement a unity-gain STF (standard transmission format) within the band of interest.

Nonetheless, the delay generated by the passive LPF is very sensitive to temperature and process variations. Any delay mismatch results in leakage of the signal and therefore may overload the second loop. Besides, the LPF attenuates the highfrequency components, thus introducing a second peaking in the STF, not at all desirable.

# 2.2 Proposed CT Sturdy MASH with DAC Nonlinearity Tolerance

To reach 50 MHz signal bandwidth for LTE-A applications, it is almost impossible to continue using a DT MASH architecture. Instead, a CT solution allows implementing such a large signal bandwidth. A CT sturdy MASH poses higher potential over a CT MASH owing to its relaxed matching requirements [7]. Nevertheless, as mentioned before, the quantization error cancellation and its extraction are still challenging in the CT domain. Besides, DAC nonlinearity becomes problematic with multi-GHz sampling frequency since DEM is less effective at low OSR, while DAC calibrations require large power and area consumptions. Thereby, we will present a more robust CT SMASH DSM in terms of quantization error extraction and its cancellation. Moreover, it would employ multibit quantization to gain good stability, relaxed dynamic requirements of the loop filter, large maximum stable amplitude (MSA), and out-of-band gain (OBG) while achieving high linearity but avoiding any linearization technique in all multibit DACs to reduce power and area costs.

To pull off the above design goals, we introduce a dual-stage noise-couplingassisted CT SMASH DSM using 1.5bit/4bit quantizers in both stages [12, 13]. By effectively eliminating 1.5bit quantization noise from the first stage, the SMASH



DSM enjoys all benefits provided by multibit operation of a DSM. The noise cancelling (NC) technique [14] applied in the SMASH not only improves the noise-shaping order by one but also works as dithering for the highly tonal 1.5bit quantization noise and further reduces its in-band tone power. Meanwhile, an FIR filter [15] incorporated in the outermost feedback path reduces the out-of-band (OOB) noise power in the nonlinear DAC input to alleviate quantization noise folding. Both features significantly mitigate the requirement for a highly linear multibit DAC, thus circumventing any DAC linearization while delivering high linearity. Moreover, we employed a SAR ADC for a 1.5bit quantizer allowing extraction of the quantization error using switched capacitor (SC), which is robust over process and temperature variations. Meanwhile, the selection of a zeroth-order architecture for the second loop eliminates more accurately the quantization error.

In practice, to speed up the whole feedback loop, we will not implement the subtraction in the digital domain by using a digital adder. Instead, as Fig. 2 shows, we obtain the subtraction in the analog domain through  $DAC_1$  and  $DAC_2$  in parallel in the front-end. The analog subtraction allows the ideal removal of  $E_{a1}$  inside the first loop filter. As a result, the number of bits of the quantizer used in the second loop merely determines the MSA and the OBG of the SMASH DSM. Since the OSR is low for wideband applications, we can advantageously use multibit quantization in the second quantizer to realize large MSA and OBG. Ideally, the resolution of the first-stage quantizer does not affect the performance of the SMASH. Still, a previous CT SMASH implementation [7] also applied multibit quantization in the first quantizer. In the MASH topology, using single-bit quantization in the first and multibit in the later stages is advantageous for DAC linearity due to the single-bit DAC intrinsic linearity and the suppression of multibit DAC nonlinearity in higher stages in the digital cancellation logic. Still, the noise leakage emerges from the first stage related with the single-bit, which is much worse than multibit quantization noise leakage. This leakage argument is still valid for SMASH DSMs. More unfortunately, in contrast to the MASH, with DAC<sub>2</sub> fed back to the most sensitive input node of the overall DSM, using a multibit  $DAC_2$  would anyhow impose DAC linearization. This occurs because the single-bit quantization noise  $E_{a1}$  is highly tonal, rather than an approximate white noise as the multibit quantization noise. In



the SMASH, such highly tonal signal  $E_{q1}$  would go through the nonlinear DAC<sub>2</sub> located in the sensitive input front-end, thus introducing harmonic distortions. In addition, the large OOB quantization noise of the second stage mixed by the nonlinearity of DAC<sub>2</sub> would thereby increase the in-band noise floor. Consequently, the linearization of DAC<sub>2</sub> is compulsory in multibit quantization in the second stage of the SMASH. Thus, previous research work about the SMASH architecture did not find any advantage in combining an intrinsically linear single-bit quantization in the first stage with a multibit quantization in the second stage. Yet, this issue subsequently emerged in state-of-the-art SMASH DSMs, and we will address it in the proposed CT SMASH.

Figure 3 presents the overall CT SMASH DSM employing an underlying dualloop architecture using 1.5bit/4bit quantizers in the first and second loops, respectively. As a result, the circuit processes an effective 4bit quantization noise inside the loop filter of the first stage, thus obtaining decent stability, large MSA, and OBG. Moreover, it applies a first-order noise coupling in the first loop. To account for the applied NC in the first loop and effectively eliminate the first-order noise-shaped  $E_{q1}$ , we must implement the corresponding filter  $(1 - z^{-1})$  after the extraction of  $E_{q1}$ . From Fig. 3, we move this filter from the input (analog domain) of the second stage to its output (digital domain), which allows higher accuracy and further increase of the one noise-shaping order in  $E_{q2}$ . In the backend, the combination of the two digital outputs  $V_1$  and  $V_2$  generates the final output  $V_0$ , given by

$$V_{\rm O} = {\rm STF}_1 X + (1 - {\rm STF}_2) {\rm NTF}_1 (1 - z^{-1}) E_{q1} - {\rm NTF}_1 (1 - z^{-1}) E_{q2}$$
(1)

The utilization of a 1.5bit quantizer in the first loop, in case of an imperfect cancellation, would leak 1.5bit quantization noise  $E_{q1}$  to the final output, thus deteriorating the final performance. To alleviate the 1.5bit quantization noise (QN) leakage, we must design a unity-gain STF<sub>2</sub> as accurately as possible. Thereby, instead of using a first-order feedforward topology as in [3] (Fig. 3), we select a zeroth-order topology for the second loop. Without any opamps involved in the implementation of STF<sub>2</sub>, their finite GBWs will not affect anymore the accuracy of the unity-gain STF<sub>2</sub>. On the other hand, we can mitigate the opamp GBWs employed

**Fig. 3** The overall architecture of the proposed CT SMASH DSM

in integrators to be  $1 \times F_s$  ( $F_s$  is the sampling frequency of the DSM) with an SQNR loss of 3 dB in the proposed SMASH. However, in the general SMASH 3–1, we must keep such GBWs at least close to 3.5\*Fs to reduce the 1.5bit noise leakage and keep the system stable.

As a result, the 3–0 topology effectively realizes a SMASH architecture where the first stage provides the noise-shaping order while the second stage determines the effective quantization bit width. Therefore, overall it allows multibit loop filter scaling in the first stage yielding larger MSA and OBG, reduced dynamic loop filter requirements, etc.

The SMASH DSM with a 1.5bit/4bit combination in both quantizers allows intrinsic linearity in the 1.5bit DAC<sub>1</sub> while the 4bit DAC<sub>2</sub> is unavoidably nonlinear. It is noteworthy that using NC not only helps to increase the noise-shaping order but also works as dithering [10] for the highly tonal 1.5bit QN  $E_{q1}$ , thus significantly reducing its idle tones and harmonic spurs. Finally, the combination of the NC technique in the 1.5bit quantizer in the first stage and an FIR LPF allows the SMASH DSM to circumvent any linearization technique for the outermost multibit DAC<sub>2</sub>, with large in-band tones and large OOB QN highly suppressed before they are nonlinearly processed by the DAC<sub>2</sub>.

Figure 4 illustrates the overall schematic of the NC-assisted dual-loop 3–0 SMASH DSM, employing a 1.5bit successive approximation register (SAR) and a 4bit flash ADC for both quantizers, respectively. The input resistance is 250  $\Omega$  to satisfy the thermal noise requirement. All DACs use a nonreturn-to-zero (NRZ) trilevel topology, thus resulting in less unit DAC cells as well as less preceding drivers. The first loop employs a third-order mixed feedforward (FF)/feedback (FB) topology, with an OBG of 2.3. The FF/FB combination separates the high-gain and high-speed requirements into the first and third integrators [11], respectively, which allows a better opamp power efficiency. Besides, a local resonator path generated by the first and second integrators introduces one zero in the NTF (non-fungible token) to further suppress the in-band noise. The two FF paths can effectively decrease the swings of the first and second integrators. To compensate the



Fig. 4 The overall schematic of the proposed CT SMASH DSM

introduced outermost two-tap FIR filter, we incorporate a simple FIR compensation filter  $F_{\rm C}(z)$  in the inner FB branches that restores the original NTF [12]. To compensate for process variation, the integration and NC capacitors are digitally programmable with a 4bit trimming accuracy, which can cover  $\pm 40\%$  RC variations.

By utilizing a 1.5bit SAR ADC for the first quantizer, the circuit naturally produces quantization noise  $E_{q1}$  on the summing node by the end of the charge redistribution [5]. Thereby, the capacitor ratio determines the extraction of the quantization noise  $E_{q1}$ , which is robust over process and temperature variations. After the SC extraction, the injection of the residue into the last integrator through an SC buffer generates the first-order NC branch [5]. Meanwhile, the SC buffer also directly drives the second stage. In order to close the SMASH loop, this architecture uses  $0.75T_S$  as an overall ELD. To compensate the ELD, we integrate a unity-gain zeroth-order path with one cycle delay [13] inside the 1.5bit SAR ADC.

This SMASH DSM renders a fourth-order 1.5bit quantization first loop combined with zeroth-order 4bit quantization second loop into an equivalently operating overall fourth-order DSM architecture with 4bit quantization. With an OSR of 12, it obtains an ideal SQNR of 90 dB.

#### 2.3 Experimental Results

Figure 5 shows the chip micrograph of the CT SMASH DSM prototype fabricated in 28 nm CMOS with an active core area of 0.085 mm<sup>2</sup>.

Running at a sampling frequency of 1.2 GHz, Fig. 6 plots the measured 65k- FFT output spectrum with a -1.6dBFS input at 6 MHz. It exhibits an expected overall fourth-order noise-shaping slope in the output spectrum. The measured SNDR (signal-to-noise and distortion ratio)/SNR (signal-to-noise ratio)/SFDR (spurious-



**Fig. 5** Chip micrograph of the prototype SMASH



Fig. 6 Measured FFT spectrum of the prototype DSM output

|                         | This work   | [8]         | [17]        | [9]         | [7]       | [18]         |
|-------------------------|-------------|-------------|-------------|-------------|-----------|--------------|
| Architecture            | SMASH       | Single-loop | Single-loop | Single-     | SMASH     | Single-loop  |
|                         | Fourth-     | Fourth-     | Fourth-     | loop        | fourth-   | fourth-order |
|                         | order       | order       | order       | sixth-order | order     | 1bit         |
|                         | 1.5bit/4bit | 4bit        | 7bit        | 4bit        | 4bit/4bit |              |
| Process                 | 28          | 28          | 16          | 65          | 28        | 40           |
| (nm)                    |             |             |             |             |           |              |
| Supply (V)              | 1.2/1.5     | 1.16/1.5    | 1/1.35/1.5  | 1.2/1.8     | 1.2/1.5   | N/A          |
| BW(MHz)                 | 50          | 50          | 125         | 45          | 50        | 40           |
| $F_{\rm S}~({\rm GHz})$ | 1.2         | 2           | 2.15        | 0.9         | 1.8       | 2.4          |
| SNDR (dB)               | 76.6        | 79.8        | 71.9        | 75.3        | 74.6      | 66.9         |
| SFDR (dB)               | 87.9        | 95.2        | N/A         | 83          | 89.3      | N/A          |
| THD (dBc)               | -83.9       | -94.1       | -80         | -78.1*      | -79.9*    | N/A          |
| Power                   | 29.2        | 64.3        | 54          | 24.7        | 78        | 5.25         |
| (mW)                    |             |             |             |             |           |              |
| Area (mm <sup>2</sup> ) | 0.085       | 0.25        | 0.217       | 0.16        | 0.34      | 0.02         |
| DAC Cal.                | Without     | With        | With        | With        | With      | Without      |
|                         | on-chip     | On-chip     | On-chip     | Off-chip    | On-chip   | On-chip      |
| FoM <sub>S</sub> (dB)   | 168.9       | 168.7       | 165.5       | 167.9       | 162.7     | 165.7        |

Table 1 Performance summary and comparison with state-of-the-art CT DSMs

free dynamic range) is 76.6 dB/77.5 dB/87.9 dB over a 50 MHz signal bandwidth, respectively.

Table 1 summarizes the performance of the prototype and compares it with other state-of-the-art CT wideband DSMs. It obtains a competitive Schreier FoM of 168.9 dB. Using the proposed techniques, it exhibits high linearity without employing any DAC linearization technique. In contrast, other works [3–5, 14] using multibit quantization either employ on-chip or off-chip DAC calibrations. As obviously observed from Table 1, when compared with such works, the power

and especially the area consumed by this prototype are much smaller. As for the single-bit DSM in [15], even though it avoids the DAC calibration as well, the resolution and the MSA are much lower. Besides, it does not only require significantly smaller input referred noise for the same SNR, but it would also impose stricter noise requirements on the preceding circuitry of the transceiver.

# 3 A 100 MHz Bandwidth Continuous-Time Sigma-Delta Modulator with Preliminary Sampling and Quantization

Figure 7 presents a fourth-order CT  $\Delta\Sigma$  modulator architecture, where the modulator runs at 2 GHz with a bandwidth of 100 MHz, experiencing an OSR of 10. We choose a cascade of integrators in feedforward (CIFF), as it requires no extra DACs in the modulator for feedback and ELD compensation and the best noise suppression in the succeeding integrators [1]. It also reduces the output swing of the first integrator, which can relax the linearity requirement of the first opamp in the LF. However, when compared with the CIFB architecture, the CIFF (cascade of integrator feedforward) requires an extra low-pass filter to alleviate the high STF peaking. The modulator realizes a fourth-order LF with three opamps. One of the opamps implements a single amplifier biquad (SAB) integrator to obtain a second-order transfer function which reduces the power and the phase delay of the LF [19]. The SAB integrator also introduces a notch in the NTF to improve the SQNR, which is effective in low OSR CT DSM designs.

The CT DSM employs preliminary sampling and quantization (PSQ) to implement additional quantization from the QTZ (quantization) backend, which runs at 2 GHz with six-bit resolution and utilizes almost 90% of the clock period. The QTZ



Fig. 7 The block diagram of the fourth-order CT  $\Delta\Sigma$  modulator with the coarse-fine QTZ

of the modulator consists of a three-bit two-step coarse QTZ and a four-bit SAR fine QTZ with one-bit error correction range. In order to avoid the extra power and latency introduced by the ELD DAC, we adopt a feedforward scheme. Figure 7 highlights the ELD compensation path, which includes the first and last integrators. The LF realizes the constant term in the ELD compensation path, equivalently acting as an active adder in the ELD compensation scheme [2]. Such ELD realization requires sufficiently high impulse response speed in the modulator, while inadequate speed leads to out-of-band peaking in the frequency domain and even instability [20]. Here, we design the unity-gain bandwidth (UGBW) of the first and last opamps to be higher than the second with  $4F_{\rm S}$ , with the ELD coefficient also over-designed, which ensures stability with high impulse response speed.

The three-bit digital tuning capacitors compensate the process variation of the RC integrator, covering  $\pm 25\%$  time constant variation. We adopted the nonreturn-tozero (NRZ) current-steering DAC and segmented structure [21] to reduce the clock jitter sensitivity [16] and the power as well as the area of the feedback DAC, respectively. The calibration of the DAC mismatch between the segment and the unit element occurs in the digital domain [22]. It involves three steps [19]: first, the evaluation of the DAC unit cell mismatch error in an offline procedure; second, the freezing and digital storage of the evaluated DAC error in the lookup table (LUT); and finally, a summation in the digital domain that corrects the output with the evaluated DAC error stored in the LUT. Based on the SNDR and SFDR targets, 13b final output codes are necessary to fulfill the accuracy. The estimated total power and area of the calibration, including memory, are  $\sim 1.4$  mW and  $\sim 0.008$  mm<sup>2</sup> in the adopted technology node, respectively. Under the temperature variation with a long channel device, as both the threshold and current factor mismatches only have a weak dependency on the temperature [22], the temperature-originated mismatch variation results from the  $g_m/I_d$ , where  $g_m$  and  $I_d$  are the transconductance and drain current of the MOSFET in the unit current cell, respectively. Simulation results based on the setup and sizing of this design show that such variation leads to a one sigma mismatch of ~0.05% from -20 to 80 °C after calibration at 27 °C, still within the target requirement.

The noise requirement determines the value of the input resistor (R1), which simultaneously decides the consumed current of the main DAC and the capacitance load of the first integrator, thus implying that the value of R1 induces a trade-off between the noise and power of the DAC as well as the opamp in the first integrator. Here, the target SNR is ~77 dB where the SQNR has close to a 10 dB margin. Based on such goal, the R and C values are 220  $\Omega$  and 2.5 pF, respectively, for the first integrator. Thus, the C<sub>DAC</sub> dissipated ~2.3 mA.

# 3.1 Preliminary Sampling and Quantization (PSQ)

A moderate OSR with the number of quantization in the backend QTZ implies a limitation in the swing variation of the QTZ input signal. Therefore, it is possible to resolve several more coarse bits during such period when we can still cover the error



in the fine quantization. Under this circumstance, we can extend the conversion time of the QTZ while simultaneously keeping a reasonable ELD coefficient for the energy-efficient target. Figure 8 plots the QTZ input and the PSQ coarse-fine sample timing. The coarse QTZ samples and quantizes at the time between the fine QTZ sampling and the DAC feedback instant to obtain extra quantization bits. There is a time difference  $\Delta t_{FC}$  between the coarse and the fine QTZ sampling instants, which leads to a sampling error ( $\varepsilon_{SAM}$ ). In order to alleviate it, we should place the coarse sampling instant as close as possible to the fine, which implies an available short time for the coarse QTZ. Therefore, there is a trade-off between the amount of  $\varepsilon_{SAM}$ and the extra quantization obtained in the coarse QTZ. Apart from  $\Delta t_{FC}$ , the modulator OBG, the LF frequency response, the input variation, and the resolution of the QTZ all affect the  $\varepsilon_{SAM}$ . We discuss its correction and other design considerations next.

Considerations about the fine sampling instant in the PSQ technique are similar to others in conventional techniques. As the CIFF architecture realizes the ELD compensation in here, the trade-off among the fine OTZ conversation time, the stability, and the power consumption of the LF bind the fine sampling instant. Figure 9 illustrates the relation between the SONR of the modulator and the opamp bandwidth in the LF, with different choices of the ELD coefficient, and it indicates the stability condition. Furthermore, since the fine QTZ has to cover the sampling error, we also need to consider its correction range. For instance, when the ELD coefficient is  $0.4T_s$ , the LF requires OPAMPs with  $2F_s$  UGBW in order to keep the modulator stable. With close to ~80 ps one SAR cycle and  $F_s$  of 2 GHz in the current design, the  $0.4T_s$  ELD only allows 2b conversion in the fine SAR QTZ, implying that we must resolve the remaining four bits during the coarse QTZ. Under this condition, the fine QTZ only can provide a small correction range for the sampling error that eventually limits the coarse sampling instant location and reduces the robustness of the PSQ. On the other hand, with a  $0.8T_s$  ELD coefficient, a power-hungry-wide bandwidth opamp is necessary that obviously is not a good choice for an energy-efficient target. In the last case with  $0.65T_s$  ELD, the modulator allows four bits fine QTZ with 1b error correction, covering a 175 mV error range.

Figure 10 displays the relation between the sampling error and the  $\Delta t_{CF}$ . We can observe that a shorter  $\Delta t_{FC}$  leads to a smaller sample error but with less available



time for the coarse quantization. When  $\Delta t_{\rm fc} = 0.125T_{\rm S}$ , it has a small sampling error but only allows one SAR cycle conversion (~80 ps) in the coarse ADC. With only one cycle available but 3b quantization, the only possible architecture to achieve it is the flash that requires seven comparators with offset calibrations and a ladder with the static current. Consequently, the QTZ will occupy a large area, limiting the modulator speed. On the other hand, with a two-cycle available time, we can adopt a subranging architecture to save power and calibration overhead from the pure flash architecture. In the three-cycle case, not only is the timing over  $1T_s$ , but also the sampling error is over the possible correction range. According to all the abovementioned considerations, we picked here a  $0.65T_s$  ELD with  $\Delta t_{FC}$  of 0.25. In wireless communication systems, both the conventional and the PSQ QTZ can saturate with the large out-of-band (OB) blocker under the same STF. However, the PSO induces one more concern from the sampling error. The sampling error (rms value) exceeds the correction range of the fine stage with >300 MHz and 0 dBFs blocker signal. Yet, with a simple first-order loop pass (LP) filter, the QTZ maintains the stability within all frequencies (Fig. 11). The LP filter limits the blocker signal's amplitude at high frequency, ensuring that the sampling error is within the dedicated



Fig. 12 The output of the LF with ideal and real integrator in the zero crossing and half and peak of the sine wave, respectively

correction range of the fine quantizer. Therefore, to tolerate the OB blocker, the overhead is an LP filter that is often available from the ADC driver.

In the CIFF DSM of Fig. 7, the feedforward path in the LF compensates the ELD. During the DAC feedback, the LF experiences a step response-like input. Restricted by the finite opamp bandwidth in the LF, the output deviates from its ideal value but eventually converges when the response becomes moderate during input tracking. From Fig. 12, when compared with the ideal case, the response of the LF in the CT DSM consists of two parts. The first is the BW limited region, where the output of the LF is mainly dependent on the step response ability of the LF, thus leading to a difference  $\varepsilon_{\text{SAM}}$  between the ideal and the real responses. The second is the input track, where the output is mainly dependent on the transfer function of the LF. In the BW limited region, the sampling error  $\varepsilon_{\text{SAM}}$  of the LF with 0.25 $T_s \Delta t_{\text{fc}}$  becomes

$$\varepsilon \propto D_{\text{out}} (1 - z^{-1}) \Big( e^{-t/\tau} - e^{-(t + 0.25T_s)/\tau} \Big),$$
 (2)

where  $D_{out}(1 - z^{-1})$  represents the difference between two sequence output codes. In the CIFF topology, the  $D_{out}(1 - Z^{-1})$  through the ELD compensation path directly affects the output of the LF, which is similar to the SC integrator. Therefore, the second part of Eq. (1) is the difference between two instants under the SC response, where  $\tau$  is the time constant of the LF that is inversely proportional to the bandwidth of the opamp in the LF. Finally, Eq. (1) indicates the total difference between two instants of the LF output, which is the sampling error in the proposed PSQ technique.

Furthermore, the slope and the polarity of the input signal also affect the sampling error. Next, we use a sinusoidal input as an example to show their influence. The response of the LF leads to different  $\varepsilon_{SAM}$  when the input is at the peak and zero crossing. As Fig. 12 shows, the response polarity reverses at zero-crossing between the BW limited and the input-tracking regions. Then, the  $\varepsilon_{SAM}$  caused by the input variation and the LF finite response counteract with each other as indicated by the equation below:

$$\varepsilon_{\text{SAM}@cross} \propto |\varepsilon_{\text{input}}| - |D_{\text{out}}(1 - z^{-1})\left(e^{-t/\tau} - e^{-(t+0.25T_s)/\tau}\right)|, \qquad (3)$$

where we subtract the error originated by the input variation ( $\varepsilon_{input}$ ) from the LF response. When compared with the ideal integrator, the real LF experiences a smaller  $\varepsilon_{SAM}$  under this condition. We can also confirm this trend through the behavioral simulation results in Fig. 13. As the opamp bandwidth is proportional to  $\tau$ , we plot the sampling error versus the bandwidth which generalizes the required opamp bandwidth consideration. From there, the  $\varepsilon_{SAM@cross}$  increases with the opamp bandwidth and becomes closer to the ideal integrator condition. The  $\varepsilon_{SAM@cross}$  almost saturates when the UGBW of the opamp is close to  $15F_s$ , but the minimum  $\varepsilon_{SAM@cross}$  appears when that UGBW is ~3–4 $F_s$ . Figure 13 also shows the sampling error of the intermediate cases when the input of the QTZ is close to the one-fourth or

Fig. 13 The maximum sample error versus the bandwidth of the opamp in the zero crossing and half and peak of the sine wave, respectively



three-fourths location of the sine wave ( $\varepsilon_{SAM@half}$ ). The zero-crossing and peak conditions bind the originated sampling error. Indeed, the signal behavior of the half values case is similar to the zero-crossing (Fig. 12), but with a different amount of error induced from the input-dependent part ( $\varepsilon_{input}$ ).

On the other hand, still in Fig. 12, the polarity of the response is the same between the BW limited and the input-tracking region at the peak. Then, the  $\varepsilon_{SAM}$  caused by the input variation and the LF finite response accumulate, which we can express by

$$\varepsilon_{\text{SAM@peak}} \propto |\varepsilon_{\text{input}}| + |D_{\text{out}}(1 - z^{-1}) \left( e^{-t/\tau} - e^{-(t + 0.25T_s)/\tau} \right)|, \tag{4}$$

where the  $\varepsilon_{input}$  adds to the LF response error. When compared with the ideal integrator, the real LF experiences a larger  $\varepsilon_{SAM@peak}$  under this condition. While it is similar to the zero-crossing condition, as the bandwidth of the opamp increases, the  $\varepsilon_{SAM@peak}$  also approaches the ideal integrator's response, as illustrated in Fig. 11. The  $\varepsilon_{SAM@peak}$  is at its minimum value when the UGBW of the opamp is  $>6F_s$ . Based on the above analysis, since  $\varepsilon_{SAM@cross}$  and  $\varepsilon_{SAM@peak}$  have different characteristics versus the integrator bandwidth, we need to consider both errors. In the current design, we choose a  $4F_s$  UGBW to balance the  $\varepsilon_{SAM}$  and opamp power with a margin for stability.

#### 3.2 Measurement Results

Figure 14 illustrates the die photo of the CT DSM, fabricated in 28 nm CMOS and occupying an active area of  $0.19 \text{ mm}^2$ . The power supplies of the QTZ and the NRZ DAC are 1.1 V and 1.5 V, respectively, assuming low noise considerations. The other parts are working under a 1 V supply. The sampling frequency of the modulator is 2 GHz with 10 OSR (oversampling ratio). We implemented the  $0.65T_s$  ELD and  $0.25T_s$  through the inverters' delay, which varies under PVT. Here, for best speed performance, we only tune the fine sampling instant. The bandwidth is 100 MHz. Figure 15 shows the output spectrum of the modulator with a -2 dBFS,  $1.4V_{pp}$  single-tone signal at ~18 MHz input frequency. The SNDR, SNR, and spurious-free dynamic range (SFDR) are 72.6 dB, 73.2 dB, and 83.6 dB, respectively, after the DAC mismatch calibration [23]. The 80 dB/decade spectral slope validates the fourth-order noise shaping realized by the SAB and two conventional integrators. The total power consumption is 16.3 mW composed by 4.4 mW and 14.3 mW from the analog and digital circuits, respectively. The analog part comprises the opamps, DAC, and QTZ, and the digital part includes the clock generator, the logic buffer, and the control circuits. The first opamp consumes the largest power due to its high thermal noise requirement with a heavy load. While the second opamp should maintain enough bandwidth for the notch of the NTF that causes influence on the SQNR of the low OSR design, it together with the last opamp has relatively smaller power benefiting from their smaller load. The power



Fig. 14 Die photo



Fig. 15 Single-tone output spectrum

consumption of the 7b 2GS/s coarse-fine QTZ is 1.4 mW, only 8.6% of the total, benefiting from the PSQ technique-based two-step QTZ. The SAR directly uses the supply and ground as references; therefore, we did not adopt any reference buffer, and we include its power in the breakdown of the QTZ power. Table 2 summarizes the measured performance. The modulator achieves a peak SNDR of 72.6 dB and a DR of 76.2 dB, resulting in an excellent Schreier FoM 170.5 dB (SNDR) or 174.2 dB (DR), and a Walden FoM 23.4 fJ/conversion step.

|                      | This work   |
|----------------------|-------------|
| Area (mm²)           | 0.019       |
| Technology (nm)      | 28          |
| OSR                  | 10          |
| Fs (GHz)             | 2           |
| Bandwidth (MHz)      | 100         |
| Power (mW)           | 16.3        |
| Peak SNDR (dB)       | 72.6        |
| DR (dB)              | 76.3        |
| FOMSch/SNDR (dB)     | 170.5       |
| FOMSch/DR (dB)       | 174.2       |
| FoMWa (fJ/conv.step) | 23.4        |
| STF peak             | Yes(11.7dB) |

 Table 2
 Key performance summary

# 4 A 40 MHz Bandwidth Noise-Shaping Pipeline SAR ADC with 0-N MASH Structure

Figures 16a, b present the architecture of the proposed energy-efficient SAR-assisted NS pipeline ADC and its corresponding signal flow diagram, respectively. The main ADC comprises the first-stage SAR ADC (6b), the residue amplifier, and the secondstage SAR ADC (5b). We inserted one-bit redundancy between the two stages to tolerate the conversion error from the first stage. The ADC is partially interleaved in the first stage, where a coarse SAR ADC performs the conversion and two fine DACs of Ch-1/Ch-2 DAC generate the residue voltage alternately. Subsequently, the circuit transfers that residue voltage to the residue amplifier for residue amplification. The residue amplifier adopts an open-loop dynamic amplifier architecture for low power considerations. Similar to [3], we extract the full resolution residue voltage of the second-stage SAR ADC after the end of the conversion. Then, the circuit feeds back the residue voltage to the residue amplifier, adding to the input of the second stage in the next sampling phase. Consequently, we will have the EF NS completed, where the zero of the NTF relates to the EF residue gain of  $\alpha$  provided by the dynamic amplifier. However, the dynamic amplifier is more sensitive under PVT variation than the close-loop residue amplifier in [3], leading to the  $\alpha$  variation and a degraded NTF. Thus, we add an extra FF path in the second stage, which enhances the NTF and compensates for the NS effect deterioration due to the gain variation in



Fig. 16 (a) Proposed energy-efficient SAR-assisted NS pipeline ADC architecture and (b) corresponding signal flow diagram

the residue amplifier, with the pole of NTF related to the FF residue gain of  $\beta$ . Consequently, the transfer function of the ADC with the EF-FF NS structure becomes

$$D_{\rm o}(z) = V_{\rm in}(z) + \left(1 - \frac{G}{G_{\rm d}}\right) Q_1(z) + \frac{1 - \alpha H_{\rm E}(z)}{1 + \beta H_{\rm F}(z)} \cdot \frac{Q_2(z)}{G_{\rm d}}.$$
 (5)

There are several design considerations about the NTF in Eq. (3), elaborated in the following section.

We introduce two calibrations in this case, including the gain calibration for the dynamic amplifier and the proposed interstage offset calibration. Moreover, the DWA (data-weighted averaging) technique [24] handles the DAC mismatch. The DWA and the interstage offset calibration operate in the background with their hardware completely integrated on-chip. The gain calibration includes the on-chip PRN generator and the off-chip gain calibration logic (least mean square algorithm), both detailed later.

# 4.1 SAR-Assisted NS Pipeline ADC

To save power, we adopted a dynamic amplifier [24] in the residue amplifier replacing the conventional static counterpart (Fig. 17). An extra input pair added to the dynamic amplifier realizes the voltage summation of the EF and the first-stage residue with dynamic power only. The transistor sizing ratio between the input and the EF residue pairs set to G:  $\alpha$  (unit-gain implemented by  $\alpha = 1$ ) allows a first-order NS in the ADC with the filter implemented as the unit delay of  $H_{\rm E}(z) = z^{-1}$ . The separated bias currents of the two paths are  $I_{\rm b1}$  and  $I'_{\rm b1}$ , with the ratio also set as G:  $\alpha$  for a better gain ratio accuracy. Figure 17 also illustrates the operating sequence of the dynamic amplifier.

Although the EF residue summation accomplished through the extra input pair of the dynamic amplifier exhibits good power efficiency, the gain ratio is sensitive to the nonidealities, including the input common-mode, PVT, and mismatch variations. Here, the first-stage SAR ADC determines the input common mode of the signal pair of the dynamic amplifier, while the adopted Vcm-based switching method [25] secures a stable common mode. However, the input common mode of the EF pair defined by the output common mode of the dynamic amplifier is sensitive to the PVT and mismatch variation. According to the simulation result, the output of the dynamic amplifier common-mode voltage has a maximum variation of 50 mV, which alters  $\alpha$  by 3%.

Due to the open-loop structure, the absolute values of G and  $\alpha$  vary greatly over PVT, but their ratio is less sensitive with the same type of transistors both in the input and the EF pairs. In Fig. 18a, we plot a 3000-run Monte Carlo simulation with the process corner variation showing that the maximum variation of the gain ratio between G and  $\alpha$  is within  $\pm 0.5\%$  when they are 8 and 1, respectively, while we



Fig. 17 Dynamic amplifier-based residue amplifier realizing EF NS and its operating sequence



Fig. 18 Monte Carlo simulation (3000 runs) of the variation of G:  $\alpha$  with (a) process corner and (b) mismatch variation effect

ensure the accuracy of *G* by background calibration [26] through tuning the current source of  $I_{b2}$ , with the proportional adjustment of  $\alpha$  simultaneously, thus maintaining a stable ratio regarding *G*. Unlike the PVT variation, the mismatch affects the values of *G* and  $\alpha$  independently, altering their gain ratio. Figure 18b shows the gain ratio variation under mismatch with a 3000-run Monte Carlo simulation, where the *G*: $\alpha$  has a maximum variation of  $\pm 11.5\%$ .

To summarize, considering all the above nonidealities and the worst condition where G and  $\alpha$  have an opposite  $3\sigma$  variation, the gain ratio between G and  $\alpha$ experiences a maximum variation of  $\pm 27\%$ . Therefore, with G well calibrated,  $\alpha$  can, in the worst case, have an error within  $\pm 27\%$  departed from its ideal value. Figure 19 displays the simulated SQNR of the ADC with the variation of  $\alpha$ , based on a ten-bit SAR-assisted pipeline structure with first-order NS and OSR = 7.5. The SQNR drops about 4 dB when  $\alpha$  varies  $\pm 27\%$ . To avoid an extra calibration for  $\alpha$ , we present an enhanced NS structure with a mild hardware cost, compensating for the SQNR drop due to the variation of  $\alpha$ .

In Fig. 20, we add an extra residue FF path in the second stage. In this configuration, we further filtered the sampled  $V_{\text{res2}}$  with  $H_{\text{F}}(z)$  and summed it with the second-stage DAC's output voltage in the comparator through an additional input pair. The comparator provides the FF residue gain of  $\beta$  through the ratio-sized input transistors [27]. For simplicity, we can just implement the  $H_{\text{F}}(z)$  with one cycle delay, where  $H_{\text{F}}(z) = z^{-1}$ . Therefore, according to Eq. (3), the noise transfer function of the ADC with the EF-FF NS structure becomes

$$NTF(z) = \frac{1 - \alpha z^{-1}}{1 + \beta z^{-1}}.$$
 (6)

Ideally, with  $\beta$  set to 1, the additional pole leads to an extra 6 dB noise attenuation at a low frequency [28], compensating for the SQNR drop due to the  $\alpha$  variation. However, the pole must be inside the unit circle for stability, thereby requiring  $\beta < 1$ .



Fig. 19 Simulated SQNR versus  $\alpha$  in a ten-bit ADC structure with first-order NS ( $\alpha = 1$ ) and OSR = 7.5, where G = 8 is ideal



Fig. 20 Bode diagram of the NTF in the current design and its comparison

After accounting for the  $3\sigma$  variation of  $\alpha$  ( $\pm 27\%$ ) and the maximum 4 dB SQNR, we set  $\beta$  as 0.75.

Figure 21 plots the bode diagram of the NTF of the EF-FF NS structure. With the pole at 0.75, it still obtains an additional 5 dB noise suppression at the low frequency when compared with the standard first-order NTF. Meanwhile, the NTF owns a low magnitude at high frequency, leading to a good NS effect with a small OSR. As a result, the enhanced NTF enables both high resolution and wide BW performance. On the other hand, according to the 3000-run Monte Carlo simulation results of the second-stage comparator, the  $3\sigma$  variation of  $\beta$  is  $\pm 20\%$ , which implies that the pole moves to a maximum of 0.9 and the system potentially becomes unstable. While such large variation only happens in the extreme case, the charge sharing between the feedforward capacitor and the parasitic capacitor, which attenuates the residue voltage to move the pole away from the unit circle, also helps to stabilize the ADC. Thus, according to the five measured samples and the post-layout Monte Carlo simulation result, the ADC is stable within a  $3\sigma$  case. Although such variation can alter the pole's location in the NTF, it only slightly weakens the FF NS effect. Based on a ten-bit SAR-assisted NS pipeline ADC model, Fig. 21 illustrates the SQNR distribution of a 50-run Monte Carlo simulation under different NS realizations



Fig. 21 Simulated SQNR distribution under different NS configurations with process corner and mismatch variations

considering both the variation of  $\alpha$  and  $\beta$ , with the interstage gain of *G* and offset calibrated. Due to the relatively accurate gain in the close-loop residue amplifier, its EF NS structure experiences the smallest SQNR variation. At the same time, the ADC with an open-loop dynamic amplifier has a decentralized SQNR distribution under corner and mismatch variations. Fortunately, we can fully compensate the SQNR drop with the proposed EF-FF NS, saving an extra calibration for the gain of the EF path.

To overcome the speed limitation of the SAR-assisted NS pipeline ADC in [29], mainly confined by the single-channel first stage, we introduce a duplicated channel of Ch-2 in the first stage to obtaining the partial interleaving operation [29] (Fig. 22a). When Ch-1 performs the conversion at the *n*-th cycle, Ch-2 samples the input simultaneously. After the conversion, we employ the residue voltage in Ch-1 in the residue amplifier for amplification; meanwhile, Ch-2 can still track the input. In the (n + 1)-th cycle, Ch-1 and Ch-2 alter their roles, whose operation propagates down in the following samples. Like this, we save the sampling operation from the critical path of the entire ADC conversion, thereby significantly speeding up the ADC. Furthermore, since the sampling time now can be as long as the first-stage conversion plus the amplification time, the tracking time of the sampler widens, greatly relaxing the design of the sample-and-hold circuit.

We add an extra coarse SAR ADC to further improve the conversion speed, with the timing diagram illustrated in Fig. 22b. With the coarse SAR, we can simplify the two-channel SAR ADCs to two-channel DACs. The coarse-SAR quantizes six-bit MSBs (most significant bits) with its low-resolution DAC, resulting in high conversion speed and low switching power. The circuit transfers the MSB codes to one of the two DACs alternately that generate the full resolution residue voltage. Figure 22b presents the adopted DWA logic to shape the DAC mismatch error in both channel DACs. After the coarse SAR resolves three MSBs, we decoded their binary form into the thermometer code and transferred it to one of the interleaving fine DACs through the DWA logic. Simultaneously, we continuously resolve in the coarse SAR



Fig. 22 Timing diagram of the first-stage ADC with (a) two-channel interleaving and (b) additional coarse SAR-assisted conversion

the remaining three LSBs (least significant bits) of the first stage. Due to the coarse SAR ADC, the DWA operation calls for no extra time slot and thereby does not slow down the ADC.

Figure 23 displays the major nonidealities in the ADC architecture. The  $n_{\rm sh1}$ ,  $n_{\rm ra}$ ,  $n_{\rm cmp1}$ , and  $n_{\rm cmp2}$  are the thermal noise from the first-stage sampling circuit, residue amplifier, and first- and second-stage comparators, respectively. The  $n_{\rm eff}$  and  $n_{\rm ff}$  are the thermal noise from the EF and FF path, respectively. The  $e_{\rm mis1}$  and  $e_{\rm mis2}$  are the DAC mismatch errors in the first- and second-stage SAR ADCs, respectively. We denote the input-referred offset voltages of the residue amplifier and first- and second-stage comparators as  $v_{\rm os,ra}$ ,  $v_{\rm os1}$ , and  $v_{\rm os2}$ , respectively. First, we omit the effect of the offset voltage where  $v_{\rm os,ra} = v_{\rm os1} = v_{\rm os2} = 0$ . Therefore, when  $G_{\rm d} = G$ , the transfer function of the ADC with the noise and mismatch error sources is

$$D_{\rm o} = V_{\rm in} + n_{\rm sh1} + n_{\rm ra} + e_{\rm mis1} + \frac{1}{G} e_{\rm mis2} + \frac{1}{G} \times \left[ z^{-1} \cdot n_{\rm ef} + \frac{3}{4} \text{NTF} \cdot z^{-1} \cdot n_{\rm ff} + \text{NTF} \cdot \left( n_{\rm cmp2} + Q_2 \right) \right].$$
(7)

With the  $Q_1$  and  $n_{cmp1}$  fully canceled, thereby they do not appear in Eq. (5). The interstage gain suppresses the  $n_{ef}$  and  $n_{ff}$ ; besides, the NTF further shapes the  $n_{ff}$  along with  $Q_2$ . Consequently, the additional noises from the EF and FF paths become trivial. On the other hand, the extra input pair in the comparator for the FF



Fig. 23 Major nonidealities in the ADC architecture

path in the second stage worsens the  $n_{cmp2}$  and induces an extra 1.7 dB SNR drop of the ADC under the same power budget, while the additional FF NS imposes a 5 dB in-band noise suppression, with  $n_{cmp2}$  shaped together with  $Q_2$ . As a result, the FF NS still brings net benefit. The sampler in the first stage and the residue amplifier dominate the overall noise performance of the ADC, while the DAC mismatch error in the first stage dominates the linearity performance due to its non-shaped characteristic. Furthermore, we fulfill the noise requirement from the sampler and residue amplifier by budgeting enough sampling capacitance and integration time, respectively. The DWA technique suppresses the DAC mismatch.

Next, we consider the interstage offset in the pipeline structure. When the offset voltage exists in the comparator, the circuit shifts the searching baseline of the SAR ADC with the same offset voltage but with reverse polarity. Therefore, the  $v_{os1}$  and  $v_{os2}$  from Fig. 23 have negative polarity. The total interstage offset voltage becomes

$$v_{\rm os,in} = v_{\rm os1} + v_{\rm os,ra} - \frac{1}{G} \cdot \frac{1 - z^{-1}}{1 + 3/4z^{-1}} v_{\rm os2}.$$
 (8)

Due to the unity EF structure, a delayed version of  $v_{os2}$  feeds back to the input of the residue amplifier, thus canceling itself out ideally. However, the residual  $v_{os2}$  indeed still contributes to  $v_{os}$  due to the non-unity gain of the EF residue (as discussed above) while the amount is small. Meanwhile, the  $v_{os1}$  and  $v_{os,ra}$  are significant under the gain of *G* and can saturate the second-stage conversion, causing a large error. Hence, we propose a background interstage offset calibration with low timing and hardware overhead (detailed next).

# 4.2 Measurement Results

Figure 24 presents the chip micrograph of the ADC prototype fabricated in 28 nm CMOS occupying an active area of  $0.016 \text{ mm}^2$ . The pseudo-random noise generator, DWA, and interstage offset calibration logic account for only 1%, 1.3%, and 1% of the total ADC's area, respectively. The ADC operates at the sampling rate of 600 MHz and achieves a BW up to 40 MHz at an OSR = 7.5. Figure 25 depicts the measured 32,768-point FFT spectrum with a 2 MHz and -04dBFS sinusoidal input signal at different calibration configurations. With all calibrations enabled, the prototype reaches a peak SNDR and spurious-free dynamic range (SFDR) of 75.2 dB and 87.1 dB, respectively. The residual DAC mismatches and the nonlinearity from the dynamic amplifier impose the remaining harmonics. The DWA effectively improves the SFDR, and the interstage offset calibration enhances both the SNR and SFDR of the ADC. The high-frequency noise floor looks mild with the DWA enabled (Fig. 25). It is because the DWA shapes the harmonic tones to high frequency and the quantization error floor superposes the high-frequency spectrum and the shaped nonideal spurs. However, the NTF of the ADC remains unchanged. Under a supply voltage of 1 V, the ADC consumes 2.56 mW, leading to a good FoM of 177.1 dB. The digital part consumes most of the power, including the control logic and DAC drivers. The power-efficient dynamic amplifier only accounts for less than 7% of the total power consumption. All the calibrations and DAC mismatch correction, including the pseudo-random noise generator, the DWA logic, and the proposed interstage offset calibration, consume only 0.34 mW (13% of the total ADC power).

Table 3 summarizes the performance of the ADC. It exhibits both high resolution and BW with outstanding FoMs among all the converters listed, revealing the



Fig. 24 Die photo

 Table 3
 Key performance

summary



Fig. 25 Measured output spectrum with different configurations at 600 MS/s

|                            | This work   |
|----------------------------|-------------|
| Technology [nm]            | 28          |
| Architecture               | NS pipe-SAR |
| Fs [MHz]                   | 600         |
| OSR                        | 7.5         |
| BW [MHz]                   | 40          |
| SNDR [dB]                  | 75.2        |
| SFDR [dB]                  | 87.1        |
| DR [dB]                    | 76.6        |
| Power [mW]                 | 2.56        |
| FoM <sub>w</sub> [fJ/step] | 6.8         |
| FoM <sub>s</sub> [dB]      | 177.1       |
| Area [mm <sup>2</sup> ]    | 0.016       |
| Off. Cal.                  | On-chip     |

effectiveness of the EF-FF NS structure and the partial interleaving architecture, including the on-chip calibration. The ADC is robust under PVT variation with the background calibrations and additional FF path. Besides, with only two DACs partially interleaved, we can maintain the channel mismatches within a reasonable level through a careful layout.

# 5 A 25 MHz Bandwidth Gain Error-Tolerant N-0 MASH Noise-Shaping Pipeline SAR ADC

To realize the noise shaping in the first stage, we can consider both EF and CIFF. The EF structure often calls for an amplifier with accurate gain to construct the sharp NTF [26], leading to extra noise and requiring calibration. In this work, we use a fully passive CIFF NS structure in the first stage to implement a stable NTF. Figure 26 illustrates the proposed MASH 2-0 NS-SAR-assisted pipelined ADC with a simplified schematic and timing diagram. We realized the second-order NS-SAR ADC in the first stage based on a passive CIFF filter [30], while the second stage is a pure SAR ADC. The NTF in Eq. (2) of the first stage is  $(1-0.5z^{-1})^2$  as two integration capacitors,  $C_1$  and  $C_2$ , are equal to the main DAC capacitor. Its operation procedure is the following. Initially, the DAC capacitor samples the input voltage  $(V_{\rm in})$  during  $\Phi_{\rm S}$ . Then, the NS-SAR ADC of the first stage converts 6b with the threeinput comparator where the ratio of the input pairs  $(g_c, g_{c1}, and g_{c2})$  is 1:1:2. After the sampling and conversion phases, the circuit sums the first stage's residue ( $V_{res1}$ ) and the voltage on two integration capacitors ( $V_{int1}$  and  $V_{int2}$ ), subsequently amplified by a three-input dynamic amplifier where the ratio among the input pairs is the same as the three-input comparator. Eventually, considering the feedforward path summation, the amplifier and the comparator (equivalently at their inputs) undertake  $-E_{q1}$ during the amplification phase  $\Phi_{da}$ . With  $-E_{q1}$  handed over to the second stage, it maintains the noise-shaping ability for the quantization error and comparator noise.



Fig. 26 The CIFF MASH 2–0 SAR-assisted pipeline: (a) a simplified schematic and (b) timing diagram
After the amplification,  $V_{\text{res1}}$  on  $C_{\text{DAC1}}$  charge-shared with two integration capacitors ( $C_1$  and  $C_2$ ) sequentially during  $\Phi_1$  and  $\Phi_2$  leads to a second-order passive integration. Simultaneously, the second-stage SAR ADC attains the remaining 6b resolution. After all, the output of the second stage ( $D_{\text{out2}}$ ) passes through the digital reconstruction filter (NTF/ $G_d$ ) and then sums with the first-stage output ( $D_{\text{out1}}$ ), thus removing the quantization noise in the first stage at the final output ( $D_{\text{out1}}$ ).

Figure 27 displays the major sources of nonideality in the proposed MASH 2–0 SAR-assisted pipeline ADC. The  $n_{\text{DAC}}$  is mainly from the kT/C noise while  $n_{0,\Phi_1}$ ,  $n_{1,\Phi_1}$ , and  $n_{2,\Phi_2}$  are the noises from the two passive integration phases  $\Phi_1$  and  $\Phi_2$ [30]. The  $n_{\text{AMP}}$  is the total input-referred noise of the amplifier.  $E_{q1}$ ,  $e_{\text{mis1}}$ , and  $n_{\text{CMP1}}$ are the quantization noise, mismatch error in the capacitance DAC array, and comparator noise of the first stage, respectively, while  $E_{q2}$ ,  $e_{\text{mis2}}$ , and  $n_{\text{CMP2}}$  are the corresponding impairments of the second stage (same as above). The overall transfer function, including these nonidealities, is



Fig. 27 Noise and mismatch analysis of the MASH 2-0 structure

$$D_{\text{out}} = V_{\text{in}} - e_{\text{mis1}} + n_{\text{DAC}} + n_{0,\Phi1} \left( 1 - 1/2z^{-1} \right) + n_{1,\Phi1} + 2n_{2,\Phi2} \left( 1 - 1/2z^{-1} \right)$$
  
NTF  $\cdot n_{\text{CMP1}} + \text{NTF} \cdot n_{\text{AMP}} + \frac{\text{NTF}}{G} \left( E_{q2} + e_{\text{mis2}} + n_{\text{CMP2}} \right)$   
(9)

Then, we cannot shape  $e_{\min 1}$ ,  $n_{DAC}$ , and  $n_{1,\Phi 1}$  while  $n_{0,\Phi 1}$  and  $n_{2,\Phi 2}$  are first-order shaped. With sufficiently large sampling and integration capacitors, we can well suppress  $n_{DAC}$ ,  $n_{0,\Phi 1}$ ,  $n_{1,\Phi 1}$ , and  $n_{2,\Phi 2}$ , with  $e_{\min 1}$  addressed by the 4-b DWA in this design (detailed later). The NTF shapes both  $n_{CMP1}$  and  $E_{q1}$ , while the redundancy between stages can further cover  $n_{CMP1}$  and the reconstruction filter (without gain error) cancels  $E_{q1}$ . Furthermore, the NTF shapes the  $E_{q2}$ ,  $e_{\min 2}$ , and  $n_{CMP2}$  while the interstage gain G suppresses them.

We split the amplification into three paths and list their noises individually. The  $n_{AMP}$  divides itself into three-input referred noises  $n_{res1}$ ,  $n_{t1}$ , and  $n_{t2}$ , which connect to the signal path  $V_{res1}$ ,  $V_{t1}$ , and  $V_{t2}$ , respectively. The  $n_{AMP}$  is the lump sum of all the above noises. On the other hand, from Eq. (4), the NTF configures all of them as second-order shaped. Nevertheless, the multiple input pairs worsen the noise when compared with a single pair at the same power budget [30]. The total noise increases by  $4\times$  because of the two additional added paths. Fortunately, the noise attenuates  $\sim 4\times$  due to the NTF with OSR = 8. Its net in-band noise is almost the same as in the case of the one-pair device. Eventually, we ensure an overall small  $n_{AMP}$  by budgeting a sufficiently long integration time.

We carefully studied the noise leakage issue in the MASH architecture due to the nonideal first-stage NTF. The ratios of the capacitors (DAC array capacitor and two integration capacitors) and the ratio between the comparator and amplifier input pairs determine the NTF in this work. As we implement the DAC and integration capacitors with the same type of capacitors (MoM), we assume them as well matched in the following analysis with  $C_1 = C_2 = C_{DAC}$ . We set  $G_d = G_a = 1$  to simplify the analysis and focus on the discussion of the NTF mismatch. We can detail the noise transfer function of the first stage of the MASH SDM as

$$NTF_{1}(z) = \frac{(1 - 0.5z^{-1})^{2}}{1 + (0.5g_{c1} + 0.25g_{c2} - 1)z^{-1} + (0.25 - 0.25g_{c1})z^{-2}}$$
(10)

where  $g_{c1}$  and  $g_{c2}$  are the gain ratios of the input pairs of the comparator normalized to  $g_c$ . Besides, a three-input amplifier constructs the first-stage residue of this MASH?SDM, which implies that we can model the output voltage of the amplifier  $V_{\text{amplifier}}$  as

$$V_{\text{amplifier}} = -\text{NTF}_{1}(z)E_{q1}(z) \times \frac{1 + (0.5g_{a1} + 0.25g_{a2} - 1)z^{-1} + (0.25 - 0.25g_{a1})z^{-2}}{(1 - 0.5z^{-1})^{2}}$$
(11)

where  $g_{a1}$  and  $g_{a2}$  are the gain ratios of the input pairs of the amplifier normalized to  $g_{a}$ . In the ideal case, where  $g_{c1} = g_{a1} = 1$  and  $g_{c2} = g_{a2} = 2$ , the transfer function of the digital reconstruction filter is

$$NTF_{d}(z) = (1 - 0.5z^{-1})^{2}$$
(12)

Then, the complete output of this MASH SDM will be

$$D_{\text{out}}(z) = V_{\text{in}}(z) + \text{NTF}_{d}(z)E_{q2}(z) + E_{q1}(z)\text{NTF}_{1}(z) \\ \times \left(1 - \frac{1 + (0.5g_{a1} + 0.25g_{a2} - 1)z^{-1} + (0.25 - 0.25g_{a1})z^{-2}}{(1 - 0.5z^{-1})^{2}}\text{NTF}_{d}(z)\right)$$
(13)

Here, we cancel completely  $E_{q1}(z)$  in the ideal case. Under PVT and mismatch variations, the zeros of the NTF<sub>1</sub> are robust and set by the capacitor ratios, while the pole locations can drift due to the mismatch among  $g_{c1}$  and  $g_{c2}$ , but they are not crucial in the cancellation process. Besides, the mismatch between  $g_{a1}$  and  $g_{a2}$  affects the cancellation procedure. Fortunately, the robust NTF<sub>1</sub> can relax their variations. It is noteworthy that the ratio between  $g_{c1}$ ,  $g_{c2}$ ,  $g_{a1}$ , and  $g_{a2}$  are in the first-order set by the width of the same type of transistors, and they therefore are relatively insensitive to PVT variations.

Only the absolute variation of the amplifier gain is a direct cause of the interstage gain error. The variations of  $g_{a1}$  and  $g_{a2}$  associated with  $g_a$  mainly affect the transmission of  $-E_{q1}(z)$  to the second stage and therefore potentially cause noise leakage. We implement the ratio among  $g_a$ ,  $g_{a1}$ , and  $g_{a2}$  with the same type of transistor in different sizes, and they experience similar variations over PVT. On the other hand, the mismatch, altering the NTF and affecting the gain error tolerance ability, can influence the relative value between  $g_a$ ,  $g_{a1}$ , and  $g_{a2}$ , altering the NTF and affecting the gain error tolerance ability, can influence the relative value between  $g_a$ ,  $g_{a1}$ , and  $g_{a2}$ , altering the NTF and affecting the gain error tolerance ability. To demonstrate the sensitivity, we performed a behavioral simulation based on the ADC structure. Figure 28 shows the SQNR with  $g_{c1}$ ,  $g_{c2}$ ,  $g_{a1}$ , and  $g_{a2}$ , variations. Furthermore, the proposed ADC is more sensitive to the input pair mismatch of the amplifier than the comparator as the perfect cancellation relies on  $g_{a1}$  and  $g_{a2}$ , which is consistent with Eq. (8). To have a good matching, the inputs of the amplifier are sufficiently large with sizes 16  $\mu$ m/ 0.05  $\mu$ m, 16  $\mu$ m/0.05  $\mu$ m, and 32  $\mu$ m/0.05  $\mu$ m, respectively. Figure 29 depicts their variations with a 100-run Monte Carlo simulation, and such large size ensures small



Fig. 28 Simulated SQNR varies with (a) extra comparator ratio and (b) extra amplifier ratio



Fig. 29 Monte Carlo simulation results of (a)  $g_{a1}$  and (b)  $g_{a2}$ 

enough standard variations. The  $3\sigma$  coefficient variations of  $g_{a1}$  and  $g_{a2}$  are  $\pm 8.4\%$  and  $\pm 12\%$ , respectively, leading to the worst SQNR of 77 dB. Figure 30 presents a 100-run post-layout Monte Carlo simulation, which illustrates the SQNR variations due to the mismatch of the input pairs of the comparator and the amplifier. The mean and standard deviation values of the SQNR are 80.26 dB and 1.25, respectively, leaving enough margin for an overall 75 dB SNDR target.



Fig. 31 Die photo

## 5.1 Measurement Results

Figure 31 presents the fabricated device in 28 nm CMOS, with the ADC occupying an area of 0.027 mm<sup>2</sup>. Figure 32 plots the measured ADC's output spectrum with and without DWA. The input frequency is 2.04 MHz with more than nine harmonics included. Consequently, the interstage gain and nonlinearity error shaping ability lead to a peak SFDR and SNDR of 92.1 dB and 75 dB, respectively. To demonstrate the tolerant range, we introduce a wide range of gain errors by adjusting the



Fig. 32 Measured spectrum



Fig. 33 Measured SNDR variations versus power supply with five chips

reference voltage of the second-stage SAR ADC. We brought off-chip the reference voltage of the second-stage ADC for measurement purposes. We measured five samples and their SNDR variations across  $\pm 10\%$  supply voltages that appear in Fig. 33. The largest SNDR drop is 0.65 dB which agrees with the analysis. The



Fig. 34 Two-tone spectrum

two-tone spectrum with -8.5 dBFS appears in Fig. 34 and the IMD3 is -81.5 dB. The prototype ADC runs at 400 MHz and consumes 1.26 mW power. The digital circuits consume the major portion. Table 4 compares the proposed design with the state of the art having similar specifications. Unlike the gain error shaping (GES) [31] scheme, the proposed design can handle positive and negative gain errors with small hardware overhead while still maintaining a relatively high-speed operation. Reference [32] reaches a good energy efficiency without calibration. However, the SNDR is ~4 dB and ~8 dB better with OSR = 8 and OSR = 20, respectively. The prototype avoids off-chip DAC calibration and is within a -16% to +12% gain error tolerable range with OSR = 8. The design exhibits a larger gain error tolerance range with a larger OSR with the SNDR mainly limited by other noises and nonlinearities, which the NTF cannot shape. We obtained a FoM<sub>W</sub> and a FoM<sub>S</sub> of 5.5 fJ/conv.-step and 178 dB, showing that this design can maintain a good power efficiency with an additional gain error tolerance ability.

|                                               | Gregoire<br>ISSCC2008 | Yoshioka<br>ISSCC2017 | Song JSSC2020                      | Hsu JSSC2020                       | Song<br>JSSC2021       | Hsu JSSC2020                       | Wang<br>JSSC2020 |                    |                 |      |
|-----------------------------------------------|-----------------------|-----------------------|------------------------------------|------------------------------------|------------------------|------------------------------------|------------------|--------------------|-----------------|------|
|                                               | [33]                  | [34]                  | [3]                                | [35]                               | [36]                   | [31]                               | [2]              | This w             | ork             |      |
| Process [nm]                                  | 180                   | 28                    | 65                                 | 40                                 | 28                     | 40                                 | 28               | 28                 |                 |      |
| Architecture                                  | Pipeline<br>ADC       | Pipeline SAR          | 0–1 MASH                           | Pipeline SAR                       | Pipeline NS<br>SAR     | Pipeline<br>SAR                    | Pipeline<br>SAR  | 2-0 M              | ASH             |      |
| Interstage gain error sup-<br>pression scheme | Closed-loop<br>SAR    | Digital<br>amplifier  | Closed-loop<br>opamp               | Second-order GES                   | Foreground calibration | Second-order<br>GES + DEF          | WACLS            | Inherer            | t architect     | ture |
| Interstage offset suppres-                    | N/A                   | Digital               | Foreground                         | N/A                                | Background             | N/A                                | Auto             | Code-c             | ounting-        |      |
| sion scheme                                   |                       | amplifier             | calibration                        |                                    | calibration            |                                    | zeroing          | based h<br>calibra | ackgroun<br>ion | þ    |
| DAC mismatch<br>calibration                   | N/A                   | N/A                   | Foreground off-chip<br>calibration | Foreground off-chip<br>calibration | 3b DWA                 | Foreground off-chip<br>calibration | RS               | 4b DW              | A               |      |
| Supply [V]                                    | 1.2                   | 0.7                   | 1.0                                | 1.0                                | 1.0                    | 1.0                                | 1.1              | 1.0                |                 |      |
| Fs [MHz]                                      | 20.2                  | 160                   | 200                                | 100                                | 600                    | 100                                | 100              | 400                |                 |      |
| Power [mW]                                    | 7.5                   | 1.9                   | 4.5                                | 1.54                               | 2.56                   | 1.38                               | 0.7              | 1.26               |                 |      |
| Area [mm <sup>2</sup> ]                       | 2.3                   | 0.097                 | 0.014                              | 0.061                              | 0.016                  | 0.054                              | 0.018            | 0.027              |                 |      |
| OSR                                           | 1                     | 1                     | 8                                  | 4                                  | 7.5                    | 8                                  | 1                | 8                  | 10 2(           | 0    |
| -3 dB SNDR gain                               | N/A                   | N/A                   | N/A                                | <-5 to                             | N/A                    | <-25 to N/A*                       | N/A              |                    |                 |      |
| [%]                                           |                       |                       |                                    | N/A*                               |                        |                                    |                  | 16~                | 17~ 19          | ~6   |
|                                               |                       |                       |                                    |                                    |                        |                                    |                  | +12                | +13 +           | 14   |
| BW [MHz]                                      | 10.1                  | 80                    | 12.5                               | 12.5                               | 40                     | 6.25                               | 50               | 25                 | 20 10           | 0    |
| SNDR [dB]                                     | 65                    | 61.1                  | 77.1                               | 75.8                               | 75.2                   | 77.1                               | 71.7             | 75                 | 76.2 79         | 9.5  |
| FoM <sub>S</sub> [dB]**                       | 156.3                 | 167.3                 | 171.5                              | 174.9                              | 177.1                  | 173.7                              | 180.2            | 178                | 178.2 17        | 78.5 |
| FoM <sub>w</sub><br>[fJ/convstep]***          | 255.5                 | 12.8                  | 30.8                               | 12.2                               | 6.8                    | 18.9                               | 2.2              | 5.5                | <u>%</u>        | .S   |
| *Doce not amound of the month                 | to active common dat  |                       |                                    |                                    |                        |                                    |                  | 1                  | -               |      |

Table 4 Key performance summary and comparison

\*Does not provide the positive gain error data. \*\*FoM<sub>S</sub> = SNDR + 10 log<sub>10</sub> (BW/Power). \*\*\*FoM<sub>W</sub> = Power/ $(2 \times BW \times 2^{(SNDR-1.76)/6.02})$ 

### References

- Qi, L., Sin, S.-W., Seng-Pan, U., Maloberti, F., & Martins, R. P. (2017). A 4.2-mW 77.1-dB SNDR 5-MHz BW DT 2-1 MASH SD modulator with multi-rate opamp sharing. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 64(10), 2641–2654.
- 2. Wang, W., Zhu, Y., Chan, C., & Martins, R. P. (2018). A 5.35-mW 10-MHz single-opamp third-order  $CT\Delta\Sigma$  modulator with CTC opamp and adaptive latch DAC driver in 65-nm CMOS. *IEEE Journal of Solid-State Circuits*, 53(10), 2783–2794.
- Song, Y., et al. (2020). A 12.5-MHz bandwidth 77-dB SNDR SAR-assisted noise shaping pipeline ADC. *IEEE Journal of Solid-State Circuits*, 55(2), 312–321.
- Zhang, H., Zhu, Y., Chan, C.-H., & Martins, R. P. (2022). An inherent gain error tolerance noise-shaping SAR-assisted pipeline ADC with code-counter-based offset calibration. *IEEE Journal of Solid-State Circuits*, 57(5), 1480–1491.
- 5. Zanbaghi, R., Saxena, S., Temes, G., & Fiez, T. S. (2012). A 75-dB SNDR, 5-MHz bandwidth stage-shared 2-2 MASH ΔΣ modulator dissipating 16mW power. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 59(8), 1614–1625.
- 6. Yoon, D.-Y., et al. (2015). A continuous-time sturdy-MASH  $\Delta\Sigma$  modulator in 28nm CMOS. *IEEE Journal of Solid-State Circuits*, 50(12), 2880–2890.
- 7. He, T., Ashburn, M., Ho, S., Zhang, Y., & Temes, G. C. (2018, February). A 50MHz-BW continuous-time ΔΣ ADC with SAR ADC dynamic error correction achieving 79.8dB SNDR and 95.2dB SFDR. In 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers (pp. 230–231).
- Wu, B., Zhu, S., Xu, B., & Chiu, Y. (2016). A 24.7 mW 65 nm CMOS SAR-assisted CT modulator with second-order noise coupling achieving 45 MHz bandwidth and 75.3 dB SNDR. *IEEE Journal of Solid-State Circuits*, 51(12), 2893–2905.
- 9. Ho, C.-Y., et al. (2015, February). A 4.5 mW CT self-coupled delta-sigma modulator with 2.2 MHz BW and 90.4 dB SNDR using residual ELD compensation. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 274–275).
- Lee, C., Alpman, E., Weaver, S., Lu, C., & Rizk, J. (2013, June). A 66 dB SNDR 15 MHz BW SAR assisted delta-sigma ADC in 22 nm tri-gate CMOS. In *Proceedings of IEEE Symposium* on VLSI Circuits Digest (pp. 1–2).
- 11. Qi, L., Jain, A., Jiang, D., Sin, S.-W., Martins, R. P., & Ortmanns, M. (2019, February). A 76.6dB-SNDR 50MHz-BW 29.2mW noise-coupling-assisted CT sturdy MASH ΔΣ modulator with 1.5b/4b Quantizers in 28nm CMOS. In *IEEE International Solid-State Circuits Conference* - (*ISSCC*) *Digest of Technical Papers* (pp. 336–338).
- Qi, L., Jain, A., Jiang, D., Sin, S.-W., Martins, R. P., & Ortmanns, M. (2020). A 76.6dB-SNDR 50MHz-BW 29.2mW multibit CT sturdy MASH with DAC non-linearity tolerance. *IEEE Journal of Solid-State Circuits*, 55(2), 344–355.
- 13. Lee, K., Miller, M. R., & Temes, G. C. (2009). An 8.1 mW, 82 dB delta-sigma ADC with 1.9 MHz BW and -98dB THD. *IEEE Journal of Solid-State Circuits*, 44(8), 2202–2211.
- 14. Zhang, Y., Chen, C.-H., He, T., & Temes, G. C. (2015). A continuous-time delta-sigma modulator for biomedical ultrasound beamformer using digital ELD compensation and FIR feedback. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 62(7), 1689–1698.
- Sukumaran, A., & Pavan, S. (2014). Low power design techniques for single-bit audio continuous-time delta sigma ADCs using FIR feedback. *IEEE Journal of Solid-State Circuits*, 49(11), 2515–2525.
- Huang, S.-J., Egan, N., Kesharwani, D., Opteynde, F., & Ashburn, M. (2017, February). A 125 MHz-BW 71.9 dB-SNDR VCO-based CT ADC with segmented phase-domain ELD compensation in 16 nm CMOS. In *IEEE International Solid-State Circuits Conference* -*(ISSCC) Digest of Technical Papers* (pp. 470–471).
- 17. Loeda, S., Harrison, J., Pourchet, F., & Adams, A. (2016). A 10/20/30/40 MHz feedforward FIR DAC continuous-time  $\Delta\Sigma$  ADC with robust blocker performance for radio receivers. *IEEE Journal of Solid-State Circuits*, 51(4), 860–870.

- Dong, Y., et al. (2016). A 72 dB-DR 465 MHz-BW continuous-time 1-2 MASH ADC in 28 nm CMOS. *IEEE Journal of Solid-State Circuits*, 51(12), 2917–2927.
- Zanbaghi, R., et al. (2013). An 80-dB DR, 7.2-MHz bandwidth single opamp biquad based CT delta sigma modulator dissipating 13.7-mW. *IEEE Journal of Solid-State Circuits*, 48(2), 487–501.
- Li, Z., & Fiez, T. S. (2007). A 14 bit continuous-time delta-sigma A/D modulator with 2.5 MHz signal bandwidth. *IEEE Journal of Solid-State Circuits*, 42(9), 1873–1883.
- 21. Wu, S., Kao, T., Lee, Z., Chen, P., & Tsai, J. (2016, February). A 160MHz-BW 72dB-DR 40mW continuous-time  $\Delta\Sigma$  modulator in 16nm CMOS with analog ISI-reduction technique. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 280–281).
- 22. Ortmanns, M., & Gerfers, F. (2006). Continuous-time sigma-delta A/D conversion (pp. 94–113). Springer.
- 23. De Bock, M., Xing, X., Weyten, L., Gielen, G., & Rombouts, P. (2013). Calibration of DAC mismatch errors in ΔΣADCs based on a sine-wave measurement. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 60(9), 567–571.
- 24. Liu, C.-C., & Huang, M.-C. (2017, February). A 0.46 mW 5 MHz-BW 79.7 dB-SNDR noiseshaping SAR ADC with dynamic-amplifier-based FIR-IIR filter. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers* (pp. 466–467).
- Zhu, Y., et al. (2010). A 10-bit 100-MS/s reference-free SAR ADC in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 45(6), 1111–1121.
- 26. Li, S., Qiao, B., Gandara, M., Pan, D. Z., & Sun, N. (2018). A 13-ENOB second- order noiseshaping SAR ADC realizing optimized NTF zeros using the error-feedback structure. *IEEE Journal of Solid-State Circuits*, 53(12), 3484–3496.
- 27. Song, Y., et al. (2018). Passive noise shaping in SAR ADC with improved efficiency. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 26(2), 416–420.
- Fredenburg, J. A., & Flynn, M. P. (2012). A 90-MS/s 11-MHz-bandwidth 62-dB SNDR noiseshaping SAR ADC. *IEEE Journal of Solid-State Circuits*, 47(12), 2898–2904.
- Zhu, Y., et al. (2012, June). A 34fJ 10b 500 MS/s partial-interleaving pipelined SAR ADC. In Proceedings of IEEE symposium on VLSI circuits (VLSIC) (pp. 90–91).
- 30. Liu, J., Li, S., Guo, W., Wen, G., & Sun, N. (2019). A 0.029-mm2 17-fJ/conversion-step thirdorder CT ΔΣ ADC with a single OTA and second-order noise-shaping SAR quantizer. *IEEE Journal of Solid-State Circuits*, 54(2), 428–440.
- Hsu, C., Tang, X., Liu, J., Xu, R., Zhao, W., Mukherjee, A., et al. (2021). A 77.1-dB-SNDR 6.25-MHz-BW pipeline SAR ADC with enhanced Interstage gain error shaping and quantization noise shaping. *IEEE Journal of Solid-State Circuits*, 56(3), 739–749.
- 32. Wang, J. C., Hung, T. C., & Kuo, T. H. (2020). A calibration-free 14-b 0.7-mW 100-MS/s pipelined-SAR ADC using a weighted- averaging correlated level shifting technique. *IEEE Journal of Solid-State Circuits*, 55(12), 3271–3280.
- 33. Gregoire, B. R., & Moon, U. (2008). An over-60 dB true rail-to-rail performance using correlated level shifting and an opamp with only 30 dB loop gain. *IEEE Journal of Solid-State Circuits*, 43(12), 2620–2630.
- 34. Yoshioka, K., Sugimoto, T., Waki, N., Kim, S., Kurose, D., Ishii, H., et al. (2017, February). 28.7 A 0.7V 12b 160MS/s 12.8fJ/conv-step pipelined-SAR ADC in 28nm CMOS with digital amplifier technique. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers* (pp. 478–479).
- Hsu, C., Andeen, T. R., & Sun, N. (2020). A pipeline SAR ADC with second-order interstage gain error shaping. *IEEE Journal of Solid-State Circuits*, 55(4), 1032–1042.
- 36. Song, Y., Zhu, Y., Chan, C.-H., & Martins, R. P. (2021). A 40-MHz band-width 75-dB SNDR partial-interleaving SAR-assisted noise-shaping pipeline ADC. *IEEE Journal of Solid-State Circuits*, 56(6), 1772–1783.

# Part III Energy Harvesters and Power Converters

## **Integrated Energy Harvesting Interfaces**



Man-Kay Law, Yang Jiang, Pui-In Mak, and Rui P. Martins

## 1 Introduction

The approaching of the Internet of Things (IoT) era witnessed the deployment of billions of active portable devices in various applications, where each of them perform different application-specific sensing, monitoring, and processing tasks, like Fig. 1 illustrates. To fulfil the ultimate goal of smart everything, we are expecting a continuous increase in the IoT device functionality, and frequent battery replacement is a major concern due to the limited battery capacity [1]. With the continuous advancement of nanofabrication technologies, IoT devices continue to undergo drastic power and system volume miniaturization [2, 3]. In advanced applications including insect-size microrobots [4] and implantable [5–7] systems, they can have an expected power consumption down to the sub-microwatt level, with special emphasis on small system volume, light weight, and long operation lifetime. As the energy availability is becoming increasingly limited in such miniaturized systems, different energy harvesting technologies for scavenging energy from the environment are becoming viable alternatives for resolving the energy bottleneck.

M.-K. Law (🖂) · Y. Jiang · P.-I. Mak

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China e-mail: mklaw@um.edu.mo; timjiang@um.edu.mo; pimak@um.edu.mo

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_6



Fig. 1 Illustrative diagram of the IoT era, with connected devices customized for different application specific tasks



**Fig. 2** A generic energy harvesting system with different energy sources powering the system load together with the storage. The maximum power point tracking (MPPT) module serves to maximize the extracted energy. The energy harvesting interface comprises DC-DC or AC-DC converters to maintain the energy flow between harvesters, storage, and load to achieve real-time system power balance

From Fig. 2, the typical composition of a generic energy harvesting system includes the energy harvesters from different sources, the system load for performing specific application tasks, and the energy storage module. The design of the energy harvesting interface specifically addresses the management of the energy flow among the source, storage, and load. Different from batteries, energy harvesters

are typically nonideal energy sources and can have a high source impedance. Therefore, maximum power point tracking (MPPT) is necessary to ensure efficient energy extraction. One popular MPPT approach is the perturb-and-observe (P&O) method [8, 9], also often referred to as the hill-climbing technique. The key idea is to enforce a small perturbation in the system, in order that it can finally converge and operate near the MPP. Obviously, P&O is flexible and we can apply it to a wide range of energy harvesting systems. However, the frequent voltage and current measurement with fast feedback control can be detrimental to energy-constrained IoT systems, not to mention the possible stability and response time issue due to the periodic system perturbations. In low power systems, a more lightweight solution is the fractional open-circuit voltage  $(V_{\rm OC})$  approach [10, 11]. The basic idea is to exploit the correlation between the MPP and the fractional  $V_{OC}$  of the harvester. This MPPT approach can be especially energy efficient and easy to implement, as it requires only one sampling and comparison operation. However, it is necessary to disconnect the harvester for  $V_{OC}$  sampling, then, it constitutes only a suboptimal solution which demands a priori knowledge of the energy harvester characteristics.

We can classify energy harvesters into either AC-type like vibration [12–14] or DC-type like solar [15, 16] or thermal [17, 18]. As the system load typically needs a DC supply, the energy harvesting interface should perform DC-DC or AC-DC conversion, with the harvested energy delivered to either the storage or the load. For peak power delivery, we should extract the extra energy from the storage. The MPPT circuit can either sample the source information in fractional  $V_{OC}$  or the load information in P&O. It also controls the energy harvesting systems typically have two important parameters. The energy extraction efficiency, which denotes the amount of power extracted from the harvester with respect to the maximum power available, which we can express as,

$$\eta_{\rm ext} = \frac{P_{\rm in,conv}}{P_{\rm harvest,\,max}} \tag{1}$$

where  $P_{in,conv}$  is the input power of the energy harvesting interface and  $P_{harvest,max}$  is the maximum power extractable from the energy harvester. Similarly, we can define the power conversion efficiency (PCE), which is intrinsic to all power converters as,

$$\eta_{\rm conv} = \frac{P_{\rm out, conv}}{P_{\rm in, conv}} \tag{2}$$

where  $P_{\text{out,conv}}$  is the output power of the power converter. In theory, the energy harvesting interface should maximize both  $\eta_{\text{ext}}$  and  $\eta_{\text{conv}}$  to ensure a high end-to-end efficiency.

This chapter introduces different energy harvesting interface designs using switched-capacitor (SC) power conversion techniques to achieve full integration, ultimately targeting both high system efficiency and small size for highly miniaturized IoT systems. Based on the type of energy harvesters, we will introduce different

SC power converters for AC-type and DC-type energy harvesters together with the measurement results in Sects. 2 and 3, respectively, with the conclusions drawn in Sect. 4.

## 2 Flipping Capacitor Rectifier for Vibration Energy Harvesting

In case of vibration energy harvesting, a piezoelectric energy harvester (PEH) is a popular choice due to its high-power density, high scalability and high output voltage generation [19]. With the PEH subjected to mechanical vibrations, they induce stress within the material, thus giving rise to an electromotive force that generates harvestable electrical charge. From Fig. 3a, we can model a piezoelectric energy harvester using a cantilever-beam structure with one dimension of freedom, which represents a spring-mass-damper system [14]. We can divide the operation of harvesting electrical charge, referred above, into two domains, the mechanical and the electrical, interfaced with a coupling stage. As depicted in Fig. 3b,  $L_M$ ,  $C_M$  and  $R_M$  in the mechanical domain represent the mechanical mass, stiffness, and mechanical loss, while  $C_P$  in the electrical domain denotes the intrinsic capacitance.



**Fig. 3** (a) The equivalent model of a piezoelectric energy harvester using a cantilever-beam structure with one dimension of freedom. (b) The equivalent circuit model consisting of the mechanical domain, the electrical domain, and the coupling stage, where we can model the PEH (piezoelectric vibration energy harvester) with an equivalent circuit having in parallel  $I_P$ ,  $C_P$  and  $R_P$  under weak coupling

Typically, the piezoelectric harvester operates at mechanical resonance to increase the output power. With a small size harvester in miniaturized energy harvesting systems, we can assume it weakly coupled. Then, we can model the harvester simply with a dependent current source  $I_{\rm P}$ , in parallel with  $C_{\rm P}$  and the intrinsic loss  $R_{\rm P}$ .

With a miniaturized PEH device, we can assume the mechanical and electrical domains as weakly coupled (i.e., a small coupling coefficient,  $\Gamma$ ). In this case, we can simply model the PEH using the equivalent circuit with  $I_{\rm P}$ ,  $C_{\rm P}$ , and  $R_{\rm P}$  connected in parallel. Assuming a sufficiently large  $R_{\rm P}$ , the PEH is equivalent to a charging/ discharging of  $C_{\rm P}$  with the current source  $I_{\rm P}$ . Consequently, we can theoretically optimize the output power by operating the PEH at the maximum power point (MPP), through biasing the PEH at approximately half of the open-circuit voltage  $(V_{\rm OC})$ .

### 2.1 Conventional PEH Interfaces

Figure 4 presents the conventional piezoelectric energy harvesting typically implemented with a full-bridge rectifier (FBR) for AC-DC conversion. It is simple to implement and robust, allowing full interface integration. However, there is a limitation in the extractable electrical power due to the PEH inherent capacitance  $C_{\rm P}$ . Yet, to implement such (extracting energy from the PEH) is theoretically inefficient, since part of the harvested energy dissipates with the changing of  $C_{\rm P}$  polarity whenever  $I_{\rm P}$  changes direction before delivering the PEH energy to the load, as exemplified with  $Q_{\rm loss}$  in Fig. 4b.

To alleviate the effect of  $C_{\rm P}$ , we can employ an impedance matching network (Fig. 5a). As the harvester impedance is capacitive, we can use an inductive component (e.g.,  $L_{\rm EX}$ ) to eliminate the phase difference between  $V_{\rm P}$  and  $I_{\rm P}$  (Fig. 5b, c). Then, we can change the input impedance of the AD-DC rectifier by tuning  $R_{\rm L}$  to extract the maximum power. However, as we should size  $L_{\rm EX}$  to resonate with  $C_{\rm P}$  at the excitation frequency, there is a direct correlation to the



**Fig. 4** The conventional PEH interface (**a**) using a full bridge rectifier (FBR) for rectifying the PEH AC voltage to a rectified DC output  $V_{\text{rect}}$  and (**b**) the corresponding timing diagram, with  $Q_{\text{loss}}$  representing the charge loss for  $I_{\text{P}}$  to discharge  $C_{\text{P}}$ 



Fig. 5 (a) PEH interface with impedance matching network, (b) matching with an external inductor  $L_{\text{EX}}$ , and (c) the corresponding timing diagram with  $I_{\text{P}}$  in phase with  $V_{\text{P}}$ 



Fig. 6 (a) PEH interface with switch only rectifier (SOR) and (b) the corresponding timing diagram

PEH resonance frequency, rendering this approach unattractive if the PEH resonance frequency is low.

A simple way to reduce the  $Q_{\text{loss}}$  in Fig. 4b is the switch-only rectifier (SOR), which shorts the PEH during the zero crossing of  $I_{\text{P}}$  using a simple switch (Fig. 6a). As a result, instead of discharging  $C_{\text{P}}$  from  $V_{\text{rect}}$  to  $-V_{\text{rect}}$ , the harvester only discharges  $C_{\text{P}}$  from 0 to  $-V_{\text{rect}}$  (Fig. 6b), and we can theoretically double the extracted power while preserving a simple solution. The rebuilt voltage  $V_{\text{r}}$  is equal to zero due to the shorting operation. This leads to a longer conduction time, and equivalently reducing  $Q_{\text{loss}}$  will increase the extracted power. Nevertheless, the shorting operation also indicates that we waste energy on  $C_{\text{P}}$ , ultimately limiting the extractable power.

Theoretically, we should quickly reverse the PEH voltage when  $I_P$  changes direction, in order that the  $V_P$  and  $I_P$  are in phase to reduce the charge loss (Fig. 7). This operation should be as efficient as possible through recycling the energy on  $C_P$ . The harvesting efficiency has a direct correlation with the efficiency of



Fig. 7 (a) PEH interface with  $C_{\rm P}$  energy recycling and (b) the corresponding timing diagram



**Fig. 8** (a) The capacitive PEH interface supporting seven-phase reconfiguration in [22] and (b) the relationship between the achievable MOPIR and the corresponding timing diagram

the PEH voltage flipping operation  $\eta_{\rm F} = (V_{\rm r} + V_{\rm rect})/2V_{\rm rect}$ . The higher the  $\eta_{\rm F}$ , the lower the energy loss during the flipping operation, resulting in a higher extractable power.

Conventionally, we can obtain such a voltage flipping operation using the parallel-synchronous switch harvesting on inductor (P-SSHI) approach [20] but at the expense of a bulky off-chip high-Q inductor. Still, as demonstrated in [21, 22], we can also achieve efficient voltage flipping of  $C_P$  capacitively, so it is possible to have the interface circuit completely designed on-chip. The basic idea is to first extract the energy on  $C_P$  using a capacitor, with the energy then employed to recharge  $C_P$  while flipping its polarity. In that case, we can invert swiftly the voltage on  $C_P$ , while minimizing the energy loss. To realize this, a multiphase operation can provide stepwise charging or discharging of  $C_P$  for reducing the conduction loss and improving  $\eta_{\rm F}$ .

Figure 8a shows the capacitive PEH interface (known as flipping capacitor rectifier, FCR) exploiting a seven-phase operation with four flying capacitors for both the positive transition cycle (PTC) and negative transition cycle (NTC) in [22], with the on-chip capacitors  $C_{1-4}$  reconfigured over the seven  $C_P$  voltage flipping



Fig. 9 (a) Chip micrograph in [22], (b) the achieved  $P_{OUT}$  over  $V_{rect}$ , and (c) measured waveforms of  $V_P$  at different testing conditions

phases. We can generally employ as a performance benchmark, the maximum output power improving rate (MOPIR), defined as the ratio between the extracted power and that using a FBR. From Fig. 8b, we observe that we can improve the conduction loss by either increasing the total capacitance  $C_{\text{total}}$  for a given  $C_{\text{P}}$  or by increasing the number of phases. As a result, the extracted output power increases.

The work in [22], fabricated in 0.18  $\mu$ m CMOS 1.8/3.3/6 V process, occupied an active area of 1.7 mm<sup>2</sup>. With the PEH  $C_{\rm P}$  characterized as 80 pF, we set the total on-chip capacitance  $C_{\rm total}$  to 1.44 nF to achieve a  $C_{\rm total}$  over  $C_{\rm P}$  ratio of 18. With all the components implemented on a single chip, the capacitor area covers ~85% of total chip area (Fig. 9a).

Figure 9b plots the measured output power  $P_{out}$  with different output  $V_{rect}$  for both the FCR in [22] and FBR at 110 kHz. The output power can be up to 50.2  $\mu$ W, and the achieved MOPIR is 4.83×. At low  $V_{in}$ , the low switch turn on voltage leads to a reduced output power. Figure 9c presents the measured PEH voltage under different measurement settings to demonstrate the effect of phase offset, incomplete charge transfer, and reduced conduction time on the extracted output power. We can perceive that the maximum PEH voltage of 5.1 V occurs when the flip time is ~1  $\mu$ s without the energy loss due to phase offset and reduced to 3.8 V with a phase offset of ~500 ns enforced.

|                                                 | FCR Design                                                              | JSSC'16 [8]                          | ISSCC'14 [23]                                    | JSSC'14 [24]                         | JSSC'10 [25]                        | TCAS-I'17 [26]                       | JSSC'16 [27]                        |
|-------------------------------------------------|-------------------------------------------------------------------------|--------------------------------------|--------------------------------------------------|--------------------------------------|-------------------------------------|--------------------------------------|-------------------------------------|
| Technology                                      | 0.18 µm                                                                 | 0.35 µm                              | 0.35 µm                                          | 0.35 µm                              | 0.35 µm                             | 0.25 µm Bi                           | 0.35 µm HV                          |
| Energy extraction technique                     | Flipping-capacitor<br>rectifier                                         | P-SSHI                               | Energy<br>pile-up                                | Energy<br>investment                 | P-SSHI                              | P-SSHI                               | P-SSHI                              |
| Piezoelectric<br>harvester                      | Piezo Systems Inc.<br>(P5A4E @ 5 mm <sup>3</sup> )                      | MIDE V21B<br>& V22B                  | Emulated<br>(transformer + RC)                   | MIDE V22B                            | MIDE V22B                           | MIDE V22B                            | MIDE V20W                           |
| Key component                                   | On-chip<br>MIM capacitor<br>(C <sub>total</sub> = 1.44 nF) <sup>a</sup> | External<br>inductor<br>(L = 3.3 mH) | External<br>inductor<br>(L = 10 mH) <sup>b</sup> | External<br>inductor<br>(L = 330 µH) | External<br>inductor<br>(L = 47 µH) | External<br>inductor<br>(L = 220 µH) | External<br>inductor<br>(L = 20 µH) |
| Max. output power<br>increasing rate<br>(MOPIR) | 4.83x<br>4.78x <sup>°</sup>                                             | 6.81x                                | 4.22x                                            | 3.6x                                 | 2.8x                                | 2.07x                                | 5x <sup>e</sup>                     |
| Max. voltage flipping eff. ( $\eta_{\rm F}$ )   | 0.85                                                                    | 0.94                                 | 0.77 <sup>b</sup>                                | NA                                   | 0.75 <sup>b</sup>                   | 0.75                                 | 0.67 <sup>b</sup>                   |
| Chip size                                       | 1.7 mm <sup>2</sup>                                                     | 0.72 mm <sup>2</sup>                 | 5.5 mm <sup>2</sup>                              | 2.34 mm <sup>2</sup>                 | 4.25 mm <sup>2</sup>                | 0.74 mm <sup>2</sup>                 | 0.6 mm <sup>2</sup>                 |
| Output power                                    | 50.2 μW                                                                 | 160.7 µW <sup>d</sup>                | 87 µW                                            | 52 µW                                | 32.5 µW                             | 136 µW                               | 75 μW                               |
| Operating freq.                                 | 110 kHz                                                                 | 225 Hz                               | 100 Hz                                           | 143 Hz                               | 225 Hz                              | 144Hz                                | 82Hz                                |

Table 1 Performance comparison of FCR with state-of-the-art PEH interfaces

<sup>a</sup>Total capacitance for  $C_{1-4}$ 

<sup>b</sup>Estimated from the corresponding literature

<sup>c</sup>Averaged over 4 measured samples

<sup>d</sup>Off-resonance with 3.35 g acceleration

<sup>e</sup>FBR output power limited by execessive diod voltage drop

Table 1 summarizes the performance comparison of the proposed FCR technique with the state-of-the-art PEH interfaces [8, 23–27]. As observed, the proposed FCR technique can effectively achieve full integration using 1.44 nF on-chip capacitors under a chip size of 1.7 mm<sup>2</sup>. The achieved MOPIR is up to  $4.83\times$ , which is comparable to the other inductive PEH interfaces using bulky high-Q inductors with an inductance of up to the mH range.

Based on the FCR technique in [22], we can further improve the PEH interface performance using the split-phase technique in [28, 29], obtaining an even higher MOPIR. Figure 10a presents the system diagram of the corresponding split-phase FCR (SPFCR) design, which includes both maximum power point tracking (MPPT) and output voltage control. The idea is to reuse the capacitor array with 4  $C_{\rm fly}$  to generate a total of 21 PEH voltage flipping phases, while utilizing the same capacitors during the non-voltage flipping time to provide multiple voltage conversion ratio (VCR) control. It also provides a MPPT scheme for relaxing the device voltage tolerance.

Figure 11 exhibits the 21-split-phase operation using 4  $C_{\rm fly}$ , with the harvester voltage biased in a step-wise manner. We design the voltage across each capacitor,



Fig. 10 (a) The SPFCR PEH interface using four flying capacitors with capacitor reuse as proposed in [29], (b) the concept of capacitor reuse between the DC-DC voltage conversion phase, and the SPFCR PEH voltage flipping phase



Fig. 11 The 21-phase SPFCR operation using four flying capacitors in [29]

defined by the multiphase operation, to be distinct in order to facilitate voltage level generation and DC-DC converter implementation.

Figure 12 demonstrates how we can reuse the 4  $C_{\rm fly}$  for DC-DC conversion. With the reconfiguration arrangement from Fig. 11, the capacitor voltages after the 21-phase SPFCR operation will be  $V_{\rm C1} = 0.27V_{\rm rect}$ ,  $V_{\rm C2} = 0.32V_{\rm rect}$ ,  $V_{\rm C3} = 0.19V_{\rm rect}$ , and  $V_{\rm C4} = 0.1V_{\rm rect}$ , respectively. By selecting appropriate interconnections among the capacitors in the charging ( $\Phi_{\rm C}$ ) and discharging ( $\Phi_{\rm D}$ ) phases, we can implement different buck/boost VCRs, with VCR update enforced after the MPPT operation for wide input adaptation.

This work employs the fractional open-circuit voltage ( $V_{OC}$ ) approach to achieve MPPT. However, as  $V_{OC}$  is typically twice the MPP voltage, the voltage tolerance requirement is ultimately limited during the VOC sampling (Fig. 13a). By exploring the PEH-dependent empirical correlation between  $V_{MPP}$  and the  $V_{OC}$  of FBR in Fig. 13b, this work can achieve MPPT with a much lower voltage tolerance. Specifically, as observed in the control waveforms in Fig. 13c, the MPPT arbitrer first resets  $C_P$  and then samples  $V_{OC,FBR}$  through the peak detector. The sampled



Fig. 12 Reusing of the 4  $C_{fly}$  to realize DC-DC conversion during the non-voltage flipping period in [29]



**Fig. 13** (a) Illustration of the advantage of using  $V_{\text{OC,FBR}}$  instead of  $V_{\text{OC,SPFCR}}$ ; (b) the empirical relationship between VMPP, SPFCR, VOC, and FBR; (c) the control waveforms of the MPPT arbiter; and (d) the simplified circuit implementation of the MPPT arbiter in [29]



Fig. 14 (a) Chip micrograph of the PEH interface using SPFCR in [29] and (b) the empirical relationship between  $V_{\text{MPP,SPFCR}}$  and 2  $V_{\text{OC,FBR}}$  over different  $P_{\text{in,FBR}}$ 



Fig. 15 (a) The measured waveform under MPPT and SPFCR operations and (b) the measured Pout versus  $V_{\text{rect}}$  under different external accelerations and the achieved MOPIR under different  $P_{\text{in}, \text{FBR}}$  in [29]

value divided and compared can generate the controls for the VCR update. Figure 13d shows the simplified circuit-level implementation of the MPPT arbiter.

Figure 14a presents the PEH interface exploiting the 21-phase SPFCR approach, implemented in 0.18  $\mu$ m CMOS; it occupies an area of ~0.21 mm<sup>2</sup>, with 4 off-chip  $C_{\rm fly}$  of 68 nF each. Figure 14b demonstrates the empirical ratio between  $V_{\rm MPP,SPFCR}$  and 2  $V_{\rm OC,FBR}$ , with a measured value of approximately 2.3, validating the possibility of using  $V_{\rm OC,FBR}$  for MPPT.

Figure 15a displays the MPPT as well as the 21-phase SPFCR operations. As observed, the MPPT arbiter can generate the required  $V_{\text{MPP,SPFCR}}$  and 2  $V_{\text{OC,FBR}}$  ratio for successful MPPT operation. Figure 15b shows the measured  $P_{\text{out}}$  versus  $V_{\text{rect}}$  and the achieved MOPIR under different  $P_{\text{in,FBR}}$ . We can demonstrate that this

|                                | SPFCR                                                                   | JSSC'17 [30]                          | JSSC'17 [22]                         | JSSC'19 [31]                          |
|--------------------------------|-------------------------------------------------------------------------|---------------------------------------|--------------------------------------|---------------------------------------|
| Technology                     | 0.18 µm                                                                 | 0.35 µm                               | 0.18 µm                              | 0.18 µm HV                            |
| Energy extraction technique    | Split-phase flipping<br>capacitor rectifier                             | SSHC                                  | FCR                                  | SE-SSHC                               |
| Piezoelectric<br>harvester     | MIDE PPA1021                                                            | MIDE V21BL                            | P5A4E @ 5 mm <sup>3</sup>            | Custom MEMS                           |
| Harvester size                 | 71 × 10.3 × 0.86 mm <sup>3</sup>                                        | 90 × 16.7 × 0.79 mm <sup>3</sup>      | $5 \times 1 \times 1 \text{ mm}^3$   | 7 × 2 mm <sup>2</sup> (4 pieces)      |
| C <sub>p</sub>                 | 22 nF                                                                   | 45 nF                                 | 78.4 pF                              | 1.94 pF                               |
| Key Component                  | 4 Capacitors <sup>a</sup><br>21 phase                                   | 8 capacitors <sup>a</sup><br>17 phase | 4 Capacitors <sup>b</sup><br>7 phase | 8 Capacitors <sup>b</sup><br>17 phase |
| C <sub>total</sub>             | 272 nF                                                                  | 360 nF                                | 1.44 nF                              | 4 nF                                  |
| MOPIR                          | 5.9 ~ 9.3× @V <sub>D</sub> = 0.12 V<br>3.7 ~ 6.2× @V <sub>D</sub> = 0 V | 9.7×                                  | 4.83×                                | 8.21×                                 |
| P <sub>in</sub> adaptation     | Capacitor-reuse<br>multi-VCR<br>SC DC-DC                                | no                                    | no                                   | no                                    |
| MPPT                           | Yes<br>( <i>V</i> <sub>OC,FBR</sub> -based)                             | no                                    | no                                   | no                                    |
| Voltage flipping<br>efficiency | 0.84                                                                    | 0.8                                   | 0.85                                 | 0.69                                  |
| Chip size                      | 0.2 mm <sup>2</sup>                                                     | 2.9 mm <sup>2</sup>                   | 1.7 mm <sup>2</sup>                  | 5.3 mm <sup>2</sup>                   |
| Output power                   | 0.5 ~ 64 μW                                                             | 161.8 µW                              | 50.2 μW                              | 186 µW                                |
| Operating freq.                | 200 Hz                                                                  | 92 Hz                                 | 110 kHz                              | 219 Hz                                |

Table 2 Performance comparison of SPFCR with state-of-the-art PEH interfaces

<sup>a</sup>Off-chip component

<sup>b</sup>On-chip component

work can obtain a high MOPIR of up to  $9.3 \times$  (with a diode drop of 0.12 V), while accomplishing a wide input power adaption with  $V_{out}$  set to 2 V. The drop at the low input power end is due to the increased intrinsic loss.

Table 2 summarizes the performance comparison of the SPFCR with state-of-theart PEH interfaces [22, 30, 31]. Using only four capacitors, the proposed SPFCR PEH interface can achieve a 21-phase PEH voltage flipping efficiency of up to 0.84. Using  $V_{\rm OC,FBR}$ -based MPPT, we can realize wide  $P_{\rm in}$  adaptation through reconfiguring the four capacitors for multi-VCR SC DC-DC conversions. It also features a high MOPIR of up to 9.3× at a diode voltage drop  $V_{\rm D} = 0.12$  V.

## **3** Reconfigurable SC DC-DC Boost Converter for Solar/Thermal Energy Harvesting

Switched-capacitor DC-DC converters attain conversion efficiency advantages for on-chip implementation when compared with inductive converters [32, 33]. The implementation of SC converters requires relatively much smaller energy storage elements, and we can realize them using a bulk CMOS process, allowing an attractive technical approach in energy-constrained miniaturized IoT devices to deliver efficient power conversion with a small system form factor [8, 9, 16, 34– 42]. As depicted in Fig. 16a, the ambient variations of the energy harvester operating conditions can lead to a wide-range distribution of the output voltage from the harvester node. Consequently, the DC-DC converter after the harvester module should feature multiple ratio conversion over a wide input voltage range to improve the overall efficiency, as illustrated in Fig. 16b. For instance, the ambient harvested voltage level can be far lower than the system required supplied voltage, for example, the voltage provided by a solar cell is normally in the level of 0.2–0.4 V. A wide and variable range power converter is a prerequisite for powering a loading system or charging a storage. However, traditional single-VCR SC converters suffer from limited voltage conversion range and overall efficiency degradation under the above application scenarios. Accordingly, highly integrated SC DC-DC converters with multiple VCRs are superior solutions for wide-range voltage conversion adaptation, overall harvesting functionality and performance improvement, and also system integration level [43-54]. In this section, we discuss the generic SC converter power stage intrinsic loss model together with advanced topology techniques for generating fine-grained VCRs and improving the interfacing capability with DC-type energy harvesters (i.e., solar and thermoelectric sources).



Fig. 16 (a) DC-DC converter with a wide input voltage range for DC-type energy harvesting powered systems. (b) Efficiency improvement over a wide input range through multiple ratio conversion

#### 3.1 SC Converter Power Stage Losses

In general, the power stage intrinsic performance of two-phase SC converters is highly dependent on the charge conduction loss and flying capacitor ( $C_{\rm fly}$ ) parasitic loss power, which are critical factors that induce the efficiency degradation during the switching operations. Establishing an effective and easy-to-use model to evaluate the SC power stage losses is essential for alleviating the loss influence and optimizing the conversion efficiency in an EH interface. Figure 17a shows an SC converter equivalent model with a generic VCR of m: n [55]. The equivalent output impedance  $R_{OUT}$  demonstrates that the conduction losses are dependent on both the slow switching limit loss  $(R_{SSL})$  and fast switching limit loss  $(R_{FSL})$ . The converter switching frequency  $f_{\rm S}$  dominates the overall  $R_{\rm OUT}$  under a given on-chip total capacitance and switch conductance (active area) (Fig. 17b). For a single SC cell, the charge conduction from the input to the output through a  $C_{\rm fly}$  generates chargesharing loss between the voltage source and capacitors, inevitably causing conversion efficiency drop. As depicted in Fig. 17c, under a slow switching condition (i.e., at a low  $f_{\rm S}$ ), we can evaluate the corresponding periodic loss power with the SC power stage in two-phase operation as

$$P_{\rm ls,SSL} = \frac{Q_{\rm cond}^2}{C_{\rm fly}}.$$
(3)



Fig. 17 (a) Generalized SC converter power stage equivalent model and (b) the equivalent output impedance versus switching frequency under a given total capacitance and switch conductance; the mechanism illustration for (c) charge-sharing loss, (d) switch conduction loss, and (e) bottom plate parasitic loss

where  $Q_{\text{cond}}$  is the conducted charge amount. On the contrary, at a high  $f_{\text{S}}$ , the switch turn-on resistance  $R_{\text{ON}}$  will dominate the loss power, as highlighted in Fig. 17d. Then, we can calculate the loss power through

$$P_{\rm ls,FSL} = \frac{Q_{\rm cond}^2}{2C_{\rm fly}} \coth\left(\frac{T_{\rm S}}{2R_{\rm ON}C_{\rm fly}}\right) f_{\rm S} \tag{4}$$

With Eqs. (3) and (4), the equivalent  $R_{SSL}$  and  $R_{FSL}$  for a single SC cell becomes

$$R_{\rm SSL} = \frac{1}{f_{\rm S}C_{\rm fly}} \left(\frac{Q_{\rm cond}}{Q_{\rm OUT}}\right)^2 \tag{5}$$

$$R_{\rm FSL} = \frac{R_{\rm SSL}}{2} \cdot \coth\left(\frac{1}{4f_{\rm S}R_{\rm ON}C_{\rm fly}}\right) \tag{6}$$

where  $Q_{OUT}$  denotes the periodic delivered output charges. Regarding Eq. (6), as  $f_S$  approaches to infinitely high, the  $R_{FSL}$  asymptotic limit is

$$R_{\rm FSL}|_{f_{\rm S}\to\infty} = 2R_{\rm ON} \left(\frac{Q_{\rm cond}}{Q_{\rm OUT}}\right)^2 \tag{7}$$

As observed, Eqs. (3)–(7) describe the loss of a single SC cell. Consequently, we need to add up the corresponding losses from all SC power cells in a converter power stage to obtain the overall  $R_{SSL}$  and  $R_{FSL}$ . From [55], we can estimate the overall  $R_{OUT}$  for an SC converter power stage by

$$R_{\rm OUT} \approx \sqrt{R_{\rm SSL}^2 + R_{\rm FSL}^2} \tag{8}$$

In addition to the conduction loss,  $C_{\rm fly}$  parasitic loss can also affect the conversion efficiency. The parasitic loss is especially critical for fully integrated SC converters because the on-chip capacitor devices generally suffer from the considerable top/bottom plate parasitic effect. The parasitic capacitance is typically proportional to the main capacitor size with a factor of  $\beta$  (Fig. 17e). Depending on the capacitor type, the value of  $\beta$  can be up to 0.1. We can use the following equation to evaluate the parasitic loss to the first order for a single SC cell:

$$P_{\rm ls,par} = \beta C_{\rm fly} \cdot \Delta V_{\rm CP}^2 f_{\rm S} \tag{9}$$

where  $\Delta V_{CP}$  is the top/bottom plate voltage swing of the  $C_{fly}$ . The above-discussed conduction and parasitic losses can directly restrict the achievable conversion efficiency. The related discussion on the optimization of the SC converter power stage losses appeared in [34, 43, 55, 56].

## 3.2 Two-Dimensional Series-Parallel (SP)-Based Topology for Fractional VCR Generation

Several state-of-the-art topology techniques obtained reduced conduction and parasitic losses when generating fine-grained step-down VCRs [44, 48, 49]. An integercascading-fractional power stage architecture is a widely adopted topology solution to provide wide-range and fine-grained conversion ratio generation. Such an approach involves a converter stage that realizes a high integer VCR and a cascaded SC stage that features fractional ratio conversion to fine-tune the output levels [38, 39]. Still, this method may suffer from suboptimal conduction losses when realizing rational boost VCRs in wide range, especially in on-chip conversion scenarios due to the limited total capacitor area. We can express a general rational boost VCR as below:

$$VCR = \frac{V_{OUT}}{V_{IN}} = K + \frac{m}{n},$$
(10)

where *K*, *m*, and *n* are all positive integers with  $m \le n$  and *m* and *n* are relatively prime. In Eq. (10), we can implement the integer part of the ratio (i.e., *K*) using a well-developed classic integer topology, for instance, Dickson SC topology, to realize optimal on-chip loss reduction [34, 57], while we can implement the fractional ratio m/n by series-parallel (SP) topology, one of the most well-known solutions for the fractional VCR generation, using relatively modular power cells. As a designer, we can adopt an *m*-row-by-*n*-column power cell array to construct the SP power stage for implementing a flexible ratio of m/n [43, 58]. Figure 18a illustrates such a topology, defined as a two-dimensional SP (2DSP) structure. Cooperating with an integer conversion stage (e.g., Dickson structure), the 2DSPbased SC converter features high step-up rational VCR generation. To obtain a complete on-chip capacitor utilization, we can reallocate part of the  $C_{fly}$  to generate either the integer or fractional ratios [50, 59], enhancing the charge-sharing loss reduction (equivalently, lowering the  $R_{SSL}$ ).

Regarding the discussed  $R_{SSL}$  loss metric, as described in Eq. (5), its optimal level when generating a boost VCR expressed in Eq. (10) under a given total flying capacitance  $C_{TOT}$  is

$$R_{\text{SSL,opt}} = \frac{1}{C_{\text{TOT}} f_{\text{S}}} \left(\frac{Kn + m - 1}{n}\right)^2.$$
(11)

The above equation is valid for the existing SC boost topologies, and it involves the loss from both the integer and fractional stages. In the following discussion, we assume the generation of the integer ratio (K), based on the Dickson topology, for optimal losses in fully integrated implementation scenarios.

We discuss the issues of using 2DSP topology when generating a fractional ratio of m/n [52]. Figure 18 demonstrates the two-phase operation for implementing the



Fig. 18 (a) Generalized 2DSP SC power stage and (b) its two-phase operation states

VCR in Eq. (10) using the 2DSP topology, where the voltage  $(K-1)V_{IN}$  is from the integer ratio stage (Dickson SC). The  $m \times n$  SC cell array produces the ratio m/n. A flexible assignment of m and n can ensure high fractional VCR possibilities. Giving that the integer part exhibits the optimal  $R_{SSL}$  property, the corresponding  $R_{SSL}$  for a general 2DSP topology under optimized  $C_{flv}$  charge flow assignment is

$$R_{\rm SSL,2DSP} = \frac{\left(\sum_{i=1}^{N_{\rm C}\_2DSP} |a_{\rm c,i}| + \sum_{k=1}^{N_{\rm C}\_int} |a_{\rm c,k}|\right)^2}{C_{\rm TOT}f_{\rm S}} = \frac{(m+K-1)^2}{C_{\rm TOT}f_{\rm S}},$$
(12)

where  $N_{C_2DSP}$  and  $N_{C_int}$  are the total number of power cell units in the fractional and integer power stage, respectively. We can compare the  $R_{SSL,2DSP}$  in Eq. (12) with the  $R_{SSL,opt}$  in Eq. (11) through the following analysis:

$$\sum_{i=1}^{N} |a_{c,i}|_{(2DSP)} - \sum_{i=1}^{N} |a_{c,i}|_{(opt)} = m + K - 1 - \frac{Kn + m - 1}{n} = \frac{1}{n}(mn - n - m + 1).$$
(13)

From the above equation, since it defines m/n as a proper fraction with  $n \neq 1$ , a possible solution for  $R_{SSL,2DSP} = R_{SSL,opt}$  is m = 1. Obviously, the 2DSP-based topology pulls off a suboptimal  $R_{SSL}$  with most of the conversion ratios except for

using a  $1 \times n$  array (a special case) to implement a VCR of (K + 1/n). Such a special case can only realize a limited set of fractional part of VCR between 0 and 1/2, which can degrade its application flexibility.

## 3.3 Algebraic Series-Parallel (ASP)-Based SC Topology Development

From the above discussion, even though the 2DSP-based topology can reach a good VCR flexibility by an  $m \times n$  capacitor array, it in fact suffers from suboptimal  $R_{SSI}$  in most of the VCR cases due to using extra  $C_{fly}$ . Besides, the 2DSP topology also exhibits relatively larger  $C_{\rm flv}$  bottom plate voltage swing ( $|\Delta V_{\rm CB}|$ ), resulting in increased  $C_{\rm flv}$  parasitic loss. Those limitations affect the achievable conversion efficiency of the 2DSP-based wide-range fine-grained converter, especially for on-chip implementations. To resolve the suboptimal  $R_{SSL}$  and increased parasitic loss limitations in the conventional 2DSP, we introduced an algebraic SP (ASP) topology technique, which can theoretically deliver optimal  $R_{SSL}$  and improved parasitic loss, simultaneously, retaining the fractional VCR generation flexibility. The topology development exploits the basic power cell operations of the 2DSP followed by elaborating the  $V_{OUT}$  expression using ASP-based topology operations algebraically. By systematically assigning the terminal voltages of the SC power cells in different switching phases according to a simple algorithm, a designer can realize arbitrary rational boost VCRs using the ASP-based method (Dickson + ASP) without using extra  $C_{\rm fly}$ . We also limited well the  $|\Delta V_{\rm CB}|$  in each cell to reduce the parasitic loss.

According to Fig. 18, we can represent the integer part generation by the voltage  $(K-1)V_{IN}$  for simplicity. Then, we can express the realization of the VCR in Eq. (10) by the 2DSP-based topology as

$$V_{\text{OUT,2DSP}} = V_{\text{IN}} + \left(m \times \frac{V_{\text{IN}}}{n}\right) + (K-1)V_{\text{IN}},\tag{14}$$

with the second and third terms implemented using the 2DSP and the integer conversion stage, respectively. By stacking both parts over the converter input, we can obtain the final output. One limitation with the 2DSP is that the fractional part during the topology reconfiguration highly relies on *m* and *n*, resulting in an excessive number of  $C_{\rm fly}$  and leading to a suboptimal  $R_{\rm SSL}$ . Furthermore, the limited capacitor voltage of  $(1/n)V_{\rm IN}$  also results in a higher bottom plate switching voltage  $|\Delta V_{\rm CB}|$ , which contributes to significant parasitic loss.

Tackling the issues in the 2DSP structure, Fig. 19 represents the two-phase ASP-based implementation for realizing the VCR in Eq. (10), featuring a uniform charge flow amount through each capacitor. The corresponding algebraic  $V_{OUT}$  expression becomes



Fig. 19 Operation states for the ASP-based rational VCR boost topology with the integer-level generation by Dickson SC structure

$$V_{\text{OUT,ASP}} = KV_{\text{IN}} + (n - m - 1)(K - 1)V_{\text{IN}} + mKV_{\text{IN}} + (n - 1) \times (V_{\text{IN}} - V_{\text{OUT}})$$
(15)

As observed above, we construct the  $V_{OUT}$  expression with the summation of different items. The included  $(V_{IN} - V_{OUT})$  term enables flexible m/n generation without dividing  $V_{IN}$  as the operation in 2DSP. Figure 19 shows the corresponding two-phase operations. During  $\Phi_2$ , all the (2n-2) SC cells connect between  $KV_{IN}$  and  $V_{OUT}$ , featuring the SP-like operations. The increased  $C_{fly}$  voltages help to lower the bottom plate switching voltage and hence reducing the parasitic loss. Similarly with the previously discussed 2DSP case, we can generate the  $KV_{IN}$  and  $(K-1)V_{IN}$  using Dickson SC stages for optimal conduction and parasitic losses. As illustrated in Fig. 19, the Dickson SC stages operate in parallel. It corresponds to a total of  $N_{C_Dks} = (Kn - 2n + m + 1)$  capacitors in the Dickson stages with a uniform conducted charge amount of  $|(1/n)Q_{OUT}|$ . In the ASP-based topology, there are a total of (Kn + m - 1) power cells (including the Dickson cells) with each cell conducting a uniform charge flow of  $|(1/n)Q_{OUT}|$ . Hence, it can theoretically reach the optimal  $R_{SSL}$  according to Eq. (11).

### 3.4 ASP Topology Generation and Analysis

Based on the above-discussed ASP operation concept, Fig. 20 shows a generalized two-phase model for an ASP-based topology framework. It requires a total number of  $N_{\rm F} = 2n - 2$  cells for generating a fractional of m/n for an overall VCR of n: (Kn + m). From Eq. (15), there are  $(n - 1) C_{\rm fly}$  connected to the  $(V_{\rm IN} - V_{\rm OUT})$  level, and the other (n - m - 1) and m cells are charged by  $(K - 1)V_{\rm IN}$  and  $KV_{\rm IN}$ , respectively. Also in Fig. 20, the odd cells are with  $(K - p)V_{\rm IN}$  and the even cells are with  $(V_{\rm IN} - V_{\rm OUT})$ . This cell arrangement ensures a pair-wise operation that can reduce  $|\Delta V_{\rm CB}|$ . We can define a configuration factor (p) to determine whether we should connect a particular odd cell to  $(K - 1)V_{\rm IN}$  or  $KV_{\rm IN}$ . The value of p can be either 0 or 1 to generate  $(K - p_i)V_{\rm IN}$ , where i denotes the cell sequence index. Furthermore, the defined p is not applicable to even cells. Referring to Eq. (15), the sum of all  $p_i$  is equal to (n - m - 1) to obtain power stage voltage balance. For the framework in Fig. 20, the steady-state voltage balancing equation is

$$(K - p_1)V_{\rm IN} + (V_{\rm IN} - V_{\rm OUT}) + (K - p_3)V_{\rm IN} + (V_{\rm IN} - V_{\rm OUT}) + \dots + (K - p_{N_{\rm F}-1})V_{\rm IN} + (V_{\rm IN} - V_{\rm OUT}) = V_{\rm OUT} - KV_{\rm IN}.$$
(16)

Reorganizing Eq. (16), we have the VCR expression as

$$VCR_{ASP} = K + \frac{N_{\rm F} - 2\sum_{k=1}^{N_{\rm F}/2} p_{2k-1}}{N_{\rm F} + 2}.$$
 (17)

By substituting  $\Sigma p_k = n - m - 1$  and  $N_F = 2n - 2$  into Eq. (17), we can have the same VCR expression as in Eq. (10).



Fig. 20 Operation states for a general ASP-based boost topology (the integer-level generation by Dickson SC structure)

Observed in Eq. (17), the result of  $p_k$  is not unique, indicating that there are multiple sets of p to deliver the same VCR with optimal  $R_{SSL}$ . Yet, it may introduce suboptimal parasitic loss depending on the specific  $p_k$  determination. The algorithm presented below ensures a systematic p selection together with parasitic loss reduction:

$$p_{i}(i \text{ is odd}) = \begin{cases} 1, \quad \left(\frac{i+1}{2}\right)\left(1-\frac{m}{n}\right) > 1 + \sum_{k=0}^{(i-1)/2} p_{2k-1} \\ 0, \quad \left(\frac{i+1}{2}\right)\left(1-\frac{m}{n}\right) < 1 + \sum_{k=0}^{(i-1)/2} p_{2k-1} \end{cases}$$
(18)

From Eq. (18), we can find that  $p_i$  is dependent on m/n and the specific configurations of its previous cells. Figure 21 describes illustratively well the procedure for  $p_i$  determination. By applying the above cell determination rules, the ASP power cell ordering (Fig. 20) ensures that we can well bind the  $|\Delta V_{\rm CB}|$  below  $V_{\rm IN}$  for reduced parasitic loss with a specific m/n.

According to the aforementioned loss analysis for a generic SC power stage, Fig. 22a exhibits the  $R_{SSL}$  comparison between the ASP-based and the 2DSP-based topologies when generating rational VCRs between 1:1 and 1:6 under a constraint condition of the same total capacitance. In the comparison, we set the fractional part of the VCR as  $m/n = \{1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5\}$ . We used the Dickson SC stages in both ASP and 2DSP cases to realize integer ratios, including also the corresponding loss contributions for fair comparisons. Regarding the integer gain generation, the corresponding required number of "unit" power cell in the 2DSPbased topology is typically more than that of the ASP-based topology. Hence, it leads to a higher  $R_{SSL}$  under the same  $C_{fly}$  area. From Fig. 22a, the ASP-based topology achieves evidently lower  $R_{SSL}$  for the whole range in contrast to the 2DSPbased topology, except for the cases of m = 1, which exhibits the same  $R_{SSL}$  for both implementations.





Fig. 22 Theoretical comparison between the ASP-based and the 2DSP-based topologies with the fractional VCRs from 1:1 to 1:6 on (a) the  $R_{\rm SSL}$  (normalized to  $C_{\rm TOT}f_{\rm S}$ ), (b) the parasitic loss (normalized to  $\beta C_{\rm TOT}f_{\rm S}V_{\rm IN}$ ) under fixed- $V_{\rm IN}$ , and (c) the parasitic loss (normalized to the same  $\beta C_{\rm TOT}f_{\rm S}V_{\rm OUT}$ ) under fixed- $V_{\rm OUT}$ 

Figure 22b, c show the parasitic loss comparison between the ASP-based and the 2DSP-based topologies. The power loss takes the integer and fractional parts into account in both cases. Furthermore, we can observe that the ASP-based technique effectively reduces the parasitic loss in all the modeled VCRs when compared with the 2DSP-based technique. In Fig. 22c, because of the parasitic loss at higher frequency, we did not scale any of the VCRs by decreasing  $V_{\rm IN}$ ; the loss difference between the two topologies becomes smaller when VCR increases.

#### 3.5 ASP-Based SC Boost Converter Implementation

This part introduces an ASP-based fully integrated SC boost converter with seven rational VCRs to support a wide input voltage range. Each SC power cell can be

reconfigured to generate either the integer or fractional ratios. Thus, the whole design achieves full capacitance utilization for optimizing the  $R_{SSL}$ . Figure 23a displays the SC converter overview. The power stage operates consists of dual interleaved branches operating with a 180-degree phase difference. Each branch consists of four SC cells ( $C_1 \sim 4$ ) with the top and bottom plate terminals connected to  $V_{IN}$ ,  $V_{OUT}$ , or  $V_{SS}$  for implementing different ratio configurations. A four-phase nonoverlapping (NOV) clock generator is employed with an externally injected master clock to reduce the shoot-through loss due to the short circuit conduction state. This design adopts an adaptive bootstrapping (ABS) gate driving technique for robust power switch control over a wide voltage dynamic range. The configuration control of all the seven VCRs is through receiving a three-bit binary code ( $D_{VIN}$ ), which determines the specific topology demand according to practical conditions.

The designed ASP-based SC boost converter was implemented in a 65 nm bulk CMOS process. The total flying capacitance is about 3 nF. The on-chip filtering capacitance contains 1 nF for  $V_{OUT}$  and 0.3 nF for  $V_{IN}$ . All the on-chip SC cell implementations employ parallel-connected MIM and MOS capacitors, stacking longitudinally to save the on-chip area. This design features using low-voltage power switches to improve the switch on-resistance ( $R_{ON}$ ) and reduce the switching loss. The implemented converter can boost a  $V_{IN}$  between 0.25 V and 1 V to a  $V_{OUT}$  of 1 V. Figure 23b exhibits the converter die micrograph with building block annotations, with the power switches, drivers, NOV clock generators, and buffers placed between the dual branch power cells. The chip occupies an active area of 0.54 mm<sup>2</sup>.



Fig. 23 (a) System overview of the implemented ASP-based SC boost converter and (b) the corresponding converter die micrograph

The SC boost converter shown in Fig. 23 covers a wide VCR range from 1:1.25 to 1:5, including 4:5, 2:3, 3:5, 1:2, 2:5, 1:3, and 1:5, specifically. By the property of uniform  $C_{fly}$  charge flow, the corresponding required SC cell number are 4, 2, 4, 1, 4, 2, and 4, respectively, implying that all seven VCR cases can be realized using  $C_{1 \sim 4}$  with full capacitance utilization. Figure 24 gives a power cell operation and partitioning mode summary. For the implemented VCRs,  $C_{1 \sim 4}$  can be configured to serve as either the Dickson or the ASP power cell. The designed converter generates the rational VCRs of 4:5, 2:3, 3:5, and 2:5 based on the ASP topology. A typical voltage doubler implements the 1:2 ratio, and the conventional Dickson topology generates the VCRs of 1:3 and 1:5 for optimal parasitic loss. In the converter design,  $C_{1 \sim 4}$  can be identical as they have the same charge flow amount under all the seven VCRs, which also include all the possible VCR cases using four  $C_{fly}$ .

The converter load regulation control is through pulse-frequency modulation (PFM). We apply a three-bit digital control to achieve the VCR reconfiguration together with a resistive ladder-based input voltage detector. To resolve the startup issue at low input voltage, we can use on-chip charge-pump techniques.

Figure 25 plots the measured power conversion efficiency (PCE) using a variable resistive load  $R_{\rm L}$  over the targeted input voltage range and a fixed  $V_{\rm OUT}$  at 1 V. The resistive loading changes from 85  $\Omega$  to 800  $\Omega$ , except for VCR = 1:5 with  $R_{\rm L}$  limited to 200  $\Omega$  due to the higher  $R_{\rm SSL}$ . The measured peak PCE ( $\eta_{\rm peak}$ ) is ~80% with  $R_{\rm L}$  between 85  $\Omega$  and 100  $\Omega$  at VCR = 2:3. From Fig. 22c, the 2:3 ratio with the ASP-based topology shows a lower parasitic loss than that with the ratio 4:5, which is in turn lower than the 1:2 ratio case. Consequently, the measured PCE for 1:2, 4:5, and 2:3 increases progressively, as displayed in Fig. 25. Regarding the ratio 3:5, even though the parasitic loss is slightly less than that of the ratio 2:3, its property of relatively higher  $R_{\rm SSL}$  loss (Fig. 22a) eventually affects the achievable PCE, corresponding to the measured results exhibited in Fig. 25.



Fig. 24 Summary of the power stage operating modes under all the seven VCRs


Fig. 25 Measured PCE over the targeted  $V_{IN}$  range under different resistive loads when generating a  $V_{OUT}$  of 1 V

Figure 26 exhibits the measured PCE over an output power range for each implemented VCR under a fixed  $V_{OUT} = 1$  V. The output power ranges from 1.2 mW to 20.4 mW, with a maximum power delivered at VCR = 1:2. Moreover, due to the specific  $V_{IN}$  selection, the results in the plot do not include the overall peak PCE point for the converter.

Table 3 summarizes the measured performance and presents a comparison of the ASP-based converter with other state-of-the-art designs. In the table, the design presented in [38] based on SP topology adopts high-density MIM capacitors to implement the  $C_{\rm fly}$ , hence, featuring low bottom plate parasitic capacitance. As the ASP-based converter exhibits a property of lower  $R_{\rm SSL}$  and parasitic losses, it demonstrates a comparable peak efficiency ( $\eta_{\rm peak}$ ) as [38], but with an estimated 1300 times power density improvement by employing higher density capacitance (MIM+MOS) as the on-chip  $C_{\rm fly}$ . In contrast, with the customized design using the fully depleted silicon-on-insulator (FD-SOI) process in [59], which realizes a higher  $\eta_{\rm peak}$ , the ASP-based converter design attains more than 4.6× power density enhancement in bulk CMOS with finer-grained VCRs. The discussed ASP-based converter also demonstrates higher peak efficiency and higher power density than



Fig. 26 Measured PCE over different output power range under fixed  $V_{OUT}$  of 1 V (DC) for all the seven VCRs

|                                                       | ASP design        | JSSC'16 [39]        | JSSC'15 [59]     | JSSC'15 [60]     | JSSC'17 [39]         | JSSC'18 [50]          | ISSCC'16 [47]             |
|-------------------------------------------------------|-------------------|---------------------|------------------|------------------|----------------------|-----------------------|---------------------------|
| Technology                                            | 65 nm CMOS        | 180 nm CMOS         | 28 nm FD-SOI     | 180 nm CMOS      | 180 nm CMOS          | 65 nm CMOS            | 0.35 μm<br>HVCMOS         |
| Conversion type                                       | Boost             | Boost               | Boost            | Boost            | Boost                | Buck-Boost            | Buck-boost                |
| Topology type                                         | ASP-based         | SP-based            | Customized       | Customized       | Moving-sum           | AVFI                  | Binary recursive          |
| VCR type                                              | Rational          | Rational            | Rational         | Integer          | Integer              | Rational              | Rational                  |
| Number of VCR                                         | 7                 | 14                  | 3                | 2                | <sup>a</sup> 22      | 13 (boost)            | 9 (boost)                 |
| VCR range                                             | $1.25 \sim 5$     | 1.33 ~ 8            | $1.5 \sim 2.5$   | 4~6              | $10 \sim 31$         | 1.1 ~ 7 (boost)       | 1.14 ~ 4 (boost)          |
| Integrated C <sub>fly</sub>                           | MOS + MIM         | HD-MIM              | MOS + MOM        | MOS + MIM        | N/R                  | MOS + MIM             | MIM                       |
| V <sub>IN</sub> Range [V]                             | 0.25~1            | 0.45 ~ 3            | 1                | 1                | $0.25\sim0.65$       | 0.26 ~ 1.3<br>(boost) | $2 \sim 6$ (boost)        |
| V <sub>OUT</sub> [V]                                  | 1                 | 3.3                 | $1.2 \sim 2.4$   | 3~6              | 4                    | 1.2                   | 5                         |
| IOUT_MAX [mA]                                         | 20.1              | 0.015               | 1                | 0.24             | <sup>a</sup> 0.001   | 21.7 (boost)          | 1.4 (boost)               |
| η <sub>peak</sub> [%]                                 | <sup>a</sup> 80   | 81                  | <sup>a</sup> 88  | 58               | 60                   | 83.2 (boost)          | 70.9 (boost)              |
| P-density @η <sub>peak</sub><br>[mW/mm <sup>2</sup> ] | <sup>a</sup> 22.7 | <sup>a</sup> 0.0174 | <sup>a</sup> 4.9 | <sup>a</sup> 2.4 | <sup>a</sup> ~0.0001 | 10.8 (boost)          | <sup>a</sup> 0.15 (boost) |
| Fully integrated                                      | Yes               | Yes                 | Yes              | Yes              | Yes                  | Yes                   | Yes                       |

Table 3 State-of-the-art SC converter performance summary and comparison

<sup>a</sup>Estimated from the corresponding literature <sup>b</sup>Regulation control executed externally

[47, 60] in boost conversion modes. In contrast to the boost mode reported in [50], the achieved power density by the ASP-based design is  $2.1 \times$  higher through reducing the power stage control redundancy. Figure 27 benchmarks the state-of-the-art fully integrated SC boost converters, including the discussed ASP-based design, in both bulk CMOS and special processes. We can observe that the ASP-based converter results in a higher power density while attaining a high number of VCR when compared with the existing designs in bulk CMOS.



Fig. 27 Performance benchmarking with state-of-the-art fully integrated SC boost converters

# 4 Conclusions

In this chapter, we introduced different energy harvesting interface designs using switched-capacitor (SC) power converters suitable for miniaturized IoT systems. In terms of vibration (AC-type) sources, we discussed both the FCR and SPFCR interfaces, which can significantly increase the PEH energy extraction efficiency without using external bulky high-Q inductors to obtain a compact system implementation. We can further employ the reusing of the capacitors in the SPFCR to achieve multi-VCR DC-DC conversion for wide input-power range adaptation in a component efficient manner. In terms of solar/thermal (DC-type) sources, we studied the ASP topology, which can attain an optimal conduction loss and reduced parasitic loss in the power stage. The employment of MIM+MOS as flying capacitors can significantly improve the power density over prior arts, while featuring a peak efficiency of up to 80% without using any external components. The proposed techniques can be especially useful for the next-generation low-cost miniaturized IoT systems with an extreme level of integration.

## References

- 1. Bandyopadhyay, S., & Chandrakasan, A. (2012, September). Platform architecture for solar, thermal, and vibration energy combining with MPPT and single inductor. *IEEE Journal of Solid-State Circuits*, 47(9), 2199–2215.
- Law, M. K., Jiang, Y., Mak, P. I., & Martins, R. P. (2022, April). Miniaturized energy harvesting systems using switched-capacitor DC-DC converters. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 69(6), 2629–2634.

- 3. Rincón-Mora, G. A. (2014, September). Miniaturized energy-harvesting piezoelectric chargers. In *Proceedings of the IEEE CICC*.
- Wang, Y., Dai, M., Wu, H., et al. (2021, December). Moisture induced electricity for selfpowered microrobots. *Nano Energy*, 90, Part A. https://doi.org/10.1016/j.nanoen.2021.106499
- Greatbatch, W., & Holmes, C. F. (1991). History of implantable devices. *IEEE Engineering in Medicine and Biology Magazine*, 10(3), 38–41.
- Malasri, K., & Wang, L. (2009). Securing wireless implantable devices for healthcare: Ideas and challenges. *IEEE Communications Magazine*, 47(7), 74–80.
- Yellen, B., Forbes, Z., Halverson, D., et al. (2005, May). Targeted drug delivery to magnetic implants for therapeutic applications. *Journal of Magnetism and Magnetic Materials*, 293(1), 647–654.
- Ozaki, T., et al. (2016, October). Fully-integrated high-conversion-ratio dual-output voltage boost converter with MPPT for low-voltage energy harvesting. *IEEE Journal of Solid-State Circuits*, 51(10), 2398–2407.
- Devaraj, A., Megahed, M., Liu, Y., Ramachandran, A., & Anand, T. (2019, December). A switched capacitor multiple input single output energy harvester (solar + piezo) achieving 74.6% efficiency with simultaneous MPPT. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 66(12), 4876–4887.
- Yoon, S., Carreon-Bautista, S., & Sánchez-Sinencio, E. (2018, December). An area efficient thermal energy harvester with reconfigurable capacitor charge pump for IoT applications. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 65(12), 1974–1978.
- Kadirvel, K., et al. (2012, February). A 330nA energy-harvesting charger with battery management for solar and thermoelectric energy harvesting. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 106–108).
- 12. Ylli, K., Hoffmann, D., Willmann, A., Becker, P., Folkmer, B., & Manoli, Y. (2015). Energy harvesting from human motion: Exploiting swing and shock excitations. *Smart Materials and Structures*, *24*(2), 025029.
- Mazzilli, F., Thoppay, P. E., Praplan, V., & Dehollain, C. (2012, May). Ultrasound energy harvesting system for deep implanted medical-devices (IMDs). In *Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)* (pp. 2865–2868).
- Roundy, S., & Wright, P. K. (2004, August). A piezoelectric vibration based generator for wireless electronics. *Smart Materials and Structures*, 13(5), 1131–1142.
- Chen, Z., Law, M. K., Mak, P. I., & Martins, R. P. (2017, February). A single-chip solar energy harvesting IC using integrated photodiodes with a 67% charge pump maximum efficiency. *IEEE Transactions on Biomedical Circuits and Systems*, 11(1), 44–53.
- Cheng, H. C., Chen, P. H., Su, Y. T., & Chen, P. H. (2021, October). A reconfigurable capacitive power converter with capacitance redistribution for indoor light-powered batteryless internet-of-things devices. *IEEE Journal of Solid-State Circuits*, 56(10), 2934–2942.
- Noh, Y. S., Seo, J. I., Kin, H. S., & Lee, S. G. (2022, September). A reconfigurable DC-DC converter for maximum thermoelectric energy harvesting in a battery-powered duty-cycling wireless sensor node. *IEEE Journal of Solid-State Circuits*, 57(9), 2719–2730.
- Kuai, Q., Leung, H. Y., Wan, Q., & Mok, P. K. T. (2022, June). A high-efficiency dual-polarity thermoelectric energy-harvesting Interface circuit with cold startup and fast-searching ZCD. *IEEE Journal of Solid-State Circuits*, 57(6), 1899–1912.
- 19. Erturk, A., & Inman, D. J. (2011). Piezoelectric energy harvesting. Wiley.
- Guyomar, D., Badel, A., Lefeuvre, E., & Richard, C. (2005, April). Toward energy harvesting using active materials and conversion improvement by nonlinear processing. *IEEE Transactions on Ultrasonics and Ferroelectrics*, 52(4), 584–595.
- 21. Chen, Z., Law, M. K., Mak, P. I., Ki, W. H., & Martins, R. P. (2017, February). A 1.7mm<sup>2</sup> inductor-less fully-integrated flipping-capacitor rectifier (FCR) for piezoelectric energy harvesting with 483% power extraction enhancement. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 372–373).

- 22. Chen, Z., Law, M. K., Mak, P. I., Ki, W. H., & Martins, R. P. (2017, December). Fullyintegrated inductor-less flipping-capacitor rectifier (FCR) for piezoelectric energy harvesting. *IEEE Journal of Solid-State Circuits*, 52(12), 3168–3180.
- 23. Yuk, Y. S., Jung, S., Gwong, H.-D., Choi, S., Sung, S. D., Kong, T.-H., Hong, S.-W., Choi, J.-H., Jeong, M.-Y., Im, J.-P., Ryu, S.-T., & Cho, G.-H. (2014, February). An energy pile-up resonance circuit extracting maximum 422% energy from piezoelectric material in a dual-source energy-harvesting interface. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers* (pp. 402–403).
- Kwon, D., & Rincón-Mora, G. A. (2014, October). A single-inductor 0.35µm CMOS energyinvesting piezoelectric harvester. *IEEE Journal of Solid-State Circuits*, 49(10), 2277–2291.
- Ramadass, Y. K., & Chandrakasan, A. P. (2010, January). An efficient piezoelectric energy harvesting interface circuit using a bias-flip rectifier and shared inductor. *IEEE Journal of Solid-State Circuits*, 45(1), 189–204.
- Wu, L., Do, X. D., Lee, S. G., & Ha, D. S. (2017, March). A self-powered and optimal SSHI circuit integrated with an active rectifier for piezoelectric energy harvesting. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 64(3), 537–549.
- 27. Du, S., Jia, Y., Do, C. D., & Seshia, A. A. (2016, November). An efficient SSHI Interface with increased input range for piezoelectric energy harvesting under variable conditions. *IEEE Journal of Solid-State Circuits*, 50(11), 2729–2742.
- Chen, Z., Jiang, Y., Law, M. K., Mak, P. I., Zeng, X., & Martins, R. P. (2019, February). A piezoelectric energy-harvesting interface using split-phase flipping-capacitor rectifier (SPFCR) and capacitor reuse multiple-VCR SC DC-DC achieving 9.3x energy-extraction improvement. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 424–425).
- Chen, Z., Law, M. K., Mak, P. I., Zeng, X., & Martins, R. P. (2020, August). Piezoelectric energy harvesting interface using split-phase flipping-capacitor rectifier with capacitor-reuse for input power adaptation. *IEEE Journal of Solid-State Circuits*, 55(8), 2106–2117.
- Du, & Seshia, A. A. (2017, October). An inductorless bias-flip rectifier for piezoelectric energy harvesting. *IEEE Journal of Solid-State Circuits*, 52(10), 2746–2757.
- Du, S., Jia, Y., Zhao, C., Amaratunga, G. A. J., & Seshia, A. A. (2019, June). A fully integrated Split-electrode SSHC rectifier for piezoelectric energy harvesting. *IEEE Journal of Solid-State Circuits*, 54(6), 1733–1743.
- 32. Villar-Piqué, G., Bergveld, H. J., & Alarcon, E. (2013, September). Survey and benchmark of fully integrated switching power converters: Switched-capacitor versus inductive approach. *IEEE Transactions on Power Electronics*, 28(9), 4156–4167.
- Sanders, S. R., et al. (2013, September). The road to fully integrated DC–DC conversion via the switched-capacitor approach. *IEEE Transaction on Power Electronics*, 28(9), 4146–4155.
- 34. Ki, W. H., Lu, Y., Su, F., & Tsui, C. Y. (2011, October). Design and analysis of on-chip charge pumps for micro-power energy harvesting applications. In *Proceedings of the 2011 IEEE/IFIP International Conference on VLSI System-on-Chip* (pp. 374–379).
- 35. Mondal, S., & Paily, R. (2016, March). An efficient on-chip switched-capacitor-based power converter for a microscale energy transducer. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 63(3), 254–258.
- 36. Li, J., Seo, J. S., Kymissis, I., & Seok, M. (2017, October). Triple-mode, hybrid-storage, energy harvesting power management unit: Achieving high efficiency against harvesting and load power variabilities. *IEEE Journal of Solid-State Circuits*, 52(10), 2550–2562.
- 37. Ballo, A., Grasso, A. D., & Palumbo, G. (2020, December). Charge pump improvement for energy harvesting applications by node pre-charging. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 67(12), 3312–3316.
- 38. Liu, X., Huang, L., Ravichandran, K., & Sánchez-Sinencio, E. (2016, May). A highly efficient reconfigurable charge pump energy harvester with wide harvesting range and two-dimensional MPPT for internet of things. *IEEE Journal of Solid-State Circuits*, 51(5), 1302–1312.

- Wu, X., Shi, Y., Jeloka, S., Yang, K., Lee, I., Lee, Y., Sylvester, D., & Blaauw, D. (2017, April). A 20-pW discontinuous switched-capacitor energy harvester for smart sensor applications. *IEEE Journal of Solid-State Circuits*, 52(4), 972–984.
- 40. Gi, H., Park, J., Yoon, Y., Jung, S., Kim, S. J., & Lee, Y. (2020, October). A soft-chargingbased SC DC–DC boost converter with conversion-ratio-insensitive high efficiency for energy harvesting in miniature sensor systems. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 67(10), 3601–3612.
- 41. Yoon, Y., et al. (2022, April). A continuously-scalable-conversion-ratio step-up/down SC energy-harvesting Interface with MPPT enabled by real-time power monitoring with frequency-mapped capacitor DAC. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 69(4), 1820–1831.
- 42. Kim, H., et al. (2021, September). A dual-mode continuously scalable-conversion-ratio SC energy harvesting Interface with SC-based PFM MPPT and flying capacitor sharing scheme. *IEEE Journal of Solid-State Circuits*, 56(9), 2724–2735.
- Beck, Y., & Singer, S. (2011, January). Capacitive transposed series-parallel topology with fine tuning capabilities. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 58(1), 51–61.
- 44. Salem, L., & Mercier, P. (2014, December). A recursive switched capacitor DC-DC converter achieving ratios with high efficiency over a wide output voltage range. *IEEE Journal of Solid-State Circuits*, 49(12), 2773–2787.
- 45. Jung, W., et al. (2014, December). An ultra-low power fully integrated energy harvester based on self-oscillating switched-capacitor voltage doubler. *IEEE Journal of Solid-State Circuits*, 49(12), 2800–2811.
- 46. Bang, S., Blaauw, D., & Sylvester, D. (2016, February). A successive-approximation switchedcapacitor DC–DC converter with resolution of V<sub>IN</sub>/2<sup>N</sup> for a wide range of input and output voltages. *IEEE Journal of Solid-State Circuits*, 51(2), 543–556.
- Lutz, D., Renz, P., & Wicht, B. (2016, February). A 10mW fully integrated 2-to-13V-input buck-boost SC converter with 81.5% peak efficiency. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 224–225).
- Jung, W., Sylvester, D., & Blaauw, D. (2016, February). A rational-conversion-ratio switchedcapacitor dc-dc converter using negative-output feedback. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 218–219).
- 49. Jiang, Y., Law, M., Mak, P., & Martins, R. P. (2018, February). A 0.22-to-2.4V-input finegrained fully integrated rational buck-boost SC DC-DC converter using algorithmic voltagefeed-in (AVFI) topology achieving 84.1% peak efficiency at 13.2mW/mm2. In *IEEE International Solid-State Circuits Conference – (ISSCC) Digest of Technical Papers* (pp. 422–424).
- Jiang, Y., Law, M. K., Mak, P. I., & Martins, R. P. (2018, December). Algorithmic voltagefeed-in topology for fully integrated fine-grained rational buck-boost switched-capacitor DC– DC converters. *IEEE Journal of Solid-State Circuits*, 53(12), 3455–3469.
- Butzen, N., & Steyaert, M. (2019, April). Design of single-topology continuously scalableconversion-ratio switched- capacitor DC–DC converters. *IEEE Journal of Solid-State Circuits*, 54(4), 1039–1047.
- 52. Jiang, Y., Law, M. K., Chen, Z., Mak, P. I., & Martins, R. P. (2019, November). Algebraic series-parallel-based switched-capacitor DC–DC boost converter with wide input voltage range and enhanced power density. *IEEE Journal of Solid-State Circuits*, 54(11), 3118–3134.
- 53. Jiang, Y., Law, M. K., Mak, P. I., & Martins, R. P. (2021, November). An arithmetic progression switched-capacitor DC-DC converter with soft VCR transitions achieving 93.7% peak efficiency and 400 mA output current. In *Proceedings of the IEEE Asian Solid-State Circuits Conference.*
- 54. Jiang, Y., Law, M.-K., Mak, P.-I., & Martins, R. P. (2022, October). Arithmetic progression switched-capacitor DC-DC converter topology with soft VCR transitions and quasi-symmetric two-phase charge delivery. *IEEE Journal of Solid-State Circuits*, 57(10), 2919–2933.

- 55. Seeman, M. D. (2009, May). A design methodology for switched-capacitor DC-DC converters." Univ. California, Berkeley, Tech. Rep.. UCB/EECS-2009-78.
- Le, H. P., Sanders, S. R., & Alon, E. (2011, September). Design techniques for fully integrated switched-capacitor DC-DC converters. *IEEE Journal of Solid-State Circuits*, 46(9), 2120–2131.
- 57. Tanzawa, T. (2010, October). On two-phase switched-capacitor multipliers with minimum circuit area. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 57(10), 2602–2608.
- Palumbo, G., & Pappalardo, D. (2010, March). Charge pump circuits: An overview on design strategies and topologies. *IEEE Circuits and Systems Magazine*, 10(1), 31–45.
- 59. Biswas, A., Sinangil, Y., & Chandrakasan, A. P. (2015, July). A 28 nm FDSOI integrated reconfigurable switched-capacitor based step-up dc-dc converter with 88% peak efficiency. *IEEE Journal of Solid-State Circuits*, 50(7), 1540–1549.
- 60. Tsai, J.-H., et al. (2015, November). A 1 V input, 3 V-to-6 V output, 58%-efficient integrated charge pump with a hybrid topology for area reduction and an improved efficiency by using parasitics. *IEEE Journal of Solid-State Circuits*, 50(11), 2533–2548.

# Fully Integrated Switched-Capacitor Power Converters



253

Junmin Jiang, Yan Lu, Wing-Hung Ki, and Rui P. Martins

# 1 Introduction

In recent years, monolithic and highly integrated DC-DC power converters are in great demand for various low-power devices, like implantable, wearable, and portable devices [1]. Integrating a DC-DC power converter fully on-chip is always favorable, as it potentially results in a simpler system design and smaller PCB footprint, and it also lowers the cost by eliminating or integrating the most costly power converter component: the power inductor.

J. Jiang

Y. Lu (🖂)

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China e-mail: yanlu@um.edu.mo

#### W.-H. Ki

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China e-mail: eeki@ust.hk

R. P. Martins State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

On leave from Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal e-mail: rmartins@um.edu.mo

Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China e-mail: jiangjm@sustech.edu.cn

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_7



Fig. 1 A typical SoC power delivery network

In typical system-on-chip (SoC) designs (Fig. 1), there are many different logic and functional blocks that need multiple individual voltage domains, enabled by multiple power converters and voltage regulators [2]. Meanwhile, if power converters can have zero external components that can significantly reduce the number of I/O pins of the SoC chip, the converters can deliver much better transient performances by allocating the power converters closer to the point of load.

Among linear voltage regulators, switching-mode power converters, and switched-capacitor (SC) power converters, SC converters are good for full integration with only capacitors used easily built on-chip with a nanometer process [3]. Although the efficiency of an SC converter drops linearly when the output voltage deviates from its ideal output voltage, we can still obtain good efficiencies with multiple voltage conversion ratios (VCRs). Therefore, SC converters attracted great interest from both the industry and the academia and are a promising alternative for the next-generation SoC power delivery. Several practical products emerged from the application of techniques presented in prior research works.

However, designing a high-performance on-chip SC power converter can be very challenging [3, 4]: First, the power efficiency of an SC converter with only a few VCRs is not high over wide input and output voltage ranges. Second, an SC converter has limited output impedance, and its maximum power density is a function of the on-chip capacitance density and the switching frequency; thus, an increase in power density will always sacrifice power efficiency. Hence, in a standard CMOS process of which the capacitance density is relatively low, there is a fundamental trade-off between power density and efficiency, and optimizing this trade-off can be challenging. Third, the output voltage ripples due to hard-charging currents affect the performances of noise-sensitive devices, and lowering the output voltage ripple requires higher switching frequency and larger capacitance. Therefore, minimizing the voltage ripple using minimum system resources and cost is also a stringent problem to solve.

To tackle the abovementioned design challenges, many circuit- and system-level techniques came out. Researchers and circuit designers try to optimize the SC design

with better trade-offs among power density, conversion efficiency, system cost, and design complexity. In this chapter, we will provide a systematic summary and design guidelines of recent SC converter design techniques. We will also review the advantages and drawbacks of these design techniques, in the aspects of topology generation, loss analysis and optimization, voltage ripple reduction, and closed-loop regulation.

The remaining of this chapter will have the following organization: Section 2 discusses the topology generation and selection, as well as the topology-level efficiency considerations. Section 3 analyzes the power conversion losses of SC converters and introduces techniques that reduce gate-drive switching loss and parasitic loss. Section 4 compares the centralized and distributive clock generation methods for multiphase SC converters. Then, we will describe two design examples: an SC converter-ring and a multi-output SC converter in Sects. 4 and 5, respectively. Finally, Sect. 6 draws the conclusions.

#### 2 Topology Generation

#### 2.1 Efficiency and Power Density Trade-Off

Topology generation or selection is the first step of consideration in most of the designs. With the input and output voltage ranges specified, we can determine the required VCRs first. For an SC converter, the theoretical efficiency is

$$\eta = \frac{V_{\rm OUT}}{M \times V_{\rm IN}},\tag{1}$$

where M is the ideal VCR of the selected topology. With only one VCR, the power conversion efficiency decreases monotonically when the output voltage drops from the ideally converted voltage ( $M \times V_{IN}$ ). For applications that require a wide input or output voltage range, it is important to reconfigure the power conversion cells for several VCRs to cater for a changing input voltage. The SC converter can then operate at a proper VCR that delivers the maximum efficiency.

Figure 2 shows the theoretical efficiency of an SC converter and a low-dropout regulator (LDO) with respect to the output voltage  $V_O$  [5]. For example, if  $V_O$  needs to be 1/2  $V_{IN}$ , the efficiency is  $\eta = 50\%$  when using an LDO. With the SC converter configured as M = 2/3, then we get  $\eta = 0.5/0.667 = 75\%$ ; on the other hand, with the SC converter reconfigured as M = 1/2, then the ideal efficiency can be 100%. Obviously, with more VCRs, the converter will have a higher averaged efficiency across the whole  $V_O$  and  $V_{IN}$  ranges. However, more VCRs need more flying capacitors and power switches; thus, combining multiple topologies in one power stage increases circuit complexity and the equivalent output resistance, reducing the output power capability and power density. Clearly, there is a trade-off between power efficiency and power density, and then, the target is to obtain an optimum high efficiency range with a reasonable design complexity.



Fig. 2 Theoretical efficiency comparison of an SC converter with four VCRs and six VCRs versus an ideal low-dropout regulator [5]

## 2.2 Two-Phase Limitation and Three-Phase Operation

Most switched-capacitor converters use only two operational clock phases, with the number of VCRs limited by the number of flying capacitors [6]. For example, with two flying capacitors, the realizable step-down VCRs are  $1\times$ ,  $2/3\times$ ,  $1/2\times$ , and  $1/3\times$  only. If more VCRs such as  $3/4\times$  and  $1/4\times$  are necessary, the converter requires one more flying capacitors.

An alternative method to realize more VCRs while keeping the number of flying capacitors unchanged is to use a multiple clock phase operation [7–10]. A three-phase operation [7] and two- or three-phase operation [8] used in step-up SC converters boost the output voltage to  $6\times/7\times$  of the input voltage for LED/LCD driver applications. Similarly, when applied to step-down SC converter in [9], it generates a very low output voltage ( $1/4\times$ ) for wireless biomedical implants. Experimental results show an efficiency of 70% obtained for  $V_{\rm O} = 0.5$  V. Figure 3 shows the three-phase topologies ( $3/4\times$  and  $1/4\times$ ) using only two flying capacitors and achieving up to 20% efficiency improvements together with a higher average efficiency over wide  $V_O$  and  $V_{\rm IN}$  ranges [10].

In a short summary, multiple-phase operation uses fewer capacitors and switches; furthermore, it realizes a better trade-off between power efficiency and density, covering wider input and output voltage ranges.



Fig. 3 Operation states of two topologies that use three-phase configuration realizing (a)  $1/4 \times$  mode and (b)  $3/4 \times$  mode [10]

## 2.3 Review of Other Topologies

To cover a wider voltage range with high efficiencies, some reconfigurable SC converters have a large number of VCRs, for example, the successive approximation (SAR)-based SC converter that has 117 VCRs [11]. By reconfiguring cascaded power cells that have M = 1/2, each power cell can be the top or the bottom voltage domain for the next stage, such that the output voltage has seven-bit resolution [12]. The topology is further improved by using recursive SC converters [13, 14]. In [15], a gear train topology emerged using five off-chip capacitors constructed four stacked power stages that realized 24 VCRs. We can find similar works in [16], and algebraic series-parallel topologies appeared to generate more VCRs to cover wide-voltage ranges [16, 17]. However, these converters have a common drawback; the output impedance is high due to the stacking of too many power switches in series that limit the load current capability and power density. But, we may use them in low-power applications with stringent requirements on system integration.

As mentioned above, a SoC requires multiple voltage domains for individual functional blocks, and then, single-input multiple-output SC converters, with capacitors and transistors potentially shared to save silicon area overhead and improved overall power efficiency, can serve the purpose well. Reference [18] proposed a dynamic power cell allocation scheme for multicore application processors. The dynamic allocation of power cells according to load demands can improve the efficiency by 4.8% when compared with the case without it. The peak efficiency was 83.3% and the maximum load was 100 mA, meanwhile, minimizing the cross regulation. Reference [19] presented a specific application that requires two outputs with different loads and used an on-demand strategy to compensate the current shortage, thus saving on-chip capacitor area. In [20], VCR of  $2\times$  and  $3\times$  shared one transistor and reduced silicon area and improved the efficiency.

#### **3** Efficiency Optimization

## 3.1 Unified Models for Losses in the SC Converter

When designing fully integrated switched-capacitor (SC) converters, optimizing efficiency is one of the most important procedures to ensure the maximum power density under peak efficiency. However, the loss contribution of SC converters may arise from multiple factors and may vary with different topologies, leading to complexity in analysis and optimization. In this subsection, we present a methodology to predict the overall efficiency and find the optimized peak efficiency.

Switched-capacitor converters can have an ideal 100% efficiency under close-tono-load condition, besides, the power efficiency starts to drop when the charge transfer on the flying capacitors happens, due to the well-known charge redistribution loss. In general, the output voltage drops proportionally with the loading current, forming an equivalent output resistor ( $R_{OUT}$ ) at the output node. Such that, Eq. (1) above is the expression of the theoretical efficiency for a certain VCR. We can observe that there is a relationship between the efficiency and the proximity of the real output voltage ( $V_{OUT}$ ) to the ideal output voltage ( $MV_{IN}$ ). The charge redistribution loss, also known as hard-charging loss, is the integrated conduction energy loss from the resistive loss on the switches. M. Seeman proposed a unified model to calculate  $R_{OUT}$  (Fig. 4) [21, 22].

We can model an SC converter as an ideal DC voltage source with an ideal transformer representing the voltage conversion and also with a finite output resistance  $R_{\text{OUT}}$  [21, 22], composed by  $R_{\text{SSL}}$  (slow switching limit resistance related to the charge redistribution loss) and  $R_{\text{FSL}}$  (fast switching limit resistance due to the finite conductance of switches), expressed by

$$R_{\rm SSL} = K_C \frac{1}{C_F f_{\rm SW}} \tag{2}$$

$$R_{\rm FSL} = K_S R_{\rm ON} \tag{3}$$

where  $K_C$  and  $K_S$  are topological factors determined by the charging scenario,  $C_F$  is the capacitance of the flying capacitor,  $f_{SW}$  is the switching frequency, and  $R_{ON}$  is the on-resistance of the switches.

The overall output resistance becomes

Fig. 4 Transformer-based SC converter model



| <b>Fable 1</b> Summary of the | VCRs | K <sub>C</sub> | R <sub>SSL</sub>        | K <sub>S</sub> | $R_{\rm FSL}$            |
|-------------------------------|------|----------------|-------------------------|----------------|--------------------------|
| of three VCRs                 | 2×   | 1              | $\frac{1}{C_F f_{SW}}$  | 4              | $4R_{\rm ON}$            |
|                               | 3/2× | $\frac{1}{2}$  | $\frac{1}{2C_F f_{SW}}$ | $\frac{7}{2}$  | $\frac{7}{2}R_{\rm ON}$  |
|                               | 4/3× | $\frac{1}{3}$  | $\frac{1}{3C_F f_{SW}}$ | $\frac{20}{9}$ | $\frac{20}{9}R_{\rm ON}$ |

$$R_O \approx \sqrt{R_{\rm SSL}^2 + R_{\rm FSL}^2}.$$
 (4)

This model assumes that the output voltage is an ideal DC voltage with neglected voltage ripple. Reference [23] pointed out that Eq. (4) may be inaccurate when the output ripple voltage is very large and presented an improved solution. Deviation from Eq. (4) may also occur if  $R_{SSL}$  is close to  $R_{FSL}$ . Otherwise, this model is accurate enough in the estimation of the  $R_O$  and in predicting the output voltage  $V_O$ ; thus, it became a widely used practical model [24, 25].

Here, we present examples to calculate  $R_O$  for three topologies: the 2×, 3/2×, and 4/3× topologies that use two-phase clock. We design all  $R_{ON}$  as equal, as they conduct the same amount of charge. Table 1 summarizes  $K_C$  and  $K_S$  for the three VCRs. The major loss is due to the equivalent IR drop of  $R_{OUT}$ , and from Eqs. (2) and (3), it is necessary to reduce  $R_{OUT}$  loss, high  $f_{SW}$ , and large transistor width  $W_{SW}$ .

#### 3.2 Switching and Parasitic Losses

In addition to conduction losses, the gate-drive switching loss  $P_{SW}$  and parasitic loss  $P_{PARA}$  are also significant, especially for fully integrated SC converters. They are actually determining the peak efficiency of the regulated SC converters, due to the adjustment of the output resistor  $R_0$  to obtain a regulated  $V_0$  under different loads. At certain  $V_0$ , the theoretical efficiency would be identical. Le et al. analyzed in [26] these two losses in addition to Seeman's model. We can calculate the gate-drive switching loss  $P_{SW}$  by knowing the switching frequency  $f_{SW}$ , the gate capacitance  $C_{GATE}$ , and the driving voltage  $V_{SW}$ . Regarding the parasitic loss, it is still complex and may vary a lot over different topologies [27, 28].

In 2020, Jiang et al. [29] presented a unified method to simplify the parasitic loss calculation by observing the voltage swing of individual parasitic capacitors. We use  $1/3 \times$  mode SC converters as examples to analyze the parasitic loss reduction. We assume that the additional charge introduced by the parasitic capacitors will not affect the flying capacitor voltages, as the parasitic capacitors are much smaller (usually below 5%) than the main capacitors. We also suppose a no-load condition such that the capacitor voltages would not change among the operational phases.

Let us consider the  $1/3 \times$  SC converter from Fig. 5, with the positive- and negative-plate parasitic capacitors  $C_{1p+}$ ,  $C_{1p-}$ ,  $C_{2p+}$ , and  $C_{2p-}$ , where we labeled their voltage swings in both phases. For the summation-mode converter, when  $\Phi_1$ 



Fig. 5 Parasitic capacitors on the top and bottom plates of the  $1/3 \times$  mode

changes to  $\Phi_2$ ,  $C_{1p+}$  charges from  $V_O$  to  $V_{IN}$  with a charge  $Q_{1P+}$ . The energy sourced from  $V_{IN}$  is

$$E_{1P+,CH} = V_{IN}Q_{1P+} = V_{IN}(V_{IN} - V_O)C_{1P+} = \frac{2}{3}V_{IN}^2C_{1P+}$$
(5)

When  $\Phi_2$  changes to  $\Phi_1$ ,  $C_{1p+}$  discharges from  $V_{IN}$  to  $V_O$ , and the energy returned to  $V_O$  becomes

$$E_{1P+,\text{DIS}} = V_0 Q_{1P+} = V_0 (V_{\text{IN}} - V_0) C_{1P+} = \frac{2}{9} V_{\text{IN}}^2 C_{1P+}$$
(6)

The energy loss due to  $C_{1P+}$  is the difference of Eqs. (5) and (6), and we can write it as

$$E_{1P+,\text{LOSS}} = E_{1P+,\text{CH}} - E_{1P+,\text{DIS}} = \frac{4}{9} V_{\text{IN}}^2 C_{1P+}$$
(7)

For  $C_{1p-}$ , it charges from 0 to 2/3 V<sub>IN</sub> in  $\Phi_2$ :

$$E_{1P-,CH} = (V_{IN} - V_O) \frac{2}{3} V_{IN} C_{1P-} = \frac{4}{9} V_{IN}^2 C_{1P-}$$
(8)

In  $\Phi_1$ , with all the charges dumped back to ground by  $C_{1p-}$ , the loss is

Fully Integrated Switched-Capacitor Power Converters

$$E_{1P-,\text{LOSS}} = E_{1P-,\text{CH}} = \frac{4}{9} V_{\text{IN}}^2 C_{1P-}$$
(9)

In general, considering the parasitic capacitor  $C_P$ , charged and discharged between two voltages  $V_L$  and  $V_H$ , in the charging phase, the energy sourced from the system is

$$E_{P,CH} = V_H (V_H - V_L) C_P \tag{10}$$

In the discharging phase, the energy returned to the system is

$$E_{P,\text{DIS}} = V_L (V_H - V_L) C_P. \tag{11}$$

Hence, the energy of the parasitic loss is the following:

$$E_{P,LOSS} = E_{P,CH} - E_{P,DIS} = (V_H - V_L)^2 C_P = \Delta V^2 C_P$$
(12)

The dominant factor of the parasitic loss is the voltage swing  $\Delta V$  that is  $(V_H - V_L)$ , where the parasitic capacitor  $C_P$  charges and discharges between these two voltages  $V_L$  and  $V_H$ . Then, we derive the parasitic loss of one parasitic capacitor  $C_P$  as

$$P_{\text{PARA},CP} = C_P (V_H - V_L)^2 f_{\text{SW}} = \Delta V^2 C_P f_{\text{SW}}$$
(13)

By using Eq. (13), we can calculate the parasitic loss  $P_{\text{PARA}}$  of all parasitic capacitors  $C_{ip+}$  and  $C_{ip-}$  (i = 1...N) by finding out the voltage swings of the positive and negative plates.

#### 3.3 Gate Switching Loss and Parasitic Loss Reduction

The concept of reducing the gate-drive switching loss implies the use of low-voltage (thin-oxide) transistors [25]. The method places in cascode several thin-oxide transistors to withstand a higher breakdown voltage. Because the feature size of the thin-oxide transistor is less than that of the thick-oxide transistors, the gate parasitic capacitance is much lower.

Figure 6 shows the operating principle of the NMOS stacking transistors. The turn-on resistance  $R_{ON}$  of a MOS transistor is

$$R_{\rm ON} = \frac{L_{\rm MIN}}{KV_{\rm OD}W_{\rm SW}} \tag{14}$$

where K is a process-related parameter,  $V_{OD}$  is the overdrive voltage of the transistor, and  $L_{MIN}$  is the minimum channel length. We can implement a power switch using



Fig. 6 Operating principle of the NMOS (N-type metal-oxide semiconductor) stacking transistors

one thick-oxide high-voltage transistor or two stacking thin-oxide low-voltage transistors. If the two implementations have the same  $R_{ON}$ , then for each type of transistors,

$$R_{\rm ON\_L} = \frac{1}{2} R_{\rm ON\_H} \tag{15}$$

Considering Eqs. (14) and (15) together, the size ratio of the thick-oxide transistor to thin-oxide transistor is

$$\frac{W_{\text{SW}\_H}}{W_{\text{SW}\_L}} = \frac{L_H}{2L_L} \frac{K_L V_{\text{OD}\_L}}{K_H V_{\text{OD}\_H}}.$$
(16)

Now, the switching loss becomes

$$P_{\rm SW} = V_{\rm SW}^2 f_{\rm SW} C_{\rm GATE} W_{\rm SW}.$$
(17)

Then, the ratio of their switching losses is

$$\frac{P_{\mathrm{SW}_H}}{P_{\mathrm{SW}_L}} = \frac{V_{\mathrm{SW}_H}^2 C_{\mathrm{GATE}_H} W_{\mathrm{SW}_H}}{2V_{\mathrm{SW}_L}^2 C_{\mathrm{GATE}_H} W_{\mathrm{SW}_I}}.$$
(18)

In a typical 0.18 µm CMOS process, we have 1.8 V thin-oxide transistors and 5 V thick-oxide transistors. Then, the lengths are  $L_H = 0.5$  µm for NMOS,  $L_H = 0.7$  µm for PMOS (p-channel metal-oxide semiconductor), and  $L_L = 0.18$  µm. The overdrive voltages are  $V_{\text{OD}\_H} = 3$  V and  $V_{\text{OD}\_L} = 1.2$  V. We extract other parameters from the process design kit and list them in Table 2. The results show that using low-voltage transistors, we can obtain a 2.615× and 1.778× switching loss reduction for the NMOS and PMOS switches, respectively. This helps the converter to achieve 82% peak efficiency in 0.18 µm CMOS. In [30], six thin-oxide transistors used in a cascode arrangement allow the SC converter in 65 nm CMOS to switch faster.

| <b></b>                        |                             |        |        |        |        |  |  |
|--------------------------------|-----------------------------|--------|--------|--------|--------|--|--|
| Table 2         Switching loss |                             | NMOS   |        | PMOS   |        |  |  |
| calculations                   | Туре                        | Low-V  | High-V | Low-V  | High-V |  |  |
|                                | $L_{\rm MIN}$ ( $\mu m$ )   | 0.18   | 0.7    | 0.18   | 0.5    |  |  |
|                                | $K (1 m/\Omega V)$          | 0.147  | 0.138  | 0.048  | 0.03   |  |  |
|                                | $C_{\text{GATE}}$ (fF/LMIN) | 0.52   | 0.82   | 0.62   | 0.62   |  |  |
|                                | $W_H/W_L$                   | 0.829× |        | 0.889× |        |  |  |
|                                | $P_{SW_H}/P_{SW_L}$         | 2.615× |        | 1.778× |        |  |  |
|                                |                             |        |        |        |        |  |  |

It is even more necessary to use cascoded devices in high-voltage applications, since the  $Q_g R_{ON}$  product of the thin-oxide transistor is much smaller than that of the high-voltage DMOS (deep diffusion metal oxide semiconductor) transistors [31, 32]. In [32], two 3.3 V transistors cascoded in an 11/1× topology convert a high voltage (35–40 V) to 3.3 V with 94.7% peak efficiency. In [8], 3.3 V and 5 V transistors cascoded in a 6× step-up SC converter with a 15 V output voltage exhibit reduced gate switching loss.

Parasitic loss is also proportional to the switching frequency. It becomes significant on a fully integrated SC converter, especially when MOS capacitors utilize flying capacitors. Multiple works [33–39] reported reduced parasitic losses, by using low parasitic ferroelectric capacitor [33], deep trench capacitor [34, 35], parasitic loss recycle techniques [36, 37], and dynamic voltage biasing techniques [38, 39]. All these methods reduce the loss caused by parasitic capacitance and can increase the efficiency.

#### 3.4 Efficiency Optimization

We can obtain the overall efficiency as

$$\eta(f_{\rm SW}, W_{\rm SW}) = \frac{P_O}{P_O + P_{\rm LOSS}} \tag{19}$$

$$P_{\rm LOSS} = P_C + P_R + P_{SW} + P_{\rm PARA} \tag{20}$$

Obviously, the gate switching loss  $P_{SW}$  and the parasitic loss  $P_{PARA}$  are proportional to the switching frequency  $f_{SW}$  and the transistor size  $W_{SW}$ , while the charge redistribution loss  $P_C$  and the conduction loss  $P_R$  are inversely proportional to  $f_{SW}$  and  $W_{SW}$ . Then, we can find the optimum efficiency point by sweeping  $f_{SW}$  and  $W_{SW}$ .

Figure 7 illustrates an example of efficiency curves with the optimum point at the maximum load condition ( $I_{LOAD} = 600$  uA) [25]. We conducted the efficiency calculation and simulation using MATLAB, and to obtain the peak efficiency of each VCR, we swept  $f_{SW}$  and  $W_{SW}$  from 10 MHz to 30 MHz and from 10 µm to 40 µm, respectively. Figure 7 shows the results in three-dimensional curves. For the  $4/3 \times$  mode, the peak efficiency is 82.5% when  $f_{SW}$  is 11.3 MHz and  $W_{SW}$  is 27.5 µm. For the  $3/2 \times$  mode, the peak efficiency is 80.5% when  $f_{SW}$  is 15.7 MHz and  $W_{SW}$  is



**Fig. 7** Simulated efficiency with respect to the switching frequency  $f_{SW}$  and the width of the power transistor width  $W_{SW}$ 

33.94 µm. The optimal  $f_{SW}$  for the  $3/2 \times$  mode is higher than the optimal  $f_{SW}$  of the  $4/3 \times$  mode because we used fewer transistors; as the switching loss is lower, we can employ larger transistors. For the  $2 \times$  mode, the peak efficiency is 80% when  $f_{SW} = 19$  MHz and  $W_{SW} = 40$  µm. This mode uses the smallest number of transistors; thus, switching and parasitic losses are significantly lower than the other two modes; however, both  $f_{SW}$  and  $W_{SW}$  can be larger. In conclusion, by using this model, we can obtain optimized efficiency for certain topologies.

# 4 Clock Generation and Distribution: 123-Phase Converter Ring

# 4.1 General Concept of Multiphase Interleaving

We can consider output voltage ripple as power loss, because a larger ripple means that we should reserve a larger supply voltage for the load. To reduce the ripple, we can easily apply a multiphase interleaving scheme in fully integrated SC power converters [26, 40-47]. Figure 8 presents the concept and system diagram, where we implement multiphase interleaving by partitioning the SC power stage into multiple small cells, with these power cells driven by different clocks ( $ck_1$  to  $ck_n$ ). Adjacent clocks have a  $360^{\circ}/n$  phase shift and T/n delay where T is the switching clock period, such that the output voltage has a higher equivalent frequency; thus, we can reduce the output voltage ripple. An n-phase voltage-controlled oscillator (VCO) can easily generate multiphase interleaving clock signals. To effectively regulate  $V_{OUT}$ , a frequency modulation scheme is favorable, as it saves unnecessary switching losses as well. After the error amplifier senses  $V_{OUT}$  and generates the control signal  $V_C$ , it will adjust the switching frequency according to the load condition. Besides reducing the output voltage ripple, we can also significantly reduce the input current  $(I_{IN})$ ripple as the discontinuous inrush input current of a single-phase converter would be evenly distributed among interleaving phases for a multiphase converter. Consequently, we can use smaller input and output capacitances. As such, more interleaving phases are beneficial and preferable in recent fully on-chip SC converter works



Fig. 8 System diagram and waveforms of multiphase interleaving SC converter [27]

[40–47]. However, distributing a large number of interleaving clock phases across a large converter chip area can be challenging.

## 4.2 Clock Generation: Centralized Versus Distributive

Figure 9 presents two schemes of interleaving clock generation and distribution. Figure 9a shows the H-tree structure with centralized clock generation and then distribution, commonly used in large digital circuits and systems. For a multiphase SC converter, each power cell needs one clock signal from the central VCO, and *N* phases will need an *N*-bit clock bus running over the whole converter, complicating the design. Moreover, in order to obtain good phase matching, the power stage layout has to be symmetrical, thus restricting the layout shape of the power management unit to rectangular. To distribute the interleaving clock phases by each of the power cells, we need to route them from the central VCO to the power cells. Then, we will get a parasitic capacitor  $C_{P_{-1}CELL} = \log_2 N \times L \times C_{PAR0}$ , where *N* is the phase number and  $C_{PAR0}$  is the unit parasitic capacitance in fF/µm. The total parasitic capacitance of all the clock wires driven by the VCO is  $C_{P_{-TOTC}} = N \times \log_2 N \times L \times C_{PAR0}$ . Therefore, the power consumption for the clock distribution is large. Consequently, the number of clock phases in most of the works is under 50 [26, 41–43].

On the other hand, Fig. 9b presents the distributed scheme [44–47], where we design the power cells to be identical, and the adjacent power cells generate clock phases with a fixed delay from the preceding. When connecting N such cells (N is an odd number) in a ring, we can form a ring oscillator along with the power converter. Each power cell supplies power to the power rails that run through the whole



Fig. 9 Comparison between clock generation and distribution, as well as parasitic capacitance on the routing wires of (a) centralized scheme and (b) distributive scheme

converter. When compared to the H-tree scheme, the distributed clock paths are shorter, and then the parasitic capacitance along the clock wire is only  $C_{P\_TOTD} = N \times L \times C_{PAR0}$  which is much smaller. Subsequently, the power consumption of the VCO is also much lower. Meanwhile, it is not necessary to locate the power cells on the periphery of the chip; actually they can run through the loading blocks that require power, as long as the connected power cells form a closed-loop ring. One possible drawback for this scheme is that the total parasitic capacitance along the clock routing paths will affect the switching frequency of the power ring. To tackle this issue, we should size the inverters in the ring oscillator accordingly.

For fully on-chip SC converters dealing with fast load transients, even the input and output decoupling capacitors, or at least part of them, need full integration on-chip. They would occupy a large die area, and we can reduce their values only by decreasing both the input rush current and the output-voltage ripple. We can effectively diminish these ripples by using multiphase interleaving. Two recent works with a large numbers of phases emerged, 101 phases in [47] for driving LEDs and 123 phases in [45, 46] for microprocessors, thus achieving very low output voltage ripple without using external capacitors. To summarize, the distributed scheme has the advantages of layout flexibility and lower power consumption when compared with the centralized scheme. We should draw a special attention to the buffer capability of the distributed ring oscillator.

# 4.3 123-Phase SC Converter Ring

Figure 10 illustrates a ring-shaped SC converter surrounding the load, to take full advantage of the multiphase interleaving technique [45]. In addition, the converter ring achieved a unity-gain frequency (UGF) higher than its switching frequency by setting its dominant pole on the output node. The designed converter ring consists of many time-interleaved power cells and only one controller. For a Lego-like layout, the size of the controller layout is exactly the same as that of one power cell. We planned the input and GND pins of the converter ring on every corner of the chip, without affecting the pads of the load. Similar to a standard pad ring, the converter ring surrounds the load in the square, with minimum changes (if not zero change) necessary for the existing layout of the load. One of the advantages of the power cell approach is its simplicity: we only need to design one power cell and the complete power ring. The converter ring layout and bumping diagram are also compatible with flip chip packaging. One advantage of integrating a step-down DC-DC converter on chip is that the input current is much smaller than the load current, thus reducing the input bump/pad current stress.

The regulation of the SC converter can use LDO-assisted loop [48], hysteresis control [49], pulse skipping modulation [50], and frequency modulation [51]. For a multiphase SC converter, frequency modulation is the most appropriate method since using LDO and using hysteresis control are both not feasible.







Figure 11 exhibits the small-signal analysis of the multiphase SC converter. One key feature of this circuit is the fact that the UGF of the designed multiphase converter is a few times higher than its switching frequency. The following features allow that to happen: (1) to consider the time-interleaved multiphase SC converter as a pseudo-continuous-time power converter, (2) to set the dominant pole at the output node, (3) to employ a high-speed error amplifier (EA), and (4) to tune the oscillator frequency through its supply to change the switching frequency of all phases instantly and simultaneously.

A switched-capacitor circuit is basically equivalent to a discrete-time resistor. Therefore, it only provides a first-order filtering in the power stage. Meanwhile, multiphase operation empowers the SC converters with more attractive features, for example, smaller input and output ripples, and faster transient responses, that allow the converter to respond within a small fraction of the switching period, acting more like a continuous-time power converter. On the other hand, the LC filter of a buck converter operating in continuous conduction mode (CCM) is a second-order filter, which can provide better filtering but limits the loop bandwidth and slows down the transient response. Also, it is necessary to change the inductor current before the regulation of the output voltage during load/line transients.

For the control loop design, there are several benefits of designing the dominant pole at the output node, as discussed in [46]. If the output pole  $p_0$  is a nondominant pole, the loop needs to have an internal dominant pole with a frequency that is a couple of decades lower than  $p_0$ , which will limit the UGF. To set  $p_0$  as the dominant pole, the converter can drive a large capacitive load without affecting the loop stability. Higher capacitive load is always better for the loop stability.

Following a conventional design methodology, the AC signals that are higher than  $f_{SW}/2$  cannot pass through a discrete-time power stage, as imposed by the Nyquist theorem. On the other hand, multiple time-interleaving phase switchedcapacitor power cells (SCPCs) act as a pseudo-continuous-time stage [46], which means that the AC signal higher than  $f_{SW}$  can also pass through the multiphase discrete-time power stage. In the VCO-based pulse frequency modulation (PFM) of SC converters, after the conversion of the voltage information  $V_{DDC}$  to the frequency domain by the VCO, there is another conversion back to the voltage domain through the multiphase SC power stage. Therefore, with a high-speed error amplifier (EA) design, we can obtain an UGF that is a few times higher than the  $f_{SW}$ . Although the buck converter can also enjoy the bandwidth extension benefit of multiphase interleaving, the abovementioned pseudo-continuous-time condition does not apply to buck converters because they can use PFM control as well, including hysteretic control and constant on-/off-time control. However, in fact, the constant on-time control belongs to both categories of PWM and PFM, because the inductor-based converter always requires the duty ratio information for output voltage regulation. Besides, during the load transient period, the duty ratio should be optimally 100% for light-to-heavy load transient and 0% for heavy-to-light load transient. The PWM sampling effect still exists in the constant-on-time controller, limiting the bandwidth extension. Therefore, we can only apply to SC converters [44, 45] a fixed duty ratio PFM, considered as a pseudo-continuous-time operation.

Figure 12 presents the chip micrograph of the first version of the converter ring design [44] implemented in 65 nm CMOS, for microprocessor applications. It has 30 power cells and 1 controller on the top edge plus 31 power cells on the other 3 edges, forming a ring around the whole chip. The number of power cells can be an arbitrary large number, depending on the layout and power cell shape and sizes. But the number of power cells will also decide the number of inverters in the ring oscillator, which determines the maximum switching frequency and consequently the maximum output power.

Figure 13 displays the measured load transient response, reference tracking, and output voltage ripple waveforms of the first converter ring design. We place one load of 25 mA on each corner of the chip to emulate the load transient events. For the load transients between 10 mA and 110 mA, the output voltage variations are within 58 mV with  $V_{IN} = 2$  V,  $V_{OUT} = 1.1$  VCM = 2/3, benefiting from the designed high UGF. To accommodate the dynamic voltage scaling (DVS) function, we demonstrated a reference tracking speed of 2.5 V/µs. The measured output ripples range from 2.2 mV to 30 mV, in a variety of loads and  $V_{OUT}/V_{IN}$  conditions. The phase





Fig. 13 Measured load transient response, reference tracking, and output voltage ripple waveforms of the converter ring [43]

mismatch on the chip corners and PVT variations dominate the nonideal output ripples. In summary, this SC converter ring exhibits low voltage ripple and fast transient response.

## 5 Multi-Output Switched-Capacitor Converter

For multicore application processors in the smartphone and the smart watch, powersaving techniques such as dynamic voltage and frequency scaling (DVFS) that extend the battery charging cycle are highly favorable. Yet, each core may need a different supply voltage [52, 53]. High-efficiency fully integrated SC power converters with no external component are promising candidates. Figure 14 shows the strategy of dynamic power cell allocation proposed in [18]. Typically, SC converters with different specifications have independent designs, leading to a large area overhead as each converter has to handle its peak output power. Recently, multioutput SC converters emerged to tackle this issue. Reference [19] uses the on-demand strategy to control the two outputs, each with a different loading range, with the outputs not interchangeable. Reference [20] fixes the two output voltages with voltage conversion ratios (VCRs) of  $2 \times$  and  $3 \times$  only. Reference [54] integrates the controller, but the three output voltages are still from three individual SC converters. Without reallocating the capacitors in the power stages, capacitor utilization is low as it is necessary to reserve margins to cater for each peak output power. Finally, [55] proposed a dual-output SC converter with one flying capacitor crossing technique to improve the power efficiency.



Fig. 14 Strategy of dynamic power cell allocation and system architecture of the dual-output SC converter [18]

In this subsection, we introduce a fully integrated dual-output SC converter with dynamic power cell allocation for application into processors. We can dynamically allocate the shared power cells according to load demands. A dual-path VCO that works independently of power cell allocation achieves a fast and stable regulation loop. The converter can deliver a maximum current of 100 mA: we can adjust one output to deliver 100 mA, while the other handles a very light load, or adjust both outputs to deliver 50 mA each with over 80% efficiency.

The converter consists of two channels (CH<sub>1</sub> and CH<sub>2</sub>) with output voltages  $V_{O1}$  and  $V_{O2}$ , respectively, with each output regulated through frequency modulation by dual VCOs. The switching frequencies of the two channels are  $f_1$  and  $f_2$ . The strategy of dynamic load allocations adjusts the switching frequencies to be equal in order that both channels have the same power density, and the whole converter obtains the best overall efficiency.

The SC converters that consist of multiple power cells can operate in a multiphase interleaving mode, with each power cell as the unit cell allocated between two channels. From Fig. 14, we assume that the two channels start with the same number of power cells, but the load of CH<sub>1</sub> is larger than that of CH<sub>2</sub>. To regulate the outputs properly, we should initially have  $f_1 > f_2$ , with more power cells eventually assigned to CH<sub>1</sub>. This means that the physical boundary should move to the right, until  $f_1$  and

 $f_2$  are approximately equal. By balancing the power densities of the two channels with an optimal switching frequency, we balanced both switching and parasitic losses leading to their final reduction. By dynamically adjusting both the numbers of power cells and the optimal switching frequencies, we ensure that the channels provide sufficient power to the loads and maximize the utilization of capacitors.

The channel selection switches connect the power cells to either CH<sub>1</sub> or CH<sub>2</sub>. The boundary of the two channels are controlled by the outputs of the bidirectional shift register (SR) sel<sub>[1:m + n]</sub> control the boundary of the two channels. We determine the direction of boundary shifting with the frequency comparator. After each comparison, the boundary will only shift along adjacent power cells as sel<sub>[1:m + n]</sub> will only shift by one bit. As such, we minimize the potential glitches due to reconnecting the power cell. There are a total of 82 power cells, and they work with interleaved phases to reduce the output ripple voltage. The ratio selector that senses  $V_{\text{REF}}/V_{\text{IN}}$  determines the VCRs of the two outputs ( $R_1$  and  $R_2$ ).

Figure 15 presents a dual-path voltage-controlled oscillator (VCO) to enable the allocation while minimizing cross regulation. The VCO consists of 82 delay cells, generating the clock phases for each power cell. One delay cell in  $CH_1$  ( $DC_{1[n]}$ ) has a complementary delay cell in CH<sub>2</sub> (DC<sub>2[n]</sub>). We choose the phases  $\varphi_{1[n]}$  and  $\varphi_{2[n]}$ through the MUX (multiplexer), subsequently distributed to the power cell. If  $sel_{[n]} = 1$ , it enables DC<sub>1[n]</sub> of VCO (CH1). Simultaneously, the MUX will short  $DC_{2[n]}$  with the clock phase redirected to the next cell. In this way, the number of delay cells in each VCO is equal to the number of its power cells, and multiphase interleaving takes effect to reduce the output ripple voltage. The error amplifier controls the frequency of the VCO, with the two outputs regulated separately, regardless of the power cell arrangement. As the speed of the regulation loop is much faster than that of the power cell allocation, we ensure stability. Each power cell consists of two flying capacitors and eight power transistors with the VCR as  $2/3 \times$  or  $1/2 \times$ . We optimize the configuration of each power cell to minimize the parasitic loss. The channel selection switches, controlled by  $sel_{[n]}$ , connect the local output  $V_{OL}$  to  $V_{O1}$  or  $V_{O2}$ .

Figure 16 illustrates the control logic composed by the frequency comparator and the power cell shift register. First, the one-shot signals ( $ck_{1os}$  and  $ck_{2os}$ ) control  $P_1$ and  $P_2$  to charge  $C_{C1}$  and  $C_{C2}$  for one clock period only. The activation of the ready signals (ready1 and ready2) happens after charging finishes, triggering the comparison between  $V_{F1}$  and  $V_{F2}$ . After a short delay, there is the reset of  $C_{C1}$  and  $C_{C2}$ . For the comparison, if  $V_{F1} < V_{F2}$ , it means that  $f_1 > f_2$ , setting the direction signal of the shift register as direct = 0, and the selection signals will shift left by one bit. This frequency adjustment repeats until  $f_1$  and  $f_2$  are very close to each other. The frequency comparator will then issue stop = 1, and the shift register stops shifting. To ensure accurate charging, we need to well match the current sources and capacitors ( $C_{C1}$  and  $C_{C2}$ ). For robust control, we added offsets to the comparators to form the hysteresis window. The clocks ck1 and ck2 drive the whole process, without an additional system clock.

Figure 17 presents the chip micrograph of the symmetrical dual-output SC converter, fabricated in 28 nm CMOS, with and active area of  $1.2 \times 0.5 \text{ mm}^2$ .



Fig. 15 Circuit implementation of the dual-path VCO, including its delay cell and power stage [18]



Fig. 16 Circuit implementation of the frequency comparator, the bidirectional shift register, and the timing diagram of the frequency comparison [18]





Figure 18 plots the measured waveforms of the steady-state outputs, reference tracking, and load transient. The measured results verified the independent regulation of the two output voltages with the adjustment of the two switching frequencies to be very close. The measured reference up- and down-tracking speeds were 500 mV/ $\mu$ s and 334 mV/ $\mu$ s, respectively. We did not observe any obvious cross regulation at  $V_{O2}$  while  $V_{O1}$  was undergoing reference tracking. With the load at  $V_{O1}$  switched from 4 mA to 40 mA, the settling time was within 500 ns. The cross regulation at  $V_{O2}$  was less than 10 mV at the rising edge and negligible at the falling edge, confirming that the dual-path VCO control can realize minimized cross regulation.

Figure 19 displays the measured efficiencies versus the load currents  $I_{O1}$  and  $I_{O2}$ . The peak efficiency was 83.3% and the split load currents were 50 mA for both channels. Due to dynamic power cell allocation, the converter reached over 80% efficiency, and it was quite constant when  $I_{O1}$  and  $I_{O2}$  were larger than 15 mA. The efficiency with allocation improves by 4.8% when compared with the circuit without. Table 3 addresses the performance comparison. We can conclude that by using dynamic power cell allocation, the proposed dual-output SC converter exhibited high efficiency over a broad load range for the two outputs with minimized cross regulation.

As a conclusion of this subsection, we presented a fully integrated dual-output SC converter with dynamic power cell allocation for application processors. We dynamically allocate the power cells according to load demands, improving the efficiency by 4.8% when compared with the structure without allocation. The circuit contains a dual-path voltage-controlled oscillator (VCO) that works independently of the power cell allocation to implement a fast and stable regulation loop. The converter achieved 83.3% peak efficiency and a maximum 100 mA while maintaining minimized cross regulation.



Fig. 18 Measured waveforms of the steady-state output voltages, reference tracking, and loading transient response



Fig. 19 Measured efficiency versus loading currents with and without dynamic power allocation

|                        | [10]                    | [20]              |                         | This work                  |
|------------------------|-------------------------|-------------------|-------------------------|----------------------------|
| 337 1                  |                         |                   |                         | This work                  |
| Work                   | ISSCC 16                | JSSC 15           | [54] ISSCC 16           | ISSCC 17                   |
| Technology             | 65 nm                   | 0.35 µm           | 180 nm                  | 28 nm                      |
| Topology               | Step-up/                | Step-up           | Step-down               | Step-down                  |
|                        | down                    |                   |                         |                            |
| Number of outputs      | 2                       | 2                 | 3                       | 2                          |
| Passive type           | On-chip                 | Off-chip          | On-chip                 | On-chip                    |
| • •                    | Off-chip                | -                 | (MIM + MOS)             | (MOM + MOS)                |
| V <sub>IN</sub>        | 0.85–3.6 V              | 1.1–1.8 V         | 0.9–4 V                 | 1.3–1.6 V                  |
| V <sub>OUT</sub>       | 0.1–1.9 V               | 2 V and           | 0.6 V, 1.2 V, and       | 0.4–0.9 V                  |
|                        |                         | 3 V               | 3.3 V                   |                            |
| I <sub>O, MAX</sub>    | 10 mA                   | 24 mA             | 100 uA <sup>a</sup>     | 100 mA                     |
| Total C <sub>FLY</sub> | 1 μF                    | 9.4 μF            | 3 nF                    | 8.1 nF                     |
| $\eta_{\text{peak}}$   | 95.8%                   | 89.5%             | 81%                     | 83.3%                      |
| Power density          | N/A                     | N/A               | 250 µW/mm <sup>2</sup>  | 150 mW/mm <sup>2</sup>     |
| Maximum load per       | V <sub>01</sub> : 1 mA  | V <sub>O1</sub> : | V <sub>01</sub> : 33 μA | V <sub>01</sub> : 0–100 mA |
| output                 | V <sub>O2</sub> : 10 mA | 12 mA             | V <sub>O2</sub> : 33 μA | V <sub>02</sub> : 100–0 mA |
| -                      |                         | V <sub>02</sub> : | $V_{O3}: 33 \mu A^{a}$  |                            |
|                        |                         | 12 mA             |                         |                            |
| Symmetrical outputs    | No                      | No                | No                      | Yes                        |

Table 3 Performance comparison with the state of the art

<sup>a</sup>Extracted from the measurement results

# 6 Conclusions

In this chapter, we discussed state-of-the-art circuit design techniques addressing the challenges of fully integrated switched-capacitor power converters, which is one of the important ingredients of power management circuits in recent SoC designs. We discussed the design considerations including topology generation, loss analysis, ripple reduction, and closed-loop feedback control. We also presented two design examples in nanometer CMOS to demonstrate the SC converter performances. Last but not least, we exposed practical design guidelines and suggestions for future works.

#### References

- Sanders, S. R., et al. (2013). The road to fully integrated DC-DC conversion via the switchedcapacitor approach. *IEEE Transactions on Power Electronics*, 28(9), 4146–4155.
- Jiang, J., Liu, X., Ki, W.-H., Mok, P. K. T., & Lu, Y. (2021). Circuit techniques for high efficiency fully-integrated switched-capacitor converters. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 68(2), 556–561.
- Piqué, G. V., Bergveld, H. J., & Alarcon, E. (2013). Survey and benchmark of fully integrated switching power converters: Switched-capacitor versus inductive approach. *IEEE Transactions* on *Power Electronics*, 28(9), 4156–4167.
- Jiang, J. (2017). High-efficiency fully-and highly-integrated switched-capacitor DC-DC converters (PhD Thesis). ECE Department, Hong Kong University of Science and Technology, Hong Kong.
- Jiang, J., Ki, W.-H., & Lu, Y. (2017). Digital 2–/3-phase switched-capacitor converter with ripple reduction and efficiency improvement. *IEEE Journal of Solid-State Circuits*, 52(7), 1836–1848.
- Makowski, M. S., & Maksimovic, D. (1995, June). Performance limits of switched-capacitor DC-DC converters. In *Proceedings of 26th Annual IEEE Power Electronics Specialists Conference*, vol. 2, pp. 1215–1221.
- Su, F., & Ki, W.-H. (2008). Component-efficient multi-phase switched capacitor DC-DC converter with configurable conversion ratios for LCD driver applications. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 55(8), 753–757.
- Karadi, R., & Piqué, G. V. (2014, February). 3-phase 6/1 switched-capacitor DC-DC boost converter providing 16V at 7mA and 70.3% efficiency in 1.1mm<sup>3</sup>. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 92–93.
- Jiang, J., et al. (2015, February). A 2–/3-phase fully integrated switched-capacitor DC-DC converter in bulk CMOS for energy-efficient digital circuits with 14% efficiency improvement. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 366–367.
- Jiang, J., Lu, Y., & Ki, W.-H. (2016, September). A digitally-controlled 2–/3-phase 6-ratio switched- capacitor DC-DC converter with adaptive ripple reduction and efficiency improvements. In *Proceedings of 42nd European Solid-State Circuits Conference (ESSCIRC)*, pp. 441–444.
- Zanwar, M., & Sen, S. (2017, January). Programmable output multi-phase switched capacitor step-up DC-DC converter with SAR-based regulation. In *Proceedings of 30th International Conference on VLSI Design and 16th International Conference on Embedded Systems*, pp. 193–198.
- Bang, S., Blaauw, D., & Sylvester, D. (2016). A successive-approximation switched-capacitor DC-DC converter with resolution of V<sub>IN</sub>/2<sup>N</sup> for a wide range of input and output voltages. *IEEE Journal of Solid-State Circuits*, 51(2), 543–556.
- 13. Salem, L. G., & Mercier, P. P. (2014). A recursive switched-capacitor DC-DC converter achieving 2<sup>N-1</sup> ratios with high efficiency over a wide output voltage range. *IEEE Journal of Solid-State Circuits*, 49(12), 2773–2787.
- Salem, L. G., & Mercier, P. P. (2015, June). A battery-connected 24-ratio switched capacitor PMIC achieving 95.5%-efficiency. In *Proceedings of IEEE Symposium on VLSI Circuits*, pp. 340–341.
- 15. Breussegem, T., & Steyaert, M. (2010). A fully integrated gearbox capacitive DC/DC-converter in 90nm CMOS: Optimization, control and measurements. In *Proceedings of the 22nd IEEE Workshop on Control and Modeling for Power Electronics*, vol. 12.
- Jiang, Y., et al. (2019). Algebraic series-parallel-based switched-capacitor DC–DC boost converter with wide input voltage range and enhanced power density. *IEEE Journal of Solid-State Circuits*, 54(11), 3118–3134.

- Jiang, Y., et al. (2018). Algorithmic voltage-feed-in topology for fully integrated fine-grained rational buck–boost switched-capacitor DC–DC converters. *IEEE Journal of Solid-State Circuits*, 53(12), 3455–3469.
- Jiang, J., et al. (2017, February). A dual-symmetrical-output switched-capacitor converter with dynamic power cells and minimized cross regulation for application processors in 28nm CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 344–345.
- Teh, C. K., & Suzuki, A. (2016, February). A 2-output step-up/step-down switched-capacitor DC-DC converter with 95.8% peak efficiency and 0.85-to-3.6V input voltage range. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 222–223.
- Hua, Z., & Lee, H. (2015). A reconfigurable dual-output switched-capacitor DC-DC regulator with sub-harmonic adaptive-on-time control for low-power applications. *IEEE Journal of Solid-State Circuits*, 50(3), 724–736.
- Seeman, M. D., & Sanders, S. R. (2008). Analysis and optimization of switched-capacitor DC-DC converters. *IEEE Transactions on Power Electronics*, 23(2), 841–851.
- 22. Seeman, M. D. (2009). A design methodology for switched-capacitor DC-DC converters (PhD Thesis). EECS Department, University of California, Berkeley.
- Breussegem, T., & Steyaert, M. (2012). Accuracy improvement of the output impedance model for capacitive down-converters. *Analog Integrated Circuits and Signal Processing*, 72, 271–277.
- Sarafianos, A., & Steyaert, M. (2015). Fully integrated wide input voltage range capacitive DC-DC converters: The folding Dickson converter. *IEEE Journal of Solid-State Circuits*, 50(7), 1560–1570.
- Jiang, J., et al. (2020). A multiphase switched-capacitor converter for fully integrated AMLED microdisplay system. *IEEE Transactions on Power Electronics*, 35(6), 6001–6011.
- Le, H.-P., Sanders, S. R., & Alon, E. (2011). Design techniques for fully integrated switchedcapacitor DC-DC converters. *IEEE Journal of Solid-State Circuits*, 46(9), 2120–2131.
- Jiang, J., Lu, Y., & Ki, W.-H. (2014). Analysis of two-phase on-chip step-down switched capacitor power converters. In *Proceedings of IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)*, pp. 575–578.
- Ki, W. H., Lu, Y., Su, F., & Tsui, C. Y. (2012, June). Analysis and design strategy of on-chip charge pumps for micro-power energy harvesting applications. In VLSI-SoC: Advanced Research for Systems on Chip, pp. 158–186.
- 29. Jiang, J., et al. (2020). Subtraction-mode switched-capacitor converters with parasitic loss reduction. *IEEE Transactions on Power Electronics*, 35(2), 1200–1204.
- 30. Le, H.-P., et al. (2013). A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19W/mm<sup>2</sup> at 73% efficiency. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 372–373.
- Ng, V., & Sanders, S. R. (2013). A high-efficiency wide-input-voltage range switched capacitor point-of-load DC–DC converter. *IEEE Transactions on Power Electronics*, 28(9), 4335–4341.
- Meyvaert, H., et al. (2015). A light-load-efficient 11/1 switched-capacitor DC-DC converter with 94.7% efficiency while delivering 100 mW at 3.3 V. *IEEE Journal of Solid-State Circuits*, 50(12), 2849–2860.
- 33. El-Damak, D., et al. (2013, February). A 93% efficiency reconfigurable switched-capacitor DC-DC converter using on-chip ferroelectric capacitors. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 374–375.
- 34. Andersen, T. M., et al. (2014, February). A sub-ns response on-chip switched-capacitor DC-DC voltage regulator delivering 3.7W/mm<sup>2</sup> at 90% efficiency using deep-trench capacitors in 32nm SOI CMOS. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 90–91.

- 35. Andersen, T. M., et al. (2017). A 10 W on-chip switched capacitor voltage regulator with feedforward regulation capability for granular microprocessor power delivery. *IEEE Transactions on Power Electronics*, 32(1), 378–393.
- Meyvaert, H., Breussegem, T. V., & Steyaert, M. (2013). A 1.65 W fully integrated 90 nm bulk CMOS capacitive DC-DC converter with intrinsic charge recycling. *IEEE Transactions on Power Electronics*, 28(9), 4327–4334.
- Biswas, A., Kar, M., & Mandal, P. (2013). Techniques for reducing parasitic loss in switchedcapacitor based DC-DC converter. In *Proceedings of IEEE 28th Annual Applied Power Electronics Conference and Exposition*, pp. 2023–2028.
- Lin, Y., et al. (2018). A 180 mV 81.2%-efficient switched-capacitor voltage doubler for IoT using self-biasing deep N-Well in 16-nm CMOS FinFET. *IEEE Solid-State Circuits Letters*, 1(7), 158–161.
- Butzen, N., & Steyaert, M. S. J. (2017). Design of soft-charging switched-capacitor DC–DC converters using stage outphasing and multiphase soft-charging. *IEEE Journal of Solid-State Circuits*, 52(12), 3132–3141.
- 40. Piqué, G. V. (2012, February). A 41-phase switched-capacitor power converter with 3.8mV output ripple and 81% efficiency in baseline 90nm CMOS. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 98–100.
- Su, F., Ki, W.-H., & Tsui, C.-Y. (2009). Regulated switched-capacitor doubler with interleaving control for continuous output regulation. *IEEE Journal of Solid-State Circuits*, 44(4), 1112–1120.
- Somasekhar, D., et al. (2010). Multiphase 1 GHz voltage doubler charge-pump in 32 nm logic process. *IEEE Journal of Solid-State Circuits*, 45(4), 751–758.
- Breussegem, T., & Steyaert, M. (2009, June). A 82% efficiency 0.5% ripple 16-phase fully integrated capacitive voltage doubler. In *IEEE Symposium on VLSI Circuits*, pp. 198–199.
- 44. Lu, Y. et al. (2015, February). A 123-phase DC-DC converter-ring with fast-DVS for microprocessors. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 364–365.
- Lu, Y., Jiang, J., & Ki, W. H. (2017). A multiphase switched-capacitor DC-DC converter ring with fast transient response and small ripple. *IEEE Journal of Solid-State Circuits*, 52(2), 579–591.
- 46. Lu, Y., Jiang, J., & Ki, W. H. (2018). Design considerations of distributed and centralized switched-capacitor converters for power supply on-chip. *IEEE Journal of Emerging and Selected Topics in Power Electronics*, 6(2), 515–525.
- Jiang, J., et al. (2017, November). Fully-integrated AMLED micro display system with a hybrid voltage regulator. In *Proceedings of IEEE Asian Solid-State Circuits Conference (A-SSCC)*, pp. 277–280.
- Lu, Y., Ki, W.-H., & Yue, C. (2016). An NMOS-LDO regulated switched-capacitor DC-DC converter with fast response adaptive phase digital control. *IEEE Transactions on Power Electronics*, 31(2), 1294–1303.
- Breussegem, T., & Steyaert, M. (2011). Monolithic capacitive DC-DC converter with single boundary-multiphase control and voltage domain stacking in 90 nm CMOS. *IEEE Journal of Solid-State Circuits*, 46(7), 1715–1727.
- Kwan, H.-K., Ng, D. C. W., & So, V. W. K. (2013). Design and analysis of dual-mode digitalcontrol step-up switched-capacitor power converter with pulse-skipping and numerically controlled oscillator-based frequency modulation. *IEEE Transactions on Very Large Scale Inte*gration (VLSI) Systems, 21(11), 2132–2140.
- Souvignet, T., Allard, B., & Trochut, S. (2016). A fully integrated switched-capacitor regulator with frequency modulation control in 28-nm FDSOI. *IEEE Transactions on Power Electronics*, 31(7), 4984–4994.

- 52. Wang, A. (2014, February). Heterogeneous multi-processing quad-core CPU and dual-GPU design for optimal performance, power, and thermal tradeoffs in a 28nm mobile application processor. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 180–181.
- 53. Singh, K., & de Gyvez, J. P. (2021). Twenty years of near/sub-threshold design trends and enablement. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 68(1), 5–11.
- 54. Jung, W., et al. (2016, February). A 60%-efficiency 20nW-500µW tri-Output fully integrated power management unit with environmental adaptation and load-proportional biasing for IoT systems. In *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 154–155.
- 55. Hong, W., et al. (2019). A dual-output step-down switched-capacitor voltage regulator with a flying capacitor crossing technique for enhanced power efficiency. In *Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, pp. 1–11.

# Hybrid Architectures and Controllers for Low-Dropout Regulators



Xiangyu Mao, Mo Huang, Yan Lu, and Rui P. Martins

# **1** Introduction

For higher system power efficiency of a system-on-a-chip (SoC) or multicore microprocessors, fine-grained supply voltage management with multiple divided and adaptive voltage domains appeared in state-of-the-art computing systems, which allows the optimization of each supply voltage domain dynamically and independently. In general, advanced nanoscale CMOS devices cannot directly withstand the high voltage levels provided by a lithium-ion battery or by a boardlevel power supply bus, mandating off-chip and/or on-chip integrated voltage regulator(s). While off-chip switching regulators can offer one-step conversion from the sources with ~90% efficiencies, they require bulky power inductors and a large number of filtering capacitors. In addition, when delivering power with a stepped-down low voltage from the board onto the chip, the high current stress demands many package bumps. Furthermore, since the package bump pitches scale at a much slower rate than that of the CMOS devices, therefore, there are restrictions for the total number of the voltage domains provided by off-chip regulators [1]. To fulfill the fine-grained power supply management, a hierarchical power delivery network with two-step conversion/regulation is favorable (Fig. 1), with the battery

X. Mao  $\cdot$  M. Huang ( $\boxtimes$ )  $\cdot$  Y. Lu

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

e-mail: mohuang@um.edu.mo; yanlu@um.edu.mo

R. P. Martins

State-Key Laboratory of Analog and Mixed-Signal VLSI/IME and FST-ECE, University of Macau, Macao SAR, China

On leave from Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal e-mail: rmartins@um.edu.mo

<sup>©</sup> The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3\_8


Fig. 1 Hierarchical power delivery network solution of a digital system



Fig. 2 Per-core DVFS with integrated LDOs and a shared power supply  $V_{IN}$ 

voltage converted into an intermediate voltage using a high-efficiency DC-DC converter and then multiple fully integrated low-dropout regulators (LDOs) employed to power the function units (FUs).

Taking the multicore processor application as an example, an LDO can provide a compact and cost-effective way to realize per-core fast dynamic voltage and frequency scaling (DVFS), as presented in Fig. 2.

To analyze how much power the DVFS can save, we can calculate the power consumption  $P_A$  of the digital system with a fixed  $V_{IN}$  as its supply:

$$P_A = C_{\rm dynA} \times V_{\rm IN}^2 \times F + I_{\rm LEAK\_VIN} \times V_{\rm IN} \tag{1}$$

Then, we calculate the system power consumption  $P_{\rm B}$  with LDOs that make each core operate at their corresponding optimum supply voltage  $V_{\rm OUT}$ :

$$P_B = \frac{C_{\rm dynA} \times V_{\rm OUT}^2 \times F + I_{\rm LEAK\_VOUT} \times V_{\rm OUT}}{\eta_{\rm LDO}}$$
(2)

where  $\mathbb{P}_{LDO} \approx V_{OUT}/V_{IN}$ . The saved power consumption is

$$P_{\text{SAVE}} = P_A - P_B = C_{\text{dyn}A} \times V_{\text{IN}} \times (V_{\text{IN}} - V_{\text{OUT}}) \times F + V_{\text{IN}} \times (I_{\text{LEAK}\_\text{VIN}} - I_{\text{LEAK}\_\text{VOUT}})$$
(3)

According to Eq. (3), we can deduce that although the LDO power efficiency can be very low in a large dropout voltage condition (e.g.,  $V_{IN} = 1$  V,  $V_{OUT} = 0.5$  V), it can still save a lot of power from a system perspective.

On the other hand, an analog LDO (A-LDO) is suitable for noise-sensitive analog and RF circuits, as it has fast transient response with low quiescent current and good power supply rejection [2–5]. However, it faces several challenges when powering the digital circuit in advanced nanoscale CMOS. One of the major design challenges of an A-LDO is the relatively small load capability. To deliver a large load current with low-dropout voltage (<100 mV), the size of the power transistor becomes quite large; thus, the associated gate pole, the load-dependent transconductance  $g_m$ , and the output pole  $p_{OUT}$  may cause instability. Most of the prior fully integrated A-LDOs are only capable of delivering a load current of <250 mA, which is insufficient to supply a high-performance processor. In [6] a dual function LDO/ power-gating design with 4A output capability requires a 4µF capacitor in package and a 50 nF on-die compensation capacitor that increases the size and cost.

Another challenge is the performance degradation at a low input voltage. The downscaling of the fabrication process favors low  $V_{IN}$  to reduce the dynamic and leakage currents of the load circuits. The Internet of Things (IoT) and the wearable devices advanced significantly benefiting from low-power circuit technologies such as near-threshold voltage computing [7, 8]. In an advanced process, microprocessors can work in the near-threshold voltage (NTV) and even the sub-threshold voltage regions to save power [9]. When the input supply voltage goes down to the NTV or sub-threshold level, LDOs are still necessary for fine-grained voltage domain and individual performance power optimization. Nevertheless, we may not have sufficient voltage headroom for the analog error amplifier (EA) to drive the power transistor in an A-LDO. Thus, a large power transistor is necessary; besides, it would be hard to obtain a high loop gain with a low supply voltage.

Recently, the digital LDO (D-LDO), the switching LDO (S-LDO), and the hybrid architecture received significant attention, as they are more suitable for such applications. The organization of this chapter is the following: Section 2 introduces the classic LDO control methods and power stage selection. Section 3 details examples of analog-assisted and hybrid control digital LDOs. Section 4 presents the ampere-level switching LDO for high-performance multicore processors. Finally, Sect. 5 draws the conclusions.

## 2 Control Method and Power Stage Selection

# 2.1 Power Stage Comparison

According to the different regulation methods of the power stage, we can categorize the LDO into three types, as presented in Fig. 3.

An A-LDO regulates the output voltage  $V_{OUT}$  by controlling the gate voltage  $V_G$  of the power transistor, while a D-LDO regulates  $V_{OUT}$  by controlling the number n of on/off power switches. Besides, the S-LDO regulates its  $V_{OUT}$  by modulating the duty cycle *D* of the power transistor. We can easily get the expressions for the output current  $I_{OUT}$  and the regulating factors:  $V_G$ , *n*, and *D*.

For the A-LDO,

$$I_{\rm OUT} = V_G \times g_m \tag{4}$$

For the D-LDO,

$$I_{\rm OUT} \approx n \times I_{\rm UNIT} \tag{5}$$

For the S-LDO,

$$I_{\rm OUT} = D \times I_{\rm SW} \tag{6}$$

where  $g_m$  is the transconductance of the analog power transistors related to its load current,  $I_{\text{UNIT}}$  is the unit current conducted through a single digital power switch cell in a D-LDO, and  $I_{\text{SW}}$  is the current conducted through the whole switching power transistor in an S-LDO.

In terms of output regulation continuity, analog control and switching control are continuous, while the digital control is of course discrete. In order to ensure the charge balance in the output, the control code of the D-LDO usually varies between



Fig. 3 LDO power stages with analog, digital, and switching control schemes

one and more adjacent codes. This is the limit cycle oscillation (LCO) in the D-LDO [10]. For a smaller LCO ripple, the simplest method uses a lower-resolution quantizer or dead-zone control but sacrificing the output accuracy. Unlike the digital control, the analog control and switching control can continuously regulate the output current, making it much easier to achieve high output accuracy. Besides, it is also easier to obtain a wide load range for the analog and switching control.

As discussed before, in the low  $V_{IN}$  condition, for the analog power stage, the gate voltage  $V_G$  needs to maintain a certain voltage (>100 mV) in order that the output of the error amplifier can be in a normal operation range for a sufficient loop gain. Nevertheless, the gate voltage of the digital power transistor or the switching power transistor can be 0 V. The switch-like power transistor can conduct more current than the analog power transistor, thus saving silicon area. Additionally, the digital power and the switching power stages are friendly to process scaling.

The frequency compensation is the key part of an LDO design. It is necessary to ensure that the LDO can remain stable over a wide load range. We can derive the transfer function of the three power stages; in the case of the analog power stage, it is

$$A_{\rm VA} = \frac{g_m R_O}{1 + s R_O C_L} \tag{7}$$

where  $C_L$  is the output capacitor. Considering the output impedance  $R_P$  of the power transistors, we can obtain the output impedance  $R_O$ . Generally,  $R_O$  and  $R_L$  have a roughly linear relationship, simply as

$$R_O = R_P / / R_L \approx K \times R_L. \tag{8}$$

Assuming that the power transistors are in the saturation region, then

$$g_m = \sqrt{2\mu_p C_{\rm OX} \frac{W}{L} I_D} = \sqrt{2\mu_p C_{\rm OX} \frac{W}{L} \times \frac{|V_{\rm OUT}|}{R_L}} \tag{9}$$

where  $\mu_p$  is the mobility of the charge carriers and  $C_{OX}$  is the gate-oxide capacitance per unit area.  $|V_{OUT}|$  represents the DC value of the output voltage. W and L are the width and length of the power transistor, respectively. Combining Eqs. (7)–(9), we have

$$A_{\rm VA} = K \times \sqrt{2\mu_p C_{\rm OX} \frac{W}{L} \times |V_{\rm OUT}| \times R_L} / (1 + sR_oC_L).$$
(10)

assuming that the output pole is always within the bandwidth. With two orders of reduction of  $R_0$ , the output pole also moves to two orders of higher frequency, but the gain of the output stage reduces only ten times, resulting in a significant increase in the bandwidth of the analog LDO under a heavy load condition. Then, the parasitic gate pole  $p_G$  of the power transistors may be within the bandwidth, resulting

in a sharp deterioration of the phase margin. The variations of bandwidth and  $p_{OUT}$  greatly affect the loop stability.

Therefore, the gate pole  $p_G$ , the load-dependent  $g_m$ , and the output pole  $p_{OUT}$  are the main factors leading to LDO compensation difficulties. Besides, it is necessary to consider the process, voltage, and temperature (PVT) variations, which complicate the compensation. Prior analog LDOs usually require a complicated compensation using pole-zero tracking to achieve good stability over the full load range [11, 12].

In the steady state, the input and output voltages are constant; thus, the digital power stage and the switching power stage can be equivalent to a constant current source. For the digital power stage, the transfer function is

$$A_{\rm VD} = \frac{I_{\rm UNIT} R_O}{1 + s R_O C_L} \tag{11}$$

For the switching power stage, the transfer function is

$$A_{\rm VS} = \frac{I_{\rm SW} R_O}{1 + s R_O C_L} \tag{12}$$

where  $I_{\text{UNIT}}$  is the unit current conducted through a single power switch cell and  $I_{\text{SW}}$  is the current conducted through the whole switching power transistor. Since they are all fixed values when  $V_{\text{IN}}$  and  $V_{\text{OUT}}$  are constant, the digital output stage has a constant gain bandwidth product. With the output pole placed within the loop bandwidth, we can easily get a constant bandwidth that does not change with the load current, which is very useful for improving the stability and simplifying the compensation. Therefore, the digital power stage and the switching power stage are more suitable for high-current applications.

In addition, for the high-current large area applications, we should pay attention to the integration scheme of the LDO. This has great influence on the choice of the LDO's control methods. With the LDO placed on the side of the load with a centralized power stage, due to the small area of the LDO, the contact surface between the power transistor and the digital load is quite small. Then, the limited metal width would have difficulties in allowing a large load current to pass, which may result in electromigration (EM) issues and a large IR drop (Fig. 4).

The distributed power stage can increase the top metal resource and reduce the IR drop. For the long-distance signal transmission, digital signals have certain advantages over the analog signals, and we can add digital buffers on the signal path. Therefore, this is another important reason why we chose the digital/switching power stages for high current applications.

In addition to the advantages described above, the digital/switching power stages also have some disadvantages. The first is the output ripple, especially for the switching power stage. In order to reduce the output ripple, the switching LDO usually needs a large output capacitor and high switching frequency. The large output capacitor restricts its on-die applications and the high switching frequency leads a large quiescent current.



Fig. 4 Centralized power stage and distributed power stage



Fig. 5 Reliability issue comparison for the analog, digital, and switching LDOs

For the digital power stage, we should pay attention to reliability issues. As we know, for the analog/switching power stages, the load current and heat spread across all the power transistors. However, for the digital power stage, the load current and heat concentrate at the "on" power transistors (Fig. 5). In a large dropout voltage condition, the unit current through each power transistor significantly increases, making the load current and heat even more concentrated, which may cause serious EM and self-heating problems. We cannot solve easily these reliability issues with the layout. Reference [13] used a code roaming algorithm, and Refs. [14, 15] mitigated the EM and self-heating issues by limiting the current through the power transistor. According to the above analyses, Table 1 summarizes the specifications of the three power stage types.

We can choose the appropriate power stage according to the application requirements. It is also possible to combine two different power stages or control methods for better performance. For example, Ref. [16] adopts a digital/analog power stage for power supply rejection (PSR) improvement and LCO reduction; Ref. [17] obtains 5.6 mV/mA load regulation and a 20,000× dynamic load range by adding a sub-LSB switching power transistor to the original digital power stage; the power stage in Ref. [18] combines digital control and switching control, achieving 1 mA– 6.4 A wide load range, when comparing it with a pure switching control, and then leading to a driving current significant reduction.

|                                  | LDO                     |                      |                         |  |
|----------------------------------|-------------------------|----------------------|-------------------------|--|
| Topology                         | Analog                  | Digital              | Switching               |  |
| Regulation fineness              | Continuous<br>(voltage) | Discrete<br>(number) | Continuous (duty cycle) |  |
| Output accuracy                  | $\checkmark$            | ×                    | -                       |  |
| Large load capability            | ×                       | $\checkmark$         | 1                       |  |
| Wide load range                  | $\checkmark$            | ×                    | $\checkmark$            |  |
| Low input power stage            | ×                       | $\checkmark$         | $\checkmark$            |  |
| Distributed power<br>transistors | ×                       | 1                    | 1                       |  |
| Self-heating and EM              | $\checkmark$            | ×                    | $\checkmark$            |  |
| Output ripple                    | $\checkmark$            | ×                    | xx                      |  |

Table 1 Power stage comparison



Fig. 6 Three controller types for LDOs

## 2.2 LDO Controller

Figure 6 shows the simple schematics of the three LDOs: A-LDO, D-LDO, and S-LDO. The A-LDO contains an error amplifier and RC compensation network, the D-LDO controller consists of a quantizer and control logic, while the switching LDO requires a high-speed comparator. Table 2 summarizes the characteristics of the three controller types. We will discuss the three controllers in terms of output voltage accuracy, transient response, and design complexity.

The output voltage accuracy is a very important indicator for the LDO, usually affected by two aspects: one is the load/line regulation; the other is the manufacturing offset error. Due to the high-gain error amplifier (EA), the A-LDO can usually obtain a good load/line regulation, and the amplifier can easily achieve small offset through common centroid-matched transistors. However, in a low  $V_{IN}$  condition, the limited voltage headroom increases the difficulty of designing a high-gain EA. To solve this problem, we can use a heterogeneous power supply: the power supplies of the LDO controller and the power stage are different. For example, in many SoCs, there is a 1.8 V supply commonly used for I/O blocks and analog circuits, such as the bandgap reference, temperature sensors, and oscillators. We can use the 1.8 V supply

|                                  | LDO                                          |                                          |                               |  |  |
|----------------------------------|----------------------------------------------|------------------------------------------|-------------------------------|--|--|
| Topology                         | Analog                                       | Digital                                  | Switching                     |  |  |
| High output<br>accuracy          | $\checkmark$                                 | ×                                        | $\checkmark$                  |  |  |
| Ultra-low quies-<br>cent current | $\checkmark$                                 | $\checkmark$                             | ×                             |  |  |
| Transient response time          | $\approx \frac{1}{BW} + T_{SR}$              | $\approx \frac{1}{BW} + T_S$             | $\approx \frac{1}{BW}$        |  |  |
| Design<br>challenges             | Compensation and energy-<br>efficient driver | Control logic, reliability consideration | Driver loss and output ripple |  |  |

Table 2 Controller comparison

as the controller supply and the 1 V supply as the power stage supply. Of course, we could use a charge pump to generate a higher voltage for the controller.

The D-LDO quantifies the error between the output voltage  $V_{OUT}$  and the reference voltage  $V_{REF}$  and then transmits the digital error information to the control logic [19]. The output accuracy depends on the quantization error. A shift registerbased D-LDO [20] consists of a clocked comparator acting as a one-bit analog-todigital converter (ADC), and a bidirectional shift register (SR) serving as an integrator, which can obtain high output accuracy but slow transient response. A multibit ADC-based D-LDO can obtain a much faster transient response. However, the quantizer resolution limits the output accuracy. Both [14, 21] adopt a six-bit ADC, of which the quantization resolution is approximately 5–7 mV. Further increasing the ADC resolution will exponentially complicate the digital proportional-integralderivative (PID) controller design, which may require higher power consumption and area. So far, there is no D-LDO achieving a load regulation of <5 mV/A.

Figure 7 shows commonly used quantizers that we can divide into voltage domain [22, 23] and time domain [24]. The voltage domain quantizer utilizes multiple comparators and voltage references to detect  $V_{OUT}$  changes ([13, 22] have 6 comparators, [23] has 13 comparators). Still, the input offset voltage of the comparators may reduce the detection window or even cause sub-window overlapping. In order to obtain high output accuracy and maintain robust operation, the comparator offsets require calibration to guarantee a monotonic detection. An event-driven D-LDO [18] obtained fine regulation by using an analog amplifier and a two-bit only current mirror-based flash analog-to-digital converter (ADC), but it requires a 1µF output capacitor to filter the output ripple to less than the minimum detection window of the ADC, limiting its fully integrated application.

The time-domain quantizers using time-to-digital converters (TDCs) and voltagecontrolled oscillators (VCOs) are friendly to process scaling and can work well at low  $V_{IN}$  voltages, but they are sensitive to PVT variations. References [14, 24] utilized a pair of VCOs to resist PVT variations, which generate additional one cycle latency to the loop and still have local mismatches between the two VCOs. For high output accuracy, a piecewise multipoint calibration is usually necessary. Reference [21] only uses one six-bit TDC, but it requires a complex active calibration for the target code. Calibration is necessary for digital LDOs to obtain high output accuracy



Fig. 7 Voltage domain quantizer and time-domain quantizer in DLDOs

and robust operation but increases the cost and design complexity. In contrast, by using an error amplifier, the analog LDOs can easily achieve high output accuracy for its continuous regulation and high gain, without any calibration.

For the switching LDO, with the duty cycle continuously regulated, it can obtain high output accuracy by using a high-speed and high-accuracy comparator or combining it with the analog error amplifier. The S-LDO in Ref. [25] achieved 1.5 mV/A load regulation, and Ref. [26] also obtained an excellent load regulation of 1 mV/A.

The transient response is also another important indicator of the LDO. We evaluate the transient speed of an LDO by the response time  $T_R$ , defined as in [27]:

$$T_R = C_{\rm OUT} \times \frac{\Delta V_{\rm OUT}}{I_L} \times \frac{I_Q}{I_L},\tag{13}$$

where  $\Delta V_{\text{OUT}}$  is the resultant output voltage spike,  $I_{\text{L}}$  is the maximum load current, and  $I_O$  is the quiescent current. We can approximate  $T_{\text{R}}$  as

$$T_R \approx \frac{1}{BW} + T_{SR} + T_S, \tag{14}$$

where BW is the loop bandwidth of the LDO,  $T_{SR}$  is the delay from a limited slew rate, and  $T_S$  is the delay from the voltage error sampling.

Frequency compensation is also a key part of an analog LDO design, especially for high current wide bandwidth applications. The A-LDO can detect the  $V_{OUT}$ variation in real time and almost  $T_S \approx 0$ . The loop bandwidth and the gate-drive slew rate limit the transient response of an A-LDO. Then, flipped voltage follower (FVF)-based LDO is the most common in fast transient applications. For example, Ref. [4] obtained a sub-ns transient response due to >400 MHz loop bandwidth and an enhanced super source follower. In general, for a fast transient response, a high slew rate requires a large current. Designing an energy-efficient driver is another challenge. The common methods include adaptive biasing [28], class-AB driver [29], and supper source follower [4].

To implement feedback control, D-LDOs adopted a range of control schemes, including integral feedback [20], dead-zone control [30], linear PID control [21], feedforward control [31], and nonlinear control [13, 32]. I-control can achieve relatively high output accuracy but with slow transient response. Dead-zone control sets a dead-zone around  $V_{\text{REF}}$  and can remove the output voltage ripple, but the size of the dead-zone requires a cautious setting to avoid window overlapping. The PID control is a comprehensive feedback control scheme. The P-control can improve the transient response by providing an output current proportional to the  $V_{OUT}$  variation. The D-control helps to reduce the sharp  $V_{OUT}$  spikes. Similar to D-control, the feedforward scheme measures the  $V_{OUT}$  slope at the beginning of a droop event and then estimates the necessary amount of charge, which can further improve the load transient performance. For nonlinear control, when  $V_{OUT}$  drops to a certain threshold, it will suddenly turn on parts of the power transistors. Nevertheless, it can constitute an alternative way to minimize voltage droop during large load transients. Yet, this nonlinear trend may easily produce overshoot spikes on  $V_{OUT}$ . In addition, according to the analysis from Sect. 2.1, for wide input/output voltage range applications, D-LDOs need to add some control methods to solve the reliability issues of the power transistors in large dropout conditions.

The D-LDO has no slew rate limitation, but it requires several clock cycles to sample the output error and process the error information. For the shift register-based D-LDO in [20], the response time of the linear search control is *N*. The successive approximation D-LDO [17] achieves a faster response time of  $N/2^N$ . By using a flash ADC [13] or an inverter-chain TDC [21], we can reduce the response time to 1–2 cycles. The higher operation frequency can improve the transient performance, but increasing power consumption, and also we have to consider the impact on stability.

A simple switching LDO operates in a hysteretic mode. It uses a high-speed comparator to amplify the error between  $V_{\text{REF}}$  and  $V_{\text{OUT}}$  into binary levels, with the comparator output signal applied to the switching power transistor. The propagation delay of the comparator and gate driver determines the transient response time, which is usually in sub-nanosecond or even in tens of picoseconds. In principle, since an error of a few mV is sufficient to drive the comparator output into a binary level, the DC load regulation error can be very small but related to the loop delay. The main drawback of an S-LDO is its coherent output ripple. It usually needs a large load capacitor and high switching frequency to reduce the output ripple. Since all the power transistors are in a switching state, there will be a large driving current. Therefore, we can only find the S-LDO in high-current application scenarios.

# 3 Analog-Digital Hybrid LDO

Recently, hybrid LDOs (H-LDOs) gained much research and development interest for combining the advantages of both analog and digital architectures [33]. According to the hybrid methods, we can divide the analog/digital hybrid LDOs into three categories.

The first is the D-LDO with an analog-assisted loop in the digital feedback control [34, 35]. From Fig. 8, the  $R_{\rm C}$  and  $C_{\rm C}$  form a high-pass filter to improve the load transient response. A favorable feature of this structure is that the baseline digital loop can work normally with a slow clock frequency, even if there is no analog loop. Also, since the analog and digital loops have largely different bandwidth, basically, they will not affect each other, maintaining low design complexity.

The second is the H-LDO with individual digital and analog loops in Fig. 9. The power stage consists of a digital power stage in parallel with an analog power stage [16], or it can have a power transistor with two different operation states [36]. This structure can support the two loops working simultaneously or utilizes a finite-state





Fig. 11 Overall architecture of the analog-assisted tri-loop DLDO proposed in [34]

machine (FSM) to control the operation of the two loops to obtain the best steadystate or dynamic performance. An obvious feature of this structure is that either loop can work independently [37, 38].

The third refers to the hybrid signal processing in a single loop, where the analog control and the digital control belong to the same feedback loop. Figure 10 presents a hybrid control architecture [39]. This structure combines an analog error amplifier, a digital voltage sensor (TDC), and a digital power stage, mainly to meet the high-current application requirements with high output accuracy. Next, we will introduce hybrid LDO design examples for each of the three categories.

# 3.1 Analog-Assisted Digital LDOs

Figure 11 presents an analog-assisted (AA) tri-loop D-LDO [34]. Different from a conventional D-LDO, the  $V_{\text{SSB}}$  node of the gate driver of the power transistors does



Fig. 12 Equivalent circuits of the (a) baseline D-LDO, (b) AA-loop D-LDO, and (c) simulated unit current comparison [34]

not connect to GND but is DC-biased to GND with a relatively large resistor  $R_{\rm C}$  and AC coupled with  $V_{\rm OUT}$  through a coupling capacitor  $C_{\rm C}$ .

When a load transient occurs, the  $V_{OUT}$  droop coupled with the gate of the "on" power transistors can generate a larger instantaneous  $V_{GS}$  change and result in larger unit current  $I_{UNIT}$ . A factor K investigated in [34] evaluates the maximum unit current variations at the transient instant in the AA and the baseline schemes. Figure 12 shows the equivalent power stage circuit of the baseline and the AA schemes. When  $V_{OUT}$  changes from  $V_{OUT_NORM}$  to  $V_{OUT_TEMP}$ , only the  $V_{DS}$  of the power transistors changes in the baseline, while both the  $V_{GS}$  and  $V_{DS}$  change in the AA-Loop. Figure 12c displays the simulated results, demonstrating the effectiveness of the AA scheme and only obtained  $1.4 \times I_{UNIT}$  in the conventional structure. Obviously, a larger instantaneous unit current can significantly reduce the  $V_{OUT}$  droop. A similar phenomenon can happen during the load current down transient.

Figure 13 exhibits the working principle of the tri-loop controlled D-LDO. Once the load transient occurs and the  $V_{OUT}$  exceeds the dead zone, coarse tuning activates, with a "C\_EN" signal generated. When C\_EN = 1, the power transistors



Fig. 13 Working principle of the trip-loop LDO

shift by L counts in each cycle, rapidly increasing the output current and decreasing the recovery time. When  $V_{OUT}$  is within the dead zone, the coarse control terminates, and the fine-tuning shifts by one count per cycle. At this moment,  $C_EN = 0$  and  $F_EN = 1$ . After several cycles of fine-tuning, the LDO will enter a freeze mode and stop all the SRs for saving steady-state quiescent current, and then we can eliminate the LCO.

For the above PMOS power stage, there are only a small number of power transistors turned-on in light load; thus, the AA-loop only works on these very few power transistors which is insufficient to compensate a large load transient. Reference [35] utilizes a NMOS power stage with an AA scheme to improve the load transient performance. Figure 14 shows the NMOS power stage with a NAND-based AA loop (NAP). When  $V_{OUT}$  drops, the NMOS source follower naturally provides more current than the PMOS power stage.  $V_{CP}$  is one of the input signals of the NAND, DC biased to  $2 \times V_{DD}$  by a resistor  $R_1$  and AC coupled with the output voltage. When  $V_{OUT}$  has an undershoot voltage, the PMOS  $M_1$  with relatively large size can amplify the coupled AC signal to the gate of the NMOS power transistor. With a 20 mA load step with 3 ns edge time, the undershoot of the PMOS AA D-LDO is close to 426 mV, while the NMOS AA D-LDO has a smaller undershoot of 244 mV due to the NMOS intrinsic response. The NMOS D-LDO with a NAP loop obtains a superior transient response of only 96 mV undershoot.

# 3.2 An Analog-Proportional Digital Integral Multiloop Digital LDO

The AA-loop is a passive scheme to improve the transient response by increasing instantaneous current. Another scheme has directly in parallel a fast analog



Fig. 14 NMOS power stage with NAND-based AA loop [35]



Fig. 15 The D-LDO with analog-proportional and digital integral control [16]

proportional loop with the digital integral loop. Figure 15 reveals a digital LDO with analog-proportional (AP) and digital integral (DI) control [16].

The traditional SR-based D-LDO is essentially an integral control, which can offer a high DC accuracy with low power consumption but also with slow response. The proportional control can respond fast but has a large DC error in the steady state. By combining these two controls, we can simultaneously obtain a fast transient response and high DC accuracy. We can implement the proportional control in an analog way.

The FVF-based LDO is a good choice for energy-efficient proportional control [2]. The analog power transistor  $M_{PA}$  and the common-gate PMOS  $M_2$  compose the fast Loop-1, to handle the fast transient. The FVF circuit can still operate normally in a low  $V_{IN}$  voltage. However, the AP part may take over all the current at a very light load condition; thus, we add Loop-2 for the load current-sharing regulation. Loop-2



Fig. 16 The timing diagram of the AP-DI LDO proposed in [16]

consists of  $M_{PA}$ ,  $M_2$ , and a two-stage error amplifier. We set the gate voltage of  $M_2$  based on the difference between  $V_{OUT}$  and  $V_{REF}$ . Although the gain of Loop-2 may not be high, it can help to improve the PSR and output accuracy under light load conditions.

The digital integral part consists of three shift register-controlled power transistor arrays. Loop-4 is a coarse tuning, composed of M and H subsections. When  $V_{OUT}$  exceeds the preset boundaries ( $V_{REF-}$  to  $V_{REF+}$ ), the outputs of CMP<sub>2</sub> and CMP<sub>3</sub> trigger a fast regulation, in which the active number of power switches changes by 16 units every cycle. When  $V_{OUT}$  is within the ( $V_{REF-}$  to  $V_{REF+}$ ) boundary, Loop-5 starts work, which is a fine-tuning with a high DC gain. The active number changes according to the output of CMP<sub>1</sub>. Figure 16 presents the timing diagram of the AP-DI LDO proposed in [16].

Figure 17 shows the simulated load transient waveforms for load steps of 0–10 mA within a 5 ns edge time, where  $V_{\rm IN} = 0.6$  V,  $V_{\rm REF} = 0.55$  V, and CLK = 5 MHz. With an AP-only LDO, the undershoot is 70 mV but with a large DC error. The DI-only LDO obtains good DC accuracy but a large undershoot of 550 mV, as well as a large LCO. The proposed AP-DI LDO not only delivers a fast transient response and good output accuracy but also eliminates the LCO in light load.

Figure 18 presents the PSR improvement of the AP-DI LDO work in [16], where  $V_{\rm IN} = 0.75$  V,  $V_{\rm OUT} = 0.7$  V, and  $I_{\rm LOAD} = 10$  mA. It is clear that the AP loop can significantly improve the PSR and Loop-2 is very effective.



Fig. 17 Simulated load transient waveforms with AP-DI, DI-only, and AP-only conditions



# 3.3 A 1.2A Calibration-Free Hybrid LDO with in-Loop Quantization

The hybrid LDOs in Sects. 3.1 and 3.2 are all for low-current applications. According to the discussions in Sect. 2.1, the digital power stage is appropriate for high-current and wide bandwidth applications. However, for high output accuracy and robust operation, calibration is necessary but increases the cost and design complexity. In contrast, analog LDOs utilize an analog amplifier that can easily achieve high output accuracy for its continuous regulation and high gain, without any calibration. Thus, we can try to combine an analog error amplifier and digital power stages to achieve large load capability and high output accuracy.

Figure 19 presents the overall architecture of the hybrid LDO with in-loop quantization proposed in [26]. Its composition includes an analog EA with RC compensation, a five-bit TDC, digital power stage, and an auxiliary constant current (ACC) circuit. Unlike the conventional DLDO which directly quantizes the output



Fig. 19 Overall architecture of the in-loop quantization hybrid LDO proposed in [26]



Fig. 20 In-loop quantization

voltage, the proposed LDO utilizes an analog EA to pre-amplify the error between  $V_{\text{OUT}}$  and  $V_{\text{REF}}$ . Then, a five-bit TDC quantizes the buffered EA signal  $V_{\text{EAB}}$  and outputs a thermometer code directly to control the digital power transistors.

Figure 20 illustrates the LDO structure comparison. When compared with the traditional analog LDO, this hybrid LDO replaces the analog driver with a digital TDC and replaces the power stage with a digital power stage. The inverter chainbased TDC is in the middle of the control loop; although it is sensitive to PVT variations, it will not affect the output accuracy benefitting from the closed-loop control because the error amplifier output can automatically track the PVT variations.

Since the current  $I_{\text{UNIT}}$  through a unit power transistor varies a lot in a large dropout condition, it may cause reliability and stability issues [13, 14]. We



Fig. 21 Small-signal analysis of the LDO proposed in [26]

implement an auxiliary constant current (ACC) circuit to keep  $I_{\text{UNIT}}$  constant. The ACC circuit consists of two loops, and its output voltage  $V_L$  has sink capability using the adaptive "GND" from the pre-driver. The control signals of the power transistors are actually in the  $[V_{\text{IN}} - V_L]$  domain. The  $V_L$  voltage tracks PVT variations to ensure that the unit current through the power transistor is equal to the defined value.

The traditional D-LDOs generally utilize the PID controller for loop stability. In the proposed hybrid LDO, we used an RC compensation to replace the digital PID controller and simplify the design by eliminating the analog-to-digital converter. Figure 21 displays the small-signal model of the hybrid LDO proposed in [26]. The RC compensation consists of the resistors  $R_1$  and  $R_2$  and capacitors  $C_1$  and  $C_2$ . Since the TDC's frequency far exceeds the loop bandwidth, we can simplify the five-bit TDC to a continuous voltage-to-digital model. Then, the transfer function of the loop is

$$H(s) = \frac{\frac{N \times I_{\text{UNTT}}}{V_{\text{RANG}}} A_0 R_O (1 + sR_2 C_2) (1 + sR_1 C_1) \times \frac{1 - e^{-1S}}{sT}}{[1 + s(A_0 + 1)R_1 C_2] \left(1 + s\frac{R_2 C_1}{1 + A_0}\right) \left(1 + s\frac{C_{\text{TDC}}}{g_{\text{max}}}\right) (1 + sR_O C_L)}$$
(15)

There are three effective poles and two zeros in the whole loop.

## 4 Multiphase Switching LDO

# 4.1 Ripple Analysis

The switching LDO can drive power transistors fast and accurately. However, it usually needs high switching frequency and a large capacitor to mitigate its output ripple, which restricts their application in low-power and low-cost scenarios. A traditional switching LDO with hysteretic control utilizes a high-speed comparator



Fig. 22 The charge-discharge model of a traditional hysteretic switching LDO

to amplify the errors between  $V_{\text{REF}}$  and  $V_{\text{OUT}}$  into binary levels. The comparator output controls the power transistor for turning it on and off, regulating the output current. Figure 22 shows the charge-discharge model of a hysteretic switching LDO.

 $I_{SW}$  is the current through the power transistor when turned on, and  $I_L$  is the load current. In steady state, according to the charge-balance principle,

$$I_{\rm SW} \times T_{\rm ON} = I_L \times T \tag{16}$$

The duty cycle D is

$$D = \frac{T_{\rm ON}}{T} = \frac{I_L}{I_{\rm SW}}.$$
 (17)

The output ripple consists of two parts: the capacitor charging-discharging component  $\Delta V_{CR}$  and the contribution of its ESR that is  $\Delta V_{ESR}$ :

$$\Delta V = \Delta V_{\rm CR} + \Delta V_{\rm ESR} = (1 - D)D \times I_{\rm SW} / (C_L \times F) + I_{\rm SW} \times R_{\rm ESR}$$
(18)

where F = 1/T is the switching frequency. In Eq. (18), the amplitude of the output ripple is related to the transistor current strength  $I_{SW}$ , switching frequency F, load capacitor  $C_L$ , and load current  $I_L$ . Assuming that D = 50%,  $I_{SW} = 1A$ , F = 1GHz,

| Table 3         The K ratio           comparison |                                    | [40]     | [25]    | [41]  |
|--------------------------------------------------|------------------------------------|----------|---------|-------|
|                                                  | Output capacitor $C_L$ (           | (nF) 750 | 481     | 2.7   |
|                                                  | Load capability $I_L$ (A           | ) 11.9   | 12      | 0.17  |
|                                                  | $K = C_{\rm L}/I_{\rm MAX}$ (nF/A) | 63.02    | 40.08   | 15.88 |
| Vaura                                            |                                    | ٨٨       | ۸ -√- 4 | ;     |



Fig. 23 The PWM control switching LDO with a triangle input signal

and  $R_{\rm ESR} = 5 \mathrm{m}\Omega$ , the output capacitor  $C_L$  needs to be larger than 25 nF for a 15 mV output ripple. Considering the PVT variations of  $I_{\rm SW}$ , the output capacitor should be even larger. Higher switching frequency can reduce the output ripple, but it increases the driver loss.

Table 3 presents the load capability and the output capacitor comparison of the prior hysteretic switching LDOs. References [25, 40] have large load capability and correspondingly need large output capacitors (481 nF in [25] and 750 nF in [40]), which require a special SOI process or a deep-trench process. Reference [41] fabricated in 16 nm CMOS with a 2.7 nF load capacitor can only drive a load current of 170 mA. We consider the ratio of output capacitance over the maximum load current  $K = C_{\rm L}/I_{\rm MAX}$  as the key performance index of switching LDOs. The *K* values in Table 3 are 63.02 nF/A, 40.08 nF/A, and 15.88 nF/A, respectively. Such large *K* values restrict the application of hysteretic switching LDOs.

## 4.2 RAMP-Based PWM Control

A hysteretic switching LDO does not fix the switching frequency, only determined by the loop propagation delay. In order to fix the switching frequency, we use a triangle wave to replace the DC reference voltage, as presented in Fig. 23.

When  $V_{\text{RAMP}} > V_{\text{OUT}}$ , the comparator output will turn on the power switch and  $V_{\text{OUT}}$  rises. When  $V_{\text{RAMP}} < V_{\text{OUT}}$ , the comparator output will turn off the power switch and  $V_{\text{OUT}}$  drops. The switching frequency is equal to the triangle wave frequency. The  $V_{\text{RAMP}}$  amplitude is usually much larger than the output ripple. Ignoring the impact of the output ripple, we can express the duty cycle *D* as

$$D = \left(\frac{\text{RAMP}}{2} + V_{\text{REF}} - V_{\text{OUT}}\right) / \text{RAMP} = \frac{1}{2} + \frac{V_{\text{REF}} - V_{\text{OUT}}}{\text{RAMP}} = \frac{I_L}{I_{\text{SW}}}$$
(19)

Equation (19) reveals the linear relationships between  $I_L$ ,  $V_{OUT}$ , D, and RAMP.

## 4.3 Four-Phase PWM Control

With the frequency of the PMW (pulse width modulation) control fixed, we can utilize a four-phase triangle wave and split the total current  $I_{SW}$  into four small currents (Fig. 24), with charging interleaved. When compared with the single-phase PWM control, the four-phase PWM control can reduce the maximum output ripple by 16 times.

# 4.4 Current Balancing

The current-sharing can be a serious issue in multiphase control, which determines the ripple cancellation effect. For the four-phase switching LDO,

$$I_L = \left(\frac{I_{SW}}{4} \times D_0\right) + \left(\frac{I_{SW}}{4} \times D_1\right) + \left(\frac{I_{SW}}{4} \times D_2\right) + \left(\frac{I_{SW}}{4} \times D_3\right)$$
(20)

The unbalanced current is

$$\Delta I = \frac{I_{\rm SW}}{4} \times \Delta D \tag{21}$$

The input offset voltage of the comparator will cause a duty cycle error. Since a small error in D only causes small unbalanced current, we recommend calibrating the comparators for a good load sharing.



Fig. 24 The four-phase PWM control charging/discharging mode

# 4.5 Dual-Loop Four-Phase PWM Control Switching LDO

Since the reference input of the comparator becomes a triangle wave, the  $V_{OUT}$  voltage cannot obtain high DC accuracy. We add a high-gain error amplifier before the PWM controller to improve the output accuracy. Figure 25 shows the overall architecture of the dual-loop four-phase PWM switching LDO [26]. The resistor  $R_1$  and capacitor  $C_1$  constitute the loop compensation circuit, and we used  $R_F$  to realize the active voltage positioning (AVP) function. We can adjust  $R_F$  to obtain different AVP effects. In addition to the four-phase PMW control, we introduced two other ripple reduction techniques: (1) current-limited power cells acting as constant current source for resisting PVT variations and (2) hybrid fast-slow power transistors, with a ratio of 4:1.

Distinctive from conventional LDO designs that consider the controller and the power transistor as a whole, we can design such switching LDO like a Lego set. Each power cell has a load capability of 220 mA; after we design the controller, the switching LDO can scale to different load applications by increasing or decreasing the number of power cells, even without redesigning the main circuits and layouts, which is very flexible and convenient.



Fig. 25 Overall architecture of the dual-loop four-phase switching LDO in [26]

# 5 Conclusions

This chapter discussed the characteristics and design considerations of each of the three LDO types (analog, digital, switching) in terms of the power stage and the control methods, for integration in nanoscale processes. The conventional analog, digital, and switching LDOs all have some inherent shortcomings or limitations. Many recent research works obtained better performances by using a hybrid architecture that combined the advantages of different control schemes. Design example-1 adopts a high-pass analog-assisted loop to improve the transient response of a digital LDO. Design example-2 utilizes the analog-proportional and digital integral control for enhancing the PSR and improves the load transient response. In addition, by combining the analog error amplifier and the distributed digital power stage (example-3), or switching power stage (example-4), the two LDOs obtained ampere-level load current capability, as well as high output accuracy and fast transient response. In brief, there is no perfect architecture for all applications but only the most suitable architecture for a specific application. We need to choose the LDO structure based on the application requirements, not limited to specific control loop and power stage types.

## References

- 1. Kim, S. T., et al. (2016). Enabling wide autonomous DVFS in a 22 nm graphics execution core using a digitally controlled fully integrated voltage regulator. *IEEE Journal of Solid-State Circuits*, 51(1), 18–30.
- Lu, Y., Wang, Y., Pan, Q., Ki, W.-H., & Yue, C. P. (2015). A fully-integrated low-dropout regulator with full-spectrum power supply rejection. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 62(3), 707–716.
- 3. Huang, M., Feng, H., & Lu, Y. (2019). A fully-integrated FVF-based low-dropout regulator with wide load capacitance and current ranges. *IEEE Transactions on Power Electronics*, 34(12), 11880–11888.
- 4. Cai, G., et al. (2021). A fully integrated FVF LDO with enhanced full-spectrum power supply rejection. *IEEE Transactions on Power Electronics*, *36*(4), 4326–4337.
- Guo, J., & Leung, K. N. (2010). A 6-µW chip-area-efficient output-capacitorless LDO in 90-nm CMOS technology. *IEEE Journal of Solid-State Circuits*, 45(9), 1896–1905.
- Luria, K., Shor, J., Zelikson, M., & Lyakhov, A. (2015, February). Dual-use low-drop-out regulator / power gate with linear and on-off conduction modes for microprocessor on-die supply voltages in 14nm. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 156–157.
- Kim, S., & Seok, M. (2015). Variation-tolerant, ultra-low-voltage microprocessor with a low-overhead, within-a-cycle in-situ timing-error detection and correction technique. *IEEE Journal of Solid-State Circuits*, 50(6), 1478–1490.
- Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of things: A survey on enabling technologies, protocols, and applications. *IEEE Communications Surveys & Tutorials*, 17(4), 2347–2376.

- Jain, S., et al. (2012, February). A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 66–68.
- Huang, M., Lu, Y., Seng-Pan, U., & Martins, R. P. (2016). Limit cycle oscillation reduction for digital low dropout regulators. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 63(9), 903–907.
- Lin, Y. H., Zheng, K. L., & Chen, K. H. (2008). Smooth pole tracking technique by power MOSFET array in low-dropout regulators. *IEEE Transactions on Power Electronics*, 23(5), 2421–2427.
- Kwok, K. C., & Mok, P. K. T. (2003, May). Pole-zero tracking frequency compensation for low dropout regulator. In *Proceedings of IEEE International Symposium on Circuits and Systems* (ISCAS), pp. 379–382.
- Muthukaruppan, R., et al. (2017, September). A digitally controlled linear regulator for per-core wide-range DVFS of atom cores in 14nm tri-gate CMOS featuring non-linear control, adaptive gain and code roaming. In *Proceedings of 43rd IEEE European Solid State Circuits Conference* (ESSCIRC), pp. 275–278.
- 14. Mahajan, T., et al. (2017, April). Digitally controlled voltage regulator using oscillator-based adc with fast-transient-response and wide dropout range in 14nm CMOS. In *Proceedings of IEEE Custom Integrated Circuits Conference (CICC)*, pp. 1–4.
- Meinerzhagen, P., et al. (2018, February). An energy-efficient graphics processor featuring finegrain DVFS with integrated voltage regulators, execution-unit turbo, and retentive sleep in 14nm tri-gate CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest* of Technical Papers, pp. 38–39.
- Huang, M., Lu, Y., & Martins, R. P. (2020). An analog-proportional digital-integral multiloop digital LDO with PSR improvement and LCO reduction. *IEEE Journal of Solid-State Circuits*, 55(6), 1637–1650.
- Salem, L. G., Warchall, J., & Mercier, P. P. (2017, February). A 100nA-to-2mA successiveapproximation digital LDO with PD compensation and sub-LSB duty control achieving a 15.1ns response time at 0.5V. In *IEEE International Solid-State Circuits Conference* -*(ISSCC) Digest of Technical Papers*, pp. 340–342.
- Jung, D., et al. (2021, February). A distributed digital LDO with time-multiplexing calibration loop achieving 40A/mm<sup>2</sup> current density and 1mA-to 6.4A ultra-wide load range in 5nm FinFET CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 414–415.
- 19. Huang, M., Lu, Y., & Martins, R. P. (2020). A comparative study of digital low dropout regulators. *Journal of Semiconductors*, 41(11), 111405.
- Okuma, Y., et al. (2010, September). 0.5-V input digital LDO with 98.7% current efficiency and 2.7-µA quiescent current in 65nm CMOS. In *Proceedings of IEEE Custom Integrated Circuits Conference (CICC)*, pp. 98–101.
- Bang, S., et al. (2020, February). A fully synthesizable distributed and scalable all-digital LDO in 10nm CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 380–382.
- 22. Yuan, Z., Fan, S., et al. (2020). A 100 MHz, 0.8-to-1.1V, 170mA digital LDO with 8-cycles mean settling time and 9-bit regulating resolution in 180-nm CMOS. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 67(9), 1664–1668.
- 23. Sun, X., Boora, A., et al. (2019, February). A 0.6-to-1.1V computationally regulated digital LDO with 2.79-cycle mean settling time and autonomous runtime gain tracking in 65nm CMOS. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 230–231.
- 24. Kundu, S., Liu, M., Wen, S.-J., Wong, R., & Kim, C. H. (2019). A fully integrated digital LDO with built-in adaptive sampling and active voltage positioning using a beat-frequency quantizer. *IEEE Journal of Solid-State Circuits*, *54*(1), 109–120.

- Perez, M. E., et al. (2020). Distributed network of LDO microregulators providing submicrosecond DVFS and IR drop compensation for a 24-core microprocessor in 14-nm SOI CMOS. *IEEE Journal of Solid-State Circuits*, 55(3), 731–743.
- Mao, X., Lu, Y., & Martins, R. P. (2022). A scalable high-current high-accuracy dual-loop fourphase switching LDO for microprocessors. *IEEE Journal of Solid-State Circuits*, 57(6), 1841–1853.
- Hazucha, P., et al. (2005). Area-efficient linear regulator with ultra-fast load regulation. *IEEE Journal of Solid-State Circuits*, 40(4), 933–940.
- Magod, R., Bakkaloglu, B., & Manandhar, S. (2018). A 1.24 μA quiescent current NMOS low dropout regulator with integrated low-power oscillator-driven charge-pump and switched capacitor pole tracking compensation. *IEEE Journal of Solid-State Circuits*, 53(8), 2356–2367.
- Zhao, X., Zhang, Q. S., et al. (2022). A high-efficiency fast-transient LDO with low-impedance transient-current enhanced buffer. *IEEE Transactions on Power Electronics*, 37(8), 8976–8987.
- 30. Kim, S. J., Kim, D., Pu, Y., Shi, C., & Seok, M. (2019, June). A 0.5-1V input event-driven multiple digital low-dropout-regulator system for supporting a large digital load. In *IEEE Symposium on VLSI Circuits, Digest of Technical Papers*, pp. C128–C129.
- 31. Kim, D., Kim, S., Ham, H., Kim, J., & Seok, M. (2018, June). 0.5V-V<sub>IN</sub>, 165-mA/mm<sup>2</sup> Fully-Integrated Digital LDO based on Event-Driven Self-Triggering Control. In *IEEE Symposium on* VLSI Circuits, Digest of Technical Papers, pp. 109–110.
- 32. Oh, J., Park, J. E., et al. (2020, February). A 480mA output-capacitor-free synthesizable digital LDO using CMP-triggered oscillator and droop detector with 99.99% current efficiency, 1.3ns response time and 9.8A/mm<sup>2</sup> current density. In *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, pp. 382–384.
- Huang, M., Lu, Y., & Martins, R. P. (2021). Review of analog-assisted-digital and digitalassisted-analog low dropout regulators. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 68(1), 24–29.
- Huang, M., Lu, Y., Seng-Pan, U., & Martins, R. P. (2017, February). An output-capacitor free analog-assisted digital low-dropout regulator with tri-loop control. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 342–343.
- Ma, X., Lu, Y., Li, Q., Ki, W.-H., & Martins, R. P. (2020). An NMOS digital LDO with NANDbased analog-assisted loop in 28-nm CMOS. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 67(11), 4041–4052.
- 36. Lu, Y., Yang, F., Chen, F., & Mok, P. K. T. (2018, February). A 500mA analog-assisted digital-LDO-based on-chip distributed power delivery grid with cooperative regulation and IR-drop reduction in 65nm CMOS. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 310–312.
- Huang, M., Lu, Y., Seng-Pan, U., & Martins, R. P. (2018). An analog-assisted tri-loop digital low-dropout regulator. *IEEE J Solid-State Circuits*, 53(1), 20–33.
- Huang, M., Lu, Y., & Martins, R. P. (2020). Partial analogue-assisted digital low dropout regulator with transient body-drive and 2.5× FOM improvement. *Electronics Letters*, 54(5), 282–283.
- 39. Mao, X., Lu, Y., & Martins, R. P. (2022). A 1.2-A calibration-free hybrid LDO with in-loop quantization and auxiliary constant current control achieving high accuracy and fast DVS. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 69(11), 4443–4452.
- 40. Toprak-Deniz, Z., et al. (2014, February). Distributed system of digitally controlled microregulators enabling per-core DVFS for the POWER8 microprocessor. In *IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*, pp. 98–99.
- 41. Kudva, S. S., et al. (2018, April). A switching linear regulator based on a fast-self-clock comparator with very low probability of meta-stability and a parallel analog ripple control module. In *Proceedings of IEEE Custom Integrated Circuits Conference (CICC)*, pp. 1–4.

# Index

#### A

Analog-to-digital converter (ADC), 131–177, 182, 183, 185, 187, 188, 193, 194, 198–209, 211–216, 289, 291, 300

## С

Calibration, 42, 43, 94, 119, 131, 136, 140, 141, 147, 148, 150, 152, 154, 155, 157, 161–164, 166–169, 172, 174, 183, 184, 189–191, 193, 196, 199, 201, 203, 205–208, 215, 216, 289, 290, 298 CMOS, 4, 5, 22, 43, 44, 48, 52–54, 61, 64, 71, 77–80, 83, 92, 99, 101, 107, 112, 117–120, 123, 124, 141, 145, 152, 163, 168, 174, 188, 196, 206, 213, 228, 232, 234, 244, 246, 247, 254, 262, 269, 272, 276, 281, 283, 302 Continuous-time DSM (CTDSM), 182–190 Crystal oscillator (XO), 83, 92–106, 123

#### D

DC-DC converter, 230, 234, 267, 269 Delta-sigma modulator (DSM), 182–190, 194, 196 Digital LDO, 283, 296, 297, 305 Dynamic voltage and frequency scaling (DVFS), 270, 282 Dynamic voltage scaling (DVS), 269

#### E

Energy harvesting, 92, 98, 107, 123, 221–248 Energy harvesting interface, 222, 223, 248

## F

Fast startup, 92, 98 Fine-grained, 234, 237, 239, 281, 283 Front-ends, 3, 45, 132, 166, 168, 175, 185, 186

#### H

High efficiency, 255, 257, 270, 274, 282

#### I

Internet of Things (IoT), xi, 91, 92, 99, 104, 106, 107, 119, 121, 123, 131, 221, 223, 234, 248, 283 Inverse class-F, 52–66

#### J

Jitter, 79, 80, 84, 108, 120–122, 134, 136, 142, 191

#### L

Low-dropout regulator (LDO), 255, 267, 282–286, 288–300, 304, 305 Low supply voltage, 120, 183, 283

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Paulo da Silva Martins, P.-I. Mak (eds.), *Analog and Mixed-Signal Circuits in Nanoscale CMOS*, Analog Circuits and Signal Processing, https://doi.org/10.1007/978-3-031-22231-3 309

#### М

Microprocessor, 266, 269, 281, 283 Millimeter wave (mm-Wave), 74, 78–80, 83, 84 Miniaturized IoT systems, 223, 248 Mode-switching, 23, 66–73 Multiphase, 255, 264, 268

#### Ν

Noise-shaping (NS), 182, 185–189, 198–205, 207, 208, 216 N-path filter, 4, 7–11, 13, 19, 20, 28–31, 34, 35

#### 0

Oversampling, 182, 183, 196, 198

## P

Phase noise, 5, 21, 22, 27, 42, 43, 51, 52, 54–57, 59–61, 64–66, 69, 71–73, 79, 93, 122
Phase-locked loop (PLL), 51, 78–87, 92
Pipeline, 132–134, 154, 156–160, 162–166, 182, 198–205, 208, 209, 213, 216
Pipeline-SAR ADC, 133–144
Power converter, 223, 224, 234, 248, 253–276
Power management, 265, 276

### R

Radio frequency (RF), 3–8, 12, 13, 17–20, 22, 24, 27–34, 36, 37, 39, 40, 51, 52, 73, 80, 99, 169, 283, 304 Receiver (RX), 3–7, 17, 19–23, 25–28, 33, 35, 40, 106, 120, 181 Reference spur, 51

Relaxation oscillator (RxO), 92, 100, 106-123

# S

Subsampling, 51

Successive approximation register (SAR), 132–134, 138–141, 144–147, 149, 150, 152–154, 167, 168, 177, 182, 183, 185, 187, 188, 191–193, 197, 198, 200, 203–205, 208, 209, 214, 216, 257 Surface acoustic wave (SAW) filter, 5, 28 Switched-capacitor (SC), 3–9, 12, 14–16, 19,

20, 22, 24, 28, 30–34, 38, 54, 55, 66, 75, 77, 114, 185, 188, 195, 223, 233–248, 253–276

Switching LDO, 283, 286, 288, 290, 291, 300–304

## Т

Time-domain converter (TDC), 132, 144–148, 150–154, 167–172, 289, 291, 293, 298–300

Transceiver (TRX), 3, 24, 51, 91, 92, 190

Transmitter (TX), 3–10, 12–15, 17–25, 28, 29, 33–35, 38, 40, 41, 43, 44, 47, 48

## U

Ultra-low-power, 91, 92, 123

Ultra-low-voltage (ULV), 92–94, 104, 107–115, 120, 121, 123, 124

### V

Voltage-controlled oscillator (VCO), 59–84, 264–266, 268, 271–274