Keywords

1 Introduction

In recent years, monolithic and highly integrated DC-DC power converters are in great demand for various low-power devices, like implantable, wearable, and portable devices [1]. Integrating a DC-DC power converter fully on-chip is always favorable, as it potentially results in a simpler system design and smaller PCB footprint, and it also lowers the cost by eliminating or integrating the most costly power converter component: the power inductor.

In typical system-on-chip (SoC) designs (Fig. 1), there are many different logic and functional blocks that need multiple individual voltage domains, enabled by multiple power converters and voltage regulators [2]. Meanwhile, if power converters can have zero external components that can significantly reduce the number of I/O pins of the SoC chip, the converters can deliver much better transient performances by allocating the power converters closer to the point of load.

Fig. 1
A S o C power delivery network with a battery and off-chip D C to D C converter on the input side, a capacitor, and an inductive and capacitive with drain voltage along with its specifications.

A typical SoC power delivery network

Among linear voltage regulators, switching-mode power converters, and switched-capacitor (SC) power converters, SC converters are good for full integration with only capacitors used easily built on-chip with a nanometer process [3]. Although the efficiency of an SC converter drops linearly when the output voltage deviates from its ideal output voltage, we can still obtain good efficiencies with multiple voltage conversion ratios (VCRs). Therefore, SC converters attracted great interest from both the industry and the academia and are a promising alternative for the next-generation SoC power delivery. Several practical products emerged from the application of techniques presented in prior research works.

However, designing a high-performance on-chip SC power converter can be very challenging [3, 4]: First, the power efficiency of an SC converter with only a few VCRs is not high over wide input and output voltage ranges. Second, an SC converter has limited output impedance, and its maximum power density is a function of the on-chip capacitance density and the switching frequency; thus, an increase in power density will always sacrifice power efficiency. Hence, in a standard CMOS process of which the capacitance density is relatively low, there is a fundamental trade-off between power density and efficiency, and optimizing this trade-off can be challenging. Third, the output voltage ripples due to hard-charging currents affect the performances of noise-sensitive devices, and lowering the output voltage ripple requires higher switching frequency and larger capacitance. Therefore, minimizing the voltage ripple using minimum system resources and cost is also a stringent problem to solve.

To tackle the abovementioned design challenges, many circuit- and system-level techniques came out. Researchers and circuit designers try to optimize the SC design with better trade-offs among power density, conversion efficiency, system cost, and design complexity. In this chapter, we will provide a systematic summary and design guidelines of recent SC converter design techniques. We will also review the advantages and drawbacks of these design techniques, in the aspects of topology generation, loss analysis and optimization, voltage ripple reduction, and closed-loop regulation.

The remaining of this chapter will have the following organization: Section 2 discusses the topology generation and selection, as well as the topology-level efficiency considerations. Section 3 analyzes the power conversion losses of SC converters and introduces techniques that reduce gate-drive switching loss and parasitic loss. Section 4 compares the centralized and distributive clock generation methods for multiphase SC converters. Then, we will describe two design examples: an SC converter-ring and a multi-output SC converter in Sects. 4 and 5, respectively. Finally, Sect. 6 draws the conclusions.

2 Topology Generation

2.1 Efficiency and Power Density Trade-Off

Topology generation or selection is the first step of consideration in most of the designs. With the input and output voltage ranges specified, we can determine the required VCRs first. For an SC converter, the theoretical efficiency is

$$ \eta =\frac{V_{\textrm{OUT}}}{M\times {V}_{\textrm{IN}}}, $$
(1)

where M is the ideal VCR of the selected topology. With only one VCR, the power conversion efficiency decreases monotonically when the output voltage drops from the ideally converted voltage (M × VIN). For applications that require a wide input or output voltage range, it is important to reconfigure the power conversion cells for several VCRs to cater for a changing input voltage. The SC converter can then operate at a proper VCR that delivers the maximum efficiency.

Figure 2 shows the theoretical efficiency of an SC converter and a low-dropout regulator (LDO) with respect to the output voltage VO [5]. For example, if VO needs to be 1/2 VIN, the efficiency is η = 50% when using an LDO. With the SC converter configured as M = 2/3, then we get η = 0.5/0.667 = 75%; on the other hand, with the SC converter reconfigured as M = 1/2, then the ideal efficiency can be 100%. Obviously, with more VCRs, the converter will have a higher averaged efficiency across the whole VO and VIN ranges. However, more VCRs need more flying capacitors and power switches; thus, combining multiple topologies in one power stage increases circuit complexity and the equivalent output resistance, reducing the output power capability and power density. Clearly, there is a trade-off between power efficiency and power density, and then, the target is to obtain an optimum high efficiency range with a reasonable design complexity.

Fig. 2
A graph of efficiency versus voltage conversion ratio. It plots an increasing trend for S C converter with 4 and 6 V C R, and ideal low dropout regulator along with a sawtooth wave as an efficiency improvement with more V C R.

Theoretical efficiency comparison of an SC converter with four VCRs and six VCRs versus an ideal low-dropout regulator [5]

2.2 Two-Phase Limitation and Three-Phase Operation

Most switched-capacitor converters use only two operational clock phases, with the number of VCRs limited by the number of flying capacitors [6]. For example, with two flying capacitors, the realizable step-down VCRs are 1×, 2/3×, 1/2×, and 1/3× only. If more VCRs such as 3/4× and 1/4× are necessary, the converter requires one more flying capacitors.

An alternative method to realize more VCRs while keeping the number of flying capacitors unchanged is to use a multiple clock phase operation [7,8,9,10]. A three-phase operation [7] and two- or three-phase operation [8] used in step-up SC converters boost the output voltage to 6×/7× of the input voltage for LED/LCD driver applications. Similarly, when applied to step-down SC converter in [9], it generates a very low output voltage (1/4×) for wireless biomedical implants. Experimental results show an efficiency of 70% obtained for VO = 0.5 V. Figure 3 shows the three-phase topologies (3/4× and 1/4×) using only two flying capacitors and achieving up to 20% efficiency improvements together with a higher average efficiency over wide VO and VIN ranges [10].

Fig. 3
An operation states of two topologies with different modes and different configurations 1 by 4 x, 1 by 1 x, 1 by 2 x, 3 by 4 x, 1 by 3 x, and 2 by 3 x.

Operation states of two topologies that use three-phase configuration realizing (a) 1/4× mode and (b) 3/4× mode [10]

In a short summary, multiple-phase operation uses fewer capacitors and switches; furthermore, it realizes a better trade-off between power efficiency and density, covering wider input and output voltage ranges.

2.3 Review of Other Topologies

To cover a wider voltage range with high efficiencies, some reconfigurable SC converters have a large number of VCRs, for example, the successive approximation (SAR)-based SC converter that has 117 VCRs [11]. By reconfiguring cascaded power cells that have M = 1/2, each power cell can be the top or the bottom voltage domain for the next stage, such that the output voltage has seven-bit resolution [12]. The topology is further improved by using recursive SC converters [13, 14]. In [15], a gear train topology emerged using five off-chip capacitors constructed four stacked power stages that realized 24 VCRs. We can find similar works in [16], and algebraic series-parallel topologies appeared to generate more VCRs to cover wide-voltage ranges [16, 17]. However, these converters have a common drawback; the output impedance is high due to the stacking of too many power switches in series that limit the load current capability and power density. But, we may use them in low-power applications with stringent requirements on system integration.

As mentioned above, a SoC requires multiple voltage domains for individual functional blocks, and then, single-input multiple-output SC converters, with capacitors and transistors potentially shared to save silicon area overhead and improved overall power efficiency, can serve the purpose well. Reference [18] proposed a dynamic power cell allocation scheme for multicore application processors. The dynamic allocation of power cells according to load demands can improve the efficiency by 4.8% when compared with the case without it. The peak efficiency was 83.3% and the maximum load was 100 mA, meanwhile, minimizing the cross regulation. Reference [19] presented a specific application that requires two outputs with different loads and used an on-demand strategy to compensate the current shortage, thus saving on-chip capacitor area. In [20], VCR of 2× and 3× shared one transistor and reduced silicon area and improved the efficiency.

3 Efficiency Optimization

3.1 Unified Models for Losses in the SC Converter

When designing fully integrated switched-capacitor (SC) converters, optimizing efficiency is one of the most important procedures to ensure the maximum power density under peak efficiency. However, the loss contribution of SC converters may arise from multiple factors and may vary with different topologies, leading to complexity in analysis and optimization. In this subsection, we present a methodology to predict the overall efficiency and find the optimized peak efficiency.

Switched-capacitor converters can have an ideal 100% efficiency under close-to-no-load condition, besides, the power efficiency starts to drop when the charge transfer on the flying capacitors happens, due to the well-known charge redistribution loss. In general, the output voltage drops proportionally with the loading current, forming an equivalent output resistor (ROUT) at the output node. Such that, Eq. (1) above is the expression of the theoretical efficiency for a certain VCR. We can observe that there is a relationship between the efficiency and the proximity of the real output voltage (VOUT) to the ideal output voltage (MVIN). The charge redistribution loss, also known as hard-charging loss, is the integrated conduction energy loss from the resistive loss on the switches. M. Seeman proposed a unified model to calculate ROUT (Fig. 4) [21, 22].

Fig. 4
A circuit diagram of transformer-based S C switched capacitor converter model with an input voltage source, transformer, 2 resistors, and 1 capacitor at the output side.

Transformer-based SC converter model

We can model an SC converter as an ideal DC voltage source with an ideal transformer representing the voltage conversion and also with a finite output resistance ROUT [21, 22], composed by RSSL (slow switching limit resistance related to the charge redistribution loss) and RFSL (fast switching limit resistance due to the finite conductance of switches), expressed by

$$ {R}_{\textrm{SSL}}={K}_C\frac{1}{C_F{f}_{\textrm{SW}}} $$
(2)
$$ {R}_{\textrm{FSL}}={K}_S{R}_{\textrm{ON}} $$
(3)

where KC and KS are topological factors determined by the charging scenario, CF is the capacitance of the flying capacitor, fSW is the switching frequency, and RON is the on-resistance of the switches.

The overall output resistance becomes

$$ {R}_O\approx \sqrt{{R_{\textrm{SSL}}}^2+{R_{\textrm{FSL}}}^2}. $$
(4)

This model assumes that the output voltage is an ideal DC voltage with neglected voltage ripple. Reference [23] pointed out that Eq. (4) may be inaccurate when the output ripple voltage is very large and presented an improved solution. Deviation from Eq. (4) may also occur if RSSL is close to RFSL. Otherwise, this model is accurate enough in the estimation of the RO and in predicting the output voltage VO; thus, it became a widely used practical model [24, 25].

Here, we present examples to calculate RO for three topologies: the 2×, 3/2×, and 4/3× topologies that use two-phase clock. We design all RON as equal, as they conduct the same amount of charge. Table 1 summarizes KC and KS for the three VCRs. The major loss is due to the equivalent IR drop of ROUT, and from Eqs. (2) and (3), it is necessary to reduce ROUT loss, high fSW, and large transistor width WSW.

Table 1 Summary of the equivalent output impedances of three VCRs

3.2 Switching and Parasitic Losses

In addition to conduction losses, the gate-drive switching loss PSW and parasitic loss PPARA are also significant, especially for fully integrated SC converters. They are actually determining the peak efficiency of the regulated SC converters, due to the adjustment of the output resistor RO to obtain a regulated VO under different loads. At certain VO, the theoretical efficiency would be identical. Le et al. analyzed in [26] these two losses in addition to Seeman’s model. We can calculate the gate-drive switching loss PSW by knowing the switching frequency fSW, the gate capacitance CGATE, and the driving voltage VSW. Regarding the parasitic loss, it is still complex and may vary a lot over different topologies [27, 28].

In 2020, Jiang et al. [29] presented a unified method to simplify the parasitic loss calculation by observing the voltage swing of individual parasitic capacitors. We use 1/3× mode SC converters as examples to analyze the parasitic loss reduction. We assume that the additional charge introduced by the parasitic capacitors will not affect the flying capacitor voltages, as the parasitic capacitors are much smaller (usually below 5%) than the main capacitors. We also suppose a no-load condition such that the capacitor voltages would not change among the operational phases.

Let us consider the 1/3× SC converter from Fig. 5, with the positive- and negative-plate parasitic capacitors C1p+, C1p, C2p+, and C2p, where we labeled their voltage swings in both phases. For the summation-mode converter, when Ф1 changes to Ф2, C1p+ charges from VO to VIN with a charge Q1P+. The energy sourced from VIN is

$$ {E}_{1P+,\textrm{CH}}={V}_{\textrm{IN}}{Q}_{1P+}={V}_{\textrm{IN}}\left({V}_{\textrm{IN}}-{V}_O\right){C}_{1P+}=\frac{2}{3}{V}_{\textrm{IN}}^2{C}_{1P+} $$
(5)
Fig. 5
4 circuits diagrams with parasitic capacitors on top and bottom plates in 1 by 3 x summation mode and subtraction mode.

Parasitic capacitors on the top and bottom plates of the 1/3× mode

When Ф2 changes to Ф1, C1p+ discharges from VIN to VO, and the energy returned to VO becomes

$$ {E}_{1P+,\textrm{DIS}}={V}_{\textrm{O}}{Q}_{1P+}={V}_{\textrm{O}}\left({V}_{\textrm{IN}}-{V}_O\right){C}_{1P+}=\frac{2}{9}{V}_{\textrm{IN}}^2{C}_{1P+} $$
(6)

The energy loss due to C1P+ is the difference of Eqs. (5) and (6), and we can write it as

$$ {E}_{1P+,\textrm{LOSS}}={E}_{1P+,\textrm{CH}}-{E}_{1P+,\textrm{DIS}}=\frac{4}{9}{V}_{\textrm{IN}}^2{C}_{1P+} $$
(7)

For C1p, it charges from 0 to 2/3 VIN in Ф2:

$$ {E}_{1P-,\textrm{CH}}=\left({V}_{\textrm{IN}}-{V}_O\right)\frac{2}{3}{V}_{\textrm{IN}}{C}_{1P-}=\frac{4}{9}{V}_{\textrm{IN}}^2{C}_{1P-} $$
(8)

In Ф1, with all the charges dumped back to ground by C1p, the loss is

$$ {E}_{1P-,\textrm{LOSS}}={E}_{1P-,\textrm{CH}}=\frac{4}{9}{V}_{\textrm{IN}}^2{C}_{1P-} $$
(9)

In general, considering the parasitic capacitor CP, charged and discharged between two voltages VL and VH, in the charging phase, the energy sourced from the system is

$$ {E}_{P,\textrm{CH}}={V}_{\textrm{H}}\left({V}_H-{V}_L\right){C}_P $$
(10)

In the discharging phase, the energy returned to the system is

$$ {E}_{P,\textrm{DIS}}={V}_L\left({V}_H-{V}_L\right){C}_P. $$
(11)

Hence, the energy of the parasitic loss is the following:

$$ {E}_{P,\textrm{LOSS}}={E}_{\textrm{P},\textrm{CH}}-{E}_{P,\textrm{DIS}}={\left({V}_H-{V}_L\right)}^2{C}_P=\Delta {V}^2{C}_P $$
(12)

The dominant factor of the parasitic loss is the voltage swing ΔV that is (VH − VL), where the parasitic capacitor CP charges and discharges between these two voltages VL and VH. Then, we derive the parasitic loss of one parasitic capacitor CP as

$$ {P}_{\textrm{PARA}, CP}={C}_P{\left({V}_H-{V}_L\right)}^2{f}_{\textrm{SW}}=\Delta {V}^2{C}_P{f}_{\textrm{SW}} $$
(13)

By using Eq. (13), we can calculate the parasitic loss PPARA of all parasitic capacitors Cip+ and Cip (i = 1…N) by finding out the voltage swings of the positive and negative plates.

3.3 Gate Switching Loss and Parasitic Loss Reduction

The concept of reducing the gate-drive switching loss implies the use of low-voltage (thin-oxide) transistors [25]. The method places in cascode several thin-oxide transistors to withstand a higher breakdown voltage. Because the feature size of the thin-oxide transistor is less than that of the thick-oxide transistors, the gate parasitic capacitance is much lower.

Figure 6 shows the operating principle of the NMOS stacking transistors. The turn-on resistance RON of a MOS transistor is

$$ {R}_{\textrm{ON}}=\frac{L_{\textrm{MIN}}}{K{V}_{\textrm{OD}}{W}_{\textrm{SW}}} $$
(14)

where K is a process-related parameter, VOD is the overdrive voltage of the transistor, and LMIN is the minimum channel length. We can implement a power switch using one thick-oxide high-voltage transistor or two stacking thin-oxide low-voltage transistors. If the two implementations have the same RON, then for each type of transistors,

$$ {R}_{\textrm{ON}\_L}=\frac{1}{2}{R}_{\textrm{ON}\_H} $$
(15)
Fig. 6
3 circuits diagrams of N M O S stacking transistors with 2 M O S F E T with ck and X, Y input in stacking N M Os, along with its on and off state.

Operating principle of the NMOS (N-type metal-oxide semiconductor) stacking transistors

Considering Eqs. (14) and (15) together, the size ratio of the thick-oxide transistor to thin-oxide transistor is

$$ \frac{W_{\textrm{SW}\_H}}{W_{\textrm{SW}\_L}}=\frac{L_H}{2{L}_L}\frac{K_L{V}_{\textrm{OD}\_L}}{K_H{V}_{\textrm{OD}\_H}}. $$
(16)

Now, the switching loss becomes

$$ {P}_{\textrm{SW}}={V}_{\textrm{SW}}^2{f}_{\textrm{SW}}{C}_{\textrm{GATE}}{W}_{\textrm{SW}}. $$
(17)

Then, the ratio of their switching losses is

$$ \frac{P_{\textrm{S}{\textrm{W}}_H}}{P_{\textrm{S}{\textrm{W}}_L}}=\frac{V_{S{W}_H}^2{C}_{\textrm{GAT}{\textrm{E}}_H}{W}_{\textrm{S}{\textrm{W}}_H}}{2{V}_{\textrm{S}{\textrm{W}}_L}^2{C}_{\textrm{GAT}{\textrm{E}}_L}{W}_{\textrm{S}{\textrm{W}}_L}}. $$
(18)

In a typical 0.18 μm CMOS process, we have 1.8 V thin-oxide transistors and 5 V thick-oxide transistors. Then, the lengths are LH = 0.5 μm for NMOS, LH = 0.7 μm for PMOS (p-channel metal-oxide semiconductor), and LL = 0.18 μm. The overdrive voltages are VOD_H = 3 V and VOD_L = 1.2 V. We extract other parameters from the process design kit and list them in Table 2. The results show that using low-voltage transistors, we can obtain a 2.615× and 1.778× switching loss reduction for the NMOS and PMOS switches, respectively. This helps the converter to achieve 82% peak efficiency in 0.18 μm CMOS. In [30], six thin-oxide transistors used in a cascode arrangement allow the SC converter in 65 nm CMOS to switch faster.

Table 2 Switching loss calculations

It is even more necessary to use cascoded devices in high-voltage applications, since the QgRON product of the thin-oxide transistor is much smaller than that of the high-voltage DMOS (deep diffusion metal oxide semiconductor) transistors [31, 32]. In [32], two 3.3 V transistors cascoded in an 11/1× topology convert a high voltage (35–40 V) to 3.3 V with 94.7% peak efficiency. In [8], 3.3 V and 5 V transistors cascoded in a 6× step-up SC converter with a 15 V output voltage exhibit reduced gate switching loss.

Parasitic loss is also proportional to the switching frequency. It becomes significant on a fully integrated SC converter, especially when MOS capacitors utilize flying capacitors. Multiple works [33,34,35,36,37,38,39] reported reduced parasitic losses, by using low parasitic ferroelectric capacitor [33], deep trench capacitor [34, 35], parasitic loss recycle techniques [36, 37], and dynamic voltage biasing techniques [38, 39]. All these methods reduce the loss caused by parasitic capacitance and can increase the efficiency.

3.4 Efficiency Optimization

We can obtain the overall efficiency as

$$ \eta \left({f}_{\textrm{SW}},{W}_{\textrm{SW}}\right)=\frac{P_O}{P_O+{P}_{\textrm{LOSS}}} $$
(19)
$$ {P}_{\textrm{LOSS}}={P}_C+{P}_R+{P}_{SW}+{P}_{\textrm{PARA}} $$
(20)

Obviously, the gate switching loss PSW and the parasitic loss PPARA are proportional to the switching frequency fSW and the transistor size WSW, while the charge redistribution loss PC and the conduction loss PR are inversely proportional to fSW and WSW. Then, we can find the optimum efficiency point by sweeping fSW and WSW.

Figure 7 illustrates an example of efficiency curves with the optimum point at the maximum load condition (ILOAD = 600 uA) [25]. We conducted the efficiency calculation and simulation using MATLAB, and to obtain the peak efficiency of each VCR, we swept fSW and WSW from 10 MHz to 30 MHz and from 10 μm to 40 μm, respectively. Figure 7 shows the results in three-dimensional curves. For the 4/3× mode, the peak efficiency is 82.5% when fSW is 11.3 MHz and WSW is 27.5 μm. For the 3/2× mode, the peak efficiency is 80.5% when fSW is 15.7 MHz and WSW is 33.94 μm. The optimal fSW for the 3/2× mode is higher than the optimal fSW of the 4/3× mode because we used fewer transistors; as the switching loss is lower, we can employ larger transistors. For the 2× mode, the peak efficiency is 80% when fSW = 19 MHz and WSW = 40 μm. This mode uses the smallest number of transistors; thus, switching and parasitic losses are significantly lower than the other two modes; however, both fSW and WSW can be larger. In conclusion, by using this model, we can obtain optimized efficiency for certain topologies.

Fig. 7
3 3-D graph of efficiency, W subscript s w versus frequency for V C R as 2 x, 3 by 2 x plot a contour sheet ranges from 0.5500 to 0.8500, and 4 by 3 x plots a contour sheet ranges from 0.7000 to 0.8500.

Simulated efficiency with respect to the switching frequency fSW and the width of the power transistor width WSW

4 Clock Generation and Distribution: 123-Phase Converter Ring

4.1 General Concept of Multiphase Interleaving

We can consider output voltage ripple as power loss, because a larger ripple means that we should reserve a larger supply voltage for the load. To reduce the ripple, we can easily apply a multiphase interleaving scheme in fully integrated SC power converters [26, 40,41,42,43,44,45,46,47]. Figure 8 presents the concept and system diagram, where we implement multiphase interleaving by partitioning the SC power stage into multiple small cells, with these power cells driven by different clocks (ck1 to ckn). Adjacent clocks have a 360°/n phase shift and T/n delay where T is the switching clock period, such that the output voltage has a higher equivalent frequency; thus, we can reduce the output voltage ripple. An n-phase voltage-controlled oscillator (VCO) can easily generate multiphase interleaving clock signals. To effectively regulate VOUT, a frequency modulation scheme is favorable, as it saves unnecessary switching losses as well. After the error amplifier senses VOUT and generates the control signal VC, it will adjust the switching frequency according to the load condition. Besides reducing the output voltage ripple, we can also significantly reduce the input current (IIN) ripple as the discontinuous inrush input current of a single-phase converter would be evenly distributed among interleaving phases for a multiphase converter. Consequently, we can use smaller input and output capacitances. As such, more interleaving phases are beneficial and preferable in recent fully on-chip SC converter works [40,41,42,43,44,45,46,47]. However, distributing a large number of interleaving clock phases across a large converter chip area can be challenging.

Fig. 8
A schematic diagram of a multiphase interleaving S C Switched capacitor converter with a comparator, voltage-controlled oscillator, and power stage cell along with its timing diagram.

System diagram and waveforms of multiphase interleaving SC converter [27]

4.2 Clock Generation: Centralized Versus Distributive

Figure 9 presents two schemes of interleaving clock generation and distribution. Figure 9a shows the H-tree structure with centralized clock generation and then distribution, commonly used in large digital circuits and systems. For a multiphase SC converter, each power cell needs one clock signal from the central VCO, and N phases will need an N-bit clock bus running over the whole converter, complicating the design. Moreover, in order to obtain good phase matching, the power stage layout has to be symmetrical, thus restricting the layout shape of the power management unit to rectangular. To distribute the interleaving clock phases by each of the power cells, we need to route them from the central VCO to the power cells. Then, we will get a parasitic capacitor CP_1_CELL = log2N × L × CPAR0, where N is the phase number and CPAR0 is the unit parasitic capacitance in fF/μm. The total parasitic capacitance of all the clock wires driven by the VCO is CP_TOTC = N × log2N × L × CPAR0. Therefore, the power consumption for the clock distribution is large. Consequently, the number of clock phases in most of the works is under 50 [26, 41,42,43].

Fig. 9
A circuit diagram of unit parasitic capacitance with power cell and inverter along with its centralized and distributive scheme for 32-bit ring oscillator.

Comparison between clock generation and distribution, as well as parasitic capacitance on the routing wires of (a) centralized scheme and (b) distributive scheme

On the other hand, Fig. 9b presents the distributed scheme [44,45,46,47], where we design the power cells to be identical, and the adjacent power cells generate clock phases with a fixed delay from the preceding. When connecting N such cells (N is an odd number) in a ring, we can form a ring oscillator along with the power converter. Each power cell supplies power to the power rails that run through the whole converter. When compared to the H-tree scheme, the distributed clock paths are shorter, and then the parasitic capacitance along the clock wire is only CP_TOTD = N × L × CPAR0 which is much smaller. Subsequently, the power consumption of the VCO is also much lower. Meanwhile, it is not necessary to locate the power cells on the periphery of the chip; actually they can run through the loading blocks that require power, as long as the connected power cells form a closed-loop ring. One possible drawback for this scheme is that the total parasitic capacitance along the clock routing paths will affect the switching frequency of the power ring. To tackle this issue, we should size the inverters in the ring oscillator accordingly.

For fully on-chip SC converters dealing with fast load transients, even the input and output decoupling capacitors, or at least part of them, need full integration on-chip. They would occupy a large die area, and we can reduce their values only by decreasing both the input rush current and the output-voltage ripple. We can effectively diminish these ripples by using multiphase interleaving. Two recent works with a large numbers of phases emerged, 101 phases in [47] for driving LEDs and 123 phases in [45, 46] for microprocessors, thus achieving very low output voltage ripple without using external capacitors.

To summarize, the distributed scheme has the advantages of layout flexibility and lower power consumption when compared with the centralized scheme. We should draw a special attention to the buffer capability of the distributed ring oscillator.

4.3 123-Phase SC Converter Ring

Figure 10 illustrates a ring-shaped SC converter surrounding the load, to take full advantage of the multiphase interleaving technique [45]. In addition, the converter ring achieved a unity-gain frequency (UGF) higher than its switching frequency by setting its dominant pole on the output node. The designed converter ring consists of many time-interleaved power cells and only one controller. For a Lego-like layout, the size of the controller layout is exactly the same as that of one power cell. We planned the input and GND pins of the converter ring on every corner of the chip, without affecting the pads of the load. Similar to a standard pad ring, the converter ring surrounds the load in the square, with minimum changes (if not zero change) necessary for the existing layout of the load. One of the advantages of the power cell approach is its simplicity: we only need to design one power cell and the complete power ring. The converter ring layout and bumping diagram are also compatible with flip chip packaging. One advantage of integrating a step-down DC-DC converter on chip is that the input current is much smaller than the load current, thus reducing the input bump/pad current stress.

Fig. 10
A ring-shaped multiphase S C Switched capacitor power converter with the controller, power cells, V subscript IN, ground and V subscript out the grid, and pads of the load.

A ring-shaped multiphase SC power converter [45]

The regulation of the SC converter can use LDO-assisted loop [48], hysteresis control [49], pulse skipping modulation [50], and frequency modulation [51]. For a multiphase SC converter, frequency modulation is the most appropriate method since using LDO and using hysteresis control are both not feasible.

Figure 11 exhibits the small-signal analysis of the multiphase SC converter. One key feature of this circuit is the fact that the UGF of the designed multiphase converter is a few times higher than its switching frequency. The following features allow that to happen: (1) to consider the time-interleaved multiphase SC converter as a pseudo-continuous-time power converter, (2) to set the dominant pole at the output node, (3) to employ a high-speed error amplifier (EA), and (4) to tune the oscillator frequency through its supply to change the switching frequency of all phases instantly and simultaneously.

Fig. 11
A multiphase S C switched capacitor converter in small signal analysis with reference voltage as input, V to F and F to V converter, A, D D C voltage output as V subscript out.

Small-signal analysis of the multiphase SC converter

A switched-capacitor circuit is basically equivalent to a discrete-time resistor. Therefore, it only provides a first-order filtering in the power stage. Meanwhile, multiphase operation empowers the SC converters with more attractive features, for example, smaller input and output ripples, and faster transient responses, that allow the converter to respond within a small fraction of the switching period, acting more like a continuous-time power converter. On the other hand, the LC filter of a buck converter operating in continuous conduction mode (CCM) is a second-order filter, which can provide better filtering but limits the loop bandwidth and slows down the transient response. Also, it is necessary to change the inductor current before the regulation of the output voltage during load/line transients.

For the control loop design, there are several benefits of designing the dominant pole at the output node, as discussed in [46]. If the output pole pO is a nondominant pole, the loop needs to have an internal dominant pole with a frequency that is a couple of decades lower than pO, which will limit the UGF. To set pO as the dominant pole, the converter can drive a large capacitive load without affecting the loop stability. Higher capacitive load is always better for the loop stability.

Following a conventional design methodology, the AC signals that are higher than fSW/2 cannot pass through a discrete-time power stage, as imposed by the Nyquist theorem. On the other hand, multiple time-interleaving phase switched-capacitor power cells (SCPCs) act as a pseudo-continuous-time stage [46], which means that the AC signal higher than fSW can also pass through the multiphase discrete-time power stage. In the VCO-based pulse frequency modulation (PFM) of SC converters, after the conversion of the voltage information VDDC to the frequency domain by the VCO, there is another conversion back to the voltage domain through the multiphase SC power stage. Therefore, with a high-speed error amplifier (EA) design, we can obtain an UGF that is a few times higher than the fSW.

Although the buck converter can also enjoy the bandwidth extension benefit of multiphase interleaving, the abovementioned pseudo-continuous-time condition does not apply to buck converters because they can use PFM control as well, including hysteretic control and constant on-/off-time control. However, in fact, the constant on-time control belongs to both categories of PWM and PFM, because the inductor-based converter always requires the duty ratio information for output voltage regulation. Besides, during the load transient period, the duty ratio should be optimally 100% for light-to-heavy load transient and 0% for heavy-to-light load transient. The PWM sampling effect still exists in the constant-on-time controller, limiting the bandwidth extension. Therefore, we can only apply to SC converters [44, 45] a fixed duty ratio PFM, considered as a pseudo-continuous-time operation.

Figure 12 presents the chip micrograph of the first version of the converter ring design [44] implemented in 65 nm CMOS, for microprocessor applications. It has 30 power cells and 1 controller on the top edge plus 31 power cells on the other 3 edges, forming a ring around the whole chip. The number of power cells can be an arbitrary large number, depending on the layout and power cell shape and sizes. But the number of power cells will also decide the number of inverters in the ring oscillator, which determines the maximum switching frequency and consequently the maximum output power.

Fig. 12
A chip micrograph of D C to D C converter ring with 4 loads, control circuits, ground V subscript IN, a d length of 1875, 1635 micrometer.

Chip micrograph of the DC-DC converter ring [43]

Figure 13 displays the measured load transient response, reference tracking, and output voltage ripple waveforms of the first converter ring design. We place one load of 25 mA on each corner of the chip to emulate the load transient events. For the load transients between 10 mA and 110 mA, the output voltage variations are within 58 mV with VIN = 2 V, VOUT = 1.1 VCM = 2/3, benefiting from the designed high UGF. To accommodate the dynamic voltage scaling (DVS) function, we demonstrated a reference tracking speed of 2.5 V/μs. The measured output ripples range from 2.2 mV to 30 mV, in a variety of loads and VOUT/VIN conditions. The phase mismatch on the chip corners and PVT variations dominate the nonideal output ripples. In summary, this SC converter ring exhibits low voltage ripple and fast transient response.

Fig. 13
A response waveform for load transient response and reference tracking with chaotic fluctuating trends, and V subscript out steady state for 4 and 2 on-chip loads.

Measured load transient response, reference tracking, and output voltage ripple waveforms of the converter ring [43]

5 Multi-Output Switched-Capacitor Converter

For multicore application processors in the smartphone and the smart watch, power-saving techniques such as dynamic voltage and frequency scaling (DVFS) that extend the battery charging cycle are highly favorable. Yet, each core may need a different supply voltage [52, 53]. High-efficiency fully integrated SC power converters with no external component are promising candidates. Figure 14 shows the strategy of dynamic power cell allocation proposed in [18]. Typically, SC converters with different specifications have independent designs, leading to a large area overhead as each converter has to handle its peak output power. Recently, multi-output SC converters emerged to tackle this issue. Reference [19] uses the on-demand strategy to control the two outputs, each with a different loading range, with the outputs not interchangeable. Reference [20] fixes the two output voltages with voltage conversion ratios (VCRs) of 2× and 3× only. Reference [54] integrates the controller, but the three output voltages are still from three individual SC converters. Without reallocating the capacitors in the power stages, capacitor utilization is low as it is necessary to reserve margins to cater for each peak output power. Finally, [55] proposed a dual-output SC converter with one flying capacitor crossing technique to improve the power efficiency.

Fig. 14
A circuit diagram of power cell allocation and architecture of dual output S C converter with dynamic power allocation of bi-directional shift register, channel selection switches, frequency comparator, dual-path V C O, and ratio selector.

Strategy of dynamic power cell allocation and system architecture of the dual-output SC converter [18]

In this subsection, we introduce a fully integrated dual-output SC converter with dynamic power cell allocation for application into processors. We can dynamically allocate the shared power cells according to load demands. A dual-path VCO that works independently of power cell allocation achieves a fast and stable regulation loop. The converter can deliver a maximum current of 100 mA: we can adjust one output to deliver 100 mA, while the other handles a very light load, or adjust both outputs to deliver 50 mA each with over 80% efficiency.

The converter consists of two channels (CH1 and CH2) with output voltages VO1 and VO2, respectively, with each output regulated through frequency modulation by dual VCOs. The switching frequencies of the two channels are f1 and f2. The strategy of dynamic load allocations adjusts the switching frequencies to be equal in order that both channels have the same power density, and the whole converter obtains the best overall efficiency.

The SC converters that consist of multiple power cells can operate in a multiphase interleaving mode, with each power cell as the unit cell allocated between two channels. From Fig. 14, we assume that the two channels start with the same number of power cells, but the load of CH1 is larger than that of CH2. To regulate the outputs properly, we should initially have f1 > f2, with more power cells eventually assigned to CH1. This means that the physical boundary should move to the right, until f1 and f2 are approximately equal. By balancing the power densities of the two channels with an optimal switching frequency, we balanced both switching and parasitic losses leading to their final reduction. By dynamically adjusting both the numbers of power cells and the optimal switching frequencies, we ensure that the channels provide sufficient power to the loads and maximize the utilization of capacitors.

The channel selection switches connect the power cells to either CH1 or CH2. The boundary of the two channels are controlled by the outputs of the bidirectional shift register (SR) sel[1:m + n] control the boundary of the two channels. We determine the direction of boundary shifting with the frequency comparator. After each comparison, the boundary will only shift along adjacent power cells as sel[1:m + n] will only shift by one bit. As such, we minimize the potential glitches due to reconnecting the power cell. There are a total of 82 power cells, and they work with interleaved phases to reduce the output ripple voltage. The ratio selector that senses VREF/VIN determines the VCRs of the two outputs (R1 and R2).

Figure 15 presents a dual-path voltage-controlled oscillator (VCO) to enable the allocation while minimizing cross regulation. The VCO consists of 82 delay cells, generating the clock phases for each power cell. One delay cell in CH1 (DC1[n]) has a complementary delay cell in CH2 (DC2[n]). We choose the phases φ1[n] and φ2[n] through the MUX (multiplexer), subsequently distributed to the power cell. If sel[n] = 1, it enables DC1[n] of VCO (CH1). Simultaneously, the MUX will short DC2[n] with the clock phase redirected to the next cell. In this way, the number of delay cells in each VCO is equal to the number of its power cells, and multiphase interleaving takes effect to reduce the output ripple voltage. The error amplifier controls the frequency of the VCO, with the two outputs regulated separately, regardless of the power cell arrangement. As the speed of the regulation loop is much faster than that of the power cell allocation, we ensure stability. Each power cell consists of two flying capacitors and eight power transistors with the VCR as 2/3× or 1/2×. We optimize the configuration of each power cell to minimize the parasitic loss. The channel selection switches, controlled by sel[n], connect the local output VOL to VO1 or VO2.

Fig. 15
3 circuits diagram of dual path V C O with frequency comparator, delay cell of dual path V C O, and power stage of S C converter with channel selection.

Circuit implementation of the dual-path VCO, including its delay cell and power stage [18]

Figure 16 illustrates the control logic composed by the frequency comparator and the power cell shift register. First, the one-shot signals (ck1os and ck2os) control P1 and P2 to charge CC1 and CC2 for one clock period only. The activation of the ready signals (ready1 and ready2) happens after charging finishes, triggering the comparison between VF1 and VF2. After a short delay, there is the reset of CC1 and CC2. For the comparison, if VF1 < VF2, it means that f1 > f2, setting the direction signal of the shift register as direct = 0, and the selection signals will shift left by one bit. This frequency adjustment repeats until f1 and f2 are very close to each other. The frequency comparator will then issue stop = 1, and the shift register stops shifting. To ensure accurate charging, we need to well match the current sources and capacitors (CC1 and CC2). For robust control, we added offsets to the comparators to form the hysteresis window. The clocks ck1 and ck2 drive the whole process, without an additional system clock.

Fig. 16
A circuit diagram with frequency comparator with matched element, a bi-directional shift registers with reset, stop, and dirt along with its timing diagram.

Circuit implementation of the frequency comparator, the bidirectional shift register, and the timing diagram of the frequency comparison [18]

Figure 17 presents the chip micrograph of the symmetrical dual-output SC converter, fabricated in 28 nm CMOS, with and active area of 1.2 × 0.5 mm2. Figure 18 plots the measured waveforms of the steady-state outputs, reference tracking, and load transient. The measured results verified the independent regulation of the two output voltages with the adjustment of the two switching frequencies to be very close. The measured reference up- and down-tracking speeds were 500 mV/μs and 334 mV/μs, respectively. We did not observe any obvious cross regulation at VO2 while VO1 was undergoing reference tracking. With the load at VO1 switched from 4 mA to 40 mA, the settling time was within 500 ns. The cross regulation at VO2 was less than 10 mV at the rising edge and negligible at the falling edge, confirming that the dual-path VCO control can realize minimized cross regulation.

Fig. 17
A chip micrograph of dual output S C switched capacitor converter with the controller, decoupling capacitor, power stage, and length, breadth of 1, 1.5 millimeter.

Chip micrograph of the symmetrical dual-output SC converter

Fig. 18
4 waveforms of steady state with adjusted frequency, load transient response, reference tracking, and regulation restoration with power cell allocation.

Measured waveforms of the steady-state output voltages, reference tracking, and loading transient response

Figure 19 displays the measured efficiencies versus the load currents IO1 and IO2. The peak efficiency was 83.3% and the split load currents were 50 mA for both channels. Due to dynamic power cell allocation, the converter reached over 80% efficiency, and it was quite constant when IO1 and IO2 were larger than 15 mA. The efficiency with allocation improves by 4.8% when compared with the circuit without. Table 3 addresses the performance comparison. We can conclude that by using dynamic power cell allocation, the proposed dual-output SC converter exhibited high efficiency over a broad load range for the two outputs with minimized cross regulation.

Fig. 19
A 3-D graph of efficiency, load current I subscript o1, versus load current I subscript o2 plots contour sheet, ranging from 70 to 85, and a graph efficiency versus load current of power sharing and allocation.

Measured efficiency versus loading currents with and without dynamic power allocation

Table 3 Performance comparison with the state of the art

As a conclusion of this subsection, we presented a fully integrated dual-output SC converter with dynamic power cell allocation for application processors. We dynamically allocate the power cells according to load demands, improving the efficiency by 4.8% when compared with the structure without allocation. The circuit contains a dual-path voltage-controlled oscillator (VCO) that works independently of the power cell allocation to implement a fast and stable regulation loop. The converter achieved 83.3% peak efficiency and a maximum 100 mA while maintaining minimized cross regulation.

6 Conclusions

In this chapter, we discussed state-of-the-art circuit design techniques addressing the challenges of fully integrated switched-capacitor power converters, which is one of the important ingredients of power management circuits in recent SoC designs. We discussed the design considerations including topology generation, loss analysis, ripple reduction, and closed-loop feedback control. We also presented two design examples in nanometer CMOS to demonstrate the SC converter performances. Last but not least, we exposed practical design guidelines and suggestions for future works.