Abstract
This chapter starts by highlighting the requirements and existing techniques in producing digital delay, summarizing the state of the art. Subsequently, a topology assessment is presented based on specified performance metrics. The proposed delay cell which was designed, fabricated, and characterized is then described. Specifically, the proposed design is based on a topology with low-supply-noise sensitivity and low jitter. Functionality is extended to support coarse/fine control for the output delay value, without the need for additional hardware. This is made possible by taking advantage of the body-biasing capabilities available in FD-SOI technology. The proposed delay element presents unique performance characteristics in terms of the achieved delay resolution and delay dynamic range. The chapter concludes with the demonstration of a delay line prototype, fabricated in ST 28 nm FD-SOI technology. After a general overview of delay techniques the proposed topology is described, by focusing on the major design aspects. Measurement results are then presented and a short discussion follows on the characterization findings.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
1 Delay Elements Review
Delay manipulation is a major concern for the reliable implementation of circuits whose purpose is timing. For example, clock phase generation, deskewing, data alignment, hazard mitigation are just some signal processing functions that could entail digital delays. However, as one acknowledges the fact that any sort of signal processing function intrinsically produces delays, then designing for delay as a performance metric is straightforward. It is exactly this intrinsic non-ideality, namely propagation delay, that is usually manipulated to build digital delay circuits (Fig. 6.1).
Depending on the application, the range and resolution when controlling a delay value differs, justifying numerous approaches in dealing with the issue. On the one hand, some sort of simple operation such as to avoid setup and hold violations in a datapath can be simply handled with some extra buffering. On the other hand, producing reliable clock signals comes along with designing elaborate systems with multiple control loops. The reason for the required complexity is high precision in implementing accurate timing operations. Common examples of systems that employ controlled delay lines include delay locked loops (DLL) [1], digitally controlled oscillators (DCO), phase locked loops (PLL) [2], asynchronous pipelines [3], and time-to-digital converters (TDC) [4]. Delay circuits are also used to realize pulse-width control circuits with output programmable duty cycles used in ADCs or DACs [5] and more recently continuous-time digital filters [6]. Besides, it is not uncommon to employ dedicated delay elements for relatively simpler signal processing operations as well, like phase shifting, interpolation, or non-overlapping clock generation. Manipulating the mechanisms involved in the generation of delay has led to the introduction of multiple ideas throughout the years. Apart from the straightforward cascading of inverters, or any logic gate for that matter, there has been specific research activity targeting power-efficient, variable delay elements. Below, we present the prevalent techniques over the recent years: capacitive shunting, current starving, and thyristor-based design. After the presentation, a short discussion follows on the topology selected for further study.
1.1 Cascaded Inverters
One of the simplest ways to introduce a digital delay has been cascading inverters (Fig. 6.2). In this case, delay is created due to the finite slopes of charging and discharging the loaded inverter outputs. These charging and discharging slopes could be modeled based on the time constants set by the effective resistance of the switching transistors and the output capacitance. A scheme of cascaded inverters interleaved with multiplexers is a typical example implementation. A recent realization was proposed in [7], where high effort stages of cascaded inverters were used to produce delay values in the order of a nanosecond. Evidently, any sort of variation in process, supply, or temperature readily translates in variation in delay [8]. Moreover, the delay is inversely proportional to the switching slope. So, for bigger delay values there is more power consumption involved.
1.2 Capacitive Shunting
As seen above, for the general arrangement of cascades of inverters or in the case of inverters driving capacitive loads such as the one in Fig. 6.3, the output signal slope varies with the time constant. This represents the main mechanism for creating delay. Actually, delay can be varied by modifying output capacitance directly as in [9] or the charge flow to it, through control voltage V ctrl as seen in Fig. 6.4 similarly to the proposed cell in [10].
1.3 The Semi-static Approach
A semi-static approach to ensure minimum short-circuit current during transitions in a delay element is proposed by [11]. The topology is shown in Fig. 6.5 for producing rising-edge delays. Input is connected to two separate arrangements, where in each one, the complementary transistor is controlled by biasing a current mirror. Thus, a static consumption overhead is introduced. The two preliminary outputs are combined in a single stage by mitigating the short-circuit currents.
1.4 Current Starving
The current-starving technique is realized by adding extra MOS devices in series with the ones of the inverter. This effectively reduces (starves) the current associated with the switching events, which directly impacts the propagation delay.
As shown in Fig. 6.6, delay control can be established either by directly modulating the voltage on the gates of the “starving” transistors. A digital approach would be switching-in transistors parallel to the starving as shown in the work of [12] but this leads to a non-monotonic increase of the delay with the ascending binary input pattern. This is due to parasitic capacitance being added, which counteracts the reduction of effective resistance. In order to achieve a monotonic behavior, it is suggested to control only the current of the starving transistors. This could be done by mirroring the output of a current DAC at the cost of extra static consumption. This approach is proven to increase tolerance to PVT variations because the digital code reflects only the control current. On another note, the current-starving technique produces a less sharp transition slope, which actually means more short-circuit current. A way to limit this side-effect was proposed in [13] through the use of series diodes to limit the output swing, and effectively reduce power consumption.
An interesting modification to the current-starved inverter is a topology resembling the Schmitt trigger. The idea was proposed in [14] in the context of an SRAM cell. The slow transition of the starved inverter is remedied with a cascaded inverter. This idea was extended to Fig. 6.7 in the work of [15] where the Schmitt trigger is presented with positive feedback action from the output signal, therefore improving the output transitions. This renders this topology superior in terms of signal integrity and delay in comparison with current-starved, transmission-gate load based, cascaded inverter based set-ups. The idea of positive feedback for transition slope modification also draws from the thyristor-based delay elements, which are presented in the next section.
1.5 Thyristor Delay Elements
The basic principle of a thyristor device is the activation of the device when a certain conduction threshold is crossed. In a thyristor-based delay element this operation is replicated by a positive feedback mechanism that completes a delay event after crossing a threshold. The delay event could be a capacitor that is slowly charged or discharged and after a threshold voltage across its terminals is reached, the charging/discharging is forcefully accelerated through positive feedback action.
The concept is illustrated in Fig. 6.8, using a complementary transistor pair. In order to describe functionality, we will start from a steady state where both transistors are off and the gate voltages (V Gp and V Gn) are VDD for gate P and zero for gate N. As we lower the voltage on P, in the vicinity of VDD-VTHp, the PMOS begins to turn on. However, as the PMOS turns on, the voltage on its gate is kept driven towards the ground, not only because of the triggering action, but also because the NMOS is now beginning to turn on as well. Therefore, in addition to the initial triggering action on the PMOS gate, the NMOS introduces an additive force, which further accelerates the switch: a positive feedback loop. Obviously, to accommodate another switching cycle, the thyristor must be pre-charged again as there is no way to return to prior state. This can be realized by adding a pre-charge circuit. The complete topology of the cell, where sequencing of the delaying and pre-charging events take place is shown in Fig. 6.9.
It works with two similar parts for supporting rising and falling edges, where the delayed rising output pre-charges the part that delays the falling edge. For example, before any rising edge of D, \(\bar {Q}\) has already reached high state through the preceding high to low transition of D. So, the left side thyristor is ready to delay the rising edge of D. The thyristor-based methodology exhibits attractive characteristics in terms of power consumption and supply sensitivity. Power is consumed during switching with a small shunt current. Primarily, there is no static consumption apart from any control current generation mechanism. Also, supply sensitivity is generally low because the delay is composed of two components: the controlled current part and the switching part. The switching part is the only part depending on the supply value, because this is when a charge cycle is completed through positive feedback. If this duration is negligible in terms of the desired delay value, sensitivity to the supply voltage is minimal. The thyristor-based delay element has attracted attention due to its special characteristics and various works have been proposed based on it. For instance, in [16], the fact that the thyristor-based topology can suffer from charge sharing is acknowledged. The parasitic capacitance tied to \(\bar {Q}\) is shared with the source capacitance of the input transistors. The work proposed the addition of switches to pre-charge the output nodes, prior to switching the input.
In the work of [17], the thyristor-based topology was used to accommodate a delay cell that would support narrowly spaced bursts of asynchronous pulses. The delay element featured in this work produced delayed pulses of determined width. So, with respect to Fig. 6.9, only one side of the circuit was used. The circuit made use of an additional capacitor element, which can be slowly discharged through the current source. The same concept is followed for a similar circuit in the work of [6]. The design comprised two capacitors and two current sources with independent biases for the same delay cell, in order to support further configurability. The same general methodology is used for the thyristor delay topology in [18]. The work proposed two ideas on the basic scheme: (1) the addition of an extra current source to accommodate a sharper edge and reduce the transition shunt current and (2) a series diode connected with the output transistor pair in order to modify the thyristor activation point.
1.6 Choosing a Delay Element
Table 6.1 presents a qualitative comparison of the characteristics of the delay element types that were detailed above. Evidently, there have been a lot of ideas to produce digital delay elements, but out of the main topology classes described above, the one that seems more attractive for advanced CMOS technology nodes is the thyristor-based one. This is because of its superiority against supply and temperature variations, while there is no static consumption. All the reviewed implementations of the thyristor-based delay element are based on a usual control mechanism during transitions by modulating the charge/discharge current. As different technology nodes are targeted and delay ranges differ, it is difficult to quantify the efficiency of the topologies. To overcome this issue, simulations have been reported for comparisons. But even so, it is not clear what amount of optimization has gone into the comparisons and yet, no variant has proposed a remarkable improvement over the main topology.
2 Coarse/Fine-Tuning Delay Element and Line Using Gate and Body-Biasing in 28 nm FD-SOI
It is clear that selecting the proper delay cell depends mainly on its assignment. For the circuit implementation of this section the target is the delay line specification described in [19]. The delay line is specified as a part of the feedback branch of a high-speed decision feedback equalizer, aiming at digital processing in the vicinity of 1–2 Gbps. This determines a regime for the delay range of a few hundreds of picoseconds. The delay line is also specified with a wide range of configurability, in the order of nanoseconds. Based on the aforementioned categories the proposed granular delay element is a thyristor-type and is based on an improvement of the one in [20]. The modifications effectively limit short-circuit current consumption and extend functionality in providing delay control through transistor body-biasing. Implemented in 28 nm UTBB FD-SOI (ultra-thin body and buried oxide fully depleted silicon on insulator) CMOS technology, the transistor’s body terminal serves as a fine control knob, which complements the coarse gate control. This way an unprecedented coarse/fine control scheme is realized without any extra hardware [21].
2.1 Delay Element Design
The proposed delay cell topology is displayed in Fig. 6.10 and three stages can be readily distinguished. The first stage is a current-starved inverter, whose output (V C) is connected to a second gated inverter stage. The second stage produces a feedback signal, V F, which controls transistors placed in parallel with the ones implementing current-starving. Finally, a driving stage generates complementary output signals. The circuit follows the typical thyristor-type operation. For a rising input, operation can be briefly summarized as follows: As input V IN rises, V C discharges slowly due to the current-starving scheme and up to the point where the second stage inverter threshold is crossed. At that time, the parallel transistor is activated, through V F, which effectively cancels the current-starving action. So, the discharge of V C is then accelerated and the switch is completed. The element’s consumption profile is dominated by the current-starved charging/discharging interval, which produces a short-circuit current up to the switching of the second stage. In an attempt to minimize the consumption of this action, the cell is designed with complementary inputs ensuring that the \(V_{\overline {IN}}\) signal precedes its complement. This is realized by cascading delay cells. Arriving directly on the second stage inverter, the input’s complement plays a preparatory role for the upcoming switch. As V IN arrives in a high state, \(V_{\overline {IN}}\) has been already driven low, shutting the path to the ground. This action enables the slow second stage output to be raised without short-circuit current loss. For the delay element at hand, substrate biasing contributes to current-starving. This is made possible thanks to FD-SOI technology. As seen in Fig. 6.11, the transistor channels are implemented with a buried oxide layer underneath, which prevents source-bulk junction leakage. This ensures a body-tie voltage range which is much wider than in bulk CMOS. Also, the body coefficient is much more important in FD-SOI (85 mV/V) than in the equivalent bulk node (40 mV/V).
Regarding control of the drain current, body-biasing is principally of the same nature as gate biasing. However, the electric field is applied through a thicker oxide and a more distant contact. One could easily predict that the effect on current modulation would be less profound. This is exactly the key observation for establishing an extra control knob to fine-tune the delay.
The transistors of the circuit are sized to produce delays in the order of picoseconds. In Fig. 6.12, on the top, we simulate the falling-edge delay variation, T d, against the control inputs. The NMOS starving transistor biases are fixed to V Gn = 500 mV for the gate, and V Bn = 0 mV for the body. The red curve refers to keeping the PMOS body as V Bpn = 1V and varying V Gp. It is a typical exponential variation with a steep ascent as current is starved under conduction threshold. In blue, we plot a family of curves, which refer to fixed values of V Gp = 500 mV, 570 mV, etc. and varying V Bp. The results confirm the above observation and reveal another important effect: careful combination of gate/body-biasing can lead to obtaining a segment of the delay range that refers to an almost linear part of the curve. To elaborate this, we can assume a gate voltage in the vicinity of 500 mV as seen in the bottom plot of Fig. 6.12. For this setting, the delay versus the body-bias variation provides a high correlation with a linear function, as well as a very fine-tuning capability in the order of fs/mV.
2.2 Delay Line Architecture
In an effort to fully investigate the topology and the control flexibility that body-biasing offers, a prototype delay line was designed and fabricated. The transistors of the first stage current-starved inverter were laid out in twin and triple-well arrangements in order to ensure isolation. The imposed constraints of well spacing lead to an overall unit delay element area of 3 × 7 μm. The well arrangement is shown in Fig. 6.13.
The delay line design involves a cascade of granular cells organized in groups, as seen in Fig. 6.14. Group size was optimized based on the granular delay range in order to guarantee minimum overlap when programming a delay value. Each group is associated with a tap output to facilitate programmability. For this purpose, all tap output nodes are connected to a common bus. Control of the bus is carried out with an 11-bit thermometer-coded word that activates the selected output.
In order to minimize power consumption, a local power-down scheme is established for the delay groups that are not active for a given delay value. This is realized through the use of lead cells (indicated as L-blocks in Fig. 6.14b), which are placed between groups. Their function is to enable the output of the last active group and propagate a steady state to the remaining ones. The logic is shown in Fig. 6.14c. The prototype chip was fabricated in 28 nm UTBB FD-SOI CMOS technology by STMicroelectronics. The chip photograph is shown in Fig. 6.15.
It comprises two delay lines: one with external analog control of body-biasing and one with digital control, through on-chip 8-bit R-string DACs. A dummy output was used to de-embed the delays coming from pad drivers and external routing. The delay lines were designed with independent power supplies to permit direct power consumption measurements. The delay line control words and on-chip DAC inputs were provided from custom on-chip control registers. In order to perform the measurements, a test-bench was built with the die being wire-bonded on a PCB. The on-chip control registers as well as on-board DACs used for generating gate and body-bias signals were programmed though the USB port of a PC, using a serial programming interface.
2.3 Delay Line Measurement Results
The good functionality of the delay line under different control scenarios was verified. Various measurements of rising and falling edges were performed in different rates up to 2 GHz. Figure 6.16 presents the variation of delay for a fully active line with body-biasing applied over a range of 1.6 V, both for rising and falling edges. The coarse/fine character is illustrated.
Another scenario entailed activating consecutively the delay groups to characterize the complete line programmability. Rising-edge delay range is measured under fixed gate biasing: V Gn = 400 mV, V Gp = 600 mV. Keeping V Bp = 1 V, this scenario involved activating stages incrementally and varying V Bn body-biasing between 0.2 and 1 V. These are the limits between inter-well junction diode inversion and nominal supply. The delay versus the programming vector, up to activating all stages, is displayed in Fig. 6.17.
The performance results are summarized in the second column of Table 6.2. “Minimum coarse sensitivity” refers to the minimum variation of delay achievable with enabling only the first group, while “minimum fine sensitivity” refers to varying body-biasing at this setting. The “maximum delay” field refers to the general gate and body-biasing extremes and verifies the known wide-spanning capabilities of a thyristor-type delay. Moreover, granular delay characterization was pursued through multiple measurements of stage by stage activation to de-embed lead cells and output drivers. The results refer to a scenario of rising-edge delays under V Gn = V Bn = 500 mV, V Gp = 0 V and V Bp = 1 V. The measured control sensitivity is 50 fs/mV and energy efficiency is 12.5 fJ/event. It is to note that delay range trades off with sensitivity. Therefore, depending on the application needs, obtaining a smaller sensitivity is possible while impacting the delay range.
Comparison with the current state of the art is done in Table 6.3. The advanced FD-SOI CMOS node and the unique body control scheme mitigate the use of extra hardware, such as current sinks, which impose a static consumption overhead. This way the lowest reported power consumption is achieved.
References
M. Hossain, F. Aquil, P.S. Chau, B. Tsang, P. Le, J. Wei, T. Stone, et al., A fast-lock, jitter filtering all-digital DLL based burst-mode memory interface. IEEE J. Solid-State Circuits 49(4), 1048–62 (2014). https://doi.org/10.1109/JSSC.2013.2297403
L. Xu, S. Lindfors, K. Stadius, J. Ryynanen, A 2.4-GHz low-power all-digital phase-locked loop. IEEE J. Solid-State Circuits 45(8), 1513–21 (2010). https://doi.org/10.1109/JSSC.2010.2047453
I.J. Chang, S.P. Park, K. Roy, Exploring asynchronous design techniques for process-tolerant and energy-efficient subthreshold operation. IEEE J. Solid-State Circuits 45(2), 401–10 (2010). https://doi.org/10.1109/JSSC.2009.2036764
J.-P. Jansson, A. Mäntyniemi, J. Kostamovaara, A CMOS time-to-digital converter with better than 10 ps single-shot precision. IEEE J. Solid-State Circuits 41(6), 1286–96 (2006). https://doi.org/10.1109/JSSC.2006.874281
J.-R. Su, T.-W. Liao, C.-C. Hung, Delay-line based fast-locking all-digital pulsewidth-control circuit with programmable duty cycle, in 2012 IEEE Asian Solid State Circuits Conference (A-SSCC) (IEEE, Piscataway, 2012), pp. 305–8. https://doi.org/10.1109/IPEC.2012.6522686
C. Vezyrtzis, W. Jiang, S.M. Nowick, Y. Tsividis, A flexible, clockless digital filter, in 2013 Proceedings of the ESSCIRC (ESSCIRC) (2013), pp. 65–68. https://doi.org/10.1109/ESSCIRC.2013.6649073
Y.W. Li, K.L. Shepard, Y.P. Tsividis, A continuous-time programmable digital FIR filter. IEEE J. Solid-State Circuits 41(11), 2512–20 (2006). https://doi.org/10.1109/JSSC.2006.883314
J.M. Rabaey, A.P. Chandrakasan, B. Nikolić, Digital Integrated Circuits: A Design Perspective, 2nd edn. Prentice Hall Electronics and VLSI Series (Prentice Hall, Upper Saddle River, 2003)
M. Bazes, A novel precision MOS synchronous delay line. IEEE J. Solid-State Circuits 20(6), 1265–71 (1985). https://doi.org/10.1109/JSSC.1985.1052467
M. Maymandi-Nejad, M. Sachdev, A digitally programmable delay element: design and analysis. IEEE Trans. Very Large Scale Integr. VLSI Syst. 11(5), 871–78 (2003). https://doi.org/10.1109/TVLSI.2003.810787
L.H. Jung, N. Shany, A. Emperle, T. Lehmann, P. Byrnes-Preston, N.H. Lovell, G.J. Suaning, Design of safe two-wire interface-driven chip-scale neurostimulator for visual prosthesis. IEEE J. Solid-State Circuits 48(9), 2217–29 (2013). https://doi.org/10.1109/JSSC.2013.2264136
M. Maymandi-Nejad, M. Sachdev, A monotonic digitally controlled delay element. IEEE J. Solid-State Circuits 40(11), 2212–19 (2005). https://doi.org/10.1109/JSSC.2005.857370
J.-L. Yang, C.-W. Chao, S.-M. Lin, Tunable delay element for low power VLSI circuit design, in TENCON 2006 - 2006 IEEE Region 10 Conference (2006), pp. 1–4. https://doi.org/10.1109/TENCON.2006.344092
A. Sekiyama, T. Seki, S. Nagai, A. Iwase, N. Suzuki, M. Hayasaka, A 1-V operating 256-kb full-CMOS SRAM. IEEE J. Solid-State Circuits 27(5), 776–82 (1992). https://doi.org/10.1109/4.133168
N.R. Mahapatra, A. Tareen, S.V. Garimella, Comparison and analysis of delay elements, in The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002, vol. 2 (2002), pp. II-473–II-476. https://doi.org/10.1109/MWSCAS.2002.1186901
J. Zhang, S.R. Cooper, A.R. LaPietra, M.W. Mattern, R.M. Guidash, E.G. Friedman, A low power thyristor-based CMOS programmable delay element, in Proceedings of the 2004 International Symposium on Circuits and Systems ISCAS ’04, vol. 1 (2004), pp. I–769–72. https://doi.org/10.1109/ISCAS.2004.1328308
B. Schell, Y. Tsividis, A low power tunable delay element suitable for asynchronous delays of burst information. IEEE J. Solid-State Circuits 43(5), 1227–34 (2008). https://doi.org/10.1109/JSSC.2008.920332
B. Saft, E. Schafer, A. Jager, A. Rolapp, E. Hennig, An improved low-power CMOS thyristor-based micro-to-millisecond delay element, in ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC) (IEEE, Piscataway, 2014), pp. 123–126. https://doi.org/10.1109/ESSCIRC.2014.6942037
I. Sourikopoulos, A. Frappé, A. Kaiser, L. Clavier, A decision feedback equalizer with channel-dependent power consumption for 60-GHz receivers, in International Symposium in Circuits and Systems (ISCAS), Melbourne (IEEE, Piscataway, 2014), pp. 1484–87. https://doi.org/10.1109/ISCAS.2014.6865427
M. Kurchuk, C. Weltin-Wu, D. Morche, Y. Tsividis, Event-driven GHz-range continuous-time digital signal processor with activity-dependent power dissipation. IEEE J. Solid-State Circuits 47(9), 2164–73 (2012). https://doi.org/10.1109/JSSC.2012.2203459
I. Sourikopoulos, A. Frappé, A. Cathelin, L. Clavier, A. Kaiser, A digital delay line with coarse/fine tuning through gate/body biasing in 28nm FDSOI, in ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference (2016), pp. 145–48. https://doi.org/10.1109/ESSCIRC.2016.7598263
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sourikopoulos, I., Cathelin, A., Kaiser, A., Frappé, A. (2020). Coarse/Fine Delay Element Design in 28 nm FD-SOI. In: Clerc, S., Di Gilio, T., Cathelin, A. (eds) The Fourth Terminal. Integrated Circuits and Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-39496-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-39496-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39495-0
Online ISBN: 978-3-030-39496-7
eBook Packages: EngineeringEngineering (R0)