Keywords

1 Delay Elements Review

Delay manipulation is a major concern for the reliable implementation of circuits whose purpose is timing. For example, clock phase generation, deskewing, data alignment, hazard mitigation are just some signal processing functions that could entail digital delays. However, as one acknowledges the fact that any sort of signal processing function intrinsically produces delays, then designing for delay as a performance metric is straightforward. It is exactly this intrinsic non-ideality, namely propagation delay, that is usually manipulated to build digital delay circuits (Fig. 6.1).

Fig. 6.1
figure 1

Definition of rising and falling digital delay

Depending on the application, the range and resolution when controlling a delay value differs, justifying numerous approaches in dealing with the issue. On the one hand, some sort of simple operation such as to avoid setup and hold violations in a datapath can be simply handled with some extra buffering. On the other hand, producing reliable clock signals comes along with designing elaborate systems with multiple control loops. The reason for the required complexity is high precision in implementing accurate timing operations. Common examples of systems that employ controlled delay lines include delay locked loops (DLL) [1], digitally controlled oscillators (DCO), phase locked loops (PLL) [2], asynchronous pipelines [3], and time-to-digital converters (TDC) [4]. Delay circuits are also used to realize pulse-width control circuits with output programmable duty cycles used in ADCs or DACs [5] and more recently continuous-time digital filters [6]. Besides, it is not uncommon to employ dedicated delay elements for relatively simpler signal processing operations as well, like phase shifting, interpolation, or non-overlapping clock generation. Manipulating the mechanisms involved in the generation of delay has led to the introduction of multiple ideas throughout the years. Apart from the straightforward cascading of inverters, or any logic gate for that matter, there has been specific research activity targeting power-efficient, variable delay elements. Below, we present the prevalent techniques over the recent years: capacitive shunting, current starving, and thyristor-based design. After the presentation, a short discussion follows on the topology selected for further study.

1.1 Cascaded Inverters

One of the simplest ways to introduce a digital delay has been cascading inverters (Fig. 6.2). In this case, delay is created due to the finite slopes of charging and discharging the loaded inverter outputs. These charging and discharging slopes could be modeled based on the time constants set by the effective resistance of the switching transistors and the output capacitance. A scheme of cascaded inverters interleaved with multiplexers is a typical example implementation. A recent realization was proposed in [7], where high effort stages of cascaded inverters were used to produce delay values in the order of a nanosecond. Evidently, any sort of variation in process, supply, or temperature readily translates in variation in delay [8]. Moreover, the delay is inversely proportional to the switching slope. So, for bigger delay values there is more power consumption involved.

Fig. 6.2
figure 2

Cascaded inverter delay

1.2 Capacitive Shunting

As seen above, for the general arrangement of cascades of inverters or in the case of inverters driving capacitive loads such as the one in Fig. 6.3, the output signal slope varies with the time constant. This represents the main mechanism for creating delay. Actually, delay can be varied by modifying output capacitance directly as in [9] or the charge flow to it, through control voltage V ctrl as seen in Fig. 6.4 similarly to the proposed cell in [10].

Fig. 6.3
figure 3

Changing the slope due to propagation delay

Fig. 6.4
figure 4

Capacitive shunting delay

1.3 The Semi-static Approach

A semi-static approach to ensure minimum short-circuit current during transitions in a delay element is proposed by [11]. The topology is shown in Fig. 6.5 for producing rising-edge delays. Input is connected to two separate arrangements, where in each one, the complementary transistor is controlled by biasing a current mirror. Thus, a static consumption overhead is introduced. The two preliminary outputs are combined in a single stage by mitigating the short-circuit currents.

Fig. 6.5
figure 5

A semi-static approach for a delay circuit

1.4 Current Starving

The current-starving technique is realized by adding extra MOS devices in series with the ones of the inverter. This effectively reduces (starves) the current associated with the switching events, which directly impacts the propagation delay.

As shown in Fig. 6.6, delay control can be established either by directly modulating the voltage on the gates of the “starving” transistors. A digital approach would be switching-in transistors parallel to the starving as shown in the work of [12] but this leads to a non-monotonic increase of the delay with the ascending binary input pattern. This is due to parasitic capacitance being added, which counteracts the reduction of effective resistance. In order to achieve a monotonic behavior, it is suggested to control only the current of the starving transistors. This could be done by mirroring the output of a current DAC at the cost of extra static consumption. This approach is proven to increase tolerance to PVT variations because the digital code reflects only the control current. On another note, the current-starving technique produces a less sharp transition slope, which actually means more short-circuit current. A way to limit this side-effect was proposed in [13] through the use of series diodes to limit the output swing, and effectively reduce power consumption.

Fig. 6.6
figure 6

Current-starving technique

An interesting modification to the current-starved inverter is a topology resembling the Schmitt trigger. The idea was proposed in [14] in the context of an SRAM cell. The slow transition of the starved inverter is remedied with a cascaded inverter. This idea was extended to Fig. 6.7 in the work of [15] where the Schmitt trigger is presented with positive feedback action from the output signal, therefore improving the output transitions. This renders this topology superior in terms of signal integrity and delay in comparison with current-starved, transmission-gate load based, cascaded inverter based set-ups. The idea of positive feedback for transition slope modification also draws from the thyristor-based delay elements, which are presented in the next section.

Fig. 6.7
figure 7

Schmitt trigger concept for delay generation

1.5 Thyristor Delay Elements

The basic principle of a thyristor device is the activation of the device when a certain conduction threshold is crossed. In a thyristor-based delay element this operation is replicated by a positive feedback mechanism that completes a delay event after crossing a threshold. The delay event could be a capacitor that is slowly charged or discharged and after a threshold voltage across its terminals is reached, the charging/discharging is forcefully accelerated through positive feedback action.

The concept is illustrated in Fig. 6.8, using a complementary transistor pair. In order to describe functionality, we will start from a steady state where both transistors are off and the gate voltages (V Gp and V Gn) are VDD for gate P and zero for gate N. As we lower the voltage on P, in the vicinity of VDD-VTHp, the PMOS begins to turn on. However, as the PMOS turns on, the voltage on its gate is kept driven towards the ground, not only because of the triggering action, but also because the NMOS is now beginning to turn on as well. Therefore, in addition to the initial triggering action on the PMOS gate, the NMOS introduces an additive force, which further accelerates the switch: a positive feedback loop. Obviously, to accommodate another switching cycle, the thyristor must be pre-charged again as there is no way to return to prior state. This can be realized by adding a pre-charge circuit. The complete topology of the cell, where sequencing of the delaying and pre-charging events take place is shown in Fig. 6.9.

Fig. 6.8
figure 8

Thyristor-like switching technique

Fig. 6.9
figure 9

Thyristor-based delay element

It works with two similar parts for supporting rising and falling edges, where the delayed rising output pre-charges the part that delays the falling edge. For example, before any rising edge of D, \(\bar {Q}\) has already reached high state through the preceding high to low transition of D. So, the left side thyristor is ready to delay the rising edge of D. The thyristor-based methodology exhibits attractive characteristics in terms of power consumption and supply sensitivity. Power is consumed during switching with a small shunt current. Primarily, there is no static consumption apart from any control current generation mechanism. Also, supply sensitivity is generally low because the delay is composed of two components: the controlled current part and the switching part. The switching part is the only part depending on the supply value, because this is when a charge cycle is completed through positive feedback. If this duration is negligible in terms of the desired delay value, sensitivity to the supply voltage is minimal. The thyristor-based delay element has attracted attention due to its special characteristics and various works have been proposed based on it. For instance, in [16], the fact that the thyristor-based topology can suffer from charge sharing is acknowledged. The parasitic capacitance tied to \(\bar {Q}\) is shared with the source capacitance of the input transistors. The work proposed the addition of switches to pre-charge the output nodes, prior to switching the input.

In the work of [17], the thyristor-based topology was used to accommodate a delay cell that would support narrowly spaced bursts of asynchronous pulses. The delay element featured in this work produced delayed pulses of determined width. So, with respect to Fig. 6.9, only one side of the circuit was used. The circuit made use of an additional capacitor element, which can be slowly discharged through the current source. The same concept is followed for a similar circuit in the work of [6]. The design comprised two capacitors and two current sources with independent biases for the same delay cell, in order to support further configurability. The same general methodology is used for the thyristor delay topology in [18]. The work proposed two ideas on the basic scheme: (1) the addition of an extra current source to accommodate a sharper edge and reduce the transition shunt current and (2) a series diode connected with the output transistor pair in order to modify the thyristor activation point.

1.6 Choosing a Delay Element

Table 6.1 presents a qualitative comparison of the characteristics of the delay element types that were detailed above. Evidently, there have been a lot of ideas to produce digital delay elements, but out of the main topology classes described above, the one that seems more attractive for advanced CMOS technology nodes is the thyristor-based one. This is because of its superiority against supply and temperature variations, while there is no static consumption. All the reviewed implementations of the thyristor-based delay element are based on a usual control mechanism during transitions by modulating the charge/discharge current. As different technology nodes are targeted and delay ranges differ, it is difficult to quantify the efficiency of the topologies. To overcome this issue, simulations have been reported for comparisons. But even so, it is not clear what amount of optimization has gone into the comparisons and yet, no variant has proposed a remarkable improvement over the main topology.

Table 6.1 Qualitative comparison of delay element types

2 Coarse/Fine-Tuning Delay Element and Line Using Gate and Body-Biasing in 28 nm FD-SOI

It is clear that selecting the proper delay cell depends mainly on its assignment. For the circuit implementation of this section the target is the delay line specification described in [19]. The delay line is specified as a part of the feedback branch of a high-speed decision feedback equalizer, aiming at digital processing in the vicinity of 1–2 Gbps. This determines a regime for the delay range of a few hundreds of picoseconds. The delay line is also specified with a wide range of configurability, in the order of nanoseconds. Based on the aforementioned categories the proposed granular delay element is a thyristor-type and is based on an improvement of the one in [20]. The modifications effectively limit short-circuit current consumption and extend functionality in providing delay control through transistor body-biasing. Implemented in 28 nm UTBB FD-SOI (ultra-thin body and buried oxide fully depleted silicon on insulator) CMOS technology, the transistor’s body terminal serves as a fine control knob, which complements the coarse gate control. This way an unprecedented coarse/fine control scheme is realized without any extra hardware [21].

2.1 Delay Element Design

The proposed delay cell topology is displayed in Fig. 6.10 and three stages can be readily distinguished. The first stage is a current-starved inverter, whose output (V C) is connected to a second gated inverter stage. The second stage produces a feedback signal, V F, which controls transistors placed in parallel with the ones implementing current-starving. Finally, a driving stage generates complementary output signals. The circuit follows the typical thyristor-type operation. For a rising input, operation can be briefly summarized as follows: As input V IN rises, V C discharges slowly due to the current-starving scheme and up to the point where the second stage inverter threshold is crossed. At that time, the parallel transistor is activated, through V F, which effectively cancels the current-starving action. So, the discharge of V C is then accelerated and the switch is completed. The element’s consumption profile is dominated by the current-starved charging/discharging interval, which produces a short-circuit current up to the switching of the second stage. In an attempt to minimize the consumption of this action, the cell is designed with complementary inputs ensuring that the \(V_{\overline {IN}}\) signal precedes its complement. This is realized by cascading delay cells. Arriving directly on the second stage inverter, the input’s complement plays a preparatory role for the upcoming switch. As V IN arrives in a high state, \(V_{\overline {IN}}\) has been already driven low, shutting the path to the ground. This action enables the slow second stage output to be raised without short-circuit current loss. For the delay element at hand, substrate biasing contributes to current-starving. This is made possible thanks to FD-SOI technology. As seen in Fig. 6.11, the transistor channels are implemented with a buried oxide layer underneath, which prevents source-bulk junction leakage. This ensures a body-tie voltage range which is much wider than in bulk CMOS. Also, the body coefficient is much more important in FD-SOI (85 mV/V) than in the equivalent bulk node (40 mV/V).

Fig. 6.10
figure 10

Coarse-fine delay element using transistor body-biasing

Fig. 6.11
figure 11

FD-SOI transistor cross-section (left) and threshold voltage (V TH) variation with body-biasing (V B) for the technology offered RVT-type transistors (right)

Regarding control of the drain current, body-biasing is principally of the same nature as gate biasing. However, the electric field is applied through a thicker oxide and a more distant contact. One could easily predict that the effect on current modulation would be less profound. This is exactly the key observation for establishing an extra control knob to fine-tune the delay.

The transistors of the circuit are sized to produce delays in the order of picoseconds. In Fig. 6.12, on the top, we simulate the falling-edge delay variation, T d, against the control inputs. The NMOS starving transistor biases are fixed to V Gn = 500 mV for the gate, and V Bn = 0 mV for the body. The red curve refers to keeping the PMOS body as V Bpn = 1V and varying V Gp. It is a typical exponential variation with a steep ascent as current is starved under conduction threshold. In blue, we plot a family of curves, which refer to fixed values of V Gp = 500 mV, 570 mV, etc. and varying V Bp. The results confirm the above observation and reveal another important effect: careful combination of gate/body-biasing can lead to obtaining a segment of the delay range that refers to an almost linear part of the curve. To elaborate this, we can assume a gate voltage in the vicinity of 500 mV as seen in the bottom plot of Fig. 6.12. For this setting, the delay versus the body-bias variation provides a high correlation with a linear function, as well as a very fine-tuning capability in the order of fs/mV.

Fig. 6.12
figure 12

Simulations of falling-edge delay time. Red curves: varying the gate while keeping body constant. Blue curves: varying the body while keeping the gate constant. Curves are presented on same x-axis to illustrate the evolution of delay, but are not coincident as they refer to different biasing conditions

2.2 Delay Line Architecture

In an effort to fully investigate the topology and the control flexibility that body-biasing offers, a prototype delay line was designed and fabricated. The transistors of the first stage current-starved inverter were laid out in twin and triple-well arrangements in order to ensure isolation. The imposed constraints of well spacing lead to an overall unit delay element area of 3 × 7 μm. The well arrangement is shown in Fig. 6.13.

Fig. 6.13
figure 13

Well arrangement accommodating the body-biasing stage

The delay line design involves a cascade of granular cells organized in groups, as seen in Fig. 6.14. Group size was optimized based on the granular delay range in order to guarantee minimum overlap when programming a delay value. Each group is associated with a tap output to facilitate programmability. For this purpose, all tap output nodes are connected to a common bus. Control of the bus is carried out with an 11-bit thermometer-coded word that activates the selected output.

Fig. 6.14
figure 14

(a) Delay line architecture. (b) Inter-group transition and delay line output are implemented with lead cells. (c) Lead cell logic

In order to minimize power consumption, a local power-down scheme is established for the delay groups that are not active for a given delay value. This is realized through the use of lead cells (indicated as L-blocks in Fig. 6.14b), which are placed between groups. Their function is to enable the output of the last active group and propagate a steady state to the remaining ones. The logic is shown in Fig. 6.14c. The prototype chip was fabricated in 28 nm UTBB FD-SOI CMOS technology by STMicroelectronics. The chip photograph is shown in Fig. 6.15.

Fig. 6.15
figure 15

Delay line chip detail

It comprises two delay lines: one with external analog control of body-biasing and one with digital control, through on-chip 8-bit R-string DACs. A dummy output was used to de-embed the delays coming from pad drivers and external routing. The delay lines were designed with independent power supplies to permit direct power consumption measurements. The delay line control words and on-chip DAC inputs were provided from custom on-chip control registers. In order to perform the measurements, a test-bench was built with the die being wire-bonded on a PCB. The on-chip control registers as well as on-board DACs used for generating gate and body-bias signals were programmed though the USB port of a PC, using a serial programming interface.

2.3 Delay Line Measurement Results

The good functionality of the delay line under different control scenarios was verified. Various measurements of rising and falling edges were performed in different rates up to 2 GHz. Figure 6.16 presents the variation of delay for a fully active line with body-biasing applied over a range of 1.6 V, both for rising and falling edges. The coarse/fine character is illustrated.

Fig. 6.16
figure 16

Falling-edge (top) and rising-edge (bottom) delay measurements. The fine control of body-biasing is verified

Another scenario entailed activating consecutively the delay groups to characterize the complete line programmability. Rising-edge delay range is measured under fixed gate biasing: V Gn = 400 mV, V Gp = 600 mV. Keeping V Bp = 1 V, this scenario involved activating stages incrementally and varying V Bn body-biasing between 0.2 and 1 V. These are the limits between inter-well junction diode inversion and nominal supply. The delay versus the programming vector, up to activating all stages, is displayed in Fig. 6.17.

Fig. 6.17
figure 17

Rising delay vs. control vector: 1–39 delay elements at minimum and maximum values of body-bias. The curve shape reflects group sizing

The performance results are summarized in the second column of Table 6.2. “Minimum coarse sensitivity” refers to the minimum variation of delay achievable with enabling only the first group, while “minimum fine sensitivity” refers to varying body-biasing at this setting. The “maximum delay” field refers to the general gate and body-biasing extremes and verifies the known wide-spanning capabilities of a thyristor-type delay. Moreover, granular delay characterization was pursued through multiple measurements of stage by stage activation to de-embed lead cells and output drivers. The results refer to a scenario of rising-edge delays under V Gn = V Bn = 500 mV, V Gp = 0 V and V Bp = 1 V. The measured control sensitivity is 50 fs/mV and energy efficiency is 12.5 fJ/event. It is to note that delay range trades off with sensitivity. Therefore, depending on the application needs, obtaining a smaller sensitivity is possible while impacting the delay range.

Table 6.2 Performance summary

Comparison with the current state of the art is done in Table 6.3. The advanced FD-SOI CMOS node and the unique body control scheme mitigate the use of extra hardware, such as current sinks, which impose a static consumption overhead. This way the lowest reported power consumption is achieved.

Table 6.3 Delay element state of the art