1 Introduction

With the continuing scaling of CMOS technologies deeper into the nanometer regime the process variations have become an ever increasing problem for the design of any VLSI circuit [1]. The within-die variations of random dopant fluctuations and process imperfections, related to the sub-wavelength lithography, cause large variations of circuit parameters such as transistor threshold voltage [2, 3]. The relative threshold voltage variations for minimum size transistors will, according to the ITRS roadmap [4], continue to increase with technology scaling. Therefore, the accuracy control of analog and mixed-signal circuits implemented with small device sizes are going to continue to pose great challenges.

High-speed low-to-medium resolution ADCs are required in both wire transmission for optical receivers and serial links as well as Ultra-Wideband radio receivers [5]. The monolithical integration of the front-end and back-end on the same die requires the ADC to be implemented in nanometer CMOS processes where the increasingly large process variations can cause significant overhead in terms of power and complexity.

Traditionally accuracy in ADCs has been achieved by sizing in the analog domain together with calibration methods. Redundancy has been used to relax the accuracy requirements of the comparators in [6]. In [7], a combination of redundancy and calibration was used to lower the power dissipation by minimizing the power used by the redundant circuits.

The aim of this paper is to show that redundancy can be used to achieve an overall energy efficient design at gigasample per second rate with the sole focus on designing for speed and low power at the building block level. As the mismatch induced comparator offsets will get worse with scaling, using comparator redundancy is a means for taking full advantage of decreased feature sizes in future CMOS processes, with higher speeds and lower power dissipation.

This paper is organized as follows, in Sect. 2 the effect of comparator redundancy is explained and the description of the ADC architecture together with the sub-blocks is given in Sect. 3. Section 4 presents the measurement results of the ADC and a comparison with other designs. The paper is concluded in Sect. 5.

2 Comparator redundancy

In Flash ADCs without redundant circuitry, careful design choices must be made in the trade-offs between power, speed and accuracy [8, 9]. However, comparator redundancy can be used to break this link [7]. In the design of pipelined ADCs, redundancy is commonly used to reduce the accuracy requirements of the comparators in the MDAC stages. However, in Flash ADCs, it has not been used to the same extent. However, in [10] and [6] redundancy was used to increase the yield and provide the desired resolution. The presented ADC utilizes that, with redundant comparators, the accuracy requirement is vastly reduced. This means that the comparator sizing can be done with the concern for speed and power alone to achieve a low-power ADC and the large mismatch induced comparator offsets are allowed.

As the individual comparators will have offsets in the order of several LSBs the ADC should be implemented with some sort of calibration method in order to take advantage of the redundant comparators. Three calibrations methods that should be taken into considerations are: (1) Comparator trimming, (2) Correction of the digital output data and (3) Selection of the optimal set of comparators. Comparator trimming means that each individual comparator has its trip point manipulated through for example setting a capacitive load imbalance on the output nodes [11] as was done in [12]. The cost of this calibration method depends on the trimming range as the extra capacitance in the comparator settling node will increase power dissipation or decrease the speed. However, no comparator redundancy is needed to achieve the same resolution compared to the methods below. If the trimming range is large enough the trip points can all be brought to their ideal locations and the signal to noise ratio achieved with this calibration method would be close to the ideal.

Digital processing of the output data means that all comparators should be characterized with their trip-points and with this knowledge the digital code width can be altered to match the trip-point distance between each neighboring comparator pair. The DNL definition in Eq. 1 as given in [13] relates the distance between two trip-points, W, to the ideal code width, Q. The distance of the output digital code should then be matched to distance between the neighboring trip-points as in Eq. 2.

$$ {\text{DNL}} = {\frac{W(k) - Q}{Q}} $$
(1)
$$ Q[k] = W[k] = T[k + 1] - T[k] $$
(2)

This will remove any errors caused by the digital interpretation of the data by minimizing the differential non-linearity (DNL) errors, and with infinite precision setting them to zero. The only source of errors would then be the quantization noise, which, compared to the ideal case, would be larger since the noise power would be lowest when the quantization steps would be all of equal size. The cost of this method depends on the implementation, but a straightforward look-up table containing all the corrected code words would be large and the increased word length would also result in increased power in the back-end.

The third calibration method, which has been utilized in this work, selects a sub-set of the comparators, which has trip-points that best matches that of the ideal locations for a reduced number of code bins as in Eq. 3, where ε is the trip point deviation from the ideal location and N subset is the number of comparators in the subset. This corresponds to minimizing the integral non-linearity (INL) error. The cost of this calibration method is the overhead needed for redundant comparators. However, the power of the disabled comparators can be gated in order to reduce the total power dissipation.

$$ \min ( \in [k])\,{\text{for}}\,k = 1 \ldots N_{\text{subset}} $$
(3)

The expected performance of the different calibration methods is shown in Fig. 1 for a 6-bit Flash ADC, where the mean expected number of bits achieved in a 1,000 point monte–carlo simulation is plotted as a function of the trip-point standard deviation. In each monte–carlo iteration, the trip-point locations are normally distributed around their ideal location with the corresponding standard deviation. Note that the distribution of trip-points is the only source of error in the monte–carlo simulation, dynamic or further static errors in the actual fabricated ADC will further reduce the achievable effective resolution.

Fig. 1
figure 1

Mean achievable effective resolution of a 6-bit Flash ADC using different calibration techniques

It can be seen that the digital trimming method achieves the nominal resolution with a trimming accuracy of LSB/2. The cost of the method would increase with increasing comparator offset as the trimming range would need to follow.

The performance achieved for both digital correction as well as the selection of the optimal set follows close with each other, differing by 0.25 bits and flattening out towards increasing offsets. The required redundancy for achieving a certain resolution is about 2 bits or a factor of 4 in the number of comparators. This ratio remains fixed also for higher number of comparators as can be seen in Fig. 2 which plots the resolution for a 10-bit Flash ADC. There, the x-axis is of the same absolute scale as in Fig. 1, i.e., the same offset in volts give rise to a higher LSB error for the 10-bit case. All calibration methods achieve a significant increase in effective resolution compared to the case of no calibration. To put the x-axis scale into perspective the comparators of the fabricated ADC has offset standard deviation of 11 LSBs referred to a 6-bit ADC.

Fig. 2
figure 2

Mean achievable effective resolutions for a 10-bit Flash ADC

3 ADC architecture

Figure 3 show the block diagram of the implemented ADC architecture which has a target resolution of 4 bits. The input signal is buffered through a source follower stage to reduce kick-back effects and drive the comparator array consisting of 63 comparators corresponding to a maximum resolution of 6 bits. As the accuracy of the individual comparators in the ADC will be deteriorated due to process variations, the output from the comparator array will not be perfect thermometer code. In order to avoid complicated bubble-suppression logic since the comparators offset could easily be several LSBs a summing decoder is used to efficiently convert the output vector into binary.

Fig. 3
figure 3

Flash ADC architecture

3.1 Comparator

The ADC uses a differential-pair sense-amplifier based comparator as shown in Fig. 4. With 63 comparators in the array, the number of comparators affecting the input buffer and reference ladder has increased by a factor of four compared to what is ideally needed to achieve 4-bit resolution. This has caused the kick-back charge from the comparators to the input buffer and reference network to increase. Compared to a typical sense-amplifier comparator that has a clocked tail NMOS transistor, the proposed ADC use a comparator with the clocked transistors placed between the cross coupled inverter pair and input transistors. This is to prevent the source and drain of the input transistors to be pre-charged, since the simultaneous discharge of these nodes cause common-mode kickback to the input. With the increased number of comparators needed for redundancy this common-mode kick-back cause too large transients on the reference and input voltages. By placing the clocked transistors above the input pair, the pre-charge of these nodes is prevented and the common-mode kick-back charge is reduced by 6×, reducing the power dissipation in the reference ladder and input buffer by the same amount [14].

Fig. 4
figure 4

Differential pair sense-amplifier based comparator

As the accuracy of the comparators is handled by redundancy, the transistor sizing can be done without regard for mismatch with the focus only on speed and power. For this design, the transistor widths were smaller than 1 μm with the exception of the equalization PMOS which was 1.45 μm in order to remove the memory effects. The small transistor sizes allow ADCs of this architecture to take full advantage of the decreased feature sizes in future CMOS technologies.

The chosen calibration method for this design is to select an optimal set of comparators because of the low power potential with this method. The comparators that do not contribute to an increase in the effective resolution are disabled using clock gating in order to minimize the power dissipation. The clock gating consists of a NAND gate as a local driver for all comparators as can be seen in Fig. 5. The NAND-gate is chosen to emphasize the rising edge of the clock at minimal clock load. When disabled, the comparators will be kept in the latching state and changes in the input signal will not affect the output or dissipate power due to switching. Since the latched output could be either high or low the enabled signal is also used to reset the output signal in the subsequent latch to prevent the disabled comparator from biasing the final output. The enable signals are controlled by a register and a serial-to-parallel-interface (SPI) is used to program the register through an external port.

Fig. 5
figure 5

External SPI control interface and clock-gating circuit

The references to the comparators are generated by a differential resistor ladder. The resistor ladder is implemented with capacitively decoupled internal nodes in order to further reduce the transient effect that both common-mode and differential kick-back cause.

3.2 Decoder

In a traditional ADC, the output of the comparator array is thermometer coded since the accuracy of the individual comparators is well controlled. When the accuracy requirements of the comparators can be ignored, due to the use of redundancy, the output will no longer be thermometer coded because of the large expected mismatch induced comparator offsets. The output array is then effectively converted to binary by using a summing decoder that disregards internal comparator order. This is implemented as a 6-bit Wallace Tree decoder as shown in Fig. 6.

Fig. 6
figure 6

63-to-6 bit wallace tree decoder

As the critical path of a 6-bit Wallace Tree decoder is 9 Full Adders [15] the decoder is pipelined to handle the throughput of a multi-GS/s ADC’s [10].

The full-adders used in the Wallace Tree decoder are the transmission gate based full adders shown in Fig. 7, which are chosen for the best resulting power-performance trade-off for the entire pipelined decoder based upon a target sampling frequency of 3 GS/s.

Fig. 7
figure 7

A transmission gate full adder cell

3.3 Calibration

To calibrate the ADC for 4-bit, low-power operation, each comparator is characterized with a trip-point. This is externally controlled by a computer, which characterize the comparators one by one through a calibration port. Once the trip points are known, the optimal number of comparators required for 4-bit performance is selected based upon minimizing the INL by making the distance between the trip-point of each neighboring comparator pair constant as described in Sect. 2. The rest of the comparators are disabled to save power. The architecture allows any number of comparators to be enabled to provide the desired 4 bit functionality.

4 Measurement results

The ADC was designed and manufactured in a 7 metal, 90 nm CMOS process and the active die area is 0.04 mm2. For measurements, the die was directly bonded onto a PCB, which is shown in Fig. 8, to reduce the parasitics. The micrograph of the chip is seen in Fig. 9 with a die size of 1 × 1 mm.

Fig. 8
figure 8

The PCB with the directly bonded die

Fig. 9
figure 9

Chip micrograph

With the ADC operating at a sampling frequency of 2.5 GS/s the maximum effective number of bits (ENOB) achieved after calibration was 3.94 bits when 27 comparators were enabled. This was measured for an input signal of 1 MHz. The differential (DNL) and integral non-linearities related to an LSB of the 4 bit output are given in Fig. 10. The maximum absolute DNL and INL were 0.48 and 0.54 LSBs, respectively.

Fig. 10
figure 10

Differential non-linearity (DNL) and integral non-linearity (INL) of the ADC

Operating with low input frequencies the effective number of bits were above 3.87 for sampling frequencies up to 2.8 GS/s as seen is Fig. 11. Above this, the decoder starts to experience timing problems, and spikes appear at the output.

Fig. 11
figure 11

Effective number of bits and SNDR versus sampling frequency

By measuring the trip-points of the comparators the standard deviation of the offset voltages was 50 mV, and with a 300 mV input range this corresponds to 10.7 LSBs referred to 6-bit resolution because of the 63 comparators. The effective resolution could be compared to the expected mean value as predicted by the monte-carlo simulations in section II, which for 10.7 LSBs offset was 4.2 effective bits.

For input frequencies between DC and Nyquist and a sampling frequency of 2.5 GS/s, the effective number of bits and signal to noise and distortion ratio (SNDR) is depicted in Fig. 12. The effective range bandwidth, where the SNDR has decreased by 3 dB, was 300 MHz.

Fig. 12
figure 12

Effective number of bits and SNDR versus input frequency

Note that at 2.8 GS/s the ADC achieves 3.98 effective bits and dissipates 34.4 mW but the effective resolution bandwidth had degraded compared to that up to 2.5 GS/s.

The total power dissipation in the ADC, measured at 2.5 GS/s while running at a power supply voltage of 1.2 V, was 30.2 mW. Out of these, 16.8 mW (55%) were used for clocking and clock generation, 9.4 mW (31%) for the input buffer, 0.25 mW (1%) for the reference generation, 1.7 mW (5.5%) were dissipated in the comparator array, and 2.4 mW (7.5%) in the digital logic.

In order to get the output data from the chip with good signal integrity, the output data was downsampled by 32 times. Using a sample rate of 2.5 GS/s and an input frequency of 1.3 MHz the output signal and spectrum is shown in Figs. 13 and 14. The measured spurious free dynamic range (SFDR) was 31.3 dBFS and was limited by the third harmonic. The SNDR from a 2,048 point FFT was 25.5 dB.

Fig. 13
figure 13

Output waveform showing every 32nd sample with 256 sample points at 2.5 GS/s and an input frequency of 1.3 MHz

Fig. 14
figure 14

Output spectrum showing the fundamental and harmonics in marked with circles. The SFDR is 31.3 dBFS and the SNDR 25.5 dB

The performance of the ADC is summarized in Table 1 together with other state-of-the-art ADCs with sampling rates above 2 GS/s. Here, two common figures of merit are used to compare ADC performance, FoM1 and FoM2 which are given in Eqs. 4 and 5.

$$ {\text{FoM1}} = {\frac{P}{{2^{\text{ENOB}} \cdot F_{s}}}} $$
(4)
$$ {\text{FoM}}2 = {\frac{P}{{2^{\text{ENOB}} \cdot2\cdot{\text{ERBW}}}}} $$
(5)

where P is the power dissipation, ENOB the effective number of bits, F s the sampling frequency, and ERBW the effective resolution bandwidth.

Table 1 Performance comparison

The authors of [6, 10, 17, 18, 20, 21] also report effective resolution bandwidths lower than the Nyquist frequency. However, the most common cause is the increased effect of sampling jitter at higher input frequencies. In our design the problem is related to an error in the layout of the input buffer, causing the two differential source follower circuits to behave differently. For the next generation of the chip the bandwidth performance will be improved, and closer to that indicated by the simulations with extracted parasitics as larger than the nyquist frequency. Although the bandwidth was reduced, the proposed ADC achieves a FoM1 of 0.79 pJ/Conversion-step and a FoM2 of 3.28 pJ/Conversion-step, proving comparator redundancy as an efficient method in designing low-power flash ADCs.

5 Conclusion

Through the use of redundancy and choice of calibration method a low-power high-speed flash ADC has been designed. The redundancy allowed the comparators to be designed with trade-offs only between speed and power. The use of ultra-low-power comparators resulted in a 2.5 GS/s, 4-bit ADC that dissipates 30.2 mW of power. The resulting FoM1 of 0.79 pJ/Conversion-step is, to the author’s best knowledge, the best reported of flash ADCs with sampling rates over 2 GS/s. Although the bandwidth was reduced, the FoM2 is still amongst the state-of-the-art for high-speed flash ADCs.