1 Introduction

Impulse radio ultra-wideband (IR-UWB) is one of the attractive technologies for short range wireless communication, such as the wireless sensor network (WSN). Conventional WSN, e.g., Zigbee and WLAN, can locate sensor nodes with an accuracy of about 2 m [1]. In contrast, IR-UWB system can locate them with an accuracy of about 30 cm [1, 2]. The wider bandwidth means higher locating accuracy. In 2002, the Federal Communication Commission (FCC) authorized the use of UWB signal for communication purpose in the band from 3.1 to 10.6 GHz with a minimum bandwidth, B, of 500 MHz [3]. In fact, to avoid the 5-GHz WLAN band, the UWB systems with a 3–5 GHz band and 6–10.6 GHz are paid more attention. In this design, the targeted bandwidth of UWB signal locates from 4.3 to 4.8 GHz.

The traditional receiver analog front end (AFE) of 3–5 GHz IR-UWB systems consumes almost 100 mW [3, 4], which is too much for WSN because the sensor network are battery-powered. Reducing the power consumption of the AFE extends the battery life, which will enable the development of high capacity, large scale WSNs. This can’t be achieved using conventional wireless communication method, for example, the Heterodyne receiver architecture. One available solution for reducing power consumption is the asynchronous receiver [2], which consumes less power than the synchronous counterpart. However the asynchronous receiver has a disadvantage regarding multiple accesses. It detects the received signal power only. Therefore, when multiple transmitters transmit the signal at the same time, the asynchronous receiver cannot distinguish among the transmitters. Another low power solution is to utilize the super-regenerative principle [5]. This technique is based on the use of an LC-VCO that is turned on briefly during each unit interval. If an input carrier pulse is present, the VCO will start up quickly. If a carrier pulse is not present, then the VCO will take longer to start up. By detecting the envelope of the VCO output and comparing the envelope peak with an appropriate reference, the bit sequence can be detected with a reasonable bit error rate. In this condition, high speed data conversion could be replaced by the envelope detection.

In this design, a direct sub-sampling receiver configuration is proposed to improve the energy efficiency without any down-conversion procedure. An analog-to-digital (A/D) converter is the key component to construct such IR-UWB receiver because of the demanding requirements on speed, accuracy and power consumption existing in the front-end and making the design challenging [6]. The input signal frequency of most ADCs is generally lower than half of the sampling frequency, as determined by the Nyquist sampling theory. An obvious solution, which requires capturing an input Gaussian pulse whose band spans from 4.3 to 4.8 GHz, is to use an ADC, which is capable of sampling at higher than 9.6 GHz. However, such a converter consumes about hundreds of milliwatts even at low resolution [7]. Therefore, a low-power sub-sampling ADC is presented in this paper.

Section 2 describes ADC’s specification including sampling rate and resolution resulting from the noise limitation in IR-UWB receiver. Section 3 briefly presents the parallel flash ADC’s architecture. Section 4 explains the averaging network with asymmetric spatial filter response based termination. Section 5 focuses on circuit technique. It explains the self-biased track and hold amplifier (THA) based front-end sampling circuit for parallel channels and a new offset-compensated comparator used in this ADC. Section 6 discusses the experimental results, and Sect. 7 concludes the paper with a brief summary.

The present paper is an extended version of [8] with additional details about system level modeling and circuit design techniques.

2 ADC specifications for IR-UWB receiver

ADC’s specifications such as sampling rate and resolution is analyzed and optimized at the overall receiver level. Figure 1 shows the non-coherent [9] direct sub-sampling transceiver configuration. The low noise amplifier (LNA) and variable-gain amplifier (VGA) are used to amplify the received power from antenna without deteriorating the noise performance. The gain control feedback loop includes a power detector to decide the VGA gain factors. The LNA’s gain could be bypassed when the input signal approaches the maximum level to ensure the stability. Down-converter is replaced by ADC directly without mixer and local oscillator, simplifying the front-end design and reducing power consumption. The signal transmitted from the TXs at 0.5 V peak corresponding to 4 dBm, as demonstrated in the WSN system requirement, would be attenuated dependent on the wireless distance d. Considering the worst situation, 10 m between TXs and RXs, the attenuation will be around 65 dB for f H at 4.8 GHz and f L at 4.3 GHz as follows,

$$ L = 20\lg \left( {\frac{c}{{4\pi d\sqrt {{f_H}{f_L}} }}} \right) = - 65.5\;{\text{dB}}, $$
(1)

which is known as the band-pass Friis Path Loss equation [10]. Here f H and f L are the up and down side limitation of the bandpass signal, c is the light velocity in the common air. If the RF front end has a maximum of 50 dB power gain, G max , combined from the LNA and VGA, the minimum average peak value approaches 80 mV as the ADC’s input. Therefore, the ADC’s full scale range is designed of 400 mVpp or −4 dBm regarding sufficient margin. Here all of the power refers to the identical impedance, e.g. 50 Ω. The signal and noise level variation along the direct sub-sampling receiver is shown in Fig. 2. Assuming the noise figure of receiver front-end (N F FE ) and overall receiver including ADC (N F tot ) are 6 and 10 dB respectively, the minimum receiver output signal and noise ratio (SNR) is 15 dB and the RX sensitivity is −62 dBm. At the maximum power transferring condition, the input impedance equals the source impedance. The output noise power density is calculated as \({{v_{n,out}^2} \left/ {{R_{in}}} \right.} = kT\) when R S  = R in . Here v 2 n,out is the output noise power density, k represents the Boltzmann constant, T is the thermal temperature. In common case kT equals −174 dBm/Hz. Corresponding noise power could be calculated as kTB, where B is the pass-band bandwidth. This level is shown in Fig. 2 at −87 dBm as a starting noise level. Now considering the noise behavior of a single block, from the definition of noise factor, the output noise power is found as

$$ F = \frac{{SN{R_{in}}}}{{SN{R_{out}}}} = \frac{{\frac{P_{in}}{P_{n,in}}}}{{\frac{P_{out}}{P_{n,out}}}}, {P_{n,out}} = {P_{n,in}} \times G \times F.$$
(2)

From Eq. 2, any output noise power would be enlarged by the power gain and its F. So −87 dBm starting level will be deteriorated at the output of RX front-end by the value of Noise figure, logarithm of F (NF FE ), resulting in −31 dBm as the P n,out,FE . The ADC operates as the cascade stage following the RX front-end. If the gain error is neglected, the power gain of ADC, G ADC is one. So the noise power at the ADC’s output as the P n,out,tot is be enlarged by the NF tot of 10 dB from the ADC’s input. The RX output noise power is −27 dBm. The difference between the P n,out,FE and the P n,out,tot , −29.2 dBm, comes from the non-ideality of ADC, mainly quantization noise. Finally the distance between the full scale power level and the quantization noise level decides the SNR of ADC. The ADC’s SNR becomes 25.2 dB, resulting in a 4-bit quantization resolution. The same result is also declared in [11] but derived using a more complicated model which includes digital baseband algorithm.

Fig. 1
figure 1

Architecture of the proposed IR-UWB transceiver

Fig. 2
figure 2

Signal budget link transfer graph of the proposed IR-UWB receiver

Another important parameter is the sampling rate. As mentioned before, the ADC sub-samples the input bandpass signal which spans from 4.3 to 4.8 GHz. The spectrum aliasing is shown in Fig. 3. To ensure the k rd and k + 1rd aliasing from negative frequency not to overlap the initial bandpass signal, there should be

$$ \begin{aligned} & - f_L + kf_s \le f_L,\\ & - f_H + \left( {k + 1} \right)f_s \ge f_H,\\ \end{aligned} $$
(3)

where f s is the sampling rate, f L ,  f H , as mentioned previously, are the minimum and maximum frequencies of bandpass signal respectively. Then we have

$$ \begin{array}{ll} &f_s \in \bigcup\limits_{k = 0}^{\left\lfloor {\frac{{f_L }}{B}} \right\rfloor } {\left[ {\frac{{2f_H }}{{k + 1}}\;\frac{{2f_L }}{k}} \right]} ,\;\;0 \le k \le \left\lfloor {\frac{{f_L }}{B}} \right\rfloor,\\ &B = {f_H} - {f_L}.\\ \end{array} $$
(4)
Fig. 3
figure 3

Sampling rate based on bandpass Nyquist theory: spectrum alias of sampled bandpass signal

Here \(\left\lfloor {} \right\rfloor\) is the down rounding function and Eq. 4 describes a series of discrete regions available for bandpass sampling behavior [12]. Here when k equals zero, it represents the case of f s  > 2f H . This is obvious, but not efficient. On the other hand, UWB impulse is so narrow in the time domain that it doesn’t need to be sampled too densely. Considering the second order differential Gaussian impulse with 500 MHz bandwidth in the 4 GHz band, since [13, 14]

$$ \begin{aligned} 2\pi B &\approx \frac{4}{\tau},\\ {T_{win}} &= 2\tau,\\ \end{aligned} $$
(5)

where τ is the time constant and T win represents the time domain window inside which the main energy of impulse aggregates, it keeps effective in a 2.6 ns time window. If it is sampled by a 2 GHz clock, five points would be captured successfully resulting in no information loss. Finally a sampling rate of 2.112 GHz is decided accordingly with the factor k of 4 in the Eq. 4. The utilized sampling rate could be positioned in Fig. 4. Figure 4(b) shows the detail frequency sections when factor k is larger than 3. Here the available frequency range consists of a series of interrupted sections depending on the factor of k and f L /B.

Fig. 4
figure 4

Available sampling rate sections in BP sampling without aliasing a sections depending on k from 0 to 8 b detail position when k is larger than 3

3 ADC architecture

In order to achieve the required data throughputs, two parallel channels are controlled by dual non-overlapping clocks [15]. Sometimes it is called two channels time-interleaved. It is used to relax the bandwidth limitations of the individual channel except for the front-end track and hold blocks. Parallel channels benefit to the power consumption in such two aspects:

  1. (1)

    Smaller parasitic capacitance deteriorates the bandwidth especially at high sampling rate. It could be illustrated as follows [16],

$$2{{T_{s,sub\_ch}}} = \frac{{N \times \ln 2 \times \left( {{C_L} + {C_p}} \right) \times {V_{ov}}}}{{2{I_{D,sub}}}},{{T_{s,single\_ch}}} = \frac{{N \times \ln 2 \times \left( {{C_L} + 2{C_p}} \right) \times {V_{ov}}}}{{2{I_{D,single}}}}\Rightarrow \frac{{{P_{paral}}}}{{{P_{sing}}}} \propto \frac{{{C_L} + {C_p}}}{{{C_L} + 2{C_p}}}.$$
(6)

Here the 1st order track and hold model is assumed. T s is the settling time, N is the settling resolution, C L and C p are the loading cap and parasitic cap respectively, V ov represents the over-drive voltage of the differential pair, I D is the dc biasing current for the input transistor. It can be found that when single channel is used, assuming the same V ov ,  N and C L , the biasing current has to be increased even larger than two times since simultaneously increased C p cancel this effect. C p deteriorates the bandwidth seriously when its value becomes comparable to the C L , which often happens in high speed flash converters. Figure 5 shows that more than 30 % power consumption could be saved in THA when C p equals C L .

(2) Various switch non-idealities, for example, charge injection and clock feedthrough, deteriorate the sampling performance more seriously at higher clock rate.

The flash ADC’s architecture is shown in Fig. 6. An open-loop THA front end is implemented to ensure the desired dynamic range. The first THA avoids the clock mismatch between parallel channels [17]. A source follower (SF), following the second THA, buffers the sampled signal and isolates it from back-end blocks. Interpolation [18] generates required reference voltage saving the number of front amplifiers (9 amplifiers in each channel). An offset-calibrated comparator is used to quantize the analog values (totally 15 in each channel). To correct the bubble error [19] from high-speed glitches, the decoder transfers the Thermometer code to the Gray code at first then maps it to the Binary code. All the analog signal paths are fully differential.

Fig. 5
figure 5

The power consumption in THA is reduced by the parasitic capacitor

Fig. 6
figure 6

2.112-GS/s 4-bit ADC architecture

4 Improved averaging termination using asymmetric spatial filter response

Since the input signal is sampled at Giga-Hertz and digitized to no more than 4 bits, at this relatively low resolution, the straight forward full-flash architecture becomes best suited for the high speed data converter. However, flash ADCs in CMOS technology suffer greatly from random offsets in the comparators which can easily exceed the least significant bit (LSB), which is defined as

$$ 3\sigma {V_{os,comp}} \le \frac{1}{2}LSB = \frac{{{V_{FS}}}}{{{2^{N + 1}}}}. $$
(7)

Here σ V os,comp is the mean square root value of the comparator offset, V FS is the full-scale range of the converter and N is the resolution. This is an accepted bound on the offset [20]. Although the dynamic offset generated by the clocked latch can be reduced by the preamplifier, the spread in the threshold voltages of the preamplifier always limit the overall performance [21]. Such spread error scales down inversely as the square root of the input transistors’ sizes [22] but at the expense of higher parasitics which deteriorate available bandwidth. The key to realize good resolution at high speed ADC therefore lies in efficient methods to resolve such speed and resolution trade-off.

Offset averaging is one such method that can be applied to preamplifier array in flash ADCs [21]. Rational designed averaging network would improve the array offset performance by 3–4 times without enlarging preamplifier’s area [22]. But any kind of averaging network generates boundary threshold offset due to the finite network configuration. More than 1 bit linearity loss resulting from averaging edge stimulates various forms of averaging termination schemes to smooth out the random mismatch across the preamplifier array. Either dummy amplifiers with sufficient number to preserve an infinite character of array or special termination circuits to compensate edge offset are used in previous works [15, 23, 24]. But large number of dummies would waste the full scale range and consume too much power, on the other hand, special termination always requires delicate controlling of amplifier’s transconductance which is not very efficient.

The proposed termination technique generates an intended asymmetric spatial filter response matching the impulse response window width, W IR , to the active zero-crossing response window width, W ZX at the boundary of network [25]. The basic principle of spatial filter response is given in [26]. Such way is composed of less active dummies and passive components, at the same time the boundary error is reduced as close as to 1 % as shown in Fig. 7. Here ZX generator array is composed of a series of preamplifiers. Each of them is connected to the neighbors with lateral averaging resistors R 1. Only two dummies are added to the network termination. Two termination resistors R T are also inserted. Finally the two terminations are cross-connected to be similar to the infinite network.

Fig. 7
figure 7

Averaging network for flash converter a network without any termination circuit b network with proposed intended asymmetric spatial filter response based termination

5 Analog circuit implementation

5.1 THA

As the front end block of the sub-sampling ADC, the THA not only tracks and holds the analog signal but also overcomes the clock mismatch between different channels. Figure 8 shows the employed twice sampled THA. To simplify the procedure, only the left-side of the THA will be analyzed. NMOS transistors M 1 and M 2 are used as sampled switches. M 1 turns on and off by the main sampling clock clks, which decides the sampling rate of the overall converter. Its frequency is 2.112 GHz. M 2, used as the switch in the sub-channel1 is controlled by the clock clk1. The clk1 is generated from clks with a frequency divided by 2. The clock, which has the anti-phase to the sub-channel 1, decides the switching behavior of channel 2. The clock diagram of clks,  clk1 and clk2 will be given together with clock for comparator in Sect. 5.2. Dummy transistors reduce the charge injection influence. The sizes of the dummies are set to half of the sampling switches. The practical charge division at the instant of switch turning off approaches 0.5 when the sampling rate is as high as to Giga-Hertz. Self-biased source followers [11] M 3 and M 4 are adopted to guarantee enough gain and output swing at 1.2 V power supply. Self-biasing means the tracked signal at the top plate of sampling capacitor C s drives both source follower’s input transistor and the current source in different sides of the THA at the same time. Figure 9 shows the simplified comparison between traditional SF and the self-biased SF schematics. In both cases, PMOS input transistors are loaded by PMOS current sources. The traditional SF’s current source is biased by a constant voltage, in contrast, the SF is biased by the crossed input signal in self-biased SF. The DC gain of the traditional case in Fig. 9(a) is

$$ A_v = \frac{{{g_{m1,2}}}}{{{g_{m1,2}} + {g_{mb1,2}} + \frac{1}{{{r_{out1,2}}}} + \frac{1}{{{r_{out3,4}}}}}} < 1. $$
(8)

Through AC small signal analysis, the output of the self-biased SF can be derived as

$$\left( {{v_{outp}} - {v_{inp}}} \right){g_{m1,2}} + \frac{{{v_{outp}}}}{{{r_{out}}}} = - {v_{inn}}{g_{m3,4}},\left( {{v_{outn}} - {v_{inn}}} \right){g_{m1,2}} + \frac{{{v_{outn}}}}{{{r_{out}}}} = - {v_{inp}}{g_{m3,4}},{A_v} = \frac{{{v_{outp}} - {v_{outn}}}}{{{v_{inp}} - {v_{inn}}}} = \frac{{\left( {{g_{m1,2}} + {g_{m3,4}}} \right){r_{out}}}}{{1 + {g_{m1,2}}{r_{out}}}},{r_{out}} \approx \frac{1}{{{g_{m1,2}}}}.$$
(9)
Fig. 8
figure 8

Twice sampled THA with self-bias source follower buffer for parallel ADC

Fig. 9
figure 9

Simplified comparison between two types source followers (SFs) a traditional SF b self-biased SF

Here g m1,2 represents the transconductance of the input transistor, r out represents the output node impedance, g m3,4 describes the loading current source transconductance. According to Eq. 9, the gain could be increased exceeding one by tuning g m3,4, which could be used to compensate gain loss in the practical passive high-pass filter (HPF) in the front of THA. The HPF shifts the input common mode level from 600 mV down to 200 mV to make sure the input transistor of THA working in strong inversion and create the appropriate drive voltage of sampling switch. Another important characteristic by using self-biasing SF is that it draws more current from current source to improve the slewing speed, resulting in faster settling. At first, the equivalent small signal model of the conventional SF combined with the passive track and hold stage is shown in Fig. 10. Here all the resistances are ignored, R on represents the on-resistance of sampling switch, M is the input transistor of the SF, g m C gs are the transconductance and gate-source parasitic capacitance respectively. g mb comes from the back-gate effect, which will be minimized by the replica-SF as mentioned in the following part. The high frequency characteristic can be found as follows,

$$ \begin{array}{ll} &\frac{{{v_{in}} - {v_x}}}{{{R_{on}}}} = {v_x}s{C_s} + \left( {{v_x} - {v_{out}}} \right)s{C_{gs}}\\ &\left( {{v_x} - {v_{out}}} \right)s{C_{gs}} + {g_m}\left( {{v_x} - {v_{out}}} \right) = {v_{out}}s{C_L}\\ \end{array} $$
(10)

So the transfer function is

$${A_v}\left( s \right) = \frac{{1 + s\frac{{{C_{gs}}}}{{{g_m}}}}}{{1 + as + b{s^2}}} a = \frac{{{C_L} + {C_s}{R_{on}}{g_m} + {C_{gs}}}}{{{g_m}}}\\ b = \frac{{{R_{on}}\left( {{C_{gs}}{C_s} + {C_L}{C_s} + {C_L}{C_{gs}}} \right)}}{{{g_m}}}.$$
(11)

And the bandwidth could be estimated as

$$ {w_b} \approx \frac{{{g_m}}}{{{C_L} + {C_{gs}} + {C_s}{R_{on}}{g_m}}} = \frac{1}{{\frac{{{C_L} + {C_{gs}}}}{{{g_m}}} + {C_s}{R_{on}}}} $$
(12)

Since generally C s R on is less than the \({\frac{{{C_L} +{C_{gs}}}}{{{g_m}}}},\) the latter one dominates the bandwidth. It is obvious that the only way to enhance the bandwidth is to enlarge the transconductance of input transistor. However it is not power efficient because the C gs will increase as the g m . Considering the practical settling behavior, it could be divided into two continuous parts, slewing and linear settling. If the current source is constant biased, in traditional SF, the current used to slewing is only constant current from current source. However, in self-biased SF, PMOS current sources are slewing too, boosting more current from a big input step actuation as shown in Fig. 11. This way does not require any more static current compared to the approach to promote the linear settling. The entire settling behavior is plotted in Fig. 12. The speed and gain are both improved obviously. From Fig. 13 it is found that the DC gain through the input range has been increased to a value just larger and the unity gain bandwidth (GBW) also has been enhanced to 4 GHz.

Fig. 10
figure 10

Small signal model of the conventional SF combined with the passive track and hold stage

Fig. 11
figure 11

Comparison of transient current of SF with/without self-biased in high speed slewing

Fig. 12
figure 12

Comparison of the entire settling behavior of SF w/o self-biased

Fig. 13
figure 13

Comparison of DC gain and Unity gain bandwidth (GBW) w/o self-biasing

Replica source followers M 5 and M 6 with smaller size compared to main follower help to bias the back-gate of M 4 so that there is no body effect and bulk capacitor loading as shown in Fig. 8. Replica source followers employ self-biasing too to match the behavior of the main SF. MOS capacitors are inserted between followers’ input and output to absorb the kickback noise [27] from back-award circuits.

As shown in Fig. 14, the THA’s signal to noise and distortion ratio (SNDR) is dominated by the 3rd harmonic distortion. Under 1.056 GHz sampling rate, a 4.8 GHz input signal results in 52 dB SNDR, corresponding to 8 bit quantization resolution. It should be mentioned that self-biased SF will deteriorate the front-end linearity since the biasing current of SF is correlated to the input signal. But this effect could be neglected when the ADC’s resolution is less than 6-bit with 2 bit margin.

5.2 Comparator

An offset self-calibrated comparator is used in this design as shown in Fig. 15 [28]. M 1 and M 2 form a sensing amplifier which is sensitive to the input voltage difference. It is followed by a dynamic latch. The whole dynamic comparator only consumes current when operation happens. Since the offset voltage of the differential pair mainly comes from the threshold spread of the input transistors, the self-calibration is designed to reduce its effect. When clka is on, input common mode voltage V cmi is given to the gate of the input transistors M 1 and M 2. At the same time the switch S2 is on and a biasing voltage V b , 200 mV defined by simulation to optimize the transient performance, is connected to the bottom plate of C os . During the period, voltage of the top plate of C os keeps increasing until M 1 and M 2 are turned-off. At this instant, the voltage stored in C os is

$$ V_{Cos1} = V_{cmi} - V_{t1} - V_b ,\;\;V_{Cos2} = V_{cmi} - V_{t2} - V_b. $$
(13)

Here V Cos is voltage of C os ,  V t is the threshold voltage of the differential pair. So now the threshold spread is stored in these two voltages. In the next clock section, clkb, the differential input voltages from preamplifiers are given to the gate of M 1 and M 2. Note that until now, the nodes D ip and D in are connected to supply voltage through S 6 meanwhile S 4,  S 5,  M 6 keep off. Then clkL’s up edge comes, turning on the S 1 and turning off S 2 and S 6. The voltages at nodes D ip and D in due to the current difference in M 1 and M 2 reflect the input voltage. Through M 5 and M 6, it also drives the latch to create the output level. And its common mode part will decrease quickly to turn off S 4 and turn on S 6 automatically. Finally the practical drive voltages are

$$ V_{ov1} = V_{cmi} + 0.5v_{in} - V_{Cos1} - V_{t1} = V_b + 0.5v_{in},V_{ov2} = V_{cmi} - 0.5v_{in} - V_{Cos2} - V_{t2} = V_b - 0.5v_{in}.$$
(14)
Fig. 14
figure 14

FFT of self-biased THA

Fig. 15
figure 15

Circuit of the comparator

Offset has no effect on the final comparison result. Using self-calibrated comparator, a high frequency preamplifier before comparator can be omitted to save power consumption. clk r resets the top plate voltage of C os .  V b is decided at 200 mV due to two facts: (1) it is the same value as the input common mode voltage and the similar reference generator could be used. (2) as shown in Eq. 14, lower V b means lower overdrive voltage, which resulting in higher single transistor gain. This is helpful to attenuate the second stage offset referred to the comparator inputs.

ADC’s clock waveform is shown in Fig. 16. As mentioned in Sect. 3, clks controls the 1st THA. A digital divider is utilized to generate anti-phase clk1 and clk2 from clks. Considering channel 1 only, clk1a,  clk1b,  clk1L and clk1r are all generated from clk1.

Fig. 16
figure 16

Multi-phase clocks of ADC

5.3 The front-end amplifier

Figure 17 shows the full differential front-end amplifier used in the averaging and interpolation network. The matching and speed requirements are strict in this stage. Any bandwidth loss due to parasitics will result in serious linear distortion [19]. Equation 15 gives the minimum gain bandwidth product Aw b , which depends on bias and load at the same time.

$$ A{w_b} = {g_m}R{w_b} = \frac{{{I_{bias}}}}{{{V_{ov}}}}R\frac{1}{{R{C_L}}} = \frac{{{I_{bias}}}}{{{V_{ov}}{C_L}}}. $$
(15)

Here I bias is the single side bias current, V ov is the overdrive voltage of the input transistor, C L represents the loading capacitor. MOS capacitors are inserted between the output and the input in both sides to absorb kickback charge from back-end comparator further.

Fig. 17
figure 17

The full differential front-end amplifier used in the averaging and interpolation network

6 Post simulation results

6.1 Single channel flash ADC

The entire single channel flash ADC for sub-sampling IR-UWB receiver is shown in Fig. 18. A resistive ladder and two reference buffers are utilized to generate original reference voltage for the averaging and interpolation network. By interpolation topology, half of the front-end amplifiers are omitted to save power consumption. Thermometer codes B 0 to B 14 are given to a decoder to be translated to the Binary codes. The proposed asymmetric spatial filter termination scheme is applied [25]. The experimental ADC is designed in SMIC 0.13 μm CMOS technology with a 1.2 V power supply. The high speed self-biased THA samples the input signal and its output goes through the averaging network to the comparators array. The circuit is designed with averaging resistance between each preamplifier of 600 Ω, loading resistance of preamplifier of 660 Ω, and termination resistance R T of 4.8 kΩ [25]. The utilized P+ Poly SAB resistor has 3-sigma value at 0.5 % level, which is high enough for 4-bit resolution in our Ultra-Wideband application. At the same time, if high matching behavior is required, the mirrored shuffle layout pattern can be utilized for higher accuracy as mentioned in [29]. The layout of the ADC is shown in Fig. 19 and it occupies an active chip area of 0.6 mm2, pads included. The ADC with and without asymmetric filtering scheme is simulated both at the sub-sampling full swing input. Their output spectrum is depicted in Figs. 20 and 21 respectively. With the use of the asymmetrical filtering scheme, a 25.8 dB SNDR and 35.9 dB SFDR results in 4-bit effective number of bit (ENOB), at the same time 8.7 dB linearity loss happens without asymmetric termination. It can be concluded that the proposed termination technique is a power efficient way to improve the low-to-medium resolution flash ADC [25].

Fig. 18
figure 18

The entire single channel flash ADC

Fig. 19
figure 19

The layout of the single channel experimental ADC with area occupancy of 600 × 400 μm2

Fig. 20
figure 20

The spectrum of the single channel flash ADC using asymmetric spatial filter scheme

Fig. 21
figure 21

The spectrum of the single channel flash ADC using no special termination

6.2 Parallel ADC for IR-UWB receiver

The entire parallel flash ADC is simulated at a 4.8 GHz input tone with the twice sampling THA. The master clock is generated from a buffered sinusoidal waveform by inverters. The SNDR and SFDR of ADC is 25.8 and 35.9 dB respectively. Figure 22 shows the spectrum of the ADC using 4,096-points FFT. The dynamic performance at relative low frequency is shown in Fig. 23. There is almost no loss of dynamic performance in parallel converter compared to the previous result. This phenomenon results from the reliability of the THA front-end. In fact the THA is designed with sufficient margin to be compatible to the parallel operation. To evaluate the performance of the simulated converter, the FoM defined by \( Power/\left( {2^{ENOB}\times 2ERBW} \right) \) is used, where ERBW and ENOB are the effective resolution bandwidth and effective number of bits, respectively. A FoM value of 0.24 pJ/step indicates the design efficiency while the converter consumes 36 mW power. A short summary of the post simulation results and comparison to other results (state-of-art) are listed in Table 1.

Fig. 22
figure 22

The FFT plot of the designed parallel ADC at f in  = 4.8 GHz

Fig. 23
figure 23

The FFT plot of the designed parallel ADC at f in  = 0.5 GHz

Table 1 Summary of the post simulation results

7 Conclusion

A 2.112 GS/s 4-bit parallel flash ADC is designed in 0.13 μm CMOS technology for sub-sampling IR-UWB receiver. An improved asymmetric averaging termination technique is used to tackle the boundary offset. To save the power consumption, a self-calibrated comparator is proposed without preamplifier. A self-biased twice sampling THA is adopted to ensure the linearity and sufficient bandwidth for parallel operation. Simulation shows that 36 mW is paid for 25.8 dB SNDR and 35.9 dB SFDR with 4.8 GHz input signal. The FoM is equal to 0.24 pJ/step.