## A switched-capacitor based track-and-hold amplifier suitable for PAM4 signaling in 45-nm CMOS

Mohamad El Mokdad<sup>1</sup> · Elias Salameh<sup>1</sup> · Jad G. Atallah<sup>1</sup>

Received: 27 June 2022 / Revised: 10 February 2023 / Accepted: 19 June 2023 / Published online: 29 June 2023 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

#### Abstract

Check for updates

In this paper, we propose the design of a track and hold (T&H) integrated circuit that is able to sample an incoming highspeed PAM4 signal and feed it to an analog to digital converter (ADC). This building block acts as a front-end in optical transceivers carrying sensitive data, where power consumption and cost are a primary concern. The proposed architecture is discussed in detail and a design approach based on the switched capacitor architecture is examined and simulated. Parasitic based capacitance is utilized to hold the signal, and post-layout simulation shows a low power consumption of 42 mW while sampling a 1 GS/s PAM4 signal at 200 MSps (Mega-samples per second). Our circuit has a 10 times lower power consumption compared to a similar implementation in BiCMOS and a comparable power consumption to similar CMOS implementations but with a higher output voltage swing and far less smaller area. Our circuit is designed for PAM4 signals but can be used for NRZ as well, which is a first of its kind in sample and hold circuits targeting serial modulated signals such as PAM4.

Keywords PAM4 · Track and hold · Switched capacitor · 45 nm

## 1 Introduction

Most data communications over networks are transmitted in a serial manner, that is, data bits are transmitted one at a time through a medium of transmission such as a copper cable, optical cable or a wireless path. An NRZ (non-return to zero) signal or Pulse-Amplitude Modulation 2-Level (PAM2) is a type of coding scheme that has two voltage levels to represent logic 0 and logic 1. Another modulation technique, PAM4, uses four voltage levels to represent four combinations of two bits logic: 11, 10, 01, and 00.

The speed of serial transmission is related to the bit time of the serial data. For a 1 GS/s PAM4 signal, the duration of transmission at each level for 2 bits is 500 ps based on Eq. 1 [1].

Mohamad El Mokdad mgmokdad@ndu.edu.lb Elias Salameh

> efsalameh01@ndu.edu.lb Jad G. Atallah

jatallah@ndu.edu.lb

<sup>1</sup> ECCE Department, Notre Dame University - Louaize, Zouk Mosbeh, Lebanon

Bit Rate = 
$$\frac{1}{\text{Bit Duration}}$$
 (1)

Compared to an NRZ signal, a PAM4 signal has an advantage of having half the Nyquist frequency or double the throughput for the same Baud rate as seen in Eq. 2 [1]. Another way to look at it is that it achieves higher resolution using the same sampling rate, which is shown in Fig. 1

Baud Rate = 
$$\frac{\text{Bit Rate}}{\text{bits per symbol}}$$
 (2)

Track and hold (T&H) amplifiers are the basis of data converters [2]. Their applications include but are not limited to ATE (Automated Test Equipment), Digital Sampling Oscilloscopes, jitter measurement, Bit Error Rate (BER), Time-Domain Reflectometers (TDR), RF Demodulation Systems, High Speed Peak Detectors, Software Defined Radio and Gigabit Passive Optical Network Applications [3, 4].

There is always a need to design better T&H circuits that meet the required criteria for specific applications. These criteria are, but not limited to: long hold period, low input offset, large input bandwidth, high sampling rate, good

#### Fig. 1 PAM4 and NRZ coding



linearity, high accuracy, low droop rate, and high common mode rejection [5].

Section 2 lists the requirements and targeted specifications followed by an overview of the different T&H architectures in Sect. 3. The design process is discussed in Sect. 4 preceding the implementation details in Sect. 5 followed by the results in Sect. 6. A conclusion in Sect. 7 wraps up this work.

## 2 Requirements and specifications

Our targeted application is a portable Bit Error Rate (BER) Tester for high speed PAM4 signals operating up to 1 GS/s where the aim here is to generate the eye diagram of the signal and deduce several key parameters such as BER and jitter. Low power consumption is a key target in this design, where the device can be easily integrated in the scope of Internet of Things (IoT). In doing so, we determine the quality of the transmission medium through-which the signal is being sent. In order to do so, samples must be taken from the cable and analyzed in the digital domain with an Analog to Digital Converter (ADC) acting as a portal between the analog and the digital domains. Having a T&H in-front of the ADC relaxes its specifications and provides more flexibility in terms of the mode of operation.

The PAM4 signal entering the chip is differential with 4 levels. The rise and fall time of each level determines the frequency of the signal. The bit period is chosen to be 500 ps.

Our aim is to be able to track our input signal, and hold it, at a sampling rate of 200 MSps. The output is then fed to an ADC that quantizes the discrete points and then feeds them to a digital processor that computes the eye diagram.

Table 1 shows six different marketing requirements we aim to achieve.

Table 2 shows the engineering requirements that we have set and their relation to Table 1. Justification is also presented for each engineering requirement, giving it a concrete base.

Based on the previous specifications, performance parameters for T&H circuits such as acquisition time, track

| Table 1         Marketing requirements | Marketing requirements                                                         | Target application                      |  |
|----------------------------------------|--------------------------------------------------------------------------------|-----------------------------------------|--|
|                                        | 1. Adequate sampling rate                                                      | Re-construction of PAM4 signal          |  |
|                                        | 2. Large input bandwidth                                                       | High bit-rate signals for data-centers  |  |
|                                        | 3. Small die size                                                              | Minimal footprint on PCB                |  |
|                                        | 4. Circuit must withstand temperature variations                               | Use in extreme conditions (datacenters) |  |
|                                        | 5. Supply voltage, input/output signal, clock and termina-<br>tion constraints | According to the specifications set     |  |
|                                        | 6. Low power consumption                                                       | Important for IoT applications          |  |
|                                        |                                                                                |                                         |  |

| Table 2 | Engir Engir | neering | requirement | s |
|---------|-------------|---------|-------------|---|
|---------|-------------|---------|-------------|---|

| Marketing requirement | Engineering requirement                                                    | Justification                                         |
|-----------------------|----------------------------------------------------------------------------|-------------------------------------------------------|
| (2)                   | Input rate of 1 GS/s PAM4                                                  | For high speed signal measurement                     |
| (2)                   | Pedestal error: 20 mV minimum, 50 mV maximum                               | Acceptable range for a typical ADC                    |
| (2)                   | Hold error: 100 uV minimum, 110 uV maximum                                 | Acceptable range for a typical ADC                    |
| (2)                   | Droop rate: 400 uV/ns minimum, 500 uV/ns maximum                           | Acceptable droop rate for a typical ADC               |
| (2)                   | Acquisition bandwidth of at least 1 GHz, maximum of 4 GHz                  | For high speed signal measurement                     |
| (1)                   | Sampling rate of at least 100 MSps, maximum of 200 MSps                    | Ability to under-sample the input signal              |
| (6)                   | Input signal: 1.5 V common mode + 3 Vppd                                   | Similar to existing implementations                   |
| (6)                   | Output signal: 800 mVppd minimum, 900 mVppd maximum                        | Input voltage range for existing ADCs                 |
| (6)                   | DC coupled input/output of 100 ohms differential                           | Typical input/output resistance termination           |
| (6)                   | Supply voltage: 1.7 V 3.1 V                                                | According to existing power regulators                |
| (5)                   | Max power consumption of 100 mW                                            | Usage in low power applications                       |
| (3)                   | Maximum die area of 0.1 mm <sup>2</sup>                                    | Minimal footprint for integration in existing chips   |
| (4)                   | Operation temperature range: 50 °C nominal, ranging from 25 °C up to 85 °C | Crucial for operation in high temperature environment |

bandwidth, hold error, droop rate and pedestal error are briefly explained.

The acquisition time is the measure of how much time it takes for the output signal to track the incoming signal. In our case, three different acquisition times were studied: the time to transition from the first level of the PAM4 signal to the second level of the PAM4 signal, the time to transition from the second level to the third level, and the time it takes to transition from the third level to the fourth level. The track bandwidth is simply equal to:

$$BW_{track} = \frac{1}{t_{acq}} \tag{3}$$

The acquisition time is directly proportional to the value of the hold capacitor, so to decrease the acquisition time and increase the track bandwidth, a smaller capacitor must be used; however, this comes at the expense of worsening the hold error and the droop rate.

The hold error is a measure of the decrease in the voltage across the hold capacitor when in the hold mode. When holding, the switches are OFF, acting as a finite-valued resistance. This resistance pulls current from the hold capacitor and thus a voltage drop occurs.

As for the droop rate, it is calculated in the following manner:

$$\text{Droop Rate} = \frac{\text{Hold Error}}{\text{Hold Period}}$$
(4)

The droop rate, is a measure of the rate at which the output voltage is changing due to the leakage from the hold capacitor. It is proportional to the hold error and inverselyproportional to the hold period. In our design, the droop rate is calculated by dividing the hold error by the hold period, which is 2.5 ns.

The pedestal error is the voltage difference in the output signal from when it exits the track mode and enters the hold mode. This error is caused by the transfer of charges from the switches to the hold capacitor. To decrease this error, a large capacitor can be used, but that comes at the expense of the decrease in acquisition time. To circumvent this issue, we use the minimum gate length in order to reduce the charges released by the transistors. Additionally, drainsource-connected transistors are used to absorb some of the excess charges.

### 3 T&H architectures

Different T&H architectures have been developed with time. These architectures perform the same operations: tracking and then holding a signal. However, some architectures are able to do so more accurately or faster than other architectures. On the other hand, higher-performance architectures are typically more complicated and require additional circuitry.

The open loop T&H circuit is a good option for highspeed applications since it does not employ feedback as shown in Fig. 2. However, it suffers from poor accuracy when compared to other architectures. The closed loop architecture is more accurate than the open loop architecture since it employs a feedback scheme as shown in Fig. 3. However, this architecture is slower than the openloop architecture. Figure 4 is the closed loop architecture with integrator output where the addition of the capacitor connected to what is virtually ground, permits the charge

#### Fig. 2 Open loop architecture



#### Fig. 3 Closed loop architecture



# Fig. 4 Improved closed loop architecture

Vin  $\gg$ CLK2  $\rightarrow$  Q2= Q2 = Q2= Q2= Q2= Q2 = Q2= Q2= Q2 = Q2

# Fig. 5 Current-multiplexed architecture





transfer encountered during the hold stage to be constant and greatly improves the slew time with the addition of switches Q1 and Q2.

Another architecture, the current-multiplexed architecture shown in Fig. 5 proposed by Texas Instruments provides a track bandwidth comparable to that of the open loop configuration, an accuracy comparable to that of the closed loop configuration, and charge injection cancellation [5].

The base collector diode architecture offers a bandwidth even wider than that of the diode bridge T&H architecture. It also offers good linearity, a large dynamic range and good stability [6].

Another architecture is the switched-emitter follower, shown in Fig. 6 that operates with a wide bandwidth, has good linearity, and a large dynamic range. Its shortcoming, however, lies in its stability issues [6, 7].

Figure 7 shows the switched capacitor based architecture. NMOS based switches are used to control the transfer of charge in and out of the hold capacitor, these switches are controlled by two non-overlapping clocks that determine the sampling rate. This architecture is widely used because of its versatility and ability to be modified to counter effect various issues such as charge injection, clock feedthrough and speed requirement. Our choice to go with this architecture is based

#### Table 3 Summary of T&H architectures

| Architecture                          | Key aspect                                                                              |
|---------------------------------------|-----------------------------------------------------------------------------------------|
| Open loop                             | Good for high speed applications<br>Poor accuracy                                       |
| Closed loop                           | Feedback system<br>Better accuracy                                                      |
| Closed loop with integrator<br>output | Better handling of charge injection                                                     |
| TI's architecture                     | Good accuracy<br>Good track bandwidth<br>Charge Injection Cancellation                  |
| Diode bridge                          | Good bandwidth<br>Poor linearity<br>Limited dynamic range                               |
| Base collector diode                  | Good bandwidth<br>Good linearity<br>Good dynamic range                                  |
| Switched emitter follower             | Good bandwidth<br>Good linearity<br>Good dynamic range<br>Suffers from stability issues |
| Switched capacitor                    | Compact design area<br>Moderate bandwidth<br>Suffers from charge injection              |

on its ability to fulfill our requirements especially area and power consumption.

Table 3 represents a summary of previously discussed architectures and other prevalent architectures, listing key aspects of each architecture.

## 4 Design

Based on the requirements set before regarding low power consumption and a large input signal range, we chose to proceed with the open loop architecture discussed previously. The essence of this topology lies in the switches implemented as NMOS transistors. The sample switch and the hold capacitor are the building blocks for the track and hold circuit. Assuming a signal-to-noise ratio (SNR) greater than 50 dB, the contribution of kT/C noise by the capacitor determines the capacitor value based on 5 [8]

$$SNR = 10 \log \left( \frac{V_{in,rms}^2}{kT/C_{hold}} \right)$$
(5)

Where  $k = 1.38 \times 10^{-23}$  J/K is Boltzmann's constant and T = 300K the absolute temperature. Assuming  $V_{IN} = 1.5$ V and a 60 dB SNR, the value of  $C_{hold}$  is found to be 15 fF.

When the gate voltage changes when switching between track to hold mode, an instantaneous drop in the voltage across the hold capacitor happens which causes a pedestal error. In addition to that, a contribution of the inversion charge  $Q_{gate}$  that forms the conductive layer of the MOS switch will flow back into the signal source and into the hold capacitor as described in 6.

$$V_{pedestal} = \frac{C_{ped}\Delta V_{gate} + Q_{gate}/2}{C_{hold}}$$
(6)

 $C_{ped}$  is the gate overlap capacitance and is proportional to the transistor width W. Taking a look at the charge injection shows an amplification at the moment of sampling, a closer look at the signal dependant part of the output voltage shows a dependency on the length and width of the switch in 7.

$$V_{hold} = V_{in} + \frac{Q_{gate}(V_{in})}{2C_{hold}} + V_{DC}$$
  
=  $V_{in} \left( 1 + \frac{WLC_{ox}}{2C_{hold}} + V_{DC} \right)$  (7)

.....

Based on the previous two equations, it is desirable to have the length of the switch at the smallest allowed by the process and have the width at a reasonable size, this reduces the pedestal step to the least allowed while keeping the switch fast enough. A transistor with a short gate length, the channel contains much less charge, and therefore this amplification is less of an issue.

Another consideration would be the droop rate, where in the hold phase, the charge can leak back from the hold capacitor and the signal will show a droop. To keep this value as small as possible, the gate of the switch must be kept at minimum so that the leakage is at its minimum.

$$V_{droop} = -\frac{I_{leak}T_{hold}}{C_{hold}}$$
(8)

To circumvent the issue of pedestal step, compensation of the pedestal step by means of half-sized transistors whose source and drain are connected is beneficial. These dummy switches are controlled by the an inverted clock to the sampling switches, where the charge on the hold node is:

$$Q_s = V_{in}(t = T_s)(C_{hold} + C_{ox}WL_{dummy})$$
(9)

The last term represent the gate capacitance of the dummy switch. After sampling, the gate of the dummy switch is pulled down releasing most of the charge on  $C_{hold}$ .

The voltage on the hold capacitor must be then buffered before any operation can be performed, taking into the account the DC coupled 100-ohm differential load off chip. A differential source follower is used as an output buffer with a supply voltage of 1.7V. The output signal range is now limited to  $V_{DD} - 2_{VT} - 2V_{drive}$ .

Based on the input/output voltage swing, for a source follower the voltage gain is calculated as:

$$G = \frac{R_L}{R_L + \left(\frac{1}{g_m}\right)} \tag{10}$$

Based on our requirements, we desire a output voltage of 900mV for a 1.5V input which translates into a gain of 0.6 V/V. Setting  $R_L$ =50  $\Omega$  we deduce the value of the transconductance required. Using 11 and taking into account the limited area available, we set  $(\frac{W}{L}) = 40$ , knowing  $K'_n$  from the process design kit, we deduce the drain current drawn from the 1P7 supply to be around 25mA.

Moving on the input and clock buffer, a common-gate amplifier is used because of its low input resistance where the 50-ohm matched input and clock signal is fed to. We desire  $R_{in} = 50$  where  $R_{in} = \frac{1}{g_m}$ . Based on 11, the drain current is inversely proportional to the width, so increasing the width of the transistor decreases the current consumption. We found a ratio of 80 to be suitable for the lowest power consumption to area compromise.

$$g_m = \sqrt{2k'_n \frac{W}{L} I_D} \tag{11}$$

#### 5 Implementation

In this section, we break down the block design of our proposed integrated circuit into a set of sub-blocks that aim to familiarize the reader with the targeted application for this particular IC.

Figure 8 represents the T&H integrated circuit at its most fundamental level: PAM4 differential input signals  $V_{INP}$  and  $V_{INN}$ , differential square wave clock inputs  $V_{CLKP}$  and  $V_{CLKN}$ , and differential output signals  $V_{OUTP}$  and  $V_{OUTN}$ .

The open-loop switched capacitor architecture is chosen due to its high accuracy and compact design area. The toplevel circuit design shown in Fig. 9 consists of the T&H core, input buffer, output buffer, and a clock buffer. The PAM4 input signal and clock are terminated to a 50  $\Omega$  resistor (single ended), and the output signal is matched to a 50  $\Omega$  resistor connected to ground, representing the load resistance. All signals in the design are differential for better noise immunity.

Two supply domains are needed for a proper operation,  $AVDD_3P1$ , which is a 3.1 V supply for the clock buffer,



Fig. 9 TOP level schematic block design

VCLKP

VCLKN

and *AVDD\_1P7*, which is a 1.7 V supply for the rest of the circuit. Additionally, there are two biasing voltages fed to the input and clock buffer in order to maintain their mode of operation. Each block is designed and simulated alone according to the requirements while taking into account the previous and the following connected blocks. This approach eases the amount of work needed to do the layout and perform the post parasitic extraction, where the individual blocks' layouts are done separately, and any modifications needed to fix the results are done in the layout regarding the expected R's and C's parasitics that will show up due to routing.

#### 5.1 Input buffer

Transistors M1, M2 and resistors R1, R2 make up the input buffer. Due to the relatively high voltage of the input, and the small input impedance of 50  $\Omega$ , a common gate amplifier is used. A common gate amplifier has a small input impedance that can be matched to our signal's impedance by adjusting the sizing of the transistors. An aspect ratio (W/L) = 80 with a multiplier of 30 is used for the transistors. Due to the large input swing, we are not able to set the transistor's input impedance at exactly 50  $\Omega$ , but rather to a midpoint between the first and fourth level resistance seen by the transistor.

Thick oxide transistors are used because of the large voltage drop across them. This voltage drop is controlled by the resistor connected between the supply voltage of 1.7 and the drain of the transistor. This resistor has a value of 500  $\Omega$ and is placed in such a way to provide adequate legroom and headroom for the output signal, while keeping the transistors in the saturation region.

#### 5.2 Clock buffer

The clock buffer made up from transistors M9, M10 and resistors R3, R4 has a similar operation to that of the input buffer. It is a common gate amplifier used to provide good matching with the 50  $\Omega$  resistance seen by looking from the clock signal point-of-view, as shown in Fig. 9. An aspect ratio (W/L) = 80 with a multiplier of 5 is used for the transistors. The output of this buffer is directly connected to the gates of the transistors in the T&H core. In order to maintain the operation of the switches in the core, the output of the clock buffer must be greater than the highest input signal going to the source of the switching transistors in the core by  $V_{TH}$  in order to keep the transistors ON.

A separate 3.1 V power supply domain is used in the clock buffer in order to maintain the high output voltage needed for the core. A 2.5 k $\Omega$  resistor is connected between the supply voltage and the drain of the transistor

to maintain the operation of the transistors in the saturation region. A relatively high current is expected to pass in the transistors so the size of the transistors is chosen accordingly.

#### 5.3 T&H core

The core of the chip consists of two differential lowthreshold-voltage transistors M3, M4 with an aspect ratio (W/L) = 5 acting as switches that are driven by the positive clock cycle as shown in Fig. 9 The voltage  $V_{GS}$  across the transistors is controlled by the voltage coming from the clock and the voltage of the input signal. When the clock is high and the input signal is at its highest level,  $V_{GS}$  becomes small and a low threshold device is needed for proper functioning of the switch [8].

The drain of the transistors is then connected to two dummy transistors M5, M6 that are controlled by the inverse of the clock and have an aspect ratio half of the switches. These transistors are there to compensate for the clock feedthrough. When the switches turn OFF, charges are released from both terminals in an equal manner if the switching speed is high. When  $V_{CLKP}$  transitions from the high state to the low state, the drain-source connected transistors absorb the extra charges released by the switches. These charges will get transferred to the hold capacitors  $C_{HOLD1} C_{HOLD2}$ , where they will alter the hold voltage on the capacitor. This phenomena is known as the pedestal error and it is due to the switch turn-off non-idealities such as clock feedthrough and charge injection.

Our sampling frequency is relatively low. We wish to undersample our signal at a rate of 200 MSps. In order for the switches to transition from the ON to the OFF state quickly, they need to be fast, so, the minimum gate length is used for the transistors (45 nm). By using the minimum length, the total gate charges are kept minimum therefore a small current is needed to flush the extra charge carried from the junction during switching.

When the clock is high, the switches are ON, and the voltage of the input signal charges the capacitors connected at the output node of the core. This charging phenomenon represents the tracking of the input signal. When the switches are OFF, the capacitors hold the value attained before for a certain period of time determined by the sampling rate and duty cycle.

The value of the hold capacitor was initially set to 15 fF based on Eq. 5. After doing the layout and extracting the parasitics, it was found that the routing from the core to the output stage has a parasitic capacitance of 26 fF to ground. Therefore the capacitors were removed, and the prasitics induced by the routing of Metal6 on top of Metal1 are used.



#### 5.4 Output buffer

In order to match the output signal to the 50  $\Omega$  resistors connected to ground, a common drain amplifier is used *M*7, *M*8. An aspect ratio (*W/L*) = 400 with a multiplier of 30 is used. A common drain amplifier has a low output resistance, which makes it good for matching with the 50  $\Omega$  load. Low threshold devices are used to increase the swing of the output voltage. When *V*<sub>TH</sub> is decreased, the *V*<sub>GS</sub> requirement is relaxed and the transistors will operate as intended while maintaining relatively good output voltage levels, as shown in Fig. 9.

#### 5.5 Layout

The layout of each block is done individually, and any affecting resistance or capacitance originating from the parasitics is taken into account. For sensitive nets such as the connections from the core to the output buffer, routes of minimum width are used in order to decrease the resulting capacitance. On the other hand, for nets that are affected by resistance such as those at the output stage, the routes are widened in order to decrease the resistance across them. The current across these routes is several mA, so an appropriate width must also be chosen so that it handles the amount of current going through.

Dummy transistors are used alongside the main transistors in all blocks in order to minimize the etch effects during fabrication, especially in differential circuits were

#### Table 4 Pre-layout simulation results

| Specification                 | Temperature |       |       |
|-------------------------------|-------------|-------|-------|
|                               | 27          | 45    | 85    |
| Level 0–1 voltage (mV)        | 446.9       | 447.5 | 446.3 |
| Level 1–2 voltage (mV)        | 765.8       | 766.4 | 767.2 |
| Level 2–3 voltage (mV)        | 994.8       | 991.9 | 983.7 |
| Total average power (mW)      | 49.42       | 48.69 | 47.2  |
| Acquisition BWLevel_0-1 (GHz) | 2.774       | 2.597 | 2.283 |
| Acquisition BWLevel_1-2 (GHz) | 4.503       | 4.275 | 3.92  |
| Acquisition BWLevel_2-3 (GHz) | 5.02        | 4.961 | 4.9   |
| Hold error (uV)               | 64.6        | 68.3  | 86.04 |
| Droop rate (uV/ns)            | 25.82       | 27.32 | 34.42 |
| Pedestal error (mv)           | 47.6        | 42.7  | 31.5  |

Table 5 Post-layout results

| Specification                 | Temperature |       |       |
|-------------------------------|-------------|-------|-------|
|                               | 27          | 45    | 85    |
| Level 0–1 voltage (mV)        | 384.5       | 384.5 | 383   |
| Level 1–2 voltage (mV)        | 650.8       | 651.5 | 652.8 |
| Level 2–3 voltage (mV)        | 849.2       | 849.9 | 845.3 |
| Total average power (mW)      | 42.74       | 42.15 | 40.95 |
| Acquisition BWLevel_0-1 (GHz) | 2.6         | 2.47  | 2.2   |
| Acquisition BWLevel_1-2 (GHz) | 3.79        | 3.66  | 3.43  |
| Acquisition BWLevel_2-3 (GHz) | 4.05        | 3.97  | 3.83  |
| Hold error (uV)               | 107         | 104   | 101   |
| Droop rate (uV/ns)            | 425.9       | 415.7 | 402   |
| Pedestal error (mv)           | 38.48       | 34.82 | 27.21 |

symmetry is key. If not used, a transistor could exhibit different behavior in fingers that are on the edge compared to fingers in the middle. This behavior shows when the threshold voltages of matching transistors differ. Symmetric connections are enforced at all levels of routing so that the connections' differential signaling is respected.

P-type substrate guard rings are used to insulate the transistors of each stage from each other for better immunity against noise and other fabrication anomalies. The P-type guard rings are then surrounded with N-type guard rings connected to the higher voltage potential *AVDD\_1P7* and *AVDD\_3P1* and separated by the

Table 6 Comparison of our results with the latest relevant publications

minimum distance required to satisfy the Design Rule Check (DRC) constraints.

The total area of the layout turned out to be 150 um by 150 um. A small snip of the final layout can be seen in Fig. 10.

#### **6** Simulation results

Tables 4 and 5 are the results of the simulations done across three different temperatures at the typical process variation.

The acquisition bandwidth pre-layout is significantly greater than that of the simulated post-layout acquisition bandwidth. The hold error and droop rate are greater postlayout than they are pre-layout. The power consumption and pedestal error are both smaller post-layout. There is a noticeable decrease in the voltage levels of the output signal from pre-layout to post-layout results. This smaller level explains why post-layout design is more thrifty in terms of power consumption. Across temperature variations, the specifications do not vary by a significant amount for neither prelayout nor post-layout simulation results.

Fig. 11 is a visual representation of the output signal of the T&H IC achieved pre-layout and post-layout with the clock and a PAM4 input signal. Figure 12 is a visual representation of the output signal sampling and NRZ input

| Specification                      | Our design         | [9]                 | [10]               | [11]               |
|------------------------------------|--------------------|---------------------|--------------------|--------------------|
|                                    | Reported           | Reported            | Reported           | Reported           |
| Architecture                       | Switched capacitor | SEF                 | Switched capacitor | Switched capacitor |
| Process                            | 45 nm CMOS         | 0.25 um BiCMOS SiGe | 28 nm CMOS         | 28 nm CMOS         |
| Inputrate (GS/s)                   | 1                  | Not reported        | Not reported       | Not reported       |
| Sampling rate (GSps)               | 0.2                | 1.5                 | 25                 | 10                 |
| Small signal bandwidth (GHz)       | 1.25               | 1                   | 70                 | 1.9                |
| Clock input voltage (single ended) | 3 V                | Not reported        | Not reported       | Not reported       |
| Clock rise/fall time (ps)          | 250                | Not reported        | Not reported       | Not reported       |
| Input voltage common mode          | 1.5 V              | Not reported        | Not reported       | Not reported       |
| Input voltage differential         | 1.5 V              | 1 Vpp               | 0.4 Vpp            | 0.8 Vpp            |
| Output voltage common mode         | 1 V                | Not reported        | 0                  | 0                  |
| Output voltage (peak to peak)      | 850 mV             | Not reported        | 0.12 mV            | 0.14 mV            |
| Power supply                       | 1.7 V & 3.1 V      | Not reported        | 1.4 V & 1.75 V     | 1.4 V & 1.8 V      |
| Die area (mm <sup>2</sup> )        | 0.0225             | 0.7475              | 0.53               | 0.89               |
| Hold error (uV)                    | 101 up to 107      | Not reported        | Not reported       | Not reported       |
| Droop rate (uV/ns)                 | 402 up to 425.9    | 161.62              | Not reported       | Not reported       |
| Pedestal error (mV)                | 27.21 up to 38.48  | Not reported        | Not reported       | Not reported       |
| Acquisition bandwidth (GHz)        | 2.2 up to 4.05     | Not reported        | Not reported       | Not reported       |
| Operating temperature (°C)         | 27 to 85           | Not reported        | Not reported       | Not reported       |
| Power consumption (mW)             | 41 up to 43        | 480                 | 73                 | 50                 |



signal. Figure 13 is the generated eye diagram of a pseudorandom binary sequence (PRBS) PAM4 input signal.

Table 6 is a comparison of the results presented in this implementation with respect to similar publications.

## 7 Conclusion

In this work we presented a T&H Circuit that is able to sample a 1 GS/s PAM4 signal at 200 MSps consuming less than 50 mW while in operation. Although this circuit is designed for PAM4 signaling, it also works with NRZ signaling as well. No publications are available for a T&H circuit targeting PAM4 signaling for comparison, but other publications such as [9], based on a SEF (Switched-Emitter-Follower) architecture in BiCMOS, demonstrates a similar small signal bandwidth and input voltage but on the expense of bigger area and 10 times the power consumption. Publication [10], based on a switched capacitor architecture in CMOS, reports a higher sampling rate and bandwidth, but with a smaller input voltage, much smaller output voltage and bigger area. Publication [11], based on a switched capacitor architecture in CMOS, reports a similar bandwidth and power consumption, but at a much larger area and narrow output voltage. Most of the publications don't report important specifications such as clock input voltage, hold error, droop rate, pedestal error, acquisition bandwidth and the operating temperature range. It is safe to assume that none of the publications are designed to maintain the mode of operation up to high temperatures as in our design. Our circuit can sample a PAM4 and NRZ signals, which is a first for a T&H publication based on a switched capacitor architecture.

The low power consumption of this block enables it to integrate well into IoT applications such as a portable always-on testers for 1 GS/s signals. Another application would be to integrate this circuit into existing Small Formfactor Pluggable (SFP) transceiver modules as a front-end with an ADC with the purpose of determining the quality of the received signal before being processed and software checked, saving the extra overhead in time.

**Funding** No funding was received to assist with the preparation of this manuscript.

#### Declarations

**Conflict of interest** Author Jad G. Atallah is a member of the editorial board in the journal of Analog Integrated Circuits and Signal Processing where this manuscript is being submitted to.

## References

- 1. Frenzel, L. E. (2016). *Principles of electronic communication* systems (4th ed.). Columbus, OH: McGraw-Hill Education.
- Carusone, T. C., Johns, D. A., & Martin, K. W. (2012). Analog integrated circuit design: International student version (2nd ed.). Nashville, TN: John Wiley & Sons.
- MAXIM: Fast Sample-and-Hold Circuit. (2012). MAXIM. 19-4539; Rev 1; 2/12

- Analog Devices: WIDEBAND 4 GS/s TRACK-AND-HOLD AMPLIFIER DC - 5 GHz. Analog Devices. v01.0514
- National Semiconductor: Specifications and Architectures of Sample-and-Hold Amplifiers. (1992). National Semiconductor. AN011215
- Daneshga, S., Rodwell, M., Griffith, Z. (2012). High-speed trackand-hold circuit design. https://web.ece.ucsb.edu/Faculty/rodwell/ publications\_and\_presentations/publications/2012\_9\_sept\_Danes hgar\_CSICS2012\_slides.pdf
- 7. Maloberti, F. (2010). Data converters. New York, NY: Springer.
- 8. Pelgrom, M. (2018). *Analog-to-digital conversion*. Cham, Switzerland: Springer.
- Cannone, F., Cascella, D., Avitabile, G., Coviello, G. (2012). A high bandwidth 11-bit 1.5gs/s track and hold amplifier in 0.25 μm sige bicmos. In 2012 International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), (pp. 49–52)
- Tretter, G., Fritsche, D., Khafaji, M. M., Carta, C., & Ellinger, F. (2016). A 55-GHz-bandwidth track-and-hold amplifier in 28-nm low-power CMOS. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 63(3), 229–233.
- Tretter, G., Fritsche, D., Carta, C., Ellinger, F. (2013). 10-GS/s track and hold circuit in 28 nm CMOS. In 2013 International Semiconductor Conference Dresden - Grenoble (ISCDG)

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.



Mohamad El Mokdad received his bachelor's degree in computer and communication engineering in 2021 from Notre Dame University - Louaize, Lebanon. Prior to graduation, he joined a multinational high-speed testing and measurement equipment company where he worked on digital designs for 400G ethernet subsystem applications. He then introduced others to the Cadence Virtuoso Suite as a certified lab instructor and went on to learn the fundamentals of IC design using a 0.13µm BiCMOS

technology. In 2021, he joined a fabless semiconductor startup that focuses on low-power WiFi SoC's where he is currently working as an analog and layout design engineer. He is a founding member of NDU eCAS lab, tackling new and interesting analog mixed signal topics with other members under the supervision of Dr. Jad G. Atallah.



Elias Salameh received his bachelor's degree in computer and communication engineering in 2021 from Notre Dame University - Louaize, Lebanon. During his studies, he was supervised by Dr. Jad G. Atallah in a number of integrated circuit design projects. After his graduation, Elias has continued working on pending projects as a member of the NDU eCAS lab, led by Dr. Jad G. Atallah. Elias is currently a MSc student in Electrical Engineering and Information Technology at the

Swiss Federal Institute of Technology(ETHZ).



Jad G. Atallah received the B.E. degree in computer and communications engineering in 2001 from the American University of Beirut, Lebanon, and the M.Sc. degree in electrical engineering with the specialization in system-onchip design in 2003 as well as the Ph.D. degree in electronic and computer systems in 2008, both from the Royal Institute of Technology, Stockholm, Sweden. In 2009, Dr. Atallah joined the Faculty of Engineering at the Notre Dame University - Louaize, Lebanon. In 2012, he was a visiting researcher at the ElectroScience Laboratory, Center of Excellence, The Ohio State University, USA. Dr. Atallah teaches regular and specialized courses in the fields of advanced wireless communications radio design and low power analog and mixed-mode circuit and system design. His current research interests are in reliable low-power mm-wave/RF/analog/mixed signal designs as well as in undergraduate education in IC design. Having published in several journal, magazine, and conferences, Dr. Atallah also co-authored a book on frequency synthesizers for convergent wireless solutions, published by Springer, as well as EDA educational material within the Cadence Academic Network.