A 32 Gb/s Low Power Little Area Re-timer with PI Based CDR in 65 nm CMOS Technology

Pang, Zhengbin; Lv, Fangxu; Tang, Weiping; Lai, Mingche; Guo, Kaile; Wu, Yuxuan; Liu, Tao; Wu, Miaomiao; Lu, Dechao

doi:10.1007/978-981-15-8135-9_3

Zhengbin Pang¹⁰,
Fangxu Lv¹⁰,
Weiping Tang¹¹,
Mingche Lai¹⁰,
Kaile Guo¹¹,
Yuxuan Wu¹¹,
Tao Liu¹¹,
Miaomiao Wu¹¹ &
…
Dechao Lu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1256))

Included in the following conference series:

Conference on Advanced Computer Architecture

924 Accesses

Abstract

This paper presents a 32 Gb/s low power little area re-timer with Phase Interpolator (PI) based Clock and Data Recovery (CDR). To further ensure signal integrity, both a Continuous Time Linear Equalizer (CTLE) and Feed Forward Equalizer (FFE) are adapted. To save power dissipation, a quarter-rate based 3-tap FFE is proposed. To reduce the chip area, a Band-Band Phase Discriminator (BBPD) based PI CDR is employed. In addition, a 2-order digital filter is adopted to improve the jitter performance in the CDR loop. This re-timer is achieved in 65 nm CMOS technology and supplied with 1.1 V. The simulation results show that the proposed re-timer can work at 32 Gb/s and consumes 91 mW. And it can equalize >−12 dB channel attenuation, tolerate the frequency difference of 200 ppm.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A 14.8 ps jitter low-power dual band all digital PLL with reconfigurable DCO and time-interlined multiplexers

Article 11 January 2015

A 2–20 Gbps Clock and Data Recovery Based on Phase Interpolation and Delay Locked Loop

Article 10 August 2023

A 10.2 mW multi-mode continuous-time ΔΣ ADC with 70–87 dB DR and 0.7–10 MHz bandwidth for TD-SCDMA and LTE digital receivers

Article 07 September 2016

Keywords

1 Introduction

The continuously increasing bandwidth demand for data communication in high performance computer (HPC) has pushed wire-line connections towards data-rates of 25 Gb/s or beyond [1]. However, low-power and high density data transceivers are also key elements of modern HPC, due to systems such as network switches and processor interfaces will employ optical communication [2, 3]. Figure 1 shows the next switch system with optical communication. The black box in the center of the system, which outputs optical signal directly, usually consists a switch chip, many re-timer chips and other optical chips. However the bandwidth, power efficiency and area of the re-timer also limit performance of the switch system. Even though, many reported CDR can meet its bandwidth, but their power is hungry due to fabricated with III-VI materials [4]. In addition, the large area of the CDR is not good for high density integrated.

To solve these problems, a high speed, low power and little area re-timer based CMOS technology is proposed. To save the power dissipation, a quarter-rate based 3-tap FFE is proposed. To reduce the chip area, a BBPD based PI CDR is employed. In addition, to improve the high speed performance, a 2-order digital filter is used.

This paper is organized as follows. Section 2 presents the architecture of the re-timer, followed by the description of building blocks. Section 3 reveals the experimental results and the conclusion.

2 Architecture and Circuit Design

Figure 2 shows the re-timer architecture, which includes a phase tracing control loop and a data path. In the phase tracing control loop, the input data are sampled by 1/4 rate 8 phase clocks firstly. Secondly, the early/late information between the sampling clocks and input data is extracted by PD circuit. After the voter and filter, the control words generated by code circuit are used to rotate the extra input clock to match the phase of the input data. In the data path, firstly, the input data is equalized by the CTLE. Secondly, it is resampled by the recovery clock. Lastly the data is equalized and output by the 3-tap FFE with driver.

In the phase tracing control loop, a quarter rate BBPD based CDR is introduced, which consists of 1/4 rate sampler, 8:32 DEMUX, phase detector, voter, 2-order digital filter, code, and phase interpolator. The data path consists CTLE, baud rate sampler, delay latch array, 4:1 MUX based 3-tap FFE.

2.1 BBPD Based PI-CDR with 2-Order Digital Filter

Clock recovery circuit is the most important circuit module in high re-timer system. Its main task is to extract clock information from the input data with amplitude noise and phase noise, and then retime the data. In addition, CDR can track the low frequency phase jitter introduced in the input data. The working principle of a CDR, shown in Fig. 3, mainly includes clock recovery (CR) module and data recovery (DR) module. The CR detects the phase information of the data, and then generates the clock related to the input data. The DR uses the generated clock to complete the data retiming task.

Figure 4 shows the model of the proposed CDR, which is a Bang-Bang phase discriminator (BBPD) based PI CDR with 2-order digital filter. It consists a BBPD, a voter, a 2-order digital filter, a phase interpolator, and a feed-back. The BBPD is used to extract the phase error between the input data and clock generated from PI. The voter is used to get the efficient results of the decision from BBPD. The 2-order digital filter is adapted to smooth the result of the voter and then used for PI. PI is used to generate a desired phase clock with a fixed input clock (Fig. 5).

To analyze the performance of the CDR, a linearized model with parameters is modeled in Fig. 6. In the linearized model, $ K_{TD} $ is the edge conversion density of the input data. $ K_{PD} $ is the phase detector gain. $ K_{V} $ is the gain of the voter to take effects of decimation from any decimation that takes place. The value $ K_{P} $ and $ K_{I} $ correspond to the proportional and integral paths from the output of the voting to the PI. $ K_{PI} $ is the gain of the PI. This corresponds to the resolution of the PI in units of Unit Interval (UI) per bit. $ z^{ - NEL} $ represents all of the delay (analog and digital pipe stages) in going around the loop. Thus, the open-loop transfer function for the linearized CDR can be express as

$$ {\text{G(Z}}^{{{ - }1}} )= \frac{{\varphi_{out} }}{{\varphi_{err} }} = K_{TD} K_{PD} K_{V} (K_{P} + K_{I} \frac{{Z^{ - 1} }}{{1 - Z^{ - 1} }})\frac{{Z^{ - 1} }}{{1 - Z^{ - 1} }}K_{PI} Z^{ - NEL} $$

(1)

In Z-Domain, $ z = e^{{S*T_{DLF} }} $, where $ S $ is $ j*2\pi f $ and $ T_{DLF} $ is the operation (cycle) period of digital loop filter (DLF). In addition, $ e^{{ - sT_{DLF} }} = 1 + ( - sT_{DLF} ) + \frac{{( - sT_{DLF} )^{2} }}{2} + \frac{{( - sT_{DLF} )^{3} }}{3!} + \cdots \cdots $, when $ {\text{s}}T_{DLF} < < 1 $, we can get

$$ z^{ - 1} = e^{{ - sT_{DLF} }} \approx 1 - sT_{DLF} ,({\text{s}}T_{DLF} \,{<}{<}\, 1) $$

(2)

Therefore the open-loop transfer function can be given by

$$ \begin{aligned} G({\text{S}}) = K_{TD} K_{PD} K_{V} \left[ {\frac{{K_{P} (1 - sT_{DLF} )}}{{sT_{DLF} }} + \frac{{K_{I} (1 - sT_{DLF} )^{2} }}{{s^{2} T_{DLF}^{2} }}} \right]K_{PI} (1 - sT_{DLF} )^{NEL} \hfill \\ \begin{array}{*{20}c} {} & { \approx K_{TD} K_{PD} K_{V} \left( {\frac{{K_{P} }}{{sT_{DLF} }} + \frac{{K_{I} }}{{s^{2} T_{DLF}^{2} }}} \right)} \\ \end{array} K_{PI} (1 - sT_{DLF} )^{NEL} \hfill \\ \end{aligned} $$

(3)

The phase transfer function is given by the following well known equation:

$$ H(S) = \frac{{\phi_{out} }}{{\phi_{in} }} = \frac{{G({\text{S}})}}{{ 1 + G({\text{S}})}} $$

(4)

Figure 6 shows the calculated phase transfer function of the proposed CDR. It can be observed that the bandwidth is 1.46 MHz.

2.2 Phase Interpolator

PI is the key module in the CDR. It can generate a desired phase clock underling the control of the input control words for sampling the input data. The working principle of the basic PI can be explained by a vector diagram and its mathematical model equation. In Fig. 7, the two basic vectors $ \overrightarrow {{V_{Q} }} $ and $ \overrightarrow {{V_{I} }} $, which between the phase is 90°, can composite a new vector. It’s known by the vector knowledge of geometry that, the phase of the composite vector, which is the angle between the new composite vector and the horizontal vector, can be controlled through changing these amplitudes of the two basic vectors. And the geometry theory of this composite vector can be expressed by Eq. (5).

$$ \overrightarrow {{V_{O} }} = \overrightarrow {{V_{Q} }} + \overrightarrow {{V_{I} }} $$

(5)

$$ V_{out} = \alpha A\,\sin (\omega t) + (1 - \alpha )A\,{ \cos }(\omega t),\;\left( {0 \le \alpha \le 1} \right) $$

(6)

$$ V_{out} = A\sqrt {\alpha^{2} + (1 - \alpha )^{2} } \sin (\omega t + \varphi_{out} ) $$

(7)

$$ \varphi_{out} = \arctan (\frac{1 - \alpha }{\alpha }) $$

(8)

In actual circuit, the two basic vectors $ \overrightarrow {{V_{Q} }} $ and $ \overrightarrow {{V_{\text{I}} }} $ can be replace by $ \alpha A\,\sin (\omega t) $ and $ (1 - \alpha )A\,{ \cos }(\omega t) $, thus Eq. (5) can be expressed as Eq. (6), which of the value is limited in [0, 1]. The phase between $ \sin (\omega t) $ and $ { \cos }(\omega t) $ is 90°, $ \alpha A $ and $ (1 - \alpha )A $ are their amplitudes respectively. When $ \alpha $ is changed, the phase of the Vout followed in 0 to 90°, which is the desired phase of Vout. In order to precisely calculate the output phase, the Eq. (7) can be derive by Eq. (6), and the phase of Vout can be calculated by Eq. (8). Figure 8 shows the Vout waveforms with different $ \alpha $ values.

Figure 9 shows the part circuit of the PI. It includes two pull-up loads, two pairs of input transistors, and 16 equivalent tail current sources under each of input pairs. And the relationship between input temperature code and output clock phase is depicted in Fig. 10.

If the input two basic clocks are be changed from 0, 90, 180, 270, the phase of the composited clock can be got in any degree (0–360) that we are desired, which is depicted in Fig. 11. Figure 12 shows the circuit of the complete PI, which consists 4 pairs of the input transistors, control words transistors and tail current sources.

2.3 4:1 MUX Based 3-Tap FFE

As everyone knows that, the dielectric channel usually presents low - pass characteristics due to the dielectric loss and skin effect. Figure 13(a) shows a typical backbone channel S12 curve, which includes a 19 in. PCB channel, 2 via holes, 2 packages and 2 connectors. The attention at the baud rate frequency is −17.32 dB. When data rate exceeds the channel bandwidth, the high data rate signal couldn’t transform within 1 unit interval (UI) and extend to the adjacent signal interval, which are showed in Fig. 13 (b), and this phenomenon is usually called inter-symbol interference (ISI). ISI can deteriorate signal integrity of the high speed signal. Figure 14 presents a 32 Gb/s NRZ eye diagram before this channel, and the eye diagram after passing channel is closed due to the ISI.

In order to mitigate this problem, a feed-forward equalizer (FFE) is usually to be introduced at the output of the re-timer to reduce the ISI. The basic construction of a 3-taps FFE as show in Fig. 15, which includes 3 delay units, 3 multiplying units with 3 coefficients and a summer, is a finite impulse response (FIR) filter. The time-domain transfer function is Eq. (9), and the Z-domain transfer function is Eq. (10), where the Z is $ {\text{e}}^{j2\pi fT} $. Figure 16 shows the channel response with different character. The black curve presents the channel response without FFE. The blue curve describes a high pass based FIR filter with proper 3 tap coefficients. And the red curve shows the channel response with the FFE, which can keep the signal integrity. Figure 17 (a) and (b) show the eye diagrams before and after the channel with proper coefficients based FFE.

$$ {\text{y}}(t) = c0*x(t) + c1*x(t + T) + c2*x(t + 2T) $$

(9)

$$ {\text{H(Z) = }}c0*Z^{0} + c1*Z^{ - 1} + c2*Z^{ - 2} \left( {{\text{z = e}}^{j2\pi fT} } \right) $$

(10)

Compared with the pre-emphasis based FFE, the de-emphasis based FFE is widely used due to its simple circuit structure. A de-emphasis based FFE equalizes the output’s signal through reducing the amplitude of the high frequency components of the original signal and maintaining the amplitude of the low frequency components of the original signal, which still follows the principle of the FFE. However, when data rates exceed 20 Gb/s, the high speed delay is power hungry and the timing is constrict under PVT variation. In order to solve these problems, a 4:1 MUX based 3-tap FFE is introduce to this re-timer showing Fig. 18. Compared with other FFE circuits, the delay cell in this FFE circuit designed with 3 4:1 MUX units, which can save power and relaxes the critical path timing by using the quarter rate clock and avoiding CML based circuits.

Figure 19 describes the 4:1 MUX with its timing diagram. This MUX consists of shunt-peaked loads and four identical unit cells, which is activated sequentially by the 2UI-spaced pulses quadrature clock (i.e., CK0, CK90, CK180, and CK270) to combine the four quarter-rate data into one serial sequence.

3 Experimental Results

The re-timer designed in 65 nm CMOS Technology. The layout of the re-timer is shown in Fig. 20. The core area of this chip is 0.11 mm².

Figure 21 show the 32 Gb/s output eye diagram of this re-timer with or without FFE. When it passes a −12.52 dB@16 GHz attenuation channel without FFE, the output eye-diagram is closed as shown in Fig. 21 (a). When using the 3-tap FFE with the proper coefficients, the vertical eye opening of the eye diagram is 200 mVpp just as shown in Fig. 21(b). When setting 200 ppm frequency between the input data and the reference clock, the eye diagram of the recovery 1/4 rate clock is shown in Fig. 22, and the total jitter of that is 7.1 ps. The total power of this re-timer is 91 mW under 1.1 V supply. Table 1 compares the performance of this work with prior similar works.

Table 1. Performance summary

Full size table

4 Conclusion

In order to solve the problem of high power consumption and large area of the high speed re-timer in HPC data communication, a 32 Gb/s low power little area re-timer with PI based CDR is proposed. To further ensure signal integrity, both a CTLE and feed forward equalizer are adapted. To save power dissipation, a quarter-rate based 3-tap FFE is proposed. To reduce chip area, a BBPD based PI CDR is employed. In addition, a 2-order digital filter is adopted to improve the high speed performance in the CDR loop. This re-timer is achieved in 65 nm CMOS technology and supplied with 1.1 V. The simulation results show that the proposed re-timer can work at 32 Gb/s and consumes 91mW. The 3-tap FFE in the re-timer can equalize >−12 dB channel attenuation. The PI based CDR with 2-order digital filter can CDR can tolerate a frequency difference of 200 ppm.

References

Rupp, K.: 42 years of microprocessor trend data. https://www.karlrupp.net/2018/02/42-years-ofmicroprocessor-trend-data/
Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117 (1965)
Google Scholar
Pham, D.: The design and implementation of a first-generation CELL processor-a multi-core SoC. In: 2005 International Conference on Integrated Circuit Design and Technology, Austin, TX, USA, pp. 49–52. IEEE (2005)
Google Scholar
Nagashima, K.: 28-Gb/s × 24-channel CDR-integrated VCSEL-based transceiver module for high-density optical interconnects. In: 2016 Optical Fiber Communications Conference and Exhibition (OFC), Anaheim, CA, pp. 1–3. IEEE (2016)
Google Scholar
Rahman, W.: A 22.5-to-32-Gb/s 3.2-pJ/b Referenceless Baud-Rate Digital CDR With DFE and CTLE in 28-nm CMOS. IEEE J. Solid-State Circ. 52(12), 3517–3531 (2017)
Google Scholar
Chu, S.-H.: A 22 to 26.5 Gb/s optical receiver with all-digital clock and data recovery in a 65 nm CMOS process. IEEE J. Solid-State Circ. 50(11), 2603–2612 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Zhengbin Pang, Fangxu Lv & Mingche Lai
Air Force Engineering University, Xi’an, China
Weiping Tang, Kaile Guo, Yuxuan Wu, Tao Liu, Miaomiao Wu & Dechao Lu

Authors

Zhengbin Pang
View author publications
You can also search for this author in PubMed Google Scholar
Fangxu Lv
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Tang
View author publications
You can also search for this author in PubMed Google Scholar
Mingche Lai
View author publications
You can also search for this author in PubMed Google Scholar
Kaile Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dechao Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fangxu Lv .

Editor information

Editors and Affiliations

National University of Defense Technology, Changsha, China
Dezun Dong
Nankai University, Tianjin, China
Xiaoli Gong
National University of Defense Technology, Changsha, China
Cunlu Li
National University of Defense Technology, Changsha, China
Dongsheng Li
National University of Defense Technology, Changsha, China
Junjie Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pang, Z. et al. (2020). A 32 Gb/s Low Power Little Area Re-timer with PI Based CDR in 65 nm CMOS Technology. In: Dong, D., Gong, X., Li, C., Li, D., Wu, J. (eds) Advanced Computer Architecture. ACA 2020. Communications in Computer and Information Science, vol 1256. Springer, Singapore. https://doi.org/10.1007/978-981-15-8135-9_3

Download citation

DOI: https://doi.org/10.1007/978-981-15-8135-9_3
Published: 05 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8134-2
Online ISBN: 978-981-15-8135-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

A 32 Gb/s Low Power Little Area Re-timer with PI Based CDR in 65 nm CMOS Technology