We propose an area- and power-efficient four-level pulse amplitude modulation (PAM-4) encoder/decoder for an AC coupled link system that guarantees DC balance and limited run length. Configured as ...10B6Q, input data of 10 bits are mapped to 5 quaternary symbols. One of four candidate codewords that have inverted symbols at predefined positions is selected that best satisfies the requirements of DC balance and the number of transitions. One quaternary symbol is added at the end to indicate which candidate codeword is selected. This coding scheme not only guarantees the PAM-4 DC balance but also increases transition density from 75% (PRBS) to 85.6% and limits the maximum run length down to six, thus facilitating timing recovery of the receiver. Although the input data width of 10 bits is used in this implementation, the proposed scheme can be extended to wider input data with higher coding efficiency. The proposed 10B6Q encoder is fabricated in the 40 nm CMOS technology and occupies an active area of 0.0009 mm 2 with a synthesized gate count of 645. It consumes 0.23 mW at the operating clock frequency of 667 MHz.
Although an injection-locked oscillator (ILO) can offer excellent jitter performance on average, its intense phase modification at a given injection rate inevitably degrades spur performance, unless ...injection timing is carefully controlled. This work investigates a behavioral model of the ILO with digital control of a bang-bang phase detector (BBPD) on a discrete-time domain, a quantitative analysis on the dynamics of the digital injection-locked clock multiplier (ILCM) is provided. Adjusting frequency error between the free-running oscillator and the injection signal is crucial to obtain better spur performance. However, the timing offset caused by the device mismatches hinders it from being correctly compensated. Therefore, we investigate the effect of timing offset (or mismatch) between the replica cells and BBPD and then propose the time-division dual calibration (TDDC) to reduce the discrepancies. In addition, three-stage replica cells are chosen to achieve a robust operation in the phase generating aspect. By removing the residual phase offset using multiple delay cells, the optimum locking point is guaranteed.
This brief presents an 8-GHz Octa-phase Error Corrector (OEC) employing a digital delay-locked loop (DLL) with a coprime phase comparison scheme. To alleviate timing constraint during the phase ...comparison, clock phases spaced in coprime to 8 is utilized, enabling up to a 64-Gb/s link operation. In particular, this brief applies 3T/8 spaced clock rather than T/8. In addition, by employing a clock-divided 5-bit selection scheme, a high-speed 8:2 multiplexer (MUX) operates seamlessly without glitches. To minimize a mismatch and calibration -induced jitter, a single shared phase comparator and a finite-state machine (FSM) for tracking the minimum total delay are employed. The test chip has been fabricated in the 40-nm CMOS technology in an active area of 0.0814 mm 2 . The core phase calibration loop consumes 10.8 mW at 8 GHz at a 0.9-V supply achieving a maximum residue phase error of 0.95 ps.
This paper presents asymmetric simultaneous bidirectional (SBD) transceivers for the next-generation automotive camera link. To realize the SBD operation with the PAM-4 signaling, the proposed wide ...linear range (WLR) hybrid excludes the voltage-dependent non-linear transconductance (g m ) of active elements. A two-step hybrid strategy suppressing the PAM-4 forward channel (FC), including the FFE, is utilized for low power and design simplicity. A Σα hybrid removes only four primary DC levels, and 2 nd order g m -capacitor (g m C) low-pass filter (LPF) filters out residual/echoes from the hybrid/channel. An echo canceller (EC) technique is also employed to further reduce the reflections of the PAM-2 back channel (BC). The highly asymmetric SBD transceivers with 12-Gb/s PAM-4 FC and 125-Mb/s PAM-2 BC achieve BER<10 -12 over 5-m cable (15.9 dB loss). Prototype chips fabricated in 40-nm technology consume 78.4 mW, exhibiting an FoM of 0.41 pJ/b/dB.
In this brief, a clock distribution scheme insensitive to supply voltage drift is proposed that minimizes variation of the clock propagation delay caused by the supply voltage change. While the ...overall clock distribution is composed of a current mode logic (CML) path and a CMOS path, most delay variations occur in the CMOS path. In the proposed scheme, delays in the CMOS path such as CML-to-CMOS converter (C2C) and inverters, are adjusted to compensate for the supply voltage drift. The bias generator provides self-generated bias voltages in response to the supply voltage drift for delay adjustment in the C2C and inverters. The proposed clock distribution path is fabricated in a 40 nm CMOS process with an active area of 0.004 mm 2 . Measured results show that the proposed scheme reduces the root-mean-square (RMS) jitter from 3.97 ps RMS to 1.62 ps RMS when the 1.1-V supply voltage is modulated by a sinusoidal wave of the 10-MHz, 100-mV peak-to-peak swing. Power consumption with differential 6-GHz clock is 11.02 mW over the clock path distance of 0.4 mm.
Skews between data and strobe signals can occur in HBM transceivers due to process and voltage variations across the base die. Skew compensation is introduced into the deserializers of our ...quarter-rate single-ended receiver for next-generation unmatched source-synchronous HBM interfaces. Data and strobe signals are energy-efficiently realigned by using a 45° strobe phase DQS 45 . This phase, which is equidistant between quadrature strobe phases DQS 0 and DQS 90 , is generated by a digital type phase interpolator of our receiver. The transceiver, including the proposed receiver, was designed and fabricated in a 65nm CMOS process. The receiver corrects skews to within 7.8ps at a data-rate of 6.4Gb/s, with an energy cost of 0.83pJ/bit per pin.
This brief presents a power- and area-efficient forwarded-clock (FC) receiver with a delay-locked loop (DLL)-based self-tracking loop for unmatched memory interfaces. In the proposed FC receiver, the ...self-tracking loop is composed of two-stage cascaded DLLs to support a burst mode. The proposed scheme compensates for a delay drift neither by relying on data (DQ) transitions nor by re-training but with a write training of the memory controller to fine-tune a data strobe (DQS) path delay through DLLs. The proposed FC receiver is fabricated in the 65-nm CMOS technology and the active area including 4 DQ lanes is 0.0329 mm 2 . After the write training is completed at supply voltage of 1 V, the measured timing margin remains larger than 0.31 UI when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC receiver achieves an energy efficiency of 0.45 pJ/bit.
As data transfer rates increase, clock frequencies used for high-speed data paths also increase. Thus, multiphase clocks are typically utilized in DRAMs to relax timing margins because of the reduced ...timing budget. However, phase errors between multiphase clocks, due to device mismatch, degrade the valid data sampling window. To reduce phase error, several multiphase correction schemes have been proposed 1-4. The active poly-phase filter-based open-loop scheme exhibits a small RMS jitter contribution, but the remaining phase error after the error correction is considerably varied and large in its operating frequency range 1. A distributed delay-locked loop (DLL) 2 offers the smallest RMS jitter, but the residual phase error is non-negligible as well due to the mismatch of error detection circuits in each calibration loop. The phase error corrector with a relaxation oscillator-based phase detector is also susceptible to the mismatch 3. The digital DLL-based scheme adopts a shared digital feedback loop to eliminate the effect of mismatch 4. However, it shows a larger RMS jitter contribution than the distributed DLL due to quantization noise and the increased clock path delay. Since the delay of in-phase clock is always fixed at the mid-point, overall set of codes of digitally-controlled delay lines (DCDLs) may not be at their optimum in terms of jitter. Because jitter and total delay of clock paths are increased more than necessary, it leads to degradation of the data eye. In this paper, an improved quadrature error corrector (QEC), the calibration of which starts from the minimum delay code over all DCDLs, is proposed along with an asynchronous and seamless-calibration on-off scheme for the reduction of power consumption in the operating state after calibration.
To meet the demand for high memory bandwidth, high-bandwidth memory (HBM) uses a silicon interposer technology to increase the number of I/O pins. Interfaces with the silicon interposer provide a ...higher throughput (Gb/s/µm) than other packaging technologies due to the high channel density. To increase the throughput further, either the per-pin data rate or the channel density should be increased. Since increasing the per-pin data rate requires a complex and power-hungry circuitry, increasing the channel density is an effective way to achieve the high throughput. However, a main problem with reducing the channel pitch is the crosstalk (XT) between adjacent lanes 1. If the channels are stacked vertically for high channel density, the vertically adjacent channels become additional XT sources. There have been many research reports on the XT cancellation (XTC) between the printed circuit board (PCB) traces, but only a few have been studied and reported on the XTC between silicon interposer channels. An XTC scheme for an on-chip interconnect, which is similar to the silicon interposer channel was proposed in 2, but it works only for a capacitively driven interconnect. A decision feedback-based XT canceller presented in 3 can cancel the multiple XT lane sources, but it consumes much power because of the large number of feedback taps. This paper presents a high throughput transceiver for HBM with 3D-staggered channels in the silicon interposer. The proposed FFE-combined XTC scheme efficiently compensates for XT from the vertically and horizontally adjacent channels, allowing for high channel density. The transceiver achieves the throughput of (Gb/s/µm) by reducing the channel pitch down to 0.5µm.
A clock generator using an injection-locked oscillator (ILO) offers remarkable jitter performance with low-overhead of additional circuits such as injection switches. Because the injection clock ...cleans the edge of the oscillator in every injection period, jitter accumulation is avoided. However, the ILO alone causes a severe reference spur owing to the mismatch between the desired oscillation frequency set by the injected reference and the free-running frequency that could change over the process, supply voltage, and temperature (PVT) variations. For this reason, continuously tuning the free-running oscillation frequency, F OSC , to nullify the frequency error, F ERR , is required. Here F ERR is the frequency difference between F OSC and the multiplication ratio, N, times the reference frequency, F REF . For minimizing such performance degradations, techniques such as pulse gating and replica-delay cells have been presented. While the minimization of F ERR is achieved, the path delay mismatch between the injection and the phase detector remains unsolved, limiting the spur reduction capability. Thus, a precise calibration for equalizing the delay mismatch is required for achieving low spur performance. This paper proposes an injection-locked all-digital phase-locked loop (IL-ADPLL) with a time-division dual calibration (TDDC) scheme for reducing the reference spur with robust performance against PVT variations.