Timing is Everything: How to optimize clock distribution in PCIe applications

Puneet Sareen contributed to the updates and additions to this blog in January 2018.

PCI Express® (PCIe®) is an industry-leading standard input/output (I/O) technology. It is one of the most commonly used I/O interface in servers, personal computers, and other applications. Through the years the standard has evolved to accommodate higher data rates (see Table 1). PCIe Generation 3 introduced a new encoding scheme that allows doubling the data throughput without doubling the data rate. The PCI-SIG announced recently that the fourth generation of PCIe has a bit rate of 16 gigatransfers per second (GT/s). 

Table 1: Data throughput of the PCIe Generations

With the increased data rates, the requirements for the reference clock scale up in relation. This article focuses on the reference clock needs.

The PCIe Reference Clock (RefClk) specifications are defined for three different architectures: Data Clocked, Separate RefClk, and Common RefClk. Each architecture has specific filter functions. The effective jitter seen at the Rx’s clock-data recovery inputs is a function of the difference in Rx and Tx PLL bandwidth and peaking convolved with the jitter spectrum of the RefClk. It is also dependent on the RefClk architecture.

In the Separate RefClk architecture, both transmitter (Tx) and receiver (Rx) receive a separate RefClk. This results in tight jitter requirements and Spread Spectrum Clocking (SSC) cannot be applied.

In the Data Clocked architecture, a single RefClk is connected to the Tx, whereas the Rx uses the embedded clock signal from the data stream. A Clock Data Recovery (CDR) circuit extracts the clock from the data stream. It has the most relaxed jitter specifications and SSC may be applied. However, it is a relatively new standard and therefore not supported by many devices.

The best alternative and most commonly-used standard is the Common RefClk architecture. It provides the same RefClk to both Tx and Rx. It supports SSC, reduces electromagnetic interference

(EMI), and it is easy to implement. The drawback of this architecture is that the RefClk needs to fulfill the <12ns skew requirement. Below is a common RefClk architecture with an application example. An example of the PCI Mezzanine card is shown in the Figure 1.

Figure 1: PCI Mezzanine Card

The common REFCLK jitter requirements after applying the standard compliant filter functions are shown in the table 2. 

Table 2: Common RefClk jitter specifications after applying the filter functions

A common PCIe application consists of several building blocks. The heart of the system is the root complex which represents the root of the I/O system. The root complex connects the CPU with memory and can have multiple PCIe ports.  In addition you have switches and PCIe endpoints (e.g.: graphics cards). All components of the I/O system need to fulfill the jitter requirement for Tx/Rx and RefClk. If all building blocks are PCIe GenX compatible, all need to fulfill the rms RefClk requirement (Figure 2).

Figure 2: Solution 1 – PCIx card example with PCIe Gen3/4 Common RefClk jitter limits

The system shown in Figure 1 could be realized by using a clock generator. Such an implementation may end up requiring more than one clock generator based on the clock tree solution, since there are other system clocks that need to be generated as well. The system clock generator could generate reference clocks for Gigabit Ethernet devices, SATA controllers, DDR clocks, and others. In Figure 3, the RefClk generator is replaced by a clock buffer. This simplifies the clocking tree and provides a more cost- and space-optimized solution.

Figure 3: Solution 2 - PCIx card example using a RefClk buffer such as LMK00338

Table 3: Solution 1 vs Solution 2 - Space and cost comparison

While using a Clock buffer to distribute the RefClk, the additive jitter of the buffer needs to be considered. Additive jitter is defined as the added amount of jitter to the input signal caused by the device itself and can be calculated as:

It assumes that the noise processes are random and the input noise is not correlated to the output noise. The jitter output of a buffer can be calculated respectively with this formula:

The LMK00338 is an ultra-low additive jitter PCIe clock buffer with typical additive jitter of 30fs rms for PCIe Gen3/4 applications. Table 4 shows the additive jitter performance with different PCIe filter functions applied.

Table 4: Additive jitter performance of LMK00338

A high performance PCIe Gen3/4 clock generator like the CDCM6208 provides a RefClk with 160.66fs rms jitter (2MHz to 5MHz filter) and CDCI6214 with 255fs rms jitter. If this clock gets distributed, the LMK00338 will add 25fs rms to the RefClk signal. Using the formula above, the output jitter will calculate to only 162.54 fs rms (Table 5). In the worst case, if the RefClk generator could have 499fs rms jitter and the PCIe Gen4 jitter limits would not be exceeded with the LMK00338.

Table 5 shows the additive jitter performance of the LMK00338 without applying the PCIe filter functions. Due to the low additive jitter of 77fs rms (integration bandwidth: 12kHz to 20MHz), the buffer is suitable for most high-performance clock applications using HCSL signaling. A smaller 4-output version is available as well.

Table 5: Effect of the clock buffer driven by a low jitter RefClk source

In the PCIx cards a common problem is power supply noise. The noise can be generated from multiple sources starting with switching power supplies, digital circuits such as CPUs, ASICs or FPGAs, and so on. Power supply bypassing will help to filter out some of this noise, whereas the remainder of the noise will affect the device performance. When this remaining noise hits on a power supply of a clock distribution device it can cause narrow-band phase modulation as well as amplitude modulation on the clock output.

The LMK00338 exhibits very good and well-behaved power supply ripple rejection (PSRR) characteristics of below -75dBc at 100MHz output frequency across a noise frequency range from 100kHz to 10MHz. This noise immunity will help to simplify the power supply bypassing and is another big advantage of this device.

The analysis above shows that an ultra-low additive jitter clock buffer can be used in Common RefClk systems without any concerns as long as the RefClk generator fulfills the jitter requirements. 

Furthermore, the universal input stage of the LMK00338 can accept any differential or single-ended signals and translates this to eight HCSL outputs. For PCIe Gen4 the maximum RefClk jitter is assumed to be less than 0.5ps rms. Therefore the buffered common RefClk architecture will be more suitable for the more stringent, newer PCIe standards.

Additional resources: