This is the author's version of an article that has been published in this conference. Changes were made to this version by the publisher prior to publication. The final version is available at http://dx.doi.org/10.1109/LAMC50424.2021.9601893

# Transmitter and Receiver Equalizers Optimization for PCI Express Gen6.0 based on PAM4

Roberto J. Ruiz-Urbina<sup>1,2</sup>, Francisco E. Rangel-Patiño<sup>1,2</sup>, José E. Rayas-Sánchez<sup>1</sup>, Edgar A. Vega-Ochoa<sup>2</sup>, and Omar H. Longoria-Gandara<sup>1</sup>

<sup>1</sup> Department of Electronics, Systems, and Informatics, ITESO – The Jesuit University of Guadalajara, Tlaquepaque, Jalisco, 45604 Mexico

<sup>2</sup> Intel Corp. Zapopan, Jalisco, 45019 Mexico (e-mail: francisco.rangel@intel.com)

Abstract —The continuously increasing bandwidth demand from new applications has led to the development of the new PCIe Gen6, reaching data rates of 64 GT/s and adopting PAM4 modulation scheme. While PAM4 solves the bandwidth constraint in high-speed interconnects, it brings new challenges for the physical channel analysis. Equalization (EQ) plays an important role even with PAM4 signaling. PCIe specification defines requirements to perform EQ at the transmitter (Tx) and at the receiver (Rx). During the EQ process, one combination of Tx/Rx EQ coefficients must be selected to meet the performance requirements of the system. Testing all possible coefficient combinations is prohibitive. Current industrial practice consists of finding a subset of combinations at post-silicon validation using maps of EQ coefficients. Finding this subset of coefficients is timeconsuming, along with all the new challenges imposed by PAM4. In this paper, we propose an optimization approach for PCIe Gen6 link EQ. Our proposal is based on a suitable objective function formulated over the channel operating margin (COM), which is a new figure of merit (FOM) adopted by standards of communications for signaling speeds beyond 25 Gbps.

*Index Terms* — channel, COM, CTLE, equalization, equalization maps, eye-diagram, FIR, ISI, jitter, NRZ, optimization, PAM4, PCIe, post-silicon validation, receiver, transmitter.

#### I. INTRODUCTION

Nowadays, peripheral component interconnect express (PCIe) virtually operates in all modern computer systems as a motherboard-level interconnect, as a passive backplane interconnect, and as an expansion card interface for add-in boards. Recently, PCIe has been also adopted for automotive advanced driver assistance systems (ADAS) [1].

Being an open industry standard, PCIe has succeeded as a global input/output (I/O) interconnect supported by a robust compliance program to ensure a unified interoperability between devices from different companies [2].

Even though there is a continuous increase in the PCIe bandwidth, new applications require even higher data rates [2]. However, as the data rate increases beyond 32 giga-transfers per second (GT/s), the bandwidth becomes the bottleneck of high-speed wireline transceiver, which is severely influenced by the channel and package loss when using the conventional non-return-zero (NRZ) signaling method [3]. In order to overcome this problem, the next generation of PCIe will adopt the pulse amplitude modulation 4-level (PAM4) signaling.



Fig. 1. Comparison of NRZ vs PAM4 encoding.

PCIe Gen6 specification defines an adaptive mechanism for equalization (EQ) to determine the optimum value of the transmitter (Tx) and receiver (Rx) EQ coefficients within a fixed time limit. A typical PCIe system may have hundreds of combinations of EQ coefficients, and then testing every coefficients combination using an exhaustive enumeration method becomes prohibitive. In order to reduce the selection time, the current post-silicon method for Gen3-Gen5 consist of finding a subset of coefficients during post-silicon validation, and then program it into the system BIOS. The method consists of using maps of EQ coefficients, which are obtained by measuring the eye diagram characteristics as figure of merit (FOM). The method consists of finding the set of coefficients that qualify the FOM as near optimal.

However, data collection to generate the EQ maps is a timeconsuming process in post-silicon validation. Along with all the new challenges imposed by PAM4 and the additional manufacturability considerations, testing every coefficients combination using an exhaustive methodology to find the best combination is impractical, and then optimization algorithms are required for choosing the right coefficients values.

In this paper, we propose an efficient optimization methodology to determine the optimal subset of coefficients for the Tx and Rx in a PCIe Gen6 equalization process during postsilicon validation. While there are not still silicon samples with PCIe Gen6, we are validating the proposed method by using MATLAB SerDes Toolbox. The procedure implies defining an effective objective function based on a new FOM as required for PAM4, and then applying a direct numerical optimization



Fig. 2. EQ map coefficients search space for optimization. From [7].

method using Nelder-Mead.

The rest of the paper is organized as follows. Section II presents an overview of the PCI Express evolution. Section III provides an overview of PAM4 functionality and its challenges. The on-chip EQ per PCIe6 specification is described in Section IV. The PCIe link equalization based on Tx EQ coefficient matrix maps is presented in Section V. The objective function formulation and the optimization procedure are presented in Section VI. Finally, the results are discussed in Section VII, and conclusions are given in Section VIII.

# II. PCI EXPRESS EVOLUTION

PCIe has advanced over the years to meet the requirements across different computing markets. PCIe started in 2003 with a data rate of 2.5 GT/s, supporting bandwidths of  $\times 1$ ,  $\times 2$ ,  $\times 4$ ,  $\times 8$ , and  $\times 16$ . Four years later, PCIe Gen2 was released, doubling the rata rate of Gen1 to 5 GT/s. In 2010, PCIe Gen3 emerged, reaching data rates of 8 GT/s. Developing PCIe Gen4 to 16 GT/s took longer due to the feasibility of cost-effective materials; the channels loss specification for 16 GT/s was increased to 28 dB after the materials loss improvements, and it was finally released in 2017.

After the release of PCIe Gen 4, the needs of new applications increased dramatically, demanding faster data transfer, leading to the release in 2019 of the PCIe Gen5 specification, with bandwidth reaching data rates of 32 GT/s, and channel specification increased to 36 dB of attenuation at 16 GHz [2]. The continuous bandwidth demand from applications such artificial intelligence, machine learning, gaming, visual computing, storage, graphics accelerators, high-end networking, coherent interconnects, internet of things (IoT) and memory expanders has led to the development of the new PCIe Gen6 specification to be released in 2021, reaching data rates of 64 GT/s. PCIe Gen6 will adopt PAM4 modulation scheme.

# **III. PULSE AMPLITUDE MODULATION 4-LEVEL**

For PAM4 encoding, the signal has four voltage levels, which encodes two bits per voltage level, as shown in Fig. 1. PAM4 uses Gray coding which combines the most significant bit (MSB) and least significant bit (LSB) pairs in a data stream into



Fig. 3. Evolution of the normalized coefficients and objective function values during optimization.

one of the four voltage levels. By encoding two bits into one symbol, PAM4 achieves the same data rate using half of the bandwidth as compared to the NRZ signaling [4] (see Fig. 1). In this sense, this transmission scheme has a spectral efficiency of 2 bits/symbol/Hz.

While PAM4 solves the bandwidth issue in high-speed communication channels, it brings new challenges for the physical channel analysis. PAM4 has 4 levels and three eyediagrams, as opposed to one eye-diagram of NRZ. PAM4 is also more susceptible to errors due to various noise sources caused by reduced voltage (and timing) ranges. This results in a higher bit error rate (BER) performance, several orders of magnitude higher than the standard 10<sup>-12</sup> BER of the previous PCIe generations. It also introduces new challenges in slicers, transition jitter, and equalizers [5]. In effect, EQ plays a critical role even with PAM4 signaling.

### **IV. PCI EXPRESS GEN6.0 EQUALIZATION**

Tx and Rx equalization schemes, such as Tx de-emphasis and pre-emphasis, Rx continuous time linear equalization (CTLE) and decision feedback equalization (DFE), are widely used in high-speed serial links to open the eye diagram [5], and they continue to be used for PAM4. PCIe Gen6 specification defines the requirements to perform on-chip EQ at the Tx and at the Rx to mitigate undesired effects and minimize the BER.

The Tx equalization coefficients for 64 GT/s are based on a feed-forward equalizer (FFE) 4-tap finite impulse response (FIR) filter ( $C_{m2}$ ,  $C_{m1}$ ,  $C_0$ , and  $C_p$ ). The cursor ( $C_0$ ), pre-cursors ( $C_{m1}$ ,  $C_{m2}$ ) and post-cursor ( $C_p$ ) coefficients refer to whether the FFE filter taps work on an advanced or delayed signal with respect to time. The serial data output is obtained by the superposition of four consecutive received pulses ( $v_{nm2}$ ,  $v_{nm1}$ ,  $v_n$ ,  $v_{np}$ ) that are weighted with the four different filter tap coefficients [6]. The filter response can be then adjusted by controlling the tap coefficients values. Therefore, the output signal ( $v_{out}$ ) of the FIR filter is given by

$$v_{\text{out}} = v_{\text{nm2}}C_{\text{m2}} + v_{\text{nm1}}C_{\text{m1}} + v_{\text{n}}C_{0} + v_{\text{np}}C_{\text{p}}$$
(1)

The EQ topology at the Rx can be a combination of a CTLE and a DFE. The CTLE is a simple one coefficient ( $C_{\text{ctle}}$ ) continuous-time circuit with high-frequency gain boosting,

whose transfer function can compensate (equalize) the channel response.

PCIe specification defines some predefined set of values for the Tx coefficients, referred to as presets, and then are adaptively changed during the channel training. The Tx EQ coefficients are computed at the upstream port by the coefficient adaptation algorithm using the received signal. Hence, these coefficients are communicated to the downstream port by using the PCIe protocol. The Tx at the downstream port then applies the received coefficients setting to its Tx EQ circuitry. This process of computing the coefficients, communicating them to the Tx, and checking the signal quality can be repeated multiple times until the required BER is achieved [6], [7].

# V. TRANSMITTER EQUALIZATION COEFFICIENT MATRIX

With the purpose of having unit-gain for the Tx equalizer, the values of the Tx coefficients are subjected to the following protocol constraints:

$$\begin{aligned} |C_{m2}| + |C_{m1}| + |C_0| + |C_p| = 1 \\ \text{subject to } C_{m2} \ge 0, \ C_{m1} \le 0, \ C_p \le 0 \end{aligned}$$
(2)

These constraints are implemented by determining only  $C_{m1}$ and  $C_p$  to fully define  $v_{out}$  from (1), being  $C_{m2} = 1/24$  [6] and  $C_0$  implied by (2). The coefficients must support all eleven values for the presets, and their respective tolerances, as defined by the Tx preset ratios table in the PCIe specification [6].

When all the PCIe specification constraints are applied, the resulting coefficients space may be mapped onto a triangular matrix, as shown in Fig. 2, where several EQ maps, one per CTLE coefficient ( $C_{ctle}$ ) value, are superimposed.  $C_{m1}$  and  $C_p$  coefficients are mapped onto the y-axis and x-axis, respectively. Each matrix cell corresponds to a valid combination of Tx coefficients, and  $u(x^*)$  correspond to a combination of  $C_{m1}$ ,  $C_p$  and  $C_{ctle}$  that results in a FOM qualified as optimum.

The current post-silicon method for Gen3-Gen5 to find the best subset of coefficients for both Tx and Rx, consists of using these EQ maps, which are obtained by measuring the eye diagram characteristics as FOM (i.e. eye height, eye width, or eye diagram area) of the received signal for each of the  $C_m$ ,  $C_p$ , and  $C_{ctle}$  combinations across each lane and device pairing, requiring multiple EQ maps. The method consists of finding the set of Tx and Rx coefficients that qualify the FOM as near optimal. However, this has to be performed by ensuring at the same time that the responses around the best  $C_m$ - $C_p$  matrix cell are at least 80% of the value of that matrix cell, as illustrated in Fig. 2, to avoid selecting a combination of too-high sensitivity. Due to the large number of EQ maps, along with all the new challenges imposed by PAM4, finding the optimal subset of coefficients would be a very challenging task.

#### VI. OBJECTIVE FUNCTION FORMULATION AND OPTIMIZATION

The channel operating margin (COM) is a signal to noise ratio and it is a new FOM that takes into account passive and active channel components. COM has been adopted by several standards of communications, and it is gaining attention as a valuable tool for analyzing high-speed digital channels, especially for signaling speeds beyond 25 Gbps. Above such data rate, eye diagram and BER performances as FOM may not be applied due to the intrinsic limitation of receiving a closed eye diagram at the receiver [8]. The COM computation algorithm is a statistical simulation of victim and aggressor unit interval pulse responses available in MATLAB.

COM is a ratio between a calculated signal amplitude to a calculated noise amplitude [9] defined as

$$COM = 20\log\frac{A_{\text{signal}}}{A_{\text{noise}}}$$
(3)

where  $A_{\text{signal}} \in \Re$  is a signal with a data rate of PCIe Gen6 and  $A_{\text{noise}} \in \Re$  is the noise on the signal considering inter-symbol interference (ISI), random and dual-Dirac jitter noise, and crosstalk. We aim at finding the optimal set of coefficients values to maximize COM. Therefore, an initial objective function to be minimized is defined as

$$u(\mathbf{x}) = -20\log \frac{A_{\text{signal}}(\mathbf{x})}{A_{\text{noise}}}$$
(4)

 $A_{\text{signal}}$  is function of the coefficient values ( $C_{\text{m1}}$ ,  $C_{\text{p}}$ ,  $C_{\text{ctel}}$ ) contained in vector  $\mathbf{x}$ . The signal amplitude,  $A_{\text{signal}}$ , comes from the middle eye, and the combined noise term,  $A_{\text{noise}}$ , is the vertical eye closure. The optimization problem for COM is then defined as,

$$\mathbf{x}^* = \arg \min_{\mathbf{x}} u(\mathbf{x}) \tag{5}$$

We need to ensure the optimal system response is within a suitable area in the coefficients search space of the EQ map. Here we follow our work in [7] to define the corresponding objective function. The four responses around  $u(x^*)$  must be at least 80% of the value of  $u(x^*)$ , as shown in Fig. 2, where  $u_{i,j}$  are the objective function values per (4) for the *i*-th  $C_{m1}$  and *j*-th  $C_p$  values.

The new optimization problem can be defined through a constrained formulation, such that the optimal set of coefficients maximizes the system response without exceeding the limit of  $0.8u(x^*)$  in the vicinity,

$$\mathbf{x}^* = \arg\min_{\mathbf{x}} u(\mathbf{x}) \tag{6}$$

subject to  $l_{11}(\mathbf{x}) \le 0$ ,  $l_{12}(\mathbf{x}) \le 0$ ,  $l_{21}(\mathbf{x}) \le 0$ ,  $l_{22}(\mathbf{x}) \le 0$ with

$$l(\mathbf{x}) = \begin{bmatrix} u(C_{m1i^{*}+1}, C_{ctle}, C_{pj^{*}}) & u(C_{m1i^{*}-1}, C_{ctle}, C_{pj^{*}}) \\ u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}+1}) & u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}-1}) \end{bmatrix}^{-} (7)$$

$$0.8u(C_{m1i^{*}}, C_{ctle}, C_{pj^{*}}) \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$$



Fig. 4. Eye diagram results before the optimization process.

where  $C_{\text{ml}i^*}$  and  $C_{\text{pi}^*}$  are the set of coefficients that maximize the FOM for each of the  $C_{\text{ctle}}$  values. A more convenient unconstrained formulation can be defined by adding a penalty term, as

$$U(\mathbf{x}) = -20 \log \frac{A_{\text{signal}}(\mathbf{x})}{A_{\text{noise}}} + \left| L(\mathbf{x}) \right|^2 \left[ \frac{|u(\mathbf{x}^{(0)})|}{\left| \max\left\{ l(\mathbf{x}^{(0)}) \right\} \right|^2} \right]$$
(8)

where  $L(\mathbf{x})$  is a corner limits penalty function, defined as

$$L(\mathbf{x}) = \max\left\{0, l(\mathbf{x})\right\} \tag{9}$$

and  $x^{(0)}$  is the starting point. Then, our unconstrained objective function to optimize the system response is

$$\mathbf{x}^* = \arg \min_{\mathbf{x}} U(\mathbf{x}) \tag{10}$$

We aim at finding the optimal set of coefficients values  $x^*$  by solving (10) using a gradient-free computationally inexpensive optimization technique, such as the Nelder-Mead method [10].

### VII. RESULTS

Following the optimization process defined in Section VI, we found a set of Tx and Rx coefficients that minimize the objective function in just 160 evaluations, as shown in Fig. 3. The time consumed by the optimization was 33.4 minutes. Fig. 3 also shows the evolution of the Tx and Rx coefficients during the optimization process.

Figures 4 and 5 show the eye diagram results at the receiver, before and after optimization, respectively. The optimized equalization coefficients yield an eye width and height average improvement of 27% and 131%, respectively, and a COM improvement of 150%, reducing also the eyes asymmetries. The obtained eye-diagram results confirm the effectiveness of the proposed optimization approach.

#### VIII. CONCLUSION

The continuous bandwidth demand from new applications has led the development of the new PCIe Gen6, reaching data rates of 64 GT/s and adopting the PAM4 modulation scheme.



Fig. 5. Eye diagram results after the optimization process.

While PAM4 solves the bandwidth constraint in high-speed interconnects, it brings new challenges for the physical channel analysis and performance. Equalization plays an important role even with PAM4 signaling.

We proposed in this paper a direct optimization approach for PCIe link EQ based on a suitable unconstrained objective function formulated over COM, which is a new FOM adopted by communication standards for signaling speeds beyond 25 Gbps. The optimized EQ coefficients were tested by measuring the eye diagrams at the receiver, confirming a significant improvement on eye area, eye symmetry, and COM.

#### REFERENCES

- [1] R. Solomon, *Life in the Fast Lane: PCI Express*® *Technology in Automotive*. [Online]. Available: https://pcisig.com/
- [2] D. Das-Sharma, "What's the difference from PCIe 3.0 to PCIe 6.0?," *Electronic Design*, Jul. 2020. [Online]. Available: http://www.electronicdesign.com
- [3] Q. Liao et al., "The design techniques for high-speed PAM4 clock and data recovery," in *IEEE Int. Conf. Integrated Circuits, Technologies and Applications (ICTA 2018)*, Beijing, China, 2018, pp. 142-143.
- [4] Intel Application Note, AN835 PAM4 Signaling. [Online]. Available: https://www.intel.com
- [5] J. He, N. Dikhaminjia, M. Tsiklauri, J. Drewniak, A. Chada, and B. Mutnury, "Equalization enhancement approaches for PAM4 signaling for next generation speeds," in *IEEE 67th Electronic Components and Technology Conf. (ECTC 2017)*, Orlando, FL, 2017, pp. 1874-1879.
- [6] PCI SIG Org. (2020), PCI Express® Base Specification Revision 6.0 Version 0.7 [Online]. Available: https://pcisig.com/specifications.
- [7] F. E. Rangel-Patiño, J. E. Rayas-Sánchez, E. A. Vega-Ochoa, and N. Hakim, "Direct optimization of a PCI Express link equalization in industrial post-silicon validation," in *IEEE Latin American Test Symp. (LATS 2018)*, Sao Paulo, Brazil, Mar. 2018, pp. 1-6.
- [8] F. de Paulis, T. Wang-Lee, R. Mellitz, M. Resso, R. Rabinovich, and O. J. Danzy, "Backplane channel design exploration at 112 Gbps using channel operating margin (COM)," in *IEEE Int. Symp. Electromagnetic Compatibility & Signal/Power Integrity (EMCSI 2020)*, Reno, NV, USA, 2020, pp. 158-163.
- [9] "IEEE Standard for Ethernet Amendment 2: Physical Layer Specifications and Management Parameters for 100 Gb/s Operation Over Backplanes and Copper Cables," in *IEEE Std 802.3bj-2014 (Amendment to IEEE Std 802.3-2012 as amended by IEEE Std 802.3bk-2013)*, vol., no., pp.1-368, 3 Sept. 2014, doi: 10.1109/IEEESTD.2014.6891095.
- [10] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, "Convergence properties of the Nelder-Mead simplex method in low dimensions," *Society for Industrial and Applied Mathematics J. Optim.*, vol. 9, no. 1, pp. 112–147, 1998.