This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5708:PCIe x2 Gen2 Link training failed

Part Number: AM5708
Other Parts Discussed in Thread: CDCM61002, TEST, AM5728, SYSCONFIG

Hi,TI

  I am using AMA5708 to implement x2 lane and Gen2 communication with xilinx A7 devices through PCIe. However, during use, it was found that individual AM5708 devices would have PCIe link instability, and I hope to get some troubleshooting suggestions.

  The following descriptions of our hardware design and troubleshooting.

  (1)The following is our hardware basic framework. The PCIe pins of AM5708 is directly connected to Xilinx-A7 GTX pins, and a 100nf capacitor is connected in series. Use CDCM61002 clock chip to provide 100MHz clock to AM5708 and Xilinx A7 respectively, as PCIe reference clock

(2)The following are the main phenomena of our PCIe failure:
expect x2 lane, gen2, actually x1 lane, gen2
expect x2 lane, gen2, actually x2 lane, gen1
expect x2 lane, gen2, in fact, AM5708 cannot perform link training normally


There are 3 sets of test abnormalities in 5 sets of boards, and there is a 1/3 probability that the test fails. The tests of the other two boards are normal.

(3)Here are some troubleshooting we have done:

① Set the desired speed to x1 lane, gen1; all boards can pass the test

② Modify the AM5708 PCIe clock source to ACSPCIE or ADPLL, the probability of test failure is the same, and there is no improvement

③ PCIe loop adds 100R terminal matching resistance, no improvement

④ Exchange the PCIe CLK capacitance of the normal board with the capacitance on the abnormal board, no improvement

⑤ The board power supply ripple is about 18.4mV, which is less than the MAX Vpp 50mV described in the manual

⑥ Hardware loopback PCIe link, using Xilinx A7 eye diagram test; the eye diagram is better, and the line signal routing quality is no problem

⑦ Interchange the AM5708 CPU of the normal board and the AM5708 CPU of the abnormal board;
The AM5708 CPU of the normal board + PCB test of the abnormal board was normal for 150 times; the AM5708 CPU of the abnormal board + PCB test of the normal board failed after 5 tests.

In summary, the PCIe link training problem currently located may be related to different AM5708 CPUs, and the error probability is relatively high.

Hope to get your troubleshooting suggestions,Thanks.

  • Hi, anyone update here?

  • Is there any expert who can answer this question? Thank you very much!

  • Hi, Gary

        Thanks for your reply. I already try reset pcie link.
        Sadly this doesn't solve our current problem, Do you have any other suggestion?

  • Hi,

        We've tried a few more recently, see if we have any other suggestions

        ① We have a board in the form of a hardware solution as AM5728 + Xilinx A7; this board can achieve stable PCIe x2lane gen2 communication under the same test conditions

        ② The problem seems to focus on the lane problem, we set up several modes for comparison, lane2 gen2(×), lane2 gen1(×), lane1 gen2(√), lane1 gen1(√)

        ③ Splitting PCIe into 2 x1 mode, using lane0 and lane1 and FPGA alone to achieve PCIe x1 communication, all can link normally; so we think the link PCB signal quality should meet the requirements

        ④ Modify the code to re-train the link when the pcie link training fails. This method sometimes can be recognized normally after re-link training, and sometimes can not be recognized normally.

        ⑤ Since we are using DSP to initialize PCIe to RC, to check the impact of linux settings, start the dsp program from uboot. It is also probabilistic that it cannot be recognized normally.

  • Hi, anyone update here?

  • Yu, thanks for the thorough tests and clear summary. It seems the instability issue happens when x2 mode is used. I will check with our hardware team to see if there are registers to improve the marginal channel condition. 

    Regards

    JIan

  • Can you please do a step-by-step review of the the following PHY programming requirements located in the "PCIePHY Subsystem Low-Level Programming Model" section of the TRM version SPRUHZ7J:

    • PCIePHY Subsystem Low-Level Programming Sequence
    • Preferred PCIe_PHY_RX SCP Register Settings
    • Preferred USB3_PHY_RX SCP Register Settings

    Please provide register dumps of all registers mentioned in the above tables for review.

    Couple of notes:

    • Please ensure that the writes to CM_CLKMODE_APLL_PCIE[7].REFSEL and CM_CLKMODE_APLL_PCIE[1:0].MODE_SELECT are not being combined into a single write to the CM_CLKMODE_APLL_PCIE register. The write to REFSEL must occur prior to the write to MODE_SELECT, or else the APLL_PCIE will always use the DPLL_PCIE as its reference clock.
    • The "Preferred USB3_PHY_RX SCP Register Settings" are required to configure the RX of the second PCIe lane. The registers are named "USB3" due to the fact that both PCIe and USB3 are muxed onto the same lane, with USB3 being the default mux option.
  • Hi

      Thanks for your reply.

      We noticed the note on the AM570x errata and set the PHY RX value of the program to be consistent with the manual. But unfortunately, this does not improve, you can help to confirm whether the settings are wrong.

    PCIEPHYRX_ANA_PROGRAMMABILITY_REG1 0x4A09400C 0x08028000

    PCIEPHYRX_DIGITAL_MODES_REG1 0x4A094028 0x00E33000

    PCIEPHYRX_DLL_REG1 0x4A094024 0xC0A41915

    PCIEPHYRX_EQUALIZER_REG1 0x4A094038 0x0000F880

    USB3PHYRX_ANA_PROGRAMMABILITY_REG1 0x4A08440C 0x08028080

    USB3PHYRX_DIGITAL_MODES_REG1 0x4A084428 0x00E33000

    USB3PHYRX_TRIM_REG4 0x4A08441C 0x80000000

    USB3PHYRX_DLL_REG1 0x4A084424 0xC0A41915

    USB3PHYRX_EQUALIZER_REG1 0x4A084438 0x0000F880

      Regarding the PCIe initialization process, we refer to the PCIE_idkAM571x_wSoCFile_C66BiosExampleProject routine for development. The program configures the CM_CLKMODE_APLL_PCIE register separately, and the attention you mentioned is not found.

    /*Locking APLL to 2.5GHz with 100MHz input*/
    regVal = HW_RD_REG32(SOC_CKGEN_CM_CORE_BASE + CM_CLKMODE_APLL_PCIE);

    HW_SET_FIELD(regVal, CM_CLKMODE_APLL_PCIE_CLKDIV_BYPASS,
    CM_CLKMODE_APLL_PCIE_CLKDIV_BYPASS_PCIEDIVBY2_BYPASS_1);

    HW_SET_FIELD(regVal, CM_CLKMODE_APLL_PCIE_REFSEL,
    CM_CLKMODE_APLL_PCIE_REFSEL_CLKREF_ACSPCIE);

    HW_WR_REG32(SOC_CKGEN_CM_CORE_BASE + CM_CLKMODE_APLL_PCIE, regVal);

    HW_WR_FIELD32(SOC_CKGEN_CM_CORE_BASE + CM_CLKMODE_APLL_PCIE,
    CM_CLKMODE_APLL_PCIE_MODE_SELECT,
    CM_CLKMODE_APLL_PCIE_MODE_SELECT_APLL_FORCE_LOCK_MODE);

    /*Wait for APLL lock*/
    while (((HW_RD_REG32(SOC_CKGEN_CM_CORE_BASE + CM_IDLEST_APLL_PCIE) &
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_MASK) <<
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_SHIFT) !=
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_APLL_LOCKED);

  • Hi,

      Supplementary update mentioned PCIe PHY configuration register value, except for PCIESS2 POWER, there is no abnormal place. But currently we are using lane x2 mode, which should be controlled by PCIe SS1, so it is believed that the value of the PCIESS2 POWER register does not affect

    CM_PCIE_CLKSTCTRL 0x4A0093A0 0x00003F02
    CM_PCIE_PCIESS1_CLKCTRL 0x4A0093B0 0x00000702
    CM_PCIE_PCIESS2_CLKCTRL 0x4A0093B8 0x00040702
    CM_L3INIT_OCP2SCP3_CLKCTRL 0x4A0093E8 0x00000001
    OCP2SCP_SYSCONFIG 0x4A080010 0x00000011
    OCP2SCP_SYSSTATUS 0x4A080014 0x00000001
    OCP2SCP_TIMING 0x4A080018 0x0000008F
    CM_CLKSEL_DPLL_PCIE_REF 0x4A00820C 0x0602EE09
    CM_DIV_M2_DPLL_PCIE_REF 0x4A008210 0x0000040F
    CM_CLKMODE_DPLL_PCIE_REF 0x4A008200 0x00000007
    CM_IDLEST_DPLL_PCIE_REF 0x4A008204 0x0000001F
    CTRL_CORE_SMA_SW_6 0x4A003C14 0x00020000
    CM_CLKMODE_APLL_PCIE 0x4A00821C 0x00000181
    CM_IDLEST_APLL_PCIE 0x4A008220 0x00000001
    CTRL_CORE_PCIE_CONTROL 0x4A003C3C 0x00000005
    USB3PHYTX_TEST_CONFIG_REG 0x4A08482C 0x80000000
    USB3PHYRX_ANA_PROGRAMMABILITY_REG1 0x4A08440C 0x08028080
    CTRL_CORE_PHY_POWER_PCIESS1 0x4A003C40 0x0680C000
    CTRL_CORE_PHY_POWER_PCIESS2 0x4A003C44 0x00000000(not correspond to datasheet value)
    CTRL_CORE_PHY_POWER_USB 0x4A002370 0x0684C000
    CTRL_CORE_PCIE_PCS 0x4A003C34 0x00960000

  • Thanks for the details. It looks like you are following all the correct steps. As a sanity check, can you disable or remove the CDCM clock generator and confirm that the APLL_PCIE does not lock. It should get stuck in the /*Wait for APLL lock*/ while loop.

  • OK, I'll test and reply.

    By the way, does this test require multiple power on verification or only one time?

  • Hi,

       I tried your suggestion and something strange did happen.

    ① I added some printing to make it easier for me to view the information

      Log_print0(Diags_INFO, "Waiting for APLL lock ...\n");
    /*Wait for APLL lock*/
    while (((HW_RD_REG32(SOC_CKGEN_CM_CORE_BASE + CM_IDLEST_APLL_PCIE) &
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_MASK) <<
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_SHIFT) !=
    CM_IDLEST_APLL_PCIE_ST_APLL_CLK_APLL_LOCKED);
    Log_print0(Diags_INFO, "APLL lock is complete!\n");   

    ② Remove LJCB_CLKP and LJCB_CLKN pins input clock

    ③ Running the test case, it actually passed the APLL lock

    [ 0.000] Watchdog.disableWatchdog set to true. Set to false in config file to enable Watchdog.
    [ 0.000] 23 Resource entries at 0x95000000
    [ 0.000] [t=0x00062ca1] xdc.runtime.Main: --> main:
    [ 0.000] [t=0x0009421b] xdc.runtime.Main: <-- main:
    [ 0.000] registering rpmsg-proto:rpmsg-proto service on 61 with HOST
    [ 0.000] [t=0x010d849a] xdc.runtime.Main: NameMap_sendMessage: HOST 53, port=61
    [ 0.000] Watchdog.disableWatchdog set to true. Set to false in config file to enable Watchdog.
    [ 0.000] [t=0x010f8444] xdc.runtime.Main: --> smain:
    [ 0.000] [t=0x0110ea90] Server: Server_create: server is ready
    [ 0.000] [t=0x0111681d] Server: <-- Server_create: 0
    [ 0.000] [t=0x0111c545] Server: --> Server_exec:
    [ 0.000] [t=0x01122044] xdc.runtime.Main: Pcie_open handle (0x95160dec)
    [ 0.000]
    [ 2.000] [t=0x56e4f629] xdc.runtime.Main: Waiting for APLL lock ...
    [ 2.000]
    [ 2.000] [t=0x56e5bf05] xdc.runtime.Main: APLL lock is complete!
    [ 2.000]
    [ 2.100] [t=0x5b308c0a] xdc.runtime.Main: Successfully configured Outbound Translation!

    ④ check some key register

    CM_CLKMODE_APLL_PCIE: 0x4A00821C -- 0x00000181
    CTRL_CORE_SMA_SW_6: 0x4A003C14 -- 0x00020000

    I have checked all the information about ACSPCIE in the manual, and I don't see anything that needs to be configured except these two registers, can you help to confirm this problem?

    Thanks.

  • Please reconfirm that the writes to CM_CLKMODE_APLL_PCIE[7].REFSEL and CM_CLKMODE_APLL_PCIE[1:0].MODE_SELECT are not being combined by the compiler into a single write to the CM_CLKMODE_APLL_PCIE register. The write to REFSEL must occur prior to the write to MODE_SELECT, or else the APLL_PCIE will always use the DPLL_PCIE as its reference clock.

  • I turned off the engineering optimization option and put the configuration of REFSEL and MODESELECT into two functions. The configuration of the print register registers is separate, but the PLL LOCK still passes without blocking

    [ 0.000] [t=0x03136463] xdc.runtime.Main: Pcie_open handle (0x95160dec)
    [ 0.000]
    [ 2.000] [t=0x58e543a0] xdc.runtime.Main: log-1 REFSEL CM_CLKMODE_APLL_PCIE: 0x102
    [ 2.000]
    [ 2.000] [t=0x58e633fa] xdc.runtime.Main: log-2 REFSEL CM_CLKMODE_APLL_PCIE: 0x102
    [ 2.000]
    [ 2.000] [t=0x58e6fa48] xdc.runtime.Main: log-3 for REFSEL CM_CLKMODE_APLL_PCIE: 0x182
    [ 2.000]
    [ 3.000] [t=0x83d08d28] xdc.runtime.Main: log-4 CM_CLKMODE_APLL_PCIE: 0x182
    [ 3.000]
    [ 4.000] [t=0xaeb9ee07] xdc.runtime.Main: log-5 CM_CLKMODE_APLL_PCIE: 0x181
    [ 4.000]
    [ 4.000] [t=0xaebabc6b] xdc.runtime.Main: Waiting for APLL lock ...
    [ 4.000]
    [ 4.000] [t=0xaebb612c] xdc.runtime.Main: APLL lock is complete!
    [ 4.000]
    [ 4.000] [t=0xaebbffa8] xdc.runtime.Main: log-6 CM_CLKMODE_APLL_PCIE: 0x181
    [ 4.000]
    [ 4.100] [t=0xb3067943] xdc.runtime.Main: Successfully configured Outbound Translation!
    [ 4.100]
    [ 4.100] [t=0xb3076f7f] xdc.runtime.Main: Starting link training...
    [ 4.100]
    [ 4.104] [t=0xb330aa42] xdc.runtime.Main: Link is up.
    [ 4.104]
    [ 4.104] [t=0xb331831a] xdc.runtime.Main: Expect 2 lanes, found 2 lanes (PASS)
    [ 4.104]
    [ 4.104] [t=0xb3324aee] xdc.runtime.Main: Expect gen 2 speed, found gen 2 speed (PASS)
    [ 4.104]
    [ 4.104] [t=0xb333e804] Server: --> Server_exec: PCIe test Pass!
    [ 4.104] [t=0xb3347de0] Server: <-- Server_exec: 0
    [ 4.104] [t=0xb334ef1a] Server: --> Server_delete:
    [ 4.104] [t=0xb335e046] Server: <-- Server_delete: 0
    [ 4.104] [t=0xb33681a1] xdc.runtime.Main: <-- smain: 0

  • Thank you for confirming that the writes are not happening simultaneously. Can you please check if the APLL_PCIE lock bit is already set prior to setting the MODE_SELECT bit. I want to confirm that the APLL_PCIE has not been previously locked by some other code beforehand. If the lock bit is already set, please analyze the system to determine if the PLL is accidentally being configured twice.

  • I checked CM_IDLEST_APLL_PCIE[0] ST_APLL_CLK, this bit is unlocked before REFSEL and MODE_SELECT

    [ 0.000] [t=0x00103dd0] xdc.runtime.Main: Pcie_open handle (0x95160dec)
    [ 0.000]
    [ 2.000] [t=0x55e21e8b] xdc.runtime.Main: log-1 CM_IDLEST_APLL_PCIE: 0x0
    [ 2.000]
    [ 2.000] [t=0x55e2f2f9] xdc.runtime.Main: log-1 REFSEL CM_CLKMODE_APLL_PCIE: 0x102
    [ 2.000]
    [ 2.000] [t=0x55e3c079] xdc.runtime.Main: log-2 CM_IDLEST_APLL_PCIE: 0x0
    [ 2.000]
    [ 2.000] [t=0x55e4756f] xdc.runtime.Main: log-2 REFSEL CM_CLKMODE_APLL_PCIE: 0x102
    [ 2.000]
    [ 2.000] [t=0x55e53a04] xdc.runtime.Main: log-3 CM_CLKMODE_APLL_PCIE: 0x182
    [ 2.000]
    [ 2.000] [t=0x55e5f392] xdc.runtime.Main: log-4 CM_CLKMODE_APLL_PCIE: 0x181
    [ 2.000]
    [ 2.000] [t=0x55e6aedf] xdc.runtime.Main: log-5 CM_IDLEST_APLL_PCIE: 0x1
    [ 2.000]
    [ 2.000] [t=0x55e76414] xdc.runtime.Main: log-5 for REFSEL CM_CLKMODE_APLL_PCIE: 0x181
    [ 2.000]
    [ 3.000] [t=0x80cd6619] xdc.runtime.Main: Waiting for APLL lock ...
    [ 3.000]
    [ 3.000] [t=0x80ce2556] xdc.runtime.Main: APLL lock is complete!
    [ 3.000]
    [ 3.000] [t=0x80cec75f] xdc.runtime.Main: log-6 CM_CLKMODE_APLL_PCIE: 0x181
    [ 3.000]
    [ 3.000] [t=0x80cf806c] xdc.runtime.Main: log-6 CM_IDLEST_APLL_PCIE: 0x1
    [ 3.000]
    [ 3.000] [t=0x80d03322] xdc.runtime.Main: log-6 for REFSEL CM_CLKMODE_APLL_PCIE: 0x181
    [ 3.000]
    [ 3.100] [t=0x85190077] xdc.runtime.Main: Successfully configured Outbound Translation!
    [ 3.100]
    [ 3.100] [t=0x8519f578] xdc.runtime.Main: Starting link training...
    [ 3.100]
    [ 3.104] [t=0x8543317c] xdc.runtime.Main: Link is up.
    [ 3.104]
    [ 3.104] [t=0x854409ac] xdc.runtime.Main: Expect 2 lanes, found 2 lanes (PASS)
    [ 3.104]
    [ 3.104] [t=0x8544d11d] xdc.runtime.Main: Expect gen 2 speed, found gen 2 speed (PASS)

  • Thank you for confirming that the APLL_PCIE was unlocked before executing the programming sequence.

    The APLL_PCIE should not successfully lock when REFSEL=ACSPCIE and the external clock is not present on the ljcp_clkp/n pins. Can you let me know how you "Remove LJCB_CLKP and LJCB_CLKN pins input clock", and how you confirmed that it was removed?

  • Hi,

      In the hardware design of our board, cdcm61002 clock chip will be connected to arm and FPGA PCIe reference clock input pins respectively through 0R resistor .

      We disconnected the 0R resistor which connected to arm.

  • Thanks for letting me know the disconnection mechanism. Can you provide the full schematic of the Refclk (from cdcm61002 to ljcb_clkp/n) for review?

    Also, can you also confirm the Output Type that is selected in the following cdcm61002 register?

  • The picture is the pcie clock design we are using now, the disconnected resistors mentioned earlier are R405 and R406

  • Thank you for the schematic showing the disconnection resistors. Can you add a picture of the schematic between the disconnection resistors and the ljcb_clkp/n input pins? I want to review the termination and coupling.

  • Hi,
      the other end of the disconnected resistor will be directly connected to the ljcb_clkp and ljcb_clkn pins of the CPU through a 100nf capacitor

  • The CDCM61002 datasheet says that 100ohm parallel termination is required in LVDS mode:

    If this termination is missing in your system, the correct voltage levels will not be achieved.

  • Thank you for your reply. The existing circuit does not have this terminal resistance. We will confirm this.

    At present, we still want to confirm the use of ACSPCIE external ref clock. after disconnecting 61002, the CPU will still pass the PLL lock normally. Do you have any idea or project images to help us confirm this problem?

  • Thank you. Yes, your system will need this 100ohm resistance external to the ljcb pins in order to function correctly. In fact, installing this may solve your stability issue.

    Regarding disconnecting the 61002, the APLL_PCIE must be locking on the DPLL_PCIE, even though the mux is programmed to use the ACSPCIE clock as reference. I'm investigating whether this is expect behavior in the scenario where no clock is present on ACSPCIE.

  • Thank you for your reply. We will add terminal resistance for test.

    Please let me know if there is any progress on ACSPCIE ref clock.

  • Hi,

      Sorry, I missed something. We have tried to add the terminal resistance problem, which has not improved the effect.

      The investigation record is at the top of the post.

       "③ PCIe loop adds 100R terminal matching resistance, no improvement"

  • Hi, any updata here?