This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: PCIe 2x lane G2 mode link up sometimes fail

Part Number: AM5728

Is there a environment can test PCIe 2 Lane mode, IDK only route out Lane0, now my customer have link up problem on 2 lane mode, after further debug, use only Lane 1 in 1x mode, also have problem. while if only use Lane 0, no problem.

So I want to ask if there is a environment to test it in advance.

Customer use PCIe communicate with FPGA, as FPGA firmware is dynamically load and boot, after each time FPGA firmware loaded, need to re-establish link-up PCIe, sometimes can't link-up at this point. The First time link up is always OK.

Did experiment, and compared result as below:

When link up successfully:

RC and EP status: Detect->Polling->Configuration->L0->Recovery->L0

When link up fail: 

RC: Detect->Polling->Configuration->L0->Recovery->Detect_QUIET,

EP : Detect->Polling->Configuration->L0->Recovery->Detect->Polling

suspect the problem occurring at Recovery.Speed, as the first time use 2.5GT/S to enter L0, then LTSSM enter Recovery status to do Speed Change, change to 5GT/S, if success, enter L0, if not, enter Detect. in order to verify this, force PCIe_SS2 in Gen 1 mode, then won't enter Recovery.Speed stage, test result show no failure on link up.

summary:

#1. PCIe_SS1 2-Lane mode link up failure sometimes, and not stable, guess it is PCIe12_PHY problem.

#2. with same procedure, if only use PCIe_SS1+PCIe1_PHY, very stable. if use only PCIe_SS2 + PCIe2_PHY, sometimes Recovery Speed Change fail.

In Linux, customer rescan PCIe in below sequence:

#1. Load FPGA or reset PCIe Link.

#2. echo 1 > /sys/bus/pci/reinit  (call kernel dra7xx_pcie_host_init())

#3. echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove (call pci_stop_and_remove_bus_device)

#4. echo 1 > /sys/bus/pci/rescan 

Are those sequence step right for re link-up PCIe? 

  • Tony, 

    Could you confirm on step 1:

    >#1. Load FPGA or reset PCIe Link.

    Did you set EP in Link training state upon reset?

    Also I did not see you tried to force 2Lane-mode in Gen 1 speed only. If Gen 1 is stable, then seems the issue is related to Gen 2 speed at 2Lane. If may be worth to check Link Capability Registers and compare with Gen 2 speed. 

    Also can you confirm what is the clock setting for the FPGA? From your description, it seems FPGA has its own refclk, and get independently reset, then wait for link training from RC. That is similar to the board-to-board sequence. 

    I am not familiar what the dra7xx_pcie_host_init() does, but assume the fpga is not effected even it tries to drive out some refclk or reset (not connected to FPGA)?

    I also heard the following sequence worked on J7:

    echo "1" > /sys/bus/pci/devices/0000\:00\:00.0//remove
    sleep 1
    echo "1" > /sys/bus/pci/rescan

    so not sure if AM57 implemented the same remove function and may give it a try if so. 

    Jian

  • Hi Jian,

    "Did you set EP in Link training state upon reset?"
    The LTSSM state of the EP will be reset to detect state and wait for the RC to enable the pci link after step 1.

    "Also I did not see you tried to force 2Lane-mode in Gen 1 speed only. If Gen 1 is stable, then seems the issue is related to Gen 2 speed at 2Lane. If may be worth to check Link Capability Registers and compare with Gen 2 speed. "
    I tested the gen1 2-lanes mode today and it is stable.The value of Link Capability Registers of gen1 and gen2 is the same.

    "Also can you confirm what is the clock setting for the FPGA? From your description, it seems FPGA has its own refclk, and get independently reset, then wait for link training from RC. That is similar to the board-to-board sequence. "
    The FPGA reference clock is provided by a separate pll and is not affected by the cpu reference clock.

    "I am not familiar what the dra7xx_pcie_host_init() does, but assume the fpga is not effected even it tries to drive out some refclk or reset (not connected to FPGA)?"
    dra7xx_pcie_host_init() is mainly used to establish the pci link and wait for the link.Sorry I didn't understand the meaning of the second half sentence.

    "I also heard the following sequence worked on J7:
    echo "1" > /sys/bus/pci/devices/0000\:00\:00.0//remove
    sleep 1
    echo "1" > /sys/bus/pci/rescan
    so not sure if AM57 implemented the same remove function and may give it a try if so. "

    The kernel driver implements the same remove and rescan function.I also used this before in below sequence:
    #1. Load FPGA or reset PCIe Link.
    #3. echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove
    #4. echo 1 > /sys/bus/pci/rescan 

    When this scan queue is executed for the first time, it is good. But after executing this scan queue again (without resetting the cpu), the problem of "Invalid Bridge Configuration" appeared. I solved this problem by adding dra7xx_pcie_host_init() ( #2)  to the scan queue.so I rescan PCIe in below sequence:

    #1. Load FPGA or reset PCIe Link.
    #2 dra7xx_pcie_host_init()
    #3. echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove
    #4. echo 1 > /sys/bus/pci/rescan 

    Thanks and Regards,

    Allen

  • Thanks Allen. I agree the issue seems to be related to PHY2. 

    Let me check with someone in the team. 

    Jian

  • Allen/Tony:

    one more question, you mentioned:

    >>The First time link up is always OK.

    Do you mean RC can consistently link with EP, if RC boot up directly (while EP is in LTSSM)? so the issue is also related to the sequence of:

    #2 dra7xx_pcie_host_init()
    #3. echo 1 > /sys/bus/pci/devices/0000:00:00.0/remove
    #4. echo 1 > /sys/bus/pci/rescan 

    that is different than a cold boot on RC?

    Jian

  • Hi Jian,

    Yes, if the FPGA has been loaded and is in LTSSM before loading the kernel, it is always successful to establish a link during the kernel initialization process. I also think the problem is related to this queue.

    This queue is just to re-establish the link and re-scan the bus. The RC cold boot will perform Surrounding Modules Global Initialization and PCIe Controllers Initalization in the kernel, including link establishment and bus scanning.See am572x PRM 24.9.5 PCIe Controller Low Level Programming Model.

    I have tried do PCIe Controller Global Initialization in this queue, but the issue still exists.

    Thanks and Regards,

    Allen

  • Allen,

    Can you try stop link and then again start link in dra7xx_pcie_host_init()

    diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c
    index 6f30362ea23b..d2858ef25155 100644
    --- a/drivers/pci/controller/dwc/pci-dra7xx.c
    +++ b/drivers/pci/controller/dwc/pci-dra7xx.c
    @@ -207,6 +207,8 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp)

    dw_pcie_setup_rc(pp);

    + dra7xx_pcie_stop_link(pci);
    + mdelay(1);
    dra7xx_pcie_establish_link(pci);
    dw_pcie_wait_for_link(pci);
    dw_pcie_msi_init(pp);

    Thanks

    Kishon

  • Hi Kishon,

    After load FPGA or reset PCIe Link. (#1), the link is already stopped, the register PCIECTRL_TI_CONF_DEVICE_CMD bit0 LTSSM_EN is 0.

    I added stop link and then again start link in dra7xx_pcie_host_init(), the test result is the same.

    Thanks,

    Allen

  • Allen, 

    were there additional test tried beyond Kishon's patch? 

    regards

    jian

  • Jian,

    Customer has no idea on how to move forward, need BU help to provide hint. BTW, is 2 Lane mode used by other customers, or if BU can verify 2 Lane mode?

  • Tony, 

    Sorry for the delayed response. I will review again with Brad with updates from above experiments, to see if there are other PHY1 related issues we may suspect. 

    also can you confirm if the the LTSSM states he recorded in the first post was obtained by software polling or via an analyzer. If he has an analyzer handy, can he send the comparison log between a fail vs. success?

    regards

    jian 

  • Hi Jian,

    The LTSSM state of the RC (AM5728)  is obtained by reading the register PCIECTRL_TI_CONF_DEVICE_CMD LTSSM_STATE bit[7:2] LTSSM_STATE and the EP (FPGA) is captured through JTAG using related tools.

    Tkanks

    Allen