This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: PCIe re-enumeration fails

Part Number: AM5728

Hi,

This question is regarding our own board and Linux based on the ti SDK 05.00.00.15

We are using the PCIe bus as two separate links to two different FPGA devices where the AM5728 is the RC.

When I start the system Linux will detect both FPGA's and will enumerate them. I can communicate to these FPGA's with no problems.

But I want to be able to reconfigure the fpga in run-time while the CPU is not reset.
What I did with another CPU and also what I read from other users is the following sequence:
- remove the fpga EP device from the discover devices with:
   echo 1 >/sys/bus/pci/devices/0001:01:00.0/remove      (0001.01.00.0 is the id of the fpga EP)
- now reconfigure the fpga
- when the fpga is reconfigured do a PCI rescan with:
   echo 1 >/sys/bus/pci/devices/0001:00:00.0/rescan       (0001:00:00.0 is the id of the bridge in the AM5728)

And normally the FPGA should be visible again using lspci to list all the devices, but this is not the case.

After some investigations I found out that the LTSSM of the PCIe controller was disabled, so I enabled it to see what would happen.
When I did that I could re-enumerate the FPGA again and it showed up again using the lspci command. But unfortunately this was not enough (as expected).
After some more investigation and reading the user manual I could see that the PCIe controller is reset (PCIe link-down reset condition) and because of the reset the PCIe bridge configuration is set to default and never reprogrammed in the PCIe driver

I also tried SDK 06.00.00.07 but now change in the behavior.

My question is if this functionality works and if so what I should do differently. For us it is crucial that the FPGA can be reprogrammed run-time so the re-enumeration must work.

Regards,
Robert

  • Hi Robert,

    Robert Pot said:
    - remove the fpga EP device from the discover devices with:
       echo 1 >/sys/bus/pci/devices/0001:01:00.0/remove      (0001.01.00.0 is the id of the fpga EP)

    Instead of removing the EP, please remove the RC in this step.

  • Hi,

    thanks for the suggestion. My first thought that this indeed would help because you now remove the bridge and this would enforce to re-initialize everything.

    But unfortunately this didn't help, I still get the same situation.

    root@am57xx-evm:~# echo 1 >/sys/bus/pci/devices/0001\:00\:00.0/remove
    echo 1 >/sys/bus/pci/rescan
    [  251.011159] pci 0001:00:00.0: ignoring class 0x000000 (doesn't match header type 01)
    [  251.019251] pci 0001:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
    [  251.031164] PCI: bus1: Fast back to back transfers enabled
    [  251.036835] pci 0001:00:00.0: not setting up bridge for bus 0001:01

    And the output from dmesg:
    [  251.011000] pci_bus 0000:00: scanning bus
    [  251.011041] pcieport 0000:00:00.0: scanning [bus 01-ff] behind bridge, pass 0
    [  251.011055] pci_bus 0000:01: scanning bus
    [  251.011065] pci_bus 0000:01: bus scan returning with max=01
    [  251.011078] pcieport 0000:00:00.0: scanning [bus 01-ff] behind bridge, pass 1
    [  251.011088] pci_bus 0000:00: bus scan returning with max=ff
    [  251.011107] pci_bus 0001:00: scanning bus
    [  251.011141] pci 0001:00:00.0: [104c:8888] type 01 class 0x000000
    [  251.011159] pci 0001:00:00.0: ignoring class 0x000000 (doesn't match header type 01)
    [  251.018953] pci 0001:00:00.0: calling pci_fixup_ide_bases+0x0/0x68
    [  251.018998] pci 0001:00:00.0: supports D1
    [  251.019007] pci 0001:00:00.0: PME# supported from D0 D1 D3hot
    [  251.019017] pci 0001:00:00.0: PME# disabled
    [  251.019241] pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 0
    [  251.019251] pci 0001:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
    [  251.031019] pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 1
    [  251.031140] pci_bus 0001:01: scanning bus
    [  251.031157] pci_bus 0001:01: fixups for bus
    [  251.031164] PCI: bus1: Fast back to back transfers enabled
    [  251.036799] pci_bus 0001:01: bus scan returning with max=01
    [  251.036811] pci_bus 0001:01: busn_res: [bus 01-ff] end is updated to 01
    [  251.036823] pci_bus 0001:00: bus scan returning with max=01
    [  251.036835] pci 0001:00:00.0: not setting up bridge for bus 0001:01

    It does work when you do a remove and a rescan but this doesn't reset the RC.
    When a new image in the FPGA is loaded the PCIe link is gone and the RC bridge is reset and you get the behavior as above.

    Did your suggestion work for other customers?

  • Robert Pot said:
    Did your suggestion work for other customers?

    Yes, please check the link below.

    https://e2e.ti.com/support/processors/f/791/p/745415/2759411#2759411

  • Thanks for the link but that is a different situation. In that situation a PCI bridge from PLX was used between the CPU and the endpoint. In that case the phy link on the CPU is never gone and the removal of the phy link is causing the reset of all the registers in the RC.

    Also in my situation when the fpga is not yet configured during the linux boot on the CPU and the fpga is configured later a rescan (using "echo 1 >/sys/bus/pci/rescan") gives a working situation.

    The problem arise when you have a working configuration and the fpga is reprogrammed causing the phy link between cpu and fpga to be gone until the fpga is repogrammed.

    regards,
    Robert

  • Robert,

    PCIe basically doesn't support hot-plug. I am not sure what causes the problem with your FPGA setup. But if you plug this FPGA EP to other RC, for example, a PC, does the scenario you described work?

  • Hi Bin,

    we have another product where we do the same but with a NXP cpu and this works fine. But I admit that also here we have a PCIe switch between CPU and FPGA.
    That was one of the reason to use the AM5728 because this had 2 PCIe busses and this eliminated the requirement to use a PCIe switch for us.

    When I read the answers in https://e2e.ti.com/support/processors/f/791/p/585029/2153006 there is the remark "The Root Complex does not do rescan currently. It expects the End Point being up already.". If that is still the case that means that I can't make this working?

    What I think the reason is the following as described in the user manual:

    "PCIe link-down reset condition - When the PCIe link has gone down and goes up again, namely
    transitions from D3cold/L3 back to D0, the PCIe core auto-applies this internal fundamental reset and
    reports it on IRQ event (PCIECTRL_TI_CONF_IRQSTATUS_MAIN[11] LINK_REQ_RST) after link-up. "

    And the problem is that the linux driver is not handling the above reset condition as a lot of registers in the PCIe-core keep their reset value after this event. They are only programmed at startup.

    Regards,
    Robert

  • Hi,

    the current situation is that I changed the driver in such a way that the PCIe bridge is reconfigured when there is a lost of the link.

    This seems to work although I'm not sure if this is correct in all cases. E.g. the handling of MSI interrupts is most likely not correctly done, but that is a later issue.

    For now this is done for me.