This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PCIe hotplug - linux does not detect/enumerate new PCIe device

Expert 3020 points

Hello,

my customer wants to hotplug a FPGA to the Netra:

- Netra boots linux

- Netra loads FPGA via spi interface

- FPGA starts up and toggles the PRESENT line on the PCIe

- PCIe end point does not show up

 

Does the current linux kernel for Netra support hotplug?

Does it monitor the PRESENT signal?

Can I force re-enumeration from userspace?

 

Regards,

Lo

  • Lo,

    The h/w does not support hotplug.

    To manually force re-enumeration, use "echo 1 > /sys/bus/pci/rescan" but this may not be sufficient if link is not established and you may need to first toggle link training by setting bit 0 of register @0x51000004 or even do complete reconfiguration of PCIe h/w if the register is not accessible (try devmem2 utility to access the register).

       Hemant

  • Hello Hemant,

    the re-enumeration does not work correctly, for the output please see below.

    Do you have any other idea?

    Regards,

    Lo

    The netra was booted, then the fpga was loaded:

    [snip]

    root@dm816x-evm:~# lspci -v

     [/snip]

    => The fpga isn't discovered.

     I do a rescan:

    [snip]

    root@dm816x-evm:~# echo 1 > /sys/bus/pci/rescan

    PCI: bus1: Fast back to back transfers enabled pci 0000:00:00.0: BAR 1: can't assign mem pref (size 0x80000000) pci 0000:00:00.0: BAR 0: assigned [mem 0x20000000-0x20000fff] pci 0000:00:00.0: BAR 0: set to [mem 0x20000000-0x20000fff] (PCI address [0x20000000-0x20000fff]) pci 0000:00:00.0: PCI bridge to [bus 01-01]

    pci 0000:00:00.0:   bridge window [io  disabled]

    pci 0000:00:00.0:   bridge window [mem disabled]

    pci 0000:00:00.0:   bridge window [mem pref disabled]

    PCI: enabling device 0000:00:00.0 (0000 -> 0003)

    root@dm816x-evm:~# lspci -v

    00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01) (prog-if 00 [Normal decode])

            Flags: bus master, fast devsel, latency 0

            Memory at 20000000 (32-bit, non-prefetchable) [size=4K]

            Memory at <ignored> (32-bit, prefetchable)

            Bus: primary=00, secondary=01, subordinate=01, sec-latency=0

            Capabilities: [40] Power Management version 3

            Capabilities: [50] MSI: Mask- 64bit+ Count=1/1 Enable-

            Capabilities: [70] Express Root Port (Slot-), MSI 00

            Capabilities: [100] Advanced Error Reporting

    [/snip]

    => still no device.

    Devmem2: Just reading the contents of 0x51000004

    [snip]

    root@dm816x-evm:~# devmem2 0x51000004 /dev/mem opened.

    Memory mapped at address 0x400cb000.

    Read at address  0x51000004 (0x400cb004): 0x00000A07

    [/snip]

    => the bit is already set?

    Devmem2: I try to force the bit to 0, then set it back to 1

    [snip]

    root@dm816x-evm:~# devmem2 0x51000004 w 0x00000A04 /dev/mem opened.

    Memory mapped at address 0x40109000.

    Read at address  0x51000004 (0x40109004): 0x00000A07 Write at address 0x51000004 (0x40109004): 0x00000A04, readback 0x00000A04 root@dm816x-evm:~# devmem2 0x51000004 w 0x00000A07 /dev/mem opened.Unhandled fault: Precise External Abort on non-linefetch (0x1018) at 0x4024b004

    Memory mapped at address 0x4024b000.

    Bus error

    [/snip]

    => at this point, it doesn't matter if I write or read, I always get the bus error.

  • Lo,

    After 1st rescan (and when lspci only shows one device), can you dump the following registers:

    0x51001728 --> The LS 5 bits will show link status, should be 0x11

    0x51002000 --> If above is true, this register should show FPGA PCI config reg 0x00

       Hemant

  • rescan:

    [snip]

    root@dm816x-evm:~# echo 1 > /sys/bus/pci/rescan
    PCI: bus1: Fast back to back transfers enabled
    pci 0000:00:00.0: BAR 1: can't assign mem pref (size 0x80000000)
    pci 0000:00:00.0: BAR 0: assigned [mem 0x20000000-0x20000fff]
    pci 0000:00:00.0: BAR 0: set to [mem 0x20000000-0x20000fff] (PCI address [0x20000000-0x20000fff])
    pci 0000:00:00.0: PCI bridge to [bus 01-01]
    pci 0000:00:00.0:   bridge window [io  disabled]
    pci 0000:00:00.0:   bridge window [mem disabled]
    pci 0000:00:00.0:   bridge window [mem pref disabled]
    PCI: enabling device 0000:00:00.0 (0000 -> 0003)

    [/snip]

    dumping the registers:

    [snip]

    root@dm816x-evm:~# devmem2 0x51001728
    /dev/mem opened.
    Memory mapped at address 0x4015c000.
    Read at address  0x51001728 (0x4015c728): 0x03001011

    root@dm816x-evm:~# devmem2 0x51002000
    /dev/mem opened.
    Memory mapped Unhandled fault: Precise External Abort on non-linefetch (0x1018) at 0x402cc000
    at address 0x402cc000.
    Bus error

    [/snip]

    Theo

  • This is strange - link is up but for some reason FPGA is not responding to the config reads.

    Can you check from FPGA side or using PCIe analyzer if the configuration reads are reaching the FPGA?

    One reason I can think of is if the bus:device:function number combination is causing the trouble - can you dump the register @0x51000008? Of course, during enumeration, the FPGA should respond to any combination at first request as long as function number is one of the supported.

       Hemant

  • the register 0x51000008:

    [snip]

    root@dm816x-evm:~# devmem2 0x51000008
    /dev/mem opened.
    Memory mapped at address 0x40227000.
    Read at address  0x51000008 (0x40227008): 0x00010000

    [/snip]

    I'll try to get more info about the config reads.

    I looked through the documentation, but I can't find any explanation about register 0x51001728 or 0x51002000? Where can I find this? Maybe it's undocumented?

    Theo

  • The 0x51000008 value looks correct - FPGA is labelled as (Bus)1:(Device)0.

    The above register is called CFG_SETUP used to access peer configuration and 0x51002000 marks PCI configuration space of remote peer. More details should be available in TRM PCIe section.

       Hemant 

  • Hi,

    I checked with a pci-e analyser and it get the following:

    * A config read request is sent.

    * A ack follows.

    * But there is no completion.

    => So, it seems that the fpga isn't answering. We are investigating. :) Thanks for the support upto now

  • Hi, I have a similar problem with connecting FPGA (Xilinx ML505) board to EVM816. 

    I've connected chipscope to the Xilinx integrated endpoint and I can confirm that the board receives config requests and responds to them on boot and every time I try to read remote config registers (like 0x51002000), however the OS still reports "Bus error":

    root@dm816x-evm:~# devmem2 0x51002000
    /dev/mem opened.
    Memory mapped Unhandled fault: Precise External Abort on non-linefetch (0x1018) at 0x401de000
    at address 0x401de000.
    Bus error

    Any clues why that might be?

  • Hi, I was trying to get some more information from the OS about the connection but hit a wall with regard to some groups of registers, mostly I would like to review the PCIe Capability Registers (17.4.8 in SPRUGX8 document), unfortunately there is no info about base address for this registers? Does anyone know how to read from there?

  • Wojciech Powiertowski said:

    I've connected chipscope to the Xilinx integrated endpoint and I can confirm that the board receives config requests and responds to them on boot and every time I try to read remote config registers (like 0x51002000), however the OS still reports "Bus error":

    
    
    Have you confirmed if the response from FPGA to config read request (when you see abort) is success?
    Also, when you see the abort, is the local config space @0x51001000 accessible? Can you also check CFG_SETUP register value if it shows correct bus:device:function number - in case there is only one EP (and no switch), the value should be bus=1, device=0 and function =0.
    
    
       Hemant  
  • Hi Hemant,

    Unfortunately I have no way to check validity of the FPGA response, I use an integrated Xilinx core and it doesn't pass any config/interrupt data to the TRN user interface (and it doesn't present Rx/Tx data on any externally visible port, either), since the core handles all config reads and writes. I'm as far as it gets in terms of visibility in FPGA and I can only see status bits informing me that there is a DLLP frame received/transmitted, TLP received/transmitted and Config frame received/transmitted. Also I don't have access to any PCIe analyzer, so I assume that FPGA receives and transmits data correctly (especially since the EVM816 can't connect - as negotiate a link - to a stock Realtek NIC, which works very well in a PC).

    As for the local config space the address 0x51001000 is accesible and it reads: 0x8888104c

    CFG_SETUP (0x51000008) reads: 0x00010000 - I believe it stands for bus=1, device=0 and function=0

    Wojtek

    edit: I would also like to repeat my question about how to get to the PCIe Capability Registers, since there is no base address in any documentation ??

  • CFG_SETUP register looks correct. I think we need to see if FPGA is returning error to config requests as following things are ok:

    1) Link is up, LTSSM is in L0

    2) PCIe module is up, local config is accessible

    3) CFG_SETUP register value is set to be able to access FPGA.

    Can you confirm if Realtek card works well with 8168 EVM?

    Regarding "PCIe Capability Registers", they start from offset 0x70 in local config space for DM8168,  "lspci" should show this.

       Hemant

  • Hi, yes the registers I was able to read does look correct but still linux reports bus error.

    Unfortunately Realtek NIC can't seem to even establish a link with EVM and cpu can't reach L0 with it connected (but the NIC works perfectly with an ordinary PC).

    Since I'm moving to a new project, one of my colleague will probably take over debug of PCIe in EVM. 

  • Ok. For NIC case, just see if you need the patch to force x1 GEN1. Follow the link in the message below (may require manual merge):

    "please try with applying the patch I sent on http://e2e.ti.com/support/embedded/f/354/p/118939/432701.aspx#432701 (scroll to my post with comments "Looks like you are using x1")".

       Hemant