This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6422: j721e-pcie-host, link does not go up. Intermittent issue.

Part Number: AM6422

Tool/software:

Hello there.

We're facing an issue where we don't detect the nvme via PCIe. But this is not deterministic and doesn't happen often, so it's not easy to debug.

Using Linux 6.1.69.

The first difference in dmesg is

        j721e-pcie-host f102000.pcie: Link up

that doesn't show when the nvme is not detected. We've read suggestions on the Internet about.

  • PCIe Power Management: Testing kernel parameters like pcie_aspm=off and nvme_core.default_ps_max_latency_us=0.
  • Making sure nvme is built-in in the kernel, not a module (link in this forum). Same for CONFIG_PHY_CADENCE_TORRENT, CONFIG_PHY_J721E_WIZ.
  • Check if there can be a race condition when the modules get loaded.
This is the dmesg when the nvme is recognized.

[    7.221986] j721e-pcie-host f102000.pcie: host bridge /bus@f4000/pcie@f102000 ranges:
[    7.229987] j721e-pcie-host f102000.pcie:       IO 0x0068001000..0x0068010fff -> 0x0068001000
[    7.238577] j721e-pcie-host f102000.pcie:      MEM 0x0068011000..0x006fffffff -> 0x0068011000
[    7.247180] j721e-pcie-host f102000.pcie:   IB MEM 0x0000000000..0x0fffffffff -> 0x0000000000
[    7.354706] j721e-pcie-host f102000.pcie: Link up
[    7.359631] j721e-pcie-host f102000.pcie: PCI host bridge to bus 0000:00
[    7.367346] pci_bus 0000:00: root bus resource [bus 00-ff]
[    7.372856] pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0x68001000-0x68010fff])
[    7.382335] pci_bus 0000:00: root bus resource [mem 0x68011000-0x6fffffff]
[    7.389252] pci 0000:00:00.0: [104c:b010] type 01 class 0x060400
[    7.395272] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0xfffffffff 64bit pref]
[    7.402656] pci 0000:00:00.0: supports D1
[    7.406666] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    7.415412] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    7.423687] pci 0000:01:00.0: [1e95:100e] type 00 class 0x010802
[    7.429770] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[    7.436640] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x00001fff]
[    7.443246] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[    8.012688] sdhci-am654 fa00000.mmc: Power on failed
[    8.048358] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    8.588760] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    8.595455] pci 0000:00:00.0: BAR 0: no space for [mem size 0x1000000000 64bit pref]
[    8.603196] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x1000000000 64bit pref]
[    8.611282] pci 0000:00:00.0: BAR 14: assigned [mem 0x68100000-0x681fffff]
[    8.618160] pci 0000:01:00.0: BAR 0: assigned [mem 0x68100000-0x68103fff 64bit]
[    8.625487] pci 0000:01:00.0: BAR 5: assigned [mem 0x68104000-0x68105fff]
[    8.632278] pci 0000:00:00.0: PCI bridge to [bus 01]
[    8.637243] pci 0000:00:00.0:   bridge window [mem 0x68100000-0x681fffff]
[    8.644343] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[    8.650808] pcieport 0000:00:00.0: PME: Signaling with IRQ 510
[    8.657135] pcieport 0000:00:00.0: AER: enabled with IRQ 510
[    8.663729] j721e-pcie-ep f102000.pcie-ep: can't request region for resource [mem 0x0f102000-0x0f102fff]
[    8.673286] j721e-pcie-ep: probe of f102000.pcie-ep failed with error -16
And this is it when it's not recognized.
[    7.218049] j721e-pcie-host f102000.pcie: host bridge /bus@f4000/pcie@f102000 ranges:
[    7.226051] j721e-pcie-host f102000.pcie:       IO 0x0068001000..0x0068010fff -> 0x0068001000
[    7.234641] j721e-pcie-host f102000.pcie:      MEM 0x0068011000..0x006fffffff -> 0x0068011000
[    7.243207] j721e-pcie-host f102000.pcie:   IB MEM 0x0000000000..0x0fffffffff -> 0x0000000000
[    8.005540] sdhci-am654 fa00000.mmc: Power on failed
[    8.041146] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    8.255968] j721e-pcie-host f102000.pcie: PCI host bridge to bus 0000:00
[    8.262708] pci_bus 0000:00: root bus resource [bus 00-ff]
[    8.268198] pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0x68001000-0x68010fff])
[    8.277670] pci_bus 0000:00: root bus resource [mem 0x68011000-0x6fffffff]
[    8.284581] pci 0000:00:00.0: [104c:b010] type 01 class 0x060400
[    8.290601] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0xfffffffff 64bit pref]
[    8.297984] pci 0000:00:00.0: supports D1
[    8.301994] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    8.310775] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    8.319096] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    8.325794] pci 0000:00:00.0: BAR 0: no space for [mem size 0x1000000000 64bit pref]
[    8.333559] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x1000000000 64bit pref]
[    8.341649] pci 0000:00:00.0: PCI bridge to [bus 01]
[    8.347313] pcieport 0000:00:00.0: PME: Signaling with IRQ 510
[    8.353660] pcieport 0000:00:00.0: AER: enabled with IRQ 510
[    8.360173] j721e-pcie-ep f102000.pcie-ep: can't request region for resource [mem 0
Does this ring a bell?
Since the issue is not easy to reproduce, hints to narrow the search space are very appreciated.

Regards,
Nelson.-
  • Hi Nilson,

    I don't know yet how to debug the problem, but the boot log shouldn't have any message related to "j721e-pcie-ep", because the PCIe controller should work in RC mode, but not in EP mode. I am not sure how you got these boot message, but you shouldn't need to change any thing PCIe related in kernel device tree k3-am642-evm.dts, which has already configured PCIe to RC mode.

  • Thanks Bin Lu.

    When the issue happens this makes the SSD be detected. We just found out.

        echo 1 > /sys/bus/pci/rescan

    We think it suggests it's related to power-saving latency at startup. When we check the power-saving setting of the SSD, we see it's at level 4 and those are the latency numbers. Do you think this makes sense?

    ps      4 : mp:0.0050W non-operational enlat:54000 exlat:45000 rrt:4 rrl:4
                rwt:4 rwl:4 idle_power:- active_power:-
                active_power_workload:-

    We need to make further tests and that might take a few days.

  • Hi Nelson,

    I am not familiar with NVME or power-saving setting. But what is the intended PCIe bus clock topology in your design? The default serdes setting in k3-am642-evm.dts is that the AM64x serdes outputs 100MHz clock on its ext_refclk pins.

  • Bin Liu, we settled for

        echo 1 > /sys/bus/pci/rescan

    and this is working. We cannot further debug at this moment. If we have more data/info in the future I'll open a related question.

    Thanks for your time. 

  • Hi Nelson,

    Thanks for the update.