This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM69: NVMe PCIe and kernel panic

Part Number: AM69
Other Parts Discussed in Thread: TDA4VM, ,

Tool/software:

Hello,
we are using SDK 10.00.08 kernel ti-linux-6.6.y and, with this version, during boot, we get a kernel panic if a NVME PCIe device is connected.
With the old SDK 09.02.00.010 kernel ti-linux-6.1.y it was working correctly.

The same PCIe slot works correctly if we plug a different board (a PCIe to USB 3.0 adapter).

I checked kernel configuration and device tree and they seem to be correct.

The kernel panic is triggered by nvme_pci_enable function at this instruction:

if (readl(dev->bar + NVME_REG_CSTS) == -1) {

Here the excerpt of the kernel panic:

[    5.998134] j721e-pcie 2900000.pcie: PCI host bridge to bus 0000:00
[    6.004436] pci_bus 0000:00: root bus resource [bus 00-ff]
[    6.009942] pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0x10001000-0x10010fff])
[    6.019437] pci_bus 0000:00: root bus resource [mem 0x10011000-0x17ffffff]
[    6.026359] pci 0000:00:00.0: [104c:b012] type 01 class 0x060400
[    6.032370] pci_bus 0000:00: 2-byte config write to 0000:00:00.0 offset 0x4 may corrupt adjacent RW1C bits
[    6.042151] pci 0000:00:00.0: supports D1
[    6.046156] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    6.051930] pci 0000:00:00.0: reg 0x224: [mem 0x00000000-0x003fffff 64bit]
[    6.058802] pci 0000:00:00.0: VF(n) BAR0 space: [mem 0x00000000-0x00ffffff 64bit] (contains BAR0 for 4 VFs)
[    6.070865] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    6.079034] pci 0000:01:00.0: [144d:a808] type 00 class 0x010802
[    6.085091] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[    6.092385] pci 0000:01:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[    6.123674] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    6.130306] pci 0000:00:00.0: BAR 7: assigned [mem 0x10400000-0x113fffff 64bit]
[    6.137635] pci 0000:00:00.0: BAR 14: assigned [mem 0x10100000-0x101fffff]
[    6.144514] pci 0000:01:00.0: BAR 0: assigned [mem 0x10100000-0x10103fff 64bit]
[    6.151851] pci 0000:00:00.0: PCI bridge to [bus 01]

[    6.162334] pci 0000:00:00.0:   bridge window [mem 0x10100000-0x101fffff]
[    6.169411] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22
[    6.176042] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[    6.182469] pcieport 0000:00:00.0: PME: Signaling with IRQ 617
[    6.188592] pcieport 0000:00:00.0: AER: enabled with IRQ 617
[    6.194688] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22
[    6.201879] nvme nvme0: pci function 0000:01:00.0
[    6.206701] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[  OK  ] Created slice Slice /system/systemd[    6.215812] SError Interrupt on CPU7, code 0x00000000bf000000 -- SError
[    6.215818] CPU: 7 PID: 64 Comm: kworker/u16:3 Not tainted 6.6.32-01373-gda8dd76693a4-dirty #35
[    6.215823] Hardware name: Toradex Aquila AM69 on Aquila Development Board (DT)
[    6.215826] Workqueue: events_unbound deferred_probe_work_func
[    6.215841] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    6.215845] pc : nvme_pci_enable+0x5c/0x524
[    6.215856] lr : nvme_pci_enable+0x50/0x524
[    6.215859] sp : ffffffc0824737e0
[    6.215861] x29: ffffffc0824737e0 x28: 0000000000000000 x27: ffffffc081106000
[    6.215866] x26: ffffff8800283100 x25: ffffff8800c5b800 x24: 000000000000ffff
[    6.215870] x23: ffffff88020f71f0 x22: ffffff880102a000 x21: ffffff880102a000
[    6.215875] x20: ffffff880102a0c0 x19: ffffff88020f7000 x18: ffffffffffffffff
[    6.215879] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720
[    6.215883] x14: 0720072007200720 x13: ffffffc08111ad70 x12: 0000000000000621
[    6.215887] x11: 000000000000020b x10: ffffffc081172d70 x9 : ffffffc08111ad70
[    6.215891] x8 : 00000000ffffefff x7 : ffffffc081172d70 x6 : 0000000000000000
[    6.215895] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[    6.215899] x2 : 0000000000000000 x1 : ffffff8800ff8000 x0 : 0000000000000000
[    6.215905] Kernel panic - not syncing: Asynchronous SError Interrupt
[    6.215908] CPU: 7 PID: 64 Comm: kworker/u16:3 Not tainted 6.6.32-01373-gda8dd76693a4-dirty #35
[    6.215911] Hardware name: Toradex Aquila AM69 on Aquila Development Board (DT)
[    6.215912] Workqueue: events_unbound deferred_probe_work_func
[    6.215916] Call trace:
[    6.215919]  dump_backtrace+0x94/0x114
[    6.215928]  show_stack+0x18/0x24
[    6.215932]  dump_stack_lvl+0x48/0x60
[    6.215937]  dump_stack+0x18/0x24
[    6.215939]  panic+0x314/0x364
[    6.215945]  nmi_panic+0x8c/0x90
[    6.215949]  arm64_serror_panic+0x6c/0x78
[    6.215951]  do_serror+0x3c/0x78
[    6.215953]  el1h_64_error_handler+0x30/0x48
[    6.215957]  el1h_64_error+0x64/0x68
[    6.215960]  nvme_pci_enable+0x5c/0x524
[    6.215963]  nvme_probe+0x280/0x6f8
[    6.215966]  pci_device_probe+0xa8/0x16c
[    6.215971]  really_probe+0x184/0x3c8
[    6.215975]  __driver_probe_device+0x7c/0x16c
[    6.215978]  driver_probe_device+0x3c/0x10c
[    6.215981]  __device_attach_driver+0xbc/0x158
[    6.215984]  bus_for_each_drv+0x80/0xdc
[    6.215990]  __device_attach+0xa8/0x1d4
[    6.215993]  device_attach+0x14/0x20
[    6.215996]  pci_bus_add_device+0x64/0xd4
[    6.216002]  pci_bus_add_devices+0x3c/0x88
[    6.216006]  pci_bus_add_devices+0x68/0x88
[    6.216010]  pci_host_probe+0x44/0xbc
[    6.216015]  cdns_pcie_host_setup+0x10c/0x1c8
[    6.216020]  j721e_pcie_probe+0x3cc/0x444
[    6.216024]  platform_probe+0x68/0xdc
[    6.216027]  really_probe+0x184/0x3c8
[    6.216030]  __driver_probe_device+0x7c/0x16c
[    6.216032]  driver_probe_device+0x3c/0x10c
[    6.216035]  __device_attach_driver+0xbc/0x158
[    6.216037]  bus_for_each_drv+0x80/0xdc
[    6.216042]  __device_attach+0xa8/0x1d4
[    6.216044]  device_initial_probe+0x14/0x20
[    6.216047]  bus_probe_device+0xac/0xb0
[    6.216052]  deferred_probe_work_func+0x9c/0xec
[    6.216054]  process_one_work+0x138/0x260
[    6.216061]  worker_thread+0x32c/0x438
[    6.216065]  kthread+0x118/0x11c
[    6.216070]  ret_from_fork+0x10/0x20
[    6.216074] SMP: stopping secondary CPUs
[    6.216084] Kernel Offset: disabled
[    6.216086] CPU features: 0x0,80000000,28020000,1000420b
[    6.216089] Memory Limit: none
[    6.539032] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

I gave a try using this patch without any change: lore.kernel.org/.../

What do you think about this? Are you experiencing the same behavior?

Emanuele