This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: PCIe access error

Part Number: AM5728

Tool/software: Linux

Hi,

a customer is getting an error 

omap_l3_noc 44000000.ocp: L3 application error: target 5 mod:1 (unclearable)

 

the error message that we are seeing with an FPGA connected via PCIe.

Immediately after the FPGA is programmed, we load the driver and it accesses the first 32-bit word of BAR 0. We call pci_iomap (so it is noncached) on the BAR before reading the word by calling ioread32. The relevant parts of /proc/iomem are below. We are reading the first word of the BAR at 24400000 which is mapped to kernel address d5f18000.

 

20013000-2fffffff : MEM

  20020000-2002ffff : 0000:00:00.0

  20100000-201fffff : 0000:00:00.0

  24000000-2fffffff : PCI Bus 0000:01

    24000000-243fffff : 0000:01:00.0

      24000000-243fffff : plda

    24400000-24403fff : 0000:01:00.0

      24400000-24403fff : plda

    28000000-2fffffff : 0000:01:00.0

      28000000-2fffffff : plda

 

80000000-feffffff : System RAM

  80008000-80dfffff : Kernel code

  81000000-81098bab : Kernel data

200000000-27fcfffff : System RAM

 

We have a logic analyzer connected to a board and do not see anything wrong with the PCI reads/writes.

I think this is different from the last time we were getting omap_l3_noc errors - that was user space accesses to the BARs through cached mappings which was probably causing bursting on the PCIE bus. The error back then was solved by making sure the BARs are uncached.

This time it is kernel space access.

We have been looking at the end of this e2e thread, where I similar error occurs. There are some hints but finding a root cause to these errors is difficult.

Could you let us know if you have seen these omap_l3_noc errors in combination with PCIe accesses? How should we best trace down this error?

Regards,

--Gunter

  • Hi,

    Here are kernel messages from startup.

    [    0.719721] OF: PCI: host bridge /ocp/axi@0/pcie@51000000 ranges:

    [    0.719732] OF: PCI:   No bus range found for /ocp/axi@0/pcie@51000000, using [bus 00-ff]

    [    0.719765] OF: PCI:    IO 0x20003000..0x20012fff -> 0x00000000

    [    0.719786] OF: PCI:   MEM 0x20013000..0x2fffffff -> 0x20013000

    [    0.820965] dra7-pcie 51000000.pcie: link up

    [    0.821123] dra7-pcie 51000000.pcie: PCI host bridge to bus 0000:00

    [    0.821136] pci_bus 0000:00: root bus resource [bus 00-ff]

    [    0.821147] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]

    [    0.821156] pci_bus 0000:00: root bus resource [mem 0x20013000-0x2fffffff]

    [    0.821189] pci 0000:00:00.0: [104c:8888] type 01 class 0x060400

    [    0.821215] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]

    [    0.821231] pci 0000:00:00.0: reg 0x14: [mem 0x00000000-0x0000ffff]

    [    0.821301] pci 0000:00:00.0: supports D1

    [    0.821310] pci 0000:00:00.0: PME# supported from D0 D1 D3hot

    [    0.821540] PCI: bus0: Fast back to back transfers disabled

    [    0.821723] pci 0000:01:00.0: [1556:1100] type 00 class 0xff0000

    [    0.821822] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit pref]

    [    0.821887] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x003fffff 64bit pref]

    [    0.821951] pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x07ffffff 64bit pref]

    [    0.847102] PCI: bus1: Fast back to back transfers disabled

    [    0.847270] pci 0000:00:00.0: BAR 9: assigned [mem 0x24000000-0x2fffffff pref]

    [    0.847283] pci 0000:00:00.0: BAR 0: assigned [mem 0x20100000-0x201fffff]

    [    0.847297] pci 0000:00:00.0: BAR 1: assigned [mem 0x20020000-0x2002ffff]

    [    0.847312] pci 0000:01:00.0: BAR 4: assigned [mem 0x28000000-0x2fffffff 64bit pref]

    [    0.847365] pci 0000:01:00.0: BAR 2: assigned [mem 0x24000000-0x243fffff 64bit pref]

    [    0.847417] pci 0000:01:00.0: BAR 0: assigned [mem 0x24400000-0x24403fff 64bit pref]

    [    0.847469] pci 0000:00:00.0: PCI bridge to [bus 01]

    [    0.847482] pci 0000:00:00.0:   bridge window [mem 0x24000000-0x2fffffff pref]

    [    0.847701] pcieport 0000:00:00.0: Signaling PME through PCIe PME interrupt

    [    0.847711] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt

    [    0.847722] pcie_pme 0000:00:00.0:pcie001: service driver pcie_pme loaded

    [    0.847850] aer 0000:00:00.0:pcie002: service driver aer loaded

    we are wondering about the lines in red. It looks like BAR 9 of the processor's controller is overlapping the FPGA's BARs. Does that make sense? What is BAR 9 for?

    REgards,

    --Gunter

  • Hi, Gunter,

    Could you do a "lspci -vv" and show the results? I am curious how it shows up on customer's board. In case of our GP EVM, boot logs are shown below. There is BAR8 configured which is the memory behind the bridge and used by EP. BAR9 may be ok, but not sure if it causes problem, or should it be BAR8. We don't have a FPGA setup to compare the proper behavior.

    [ 0.734180] OF: PCI: host bridge /ocp/axi@0/pcie@51000000 ranges:
    [ 0.734192] OF: PCI: No bus range found for /ocp/axi@0/pcie@51000000, using [bus 00-ff]
    [ 0.734224] OF: PCI: IO 0x20003000..0x20012fff -> 0x00000000
    [ 0.734246] OF: PCI: MEM 0x20013000..0x2fffffff -> 0x20013000
    [ 0.835449] dra7-pcie 51000000.pcie: link up
    [ 0.835618] dra7-pcie 51000000.pcie: PCI host bridge to bus 0000:00
    [ 0.835631] pci_bus 0000:00: root bus resource [bus 00-ff]
    [ 0.835642] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
    [ 0.835652] pci_bus 0000:00: root bus resource [mem 0x20013000-0x2fffffff]
    [ 0.836027] PCI: bus0: Fast back to back transfers disabled
    [ 0.860249] PCI: bus1: Fast back to back transfers disabled
    [ 0.860415] pci 0000:00:00.0: BAR 0: assigned [mem 0x20100000-0x201fffff]
    [ 0.860431] pci 0000:00:00.0: BAR 8: assigned [mem 0x20200000-0x202fffff]
    [ 0.860443] pci 0000:00:00.0: BAR 1: assigned [mem 0x20020000-0x2002ffff]
    [ 0.860459] pci 0000:01:00.0: BAR 0: assigned [mem 0x20200000-0x20201fff 64bit]
    [ 0.860514] pci 0000:00:00.0: PCI bridge to [bus 01]
    [ 0.860527] pci 0000:00:00.0: bridge window [mem 0x20200000-0x202fffff]
    [ 0.860755] pcieport 0000:00:00.0: Signaling PME through PCIe PME interrupt
    [ 0.860765] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
    [ 0.861887] backlight supply power not found, using dummy regulator

    The "lspci -vv" output is shown below in which the memory address for Region 0 and 1 of PCI bridge matches the BAR0 and BAR1 configuration, and memory behind matches BAR8. On the EP side, the memory of Region 0 is at 0x20200000 (where the BAR8 starts).

    root@am57xx-evm:~# lspci -vv
    00:00.0 PCI bridge: Texas Instruments Multicore DSP+ARM KeyStone II SOC (rev 01) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 438
    Region 0: Memory at 20100000 (32-bit, non-prefetchable) [size=1M]
    Region 1: Memory at 20020000 (32-bit, non-prefetchable) [size=64K]
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    Memory behind bridge: 20200000-202fffff

    01:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
    Subsystem: Intel Corporation Centrino Wireless-N 1000 BGN
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 470
    Region 0: Memory at 20200000 (64-bit, non-prefetchable) [size=8K]

    Rex
  • Hi, Gunter,

    I haven't heard back from you. Has the customer issue resolved? Is it configuration issue?
    Thanks!

    Rex
  • Hi Rex,

    sorry this took longer, but the customer was tracking down a power supply issue on some of their boards, since the omap_l3_noc error did not occur on all boards. Fixing the power supply issue also caused the omap_l3_noc error to go away.

    So for now we can close this issue.

    Thanks for the note on the lspci -vv and BAR8 explanation. We will keep this info in case these questions rise up again.


    Thanks!
    --Gunter