This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TI8168 PCIe device memory BAR access results in unhandled fault

Hi,

We have a DM8168 Evaluation Module of which we are using the PCIe interface to communicate with a

configured Altera Arria II GX  FPGA development board.

During EVM Ubuntu boot, with FPGA card attached,  the following PCIe information is given:

ti816x_pcie: Invoking PCI BIOS...
ti816x_pcie: Setting up Host Controller...
ti816x_pcie: Register base mapped @0xd0820000
ti816x_pcie: Starting PCI scan...
PCI: bus0: Fast back to back transfers disabled
PCI: bus1: Fast back to back transfers disabled
pci 0000:00:00.0: BAR 9: assigned [mem 0x20000000-0x21ffffff pref]
pci 0000:00:00.0: BAR 8: assigned [mem 0x22000000-0x220fffff]
pci 0000:01:00.0: BAR 0: assigned [mem 0x20000000-0x200fffff]
pci 0000:01:00.0: BAR 0: set to [mem 0x20000000-0x200fffff] (PCI address [0x20000000-0x200fffff])
pci 0000:00:00.0: PCI bridge to [bus 01-01]
pci 0000:00:00.0:   bridge window [io  disabled]
pci 0000:00:00.0:   bridge window [mem 0x22000000-0x220fffff]
pci 0000:00:00.0:   bridge window [mem 0x20000000-0x21ffffff pref]
PCI: enabling device 0000:00:00.0 (0140 -> 0143)

Note only One memory BAR is configured in FPGA.  BAR0 is configured 32-bit non-prefetchable memory (1Mb)

Our driver then registers, reads the device configuration, scans and maps the bars. The following debugs are given during this stage:

probe() ape = 0xcb754600
PCI: enabling device 0000:01:00.0 (0140 -> 0142)
Enabled MSI interrupting.
Using a 64-bit DMA mask.
IRQ pin #1 (0=none, 1=INTA#...4=INTD#).
IRQ line #48.
Succesfully requested IRQ #112 with dev_id 0xcb754600
BAR0 0x20000000-0x200fffff flags 0x00040200
BAR[0] mapped at 0xd5780000 with length 1048576 flags = 0x00040200.
fpga_tests()

when it comes to accessing ANY part of BAR0 e.g.

printk(KERN_INFO "BAR 0 %d = %0llx ", i, ptrMem[i++]);

We get the following memory fault:

Unhandled fault: Precise External Abort on non-linefetch (0x1008) at 0xd5780000
Internal error: : 1008 [#1]
last sysfs file: /sys/module/pvrsrvkm/initstate
Modules linked in: altpciechdma(+) bufferclass_ti omaplfb pvrsrvkm TI81xx_hdmi ti81xxfb vpss syslink ipv6
CPU: 0    Not tainted  (2.6.37 #1)

I tried disabling the MSI interrupt support in the Kernel configuration and in the PCIe device driver, But that didn't help me out.

As per the solution provided , i have verified the Address writtem to the FPGA configuation space BAR0 register by the processor is same

as the address assigned for the PCIe outbound memory access (ie. 0x2000000 ). Here is the lspci command output after loading the pcie device driver

and access failure to the FPGA memory window. I am getting the same completion timeout error (CmpltTO+) in the RC error status Register.

root@vp4000:~/pcie_fpga# ./lspci -xvvv
00:00.0 Class 0604: Device 104c:8888 (rev 01)
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Region 0: Memory at <red> (32-bit, Region 1: Memory at <ignored>(32-bit, prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff     
        Memory behind bridge: 20000000-200fffff   Prefetchable memory behind bridge: fff00000-000fffff
        Secondary status : 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR- NoISA-eset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s, Latency L0 <2us, L1 <64us
                        ClockPM- Surprise- LLActRep+ BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 0e, GenCap+ CGenEn- ChkCap+ ChkEn-
00: 4c 10 88 88 47 01 10 00 01 00 04 06 10 00 01 00
10: 00 00 00 51 08 00 00 80 00 01 01 00 f0 00 00 00
20: 00 20 00 20 f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 30 01 01 00

01:00.0 Class ff00: Device 1172:0004 (rev 01)
        Subsystem: Device 1172:0004
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 48
        Region 0: Memory at 20000000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Kernel driver in use: pcieFpga
00: 72 11 04 00 46 01 10 00 01 00 00 ff 10 00 00 00
10: 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 72 11 04 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 30 01 00 00

Please Help me and give some Suggestions to resolve this issue.

Appreciate your reply....

Note : I have seen a similar post in this forum, But  i couldn't make out the solution for the same from that reply chain.

Thanks,

Ansa Ahammed

  • Hello,

    Please clarify: is the lspci dump before or after the crash?

    Is it also possible to check on FPGA side if the memory read request indeed reached it and what was the response?

       Hemant 

  • Hi Hemant,

    Thanks for the quick response.

    The lspci dump is taken after loading the pcie device driver and access failure to the FPGA memory window(ie: the crash).

    It seems it is not possible to check on the FPGA side, if the memory read request reached it or not , especially on the PCIe Hard IP. Instead the FPGA team

    was trying is to check whether the Read request is coming to Transaction Layer of PCIE or not (Where they tried to signal tap the PCIe packet SOP - start of packet) ,

    But the packet SOP was not coming into the transaction layer. So we couldn't do any more on the FPGA side debugging.

    One more clarification on the BAR Register:

    Is the BAR Register values should be filled the RC or the EP itself? In our case only the size of BAR is assigned by the FPGA. The BAR base address are filled by RC.

    Here it is 0x20000000 (Ie the Outbound Memory Window base address).

    Also the BAR0 Resource Flag is set as - 0x00040200. Is this value correct?

    Please revert if you need any more clarity on this. Looking forward for your support.

    --

    Thanks and Regards,

    Ansa Ahammed.



  • Ansa Ahammed said:

    But the packet SOP was not coming into the transaction layer. So we couldn't do any more on the FPGA side debugging.

    So the memory read request never reached FPGA transaction layer? Have you checked the possibility it was dropped (perhaps due to some link issue) at data link layer itself? Do you have PCIe analyzer to check what happens over the link when you try to do memory read?

    Another option is to separate out BAR0 access from FPGA driver and use lspci to dump the config space *before* and *after* memory access. For example, try follwoing sequence:

    1) Load FPGA driver - it should just do initialization, enable ,memory access for FPGA.

    2) lspci

    3) Now use devmem2 to read 0x20000000 (or any value the BAR0 of FPGA is assigned)

    4) lspci

    Ansa Ahammed said:

    One more clarification on the BAR Register:

    Is the BAR Register values should be filled the RC or the EP itself? In our case only the size of BAR is assigned by the FPGA. The BAR base address are filled by RC.

    Here it is 0x20000000 (Ie the Outbound Memory Window base address).

    Also the BAR0 Resource Flag is set as - 0x00040200. Is this value correct?

    RC will write BAR0. The resource shown in lspci is 32-bit non-prefetchable memory which matches the setting you mentioned on FPGA.

       Hemant