This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: PCIe network card is not stable on SDK 7.0

Part Number: TDA4VM

Hi, we are using Intel 82599ES PCIe card on our board, and the PCIe card is attached to the 2 LANEs of SERDES2.

The 82599ES card is stable if our board runs SDK 6.2, but A72 reports ABORT if the board runs SDK 7.0.

Since the SERDES2 of EVM board is used as M.2 slot, so I am not sure if the EVM's SERDES2 has this issue too, but 82599ES card works well when it is plugged into EVM's PCIe x2 slot on SDK 7.0.

The UART crash log:

ERROR:   Unhandled External Abort received on 0x80000001 from S-EL1                                             
ERROR:   exception reason=0 syndrome=0xbf000000                                                                                  
Unhandled Exception from EL1                                                                                                     
x0             = 0x0000000000000000                                                                                              
x1             = 0x0000000000010148                                                                                              
x2             = 0xffff000840073e05                                                                                              
x3             = 0x0000000000000000                                                                                              
x4             = 0x0000000000000027                                                                                              
x5             = 0x0000006562677869                                                                                              
x6             = 0xffff00084ed3276c                                                                                              
x7             = 0x0000000000000018                                                                                              
x8             = 0xfefefefefefefeff                                                                                              
x9             = 0xffff00084ed3276c                                                                                              
x10            = 0x00000000000009e0                                                                                              
x11            = 0x0000000000000000                                                                                              
x12            = 0x0000000000000001                                                                                              
x13            = 0x0000000000000000                                                                                              
x14            = 0x0000000000000000                                                                                              
x15            = 0x0000000000000000                                                                                              
x16            = 0x0000000000000000                                                                                              
x17            = 0x0000000000000000                                                                                              
x18            = 0x0000000000000000                                                                                              
x19            = 0x0000000000010148                                                                                              
x20            = 0x0000000000000000                                                                                              
x21            = 0xffff000846301980                                                                                              
x22            = 0xffff800015300000                                                                                              
x23            = 0xffff000840073e00                                                                                              
x24            = 0xffff0008463027b0                                                                                              
x25            = 0x0000000000000000                                                                                              
x26            = 0xffff80001a62fce8                                                                                              
x27            = 0xffff0008463007a8                                                                                              
x28            = 0xffffffffffffe098                                                                                              
x29            = 0xffff80001a6efc80                                                                                              
x30            = 0xffff800010626970                                                                                              
scr_el3        = 0x000000000000073d                                                                                              
sctlr_el3      = 0x0000000030cd183f                                                                                              
cptr_el3       = 0x0000000000000000                                                                                              
tcr_el3        = 0x0000000080803520                                                                                              
daif           = 0x00000000000002c0                                                                                              
mair_el3       = 0x00000000004404ff                                                                                              
spsr_el3       = 0x0000000060000005                                                                                              
elr_el3        = 0xffff800010624ed0                                                                                              
ttbr0_el3      = 0x0000000070010b00                                                                                              
esr_el3        = 0x00000000bf000000                                                                                              
far_el3        = 0x0000000000000000                                                                                              
spsr_el1       = 0x0000000040000005                                                                                              
elr_el1        = 0xffff800010086aa8                                                                                              
spsr_abt       = 0x0000000000000000                                                                                              
spsr_und       = 0x0000000000000000                                                                                              
spsr_irq       = 0x0000000000000000                                                                                              
spsr_fiq       = 0x0000000000000000                                                                                              
sctlr_el1      = 0x0000000034d4d91d                                                                                              
actlr_el1      = 0x0000000000000000                                                                                              
cpacr_el1      = 0x0000000000300000                                                                                              
csselr_el1     = 0x0000000000000000                                                                                              
sp_el1         = 0xffff80001a6efc80                                                                                              
esr_el1        = 0x0000000056000000                                                                                              
ttbr0_el1      = 0x00000008d51d1c00                                                                                              
ttbr1_el1      = 0x059e000080bc0000                                                                                              
mair_el1       = 0x0000bbff440c0400                                                                                              
amair_el1      = 0x0000000000000000                                                                                              
tcr_el1        = 0x00000034f5507510                                                                                              
tpidr_el1      = 0xffff80086ee50000                                                                                              
tpidr_el0      = 0x0000000000000000                                                                                              
tpidrro_el0    = 0x0000000000000000                                                                                              
par_el1        = 0x0000000000000000                                                                                              
mpidr_el1      = 0x0000000080000001                                                                                              
afsr0_el1      = 0x0000000000000000                                                                                              
afsr1_el1      = 0x0000000000000000                                                                                              
contextidr_el1 = 0x0000000000000000                                                                                              
vbar_el1       = 0xffff800010081800                                                                                              
cntp_ctl_el0   = 0x0000000000000005                                                                                              
cntp_cval_el0  = 0x000000f833b4db18                                                                                              
cntv_ctl_el0   = 0x0000000000000000                                                                                              
cntv_cval_el0  = 0x0000000000000000                                                                                              
cntkctl_el1    = 0x00000000000000e6                                                                                              
sp_el0         = 0x000000007000abd0                                                                                              
isr_el1        = 0x0000000000000040                                                                                              
dacr32_el2     = 0x0000000000000000                                                                                              
ifsr32_el2     = 0x0000000000000000                                                                                              
cpuectlr_el1   = 0x0000001b00000040                                                                                              
cpumerrsr_el1  = 0x0000000000000000                                                                                              
l2merrsr_el1   = 0x0000000000000000                                                                                              
[ 5349.102182] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:                                                             
[ 5349.108267] rcu:     1-...0: (195 ticks this GP) idle=116/1/0x4000000000000000 softirq=48546/48548 fqs=2625                   
[ 5349.117809]  (detected by 0, t=5252 jiffies, g=89765, q=883)                                                                  
[ 5349.123449] Task dump for CPU 1:                                                                                              
[ 5349.126663] kworker/u4:1    R  running task        0  2058      2 0x0000002a                                                  
[ 5349.133703] Workqueue: ixgbe ixgbe_service_task                                                                               
[ 5349.138218] Call trace:                                                                                                       
[ 5349.140655]  __switch_to+0x104/0x170                                                                                          
[ 5349.144216]  0xffff0008412fe000                                                                                               
[ 5412.122183] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:                                                             
[ 5412.128269] rcu:     1-...0: (195 ticks this GP) idle=116/1/0x4000000000000000 softirq=48546/48548 fqs=10497                  
[ 5412.137897]  (detected by 0, t=21007 jiffies, g=89765, q=1239)                                                                
[ 5412.143711] Task dump for CPU 1:                                                                                              
[ 5412.146924] kworker/u4:1    R  running task        0  2058      2 0x0000002a                                                  
[ 5412.153965] Workqueue: ixgbe ixgbe_service_task                                                                               
[ 5412.158480] Call trace:                                                                                                       
[ 5412.160918]  __switch_to+0x104/0x170                                                                                          
[ 5412.164479]  0xffff0008412fe000                                                                                               
[ 5475.142183] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:                                                             
[ 5475.148270] rcu:     1-...0: (195 ticks this GP) idle=116/1/0x4000000000000000 softirq=48546/48548 fqs=18369                  
[ 5475.157898]  (detected by 0, t=36762 jiffies, g=89765, q=2380)                                                                
[ 5475.163711] Task dump for CPU 1:                                                                                              
[ 5475.166924] kworker/u4:1    R  running task        0  2058      2 0x0000002a                                                  
[ 5475.173966] Workqueue: ixgbe ixgbe_service_task                                                                               
[ 5475.178480] Call trace:                                                                                                       
[ 5475.180918]  __switch_to+0x104/0x170                                                                                          
[ 5475.184480]  0xffff0008412fe000                                                                                               
[ 5504.050190] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 5524 jiffies s: 85 root: 0x2/.          
[ 5504.060708] rcu: blocking rcu_node structures:                                                                                
[ 5504.065520] Task dump for CPU 1:                                                                                              
[ 5504.068800] kworker/u4:1    R  running task        0  2058      2 0x0000002a                                                  
[ 5504.075893] Workqueue: ixgbe ixgbe_service_task                                                                               
[ 5504.080503] Call trace:                                                                                                       
[ 5504.082995]  __switch_to+0x104/0x170                                                                                          
[ 5504.086615]  0xffff0008412fe000

Could you please tell me how to debug this issue?

And is there any big changes about PCIe between SDK 6.2 and SDK 7.0?

Thanks

  • Hi,

    Is it a multi-function card? Do you have lspci output with 6.2 SDK?

    Thanks

    Kishon

  • No, it is a single port version, only one function.

    The output of lspci with SDK 6.2:

    sudo lspci -vv
    0000:00:00.0 PCI bridge: Texas Instruments Device b00d (prog-if 00 [Normal decode])
            Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Interrupt: pin A routed to IRQ 255
            Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
            I/O behind bridge: None
            Memory behind bridge: None
            Prefetchable memory behind bridge: None
            Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
            BridgeCtl: Parity- SERR- NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                    PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
            Capabilities: [80] Power Management version 3
                    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
            Capabilities: [90] MSI: Enable- Count=1/1 Maskable+ 64bit+
                    Address: 0000000000000000  Data: 0000
                    Masking: 00000000  Pending: 00000000
            Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
                    Vector table: BAR=0 offset=00000000
                    PBA: BAR=0 offset=00000008
            Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
                    DevCap: MaxPayload 256 bytes, PhantFunc 0
                            ExtTag- RBE+
                    DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                            MaxPayload 128 bytes, MaxReadReq 512 bytes
                    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                    LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L1, Exit Latency L1 <8us
                            ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                    LnkSta: Speed 2.5GT/s (downgraded), Width x2 (ok)
                            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                    SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                            Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                    SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                            Control: AttnInd Off, PwrInd Off, Power+ Interlock-
                    SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock-
                            Changed: MRL- PresDet- LinkState-
                    RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                    RootCap: CRSVisible-
                    RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                    DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                             AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                             AtomicOpsCtl: ReqEn- EgressBlck-
                    LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                             Compliance De-emphasis: -6dB
                    LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
            Capabilities: [100 v2] Advanced Error Reporting
                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                    AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                    HeaderLog: 00000000 00000000 00000000 00000000
                    RootCmd: CERptEn- NFERptEn- FERptEn-
                    RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                             FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                    ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
            Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00
            Capabilities: [300 v1] Secondary PCI Express <?>
            Capabilities: [4c0 v1] Virtual Channel
                    Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                    Arb:    Fixed- WRR32- WRR64- WRR128-
                    Ctrl:   ArbSelect=Fixed
                    Status: InProgress-
                    VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                            Status: NegoPending- InProgress-
                    VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=1 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
                    VC2:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=2 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
                    VC3:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=3 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
            Capabilities: [5c0 v1] Address Translation Service (ATS)
                    ATSCap: Invalidate Queue Depth: 01
                    ATSCtl: Enable-, Smallest Translation Unit: 00
            Capabilities: [640 v1] Page Request Interface (PRI)
                    PRICtl: Enable- Reset-
                    PRISta: RF- UPRGI- Stopped+
                    Page Request Capacity: 00000001, Page Request Allocation: 00000000
            Capabilities: [900 v1] L1 PM Substates
                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                              PortCommonModeRestoreTime=255us PortTPowerOnTime=26us
                    L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                               T_CommonMode=0us LTR1.2_Threshold=0ns
                    L1SubCtl2: T_PwrOn=10us
            Kernel modules: pci_endpoint_pipe
    
    0001:00:00.0 PCI bridge: Texas Instruments Device b00d (prog-if 00 [Normal decode])
            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 0
            Interrupt: pin A routed to IRQ 255
            Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
            I/O behind bridge: 00001000-00001fff [size=4K]
            Memory behind bridge: 00100000-00afffff [size=10M]
            Prefetchable memory behind bridge: None
            Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
            BridgeCtl: Parity- SERR- NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                    PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
            Capabilities: [80] Power Management version 3
                    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
            Capabilities: [90] MSI: Enable- Count=1/1 Maskable+ 64bit+
                    Address: 0000000000000000  Data: 0000
                    Masking: 00000000  Pending: 00000000
            Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
                    Vector table: BAR=0 offset=00000000
                    PBA: BAR=0 offset=00000008
            Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
                    DevCap: MaxPayload 256 bytes, PhantFunc 0
                            ExtTag- RBE+
                    DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                            MaxPayload 128 bytes, MaxReadReq 512 bytes
                    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                    LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L1, Exit Latency L1 <8us
                            ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                    LnkSta: Speed 5GT/s (downgraded), Width x2 (ok)
                            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt+
                    SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                            Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                    SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                            Control: AttnInd Off, PwrInd Off, Power+ Interlock-
                    SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock-
                            Changed: MRL- PresDet- LinkState-
                    RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                    RootCap: CRSVisible-
                    RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                    DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
                             AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd+
                             AtomicOpsCtl: ReqEn- EgressBlck-
                    LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                             Compliance De-emphasis: -6dB
                    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
            Capabilities: [100 v2] Advanced Error Reporting
                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                    AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                    HeaderLog: 00000000 00000000 00000000 00000000
                    RootCmd: CERptEn- NFERptEn- FERptEn-
                    RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                             FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                    ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
            Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00
            Capabilities: [300 v1] Secondary PCI Express <?>
            Capabilities: [4c0 v1] Virtual Channel
                    Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                    Arb:    Fixed- WRR32- WRR64- WRR128-
                    Ctrl:   ArbSelect=Fixed
                    Status: InProgress-
                    VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                            Status: NegoPending- InProgress-
                    VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=1 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
                    VC2:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=2 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
                    VC3:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                            Ctrl:   Enable- ID=3 ArbSelect=Fixed TC/VC=00
                            Status: NegoPending- InProgress-
            Capabilities: [5c0 v1] Address Translation Service (ATS)
                    ATSCap: Invalidate Queue Depth: 01
                    ATSCtl: Enable-, Smallest Translation Unit: 00
            Capabilities: [640 v1] Page Request Interface (PRI)
                    PRICtl: Enable- Reset-
                    PRISta: RF- UPRGI- Stopped+
                    Page Request Capacity: 00000001, Page Request Allocation: 00000000
            Capabilities: [900 v1] L1 PM Substates
                    L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                              PortCommonModeRestoreTime=255us PortTPowerOnTime=26us
                    L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                               T_CommonMode=0us LTR1.2_Threshold=0ns
                    L1SubCtl2: T_PwrOn=10us
            Kernel modules: pci_endpoint_pipe
    
    0001:01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
            Subsystem: Intel Corporation Ethernet Server Adapter X520-1
            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 0, Cache Line Size: 64 bytes
            Interrupt: pin A routed to IRQ 0
            Region 0: Memory at 4400100000 (64-bit, non-prefetchable) [size=512K]
            Region 2: I/O ports at 10000 [disabled] [size=32]
            Region 4: Memory at 4400a00000 (64-bit, non-prefetchable) [size=16K]
            [virtual] Expansion ROM at 4400180000 [disabled] [size=512K]
            Capabilities: [40] Power Management version 3
                    Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                    Address: 0000000000000000  Data: 0000
                    Masking: 00000000  Pending: 00000000
            Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
                    Vector table: BAR=4 offset=00000000
                    PBA: BAR=4 offset=00002000
            Capabilities: [a0] Express (v2) Endpoint, MSI 00
                    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                    DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                            MaxPayload 128 bytes, MaxReadReq 512 bytes
                    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                    LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
                            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                    LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
                            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                             AtomicOpsCtl: ReqEn-
                    LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                             Compliance De-emphasis: -6dB
                    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
            Capabilities: [e0] Vital Product Data
    pcilib: sysfs_read_vpd: read failed: Input/output error
                    Not readable
            Capabilities: [100 v1] Advanced Error Reporting
                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                    AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                    HeaderLog: 00000000 00000000 00000000 00000000
            Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-c1-cf-fe
            Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                    ARICap: MFVC- ACS-, Next Function: 0
                    ARICtl: MFVC- ACS-, Function Group: 0
            Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                    IOVCap: Migration-, Interrupt Message Number: 000
                    IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                    IOVSta: Migration-
                    Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
                    VF offset: 128, stride: 2, Device ID: 10ed
                    Supported Page Size: 00000553, System Page Size: 00000010
                    Region 0: Memory at 0000000000200000 (64-bit, non-prefetchable)
                    Region 3: Memory at 0000000000600000 (64-bit, non-prefetchable)
                    VF Migration: offset: 00000000, BIR: 0
            Kernel driver in use: ixgbe

  • There are no major changes between 6.2 and 7.0. We've tested M.2 slot with a Samsung NVMe card. The abort could be because of unstable link or there's an errata for multi function card. Since it's not multi-function card, it's probably because of unstable link.

    Do you use an adaptor for connecting to M.2 slot?

    Is the issue reproducible 100% of the time?

  • The single function 82599ES card never abort with SDK 6.2 at all conditions.

    But after the same TDA4 board installed with SDK 7.0, TDA4 aborts very randomly from 5 minutes to 10 hours even TDA4 and network card are at idle state, and the abort is reproducible at 100%, but we have not figured out the root cause.

    And then I install SDK 6.2 to the same board again, the abort never showed up, both TDA4 and network card are stable.

    I repeated the same test on more than 3 TDA4 boards, all of them have the same results: stable with SDK 6.2 and abort with SDK 7.0.

    So if this issue is caused by unstable link, SDK 6.2 should abort too.

    And we are trying to buy M.2 to PCIe adaptor, then make 82599ES connect to SERDES2 on EVM board, and check if this issue will reproduce on EVM board.

  • Okay, I thought the abort is seen during the enumeration itself but looks like it happens later. Would you be able to get what the ethernet (ixgbe) driver was trying to do when the abort happens? Was the driver accessing the card or the card trying to interrupt or card writing to buffer?

    We haven't observed aborts during data transfer.

  • Okay, I think we should do some investigations at our both side:

    I will try to narrow down what ixgbe driver is doing when aborts,

    and please make a double check to make sure there is no difference about PCIe clock, SERDES configurations, and so on, because our board is stable and works well with SDK 6.2

    Thanks.

  • Hi Kishon,

    As we discussed in email, I did a CCS memory save, and uploaded here the register dumps of PCIe and SERDES. And the corresponding UART abort log is as below:

    ERROR:   Unhandled External Abort received on 0x80000000 from S-EL1
    ERROR:   exception reason=0 syndrome=0xbf000000
    Unhandled Exception from EL1
    x0             = 0x0000000000000000
    x1             = 0x0000000000010148
    x2             = 0xffff000840073e05
    x3             = 0x0000000000000000
    x4            = 0x0000000000000027
    x5             = 0x0000006562677869
    x6             = 0xffff000823b81f6c
    x7             = 0x0000000000000018
    x8             = 0xfefefefefefefeff
    x9             = 0xffff000823b81f6c
    x10            = 0x00000000000009e0
    x11            = 0x0000000000000000
    x12            = 0x0000000000000001
    x13            = 0x0000000000000000
    x14            = 0x0000000000000001
    x15            = 0x0000000000001000
    x16            = 0x0000000000000000
    x17            = 0x0000000000000000
    x18            = 0x0000000000000000
    x19            = 0x0000000000010148                                                                                              
    x20            = 0x0000000000000000                                                                                              
    x21            = 0xffff000841441980                                                                                              
    x22            = 0xffff800015a80000                                                                                              
    x23            = 0xffff000840073e00                                                                                              
    x24            = 0xffff0008414427b0                                                                                              
    x25            = 0x0000000000000000                                                                                              
    x26            = 0xffff000840060020                                                                                              
    x27            = 0xffff0008414407a8                                                                                              
    x28            = 0xffffffffffffe098                                                                                              
    x29            = 0xffff8000146efc80                                                                                              
    x30            = 0xffff800010626970                                                                                              
    scr_el3        = 0x000000000000073d                                                                                              
    sctlr_el3      = 0x0000000030cd183f                                                                                              
    cptr_el3       = 0x0000000000000000                                                                                              
    tcr_el3        = 0x0000000080803520                                                                                              
    daif           = 0x00000000000002c0                                                                                              
    mair_el3       = 0x00000000004404ff                                                                                              
    spsr_el3       = 0x0000000060000005                                                                                              
    elr_el3        = 0xffff800010624ed0                                                                                              
    ttbr0_el3      = 0x0000000070010b00                                                                                              
    esr_el3        = 0x00000000bf000000                                                                                              
    far_el3        = 0x0000000000000000                                                                                              
    spsr_el1       = 0x0000000040000005                                                                                              
    elr_el1        = 0xffff800010086aa8                                                                                              
    spsr_abt       = 0x0000000000000000                                                                                              
    spsr_und       = 0x0000000000000000                                                                                              
    spsr_irq       = 0x0000000000000000                                                                                              
    spsr_fiq       = 0x0000000000000000                                                                                              
    sctlr_el1      = 0x0000000034d4d91d                                                                                              
    actlr_el1      = 0x0000000000000000                                                                                              
    cpacr_el1      = 0x0000000000300000                                                                                              
    csselr_el1     = 0x0000000000000000                                                                                              
    sp_el1         = 0xffff8000146efc80                                                                                              
    esr_el1        = 0x0000000056000000                                                                                              
    ttbr0_el1      = 0x00000008d1020000                                                                                              
    ttbr1_el1      = 0x0e26000080bc0000                                                                                              
    mair_el1       = 0x0000bbff440c0400                                                                                              
    amair_el1      = 0x0000000000000000                                                                                              
    tcr_el1        = 0x00000034f5507510                                                                                              
    tpidr_el1      = 0xffff80086ee30000                                                                                              
    tpidr_el0      = 0x0000000000000000                                                                                              
    tpidrro_el0    = 0x0000000000000000                                                                                              
    par_el1        = 0x0000000000000000                                                                                              
    mpidr_el1      = 0x0000000080000000                                                                                              
    afsr0_el1      = 0x0000000000000000                                                                                              
    afsr1_el1      = 0x0000000000000000                                                                                              
    contextidr_el1 = 0x0000000000000000                                                                                              
    vbar_el1       = 0xffff800010081800                                                                                              
    cntp_ctl_el0   = 0x0000000000000005                                                                                              
    cntp_cval_el0  = 0x00000060654c625f                                                                                              
    cntv_ctl_el0   = 0x0000000000000000                                                                                              
    cntv_cval_el0  = 0x0000000000000000                                                                                              
    cntkctl_el1    = 0x00000000000000e6                                                                                              
    sp_el0         = 0x000000007000a3d0                                                                                              
    isr_el1        = 0x0000000000000040                                                                                              
    dacr32_el2     = 0x0000000000000000                                                                                              
    ifsr32_el2     = 0x0000000000000000                                                                                              
    cpuectlr_el1   = 0x0000001b00000040                                                                                              
    cpumerrsr_el1  = 0x0000000000000000                                                                                              
    l2merrsr_el1   = 0x0000000000000000
    

    serdes2_pcie2_reg_dump.tar.gz

  • Hi Felix,

    As discussed over e-mail. The dumps are not correct. We will wait for your response.

    Best Regards,
    Keerthy

  • Hi Keerthy,

    We have fixed this issue, and it is caused by a un-initialised GPIO which is used as PCIe PERST in our design.

    Since we just merge patches from SDK 6.2 to SDK 7.0, and there is some slight difference about GPIO initialise, so the GPIO was left un-initialised.

    After fixed the GPIO initialise procedure, SDK 7.0 is as stable as SDK 6.2.

    Appreciate your time and help.