This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: PCIE NVME SSD is not stable on SERDES0

Part Number: TDA4VM

Hi, gent:


On our TDA4VM custom bard, we design PCIE as key M slot and the PCIE signal is from 2 lanes of SERDES0.

We test several brands of NVMe SSDs: Advantech, Micron crucial and Kingston.

For Kingston NVMe SSD, it sometimes hangs on:
-----------------------------------------------------------------------------------------------------------
[ 9.915402] nvme 0000:01:00.0: enabling device (0000 -> 0002)


ERROR: Unhandled External Abort received on 0x80000000 from S-EL1
ERROR: exception reason=0 syndrome=0xbf000000
Unhandled Exception from EL1
x0 = 0x0000000000000000
x1 = 0xffffffffffffffff
x2 = 0x00000000fffffffe
x3 = 0x0000000000000001
x4 = 0x000000000000000b
x5 = 0xffff0008020b4800
x6 = 0x00000000be127933
x7 = 0xc0000000ffffefff
x8 = 0x0000000000017fe8
x9 = 0xffff8000111284c8
x10 = 0xffff800011128470
x11 = 0xffff8000111404b0
x12 = 0x00000000ffffffea
x13 = 0xffff0008000d5930
x14 = 0x00000000000001fb
x15 = 0xffff0008000d5930
x16 = 0x0000000000000000
x17 = 0x0000000000000000
x18 = 0x0000000000000010
x19 = 0xffff000804884000
x20 = 0xffff000804883f28
x21 = 0xffff0008048842d0
x22 = 0xffff000804884000
x23 = 0xffff000804884728
x24 = 0xffff0008022000b0
x25 = 0x0000000000000000
x26 = 0xffff0008022000b0
x27 = 0xffff000802200000
x28 = 0xffff000804884310
x29 = 0xffff8000112cbc80
x30 = 0xffff800008e928c4
scr_el3 = 0x000000000000073d
sctlr_el3 = 0x0000000030cd183f
cptr_el3 = 0x0000000000000000
tcr_el3 = 0x0000000080803520
daif = 0x00000000000002c0
mair_el3 = 0x00000000004404ff
spsr_el3 = 0x00000000a0000005
elr_el3 = 0xffff800008e928d0
ttbr0_el3 = 0x0000000070011cc0
esr_el3 = 0x00000000bf000000
far_el3 = 0x0000000000000000
spsr_el1 = 0x0000000040000005
elr_el1 = 0xffff8000100b5570
spsr_abt = 0x0000000000000000
spsr_und = 0x0000000000000000
spsr_irq = 0x0000000000000000
spsr_fiq = 0x0000000000000000
sctlr_el1 = 0x0000000034d4d91d
actlr_el1 = 0x0000000000000000
cpacr_el1 = 0x0000000000300000
csselr_el1 = 0x0000000000000000
sp_el1 = 0xffff8000112cbc80
esr_el1 = 0x0000000092000007
ttbr0_el1 = 0x0000000884a9f000
ttbr1_el1 = 0x00bc000082ee6000
mair_el1 = 0x000c0400bb44ffff
amair_el1 = 0x0000000000000000
tcr_el1 = 0x00000034b5d03590
tpidr_el1 = 0xffff80086e853000
tpidr_el0 = 0x0000000000000000
tpidrro_el0 = 0x0000000000000000
par_el1 = 0x0000000000000000
mpidr_el1 = 0x0000000080000000
afsr0_el1 = 0x0000000000000000
afsr1_el1 = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1 = 0xffff800010013000
cntp_ctl_el0 = 0x0000000000000005
cntp_cval_el0 = 0x00000000be1cb6a5
cntv_ctl_el0 = 0x0000000000000000
cntv_cval_el0 = 0x0000000000000000
cntkctl_el1 = 0x00000000000000d6
sp_el0 = 0x000000007000b380
isr_el1 = 0x0000000000000040
dacr32_el2 = 0x0000000000000000
ifsr32_el2 = 0x0000000000000000
cpuectlr_el1 = 0x0000001b00000040
cpumerrsr_el1 = 0x0000000000000000
l2merrsr_el1 = 0x0000000000000000

--------------------------------------------------------------------------------------------------------------

For other SSDs, we don't see this issue.

We find there is a similar case on TI E2E:
e2e.ti.com/.../tda4vm-pcie-network-card-is-not-stable-on-sdk-7-0

 has mentioned this issue might be due to unstable link on PCIE.

Could you share a tip how can we check if the link is unstable(e.g. via register content or...)?
If so, are there any registers in PCIE or SERDES block that can be used to fine tune the signal or improve signal quality?

Thanks~!!!

BR,


Richard

  • Hi Richard,

    Apologies for the delay, could you do the following to check status of the signal:

    1. Since Linux kernel is down, connect CCS through JTAG
    2. Check the content of PCIE_USER_LINKSTATUS register at 0x02907014. Check bit 1 and 0 - these values should be 11. Also read LTSSM field which is bits 29-24.
    3. If in step 2, the link status is down, check the SERDES register PHY_PMA_CMN_CTRL at 0x0500 E000. Specifically bit 10 - Current value of cmn_plllc_ready PMA output should be 1

    Regards,

    Takuma