This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

J721E-EVB PCIE device enumeration exception over NFS on TI SDK PROCESSOR-SDK-LINUX-J721E_08.00.00.08

1. With the J721E-EVB board installed one X1 PCIE Intel E1000E on slot 1 or 2.

2. With the default u-boot of TI SDK PROCESSOR-SDK-LINUX-J721E_08.00.00.08:

U-Boot SPL 2021.01-g53e79d0e89 (Aug 04 2021 - 23:32:00 +0000)

Model: Texas Instruments K3 J721E SoC

Board: J721EX-PM2-SOM rev E7

SYSFW ABI: 3.1 (firmware rev 0x0015 '21.5.0--v2021.05 (Terrific Llam') Trying to boot from MMC2 Starting ATF on ARM64 core...

 

NOTICE:  BL31: v2.5(release):08.00.00.004-dirty

NOTICE:  BL31: Built : 22:30:09, Aug  4 2021

 

U-Boot SPL 2021.01-g53e79d0e89 (Aug 04 2021 - 22:33:28 +0000)

Model: Texas Instruments K3 J721E SoC

Board: J721EX-PM2-SOM rev E7

SYSFW ABI: 3.1 (firmware rev 0x0015 '21.5.0--v2021.05 (Terrific Llam')

Detected: J7X-BASE-CPB rev E3

Detected: J7X-VSC8514-ETH rev E2

Trying to boot from MMC2

 U-Boot 2021.01-g53e79d0e89 (Aug 04 2021 - 22:33:28 +0000)

 SoC:   J721E SR1.0

Model: Texas Instruments K3 J721E SoC

Board: J721EX-PM2-SOM rev E7

DRAM:  4 GiB

Flash: 0 Bytes

MMC:   sdhci@4f80000: 0, sdhci@4fb0000: 1

In:    serial@2800000

Out:   serial@2800000

Err:   serial@2800000

Detected: J7X-BASE-CPB rev E3

Detected: J7X-VSC8514-ETH rev E2

Net:   am65_cpsw_nuss_slave ethernet@46000000: K3 CPSW: nuss_ver:

0x6BA00101 cpsw_ver: 0x6BA80100 ale_ver: 0x00293904 Ports:1

mdio_freq:1000000

eth0: ethernet@46000000

Hit any key to stop autoboot:  0

3. Over NFS, downloading the Linux default kernel image and dtb files of TI SDK PROCESSOR-SDK-LINUX-J721E_08.00.00.08, then booting the kernel, there will be below exception:

Load address: 0x88000000

Loading: #######

      2.1 MiB/s

done

Bytes transferred = 97586 (17d32 hex)

## Flattened Device Tree blob at 88000000

    Booting using the fdt blob at 0x88000000

    Loading Device Tree to 000000008ffe5000, end 000000008ffffd31 ... OK

 

Starting kernel ...

 

ERROR:   Unhandled External Abort received on 0x80000000 from S-EL1

ERROR:   exception reason=0 syndrome=0xbf000000 Unhandled Exception from EL1

x0             = 0xffff800014a20000

x1             = 0x0000000000000000

x2             = 0xffff800016400008

x3             = 0x0000000000000001

x4             = 0x000000000000000b

x5             = 0xffff00080236c000

x6             = 0xffff800011b6f864

x7             = 0x000000000000ea60

x8             = 0x0000000080b5111d

x9             = 0x00000000b00d104c

x10            = 0x7f7f7f7f7f7f7f7f

x11            = 0x0101010101010101

x12            = 0xffff0008000eb29a

x13            = 0xffff0008000eba1c

x14            = 0xffffffffffffffff

x15            = 0xffff000800351330

x16            = 0x0000000080396182

x17            = 0x0000000000000020

x18            = 0x0000000000000000

x19            = 0xffff800011b6f7a4

x20            = 0x0000000000000004

x21            = 0xffff00080236c800

x22            = 0x0000000000000087

x23            = 0xffff800011b6f864

x24            = 0x0000000000000000

x25            = 0xffff8000112d3a88

x26            = 0x0000000000000001

x27            = 0x0000000000000000

x28            = 0xffff00080236c000

x29            = 0xffff800011b6f720

x30            = 0xffff8000104ea30c

scr_el3        = 0x000000000000073d

sctlr_el3      = 0x0000000030cd183f

cptr_el3       = 0x0000000000000000

tcr_el3        = 0x0000000080803520

daif           = 0x00000000000002c0

mair_el3       = 0x00000000004404ff

spsr_el3       = 0x0000000020000085

elr_el3        = 0xffff8000104ea324

ttbr0_el3      = 0x0000000070010c00

esr_el3        = 0x00000000bf000000

far_el3        = 0x0000000000000000

spsr_el1       = 0x0000000040000005

elr_el1        = 0xffff800010a70a10

spsr_abt       = 0x0000000000000000

spsr_und       = 0x0000000000000000

spsr_irq       = 0x0000000000000000

spsr_fiq       = 0x0000000000000000

sctlr_el1      = 0x0000000034d4d91d

actlr_el1      = 0x0000000000000000

cpacr_el1      = 0x0000000000300000

csselr_el1     = 0x0000000000000000

sp_el1         = 0xffff800011b6f720

esr_el1        = 0x0000000000000000

ttbr0_el1      = 0x0000000083270000

ttbr1_el1      = 0x0000000082f10000

mair_el1       = 0x000c0400bb44ffff

amair_el1      = 0x0000000000000000

tcr_el1        = 0x00000034f5d07590

tpidr_el1      = 0xffff80086eab0000

tpidr_el0      = 0x0000000000000000

tpidrro_el0    = 0x0000000000000000

par_el1        = 0x0000000000000000

mpidr_el1      = 0x0000000080000000

afsr0_el1      = 0x0000000000000000

afsr1_el1      = 0x0000000000000000

contextidr_el1 = 0x0000000000000000

vbar_el1       = 0xffff800010010800

cntp_ctl_el0   = 0x0000000000000005

cntp_cval_el0  = 0x0000000b66cc8f06

cntv_ctl_el0   = 0x0000000000000000

cntv_cval_el0  = 0x0000000000000000

cntkctl_el1    = 0x00000000000000d6

sp_el0         = 0x000000007000a3d0

isr_el1        = 0x0000000000000040

dacr32_el2     = 0x0000000000000000

ifsr32_el2     = 0x0000000000000000

cpuectlr_el1   = 0x0000001b00000040

cpumerrsr_el1  = 0x0000000000000000

l2merrsr_el1   = 0x0000000000000000

4. If unplug the PCIE device from slot 1 or 2, the kernel will boot normal.

root@j7-evm:~# uname -a
Linux j7-evm 5.10.41-g4c2eade9f7 #1 SMP PREEMPT Wed Aug 4 22:47:28 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

5. If the kernel is not started via NFS, but booting the kernel from SD card, even plugged in PCIE device on slot 1 or 2, the kernel can boot fine and the PCIE devices can be enumerated fine.

  • Xul, 

    Could you do a printenv at u-boot for both of your nfs and SD card boot? From your log of NFS boot, it seems the kernel run into memory violation at the beginning of the boot. Since PCIe driver get loaded a little later in the kernel, I would expect some kernel log being printed if the PCIe driver has a memory violation. 

    Also on the successful boot with SD card, can you do a lsmod and see what kernel modules may be loaded during boot? In previous SDK releases, I had to remove sa2_ul and crypto related kernel modules for NFS. that should be fixed by now. 

    Another test to help debug, not sure if you tried, may be to load the kernel from tftp, but leave file system on the SD card, then see if leaving the PCIe card in the slots can create problem. 

    regards

    Jian

  • Hi, 

    Thanks for you reply.

    I have attached the log information required by you (Including, printenv and lsmod and lspci), we could see that the e1000e PCIe card is scanned fine over SD card booting. But hung over NFS.

    You mentioned the test to download kernel from tftp and leave file system on the SD card, I have tried before, have the same result and it hung on the kernel, It has not yet reached the stage of the mount file system. 

    In addition, we use the universal development board (J721E-EVM) and have this problem on several boards. This problem should be easy to reproduce on your side.

    Thanks

    Xulin

    U-Boot SPL 2021.01-g53e79d0e89 (Aug 04 2021 - 23:32:00 +0000)
    Model: Texas Instruments K3 J721E SoC
    Board: J721EX-PM2-SOM rev E7
    SYSFW ABI: 3.1 (firmware rev 0x0015 '21.5.0--v2021.05 (Terrific Llam')
    Trying to boot from MMC2
    Starting ATF on ARM64 core...
    
    NOTICE:  BL31: v2.5(release):08.00.00.004-dirty
    NOTICE:  BL31: Built : 22:30:09, Aug  4 2021
    
    U-Boot SPL 2021.01-g53e79d0e89 (Aug 04 2021 - 22:33:28 +0000)
    Model: Texas Instruments K3 J721E SoC
    Board: J721EX-PM2-SOM rev E7
    SYSFW ABI: 3.1 (firmware rev 0x0015 '21.5.0--v2021.05 (Terrific Llam')
    Detected: J7X-BASE-CPB rev E3
    Detected: J7X-VSC8514-ETH rev E2
    Trying to boot from MMC2
    
    
    U-Boot 2021.01-g53e79d0e89 (Aug 04 2021 - 22:33:28 +0000)
    
    SoC:   J721E SR1.0
    Model: Texas Instruments K3 J721E SoC
    Board: J721EX-PM2-SOM rev E7
    DRAM:  4 GiB
    Flash: 0 Bytes
    MMC:   sdhci@4f80000: 0, sdhci@4fb0000: 1
    In:    serial@2800000
    Out:   serial@2800000
    Err:   serial@2800000
    Detected: J7X-BASE-CPB rev E3
    Detected: J7X-VSC8514-ETH rev E2
    Net:   am65_cpsw_nuss_slave ethernet@46000000: K3 CPSW: nuss_ver: 0x6BA00101 cpsw_ver: 0x6BA80100 ale_ver: 0x00293904 Ports:1 mdio_freq:1000000
    eth0: ethernet@46000000
    Hit any key to stop autoboot:  0 
    => 
    => 
    => 
    => 
    => setenv bootargs 'console=ttyS2,115200 earlycon printk.time=y root=/dev/nfs rw nfsroot=128.224.165.20:/export/pxeboot/vlm-boards/4040505/ti-j7evm-29073/rootfs,v3,tcp ip=dhcp;'
    => setenv ipaddr 128.224.163.75;setenv netmask 255.255.0.0;setenv gatewayip 128.224.163.1;setenv serverip 128.224.165.20;
    => setenv bootcmd 'tftp 0x82000000 vlm-boards/4040505/ti-j7evm-29073/kernel; tftpboot 0x88000000 vlm-boards/4040505/ti-j7evm-29073/dtb;booti 0x82000000 - 0x88000000';
    => printenv 
    addr_fit=0x90000000
    arch=arm
    args_all=setenv optargs earlycon=ns16550a,mmio32,0x02800000 ${mtdparts}
    args_mmc=run finduuid;setenv bootargs console=${console} ${optargs} root=PARTUUID=${uuid} rw rootfstype=${mmcrootfstype}
    args_ufs=setenv devtype scsi;setenv bootpart 1:1;run ufs_finduuid;setenv bootargs console = ${console} ${optargs}root=PARTUUID=${uuid} rw rootfstype=${scsirootfstype};setenv devtype scsi;setenv bootpart 1:1
    baudrate=115200
    board=j721e
    board_name=J721EX-PM2-SOM
    board_rev=E7
    board_serial=0313
    board_software_revision=01
    boot=mmc
    boot_fdt=try
    boot_fit=0
    boot_rprocs=if test ${dorprocboot} -eq 1 && test ${boot} = mmc; then rproc init;run boot_rprocs_mmc;fi;
    boot_rprocs_mmc=env set rproc_id;env set rproc_fw;for i in ${rproc_fw_binaries} ; do if test -z "${rproc_id}" ; then env set rproc_id $i;else env set rproc_fw $i;run rproc_load_and_boot_one;env set rproc_id;env set rproc_fw;fi;done
    bootargs=console=ttyS2,115200 earlycon printk.time=y root=/dev/nfs rw nfsroot=128.224.165.20:/export/pxeboot/vlm-boards/4040505/ti-j7evm-29073/rootfs,v3,tcp ip=dhcp;
    bootcmd=tftp 0x82000000 vlm-boards/4040505/ti-j7evm-29073/kernel; tftpboot 0x88000000 vlm-boards/4040505/ti-j7evm-29073/dtb;booti 0x82000000 - 0x88000000
    bootdelay=2
    bootdir=/boot
    bootenvfile=uEnv.txt
    bootm_size=0x10000000
    bootpart=1:2
    bootscript=echo Running bootscript from mmc${mmcdev} ...; source ${loadaddr}
    console=ttyS2,115200n8
    cpu=armv8
    default_device_tree=k3-j721e-common-proc-board.dtb
    dfu_alt_info_emmc=rawemmc raw 0 0x800000 mmcpart 1;rootfs part 0 1 mmcpart 0;tiboot3.bin.raw raw 0x0 0x400 mmcpart 1;tispl.bin.raw raw 0x400 0x1000 mmcpart 1;u-boot.img.raw raw 0x1400 0x2000 mmcpart 1;u-env.raw raw 0x3400 0x100 mmcpart 1;sysfw.itb.raw raw 0x3600 0x800 mmcpart 1
    dfu_alt_info_mmc=boot part 1 1;rootfs part 1 2;tiboot3.bin fat 1 1;tispl.bin fat 1 1;u-boot.img fat 1 1;uEnv.txt fat 1 1;sysfw.itb fat 1 1
    dfu_alt_info_ospi=tiboot3.bin raw 0x0 0x080000;tispl.bin raw 0x080000 0x200000;u-boot.img raw 0x280000 0x400000;u-boot-env raw 0x680000 0x020000;sysfw.itb raw 0x6c0000 0x100000;rootfs raw 0x800000 0x3800000
    dfu_alt_info_ram=tispl.bin ram 0x80080000 0x100000;u-boot.img ram 0x81000000 0x100000
    dorprocboot=0
    dtboaddr=0x89000000
    envboot=mmc dev ${mmcdev}; if mmc rescan; then echo SD/MMC found on device ${mmcdev};if run loadbootscript; then run bootscript;else if run loadbootenv; then echo Loaded env from ${bootenvfile};run importbootenv;fi;if test -n $uenvcmd; then echo Running uenvcmd ...;run uenvcmd;fi;fi;fi;
    eth1addr=70:ff:76:1d:98:3e
    eth2addr=70:ff:76:1d:98:3f
    eth3addr=70:ff:76:1d:98:40
    eth4addr=70:ff:76:1d:98:41
    ethaddr=50:51:a9:fc:ed:8c
    fdt_addr_r=0x88000000
    fdtaddr=0x88000000
    fdtcontroladdr=fdeb2d90
    findfdt=setenv name_fdt ${default_device_tree};if test $board_name = J721EX-PM1-SOM; then setenv name_fdt k3-j721e-proc-board-tps65917.dtb; fi;if test $board_name = j721e; then setenv name_fdt k3-j721e-common-proc-board.dtb; fi;if test $board_name = j721e-eaik; then setenv name_fdt k3-j721e-eaik.dtb; fi;setenv fdtfile ${name_fdt}
    finduuid=part uuid mmc ${bootpart} uuid
    gatewayip=128.224.163.1
    get_fdt_mmc=load mmc ${bootpart} ${fdtaddr} ${bootdir}/${name_fdt}
    get_fdt_ufs=load ${devtype} ${bootpart} ${fdtaddr} ${bootdir}/${fdtfile}
    get_fit_mmc=load mmc ${bootpart} ${addr_fit} ${bootdir}/${name_fit}
    get_kern_mmc=load mmc ${bootpart} ${loadaddr} ${bootdir}/${name_kern}
    get_kern_ufs=load ${devtype} ${bootpart} ${loadaddr} ${bootdir}/${name_kern}
    get_overlay_mmc=fdt address ${fdtaddr};fdt resize 0x100000;for overlay in $name_overlays;do;load mmc ${bootpart} ${dtboaddr} ${bootdir}/${overlay} && fdt apply ${dtboaddr};done;
    get_overlay_ufs=fdt address ${fdtaddr};fdt resize 0x100000;for overlay in $name_overlays;do;load scsi ${bootpart} ${dtboaddr} ${bootdir}/${overlay} && fdt apply ${dtboaddr};done;
    get_overlaystring=for overlay in $name_overlays;do;setenv overlaystring ${overlaystring}'#'${overlay};done;
    importbootenv=echo Importing environment from mmc${mmcdev} ...; env import -t ${loadaddr} ${filesize}
    init_mmc=run args_all args_mmc
    init_ufs=ufs init; scsi scan; run args_ufs
    ipaddr=128.224.163.75
    kernel_addr_r=0x82000000
    loadaddr=0x82000000
    loadbootenv=fatload mmc ${mmcdev} ${loadaddr} ${bootenvfile}
    loadbootscript=load mmc ${mmcdev} ${loadaddr} boot.scr
    loadfdt=load ${devtype} ${bootpart} ${fdtaddr} ${bootdir}/${fdtfile}
    loadimage=load ${devtype} ${bootpart} ${loadaddr} ${bootdir}/${bootfile}
    mmcboot=mmc dev ${mmcdev}; devnum=${mmcdev}; devtype=mmc; if mmc rescan; then echo SD/MMC found on device ${mmcdev};if run loadimage; then run args_mmc; if test ${boot_fit} -eq 1; then run run_fit; else run mmcloados;fi;fi;fi;
    mmcdev=1
    mmcloados=if test ${boot_fdt} = yes || test ${boot_fdt} = try; then if run loadfdt; then bootz ${loadaddr} - ${fdtaddr}; else if test ${boot_fdt} = try; then bootz; else echo WARN: Cannot load the DT; fi; fi; else bootz; fi;
    mmcrootfstype=ext4 rootwait
    mtdids=nor0=47040000.spi.0,nor0=47034000.hyperbus
    mtdparts=mtdparts=47040000.spi.0:512k(ospi.tiboot3),2m(ospi.tispl),4m(ospi.u-boot),256k(ospi.env),1m(ospi.sysfw),256k(ospi.env.backup),57344k@8m(ospi.rootfs),256k(ospi.phypattern);47034000.hyperbus:512k(hbmc.tiboot3),2m(hbmc.tispl),4m(hbmc.u-boot),256k(hbmc.env),1m(hbmc.sysfw),-@8m(hbmc.rootfs)
    name_fit=fitImage
    name_kern=Image
    netmask=255.255.0.0
    partitions=uuid_disk=${uuid_gpt_disk};name=rootfs,start=0,size=-,uuid=${uuid_gpt_rootfs}
    pxefile_addr_r=0x80100000
    ramdisk_addr_r=0x88080000
    rd_spec=-
    rdaddr=0x88080000
    rproc_fw_binaries=2 /lib/firmware/j7-main-r5f0_0-fw 3 /lib/firmware/j7-main-r5f0_1-fw 4 /lib/firmware/j7-main-r5f1_0-fw 5 /lib/firmware/j7-main-r5f1_1-fw 6 /lib/firmware/j7-c66_0-fw 7 /lib/firmware/j7-c66_1-fw 8 /lib/firmware/j7-c71_0-fw 
    rproc_load_and_boot_one=if load mmc ${bootpart} $loadaddr ${rproc_fw}; then if rproc load ${rproc_id} ${loadaddr} ${filesize}; then rproc start ${rproc_id};fi;fi
    run_fit=bootm ${addr_fit}#${fdtfile}${overlaystring}
    run_kern=booti ${loadaddr} ${rd_spec} ${fdtaddr}
    scriptaddr=0x80000000
    scsirootfstype=ext4 rootwait
    serial#=0000000000000313
    serverip=128.224.165.20
    soc=k3
    stderr=serial@2800000
    stdin=serial@2800000
    stdout=serial@2800000
    ufs_finduuid=part uuid scsi ${bootpart} uuid
    update_to_fit=setenv loadaddr ${addr_fit}; setenv bootfile ${name_fit}
    vendor=ti
    
    Environment size: 6282/131068 bytes
    => boot
    k3-navss-ringacc ringacc@2b800000: Ring Accelerator probed rings:286, gp-rings[96,20] sci-dev-id:235
    k3-navss-ringacc ringacc@2b800000: dma-ring-reset-quirk: disabled
    am65_cpsw_nuss_slave ethernet@46000000: K3 CPSW: rflow_id_base: 2
    link up on port 1, speed 1000, full duplex
    Using ethernet@46000000 device
    TFTP from server 128.224.165.20; our IP address is 128.224.163.75
    Filename 'vlm-boards/4040505/ti-j7evm-29073/kernel'.
    Load address: 0x82000000
    Loading: #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 #################################################################
    	 ###
    	 3 MiB/s
    done
    Bytes transferred = 25803264 (189ba00 hex)
    am65_cpsw_nuss_slave ethernet@46000000: K3 CPSW: rflow_id_base: 2
    link up on port 1, speed 1000, full duplex
    Using ethernet@46000000 device
    TFTP from server 128.224.165.20; our IP address is 128.224.163.75
    Filename 'vlm-boards/4040505/ti-j7evm-29073/dtb'.
    Load address: 0x88000000
    Loading: #######
    	 1.8 MiB/s
    done
    Bytes transferred = 98030 (17eee hex)
    ## Flattened Device Tree blob at 88000000
       Booting using the fdt blob at 0x88000000
       Loading Device Tree to 000000008ffe5000, end 000000008ffffeed ... OK
    
    Starting kernel ...
    
    ERROR:   Unhandled External Abort received on 0x80000001 from S-EL1
    ERROR:   exception reason=0 syndrome=0xbf000000
    Unhandled Exception from EL1
    x0             = 0xfffffe0014ab0000
    x1             = 0x0000000000000000
    x2             = 0xfffffe0016400008
    x3             = 0x0000000000000001
    x4             = 0x000000000000000b
    x5             = 0xfffffc0802f14800
    x6             = 0x0000000000000001
    x7             = 0x0000000000000000
    x8             = 0xfffffe001254f720
    x9             = 0xfffffe001074a630
    x10            = 0x7f7f7f7f7f7f7f7f
    x11            = 0x0101010101010101
    x12            = 0x0000000000000038
    x13            = 0xfffffc080249929a
    x14            = 0xffffffffffffffff
    x15            = 0xfffffc0802499a1c
    x16            = 0x0000000000000001
    x17            = 0x0000000000000001
    x18            = 0xffffffffffffffff
    x19            = 0x0000000000000004
    x20            = 0xfffffe001254f7c4
    x21            = 0x0000000000000087
    x22            = 0xfffffe001254f864
    x23            = 0x0000000000000000
    x24            = 0xfffffe0011a25490
    x25            = 0x0000000000000001
    x26            = 0xfffffc0802f158e8
    x27            = 0xfffffe00117eeeb8
    x28            = 0xfffffc0802f14800
    x29            = 0xfffffe001254f750
    x30            = 0xfffffe001071b094
    scr_el3        = 0x000000000000073d
    sctlr_el3      = 0x0000000030cd183f
    cptr_el3       = 0x0000000000000000
    tcr_el3        = 0x0000000080803520
    daif           = 0x00000000000002c0
    mair_el3       = 0x00000000004404ff
    spsr_el3       = 0x0000000020000085
    elr_el3        = 0xfffffe001071b0ac
    ttbr0_el3      = 0x0000000070010c00
    esr_el3        = 0x00000000bf000000
    far_el3        = 0x0000000000000000
    spsr_el1       = 0x0000000060000005
    elr_el1        = 0xfffffe0010d99158
    spsr_abt       = 0x0000000000000000
    spsr_und       = 0x0000000000000000
    spsr_irq       = 0x0000000000000000
    spsr_fiq       = 0x0000000000000000
    sctlr_el1      = 0x0000000034d4d91d
    actlr_el1      = 0x0000000000000000
    cpacr_el1      = 0x0000000000300000
    csselr_el1     = 0x0000000000000000
    sp_el1         = 0xfffffe001254f750
    esr_el1        = 0x0000000000000000
    ttbr0_el1      = 0x00000000831a0000
    ttbr1_el1      = 0x00000000831b0000
    mair_el1       = 0x000c0400bb44ffff
    amair_el1      = 0x0000000000000000
    tcr_el1        = 0x00000034f5d67596
    tpidr_el1      = 0xfffffe086e770000
    tpidr_el0      = 0x0000000000000000
    tpidrro_el0    = 0x0000000000000000
    par_el1        = 0x0000000000000000
    mpidr_el1      = 0x0000000080000001
    afsr0_el1      = 0x0000000000000000
    afsr1_el1      = 0x0000000000000000
    contextidr_el1 = 0x0000000000000000
    vbar_el1       = 0xfffffe0010010800
    cntp_ctl_el0   = 0x0000000000000005
    cntp_cval_el0  = 0x00000003aad8081e
    cntv_ctl_el0   = 0x0000000000000000
    cntv_cval_el0  = 0x0000000000000000
    cntkctl_el1    = 0x00000000000000d6
    sp_el0         = 0x000000007000abd0
    isr_el1        = 0x0000000000000040
    dacr32_el2     = 0x0000000000000000
    ifsr32_el2     = 0x0000000000000000
    cpuectlr_el1   = 0x0000001b00000040
    cpumerrsr_el1  = 0x0000000000000000
    l2merrsr_el1   = 0x0000000000000000
    
    
    boot-over-sd.log

  • Xulin, 

    I apologize for the extended delay in response. Could you let me know if there is any further progress on your end during this time? Otherwise, I will work with my colleague from kernel boot to debug the issue.

    You boot log shows the core went to EL1 exception at the beginning of kernel execution, so I am not sure the software caused memory violation. If you happen to tried another PCIe card other than the E1000, for example a SSD device, and still seeing the issue, please let me know. 

    regards

    Jian

  • No. This issue still existed. We use the universal development board (J721E-EVM) and have this problem on several boards. This problem should be easy to reproduce on your side.

  • Xulin, 

    Do you have a SSD handy to give it a quick try? I don't have a PCIe NIC at hand, so may try SSD first. If boot also fails with an SSD inserted, then I can get the same baseline as yours. 

    Jian

  • Hi Jian,

    We'd ever tested the NVME SSD on PCIE slot 3, it worked fine. Please be noticed that only via NFS, the issue occured, if via SD to boot kernel, even plugged in PCIE device on slot 1 or 2, the kernel can boot fine and the PCIE devices can be enumerated fine.

    Thanks

    Xulin

  • wait Xulin, 

    just to confirm, are you mounting the NFS filesys from the same network that PCIe NIC provided, or using a Ethernet port from CPSWx?

    Jian 

  • As described at the very beginning of the problem, when the issue occurs, It has not yet reached the stage of the mount file system. It is in the initialization phase of the kernel. Actually just connecting one Ethernet cable to the PCIe NIC port on the board, not using CPSWx.

  • Hi,

    Is the NFS boot working for you without the PCIe? This happens very early during kernel boot.
    Do you have the system booting to prompt using file system in MMC-SD card?

    - Keerthy

  • Yes, the NFS booting worked fine without the PCIe installed.

    If the kernel does not start via NFS instead of booting the kernel from SD card, even plugged in PCIE device on slot 1 or 2, the kernel can boot fine and the PCIE devices can be enumerated fine.

  • Hi Xulin,

    Do you try to access PCIe very early during the boot? Some module that is not enabled is being accessed looks like.
    What are your bootargs?

    - Keerthy

  • I used the most original TI SDK image and u-boot without any modification, it should not access PCIe very early during the boot.
    By the way, with the latest release of TI Linux SDK 08.01.00.07, the issue still existed.

    The bootargs I used:


    setenv bootargs 'console=ttyS2,115200 earlycon printk.time=y root=/dev/nfs rw nfsroot=192.168.1.3:/tftp/rootfs,v3,tcp ip=dhcp;'

    setenv ipaddr 192.168.1.2;setenv netmask 255.255.0.0;setenv gatewayip 192.168.1.1;setenv serverip 192.168.1.3;

    setenv bootcmd 'tftp 0x82000000 Image; tftpboot 0x88000000 k3-j721e-common-proc-board.dtb;booti 0x82000000 - 0x88000000';run bootcmd

  • Hi,

    Apologies on the long silence. I am checking internally with our experts. I will get back to you as soon as I have feedback.

    Thanks,
    Keerthy

  • Hi,

    Couple of questions:

    • Does this happen with all the PCIe cards connected?
    • is it possible for them to probe PERST line and REFCLK lines when this happens?

    Best Regards,
    Keerthy

  • I did not have others PCIe cards to test.

    I used the common J721E-EVM target board to test, the issue should be easy on your side to reproduce.

    Thanks

    Xulin

  • Hi Xulin,

    Thanks. Which exact card causes the crash? Can you please let me know.

    - Keerthy

  • The PCIe card we used: ark.intel.com/.../intel-82574it-gigabit-ethernet-controller.html

    The target we used:

    SoC: J721E SR1.0
    Model: Texas Instruments K3 J721E SoC
    Board: J721EX-PM2-SOM rev E7
    DRAM: 4 GiB
    Flash: 0 Bytes
    MMC: sdhci@4f80000: 0, sdhci@4fb0000: 1
    In: serial@2800000
    Out: serial@2800000
    Err: serial@2800000
    Detected: J7X-BASE-CPB rev E3
    Detected: J7X-VSC8514-ETH rev E2

  • Hi Keerthy,

    Clicked "This resolved my issue" by mistake, actually the issue is not resolved  and still existed.

    How to re-open for this question?

    Thanks

    Xulin

  • Hi Xulin,

    No issues. I am checking internally on the above part that you have shared. 

    Apologies for the delay. I am checking internally if we are testing NFS boot with the above part.

    Best Regards,
    Keerthy

  • Hi Xulin,

    Can you confirm: SW3, dip switch 5 and 6 are 'OFF'

    Also our expert is asking if you can probe PERST & REFCLK lines while this crash happens?

    - Keerthy

  • Yes, both the switch 5 and 6 of SW3 are 'OFF' state.

    I don't think it's a hardware issue, since it worked fine with booting the Linux kernel from the SD card. 

    And with preempt-rt kernel TI SDK 8.1, the issue will also happen even booting Linux kernel from the SD card.

    Thanks

    Xulin

  • Hi,

    An internal defect is created & this is being debugged, We will keep you posted on the updates.

    Best Regards,
    Keerthy

  • Hi,

    Can you try the below diff & let us know if this helps?

    diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c
    index 40256815e8f1..1882afea3b07 100644
    --- a/drivers/pci/controller/cadence/pci-j721e.c
    +++ b/drivers/pci/controller/cadence/pci-j721e.c
    @@ -427,7 +427,7 @@ static struct pci_ops cdns_ti_pcie_host_ops = {
    
    static const struct j721e_pcie_data j721e_pcie_rc_data = {
    .mode = PCI_MODE_RC,
    - .quirk_retrain_flag = true,
    + .quirk_retrain_flag = false,
    .is_intc_v1 = true,
    .byte_access_allowed = false,
    .linkdown_irq_regfield = LINK_DOWN,

    Regards,
    Keerthy

  • Hi,

    The above code indeed helps, you know the "quirk_retrain_flag = true"  is to retrain Link to work around Gen2 training defect, now it was changed to "false". 

    So the issue is how to work around Gen2 training defect?

    Thanks

    Xulin

  • Hi Xulin,

    Thanks for the quick feedback! Our PCIe experts feedback is that: quirk_retrain_flag = true is needed for GEN2 enumeration but we kind of do it for 
    all modes (GEN1 or GEN2 or GEN3) and the re-training seems to be causing LINKDOWN.

    Is the above change fixing the issue consistently for you? Is it working stably?  How many boots have you tried?

    - Keerthy

  • Thanks for you reply.  After trying a few more times, I can see this problem again occasionally. It doesn't look like this issue has been fully resolved.

    Thanks

    Xulin

    
    
  • Xulin,

    By any chance if this was ever reproduced on 7.3 SDK that is the older SDK. If you had ever tried?

    Also is this reproducible on multiple boards on your side?

    Basically gathering data points to check if this came in newer SDK 8.0.

    - Keerthy

  • Hi Keerthy,


    This problem has never been found in older 7.3 SDK, I believe it was introduced from the SDK 8.0, and it was also reproduced on
    8.1 SDK. If I switch to SDK 7.3, this issue goes away, I don't think this issue depends on the specific target board. Another
    debug information is that using the SDK 8.1 kernel and SDK 7.3 dtb to boot the target board, the issue also did not appear.

    Thanks

    Xulin

  • setenv bootargs 'console=ttyS2,115200 earlycon printk.time=y root=/dev/nfs rw nfsroot=192.168.1.3:/tftp/rootfs,v3,tcp ip=dhcp;'

    setenv ipaddr 192.168.1.2;setenv netmask 255.255.0.0;setenv gatewayip 192.168.1.1;setenv serverip 192.168.1.3;

    setenv bootcmd 'tftp 0x82000000 Image; tftpboot 0x88000000 k3-j721e-common-proc-board.dtb;booti 0x82000000 - 0x88000000';run bootcmd

    Can you also paste the bootargs for working SD boot? We are suspecting if some binary is getting over written in NFS mode.

    - Keerthy

  • The bootargs for working SD boot used the default installed bootargs of the board, no any changes.

    It's not the binary over written, otherwise it will prompt the related information, and I tried to change the address, it did not help.

    And with ".quirk_retrain_flag = false", most of the time it already works.

    Thanks

    Xulin