This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-TDA4VM: The EVM board often crashes when running or no-loading

Part Number: SK-TDA4VM

Hi,

The customer has the following errors when using SK-TDA4VM, no program is running, and it crashes from time to time after power-on. Is it a hardware problem? The two error codes are as follows:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@tda4vm-sk:~# [ 5894.898990] Unable to handle kernel NULL pointer dereferen0
[ 5894.907770] Mem abort info:
[ 5894.907940] Unable to handle kernel paging request at virtual address ffb8000
[ 5894.910552] ESR = 0x96000046
[ 5894.918446] Adjusting arch_sys_counter more than 11% (651145911 vs 93113548)
[ 5894.921484] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5894.928527] Unable to handle kernel paging request at virtual address ffd2004
[ 5894.933800] SET = 0, FnV = 0
[ 5894.941691] Mem abort info:
[ 5894.944731] EA = 0, S1PTW = 0
[ 5894.947509] ESR = 0x96000004
[ 5894.950634] Data abort info:
[ 5894.953673] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5894.956537] ISV = 0, ISS = 0x00000046
[ 5894.961828] SET = 0, FnV = 0
[ 5894.965646] CM = 0, WnR = 1
[ 5894.968684] EA = 0, S1PTW = 0
[ 5894.968685] Data abort info:
[ 5894.971640] user pgtable: 64k pages, 48-bit VAs, pgdp=00000008a46f7c00
[ 5894.974763] =SV 9 0, ISSx= 0800040004
[ 5894.977629] [0000000000000010] pgd=00000008a7660003
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@tda4vm-sk:/opt/edge_ai_apps/docker# [ 439.450938] Unable to handle kernel0
[ 439.458894] Inyuffncie|t s0ackaspaoe tn ha.dleeexcoption!
[ 439.458895] ESx: 0x96000045 -- DABT (current EL)
[ 439.458896] FAR: 0xff00800011000010
[ 439.458898] Task stack: [0xffff8000131a0000..0xffff8000131b0000]
[ 439.458899] IRQ stack: [0xff00800011000000..0xff00800011010000]
[ 439.458900] Overflow stack: [0xffff00087fa802b0..0xffff00087fa812b0]
[ 439.458902] CPU: 1 PID: 868 Comm: snmpd Tainted: G O 5.10.1001
[ 439.458903] Hardware name: Texas Instruments J721E SK (DT)
[ 439.458904] pstate: 400003c5 (nZcv DAIF -PAN -UAO -TCO BTYPE=--)
[ 439.458905] pc : el1_sync+0x0/0x140
[ 439.458906] lr : el1_irq+0xcc/0x180
[ 439.458907] sp : ff00800011000010
[ 439.458908] x29: ffff8000131af6f0 x28: ffff000826dcb600
[ 439.458911] x27: ffff80001116a2f0 x26: ff00800011010000
[ 439.458914] x25: ff00800011000000 x24: ffff8000112a6c38
[ 439.458916] x23: 0000000040000005 x22: ffff8000100b2ec0
[ 439.458919] x21: ffff8000131af710 x20: 0000ffffffffffff
[ 439.458921] x19: ffff8000131af5c0 x18: 0000000000000010
[ 439.458923] x17: 0000000000000000 x16: 0000000000000000
[ 439.458925] x15: ffff000826dcbb30 x14: 206c656e72656b20
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi Nancy,

    The first thing we can check is if the crash is power or heat related.

    Could you advise customer to do the following:

    Regards,

    Takuma

  • Share board revision number, there should be a sticker on the under side of the board saying something like PROC112A

    PROC112A1

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    root@tda4vm-sk:/opt/edge_ai_apps/apps_python#
    root@tda4vm-sk:/opt/edge_ai_apps/apps_python#
    root@tda4vm-sk:/opt/edge_ai_apps/apps_python#
    root@tda4vm-sk:/opt/edge_ai_apps/apps_python# [ 1033.118359] Insufficient stack!
    [ 1033.118361] ESR: 0x96000047 -- DABT (current EL)
    [ 1033.118362] FAR: 0xffff800011ffffe0
    [ 1033.118363] Task stack: [0xffff8000116a0000..0xffff8000116b0000]
    [ 1033.118364] IRQ stack: [0xffff800011ff0000..0xffff800012000000]
    [ 1033.118366] Overflow stack: [0xffff00087fa802b0..0xffff00087fa812b0]
    [ 1033.118367] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 5.10.11
    [ 1033.118369] Hardware name: Texas Instruments J721E SK (DT)
    [ 1033.118370] pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--)
    [ 1033.118371] pc : gic_handle_irq+0x4/0x128
    [ 1033.118372] lr : el1_irq+0xcc/0x180
    [ 1033.118373] sp : ffff800012000000
    [ 1033.118374] x29: ffff8000116aff40 x28: ffff00082016a800
    [ 1033.118377] x27: 0000000000000000 x26: ffff800012000000
    [ 1033.118379] x25: ffff800011ff0000 x24: 00000000000000e0
    [ 1033.118382] x23: 0000000040000005 x22: ffff800010a84398
    [ 1033.118384] x21: ffff8000116aff60 x20: 0000ffffffffffff
    [ 1033.118387] x19: ffff8000116afe10 x18: 0000000000000000
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    Summary of CPU load,
    ====================
    CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    HWA performance statistics,
    ===========================
    DDR performance statistics,
    ===========================
    DDR: READ BW: AVG = 514 MB/s, PEAK = 2326 MB/s
    DDR: WRITE BW: AVG = 9 MB/s, PEAK = 1103 MB/s
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Ensure power supply being used has similar power rating as the recommended power supply (20V/65W): https://www.digikey.com/en/products/detail/qualtek/QADC-65-20-08CB/9771104 

    The power supply conforms to the rated power of (20V/65W), and the maximum output supports 20v/3.25A.

  • The customer tried to re-flash the SD card image (ti-processor-sdk-linux-sk-tda4vm-etcher-image.zip — 1449343 K) to reproduce the bug, ran the Single input multi inference demo and recorded the temperature log for a few minutes and the display output card appeared Dead but minicom still shows the demo running and the temperature monitoring log continues. Then he quit the demo, re-entered any demo with Segmentation fault (core dumped) errors (listed below) and couldn't run the demo. After Reset, the demo can run, but crashes from time to time (the latest crash log and the corresponding temperature during no-load will be listed below, the monitor is still displaying the wallpaper when it crashes, but no operations can be performed, including UART and SSH control)

    Segmentation fault (core dumped):

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    lassification.yamlt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/image_
    2022-04-18 21:16:39,252 INFO Could not find libdlr.so in model artifact. Using o
    APP: Init ... !!!
    MEM: Init ... !!!
    MEM: Initialized DMA HEAP (fd=4) !!!
    MEM: Init ... Done !!!
    IPC: Init ... !!!
    IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
    870.565568 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
    870.565631 s: VX_ZONE_INIT:Enabled
    870.565640 s: VX_ZONE_ERROR:Enabled
    870.565646 s: VX_ZONE_WARNING:Enabled
    870.566199 s: VX_ZONE_INIT:[tivxInitLocal:130] Initialization Done !!!
    870.566450 s: VX_ZONE_INIT:[tivxHostInitLocal:86] Initialization Done for H!
    870.566753 s: VX_ZONE_ERROR:[ownContextCreateCmdObj:161] context object desd
    870.566766 s: VX_ZONE_ERROR:[vxCreateContext:946] context objection creatiod
    870.566776 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL
    870.566783 s: VX_ZONE_ERROR:[tivxAddKernelTIDL:233] Unable to allocate userD
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Crashes from time to time after reset (this time it crashes after reset login):

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    ***************************************************************
    ***************************************************************
    [ OK ] Started Print notice about GPLv3 packages.
    [ OK ] Started weston.service.
    Starting DEMO...
    Starting telnetd.service...
    [ OK ] Started DEMO.
    [ OK ] Started telnetd.service.
    [ 15.936058] PVR_K: 1047: RGX Firmware image 'rgx.fw.22.104.208.318' loaded
    [ 16.768461] am65-cpsw-nuss 46000000.ethernet eth0: Link is Up - 1Gbps/Full -f
    [ 16.777005] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    _____ _____ _ _
    | _ |___ ___ ___ ___ | _ |___ ___ |_|___ ___| |_
    | | _| .'| . | . | | __| _| . | | | -_| _| _|
    |__|__|_| |__,|_ |___| |__| |_| |___|_| |___|___|_|
    |___| |___|
    Arago Project http://arago-project.org tda4vm-sk ttyS2
    Arago 2021.09 tda4vm-sk ttyS2
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi Nancy,

    The board crashing after logging in and running no applications is not a known issue - we have not been able to reproduce this behavior on our boards so far.

    For the errors from the demo, we have seen some instabilities using file input on some of our demos which we are working to fix. 

    A couple of questions for the customer:

    • Are you using file input, or a live input from a camera when running the demo?
    • After power-on and running no application, if you boot 10 times, how many time is the crash observed when sitting idle at login screen? And how long does it take?
    • If you have multiple boards, can this be observed on all boards?
    • If you have multiple SD cards, can this be observed on all SD cards?

    Regards,

    Takuma

  • Hi Takuma,

    Got some updates from the customer side as follows:

    Are you using file input, or a live input from a camera when running the demo?

    When running the demo, the customer using file input and the issue not only occurs when running the demo. 

    After power-on and running no application, if you boot 10 times, how many time is the crash observed when sitting idle at login screen? And how long does it take?

    As you said, after power-on and running no application, the crash occurs occasionally(at random).  In the previous tests, the minimum time was about three minutes after power-on and I'm afraid they are not able to boot 10 times since they don't have much time.

    If you have multiple boards, can this be observed on all boards?

    They do have multiple boards, and for now with the same SD card image on the bad board and other peripherals, only replace the same board model and do not observe a similar issue. So the customer suspects that the board has a h/w problem.

    If you have multiple SD cards, can this be observed on all SD cards?

    They've tried to replace other SD card but this still happens. 

    Assuming the following procedure:

    A. after flashing the SD card image, load the development board and power-on.
    B. Run demo for about ten minutes
    C. the display is stuck, and the demo is terminated via the keyboard, and the program re-runs failed
    D. after rebooting the board, the system crash periodically.

    The point is replacing the same type of board with the same environment does not occur with other development boards of the same type. Could you please help determine if there is a quality problem with this board?

    Thanks and regards,

    Cherry

  • Hi,

    May I know is there any updates?

    Thanks and regards,

    Cherry

  • Hi Cherry,

    Currently checking with HW app team about the process for determining/reporting a defective board.

    Can you confirm that the random kernel crashes are seen only on 1 "bad" board and switching boards fixes the issue? Or do multiple boards exhibit the same failure?

    Regards,

    Takuma

  • Cherry, 

    The Starter Kit looks to be defective. Have your customer go to: https://www.ti.com/productreturns/docs/createReturn.tsp

    and that should get the process started.

    Thanks,

    Alec