SK-TDA4VM: Board is freezing and losing SSH/serial connection

Part Number: SK-TDA4VM

Tool/software:

Hello,

I am experiencing issues with my SK-TDA4VM board. Sometimes, while using it normally, the board seems to freeze completely. When this happens, I lose the SSH connection and, at the same time, the serial console also stops responding.

This issue happens repeatedly, usually when I am running something on the board. For example, I was transferring some files through the SSH connection, and the board froze several times during the process. I tested it multiple times and the same issue occurred.

More recently, I tried using ROS inside Docker with the TI Robotics SDK. When I run one of the commands from the documentation to start ROS, the board freezes again in the same way. Both SSH and serial communication are lost.

I already tried reinstalling the Linux SDK for Edge AI several times, thinking it could be a software issue. I even tested with an older release, but the same problem occurred, so I went back to the latest version (11.00.00.08).

I am not sure if this could be a hardware problem with my board or something else. Any suggestions on how I can debug or fix this?


Best regards,
Heverton

  • Hi Heverton,

    Can you please monitor and report the temperature of the SoC?

  • Hi Mark,

    I monitored the SoC temperature while running a workload, and I captured the values right at the moment when the board froze (I was running a colcon build at that time). Here are the readings from /sys/class/thermal/thermal_zone*/temp:

    46194
    47386
    49046
    49046
    47624

    I also checked the type of each thermal zone, and the values above correspond to:

    • thermal_zone0 -> wkup-thermal

    • thermal_zone1 -> mpu-thermal

    • thermal_zone2 -> c7x-thermal

    • thermal_zone3 -> gpu-thermal

    • thermal_zone4 -> r5f-thermal

    Please let me know if these temperatures are within the expected range. Also, if there is any other recommended method to measure or monitor the temperature of the board, I would appreciate your guidance.

    Best regards,
    Heverton

  • Hi Heverton,

    Thanks for checking. 47-49ºC seems normal for room temperature operation. 

    Are you using an SD card or eMMC?

  •  Additionally, can you please share or check the following:

  • Hi Mark,

    Here are the details you asked for:

    • I am currently using an SD card.

    • The board revision number reported in U-Boot is A1.

    • For the power supply, I am using a 20V/65W adapter (not exactly the same model as in the link you provided, but with the same specs).

    • Regarding the performance logs: I recorded a short video to better capture the behavior, and I extracted two frames as you can see below.

      Here, just before the freeze, the C6x and C7x cores show an extremely high and abnormal load values.

      And this is the exact moment of the freeze. The MPU shows 100% load, while the other cores look normal.


    Best regards,
    Heverton

  • Hi Mark,

    I also wanted to share another detail that I found while testing on a different SK-TDA4VM board using the same SD Card. On this second board, while running a colcon build, the system froze again. This time, the SSH connection was lost, but the serial console remained active, and I was able to capture a memory-related error message right before the crash:

    Finished <<< ti_mmwave_rospkg [10.5s]
    Starting >>> ti_ros_gst_plugins
    [Processing: ti_objdet_range, ti_ros_gst_plugins]
    [  175.145420] node invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0gins:build 40% - 30.2s]
    [  175.159842] CPU: 1 UID: 0 PID: 2116 Comm: node Tainted: G           O       6.12.17-ti-00773-gcdcaeac783e3-dirty #1
    [  175.159857] Tainted: [O]=OOT_MODULE
    [  175.159859] Hardware name: Texas Instruments J721E SK (DT)
    [  175.159863] Call trace:
    [  175.159865]  dump_backtrace+0x90/0xe8
    [  175.159880]  show_stack+0x18/0x24
                                                                    [  175.159884]  dump_stack_lvl+0x74/0x8c
    [1min 4.0s] [  175.159892]  dump_stack+0x18/0x24
    [11/17 complete] [2 ongoing] [ti_objdet_range:build 50% - 37.2s][  175.159895]  dump_header+0x3c/0x1a0
    [  175.159903]  oom_kill_process+0x130/0x348
    [  175.159908]  out_of_memory+0xdc/0x344
    [  175.159913]  __alloc_pages_noprof+0xb08/0xce8
    [  175.159918]  get_free_pages_noprof+0x1c/0x54
    [  175.159922]  proc_pid_cmdline_read+0x1e8/0x3d4
    [  175.159928]  vfs_read+0xc4/0x320
    [  175.159934]  ksys_read+0x74/0x10c
    [  175.159937]  __arm64_sys_read+0x1c/0x28
    [  175.159941]  invoke_syscall+0x48/0x10c
    [  175.159947]  el0_svc_common.constprop.0+0xc0/0xe0
    [  175.159952]  do_el0_svc+0x1c/0x28
    [  175.159955]  el0_svc+0x28/0x98
    [  175.159960]  el0t_64_sync_handler+0x120/0x12c
    [  175.159965]  el0t_64_sync+0x190/0x194
    [  175.159969] Mem-Info:
    [  175.327392] active_anon:275 inactive_anon:506669 isolated_anon:0
    [  175.327392]  active_file:1484 inactive_file:1388 isolated_file:32
    [  175.327392]  unevictable:0 dirty:17 writeback:0
    [  175.327392]  slab_reclaimable:7717 slab_unreclaimable:8015
    [  175.327392]  mapped:2733 shmem:2532 pagetables:3817
    [  175.327392]  sec_pagetables:0 bounce:0
    [  175.327392]  kernel_misc_reclaimable:0
    [  175.327392]  free:14304 free_pcp:326 free_cma:1388
    [  175.366947] Node 0 active_anon:1100kB inactive_anon:2027516kB active_file:1252kB inactive_file:1560kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2196kB dirty:68kB writeback:0kB shmem:10128kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:1075200kB writeback_tmp:0kB kernel_stack:4664kB pagetables:15268kB sec_pagetables:0kB all_unreclaimable? no
    [  175.398743] DMA free:19824kB boost:0kB min:17000kB low:21248kB high:25496kB reserved_highatomic:0KB active_anon:0kB inactive_anon:805260kB active_file:248kB inactive_file:204kB unevictable:0kB writepending:4kB present:2097152kB managed:840380kB mlocked:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB
    [  175.426065] lowmem_reserve[]: 0 0 1349 0
    [  175.430143] Normal free:45456kB boost:0kB min:28052kB low:35064kB high:42076kB reserved_highatomic:0KB active_anon:1100kB inactive_anon:1221332kB active_file:1596kB inactive_file:1636kB unevictable:0kB writepending:232kB present:2097152kB managed:1382292kB mlocked:0kB bounce:0kB free_pcp:624kB local_pcp:208kB free_cma:7568kB
    [  175.459040] lowmem_reserve[]: 0 0 0 0
    [  175.462736] DMA: 573*4kB (UME) 318*8kB (UME) 180*16kB (UME) 97*32kB (UME) 41*64kB (UME) 13*128kB (UME) 5*256kB (UME) 3*512kB (UME) 0*1024kB 1*2048kB (M) 0*4096kB = 19972kB
    [  175.478075] Normal: 3131*4kB (UMEC) 1299*8kB (UMEC) 575*16kB (UMEC) 188*32kB (UMEC) 73*64kB (UMEC) 14*128kB (UME) 2*256kB (E) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45108kB
    [  175.493318] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    [  175.502031] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
    [  175.510567] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    [  175.519143] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
    [  175.527430] 3492 total pagecache pages
    [  175.531173] 0 pages in swap cache
    [  175.534514] Free swap  = 0kB
    [  175.537396] Total swap = 0kB
    [  175.540302] 1048576 pages RAM
    [  175.543259] 0 pages HighMem/MovableOnly
    [  175.547126] 492908 pages reserved
    [  175.550467] 131072 pages cma reserved
    [  175.554156] 0 pages hwpoisoned
    [  175.557230] Tasks state (memory values in pages):
    [  175.561953] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    [  175.573024] [    114]   998   114     1163       59       32       27         0    45056        0             0 rpcbind
    [  175.583829] [    115]     0   115    10016      644      288        4       352    86016        0          -250 systemd-journal
    [  175.595308] [    156]     0   156     7639      967      928       39         0    81920        0         -1000 systemd-udevd
    [  175.606645] [    163]   992   163     3827      215      192       23         0    69632        0             0 systemd-resolve
    [  175.618502] [    164]   991   164    22282      224      192       32         0    77824        0             0 systemd-timesyn
    [  175.630384] [    166]     0   166     3712      224      192       32         0    65536        0             0 systemd-userdbd
    [  175.642530] [    168]     0   168     3951      202      192       10         0    73728        0             0 systemd-userwor
    [  175.654256] [    169]     0   169     3952      235      192       43         0    73728        0             0 systemd-userwor
    [  175.675078] [    170]     0   170     3877      237      192       45         0    65536        0             0 systemd-userwor
    [  175.690752] [    311]   999   311     1714      177      122       55         0    49152        0          -900 dbus-broker-lau
    [  175.702360] [    332]   999   332      830      137       84       53         0    49152        0          -900 dbus-broker
    [  175.713634] [    337]     0   337      620       76       32       44         0    40960        0             0 atd
    [  175.724182] [    343]   995   343     1854      176       96       80         0    49152        0             0 avahi-daemon
    [  175.735431] [    345]     0   345      725       79       32       47         0    40960        0             0 crond
    [  175.746056] [    358]     0   358    19707      113       96       17         0    49152        0             0 irqbalance
    [  175.757110] [    361]     0   361      575       45        0       45         0    40960        0             0 mstpd
    [  175.767871] [    363]     0   363     2459      161      128       33         0    49152        0             0 ofonod
    [  175.778587] [    364]   990   364    21630      779      730       49         0    77824        0             0 pulseaudio
    [  175.789659] [    366]     0   366      699       65       32       33         0    40960        0             0 starter
    [  175.800462] [    369]     0   369     3806      236      192       44         0    69632        0             0 systemd-logind
    [  175.811868] [    376]   993   376     4152      262      224       38         0    65536        0             0 systemd-network
    [  175.823379] [    377]   995   377     1854      110       70       40         0    49152        0             0 avahi-daemon
    [  175.834611] [    382]     0   382      875       60        0       60         0    40960        0             0 telnetd
    [  175.845429] [    446]     0   446   266996      453      416       37         0   200704        0             0 charon
    [  175.856155] [    448]     0   448      962       82       64       18         0    45056        0             0 lldpd
    [  175.866781] [    449]     0   449      775       47        0       47         0    45056        0             0 netserver
    [  175.877753] [    456]     0   456     3589      830      800       30         0    65536        0             0 snmpd
    [  175.888391] [    458]     0   458   447661     4201     4101      100         0   270336        0          -999 containerd
    [  175.899463] [    467]     0   467      566       35        0       35         0    40960        0             0 agetty
    [  175.910176] [    470]     0   470     1656       86       64       22         0    57344        0             0 login
    [  175.920807] [    476]   997   476      819       81       33       48         0    45056        0             0 lldpd
    [  175.931438] [    487]     0   487      723       58        0       58         0    40960        0             0 tee-supplicant
    [  175.942839] [   1277]     0  1277     4362      407      384       23         0    73728        0           100 systemd
    [  175.953649] [   1279]     0  1279     4802      343      309       34         0    69632        0           100 (sd-pam)
    [  175.964539] [   1284]     0  1284     1451      549      544        5         0    49152        0             0 sh
    [  175.974906] [   1391]     0  1391      916       71       32       39         0    40960        0             0 docker_run.sh
    [  175.986260] [   1392]     0  1392   442417     2108     2053       55         0   225280        0             0 docker
    [  175.996978] [   1399]     0  1399   490137     7710     7674       36         0   352256        0          -500 dockerd
    [  176.007772] [   1573]     0  1573   512816     1379     1369       10         0   204800        0          -998 containerd-shim
    [  176.019264] [   1594]     0  1594     1135      156      128       28         0    49152        0             0 bash
    [  176.029805] [   1683]     0  1683      743       77       32       45         0    45056        0             0 dropbear
    [  176.040690] [   1684]     0  1684      982      133       96       37         0    40960        0             0 bash
    [  176.051237] [   1730]     0  1730      916       67       32       35         0    40960        0             0 sh
    [  176.061604] [   1740]     0  1740  2890957    18324    18097      227         0  1269760        0             0 node
    [  176.072142] [   1998]     0  1998      556       45        0       45         0    40960        0             0 sleep
    [  176.082765] [   2105]     0  2105  8233509    30232    29969      263         0  3174400        0             0 node
    [  176.093300] [   2116]     0  2116   254747     7804     7749       55         0   958464        0             0 node
    [  176.103843] [   2153]     0  2153     1130      114       96       18         0    40960        0             0 sh
    [  176.114216] [   2169]     0  2169   230396     6054     5797      257         0   815104        0             0 node
    [  176.124757] [   2210]     0  2210   187400     3119     3065       54         0   643072        0             0 node
    [  176.135301] [   2253]     0  2253   122766     1710     1678       32         0   802816        0             0 perf_stats
    [  176.146361] [   2404]     0  2404   159193     7537     7379      158         0   200704        0             0 colcon
    [  176.157076] [   4845]     0  4845    15858      508      480       28         0   114688        0             0 cmake
    [  176.167699] [   4847]     0  4847      750       98       64       34         0    45056        0             0 gmake
    [  176.178327] [   4856]     0  4856      750       97       64       33         0    40960        0             0 gmake
    [  176.188951] [   4873]     0  4873      750       98       64       34         0    49152        0             0 gmake
    [  176.199582] [   4889]     0  4889      894       32       32        0         0    40960        0             0 c++
    [  176.210037] [   4899]     0  4899   179622   164526   164339      187         0  1441792        0             0 cc1plus
    [  176.220831] [   4982]     0  4982    15859      521      497       24         0   114688        0             0 cmake
    [  176.231452] [   4984]     0  4984      750       97       64       33         0    45056        0             0 gmake
    [  176.242208] [   4988]     0  4988      750       64       64        0         0    45056        0             0 gmake
    [  176.252882] [   4992]     0  4992      885      194      192        2         0    45056        0             0 gmake
    [  176.263527] [   4998]     0  4998      894       32       32        0         0    40960        0             0 c++
    [  176.273984] [   4999]     0  4999   128562   119124   118855      269         0  1069056        0             0 cc1plus
    [  176.284786] [   5000]     0  5000      894       45       32       13         0    40960        0             0 c++
    [  176.295240] [   5001]     0  5001   128940   119791   119587      204         0  1069056        0             0 cc1plus
    [  176.306042] [   5184]     0  5184   122766     1710     1678       32         0   802816        0             0 perf_stats
    [  176.317101] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-dropbear.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-1f8884a2818c2d0ff55b2e6183c8ab40e9aea337790ef4c3fe0599aae1682008.scope,task=cc1plus,pid=4899,uid=0
    [  176.339208] Out of memory: Killed process 4899 (cc1plus) total-vm:718488kB, anon-rss:657356kB, file-rss:1004kB, shmem-rss:0kB, UID:0 pgtables:1408kB oom_score_adj:0
    --- stderr: ti_objdet_range
    ** WARNING ** io features related to pcap will be disabled
    c++: fatal error: Killed signal terminated program cc1plus
    compilation terminated.
    gmake[2]: *** [CMakeFiles/objdet_disparity_fusion.dir/build.make:76: CMakeFiles/objdet_disparity_fusion.dir/src/objdet_disparity_fusion.cpp.o] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:137: CMakeFiles/objdet_disparity_fusion.dir/all] Error 2
    gmake: *** [Makefile:146: all] Error 2
    ---
    Failed   <<< ti_objdet_range [41.8s, exited with code 2]
    Aborted  <<< ti_ros_gst_plugins [2min 0s]
    
    Summary: 11 packages finished [2min 34s]
      1 package failed: ti_objdet_range
      1 package aborted: ti_ros_gst_plugins
      4 packages had stderr output: serial ti_external ti_mmwave_rospkg ti_objdet_range
      4 packages not processed

    I also ran the same performance log test, and the behavior was very similar to the first board.

    This seems to reinforce that the issue is not isolated to a single board, since I was able to reproduce similar failures on two different SK-TDA4VM units. I will also try with another SD Card to rule that out.

    Best regards,
    Heverton

  • Hi Heverton,

    Robotics SDK/ROS had been descoped in 11.0 SDK. There were mentions about an older release was tested out, but could you try to reproduce issue using 10.1 SDK? https://www.ti.com/tool/download/PROCESSOR-SDK-LINUX-SK-TDA4VM/10.01.00.04

    This issue happens repeatedly, usually when I am running something on the board. For example, I was transferring some files through the SSH connection, and the board froze several times during the process. I tested it multiple times and the same issue occurred.

    If possible, could you try reproducing issue using this ssh method as it will have less moving parts compared to ROS. And when issue happens, please share the command that was ran, the file (or a description of the file including size, file type, whether it is a folder, etc) that was sent, and any logs that you can gather. Also, how frequent or easily reproducible the issue is, whether it takes 1 hour to reproduce, or the issue only happens 1 out of 20 attempts, etc.

    Essentially, any information that you think may be useful in reproducing the issue at our benches would be appreciated.

    Regards,

    Takuma

  • Hi Takuma,

    Thanks for the follow-up. I did some tests with data transfer over SSH as you suggested:

    When I transfer files via command line (using scp), everything works fine.

    However, when I transfer the same files using VSCode remote support (drag-and-drop into the board), the board eventually freezes. For example, I tried transferring a folder containing around 1000 JPG images (8.58 GB total). With scp, the transfer completed successfully, but with VSCode the freeze consistently happens after 250 images.

    That said, the bigger issue for me is still with ROS. I tried what you recommended and switched to the 10.01 SDK to test it. Unfortunately, the same freezing issue occurs. In this version, I no longer even see the memory-related error message I had before. The board simply freezes (both SSH and serial stop responding).

    For reference, I reproduced these issues on two different SK-TDA4VM boards, both revision A1, and also tested with two different SD cards, with the same results.

    Do you have any suggestions on what else I could try to debug this?

    Best regards,
    Heverton