This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
The customer has the following errors when using SK-TDA4VM, no program is running, and it crashes from time to time after power-on. Is it a hardware problem? The two error codes are as follows:
root@tda4vm-sk:~# [ 5894.898990] Unable to handle kernel NULL pointer dereferen0 [ 5894.907770] Mem abort info: [ 5894.907940] Unable to handle kernel paging request at virtual address ffb8000 [ 5894.910552] ESR = 0x96000046 [ 5894.918446] Adjusting arch_sys_counter more than 11% (651145911 vs 93113548) [ 5894.921484] EC = 0x25: DABT (current EL), IL = 32 bits [ 5894.928527] Unable to handle kernel paging request at virtual address ffd2004 [ 5894.933800] SET = 0, FnV = 0 [ 5894.941691] Mem abort info: [ 5894.944731] EA = 0, S1PTW = 0 [ 5894.947509] ESR = 0x96000004 [ 5894.950634] Data abort info: [ 5894.953673] EC = 0x25: DABT (current EL), IL = 32 bits [ 5894.956537] ISV = 0, ISS = 0x00000046 [ 5894.961828] SET = 0, FnV = 0 [ 5894.965646] CM = 0, WnR = 1 [ 5894.968684] EA = 0, S1PTW = 0 [ 5894.968685] Data abort info: [ 5894.971640] user pgtable: 64k pages, 48-bit VAs, pgdp=00000008a46f7c00 [ 5894.974763] =SV 9 0, ISSx= 0800040004 [ 5894.977629] [0000000000000010] pgd=00000008a7660003 [ 5894.984132] CM = 0, WnR = 0 [ 5894.984134] [ffd2000824ffafb4] address between user and kernel address ranges [ 5894.984136] Unable to handle kernel NULL pointer dereference at virtual addr0 [ 5894.984138] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 5894.984138] Mem abort info: [ 5894.984140] ESR = 0x96000006 [ 5894.984141] Modules linked in: [ 5894.984142] EC = 0x25: DABT (current EL), IL = 32 bits [ 5894.984143] xt_conntrack [ 5894.984144] SET = 0, FnV = 0 [ 5894.984145] xt_MASQUERADE [ 5894.984146] EA = 0, S1PTW = 0 [ 5894.984147] xt_addrtype [ 5894.984148] Data abort info: [ 5894.984149] iptable_filter [ 5894.984150] ISV = 0, ISS = 0x00000006 [ 5894.984151] iptable_nat [ 5894.984152] CM = 0, WnR = 0 [ 5894.984152] nf_nat [ 5894.984154] user pgtable: 64k pages, 48-bit VAs, pgdp=00000008a46f7c00 [ 5894.984155] nf_conntrack [ 5894.984156] [0000000000000000] pgd=00000008a7660003 [ 5894.984156] nf_defrag_ipv6 [ 5894.984157] , p4d=00000008a7660003 [ 5894.984158] nf_defrag_ipv4 [ 5894.984159] , pud=00000008a7660003 [ 5894.984160] libcrc32c [ 5894.984160] , pmd=0000000000000000 [ 5894.984161] ip_tables [ 5894.984162] [ 5894.984163] x_tables bridge stp llc overlay xfrm_user xfrm_algo md5 ecb aes6 [ 5894.984249] CPU: 1 PID: 162 Comm: systemd-journal Tainted: G O 1 [ 5894.984251] Hardware name: Texas Instruments J721E SK (DT) [ 5894.984254] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--) [ 5894.984265] pc : _raw_write_lock_irqsave+0x168/0x318 [ 5894.984270] lr : try_to_wake_up+0x5c/0x4e0 [ 5894.984271] sp : ffff8000113afdd0 [ 5894.984273] x29: ffff8000113afdd0 x28: ffff8000100d8ad0 [ 5894.984277] x27: ffff00087fa88300 x26: 0000000000000006 [ 5894.984280] x25: 0000000000000001 x24: ffd2000824ffafb4 [ 5894.984283] x23: 0000055cb0c1537c x22: 0000000000000000 [ 5894.984287] x21: ffff000824c17e00 x20: 0000000000000003 [ 5894.984290] x19: 0000000000000080 x18: 0000000000000010 [ 5894.984293] x17: 0000000000000000 x16: 0000000000000000 [ 5894.984296] x15: 000000385caf6754 x14: 000000000000030c [ 5894.984300] x13: 000000000000030e x12: 0000000000000001 [ 5894.984303] x11: 0000000000000000 x10: 0000000000000001 [ 5894.984306] x9 : 000000000000030e x8 : 000000000102258e [ 5894.984309] x7 : ffff00087fa8b140 x6 : ffff000824484c78 [ 5894.984312] x5 : 0000000000000000 x4 : 0000000000000000 [ 5894.984316] x3 : ffd2000824ffafb4 x2 : 0000000000000001 [ 5894.984319] x1 : 0000000000000000 x0 : 0000000000010003 [ 5894.984323] Call trace: [ 5894.984327] _raw_write_lock_irqsave+0x168/0x318 [ 5894.984329] try_to_wake_up+0x5c/0x4e0 [ 5894.984332] wake_up_process+0x18/0x28 [ 5894.984337] hrtimer_wakeup+0x20/0x38 [ 5894.984340] __hrtimer_run_queues+0x114/0x1b8 [ 5894.984342] hrtimer_interrupt+0xe8/0x248 [ 5894.984347] arch_timer_handler_phys+0x34/0x48 [ 5894.984350] handle_percpu_devid_irq+0x84/0x148 [ 5894.984354] generic_handle_irq+0x30/0x48 [ 5894.984356] __handle_domain_irq+0x64/0xc0 [ 5894.984362] gic_handle_irq+0x58/0x128 [ 5894.984366] el1_irq+0xcc/0x180 [ 5894.984368] console_unlock+0x268/0x428 [ 5894.984370] vprintk_emit+0x134/0x268 [ 5894.984373] vprintk_default+0x38/0x48 [ 5894.984376] vprintk_func+0xf4/0x2a8 [ 5894.984379] printk+0x60/0x84 [ 5894.984383] die_kernel_fault+0x40/0x78 [ 5894.984385] __do_kernel_fault+0x74/0x150 [ 5894.984387] do_bad_area+0x5c/0x68 [ 5894.984390] do_translation_fault+0x38/0x68 [ 5894.984392] do_mem_abort+0x40/0xa0 [ 5894.984394] el1_abort+0x48/0x70 [ 5894.984397] el1_sync_handler+0xac/0xc8 [ 5894.984399] el1_sync+0x88/0x10 [ 5894.984401] 08ff00800v10f826b8 [ 5894.984406] Code: 451806018d5334611 526b0a90 f8800871 f8850fc60) [ 5894.984413] ---[ end trace 2f5eabcaa9b203ad ]--- [ 5894.984416] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 5894.984419] SMP: stopping secondary CPUs [ 5896.056422] SMP: failed to stop secondary CPUs 0-1 [ 5896.056429] Kernel Offset: disabled [ 5896.056431] CPU features: 0x0040022,20006008 [ 5896.056433] Memory Limit: none [ 5896.481800] ---[ end Kernel panic - not syncing: Oops: Fatal exception in in-
root@tda4vm-sk:/opt/edge_ai_apps/docker# [ 439.450938] Unable to handle kernel0 [ 439.458894] Inyuffncie|t s0ackaspaoe tn ha.dleeexcoption! [ 439.458895] ESx: 0x96000045 -- DABT (current EL) [ 439.458896] FAR: 0xff00800011000010 [ 439.458898] Task stack: [0xffff8000131a0000..0xffff8000131b0000] [ 439.458899] IRQ stack: [0xff00800011000000..0xff00800011010000] [ 439.458900] Overflow stack: [0xffff00087fa802b0..0xffff00087fa812b0] [ 439.458902] CPU: 1 PID: 868 Comm: snmpd Tainted: G O 5.10.1001 [ 439.458903] Hardware name: Texas Instruments J721E SK (DT) [ 439.458904] pstate: 400003c5 (nZcv DAIF -PAN -UAO -TCO BTYPE=--) [ 439.458905] pc : el1_sync+0x0/0x140 [ 439.458906] lr : el1_irq+0xcc/0x180 [ 439.458907] sp : ff00800011000010 [ 439.458908] x29: ffff8000131af6f0 x28: ffff000826dcb600 [ 439.458911] x27: ffff80001116a2f0 x26: ff00800011010000 [ 439.458914] x25: ff00800011000000 x24: ffff8000112a6c38 [ 439.458916] x23: 0000000040000005 x22: ffff8000100b2ec0 [ 439.458919] x21: ffff8000131af710 x20: 0000ffffffffffff [ 439.458921] x19: ffff8000131af5c0 x18: 0000000000000010 [ 439.458923] x17: 0000000000000000 x16: 0000000000000000 [ 439.458925] x15: ffff000826dcbb30 x14: 206c656e72656b20 [ 439.458928] x13: 656c646e6168206f x12: 6120747365757165 [ 439.458930] x11: 7220676e69676170 x10: 736572646461206c [ 439.458932] x9 : 6175747269762074 x8 : 3038303035323830 [ 439.458934] x7 : 3030303030302073 x6 : ffff8000112a6de7 [ 439.458936] x5 : 0000000000000000 x4 : 0000000000000000 [ 439.458939] x3 : 00000000ffffffff x2 : ffff80086eaa0000 [ 439.458941] x1 : ffff8000104b7ea0 x0 : ffff8000131af5c0 [ 439.458943] Kernel panic - not syncing: kernel stack overflow [ 439.458951] Unable to handle kernel paging request at virtual address ffff800 [ 439.458953] Mem abort info: [ 439.458955] ESR = 0x96000007 [ 439.458957] EC = 0x25: DABT (current EL), IL = 32 bits [ 439.458959] SET = 0, FnV = 0 [ 439.458960] EA = 0, S1PTW = 0 [ 439.458961] Data abort info: [ 439.458963] ISV = 0, ISS = 0x00000007 [ 439.458964] CM = 0, WnR = 0 [ 439.458967] swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000082f40000 [ 439.458969] [ffff800011ffb390] pgd=00000008ffff0003, p4d=00000008ffff0003, p0 [ 439.458977] Internal error: Oops: 96000007 [#1] PREEMPT SMP [ 439.458979] Modules linked in: xt_conntrack xt_MASQUERADE xt_addrtype iptabl6 [ 439.459075] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.11 [ 439.459076] Hardware name: Texas Instruments J721E SK (DT) [ 439.459079] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--) [ 439.459089] pc : timekeeping_advance+0x64/0x560 [ 439.459092] lr : timekeeping_advance+0x44/0x560 [ 439.459093] sp : ffff80001138fe60 [ 439.459095] x29: ffff80001138fe60 x28: ffff80001115b980 [ 439.459099] x27: 00000000ffee651c x26: ffff800011390000 [ 439.459102] x25: ffff800011380000 x24: ffff8000112ca000 [ 439.459106] x23: ffff8000112cb000 x22: ffff8000112ca1b8 [ 439.459109] x21: ffff8000112ca1a8 x20: ffff8000112ca080 [ 439.459112] x19: 0000000000000001 x18: 0000000000000000 [ 439.459115] x17: 0000000000000000 x16: 0000000000000000 [ 439.459118] x15: 0000000000000000 x14: 00000000000001fd [ 439.459121] x13: 0000000000000000 x12: 0000000000000001 [ 439.459125] x11: ffff8000112a4000 x10: 00000000000009a0 [ 439.459128] x9 : ffff8000112a44f0 x8 : ffff80001115c380 [ 439.459131] x7 : ffff800011151000 x6 : 0000000100008910 [ 439.459134] x5 : 00ffffffffffffff x4 : 0000000000000000 [ 439.459137] x3 : ffff8000112ca1a8 x2 : 0000000000000001 [ 439.459141] x1 : 0000000000000000 x0 : ffff800011ffb390 [ 439.459145] Call trace: [ 439.459148] timekeeping_advance+0x64/0x560 [ 439.459152] update_wall_time+0x14/0x20 [ 439.459155] tick_do_update_jiffies64.part.0+0xa8/0x118 [ 439.459157] tick_irq_enter+0xc0/0xf0 [ 439.459160] irq_enter_rcu+0x60/0x68 [ 439.459162] irq_enter+0x14/0x20 [ 439.459165] __handle_domain_irq+0x40/0xc0 [ 439.459171] gic_handle_irq+0x58/0x128 [ 439.459173] el1_irq+0xcc/0x180 [ 439.459175] Unable to handle kernel paging request at virtual address 0018808 [ 439.459176] Mem abort info: [ 439.459177] ESR = 0x86000004 [ 439.459180] EC = 0x21: IABT (current EL), IL = 32 bits [ 439.459181] SET = 0, FnV = 0 [ 439.459182] EA = 0, S1PTW = 0 [ 439.459184] [0018800010ff4958] address between user and kernel address ranges [ 440.533958] SMP: stopping secondary CPUs [ 440.533960] SMP: failed to stop secondary CPUs 0-1 [ 440.533961] Kernel Offset: disabled [ 440.533961] CPU features: 0x0040022,20006008 [ 440.533963] Memory Limit: none
Hi Nancy,
The first thing we can check is if the crash is power or heat related.
Could you advise customer to do the following:
Regards,
Takuma
Share board revision number, there should be a sticker on the under side of the board saying something like PROC112A
PROC112A1
Print out temperature when crash occurs: https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-sk-tda4vm/08_02_00/exports/docs/performance_visualizer.html#generating-performance-logs
root@tda4vm-sk:/opt/edge_ai_apps/apps_python# root@tda4vm-sk:/opt/edge_ai_apps/apps_python# root@tda4vm-sk:/opt/edge_ai_apps/apps_python# root@tda4vm-sk:/opt/edge_ai_apps/apps_python# [ 1033.118359] Insufficient stack! [ 1033.118361] ESR: 0x96000047 -- DABT (current EL) [ 1033.118362] FAR: 0xffff800011ffffe0 [ 1033.118363] Task stack: [0xffff8000116a0000..0xffff8000116b0000] [ 1033.118364] IRQ stack: [0xffff800011ff0000..0xffff800012000000] [ 1033.118366] Overflow stack: [0xffff00087fa802b0..0xffff00087fa812b0] [ 1033.118367] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 5.10.11 [ 1033.118369] Hardware name: Texas Instruments J721E SK (DT) [ 1033.118370] pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--) [ 1033.118371] pc : gic_handle_irq+0x4/0x128 [ 1033.118372] lr : el1_irq+0xcc/0x180 [ 1033.118373] sp : ffff800012000000 [ 1033.118374] x29: ffff8000116aff40 x28: ffff00082016a800 [ 1033.118377] x27: 0000000000000000 x26: ffff800012000000 [ 1033.118379] x25: ffff800011ff0000 x24: 00000000000000e0 [ 1033.118382] x23: 0000000040000005 x22: ffff800010a84398 [ 1033.118384] x21: ffff8000116aff60 x20: 0000ffffffffffff [ 1033.118387] x19: ffff8000116afe10 x18: 0000000000000000 [ 1033.118389] x17: 0000000000000000 x16: 0000000000000000 [ 1033.118391] x15: 0000000000000000 x14: 0000000000000000 [ 1033.118393] x13: 0000000000000001 x12: 0000000000000000 [ 1033.118395] x11: 0000000000000000 x10: 00000000000009a0 [ 1033.118398] x9 : ffff8000116afec0 x8 : ffff00082016b200 [ 1033.118400] x7 : 0000000000000000 x6 : ffff00082016a800 [ 1033.118403] x5 : ffff00087fa8bd40 x4 : ffff00087fa8be40 [ 1033.118405] x3 : 0000000000000000 x2 : 0000000000077796 [ 1033.118407] x1 : ffff8000104b7ea0 x0 : ffff8000116afe10 [ 1033.118410] Kernel panic - not syncing: kernel stack overflow [ 1033.118411] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 5.10.11 [ 1033.118413] Hardware name: Texas Instruments J721E SK (DT) [ 1033.118414] Call trace: [ 1033.118415] dump_backtrace+0x0/0x1a0 [ 1033.118416] show_stack+0x18/0x68 [ 1033.118417] dump_stack+0xd0/0x12c [ 1033.118418] panic+0x16c/0x334 [ 1033.118419] nmi_panic+0x8c/0x90 [ 1033.118420] handle_bad_stack+0x11c/0x148 [ 1033.118420] __bad_stack+0x9c/0xa0 [ 1033.118422] gic_handle_irq+0x4/0x128 [ 1033.118423] arch_cpu_idle+0x18/0x28 [ 1033.118424] default_idle_call+0x20/0x68 [ 1033.118425] do_idle+0xc0/0x128 [ 1033.118425] cpu_startup_entry+0x24/0x60 [ 1033.118427] secondary_start_kernel+0x14c/0x178 [ 1033.118442] SMP: stopping secondary CPUs [ 1033.118443] Kernel Offset: disabled [ 1033.118444] CPU features: 0x0040022,20006008 [ 1033.118445] Memory Limit: none
Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 514 MB/s, PEAK = 2326 MB/s DDR: WRITE BW: AVG = 9 MB/s, PEAK = 1103 MB/s DDR: TOTAL BW: AVG = 523 MB/s, PEAK = 3429 MB/s SoC temperature statistics ========================== CPU: 80.46 degree Celsius WKUP: 79.03 degree Celsius C7X: 80.87 degree Celsius GPU: 81.27 degree Celsius R5F: 80.06 degree Celsius Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 1.56 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 513 MB/s, PEAK = 2284 MB/s DDR: WRITE BW: AVG = 9 MB/s, PEAK = 1218 MB/s DDR: TOTAL BW: AVG = 522 MB/s, PEAK = 3502 MB/s SoC temperature statistics ========================== CPU: 79.65 degree Celsius WKUP: 79.24 degree Celsius C7X: 80.87 degree Celsius GPU: 81.88 degree Celsius R5F: 80.26 degree Celsius Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 514 MB/s, PEAK = 2032 MB/s DDR: WRITE BW: AVG = 8 MB/s, PEAK = 984 MB/s DDR: TOTAL BW: AVG = 522 MB/s, PEAK = 3016 MB/s SoC temperature statistics ========================== CPU: 80.46 degree Celsius WKUP: 79.03 degree Celsius C7X: 80.46 degree Celsius GPU: 81.68 degree Celsius R5F: 79.85 degree Celsius Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 513 MB/s, PEAK = 2341 MB/s DDR: WRITE BW: AVG = 8 MB/s, PEAK = 1226 MB/s DDR: TOTAL BW: AVG = 521 MB/s, PEAK = 3567 MB/s SoC temperature statistics ========================== CPU: 80.26 degree Celsius WKUP: 79.03 degree Celsius C7X: 81.27 degree Celsius GPU: 81.47 degree Celsius R5F: 80.26 degree Celsius Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 513 MB/s, PEAK = 1852 MB/s DDR: WRITE BW: AVG = 8 MB/s, PEAK = 962 MB/s DDR: TOTAL BW: AVG = 521 MB/s, PEAK = 2814 MB/s SoC temperature statistics ========================== CPU: 80.46 degree Celsius WKUP: 79.65 degree Celsius C7X: 80.67 degree Celsius GPU: 81.68 degree Celsius R5F: 80.26 degree Celsius Summary of CPU load, ==================== CPU: mpu1_0: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_0: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c6x_2: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % ) HWA performance statistics, =========================== DDR performance statistics, =========================== DDR: READ BW: AVG = 515 MB/s, PEAK = 2193 MB/s DDR: WRITE BW: AVG = 8 MB/s, PEAK = 1228 MB/s DDR: TOTAL BW: AVG = 523 MB/s, PEAK = 3421 MB/s SoC temperature statistics ========================== CPU: 80.06 degree Celsius WKUP: 79.24 degree Celsius C7X: 81.27 degree Celsius GPU: 81.88 degree Celsius R5F: 80.87 degree Celsius Network error: Software caused connection abort ──────────────────────────────────────────────────────────────────────────────── Session stopped - Press <return> to exit tab - Press R to restart session - Press S to save terminal output to file
Ensure power supply being used has similar power rating as the recommended power supply (20V/65W): https://www.digikey.com/en/products/detail/qualtek/QADC-65-20-08CB/9771104
The power supply conforms to the rated power of (20V/65W), and the maximum output supports 20v/3.25A.
The customer tried to re-flash the SD card image (ti-processor-sdk-linux-sk-tda4vm-etcher-image.zip — 1449343 K) to reproduce the bug, ran the Single input multi inference demo and recorded the temperature log for a few minutes and the display output card appeared Dead but minicom still shows the demo running and the temperature monitoring log continues. Then he quit the demo, re-entered any demo with Segmentation fault (core dumped) errors (listed below) and couldn't run the demo. After Reset, the demo can run, but crashes from time to time (the latest crash log and the corresponding temperature during no-load will be listed below, the monitor is still displaying the wallpaper when it crashes, but no operations can be performed, including UART and SSH control)
Segmentation fault (core dumped):
lassification.yamlt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/image_ 2022-04-18 21:16:39,252 INFO Could not find libdlr.so in model artifact. Using o APP: Init ... !!! MEM: Init ... !!! MEM: Initialized DMA HEAP (fd=4) !!! MEM: Init ... Done !!! IPC: Init ... !!! IPC: Init ... Done !!! REMOTE_SERVICE: Init ... !!! REMOTE_SERVICE: Init ... Done !!! 870.565568 s: GTC Frequency = 200 MHz APP: Init ... Done !!! 870.565631 s: VX_ZONE_INIT:Enabled 870.565640 s: VX_ZONE_ERROR:Enabled 870.565646 s: VX_ZONE_WARNING:Enabled 870.566199 s: VX_ZONE_INIT:[tivxInitLocal:130] Initialization Done !!! 870.566450 s: VX_ZONE_INIT:[tivxHostInitLocal:86] Initialization Done for H! 870.566753 s: VX_ZONE_ERROR:[ownContextCreateCmdObj:161] context object desd 870.566766 s: VX_ZONE_ERROR:[vxCreateContext:946] context objection creatiod 870.566776 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL 870.566783 s: VX_ZONE_ERROR:[tivxAddKernelTIDL:233] Unable to allocate userD 870.566789 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL 870.566794 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL 870.566799 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL 870.566805 s: VX_ZONE_ERROR:[vxGetStatus:713] Reference is NULL 870.566811 s: VX_ZONE_ERROR:[vxSetReferenceName:659] Invalid reference 870.566819 s: VX_ZONE_ERROR:[tivxMapTensorPatch:611] Invalid tensor referene Segmentation fault (core dumped)
Crashes from time to time after reset (this time it crashes after reset login):
*************************************************************** *************************************************************** [ OK ] Started Print notice about GPLv3 packages. [ OK ] Started weston.service. Starting DEMO... Starting telnetd.service... [ OK ] Started DEMO. [ OK ] Started telnetd.service. [ 15.936058] PVR_K: 1047: RGX Firmware image 'rgx.fw.22.104.208.318' loaded [ 16.768461] am65-cpsw-nuss 46000000.ethernet eth0: Link is Up - 1Gbps/Full -f [ 16.777005] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready _____ _____ _ _ | _ |___ ___ ___ ___ | _ |___ ___ |_|___ ___| |_ | | _| .'| . | . | | __| _| . | | | -_| _| _| |__|__|_| |__,|_ |___| |__| |_| |___|_| |___|___|_| |___| |___| Arago Project http://arago-project.org tda4vm-sk ttyS2 Arago 2021.09 tda4vm-sk ttyS2 tda4vm-sk login: root root@tda4vm-sk:/opt/edge_ai_apps# cd [ 27.610545] Unable to handle kernel pag0 [ 27.618454] Mem abort info: [ 27.621237] ESR = 0x96000006 [ 27.624279] EC = 0x25: DABT (current EL), IL = 32 bits [ 27.629575] SET = 0, FnV = 0 [ 27.632616] EA = 0, S1PTW = 0 [ 27.635743] Data abort info: [ 27.638611] ISV = 0, ISS = 0x00000006 [ 27.642432] CM = 0, WnR = 0 [ 27.645387] swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000082f40000 [ 27.652156] [ffff03702418b920] pgd=00000008fffd0003, p4d=00000008fffd0003, p0 [ 27.662749] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 27.668303] Modules linked in: bluetooth ecdh_generic ecc rfkill xhci_plat_h6 [ 27.682983] Unable to handle kernel paging request at virtual address ff9f000 [ 27.722389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.11 [ 27.730278] Mem abort info: [ 27.738777] Hardware name: Texas Instruments J721E SK (DT) [ 27.741556] ESR = 0x96000004 [ 27.747022] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 27.750060] EC = 0x25: DABT (current EL), IL = 32 bits [ 27.756054] pc : am65_cpsw_nuss_rx_poll+0xdc/0x370 [ 27.761336] SET = 0, FnV = 0 [ 27.766109] lr : am65_cpsw_nuss_rx_poll+0xc0/0x370 [ 27.769147] EA = 0, S1PTW = 0 [ 27.773919] sp : ffff80001138fda0 [ 27.777043] Data abort info: [ 27.780343] x29: ffff80001138fda0 [ 27.783207] ISV = 0, ISS = 0x00000004 [ 27.783209] x28: ffff000824001a40 [ 27.786594] CM = 0, WnR = 0 [ 27.790411] [ 27.793797] [ff9f000820470000] address between user and kernel address ranges [ 27.796748] x27: ffff000824001a30 x26: 0000000000000122 [ 27.810633] x25: 000000000000012c x24: 0000000000000000 [ 27.815928] x23: 0000000000000040 x22: ffff000824120080 [ 27.821224] x21: ffff000824121530 x20: 0000000000000000 [ 27.826519] x19: ffff000824001a00 x18: 0000000000000000 [ 27.831816] x17: 0000000000000000 x16: 0000000000000000 [ 27.837112] x15: 0000000000000000 x14: 0000000000000000 [ 27.842408] x13: 003d090000000000 x12: 00003d0900000000 [ 27.847703] x11: 0000000000000040 x10: ffff8000111da400 [ 27.852998] x9 : ffff8000111da3f8 x8 : ffff000840000270 [ 27.858294] x7 : 0000000000000000 x6 : ffff80001138fe30 [ 27.863590] x5 : 0000000000000008 x4 : ffff000824001a40 [ 27.868885] x3 : 0000000000000058 x2 : 0000000000000368 [ 27.874180] x1 : ffff00082418bc80 x0 : ffff03702418b918 [ 27.879476] Call trace: [ 27.881912] am65_cpsw_nuss_rx_poll+0xdc/0x370 [ 27.886344] net_rx_action+0x118/0x380 [ 27.890080] efi_header_end+0x120/0x268 [ 27.893901] irq_exit+0xc0/0xe0 [ 27.897030] __handle_domain_irq+0x68/0xc0 [ 27.901113] gic_handle_irq+0x58/0x128 [ 27.904847] el1_irq+0xcc/0x180 [ 27.907975] arch_cpu_idle+0x18/0x28 [ 27.911538] default_idle_call+0x20/0x68 [ 27.915446] do_idle+0xc0/0x128 [ 27.918573] cpu_startup_entry+0x28/0x60 [ 27.922481] rest_init+0xd4/0xe4 [ 27.925699] arch_call_rest_init+0x10/0x1c [ 27.929779] start_kernel+0x48c/0x4c4 [ 27.933430] Code: 51000400 b9006fe2 52806d02 9ba20400 (f9400419) [ 27.939509] ---[ end trace b5919ffb93f953aa ]--- [ 27.944110] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 27.950964] SMP: stopping secondary CPUs [ 29.023873] SMP: failed to stop secondary CPUs 0-1 [ 29.028652] Kernel Offset: disabled [ 29.032126] CPU features: 0x0040022,20006008 [ 29.036380] Memory Limit: none [ 29.039424] ---[ end Kernel panic - not syncing: Oops: Fatal exception in in-
Hi Nancy,
The board crashing after logging in and running no applications is not a known issue - we have not been able to reproduce this behavior on our boards so far.
For the errors from the demo, we have seen some instabilities using file input on some of our demos which we are working to fix.
A couple of questions for the customer:
Regards,
Takuma
Hi Takuma,
Got some updates from the customer side as follows:
Are you using file input, or a live input from a camera when running the demo?
When running the demo, the customer using file input and the issue not only occurs when running the demo.
After power-on and running no application, if you boot 10 times, how many time is the crash observed when sitting idle at login screen? And how long does it take?
As you said, after power-on and running no application, the crash occurs occasionally(at random). In the previous tests, the minimum time was about three minutes after power-on and I'm afraid they are not able to boot 10 times since they don't have much time.
If you have multiple boards, can this be observed on all boards?
They do have multiple boards, and for now with the same SD card image on the bad board and other peripherals, only replace the same board model and do not observe a similar issue. So the customer suspects that the board has a h/w problem.
If you have multiple SD cards, can this be observed on all SD cards?
They've tried to replace other SD card but this still happens.
Assuming the following procedure:
A. after flashing the SD card image, load the development board and power-on.
B. Run demo for about ten minutes
C. the display is stuck, and the demo is terminated via the keyboard, and the program re-runs failed
D. after rebooting the board, the system crash periodically.
The point is replacing the same type of board with the same environment does not occur with other development boards of the same type. Could you please help determine if there is a quality problem with this board?
Thanks and regards,
Cherry
Hi Cherry,
Currently checking with HW app team about the process for determining/reporting a defective board.
Can you confirm that the random kernel crashes are seen only on 1 "bad" board and switching boards fixes the issue? Or do multiple boards exhibit the same failure?
Regards,
Takuma
Cherry,
The Starter Kit looks to be defective. Have your customer go to: https://www.ti.com/productreturns/docs/createReturn.tsp
and that should get the process started.
Thanks,
Alec