This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi Ti expert
I have some questions about system exceptions.
Hardware platform: TDA4VH
Software Version : SDK 8.6
Our custome board has two TDA4VHs, both of which are enabled with CPSW2G and CPSW9G. Our two SOCs are cascaded based on CPSW9G. At the application level, we use NFS for dual-SOC communication. Enable NFS service (v3 version) on the master SOC, and mount the exported nodes from the master SOC on the slaver SOC. However, when executing NFS mounting on the slaver SOC, the following problem will occasionally occur on the slaver SOC: We have compiled the logs of the multiple instances when the problem occurred, as shown below:
log(1) [22:47:29:788][1970-01-01 00:00:02][ 3.162][ 2.487517] SError Interrupt on CPU0, code 0xbf000002 -- SError [22:47:29:788][ 2.487519] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [22:47:29:788][ 2.487520] Hardware name: HIKAUTO AE-B50038-S (DT) [22:47:29:789][ 2.487521] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--) [22:47:29:789][ 2.487522] pc : _raw_spin_unlock+0x38/0x48 [22:47:29:789][ 2.487523] lr : handle_irq_event+0x40/0xe0 [22:47:29:790][ 2.487524] sp : ffff8000113afae0 [22:47:29:790][ 2.487526] x29: ffff8000113afae0 x28: ffff80001117be40 [22:47:29:791][ 2.487529] x27: ffff00083249a8b0 x26: ffff00083038dc10 [22:47:29:791][ 2.487531] x25: ffff0008324a8280 x24: ffff0008326cce60 [22:47:29:791][ 2.487534] x23: ffff80001117be40 x22: ffff0008326cce00 [22:47:29:792][ 2.487536] x21: ffff0008324248dc x20: ffff0008324248dc [22:47:29:792][ 2.487538] x19: ffff000832424800 x18: 0000000000000000 [22:47:29:792][ 2.487540] x17: 0000000000000000 x16: 0000000000000000 [22:47:29:792][ 2.487542] x15: 00000544b6901d20 x14: 0000000000000228 [22:47:29:792][ 2.487545] x13: 0000000000000005 x12: ffff8000127e0000 [22:47:29:792][ 2.487547] x11: 0000000000000040 x10: ffff8000111faa48 [22:47:29:793][ 2.487549] x9 : ffff8000111faa40 x8 : ffff00082ee40270 [22:47:29:793][ 2.487551] x7 : 0000000000000000 x6 : 0000000000000000 [22:47:29:793][ 2.487553] x5 : ffff80086e9c0000 x4 : ffff8000113afb10 [22:47:29:793][ 2.487556] x3 : ffff80086e9c0000 x2 : ffff000832424800 [22:47:29:794][ 2.487558] x1 : ffff80001117be40 x0 : 0000000100010101 [22:47:30:426][ 2.487561] Kernel panic - not syncing: Asynchronous SError Interrupt [22:47:30:426][ 2.487562] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [22:47:30:426][ 2.487563] Hardware name: HIKAUTO AE-B50038-S (DT) [22:47:30:427][ 2.487564] Call trace: [22:47:30:427][ 2.487565] dump_backtrace+0x0/0x1a0 [22:47:30:427][ 2.487566] show_stack+0x18/0x28 [22:47:30:428][ 2.487567] dump_stack+0xd0/0x12c [22:47:30:428][ 2.487568] panic+0x16c/0x334 [22:47:30:428][ 2.487569] nmi_panic+0x8c/0x90 [22:47:30:428][ 2.487570] arm64_serror_panic+0x78/0x84 [22:47:30:429][ 2.487571] do_serror+0x38/0x98 [22:47:30:429][ 2.487572] el1_error+0x90/0x110 [22:47:30:429][ 2.487573] _raw_spin_unlock+0x38/0x48 [22:47:30:429][ 2.487574] handle_level_irq+0xb8/0x140 [22:47:30:430][ 2.487575] generic_handle_irq+0x30/0x48 [22:47:30:430][ 2.487576] ti_sci_inta_irq_handler+0xc4/0x160 [22:47:30:431][ 2.487577] generic_handle_irq+0x30/0x48 [22:47:30:431][ 2.487578] __handle_domain_irq+0x64/0xc0 [22:47:30:432][ 2.487579] gic_handle_irq+0x58/0x124 [22:47:30:432][ 2.487580] el1_irq+0xcc/0x180 [22:47:30:432][ 2.487580] _raw_spin_unlock_irqrestore+0x3c/0x48 [22:47:30:433][ 2.487581] enable_irq+0x5c/0xa0 [22:47:30:433][ 2.487582] virt_cpsw_nuss_tx_poll+0x1a0/0x2c8 [22:47:30:433][ 2.487583] net_rx_action+0x114/0x380 [22:47:30:434][ 2.487584] efi_header_end+0x120/0x268 [22:47:30:434][ 2.487585] irq_exit+0xc0/0xe0 [22:47:30:435][ 2.487586] __handle_domain_irq+0x68/0xc0 [22:47:30:435][ 2.487587] gic_handle_irq+0x58/0x124 [22:47:30:435][ 2.487588] el1_irq+0xcc/0x180 [22:47:30:436][ 2.487589] arch_cpu_idle+0x18/0x28 [22:47:30:436][ 2.487590] default_idle_call+0x20/0x68 [22:47:30:436][ 2.487591] do_idle+0xc0/0x128 [22:47:30:437][ 2.487592] cpu_startup_entry+0x24/0x60 [22:47:30:437][ 2.487594] rest_init+0xd4/0xe4 [22:47:30:437][ 2.487595] arch_call_rest_init+0x10/0x1c [22:47:30:438][ 2.487596] start_kernel+0x480/0x4b8 [22:47:30:438][ 2.487613] SMP: stopping secondary CPUs [22:47:30:438][ 2.487614] Kernel Offset: disabled [22:47:30:439][ 2.487615] CPU features: 0x28040022,20006008 [22:47:30:439][ 2.487616] Memory Limit: none log(2) [00:16:10:935][00[ 2.984265] SError Interrupt on CPU0, code 0xbf000002 -- SError [00:16:10:935][ 2.984267] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [00:16:10:937][ 2.984268] Hardware name: HIKAUTO AE-B50038-S (DT) [00:16:10:937][ 2.984269] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--) [00:16:10:937][ 2.984270] pc : _raw_spin_unlock+0x38/0x48 [00:16:10:937][ 2.984271] lr : handle_irq_event+0x40/0xe0 [00:16:10:938][ 2.984272] sp : ffff8000113afed0 [00:16:10:938][ 2.984274] x29: ffff8000113afed0 x28: ffff80001117be40 [00:16:10:938][ 2.984277] x27: 00000000ffeec3ac x26: ffff8000113b0000 [00:16:10:938][ 2.984279] x25: ffff000832347680 x24: ffff00082ee5ea60 [00:16:10:938][ 2.984282] x23: ffff80001117be40 x22: ffff00082ee5ea00 [00:16:10:938][ 2.984284] x21: ffff0008322d56dc x20: ffff0008322d56dc [00:16:10:939][ 2.984286] x19: ffff0008322d5600 x18: 0000000000000000 [00:16:10:939][ 2.984289] x17: 0000000000000000 x16: 0000000000000000 [00:16:10:939][ 2.984291] x15: 0000fffc112de6d0 x14: 000000000000022d [00:16:10:939][ 2.984293] x13: 0000000000000000 x12: 0000000000000001 [00:16:10:939][ 2.984296] x11: 0000000000000040 x10: ffff8000111faa48 [00:16:10:939][ 2.984298] x9 : ffff8000111faa40 x8 : ffff00082ee40270 [00:16:10:939][ 2.984300] x7 : 0000000000000000 x6 : 0000000000000000 [00:16:10:940][ 2.984303] x5 : ffff80086e9c0000 x4 : ffff8000113aff00 [00:16:11:580][ 2.984305] x3 : ffff80086e9c0000 x2 : ffff0008322d5600 [00:16:11:580][ 2.984308] x1 : ffff80001117be40 x0 : 0000000100010001 [00:16:11:580][ 2.984310] Kernel panic - not syncing: Asynchronous SError Interrupt [00:16:11:580][ 2.984312] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [00:16:11:580][ 2.984313] Hardware name: HIKAUTO AE-B50038-S (DT) [00:16:11:580][ 2.984314] Call trace: [00:16:11:580][ 2.984315] dump_backtrace+0x0/0x1a0 [00:16:11:581][ 2.984316] show_stack+0x18/0x28 [00:16:11:581][ 2.984317] dump_stack+0xd0/0x12c [00:16:11:581][ 2.984318] panic+0x16c/0x334 [00:16:11:581][ 2.984319] nmi_panic+0x8c/0x90 [00:16:11:581][ 2.984320] arm64_serror_panic+0x78/0x84 [00:16:11:581][ 2.984321] do_serror+0x38/0x98 [00:16:11:581][ 2.984322] el1_error+0x90/0x110 [00:16:11:581][ 2.984323] _raw_spin_unlock+0x38/0x48 [00:16:11:581][ 2.984324] handle_level_irq+0xb8/0x140 [00:16:11:581][ 2.984325] generic_handle_irq+0x30/0x48 [00:16:11:581][ 2.984326] ti_sci_inta_irq_handler+0xc4/0x160 [00:16:11:581][ 2.984327] generic_handle_irq+0x30/0x48 [00:16:11:581][ 2.984328] __handle_domain_irq+0x64/0xc0 [00:16:11:581][ 2.984329] gic_handle_irq+0x58/0x124 [00:16:11:581][ 2.984330] el1_irq+0xcc/0x180 [00:16:11:582][ 2.984331] arch_cpu_idle+0x18/0x28 [00:16:11:582][ 2.984332] default_idle_call+0x20/0x68 [00:16:11:582][ 2.984333] do_idle+0xc0/0x128 [00:16:11:582][ 2.984334] cpu_startup_entry+0x28/0x60 [00:16:11:582][ 2.984335] rest_init+0xd4/0xe4 [00:16:11:582][ 2.984336] arch_call_rest_init+0x10/0x1c [00:16:11:582][ 2.984337] start_kernel+0x480/0x4b8 [00:16:11:582][ 2.984354] SMP: stopping secondary CPUs [00:16:11:582][ 2.984356] Kernel Offset: disabled [00:16:11:582][ 2.984356] CPU features: 0x28040022,20006008 [00:16:11:582][ 2.984358] Memory Limit: none log(3) [05:33:49:315][1970-01-01 00:00:02][ 3.156]INF[mcu0-0 | 8][MCUCAN [ 2.686770] SError Interrupt on CPU0, code 0xbf000002 -- SError [05:33:49:316][ 2.686773] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [05:33:49:316][ 2.686775] Hardware name: HIKAUTO AE-B50038-S (DT) [05:33:49:316][ 2.686776] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [05:33:49:317][ 2.686777] pc : _raw_spin_unlock_irqrestore+0x3c/0x48 [05:33:49:317][ 2.686778] lr : __irq_put_desc_unlock+0x20/0x50 [05:33:49:318][ 2.686779] sp : ffff8000113afd60 [05:33:49:319][ 2.686780] x29: ffff8000113afd60 x28: ffff000835f88900 [05:33:49:319][ 2.686783] x27: ffff00083038dc10 x26: ffff000835f88800 [05:33:49:320][ 2.686786] x25: ffff00082f8052e0 x24: 0000000000000040 [05:33:49:320][ 2.686788] x23: ffff00082f8052c0 x22: ffff00082efc4000 [05:33:49:320][ 2.686790] x21: 0000000000000036 x20: 0000000000000001 [05:33:49:320][ 2.686792] x19: ffff0008322d4a00 x18: 0000000000000000 [05:33:49:320][ 2.686795] x17: 0000000000000000 x16: 0000000000000000 [05:33:49:321][ 2.686797] x15: 0000000000000000 x14: 000000000000038c [05:33:49:321][ 2.686800] x13: 0000000000000000 x12: 0000000000000000 [05:33:49:321][ 2.686802] x11: 0000000000000040 x10: ffff8000111faa48 [05:33:49:321][ 2.686804] x9 : ffff8000111faa40 x8 : ffff00082ee40270 [05:33:49:321][ 2.686807] x7 : 0000000000000000 x6 : ffffffffffffffe0 [05:33:49:321][ 2.686809] x5 : ffff000832347680 x4 : 0000000000000000 [05:33:49:322][ 2.686811] x3 : ffffffffffffffe0 x2 : 0000000000000000 [05:33:49:322][ 2.686813] x1 : ffff80001117be40 x0 : 0000000100000101 [05:33:49:322][ 2.686816] Kernel panic - not syncing: Asynchronous SError Interrupt [05:33:49:636][ 2.686817] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.10.162-g76b3e88d56 #1 [05:33:49:637][ 2.686818] Hardware name: HIKAUTO AE-B50038-S (DT) [05:33:49:637][ 2.686819] Call trace: [05:33:49:638][ 2.686820] dump_backtrace+0x0/0x1a0 [05:33:49:638][ 2.686821] show_stack+0x18/0x28 [05:33:49:639][ 2.686822] dump_stack+0xd0/0x12c [05:33:49:639][ 2.686823] panic+0x16c/0x334 [05:33:49:639][ 2.686824] nmi_panic+0x8c/0x90 [05:33:49:640][ 2.686825] arm64_serror_panic+0x78/0x84 [05:33:49:640][ 2.686826] do_serror+0x38/0x98 [05:33:49:640][ 2.686827] el1_error+0x90/0x110 [05:33:49:641][ 2.686828] _raw_spin_unlock_irqrestore+0x3c/0x48 [05:33:49:642][ 2.686829] enable_irq+0x5c/0xa0 [05:33:49:642][ 2.686830] virt_cpsw_nuss_rx_poll+0x23c/0x328 [05:33:49:642][ 2.686830] net_rx_action+0x114/0x380 [05:33:49:643][ 2.686831] efi_header_end+0x120/0x268 [05:33:49:643][ 2.686832] irq_exit+0xc0/0xe0 [05:33:49:644][ 2.686833] __handle_domain_irq+0x68/0xc0 [05:33:49:645][ 2.686834] gic_handle_irq+0x58/0x124 [05:33:49:645][ 2.686835] el1_irq+0xcc/0x180 [05:33:49:646][ 2.686836] arch_cpu_idle+0x18/0x28 [05:33:49:646][ 2.686837] default_idle_call+0x20/0x68 [05:33:49:647][ 2.686838] do_idle+0xc0/0x128 [05:33:49:647][ 2.686839] cpu_startup_entry+0x24/0x60 [05:33:49:648][ 2.686840] rest_init+0xd4/0xe4 [05:33:49:648][ 2.686841] arch_call_rest_init+0x10/0x1c [05:33:49:649][ 2.686842] start_kernel+0x480/0x4b8 [05:33:49:649][ 2.686860] SMP: stopping secondary CPUs [05:33:49:650][ 2.686861] Kernel Offset: disabled [05:33:49:650][ 2.686862] CPU features: 0x28040022,20006008 [05:33:49:651][ 2.686863] Memory Limit: none
After querying the ARMv8 manual, we found that the explanation for error code 0xbf000002 is shown in the following figure:
Bit 24 in the error code is set to 1, additional definition is provided that cannot be found in the manual. Does TI know the specific meaning of this error code?
We accidentally discovered during testing that if a firewall is configured for an address and access continues, the “SError Interrupt on CPU0, code 0xbf000002” will appear. We have checked the code and confirmed that there is no new logic to enable the firewall except for the SDK.
So we have the following questions and hope that TI experts can help answer them:
1. What does Linux error code 0xbf000002 mean? Is it related to the firewall?
2. If it is related, why does the problem occur after NFS is mounted? How to modify the SDK’s firewall configuration? If it is not related, where does the TI expert think the problem exists?
Thank you very much, looking forward to your reply!
Best regards
Alex
Hi Ti expert
We have recently performed the following fault injection and testing:
1. Measures to try to avoid the problem:
We tried various measures such as replacing the TX/RX UDMA channels of CPSW9G, disabling the multimedia engine initialization of MCU2_0, However, the serror issue can still be reproduced.
2. Fault injection testing:
We configured the North Bridge (NB) firewall to allow only MCU1_0 to access the 1GB memory starting from address 0x80000000. After using the firewall configuration, the Universal DMA (UDMA) inside NAVSS is unable to access the memory. After the configuration takes effect, we can reproduce the SERROR phenomenon by configuring the IP address on the Linux system. The error code in ESR register is consistent (0xbf000002 ).
TI experts have any suggestions or troubleshooting ideas?
Best regards
Alex
Hi Ti expert
We found in our testing that occasionally SError occurs after calling the vxVerifyGraph interface on Linux. We also made another test and found that with m4 printing enabled, our process would get stuck after calling the vxVerifyGrap interface. Both eventually stopped at vxVerifyGraph. We speculate that there is some correlation between the two. Please help check the logs of m4 to see if the reason for the problem can be found.
Best regards
Alex
Hi,
Is this issue something which is seen when we accessed some sort of firewalled region over nfs?
Regards,
Tanmay
HI, Tanmay
In my above description. nfs is our application business. Firewall is an issue we encountered by chance during fault injection. We only saw that these two issues shared the same error code, but does not know the connection between them. Under normal circumstances, we will not modify the firewall configuration of the SDK.
Best regards
Alex
Hi Li
can you enable the TIFS traces and share with us to see if there is any firewall exception.
Please follow the FAQ on how to enable the traces for the TIFS.
Regards
Diwakar
Hi Alex
The M4 logs which you shared not seeing any firewall exception for that.
Regards
Diwakar