This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM4378: rcu preempt self-detected cpu stall on custom target based on AM437x with linux 4.19.94

Part Number: AM4378

Target SOC : AM4378(custom board)

 Linux  version : 4.19.94 

TI SDK version : 06.03.00.106

We are seeing CPU stall issues  in long run setup  with custom RS485 driver based application which periodically scans & requests  slaves on the RS485 bus.

And we are observing starvation issue with running tasks pointing to different process/threads each time, rcu_preempt, swapper, rest_server thread & tsk_main_application. and same custom driver(RS485 custom driver) with main application thread we are not seeing starvation or stall issues on AM335x based platform running linux 4.9.28 kernel.

Could you please help us with pointers to root cause this issue further ?

Log snippet:

[390027.914576] rcu: INFO: rcu_preempt self-detected stall on CPU

[390027.920472] rcu: 0-...!: (2100 ticks this GP) idle=2b2/0/0x3 softirq=20731028/20731028 fqs=0
[390027.929206] rcu: (t=2100 jiffies g=64349461 q=13)
[390027.934203] rcu: rcu_preempt kthread starved for 2100 jiffies! g64349461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[390027.945029] rcu: RCU grace-period kthread stack dump:
[390027.950189] rcu_preempt I 0 10 2 0x00000000
[390027.955790] Backtrace:
[390027.958361] [<c09f2f50>] (__schedule) from [<c09f3720>] (schedule+0x58/0xc4)
[390027.965536] r10:c0f55e93 r9:c0f03108 r8:c0f16820 r7:c0f16840 r6:dc485ee4 r5:c0f16840
[390027.973485] r4:ffffe000
[390027.976128] [<c09f36c8>] (schedule) from [<c09f7484>] (schedule_timeout+0x184/0x294)
[390027.983993] r5:c0f16840 r4:0252a544
[390027.987688] [<c09f7300>] (schedule_timeout) from [<c0183984>] (rcu_gp_kthread+0x5e0/0xbf0)
[390027.996079] r8:00000001 r7:c0f16820 r6:dc484000 r5:c0f16154 r4:00000001
[390028.002914] [<c01833a4>] (rcu_gp_kthread) from [<c014bf5c>] (kthread+0x158/0x160)
[390028.010514] r7:dc484000
[390028.013155] [<c014be04>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[390028.020495] Exception stack(0xdc485fb0 to 0xdc485ff8)
[390028.025658] 5fa0: 00000000 00000000 00000000 00000000
[390028.033962] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[390028.042264] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[390028.048998] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c014be04
[390028.056946] r4:dc4415c0
[390028.059585] Task dump for CPU 0:
[390028.062915] swapper R running task 0 0 0 0x00000002
[390028.070084] Backtrace:
[390028.072639] [<c010ce90>] (dump_backtrace) from [<c010d204>] (show_stack+0x18/0x1c)
[390028.080332] r7:c0f16154 r6:c0f03108 r5:00000000 r4:c0f06a38
[390028.086111] [<c010d1ec>] (show_stack) from [<c015697c>] (sched_show_task.part.2+0xec/0x110)
[390028.094591] [<c0156890>] (sched_show_task.part.2) from [<c0156a94>] (dump_cpu_task+0x38/0x3c)
[390028.103239] r5:00000000 r4:c0f16154
[390028.106925] [<c0156a5c>] (dump_cpu_task) from [<c0185488>] (rcu_dump_cpu_stacks+0x90/0xd0)
[390028.115318] [<c01853f8>] (rcu_dump_cpu_stacks) from [<c0184b64>] (rcu_check_callbacks+0x628/0x8ec)
[390028.124409] r10:c0f16820 r9:c0f030fc r8:00000000 r7:c0f16154 r6:c0f16574 r5:c0f165e0
[390028.132359] r4:c0f16154 r3:c0f03048
[390028.136044] [<c018453c>] (rcu_check_callbacks) from [<c018aa78>] (update_process_times+0x3c/0x6c)
[390028.145047] r10:20010193 r9:c019c774 r8:00000000 r7:000162ba r6:00000000 r5:c0f06a38
[390028.152995] r4:ffffe000
[390028.155639] [<c018aa3c>] (update_process_times) from [<c019c54c>] (tick_sched_handle+0x5c/0x60)
[390028.164464] r7:000162ba r6:730738c7 r5:c0f01d80 r4:c0f184c0
[390028.170240] [<c019c4f0>] (tick_sched_handle) from [<c019c7c4>] (tick_sched_timer+0x50/0xac)
[390028.178724] [<c019c774>] (tick_sched_timer) from [<c018b660>] (__hrtimer_run_queues.constprop.3+0x190/0x228)
[390028.188684] r7:ffffe000 r6:c0f17980 r5:c0f184c0 r4:c0f179c0
[390028.194460] [<c018b4d0>] (__hrtimer_run_queues.constprop.3) from [<c018be20>] (hrtimer_interrupt+0x120/0x30c)
[390028.204510] r10:9017ff67 r9:ffffffff r8:7fffffff r7:00000003 r6:20010193 r5:ffffe000
[390028.212458] r4:c0f17980
[390028.215098] [<c018bd00>] (hrtimer_interrupt) from [<c010f0a4>] (twd_handler+0x38/0x40)
[390028.223141] r10:9017ff67 r9:c0f00000 r8:dc408000 r7:00000012 r6:dc432740 r5:c0f034d4
[390028.231089] r4:00000001
[390028.233731] [<c010f06c>] (twd_handler) from [<c0176f84>] (handle_percpu_devid_irq+0x6c/0x104)
[390028.242380] r5:c0f034d4 r4:dc407300
[390028.246075] [<c0176f18>] (handle_percpu_devid_irq) from [<c0171920>] (generic_handle_irq+0x2c/0x3c)
[390028.255249] r7:c0f01e90 r6:00000001 r5:00000000 r4:c0f546bc
[390028.261028] [<c01718f4>] (generic_handle_irq) from [<c0172100>] (__handle_domain_irq+0x5c/0xb0)
[390028.269864] [<c01720a4>] (__handle_domain_irq) from [<c045c140>] (gic_handle_irq+0x44/0x70)
[390028.278342] r9:c0f00000 r8:fa241100 r7:c0f01d80 r6:fa240100 r5:fa24010c r4:c0f034d4
[390028.286213] [<c045c0fc>] (gic_handle_irq) from [<c0101a0c>] (__irq_svc+0x6c/0xa8)
[390028.293813] Exception stack(0xc0f01d80 to 0xc0f01dc8)
[390028.298979] 1d80: 00200102 00200102 c0f55c40 00000000 00000008 00000000 00000001 00000000
[390028.307284] 1da0: dc408000 c0f00000 9017ff67 c0f01e2c c0f01e30 c0f01dd0 c0131414 c0102220
[390028.315582] 1dc0: 60010113 ffffffff
[390028.319177] r9:c0f00000 r8:dc408000 r7:c0f01db4 r6:ffffffff r5:60010113 r4:c0102220
[390028.327057] [<c0102180>] (__do_softirq) from [<c0131414>] (irq_exit+0x108/0x114)
[390028.334577] r10:9017ff67 r9:c0f00000 r8:dc408000 r7:00000000 r6:00000001 r5:00000000
[390028.342525] r4:c0f546bc
[390028.345163] [<c013130c>] (irq_exit) from [<c0172104>] (__handle_domain_irq+0x60/0xb0)
[390028.353119] [<c01720a4>] (__handle_domain_irq) from [<c045c140>] (gic_handle_irq+0x44/0x70)
[390028.361599] r9:c0f00000 r8:fa241100 r7:c0f01e90 r6:fa240100 r5:fa24010c r4:c0f034d4
[390028.369468] [<c045c0fc>] (gic_handle_irq) from [<c0101a0c>] (__irq_svc+0x6c/0xa8)
[390028.377069] Exception stack(0xc0f01e90 to 0xc0f01ed8)
[390028.382231] 1e80: 00000000 000162b5 05355555 c0f10d10
[390028.390536] 1ea0: ffffe000 c0f030b8 000162b5 9019d698 dbe13400 00000000 9017ff67 c0f01f1c
[390028.398839] 1ec0: c0f01ee0 c0f01ee0 c07d5374 c07d5378 a0010013 ffffffff
[390028.405574] r9:c0f00000 r8:dbe13400 r7:c0f01ec4 r6:ffffffff r5:a0010013 r4:c07d5378
[390028.413455] [<c07d52e8>] (cpuidle_enter_state) from [<c07d5628>] (cpuidle_enter+0x1c/0x20)
[390028.421847] r10:c0c0c4b4 r9:c0f55e69 r8:c0f030cc r7:c0f49a88 r6:dbe13400 r5:c0f030b8
[390028.429795] r4:ffffe000
[390028.432434] [<c07d560c>] (cpuidle_enter) from [<c0157ac0>] (call_cpuidle+0x28/0x40)
[390028.440216] [<c0157a98>] (call_cpuidle) from [<c0157e60>] (do_idle+0x1e4/0x234)
[390028.447649] [<c0157c7c>] (do_idle) from [<c015819c>] (cpu_startup_entry+0x14/0x18)
[390028.455342] r10:dcfff7c0 r9:c0e40a30 r8:c0f574c0 r7:c0f03040 r6:ffffffff r5:00000002
[390028.463290] r4:c0f10098
[390028.465927] [<c0158188>] (cpu_startup_entry) from [<c09f2564>] (rest_init+0xb0/0xb4)
[390028.473802] [<c09f24b4>] (rest_init) from [<c0e00dcc>] (start_kernel+0x410/0x438)
[390028.481405] r5:c0f574c0 r4:c0f57518
[390028.485085] [<c0e009bc>] (start_kernel) from [<00000000>] ( (null))      

[ 1700.737639] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 1700.744470] rcu: 0-...!: (25282 ticks this GP) idle=54a/1/0x40000004 softirq=138360/138360 fqs=0
[ 1700.755391] rcu: (t=26000 jiffies g=399669 q=2)
[ 1700.760450] rcu: rcu_preempt kthread starved for 26000 jiffies! g399669 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1700.771204] rcu: RCU grace-period kthread stack dump:
[ 1700.776550] rcu_preempt R running task 0 10 2 0x00000000
[ 1700.784001] Backtrace:
[ 1700.786819] [<c0bc8c88>] (__schedule) from [<c0bc98d0>] (schedule+0x54/0xbc)
[ 1700.794204] r10:c1288938 r9:ffffe000 r8:c1289640 r7:c1334194 r6:dc56fe7c r5:c1289640
[ 1700.802341] r4:ffffe000
[ 1700.805204] [<c0bc987c>] (schedule) from [<c0bcef50>] (schedule_timeout+0x23c/0x4b0)
[ 1700.813265] r5:c1289640 r4:0014f768
[ 1700.817180] [<c0bced14>] (schedule_timeout) from [<c01b7984>] (rcu_gp_kthread+0x9e0/0x1a10)
[ 1700.825853] r10:c12875b0 r9:ffffe000 r8:c1108150 r7:c12875b0 r6:c1287834 r5:c1287804
[ 1700.833996] r4:00000003
[ 1700.836851] [<c01b6fa4>] (rcu_gp_kthread) from [<c0159658>] (kthread+0x178/0x180)
[ 1700.844644] r7:dc56e000
[ 1700.847499] [<c01594e0>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
[ 1700.855036] Exception stack(0xdc56ffb0 to 0xdc56fff8)
[ 1700.860396] ffa0: 00000000 00000000 00000000 00000000
[ 1700.868909] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1700.877416] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 1700.884344] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01594e0
[ 1700.892478] r4:dc51a9c0
[ 1700.895332] Task dump for CPU 0:
[ 1700.898848] tsk_main_appli R running task 0 1119 1061 0x00000002


[27044.459029] rcu: INFO: rcu_preempt self-detected stall on CPU
[27044.464817] rcu: 0-...!: (8353 ticks this GP) idle=2da/1/0x40000004 softirq=1370452/1370452 fqs=0
[27044.473899] rcu: (t=8403 jiffies g=4452281 q=10)
[27044.478720] rcu: rcu_preempt kthread starved for 8403 jiffies! g4452281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[27044.489197] rcu: RCU grace-period kthread stack dump:
[27044.494267] rcu_preempt R running task 0 10 2 0x00000000
[
27044.501350] Backtrace:
[27044.503808] Function entered at [<c09d0528>] from [<c09d0bec>]
[27044.509670] r10:00000005 r9:dc482000 r8:c0d16760 r7:c0d16780 r6:dc483ee4 r5:c0d16780
[27044.517533] r4:ffffe000
[27044.520073] Function entered at [<c09d0b94>] from [<c09d48bc>]
[27044.525930] r5:c0d16780 r4:0028ae3b
[27044.529518] Function entered at [<c09d4738>] from [<c017c748>]
[27044.535379] r8:c0d03108 r7:ffffe000 r6:c0d16760 r5:c0d16088 r4:00000001
[27044.542107] Function entered at [<c017c298>] from [<c014923c>]
[27044.547962] r7:dc482000
[27044.550503] Function entered at [<c01490e4>] from [<c01010e8>]
[27044.556359] Exception stack(0xdc483fb0 to 0xdc483ff8)
[27044.561432] 3fa0: 00000000 00000000 00000000 00000000
[27044.569649] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[27044.577865] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[27044.584512] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01490e4
[27044.592372] r4:dc43cb80
[27044.594914] Task dump for CPU 0:
[27044.598153] rest_server_0 R running task 0 1040 1019 0x00000002
[27044.605233] Backtrace:
[27044.607689] Function entered at [<c010cd7c>] from [<c010d0e0>]
[27044.613549] r7:c0d16088 r6:c0d03108 r5:000003fb r4:db70de00
[27044.619230] Function entered at [<c010d0c8>] from [<c015367c>]
[27044.625086] Function entered at [<c0153590>] from [<c0153794>]
[27044.630943] r5:00000000 r4:c0d16088
[27044.634532] Function entered at [<c015375c>] from [<c017e0c8>]
[27044.640387] Function entered at [<c017e038>] from [<c017d7f0>]
[27044.646249] r10:c0d16760 r9:c0d030fc r8:00000000 r7:c0d16088 r6:c0d164a8 r5:c0d16514
[27044.654111] r4:c0d16088 r3:c0d03048
[27044.657698] Function entered at [<c017d1c8>] from [<c0182e00>]
[27044.663560] r10:20000193 r9:c019437c r8:00000000 r7:00001898 r6:00000000 r5:db70de00
[27044.671421] r4:ffffe000
[27044.673961] Function entered at [<c0182dc4>] from [<c0194154>]
[27044.679820] r7:00001898 r6:c6d5e69d r5:db789ce0 r4:c0d18400
[27044.685501] Function entered at [<c01940f8>] from [<c01943cc>]
[27044.691356] Function entered at [<c019437c>] from [<c0183994>]
[27044.697214] r7:ffffe000 r6:c0d178c0 r5:c0d18400 r4:c0d17900
[27044.702895] Function entered at [<c0183824>] from [<c0184084>]
[27044.708755] r10:00000000 r9:ffffffff r8:7fffffff r7:00000003 r6:20000193 r5:ffffe000
[27044.716616] r4:c0d178c0
[27044.719158] Function entered at [<c0183f64>] from [<c010ef34>]
[27044.725019] r10:00000000 r9:db788000 r8:dc408000 r7:00000012 r6:dc42f700 r5:c0d03498
[27044.732880] r4:00000001
[27044.735421] Function entered at [<c010eefc>] from [<c0170d98>]
[27044.741277] r5:c0d03498 r4:dc407300
[27044.744866] Function entered at [<c0170d2c>] from [<c016bbbc>]
[27044.750725] r7:db789df0 r6:00000001 r5:00000000 r4:c0d54b24
[27044.756405] Function entered at [<c016bb90>] from [<c016c39c>]
[27044.762260] Function entered at [<c016c340>] from [<c0461ae0>]
[27044.768121] r9:db788000 r8:fa241100 r7:db789ce0 r6:fa240100 r5:fa24010c r4:c0d03498
[27044.775896] Function entered at [<c0461a9c>] from [<c0101a0c>]
[27044.781751] Exception stack(0xdb789ce0 to 0xdb789d28)
[27044.786829] 9ce0: 00404040 00404040 c0d560c0 00000000 00000202 00000000 00000001 00000000
[27044.795047] 9d00: dc408000 db788000 00000000 db789d8c db789d90 db789d30 c012fdc4 c0102220
[27044.803259] 9d20: 60000113 ffffffff
[27044.806765] r9:db788000 r8:dc408000 r7:db789d14 r6:ffffffff r5:60000113 r4:c0102220
[27044.814539] Function entered at [<c0102180>] from [<c012fdc4>]
[27044.820400] r10:00000000 r9:db788000 r8:dc408000 r7:00000000 r6:00000001 r5:00000000
[27044.828261] r4:c0d54b24
[27044.830802] Function entered at [<c012fcbc>] from [<c016c3a0>]
[27044.836657] Function entered at [<c016c340>] from [<c0461ae0>]
[27044.842519] r9:db788000 r8:fa241100 r7:db789df0 r6:fa240100 r5:fa24010c r4:c0d03498
[27044.850293] Function entered at [<c0461a9c>] from [<c0101a0c>]
[27044.856149] Exception stack(0xdb789df0 to 0xdb789e38)
[27044.861223] 9de0: dc2e1c3c 00000000 00000000 00000002
[27044.869440] 9e00: db789ee8 dc2e1bb8 db789f60 db789f60 db789ed0 dc2e1c3c 00000000 db789ebc
[27044.877655] 9e20: db789ec0 db789e40 c02c50a8 c0162fb8 60000013 ffffffff
[27044.884301] r9:db788000 r8:db789ed0 r7:db789e24 r6:ffffffff r5:60000013 r4:c0162fb8
[27044.892075] Function entered at [<c02c5034>] from [<c023a638>]
[27044.897936] r10:00000004 r9:00000000 r8:00000000 r7:db789f60 r6:db789f60 r5:00000000
[27044.905797] r4:db8eec00
[27044.908337] Function entered at [<c023a510>] from [<c023a81c>]
[27044.914196] r7:db789f60 r6:b6b3d2ec r5:db8eec00 r4:00000008
[27044.919877] Function entered at [<c023a770>] from [<c023aaac>]
[27044.925738] r8:0073a3ad r7:00000008 r6:b6b3d2ec r5:db8eec03 r4:db8eec00
[27044.932466] Function entered at [<c023aa44>] from [<c023ab44>]
[27044.938326] r9:db788000 r8:c0101204 r7:00000004 r6:b6b3d2ec r5:00000008 r4:00000070
[27044.946099] Function entered at [<c023ab34>] from [<c0101000>]
[27044.951956] Exception stack(0xdb789fa8 to 0xdb789ff0)
[27044.957031] 9fa0: 00000070 00000008 00000001 b6b3d2ec 00000008 00000000
[27044.965246] 9fc0: 00000070 00000008 b6b3d2ec 00000004 00000008 00000008 00000020 b50f7d28
[27044.973459] 9fe0: 00000004 adcfacc8 b6652061 b65dec46
[27107.489028] rcu: INFO: rcu_preempt self-detected stall on CPU
[27107.494811] rcu: 0-...!: (14606 ticks this GP) idle=2da/1/0x40000004 softirq=1370452/1370452 fqs=0
[27107.503980] rcu: (t=14706 jiffies g=4452281 q=10)

To  debug starvation issue we have enabled following lock def & lock mechanism debug configuration 

CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y

CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_TIMEKEEPING=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y

After loading kernel with these configurations seeing soft lock issue  as CPU#0 stuck for more than 22seconds 

[ 3453.079847] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [tsk_main_appli:1087]
[ 3453.087599] Modules linked in: gpio_interrupts_driver(O) rnet(O) mstp_cmnet(O) ccn_kmodule(O) pvrsrvkm(O) focaltech(O)
[ 3453.098441] irq event stamp: 4746189
[ 3453.102091] hardirqs last enabled at (4746188): [<c0102230>] __do_softirq+0xf0/0x568
[ 3453.110012] hardirqs last disabled at (4746189): [<c01019e0>] __irq_svc+0x60/0xb0
[ 3453.117583] softirqs last enabled at (4746136): [<c01024d8>] __do_softirq+0x398/0x568
[ 3453.125594] softirqs last disabled at (4746187): [<c0135684>] irq_exit+0x124/0x130
[ 3453.133255] CPU: 0 PID: 1087 Comm: tsk_main_appli Tainted: G O L 4.19.94-g5a23bc00e0 #1
[ 3453.142476] Hardware name: Generic AM43 (Flattened Device Tree)
[ 3453.148475] PC is at __do_softirq+0xf4/0x568
[ 3453.152824] LR is at __this_cpu_preempt_check+0x1c/0x20
[ 3453.158122] pc : [<c0102234>] lr : [<c050a480>] psr: 20010113
[ 3453.164466] sp : db719c48 ip : db719bf0 fp : db719cbc
[ 3453.169763] r10: ffffe000 r9 : 00000002 r8 : 00000000
[ 3453.175061] r7 : db719d28 r6 : c131d100 r5 : c0135684 r4 : db718000
[ 3453.181666] r3 : da1a6000 r2 : 00000000 r1 : c0f2c34c r0 : 00000000
[ 3453.188273] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 3453.195490] Control: 10c53c7d Table: 9b6a0059 DAC: 00000051
[ 3453.201315] CPU: 0 PID: 1087 Comm: tsk_main_appli Tainted: G O L 4.19.94-g5a23bc00e0 #1
[ 3453.210538] Hardware name: Generic AM43 (Flattened Device Tree)
[ 3453.216530] Backtrace:
[ 3453.219055] [<c010e5ac>] (dump_backtrace) from [<c010e93c>] (show_stack+0x18/0x1c)
[ 3453.226714] r7:00000000 r6:db718000 r5:c11081f0 r4:c1288918
[ 3453.232462] [<c010e924>] (show_stack) from [<c0b276d0>] (dump_stack+0x24/0x28)
[ 3453.239780] [<c0b276ac>] (dump_stack) from [<c0109d94>] (show_regs+0x14/0x18)
[ 3453.247012] [<c0109d80>] (show_regs) from [<c01f7b64>] (watchdog_timer_fn+0x2a8/0x2f0)
[ 3453.255031] [<c01f78bc>] (watchdog_timer_fn) from [<c01b43e8>] (__hrtimer_run_queues.constprop.3+0x224/0x59c)
[ 3453.265047] r10:c01f78bc r9:c131d40e r8:ffffe000 r7:c1283ac0 r6:c1283ac0 r5:c1283b28
[ 3453.272962] r4:c1288990
[ 3453.275565] [<c01b41c4>] (__hrtimer_run_queues.constprop.3) from [<c01b4ad0>] (hrtimer_interrupt+0x114/0x2b4)
[ 3453.285580] r10:c1283be0 r9:dc408800 r8:7fffffff r7:00000003 r6:c1283ac0 r5:20010193
[ 3453.293493] r4:c1283ac0
[ 3453.296097] [<c01b49bc>] (hrtimer_interrupt) from [<c01107a0>] (twd_handler+0x38/0x40)
[ 3453.304105] r10:ffffe000 r9:dc408800 r8:dc41fc00 r7:00000012 r6:c131e430 r5:dc485940
[ 3453.312018] r4:00000001
[ 3453.314623] [<c0110768>] (twd_handler) from [<c01956fc>] (handle_percpu_devid_irq+0x104/0x3bc)
[ 3453.323323] r5:dc485940 r4:c131d3e0
[ 3453.326980] [<c01955f8>] (handle_percpu_devid_irq) from [<c0190130>] (generic_handle_irq+0x2c/0x3c)
[ 3453.336123] r10:ffffe000 r9:dc408800 r8:db719d28 r7:db719bf8 r6:00000001 r5:00000000
[ 3453.344036] r4:c131713c
[ 3453.346639] [<c0190104>] (generic_handle_irq) from [<c0190800>] (__handle_domain_irq+0x78/0xe4)
[ 3453.355439] [<c0190788>] (__handle_domain_irq) from [<c05249ac>] (gic_handle_irq+0x44/0x70)
[ 3453.363884] r9:db718000 r8:fa241100 r7:db719bf8 r6:fa240100 r5:fa24010c r4:c11087ec
[ 3453.371719] [<c0524968>] (gic_handle_irq) from [<c01019f0>] (__irq_svc+0x70/0xb0)
[ 3453.379284] Exception stack(0xdb719bf8 to 0xdb719c40)
[ 3453.384409] 9be0: 00000000 c0f2c34c
[ 3453.392679] 9c00: 00000000 da1a6000 db718000 c0135684 c131d100 db719d28 00000000 00000002
[ 3453.400950] 9c20: ffffe000 db719cbc db719bf0 db719c48 c050a480 c0102234 20010113 ffffffff
[ 3453.409220] r9:db718000 r8:00000000 r7:db719c2c r6:ffffffff r5:20010113 r4:c0102234
[ 3453.417058] [<c0102140>] (__do_softirq) from [<c0135684>] (irq_exit+0x124/0x130)
[ 3453.424544] r10:00000001 r9:dc408800 r8:00000000 r7:db719d28 r6:00000001 r5:00000000
[ 3453.432457] r4:c131d100
[ 3453.435058] [<c0135560>] (irq_exit) from [<c0190804>] (__handle_domain_irq+0x7c/0xe4)
[ 3453.442974] r5:00000000 r4:c131713c
[ 3453.446621] [<c0190788>] (__handle_domain_irq) from [<c05249ac>] (gic_handle_irq+0x44/0x70)
[ 3453.455066] r9:db718000 r8:fa241100 r7:db719d28 r6:fa240100 r5:fa24010c r4:c11087ec
[ 3453.462900] [<c0524968>] (gic_handle_irq) from [<c01019f0>] (__irq_svc+0x70/0xb0)
[ 3453.470465] Exception stack(0xdb719d28 to 0xdb719d70)
[ 3453.475595] 9d20: 0004bcb9 c0f2c34c c1282680 00000000 db442800 db442c4c
[ 3453.483864] 9d40: db432600 db719dac 00000000 00000002 00000001 db719d9c db719d78 db719d78
[ 3453.492129] 9d60: c092ce6c c092cea0 20010013 ffffffff
[ 3453.497260] r9:db718000 r8:00000000 r7:db719d5c r6:ffffffff r5:20010013 r4:c092cea0
[ 3453.505106] [<c092cd78>] (tiadc_read_raw) from [<c0925a84>] (iio_read_channel_info+0xbc/0xc4)
[ 3453.513728] r9:db7bb200 r8:db442818 r7:c0c962dc r6:dad71000 r5:db719dac r4:c092cd78
[ 3453.521573] [<c09259c8>] (iio_read_channel_info) from [<c068be3c>] (dev_attr_show+0x24/0x50)
[ 3453.530100] r6:dad71000 r5:db475d00 r4:db6c6120
[ 3453.534804] [<c068be18>] (dev_attr_show) from [<c033e8cc>] (sysfs_kf_seq_show+0x8c/0xf0)
[ 3453.542983] r5:00001000 r4:db6c6120
[ 3453.546634] [<c033e840>] (sysfs_kf_seq_show) from [<c033cfb4>] (kernfs_seq_show+0x2c/0x30)
[ 3453.554993] r9:db6c6138 r8:db719f60 r7:db6fb4d0 r6:007000c0 r5:00000000 r4:db6c6120
[ 3453.562834] [<c033cf88>] (kernfs_seq_show) from [<c02e1a44>] (seq_read+0x15c/0x540)
[ 3453.570585] [<c02e18e8>] (seq_read) from [<c033dcdc>] (kernfs_fop_read+0x38/0x1f4)
[ 3453.578245] r10:00000003 r9:00000000 r8:db719f60 r7:db719f60 r6:a7dd1c84 r5:00000010
[ 3453.586158] r4:db7bb200
[ 3453.588765] [<c033dca4>] (kernfs_fop_read) from [<c02b53f4>] (__vfs_read+0x44/0x170)
[ 3453.596599] r10:00000003 r9:00000000 r8:db719f60 r7:db719f60 r6:a7dd1c84 r5:c033dca4
[ 3453.604512] r4:db6fb3c0
[ 3453.607114] [<c02b53b0>] (__vfs_read) from [<c02b55b0>] (vfs_read+0x90/0x110)
[ 3453.614335] r8:00000001 r7:db719f60 r6:a7dd1c84 r5:db6fb3c0 r4:00000010
[ 3453.621123] [<c02b5520>] (vfs_read) from [<c02b5b14>] (ksys_read+0x68/0xf0)
[ 3453.628171] r9:00000000 r8:00000000 r7:00000010 r6:a7dd1c84 r5:db6fb3c3 r4:db6fb3c0
[ 3453.636005] [<c02b5aac>] (ksys_read) from [<c02b5bac>] (sys_read+0x10/0x14)
[ 3453.643055] r9:db718000 r8:c01011c4 r7:00000003 r6:0000000b r5:a7dd1c84 r4:00000010
[ 3453.650890] [<c02b5b9c>] (sys_read) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[ 3453.658455] Exception stack(0xdb719fa8 to 0xdb719ff0)
[ 3453.663584] 9fa0: 00000010 a7dd1c84 0000000b a7dd1c84 00000010 00000000
[ 3453.671854] 9fc0: 00000010 a7dd1c84 0000000b 00000003 a7dd2440 a7dd24b0 00000001 aadf8ac8
[ 3453.680120] 9fe0: 00000003 a7dd1c60 b672c135 b672e126

  • Hello,

    I have assigned your thread, but the thread owner may be on vacation. Please expect a delayed response. Feel free to ping the thread if you do not get a reply within the first week of January.

    Regards,

    Nick

  • Hello Nick, 

    Any updates on this? 

  • Hello 

    Requesting you please help on this issue to find root cause.   

  • Hi Sathish,

    Sorry for my late response.

    I don't have much of experience in debugging such RCU stall cases, but have you checked the kernel documentation https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt ? It explains some details of the RCU stall detector and some guidance for debugging it. The kernel ftrace RCU event tracing might be helpful in your debug.

  • All,

    Thank you for everyone's patience as we continue to work through this.

    If (assuming) we all still have specific questions about how a certain TI IP works, or specific questions about how certain TI code works, we can absolutely help here. But if this is in fact a generic Linux question (?), there are other places online we should also look.

    We would also like to just circle back to the original problem statement of "CPU stall issues  in long run setup  with custom RS485 driver based application which periodically scans & requests  slaves on the RS485 bus" to ask if any additional testing was done to help isolate either the custom RS485 driver, long run setup or even the application itself.  In other words, remove some of those dependencies and re-test to see if the same behavior exists.  And if so, from that exercise, it pointed to a base Linux or driver issue, that would be very helpful in finding root cause here.

    Regardless, our team has escalated to the top now and do expect some additional feedback shortly.

    Thank you,

    Chris

  • Hello 

    We had some more discussion on this internally

    The error signature ("rcu: INFO: rcu_preempt self-detected stall on CPU") indicates a busy loop somewhere in kernel which could be caused by many reasons. So it is a Linux system debug. We have pointed to a kernel documentation which explains the details of this symptom and some guidelines for debugging.

    Our ability to help on this limited given it is an old kernel (released in 2019) and custom application.

    Some additional suggestions that might help


    - Have any patches been applied to the kernel? Please provide the kernel signature line saying what is the git tag.
    - Since the AM335x is not seeing this are there any large memory copies going on? Is there a display?
    - Is it possible for you to the RT kernel? Might allow a recovery from the stall?

  • Hello Mukul,

    Please find comments below.

    - Have any patches been applied to the kernel? Please provide the kernel signature line saying what is the git tag.

       @ Linux  4.19.94-g5a23bc00e0 #1 PREEMPT Wed Jan 3 16:31:39 UTC 2024 armv7l GNU/Linux

     

    - Since the AM335x is not seeing this are there any large memory copies going on? Is there a display?

       @ We do have display and our UI application utilizes EGLFS 

     

    - Is it possible for you to the RT kernel? Might allow a recovery from the stall?

      @ We are working on building the necessary images based on TI RT Kernel.

  • Hi,

    The kernel signature is different than what is posted in the release notes. Have there been any patches applied to the kernel?

    If changes have been made to the kernel what were they intended to do?

    TI is only able to support issues in the drivers written by TI. The RCU stalls at the moment do not look related to TI drivers.

    Regarding the tsk_main_application, at or about line 1087 was there a call into the kernel?

    Best Regards,

    Schuyler