Hello, i get spontaneous some linux kernel hangs on my device, we are using the main line kernel 5.10.163 but we have also tested the TI version 5.10.158 with the same result:
[65753.330296][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:16255] [65753.339334][ C0] CPU: 0 PID: 16255 Comm: kworker/0:0 Not tainted 5.10.158-ti-158 #1 [65753.347671][ C0] Hardware name: Generic AM33XX (Flattened Device Tree) [65753.355235][ C0] Workqueue: events dbs_work_handler [65753.361051][ C0] PC is at __do_softirq+0x9c/0x27c [65753.366486][ C0] LR is at irq_exit+0xc0/0x11c [65753.371501][ C0] pc : [<c0101224>] lr : [<c0124ef4>] psr: 20010113 [65753.378848][ C0] sp : ca14fd08 ip : c0a50a00 fp : c15af500 [65753.385142][ C0] r10: c15afd80 r9 : c0a66a10 r8 : c1009400 [65753.391462][ C0] r7 : ca14e000 r6 : 00000000 r5 : c0a607d4 r4 : 00000002 [65753.399091][ C0] r3 : 00000000 r2 : c0a61000 r1 : c0a61000 r0 : c103f000 [65753.406766][ C0] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [65753.415025][ C0] Control: 10c5387d Table: 82bfc019 DAC: 00000055 [65753.421916][ C0] CPU: 0 PID: 16255 Comm: kworker/0:0 Not tainted 5.10.158-ti-158 #1 [65753.430194][ C0] Hardware name: Generic AM33XX (Flattened Device Tree) [65753.437507][ C0] Workqueue: events dbs_work_handler [65753.443440][ C0] [<c010ab14>] (unwind_backtrace) from [<c0109784>] (show_stack+0x10/0x14) [65753.452492][ C0] [<c0109784>] (show_stack) from [<c01934a0>] (watchdog_timer_fn+0x1ac/0x1d8) [65753.461916][ C0] [<c01934a0>] (watchdog_timer_fn) from [<c0163fd0>] (__hrtimer_run_queues.constprop.0+0x150/0x1dc) [65753.473229][ C0] [<c0163fd0>] (__hrtimer_run_queues.constprop.0) from [<c0164bcc>] (hrtimer_interrupt+0xd4/0x2c0) [65753.484534][ C0] [<c0164bcc>] (hrtimer_interrupt) from [<c04b82ac>] (dmtimer_clockevent_interrupt+0x24/0x2c) [65753.495340][ C0] [<c04b82ac>] (dmtimer_clockevent_interrupt) from [<c0155ab8>] (__handle_irq_event_percpu+0x50/0x13c) [65753.506859][ C0] [<c0155ab8>] (__handle_irq_event_percpu) from [<c0155c74>] (handle_irq_event+0x48/0xb4) [65753.517235][ C0] [<c0155c74>] (handle_irq_event) from [<c0159950>] (handle_level_irq+0xa0/0x170) [65753.526907][ C0] [<c0159950>] (handle_level_irq) from [<c0155338>] (__handle_domain_irq+0x78/0xc8) [65753.536721][ C0] [<c0155338>] (__handle_domain_irq) from [<c0100b78>] (__irq_svc+0x58/0x74) [65753.545758][ C0] Exception stack(0xca14fcb8 to 0xca14fd00) [65753.551932][ C0] fca0: c103f000 c0a61000 [65753.561354][ C0] fcc0: c0a61000 00000000 00000002 c0a607d4 00000000 ca14e000 c1009400 c0a66a10 [65753.570776][ C0] fce0: c15afd80 c15af500 c0a50a00 ca14fd08 c0124ef4 c0101224 20010113 ffffffff [65753.580264][ C0] [<c0100b78>] (__irq_svc) from [<c0101224>] (__do_softirq+0x9c/0x27c) [65753.588937][ C0] [<c0101224>] (__do_softirq) from [<c0124ef4>] (irq_exit+0xc0/0x11c) [65753.597524][ C0] [<c0124ef4>] (irq_exit) from [<c015533c>] (__handle_domain_irq+0x7c/0xc8) [65753.606616][ C0] [<c015533c>] (__handle_domain_irq) from [<c0100b78>] (__irq_svc+0x58/0x74) [65753.615637][ C0] Exception stack(0xca14fd78 to 0xca14fdc0) [65753.621801][ C0] fd60: 00000000 00000002 [65753.631222][ C0] fd80: 00000420 f9e00420 00000000 c1019e00 00000001 c1019280 00000000 00000005 [65753.640640][ C0] fda0: c15afd80 c15af500 00000000 ca14fdc8 c03979ac c0394f90 60010113 ffffffff [65753.650172][ C0] [<c0100b78>] (__irq_svc) from [<c0394f90>] (clk_memmap_readl+0x38/0x98) [65753.659211][ C0] [<c0394f90>] (clk_memmap_readl) from [<c03979ac>] (_omap3_noncore_dpll_lock+0x34/0xb8) [65753.669559][ C0] [<c03979ac>] (_omap3_noncore_dpll_lock) from [<c0397b58>] (omap3_noncore_dpll_program+0x128/0x298) [65753.681001][ C0] [<c0397b58>] (omap3_noncore_dpll_program) from [<c038edf0>] (clk_change_rate+0x16c/0x29c) [65753.691582][ C0] [<c038edf0>] (clk_change_rate) from [<c038f098>] (clk_core_set_rate_nolock+0x178/0x244) [65753.701994][ C0] [<c038f098>] (clk_core_set_rate_nolock) from [<c038fba0>] (clk_set_rate+0x30/0x154) [65753.712144][ C0] [<c038fba0>] (clk_set_rate) from [<c0486820>] (dev_pm_opp_set_rate+0x304/0x674) [65753.721872][ C0] [<c0486820>] (dev_pm_opp_set_rate) from [<c048c44c>] (__cpufreq_driver_target+0x1a8/0x5f8) [65753.732519][ C0] [<c048c44c>] (__cpufreq_driver_target) from [<c048f3e0>] (od_dbs_update+0xbc/0x168) [65753.742581][ C0] [<c048f3e0>] (od_dbs_update) from [<c04904d4>] (dbs_work_handler+0x2c/0x54) [65753.751997][ C0] [<c04904d4>] (dbs_work_handler) from [<c01369e8>] (process_one_work+0x194/0x404) [65753.761788][ C0] [<c01369e8>] (process_one_work) from [<c0136cd4>] (worker_thread+0x7c/0x570) [65753.771218][ C0] [<c0136cd4>] (worker_thread) from [<c013c7d0>] (kthread+0x138/0x140) [65753.779892][ C0] [<c013c7d0>] (kthread) from [<c0100138>] (ret_from_fork+0x14/0x3c) [65753.788206][ C0] Exception stack(0xca14ffb0 to 0xca14fff8) [65753.794404][ C0] ffa0: 00000000 00000000 00000000 00000000 [65753.803807][ C0] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [65753.813172][ C0] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
Our temporary solution is to use the performance scaling governor, with them we don't get hangups.