This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software:
We are facing a problem which the cpu usage of kworker becomes high (about 70%)
and some software stop working.
At this time, musb_urb_dequeque() in usdb_dhrc and rx_submit() in usbnet have high cpu usage in perf tool.
When we repeats following steps with USB connected LTE modem, the kworker's problem occurs once every 20 to 200 times.
And the problem is recovered if musb driver is reloaded as bellow.
--------------------
$ rmmod musb_dsps
$ rmmod musb_hdrc
$ modprobe musb_dsps
--------------------
Please tell me the cause of this problem and how to solve or prevent it.
- Here are the steps:
1. Set Modem power on.
2. Wait for serial ports to appears.
(/dev/ttyACM[0-3] appears about 10 sec after power on.)
3. Wait for a ready-to-use message from MODEM.
(^SYSTART messages is transmitted from MODEM when MODEM becomes ready.)
4. Send following AT commands to /dev/ttyACM0 to activate network.
---------------------------------------------------------
[2024-07-26 13:44:41.806] ATE1
[2024-07-26 13:44:41.806] OK
[2024-07-26 13:44:41.951] AT+CMEE=2
[2024-07-26 13:44:41.951] OK
[2024-07-26 13:44:42.088] AT^SGPSC="NMEA/Output","on"
[2024-07-26 13:44:42.088] ^SGPSC: "Nmea/Output","on"
[2024-07-26 13:44:42.088] OK
[2024-07-26 13:44:42.230] AT^SGPSC="Engine","1"
[2024-07-26 13:44:42.230] ^SGPSC: "Engine","1"
[2024-07-26 13:44:42.230] OK
[2024-07-26 13:44:42.359] AT+CPIN?
[2024-07-26 13:44:42.359] +CPIN: READY
[2024-07-26 13:44:42.359] OK
...
[2024-07-26 13:44:43.272] # make_line_active
[2024-07-26 13:44:43.481] AT+COPS?
[2024-07-26 13:44:43.481] +COPS: 0,0,"NTT DOCOMO",7
[2024-07-26 13:44:43.498] OK
[2024-07-26 13:44:43.608] AT+CEREG?
[2024-07-26 13:44:43.637] +CIEV: prov,1,"Commercial-DCM"
[2024-07-26 13:44:43.637] +CEREG: 0,1
[2024-07-26 13:44:43.637] OK
[2024-07-26 13:44:43.766] AT^SWWAN?
[2024-07-26 13:44:43.766] OK
[2024-07-26 13:44:43.878] AT+CREG?
[2024-07-26 13:44:43.878] +CREG: 0,1
[2024-07-26 13:44:43.893] OK
[2024-07-26 13:44:44.006] AT^SWWAN=1,1
[2024-07-26 13:44:44.075] OK
---------------------------------------------------------
5. Receive GPS NMEA sentences on /det/ttyS1 for about 50 seconds.
6. Set Modem power off.
7. repeat from step 1.
- Board: Custom board
- LTE modem interface: USB1 in host mode.
LTE network interface name is eth0.
- device tree:
&am33xx_pinmux { ... usb1_pins_default: usb1_pins_default { pinctrl-single,pins = < AM33XX_IOPAD(0xa34, PIN_OUTPUT | MUX_MODE0) /* (F15) USB1_DRVVBUS.USB1_DRVVBUS */ >; }; usb1_pins_sleep: usb1_pins_sleep { pinctrl-single,pins = < AM33XX_IOPAD(0xa34, PIN_OUTPUT | MUX_MODE7) /* (F15) USB1_DRVVBUS.gpio3_13 */ >; }; } &usb { status = "okay"; }; &usb_ctrl_mod { status = "okay"; }; &usb1_phy { status = "okay"; }; &usb1 { pinctrl-names = "default", "sleep"; pinctrl-0 = <&usb1_pins_default>; pinctrl-1 = <&usb1_pins_sleep>; status = "okay"; dr_mode = "host"; };
- result of 'top' command when the problem occurs:
top - 23:22:36 up 13:10, 2 users, load average: 10.90, 10.05, 8.71 Tasks: 89 total, 2 running, 87 sleeping, 0 stopped, 0 zombie %Cpu(s): 14.7 us, 76.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 8.8 si, 0.0 st MiB Mem : 237.2 total, 111.5 free, 34.0 used, 91.7 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 192.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5762 root 20 0 0 0 0 R 64.0 0.0 341:42.00 kworker/0:3+events 10 root 20 0 0 0 0 S 16.0 0.0 75:42.61 ksoftirqd/0 6798 root 20 0 3608 1784 1508 R 16.0 0.7 0:00.11 top 1 root 20 0 24728 5428 3864 S 0.0 2.2 1:58.37 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.08 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq 9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tasks_kthre 11 root 20 0 0 0 0 I 0.0 0.0 0:19.47 rcu_preempt 12 root 20 0 0 0 0 S 0.0 0.0 0:00.56 kdevtmpfs 13 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
- result of 'perf' command when the problem occurs:
This shows musb_urb_dequeque() in usdb_dhrc and rx_submit() in usbnet have high cpu usage.
----------------------------------------------------------------- Samples: 12K of event 'cycles', Event count (approx.): 354629522 Children Self Command Shared Object Symbol + 76.97% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] ret_from_fork + 76.97% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] kthread + 76.97% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] worker_thread + 76.97% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] process_one_work + 76.69% 0.00% kworker/0:6+eve [usbnet] [k] usbnet_deferred_kevent + 76.69% 0.00% kworker/0:6+eve [usbnet] [k] __handle_link_change.pa + 76.63% 1.00% kworker/0:6+eve [usbnet] [k] unlink_urbs.constprop.0 + 72.65% 0.96% kworker/0:6+eve [kernel.kallsyms] [k] usb_hcd_unlink_urb + 71.03% 27.84% kworker/0:6+eve [musb_hdrc] [k] musb_urb_dequeue + 30.17% 0.25% kworker/0:6+eve [kernel.kallsyms] [k] __softirqentry_text_sta + 30.17% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] __irq_svc + 30.17% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] __handle_domain_irq + 30.17% 0.00% kworker/0:6+eve [kernel.kallsyms] [k] irq_exit + 28.49% 0.08% kworker/0:6+eve [kernel.kallsyms] [k] tasklet_action_common.c + 28.41% 0.13% kworker/0:6+eve [usbnet] [k] usbnet_bh + 28.28% 0.19% kworker/0:6+eve [usbnet] [k] rx_alloc_submit + 28.08% 27.81% kworker/0:6+eve [usbnet] [k] rx_submit + 13.04% 0.15% kworker/0:6+eve [kernel.kallsyms] [k] preempt_schedule_common + 12.94% 0.25% kworker/0:6+eve [kernel.kallsyms] [k] __sched_text_start + 12.69% 12.68% kworker/0:6+eve [kernel.kallsyms] [k] finish_task_switch + 5.13% 0.00% ksoftirqd/0 [kernel.kallsyms] [k] ret_from_fork + 5.13% 0.00% ksoftirqd/0 [kernel.kallsyms] [k] kthread + 5.13% 0.40% ksoftirqd/0 [kernel.kallsyms] [k] smpboot_thread_fn + 4.52% 0.27% ksoftirqd/0 [kernel.kallsyms] [k] schedule -----------------------------------------------------------------
- journal log around the problem:
Jul 25 16:24:42 am335x-evm kernel: usb 1-1: new high-speed USB device number 28 using musb-hdrc Jul 25 16:24:42 am335x-evm kernel: usb 1-1: New USB device found, idVendor=1e2d, idProduct=0069, bcdDevice= 0.00 Jul 25 16:24:42 am335x-evm kernel: usb 1-1: New USB device strings: Mfr=4, Product=3, SerialNumber=5 Jul 25 16:24:42 am335x-evm kernel: usb 1-1: Product: PLSx3 Jul 25 16:24:42 am335x-evm kernel: usb 1-1: Manufacturer: Cinterion Wireless Modules Jul 25 16:24:42 am335x-evm kernel: usb 1-1: SerialNumber: 4a722271 Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.0: ttyACM0: USB ACM device Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.2: ttyACM1: USB ACM device Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.4: ttyACM2: USB ACM device Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.6: ttyACM3: USB ACM device Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {create} index 284 type 1 <ETHER> Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {update} flags 4098 <DOWN> Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500 Jul 25 16:24:42 am335x-evm systemd-networkd[718]: eth0: Link UP Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN> Jul 25 16:24:42 am335x-evm systemd-networkd[718]: eth0: Gained carrier Jul 25 16:24:42 am335x-evm connmand[715]: Adding interface eth0 [ ethernet ] Jul 25 16:24:42 am335x-evm kernel: cdc_ether 1-1:1.8 eth0: register 'cdc_ether' at usb-musb-hdrc.1-1, CDC Ethernet Device, 00:a0:c6:22:27:10 Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {update} flags 102467 <UP,RUNNING,LOWER_UP> Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500 Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 0 <UNKNOWN> Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {add} route fe80:: gw :: scope 0 <UNIVERSE> Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {RX} 0 packets 0 bytes Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {TX} 2 packets 700 bytes Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {update} flags 36867 <UP> Jul 25 16:24:43 am335x-evm systemd-networkd[718]: eth0: Lost carrier Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500 Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN> Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {del} route fe80:: gw :: scope 0 <UNIVERSE> Jul 25 16:24:43 am335x-evm systemd-udevd[6709]: 1-1:1.1: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in01' failed with exit code 1. Jul 25 16:24:43 am335x-evm systemd-udevd[6711]: 1-1:1.3: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in03' failed with exit code 1. Jul 25 16:24:43 am335x-evm systemd-udevd[6716]: 1-1:1.7: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in07' failed with exit code 1. Jul 25 16:24:43 am335x-evm systemd-udevd[6713]: 1-1:1.5: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in05' failed with exit code 1. Jul 25 16:24:43 am335x-evm systemd-udevd[6718]: 1-1:1.9: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in09' failed with exit code 1. Jul 25 16:24:43 am335x-evm systemd-udevd[6717]: Using default interface naming scheme 'v243'. Jul 25 16:24:43 am335x-evm systemd-udevd[6717]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable. Jul 25 16:24:44 am335x-evm systemd-networkd[718]: eth0: IPv6 successfully enabled Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {RX} 0 packets 0 bytes Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {TX} 2 packets 700 bytes Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500 Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN>
Thanks
Hi Bin san,
I forgot to describe important informations.
kernel version is 5.10.100+
URL: git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git
tag: 08.02.00.006
branch: ti-linux-5.10.y
SDK version is 08_02_00_24
Best regards,
Katsuyama
Hi Katsuyama-san,
Let me look into how to debug the problem.
4554 # CONFIG_MUSB_PIO_ONLY is not set
4555 CONFIG_USB_TI_CPPI41_DMA=y
Your kernel has MUSB DMA enabled. Can you please disable it in kernel menuconfig to use MUSB PIO mode to see if the issue still happens?
Hi Bin san,
Than you for your advice.
I tried disabling MUSB DMA with CONFIG_UMSB_PIO_ONLY=y.
But, unfotunately, there was no effect.
We are in a very cornered situation.
Are there any other possible solutions?
Best regards,
Katsuyama
Hi Bin san,
The following log remains in journal.log both before and after disabling MUSB DMA:
am335x-evm systemd-udevd[117]: 47400000.dma-controller: Process '/sbin/modprobe of:Ndma-controllerT(null)Cti,am3359-cppi41' failed with exit code 1.
Is there any relationship?
Best regards,
Katsuyama
Hi Katsuyama-san,
Bin is out of office until the end of next week. Please expect a delayed response.
Hi Katsuyama-san,
Sorry for my late response.
The following log remains in journal.log both before and after disabling MUSB DMA:
I am not sure what causes this message, it could be something in the root filesystem. But I won't worry about it, since the MUSB DMA driver should be built in kernel when it is enabled (CONFIG_USB_TI_CPPI41_DMA=y).
Hi Bin san,
Thank you for your reply.
I got following trace log when the problem occurred.
Aug 11 06:52:24 am335x-evm kernel: ------------[ cut here ]------------ Aug 11 06:52:24 am335x-evm kernel: WARNING: CPU: 0 PID: 22742 at drivers/usb/musb/musb_host.c:113 musb_h_tx_flush_fifo+0x13c/0x140 [musb_hdrc] Aug 11 06:52:24 am335x-evm kernel: musb-hdrc musb-hdrc.1: Could not flush host TX2 fifo: csr: 2403 Aug 11 06:52:24 am335x-evm kernel: Modules linked in: musb_dsps musb_hdrc cdc_ether usbnet cdc_acm iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ip_tables x_tables 8021q g_serial wl18xx wlcore mac80211 libarc4 cfg80211 omap_aes_driver omap_sham omap_crypto crypto_engine pm33xx ti_emif_sram c_can_platform c_can can_dev wlcore_sdio phy_am335x phy_generic wkup_m3_ipc phy_am335x_control omap_wdt wkup_m3_rproc uio_module_drv(O) uio cryptodev(O) [last unloaded: musb_hdrc] Aug 11 06:52:24 am335x-evm kernel: CPU: 0 PID: 22742 Comm: modem_status.sh Tainted: G O 5.10.100+ #1 Aug 11 06:52:24 am335x-evm kernel: Hardware name: Generic AM33XX (Flattened Device Tree) Aug 11 06:52:24 am335x-evm kernel: [<c010d584>] (unwind_backtrace) from [<c010a12c>] (show_stack+0x10/0x14) Aug 11 06:52:24 am335x-evm kernel: [<c010a12c>] (show_stack) from [<c0a44b54>] (__warn+0xd4/0xec) Aug 11 06:52:24 am335x-evm kernel: [<c0a44b54>] (__warn) from [<c0a44c04>] (warn_slowpath_fmt+0x98/0xc8) Aug 11 06:52:24 am335x-evm kernel: [<c0a44c04>] (warn_slowpath_fmt) from [<bf24d254>] (musb_h_tx_flush_fifo+0x13c/0x140 [musb_hdrc]) Aug 11 06:52:24 am335x-evm kernel: [<bf24d254>] (musb_h_tx_flush_fifo [musb_hdrc]) from [<bf24e6dc>] (musb_cleanup_urb+0x110/0x1fc [musb_hdrc]) Aug 11 06:52:24 am335x-evm kernel: [<bf24e6dc>] (musb_cleanup_urb [musb_hdrc]) from [<bf24e8e4>] (musb_urb_dequeue+0x11c/0x150 [musb_hdrc]) Aug 11 06:52:24 am335x-evm kernel: [<bf24e8e4>] (musb_urb_dequeue [musb_hdrc]) from [<c07570d4>] (usb_hcd_unlink_urb+0xac/0xc4) Aug 11 06:52:24 am335x-evm kernel: [<c07570d4>] (usb_hcd_unlink_urb) from [<c0758360>] (usb_poison_urb+0x60/0xdc) Aug 11 06:52:24 am335x-evm kernel: [<c0758360>] (usb_poison_urb) from [<bf03ce94>] (acm_poison_urbs+0x24/0x5c [cdc_acm]) Aug 11 06:52:24 am335x-evm kernel: [<bf03ce94>] (acm_poison_urbs [cdc_acm]) from [<bf03d158>] (acm_port_shutdown+0x14/0xb8 [cdc_acm]) Aug 11 06:52:24 am335x-evm kernel: [<bf03d158>] (acm_port_shutdown [cdc_acm]) from [<c05997c8>] (tty_port_shutdown+0x74/0xa0) Aug 11 06:52:24 am335x-evm kernel: [<c05997c8>] (tty_port_shutdown) from [<c0599d9c>] (tty_port_close+0x3c/0x7c) Aug 11 06:52:24 am335x-evm kernel: [<c0599d9c>] (tty_port_close) from [<c05916a0>] (tty_release+0xf0/0x4b0) Aug 11 06:52:24 am335x-evm kernel: [<c05916a0>] (tty_release) from [<c024536c>] (__fput+0x78/0x270) Aug 11 06:52:24 am335x-evm kernel: [<c024536c>] (__fput) from [<c0142a88>] (task_work_run+0x98/0xcc) Aug 11 06:52:24 am335x-evm kernel: [<c0142a88>] (task_work_run) from [<c0109ed0>] (do_work_pending+0x57c/0x598) Aug 11 06:52:24 am335x-evm kernel: [<c0109ed0>] (do_work_pending) from [<c01000cc>] (slow_work_pending+0xc/0x20) Aug 11 06:52:24 am335x-evm kernel: Exception stack(0xc4b3ffb0 to 0xc4b3fff8) Aug 11 06:52:24 am335x-evm kernel: ffa0: 00000000 00000004 00000113 00000000 Aug 11 06:52:24 am335x-evm kernel: ffc0: 000fcff0 b6f1a230 00000000 00000006 00000009 00000011 00121038 000caaf4 Aug 11 06:52:24 am335x-evm kernel: ffe0: 00000006 bef55620 48dddcf3 48d68d16 20070030 00000011 Aug 11 06:52:24 am335x-evm kernel: ---[ end trace 8d94f53d281fd74f ]---
And I found your following patches for kernel 4.2.0 in 2016.
(1/2) marc.info/
(2/2) marc.info/
Will these solve our problem in kernel 5.10?
If so, does this mean I don't need to disable MUSB DMA?
Best regards,
Katsuyama
Hi Katsuyama-san,
I got following trace log when the problem occurred.
This kernel WARNING should be harmless, it can be ignored.
Have you tested with MUSB CPPI41DMA disabled to see if the issue still happens?
Hi Bin san,
Thank you for your reply.
This kernel WARNING should be harmless, it can be ignored.
I see. Thank you for your answer.
Have you tested with MUSB CPPI41DMA disabled to see if the issue still happens?
Yes, I could have a good result, but not perfect.
The combination with reloading musb_hdrc & musb_dsps before powering on the modem
seems perfect.
I checked the source code of the driver.
The '(1/2) patch' has already been applied in kernel 5.10.
So, I applied '(2/2) patch' and 'retries=10000' one and test with enabling MUSB DMA.
But, the result was NG.
Some people in my company and my customer don't want to disable MSUB DMA and
reload the musb driver. And they expected a good result with these patches.
So, can I get your comment on the following?
1. These patches are valid or invalid?
2. The mechanism which this problem occurs with MUSB DMA.
3. Cons of disabling MUSB DMA.
(I understand that transfer speeds will be slower.
Best regards,
Katsuyama
Hi Katsuyama-san,
1. These patches are valid or invalid?
As you said, the first patch (1/2) is already in the kernel, so it is valid. I will review the second patch and get back to you.
2. The mechanism which this problem occurs with MUSB DMA.
First of all, without further debug into the issue, I don't know if the issue is related to the MUSB DMA. That is why I asked you to test with the DMA disabled to see if the issue still happens or not.
3. Cons of disabling MUSB DMA.
(I understand that transfer speeds will be slower.
Did you compared the transfer speed with and without MUSB DMA in your use case? What is the difference?