AM3356: kworker becomes high cpu usage and some software stop with USB connected LTE modem

Part Number: AM3356

Tool/software:

We are facing a problem which the cpu usage of kworker becomes high (about 70%)
and some software stop working.
At this time, musb_urb_dequeque() in usdb_dhrc and rx_submit() in usbnet have high cpu usage in perf tool.

When we repeats following steps with USB connected LTE modem, the kworker's problem occurs once every 20 to 200 times.

And the problem is recovered if musb driver is reloaded as bellow.

--------------------
$ rmmod musb_dsps
$ rmmod musb_hdrc
$ modprobe musb_dsps
--------------------

Please tell me the cause of this problem and how to solve or prevent it.

- Here are the steps:
1. Set Modem power on.
2. Wait for serial ports to appears.
(/dev/ttyACM[0-3] appears about 10 sec after power on.)
3. Wait for a ready-to-use message from MODEM.
(^SYSTART messages is transmitted from MODEM when MODEM becomes ready.)
4. Send following AT commands to /dev/ttyACM0 to activate network.
---------------------------------------------------------
[2024-07-26 13:44:41.806] ATE1
[2024-07-26 13:44:41.806] OK
[2024-07-26 13:44:41.951] AT+CMEE=2
[2024-07-26 13:44:41.951] OK
[2024-07-26 13:44:42.088] AT^SGPSC="NMEA/Output","on"
[2024-07-26 13:44:42.088] ^SGPSC: "Nmea/Output","on"
[2024-07-26 13:44:42.088] OK
[2024-07-26 13:44:42.230] AT^SGPSC="Engine","1"
[2024-07-26 13:44:42.230] ^SGPSC: "Engine","1"
[2024-07-26 13:44:42.230] OK
[2024-07-26 13:44:42.359] AT+CPIN?
[2024-07-26 13:44:42.359] +CPIN: READY
[2024-07-26 13:44:42.359] OK
...
[2024-07-26 13:44:43.272] # make_line_active
[2024-07-26 13:44:43.481] AT+COPS?
[2024-07-26 13:44:43.481] +COPS: 0,0,"NTT DOCOMO",7
[2024-07-26 13:44:43.498] OK
[2024-07-26 13:44:43.608] AT+CEREG?
[2024-07-26 13:44:43.637] +CIEV: prov,1,"Commercial-DCM"
[2024-07-26 13:44:43.637] +CEREG: 0,1
[2024-07-26 13:44:43.637] OK

[2024-07-26 13:44:43.766] AT^SWWAN?
[2024-07-26 13:44:43.766] OK

[2024-07-26 13:44:43.878] AT+CREG?
[2024-07-26 13:44:43.878] +CREG: 0,1
[2024-07-26 13:44:43.893] OK

[2024-07-26 13:44:44.006] AT^SWWAN=1,1
[2024-07-26 13:44:44.075] OK
---------------------------------------------------------
5. Receive GPS NMEA sentences on /det/ttyS1 for about 50 seconds.
6. Set Modem power off.

7. repeat from step 1.

- Board: Custom board

- LTE modem interface: USB1 in host mode.
LTE network interface name is eth0.

- device tree:

  &am33xx_pinmux {
      ...
      usb1_pins_default: usb1_pins_default {
	      pinctrl-single,pins = <
		      AM33XX_IOPAD(0xa34, PIN_OUTPUT | MUX_MODE0) 	/* (F15) USB1_DRVVBUS.USB1_DRVVBUS */
	      >;
      };

      usb1_pins_sleep: usb1_pins_sleep {
	      pinctrl-single,pins = <
		      AM33XX_IOPAD(0xa34, PIN_OUTPUT | MUX_MODE7) 	/* (F15) USB1_DRVVBUS.gpio3_13 */
	      >;
      };
  }
  &usb {
	  status = "okay";
  };

  &usb_ctrl_mod {
	  status = "okay";
  };
  &usb1_phy {
	  status = "okay";
  };
  &usb1 {
	  pinctrl-names = "default", "sleep";
	  pinctrl-0 = <&usb1_pins_default>;
	  pinctrl-1 = <&usb1_pins_sleep>;
	  status = "okay";
	  dr_mode = "host";
  };

- result of 'top' command when the problem occurs:

  top - 23:22:36 up 13:10,  2 users,  load average: 10.90, 10.05, 8.71
  Tasks:  89 total,   2 running,  87 sleeping,   0 stopped,   0 zombie
  %Cpu(s): 14.7 us, 76.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  8.8 si,  0.0 st
  MiB Mem :    237.2 total,    111.5 free,     34.0 used,     91.7 buff/cache
  MiB Swap:      0.0 total,      0.0 free,      0.0 used.    192.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   5762 root      20   0       0      0      0 R  64.0   0.0 341:42.00 kworker/0:3+events
     10 root      20   0       0      0      0 S  16.0   0.0  75:42.61 ksoftirqd/0
   6798 root      20   0    3608   1784   1508 R  16.0   0.7   0:00.11 top
      1 root      20   0   24728   5428   3864 S   0.0   2.2   1:58.37 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.08 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
      9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_kthre
     11 root      20   0       0      0      0 I   0.0   0.0   0:19.47 rcu_preempt
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.56 kdevtmpfs
     13 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns

- result of 'perf' command when the problem occurs:

This shows musb_urb_dequeque() in usdb_dhrc and rx_submit() in usbnet have high cpu usage.

  -----------------------------------------------------------------
    Samples: 12K of event 'cycles', Event count (approx.): 354629522
    Children      Self  Command          Shared Object                         Symbol
  +   76.97%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] ret_from_fork
  +   76.97%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] kthread
  +   76.97%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] worker_thread
  +   76.97%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] process_one_work
  +   76.69%     0.00%  kworker/0:6+eve  [usbnet]                              [k] usbnet_deferred_kevent
  +   76.69%     0.00%  kworker/0:6+eve  [usbnet]                              [k] __handle_link_change.pa
  +   76.63%     1.00%  kworker/0:6+eve  [usbnet]                              [k] unlink_urbs.constprop.0
  +   72.65%     0.96%  kworker/0:6+eve  [kernel.kallsyms]                     [k] usb_hcd_unlink_urb
  +   71.03%    27.84%  kworker/0:6+eve  [musb_hdrc]                           [k] musb_urb_dequeue
  +   30.17%     0.25%  kworker/0:6+eve  [kernel.kallsyms]                     [k] __softirqentry_text_sta
  +   30.17%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] __irq_svc
  +   30.17%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] __handle_domain_irq
  +   30.17%     0.00%  kworker/0:6+eve  [kernel.kallsyms]                     [k] irq_exit
  +   28.49%     0.08%  kworker/0:6+eve  [kernel.kallsyms]                     [k] tasklet_action_common.c
  +   28.41%     0.13%  kworker/0:6+eve  [usbnet]                              [k] usbnet_bh
  +   28.28%     0.19%  kworker/0:6+eve  [usbnet]                              [k] rx_alloc_submit
  +   28.08%    27.81%  kworker/0:6+eve  [usbnet]                              [k] rx_submit
  +   13.04%     0.15%  kworker/0:6+eve  [kernel.kallsyms]                     [k] preempt_schedule_common
  +   12.94%     0.25%  kworker/0:6+eve  [kernel.kallsyms]                     [k] __sched_text_start
  +   12.69%    12.68%  kworker/0:6+eve  [kernel.kallsyms]                     [k] finish_task_switch
  +    5.13%     0.00%  ksoftirqd/0      [kernel.kallsyms]                     [k] ret_from_fork
  +    5.13%     0.00%  ksoftirqd/0      [kernel.kallsyms]                     [k] kthread
  +    5.13%     0.40%  ksoftirqd/0      [kernel.kallsyms]                     [k] smpboot_thread_fn
  +    4.52%     0.27%  ksoftirqd/0      [kernel.kallsyms]                     [k] schedule
  -----------------------------------------------------------------

- journal log around the problem:

  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: new high-speed USB device number 28 using musb-hdrc
  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: New USB device found, idVendor=1e2d, idProduct=0069, bcdDevice= 0.00
  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: New USB device strings: Mfr=4, Product=3, SerialNumber=5
  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: Product: PLSx3
  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: Manufacturer: Cinterion Wireless Modules
  Jul 25 16:24:42 am335x-evm kernel: usb 1-1: SerialNumber: 4a722271
  Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.0: ttyACM0: USB ACM device
  Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.2: ttyACM1: USB ACM device
  Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.4: ttyACM2: USB ACM device
  Jul 25 16:24:42 am335x-evm kernel: cdc_acm 1-1:1.6: ttyACM3: USB ACM device
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {create} index 284 type 1 <ETHER>
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {update} flags 4098 <DOWN>
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500
  Jul 25 16:24:42 am335x-evm systemd-networkd[718]: eth0: Link UP
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN>
  Jul 25 16:24:42 am335x-evm systemd-networkd[718]: eth0: Gained carrier
  Jul 25 16:24:42 am335x-evm connmand[715]: Adding interface eth0 [ ethernet ]
  Jul 25 16:24:42 am335x-evm kernel: cdc_ether 1-1:1.8 eth0: register 'cdc_ether' at usb-musb-hdrc.1-1, CDC Ethernet Device, 00:a0:c6:22:27:10
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {update} flags 102467 <UP,RUNNING,LOWER_UP>
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 0 <UNKNOWN>
  Jul 25 16:24:42 am335x-evm connmand[715]: eth0 {add} route fe80:: gw :: scope 0 <UNIVERSE>
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {RX} 0 packets 0 bytes
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {TX} 2 packets 700 bytes
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {update} flags 36867 <UP>
  Jul 25 16:24:43 am335x-evm systemd-networkd[718]: eth0: Lost carrier
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN>
  Jul 25 16:24:43 am335x-evm connmand[715]: eth0 {del} route fe80:: gw :: scope 0 <UNIVERSE>
  Jul 25 16:24:43 am335x-evm systemd-udevd[6709]: 1-1:1.1: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in01' failed with exit code 1.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6711]: 1-1:1.3: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in03' failed with exit code 1.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6716]: 1-1:1.7: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in07' failed with exit code 1.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6713]: 1-1:1.5: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in05' failed with exit code 1.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6718]: 1-1:1.9: Process '/sbin/modprobe usb:v1E2Dp0069d0000dcEFdsc02dp01ic0Aisc00ip00in09' failed with exit code 1.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6717]: Using default interface naming scheme 'v243'.
  Jul 25 16:24:43 am335x-evm systemd-udevd[6717]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
  Jul 25 16:24:44 am335x-evm systemd-networkd[718]: eth0: IPv6 successfully enabled
  Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {RX} 0 packets 0 bytes
  Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {TX} 2 packets 700 bytes
  Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {newlink} index 284 address 00:A0:C6:22:27:10 mtu 1500
  Jul 25 16:24:44 am335x-evm connmand[715]: eth0 {newlink} index 284 operstate 2 <DOWN>

Thanks

  • Hi,

    Which version of the kernel do you use on AM3356?

  • Hi Bin san,

    I forgot to describe important informations.

    kernel version is 5.10.100+
    URL: git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git
    tag: 08.02.00.006
    branch: ti-linux-5.10.y

    SDK version is 08_02_00_24

    Best regards,
    Katsuyama

  • Hi Bin-san,

    I attach kernel config.linux-config.zip

  • Hi Katsuyama-san,

    Let me look into how to debug the problem.

      4554 # CONFIG_MUSB_PIO_ONLY is not set
      4555 CONFIG_USB_TI_CPPI41_DMA=y

    Your kernel has MUSB DMA enabled. Can you please disable it in kernel menuconfig to use MUSB PIO mode to see if the issue still happens?

  • Hi Bin san,

    Than you for your advice.
    I tried disabling MUSB DMA with CONFIG_UMSB_PIO_ONLY=y.
    But, unfotunately, there was no effect.

    We are in a very cornered situation.
    Are there any other possible solutions?

    Best regards,
    Katsuyama

  • Hi Bin san,

    The following log remains in journal.log both before and after disabling MUSB DMA:

    am335x-evm systemd-udevd[117]: 47400000.dma-controller: Process '/sbin/modprobe of:Ndma-controllerT(null)Cti,am3359-cppi41' failed with exit code 1.
    

    Is there any relationship?


    Best regards,
    Katsuyama

  • Hi Katsuyama-san,

    Bin is out of office until the end of next week. Please expect a delayed response.

  • Hi Katsuyama-san,

    Sorry for my late response.

    The following log remains in journal.log both before and after disabling MUSB DMA:

    I am not sure what causes this message, it could be something in the root filesystem. But I won't worry about it, since the MUSB DMA driver should be built in kernel when it is enabled (CONFIG_USB_TI_CPPI41_DMA=y).

  • Hi Bin san,

    Thank you for your reply.

    I got following trace log when the problem occurred.

    Aug 11 06:52:24 am335x-evm kernel: ------------[ cut here ]------------
    Aug 11 06:52:24 am335x-evm kernel: WARNING: CPU: 0 PID: 22742 at drivers/usb/musb/musb_host.c:113 musb_h_tx_flush_fifo+0x13c/0x140 [musb_hdrc]
    Aug 11 06:52:24 am335x-evm kernel: musb-hdrc musb-hdrc.1: Could not flush host TX2 fifo: csr: 2403
    Aug 11 06:52:24 am335x-evm kernel: Modules linked in: musb_dsps musb_hdrc cdc_ether usbnet cdc_acm iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ip_tables x_tables 8021q g_serial wl18xx wlcore mac80211 libarc4 cfg80211 omap_aes_driver omap_sham omap_crypto crypto_engine pm33xx ti_emif_sram c_can_platform c_can can_dev wlcore_sdio phy_am335x phy_generic wkup_m3_ipc phy_am335x_control omap_wdt wkup_m3_rproc uio_module_drv(O) uio cryptodev(O) [last unloaded: musb_hdrc]
    Aug 11 06:52:24 am335x-evm kernel: CPU: 0 PID: 22742 Comm: modem_status.sh Tainted: G           O      5.10.100+ #1
    Aug 11 06:52:24 am335x-evm kernel: Hardware name: Generic AM33XX (Flattened Device Tree)
    Aug 11 06:52:24 am335x-evm kernel: [<c010d584>] (unwind_backtrace) from [<c010a12c>] (show_stack+0x10/0x14)
    Aug 11 06:52:24 am335x-evm kernel: [<c010a12c>] (show_stack) from [<c0a44b54>] (__warn+0xd4/0xec)
    Aug 11 06:52:24 am335x-evm kernel: [<c0a44b54>] (__warn) from [<c0a44c04>] (warn_slowpath_fmt+0x98/0xc8)
    Aug 11 06:52:24 am335x-evm kernel: [<c0a44c04>] (warn_slowpath_fmt) from [<bf24d254>] (musb_h_tx_flush_fifo+0x13c/0x140 [musb_hdrc])
    Aug 11 06:52:24 am335x-evm kernel: [<bf24d254>] (musb_h_tx_flush_fifo [musb_hdrc]) from [<bf24e6dc>] (musb_cleanup_urb+0x110/0x1fc [musb_hdrc])
    Aug 11 06:52:24 am335x-evm kernel: [<bf24e6dc>] (musb_cleanup_urb [musb_hdrc]) from [<bf24e8e4>] (musb_urb_dequeue+0x11c/0x150 [musb_hdrc])
    Aug 11 06:52:24 am335x-evm kernel: [<bf24e8e4>] (musb_urb_dequeue [musb_hdrc]) from [<c07570d4>] (usb_hcd_unlink_urb+0xac/0xc4)
    Aug 11 06:52:24 am335x-evm kernel: [<c07570d4>] (usb_hcd_unlink_urb) from [<c0758360>] (usb_poison_urb+0x60/0xdc)
    Aug 11 06:52:24 am335x-evm kernel: [<c0758360>] (usb_poison_urb) from [<bf03ce94>] (acm_poison_urbs+0x24/0x5c [cdc_acm])
    Aug 11 06:52:24 am335x-evm kernel: [<bf03ce94>] (acm_poison_urbs [cdc_acm]) from [<bf03d158>] (acm_port_shutdown+0x14/0xb8 [cdc_acm])
    Aug 11 06:52:24 am335x-evm kernel: [<bf03d158>] (acm_port_shutdown [cdc_acm]) from [<c05997c8>] (tty_port_shutdown+0x74/0xa0)
    Aug 11 06:52:24 am335x-evm kernel: [<c05997c8>] (tty_port_shutdown) from [<c0599d9c>] (tty_port_close+0x3c/0x7c)
    Aug 11 06:52:24 am335x-evm kernel: [<c0599d9c>] (tty_port_close) from [<c05916a0>] (tty_release+0xf0/0x4b0)
    Aug 11 06:52:24 am335x-evm kernel: [<c05916a0>] (tty_release) from [<c024536c>] (__fput+0x78/0x270)
    Aug 11 06:52:24 am335x-evm kernel: [<c024536c>] (__fput) from [<c0142a88>] (task_work_run+0x98/0xcc)
    Aug 11 06:52:24 am335x-evm kernel: [<c0142a88>] (task_work_run) from [<c0109ed0>] (do_work_pending+0x57c/0x598)
    Aug 11 06:52:24 am335x-evm kernel: [<c0109ed0>] (do_work_pending) from [<c01000cc>] (slow_work_pending+0xc/0x20)
    Aug 11 06:52:24 am335x-evm kernel: Exception stack(0xc4b3ffb0 to 0xc4b3fff8)
    Aug 11 06:52:24 am335x-evm kernel: ffa0:                                     00000000 00000004 00000113 00000000
    Aug 11 06:52:24 am335x-evm kernel: ffc0: 000fcff0 b6f1a230 00000000 00000006 00000009 00000011 00121038 000caaf4
    Aug 11 06:52:24 am335x-evm kernel: ffe0: 00000006 bef55620 48dddcf3 48d68d16 20070030 00000011
    Aug 11 06:52:24 am335x-evm kernel: ---[ end trace 8d94f53d281fd74f ]---
    

    And I found your following patches for kernel 4.2.0 in 2016.

    patchwork.kernel.org/.../

    (1/2) marc.info/
    (2/2) marc.info/

    Will these solve our problem in kernel 5.10?
    If so, does this mean I don't need to disable MUSB DMA?

    Best regards,
    Katsuyama

  • Hi Katsuyama-san,

    I got following trace log when the problem occurred.

    This kernel WARNING should be harmless, it can be ignored.

    Have you tested with MUSB CPPI41DMA disabled to see if the issue still happens?

  • Hi Bin san,

    Thank you for your reply.

    This kernel WARNING should be harmless, it can be ignored.

    I see. Thank you for your answer.

    Have you tested with MUSB CPPI41DMA disabled to see if the issue still happens?

    Yes, I could have a good result, but not perfect.
    The combination with reloading musb_hdrc & musb_dsps before powering on the modem
    seems perfect.

    I checked the source code of the driver.
    The '(1/2) patch' has already been applied in kernel 5.10.
    So, I applied '(2/2) patch' and 'retries=10000' one and test with enabling MUSB DMA.
    But, the result was NG.

    Some people in my company and my customer don't want to disable MSUB DMA and
    reload the musb driver. And they expected a good result with these patches.

    So, can I get your comment on the following?
    1. These patches are valid or invalid?
    2. The mechanism which this problem occurs with MUSB DMA.
    3. Cons of disabling MUSB DMA.
    (I understand that transfer speeds will be slower.

    Best regards,
    Katsuyama

  • Hi Katsuyama-san,

    1. These patches are valid or invalid?

    As you said, the first patch (1/2) is already in the kernel, so it is valid. I will review the second patch and get back to you.

    2. The mechanism which this problem occurs with MUSB DMA.

    First of all, without further debug into the issue, I don't know if the issue is related to the MUSB DMA. That is why I asked you to test with the DMA disabled to see if the issue still happens or not.

    3. Cons of disabling MUSB DMA.
    (I understand that transfer speeds will be slower.

    Did you compared the transfer speed with and without MUSB DMA in your use case? What is the difference?