This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-AM335X: Linux panic due to USB interrupt while suspended

Part Number: PROCESSOR-SDK-AM335X

Hello,

I am supporting a product based upon the AM335x processor running Linux built from the AM335x processor SDK version 4.02.00.09 (kernel 4.9.59) and have observed an occasional kernel panic as follows:

[1283423.745953] Unhandled fault: external abort on non-linefetch (0x1008) at 0xe00f9030
[1283423.754222] pgd = c0004000
[1283423.757255] [e00f9030] *pgd=9d824811, *pte=47401653, *ppte=47401453
[1283423.764067] Internal error: : 1008 [#1] PREEMPT ARM
[1283423.769384] Modules linked in: omap_serial_tpi(O) rnet(O) power_freq(O) actnet(O) ionet(O) arcnet_sohard(O) ti_am335x_adc(O) ti_am335x_tscadc(O) kfifo_buf industrialio bridge stp llc iptable_filter ip_tables cryptodev(O) [last unloaded: power_freq]
[1283423.792834] CPU: 0 PID: 24111 Comm: kworker/0:1 Tainted: G           O    4.9.59-alc #1
[1283423.801441] Hardware name: Generic AM33XX (Flattened Device Tree)
[1283423.808058] Workqueue: pm pm_runtime_work
[1283423.812463] task: ddeda000 task.stack: d1fd8000
[1283423.817421] PC is at musb_default_readl+0x10/0x18
[1283423.822562] LR is at dsps_interrupt+0x50/0x30c
[1283423.827425] pc : [<c0524660>]    lr : [<c0531668>]    psr: 80030193
[1283423.827425] sp : d1fd9c48  ip : d1fd9c58  fp : d1fd9c54
[1283423.839871] r10: dd9d4710  r9 : c0c981e4  r8 : e00f9000
[1283423.845551] r7 : d1fd9cd4  r6 : ddcc6010  r5 : c085060c  r4 : ddaf3740
[1283423.852601] r3 : 00010002  r2 : c0524650  r1 : e00f9030  r0 : e00f9000
[1283423.859657] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[1283423.867441] Control: 10c5387d  Table: 9c0c0019  DAC: 00000051
[1283423.873667] Process kworker/0:1 (pid: 24111, stack limit = 0xd1fd8210)

When this panic occurs the PC is always set to musb_default_readl+0x10 as shown above.  Tracing the dump above shows that the failure occurs in the __raw_readl() call which is called indirectly via the musb_readl() call in the dsps_interrupt() function in musb_dsps.c.  Debugging the problem via ftrace has indicated that the panic occurs if the USB peripheral is in a suspended (clock disabled) state when the ISR is called and an attempt is made to read the USB status register via musb_readl().

Disabling kernel runtime power management for the USB peripheral via /sys/bus/platform/drivers/musb-hdrc/musb-hdrc.X/power/control prevents the panic from occurring.  However, if possible I would like to understand and address the root cause of the interrupt/ISR being triggered while the device is in the suspended state rather than relying on disabling power management for the device.

Any help or suggestions are appreciated.

  • To add some additional detail, the panic above occurs in devices with nothing connected to either USB phy.  Affected devices can run for hours/days/weeks between occurrences, though panics can be induced more frequently by decreasing the default kernel runtime power management autosuspend delay via /sys/bus/platform/drivers/musb-hdrc/musb-hdrc.X/power/autosuspend_delay_ms.

  • Hi Jeffrey,

    The kernel MUSB drivers had many changes during kernel v4.4/v4.9/v4.14 time frames. Can you please test the kernel 5.10 in the latest SDK 8.2 to see if the issue has been solved?

  • Hi,

    Thanks for the reply.  Unfortunately getting a later kernel built and booting on our hardware would require a multiple-month effort which is not feasible at this point.  And even if that were to address the problem deploying a new kernel and filesystem built with one of the newer SDKs to the field would require at least a year of development and testing effort.

    That said, if there is a particular commit which you believe will address the issue I may be able to try it out.  I've already reviewed the majority of the  post-4.9 commits in root/drivers/usb/musb/ and have applied a few that appear to address similar issues (this one and this one in particular) but the issue still persists.

  • Hi Jeffrey,

    Bin and I are here in Dallas today and spoke about this together.  Unfortunately, there isn't an easy way to simply incorporate the changes without moving up to the newer kernel release, largely because there were so many changes done both by TI and Linux.org.  So, Bin and I strongly feel that that best path forward to test this would be to indeed please test the kernel 5.10 in the latest (or at least later...) SDK 8.2 ( ) to verify if the issue is resolved.  We know this woudl take some effort on your team's behalf but again feel this is the best approach.

    Can you please comment?  Bin has offered to speak together on a call that we can setup if that helps also.

    Thank you,

    Chris & Bin