This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

BEAGL-BONE-BLACK: Regression: Linux 6.1.46 Remains Unresponsive After Disabling Interrupt Associated to GPIO Pin With High-Frequency Signal

Part Number: BEAGL-BONE-BLACK


Description:

When monitoring a 140kHz signal for interrupts on a GPIO pin, Linux kernel version 6.1.46 experiences an interrupt storm and becomes unresponsive after disabling the associated IRQ. This behaviour contrasts with the expected behavior observed in kernel version 5.10.168, where the system becomes responsive upon disabling the IRQ.

Test Setup:

  • Utilized SDK 9.0.1 and a BeagleBone Black with Kernel 6.1.46 from Linux-ti-staging and Kernel 5.10.168 from linux-bb.org.
  • Signal generator producing a >100kHz square wave signal, specifically tested with 140kHz.
  • GPIO input pins tested: P8-12/14/16/18/17.

Steps to Reproduce:

  1. Apply a 140kHz signal to any GPIO input pin (e.g., P8-12/14/16/18/17).
  2. Use gpiomon to monitor the signal for interrupts.
  3. Linux will disable the associated IRQ source.
  4. On Linux 6.1.46, the system becomes unresponsive, and responsiveness is restored upon signal removal. In contrast, on Linux 5.10.168, the system becomes responsive after the IRQ source is disabled, allowing gpiomon to be forcefully terminated.

Urgency:
The issue presents an urgent concern as applying the signal to any GPIO input results in a complete system lockup, deviating from expected behaviour where Linux should recover by disabling the interrupt source, as observed in the 5.10 kernel.

  • Hi Aaron,

    On Linux 6.1.46, the system becomes unresponsive, and responsiveness is restored upon signal removal. In contrast, on Linux 5.10.168, the system becomes responsive after the IRQ source is disabled, allowing gpiomon to be forcefully terminated.

    Do you mean the issue is that applying 140KHz signal to a gpio pin makes the Linux system unresponsive, the unresponsiveness would disappear upon signal removal in kernel 5.10.168, but remains in kernel 6.1.46 even after the signal is removed?

  • On Linux 6.1.46, the system remains unresponsive while the signal is applied and this unresponsiveness continues even when Linux indicates that the interrupt source has been disabled.  On Linux 5.10.168, the kernel disables the interrupt and the system is responsive while the signal is applied.

  • Hi Aaron,

    Thanks for the clarification. Let me try to replicate the issue on the EVM and look into it.

  • Hi Aaron,

    GPIO input pins tested: P8-12/14/16/18/17.

    When you tested with these GPIO pins, how the pin is used in kernel to request its irq?

    Can you please share the kernel DTS change for using the GPIO pin?

  • I used gpiomon to listen to edge-based interrupt events on the GPIOs.  Here's an example for monitoring GPIO P8-12 on the BBB:

    gpiomon gpiochip0 12

    You can also use gpioinfo to list the available GPIO controllers and their lines.

  • Hi Aaron,

    Can you please share the kernel dts change for these gpio pins?

  • There are no DTS changes required to listen to these GPIO lines listed above.  I'm using libgpiod tools (ie gpiomon, gpioget, gpioset...) to manipulate these IOs and these pins aren't allocated to any other drivers.

  • Hi Aaron,

    If the pinmux is not specified in kernel dts, the pin is properly configured. What is the purpose of the test?

  • I've tested with the following DTS pin-mux configuration and it still behaves the same.

    &am33xx_pinmux {
    ...
    bbbgpio_pins_default: bbbgpio-default-pins {
    pinctrl-single,pins = <
    AM33XX_IOPAD(0x830, PIN_INPUT | MUX_MODE7) /* (T12) gpmc_ad12.gpio1[12] */
    AM33XX_IOPAD(0x834, PIN_INPUT | MUX_MODE7) /* (R12) gpmc_ad13.gpio1[13] */
    AM33XX_IOPAD(0x838, PIN_INPUT | MUX_MODE7) /* (V13) gpmc_ad14.gpio1[14] */
    AM33XX_IOPAD(0x83c, PIN_INPUT | MUX_MODE7) /* (U13) gpmc_ad15.gpio1[15] */
    AM33XX_IOPAD(0x828, PIN_INPUT | MUX_MODE7) /* (T11) gpmc_ad10.gpio0[26] */
    AM33XX_IOPAD(0x82c, PIN_INPUT | MUX_MODE7) /* (U12) gpmc_ad11.gpio0[27] */
    >;
    };
    };

  • Hi Aaron,

    I ran the same test on Beaglebone Black pin P8-12 and can see the IRQ got disabled, but Linux is still responsive after the 140KHz is removed.

    Here is the details of my test:

    - Flash SDK9.1.0.1 Linux image tisdk-default-image-am335x-evm.wic.xz to a SD card and boot it on Beaglebone Black;

    - Run command "gpiomon 0 12" on BBB;

    - Connect 140KHz pwm signal to BBB pin P8-12, and got the following message on BBB linux console:

    am335x-evm login: root
    root@am335x-evm:~# gpiomon 0 12
    [   61.113390] irq 61: nobody cared (try booting with the "irqpoll" option)
    [   61.120229] CPU: 0 PID: 0 Comm: swapper Tainted: G           O       6.1.46-g1d4b5da681 #1
    [   61.128577] Hardware name: Generic AM33XX (Flattened Device Tree)
    [   61.134737]  unwind_backtrace from show_stack+0x10/0x14
    [   61.140075]  show_stack from dump_stack_lvl+0x24/0x2c 
    [   61.145215]  dump_stack_lvl from __report_bad_irq+0x38/0xe0
    [   61.150877]  __report_bad_irq from note_interrupt+0x2a4/0x2f0
    [   61.156712]  note_interrupt from handle_irq_event+0x88/0xa0  
    [   61.162357]  handle_irq_event from handle_simple_irq+0x98/0x100
    [   61.168351]  handle_simple_irq from generic_handle_domain_irq+0x28/0x38
    [   61.175044]  generic_handle_domain_irq from omap_gpio_irq_handler+0xc4/0x20
    [   61.182180]  omap_gpio_irq_handler from __handle_irq_event_percpu+0x7c/0x1f 
    [   61.189308]  __handle_irq_event_percpu from handle_irq_event+0x44/0xa0      
    [   61.195907]  handle_irq_event from handle_level_irq+0xc0/0x1c8              
    [   61.201812]  handle_level_irq from generic_handle_domain_irq+0x28/0x38      
    [   61.208416]  generic_handle_domain_irq from generic_handle_arch_irq+0x50/0x6c
    [   61.215636]  generic_handle_arch_irq from __irq_svc+0x84/0xc4                
    [   61.221462] Exception stack(0xc1101df0 to 0xc1101e38)                        
    [   61.226569] 1de0:                                     00000000 00000000 c1260a00 00000000
    [   61.234815] 1e00: c1260a00 00000002 c111c6fc c1101eec 00000001 00000002 00000000 c1108340
    [   61.243059] 1e20: c1210de8 c1101e40 c010135c c0101370 200d0113 ffffffff                  
    [   61.249725]  __irq_svc from __do_softirq+0xb8/0x3bc                                      
    [   61.254669]  __do_softirq from irq_exit+0xc8/0x134                                       
    [   61.259532]  irq_exit from __irq_svc+0x84/0xc4                                           
    [   61.264034] Exception stack(0xc1101eb8 to 0xc1101f00)                                    
    [   61.269131] 1ea0:                                                       00000000 0000000e
    [   61.277377] 1ec0: 00000000 c1120140 00000001 c123a600 c1cef400 196a822d 00000001 0000000e
    [   61.285626] 1ee0: 00000000 0000000e 01c2ca60 c1101f08 c093a720 c093a744 200d0013 ffffffff
    [   61.293859]  __irq_svc from cpuidle_enter_state+0x1a4/0x4a4                              
    [   61.299523]  cpuidle_enter_state from cpuidle_enter+0x30/0x40                            
    [   61.305358]  cpuidle_enter from do_idle+0x1ac/0x25c                                      
    [   61.310331]  do_idle from cpu_startup_entry+0xc/0x10                                     
    [   61.315380]  cpu_startup_entry from rest_init+0xc0/0xc4                                  
    [   61.320692]  rest_init from arch_post_acpi_subsys_init+0x0/0x8                           
    [   61.326655] handlers:                                                                    
    [   61.328950] [<4a12dfae>] lineevent_irq_handler threaded [<e46c157b>] lineevent_irq_thread
    [   61.337280] Disabling IRQ #61                                                            
    event: FALLING EDGE offset: 12 timestamp: [      61.083280464]                              
    
    ^Croot@am335x-evm:~#
    root@am335x-evm:~# uname -a
    Linux am335x-evm 6.1.46-g1d4b5da681 #1 PREEMPT Thu Oct 19 10:19:08 UTC 2023 armv7l armv7l armv7l GNU/Linux
    root@am335x-evm:~# ls /
    bin  boot  dev  etc  home  lib  linuxrc  lost+found  media  mnt  opt  proc  run  sbin  srv  sys  tmp  usr  var  www
    root@am335x-evm:~#

    - Remove the pwm signal from BBB pin P8-12, and press "Ctrl-C" to terminate the gpiomon command. The BBB is still responsive. Please see the 'uanme -a' and 'ls' command output in the console log above.

  • Hi Bin,

    Can you confirm if the console is responsive right after the "Disable IRQ#" message is displayed and before you remove the signal?

  • Hi Araon,

    Yes, now I can see Linux is not responsive when the signal is still applied.

    Now before the test I changed the cpufreq gonvernor to "performance" (default is "ondemand"), then the Linux will be responsive when the signal is still applied. Note that now I have to bump the signal to 200KHz to generate enough irq storm for kernel to disable the IRQ.

    am335x-evm login: root
    root@am335x-evm:~#
    root@am335x-evm:~# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/
    affected_cpus                  cpuinfo_transition_latency     scaling_cur_freq               scaling_min_freq
    cpuinfo_cur_freq               related_cpus                   scaling_driver                 scaling_setspeed
    cpuinfo_max_freq               scaling_available_frequencies  scaling_governor               stats/
    cpuinfo_min_freq               scaling_available_governors    scaling_max_freq
    root@am335x-evm:~# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
    root@am335x-evm:~# gpiomon 0 12 &
    [1] 582
    root@am335x-evm:~# [   89.915770] irq 61: nobody cared (try booting with the "irqpoll" option)
    [   89.922548] CPU: 0 PID: 583 Comm: irq/61-gpiomon Tainted: G           O       6.1.46-g1d4b5da681 #1
    [   89.931641] Hardware name: Generic AM33XX (Flattened Device Tree)
    [   89.937772]  unwind_backtrace from show_stack+0x10/0x14
    [   89.943052]  show_stack from dump_stack_lvl+0x24/0x2c
    [   89.948143]  dump_stack_lvl from __report_bad_irq+0x38/0xe0
    [   89.953755]  __report_bad_irq from note_interrupt+0x2a4/0x2f0
    [   89.959541]  note_interrupt from handle_irq_event+0x88/0xa0
    [   89.965146]  handle_irq_event from handle_simple_irq+0x98/0x100
    [   89.971099]  handle_simple_irq from generic_handle_domain_irq+0x28/0x38
    [   89.977749]  generic_handle_domain_irq from omap_gpio_irq_handler+0xc4/0x200
    [   89.984842]  omap_gpio_irq_handler from __handle_irq_event_percpu+0x7c/0x1f0
    [   89.991928]  __handle_irq_event_percpu from handle_irq_event+0x44/0xa0
    [   89.998489]  handle_irq_event from handle_level_irq+0xc0/0x1c8
    [   90.004353]  handle_level_irq from generic_handle_domain_irq+0x28/0x38
    [   90.010915]  generic_handle_domain_irq from generic_handle_arch_irq+0x50/0x6c
    [   90.018089]  generic_handle_arch_irq from call_with_stack+0x18/0x20
    [   90.024410]  call_with_stack from __irq_svc+0x94/0xc4
    [   90.029494] Exception stack(0xe0691ed0 to 0xe0691f18)
    [   90.034570] 1ec0:                                     c1d04440 c613fc00 c057162c c1d04800
    [   90.042785] 1ee0: c380f600 c613fc00 c3f95680 c380f60c c0171714 c380f670 c1260c85 c0dfa724
    [   90.051000] 1f00: c2f80608 e0691f20 c05776d4 c05776d4 a00e0013 ffffffff
    [   90.057641]  __irq_svc from omap_gpio_irq_bus_lock+0xc/0x1c
    [   90.063243]  omap_gpio_irq_bus_lock from irq_finalize_oneshot.part.0+0xf8/0x130
    [   90.070592]  irq_finalize_oneshot.part.0 from irq_thread_fn+0x70/0x78
    [   90.077067]  irq_thread_fn from irq_thread+0x118/0x22c
    [   90.082233]  irq_thread from kthread+0xd8/0x108
    [   90.086798]  kthread from ret_from_fork+0x14/0x2c
    [   90.091528] Exception stack(0xe0691fb0 to 0xe0691ff8)
    [   90.096601] 1fa0:                                     00000000 00000000 00000000 00000000
    [   90.104816] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [   90.113029] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [   90.119671] handlers:
    [   90.121949] [<4b2351e0>] lineevent_irq_handler threaded [<69445df8>] lineevent_irq_thread
    [   90.130202] Disabling IRQ #61
    event:  RISING EDGE offset: 12 timestamp: [      89.687559342]
    event:  RISING EDGE offset: 12 timestamp: [      89.812620634]
    event:  RISING EDGE offset: 12 timestamp: [      89.885656509]
    
    root@am335x-evm:~#
    root@am335x-evm:~#
    root@am335x-evm:~#
    root@am335x-evm:~#
    root@am335x-evm:~# fg
    gpiomon 0 12
    ^Croot@am335x-evm:~#
    root@am335x-evm:~#
    root@am335x-evm:~#

  • Thanks, Bin.  Hopefully, you have what you need to put a fix together. Let me know once you have a patch to try and I'd be happy to test it.

  • Hi Aaron,

    Is it possible for you use this cpufreq workaround in short term until we root cause it?

    This sounds like a kernel system problem, and I am not sure how long would take to find the root cause.

  • Please ignore the cpufreq workaround. Though the console is responsive, the system is sluggish. We will continue debugging the problem.

  • Hi Bin, The suggested workaround using the CPUfreq governor doesn't work in the end application/device I'm developing, as it's limited to either 300 MHz or 600 MHz max frequency.

    Also, this workaround only mitigates the issue and isn't suitable for real-world applications as the kernel utilization will vary and most likely increase when running user applications or servicing other device interrupts (i.e., Wi-Fi, BT, flash).

    A fix for this regression is considered a priority for the end device I'm working on.

  • Hi Aaron,

    A fix for this regression is considered a priority for the end device I'm working on.

    Agreed. I am working on it. Will keep you posted.

  • Hi Aaron,

    On Linux 5.10.168, the kernel disables the interrupt and the system is responsive while the signal is applied.

    I don't observe this behavior.

    In my test, with the prebuilt default SD card image in Processor SDK 8.2.0.24, the behavior is the same as SDK9.1, the console is till locked up when the signal is applied after the IRQ is disabled.

    How to do test with kernel 5.10? Using the prebuilt SD card image in SDK8.2?

  • Hi Aaron,

    It seems to be related to kernel config.

    The SDK8.2 prebuilt kernel (5.10.100) has the lockup. I rebuilt the kernel in SDK8.2 using my own kernel config, then it doesn't have the lockup. I will check what the kernel config difference between the prebuilt and my build.

  • BTY, I also rebuilt kernel 5.10.168, it doesn't lockup either.

  • Hi Aaron,

    Please apply the following kernel patch on kernel v6.1, and let me know if this solves the issue.

    diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
    index 80ddc43fd875..54dbf44c04b2 100644
    --- a/drivers/gpio/gpio-omap.c
    +++ b/drivers/gpio/gpio-omap.c
    @@ -708,6 +708,27 @@ static void omap_gpio_unmask_irq(struct irq_data *d)
     	raw_spin_unlock_irqrestore(&bank->lock, flags);
     }
     
    +static void omap_gpio_disable_irq(struct irq_data *d)
    +{
    +	struct gpio_bank *bank = omap_irq_data_get_bank(d);
    +	unsigned offset = d->hwirq;
    +	unsigned long flags;
    +
    +	raw_spin_lock_irqsave(&bank->lock, flags);
    +	omap_set_gpio_irqenable(bank, offset, 0);
    +	raw_spin_unlock_irqrestore(&bank->lock, flags);
    +}
    +static void omap_gpio_enable_irq(struct irq_data *d)
    +{
    +	struct gpio_bank *bank = omap_irq_data_get_bank(d);
    +	unsigned offset = d->hwirq;
    +	unsigned long flags;
    +
    +	raw_spin_lock_irqsave(&bank->lock, flags);
    +	omap_set_gpio_irqenable(bank, offset, 1);
    +	raw_spin_unlock_irqrestore(&bank->lock, flags);
    +}
    +
     /*---------------------------------------------------------------------*/
     
     static int omap_mpuio_suspend_noirq(struct device *dev)
    @@ -1398,6 +1419,8 @@ static int omap_gpio_probe(struct platform_device *pdev)
     	irqc->irq_ack = dummy_irq_chip.irq_ack,
     	irqc->irq_mask = omap_gpio_mask_irq,
     	irqc->irq_unmask = omap_gpio_unmask_irq,
    +	irqc->irq_disable = omap_gpio_disable_irq,
    +	irqc->irq_enable = omap_gpio_enable_irq,
     	irqc->irq_set_type = omap_gpio_irq_type,
     	irqc->irq_set_wake = omap_gpio_wake_enable,
     	irqc->irq_bus_lock = omap_gpio_irq_bus_lock,
    

  • Hi Bin,

    Thank you for providing a patch so quickly.  I've confirmed that the patch fixes the interrupt storm issue.  The system is responsive when interrupts are disabled and re-enabling interrupts works too.

    Will this patch make it into the next SDK release?

  • HI Aaron,

    Glad to hear the patch fixes the issue. Thanks for the update.

    I have filed the issue to our sw dev team and provided the patch as a reference solution. I currently don't have the exact schedule for the next SDK for AM335x, but likely it would be some time later this year, so very likely this patch should get in to the next SDK.