This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Unhandled edge IRQS on UART1 (TTYS1) (AM1705/DA830).

Other Parts Discussed in Thread: AM1705, OMAP-L137

All,

I've seen various chatter on threads talking about the following type of error message:

 

irq 53: nobody cared (try booting with the "irqpoll" option)

... some long drawn out stack backtrace...

handlers:

[<some prog addy>] (serial8250_interrupt+0x0/0x120)

Disabling IRQ #53

 

I've tried booting with "irqpoll" and still have the issue. I have yet to try booting with "noirqdebug" because I feel this might be a little excessive for what is actually happening. Basically, what I've found, is that IRQ 53 is getting edge triggered interrupts every time a transmission is made. My suspicion is also that the default GPIO init for the AM1705 (DA830) mach is doing a "set_irq_type" for both rising and falling edge. I would assume these should be masked off once the 8250 driver takes over, but that doesn't seem to be the case. I can clearly see the unhandled interrupt count rise when "handle_edge_irq" passes off to "note_interrupt". Eventually the irq_desc reaches the magic unhandled count (100000) and the whole thing gets disabled. This obviously effects the performance of anything previously relying on IRQ 53. 

 

Has anyone ran into something similar to this?

What are some possible workarounds?

Does passing "noirqdebug" have any other adverse effects to a running kernel?

 

For reference, I'm using Linux 2.6.38.6 with Buildroot 2011.02.

 

Thanks,

Andrew

 

 

  • Could a TI employee please comment on any other possible workarounds? It seems like to me there is an underlying issue with how TI's 16550A UART peripheral works with the 8250 driver in the Linux kernel.

     

    UPDATE:

    At this point, this thread may be nothing more than me thinking out loud. Should you find yourself in a similar predicament, please read on. Others using the 8250 driver on this target may never run into this issue. If your implementation has consistent bidirectional traffic, it is likely that you won't have any problems. However, if your implementation does not always receive characters, you may eventually reach the unhandled threshold in the spurious irq handler.

    Anyway, here we go.

    I've found that the GPIO init is not the underlying cause of the unhandled edge_irq. After turning on some 8250 debug, I've discovered the following warnings when opening a FD to /dev/ttyS1 and configuring as a TERMIOS "raw" interface.

     

    "serial8250_startup" debug output:

    First message:

    ttyS1 - using backup timer

    Second message:

    ttyS1 - enabling tx status workarounds

    Both of these messages rely on the state of the THRI interrupt and are somewhat superficial. While I don't think they should be happening, I can't directly see a performance hit due to their enabled logic.

     

    After my implementation opens a FD to /dev/ttyS1, it continues to write 1 byte at a time.  Again, not a performance centric implementation. However, each write triggers an unhandled interrupt. Looking at "serial8250_start_tx," it enables the THRI interrupt and then calls "transmit_chars". "transmit_chars" then takes care of writing each char to the UART_TX register. Once done, it calls "__stop_tx" which clears the THRI enable bit. This all happens before the "serial8250_interrupt" is called.

    If I understand the SPRUFW3A – August 2010 document correctly, set/clear operations on the IER register will set/clear their IIR counterpart. Obviously the set operation still depends on the state of the THRE, but the documentation does not define what happens if the THRI (ETBEI) bit in the IER register is cleared when the THRE interrupt is pending.

    From what I can tell, when "__stop_tx" clears the THRI enable bit, the IIR register does in fact get cleared as well. This means that when the "serial8250_interrupt" gets called, the IIR register will show IPEND = 1 (no interrupts pending). This to me seems like a race condition. Ultimately, the serial8250_interrupt returns "handled" = 0 to "handle_IRQ_event" which returns "IRQ_NONE" to "handle_edge_irq" which finally passes the return to "note_interrupt". "note_interrupt" eventually spills over into "__report_bad_irq" which disables the interrupt. 

    Once this whole debacle plays out, the UART1 interrupt gets disabled. At that point, it looks like the only thing left to service the UART is the backup timer setup in "serial8250_startup". 

    I may be full of hot air but I don't think any person using this target should be able to force this whole chain of events without hacking the kernel or the 8250 driver. This also implies that you can't really fix the issue without mucking with the kernel or the 8250 driver. At this point, I feel like the only option is to pass "noirqdebug" to the kernel and disable "note_interrupt" from being called. 

    Am I missing something?

    Thanks,

    Andrew

     

     

  • I took a look at the DA830 EVM (aka AM1705/L137) code at the Arago GIT. As far as I can tell the code does not mux in the second and third serial ports. I think the ASP is using those pins. I was using a L137 EVM and it only has one serial port. The AM17xx EVM looks the same. Are you using a custom board and the mainline kernel?

  • Norman,

    I am using a custom board with the mainline kernel and buildroot. The versions are as follows

    Linux 2.6.38.6

    Buildroot 2011-02 

    As for the pin-muxing, here is my configuration at run-time after "da830_evm_init" has ran. To be upfront, I'm piggy-backing on the da830 mach-type and have modified the existing EVM code to be "compatible" with my custom board. I have also added 

    davinci_cfg_reg_list(da830_uart1_pins);

    davinci_cfg_reg_list(da830_uart2_pins);

    to the initialization routine.

     

    PINMUX SETTINGS:

    PINMUX0: 0x01c14120: 0x11111111

    PINMUX1: 0x01c14124: 0x11111111

    PINMUX2: 0x01c14128: 0x11111111

    PINMUX3: 0x01c1412c: 0x11111111

    PINMUX4: 0x01c14130: 0x11111111

    PINMUX5: 0x01c14134: 0x11111111

    PINMUX6: 0x01c14138: 0x11111111

    PINMUX7: 0x01c1413c: 0x12111111

    PINMUX8: 0x01c14140: 0x28811022 (UART2_RXD:2h BIT 31-28)

    PINMUX9: 0x01c14144: 0x88288012 (UART2_TXD:2h BIT 3-0)

    PINMUX10: 0x01c14148: 0x22222221

    PINMUX11: 0x01c1414c: 0x11181122 (UART1_TXD:1h BIT 15-12/UART1_RXD:1h BIT 11-8)

    PINMUX12: 0x01c14150: 0x81118111

    PINMUX13: 0x01c14154: 0x11188111

    PINMUX14: 0x01c14158: 0x11111111

    PINMUX15: 0x01c1415c: 0x11111111

    PINMUX16: 0x01c14160: 0x88888811

    PINMUX17: 0x01c14164: 0x11888888

    PINMUX18: 0x01c14168: 0x11111811

    PINMUX19: 0x01c1416c: 0x00000001

     

    Thanks,

     

    Andrew

  • I did a quick diff on a few files. There are differences between TI's kernel and the mainline 2.6.38. See:

     /arch/arm/mach-davinci/devices-da8xx.c
     /arch/arm/mach-davinci/serial.c
     /drivers/serial/8250.c aka /drivers/tty/serial/8250.c in 2.6.38

    Most changes are cosmetic but some are flags. Maybe you should try to port over some of TI's changes. In 8250.c, there is a hack to ignore the MSR for the DA850. Makes me wonder if the hack should be applied to the DA830 as well. I don't see anything for ignore the flow control on the second and third ports either.

    If you don't have it, TI's GIT is here

    http://arago-project.org/git/projects/?p=linux-davinci.git;a=tree

    You could try modifying the DVSDK 3.? for the DA830 but I think that version is based on 2.6.32 or there abouts.

     

  • Norman,

    Thanks for pointing this out. I will spin a diff and see if the changes apply to my implementation. 

    Andrew

  • Norman,

    I had a chance to look at the differences between the files you mentioned. For the most part, it looks like all of the changes are work arounds for the PORT_AR7 type. From what I can tell, the AM1705 uses the 16550(a) UART with 16-byte fifo not the AR7. In the Arago-git/devices-da8xx.c file, .type gets set explicitly to AR7 in the plat_serial8250_port array. The 8250 driver's "autoconfig" routine actually works for my platform and correctly configures a 16550a port type.  I don't think I should explicitly set the port type to AR7.

    I also looked over the MSR hack that you were talking about. After my serial device is registered, the IER does not indicate that the MSR interrupt is enabled. I think that means that I'm safe from this bug. After following some of the initialization, the MSR hack looks like it is a result of the AR7 port logic trying to assert control signals and so on. The 16550a doesn't seem to do this. 

    I may take a broader look at the linux-davinci.git to rule out any other obvious difference but I don't think my Mainline Linux is the issue at this point. I'm somewhat sure my issues are a result of the IER being modified by "__stop_tx" before the serial8250_interrupt gets a chance to handle it. 

    Thanks you so much for taking the time to respond to my post.

    Andrew

  • Hi Andrew,

    I am encountering the same "irq 53: nobody cared" error. Did you find a work-around? My application uses a ARM(Linux) - DSPLink - DSP(BIOS5) architecture. I've had to suppress spurious EDMA interrupts in the Linux EDMA driver. I am wondering if I have to do the same in the 8250 driver. A bit worrisome that suppressing spurious interrupts may be covering up worse problems.

    NoRm

  • Hi Norman,

    My usage of the serial interface was to facilitate RS-485 communications. In my application, the AM1705 was operating as the master. I sent polling messages at a fairly constant rate and when no slave devices were on the bus, the number of "sent" interrupts added up to hitting the spurious limit. When a slave device was connected that responded to polls, the spurious number would be cleared on successful reception of characters from a slave. This prevented the application from racking up the spurious limit. Disabling this wasn't an issue on my setup. I could clearly define the underlying cause of the spurious IRQ and knew the consequences of disabling it.

    What are you using the serial interface for on your platform?

    Do you feel that it is risky to suppress spurious IRQs on this interrupt?

    What I would suggest is passing noirqdebug to the kernel on "Release" builds but leaving them enabled during debug. This will allow you to accurately determine if you have issues elsewhere but still maintain operation of the serial interface when code is released. I know this is not ideal but if you follow my previous post, you will find that the error is in how the 16550 driver has been implemented by TI. There seems to be a race condition between the interrupt handler and "stop_tx".

    Hope this helps,

    Andrew

  • Hi Andrew,

    I am using UART1 for the console and debug. My situation is sort of similar to yours where the target is mainly transmitting or printing debug messages. Once my app is started, I don't type into the terminal or receive any data. The spurious count does not be cleared. My console input goes dead after the panic. Console output slows down. Appears to revert to polling.

    In my case, OMAP-L137, the problem is much worse with a DSP app running. Usually panics after 20 seconds. I wrote a simple ARM app that just printed a number in an endless loop. It did not exhibit the problem after 15 minutes. In your case, AM1705, you don't have a DSP. So I am hoping I don't have to debug this on the DSP side. Interaction with the DSP must changing the ARM timing in such a way as to get into the problem state more frequently.

    Using "noirqdebug" will ignore all spurious interrupts? Not sure if I am ready for that. I am still fighting out with DSPBIOS. My workaround for EDMA is to return IRQ_HANDLED instead of IRQ_NONE to suppress the spurious interrupt at the lowest level. Never quite sure if doing such thing will break the driver.

    Thanks for the response. Much appreciated.

    Norman

     

  • Hi Norman,

    One of my previous posts documents the call stack pretty thoroughly. You could possibly remove the call to "note_interrupt" from the serial driver if you wanted to achieve the same effect as noirqdebug, with a little more precision.

    Another alternative is to add telnet support to your Linux kernel and just use that for terminal debug once your app has started.

    I'm sorry that you ran into this issue as well. Maybe someone at TI could comment on an expected date for a fix. It seems like it would be a pretty common occurrence on other user's platforms.

     

    Andrew