Spurious irq 95: 0xffffffdf, please flush posted write for irq 37 and Linux kernel thread "softirq" is consuming very high percentage of CPU - 95%

Resmi

Dear TI/Forum members

Hardware: OMAP 3503

Software : Linux Kernel (from TI PSP linux-02.01.03.11 ) v2.6.29

Ethernet : LAN 9220 SMSC

Problem : Occasionally when we try to restart the eth0 interface when changing the IPv4 configuration, we run into two issues

Issue 1: The "Spurious irq 95: 0xffffffdf, please flush posted write for irq 37" is logged by kernel (seen via dmesg command) , this message is seen a 2 to 5 times , and no more

Issue 2: The Linux kernel thread "softirq" is consuming a very high percentage of CPU, 95% , and any operation on the system becomes very very slow , the only way to recover from this issue is to hard reboot system.

Has anyone of you faced these two issues, any pointers to resolve these issue would be great.

thanks

Pads

over 14 years ago

0 Sanjeev Premi over 14 years ago

TI__Expert 4590 points

There have been at-least 3 major PSP releases since 02.01.03.11.

Do you have any specific reason to stay at 2.6.29 kernel?

0 Resmi over 14 years ago in reply to Sanjeev Premi

Intellectual 280 points

Thank you so much for fast response Sanjeev

>> Do you have any specific reason to stay at 2.6.29 kernel?

We are in v2.6.29 (PSP 02.01.03.11) , because we have about 40+ kernel patches for our custom hardware based on OMAP 3503, and it will take quite some time before migrate to latest release from TI.

Can you please let me know if this problem is resolved in the new release of kernel ? if yes which release ?

Is this a bug in the softirq kernel thread ? which is fixed in the newer release ?

thanks

Pads

0 Sanjeev Premi over 14 years ago in reply to Resmi

TI__Expert 4590 points

Resmi said:
Can you please let me know if this problem is resolved in the new release of kernel ? if yes which release ?

Can't point to a specific release, but here is the patch that should be fixing it.
https://patchwork.kernel.org/patch/18244/

Resmi said:
Is this a bug in the softirq kernel thread ? which is fixed in the newer release ?

None that I am aware of; but you may want to check the changelog to find any.

Resmi said:
... because we have about 40+ kernel patches for our custom hardware based on OMAP 3503,

Personal opinion: you should consider the effort in porting against time spend in hitting the problems and debugging and finding that it has already been fixed; and back-porting it.
Remember this kernel version is really old.

0 Resmi over 14 years ago in reply to Sanjeev Premi

Intellectual 280 points

Dear Sanjeev

Thank very much once again for the prompt response,

>>Can't point to a specific release, but here is the patch that should be fixing it

>>https://patchwork.kernel.org/patch/18244/

Even with this patch, I still see the problem , spurious interrupt message is displayed and the ksoftirqd kernel thread's CPU spikes and stays at 95% cpu usage.

below is the additional details

------------- dmesg output after issue has occurred ------------------------

.....

Beginning of smsc911x_open method
Beginning of smsc911x_soft_reset method
End of smsc911x_soft_reset method
net eth0: SMSC911x/921x identified at 0xa080c000, IRQ: 172
End of smsc911x_open method
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: duplicate address detected!
Beginning of smsc911x_stop method
Spurious irq 95: 0xffffffdf, please flush posted write for irq 37
Spurious irq 95: 0xffffffdf, please flush posted write for irq 37 <-------------- This msg is printed when execution is happening within smsc911x_stop() method of SMSC911x device driver, it is stuck at napi_disable() method , this issue occurs almost always in the napi_disable() which is called by smsc911x_stop() method, when eth0 interface is brought down to restart the eth0 interface.

.......

---------------------------- end -------------------------------------------------------

and the below is the output of /proc/interrupt before and after this issue has occurred

------------------------ Before issue : /proc/interrupts ----------------------------------------

           CPU0
11:          0        INTC prcm
12:       4958        INTC DMA
24:          0        INTC Omap 3 Camera ISP
25:          0        INTC OMAP DSS
37:      49895        INTC gp timer                     <------------------------this is low here
56:        586        INTC i2c_omap
57:        175        INTC i2c_omap
61:         17        INTC i2c_omap
65:          0        INTC omap_mcspi_isr
72:          3        INTC serial idle
73:        448        INTC serial idle
74:       1649        INTC serial idle, serial
83:      15607        INTC mmc0
92:        170        INTC musb_hdrc
93:         63        INTC musb_hdrc
172:        882        GPIO eth0
174:          0        GPIO maintenance_reset
176:          0        GPIO cpld-power
369:          0     twl4030 twl4030_keypad
378:          0     twl4030 twl4030_usb
384:          0     twl4030 mmc0
Err:          0
-----------------------------------------end --------------------------------------

and

----------------- After issue: /proc/interrupts -------------------------------

           CPU0
11:          0        INTC prcm
12:       8574        INTC DMA
24:          0        INTC Omap 3 Camera ISP
25:          0        INTC OMAP DSS
37:     183364        INTC gp timer                    <------------------------this is very high here
56:       1084        INTC i2c_omap
57:        178        INTC i2c_omap
61:         17        INTC i2c_omap
65:          0        INTC omap_mcspi_isr
72:         11        INTC serial idle
73:        448        INTC serial idle
74:       3607        INTC serial idle, serial
83:      22277        INTC mmc0
92:        635        INTC musb_hdrc
93:        528        INTC musb_hdrc
172:       2305        GPIO eth0
174:          0        GPIO maintenance_reset
176:          0        GPIO cpld-power
369:          0     twl4030 twl4030_keypad
378:          0     twl4030 twl4030_usb
384:          0     twl4030 mmc0
Err:          0
----------------------------------------------------------------------------

My observation is that whenever the eth0 interface is restarted (i.e. stopped and started), during stop sequence, in the ethernet device driver (in drivers/net/smsc911x.c ) in the smsc911x_stop() method, after successfully invoking netif_stop_queue(dev); the napi_disable(&pdata->napi) is called, and this is when the spurious irq message is printed and following kernel threads sudden start consuming high % of cpu , mmcqd and ksoftirqd , in top I see 98.9 % usage by softirq thread.

Any pointers/suggestions to resolve is greatly appreciated.

>>Personal opinion: you should consider the effort in porting against time spend in hitting the problems and debugging and finding that it has already been fixed; and

>>back-porting it. Remember this kernel version is really old.

I completely agree with you, as of last evening, in parallel to finding a fix for the above issue, I have initiated the process of porting our patches to latest release,

btw do you recommend moving to latest release or the one before the latest release ? as I am concerned about stability, need your expert inputs to decide on this.

Thank you ,

Pads,

0 Resmi over 14 years ago in reply to Resmi

Intellectual 280 points

Any updates ?

0 Sanjeev Premi over 14 years ago in reply to Resmi

TI__Expert 4590 points

I did mention earlier that this is a really old version. You can check the lastest PSP updates at this URL:

http://arago-project.org/git/projects/?p=linux-omap3.git;a=summary

You may want to selectively backport selected patches. Have already pointed to a specific patch but there could be more associated - but not directly related - patches that would have helped in fixing the issue(s).

Processors

Processors forum

Spurious irq 95: 0xffffffdf, please flush posted write for irq 37 and Linux kernel thread "softirq" is consuming very high percentage of CPU - 95%