[DM3730]: Interrupt missing issue with TI Arago Linux 2.6.32...

Honey Sukesan

Intellectual 680 points

Other Parts Discussed in Thread: DM3730

Hi all,

We are using our Vendor supplied TI Arago Linux 2.6.32 Kernel on our TI DM3730 based board.

Our Processor is acting as an SPI master and an AFE board is acting as SPI slave.

SPI slave is configured to generate periodic interrupts at 125 microseconds.

During this time, some input data to be read at the master side. This is done at the master's tasklet.

A GPIO of DM3730 is configured to generate interrupt when the interrupt is reached from SPI slave.

Using a scope, we have ensured the correctness of interrupts from the SPI slave (AFE board).

But we can see interrupts missing at the master side in every 94 milliseconds.

We have seen this by toggling another GPIO inside interrupt handler.

Another observation is if the CPU becomes loaded, interrupt missing happens very frequently.

Because of this, our data acquisition becomes wrong and our output result is incorrect.

We suspect if there is any high priority interrupts being serviced by the Kernel at this 94ms interval.

We have verified the output of cat /proc/interrupts. The other frequent interrupts running simultaneously are

gp_timer and eth0. We have also tried to increase the priority of our interrupt by writing some registers at our

IRQ controller. Could anyone please shed some light to this issue? What may be the reason of this interrupt miss / interrupt latency?

Please suggest any workarounds to solve this issue...

The processor is running at 1GHz.

Thanks,

Honey S

over 12 years ago

0 Kim Rowe52176 over 12 years ago

Prodigy 100 points

Hi;

This is insufficient information to do a detailed analysis; however here are some ideas or things to check.

One of the known properties of Linux is that the time difference between the minimum interrupt response and the maximum response is huge. This could be the source of your problem if the interrupt controller does not preserve (latch) other interrupts.

Are you sure interrupts are nested and prioritized?

Have you used any kind of NMI which could lock out interrupts at a critical time?

Is the interrupt properly cleared?

Does the period of the timer relate in any way to your 94 msec missing interrupt? What is the period? How many 125usec intervals are there in its period. How often will they align? Could it be related?

Ethernet traffic can be bursty which could related to insufficient resources to process interrupts. Do all the packet transfers use DMA with minimal interrupt disable windows? This could cause an extra load which gives you insufficient processing power to service the interrupts. It would seem that at 1GHx this processor should be plenty fast enough to process at this rate.

This illustrates one of the biggest weaknesses of Linux. Because it is large and monolithic it becomes difficult to track problems like this. If all else fails, go and get an RTOS which is really designed for your application. With full disclosure that I have a vested interest in POSIX RTOS offerings, I might suggest you port your application to the Unison POSIX RTOS. You'll be stunned by the difference in both throughput and latency. To get this, you do give up dynamic loading but the port is simple.

My $0.02, thoughts from others?

Kim

0 Honey Sukesan over 12 years ago in reply to Kim Rowe52176

Intellectual 680 points

Hi,

Thank you for your response.

Regarding your queries, please see the following:-

Are you sure interrupts are nested and prioritized?

[Honey] >> We registered our interrupt handler on linux as IRQ using “request_irq” call. We read our interrupt controller IRQ priority status register within our interrupt handler and observed that its priority is zero (Highest Priority).

We found that another interrupt “GP_TIMER” with priority zero is also active during the same period.

We are not sure whether our interrupt is nested!

Is the interrupt properly cleared?

[Honey] >> static struct irq_chip omap_irq_chip = {

.name = "INTC",

.ack = omap_mask_ack_irq,

.mask = omap_mask_irq,

.unmask= omap_unmask_irq,

};

We assume that the above functions will do the interrupt clearing mechanism.

[Honey] >> We didn’t find any relation between these two. But our timer tick is set to 7.8 msec(CONFIG_HZ = 128)

Have you used any kind of NMI which could lock out interrupts at a critical time?

[Honey] >> We haven't used any NMI.

Ethernet traffic can be bursty which could related to insufficient resources to process interrupts. Do all the packet transfers use DMA with minimal interrupt disable windows?

[Honey] >> We have tried disabling ethernet from our Linux Kernel and tested the driver over this kernel image. But this doesn't have any effect on the interrupt miss/latency issue. We are not using DMA for our data transfer.

Please send your valuable suggestions on this issue..

Thanks,

Honey S

0 Honey Sukesan over 12 years ago in reply to Honey Sukesan

Intellectual 680 points

Hi all,

Adding some more observations regarding our interrupt latency/miss issue:-

We have tried to increase the SPI clock to 12MHz. Then also, horrible interrupt latency causing 3 or 4 interrupt miss occur at every 94ms.
We suspect if there is any high priority interrupts being serviced by the Kernel at this 94ms interval. We have verified the output of cat /proc/interrupts. The other frequent interrupts running simultaneously are gp_timer and eth0. We have also tried to increase the priority of our interrupt by writing some registers at our IRQ controller.
We have also tried to disable the Ethernet support in our kernel to disable the eth0 interrupt and still there is no change in the issue.
We have also tried to mask the gp_timer interrupt inside our SPI driver init; Then whole system hangs.
Also tried to decrease the priority of gpi_timer interrupt from inside our SPI driver init. Still there is no change in the issue.
We commented the tasklet_schedule which recieves SPI data from our Interrupt handler and enable GPIO Toggling on every interrupts . We registered our interrupt handler on Linux as IRQ interrupt using “request_irq “ call. We observed that:-
- Average interrupt latency is about 4 micro second.
- But in some case it goes to 70 micro seconds. In the attached diagram Point ‘a’ shows that GPIO set to high on an interrupt. Slave will generate another interrupt after 125 micro second. But due to latency it get reflect only after 200 micro second.
- We observed that on every 94 milliseconds, there is an interrupt latency of more than 125 micro seconds. This may cause our SPI data read to be incorrect. The time period 94 milliseconds is fixed and periodic. Diagram attached.
  
  Please give your valuable suggestions to this issue.. We are still stuck here...
  
  Can Linux be able to handle 125 microseconds interrupt on 1GHz DM3730 processor?
  
  Thanks,
  Honey S

Processors

Processors forum

[DM3730]: Interrupt missing issue with TI Arago Linux 2.6.32...