LAUNCHXL-CC2640R2: Question on zero latency interrupts

Andrew Coad

Part Number: LAUNCHXL-CC2640R2
Other Parts Discussed in Thread: CC2640

Hi,

I was under the assumption that an interrupt service routine (ISR) established through IntRegister() was an RTOS managed ISR and, because of that, I've been happily calling Semaphore_post() inside the ISR with no ill side-effects.

On closer inspection, IntRegister() does the same thing (ignoring a possible vector table copy from Flash to SRAM) as Hwi_plug() - it inserts a pointer to a function into the interrupt vector table.

There are several comments in various documents that zero latency ISRs are "severely restricted" as to what RTOS services they can invoke but I can't find a definitive list of what those services are. I did find one forum post where it said that you can call Hwi_post() from within a zero latency ISR but is my use case of calling Sempahore_post() acceptable?

In my case, I'm not using the semaphore for multi-threaded synchronisation. Sempahore_post()/Semaphore_pend() is being used within a single task to synchronise the task code with the ISR - briefly, the task kicks off the process of writing a block of data out through the SSI peripheral and then pends on a semaphore. The ISR responds to the SSI interrupt and loads more data until done at which time it posts to the semaphore for the task to resume work. This has been working just fine but I'm wondering if that was just luck and whether I should change the mechanism.

Any comments would be appreciated.

Regards,

over 7 years ago

0 IUsedToBeARealPerson (M-W) over 7 years ago

TI__Guru 58425 points

Hi Andrew,

While this potentially would work, the problem is that the interrupts that is plugged in (like in the case of calling Hwi_plug) does not go over the normal Hwi dispatcher. This opens up for multiple data access issues in as the TI-RTOS can't guarantee most operations.

Are there any special reason that your interrupt need to be registered as a plug or could you simply use the normal Hwi dispatcher as you seem to run TI-RTOS as well?

0 Andrew Coad over 7 years ago in reply to IUsedToBeARealPerson (M-W)

Expert 1625 points

Hi M-W,

Thanks for your response.

Here is the situation:

The SSI peripheral is transmitting bursts of 360 x 16-bit words every 15ms (worst case). The clock rate for the SSI is ~3.4MHz so the transmission time for 360 x 16-bit words is a shade under 1.7ms with each 16-bit word taking 4.7us. Of that 1.7ms, I'm looking to reduce the amount of time spent servicing the hardware such that I can reduce or even eliminate the possibility of the output stream being interrupted.

Looking at the benchmark data for the M3, the managed interrupt overhead is as follows:
- Latency: 122 clocks
- Prolog: 112 clocks
- Epilog: 204 clocks
= 438 clocks = 9.125us

The interrupt routine (currently in C) with no optimisation takes 113 clocks = 2.5us (maybe more because this is calculated from ARM docs and does not take any wait states into account). With maximum optimisation it takes 78 clocks = 1.65us. The SSI is generating an interrupt every 4 x 16-bit words = 18.8us and in that window of time, the inclusive total of code executing is around 11.65us (not optimised) or 10.75us (optimised). This is between 57% and 62% of total time available - and it could be worse if I could figure out what wait states there are for Flash, SRAM, peripherals.

My thinking is that if I use a zero latency interrupt and I code up the ISR in assembler, I can significantly reduce the amount of time that code execution consumes. According to ARM documentation, the best case interrupt response is 12 clocks (0.25us) going in and 10 clocks (0.2us) coming out. If I can get the ISR optimised to 1us or better, the total time spent in code will be less than 1.5us every 18.8us or 8%.

That is my thinking but I am open to any ideas or suggestions.

Regards,
ac

0 IUsedToBeARealPerson (M-W) over 7 years ago in reply to Andrew Coad

TI__Guru 58425 points

Hi Andrew,

I assume you have looked into using the DMA to lower ISR overhead? If the numbers of bytes in each burst is known you should be able to utilize the DMA feature quite smoothly. When using DMA, you only get the interrupt on completion of the full transaction (and not every frame). In that case you should even be able to do this using the normal SPI driver, is this something you tried and found to be failing?

0 Andrew Coad over 7 years ago in reply to IUsedToBeARealPerson (M-W)

Expert 1625 points

Hi M-W,

DMA was the approach I assumed at the outset but after looking into the details I did not prototype it.

The main reason is that in the CC2640, the MCU has priority over DMA and, unlike some priority schemes, DMA is not guaranteed M number of cycles for every N number of MCU cycles. This being the case, DMA could be held off indefinitely in theory and considerably in practice. If the DMA controller is delayed more than ~18us after an SSI interrupt, the SSI buffer will under run and cause a data drop out.

The uDMA controller maintains its control blocks in main memory and a single transfer requires 12 bus cycles not including any wait state overhead. Any competing access to SRAM will have an impact on DMA. How likely is it that the DMA controller gets held off more than 18us? There is no way of knowing - it is non-determinate. Maybe I could get it working in the lab today but, tomorrow, a firmware upgrade or just a change in BT traffic patterns and it could start to fail. Since the standard SPI driver installs a DMA handler under the hood, the SPI driver has the same non-determinant behaviour.

As I see it, the only way where I can be sure that the SSI timing is met is to use a zero latency interrupt and get in and out as quickly as possible. I think this issue touches on a general challenge with any MCU combined with a radio - how do you respond to hardware in a determinant manner without fouling the radio?

Regards,
ac

0 IUsedToBeARealPerson (M-W) over 7 years ago in reply to Andrew Coad

TI__Guru 58425 points

Hi Andrew,

I would say it is unlikely that the DMA is held-of for > 800 cycles in a row. Also, add the 8-slot deep SPI FIFO into the mix, the DMA would try to top this of as soon there is a slot free (and not only when we reach the 4-slot free mark). This gives you even more potential headroom on the bus access.

The DMA activity would typically not impact the radio activity as long as it is performed from RAM (transfers from flash can prevent HF clock switching). As the radio driver heavily depends on interrupts, having frequent zero-latency interrupts could also impact this in a negative way.

Back to the original question regarding your semaphore post. While this might work well for you today, this could also change with a firmware upgrade if the way semaphores works would change (under the hood, not the API level). As semaphore operations involve queue and flag manipulation that is not atomic, it is not supported to call these from inside a zero-latency interrupt. That it is not supported does not necessarily mean that it will not work, just that the behavior of the semaphores can not be guaranteed.

0 Andrew Coad over 7 years ago in reply to IUsedToBeARealPerson (M-W)

Expert 1625 points

Hi M-W,

Thank you for your message. After your last comment, I took a step back and revisited the DMA approach and you are correct, it is the better option.

My concern about DMA getting held off is an unnecessary concern. For the MCU to hold off DMA for an extended period of time, the MCU would have to be doing a sizeable back-to-back transfer to/from RAM, which isn't going to happen. Other MCU activity will be occurring on different busses to what DMA uses.

I'll mark this thread as resolved and, again, thank you for your input.

Regards,
ac

Bluetooth®︎

Bluetooth forum

LAUNCHXL-CC2640R2: Question on zero latency interrupts