This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IPC Notify events stopping critical HWI from running (OMAP L138 DSP, SysBios)

Other Parts Discussed in Thread: OMAP-L138, SYSBIOS

On our OMAP L138, my code is using IPC for the cores to talk to each other (via Syslink on ARM).  The cores communicate with shared data, where the ARM sends a Notify event to tell the DSP that it has changed this data, the Notify callback in the DSP posts a SWI to process this data, and the DSP SWI sends a Notify event back to the ARM when it has handled the change.  The ARM is handling the UI, and the DSP is running a real-time processing loop with the configuration passed from the ARM.

On the DSP, the critical processing runs in an ISR (plugged HWI), where the HWIs are configured via MaskOption_BITMASK settings so that this critical processing has top priority.  All other HWIs will be pre-empted by it. But I've just found that scheduling of this ISR is being delayed by around 1us, every time a Notify event exchange happens.  This is causing the processing to miss its deadline.  The ISR does not get pre-empted whilst it is running - it just gets extra latency in being scheduled.

I created some extra Notify events to test this.  Just sending a Notify event from the ARM to the DSP did not cause this, and nor did the Notify callback posting a SWI.  When I investigated the IPC code, I found the HWI was set to MaskOption_SELF, so this is as expected.

But sending a Notify event back to the ARM causes the critical processing HWI to be held up.  When I looked at the IPC code for sending a Notify event (the function NotifyDriverShm_sendEvent() in file NotifyDriverShm.c), I found numerous places which globally disabled/enabled interrupts.  This would absolutely explain what I'm seeing, by creating a priority inversion where IPC prevents any higher-priority code from pre-empting it.

I'm afraid this looks like a serious error.  Clearly there is an assumption from the IPC coders that IPC comms will be the highest priority operation and should be allowed to do this, but this simply isn't valid.  It might not be a big problem with soft-real-time systems, and it might actually be correct in cases where the DSP is being used as a co-processor for the ARM and Notify *is* the highest priority.  But in a hard-real-time system where the DSP is running a control loop and the ARM is running the UI, IPC should be a very *low* priority.  Having it messing with interrupts means we are unable to guarantee processing deadlines.

Do you have any suggestions for workarounds or alternatives within IPC?  Or will I have to write my own notification mechanism to replace IPC, if it is not fit for this purpose?

  • FYI for anyone else with the same problem...

    Notify and other IPC events use the first two inter-processor (CHIPSIG) interrupts in each direction, leaving two spare. (Assuming you don't have "both" selected in your IPC/Syslink settings; the default is for them to only use one.) Since Notify is clearly broken, and TI do not seem to have any answers for how to fix it, I decided to write my own equivalent.

    For the ARM side, I wrote a small Linux driver with two operations: one which would fire off a CHIPSIG interrupt from the ARM to the DSP; and one which would block waiting for a CHIPSIG interrupt from the DSP to the ARM. The former is easy. The latter would typically be done by creating an ISR and having it trigger a semaphore, so that other processing can carry on in the background. In my case though, I know that there will only ever be a very short wait for the DSP to come back, and latency is important so I don't want the overheads of context-switching ISRs and semaphores, so I've just done a simple spinlock checking that IRQ flag. For the DSP side, I simply replaced the Notify callback/send with a HWI ISR and HWI trigger.

    The result completely fixes the problem. The high-priority ISR is now completely deterministic, and the extra latency has gone away.

    Processing the ARM-DSP comms is also significantly better. With Notify events, sending a Notify from the ARM to the DSP, the DSP reading a value and sending a Notify back to the ARM took 292-424us (measured by the old-school method of setting a GPIO pin high immediately before and low immediately after, and watching the GPIO pin with a scope). With my own code, the exact same operation takes 218-249us. The minimum time is not hugely different (mainly due to removing a semaphore context-switch on the ARM side, I think), but the maximum time is radically improved so that there is very little variation in how long this takes.
  • Glad to see this was resolved.
  • It was NOT resolved!!!!

    I have coded my own notification method in the absence of working software from TI.  My own application now works correctly, because I am no longer using that faulty TI code.

    It has NOT resolved the fact that the IPC Notify library code for OMAP-L138 has a serious priority inversion bug which prevents it being used in ANY application requiring a deterministic response to hardware interrupts with single-digit microsecond accuracy.

  • What I failed to say at the time. I'm building with Code Composer Studio v5.5.0.00077. (We haven't yet had a compelling business case to upgrade to CCS v6.) If you can confirm that this issue has already been identified and fixed in later versions of TI-RTOS (SYSBIOS) then that's sorted. If it hasn't, then it seems likely the bug is still in there.