On our OMAP L138, my code is using IPC for the cores to talk to each other (via Syslink on ARM). The cores communicate with shared data, where the ARM sends a Notify event to tell the DSP that it has changed this data, the Notify callback in the DSP posts a SWI to process this data, and the DSP SWI sends a Notify event back to the ARM when it has handled the change. The ARM is handling the UI, and the DSP is running a real-time processing loop with the configuration passed from the ARM.
On the DSP, the critical processing runs in an ISR (plugged HWI), where the HWIs are configured via MaskOption_BITMASK settings so that this critical processing has top priority. All other HWIs will be pre-empted by it. But I've just found that scheduling of this ISR is being delayed by around 1us, every time a Notify event exchange happens. This is causing the processing to miss its deadline. The ISR does not get pre-empted whilst it is running - it just gets extra latency in being scheduled.
I created some extra Notify events to test this. Just sending a Notify event from the ARM to the DSP did not cause this, and nor did the Notify callback posting a SWI. When I investigated the IPC code, I found the HWI was set to MaskOption_SELF, so this is as expected.
But sending a Notify event back to the ARM causes the critical processing HWI to be held up. When I looked at the IPC code for sending a Notify event (the function NotifyDriverShm_sendEvent() in file NotifyDriverShm.c), I found numerous places which globally disabled/enabled interrupts. This would absolutely explain what I'm seeing, by creating a priority inversion where IPC prevents any higher-priority code from pre-empting it.
I'm afraid this looks like a serious error. Clearly there is an assumption from the IPC coders that IPC comms will be the highest priority operation and should be allowed to do this, but this simply isn't valid. It might not be a big problem with soft-real-time systems, and it might actually be correct in cases where the DSP is being used as a co-processor for the ARM and Notify *is* the highest priority. But in a hard-real-time system where the DSP is running a control loop and the ARM is running the UI, IPC should be a very *low* priority. Having it messing with interrupts means we are unable to guarantee processing deadlines.
Do you have any suggestions for workarounds or alternatives within IPC? Or will I have to write my own notification mechanism to replace IPC, if it is not fit for this purpose?