This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6x CSL interrupt hook problem

I checked a couple of Chip Support Libraries for the C6x family, and they share the same problem, although the exact code varies:  __IRQ_hookFetchPacket: STW B0,*B15--  and a corresponding  LDW *++B15,B0 a few instructions later.  The problem is that this violates the requirement (documented in the TMS320C6000 Optimizing C Compiler V7.3 User's Guide) that the run-time stack pointer *always* be 8-byte (doubleword) aligned.  Presumably the compiler can generate code for some data types that depends on that alignment.  My guess is that the coder of the cited code either was oblivious to the requirement, or figured that it didn't matter, since interrupts were masked (GIE bit clear in the CSR) and the SP was restored before the GIE bit could be reenabled.  The flaw with that logic is that an NMI (non-maskable interrupt) could occur during the very short window while the SP is misaligned, so whatever C code the NMI handler executes could run afoul of the presumption that the SP is always doubleword-aligned.  The fix for this is to change the code to STW B0,*B15--[2] etc.

I have seen that construct used somewhere (I forget where), and have been wondering why, until I read the requirement in the compiler guide.

Admittedly, it is hard to trip over this flaw, but not impossible, and anyone using NMI in a critical application should be aware of the possibility.

  • Douglas,

    This will probably be best answered in the E2E Compiler Forum, even though it seems like a device-specific question.

    But I suspect that NMI is considered to be a non-returning event that does not need the stack protection, and guarantees it by itself. That is just a layman's guess, so the right answer from the Compiler team will be what we really want.

    I have requested this thread be moved there from this C67x Single Core DSP Forum.

    Regards,
    RandyP

  • The requirement that the stack must remain 8-byte aligned is only true if there is some part of the system which requires the C calling convention that could still execute.  That is, if interrupts are truly disabled, the function is free to violate the C calling convention, as long as it restores it before re-enabling interrupts.  These exceptions are so complicated, and of such limited use, that it's better just to use the rule of thumb that SP should be 8-byte aligned at all times.

    If the NMI handler might be written in C, or otherwise C callable, then all interrupt handlers must obey the C calling convention.  The same is true if there is any other interrupt that could occur before interrupts are disabled.

    If an interrupt handler calls a C-callable function, it must obey the C calling convention.

    If the NMI handler is truly a non-returning event, and neither it nor __IRQ_hookFetchPacket is a C function, and no other interrupt which might be handled by a C-callable handler could occur, then the C environment has come to an end.  In this case, it would seem __IRQ_hookFetchPacket might be able to get away with not aligning SP to 8 bytes.  Is it worth proving this in each system just to save 4 bytes of a stack that is about to disappear?

  • Archaeologist,

    It sounds like you are supporting Douglas' position on these stack saves being poor choices. If that is the case, and I agree, should we try to find a way to get the word out within the software groups to change these interrupt-vector templates to use the extra 4 bytes and not have to worry about proving that the mis-alignment is okay?

    If I can help with that, please let me know who to start with.

    Regards,
    RandyP

  • In my opinion, it is easier for everyone if those stack saves are changed to keep SP 8-byte aligned, and there is almost no downside.

    Yes, I agree that the CSL ought to be changed.

    I'm sorry, I don't know who to talk to about CSL issues.

  • Douglas,

    Was there a particular device for which this CSL issue causes you concern, or is it a general and academic issue?

    It seems to also be in some SYS/BIOS files, so there are a few places where this needs to be repaired. I will try to push this to the right people and see what can be done.

    Thank you for bringing this to our attention. And I am glad we got to this forum so my layman's guess was corrected.

    Please keep in mind that there may be different opinions about this being a requirement, so there is no guarantee that there will be back-fill to change this in any existing code. It makes sense to me that we upgrade future code to avoid this, since as Archaeologist says, "there is almost no downside".

    But it is a fact that there are places in any highly pipelined software in which there is no way to make a valid return from an NMI. If there are in-flight register writes that end up being flushed early because of the NMI, then the calculations or even branch targets could change and completely corrupt the system. So, no promises.

    Regards,
    RandyP

  • Well, it's not an academic issue, in that there is no particular reason that an NMI handler wouldn't use a lot of C code, which may depend on proper SP alignment.  I don't have an immediate problem myself, but then I am not using the CSL for this anyway.  I'm more concerned (just as a member of the community) about other users whose NMI handlers may fail sporadically, especially when they have critical reliability requirements (which is usually the case when you're using NMI at all).

    I suspect that most NMI processing is likely to involve a certain amount of notification/logging actions, resetting peripherals etc. (presumably involving some amount of C code), then restarting the whole program.  (You usually don't want to just longjmp to a recovery point, because without a way to block NMI you can't be sure that some data structures aren't left in an inconsistent state.)

    I don't think there is much difference in valid return from NMI (via B NRP) and valid return from a maskable interrupt (via B ISP).  NMI is not the same as RESET!  For the C6713 at least, execute packets are annulled in the pipeline the same for NMI as for maskable interrupts.  If the program uses single-assignment programming, which is the current compiler default, there should be no problem with the interrupted computation.  The only place I might worry about is NMI in the middle of a maskable interrupt vector where there is typically "B B0; LDW *++B15[2],B0" but since the NMI will not be serviced within the delay slots of a branch, the LDW is safely completed to restore the clobbered B0 and the branch is taken as usual before the NMI is serviced.

    I'm impressed how the computer architect(s) actually managed to get this right.  I could tell you stories...

  • I think Archaeologist mentioned this, but it is worth noting that fixing this doesn't change the size of the stack needed for the application; the borrowed extra 4 bytes is given back concurrently with executing the user-provided ISR function.  The "STW B0,*B15--[2] ... LDW *B15++[2],B0]" sequence doesn't actually access any extra space; it uses the same location as the old code but adjusts the pointer 4 bytes lower.  The only way an effect could be seen would be if there is an interrupt (necessarily NMI) during this window, and in that case the NMI handler's code will see the correctly adjusted SP (B15) instead of a misaligned one.  That could push the stack usage one word lower than in the case of the old code, nonrecursively, so the difference is (barely) visible under this set of circumstances (only).  However, the difference is better than benign, it is actually beneficial since the NMI handler's C-generated code may well need the SP to be doubleword-aligned.

    Thanks to everybody for the friendly discussion.

  • I agree that there's a small window where an NMI could preempt these ISR entry points and run with a 4-byte aligned stack instead of the 8-byte aligned stack as required by the compiler.   This could cause a problem if the NMI service routine used a double-word-sized local variable or local structure with a double-word element.   In most cases, the NMI routines are terminal on the C6x and do not return back to the application because of the open pipeline.

    SDOCM000093861 -- will be fixed in DSP/BIOS 5.42.00 in Aug/Sept timeframe.

    SDOCM000093859 -- will be fixed in SYS/BIOS 6.34.00 in Aug/Sept timeframe.