TMS570LC4357: Correct Order for FPU Register Save/Restore in Interrupt Context Switch

Part Number: TMS570LC4357

Tool/software:

I am working on a project in which floating point operations are required inside an interrupt context, specifically in a fast interrupt request (FIQ). In order to ensure the FPU register state is maintained, it is required that the FPU registers are saved at the beginning of the ISR, and restored at the end. Per the ARM Architecture Reference Manual for A/R chips, Section B4.1.57, the registers I need to save are:

  • D0-D15
  • D16-D31 (if implemented)
  • FPSCR
  • FPEXC

The original implementation for this save/restore logic pushed the registers in the order listed above, and popped the off in the reverse order, like so:

    uint32 fpscr_reg;
    uint32 fpexc_reg;

    __asm__(" vpush {d0-d15}");
    __asm__(" vmrs %0, fpscr" : "=r"(fpscr_reg));
    __asm__(" vmrs %0, fpexc" : "=r"(fpexc_reg));

    /* code to handle interrupt... */
    
    __asm__(" vmsr fpexc, %0" ::"r"(fpexc_reg));
    __asm__(" vmsr fpscr, %0" ::"r"(fpscr_reg));
    __asm__(" vpop {d0-d15}");

That appeared to work well, but recently, when adding a new feature that happened to do a couple of long floating point operations in a single line, I noticed that I was getting incorrect results. These incorrect results went away when floating point operations were broken up with their values stored in intermediate variables.

After a lot of digging, it appears that the issue was that the floating point registers were getting stored in the wrong order, which was garbling the saved state before entering the interrupt context. With a long computation in a single line, the state of the computation was never getting save to RAM, and instead lived in the FPU registers. When the interrupt triggered, it was messing up that register state, which in turn messed up the computed result.

Evidently, the correct ordering is to store the FPSCR and FPEXC first, and then store the general purpose FP registers. This looks like:

    uint32 fpscr_reg;
    uint32 fpexc_reg;

    __asm__(" vmrs %0, fpscr" : "=r"(fpscr_reg));
    __asm__(" vmrs %0, fpexc" : "=r"(fpexc_reg));
    __asm__(" vpush {d0-d15}");
    
    /* code to handle interrupt... */
    
    __asm__(" vpop {d0-d15}");
    __asm__(" vmsr fpexc, %0" ::"r"(fpexc_reg));
    __asm__(" vmsr fpscr, %0" ::"r"(fpscr_reg));

That appeared to fix the issues I was seeing with the long floating point computation in a single line. That said, there doesn't seem to be much information about these steps of saving/restoring out there. I would like to get some confirmation that this new ordering is correct, and that I'm not missing any other key steps in this process or idiosyncrasies of the TMS570.

Thanks!

  • I ended up finding out what the issue here was. Turns out the local variables, fpscr_reg and fpexc_reg get compiled as relative offsets to the stack pointer. The vpush instruction then pushes directly to the stack, updating the stack pointer. This is what created the dependence on order, because if we pushed the d0-d15 registers between when the local variables were declared and when we stored fpscr and fpexc, the stack pointer would move and the status registers would get placed into locations on the stack meant for the general-purpose registers. To resolve this issue, I chose to save/restore the status registers using the push/pop instructions. e.g.:

    __asm__(" vpush {d0-d15}");
    __asm__(" vmrs r1, fpscr");
    __asm__(" push {r1}");
    __asm__(" vmrs r1, fpexc");
    __asm__(" push {r1}");
    
    /* code to handle interrupt... */
    
    __asm__(" pop {r2}");
    __asm__(" vmsr fpexc, r2");
    __asm__(" pop {r2}");
    __asm__(" vmsr fpscr, r2");
    __asm__(" vpop {d0-d15}");


    This unified the method of store/restore between all of the registers, ensuring there was no dependence on the order of operations.