Tool/software:
I am working on a project in which floating point operations are required inside an interrupt context, specifically in a fast interrupt request (FIQ). In order to ensure the FPU register state is maintained, it is required that the FPU registers are saved at the beginning of the ISR, and restored at the end. Per the ARM Architecture Reference Manual for A/R chips, Section B4.1.57, the registers I need to save are:
- D0-D15
- D16-D31 (if implemented)
- FPSCR
- FPEXC
The original implementation for this save/restore logic pushed the registers in the order listed above, and popped the off in the reverse order, like so:
uint32 fpscr_reg; uint32 fpexc_reg; __asm__(" vpush {d0-d15}"); __asm__(" vmrs %0, fpscr" : "=r"(fpscr_reg)); __asm__(" vmrs %0, fpexc" : "=r"(fpexc_reg)); /* code to handle interrupt... */ __asm__(" vmsr fpexc, %0" ::"r"(fpexc_reg)); __asm__(" vmsr fpscr, %0" ::"r"(fpscr_reg)); __asm__(" vpop {d0-d15}");
That appeared to work well, but recently, when adding a new feature that happened to do a couple of long floating point operations in a single line, I noticed that I was getting incorrect results. These incorrect results went away when floating point operations were broken up with their values stored in intermediate variables.
After a lot of digging, it appears that the issue was that the floating point registers were getting stored in the wrong order, which was garbling the saved state before entering the interrupt context. With a long computation in a single line, the state of the computation was never getting save to RAM, and instead lived in the FPU registers. When the interrupt triggered, it was messing up that register state, which in turn messed up the computed result.
Evidently, the correct ordering is to store the FPSCR and FPEXC first, and then store the general purpose FP registers. This looks like:
uint32 fpscr_reg; uint32 fpexc_reg; __asm__(" vmrs %0, fpscr" : "=r"(fpscr_reg)); __asm__(" vmrs %0, fpexc" : "=r"(fpexc_reg)); __asm__(" vpush {d0-d15}"); /* code to handle interrupt... */ __asm__(" vpop {d0-d15}"); __asm__(" vmsr fpexc, %0" ::"r"(fpexc_reg)); __asm__(" vmsr fpscr, %0" ::"r"(fpscr_reg));
That appeared to fix the issues I was seeing with the long floating point computation in a single line. That said, there doesn't seem to be much information about these steps of saving/restoring out there. I would like to get some confirmation that this new ordering is correct, and that I'm not missing any other key steps in this process or idiosyncrasies of the TMS570.
Thanks!