Other Parts Discussed in Thread: TMS320F28377D
Tool/software: TI C/C++ Compiler
Dear readers,
We are having done some optimalization and notice different behaviour in CLA and CPU.
original code
x = (unsigned int) AdcaResultRegs.ADCRESULT0 + (unsigned int) AdcaResultRegs.ADCRESULT1 + (unsigned int) AdcaResultRegs.ADCRESULT2 + (unsigned int) AdcaResultRegs.ADCRESULT3 + (unsigned int) AdcaResultRegs.ADCRESULT4;
MMOVZ16 MR0,@_AdcbResultRegs+6 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MMOVZ16 MR1,@_AdcbResultRegs+8 ; [CPU_FPU] |209| MLSL32 MR1,#16 ; [CPU_FPU] |209| MLSR32 MR1,#16 ; [CPU_FPU] |209| MADD32 MR1,MR1,MR0 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+9 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR1,MR0,MR1 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+10 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR1,MR0,MR1 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+11 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR1,MR0,MR1 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+13 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR1,MR0,MR1 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+14 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR1,MR0,MR1 ; [CPU_FPU] |209| MMOVZ16 MR0,@_AdcbResultRegs+15 ; [CPU_FPU] |209| MLSL32 MR0,#16 ; [CPU_FPU] |209| MLSR32 MR0,#16 ; [CPU_FPU] |209| MADD32 MR0,MR0,MR1 ; [CPU_FPU] |209| MUI32TOF32 MR0,MR0 ; [CPU_FPU] |209|
optimized code which uses single cycle MUI16TOF32 instructions (without MNOPS):
x = (float) AdcaResultRegs.ADCRESULT0 + (float) AdcaResultRegs.ADCRESULT1 + (float) AdcaResultRegs.ADCRESULT2 + (float) AdcaResultRegs.ADCRESULT3 + (float) AdcaResultRegs.ADCRESULT4;
MUI16TOF32 MR0,@_AdcbResultRegs ; [CPU_FPU] |199| MUI16TOF32 MR1,@_AdcbResultRegs+1 ; [CPU_FPU] |199| MADDF32 MR0,MR1,MR0 ; [CPU_FPU] |199| MUI16TOF32 MR1,@_AdcbResultRegs+2 ; [CPU_FPU] |199| MADDF32 MR0,MR1,MR0 ; [CPU_FPU] |199| MMOVD32 MR3,@_CLimErr_p ; [CPU_FPU] |195| MUI16TOF32 MR1,@_AdcbResultRegs+3 ; [CPU_FPU] |199| MADDF32 MR0,MR1,MR0 ; [CPU_FPU] |199| MUI16TOF32 MR1,@_AdcbResultRegs+4 ; [CPU_FPU] |199| MADDF32 MR0,MR1,MR0 ; [CPU_FPU] |199| MMOV32 @_x,MR0 ; [CPU_FPU] |199|
Somehow in realtime control system we notice that the measured is "different" on fast changing signals when using the (float) casts resulting in MUI16TOF32 instructions.
We tried a new compiler and notice that in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.
Should this also be the case for CLA-to-FPU instruction MUI16TOF32? This could explain why the optimized float accumulation result (averaging) is different.
Best regards,