Compiler/TMS320F280049: CLA MUI16TOF32 vs CPU UI16TOF32

Tjarco Boerkoel

Expert 1725 points

Part Number: TMS320F280049
Other Parts Discussed in Thread: TMS320F28377D

Tool/software: TI C/C++ Compiler

Dear readers,

We are having done some optimalization and notice different behaviour in CLA and CPU.

original code

x = (unsigned int) AdcaResultRegs.ADCRESULT0 + (unsigned int) AdcaResultRegs.ADCRESULT1 + (unsigned int) AdcaResultRegs.ADCRESULT2 + (unsigned int) AdcaResultRegs.ADCRESULT3 + (unsigned int) AdcaResultRegs.ADCRESULT4;

        MMOVZ16   MR0,@_AdcbResultRegs+6 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MMOVZ16   MR1,@_AdcbResultRegs+8 ; [CPU_FPU] |209| 
        MLSL32    MR1,#16               ; [CPU_FPU] |209| 
        MLSR32    MR1,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR1,MR0           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+9 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+10 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+11 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+13 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+14 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+15 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR0,MR0,MR1           ; [CPU_FPU] |209| 
        MUI32TOF32 MR0,MR0              ; [CPU_FPU] |209|

optimized code which uses single cycle MUI16TOF32 instructions (without MNOPS):

x = (float) AdcaResultRegs.ADCRESULT0 + (float) AdcaResultRegs.ADCRESULT1 + (float) AdcaResultRegs.ADCRESULT2 + (float) AdcaResultRegs.ADCRESULT3 + (float) AdcaResultRegs.ADCRESULT4;

        MUI16TOF32 MR0,@_AdcbResultRegs ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+1 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+2 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MMOVD32   MR3,@_CLimErr_p       ; [CPU_FPU] |195| 
        MUI16TOF32 MR1,@_AdcbResultRegs+3 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+4 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MMOV32    @_x,MR0               ; [CPU_FPU] |199|

Somehow in realtime control system we notice that the measured is "different" on fast changing signals when using the (float) casts resulting in MUI16TOF32 instructions.

We tried a new compiler and notice that in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

Should this also be the case for CLA-to-FPU instruction MUI16TOF32? This could explain why the optimized float accumulation result (averaging) is different.

Best regards,

over 5 years ago

0 George Mock over 5 years ago

TI__Guru**** 249890 points

Tjarco Boerkoel said:
We tried a new compiler and notice that in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

I presume you mean compiler version 18.12.5.LTS. I presume you are referring to an entry that is filed in our system for tracking issues. But I cannot find it. What is the ID of this issue? It will look similar to CODEGEN-6020.

Thanks and regards,

-George

0 Tjarco Boerkoel over 5 years ago in reply to George Mock

Expert 1725 points

Hi George Mock,

Thank you for your reply.

Yes indeed, sorry, it's compiler 18.12.5LTS. Its an Advanced Option in my C2000 compiler configuration under 'Runtime Model Options'.

We are using the TMS320F28377D micro controller.

Best regards

0 George Mock over 5 years ago

TI__Guru**** 249890 points

I presume you get this information ...

Tjarco Boerkoel said:
in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

... from a document published by TI. I cannot find this document. Please give me the link to this document, or attach it to your next post, or something similar.

Thanks and regards,

-George

0 Tjarco Boerkoel over 5 years ago in reply to George Mock

Expert 1725 points

Hi George Mock,

No document related.

Best regards,

[edit]

There is really something going on here, when optimizing for CLA1

float x = (long) ADCA.result1 + (long) ADCA.result2 +(long) ADCA.result3 +(long) ADCA.result4;

into

float x = (float) ADCA.result1 + (float) ADCA.result2 +(float) ADCA.result3 +(float) ADCA.result4;

I get different behavior on my controller responds. Is there a pipeline issue for CLA when using UI16TOF32 just as described for cpu->fpu1?

0 George Mock over 5 years ago in reply to Tjarco Boerkoel

TI__Guru**** 249890 points

I apologize for the delay.

The compiler switch --silicon_errata_fpu1_workaround is related to interactions between the CPU and the FPU, and not the CLA.

Tjarco Boerkoel said:
Is there a pipeline issue for CLA when using UI16TOF32 just as described for cpu->fpu1?

I don't know. I'll notify the C28x experts about this thread.

Thanks and regards,

-George

0 Lori Heustess over 5 years ago in reply to George Mock

TI__Guru* 92140 points

Hello Tjarco,

Make sure the floating point rounding mode for the CLA is set to 1.

This is in the MSETFLG register for the CLA RNDF32=1

You can set this bit in your C Code by using the __msetflg CLA intrinsic documented in the Compiler User's Guide (http://www.ti.com/lit/spru514)

Let me know if this resolves the issue.

Best Regards,

Lori

Code Composer Studio™︎

Code Composer Studio forum

Compiler/TMS320F280049: CLA MUI16TOF32 vs CPU UI16TOF32