This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320F280049: CLA MUI16TOF32 vs CPU UI16TOF32

Part Number: TMS320F280049
Other Parts Discussed in Thread: TMS320F28377D

Tool/software: TI C/C++ Compiler

Dear readers,

We are having done some optimalization and notice different behaviour in CLA and CPU.

original code

x = (unsigned int) AdcaResultRegs.ADCRESULT0 + (unsigned int) AdcaResultRegs.ADCRESULT1 + (unsigned int) AdcaResultRegs.ADCRESULT2 + (unsigned int) AdcaResultRegs.ADCRESULT3 + (unsigned int) AdcaResultRegs.ADCRESULT4;

        MMOVZ16   MR0,@_AdcbResultRegs+6 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MMOVZ16   MR1,@_AdcbResultRegs+8 ; [CPU_FPU] |209| 
        MLSL32    MR1,#16               ; [CPU_FPU] |209| 
        MLSR32    MR1,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR1,MR0           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+9 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+10 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+11 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+13 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+14 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR1,MR0,MR1           ; [CPU_FPU] |209| 
        MMOVZ16   MR0,@_AdcbResultRegs+15 ; [CPU_FPU] |209| 
        MLSL32    MR0,#16               ; [CPU_FPU] |209| 
        MLSR32    MR0,#16               ; [CPU_FPU] |209| 
        MADD32    MR0,MR0,MR1           ; [CPU_FPU] |209| 
        MUI32TOF32 MR0,MR0              ; [CPU_FPU] |209|

optimized code which uses single cycle MUI16TOF32 instructions (without MNOPS):

x = (float) AdcaResultRegs.ADCRESULT0 + (float) AdcaResultRegs.ADCRESULT1 + (float) AdcaResultRegs.ADCRESULT2 + (float) AdcaResultRegs.ADCRESULT3 + (float) AdcaResultRegs.ADCRESULT4;

        MUI16TOF32 MR0,@_AdcbResultRegs ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+1 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+2 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MMOVD32   MR3,@_CLimErr_p       ; [CPU_FPU] |195| 
        MUI16TOF32 MR1,@_AdcbResultRegs+3 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MUI16TOF32 MR1,@_AdcbResultRegs+4 ; [CPU_FPU] |199| 
        MADDF32   MR0,MR1,MR0           ; [CPU_FPU] |199| 
        MMOV32    @_x,MR0               ; [CPU_FPU] |199| 

Somehow in realtime control system we notice that the measured is "different" on fast changing signals when using the (float) casts resulting in MUI16TOF32 instructions.

We tried a new compiler and notice that in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

Should this also be the case for CLA-to-FPU instruction MUI16TOF32? This could explain why the optimized float accumulation result (averaging) is different.

Best regards,

  • Tjarco Boerkoel said:
    We tried a new compiler and notice that in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

    I presume you mean compiler version 18.12.5.LTS.  I presume you are referring to an entry that is filed in our system for tracking issues.  But I cannot find it.  What is the ID of this issue?  It will look similar to CODEGEN-6020.

    Thanks and regards,

    -George

  • Hi ,

    Thank you for your reply.

    Yes indeed, sorry, it's compiler 18.12.5LTS. Its an Advanced Option in my C2000 compiler configuration under 'Runtime Model Options'.

    We are using the TMS320F28377D micro controller.

    Best regards

  • I presume you get this information ...

    Tjarco Boerkoel said:
    in Version 12.12.5LTS is a workaround "CPU-to-FPU register write (fpu1 errata)" that adds 5x NOPS for UI16TOF32 instructions.

    ... from a document published by TI.  I cannot find this document.  Please give me the link to this document, or attach it to your next post, or something similar.

    Thanks and regards,

    -George

  • Hi ,

    No document related.

    Best regards,

    [edit]

    There is really something going on here, when optimizing  for CLA1

    float x = (long) ADCA.result1 + (long) ADCA.result2 +(long) ADCA.result3 +(long) ADCA.result4;

    into

    float x = (float) ADCA.result1 + (float) ADCA.result2 +(float) ADCA.result3 +(float) ADCA.result4;

    I get different behavior on my controller responds. Is there a pipeline issue for CLA when using UI16TOF32 just as described for cpu->fpu1?

  • I apologize for the delay. 

    The compiler switch --silicon_errata_fpu1_workaround is related to interactions between the CPU and the FPU, and not the CLA.

    Tjarco Boerkoel said:
    Is there a pipeline issue for CLA when using UI16TOF32 just as described for cpu->fpu1?

    I don't know.  I'll notify the C28x experts about this thread.

    Thanks and regards,

    -George

  • Hello Tjarco,

    Make sure the floating point rounding mode for the CLA is set to 1. 

    This is in the MSETFLG register for the CLA  RNDF32=1         

    You can set this bit in your C Code by using the __msetflg CLA intrinsic documented in the Compiler User's Guide (http://www.ti.com/lit/spru514)

    Let me know if this resolves the issue. 

    Best Regards,

    Lori