TMS320F280025C: "if A <= B)" consumes 19 system clocks

Part Number: TMS320F280025C
Other Parts Discussed in Thread: C2000WARE

Tool/software:

please use the attached testcase which can run on 280025 DEMO board directly.  CCS11.1 with Compiler V21.6 is used. 

It is found that:

if (VAC_rms_fil_DS3 <= Pr_VL1_DS3)        consumes 13 system clocks

 

++VAC_Pr_VL1_cnt_DS3;    consumes 12 system clocks

 

if (VAC_rms_fil_DS3 >= Pr_VH1_DS3)   consumes19 system clocks

I tried changing the variable type from float32_t to float, and also tried changing to constant (such as changing to   if (VAC_rms_fil_DS3 <= 574.6)   ), and the execution time is still the same.

Please advise why the execution of a simple comparison line consumes 13 or 19 system clocks. Thanks!

  • The testcase is attached above

  • Thank you for submitting a test case.  To explore ...

    if (VAC_rms_fil_DS3 >= Pr_VH1_DS3)   consumes19 system clocks

    I added the build option --src_interlist.  This causes the compiler to keep the auto-generated assembly file, instead of deleting it. It has the same name as the source file, with the file extension changed to .asm.  Comments are added to make it easier to understand.  Inspecting that file shows ...

    ;----------------------------------------------------------------------
    ; 129 | if (VAC_rms_fil_DS3 >= Pr_VH1_DS3)                                     
    ;----------------------------------------------------------------------
            MOV32     R0H,@||Pr_VH1_DS3||   ; [CPU_FPU] |129| 
            MOV32     R1H,@||VAC_rms_fil_DS3|| ; [CPU_FPU] |129| 
            CMPF32    R1H,R0H               ; [CPU_FPU] |129| 
            MOVST0    ZF, NF                ; [CPU_FPU] |129| 
            B         ||$C$L20||,LT         ; [CPU_ALU] |129| 

    I cannot explain why the MOVST0 instruction is needed.  But I'm sure it is necessary.  As for the rest of the instructions, there simply is no other way for the compiler to evaluate this expression.

    The extra cycles are likely due to memory wait states, or how a conditional branch affects the pipeline, etc.  I am not an expert on those details of the C28x CPU.  Therefore, I will notify those experts about this thread.

    Thanks and regards,

    -George

  • Hi George,

    Thanks a lot for your reply. Please see below:

    1. could you elaborate "how to add the build option --src_interlist "  to keep .asm file?

    2. What can I do to shorten the execution time?

    Quentin

  • Hi George,

    Thanks a lot for your reply. Please see below:

    1. could you elaborate "how to add the build option --src_interlist "  to keep .asm file?

    2. What can I do to shorten the execution time?

    Quentin

  • Quentin,

    MOV32, CMPF32, and MOVST0 are single-cycle instructions (spruhs1c)

    B is a 7-cycle instruction if the branch is taken and 4-cycle if not taken (spru430)

    So given the assembly snippet pasted by George, this would take 4 + 7 = 11 cycles if the branch is taken.

    There is no way to shorten this. We have 2 loads, a compare, and a branch.

  • Hi Sira,

    Thanks for your reply. 

    1. could you elaborate on "how to add the build option --src_interlist "  to keep .asm file? George mentioned this in his post.
    2. I would like to change compiler optimization level to see if it can shorten the execution time. What is the difference between "3 - Interprocedure Optimizations" & "4 - Whole Program Optimizations"?

    Thanks,

    Quentin

  • Hi Sira,

    This is the same thread's issue
    Now I'm debugging this case with Quentin and we experience a strange phenomenon. We added some code into the C2000ware sample code(like Quentin's test case project). The first time we tested it, everything was fine, "if A <= B" consumes 12 system clocks. However, when we click "reset device" and then "restart" in CCS, the same code "if A <= B" takes 69 cycles, which is significantly more than the first test.

    Quentin will send the test picture later. Could you help to analyze this? Thanks!

    Regards,

    Julia

  • elaborate on "how to add the build option --src_interlist "  to keep .asm file?

    Please see the article Finding Compiler Options in CCS.  It discusses how to find compiler options in the CCS build dialog.  The running example uses --src_interlist.

    I would like to change compiler optimization level to see if it can shorten the execution time.

    Under most circumstances, adding --opt_level=3 or --opt_level=4 improves performance.  However, for this specific compare expression, those options do not affect the generated code.

    Thanks and regards,

    -George

  • Hi George,

    Thanks for pointing me to the article.

    Quentin

  • Hi George and Sira,

    As Julia described, CCS shows a simple " if (VAC_rms_fil_DS3 <= Pr_VL1_DS3)" consumes 12 system clocks sometimes,  but consumes 69 system clocks some other times. Please see the attched screen capture and advise.

    Thanks,

    Quentin

  • Quentin, Julia,

    The number should be consistent across measurements, if the same code is being profiled, but this disparity probably means there is some variance in the CCS clock after a device reset in CCS. This is something the CCS team would have to comment on.

    Side note - With optimization enabled, in general you cannot place breakpoints at arbitrary points in your C code.

    Thanks,

    Sira