Compiler/TMS320F28069: C execution speed for a=b*0.01 (divide by 100)

Terje Bohler1

Part Number: TMS320F28069

Tool/software: TI C/C++ Compiler

Hi.

We have a time critical switched power supply application and see that by multiply a variable (b) with 0.01 causes some kind of "CPU-problems" for that application (well, an ISR ran every 10'th us sometimes fails due to interrupt latency - I guess). How does the C compiler cope with this 0.01 multiplication operations? Is it identical to a=b/100 (which I thought was a slower implementation/execution)?

best regards

Terje Bøhler (I dont actually have access to the source code for the moment ...)

over 7 years ago

George Mock over 7 years ago

TI__Guru**** 232750 points

I presume you build with the compiler switch --float_support=fpu32. In that case, multiply by 0.01 is much faster than divide by 100.

If you do build with --float_support=fpu32, then I suspect some hardware configuration mistake causes your problem. To pursue that, you need help from the experts in the C2000 forum. You can either start a new thread there, or I can move this thread into that forum.

Thanks and regards,

-George

Archaeologist over 7 years ago

TI__Guru* 84225 points

If variables a and b are integers, you probably aren't doing yourself any favors by introducing a float expression. What are the types of a and b?

Terje Bohler1 over 7 years ago in reply to George Mock

Intellectual 330 points

Thank you very much Georg (I'm not very experienced with the C2000 by the way)

We use the TMS 320F28069, and here are some compiler options used:

Processor Options:

Configuration: Debug [Active]

Summary flags set:

-v28 -ml -mt --cla_support=cla0 --float_support=fpu32 --vcu_support=vcu0 --

. . .

include_path="C:/SW_PS/SW_C68/DcDcCtrl1/DcDcCtrl/F2806x_BSP/IQmath/Include" --

include_path="C:/SW_PS/SW_C68/DcDcCtrl1/DcDcCtrl/F2806x_BSP/PowerLib/asm" -g --define="_DEBUG" --define="__TARGET_CCS" --define="FLASH" --define="LARGE_MODEL" --diag_warning=225

Optimization:

Optimization level: off

Speed vs size trade-offs: 2

Allow reassoziation of FP arithmetic: on

Floating Point mode (--fp_mode): strict

What I feel is very strange is that a simple calculation like the above (a = 0.01 * a , where a is a Uint16) may sometimes have such damaging effects (caused by delay in the entry time of the ISR, which is entered every 5us (not 10us as stated earlier)?).

Q-1: How can a simple multiplication "a=a*0.1" affect the ("background PWM") ISR latency? (ISR is executed every 5us - not 10us)

Q-2: Are there any "interrupt disable" executed in the "a=a*0.1" statement causing this ISR latency?

Q-3: Could there possibly be other kinds of "waits or delays" in conjunction to the "a=a*0.1" execution ?
E:g. "Atomic DSP/MCU operations" blocking interrupts while waiting for calculated float/fixed-point multiplication results?

Q-4: You say: "then I suspect some hardware configuration mistake causes your problem". What could that be? Is it possible to "HW configure this in a wrong manner?

Q-5: What would typically be the DPS consumption/execution time for such an ""a=a*0.1" operation (TCY is 60MHz)?

I really appreciate further comments to this (for me) very strange behavior.

Best Regards

Terje Bøhler

Chester Gillon over 7 years ago in reply to Terje Bohler1

Guru 92251 points

Terje Bohler1 said:
Q-2: Are there any "interrupt disable" executed in the "a=a*0.1" statement causing this ISR latency?
Q-3: Could there possibly be other kinds of "waits or delays" in conjunction to the "a=a*0.1" execution ?
E:g. "Atomic DSP/MCU operations" blocking interrupts while waiting for calculated float/fixed-point multiplication results?

Q-4: You say: "then I suspect some hardware configuration mistake causes your problem". What could that be? Is it possible to "HW configure this in a wrong manner?

Q-5: What would typically be the DPS consumption/execution time for such an ""a=a*0.1" operation (TCY is 60MHz)?

You should be able to look at the generate assembler and use the CCS profile clock to answer those questions.

E.g. I created a test project for a TMS320F28069, compiling with the following options:

"C:/ti_ccs7_0/ccsv7/tools/compiler/ti-cgt-c2000_16.9.1.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla0 --float_support=fpu32 --vcu_support=vcu0 --include_path="C:/ti_ccs7_0/ccsv7/tools/compiler/ti-cgt-c2000_16.9.1.LTS/include" -g --diag_warning=225 --diag_wrap=off --display_error_number -k --preproc_with_compile --preproc_dependency="main.d"  "../main.c"

One function performed a divide by 100 using a floating point multiply:

uint16_t double_scale (uint16_t a)
{
    a = a * 0.01;

    return a;
}

Which had the following instructions:

        double_scale():
008019:   FE02        ADDB         SP, #2
00801a:   9641        MOV          *-SP[1], AL
17          a = a * 0.01;
00801b:   E801E118    MOVIZ        R0, #0x3c23
00801d:   E2C40141    UI16TOF32    R1H, *-SP[1]
00801f:   E80EB850    MOVXI        R0H, #0xd70a
008021:   E7000040    MPYF32       R0H, R0H, R1H
008023:   7700        NOP          
008024:   E68E0000    F32TOUI16    R0H, R0H
008026:   7700        NOP          
008027:   7700        NOP          
008028:   BFA90F12    MOV32        @ACC, R0H
00802a:   9641        MOV          *-SP[1], AL
20      }
00802b:   FE82        SUBB         SP, #2
00802c:   0006        LRETR

And another function performed an integer divide by 100:

uint16_t uint_scale (uint16_t a)
{
    a = a / 100;

    return a;
}

Which had the following instructions:

        uint_scale():
008041:   FE02        ADDB         SP, #2
008042:   9641        MOV          *-SP[1], AL
31          a = a / 100;
008043:   BE64        MOVB         XAR6, #0x64
008044:   0E41        MOVU         ACC, *-SP[1]
008045:   F60F        RPT          #15
008046:   1FA6     || SUBCU        ACC, @AR6
008047:   9641        MOV          *-SP[1], AL
34      }
008048:   FE82        SUBB         SP, #2
008049:   0006        LRETR

I can't see any instructions which disable interrupts.

To measure the relative performance of the two functions used the profile clock in CCS to time the number of cycles to call each function 100 times.

The double_scale() function which performs a fixed-to-float, floating point multiply and then float-to-fixed conversion took on average 41 cycles.

The uint_scale() function which performs an integer divide took on average 51 cycles.

This shows that the floating point multiply is faster than the integer divide. You can convert the number of cycles to time based upon your clock frequency to get an estimate of the execution time.

I say estimate since:

a) The number of clock cycles I measured includes the overhead of a function call and loop.

b) In the double_scale() function which only performs a = a * 0.01 the compiler has inserted three NOPs presumably to allow for the processor pipeline. If the calculation was performed inside another function the compiler might be able to pipeline other instructions rather than NOPs.

c) I didn't configure the number of flash wait states required for your TCY of 60 MHz. Executing code from SRAM rather than flash could be faster.

Terje Bohler1 over 7 years ago in reply to Chester Gillon

Intellectual 330 points

Thanks Chester.

Do you have any suggestions to Georges comment "...suspect some hardware configuration mistake..."?

Are there any possible way to HW misconfiguration, causing my symptoms?

And, to me the code generated seems "fully interruptable" (by PWM or other interrupt sources ...).

best regards

Terje Bøhler

Chester Gillon over 7 years ago in reply to Terje Bohler1

Guru 92251 points

Terje Bohler1 said:
Do you have any suggestions to Georges comment "...suspect some hardware configuration mistake..."?

I don't have any specific information on a hardware configuration mistake.

You mention that an interrupt is required to be serviced every 5 microseconds, while the CPU frequency is 60 MHz. This means that there is one interrupt to be serviced every 300 clocks. Do you have any measurements of how long the interrupt servicing takes before you add the problematic "a=b*0.01" calculation?

Maybe the addition of the "a=b*0.01" somehow stalls the servicing of the ISR long enough to cause a problem.

Also, how is the ISR serviced. E.g. are you using SYS/BIOS in the program?

There are some SYS/BIOS benchmarks for the ti.platforms.ezdsp28335 which report "Hwi dispatcher prolog" as 251 clocks and "Hwi dispatcher epilog" as 181 clocks which means a SYS/BIOS HWI wouldn't be able to service an ISR every 300 clocks.

Code Composer Studio™︎

Code Composer Studio forum

Compiler/TMS320F28069: C execution speed for a=b*0.01 (divide by 100)