Tool/software:
Hi Expert
Customer developing OBC DCDC with our F280039, the MCU controls PFC and CLLLC in the OBC power stage.
The PFC control frequency is 30KHz, customer find if the control ISR run in CPU the PWM CMP data cannot be updated in 33.3us (actual time ~36.5us). but CLA can finish the ISR in ~27us.
We did the check below:
The Processor Option shown below, FPU/ TMU is enabled:
The Optimization shown below, floating point mode is relax, which allow compiler to replace the math.h sin/cos to TMU __sin/ __cos
The assembly of CPU is shown below:
The assembly of CLA is shown below:
The instructions in control ISR are "if else branch" / "+ "/ "-" / ''/" / "sin"/ "cos".
All variables are write with F suffix, for example, 2.0 write as "2.0f".
Is there any other actions I can take to optimize the code?
Thanks
Joe
Hi,
Our expert on this topic is out of office for two weeks. Is it ok to wait for his return?
Thanks,
Ben Collier
Hi Ben
I think the expert might be Sira? it is ok, I have some workaround, such as reduce some "if- else-" branch, but still need expert help to look into the root cause.
I will follow up latter.
Thanks
Joe
Hi Ben
Seems that Sira backs to office, could you please help to re-assign the E2E thread to him?
Thanks
Joe
Hi Joe,
What optimization level is specified for the project?
Have you checked whether the divisions on the C28x side are using the TMU?
The C28x should be faster than (or as fast as) the CLA for the operations you have described.
Can they share the code?
Thanks,
Sira
Hi Sira
Sorry for the delay, I do not have F280039 LP for now, but I do the following test with F280049 LP:
The Basic projects are : timer_ex1_cputimers and cla_sin,
The optimization level setting is shown below:
Test 1: In the project "timer_ex1_cputimers" I add below statement to calculate 512 and use CPU TIMER0 to record the time:
// for (i = 0; i<10; i++)
// {
for (j = 0; j<512; j++)
{
sinf(test_input[j]);
}
//}
Test result 1:
17984 cycles are needed for 512 times sinf() calculation in CPU.
179456 cycles are needed for 5120 times sinf() calculation in CPU.
Test 2: In the project "cla_sin" the same statement is added in CLAtask1, the CPUTimer_startTimer() is added before Cla1ForceTask1andWait() and CPUTimer_stopTimer() is added behind WAITSTEP:
Test result 2:
8011 cycles are needed for 512 times sinf() calculation in CLA.
77416 cycles are needed for 5120 times sinf() calculation in CLA.
Test 3: In the project "cla_sin" I use the statement below in CLAtask1, the CPUTimer_startTimer() is added before CLA_forceTasks() and CPUTimer_stopTimer() is added in the first line of cla1Isr1() :
// for (i = 0; i<10; i++)
// {
for (j = 0; j<512; j++)
{
x = i+j;
//CLAsin(test_input[j]);
}
// }
Test result 3:
10319 cycles are needed for 512 times + calculation in CLA.
102791 cycles are needed for 5120 times + calculation in CLA. (CPU do this with 77010 cycles)
The test shows that sinf runs faster in CLA than in CPU.
My question can be narrow down to how to improve the CPU running efficiency for sinf?
Thanks
Joe
Joe,
Looking at your optimization settings, fp_mode = relaxed which is goog. However, the optimization level (--opt_level) is off. This prevents TMU instructions from being generated. Please change it to 2 atleast.
Thanks,
Sira
Hi Sira
Thanks for your guidance, I changed the optimization level (--opt_level) to 2, that helps me reduce the CPU time consumption dramatically.
Test result: 97485 timer0 cycles for 5120 sinf calculation with --opt level 2, (179456 with --opt level off)
But the compiler ignores my test code at first, and it becomes normal after I add "volatile" before variable claim. So I think if customer changes the --opt level to 2 directly in the mature project, they may find their program performance changed to the unexpected state.
Support needs:
Could you please help to suggest where users need to check when change --opt level from off to 2?
Or can user do not set the --opt level for the whole project, just for the ISR?
Thanks
Joe
Hi Joe,
Adding volatile is going to make the compiler take unnecessary extra cycles because it will always need to read/write the variables from memory, and cannot perform register optimizations.
We strongly recommend against using volatile unless it is absolutely needed (E.g. for peripheral registers, or for variables that are modified in more than one context e.g. multiple ISRs, or ISR and background loop.
We will always recommend that optimization be enabled, atleast at level 2, and for the entire project.
For your specific issue, instead of using volatile, you should ensure the operations that are performed are actually "used". One way to do that would be to assign the sinf to some global variable and keep adding each subsequent sinf to it. And finally print that variable (outside the benchmarking section).
Thanks,
Sira
Hi Sira
Thanks for your reply, I changed the benchmark code according to your guidance, finally I can see the 5120 sinf consume 71782 timer0 counts in CPU and 77416 timer0 counts in CLA, that's align with :
The C28x should be faster than (or as fast as) the CLA for the operations you have described.
Best regards
Joe
Hi Sira
One more question:
the optimization level (--opt_level) is off. This prevents TMU instructions from being generated. Please change it to 2 atleast.
Is there any guidance for this? I cannot find the exact statement in SPRU514Z 3.1 and 3.16 that --opt_level off prevent the TMU instructions from being generated.
Thanks
Joe