Compiler/TMS320F28075: Execution Time Difference on F28075 with Different Compilers

Ricky Zhang

Part Number: TMS320F28075

Tool/software: TI C/C++ Compiler

Hi, Champs,

My customer had a problem on F28075 execution time with specific codes with different compilers.

They used F28234 earlier with CGT v6.1.0 and moved to F28075 now with latest CGT v16.9.1.LTS, while with the same codes running on SARAM on same speed of 120MHz, they found there're large execution time difference, thus they tried v6.1.0 on F28075 as well and found problem exists:

v6.1.0 3.49us
10869h - 10719h = 150h = 336

v16.9.1.LTS 4.83us
10d20h - 10b8ch = 194h = 404

Here I attached the source code snapshot, the linker command file, the array defined in a C++ class, the array address assigned in memory (identical in both cases), and the compiler console, disassembly copied from view, map file for both cases, could you please take a look at it and advise why this happens (we saw the disassembly codes are different here)?

F28075 Compiler.7z

Best Regards,

Ricky Zhang

over 7 years ago

0 George Mock over 7 years ago

TI__Guru**** 232920 points

All of the code is built with --opt_level=off. (Your build uses the equivalent -Ooff). The compiler development team is not concerned about the performance of code built with --opt_level=off. Among other things, there is no tracking of performance differences between versions under --opt_level=off. I am not surprised there is a difference, or even a worse difference.

If performance is important, then build with at least --opt_level=2. If you have a reason to not build with optimization, what is it?

Thanks and regards,

-George

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

Oh......then it's a pity to tell that the original setting in customer system is -O4 with the same result.

Using --opt_level=off here is just to show you will get the same "bad' performance and it has nothing to do with optimization, and we don't need to do extra try effort with "no optimization".

Btw, with optimization level off, no matter which version compiler you use, you should get identical disassembly codes, isn't it?

Can you test the code at your side with a simplified project, or simply analyze the disassembly codes, or you have to get a test case from customer?

0 George Mock over 7 years ago in reply to Ricky Zhang

TI__Guru**** 232920 points

Ricky Zhang said:
Oh......then it's a pity to tell that the original setting in customer system is -O4 with the same result.

We very much want to figure that out. More on that below.

Ricky Zhang said:
Using --opt_level=off here is just to show you will get the same "bad' performance and it has nothing to do with optimization, and we don't need to do extra try effort with "no optimization".

I understand. It is just that we don't analyze performance problems in that fashion.

Ricky Zhang said:
Btw, with optimization level off, no matter which version compiler you use, you should get identical disassembly codes, isn't it?

They will often be similar. But not identical. Differences like the one experienced here are not common, but they are not surprising either.

Ricky Zhang said:
Can you test the code at your side with a simplified project, or simply analyze the disassembly codes, or you have to get a test case from customer?

We need a test case. I presume the performance difference is seen in one function. Please preprocess the source file which contains the function, and attach that to the next post. Indicate the name of that function. Show all the build options exactly as the compiler sees them And indicate the version of the compiler.

Thanks and regards,

-George

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

Function name is Dat_Int_InvCurrPQCalc();

Defined in Inverter.cpp source file under folder 28075_TestTime\Source\App

Called by EPWM_INT_ISR() function in CtrlISR.cppsource file under folder 28075_TestTime\Source\Scheduler, which is included in MainProcedure.cpp source file under folder 28075_TestTime\Source by "#include "Scheduler\CtrlISR.cpp""

Built with both v6.1.0 and v16.9.0.LTS compilers and consoles for both cases are attached

GPIO43 is used to test the code execution duration in hardware

Preprocess.7z

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

In case there're difficulties to re-produce this issue, or you need the entire project to do further investigations, attached is the project and .map files for reference.

All compiles are completed with -Ooff and you can change it to -O4 for test as well.

Project Build.7z

0 George Mock over 7 years ago in reply to Ricky Zhang

TI__Guru**** 232920 points

My investigation of this problem is limited to inspecting the assembly code generated by the compiler. I look at how many instructions are generated, and the sorts of operations those instructions perform. These factors influence how many CPU cycles are needed to execute the function. Based on that, I see no reason to expect a large difference in the number of CPU cycles needed for this function, as generated by the 6.1.0 and 16.9.0.LTS compilers.

I use the term CPU cycles deliberately. That is all the compiler can influence. The compiler cannot do anything about cycles lost to system effects like memory wait states, or stalls of some kind. But perhaps something like that is the reason for the difference. Unfortunately, I am not an expert on system effects.

Thanks and regards,

-George

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

George Mock said:

My investigation of this problem is limited to inspecting the assembly code generated by the compiler. I look at how many instructions are generated, and the sorts of operations those instructions perform. These factors influence how many CPU cycles are needed to execute the function. Based on that, I see no reason to expect a large difference in the number of CPU cycles needed for this function, as generated by the 6.1.0 and 16.9.0.LTS compilers.

I think I've already provided these information in my original post, which includes the assembly codes and the differences when F28075 runs at 120MHz:

v6.1.0 3.49us
10869h - 10719h = 150h = 336

v16.9.1.LTS 4.83us
10d20h - 10b8ch = 194h = 404

So can you let me know how many CPU cycles exactly did you see on both cases?

And why there're differences for these assembly codes generated with both compilers?

As you see, there're no other build options different but customer only changes the compiler.

George Mock said:

I use the term CPU cycles deliberately. That is all the compiler can influence. The compiler cannot do anything about cycles lost to system effects like memory wait states, or stalls of some kind. But perhaps something like that is the reason for the difference. Unfortunately, I am not an expert on system effects.

Who do you think is the experts on system effects? Anyone from compiler tools team, or C2000 team?

Please advise. Thanks.

0 Vivek Singh over 7 years ago in reply to Ricky Zhang

TI__Guru** 109165 points

Ricky,

They used F28234 earlier with CGT v6.1.0 and moved to F28075 now with latest CGT v16.9.1.LTS, while with the same codes running on SARAM on same speed of 120MHz, they found there're large execution time difference, thus they tried v6.1.0 on F28075 as well and found problem exists:

What exactly you mean by SARAM here. Is it internal RAM of external RAM device (via XINTF/EMIF). If internal RAM then which RAM (LSx, GSx ?)

Regards,

Vivek Singh

0 Ricky Zhang over 7 years ago in reply to Vivek Singh

TI__Genius 9520 points

Vivek,

Sorry I made the confusion. I'm referring to the internal RAM like D0/1 and GSx RAM.

You can find the entire project in my Mar. 16th post for all detailed information.

Btw, customer didn't enable DCSM or CSM at this moment.

Function name is Dat_Int_InvCurrPQCalc();

Defined in Inverter.cpp source file under folder 28075_TestTime\Source\App, which will be copied to GS RAM RAMGS4567 (origin = 0x00F000, length = 0x004000) for running.

Called by EPWM_INT_ISR() function (which will be copied to D0 RAM RAMD01(origin = 0x00B000, length = 0x000050) for running.) in CtrlISR.cppsource file under folder 28075_TestTime\Source\Scheduler, which is included in MainProcedure.cpp source file under folder 28075_TestTime\Source by "#include "Scheduler\CtrlISR.cpp""

UNION: RUN = RAMD01

{

.CriticalIntFuncsSecured : LOAD = FLASHCTON,

LOAD_START(_CriticalIntFuncsSecuredLoadStart),

LOAD_END(_CriticalIntFuncsSecuredLoadEnd),

RUN_START(_CriticalIntFuncsSecuredRunStart),

PAGE = 0

Flash28_API:

LOAD = FLASHAB,

LOAD_START(_Flash28_API_LoadStart),

LOAD_END(_Flash28_API_LoadEnd),

RUN_START(_Flash28_API_RunStart),

PAGE = 0

}

.CriticalIntFuncsNOTSecured : LOAD = FLASHCTON,

RUN = RAMGS4567,

LOAD_START(_CriticalIntFuncsNOTSecuredLoadStart),

LOAD_END(_CriticalIntFuncsNOTSecuredLoadEnd),

RUN_START(_CriticalIntFuncsNOTSecuredRunStart),

PAGE = 0

Best Regards,

Ricky Zhang

0 Ricky Zhang over 7 years ago in reply to Vivek Singh

TI__Genius 9520 points

Vivek,

Could we get any response from your side? Customer pushed a little bit as this issue has pended for long time.

Sorry for that and thanks for your attention.

Best Regards,

Ricky Zhang

0 Vivek Singh over 7 years ago in reply to Ricky Zhang

TI__Guru** 109165 points

Hi Ricky,

I am yet to go through full detail on this but wanted to do a quick check - If you are using the ETPWM here then there will be difference in the time since ETPWM is not running at speed (60MHz vs 120 Mhz) . Though the code size is same (and same instruction), the access time will be different. Or you are seeing the difference in CPU algorithm execution itself (no dependency on peripheral access).

Regards,

Vivek Singh

0 Ricky Zhang over 7 years ago in reply to Vivek Singh

TI__Genius 9520 points

Vivek,

No, the test case we provided didn't use ETPWM and you can simply compare the disassembly codes generated with different compiler tools. There're big difference there for identical C codes and this does cause different CPU cycles to execute.

Best Regards,

Ricky Zhang

0 Vivek Singh over 7 years ago in reply to Ricky Zhang

TI__Guru** 109165 points

Ricky,

Issue is different execution cycle between two different devices (F28234 vs F28075) or code generated from two different compiler version for same device (F28075)?

Vivek Singh

0 Ricky Zhang over 7 years ago in reply to Vivek Singh

TI__Genius 9520 points

Vivek,

The latter. Issue was occurred when migrating codes from F28234 to F28075, but now we narrow the range to different compilers on same F28075 device.

Best Regards,

Ricky Zhang

0 Vivek Singh over 7 years ago in reply to Ricky Zhang

TI__Guru** 109165 points

Ricky,

In that case we have to ask compile team to look into this again.

Vivek Singh

0 George Mock over 7 years ago in reply to Ricky Zhang

TI__Guru**** 232920 points

Ricky Zhang said:
compare the disassembly codes generated with different compiler tools. There're big difference there for identical C codes

I need to reproduce this result to understand how it happened. Please submit a test case by following these steps.

Preprocess the source file related to the disassembly compared
Attach that to your next post
Indicate the versions of the compiler used
Indicate the names of the functions compared
Show the build options exactly as the compiler sees them

Thanks and regards,

-George

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

George Mock said:

Ricky Zhang

compare the disassembly codes generated with different compiler tools. There're big difference there for identical C codes

I need to reproduce this result to understand how it happened. Please submit a test case by following these steps.

Preprocess the source file related to the disassembly compared

Attach that to your next post

Indicate the versions of the compiler used

Indicate the names of the functions compared

Show the build options exactly as the compiler sees them

Thanks and regards,

-George

I think I have already proceed this on my Mar. 16th post, do you really need me to repeat it?

Best Regards,

Ricky Zhang

0 Ricky Zhang over 7 years ago in reply to George Mock

TI__Genius 9520 points

Can we expect an update by this week please?

0 Vivek Singh over 6 years ago in reply to Ricky Zhang

TI__Guru** 109165 points

This is getting debugged/discussed offline.

C2000™︎ microcontrollers

C2000 microcontrollers forum

Compiler/TMS320F28075: Execution Time Difference on F28075 with Different Compilers