• Not Answered

Compiler/TMS320F28075: Execution Time Difference on F28075 with Different Compilers

Part Number: TMS320F28075

Tool/software: TI C/C++ Compiler

Hi, Champs,

My customer had a problem on F28075 execution time with specific codes with different compilers.

They used F28234 earlier with CGT v6.1.0 and moved to F28075 now with latest CGT v16.9.1.LTS, while with the same codes running on SARAM on same speed of 120MHz, they found there're large execution time difference, thus they tried v6.1.0 on F28075 as well and found problem exists:

v6.1.0  3.49us
10869h - 10719h = 150h = 336

v16.9.1.LTS 4.83us
10d20h - 10b8ch = 194h = 404

Here I attached the source code snapshot, the linker command file, the array defined in a C++ class, the array address assigned in memory (identical in both cases), and the compiler console, disassembly copied from view, map file for both cases, could you please take a look at it and advise why this happens (we saw the disassembly codes are different here)?

F28075 Compiler.7z

Best Regards,

Ricky Zhang

7 Replies

  • All of the code is built with --opt_level=off.  (Your build uses the equivalent -Ooff).  The compiler development team is not concerned about the performance of code built with --opt_level=off.  Among other things, there is no tracking of performance differences between versions under --opt_level=off.  I am not surprised there is a difference, or even a worse difference.  

    If performance is important, then build with at least --opt_level=2.  If you have a reason to not build with optimization, what is it?

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click Verify Answer on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    Oh......then it's a pity to tell that the original setting in customer system is -O4 with the same result.

    Using --opt_level=off here is just to show you will get the same "bad' performance and it has nothing to do with optimization, and we don't need to do extra try effort with "no optimization".

    Btw, with optimization level off, no matter which version compiler you use, you should get identical disassembly codes, isn't it?

    Can you test the code at your side with a simplified project, or simply analyze the disassembly codes, or you have to get a test case from customer?

  • In reply to Ricky Zhang:

    Ricky Zhang
    Oh......then it's a pity to tell that the original setting in customer system is -O4 with the same result.

    We very much want to figure that out.  More on that below.

    Ricky Zhang
    Using --opt_level=off here is just to show you will get the same "bad' performance and it has nothing to do with optimization, and we don't need to do extra try effort with "no optimization".

    I understand.  It is just that we don't analyze performance problems in that fashion.

    Ricky Zhang
    Btw, with optimization level off, no matter which version compiler you use, you should get identical disassembly codes, isn't it?

    They will often be similar.  But not identical.  Differences like the one experienced here are not common, but they are not surprising either.

    Ricky Zhang
    Can you test the code at your side with a simplified project, or simply analyze the disassembly codes, or you have to get a test case from customer?

    We need a test case.  I presume the performance difference is seen in one function.  Please preprocess the source file which contains the function, and attach that to the next post.  Indicate the name of that function.  Show all the build options exactly as the compiler sees them  And indicate the version of the compiler.

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click Verify Answer on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    Function name is Dat_Int_InvCurrPQCalc();

    Defined in Inverter.cpp source file under folder 28075_TestTime\Source\App

    Called by EPWM_INT_ISR() function in CtrlISR.cppsource file under folder 28075_TestTime\Source\Scheduler, which is included in MainProcedure.cpp source file under folder 28075_TestTime\Source by "#include "Scheduler\CtrlISR.cpp""

    Built with both v6.1.0 and v16.9.0.LTS compilers and consoles for both cases are attached

    GPIO43 is used to test the code execution duration in hardware

    Preprocess.7z

  • In reply to George Mock:

    In case there're difficulties to re-produce this issue, or you need the entire project to do further investigations, attached is the project and .map files for reference.

    All compiles are completed with -Ooff and you can change it to -O4 for test as well.

    Project Build.7z

  • In reply to Ricky Zhang:

    My investigation of this problem is limited to inspecting the assembly code generated by the compiler.  I look at how many instructions are generated, and the sorts of operations those instructions perform.  These factors influence how many CPU cycles are needed to execute the function.  Based on that, I see no reason to expect a large difference in the number of CPU cycles needed for this function, as generated by the 6.1.0 and 16.9.0.LTS compilers.  

    I use the term CPU cycles deliberately.  That is all the compiler can influence.  The compiler cannot do anything about cycles lost to system effects like memory wait states, or stalls of some kind.  But perhaps something like that is the reason for the difference.  Unfortunately, I am not an expert on system effects.

    Thanks and regards,

    -George


    TI C/C++ Compiler Forum Moderator
    Please click Verify Answer on the best reply to your question
    The CCS Youtube Channel
     has short how-to videos
    The 
    Compiler Wiki answers most common questions
    Track an issue with SDOWP. Enter your bug id in the Search box.

  • In reply to George Mock:

    George Mock

    My investigation of this problem is limited to inspecting the assembly code generated by the compiler.  I look at how many instructions are generated, and the sorts of operations those instructions perform.  These factors influence how many CPU cycles are needed to execute the function.  Based on that, I see no reason to expect a large difference in the number of CPU cycles needed for this function, as generated by the 6.1.0 and 16.9.0.LTS compilers.  

    I think I've already provided these information in my original post, which includes the assembly codes and the differences when F28075 runs at 120MHz:

    v6.1.0  3.49us
    10869h - 10719h = 150h = 336

    v16.9.1.LTS 4.83us
    10d20h - 10b8ch = 194h = 404

    So can you let me know how many CPU cycles exactly did you see on both cases?

    And why there're differences for these assembly codes generated with both compilers?

    As you see, there're no other build options different but customer only changes the compiler.

    George Mock

    I use the term CPU cycles deliberately.  That is all the compiler can influence.  The compiler cannot do anything about cycles lost to system effects like memory wait states, or stalls of some kind.  But perhaps something like that is the reason for the difference.  Unfortunately, I am not an expert on system effects.

    Who do you think is the experts on system effects? Anyone from compiler tools team, or C2000 team?

    Please advise. Thanks.