This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linker problem using Whole Program Optimization option.

CCS 5.5, TMS570LC43

I've got poor performances in overall with optimization options, even with the CACHE activate. (I can see a difference with CACHE activated so it seems really be activated). That's why I've tried all the options. I have to admit that it had not been always the case, something change in my code or project setting maybe that influence that, but I don't know what.

For instance, now I put a simple empty for loop somewhere (like 1000 loops) and it raises the CPU load like the hell. The is with or without optimization at all and for any optimization that I could build.

Specifically, the option level 4 doen't work properly for me, it almost never finished to link (I mean more than5 minutes!) using this option and the program crash or doen't seem to work properly. These symptoms are not there for other options.

Is it necessairy that all the code use the same optimization level ? In overall, I've test the same optimization for all the code but a couple of files that cause compile error if I use specific level. For instance, if I use optimization 3 or 4 level on some file, it hangs forever while compiling so I have to use 1 or 2 level.

  • Simon lapointe said:
    Specifically, the option level 4 doen't work properly for me

    I'd appreciate if we could get a test case which allows us to reproduce this behavior, and explain it.  Is your code organized as a CCS project?  If so, I'd appreciate if you would submit the project.

    Simon lapointe said:
    Is it necessairy that all the code use the same optimization level ?

    No.

    Simon lapointe said:
    if I use optimization 3 or 4 level on some file, it hangs forever while compiling

    Again, I'd appreciate if we could get a test case.  For problems with a single file ... Please preprocess the file and submit that.  Show the compiler version, and all the build options exactly as the compiler sees them.

    Thanks and regards,

    -George

  • I've sent you the project.
  • Simon lapointe said:
    Specifically, the option level 4 doen't work properly for me, it almost never finished to link (I mean more than5 minutes!)

    I cannot reproduce the part where it takes a long time to build.  The build does take a while, but that is because there are many files.  No one file takes all that long to build.  And the link does not take a long time either.

    But I do see several messages similar to ...

    error: symbol "McuCustomerExtension4Codec::Encode(char *, unsigned int *)"
       redeclared with incompatible type:
       "void(struct McuCustomerExtension4Codec * const, unsigned char * const,
       uint32_t * const)"
       in
       "test_case/V34_A_SEND/Mo4/CanFile/include/codecs/extension4/McuCustomerExtension4Codec.h"
       at line 77
       and:
       "void(struct McuCustomerExtension4Codec * const, unsigned char *, uint32_t
       *)"
       in "../CanFile/src/codecs/extension4/McuCustomerExtension4Codec.cpp" at line
       7)
    

    Notice how the .h file declares the 2nd and 3rd function arguments with const pointers.  But the .cpp file defines the function with those same pointers non-const.  This difference is not exposed until link time optimization (with --opt_level=4) is used.  There are 11 such errors.  These function declarations and definitions must be changed so they agree.

    Thanks and regards,

    -George

  • Thanks for this information.

    Well I've just fix the error using coherent argument definitions and it successfully builds. The problem with this --opt_level=4 is that it takes 9 minutes just to link (I have measure it). Maybe this is not forever but it is like forever for me. My laptop is pretty recent stuff but this could be a configuration problem somewhere, I don't know.

    Do you use CCS5.5 and the same compiler version ?

    So I will experiment this on the hardware are I'll give you feedback soon. If the slow running still there, I could also work in order to remove the bootloader stuff and hardware dependant stuff in order you to run this code on a development plateform. Is that something possible ?

    - Simon

  • Simon lapointe said:
    The problem with this --opt_level=4 is that it takes 9 minutes just to link (I have measure it).

    That's a long time, but within the range of times we have seen with --opt_level=4.  I'm sure building with --opt_level=3 is much faster.  Compare the difference in performance and code size between those two.  You may decide that --opt_level=3 is good enough.  Or perhaps you can delay using --opt_level=4 until near the end of your project.

    Simon lapointe said:
    Do you use CCS5.5 and the same compiler version ?

    Yes.  But I encounter those errors, which stop the build before it completes.

    Simon lapointe said:
    So I will experiment this on the hardware are I'll give you feedback soon. If the slow running still there, I could also work in order to remove the bootloader stuff and hardware dependant stuff in order you to run this code on a development plateform. Is that something possible ?

    This is outside my expertise.

    Thanks and regards,

    -George

  • I've experimented with level4 and since it doen't give more performance, I'll follow your advice and keep on level3.

    Another thing: I've realized that the difference between both for loops that I'd talked about is normal in the context that one is called more often, sorry about that, it was my mistake.

    But nevertheless, performance seems to be poor for both loops and I still cannot execute the code I'm able to run on a TMS570LS3 DSP.
    One of my collegue suggest that some branching instruction don't beneficiate the CACHE performance. Is there other type of coding that act like this, explaining why the performance is not there ? I'm searching some ideas to investiguate.

    Regards.
  • Simon lapointe said:
    performance seems to be poor for both loops and I still cannot execute the code I'm able to run on a TMS570LS3 DSP.

    We compiler experts have little knowledge of system level performance issues such as this.  I recommend you start a new thread in the Hercules device forum.

    Thanks and regards,

    -George