This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler: TDA2xx DSP - compiling a file take a very long time

Tool/software: TI C/C++ Compiler

Hello,

Using the DSP on the TDA2xx, compiler version TI v8.1.0

one of the source files we have on the project, takes a VERY long time to compile (few minutes).

The file only contains a few functions some of them are marked as inline, there is also a usage of the #pragma MUST_ITERATE for some of the for loops.

The DSP optimization is set to 3 (interprocedure optimizations) and to maximize speed (level 5).

This long compilation time only happen in this specific file and i cant see anything special about it.

Appreciate your help to resolve this

Thanks

Guy

  • Guy Mardiks said:
    This long compilation time only happen in this specific file

    Please submit a test case based on this file as described in the article How to Submit a Compiler Test Case.

    Thanks and regards,

    -George

  • Hi,

    I cannot send our files but i what i have is  a nested for loops of 3 levels inside an inline function and this function can be called inside other function with different inputs

    inline void test_func(....)

    {

    for()

    {

     for()

     {

       for()

       {

    ....

        }

    }

    }

    ...

    }

    Is there a known issue regarding inline?if so, any suggestion on how to improve compile time without affecting performance?

    Guy

  • The two main reasons that compilation takes a long time are size and complexity.  Big things take longer, sometimes out of proportion to their size.  Complex things take longer, where "complex" is not always apparent from the source -- it could be a mess of cross-dependences or a series of assignments that can be combined into one giant expression or some weird combination that happens to be pathological.

    How much inlining is happening?  If these functions are inlined in many places, you could be creating something large.  I think that the --gen_opt_info=2 option will indicate which call sites are inlined.  The --keep_asm option will keep the .asm file from the compilation;  at the end is a section that lists the functions that are inlined, kind of like footnotes.  If too much inlining is the problem, there are several approaches;  which to use depends on the version of the compiler.

    Do the loops have known constant trip counts?  If the counts are small enough, the loops might be unrolled completely.  The idea is to allow for more software pipelining, but the compiler sometimes slows itself down in the pursuit of small speedups.  Try adding "#pragma UNROLL(1)" before each of the for-loops -- that will inhibit unrolling.  If that helps, then over-unrolling is the issue.  You may need to adjust the unroll amount to achieve the speed you want.

    If neither of these ideas helps, then we're just going to have to have a test case.  Your code might be triggering something complex, which we might be able to avoid by either modifying the code or changing the compiler, but there's no way to tell from what we've seen so far.

  • Hi, Thanks.
    When i added #pragma UNROLL(1) to the inner loop in one of the inline functions, indeed the compilation went much faster.
    The specific loop (currently) has constant number of iterations (4) any any unroll number below 4 (3,2,1) speed up compilation while 4 returns it to be very slow.
    why would this have such an impact?
  • Guy Mardiks said:
    why would this have such an impact?

    In the previous post pf states ...

    the compiler sometimes slows itself down in the pursuit of small speedups

    In your case, it is likely this becomes more pronounced the more you unroll the loop.

    Thanks and regards,

    -George

  • If the loop has recurrences, the unrolled loop can combine those into longer expressions, which starts to enter the "complexity" zone.  The loop itself (the next-outer loop, with the unrolled innermost loop spliced in) is larger, which affects both size and complexity.