This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CGT Reordering guarantees

Hi


I am wondering under what circumstances the TI compiler reorders instructions, or rather, under what circumstances it does not. Are there any guarantees at all? What documentation is there (I cannot find any)?

In particular:

- Will the TI compiler move code around volatile accesses?
- Will it move code around calls to Hwi_disable()/_restore()? To Semaphore_... calls? Gate... calls?
- Is there any mechanism similar to gcc's asm("" : : : "memory")?

Regards

Markus

  • Markus Moll said:
    Will the TI compiler move code around volatile accesses?

    Please see this wiki article for a discussion of the volatile keyword.

    Markus Moll said:
    Will it move code around calls to Hwi_disable()/_restore()? To Semaphore_... calls? Gate... calls?

    I'm not specifically familiar with these.  I presume they appear to the compiler as ordinary function calls, and are treated as such.

    Markus Moll said:
    Is there any mechanism similar to gcc's asm("" : : : "memory")?

    No.

    Thanks and regards,

    -George

  • Hi

    I had seen the wiki article, it doesn't seem to say anything about instruction reordering (which, I guess, means that there aren't any guarantees).

    I am asking because the problem might arise in any typical producer/consumer pattern. Even if I use appropriate locks, unless prior memory accesses are guaranteed to be completed at the point where I release the lock, the consumer might see invalid/incomplete data. The problem becomes more pronounced when using whole-program optimization or any other kind of inter-procedure optimization, as the compiler will not have to treat function calls conservatively.

    There are a number of solutions, one is to make every single bit of shared data volatile, which is much too strong a restriction. Another solution is to use assembly to retain control over execution order. A third solution is to disable optimizations for some functions, which will then prescribe execution order by calling appropriate sub-procedures. None of these are particular appealing to me (although I have a clear preference for the third solution).

    Of course, most of the time straight-forward implementations (seemingly) work very well but I just do not like to maintain code that might break with any compiler upgrade or even with slight modifications of unrelated parts of the code.


    Regards

    Markus

  • The volatile wiki article doesn't talk about instructions because the compiler handles volatile strictly according to the C semantics, which don't deal with the concept of instructions.  However, there's pretty much a one-to-one correspondence between a volatile access and the instruction generated, so it's not unreasonable to view it as saying that instructions with volatile accesses won't be reordered with respect to other instructions with volatile accesses.

    It's true that the TI compiler only guarantees that volatile accesses won't be reordered with respect to other volatile accesses.  Non-volatile accesses are fair game; they can be moved or more importantly optimized away.  In theory this includes any statement including function calls.  Yes, this means that the only solution in TI C to creating a critical section is to use volatile for all objects accessed in the critical section.  Resorting to assembly means the compiler can't see the function and must assume it might be performing volatile accesses.  Dispatching to subroutines works as long as the optimization level isn't high enough that the compiler can see and perhaps inline those subroutines.

  • Archaeologist said:

    It's true that the TI compiler only guarantees that volatile accesses won't be reordered with respect to other volatile accesses.  Non-volatile accesses are fair game; they can be moved or more importantly optimized away.  In theory this includes any statement including function calls.  Yes, this means that the only solution in TI C to creating a critical section is to use volatile for all objects accessed in the critical section.  Resorting to assembly means the compiler can't see the function and must assume it might be performing volatile accesses.  Dispatching to subroutines works as long as the optimization level isn't high enough that the compiler can see and perhaps inline those subroutines.

    Would the bold statement in the quote above also apply to inline assembly? If so, can the compiler still reorder non-volatile accesses around inline assembly (but not volatile ones)?
  • The TI compiler guarantees very little about inline assembly; you should only use inline assembly with extreme caution.

    The TI compiler does not read the contents of the inline assembly statement; it does not even know if that statement performs a memory access.

    Yes, the TI compiler might reorder non-volatile memory accesses around an inline assembly statement. Once it reaches a certain point, it becomes a barrier instruction and disallows instruction reordering across the inline assembly, but that's not enough to prevent all reordering.
  • Archaeologist said:
    Yes, the TI compiler might reorder non-volatile memory accesses around an inline assembly statement.

    Thanks for the clarification. Is that also true for intrinsics, or do they have any additional compiler barrier behaviour? I'm particularly thinking of __ldrex and __strex here.

  • Intrinsics are understood by the compiler; they are handled just like built-in expressions. If you use them on volatile data, they should not be reordered. However, see e2e.ti.com/.../1673307
  • Archaeologist said:
    Intrinsics are understood by the compiler; they are handled just like built-in expressions. If you use them on volatile data, they should not be reordered.

    Sorry to labour the point, but is the guarantee that if they're used on volatile data they won't be reordered relative to other volatile accesses, or that they won't be reordered relative to anything? (ignoring the current __ldrex/__strex issues)

  • It depends on the intrinsic; some are barriers, some are not. A saturated add intrinsic is just like an add operator, and can be moved around as needed by the compiler. However, a disable interrupts intrinsic is a barrier.
  • Archaeologist said:
    It depends on the intrinsic; some are barriers, some are not. A saturated add intrinsic is just like an add operator, and can be moved around as needed by the compiler. However, a disable interrupts intrinsic is a barrier.

    Ah, that makes sense.

    For TI's ARM compiler it also creates an opportunity to provide a compiler barrier with __dmb(), __dsb()  and __isb() instrinsics. Currently those instructions can only be inserted with inline assembly, so they don't create a full compiler barrier. If these intrinsics did exist and acted as a compiler barrier it would remove the need to qualify every variable accessed in a critical section as volatile.

    EDIT: Having made the above comment about avoiding excess use of volatile, I'm struggling to think of a situation where that would be applicable. In a producer/consumer scenario both the lock variable and the shared data it protects need to be volatile. If the shared data isn't volatile, then not only can the compiler reorder access to it around the lock operations but also may optimise it out entirely.

  • Hello,

    I'd like to ask another question, namely how the TI compiler treats reordering of function calls, when the function code is in static library?

    Does it assume the function does volatile accesses and thus does not reorder these calls with other volatile accesses (in particular with other static library functions' calls).

    I'm considering at least two use cases: library compiled without optimizations and with --opt_level=4.

    Thank you

  • I have reached out to some experts who have more depth of knowledge on this question than me.  However, because of the holiday in the US, I don't expect to hear from them until next week.  In the meantime, here is my (possibly wrong) understanding.

    Assume the function calls are not inlined.  With regard to two function calls, they are never reordered.  With regard to a function call and a volatile access, they are never reordered.

    If a function is inlined, that changes everything.  After the inlining occurs, then the code in the function becomes part of the function which calls it.  All sorts of reordering goes on at this point.  The key thing to understand about inlining is this: For any function inlining to occur, the source code to the function must be somehow visible at compile time.  This often occurs when a function definition is supplied in a header file.  But it is also possible to inline a call to a function defined in the same file.

    Thanks and regards,

    -George