C2000-CGT: Tail call optimisation

Part Number: C2000-CGT
Other Parts Discussed in Thread: C2000WARE

Tool/software:

I am wondering if current versions of C2000-CGT are capable of doing tail-call optimisation. The compiler manual makes no mention of this, but it seems like a common enough optimisation to be found in C compilers.

This post I found indicates that at least very old versions of C2000-CGT were capable of optimising tail recursive calls: Using modf( float, float* ) in cmath results in infinite loop - Code Composer Studio forum - Code Composer StudioTm︎ - TI E2E support forums

However, my question is not aimed at tail recursive functions (we do not have any relevant occurrence of those in our code base), but about general tail call optimisation -- i. e. emitting LB instead of LCR; LRETR.

For a simple example, take the following C function:

__attribute__((ramfunc)) static void tx0(void) {
    set_low();
    DEVICE_DELAY_US(10);
    set_high();
    DEVICE_DELAY_US(5);
}

With DEVICE_DELAY_US expanding to a call to SysCtl_delay from C2000Ware (F2838x). The generated assembly (C2000-CGT 22.6.2.LTS, --silicon_version=28     --abi=eabi     --unified_memory     --cla_support=cla2     --float_support=fpu64     --idiv_support=idiv0     --tmu_support=tmu0     --vcu_support=vcrc     --opt_level=4 --opt_for_speed=3 --fp_mode=relaxed --symdebug:dwarf --c11 --relaxed_ansi --keep_asm --preproc_with_compile) 
is as follows:

        LCR       set_low
        MOV       ACC, #398
        LCR       SysCtl_delay
        LCR       set_high
        MOVB      ACC, #198
        LCR       SysCtl_delay
        LRETR

My naive expectation is that replacing the last two lines with LB SysCtl_delay would result in functionally equivalent code, but saving 4 CPU cycles upon return. This is of course irrelevant in the specific example given the long intentional delays, but it hopefully gets the point across. Other examples could be shared, but likely require some amount of manual editing to not risk publishing any proprietary IP.

Some of these occasions in the generated assembly for our code get eliminated during linking with -O4 due to the called function being inlined, but non-inlined call sites remain. This also includes indirect calls (i.e. LCR *XARn; LRETR), which I assume to be similarly replaceable with LB *XAR7.

This leaves me wondering: Are there any specific compile flags needed to enable tail call optimisation? Conversely, are there flags known to inhibit it? Or is the C2000 compiler not capable of tail call optimisation in general?

  • Hello!

    Or is the C2000 compiler not capable of tail call optimisation in general?

    Support for tail call optimization and tail recursion in the C2000 compiler is more limited than in other TI compilers for other processors.  Unfortunately, that doesn't help you out very much.  You mention you could provide some examples with some manual editing. IA simple test case demonstrated your case interacting with the code in question would help us dig into it a bit more.  Please follow the directions in the article How to Submit a Compiler Test Case

    Thanks,

    -Alan