Other Parts Discussed in Thread: C2000WARE
Tool/software:
I am wondering if current versions of C2000-CGT are capable of doing tail-call optimisation. The compiler manual makes no mention of this, but it seems like a common enough optimisation to be found in C compilers.
This post I found indicates that at least very old versions of C2000-CGT were capable of optimising tail recursive calls: Using modf( float, float* ) in cmath results in infinite loop - Code Composer Studio forum - Code Composer Studio︎ - TI E2E support forums
However, my question is not aimed at tail recursive functions (we do not have any relevant occurrence of those in our code base), but about general tail call optimisation -- i. e. emitting LB instead of LCR; LRETR.
For a simple example, take the following C function:
__attribute__((ramfunc)) static void tx0(void) { set_low(); DEVICE_DELAY_US(10); set_high(); DEVICE_DELAY_US(5); }
With DEVICE_DELAY_US expanding to a call to SysCtl_delay from C2000Ware (F2838x). The generated assembly (C2000-CGT 22.6.2.LTS, --silicon_version=28 --abi=eabi --unified_memory --cla_support=cla2 --float_support=fpu64 --idiv_support=idiv0 --tmu_support=tmu0 --vcu_support=vcrc --opt_level=4 --opt_for_speed=3 --fp_mode=relaxed --symdebug:dwarf --c11 --relaxed_ansi --keep_asm --preproc_with_compile)
is as follows:
LCR set_low MOV ACC, #398 LCR SysCtl_delay LCR set_high MOVB ACC, #198 LCR SysCtl_delay LRETR
My naive expectation is that replacing the last two lines with LB SysCtl_delay
would result in functionally equivalent code, but saving 4 CPU cycles upon return. This is of course irrelevant in the specific example given the long intentional delays, but it hopefully gets the point across. Other examples could be shared, but likely require some amount of manual editing to not risk publishing any proprietary IP.
Some of these occasions in the generated assembly for our code get eliminated during linking with -O4 due to the called function being inlined, but non-inlined call sites remain. This also includes indirect calls (i.e. LCR *XARn; LRETR), which I assume to be similarly replaceable with LB *XAR7.
This leaves me wondering: Are there any specific compile flags needed to enable tail call optimisation? Conversely, are there flags known to inhibit it? Or is the C2000 compiler not capable of tail call optimisation in general?