This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C2000-CGT: Tail call optimisation

Part Number: C2000-CGT
Other Parts Discussed in Thread: C2000WARE

Tool/software:

I am wondering if current versions of C2000-CGT are capable of doing tail-call optimisation. The compiler manual makes no mention of this, but it seems like a common enough optimisation to be found in C compilers.

This post I found indicates that at least very old versions of C2000-CGT were capable of optimising tail recursive calls: Using modf( float, float* ) in cmath results in infinite loop - Code Composer Studio forum - Code Composer StudioTm︎ - TI E2E support forums

However, my question is not aimed at tail recursive functions (we do not have any relevant occurrence of those in our code base), but about general tail call optimisation -- i. e. emitting LB instead of LCR; LRETR.

For a simple example, take the following C function:

__attribute__((ramfunc)) static void tx0(void) {
    set_low();
    DEVICE_DELAY_US(10);
    set_high();
    DEVICE_DELAY_US(5);
}

With DEVICE_DELAY_US expanding to a call to SysCtl_delay from C2000Ware (F2838x). The generated assembly (C2000-CGT 22.6.2.LTS, --silicon_version=28     --abi=eabi     --unified_memory     --cla_support=cla2     --float_support=fpu64     --idiv_support=idiv0     --tmu_support=tmu0     --vcu_support=vcrc     --opt_level=4 --opt_for_speed=3 --fp_mode=relaxed --symdebug:dwarf --c11 --relaxed_ansi --keep_asm --preproc_with_compile) 
is as follows:

        LCR       set_low
        MOV       ACC, #398
        LCR       SysCtl_delay
        LCR       set_high
        MOVB      ACC, #198
        LCR       SysCtl_delay
        LRETR

My naive expectation is that replacing the last two lines with LB SysCtl_delay would result in functionally equivalent code, but saving 4 CPU cycles upon return. This is of course irrelevant in the specific example given the long intentional delays, but it hopefully gets the point across. Other examples could be shared, but likely require some amount of manual editing to not risk publishing any proprietary IP.

Some of these occasions in the generated assembly for our code get eliminated during linking with -O4 due to the called function being inlined, but non-inlined call sites remain. This also includes indirect calls (i.e. LCR *XARn; LRETR), which I assume to be similarly replaceable with LB *XAR7.

This leaves me wondering: Are there any specific compile flags needed to enable tail call optimisation? Conversely, are there flags known to inhibit it? Or is the C2000 compiler not capable of tail call optimisation in general?

  • Hello!

    Or is the C2000 compiler not capable of tail call optimisation in general?

    Support for tail call optimization and tail recursion in the C2000 compiler is more limited than in other TI compilers for other processors.  Unfortunately, that doesn't help you out very much.  You mention you could provide some examples with some manual editing. IA simple test case demonstrated your case interacting with the code in question would help us dig into it a bit more.  Please follow the directions in the article How to Submit a Compiler Test Case

    Thanks,

    -Alan

  • Turns out even the simplest single-file testcase fails to tailcall:

    extern void foo(void);
    
    void bar(void) {
    		foo();
    }

    Compiled with:

    cl2000.exe --silicon_version=28     --abi=eabi     --unified_memory     --cla_support=cla2     --float_support=fpu64     --idiv_support=idiv0     --tmu_support=tmu0     --vcu_support=vcrc     --opt_level=4 --opt_for_speed=3 --fp_mode=relaxed --symdebug:dwarf --c11 --relaxed_ansi --keep_asm --preproc_with_compile foo.c

    Now if in the actual code this is the only call to foo(), it might get inlined during link-time optimisation. For the non-inlined case however, I have yet to observe a tail call get inserted during LTO. What thus remains is the originally-emitted assembly:

    ||bar||:
    	.dwcfi	cfa_offset, -2
    	.dwcfi	save_reg_to_mem, 26, 0
    	.dwpsn	file "foo.c",line 4,column 3,is_stmt,isa 0
    $C$DW$3	.dwtag  DW_TAG_TI_branch
    	.dwattr $C$DW$3, DW_AT_low_pc(0x00)
    	.dwattr $C$DW$3, DW_AT_name("foo")
    	.dwattr $C$DW$3, DW_AT_TI_call
    
            LCR       #||foo||              ; [CPU_ALU] |4| 
            ; call occurs [#||foo||] ; [] |4| 
    $C$DW$4	.dwtag  DW_TAG_TI_branch
    	.dwattr $C$DW$4, DW_AT_low_pc(0x00)
    	.dwattr $C$DW$4, DW_AT_TI_return
    
            LRETR     ; [CPU_ALU] 
            ; return occurs ; [] 
    	.dwattr $C$DW$2, DW_AT_TI_end_file("foo.c")
    	.dwattr $C$DW$2, DW_AT_TI_end_line(0x05)
    	.dwattr $C$DW$2, DW_AT_TI_end_column(0x01)
    	.dwendentry
    	.dwendtag $C$DW$2
    

  • my question is not aimed at tail recursive functions (we do not have any relevant occurrence of those in our code base), but about general tail call optimisation -- i. e. emitting LB instead of LCR; LRETR.

    The C2000 compiler optimizes tail recursive functions, but not general tail calls.

    Thanks and regards,

    -George