This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28388D: Lack of Intrinsic Substitution for Standard Math Functions in TI’s libc with TMU1 Enabled

Part Number: TMS320F28388D


Tool/software:

TL;DR: Why doesn't the TI compiler (e.g., cl2000 v22.6.1.LTS) automatically map standard mathematical functions such as atan2f, sinf, and cosf, included via <cmath> or <math.h>, to their corresponding TMU-accelerated intrinsics (e.g., ATANPUF32, SINPUF32, COSPUF32) when TMU support is explicitly enabled (via --tmu_support=tmu1) and optimization flags are appropriately set (-O4, --opt_for_speed=5, etc.)?

In the following illustrative case:

#include <cmath>

float test(float x, float y) {
    float a = atan2(x, y);
    float b = sin(a);
    float c = cos(a);
    return a + b + c;
}

The compiled output clearly shows that the standard atan2f, sinf, and cosf functions are called as external symbols, despite the presence of TMU and FPU64 support:

        LCR       #||atan2f||           
        ...
        LCR       #||sinf||             
        ...
        LCR       #||cosf||  

Only when manually overriding the standard functions with inline wrappers that explicitly call the intrinsics (__atan2, __sin, __cos) is efficient TMU-based code generation produced:

inline float atan2(float x, float y) { return __atan2(y, x); }
inline float sin(float x) { return __sin(x); }
inline float cos(float x) { return __cos(x); }

This results in the expected use of hardware instructions:

        ATANPUF32 R0H,R0H
        ...
        SINPUF32  R1H,R1H
        COSPUF32  R2H,R2H

Given that TMU support is enabled, one would expect the compiler and standard library headers to detect the target architecture and transparently map these high-level math functions to their optimized, hardware-accelerated counterparts.

The build command is the following:

/opt/ti/ti-cgt-c2000_22.6.1.LTS/bin/cl2000 -c \
-D=USE_20MHZ_XTAL -D=_FLASH -D=CPU1 -D=__TMS320C28XX__ \
--issue_remarks --abi=eabi --tmu_support=tmu1 \
--float_support=fpu64 --gen_opt_info=2 \
-v28 -ml -O4 -op=3 --c_src_interlist --auto_inline \
--verbose_diagnostics --advice:performance=all \
--opt_for_speed=5 --preproc_with_compile --keep_asm \
-I/opt/ti/ti-cgt-c2000_22.6.1.LTS/include -z main.obj -o exec.out

Shouldn’t the TI standard C library (math.h) and C++ wrapper (<cmath>) integrate target-specific logic (e.g., via conditional macros or inline definitions) to redirect standard math functions to hardware intrinsics when available? Why is this redirection left entirely to the user? Is there a compiler flag or a TI-provided configuration that enables automatic mapping to intrinsics under TMU-supported builds?

  • Sorry for the delayed reply. The setting required for this is fp_mode=relaxed

    2.3.3 of https://www.ti.com/lit/spru514

  • Wonderful, this is exactly what I was looking for. 

    #include <cmath>
    
    float test(float x, float y) {
        float a = atan2(x, y);
        float b = sinf(a);
        float c = cosf(a);
        float d = a * 2 * M_PI;
        return a + b + c + d;
    }
    

    ||_Z4testff||:
    ;* R0    assigned to $O$C1
    ;* R0    assigned to x
    ;* R1    assigned to y
            QUADF32   R1H,R0H,R0H,R1H       ; [CPU_FPU] |9|
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            ATANPUF32 R0H,R0H               ; [CPU_FPU] |9|
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            ADDF32    R0H,R0H,R1H           ; [CPU_FPU] |9|
            NOP       ; [CPU_ALU]
            NOP       ; [CPU_ALU]
            MPY2PIF32 R0H,R0H               ; [CPU_FPU] |9|
            NOP       ; [CPU_ALU]
            MOVIZ     R1H,#16457            ; [CPU_FPU] |9|
            DIV2PIF32 R2H,R0H               ; [CPU_FPU] |9|
            DIV2PIF32 R3H,R0H               ; [CPU_FPU] |9|
            SINPUF32  R2H,R2H               ; [CPU_FPU] |9|
            ADDF32    R4H,R0H,R0H           ; [CPU_FPU] |9|
    ||      MOV32     *SP++,R4H             ; [CPU_FPU]
            COSPUF32  R3H,R3H               ; [CPU_FPU] |9|
            MOVXI     R1H,#4059             ; [CPU_FPU] |9|
            ADDF32    R0H,R0H,R2H           ; [CPU_FPU] |9|
            MPYF32    R1H,R1H,R4H           ; [CPU_FPU] |9|
            ADDF32    R0H,R0H,R3H           ; [CPU_FPU] |9|
            NOP       ; [CPU_ALU]
            ADDF32    R0H,R0H,R1H           ; [CPU_FPU] |9|
    ||      MOV32     R4H,*--SP             ; [CPU_FPU]
            LRETR     ; [CPU_ALU]