TMS320F28388D: Lack of Intrinsic Substitution for Standard Math Functions in TI’s libc with TMU1 Enabled

Yves Chevallier

Part Number: TMS320F28388D

Tool/software:

TL;DR: Why doesn't the TI compiler (e.g., cl2000 v22.6.1.LTS) automatically map standard mathematical functions such as atan2f, sinf, and cosf, included via <cmath> or <math.h>, to their corresponding TMU-accelerated intrinsics (e.g., ATANPUF32, SINPUF32, COSPUF32) when TMU support is explicitly enabled (via --tmu_support=tmu1) and optimization flags are appropriately set (-O4, --opt_for_speed=5, etc.)?

In the following illustrative case:

#include <cmath>

float test(float x, float y) {
    float a = atan2(x, y);
    float b = sin(a);
    float c = cos(a);
    return a + b + c;
}

The compiled output clearly shows that the standard atan2f, sinf, and cosf functions are called as external symbols, despite the presence of TMU and FPU64 support:

        LCR       #||atan2f||           
        ...
        LCR       #||sinf||             
        ...
        LCR       #||cosf||

Only when manually overriding the standard functions with inline wrappers that explicitly call the intrinsics (__atan2, __sin, __cos) is efficient TMU-based code generation produced:

inline float atan2(float x, float y) { return __atan2(y, x); }
inline float sin(float x) { return __sin(x); }
inline float cos(float x) { return __cos(x); }

This results in the expected use of hardware instructions:

        ATANPUF32 R0H,R0H
        ...
        SINPUF32  R1H,R1H
        COSPUF32  R2H,R2H

Given that TMU support is enabled, one would expect the compiler and standard library headers to detect the target architecture and transparently map these high-level math functions to their optimized, hardware-accelerated counterparts.

The build command is the following:

/opt/ti/ti-cgt-c2000_22.6.1.LTS/bin/cl2000 -c \
-D=USE_20MHZ_XTAL -D=_FLASH -D=CPU1 -D=__TMS320C28XX__ \
--issue_remarks --abi=eabi --tmu_support=tmu1 \
--float_support=fpu64 --gen_opt_info=2 \
-v28 -ml -O4 -op=3 --c_src_interlist --auto_inline \
--verbose_diagnostics --advice:performance=all \
--opt_for_speed=5 --preproc_with_compile --keep_asm \
-I/opt/ti/ti-cgt-c2000_22.6.1.LTS/include -z main.obj -o exec.out

Shouldn’t the TI standard C library (math.h) and C++ wrapper (<cmath>) integrate target-specific logic (e.g., via conditional macros or inline definitions) to redirect standard math functions to hardware intrinsics when available? Why is this redirection left entirely to the user? Is there a compiler flag or a TI-provided configuration that enables automatic mapping to intrinsics under TMU-supported builds?

6 months ago

0 Sira Rao80 6 months ago

TI__Mastermind 26290 points

Sorry for the delayed reply. The setting required for this is fp_mode=relaxed

2.3.3 of https://www.ti.com/lit/spru514

0 Yves Chevallier 6 months ago in reply to Sira Rao80

Prodigy 70 points

Wonderful, this is exactly what I was looking for.

#include <cmath>

float test(float x, float y) {
    float a = atan2(x, y);
    float b = sinf(a);
    float c = cosf(a);
    float d = a * 2 * M_PI;
    return a + b + c + d;
}

||_Z4testff||:
;* R0    assigned to $O$C1
;* R0    assigned to x
;* R1    assigned to y
        QUADF32   R1H,R0H,R0H,R1H       ; [CPU_FPU] |9|
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        ATANPUF32 R0H,R0H               ; [CPU_FPU] |9|
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        ADDF32    R0H,R0H,R1H           ; [CPU_FPU] |9|
        NOP       ; [CPU_ALU]
        NOP       ; [CPU_ALU]
        MPY2PIF32 R0H,R0H               ; [CPU_FPU] |9|
        NOP       ; [CPU_ALU]
        MOVIZ     R1H,#16457            ; [CPU_FPU] |9|
        DIV2PIF32 R2H,R0H               ; [CPU_FPU] |9|
        DIV2PIF32 R3H,R0H               ; [CPU_FPU] |9|
        SINPUF32  R2H,R2H               ; [CPU_FPU] |9|
        ADDF32    R4H,R0H,R0H           ; [CPU_FPU] |9|
||      MOV32     *SP++,R4H             ; [CPU_FPU]
        COSPUF32  R3H,R3H               ; [CPU_FPU] |9|
        MOVXI     R1H,#4059             ; [CPU_FPU] |9|
        ADDF32    R0H,R0H,R2H           ; [CPU_FPU] |9|
        MPYF32    R1H,R1H,R4H           ; [CPU_FPU] |9|
        ADDF32    R0H,R0H,R3H           ; [CPU_FPU] |9|
        NOP       ; [CPU_ALU]
        ADDF32    R0H,R0H,R1H           ; [CPU_FPU] |9|
||      MOV32     R4H,*--SP             ; [CPU_FPU]
        LRETR     ; [CPU_ALU]

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28388D: Lack of Intrinsic Substitution for Standard Math Functions in TI’s libc with TMU1 Enabled