TMS320F28379D: fmin, fmax running slowly (calling 64-bit functions)

Wil P.

Part Number: TMS320F28379D

This is a post about a development tool or software, but I seem to have created it in the wrong spot. Please move it as appropriate.

Hello,

Here is my setup:

Windows 10
CCS 10.2
F28379D
TI v20.12.0.STS
Processor options: fpu32, tmu0, vcu2, relaxed (fp_mode)
EABI
Libraries included: c28x_fpu_dsp_library_eabi, rts2800_fpu32_eabi

I am trying to use the FPU instruction MINF32 and MAXF32. As described on SPRUEO2 (Float Instructions), MINF32 is a single cycle instruction.

Consider the following code:

target=fmaxf(orig,min);

Instead of calling the MAXF32 function, it calls the following code from the ti_fmax.c file in the compiler directory:

#if FLT_MANT_DIG != DBL_MANT_DIG
float fmaxf(float x, float y)
{
    if (isnan(x)) return y;
    if (isnan(y)) return x;

    /* -0 and +0 compare equal, but we must return +0, so check the
        sign bit */

    if (__FLOAT_SIGN_BIT_ZERO(x) != __FLOAT_SIGN_BIT_ZERO(y))
        if (__FLOAT_SIGN_BIT_ZERO(x)) return x;
        else                          return y;

    return (x > y ? x : y);
}
#else
float fmaxf(float x, float y) __attribute__((__alias__("fmax")));
#endif

double fmax(double x, double y)
{
    if (isnan(x)) return y;
    if (isnan(y)) return x;

    /* -0 and +0 compare equal, but we must return +0, so check the
        sign bit */

    if (__DOUBLE_SIGN_BIT_ZERO(x) != __DOUBLE_SIGN_BIT_ZERO(y))
        if (__DOUBLE_SIGN_BIT_ZERO(x)) return x;
        else                           return y;

    return (x > y ? x : y);
}

The pre-processor directive fails because in float.h, FLT_MANT_DIG is 24 and DBL_MANT_DIG is 53, so the code defined as a double is called. This code uses 64-bit definitions which I am aware that the processor does not support natively. As you can see in the processor options above, I have set the floating point support setting to fpu32. The code runs very slowly compared to a conditional by an order of magnitude.

In the end, I would like the min and max functions to be 32-bit and run in a single cycle. Ultimately I need to bound a variable so I want something like this:

target= fminf(fmaxf(orig,min),max);

I expect that to compile into several 32-bit register moves and a MINF32 and MAXF32, say 10 cycles at most.

Summary of questions that I would like some answers to:

Is the call to the ti_fmax.c fmax function a bug? How do I force it to run the FPU instruction? I tried __MAXF32, but it did not work. Ideally, I would like the compiler to do this.
What is the fastest way to bound a float variable on this system? I have to do this many times in my application. Currently I am doing something very similar to a nested version of "return (x > y ? x : y);".

Thanks in advance for any help or insight.

over 3 years ago

0 Chester Gillon over 3 years ago

Guru 92251 points

I'm not sure if the compiler optimiser is supposed to automatically inline run time library calls into instructions.

I can repeat this with TI v20.12.0.STS and optimisation level 1 - Local Optimisations.

The following in the C source:

#include <math.h>

float math_funcs (const float orig, const float min, const float max)
{
    const float target = fminf(fmaxf(orig,min),max);

    return target;
}

Generates the following assembler which has calls to the run time library functions:

||math_funcs||:
        MOV32     *SP++,R4H             ; [CPU_FPU] 
        MOV32     R4H,R2H               ; [CPU_FPU] |5| 
;----------------------------------------------------------------------
;   6 | const float target = fminf(fmaxf(orig,min),max);                       
;   8 | return target;                                                         
;----------------------------------------------------------------------
        LCR       #||fmaxf||            ; [CPU_ALU] |6| 
        ; call occurs [#||fmaxf||] ; [] |6| 
        MOV32     R1H,R4H               ; [CPU_FPU] |6| 
        LCR       #||fminf||            ; [CPU_ALU] |6| 
        ; call occurs [#||fminf||] ; [] |6| 
        MOV32     R4H,*--SP             ; [CPU_FPU] 
        LRETR     ; [CPU_ALU]

Whereas if use the __fmax and __fmin intrinsics specified in the 7.6.1 Floating Point Unit (FPU) Intrinsics section of TMS320C28x Optimizing C/C++ Compiler v20.12.0.STS User's Guide (Rev. V) :

float intrinsic_funcs (const float orig, const float min, const float max)
{
    const float target = __fmin(__fmax(orig,min),max);

    return target;
}

The generated assembler uses the MAXF32 and MINF32 instructions:

||intrinsic_funcs||:
;----------------------------------------------------------------------
;  15 | return target;                                                         
;----------------------------------------------------------------------
        MAXF32    R0H,R1H               ; [CPU_FPU] |15| 
        MINF32    R0H,R2H               ; [CPU_FPU] |15| 
        LRETR     ; [CPU_ALU]

I have attached the CCS test project used. TMS320F28379D_fmin_fmax.zip

0 George Mock over 3 years ago

TI__Guru**** 239845 points

Thank you for informing us of this problem. And thank you to Chester Gillon for further characterizing the problem. I can reproduce a similar result. However, I am not certain that I have captured all the subtle nuances of what is happening. So I can do that, for the source file that calls fmaxf and fminf, please follow the directions in the article How to Submit a Compiler Test Case.

Thanks and regards,

-George

0 Wil P. over 3 years ago in reply to Chester Gillon

Expert 1640 points

Chester,

Thank you for an insightful and thorough post. Were you able to get any performance data on the instrinic vs the function call? Also, I appreciate you providing the reference for the intrinsic functions, this will help me at least in the short term.

Please forgive my ignorance as you appear to be much better at this than I am, but what do you mean by:

Chester Gillon said:
I'm not sure if the compiler optimiser is supposed to automatically inline run time library calls into instructions.

I must be missing something, but this is what I thought the compiler is supposed to do. For example, with the proper build configuration, "cos(x)" will compile into TMU functions (DIV2PIF3 and COSPUF32); can you help me understand how this example is different than changing "fmax" to FPU intinsics?

George,

I am not sure what else you'd need. Chester attached a MWE complete project to his post.

0 George Mock over 3 years ago in reply to Wil P.

TI__Guru**** 239845 points

William Perdikakis said:

George,

I am not sure what else you'd need.

I'd like to see the compiler options you use, exactly as the compiler sees them. I agree they are likely to be similar to the compiler options Chester used in the project he generously attached to his post. But because the smallest details can matter, I asked for a complete test case.

In addition, there may be a good reason the compiler is unable to automatically map a call to fmaxf to the instruction MAXF32. I'm exploring that possibility and will get back to you.

Thanks and regards,

-George

0 George Mock over 3 years ago in reply to George Mock

TI__Guru**** 239845 points

I filed the entry EXT_EP-10258 to request that the compiler be changed. When a call to fmaxf or fminf is seen, the code generated should not call a function in the RTS library, but use the instructions MAXF32 or MINF32. You are welcome to follow this entry with that link.

Thanks and regards,

-George

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28379D: fmin, fmax running slowly (calling 64-bit functions)