This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28075: FPU/TMU: Fast IQn fraction to float converion

Part Number: TMS320F28075
Other Parts Discussed in Thread: CONTROLSUITE

Hi,

I am looking for a fast way to convert the fraction of an IQ style long integer number to a float for a subsequent __cospuf32() TMU intrinsic call.

I am in doubt how much IQ math support is in the F28075 (No ROM tables, right?). I tried the following:

#include "IQmathLib.h"
#include <math.h>


typedef float   float32_t;

float32_t   x, y, z;
int32_t     lq16val = 0x0003c000; /* 3.75 */
int32_t     t0, t1, t2, t3;

float32_t   (*lq2f)(int32_t) = _IQ16toF; /* use a function pointer that points to the right IQxxtoF function if Q changes */

#define LQ16_MASK_FRACTION ((1L << 16) - 1)

void
test_tmu()
{
    t0 = CpuTimer0Regs.TIM.all;
    x = ldexp((float32_t)(lq16val & LQ16_MASK_FRACTION), -16);
    t1 = CpuTimer0Regs.TIM.all;
    y = _IQ16toF(lq16val & LQ16_MASK_FRACTION);
    t2 = CpuTimer0Regs.TIM.all;
    z = (*lq2f)(lq16val & LQ16_MASK_FRACTION);
    t3 = CpuTimer0Regs.TIM.all;

    t0 = t0 - t1; // 70 cycles
    t1 = t1 - t2; // 32 cycles
    t2 = t2 - t3; // 37 cycles

    x = __sinpuf32(z); // etc.
}

I linked with the C:\ti\controlSUITE\libs\math\IQmath\v160\lib\IQmath_fpu32.lib, I received warnings on build:

warning #16002-D: build attribute vendor section TI missing in "J:\workspace\firmware\c2000\libc2000\lib\IQmath_fpu32.lib<IQ16toF.obj>": compatibility cannot be determined
warning #10247-D: creating output section "IQmath" without a SECTIONS specification

The results computed are correct (x == y == z = 0.75).

My questions are:

What with IQ math on the 28075?

Any better idea to do the job (faster!)? I searched in the Table 7-[6-8]. TMS320C28x C/C++ Compiler Intrinsics in the SPRU514L TMS320C28x Optimizing C/C++ Compiler Guide, but did not found anything appropriate.

Thanks,

Frank


  

  • Hi Frank,

    fmdhr said:
    What with IQ math on the 28075?

    The IQMath tables were taken out of ROM, so you will have to load them to FLASH on this device, but other than that IQMath can be used with this device

    fmdhr said:
    Any better idea to do the job (faster!)? I searched in the Table 7-[6-8]. TMS320C28x C/C++ Compiler Intrinsics in the SPRU514L TMS320C28x Optimizing C/C++ Compiler Guide, but did not found anything appropriate.

    Well you could try this

    ((float)(lq16val & LQ16_MASK_FRACTION))*(1.0/(1UL<<16))

    It should mask off the fractional portion, convert to float and then multiply by 2^-16. The compiler should precompute the value 2^-16 so you dont waste runtime cycles on it.

    Alternatively you could try this

    ((float)(((uint32_t)lq16val << 16)>>16)*(1.0/(1UL<<16))

  • Hi Vishal,

    thank you for the fast reply. Sometimes it's difficult do see the easy way. Below are my measurements supplemented with your suggestions, both for fix and variable Q values. I have put the code into separate functions to prevent the heavy compiler optimisations (-O2) that mixed the operations for the several statements.

    Regards,
    Frank

    #define LQ16_MASK_FRACTION ((1L << 16) - 1)
    #define LQ12_MASK_FRACTION ((1L << 12) - 1)

    volatile int32_t mask16 = LQ16_MASK_FRACTION, mask12 = LQ12_MASK_FRACTION;
    volatile int16_t shft16 = 16, shft12 = 12;
    volatile float32_t scale16 = 1.0 / (1UL << 16);
    volatile float32_t scale12 = 1.0 / (1UL << 12);

    // with cpu cycles (cpu timer cycles)
    #if FIX_Q
    void f1() { x = ldexp((float32_t)(lq16val & LQ16_MASK_FRACTION), -16); } // 80
    void f2() { y = _IQ16toF(lq16val & LQ16_MASK_FRACTION); } // 43
    void f3() { z = (*lq2f)(lq16val & LQ16_MASK_FRACTION); } // 44
    void f4() { x = ((float)(lq16val & LQ16_MASK_FRACTION)) * (1.0/(1UL<<16)); } // 27
    void f5() { y = (float)(((uint32_t)lq16val << 16)>>16) * (1.0/(1UL<<16)); } // 26
    void f6() { x = ((float)(lq12val & LQ12_MASK_FRACTION)) * (1.0/(1UL<<12)); } // 28
    void f7() { y = (float)(((uint32_t)lq12val << 12)>>12) * (1.0/(1UL<<12)); } //26
    #else
    void f1() { x = ldexp((float32_t)(lq16val & mask16), -16); } //81
    void f2() { y = _IQ16toF(lq16val & mask16); } //42
    void f3() { z = (*lq2f)(lq16val & mask16); } //44
    void f4() { x = ((float)(lq16val & mask16)) * scale16; } // 26
    void f5() { y = (float)(((uint32_t)lq16val << shft16) >> shft16) * scale16; } // 29
    void f6() { x = ((float)(lq12val & mask12)) * scale12; } // 26
    void f7() { y = (float)(((uint32_t)lq12val << shft12) >> shft12) * scale12; } // 29
    #endif
  • I think you can shave off 8 cycles (call LCR and return LRETR) from each if you make those functions inline.
  • Hi Vishal,

    of course the code will be inline (or inlined) in the final release. I put the code in the several functions only to get reliable and comparable measurements for the different versions. There were strange results due to the optimizer, which seemed to use e.g. the delay slots in the fpu pipeline.

    Thanks again,

    Frank