TMS320F28075: FPU/TMU: Fast IQn fraction to float converion

fmdhr

Part Number: TMS320F28075
Other Parts Discussed in Thread: CONTROLSUITE

Hi,

I am looking for a fast way to convert the fraction of an IQ style long integer number to a float for a subsequent __cospuf32() TMU intrinsic call.

I am in doubt how much IQ math support is in the F28075 (No ROM tables, right?). I tried the following:

#include "IQmathLib.h"
#include <math.h>

typedef float float32_t;

float32_t   x, y, z;
int32_t     lq16val = 0x0003c000; /* 3.75 */
int32_t     t0, t1, t2, t3;

float32_t   (*lq2f)(int32_t) = _IQ16toF; /* use a function pointer that points to the right IQxxtoF function if Q changes */

#define LQ16_MASK_FRACTION ((1L << 16) - 1)

void
test_tmu()
{
    t0 = CpuTimer0Regs.TIM.all;
    x = ldexp((float32_t)(lq16val & LQ16_MASK_FRACTION), -16);
    t1 = CpuTimer0Regs.TIM.all;
    y = _IQ16toF(lq16val & LQ16_MASK_FRACTION);
    t2 = CpuTimer0Regs.TIM.all;
    z = (*lq2f)(lq16val & LQ16_MASK_FRACTION);
    t3 = CpuTimer0Regs.TIM.all;

    t0 = t0 - t1; // 70 cycles
    t1 = t1 - t2; // 32 cycles
    t2 = t2 - t3; // 37 cycles

x = __sinpuf32(z); // etc.
}

I linked with the C:\ti\controlSUITE\libs\math\IQmath\v160\lib\IQmath_fpu32.lib, I received warnings on build:

warning #16002-D: build attribute vendor section TI missing in "J:\workspace\firmware\c2000\libc2000\lib\IQmath_fpu32.lib<IQ16toF.obj>": compatibility cannot be determined
warning #10247-D: creating output section "IQmath" without a SECTIONS specification

The results computed are correct (x == y == z = 0.75).

My questions are:

What with IQ math on the 28075?

Any better idea to do the job (faster!)? I searched in the Table 7-[6-8]. TMS320C28x C/C++ Compiler Intrinsics in the SPRU514L TMS320C28x Optimizing C/C++ Compiler Guide, but did not found anything appropriate.

Thanks,

Frank

over 8 years ago

0 Vishal_Coelho over 8 years ago

TI__Mastermind 20850 points

Hi Frank,

fmdhr said:
What with IQ math on the 28075?

The IQMath tables were taken out of ROM, so you will have to load them to FLASH on this device, but other than that IQMath can be used with this device

fmdhr said:
Any better idea to do the job (faster!)? I searched in the Table 7-[6-8]. TMS320C28x C/C++ Compiler Intrinsics in the SPRU514L TMS320C28x Optimizing C/C++ Compiler Guide, but did not found anything appropriate.

Well you could try this

((float)(lq16val & LQ16_MASK_FRACTION))*(1.0/(1UL<<16))

It should mask off the fractional portion, convert to float and then multiply by 2^-16. The compiler should precompute the value 2^-16 so you dont waste runtime cycles on it.

Alternatively you could try this

((float)(((uint32_t)lq16val << 16)>>16)*(1.0/(1UL<<16))

0 fmdhr over 8 years ago in reply to Vishal_Coelho

Expert 1530 points

Hi Vishal,

thank you for the fast reply. Sometimes it's difficult do see the easy way. Below are my measurements supplemented with your suggestions, both for fix and variable Q values. I have put the code into separate functions to prevent the heavy compiler optimisations (-O2) that mixed the operations for the several statements.

Regards,
Frank

#define LQ16_MASK_FRACTION ((1L << 16) - 1)
#define LQ12_MASK_FRACTION ((1L << 12) - 1)

volatile int32_t mask16 = LQ16_MASK_FRACTION, mask12 = LQ12_MASK_FRACTION;
volatile int16_t shft16 = 16, shft12 = 12;
volatile float32_t scale16 = 1.0 / (1UL << 16);
volatile float32_t scale12 = 1.0 / (1UL << 12);

// with cpu cycles (cpu timer cycles)
#if FIX_Q
void f1() { x = ldexp((float32_t)(lq16val & LQ16_MASK_FRACTION), -16); } // 80
void f2() { y = _IQ16toF(lq16val & LQ16_MASK_FRACTION); } // 43
void f3() { z = (*lq2f)(lq16val & LQ16_MASK_FRACTION); } // 44
void f4() { x = ((float)(lq16val & LQ16_MASK_FRACTION)) * (1.0/(1UL<<16)); } // 27
void f5() { y = (float)(((uint32_t)lq16val << 16)>>16) * (1.0/(1UL<<16)); } // 26
void f6() { x = ((float)(lq12val & LQ12_MASK_FRACTION)) * (1.0/(1UL<<12)); } // 28
void f7() { y = (float)(((uint32_t)lq12val << 12)>>12) * (1.0/(1UL<<12)); } //26
#else
void f1() { x = ldexp((float32_t)(lq16val & mask16), -16); } //81
void f2() { y = _IQ16toF(lq16val & mask16); } //42
void f3() { z = (*lq2f)(lq16val & mask16); } //44
void f4() { x = ((float)(lq16val & mask16)) * scale16; } // 26
void f5() { y = (float)(((uint32_t)lq16val << shft16) >> shft16) * scale16; } // 29
void f6() { x = ((float)(lq12val & mask12)) * scale12; } // 26
void f7() { y = (float)(((uint32_t)lq12val << shft12) >> shft12) * scale12; } // 29
#endif

0 Vishal_Coelho over 8 years ago in reply to fmdhr

TI__Mastermind 20850 points

I think you can shave off 8 cycles (call LCR and return LRETR) from each if you make those functions inline.

0 fmdhr over 8 years ago in reply to Vishal_Coelho

Expert 1530 points

Hi Vishal,

of course the code will be inline (or inlined) in the final release. I put the code in the several functions only to get reliable and comparable measurements for the different versions. There were strange results due to the optimizer, which seemed to use e.g. the delay slots in the fpu pipeline.

Thanks again,

Frank

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28075: FPU/TMU: Fast IQn fraction to float converion