This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
triggered by a recent discussion (e2e.ti.com/.../713771), I wonder why there is no F32TOI32R instruction (and an intrinsic long __f32toi32r(double src))?
This would remove the nonlinearity around zero that the F32TOI32 instruction has: both positive and negative floats are rounded (truncated) towards zero.
Something like float f; long l; l = (int32_t)(f >= 0.0 ? f + 0.5 : f - 0.5) does it, and there is the F32TOI16R instruction that does it, but for 16 bit values only.
An idea for a next release?
See SPRU514P, Table 7-7. C/C++ Compiler Intrinsics for FPU.
Regards,
Frank
Hi,
Instead of (f >= 0.0 ? f + 0.5 : f - 0.5) you should use branchless ((f + 0x0.Cp24f) - 0x0.Cp24f). It is add 2^23, then subtract 2^23. It will follow current FP rounding mode. Since it's round to nearest even, it will round similar to your expression, but all in the middle values to even integers, +-0.5 to 0.0, +-1.5 to +-2.0, +-2.5 to +-2.0, +-3.5 to +-4.0 and so on.
Sorry, +2^23 -2^23 trick is only valid for positive numbers. For negative it's necessary to subtract 2^23 first, then add. Still branch is required.
Regards,
Edward
Edward,
I received a mail with your 1st reply, but no mail with the correction added later. May be the forum admins want to check that.
I tested your suggestion and got the results below. In my application, it is not necessary to hold the "round x.5 towards even" rule, so I think it is a good solution for my problem.
I did not know the "0x0.Cp24f" syntax, and I did not found it documented. Is it general C or a TI specific syntax? Where to read more about it?
Thanks & regards,
Frank
static inline
int32_t f32toi32r(float32_t f)
{
return (int32_t)((f + 0x0.Cp24f) - 0x0.Cp24f);
}
floats fr and f32toi32r results ir:
fr
-4.0 -3.875 -3.75 -3.625 -3.5 -3.375 -3.25 -3.125
-3.0 -2.875 -2.75 -2.625 -2.5 -2.375 -2.25 -2.125
-2.0 -1.875 -1.75 -1.625 -1.5 -1.375 -1.25 -1.125
-1.0 -0.875 -0.75 -0.625 -0.5 -0.375 -0.25 -0.125
0.0 0.125 0.25 0.375 0.5 0.625 0.75 0.875
1.0 1.125 1.25 1.375 1.5 1.625 1.75 1.875
2.0 2.125 2.25 2.375 2.5 2.625 2.75 2.875
3.0 3.125 3.25 3.375 3.5 3.625 3.75 3.875
ir
-4 -4 -4 -4 -4 -3 -3 -3
-3 -3 -3 -3 -2 -2 -2 -2
-2 -2 -2 -2 -2 -1 -1 -1
-1 -1 -1 -1 0 0 0 0
0 0 0 0 0 1 1 1
1 1 1 1 2 2 2 2
2 2 2 2 2 3 3 3
3 3 3 3 4 4 4 4
fr
-3.9375 -3.8125 -3.6875 -3.5625 -3.4375 -3.3125 -3.1875 -3.0625
-2.9375 -2.8125 -2.6875 -2.5625 -2.4375 -2.3125 -2.1875 -2.0625
-1.9375 -1.8125 -1.6875 -1.5625 -1.4375 -1.3125 -1.1875 -1.0625
-0.9375 -0.8125 -0.6875 -0.5625 -0.4375 -0.3125 -0.1875 -0.0625
0.0625 0.1875 0.3125 0.4375 0.5625 0.6875 0.8125 0.9375
1.0625 1.1875 1.3125 1.4375 1.5625 1.6875 1.8125 1.9375
2.0625 2.1875 2.3125 2.4375 2.5625 2.6875 2.8125 2.9375
3.0625 3.1875 3.3125 3.4375 3.5625 3.6875 3.8125 3.9375
ir
-4 -4 -4 -4 -3 -3 -3 -3
-3 -3 -3 -3 -2 -2 -2 -2
-2 -2 -2 -2 -1 -1 -1 -1
-1 -1 -1 -1 0 0 0 0
0 0 0 0 1 1 1 1
1 1 1 1 2 2 2 2
2 2 2 2 3 3 3 3
3 3 3 3 4 4 4 4
nonlinearity with ir = (int32_t)fr;
ir
-3 -3 -3 -3 -3 -3 -3 -3
-2 -2 -2 -2 -2 -2 -2 -2
-1 -1 -1 -1 -1 -1 -1 -1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
Edward,
again thank you yor the informations.
I did not found the __u32_bits_as_f32, __f32_bits_as_u32 intrinsics documented, neighter in SPRUEO2B (FPU), nor in SPRU514P (C28x Compiler), nor in SPRU430F (C28x CPU in), nor in SPRU513P (C28 Assembly Tools). Can anyone from TI please help here? Or is this "Herrschaftswissen" (German)?
Unforunately, these intrinsics do not work with the CLA compiler.
Below is a code snipped that tests several versions. It includes an assembler coded function that uses the NEGF instruction as suggested. It seems it does what we want, but I'm not sure if all conditions (pipeline etc.) are met. It also seems to be the fastest way - 4 + 4 cycles overhead for call/return (although, the call may hinder other optmizations). The & 0x80000000 approach looks good with memory, bad in expressions due to the 4 cycles pipeline delay for ACC <> RnH transfers.
Thanks & regards
Frank
static inline
float32_t fsign(float32_t f)
{
return __u32_bits_as_f32((__f32_bits_as_u32(f) & 0x80000000) | 0x3F800000);
}
volatile float32_t f1, f2, f3;
extern float32_t fsignf(float32_t f);
/* r0h = r0h < 0.0 ? -1.0 : 1.0 */
asm(" .global _fsignf ");
asm("_fsignf: movizf32 r1h, #1.0 ");
asm(" cmpf32 r0h, #0.0 ");
asm(" negf32 r0h, r1h, lt ");
asm(" lretr ");
void
main(void)
{
f1 = 2.0;
f2 = fsign(f1);
f1 = -2.0;
f3 = f1 * fsign(f1 + f2);
f3 = fsignf(f1);
f3 = f1 >= 0.0 ? 1.0 : -1.0;
f3 = fsignf(-f1);
}
Using
"C:/ti/ccsv7/tools/compiler/ti-cgt-c2000_16.9.1.LTS/bin/cl2000" -v28 -ml -mt --vcu_support=vcu2 --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 -O2 -g --define=CPU1
compiles to:
015155: 761F0487 MOVW DP, #0x487
88 f1 = 2.0;
015157: E8020000 MOVIZ R0, #0x4000
015159: E2030016 MOV32 @0x16, R0H
61 {
01515b: 0616 MOVL ACC, @0x16
90 f1 = -2.0;
01515c: E8060000 MOVIZ R0, #0xc000
89 f2 = fsign(f1);
01515e: 18A88000 AND @AH, #0x8000
015160: 9000 ANDB AL, #0x0
015161: 1AA83F80 OR @AH, #0x3f80
015163: 1E14 MOVL @0x14, ACC
90 f1 = -2.0;
015164: E2030016 MOV32 @0x16, R0H
61 {
015166: E2AF0016 MOV32 R0H, @0x16, UNCF
015168: E2AF0114 MOV32 R1H, @0x14, UNCF
01516a: E7100040 ADDF32 R0H, R0H, R1H
01516c: 7700 NOP
01516d: 7700 NOP
01516e: 7700 NOP
91 f3 = f1 * fsign(f1 + f2);
01516f: BFA90F12 MOV32 @ACC, R0H
015171: 18A88000 AND @AH, #0x8000
015173: 9000 ANDB AL, #0x0
015174: 1AA83F80 OR @AH, #0x3f80
015176: BDA90F12 MOV32 R0H, @ACC
015178: 7700 NOP
015179: 7700 NOP
01517a: 7700 NOP
01517b: E2AF0116 MOV32 R1H, @0x16, UNCF
01517d: E7000008 MPYF32 R0H, R1H, R0H
01517f: 7700 NOP
015180: E2030018 MOV32 @0x18, R0H
92 f3 = fsignf(f1);
015182: E2AF0016 MOV32 R0H, @0x16, UNCF
015184: 7641514B LCR fsignf
015186: 761F0487 MOVW DP, #0x487
015188: E2030018 MOV32 @0x18, R0H
93 f3 = f1 >= 0.0 ? 1.0 : -1.0;
01518a: E2AF0016 MOV32 R0H, @0x16, UNCF
01518c: E5A0 CMPF32 R0H, #0.0
01518d: AD14 MOVST0 NF,ZF
01518e: 6304 SB C$L1, GEQ
01518f: E805FC00 MOVIZ R0, #0xbf80
015191: 6F03 SB C$L2, UNC
015192: E801FC00 MOVIZ R0, #0x3f80
015194: E2030018 MOV32 @0x18, R0H
94 f3 = fsignf(-f1);
015196: E2AF0016 MOV32 R0H, @0x16, UNCF
015198: E6AF0000 NEGF32 R0H, R0H, UNCF
01519a: 7641514B LCR fsignf
01519c: 761F0487 MOVW DP, #0x487
01519e: E2030018 MOV32 @0x18, R0H
Richard,
I think I can summarize as follows: a F32TOI32R should do what F32TOI16R does for 16 bit integers, but for 32 bit integers.
I again ran my test program and included the __f32toi16r intrinsic. As discussed with Edward, the treatment of the x.5 case is different: F32TOI16R and Edwards add/sub 1.5*2^23 round toward next even, the branch ((int32_t)(fr[i] < 0.0 ? fr[i] - 0.5 : fr[i] + 0.5);) and my f + halffsign(f) approach do not.
What may be essential for mathematicians and in other applications, does not matter in my control application. Since there is always some noise, it is not important where x.5 is rounded to. I just want to have a "straight lined staircase" with evenly spaced stairsteps (without the missing stairstep at zero that F32TOI32 has).
Thanks & regards,
Frank
Here is the file, I think you can build a complete test case from.
/******************************************************************************/ /* testfsign.h - sign / round functions */ /******************************************************************************/ typedef long int32_t; typedef float float32_t; /******************************************************************************/ /* enums, defines */ /******************************************************************************/ /* define macros to hide the strange syntax of signf() , halfsignf() usage */ #define fsign(x) (signf(&(x)).f) #define halffsign(x) (halfsignf(&(x)).f) /* 0.5 * fsign(x) */ /******************************************************************************/ /* function definitions */ /******************************************************************************/ /* A fast version of sign(f) = f >= 0.0 ? 1.0 : -1.0; that does not need the slow branches in the cla. Unfortunately, the argument must be an addressable variable. */ union fi { float32_t f; int32_t i; }; static inline union fi signf(float32_t *f) { register union fi r; /* result = 1.0 | sign_bit(f) */ r.i = 0x3F800000 | ((*(int32_t *)f) & 0x80000000); return r; } #if 0 static inline float32_t fsign(float32_t f) { return __u32_bits_as_f32((__f32_bits_as_u32(f) & 0x80000000) | 0x3F800000); } #endif /* halfsignf(x) = 0.5 * fsign(x) Good for rounding of floats to integers to avoid the gap around 0.0, see below. */ static inline union fi halfsignf(float32_t *f) { register union fi r; /* result = 0.5 | sign_bit(f) */ r.i = 0x3F000000 | ((*(int32_t *)f) & 0x80000000); return r; } static inline int32_t f32toi32r_(float32_t f) { return (int32_t)(f + halffsign(f)); } /* https://e2e.ti.com/support/microcontrollers/c2000/f/171/t/719504 */ static inline int32_t f32toi32r(float32_t f) { // return (int32_t)((f + 0x0.Cp24f) - 0x0.Cp24f); /* round adding/subtracting 1.5 * 2^23 */ return (int32_t)((f + 12582912.0) - 12582912.0); } #if TEST_ROUND || 1 && !defined(__TMS320C28XX_CLA__) #if 0 /* not better */ float32_t onedotfive2e23 = 12582912.0; static inline int32_t f32toi32r(float32_t f) { // return (int32_t)((f + 0x0.Cp24f) - 0x0.Cp24f); /* round adding/subtracting 1.5 * 2^23 */ return (int32_t)((f + onedotfive2e23) - onedotfive2e23); } #endif /* demonstrate several methods of rounding floats to integers */ #define NROUND 64 /* # of values to round */ #define RROUND 4.0 /* round -range ... range */ #define EROUND 0.0 /* epsilon added to values, e.g. 0.0625 */ float32_t fr[NROUND]; int32_t ir[NROUND]; void testround() { int16_t i; for ( i = 0; i < NROUND; ++i ) { fr[i] = EROUND - RROUND + (float32_t)(2 * i) * RROUND / (float32_t)NROUND; } /* fr = -2.0 -1.75 -1.5 -1.25 -1.0 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 */ for ( i = 0; i < NROUND; ++i ) { /* round with gap around 0 */ ir[i] = (int32_t)fr[i]; } /* ir = -2 -1 -1 -1 -1 0 0 0 0 0 0 0 1 1 1 1 */ for ( i = 0; i < NROUND; ++i ) { /* add of 0.5 only shifts the gap */ ir[i] = (int32_t)(fr[i] + 0.5); } /* ir = -1 -1 -1 0 0 0 0 0 0 0 1 1 1 1 2 2 */ for ( i = 0; i < NROUND; ++i ) { /* close the gap adding 0.5 * fsign(x) */ //ir[i] = (int32_t)(fr[i] + halffsign(fr[i])); ir[i] = (int32_t)(fr[i] < 0.0 ? fr[i] - 0.5 : fr[i] + 0.5); } /* ir = -2 -2 -2 -1 -1 -1 -1 0 0 0 1 1 1 1 2 2 */ for ( i = 0; i < NROUND; ++i ) { /* close the gap adding 0.5 * fsign(x) */ ir[i] = (int32_t)(fr[i] + halffsign(fr[i])); //ir[i] = f32toi32r_(fr[i]); } /* ir = -2 -2 -2 -1 -1 -1 -1 0 0 0 1 1 1 1 2 2 */ for ( i = 0; i < NROUND; ++i ) { ir[i] = (int32_t)__f32toi16r(fr[i]); } /* ir = -2 -2 -2 -1 -1 -1 0 0 0 0 0 1 1 1 2 2 */ for ( i = 0; i < NROUND; ++i ) { ir[i] = f32toi32r(fr[i]); } /* ir = -2 -2 -2 -1 -1 -1 0 0 0 0 0 1 1 1 2 2 */ } #endif /* TEST_ROUND */