Tool/software: TI C/C++ Compiler
The C28x+FPU supports conditional moves of floating-point registers via the following instructions:
MOV32 RaH, RbH {, CNDF}
MOV32 RaH, mem32 {, CNDF}
MOV32 mem32, RbH {, CNDF}
The compiler (v18.12.1.LTS) will not emit these instructions.
For example, the code:
void fp_conditional_move(float *a, float b)
{
*a = ( b == .5f ) ? -.5f : b;
}
generates the following assembly (-O2, -O3, and -O4)(trimmed for brevity):
||fp_conditional_move||:
ADDB SP,#2 ; [CPU_ARAU]
CMPF32 R0H,#16128 ; [CPU_FPU] |3|
MOVST0 ZF, NF ; [CPU_FPU] |3|
B ||$C$L1||,NEQ ; [CPU_ALU] |3|
MOVIZ R0H,#48896 ; [CPU_FPU] |3|
||$C$L1||:
SUBB SP,#2 ; [CPU_ARAU]
MOV32 *+XAR4[0],R0H ; [CPU_FPU] |3|
LRETR ; [CPU_ALU]
Including the superfluous (?) stack pointer operations, this will typically take 12 cycles. I'm assuming the branch doesn't cause a prefetch miss which would add additional Flash wait-state cycles.
This looks like inefficient code to me. I expected a conditional overwrite of the R0H register with the value -0.5f, i.e. (my comments):
||fp_conditional_move||:
; *XAR4 = (R0H == .5f) ? -.5f : R0H
MOVIZ R1H,#48896 ; R1H = -.5f (R1H is caller-saved, i.e. trash)
CMPF32 R0H,#16128 ; R0H == .5f (sets ZF if equal)
MOV32 R0H, R1H, EQ ; if (R0H == .5f) R0H = R1H
MOV32 *+XAR4[0],R0H ; *XAR4 = R0H
LRETR ; [CPU_ALU]
This code is branchless and takes 4 cycles, a speed up of 200% over the compiler-generated branching code.
Is my conditional move code correct?
Why does not compiler not emit conditional moves?
I have previously worked on C6000 cores where the compiler is very happy to emit conditional instructions.