This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320F28377D: Compiler does not emit conditional moves of floating-point registers

Part Number: TMS320F28377D

Tool/software: TI C/C++ Compiler

The C28x+FPU supports conditional moves of floating-point registers via the following instructions:

MOV32 RaH, RbH {, CNDF}

MOV32 RaH, mem32 {, CNDF}

MOV32 mem32, RbH {, CNDF}

The compiler (v18.12.1.LTS) will not emit these instructions.

For example, the code:

void fp_conditional_move(float *a, float b)
{
    *a = ( b == .5f ) ? -.5f : b;
}

generates the following assembly (-O2, -O3, and -O4)(trimmed for brevity):

||fp_conditional_move||:

        ADDB      SP,#2                 ; [CPU_ARAU]
        CMPF32    R0H,#16128            ; [CPU_FPU] |3|
        MOVST0    ZF, NF                ; [CPU_FPU] |3|
        B         ||$C$L1||,NEQ         ; [CPU_ALU] |3|
        MOVIZ     R0H,#48896            ; [CPU_FPU] |3|
||$C$L1||:   
        SUBB      SP,#2                 ; [CPU_ARAU]
        MOV32     *+XAR4[0],R0H         ; [CPU_FPU] |3|
        LRETR     ; [CPU_ALU]

Including the superfluous (?) stack pointer operations, this will typically take 12 cycles. I'm assuming the branch doesn't cause a prefetch miss which would add additional Flash wait-state cycles.

This looks like inefficient code to me. I expected a conditional overwrite of the R0H register with the value -0.5f, i.e. (my comments):

||fp_conditional_move||:
        ; *XAR4 = (R0H == .5f) ? -.5f : R0H
        MOVIZ     R1H,#48896            ; R1H = -.5f (R1H is caller-saved, i.e. trash)
        CMPF32    R0H,#16128            ; R0H == .5f (sets ZF if equal)
        MOV32     R0H, R1H, EQ          ; if (R0H == .5f) R0H = R1H
        MOV32     *+XAR4[0],R0H         ; *XAR4 = R0H
        LRETR     ; [CPU_ALU]

This code is branchless and takes 4 cycles, a speed up of 200% over the compiler-generated branching code.

Is my conditional move code correct?

Why does not compiler not emit conditional moves?

I have previously worked on C6000 cores where the compiler is very happy to emit conditional instructions.

  • This does appear to be a performance problem in the compiler.  I filed the entry CODEGEN-6788 in the SDOWP system to have this investigated.  It is not filed as a bug, because the code executes correctly.  Instead, it asks for the compiler to be changed to emit faster code for this input.   Sometimes, the requested performance improvement is not possible, or impractical.  If this turns out to be the case, an explanation will be given.

    You are welcome to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • Hi,

    Where are you seeing the conditional store instruction listed?  The floating point architecture reference guide we have only lists conditional floating point loads and register to register moves, but no conditional floating point store instructions.

    Thanks,

    Anna Youssefi

    Compiler Support Team

  • Hello Anna,

    Looks like I imagined the "MOV32 mem32, RaH {, CNDF}" instruction. I cannot find it in my documentation and I have confirmed my assembler (18.12.1.LTS) emits an error on that instruction. I probably assumed it existed for symmetry reasons.

    I have confirmed the assember accepts "MOV32 RaH, mem32 {, CNDF}" and "MOV32 RaH, RbH {, CNDF}" instructions.

    Thanks,

    Iain