Tool/software: TI C/C++ Compiler
I want to access the I16TOF32 instruction in C. A int-to-float cast results in a load to ACC to perform a sign extension from 16 to 32 bits, a move to a floating-point register, then a conversion to float using I32TOF32. Below is a snippet of assembly output by the C2000 16.9.2.LTS compiler at full optimisation level (it has unrolled the loop by a factor of 2).
MOVB XAR6,#24
MOVL XAR5,#_b
MOVL XAR4,#_a
SETC SXM
RPTB $C$L2,AR6
; repeat block starts
MOV ACC,*XAR4++
MOV32 R0H,ACC
MOV ACC,*XAR4++
MOV32 R3H,ACC
NOP
NOP
NOP
I32TOF32 R1H,R0H
I32TOF32 R0H,R3H
MOV32 *XAR5++,R1H
MOV32 *XAR5++,R0H
; repeat block ends
$C$L2:
This compiler-generated code is too slow for my use case (11 cycles per 2 outputs, vs 2 cycles per 2 outputs for hand-coded pipelined assembly). Is there a way to force the compiler to emit an I16TOF32 without using inline assembly (as it turns off some/most optimisations) and without resorting to re-writing my algorithms in assembly?