Other Parts Discussed in Thread: AM6442
Dear TI team,
I'm currently looking into performance issues in some parts of our code running on an R5f core on a AM6442.
I've noticed that our code spends a lot of time executing the __floatdidf function to convert 64-bit integers to double precision floats.
The library contains a floatdidf.S.obj that implements the conversion using integer operations. "Upstream" llvm/clang doesn't seem to contain this floatdidf.S implementation for ARM, and should thus rely on the C implementation in floatdidf.c that has separate implementations for hard-float and soft-float. The hard-float version uses floating point operations on two 32-bit integers which are supported by the vfpv3 hardware and are thus executing on the FPU.
- Is there a reason why __floatdidf is implemented using integer arithmetic in the armv7r-ti-none-eabihf version of libclang_rt.builtins.a?
- Did TI base the arm-cgt-clang on a specific upstream version that I could look at for reference? The libclang* stuff is in a folder "12.0.1", but that seems unlikely, since LLVM 12.0.1 was released AFTER arm-cgt-clang 1.3.0.
I've tried overriding the library provided version with a copy of the upstream llvm/clang code, and that seems to improve performance of my specific code sequence by about 50%, i.e. ~45,000 cycles instead of ~90,000 cycles.
I could of course just use my workaround, but I'd like to understand where the code in libclang_rt.builtins.a comes from, and why the eabihf version of that code doesn't use the FPU functions to optimize this conversion.