This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TI-CGT: ARM-CGT-CLANG 1.3.0 libclang_rt.builtins.a eabihf implementation of __floatdidf

Part Number: TI-CGT
Other Parts Discussed in Thread: AM6442

Dear TI team,

I'm currently looking into performance issues in some parts of our code running on an R5f core on a AM6442.

I've noticed that our code spends a lot of time executing the __floatdidf function to convert 64-bit integers to double precision floats.

The library contains a floatdidf.S.obj that implements the conversion using integer operations. "Upstream" llvm/clang doesn't seem to contain this floatdidf.S implementation for ARM, and should thus rely on the C implementation in floatdidf.c that has separate implementations for hard-float and soft-float. The hard-float version uses floating point operations on two 32-bit integers which are supported by the vfpv3 hardware and are thus executing on the FPU.

  • Is there a reason why __floatdidf is implemented using integer arithmetic in the armv7r-ti-none-eabihf version of libclang_rt.builtins.a?
  • Did TI base the arm-cgt-clang on a specific upstream version that I could look at for reference? The libclang* stuff is in a folder "12.0.1", but that seems unlikely, since LLVM 12.0.1 was released AFTER arm-cgt-clang 1.3.0.

I've tried overriding the library provided version with a copy of the upstream llvm/clang code, and that seems to improve performance of my specific code sequence by about 50%, i.e. ~45,000 cycles instead of ~90,000 cycles.

I could of course just use my workaround, but I'd like to understand where the code in libclang_rt.builtins.a comes from, and why the eabihf version of that code doesn't use the FPU functions to optimize this conversion.

Regards,

Dominic

  • Dominic,

    Our compiler expert is out today but will be back tomorrow (Tuesday) and will reply then.

    Regards,

    John

  • Thank you for notifying us of this performance problem.  We have not completed our analysis, but your description of the problem appears to be correct.  We intend to fix this problem in the next major release.

    Thanks and regards,

    -George

  • The issue EXT_EP-10493 has been filed to have this issue investigated.  You are welcome to follow it at that link.

    Thanks and regards,

    -George

  • Hello George,

    thanks for filing this issue.

    It would be great if you could share some details of your analysis. For this particular algorithm, I believe __floatdidf was the only issue, at least I couldn't spot anything obvious afterwards, but I'm wondering if other parts of the compiler support libraries are using less-than-optimal code sequences (for a VFP enabled device), too.

    We intend to fix this problem in the next major release.

    Any idea when that next major release is planned?

    Regards,

    Dominic

  • Hi Dominic,

    Your initial analysis revealed that the tiarmclang compiler is using a "generic" assembly implementation of the floating-point runtime function with the incorrect assumption that it will provide smaller code size and/or better performance than a C implementation. We anticipate that this may be the case with some other floating-point runtime functions as well as __floatdidf, so in addition to correcting the issue with __floatdidf, the issue report that George filed will also incorporate checking and fixing other floating-point runtime routines as needed.

    We are still in the planning stages of the next major release project, so I am not able to communicate a firm date for when that release will happen. However, we expect that a pre-release containing some bug fixes and some of the features planned for the next major release will be made available in the fall of this year (late Oct/early Nov).

    Regards,

    Todd Snider

    TI Arm Clang Compiler Tools Team