This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Looking for native assembly language sqrt() equivalent to accelerate performance

Hi there,

I’m using IAR EWARM 7.1 with the TI TMS570 MCU, ARM Cortex-R4F core with floating point unit (FPU).

In my application I need to find the square root of a given floating point number very very frequently so I've included the generic math.h header and use the sqrt() function in my code.

  1. Under “General Options”, I selected Library Configuration=Normal
  2. Under “Linker” I set “Automatic runtime library selection”

With the MCU clocked at 120 MHz, a call to the sqrt() function has averaged more than EDITED 0.5µs (measured), which is unacceptable for our application (too slow).

Is there any sqrt() in assembly language available that uses the native instructions of the Cortex-R4F for such computation?

Thanks!

  • Chuck,

    TI's ARM compiler will generate VSQRT instructions instead of function calls to sqrt, sqrtf, and sqrtl if you enable --fp_mode=relaxed  (you don't get errno set if you give a negative input but it's very fast).

    See page 30 of SPNU151i if you want to reference this for IAR.

    I skimmed the IAR compiler manuals for V 7.10 and I didn't see anything that is equivalent, but we're not experts on the IAR compiler.  I suggest contacting IAR to see if they can perform the same sort of optimization  (you can reference the TI compiler as an example).

     

  • Hi Anthony,

    Thanks for the quick reply.

    According to your post, VSQRT seems to be an native instruction of the Cortex-R4F, would it be possible for you to wrap it as an inline assembly function to accept a floating point parameter and return the result in a register or something, all in a .s assembly file?

    I know that I'm asking too much since I'm not familiar with the assembly language of this ARM core. :)

    Many thanks!

  • Hi Chuck,
    This might be what IAR winds up recommending that you do.
    But I think you want to ask if it can be inlined so that you avoid the overhead of funciton call.

    Best Regards,

    Anthony

  • Yes, when FPU is selected as VFPv3, I get the following code. I think that this is pretty sure that what one can do for the best performance. Do you agree?

    As you can see, there are still some code that need to be executed for parameter passing and result returning...

    BTW, I made a bad calculation on the time taken for the sqrt(), it should be approximately 0.5µs when clocking the MCU at 120 MHz.

  • Hi Chuck,

    Ok that's good.   If you are happy with the performane you can stop obviously.

    If not you could try inlining the sqrtf function.  

    The optimization I referenced in the TI compiler would be much faster.  It wouldn't have the function call and it wouldn't do the check for negative input value like the code does above. (I believe that's what the BMI is ... branch if minus)..

    Anyway, that's not the default behaviour for the TI compiler, but if you can guarantee that your inputs are positive then you could use it.  In your case you could probably take the sqrt of the abs of amplitude and this would guarantee you are not giving a negative number to SQRT.  It might be faster that way than having the check for negative ...  VABS is 1 cycle plus 1 cycle of result latency so I think it would be faster...