This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS1227, fast library with sinf, cosf, atan2

Other Parts Discussed in Thread: TMS570LS1227, TMS320F28335, TMS570LS3137

Hello,

I'm doing some performance tests with the demo board of TMD570LS1227, and I want to measure the execution time of a existing code snippet. Goal is to know if TMS570LS1227 has the same / less / better performance as the Delfino 28335 for my application.

My test code includes some FOC calculations as park, clark, ipark, pi controllers, pretty lot of math, because this is part of the target application.

I do compare directly the same code on the TMS320F28335 and the TMS570LS1227.
Some ifdefs help me do call the correct library functions (as sinf for 570LS1227 and sin_f32 of 28335)

The 28335 test application is built with -o3 and runs in RAM at 150 MHz. It is linked with the rts2800_fpu32_fast_supplement.lib.

The TMS5701227 is built with -o3 and runs in FLASH / RAM at 180 MHz. No special library included so far.

I tried to run it from FLASH and RAM. There is no big difference in execution time (<1us). The sinf, cosf, sqrtf functions are still in FLASH.

I'm speeking about execution times on TMS570LS1227 of about 16us, measured with GPIO and PMU. The same code on 28335 needs about 9.5 us.

So I'm not looking for ns.

I'm sure there is more somewhere in the TMS570LS1227...


Long introducion for a short question:

Is there something similar as the rts2800_fpu32_fast_supplement.lib for the TMS570LS1227?

I do only need: sinf, cosf, atan2f, sqrtf, rest is simple math as multiply, division, addtions, subtraction.

Thanky you very much!

Best Regards,

Roger

  • Update:

    OK, I found the CMSIS dsp lib and did include these functions:

    arm_sin_cos_f32()

    arm_sin_f32() function.

    The arm_sqrt_f32 function points to the sqrtf function and a atan2 function does not exist in CMSIS.

    I putted all the code and the sin/cos tables into RAM and with this I got execution speeds nearly as fast as in 28335

    TMS320F28335: 9.5 us

    TMS570LS1227: 10.3us

    Measurements with PMU showed that atan2f needs about 1.6us. (15%!)

    Any idea what to do to get even more performance?

    Thank you for every hint!

    Roger

  • Roger,

    You can check out the processor optimization hints at:

    http://processors.wiki.ti.com/index.php/ARM_compiler_optimizations

    There is also a link for specific FPU optimization options near the bottom of this page.

    Let me know if this helps.

    Regards, Sunil

  • Roger,


    Also, Cortex R4F FPU includes machine instructions to perform floating point SQRT and DIV so you don't necessarily need a special optimized function.   You might want to check what these functions are doing.  If they're putting exception handling code around the machine instruction you might be able to decide you don't need it by restricting the inputs.

    Also these instructions are 2 cycle w. 16 cycle latency.  So it's cheap to issue them but you might find you are waiting on the result.  If this is the case there might be a way to rearrange the order of computations to hide the latency.  (ie.. do some other independent operations while waiting for the result).

     

  • Hello Sunil & Anthony.

    Thank you very much for your hints!

    I did check Sunil's tip for optimization settings. The fp_mode=relaxed resulted in a much better performance!

    I did not check Anthony's tip yet. But this seems to be interesting. I saw already the __sqrt() intrinsic (because of fp_mode=relaxed). But I will double check what kind of DIV is done if I write code as this: a = b / c;

    Additionally I will look for an optimized atan2 function. I'll report what I find.

    Roger

  • Just to inform:

    My final tests did show following calculation times for my test code:

    TMS320F28335 (150MHz): 1277 cycles -> 8.51us

    TMS570LS1227 (180 MHz): 1050 cycles -> 5.83us

    For this I replaced the atan2 function (on LS1227) with a look up function (I ported it from the 28335 fast supplement lib).

    I compared the fastest version possible on each target. I used for both targets:

    -o3 --fp_mode=relaxed --fp_reassoc=on --single_inline --optimize_with_debug=on, all code running in RAM

    • for 28335 best results were with --opt_for_speed=4
    • for LS1227 best results with --opt_for_speed = 2

    Looks not to bad now :)

    Roger

  • FYI: Linear interpolation with values in table can be optimized to:

    • cos_f32(x) 40-45 cycles
    • sin_f32(x) 48-53 cycles
    • with 4096+1 table it have maximum error 4e-7
    • measured on TMS570LS3137

    Jiri

  • Thank you Jiri,

    For my tests I took the CMSIS dsp lib which offers lookups for sin / cos.

    For my calculations I see that the TMS570LS1227 has more performance than the TMS320F28335 but less performance than TMS320F346 (@300MHz).

    It seems that it should work for my application, so I will continue with testing... I'm sure I will post some more questions here :)

    Best regards and thank you!

    Roger