This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6674: Cycles for Sine and cosine functions

Part Number: TMS320C6674
Other Parts Discussed in Thread: MATHLIB

Hi Team,

Our customer is looking at our core benchmark data shown on https://www.ti.com/processors/digital-signal-processors/core-benchmarks/core-benchmarks.html

They are interested particularly on the C66x core. The main concern now is the cycles required specifically by the sine and cosine functions in single precision. Do we have these data characterized on hand?

Thanks in advance!


Kind Regards,

Jejomar


  • Hi,

    Those bench-markings are summarized in https://www.ti.com/lit/an/sprac13/sprac13.pdf. Unfortunately I don't have sin, cos single precision numbers on hand.

    That should be doable using MATHLIB on C66x. Let me try if I can duplicate the existing results for arctan, log10 and square root, then move to sin and cos functions.

    Regards, Eric 

  • Hi,

    I tried with the mathlib_c66x_3_1_2_4. All the test example CCS projects are under \packages\ti\mathlib\src.

    For square root I duplicated the same number. For arctan2 and log10 I have a bit better performance than the number reported, both 2 cycles less. So my setup and methodology sound good.

    • Then for sin, the project is \sinsp\c66\sinsp_66_LE/BE_ELF:

    --------------------------------------------------------------------------------
    Cycle Profile: sinSP
    --------------------------------------------------------------------------------
    RTS: 164 cycles
    ASM: 95 cycles
    C: 95 cycles
    Inline: 74 cycles
    Vector: 10 cycles
    --------------------------------------------------------------------------------

    • For cos, the project is cossp\c66\cossp_66_LE/BE_ELF

    --------------------------------------------------------------------------------
    Cycle Profile: cosSP
    --------------------------------------------------------------------------------
    RTS: 175 cycles
    ASM: 101 cycles
    C: 106 cycles
    Inline: 97 cycles
    Vector: 10 cycles
    --------------------------------------------------------------------------------

    What reported in the  https://www.ti.com/processors/digital-signal-processors/core-benchmarks/core-benchmarks.html is the cycles using vector. So, please share with customer that sin() and cos() single precision all take 10 cycles.

    If customer wants additional math operation numbers, they may try to import, build and run the CCS projects under \packages\ti\mathlib\src.

    Regards, Eric

  • Thank you all very much for your help.

    I happen to have just find this webpage, where all these execution times are given: software-dl.ti.com/.../MATHLIB_c66x_TestReport.html

    Now the only remaining questions I have, regarding this webpage, are:

    -What do "RTS", "C", "Inline", "Vector" mean? What is the difference?

    -Can I assume that the number of floating-point operations performed is equal to the number of cycles?

    Best regards.

  • Hello!

    RTS is for run time support, kind of library implemented the most traditional way, very much like in any other compiler. However, DSPs are special beasts, so they may demonstrate a way better performance if used properly. MATHLIB is an example, how to do that right. If you navigate inside mathlib_c66x_whatever_version\packages\ti\mathlib\src\sinsp\, there is a demo file sinsp_d.c, which allows to obtain the numbers being discussed.

    Other names refer to other implementations of algorithm under discussion. C - is for natural C implementation, you may find actual code in c66/sinsp_i.h. It implements some better/wiser/more efficient algorithm and provide C callable function. Note, implementation itself is provided as inlineable function, so one may save of function call overhead with expense of code size. Finally, Vector is an implementation used to calculate sequence of results given the sequence of arguments. Because of loop pipelining and other optimization techniques computing N values in a row may take much less cycles, then one may estimate by simply multiplying one value cycles by N. The cycles counts you've seen just prove this. Of course, there will overhead of entering this loop, exiting it, but on average, each value is much cheaper.

    With this in mind one may wish to arrange their algorithm in a way, that first array of arguments is prepared, then vectorized library function is used to calculate vector of sines. Similar logic is behind many functions in DSPLIB.

  • Hi,

    Thanks for pointing out the test reports! 

    RTS - this is the C66x run time library

    ASM, C are typically the same: with C66x intrinsic

    INLINE: is the inline function

    VECTOR: vector implementation

    Yes, the number of floating-point operations performed is equal to the number of cycles.

    Regards, Eric