This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Flops of C66x

Other Parts Discussed in Thread: TMS320C6678

Hi all

I am reading from http://www.ti.com/lit/ds/sprs691b/sprs691b.pdf 

This document says the 1.25G c66x dsp got 20GFlops , this is amazing

and I dont understand of the Flops caculate , in page 12 it say

In addition,
the C66x core integrates floating point capability and the per core raw computational performance is an
industry-leading 32 MACS/cycle and 16 flops/cycle.

 

What does this 16Flops/cycle mean ?

Do you have the 16 pipeline  ? or do this means 16 ALU or something ?

Thanks and regards

  • Hi all

    I get somthing from http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf on page 30-31

    As the talbe 1-1 show , do you mean the 16 is SIMD instruction of 4x4 data multiply ?

     

     

  • Xiongwei,

    It's using a 1 SIMD instruction on the two .M units that do 4 floating point operations per cycle, 1 SIMD instruction on two .L units that do 2 floating point operations per cycle and 1 SIMD instruction on the two .S units that does 2 floating point operations per cycle.  Giving a total of 16 Flops.

    Best Regards,

    Chad

  • xiongwei huang said:
    What does this 16Flops/cycle mean ?

    To be honest, you will not get a lot of benefit from trying to dissect the details of the bold device capability statements. Even the speed of any single instruction does not explain how your application will perform on the TMS320C6678. Your application will benefit from the Multicore Navigator, the high-speed DDR3 interconnect, the EDMA3 modules, the shared MSM RAM, and the instruction set features.

    But it is natural for an inquisitive engineer to see a specification and to then question it. You may learn some interesting details, and then you will still have to get your application running as fast as possible.

    16FLOPs/cycle means 16 Floating Point Operations per cycle. In this case, there are a variety of instructions that could be running in parallel to achieve this number. The comments below Table 1-1 "Raw Performance Comparison Between the C674x and the C66x" on page 30 of sprugh7 states that the 16 number comes from

    SPRUGH7 Table 1-1 comment said:
    2-way SIMD on .L and .S units (e.g. 8 SP operations for A and B) and 4 SP multiply on one .M unit (e.g 8 SP operations for A and B)

    That may not be the clearest statement or easiest to understand, but this is what it means:

    The 16 FLOPs/cycle number comes from

     2 FLOPs - 2-way SIMD on .L1 (A side) such as DADDSP or DSUBSP
     2 FLOPs - 2-way SIMD on .L2 (B side) such as DADDSP or DSUBSP
     2 FLOPs - 2-way SIMD on .S1 (A side) such as DADDSP or DSUBSP
     2 FLOPs - 2-way SIMD on .S2 (B side) such as DADDSP or DSUBSP
     4 FLOPs - 4-way SIMD on .M1 (A side) such as QMPYSP (or CMPYSP, maybe not 4-way SIMD)
     4 FLOPs - 4-way SIMD on .M2 (B side) such as QMPYSP (or CMPYSP, maybe not 4-way SIMD)
    ========
    16 FLOPs total per cycle per C66x CorePac

    Regards,
    RandyP

     

    If you need more help, please reply back. If this answers the question, please click  Verify Answer  , below.

  • Hello Chad and RandyP

    Really really thanks for your answer .

    Regards

    Xiongwei