This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F280049: Calculation cycles comparision between float32 and uint32

Part Number: TMS320F280049


For normal multiplication (uint32*uint32, uint16*uint16), division calculation(uint32/uint16), will FPU take shorter time than a normal fix point DSP core? (compare F280049 with F28027, with all instructions enabled)

For example,

uint32 a = 2, b = 3;

float32 c = 4, d = 5;

Will c * d fater than a * b.? What if variable is in uint16 and float32?

Will a / b fater than c / d? What if variable is in uint16 and float32?

Thanks

Sheldon

  • Sheldon,

    I am tempted to give a quick answer here, but on second thought, I think there are some nuances here(including the fact that the 28004x is a 100MHz device, whereas the 28027 is a 60MHz device), so let me analyze this carefully before getting back to you. Please give me a day or so.

    Also, could you elaborate what you mean by "What if variable is in uint16 and float32?" Do you mean a mixed-operation involving fixed-pt and floating-pt numbers. In those cases, you will end up casting. Or did you mean to differentiate between uint16 vs uint32. Please clarify.

    Thanks,

    Sira

  • Hi Sira,

    We may assume they are running at the same main clock speed, what I care about is the differnece between instruction set here. (like if a FPU is enabled or not).

    I mean the same test with differences in uint16 and uint32. Like below, thanks!

    uint16 a = 2, b = 3;

    float32 c = 4, d = 5;

    Will c * d fater than a * b.?

    Will a / b fater than c / d?

    Thanks

    Sheldon

  • Sheldon,

    I think since there are single cycle instructions for fixed-point multiply MPY as well as floating-point multiply MPYF32, there should be no differences on the multiply side.

    On the division side, there are quite a few possibilities.

    If "/" just calls the standard RTS library, it would consume a different number of cycles for fixed-pt and floating-pt. Presume floating-pt will take longer. I haven't measured or know of where this is documented.

    If you use the FPUFastRTS library, it supports single-precision floating-pt division (and double-precision division), and benchmarking data is available in the user guide (25 cycles for single-precision). Presume this would be smaller than RTS library single-precision floating-pt division, because the purpose is to tradeoff accuracy for speed.

    On devices that have FASTINTDIV (2838x, but not 28004x), integer division can be performed fast (benchmarking data available in the corresponding user guide - 16 cycles for uint16/uint16).

    If you use the IQMath fixed-pt library for fixed-pt division, benchmarking data is available in the user guide (63 cycles of IQNDiv).

    Hope this helps,

    Sira

  • Hi Sira,

    Thanks for your detailed answer, could you check if you understanding is correct or not?

    As for multiplication between two fixed point numbers and between two float point numbers, there should be no differences.

    uint32*uint32, uint16*uint16

    float32*float32

    Comments

    C28x

    C28x + FPU

    No differences, for former has MPY instruction and later has MPYF32, both are single cycle

    As for division between two fixed point numbers and between two float point numbers, please see details below

    uint32/uint32, uint16/uint16

    float32/float32

    Comments

    C28x + FASTINTDIV

    C28x + FPU + TMU

    Former takes around 14 to 16 cycles with hardware support. Later uses DIVF32 to process with 5 cycles (Can execute other instructions in the remaining four cycles for CPU pipelining)

    C28x + IQmath

    C28x + FPU + FPUfastRTS

    Former takes 63 cycles by calling IQNdiv. Later takes 24 cycles.

    Sheldon

  • Sheldon,

    Thanks for including info on the DIVF32 supported by the FPU.

    Tables look good. Probably would be good to explicitly say"IQMath Library" and "FPUFastRTS Library" to distinguish them from the HW modules like FPU, TMU, and FASTINTDIV.

    Thanks,

    Sira