This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66x MAC performance

Hi,

Quoting SPRS691A: "Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32x32 bit multiplies, sixteen 16x16 bit multiplies, ..."

The four 32x32 multiplications per clock cycle can be computed with the "QMPY32" SIMD instruction. However, there doesn't seem to be a SIMD instruction that does sixteen 16x16 multiplications. For such an instruction, the packed source register would have to be sixteen x 16-bit int (= 256-bits) long, whereas the source operands for the M unit are a maximum of 128-bits.

Are the claimed 40 GMAC/Core even theoretically possible? How is this calculated?

Kind Regards,

Rene