Hi,
Quoting SPRS691A: "Each C66x .M unit can perform one of the following fixed-point operations each clock cycle: four 32x32 bit multiplies, sixteen 16x16 bit multiplies, ..."
The four 32x32 multiplications per clock cycle can be computed with the "QMPY32" SIMD instruction. However, there doesn't seem to be a SIMD instruction that does sixteen 16x16 multiplications. For such an instruction, the packed source register would have to be sixteen x 16-bit int (= 256-bits) long, whereas the source operands for the M unit are a maximum of 128-bits.
Are the claimed 40 GMAC/Core even theoretically possible? How is this calculated?
Kind Regards,
Rene