Apologies for the dumb question...
The C6678 datasheet states the performance is 40GMACs per core (SPRS691 Section 1). I presume a MAC is defined in the normal manner as a combined multiplication and accumulation operation.
Looking at the instruction set, I can see the DDOTP4H instruction does 8 MACs per .M unit per cycle. At 1.25GHz that makes 20GMACs per core, not 40.
I can't see that anything else that will use the other 8 multipliers in the .M unit. How would I get the full 40GMACs/core performance?
Regards,
Steve D