Hello,
I have some questions regarding the MAC performance on the C66x.
I have to perform 32 32x32 MACs as fast as possible on the C66x. The datasheet (SPRS691B) and the document "Optimizing Loops on the C66x DSP" states that the C66x can perform 8 32x32MACs/cycle. This would result in 4 cycles for the total 32 32x32MACs.
Well, this is not what I see from my performance measurements. What I see when I perform e.g. the DSP_dotprod example of the DSPLib is that the C66x needs 63cycles for 32 16x16MACs!
Hopefully somebody can comment on this and give some advice?
BR,
Andreas