Part Number: TMS320C6657
Hello,
C66x has SIMD capability so that it can carry out 8 float MAC operation per instruction.
However, it has only two 64bit data bus from L1 data memory.
Therefore, if the dot product of two arrays is done, only two MAC operation will be used due to lack of loading data to registers, which is the same performance as C674x.
If so, I think that the 8 MAC capability using C66x' SIMD is not so useful in many arithmetic cases in general.
Am I right?