Hi,
I don't understand if the number's MAC 8 MAC/cycle or 32 MAC/cycle?
I found this in tms320C6678 reference guide
In addition, the C66x core integrates floating point capability and the per core raw computational performance is an
industry-leading 32 MACS/cycle and 16 flops/cycle. It can execute 8 single precision floating point MAC operations
per cycle