Hello,
I plan to use TMS320C6678 to run an algorithm. According to technical document, TMS320C6678 is able to perform 256 16x16 bit fixed-point multiplies or 64 floating-point multiplies each clock cycle. My question is: How to implement this, by using certain instruction like MPY or by properly setting the pipeline?
Thanks a lot.