Hi,
It says here: TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor (Rev. B) (sec. 2.2):
"Each C66x .M unit can also perform one the following floating-point operations each clock cycle: one, two, or four
single-precision multiplies or a complex single-precision multiply."
In order to do 4 SP multiplications you need (2*32*4=256bit) of data. However, to my understanding, bus width
to the register file is 64 bits. So it seems that the 4 SP multiplications per cycle is more then it can handle.
Am I missing something?
Thank you.
 
				 
		 
					 
                          