In a TMS320 C66x, we can reach until 8 32-bit MAC/CYCLE, but the core can't load more than 128-bit/CYCLE, it would be more efficient if we could load 256-bit/CYCLE, is that normal/the case in all VLIW processors, that the computation power < load bandwidth ?
A practical example is when performing floating complex dot multiplies over 2 arrays, then 2 complex numbers (each of 64-bit) are loaded during 1 cycle, and we use .M1 for a complex product between the 2 loaded numbers, and then .M2 won't have new data loaded to treat data too in that same cycle ..
I just want to know if in general most existing VLIW presents the same issue, and want to understand more what is limiting designers to make the data bus wider (is it L1D memory) ?
No one to share his knowledge ?
I can see the that TigerSHARK DSP of AD presents wider data bus of 256-bit, while its computation power is not as powerful as the TI C66x ..
Is that not important ?
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.