This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

A question about C66x architecture

Hello,

In a TMS320 C66x, we can reach until 8 32-bit MAC/CYCLE, but the core can't load more than 128-bit/CYCLE, it would be more efficient if we could load 256-bit/CYCLE, is that normal/the case in all VLIW processors, that the computation power < load bandwidth ?

Thanks

  • A practical example is when performing floating complex dot multiplies over 2 arrays, then 2 complex numbers (each of 64-bit) are loaded during 1 cycle, and we use .M1 for a complex product between the 2 loaded numbers, and then .M2 won't have new data loaded to treat data too in that same cycle ..

    I just want to know if in general most existing VLIW presents the same issue, and want to understand more what is limiting designers to make the data bus wider (is it L1D memory) ?

    Thanks

  • No one to share his knowledge ?

    I can see the that TigerSHARK DSP of AD presents wider data bus of 256-bit, while its computation power is not as powerful as the TI C66x ..

    Is that not important ?