This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: Performance gain when using C7x instead of C66x DSP for non-TI-optimized code

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA2,

Dear TI experts,

we are currently using TDA2 (C66x DSP @ 1Ghz) to run C code that is not optimized w.r.t. TI intrinsics etc.The code has complex data structures and performs many operations in nested loops. We basically fully rely on the compiler optimizations (O3 level) to make best out of the DSP's capacity.

We are now considering to switch to TDA4VH-Q1 (C7x DSP @ 1 GHz). In that context I have the following questions:

1) I know that C7x has a wider SIMD data path compared to C6x (512 bit vs 64 bit). Much much would we benefit from that given the same i.e. non-TI-optimized code? Meaning, will the compiler generally be able to make use of that in our case?

2) How could we take more advantage from the C7x (compared to C66x)? Would introducing the TI intrinsics etc. improve the performance a lot? What else would be helpful here?

  • Hi,

    1) it becomes difficult to do parallelization when SIMD width is higher, so in the case of c7x. Non optimized code may not show any improvement on c7x. It may degrade as well in some cases

    2) First thing for DSP optimization is that, most of the compute should happen in loops which can be software pipelined. Then on top of that using intrisics helps in optimizing it further. Hence having loops is first and foremost requirement for DSP optimization. in case of nested loops, only inner most loop will be software pipelined. There are plenty of documents available online for C66x software pipelining, and most of the concept/requirements of optimization of C66x applies to c7x.

    Regards

    Deepak Poddar