This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TDA2PXEVM: Performance difference between VCOP kernel implementations

Part Number: TDA2PXEVM


Tool/software: TI C/C++ Compiler

Hello,

I am trying to write EVE VCOP kernel for an algorithm having many arithmetic operations involved. This shall be implemented by splitting the operations between multiple for loops in a single kernel function.

My question is, will there be any performance difference between the below-mentioned approaches?
Assume, the algorithm will fit in a single for-loop satisfying all VCOP constraints.

Approach 1:  Algorithm implemented in VCOP kernel using single for-loop
Approach 2:  Algorithm implemented in VCOP kernel using multiple for-loop's by splitting the operations.

Known that approach 2 will introduce the usage of intermediate buffers which will increase the memory, how does this affect the performance?

 
 
  • I have forwarded your query to EVE experts, they will post the response here.

    Thanks,

    Praveen 

  • Hi,
       The approach 1 will be best as each loop has certain overheads associated with it. For each vector command specifying a loop, several overhead components are present. In general,developers are encouraged to place into each loop as much processing as possible and run as many iterations as possible to reduce the percentage of time spent on overhead. Hope this clarifies your doubt.

    Regards,

    Anshu


  • Hi Anshu,

    Thanks for the response.

    Approach 1 will be the best if the algorithm (number of operations) fits into a single for loop.

    If the algorithm is huge, I need to implement it using two loops to satisfy VCOP constraints. In that case, there will be overheads introduced for each loop which will affect the performance. is my understanding right?

  • Hi Santhiya,

       There are loop overheads which can be hidden if the loops are back to back ( which should be the case for you) but yes there will be overheads which are there for each loop. Hence each loop should process as much data as possible to mitigate the effect of loop overheads.


    Regards,

    Anshu