Compiler/TDA2PXEVM: Performance difference between VCOP kernel implementations

Santhiya R1

Part Number: TDA2PXEVM

Tool/software: TI C/C++ Compiler

Hello,

I am trying to write EVE VCOP kernel for an algorithm having many arithmetic operations involved. This shall be implemented by splitting the operations between multiple for loops in a single kernel function.

My question is, will there be any performance difference between the below-mentioned approaches?
Assume, the algorithm will fit in a single for-loop satisfying all VCOP constraints.

Approach 1: Algorithm implemented in VCOP kernel using single for-loop
Approach 2: Algorithm implemented in VCOP kernel using multiple for-loop's by splitting the operations.

Known that approach 2 will introduce the usage of intermediate buffers which will increase the memory, how does this affect the performance?

over 5 years ago

0 Praveen Eppa1 over 5 years ago

TI__Genius 17580 points

I have forwarded your query to EVE experts, they will post the response here.

Thanks,

Praveen

0 Anshu Jain over 5 years ago

TI__Guru 56820 points

Hi,
The approach 1 will be best as each loop has certain overheads associated with it. For each vector command specifying a loop, several overhead components are present. In general,developers are encouraged to place into each loop as much processing as possible and run as many iterations as possible to reduce the percentage of time spent on overhead. Hope this clarifies your doubt.

Regards,

Anshu

0 Santhiya R1 over 5 years ago in reply to Anshu Jain

Prodigy 160 points

Hi Anshu,

Thanks for the response.

Approach 1 will be the best if the algorithm (number of operations) fits into a single for loop.

If the algorithm is huge, I need to implement it using two loops to satisfy VCOP constraints. In that case, there will be overheads introduced for each loop which will affect the performance. is my understanding right?

0 Anshu Jain over 5 years ago in reply to Santhiya R1

TI__Guru 56820 points

Hi Santhiya,

There are loop overheads which can be hidden if the loops are back to back ( which should be the case for you) but yes there will be overheads which are there for each loop. Hence each loop should process as much data as possible to mitigate the effect of loop overheads.

Regards,

Anshu

Processors

Processors forum

Compiler/TDA2PXEVM: Performance difference between VCOP kernel implementations