This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6745: dsplib: further optimization

Part Number: TMS320C6745

Hi, i am using TMS320C6745.

I am seeing that most of functions of the official C674x dsplib are quite "general purpose", in the sense that they are compiled under few operating hypothesis. In particular, the batch size N is often supposed to be only multiple of two / four, so i expect that into such function there is something like a FOR loop executing the condition many times as N increases, once every 2 or four samples.

In my application i always need to handle data at larger batches (16). The question is: if i rewrite some of the dsplib functions (especially blk_move, w_vec, dotprod, vecmul) specifically for N = 16, will my function execute "always the same way" avoiding all the "FOR conditions"? Can i expect to have a significant increase of performances by rewriting dsplib specifically for N = 16?

  • DSPLIB functions are provided in full source and under BSD licensing to allow users to use it as is or with modifications so you can certainly attempt to optimize the code. The reason why the batch size N is often multiple of two or four is that the function use dual or quad data load operations supported in C674x ISA. 

    If you want to extend this to do batch processing of 16, you can certainly do it but we are not sure how much more benefit you may be able to get out of re-writing the function. If you are writing the code to unroll the loop, you may want to see how well the compiler is doing this in the assembly out put before planning to spend more time on further optimizations.



  • Thank you, i will check the assembler output before proceeding!