This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Improving the performance via Intrinsics!

Other Parts Discussed in Thread: TMS320C6678

Hi all,

         I'm presently concentrated on improving the performance of my code. By performance I refer to speeding up without sacrificing the quality. I'm working with TMS320C6678 in CCSv5.1.
There are lots of mathematical calculations and loops in my code. I have done most of compiler optimisations in CCS referring to Optimising compiler guide. That really got worked. Now I have optimise using intrinsics. I replaced the mathematical calculations (multiplications, divisons etc) with corresponding intrinsics. Using intrinsics consumed more time to give the output the the quality was maintained. I have'nt tried any SIMD intsructions. Could anyone help on how to use these SIMD intrinsics?

Regards,

Sohal

  • This tutorial shows you one way to get SIMD instructions without using intrinsics.

    If you want to learn more about applying intrinsics generally, I recommend you get the DSPLIB package.  It may already have an implementation of the routine you need.  Even if it doesn't, there is likely to be something close.  The value is in looking at the source code.  For each routine, there are two implementations.  One is in natural C, i.e. the most straightforward implementation of the routine.  The other implementation is the one actually supplied in the library.  All sorts of optimization tricks are present, including use of intrinsics.  Comparing the two routines is very helpful.  It shows you how the intrinsics are intended to be used, and how to restructure your code to use them.

    For example, the natural C implementation of matrix multiply is the file install_root/packages/ti/dsplib/src/DSP_mat_mul/c66/DSP_mat_mul_cn.c .  The optimized implementation is install_root/packages/ti/dsplib/src/DSP_mat_mul/c66/DSP_mat_mul.c .  You'll see it uses several intrinsics.

    Thanks and regards,

    -George

  • Dear georgem,

                   Thanks for the reply.
    I have been going through the DSPLIB source and comparing the natural C version and the intrinsic version and trying to learn from them. If you can provide with me a small example as to how to implement a for loop including multiplication of values in two arrays by intrinsics, that would be nice.....

    Regards,

    Sohal