This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Suggestions on optimization required on 6482

Other Parts Discussed in Thread: TMS320C6455

Hi,

I need some assistance in optimizing the below operations

1) An array contains complex values say a+ib , with both a and b being signed short integers. I need an output array of unsigned integer say c. Where c[j]=((a[j]*a[j])+(b[j]*b[j])). Whats the best possible way to implement this equation in an optimized manner.(j is a multiple of 1024).

2)2 arrays A and B. I need to find C such that C[j]=A[j]+B[j]. A,B and C are of signed short integer type. Whats the best possible way to implement this equation in an optimized manner.(j is a multiple of 1024).

Is there any library available to implement these equations in an optimized manner?

  • Hi,

    Thanks for your post.

    Do you mean TCI6482?

    If so, TCI6482 is not supported through the forums but through your established marketing channels and if you need help with those contacts, please go through your TI salesperson.

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

     

  • Hi Sivaraj, Yes i was referring to TCI6482. However my question is more generic. Could i know how the same can be done on any other c64 dsp, say tms320c6455. Regards Vencatesh
  • Hi,

    Thanks for your update.

    There are c64x core optimization workshops and online materials, compiler optimization techniques to enhance performance and reduce code size. Kindly check the below TI wiki resources:

    http://processors.wiki.ti.com/index.php/TMS320C6000_DSP_Optimization_Workshop

    https://processors.wiki.ti.com/images/e/e8/C64plus_cgt_overview.pdf

    http://processors.wiki.ti.com/index.php/Optimization_Techniques_for_the_TI_C6000_Compiler

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

  • Hi,

    I'm aware about documents you have listed below. I do compile my code with an -03 optimization level. My for loops use the must_iterate pragma. The multiply operations have been tried with _mpy _mph, addition with _sadd. Also i have tried _mem4 to load the complex value and used _dotp2 to compute the a*a+b*b. But none of these seem to reduce my cycle count.

    That why I have posted this question on the forum to see if there are any inbuilt libraries to do the same. It shall be highly appreciated if I can have a solution to my problem mentioned in the initial tread.

    Regards,

    Vencatesh

  • Vencatesh,

    You should have learned more out of the optimization workshop material than what you listed above, but it is a good start. You will want to look at all of the SIMD instructions to see what could be used. And look at the other optimization techniques described and discussed in the resources listed above.

    1. How fast does it need to be for your application to work? Why do you need that speed for this operation? What is the nature of your application?

    2. How fast is it running now? How much faster do you think is theoretically possible?

    3. Which memory endpoints are you using for the source and destination for your arrays? Try pre-loading the cache or using EDMA3 to copy data to/from slow memory and internal fast memory, especially L1SRAM, if cache is not helping enough.

    4. Show the compiler switches you are using.

    5. Show some of the C source and assembly output from these loops to demonstrate what is happening in the compiler operation and in the code execution.

    6. Have you tried writing it in assembly to improve the performance? That is a tough task to take on, so make it the last method you try.

    Regards,
    RandyP