This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Even more optimization !

Hi,

I have the C code that i have to optimize. I have taken the following steps already

1. use Intrinsics where ever i could

2. Replaced some functions which were taking a lot many cycles with linear assembly.

Still i have not reached my Target.

I have also observed the replacing C function that use Intrinsics with linear assembly also does not make a lot of difference, in fact its hard to beat the C optimization iin most cases.

What should i do next ?

Thanks in Advance

  • Saad, will be glad to help.
    Have you tried looking at the compiler feedback as part of the generated ASM file? Note you can request the compiler to generate .asm file by using the option -k. Is the feedback consistent with what you expect?
    Have you used 'restrict' for the data pointers to indicate to compiler the correct memory dependencies?
    What is your target and how off are you from that?
    What kind of function you are trying to implement? What your theoretical best estimate for this kernel?

    Regards,
    Gagan