Hi,
I have the C code that i have to optimize. I have taken the following steps already
1. use Intrinsics where ever i could
2. Replaced some functions which were taking a lot many cycles with linear assembly.
Still i have not reached my Target.
I have also observed the replacing C function that use Intrinsics with linear assembly also does not make a lot of difference, in fact its hard to beat the C optimization iin most cases.
What should i do next ?
Thanks in Advance