This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

loop optimization & profile

I'm using CCS, version 3.3.38.2/BIOS 5.31.02/Code Gen Tools v6.0.8 on a C6416T

I have a fir filter, (fir.asm) and first I did not used any of the pragmas to pass info to the compiler (restrict, _nassert, MUST_ITERATE)
and  I get the following two loops in the fir.asm:
 
loop1:  ii=5 with 4 iterations in parallel
loop2 (alternate):  ii=4 with 4 iterations in parallel
===================================================================================================
I took the same filter and optimized it (fir_optimized.asm) by adding this time all the info to the compiler, (restrict, _nassert, MUST_ITERATE); I get only one loop with this performance:
 
 ii=2 with 6 iterations in parallel.
 
Looking at the two loops the fir_optimized is a much better implementation; however when I run them in the
Profiler I get for the optimized version  a 30% increase in the cycle count.
 
Do you have any hints what's really happening here??

Thanks,

Andrew