Hi Wang,
Thanks much for the clarification , Now I can map both PMU cycles with the RTI cycles. Both are identical. But I am still wondering why I get huge difference in cycles. The cycles we get on TMS570 are worse than Cortex-m4 cycles. I expect R4 is powerful than Cortex-M4.
I have created adder example and I see no difference in cycles even we use the optimization enable or not. we get same cycles. I also have the unroll version of the code for adderr function and here also not much saving. I think we might be missing some project options?
Can you please run this example for optimization ON and OFF and let me know what you observed? I see not loop unrolled code takes 1485 cycles and unrolled code takes 1385 cycles. If I enable optimizations, -O3 and opt_for_speed_=5 there no chnage in cycles almost same cycles. Can you please let me know what is happening here. This is crucial for our application, we need optimizaize the stuff.
Best regards,
Kranti.