HI, everybody:
I have some problem in recently. I want to write my Algorithm using Assembler.
My original Algorithm in C language are not have good performance. I try to use CCS opt level to increase my performance but I fell not good. So I want to write the Assembler.
When I finish to write the Algorithm in Assembler. I use the CCS profile compare with my Algorithm in C and my Assembler Algorithm CPU cycle.The result is my Assembler
language CPU cycle much more than O2. So I hope everybody can help me :
first question is -> I fell that ....just one cycle but I testing the CPU cycle LDW instruction are have lots of CPU cycle?
ADD .L1 A1,A10,A11
||ADD .L2 B1,B10,B11
||LDW .D1 *A2++,A20
||LDW .D2 *B2++,B20
second question is -> When I using O2 I can't debug the Algorithem.
thired question is -> If I want to write the Assembler how can I faster than O2.