Hi
I have write this simple linear assembly code to calculate a dot product :
.global _dotp _dotp: .cproc pm, pn .reg m, n, prod, sum ZERO sum MVK .S1 100, A1 loop: LDH .D1 *pm++, A2 ||LDH .D2 *pn++, B2 SUB .S1 A1, 1, A1 [A1] B .S2 loop NOP 2 MPY .M1X A2, B2, A6 NOP ADD .L1 2, sum, sum .return sum .endproc
Because of delay slots for the branch instruction the ADD & MPY instructions must occur in the loop so the ADD & MPY instructions must occur 100 times.
But when I compile this linear assembly code with ccsv5 the result show that the ADD & MPY instruction occurs only once
I have compile this code in O3.
Is there any other optimization settings required?