Hi
I have write this simple linear assembly code to calculate a dot product :
.global _dotp
_dotp: .cproc pm, pn
.reg m, n, prod, sum
ZERO sum
MVK .S1 100, A1
loop:
LDH .D1 *pm++, A2
||LDH .D2 *pn++, B2
SUB .S1 A1, 1, A1
[A1] B .S2 loop
NOP 2
MPY .M1X A2, B2, A6
NOP
ADD .L1 2, sum, sum
.
return
sum
because of delay slots
for
the branch instruction the ADD & MPY instructions must occur in the loop so the ADD & MPY instructions must occur 100 times .but when I compile
this
linear assembly code with ccsv5 the result show that
the ADD & MPY instruction occurs only once i have compile
this
code in O3is there any other optimization settings required?