This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

6678 linear assembly delay slots

Hi

I have write this simple linear assembly code to calculate a dot product :

.global _dotp
_dotp: .cproc pm, pn
    .reg m, n, prod, sum
    ZERO sum
    MVK .S1 100, A1
    
loop:
    LDH .D1 *pm++, A2
       ||LDH .D2 *pn++, B2
    SUB .S1 A1, 1, A1
 [A1] B .S2 loop
    NOP 2  
    MPY .M1X A2, B2, A6
    NOP
    ADD .L1 2, sum, sum
    .return sum
because of delay slots for the branch instruction  the ADD & MPY instructions must occur in the loop so the ADD & MPY instructions must occur 100 times .but when I compile this linear assembly code with ccsv5 the result show that
the ADD & MPY instruction occurs only once i have compile this code in O3is there any other optimization settings required?
  • When you write linear assembly, you are required to leave all the scheduling decisions to the compiler.  Do not add any NOP instructions, or place the branch with regard to the delay slots, etc.  You are allowed to use machine register names and specify the instruction unit used, but is best if you leave that out as well.

    To get you started, I rewrote your code.  While this is close, I can guarantee it is not 100% correct. 

    	.global	_dotp
    _dotp:	.cproc	pm, pn
    	.reg	m, n, prod, sum, cnt
    	ZERO	sum
    	MVK	100, cnt
    
    loop:
    	LDH	*pm++, m
    	LDH	*pn++, n
    	MPY	m, n, prod
    	ADD	sum, prod, sum
     [cnt]	SUB	cnt, 1, cnt
     [cnt]	B	loop
    
    	.return sum
    	.endproc
    

    I don't have the

    ADD 2, sum, sum

    that you have in your original code.  I am not sure how to combine that with the result of the MPY instruction.  I'm thinking that, now you don't have to worry with all the difficult scheduling details, you can easily change the code as needed.  

    To have the compiler schedule this loop with a software pipeline, you need to use --opt_level=2 or higher.

    Thanks and regards,

    -George

  • Hi George and thanks;

    this expression "ADD 2, sum, sum" was just a mistake.

    you mean I'm not allowed to use NOP in linear assembly? ( i have seen some linear assembly codes using NOPs in forum)

    So how I can be sure that in the final assembly file, the compiler do the ADD and MPY instructions in delay slots of branch instruction? (and how I can force the compiler to do that?)

    Thanks and regards,

  • dido said:
    you mean I'm not allowed to use NOP in linear assembly?

    That's correct.

    dido said:
    So how I can be sure that in the final assembly file, the compiler do the ADD and MPY instructions in delay slots of branch instruction? (and how I can force the compiler to do that?)

    The compiler handles those details for linear assembly just as it does for C code.

    Thanks and regards,

    -George