This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320C6748: Multiple branch instructions usage in divu.asm function from TI RTS lib

Part Number: TMS320C6748

Tool/software: TI C/C++ Compiler

Hi, 

I'm trying to figure out why the branch instruction in the following code snippet is used by this way. The branch is placed on the end of each execute packet, it's supposed to be execute in parallel with other instruction from execute packet and it's not conditionally execute. I suppose that this has something common with full utilization of pipeline, all branch instructions are execute but only one is really used. Could you please help me to understand to this tricky branch instruction usage. 

	CMPGTU	B4,	A4,	B2	; gt = den > num
||	SUB	A0,	B0,	A0	; quotient_shift = 32 - i
||	SHL	A2,	A6,	A2	; first_div <<= i
||	B	LOOP			;
 
   [B1]	ZERO	B2			; num32 && gt 
|| [B2]	MV	B2,	B1		; !(num32 && !gt)
|| [B2]	SHRU	A2,	1,	A2	; first_div >>= 1
||	B	LOOP			;
 
   [B2]	SHRU	B4,	1,	B4	; if (num32 && gt) den >> 1
||[!B1] SUB	A4,	B4,	A4	; if (num32 && !gt) num -= den
||	B	LOOP			;
 
  [!B1]	SHRU	B4,	1,	B4	; if (num32 && !gt) den >> 1
|| [B2]	SUB	A4,	B4,	A4	; if (num32 && gt) num -= den
||	CMPLT	B0,	7,	B2	; check for negative loop counter
||	SUB	B0,	7,	B1	; generate loop counter
||	B	LOOP			;
 
   [B2]	ZERO	B1			; zero negative loop counter
|| [B0]	SUBC	A4,	B4,	A4	; num = subc(num, den)
|| [B0]	SUB	B0,	1,	B0	; i--
||	B	LOOP			;
 
LOOP: 
   [B0]	SUBC	A4,	B4,	A4	; num = subc(num, den)
|| [B0]	SUB	B0,	1,	B0	; i--
|| [B1]	SUB	B1,	1,	B1	; i--
|| [B1]	B	LOOP			; for

(Code is taken from divu.asm file from TI RTS library for C6000, compiler v. 7.4.12). 

Thanks!

Karel

  • This is a software pipelined loop. It is indeed tricky to understand. Explaining how it works is beyond the scope of the forum, but I'll give you a quick intuitive picture. Look only at the branches in the first 5 execute packets. Note that they all branch to LOOP. The execute packet at LOOP is the software pipelined kernel. Branch instructions have 5 delay slots, so the first few branches don't finish executing before another branch gets added to the pipeline, so you end up with several branch instructions "in flight" at the same time. What will happen is that the one execute packet at LOOP will be executed before the first branch occurs. Then, the branch occurs and the PC is right back at LOOP, so that execute packet is executed again. Then the second branch occurs, and again the PC is right back at LOOP. So having multiple "in flight" branches like this is a cute little trick to execute one execute packet over and over, as if it were a zero-overhead loop.
  • Thanks for helping me to better understand. Now, it's little bit clearer.