Compiler/TMS320C6748: Multiple branch instructions usage in divu.asm function from TI RTS lib

Karel Szurman

Part Number: TMS320C6748

Tool/software: TI C/C++ Compiler

Hi,

I'm trying to figure out why the branch instruction in the following code snippet is used by this way. The branch is placed on the end of each execute packet, it's supposed to be execute in parallel with other instruction from execute packet and it's not conditionally execute. I suppose that this has something common with full utilization of pipeline, all branch instructions are execute but only one is really used. Could you please help me to understand to this tricky branch instruction usage.

	CMPGTU	B4,	A4,	B2	; gt = den > num
||	SUB	A0,	B0,	A0	; quotient_shift = 32 - i
||	SHL	A2,	A6,	A2	; first_div <<= i
||	B	LOOP			;
 
   [B1]	ZERO	B2			; num32 && gt 
|| [B2]	MV	B2,	B1		; !(num32 && !gt)
|| [B2]	SHRU	A2,	1,	A2	; first_div >>= 1
||	B	LOOP			;
 
   [B2]	SHRU	B4,	1,	B4	; if (num32 && gt) den >> 1
||[!B1] SUB	A4,	B4,	A4	; if (num32 && !gt) num -= den
||	B	LOOP			;
 
  [!B1]	SHRU	B4,	1,	B4	; if (num32 && !gt) den >> 1
|| [B2]	SUB	A4,	B4,	A4	; if (num32 && gt) num -= den
||	CMPLT	B0,	7,	B2	; check for negative loop counter
||	SUB	B0,	7,	B1	; generate loop counter
||	B	LOOP			;
 
   [B2]	ZERO	B1			; zero negative loop counter
|| [B0]	SUBC	A4,	B4,	A4	; num = subc(num, den)
|| [B0]	SUB	B0,	1,	B0	; i--
||	B	LOOP			;
 
LOOP: 
   [B0]	SUBC	A4,	B4,	A4	; num = subc(num, den)
|| [B0]	SUB	B0,	1,	B0	; i--
|| [B1]	SUB	B1,	1,	B1	; i--
|| [B1]	B	LOOP			; for

(Code is taken from divu.asm file from TI RTS library for C6000, compiler v. 7.4.12).

Thanks!

Karel

over 8 years ago

0 Archaeologist over 8 years ago

TI__Guru* 84285 points

This is a software pipelined loop. It is indeed tricky to understand. Explaining how it works is beyond the scope of the forum, but I'll give you a quick intuitive picture. Look only at the branches in the first 5 execute packets. Note that they all branch to LOOP. The execute packet at LOOP is the software pipelined kernel. Branch instructions have 5 delay slots, so the first few branches don't finish executing before another branch gets added to the pipeline, so you end up with several branch instructions "in flight" at the same time. What will happen is that the one execute packet at LOOP will be executed before the first branch occurs. Then, the branch occurs and the PC is right back at LOOP, so that execute packet is executed again. Then the second branch occurs, and again the PC is right back at LOOP. So having multiple "in flight" branches like this is a cute little trick to execute one execute packet over and over, as if it were a zero-overhead loop.

0 Karel Szurman over 8 years ago in reply to Archaeologist

Prodigy 140 points

Thanks for helping me to better understand. Now, it's little bit clearer.

Code Composer Studio™︎

Code Composer Studio forum

Compiler/TMS320C6748: Multiple branch instructions usage in divu.asm function from TI RTS lib