This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SPKERNEL (fstg, fcyc)

Because of my bad English, I can't understand what is fstg, fcyc in the instruction SPKERNEL (fstg, fcyc). Could anybody explain it for a following case of code:

sploop 13
ldw *a4, a10
|| ldw *b4++, b10
nop
nop
nop
nop
nop
mpy32 b10, b8, b13:b12
nop
nop
nop
nop
add.d1 b13, a10, a10
stw a10, *a4++
spkernel

  • Hi Stanislav,

    As per my understanding SPLOOP instruction is used when the number of loop iterations are known in advance.

    SPLOOP ii
    The ii parameter is the iteration interval that specifies the interval (in instruction cycles) between successive iterations of the loop.

    Please Refer the below pdf
    www.ti.com/.../sprugh7.pdf

    As per the above document

    SPKERNEL (fstg, fcyc)
    The (optional) fstg and fcyc parameters specify the delay interval between the SPKERNEL instruction and the start of the post epilog code. The
    fstg specifies the number of complete stages and the fcyc specifies the number of cycles in the last stage in the delay.

    The SPKERNEL instruction argments(fstg,fcyc) instruct the SPLOOP hardware to begin execution of post-SPLOOP instructions by an amount of delay (stages/cycles) after the start of the epilog.

    For more information,Please Refer the Table 4-11 Field Allocation in stg/cyc Field.
  • Stanislav,

    I have been writing C6000 assembly for several years and will say I am very good at it. But I have never been able to fully understand the SPKERNEL instructions fstg and fcyc fields. It is not your 'bad English' that is keeping you from understanding it.

    Arvind's explanation is like what is in the CPU & Instruction Set Ref Guide, and I understand the words but do not understand it well enough to figure out what to use for those two parameters.

    As I tell everyone in this forum, do not write in assembly. Write your algorithm in C, let the compiler optimize it, and study the compiler's output if you want to figure out ways to optimize that. For loops like this, you might try writing in linear assembly to get to the SPLOOP/SPKERNEL pair more quickly.

    You will learn a lot from using the C Compiler for this specific loop, and you might learn a lot by trying linear assembly. In either case, you should go to TI.com and search for the C6000 Optimization Workshop. I am pretty sure there are archived copies of that on our Wiki. It has all the information you need for optimizing C code, for writing optimized loops in C code, and writing linear assembly. It may even have a section on SPLOOP that tries to explain SPKERNEL.

    Regards,
    RandyP
  • fstg, fcyc is very simple. Imagine for a moment that SPKERNEL didn't support extra argument (which would be equivalent of passing 0,0 at all times). You have to recognize that execute packet following one with SPKERNEL is scheduled for execution at ii*n-th cycle, where ii is argument to SPLOOP instruction and n is initial ILC value. But what if it's not the time to execute that instruction/packet at that point yet? In such case you'd have to add some NOPs to wait for right moment, right? And fstg,fcyc is nothing else but a way to express corresponding delay. But what are their values? Even simpler. If you have to postpone first instruction/packet following the loop by j cycles, then fstg is j/ii and fcyc is j%ii. That is all to it.

    As for suggested example. It's not representative, because the loop body is as long as ii. And there is no reason to postpone instruction/packet following SPKERNEL, whichever absent. For this reason it's impossible to illustrate what fstg, fcyc are useful for. However, it should be noted that operation in question can be performed with SPLOOP 2, or [asymptotically] 6.5[!] times faster than suggested example. In such case you'd be likely to have to use fstg,fcyc. Indeed, imagine that the missing instruction was return from subroutine. Would you want it to start execution at 2*n-th cycle? No. Because at that point you'd still have 11 cycles to go till the last instruction of the last iteration of the loop body. But return from subroutine gives you only 5 extra cycles, after which execution control is passed to caller. And passing the control that early is likely to have catastrophic effect. So you'd want this transfer of execution to coincide with the last instruction in the loop, and therefore you'd want to postpone the issue of the return instruction by 5 (11 minus 5 delay slots minus 1 for instruction itself) cycles, which would give you SPKERNEL 2,1. [On side note one also has to ensure that there is execution port available for execution of the return instruction at that particular cycle. If there is none, then you'd have to postpone execution for even later.]

    Important to note that "can be performed with SPLOOP 2" does not mean that you can simply replace 13 with 2 in the presented example. That would produce dead wrong result. The loop would have to be rearranged for lower iteration interval.

    As for "SPLOOP 2" per se. For completeness sake one should say that there is one condition which would legitimate SPLOOP 13 and render SPLOOP 2 suggestion inappropriate. But I find it hard to imagine that such condition would have to be met. The condition is that second ldw instruction in the loop would have to load result written by stw instruction. Only then SPLOOP 13 would be a must. Is it actually case here?

  • Andy Polyakov said:
    You have to recognize that execute packet following one with SPKERNEL is scheduled for execution at ii*n-th cycle, where ii is argument to SPLOOP instruction and n is initial ILC value.

    Also for completeness sake one should probably add that there is condition when "scheduled for ii*n-th cycle" doesn't hold true. This is cases when n is so low that prologue phase is over before SPKERNEL instruction is reached. Or in other words when ii*n is smaller than loop body length. In such case execute packet following one with SPKERNEL is delayed for fcyc [as far as I understand].