software pipelining and dynamic loop count

sam12757

I have a piece of code that compute loop bounds dynamically at runtime. For example:

for(i=0; i<100; i++) {
for(j=0; j< function1(i); j++) A();
for(j=0; j< function2(i); j++) B();
}

So the loop counts of A() and B() are unknown at compile time, but computed at runtime. Does this automatically disqualify the loop from software pipelining?

How about this code, where the loop bounds are pre-computed and stored in a 2D array M:

for(i=0; i<3; i++) {
for(j=0; j<M[i][0]; j++) A();
for(j=0; j<M[i][1]; j++) B();
}

Can TI compiler software pipeline this code?

over 16 years ago

0 sam12757 over 16 years ago

Prodigy 30 points

I just realized any function call inside a loop disqualifies the loop from software pipelining. So I can change the functions in my first question to macros (inlined functions) to avoid this issue. My point is to ask if software pipelining is feasible if loop bounds are computed dynamically.

0 Bernie Thompson TI over 16 years ago in reply to sam12757

TI__Mastermind 41665 points

sam said:
My point is to ask if software pipelining is feasible if loop bounds are computed dynamically.

Yes, though this is through the MUST_ITERATE pragma which you use to define the minimum trip count (number of loops) that will happen so that the compiler can effectively implement software pipelining, if you do not have this pragma than the compiler must assume you can have a loop count of 0 and thus you do not get proper software pipelining. The MUST_ITERATE pragma is discussed further in section 5.8.6 of SPNU151d assuming you are developing for the ARM9 on one of the DM3xx processors, though this applies similarly to our DSP devices.

0 sam12757 over 16 years ago in reply to Bernie Thompson TI

Prodigy 30 points

What about my second approach, where the loop bounds are pre-computed and stored in a 2D array? The loop code does a runtime look-up in the array. Does this make any difference at all compared to dynamically computed loop bounds?

Does the compiler do software pipelining based on the lower bound of MUST_ITERATE pragma? If the lower bound is 1, then the compiler treats the loop as iterating once only regardless of the upper bound, and only software-pipelines the 1 iteration, leaving all the other possible iterations as epilogue? If this is true, then software pipelining is ineffective for dynamic loop bounds with a wide range at runtime.

0 Bernie Thompson TI over 16 years ago in reply to sam12757

TI__Mastermind 41665 points

sam said:
What about my second approach, where the loop bounds are pre-computed and stored in a 2D array? The loop code does a runtime look-up in the array. Does this make any difference at all compared to dynamically computed loop bounds?

This is a good question, I believe any loop that is not explicitly given a single number of loops would require the MUST_ITERATE pragma to properly pipeline, in other words I suspect if the loop count is looked up in an array that is the equivalent of calculating it to the compiler, as this would require the compiler to look through the array and essentially calculate the same values that would be in the MUST_ITERATE pragma anyway, and since the MUST_ITERATE pragma exists it would seem redundant to put this burdensome optimization code into the compiler. Of course this is just an educated guess, as I am not intimately familiar with the internals of the optimizer (perhaps one of our compiler experts can comment on this), however you could try using a 2d array of loop values to see if it has an impact as a test, this may still be faster depending on how complex the loop count calculation was going to be anyway.

sam said:
Does the compiler do software pipelining based on the lower bound of MUST_ITERATE pragma? If the lower bound is 1, then the compiler treats the loop as iterating once only regardless of the upper bound, and only software-pipelines the 1 iteration, leaving all the other possible iterations as epilogue? If this is true, then software pipelining is ineffective for dynamic loop bounds with a wide range at runtime.

That sounds about right, if your bounds are so wide than it would be impractical to put the loop into a software pipeline, though having a lower bound of 1 is still better than a lower bound of 0. The more restricted you can be with the MUST_ITERATE pragma the more efficient the compiler will be in implementing software pipelining. The alternative would be to make more than one loop function, where you can guarantee calls into one of them are more restricted and can generate more pipelined code whereas the other could be more lose for cases like the loop of one, of course this is all fairly application specific so it may or may not be practical in your situation.

Processors

Processors forum

software pipelining and dynamic loop count