How to get huge loop to be pipelined?

Clemens Eisserer

Hello,

I am currenzly optimizing some very time-consuming code-fragment which has 3 loops. The first two loop over y/x, while the innerst loop executes some code exactly 4 times.

While the optimizer does a good job optimizing the innerst loop, the loop prolog/epilog is quite long and because the loop is only executed 4 times contributes significantly to execution time. Furthermore the "preparation"-code, which is executed for every pixel isn't pipelined at all.

What I tried is to manually unroll the innerst loop, I get the following error: [E0800] Specified label is too far away; max range is [-2048,2047]
Furthermore the loop isn't pipelined at all.

So, my question consists of 3 parts:

- Is there any switch to instruct the optimizer to try to software-pipeline larger loops?

- Should I use another Branch-Instruction?

- Is this adviseable at all? As far as I can see the code should still fit into L1P.

Thank you in advance, Clemens

for(y=0; y < height; y++) {

for(x=0; x < width; x++) {

//Some preparation code

for(0 ...3) {

//Innerst loop code (~100 Instructions executed in 16 cycles)

}

}

}

over 13 years ago

George Mock over 13 years ago

TI__Guru**** 249900 points

Clemens Eisserer said:
I get the following error: [E0800] Specified label is too far away; max range is [-2048,2047]

This error is from the assembler. Are you programming in C and you got this error?

Thanks and regards,

-George

Clemens Eisserer over 13 years ago in reply to George Mock

Expert 2430 points

Hi George,

Thanks for your reply. No, the loop is written in linear assembly - so I guess BDEC is just the wrong branch instruction for larger offsets?

Any idea how I could get the assembly optimizer to pipeline loops containing a lot of instructions (~500)?

Thank you in advance, Clemens

Alberto Chessa over 13 years ago in reply to Clemens Eisserer

Mastermind 6670 points

Hi,

The software pipeline is based on a pipeline buffer that has a limited capacity (14 execution packets), so 500 instructions are too much to be pipelined.

Clemens Eisserer over 13 years ago in reply to Alberto Chessa

Expert 2430 points

And whats about the "manual" pipelining the compiler does, when no loop-buffer is available (like on The c64)?

However, it seems the optimizer can't copy with that many instructions anyway - only unrolling the loop once caused the optimizer to bail out ("did not find schedule") :/

Archaeologist over 13 years ago in reply to Clemens Eisserer

TI__Guru* 84285 points

Try unrolling the innermost 4-iteration loop completely. Unrolling a loop partially will probably not help the compiler software pipeline it.

Can you interchange the loops so that the 4-iteration loop is no longer the innermost loop?

Try using the --debug_software_pipeline (-mw) option to get more details about the compiler's attempts to software pipeline.

Clemens Eisserer over 13 years ago in reply to Archaeologist

Expert 2430 points

Hi Archeoloist,

Thanks for your feedback. With the innermost loop unrolled, the optimizer bails out with:

;* SOFTWARE PIPELINE INFORMATION
;* Disqualified loop: Too many instructions (limit = 250)

The completly unrolled loop has 432 instructions - Is there any way I can set the limit higher?

Thanks a lot, Clemens

Clemens Eisserer over 13 years ago in reply to Clemens Eisserer

Expert 2430 points

Found the -mpn switch, so I compiled with "-mpn500", which resulted (in combination with -O3 or O2) in:

Renamed pair with base %s above window: DCMPGTU4 .S2X VRB2700:VRB2617,VRA2410:VRA2409,VRB2564 ; |554|
>> ../Census16_C66Turbo.sa, line 111: INTERNAL ERROR: Corrupted IR detected
during check_mve/spilling

This may be a serious problem. Please contact customer support with a
description of this problem and a sample of the source files that caused this
INTERNAL ERROR message to appear.

Cannot continue compilation - ABORTING!

Archaeologist over 13 years ago in reply to Clemens Eisserer

TI__Guru* 84285 points

Well, that's a bug. Could you send me the test case through private conversation? I can't guarantee that fixing the bug will make the compiler able to software pipeline this loop, but we should at least try to fix the bug.

Archaeologist over 13 years ago in reply to Archaeologist

TI__Guru* 84285 points

This is now SDSCM00044316. This bug appears to have been introduced in release 7.2.x.

Code Composer Studio™︎

Code Composer Studio forum

How to get huge loop to be pipelined?