Hi,
I have two loops, where scheduling the inner loop fails with ii = 15 because of high register pressure. (I am using linear assembly):
;* Searching for software pipeline schedule at ...
;* ii = 15 Register pressure too high: 38
;* ii = 15 Did not find schedule
;* ii = 16 Schedule found with 3 iterations in parallel
Because there are a few "cold" registers, which are only accessed in the outer loop, I thought about using some temporary memory location (like the stack) to store those values to make more registers available for scheduling the inner loop. So what I do now is:
OUTERLOOP:
LDW *regStack[0], coldReg1
LDW *regStack[1], coldReg2
LDW *regStack[2], coldReg3
LDW *regStack[3], coldReg4
; Do a bit with the values in the cold registers
INNERLOOP
; Some very hot codeBDEC INNERLOOP, iLoopCnt
BDEC OUTERLOOP, oLoopCnt
MV imgWidth, xLoopCnt
However that doesn't seem to help - the "Regs Live Always" count doesn't change, and the code produced doesn't differ either - like the optimizer doesn't recognize those registers can be re-used, because at the next iteration the old values will be restored anyway. I also tried putting those LDWs right after the inner loop's branch, but it didn't help either.
Is there any way to make more registers available by moving cold registers to a temporary storage?
Thx