Hello,
I've written a hand optimized linear assembly function containing a loop, and the aseembly optimizer does afabulous jop optimizing it, generating a highly optimized loop kernel.
However, there seem to be several problems caused by register pressure as well as long live registers:
;* Searching for software pipeline schedule at ...
;* ii = 10 Register is live too long
;* ii = 10 Cannot allocate machine registers
;* Regs Live Always : 3/3 (A/B-side)
;* Max Regs Live : 27/27
;* Max Cond Regs Live : 0/0
;* ii = 10 Schedule found with 3 iterations in parallel
What I don't understand is why/how the optimizer can generate a pipelined loop in 10 iterations, although it is reporting those problems with 10 iterations. Was the optimizer able to optimize those problems "away"?
Thanks you in advance, Clemens