Tool/software:
I was looking through the assembly generated by the CLA C compiler trying to gain some insight on optimizing my C code and noticed that it never uses the MAR1 register. Only the MAR0 register is used.
Since the MAR registers cannot be used until the 4th instruction following a load, it seems like it would be beneficial to use both MAR registers to avoid lots of MNOP stalls. It seems like the compiler doesn't want to do this however, and it generates code like below with lots of MNOP stalls waiting for MAR0 to load.
Is there a compiler optimization flag I'm missing? Am I misunderstanding the usefulness of MAR1 or the pipeline operation? Do I need to just write my code directly in assembly if the compiler is this dumb?
MMOV16 MAR0,@task1+4 ; load MAR0 with new address MI32TOF32 MR0,@iCmd ; non-conflicting instruction MADDF32 MR0,MR0,MR1 ; non-conflicting instruction MF32TOI32 MR0,MR0 ; non-conflicting instruction MMOV16 *MAR0+[#10],MR0 ; Use MAR0 on 4th instruction after load MMOV16 MAR0,@task1+5 ; Load MAR0 with new address MNOP ; stall MNOP ; stall MNOP ; stall MMOV16 *MAR0+[#10],MR0 ; MAR0 available for use on 4th instruction MNOP MNOP MNOP