I have this 28335 assembly code generated from some C++ code:
MOVW DP,#0x0394 MOVL XAR4,@58 L0 MOVL ACC,*+XAR4[0] L1 NOP L2 NOP L3 NOP L4 NOP L5 NOP L6 NOP L7 NOP L8 NOP L9 MOVL XAR6,*+XAR4[0] MOV AH,@AR6 SUB AL,@AH
What's doing is setting up XAR4 to point to the 32-bit TIMER0 counter that's set to free-run at the system clock rate (150MHz).
It samples TIMER0 once, storing its value in ACC, executes 8 NOP instructions, and then samples TIMER0 again, storing its
value in XAR6, and subtracts the two.
When this code executes from RAM, the result I get is 10 rather than the expected 9, which I don't understand. NOP takes 1 cycle, and MOVL
takes 1 cycle, so the time between lines L0 and L9 above should be 9 cycles.
Where does the extra CPU cycle go?