This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

accounting for CpuTimer0Regs measurement: phantom instruction?

I have this 28335 assembly code generated from some C++ code:

   MOVW       DP,#0x0394
   MOVL       XAR4,@58
L0 MOVL       ACC,*+XAR4[0]
L1 NOP
L2 NOP
L3 NOP
L4 NOP
L5 NOP
L6 NOP
L7 NOP
L8 NOP
L9 MOVL       XAR6,*+XAR4[0]
   MOV        AH,@AR6
   SUB        AL,@AH

What's doing is setting up XAR4 to point to the 32-bit TIMER0 counter that's set to free-run at the system clock rate (150MHz).
It samples TIMER0 once, storing its value in ACC, executes 8 NOP instructions, and then samples TIMER0 again, storing its
value in XAR6, and subtracts the two.

When this code executes from RAM, the result I get is 10 rather than the expected 9, which I don't understand. NOP takes 1 cycle, and MOVL
takes 1 cycle, so the time between lines L0 and L9 above should be 9 cycles.

Where does the extra CPU cycle go?