Cycle timing for jump on CPUXV2 chips

Peter Bigot

Other Parts Discussed in Thread: MSP430G2231, MSP430F2619, CC430F5137, MSP430F5438A, MSP430F5529

I'm validating instruction timings prior to implementing __delay_cycles in mspgcc. I understand the timings for a given instruction may be different among the base MSP430 CPU, CPUX (2xx/4xx), and CPUXV2 (5xx/6xx). The anomaly I'm seeing doesn't seem to be documented.

My test program maps SMCLK and another pin to PCB test pins. I run the following loop (adjusted for MCU-specific port/pin) with a logic analyzer attached.

    8072:       f2 e0 40 00     xor.b   #64,    &0x0203 ;#0x0040
    8076:       03 02
    8078:       f2 e0 40 00     xor.b   #64,    &0x0203 ;#0x0040
    807c:       03 02
    807e:       f9 3f           jmp     $-12            ;abs 0x8072

All the documentation for all the chips suggests that the xor instruction should take 5 cycles, and all jumps take 2 cycles, whether taken or not (the differences between CPU and CPUX and CPUXV2 do not apply to these instructions).

So comparing the output pin with SMCLK I should see 5 cycles in one mode, then 7 (5 plus the jump) in the other mode.

What I see is:

msp430g2231 [CPU]: 5 + 7
msp430f2619 [CPUX]: 5 + 7
cc430f5137 [CPUXV2]: 5 + 8
msp430f5438a [CPUXV2]: 5 + 8

This implies that a jump on CPUXV2 really takes three cycles. I confirmed this by inserting "jmp $2" between the xor instructions, and measured 7+7 for CPU and CPUX, and 8+8 for CPUXV2.

The 2xx guide has separate instruction timing sections for 430 and 430X, both of which state jumps take 2 cycles, which matches my measurements. The 4xx and 5xx/6xx guides also have separate timing sections for 430 and 430X, but only the 430 one mentions jumps (5xx/6xx section 5.5.1.5) and it also says jumps take two cycles.

Am I doing something wrong, did I miss something in the documentation, or is the documentation in the 5xx/6xx users guide incomplete/incorrect?

Peter

over 14 years ago

0 Jeff Tenney over 14 years ago

Guru 12160 points

Peter,

I ran some experiments to try to figure out where you went wrong.

But all I did was prove you right. No matter what I do, the JMP instruction seems to take 3 cycles on my MSP430F5529. I have several more experiments I want to run, and I will definitely post again if I learn anything interesting. I will try other arrangements of instructions, conditional jumps, and other CPUXV2 MCUs. I will also look at the implementation of IAR's __delay_cycles( ) for these MCUs.

Very nice find sir.

Jeff

0 Jeff Tenney over 14 years ago in reply to Jeff Tenney

Guru 12160 points

Peter,

The following two loops have the same number of cycles, even though one loop has a NOP. (That's the only difference.)

TEMP XOR.B #BIT6, &P1OUT ; 5 cycles
JMP TEMP ; 3 cycles !!!

and

TEMP    XOR.B   #BIT6, &P1OUT   ; 5 cycles
        NOP                     ; 1 cycle
        JMP     TEMP            ; 2 cycles

The following code, however, acts differently with and without the NOP:

TEMP    MOV.B   #BIT6, &P1OUT   ; 4 cycles
        MOV.B   #~BIT6, &P1OUT ; 4 cycles
        JMP     TEMP            ; 2 cycles

and

TEMP    MOV.B   #BIT6, &P1OUT   ; 4 cycles
        MOV.B   #~BIT6, &P1OUT ; 4 cycles
        NOP                     ; 1 cycle
        JMP     TEMP            ; 2 cycles

So it appears to be related to the RMW instructions (and perhaps other factors like the addressing mode). The JMP instruction seems to stall waiting for RMWs to quit using the address bus. However, the NOP instruction doesn't wait. Neither does the MOV instruction in this example:

TEMP:   BIC.B   #BIT6, &P1OUT   ; 5 cycles (P1.6 stays high for 8 cycles, not 7 (2+5)!)
        MOV.B   #BIT6, &P1OUT   ; 4 cycles (P1.6 stays low for only 3 cycles, not 4!)
        BIC.B   #BIT6, &P1OUT   ; 5 cycles (P1.6 stays high for 6 cycles, not 5!)
        MOV.B   #BIT6, &P1OUT   ; 4 cycles (P1.6 stays low for only 3 cycles, not 4!)
        JMP     TEMP            ; 2 cycles (P1.6 is high for these 2 cycles)

In fact this example gives some insight into CPUXV2 pipelining as noted in the comments inside parentheses. P1.6 doesn't match up exactly with the cycle counts allotted to each instruction. This is because the CPUXV2 appears to let RMW instructions bleed over into the first cycle of execution of the next instruction whenever possible. Maybe it's not possible during a JMP since the CPU has to perform an instruction fetch during that cycle.

Anyway it appears that JMPs take 3 cycles when they follow an RMW instruction that modifies anything but a register.

Jeff

0 Peter Bigot over 14 years ago in reply to Jeff Tenney

Expert 2505 points

Thanks for confirming and clarifying. Trying to infer the pipeline architecture from probing like this is a great exercise for students, but I'd rather spend my time on something more practical. (Seems odd that a jump stalls for a read-modify-write but not for a plain write.)

So the answer to the ultimate question seems to be "documentation is incomplete": jumps take 2 cycles, but can be stalled under certain circumstances. In this case, I don't need to know what "certain circumstances" means: I've verified that the specific __delay_cycles implementation I'm using (nop* mov (dec jnz)+) has the same total cycle count among CPU, CPUX, CPUXV2, removing a special case that I didn't like anyway.

I appreciate the help.

Peter

0 Jeff Tenney over 14 years ago in reply to Peter Bigot

Guru 12160 points

Peter Bigot said:

Trying to infer the pipeline architecture from probing like this is a great exercise for students, but I'd rather spend my time on something more practical.

I came to the exact same conclusion about 10 minutes into this little experiment. TI should explain this or a graduate student should figure it out.

Jeff

0 Jens-Michael Gross over 14 years ago in reply to Jeff Tenney

Guru 227245 points

I htink, th eobserved delay has nothing to do with the code itself but with the flash controller. Remember the flash controller problem on the 54xxA (and other 5x) devices. The explanation included the discovery that the flash controller is fetching 4 bytes into a read latch (which was faulty).

So if you have a plain write, the instruction fetch can be done from the latch while the writes is executed. If it is a read/modify/write, the read may invalidate the latch (even if it is not read from flash at all). So the difference with the NOP noted above and the difference between a plain write and a RMW instruction may depend on where on a 32 bit boundary the instruction sits, first or second word.

Just a guess (as usual) :)

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

Cycle timing for jump on CPUXV2 chips