This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Fastest 8bit x 8bit multiply ever made for value line msp430 (probably)

Set up R12 and R13 with the numbers you want to multiply, both sides can be any number 0 to 255.
Returns a 16bit value in R13
Every instruction are single word type and it take 27-to-35cycles (plus 3 for ret)
I don't have to clear c before rrc  is due to if JNC was done it was of course already clear
if the add.w is done it will always clear c too.

I want to see you C programmer make this a function call and post it here.

Example     mov.b  #240,r12
            mov.b  #221,r13
            call   #multiply
            bis.w  #CPUOFF+GIE,SR 
;--------------------------------
multiply    swpb   R12
            clrc
            rrc.w  r12
_bit0       rrc.w  r13
            jnc    _bit1
            add.w  r12,r13            
_bit1       rrc.w  r13
            jnc    _bit2
            add.w  r12,r13
_bit2       rrc.w  r13
            jnc    _bit3
            add.w  r12,r13
_bit3       rrc.w  r13
            jnc    _bit4
            add.w  r12,r13
_bit4       rrc.w  r13
            jnc    _bit5
            add.w  r12,r13
_bit5       rrc.w  r13
            jnc    _bit6
            add.w  r12,r13
_bit6       rrc.w  r13
            jnc    _bit7
            add.w  r12,r13
_bit7       rrc.w  r13
            jnc    _bitend
            add.w  r12,r13
_bitend     ret

  • I was able to change the eight jnc to addc = 8 cycles saved
    subtract one for the added inv = now only 20-to-28 cycles !!
    It now also auto AND 0xFF the values coming in, without extra cycles.

    Adding #1 to ProgramCounter may cause reset on some msp familys(?)

    multiply2   inv.b  R13     ; invert and clears bit 8-F
                add.b  #0,R12  ; clear c and clears bit 8-F 
                swpb   R12
                rrc.w  r12
    _bit0       rrc.w  r13
                addc.w #1,PC
                add.w  r12,r13           

  • Looks great. Where’s the ‘like’ button? J

    Sure that addc.w is faster than jnc? If PC is the target, the instructions need an extra cycle, because the next instruction fetch needs to be delayed until the operation is complete. While for all other registers, register update and fetch of the next instruction can be done parallel.
    Nice idea that adding #1 is ignored as it only changes the LSB, while #1+C will add 2 and will increment PC.
    But I dimly remember that the state of PC after fetching an instruction is different on different MSPs (was it a MSP/MSPX core difference?) I'm just relying on faint memories of an errata sheet entry here.

  • As already noted by JMG, as per MSP430x5xx family datasheet, emulated instruction br RX (mov.w RX, PC) will be executed in 3 cycles. I measured it (for MSP430x5xx not MSP430x2xx). Instruction add.w #1, PC will be executed in 3 cycles, too.

  • Yes, it was IAR Workbench that simulated it wrong.

    IAR stops on PC+1 as it does not emulate the bit0 in PC is always fixed to zero.
    So I did a multiplication run on 0, as then every addc will the do +2.
    IAR calculated the cycles need for PC+2 wrong as it think it's like any other R+2

    I instead used a slow ACLK running while it calculates 65536 multiplication with both JNC and ADDC.W,
    when breakpoint hits on a real G2553, the TA0R had the same amount of time in both trials.

    So the top post is the correct one to use and is still very fast.

    Is the fetch stored somewhere writable? so subtracting c on it could change R12 in to R11 and emulate a NOP.

  • Tony Philipsson said:
    IAR stops on PC+1 as it does not emulate the bit0 in PC is always fixed to zero.
    So I did a multiplication run on 0, as then every addc will the do +2.

     Hi Tony, I see now redirected from your post, adding 1 to PC or SP too on a word machine result in odd address and is a fault.

     After this processor stop by reset, illegal instruction and illegal address too are mapped to Reset vector.

  • >result in odd address and is a fault

    Most msp430 (later models) have a 0 hardcoded in to BIT0 of PC.
    You can not make it a odd number even if you try.

    But IAR does not emulate that correctly

    IAR did also incorrectly calculate that addc #1,PC takes 1 cycle
    but it's two just like JNC so there was no cycle improvements.
  • Tony Philipsson said:
    IAR did also incorrectly calculate that addc #1,PC takes 1 cycle
    but it's two just like JNC so there was no cycle improvements.

     This is true, but simulator and not only map result from ALU to a register and PC, Sp are not different from others general purpose. So if register has bit 0 it is set and consequently can be a problem on some silicon revision or why not IPCore too, see here:

    http://opencores.org/project,openmsp430

     PC and SP are 16 bit and not truncated nor bit 0 reset.

     Anyway this macro code I think can be the faster 8x8 multiply from usual looping mode, sure not rearranging test as looping can be shortest too?

  • Tony Philipsson said:
    I was able to change the eight jnc to addc = 8 cycles saved

    Due to the pre-fetch pipeline of MSP CPU (or CPUX), it is not possible to execute any jump/branch (conditional or otherwise) in 1 MCLK cycle.

**Attention** This is a public forum