This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

multiplier in msp430

Other Parts Discussed in Thread: MSP430F2013

Dear Experts:

     I measured the multiply time for the msp430f2013.  It is:

long x long --> unsigned long long   takes 17us, processor running at 1Mhz.   This seems pretty good without a hardware multiplier because at full speed (16Mhz) it would only take 1.06us.

Now if I were to slip down and buy one of the units having a 16x16 hardware multiplier, how long would it take to multiply a long x long

for  long long result?

Was also looking at the 32x32 hardware multiplier which seems to have a 16x16 engine.  Thus would expect this to need 4 cycles to multiply 32x32.  Is this correct?

Thanks,

John

  • Did you mean 17 micro-seconds? That is too good to be true. Please check that again.
  • 17uS, that would around 8-12 instructions at 1MHz, so unless it was a value the compiler was able to precalculate as it never changed.
    Or it was *254 and it only have to do a swap and a sub.

  • I remeasured the multiply times with the msp430f2013 using large random numbers
    (proc running at 1Mhz):

    M1 = (signed long)1314159314 ;
    M2 = (signed long)2314314159 ;
    P1OUT |=0x20;
    ans = M1*M2 ; //test
    ans = M1*M2 ; //test
    P1OUT ^=0x20;
    ans = ans +1;

    The multiply time for one multiply came out to be 1590us.
    before I was multiplying 8192x65000 which took 17us.

    Would even more like to know what this number would be if were using a 430 that has a 16x16 hardware multiplier??

    Thanks,

    John
  • by the way the result, ans, is a long long.

    John
  • Show us the complete code including declarations, is ans declared volatile?
    The 17uS must be that the compiler noticing that it can calculate the answer for you on the spot.
    As the math in not "random" at runtime.

  • John Moore39 said:
    I remeasured the multiply times with the msp430f2013 using large random numbers
    (proc running at 1Mhz):

    M1 = (signed long)1314159314 ;
    M2 = (signed long)2314314159 ;
    P1OUT |=0x20;
    ans = M1*M2 ; //test
    ans = M1*M2 ; //test
    P1OUT ^=0x20;
    ans = ans +1;

    Any decent compiler would 1) drop the first assignment to "ans" since it's dead, and 2) constant fold the multiplication. Hence, whatever you measured is most likely not a real multiplication.

    I would recommend you to inspect the generated assembly instructions in the list file.

    One way to measure the true speed of a multiplication is to hide the constant from the compiler. You can, for example, do this by reading the values from a (non-constant) variable in memory or by passing the constant to a function that simply returns them (but beware of inlining).

    Also note that there would be a big difference in the generated code for the following:

        long M1 = ...;
        long M2 = ...;
        long long ans = ((long long)M1) * ((long long)M2);

    Compared to:

        long long M1 = ...;
        long long M2 = ...;
        long long ans = M1 * M2;
    

    In the former case, the compiler can utilize the 32*32 => 64 bit mode in the hardware multiplier. In the latter, the compiler must perform a full 64*64=>64 bit multiplication.

        -- Anders Lindgren, Author of the IAR compiler for MSP430, IAR Systems.

  • Thanks, Anders:
    I did as you recommended in the first version you listed. I should have realized that the multiplicands don't need to be any larger than
    32 bits for a 64 bit product. The code gives the correct answer and takes only 700us with proc running at 1Mhz.
    The code does call _Mul32s32sTo64i.

    You seem to be saying that by doing it this way, it actually uses the hardware multiplier even for MSP430F2013!!!

    John
  • But, Anders, If I don't cast the multiplicands to (long long), the answer is incorrect and takes longer.

    John
  • The answer is incorrect according to you. But it is correct according to c.
  • John Moore39 said:
    Thanks, Anders:
    I did as you recommended in the first version you listed. I should have realized that the multiplicands don't need to be any larger than
    32 bits for a 64 bit product. The code gives the correct answer and takes only 700us with proc running at 1Mhz.
    The code does call _Mul32s32sTo64i.

    You seem to be saying that by doing it this way, it actually uses the hardware multiplier even for MSP430F2013!!!

    No, not at all. The routine _Mul32s32sTo64i is a plain subroutine that performs the multiplication using normal integer operations like shift and add.

  • John Moore39 said:
    But, Anders, If I don't cast the multiplicands to (long long), the answer is incorrect and takes longer.

    I think you missed a detail in my earlier answer. In the first case M1 and M2 were defined as "long" and cast to "long long" in the multiplication. In the second they were defined as "long long". In both cases, the multiplication was performed in 64 bit precision. However, in the first the example compiler can utilize the fact that both are sign extended from a 32 bit value.

    Another thing: Will M1 and M2 ever be negative? If not, then I suggest that you use unsigned numbers, as it's easier to multiply those. (By "easier" I mean it is faster and requires less code).

    By the way, a 32*32=>64 bit multiplication, when the 32 bit multiplier is used, requires eight instructions, four to write the two arguments and four to read it back out again, plus a number of instructions to disable and restore interrupts. For the 16 bit multiplier, a few more is requires.

  • Thanks to everyone. I needed this discussion. The multiply times are critical for low power or just
    general filtering.

    John
  • John,

    BTW, in c, if you multiply two 32-bit unsigned variables, you only get the low order 32-bit result. If you try to store that into a 64-bit variable, the high order 32-bit is filled with 0's. Thus according to you and I, the result is incorrect. But that is how c does it according to the c standard.

    When you do not have 32x32 multiplier, the high order 32-bit is not even calculated. When you have hardware 32x32 multiplier, the hardware will generate 64-bit, but the high order 32-bit will be discarded. The result is the same as in software multiplication.

    If you declare or cast the multiplier and multiplicand as 64-bit, c will do 32x32 multiplication three times, each time not calculate or discard the top 32-bit result of 32x32 operations. It will then combined the three 32-bit results into a 64-bit number.

    When in Rome, speak Roman. Don't say that they are incorrect ;)

    --OCY

**Attention** This is a public forum