This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430F5659: Hardware Multiplier speed

Part Number: MSP430F5659

Hello,

I tried to multiply two 16-bit integers by MPY32 on MSP430F5659 controller.

The code is simple and utilizes code example for this controller, here it is:

int32_t MpyByHardware (int16_t operand1, int16_t operand2)
{
  int32_t  Result;
  uint16_t ui16IntState;
  uint16_t ui16MPYState;
  
  // Save current interrupt state and disable interrupts
  ui16IntState = __get_interrupt_state();
  __disable_interrupt();
  
  // Save current MPY peripheral state
  ui16MPYState = MPY32CTL0;
  
  // Set MPY mode to signed multiply
  MPY32CTL0 |= MPYM_1;
  
  // Perform multiplication and save result
  MPYS = operand1;
  OP2  = operand2;
  __delay_cycles(3);
  Result = RESHI;
  Result = Temp<<16;
  Result += RESLO;
  
  // Restore MPY mode and interrupts
  MPY32CTL0 = ui16MPYState;
  __set_interrupt_state(ui16IntState);
  
  return Result;
}

The problem is that it takes around 3us (with 24MHz system clock), measured by scope, I/O pin set before function call, and reset right after.

This is even worse then just calling operand1*operand2 (it takes 2.8us).

I'm using IAR compiler, checked the "enable hardware multiplier" box in project options.

What am I doing wrong?

Thanks,

Dragana

  • Hello Dragana,

    One of the engineers on our team is looking into your issue and will provide a response before the end of the week.

    Regards,
    Ryan
  • Dragana,
    I'm looking into this for you. Have you tried timing this in free run mode? The debugger can slow things down a bit.

    Also, how long does line 18-23 take?
  • Dragana,
    Can you check to see that the option to build with the hardware multiplier enabled?

    I should also mention that assembly will be faster that c in this case.

    See the following thread for more information: e2e.ti.com/.../159764
  • Hi!

    The IAR tools, when "hardware multiplier" is enabled, generate inline code to perform the multiplication using the hardware multiplier. You will gain little by manually accessing the hardware multiplier registers.

    The only situation where this is needed is if you want to use a non-standard mode like fractional or saturation. In this case it is a good idea to disable the hardware multiplier option, to ensure that the code generated by the compiler doesn't interfere with the code that manually access the registers.

    For example, this is a standard C function that performs a 16*16->32 bit signed multiplication:

    long mul_sign_extended(short x, short y)
    {
      return ((long)x) * ((long)y);
    }
    

    The IAR tools generate the following:

    mul_sign_extended:

    PUSH.W SR
    DINT
    NOP
    MOV.W R12, &__iar_HWMUL + 2  // MPYS
    MOV.W R13, &__iar_HWMUL + 8  // OP2
    MOV.W &__iar_HWMUL + 10, R12 // RESLO
    MOV.W &__iar_HWMUL + 12, R13 // RESHI
    NOP
    POP.W SR
    RETA

    Of course, the corresponding code will be generated for the multiplication in any context, so you don't have to place the multiplication in a solitary function.

    If we take a look at the actual multiplication code you posted:

    - There is no need to set the multiplication mode manually. This is done if you write to the correct registers. If you write the first operand to MPY, the operation assumes that the operands should be zero extended. If you write to MPYS, the operands will be sign extended.

    - You don't have to save and restore MPY32CTL0 (unless you change one of the fundamental settings like enabling saturated or fractional modes).

    - you doesn't have to do __delay_cycles(3) in your code. When doing a simple 16*16 -> 32 bit operation, you can read out the result immediately. (If you plan to read out the high part of the operation first, or perform a 64 bit multiplication, a delay might be needed.)

        -- Anders Lindgren, IAR Systems, Author of the IAR compiler for MSP430

  • Hi,

    Thank you both for such a thorough approach to my question. I'm glad I can say you helped a lot.

    I removed delay and MPYCTL0 mode set, register save and restore, and also reading low result byte (not needed for my application) and now the function takes a little over 0.5us!


    I do have one more question:

    I need to divide two 16-bit signed integers, none of them being a constant value (both values taken from ADC and some more math done - that's why they are signed). And a result needs to be a 16-bit signed integer too.
    I use this statement:

    result = ((int32_t)numer << 16) / denom;

    It takes 15.6us. Is it possible to achieve faster division some other way?


    Thanks again for your support!

    Dragana
  • General division (when none of the values are constant) is hard to make fast. In your case, you might be able to copy and rename the division routines provided by the compiler (?DivMod32u and ?DivMod32s) and use the fact that 1) the lower 16 bits of the dividend is zero. 2) The lower 16 bits of the result isn't used. And 3) remove the code managing the modulo (unless you plan to use it as well).

    You could also investigate alternative approaches. If the range of the values is small, or if some values seem to be used often, it might be efficient to use a lookup table or handle special cases. Also, unsigned division is faster than signed, as there would be no need to manage the sign.

        -- Anders Lindgren, IAR Systems

**Attention** This is a public forum