This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

2^n division optimization for signed division

Other Parts Discussed in Thread: MSP430F5418A

I found that dividing using /32 for unsigned operands is optimized into a shift by 5 (which is a good thing). Now using signed integers the optimization won't do this.

Following minimal example shows the issue:

#include "msp430f5418A.h"

void main(void)
{
WDTCTL = WDTPW + WDTHOLD; // Stop WDT

volatile int u = -64;

volatile int a = u / 32;
volatile int b = u >> 5;

__bis_SR_register(LPM3_bits + GIE); // Enter LPM3, enable interrupts
__no_operation(); // For debugger
}

The relevant generated assembly is:

MOV.W #0xffc0,0x0000(SP)
MOV.W @SP,R12
MOV.W #0x0020,R13
CALLA #__divi
MOV.W R12,0x0002(SP)
MOV.W @SP,R15
RPT #5 RRAX.W R15
MOV.W R15,0x0004(SP)

The optimizations are turned to maximum (o4) and it's optimized for speed (opt_for_speed=5), building release code (no debug options selected).

Now I see there is a reason behind calling divi in this case - if something smaller than -32 is divided by 32 the result is 0 - which will not happen with a simple shift operation (result will be -1).

I was wondering however if a comparison of the numerator and divisor and after that either a return of 0 or returning the shifted value wouldn't be faster and smaller.

Something like this pseudocode:

if (abs(numerator) <  divisor)

return 0; 

else

return numerator >> log2(divisor);

I'm not used to writing algorithms in assembler - so my trying to do so will probably generate not a very nice result - that's why my statement is a bit vague and I ask the question here.

  • It's not quite that simple.  Consider -33 / 32 == -1, but -33 >> 5 == -2.

    There may be things that could be done for MSP.  This would require a bit of investigation.  You may be interested to know that C6000 (which has 32-bit int) does this operation with the expression (signed)(((unsigned)(x>>4)>>27)+x)>>5, which would not be feasible on an MSP that doesn't have RPT RRAX, but perhaps it would on MSP430F5418A.  On the other hand, it would slightly increase code size.

  • Oh well, seems like today half of my brain isn't working like it should - I'm missing things everywhere.

    Thanks for the explanation (and for pointing out my lack of thinking). I haven't looked at the MSP instruction sets in detail - from your answer I take it that they differ depending on the families, another interesting point.

    I should stay away from trying to optimize things when I'm not at my best... 

  • There's no harm in looking at or posting about such things, even if it's not a perfect solution.  A similar sort of observation by a forum user led to implementing faster hardware multiplication routines in the RTS.