This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430: Fast Integer Sqrt() Routine

Hi,

 

I have 64 integer number and I need its square root as 32 bit integer only. Fractionals are not required. The library provided by TI for energy metering consumes 25000 cycles, I didnt understant what is the benefit of it. If I use double float and standart libraries instead of provided one, it is only 6000 cycles. This process should be maximum 1000 cycles. Is there any application note or ready code for this?

 

10x.

 

  • I will not answer directly to your question, but have a look to document SLAA024, there is an example in chapter 5.1.8.1 of  "fast square root" but only for 32bits integer.

    I take 720 cycles for calculating the square root, I don't think it is possible to make such calculation on 64 bits number in 1000 cycles.

  • If you have access to the 32bit hardware multiplier, you can do it really fast by making some 'obvious' assumptions.

    first, the square root can only be 32 bit as the operand is only 64 bit.
    Now start with 0x8000000 and take the square. If the result is higher than the original value, clear the bit 31 again, else leave it. Now set the bit 30 and repeat. At the end you'll get the square root after 32 multiplications and comparisons. 31 cycles per step should be possible.
    You can even reduce the time further by skipping one multiplication/comparison step for each two clear MSBs in the operand. But then the time required differs by the operand. And the maximum time increases by some cycles.

  • Hi,
     
    I wrote something like binary search by using hardware multiplier, it consumes around 1800 cycles. But the problem it kills my SD16_A interrupt, I didnt understand why. In SD16_A interrupt, I’m also using harware multiplier at different modes, therefore I’m saving MPY32CTL0 and RES registers and restoring back after interrupt. isqrt64() routine should work in main. Yes it is working very well in main but why is my SD16_A interrupt affected from isqrt64() at the same time?
     
     
    uint32_t isqrt64(register uint64_t h)
    {  
        bool s = __get_interrupt_state();
       
        uint32_t hi = 0xFFFFFFFF;
        uint32_t lo = 0;
       
        uint32_t mid = ( hi + lo ) >> 1;
        *(uint64_t*)&RES0 = 0x3FFFFFFF00000001ULL;
       
        MPY32CTL0 = 0;
       
        while( (lo < (hi-1)) && (*(uint64_t*)&RES0 != h))
        {  
            if( *(uint64_t*)&RES0 < h ) lo = mid;
            else hi = mid;
           
            mid = ( hi + lo ) >> 1;
           
            //*(uint64_t*)&RES0 = (int64_t)mid*mid;
            __disable_interrupt();
                MPY32L = mid;
                MPY32H = mid >> 16;
                OP2L = mid;
                OP2H = mid >> 16;
            __set_interrupt_state(s);
        }
       
        return(mid);
    }
     
    #pragma vector =  SD16A_VECTOR /* 0xFFEE ADC */
    __interrupt void SD16A_ISR(void)
    {
        uint16_t tmp_MPY32CTL0 = MPY32CTL0;
        uint64_t tmp_RES0 = *(uint64_t*)&RES0;
     
        ...
        ...
       
        MPY32CTL0 = tmp_MPY32CTL0;
        *(uint64_t*)&RES0 = tmp_RES0;
    }
     
    10x
  • Hello,

    In this TI document, Minus is not defined / initialized. It seems that one line is missing.

    Does anyone have the full function please?

    Best regards

    Mich

  • What do you mean with ‚kills my SD16 interrupt‘? What do you observe?

    I don’t see a reason why the two should interfere.
    Well, one could refine the algorithm a bit, e.g. replace the while with a do/while, removing the need of initializing RES, extending the time with disabled interrupts, removing the need to backup RES in the ISR etc. But it should work fine as it is.

  • I don't remember what the problem was.

    But below code works fine:

    __interrupt void SD16A_Interrupt(void)
    {
    #ifndef	_WIN32
    	asm("PUSH 0x015C");		//	Save multiplier mode, etc. MPY32CTL0
    	asm("PUSH 0x015A");		//  Save result 3
    	asm("PUSH 0x0158");		//  Save result 2
    	asm("PUSH 0x0156");		//  Save result 1
    	asm("PUSH 0x0154");		//  Save result 0
    	asm("PUSH &MPY32H");	//  Save operand 1, high word
    	asm("PUSH &MPY32L");	//  Save operand 1, low word
    	asm("PUSH &OP2H");		//  Save operand 2, high word
    	asm("PUSH &OP2L");		//  Save operand 2, low word
    #endif
    
    ..
    ..
    
    #ifndef	_WIN32
    	asm("POP &OP2L"); 		// Restore operand 2, low word
    	asm("POP &OP2H"); 		// Restore operand 2, high word
    							// Starts dummy multiplication but
    							// result is overwritten by
    							// following restore operations:
    	asm("POP &MPY32L"); 	// Restore operand 1, low word
    	asm("POP &MPY32H"); 	// Restore operand 1, high word
    	asm("POP 0x0154"); 		// Restore result 0
    	asm("POP 0x0156"); 		// Restore result 1
    	asm("POP 0x0158"); 		// Restore result 2
    	asm("POP 0x015A"); 		// Restore result 3
    	asm("POP 0x015C"); 		// Restore multiplier mode, etc, MPY32CTL0
    #endif
    }

  • Jens-Michael Gross said:

    What do you mean with ‚kills my SD16 interrupt‘? What do you observe?

    I don’t see a reason why the two should interfere.

    I see a potential problem with the code. If the interrupt routine, directly or indirectly, use the hardware multiplier, it will overwrite the values stored in the hardware multiplier registers, as they are written to in parts of the code where interrupts are enabled.

    There are a number of ways around this:

    1) Make sure the interrupt routines don't use the hardware multiplier. (Remember that the compiler and the runtime library can use it for you, even if the interrupt routines contains plain code.)

    2) Make the square root routine interrupt safe. One way to do this is to disable interrupts over the entire routine -- unfortunately, the processor will not service interrupts during this time. Another is to limit the use of the hardware multiplier to the inner loop, writing down input values and reading out the result while interrupt are disabled.

    3) Save the state of the hardware multiplier, as suggested by BasePointer. Unfortunately, this makes the interrupt slower, it's hard to do it 100% correct (even the code sequences suggested by Ti contains problems), and it hard to do in C (using asm("push ...") cause the C compiler to assume wrong stack offsets so the __xxx_on_stack() won't work. In addition, backtrace info will no longer be correct, so you will not be able to see the call stack in the debugger.) Personally, I would discourage you from using this method.

        -- Anders Lindgren, IAR Systems, Author of the IAR compiler for MSP430

     

  • Hi Anders,

    I don't know what __xxx_on_stack() function is for.

    How can I be sure that whether __xxx_on_stack() function is used inside interrupt routine by compiler?

    Above code is used in one of our project and the product is in mass production for 2 years.

    We did't notice strange behavior so far.

    Below code works too but slower. Do you think that we should change it with below version?

    We use IAR 6.10 for MSP430 in our projects.

    #ifndef _WIN32
    #pragma vector =  SD16A_VECTOR /* 0xFFEE ADC */
    #pragma optimize=speed
    #endif
    static __interrupt void SD16A_Interrupt(void)
    {
    	// push all multiplier module registers
    	u16 _MPY32CTL0 = MPY32CTL0;	//	Save multiplier mode, etc.
    	u16 _RES3 = RES3;				//  Save result 3
    	u16 _RES2 = RES2;				//  Save result 2
    	u16 _RES1 = RES1;				//  Save result 1
    	u16 _RES0 = RES0;				//  Save result 0
    	u16 _MPY32H = MPY32H;			//  Save operand 1, high word
    	u16 _MPY32L = MPY32L;			//  Save operand 1, low word
    	u16 _OP2H = OP2H;				//  Save operand 2, high word
    	u16 _OP2L = OP2L;				//  Save operand 2, low word
    
    	...
    
    	// 
    	
    	// pop all multiplier module registers
    	OP2L = _OP2L; 			// Restore operand 2, low word
    	OP2H = _OP2H; 			// Restore operand 2, high word
    	// Starts dummy multiplication but
    	// result is overwritten by
    	// following restore operations:
    	MPY32L = _MPY32L;		// Restore operand 1, low word
    	MPY32H = _MPY32H;		// Restore operand 1, high word
    	RES0 = _RES0;			// Restore result 0
    	RES1 = _RES1;			// Restore result 1
    	RES2 = _RES2;			// Restore result 2
    	RES3 = _RES3;			// Restore result 3
    	MPY32CTL0 = _MPY32CTL0;	// Restore multiplier mode, etc.
    
    }
    

    Thanks.

  • First, I would like to make a correction, I meant __xxx_on_exit(), as in __bic_SR_register_on_exit(). They are only used if you use them explicitly in your interrupt routine.

    If your code work then I suggest that you don't change it. However, I would have preferred a solution where the square-root routine would have been interrupt-safe. In the end, it boils down to how fast normal operation should be and how responsive the system should be with respect to interrupts. Saving and restoring hwmul registers is expensive it the interrupt occurs often. On the other hand, disabling interrupt over the entire square-root routine might cause the system to stop responding to interrupt for too long. Rewriting it so that it don't rely on the hardware multiplier outside the inner loop (where interrupts are disabled) might cause it to be too slow...

        -- Anders Lindgren, IAR Systems

  • The code implies that the MPY was operated in 32 bit unsigned multiply mode. By writing the backed-up values from MPY32L/H you implicitly set the multiplier in unsigned 32 bit multiplication mode, even if it was operated in signed mode or MAC mode or 16 bit mode before.
    There is no workaround. Your ISR cannot know which of the MPY or MAC registers the application has written to, when it was interrupted. Or Maybe only one and you interrupted a 32 bit write.

  • Anders, in the code I was referring to, the only two registers which are written outside the critical section (interrupts disabled) are saved and restored in the ISR and it does no harm. All critical stuff was done whiel interrupts are disabled.
    So the code, as far as posted, does not seem to have a problem.

    However, there might be other (not posted) ISRs which do not save and restore them properly.
    Regarding the interrupt-safety, I agree. The function can easily be rearranged so that all critical parts are inside the critical block and an ISR don’t have to take care for anything. Since all this are simple, straight and independent operations, there is no need to keep interrupts disabled for more than just one inner loop at a time. Sure this can still be a problem for ISRs with very tight latency requirements, but usually it isn’t. Also, the MPY register saving would add way more ‘latency’ to the ISR :)
    Of course I wouldn’t block interrupts for the whole SQRT calculation. But that’s not needed anyway.

**Attention** This is a public forum