This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Fixed point math taking to long

Other Parts Discussed in Thread: MSP430F2619

Greetings,

Currently I am experimenting with a MSP430F2619 to emulated a PI AGC controller in a radio. My problem is that the MSP430F2619 is not fast enough so I moved to Q19 fixed point math. (I am aware that the MSP430F2619 does not have a 32 bit hardware multiplier)

By running through the assembly code in CCS5 I counted approximately 120 cycles for a multiply and sum operation for two long Q19 fixed point math words. The MSP430 is running with a 16 Mhz clock, assuming the processor does every clock cycle some thing the total calculation time is 120/16MhZ = 7.5 usek however by toggling an IO with a LED connected, I measured a total of 37.1 usek which is roughly about 600 cycles which is much more than the counted 120 cycles (I did this measurement in timerA's interrupt routine).

For multiplying two floating point values and summing the total elapsed time is 82usek.

I extracted a small snipped from the code. I am doing something wrong here or is just simply the limit of the MSP430F2619?

// Q format

signed long long Multiply_result_q;

static signed long Error_Voltage_q;

const signed long Ts_q= 194;

static signed long Ie_q;

P4OUT &= ~0x04;// LED OFF

Multiply_result_q = (long)(((long long)(long)Error_Voltage_q * (long long)(long)Ts_q)>>q_divide_pow2__19); // Error_Voltage = (Ref_Voltage - RSSI_Voltage)*Ts;

Ie_q = Ie_q + (long)Multiply_result_q; // Integrate Error

P4OUT ^= 0x04; // LED ON

 

//Floating point

static float Ie;

static float Error_Voltage;

static const float Ts = 1.0/2300.0;

P4OUT &= ~0x04;// LED OFF

Ie = Ie + Error_Voltage*Ts;

P4OUT ^= 0x04;// LED ON

### Happy new year !:)

  • The 2619 only has a 16 bit hardware multiplier. Since you are working on 64 bit long long values, the hardware multiplier is of no use. (note that the C language requires the compiler to extend the values to 64 bit first and then do a 64x64 operation, discarding the upper 64bit of the result, instead doing a 32x32 bit multiplication with 64bit result.

    So the multiplication is completely done in software. And not really fast.

    On MSPs with 32bit HWM, the situation is slightly better, but I wrote my own inline code for multiplications where the 32x32->64 or 16x16->32 functionality of the hardware multiplier is used, speeding up calculations significantly.

    Unfortunately, multiplication is one of the weak spots of the C language, as soon as you work with data that is larger than the 'natural size' of the processor. Well, on most processors you won't see a difference. The MSP is one of very few where the result of a multiplication has more (valid) bits than its operands had. On most, the C approach of pre-expansion of the datasize matches the requirements of the processor.

  • Thanks for your reply. I found the information very use full. 

    Could you please explain why if I step trough the dis-assembly code I count 120 steps which is about 120*1/fosc = 7.5 usek but I measure 37 usek for a multiply and accumulate for two 32 bit words . Are some hidden steps in the actual assembly code or does the possessor take 5 clock cycles to execute 1 step ? 

  • The MSP430x2xx Family MCUs take 1 to 6 MCLK cycles per instruction.

    Tables 3-14 through 3-16 in the Family Users Guide (Pages 60 and 61) specify how many clock cycles are required for each particular combination of instruction, source and destination.

  • On last quick question: If I move the multiplication to the ram do you think I would see a small performance increase ? 

  • On an FR5xx device, running with >8MHz, it would indeed make a difference, since the CPU gets waitstates when reading FRAM faster than 8MHz. However, this only affects the instruction read from FRAM, not the access of the hardware registers, and also much of it is smoothened by a hardware cache.

    On F devices with flash it doesn't make a (speed) difference whether code runs from flash or ram. Running form ram takes a little less power and can continue when the flash controller is busy writing or erasing flash, but code execution speed is the same.

  • Thanks again for the answer! In the code the multiplication function executes 3 times every 400 usek,  sounds like a good a idea move the multiplication function to RAM since the radio runs of a battery. 

**Attention** This is a public forum