C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F280049C: Calculation Time of Arithmetic in the CLA

Chen-Chih Liu

Intellectual 310 points

Part Number: TMS320F280049C

Hi experts,

We would like to reduce the calculation time of arithmetic in the CLA and we found the floating data type is faster than the integer from the data sheet.

But we found an interesting thing which is shown below (we use the oscilloscope to measure the elapsed time):

It is reasonable that scenario 2 is faster than scenario 1 and scenario 4 is faster than scenario 3. But we would like to know why scenario 3 takes much more time than scenario 1?

If it is caused by the different variables, scenarios 4 and 5 should also take much more time than scenario 2.

kindly help clarify our doubts.

Scenario 1: Data types of a and b are unsigned integers, it takes 1.07us.

uint16_t a, b; // a is a random variable

GpioDataRegs.GPBSET.bit.GPIO34 = 1; // pull high
b = a * (a + 1) - (a * a - 1);
b += a * (a + 2) - (a * a - 2);
b += a * (a + 3) - (a * a - 3);
b += a * (a + 4) - (a * a - 4);
b += a * (a + 5) - (a * a - 5);
b += a * (a + 6) - (a * a - 6);

GpioDataRegs.GPBCLEAR.bit.GPIO34 = 1; // pull low

Scenario 2: Data types of a and b are floating, it takes 0.42us.

float a, b; // a is a random variable

GpioDataRegs.GPBSET.bit.GPIO34 = 1; // pull high
b = a * (a + (float)1) - (a * a - (float)1);
b += a * (a + (float)2) - (a * a - (float)2);
b += a * (a + (float)3) - (a * a - (float)3);
b += a * (a + (float)4) - (a * a - (float)4);
b += a * (a + (float)5) - (a * a - (float)5);
b += a * (a + (float)6) - (a * a - (float)6);

GpioDataRegs.GPBCLEAR.bit.GPIO34 = 1; // pull low

Scenario 3: Data types of (a, c, d, e, f, g) and b are unsigned integers, it takes 34us.

uint16_t a, b, c, d, e, f, g; // a, c, d, e, f, g is a random variable

GpioDataRegs.GPBSET.bit.GPIO34 = 1; // pull high
b = a * (g + 1) - (e * f - 1);
b += g * (a + 2) - (f * e - 2);
b += c * (d + 3) - (g * d - 3);
b += d * (e + 4) - (c * c - 4);
b += e * (f + 5) - (a * g - 5);
b += f * (a + 6) - (d * a - 6);

GpioDataRegs.GPBCLEAR.bit.GPIO34 = 1; // pull low

Scenario 4: Data types of (a, c, d, e, f, g) and b are floating, it takes 0.6us.

float a, b, c, d, e, f, g; // a, c, d, e, f, g is a random variable

GpioDataRegs.GPBSET.bit.GPIO34 = 1; // pull high
b = a * (g + (float)1) - (e * f - (float)1);
b += g * (a + (float)2) - (f * e - (float)2);
b += c * (d + (float)3) - (g * d - (float)3);
b += d * (e + (float)4) - (c * c - (float)4);
b += e * (f + (float)5) - (a * g - (float)5);
b += f * (a + (float)6) - (d * a - (float)6);

GpioDataRegs.GPBCLEAR.bit.GPIO34 = 1; // pull low

Scenario 5: Data types of (a, c, d, e, f, g) and b are unsigned integers and floating, respectively, it takes 0.86us.

uint16_t a, c, d, e, f, g; // a, c, d, e, f, g is a random variable

float b;

GpioDataRegs.GPBSET.bit.GPIO34 = 1; // pull high
b = (float)a * ((float)g + (float)1) - ((float)e * (float)f - (float)1);
b += (float)g * ((float)a + (float)2) - ((float)f * (float)e - (float)2);
b += (float)c * ((float)d + (float)3) - ((float)g * (float)d - (float)3);
b += (float)d * ((float)e + (float)4) - ((float)c * (float)c - (float)4);
b += (float)e * ((float)f + (float)5) - ((float)a * (float)g - (float)5);
b += (float)f * ((float)a + (float)6) - ((float)d * (float)a - (float)6);

GpioDataRegs.GPBCLEAR.bit.GPIO34 = 1; // pull low

Best Regards,

C.C, Liu

over 3 years ago

0 Veena Kamath over 3 years ago

TI__Mastermind 32425 points

Hi,

Have you looked at the generated assembly code? You can view that be enabling the --keep_am compile flag in the CCS project.

Regards,

Veena

0 Chen-Chih Liu over 3 years ago in reply to Veena Kamath

Intellectual 310 points

Hi,

Yes, we have checked the assembly code. We are not familiar with the mechanism of this compiler, so we want to ask experts directly.

For example, why scenario 3 uses a lot of MNOP but others do not (screenshot of a fragment as below)?

Hope TI experts can clarify our concerns.

By the way, do you have the document of each assembly instruction?

BR,

C.C. Liu

0 Veena Kamath over 3 years ago in reply to Chen-Chih Liu

TI__Mastermind 32425 points

Hi,

Yes, the CLA chapter in the device Technical Reference Manual has the details of all CLA instructions.

I will forward your query to compiler experts

Regards,

Veena

0 George Mock over 3 years ago in reply to Chen-Chih Liu

TI__Guru**** 244950 points

Chen-Chih Liu said:
why scenario 3 uses a lot of MNOP

The CPU pipeline of the CLA is not protected. When an instruction is issued, the result of that instruction is sometimes not available for several cycles. The compiler attempts to fill those cycles with other independent instructions. When no other instructions are available, those cycles get filled with MNOP instructions. For further details, please see this forum thread.

Thanks and regards,

-George

0 Chen-Chih Liu over 3 years ago in reply to George Mock

Intellectual 310 points

Hi George,

Thanks for your reply.

I have an extended question. The floating data type (scenarios 4 or 5) is almost 50 times faster than the unsigned data type (scenario 3), is this due to CLA being a fully-programmable independent 32-bit floating-point hardware accelerator (that's why there are no a lot of MNOP instructions?)?

BR,

C.C. Liu

0 Peter Luong1 over 3 years ago in reply to Chen-Chih Liu

TI__Genius 14201 points

Hi Chen,

Thanks for your question, I will route this thread back to Veena so they can help you with this CLA question

Regards,

Peter

+1 Ashwini Athalye over 3 years ago in reply to Chen-Chih Liu

TI__Expert 7695 points

Hi Chen,

That is correct, the CLA instruction set is optimized for 32-bit floating point.

Without knowing the generated assembly, one example where I can think that can make a difference is say to ADD a 16-bit constant - for floating point addition, there is an MADD32F MRa, #16F, MRb which takes on the immediate operand as part of the instruction. For unsigned int, the only instuction supported is MADD32 MRa, MRb, MRc that is there is no instruction that takes immediate operand. This implies for the code snippet you have, for integer types, constants will need to be loaded to a register first, this is not needed for floating point.

Thanks,

Ashwini