Bug in the c64x+ CCS 5.1 simulator?

fabian.buettner

Hi,

I have implemented a method to compute multi-precision subtraction of two multi-precision integers using c64x+ intrinsics. Unfortunately i get different results on my device(beagleboard) compared to the CCS5.1 simulator.

This code which works correctly in the simulator:

uint32_t a[7], b[7], c[7], i;

int8_t borrow;

uint40_t result;

a[0] = 0xc8266031; a[1] = 0xc603009c; a[2] = 0x0d7eb877; a[3] = 0x0caed702; a[4] = 0xcf9d18d3; a[5] = 0x14b1ba1a; a[6] = 0x1;

b[0] = 0xffffffff; b[1] = 0xffffffff; b[2] = 0xfffffffe; b[3] = 0xffffffff; b[4] = 0xffffffff; b[5] = 0xffffffff; b[6] = 0x0;

borrow = 0;

for(i = 0; i < 6; i++)

{

result = (uint40_t)a[i] - (uint40_t)b[i] - borrow;

c[i] = _loll(result);

borrow = _hill(result);

}

This code is working correctly on the device(Note the changed sign in front of the borrow variable in the line where the value for variable "result" is computed):

uint32_t a[7], b[7], c[7], i;

int8_t borrow;

uint40_t result;

a[0] = 0xc8266031; a[1] = 0xc603009c; a[2] = 0x0d7eb877; a[3] = 0x0caed702; a[4] = 0xcf9d18d3; a[5] = 0x14b1ba1a; a[6] = 0x1;

b[0] = 0xffffffff; b[1] = 0xffffffff; b[2] = 0xfffffffe; b[3] = 0xffffffff; b[4] = 0xffffffff; b[5] = 0xffffffff; b[6] = 0x0;

borrow = 0;

for(i = 0; i < 6; i++)

{

result = (uint40_t)a[i] - (uint40_t)b[i] + borrow;

c[i] = _loll(result);

borrow = _hill(result);

}

regards,

fabian

over 14 years ago

RandyP over 14 years ago

TI__Guru* 84110 points

Fabian,

You may get more assistance here if you supply a little more information, such as

Which full CCS version are you running, 5.1.xx.yyyyy?
Which version of the simulator are you using?
Which PC OS are you running this on, Windows (which?) or Linux (which?)?
What results did you expect?
What results did you get?

Why do you want to use uint40_t? This is not an efficient data type, but is maintained for compatibility with older C6000 architectures. You may get more efficient results with uint64_t instead.

Regards,
RandyP

fabian.buettner over 14 years ago in reply to RandyP

Prodigy 140 points

Hey Randy,

I am sorry for the lack of information. I am using CCS version 5.1.0.09000.

I am using the C64x+, CPU Cycle Accurate Simulator, Little Endian

My OS is a x64_86 Gentoo Linux.

I expected the Version with "+ borrow" to be the correct one, since the borrow can be -1.

In the simulator i had to use "- borrow" to get my expected result of the multi precision subtraction.

However, i may have messed up something in my test case.

Thus, i am going to check for that behavior again because i have to rewrite this part of my code again anyway, since

I learned a lot new things about how to write c64x+ efficient code.

What do you mean by. "This is not an efficient data type"? how can it be less efficient? is it slower or something?

I read about uint40_t on this website: http://processors.wiki.ti.com/index.php/C6000_CGT_Optimization_Lab_-_1

I cite from Step 2: "Change type of sum from long long to int40_t

When the type of sum is long long (64-bits), the compiler must allocate a sequential register pair (like A5:A4) to contain it. This increases the number of registers the compiler must manage overall. In this case, it causes the compiler to make some poor register allocation decisions that lead to a loop carried dependency."

Since, i don't need double precision in my algorithm i found it to be a great way to get the carry or the borrow of a 32-bit addition or subtraction. Or is there a more efficient way to get the carry/borrow?

regards and thanks for your time,

fabian

RandyP over 14 years ago in reply to fabian.buettner

TI__Guru* 84110 points

Fabian,

It sounds like your question is on hold until you do more investigation. Please let us know about your progress.

Fabian B��ttner said:

I expected the Version with "+ borrow" to be the correct one, since the borrow can be -1.

In the simulator i had to use "- borrow" to get my expected result of the multi precision subtraction.

If you do need to come back with revised questions, please keep in mind that these two statements do not explain the expected results. I do not know what would be done in debugging your code, but knowing the specific expected result may be a requirement for debugging something like this.

Fabian B��ttner said:
Change type of sum from long long to int40_t

This is very interesting to me. I will consider this in the future; before now, I have have always dismissed 40-bit operations as being legacy and not having any advantage over 64-bit operations. In fact, I expected 40-bit operations to carry some overhead just as using a 16-bit data type can carry overhead compared to 32-bit data types for situations such as loop counters and indexes.

But I do not understand the explanation that you quoted, about the register pairs (like A5:A4). Both 40-bit and 64-bit values must be stored in register pairs, so this does not seem to be a clear advantage. The advantage that I see is that there are several more instructions that support the 40-bit type so the compiler has more choices when optimizing the code. My concern would be if the compiler is obligated to mask off the upper bits of the high register. The way to know is to build it and benchmark it and look at the assembly output. I will add that to my list, but will leave an official answer to someone more knowledgeable than I am.

Regards,
RandyP

Code Composer Studio™︎

Code Composer Studio forum

Bug in the c64x+ CCS 5.1 simulator?