TMS320C6748: C64x DSP Library is useless!

Pavel Mychko

Part Number: TMS320C6748

I need to use a lot mathematic functions for my DSP with fixed point. So I decided to use C64x DSP Library in order to get high computation performance. And what do I get? A set of useless functions!

For example, lets have a look to DSP_dotprod function. This function just computes a sum of products two 16-bit vectors (something like that: z = x[0]*y[0] + x[1]*y[1] + .. etc. But it has 32-bit accumulator only! It's easy to calculate that 32-bit accumulator might be overflow for input vectors with 5 elemets only! It seems ridiculous to use 32-bit sum register in that case! It must be at least 40-bit (long long). The real vectors usualy have the length of 160...320 elements.

Could anybody comment that situation?

over 5 years ago

0 Cvetolin Shulev-XID over 5 years ago

TI__Guru 65405 points

The team is notified. They will post their feedback directly here.

BR
Tsvetolin Shulev

0 Rahul Prabhu over 5 years ago

TI__Guru** 114410 points

Pavel,

Can you indicate the version of the library that you are using. The version of the library that I have installed 3.4.0.0 also seems to have the same issue as you highlighted. C64x+DSP functions were developed more than 10 years back so we can`t comment on what the developer`s thought process may have been when implementing the kernel. My best guess is that the objective of writing that kernel was to demonstrate how the DSP can perform multiple loads in single cycle using mem8 and then utilitize the _dotp2 to perform multiply and sum in a single cycle. Since dotp2 doesn`t return long, they maintained the accumulate at int. ldotp was not in the initial Instruction sets
www.ti.com/.../spru732j.pdf

The intent of DSPLIB is to provide a free library of kernel written in a way that would provide high performance utilizing the DSP architecture. The beauty of the library is that it is provided in complete source so for users who find the implementation not suitable for their application have the ability to tweak the source and still be able use. Can you check the CPU Intruction set and see if using _ldotp2 which was added later by the compiler can be used for your usecase.

www.ti.com/.../spru198k.pdf

Regards,
Rahul

0 Pavel Mychko over 5 years ago in reply to Rahul Prabhu

Intellectual 360 points

My library's version is the same as yours (3.4.0.0).

Of course, that operation and other ones shouldn't rely on result of one multiplication. As this functions store data a lot of multiplications the storage register (accumulator) can be wider than operation result. If processor support wide registers, they should be used. So I don't see the problem to convert that functions to long long results.
Secondly, my DSP is not native C64x, it's C674x. As I know, it completely inherits all integer commands from C64x family. I didn't check is it have any new integer commands in comparison to C64x family. If yes, it will be good idea to make the integer DSP library especially for it. If no, I hope you kindly issue new version of DSP library. I'm С developer and, to be honest, hope to avoid writing code in assembler.

0 Rahul Prabhu over 5 years ago in reply to Pavel Mychko

TI__Guru** 114410 points

DSPlib for these devices is currently in maintenance mode so we don`t anticipate any new releases for a while. Having said that the C6000 compiler optimization has improved quite a bit over the years and you should be able to get good optimization out of the box with simple loops like dot product if you don`t add any control code that disqualifies software pipe lining.

If you are not satisfied with the performance that you are getting then please let us know and we can help provide further guidance or provide you an optimized C function.

Regards,
Rahul

0 RandyP over 5 years ago

TI__Guru* 84110 points

Pavel,

I assume you are using the C64x+ DSPLIB library and not the older C64x DSPLIB. When we moved from C641x to C642x and above, the processor core was enhanced to the C64x+ which makes important architectural and instruction set improvements. The core name is sometimes labelled as C64xp or C64xplus, but the official name is C64x+. You can use the older library but will not get any of the improvements since those would not have used the improvements.

The C64x+ DSPLIB is definitely 'useless' as an integer math library, for the reasons you listed. But it is awesome as a fixed point math library. You need to understand the difference between integer math and fixed point math in order to use the C64x+ DSPLIB effectively. We used to have training courses archived under "c6000 embedded design workshop" or "c6000 dsp integration workshop", but I do not know what is still available there. Your searches are as effective as anything I could refer you to, and I assume you do not plan to do that from the tenor of your post.

For the C developer who is not interested in DSP architecture but is interested in superior math performance, TI moved beyond the C64x+ fixed-point processors and integrated that superior fixed point engine with the superior C67x+ floating point instruction set from our C672x devices, adding the two together to form the C674x core in your C6748. With this combined core, you get the greater ease of programming with floating point in which you can use integers or fractions or any similar notation natively. In general, floating point operations take twice as many cycles as fixed point, but of course with the convenience of using floating point data.

Good luck,
Regards,
RandyP

0 Pavel Mychko over 5 years ago in reply to RandyP

Intellectual 360 points

Did you say C64x+ DSPLIB is awesome as a fixed point math library? I'm not sure it suitables for fixed-point arithmetic at all.

Lets have a look to my sample code:

	short a = 0.7 * 32768;	// 22937 is 0.7 in Q1.15
	short b = 0.3 * 32768;	// 9830 is 0.3 in Q1.15
	short res;

	res = ((long)a * (long)b) / 32768;	// 6880 that means 0,2099 in Q1.15- very well
	
	float fa = 0.7;
	float fb = 0.3;
	float fres;

	fres = fa * fb;			// 0.21 - very well

	short x[4] = {22937, 22937, 22937, 22937};
	short y[4] = {9830, 9830, 9830, 9830};
	int z, libz;

	z = 22937*9830 + 22937*9830 + 22937*9830 + 22937*9830;	// 901882840 in integer arithmetic, should be 27520 in Q1.15

	libz = DSP_dotprod(x, y, 4);	// 901882840 - it's not fixed-point arithmetic

The library that I used is C64Px ver. 3.4.0.0. So you can see that mathematic is absolutely the same as in C64x library.

Functions in DSP library which work in fixed-point arithmetic should do division on scaling factor what doesn't happen in this library.

So I don't understand why you say it's suitable for fixed-point arithmetic.

0 RandyP over 5 years ago in reply to Pavel Mychko

TI__Guru* 84110 points

Pavel,

The ease of use of floating point allows you to implement your company's product much more quickly with close performance levels, compared to fixed point. Many applications can be implemented with either, so it is up to you to chose the development path that works for you.

There are likely many products you come across in daily life and technical life that use code running DSPLIB. I am very happy and confident to say that many successful companies have used TI DSPLIB to create leadership products with fixed point math.

As you point out, shifting is critical in fixed point math. Shifting is what creates the difference between integer and fixed-point numbers. Your two examples generating z and libz generate the exact same result, which is what you would want. Both generate the result of 0.839944 when read as Q2.30, which is the natural result from multiplying two Q1.15 numbers. In the 16-bit, you also have 27523 in integer format or 0.839935 in Q1.15 format.

Some functions like fir do the scaling for you. Please refer to the DSPLIB documentation to be sure.

I will sign off from this thread and let others take over who are better at explaining things.

I do wish you success.

Regards,
RandyP

0 Pavel Mychko over 5 years ago in reply to RandyP

Intellectual 360 points

Thank you for explaining.
What about floating point, as you said performance decreasing twice in that case. I have a project which load the processor up to 99% with fixed point. So there is no way to use floating point instead.

Processors

Processors forum

TMS320C6748: C64x DSP Library is useless!