DSPLIB bug for C6415T

transmitterdan

I am trying to debug a error in one of my programs. The error happens when I feed an input value into the DSPLIB routine dsp_fir_sym() that is greater than 0x3fff. I looked at the online examples found here:

http://focus.ti.com/general/docs/litabsmultiplefilelist.tsp?literatureNumber=spra884a

I installed this and looked at the example for dsp_fir_sym() and found it interesting that the example uses a sine wave that only goes up to 15000 which is just less than 1/2 full scale. Is there an inherent reason that the input data cannot go all the way to full scale with this function? If so, that would seem to be a serious limitation. The documentation for this function claims that it uses a 40 bit accumulator so it seems unlikely that it could overflow.

Dan

over 16 years ago

0 RandyP over 16 years ago

TI__Guru* 84110 points

Depending on the length and range of the coefficients, overflow can easily occur even with a 40-bit accumulator. Quite likely, the input range is held down to make the example work with as many platforms as possible and maybe to work with the other DSPLIB routines. But these are just guesses. The function documentation does include the vague warning:

/*      Note that samples are added together before multiplication, and     */
/*      so overflow *may* result for large-scale values, despite the               */
/*      40-bit accumulation.                                                                                    */

There will have to be a tradeoff between the coefficients and the sample values to avoid this, but it definitely generates errors when the samples are larger.

The advantage of the symmetric FIR is that the speed can be improved 2x by adding the corresponding samples and doing a single multiply. The disadvantage is that when that add occurs, it may be two samples near the top of the range which then doubles the range. Since the most efficient multiply will be 16x16, the result of the add will have to stay within the original range, so the samples will have to be scaled down by 2.

If the performance gain is not worth the scaling sacrifice, then you will need to stay with the standard FIR implementation.

0 Randy Yates over 15 years ago in reply to RandyP

Expert 1940 points

So to clarify, a 16-bit accumulator is used for the add operation on the 64x (6418)? I believe the 54x and 55x had 17-bit multipliers for just this reason.

--Randy

0 RandyP over 15 years ago in reply to Randy Yates

TI__Guru* 84110 points

I am glad to see that the 2009 and earlier threads are showing up and getting some use. We must have done a fairly good job in the transition to the new E2E forum in December 2009 to retain everything and keep it searchable.

Randy Yates said:
a 16-bit accumulator is used for the add operation on the 64x (6418)

This is an overstatement of the situation. The C64x instruction set includes many different add and multiply instructions, with tradeoff choices you can make based on data size, internal result, external (output) result, and cycles per operation. You can get 8-bit, 16-bit, 32-bit, and 64-bit results with different instructions. There may be some 40-bit holdouts from the original C62xx instruction set.

In the case of the highly optimized DSPLIB DSP_fir_sym function, symmetric 16-bit input samples are added together then that result is multiplied by a 16-bit coefficient to generate a 16-bit output result.

The sum of two 16-bit input samples effectively makes a 17-bit input sample for the multiplication. The main key to optimizing this function is the use of the DOTP2 instruction, which multiplies pairs of 16-bit values, two of which are the summed input samples and two of which are coefficient. Since the 17-bit summed input sample is going into a 16-bit DOTP2 field, there is no need to retain the extra bit - the tradeoff has already been decided by using the powerful DOTP2 instruction and getting the performance gain of multiplying 2 input samples times 1 coefficient.

So since the 17th bit is going to be lost going into the DOTP2, the ADD2 instruction can be used to add 2 pairs of 16-bit input samples per instruction. And yes, these use a 16-bit accumulator.

As I explained poorly in the earlier post, this is a tradeoff for performance over dynamic range. If you need the dynamic range, then you can use the standard FIR function and give up the performance benefit of the symmetric version. The performance gain looks like about 30% faster than the other DSP_fir_xyz functions.

But no, the blanket statement you made above is not true. You could say "a 16-bit accumulator is used for the ADD2 instruction on the c64x" or "a 16-bit accumulator is available on the c64x when higher performance is desired".

Processors

Processors forum

DSPLIB bug for C6415T