How does DSPLIB FFT function prevent overflow?

Cheng Wang_sh

Prodigy 20 points

Dear Sir/Madam,

I am studying the fixed point FFT algorithm and have installed the package "c64plus-dsplib_2_02_00_00".

There is a function named DSP_fft16x16_cn.c, which is writen in natural C and locates at .\src\DSP_fft16x16.

My question is How does DSP_fft16x16_cn.c prevent overflow?

I browsed the doc on C2x, C3x, C5x and knew that the overflow in FFT can be averted by scaling 2 at each stage.

In TI's "sprueb8b.pdf", it is clearly stated that, for most FFT function, "the input data must be scaled by 2(log2(nx)) because no scaling
is done by the functions".

In summary, my questions are:

1) "How does DSP_fft16x16_cn.c prevent overflow? ", I could not find the scaling operation in this natural C program, but the program works very well and no overflow was observed (I use large numbers as the input), and the precision is very good (compared with MATLAB FFT float point function). I don't know how this program achieve good precision and prevent overflow at the same time.

2) If I call intrinsic function "DSP_fft16x16.c", how can I improve the precision? In order to evaluate the precision, I wrote a fixed point FFT algorithm in natural C , and found that precision is affected significantly by scaling by two the stage output data (compared with MATLAB FFT float point function). I think, if the input is scaled by 2(log2(nx)) before the first stage according to "sprueb8b.pdf", , the precision will be further reduced.

3) In TI's "sprueb8b.pdf", it is clearly stated that "All stages are radix-4 except the last one, which can be radix-2 or radix-4,depending on the size of the FFT. All stages except the last one scale by two the stage output data."(page 4-6) . I wonder if it is enough to "scale by two the stage output data" when radix-4 is adopted. Some literature say, the maximum bit-growth of each stage for radix-4 is 3.

With my warmest regards,

Cheng Wang

over 14 years ago

0 Rahul Prabhu over 14 years ago

TI__Guru** 116170 points

Cheng Wang,

It may be a good excercise for you to take a look at the serial assembly code for the function as it has more information regarding the implementation details and does compare the serial assembly code to it natural C version. To answer some of your questions about scaling take a look at the following lines of code in the function

x2[l1 ] = (-co20 * xt0_0 - si20 * yt0_0 + 0x8000) >> 16;

x2[l1+1] = (-co20 * yt0_0 + si20 * xt0_0 + 0x8000) >> 16;

x2[l1+2] = (-co21 * xt0_1 - si21 * yt0_1 + 0x8000) >> 16;

x2[l1+3] = (-co21 * yt0_1 + si21 * xt0_1 + 0x8000) >> 16;

x2[h2 ] = (co10 * xt1_0 + si10 * yt1_0 + 0x8000) >> 16;

x2[h2+1] = (co10 * yt1_0 - si10 * xt1_0 + 0x8000) >> 16;

x2[h2+2] = (co11 * xt1_1 + si11 * yt1_1 + 0x8000) >> 16;

x2[h2+3] = (co11 * yt1_1 - si11 * xt1_1 + 0x8000) >> 16;

x2[l2 ] = (co30 * xt2_0 + si30 * yt2_0 + 0x8000) >> 16;

x2[l2+1] = (co30 * yt2_0 - si30 * xt2_0 + 0x8000) >> 16;

x2[l2+2] = (co31 * xt2_1 + si31 * yt2_1 + 0x8000) >> 16;

x2[l2+3] = (co31 * yt2_1 - si31 * xt2_1 + 0x8000) >> 16;

Notice in this code the two middle samples are swapped and stored to produces results in digit reversed order. Also in addition to the rounding, the shift is performed by 16, as opposed to 15, to give scaling by 2 that is required in that stage.

Regards,

Rahul

0 Cheng Wang_sh over 14 years ago in reply to Rahul Prabhu

Prodigy 20 points

Dear Rahul,

Thank you for your helpful information.

I want to confirm whether it is enough for most cases to "scale by two the stage output data" when radix-4 is adopted, as some literature say, the maximum bit-growth of each stage for radix-4 is 3.

According to TI's "sprueb8b" (page 4-6), All stages except the last one scale by two the stage output data( I guess in both "DSP_fft16x16.c" and "DSP_fft16x16_cn.c"). If this is the case, for a given N points, the precision of radix-4 FFT is better than that of radix-2 FFT.

For example, when mixed radix FFT is adopted, for 2048 FFT (N=2048=2^11=2*4^5), all stages except the last one is radix-4. That is, there are totally 5 scaling operations and the scaling factors are 2. The last stage is radix-2 operation.

If pure radix-2 is adopted, N=2048=2^11, in this case, there are totally 10 scaling operations and the scaling factors are also 2.

Thus, the radix-2 has a worse precision than radix-4. Is my conclusion right?

When pure radix-2 is adopted, the overflow can be observed if 5 scaling operation is done altenately.

Thank you for you patient.

Best wishes,

Cheng

Processors

Processors forum

How does DSPLIB FFT function prevent overflow?