Efficient FFT Computation of Real Input

Tariq Siddiqui

Other Parts Discussed in Thread: SPRC121

Hi !

I am trying to implement FFT algorithm for Real Input disucssed on wiki page 'http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input'. I have downloaded the example C code and run it for DSK C6713 but I am getting the following errors.

"DSPF_sp_fftSPxSP.asm", ERROR! at line 217: [E0800] Instruction invalid for C6700.
Build Complete, 291 Errors, 0 Warnings, 19 Remarks.

Does it mean that I can't use this code with C6713 ? What necessary changes should I made to make this code run ?

I will appreciate your help.

BR,

Tariq

over 14 years ago

0 Tariq Siddiqui over 14 years ago

Expert 2270 points

Will some one reply to this post ?

0 DanRinkes over 14 years ago in reply to Tariq Siddiqui

TI__Expert 8055 points

Tariq;

It's not quite clear by the topic on the wiki (other than it references the 674x DSPLib, but I believe this content is specific to the 674x processors. The 6713 is not instruction level compatible with the 674x. Looking at the assembly code, the instruction in question is STDW. Which I assume writes a double word corresponding to 2 registers back to the stack in a single cycle. you can break these up into back to back STW words. (It will take an extra cycle for each. This is just one of the optimizations on the 674x architecture.

           STW     .D2T1   A10,*SP--(8)      ; |419|
           STDW    .D2T2   B13:B12,*SP--     ; |419|
           STDW    .D2T2   B11:B10,*SP--     ; |419|
           STDW    .D2T1   A15:A14,*SP--     ; |419|
           STDW    .D2T1   A13:A12,*SP--     ; |419|
           STW     .D2T2   B3,*SP--(48)      ; |419|

For example, I think you can replace the first STDW instruction with

STW .D2T1 B13, *SP--

STW .D2T1 B12, *SP--

I may be incorrect about the ordering of these.

However, I suspect that there will be more instructions that fail when you fix these. You just have to replace them with whatever functionally works on the 6713. This may already be done for you in the 6713 DSPLIB. You should check that out.

Be aware, it is important to understand what the assembly code is doing before making modifications.

Regards,

Dan

0 Tariq Siddiqui over 14 years ago in reply to DanRinkes

Expert 2270 points

Dan,

Do you mean using the same example code with 6713 DSPLIB ?

I don't see any DSPF_sp_fftSPxSP.asm file in dsp67x.lib ? Should I still try without .asm file ?

Waiting for your reply.

-Tariq

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

Tariq,

maybe you should describe more detailed which development tools you are using, especially which DSPLIB version (from where resp. for which target platform?) your problems concern to.

However, the DSPF_sp_fftSPxSP.asm should be an element of the sources of the DSPLIB, at least for c674x-dsplib_1_03_00_00 it is (but that was the wrong version for your platform).

As that is not clear from your question: of course an .asm file is no part of a .lib file, but it may be an element of the source files which can be built into a .lib binary. But hopefully I assume that this is clear.

In each case, feel free to ask, as concrete as possible,
best regards,
Joern.

0 Tariq Siddiqui over 14 years ago in reply to Joern

Expert 2270 points

Dear Joern,

Thanks for your reply.

I used the example code given at wiki side and .asm file was already included in the project. Being a naive to processor, I assumed that .asm file is a part of the project. When I donwloaded the TMS320C67x DSP library using the link given below, I didn't find .asm file and got confused.

http://focus.ti.com/docs/toolsw/folders/print/sprc121.html

Now I understood that libraries are pre-built object files. Using the library for 67x (which I donwloaded from the above link), I am able to build and run the project succesfully but I still have few confusion about the sizes of different variables defined in the demo project.

My input buffer has 1024 real samples. According to wiki page given below, input samples should be formed as N/2-point complex valued sequence and then compute the N/2-point complex FFT on the complex valued sequence.

http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input

What should be the size of following variables if size of input buffer is 1024 ?

FFT Length = N ?

float pRFFT_In ?
float pRFFT_Out ?
twidle (w) ?

-Tariq

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

Dear Tariq,

in common maybe a look into C674x DSPLIB helps understanding the parameters of the lib's functions like DSPF_sp_fftSPxSP() and DSPF_sp_ifftSPxSP().

In case of the example sprc121 the file main.c at least seems to deliver all answers. N within that sample is a pre-processor defined value and is defined in this sample to be 512 (see near the top) - if you just wanted to calculate for 1024 complex values as input just you had to change this N within the sample to 1024.

If you follow the use of this N within the sample's main.c you get an answer concerning the needed array sizes. At first glance it's not clear for me why they always add 4 to the length of the different fields, maybe that is only for safety purposes, maybe they know about the algorithm accessing always one element more than needed, however, I would copy that kind of use or analyse step by step the actual needs of the source code. By the way, if your DSPLIB came with source code, usually they include a C implementation besides each optimized ASM implementation, this sometimes helps a lot in understanding what actually is done by the functions.

If changing the code for your own purposes always think of the alignment needs (see the #pragma DATA_ALIGN defines resp. look into the C674x DSPLIB documentation).

But in general this sample of course should not be used directly for production environment, besides some coding style elements especially the dynamic calculation of the "twiddle factors" (the FFT's trigonometric constant coefficients) - as they also write in their sample code - should be done only once and not in each calculation oncemore...

But now, last not least to the root of your question, how to handle real samples as input instead of complex values. In principle (in my own, small opinion) you might calculate this by simply copying these real values into a field of complex values, where the complex part of the value pair is set to zero. But of course, as we want to optimize, that was counterproductively. So the characteristics of FFT are used, probably you might use your M-size input array of real values directly as M/2-size input array of complex values, afterwards you make the FFT for length N=M/2 (that means in your case, as you have 1024 real values, you can leave the N of the sample at 512). For finishing you will have to calculate now the FFT result for your real values like described in Computing a Length N/2 Complex FFT from a Length N Real Input Sequence.

Feel free to ask again, maybe I've overlooked what information you actually was missing.

Regards,
Joern.

0 BAS over 14 years ago in reply to Joern

Expert 2370 points

Dear Joern,

Thanks for your explanation.

Can you please tell me how can I save w(twiddles), A, B Tables as .const so that they don't calculate every time ?

Regards,

Tariq.

0 Joern over 14 years ago in reply to BAS

Genius 3055 points

You can place variables within explicit sections by use of

#pragma DATA_SECTION(anyVariable, ".anysect")

This way you are enabled to place values within specific memory areas, like local memory. Additionally possibly you will have to assign those sections to specific memory areas, this is done by the linker command file or by use of the RTSC framework. It may be of an interest concerning memory access speed.

But in your case I assume it will be helpful as a 1st step to simply precalculate the data once, maybe within an own initFFT() function, and to hold these values within fields being statically available from everywhere within the module and later access these prepared fields from your FFT functions. In C I would prefer to declare those variables globally within the module(.c file), but as static - so they are accessible from the whole module, but may not lead to conflicts elsewhere.

Maybe it would be helpful to start looking around at these points:

http://processors.wiki.ti.com/index.php/Compiler
C6000 Compiler Guide, look there for "data_section"
look here within the forum for "#pragma data_section" to find some further starting points (for sure better than mines)

Hope that helps somewhat (good luck, it's a huge field),
best regards,
Joern.

0 Tariq Siddiqui over 14 years ago in reply to Joern

Expert 2270 points

Hi,

I build and run the project using dsp67x.lib successfully. But I am getting an extra peak in the middle of 2 peaks after calculating FFT magnitude which can bee seen in the graph shown below.

Here attached my code. TI guys, please have a look on my code and correct me if I am wrong. Can you also tell how to save w(twiddles), A, B Tables as .const ?

#define N 1024// FFT Length

float pInput[2048]; // input buffer

float pRFFT_In[2 * N]; // input to FFT function
float pRFFT_Out[2 * N]; // output of FFT function

float pTemp[N];

float w[N];
float A[N];
float B[N];

#pragma DATA_ALIGN (brev , 8);
unsigned char brev[64] = {
    0x0, 0x20, 0x10, 0x30, 0x8, 0x28, 0x18, 0x38,
    0x4, 0x24, 0x14, 0x34, 0xc, 0x2c, 0x1c, 0x3c,
    0x2, 0x22, 0x12, 0x32, 0xa, 0x2a, 0x1a, 0x3a,
    0x6, 0x26, 0x16, 0x36, 0xe, 0x2e, 0x1e, 0x3e,
    0x1, 0x21, 0x11, 0x31, 0x9, 0x29, 0x19, 0x39,
    0x5, 0x25, 0x15, 0x35, 0xd, 0x2d, 0x1d, 0x3d,
    0x3, 0x23, 0x13, 0x33, 0xb, 0x2b, 0x1b, 0x3b,
    0x7, 0x27, 0x17, 0x37, 0xf, 0x2f, 0x1f, 0x3f
};

void main ()
{
    int i;

    // Generate real input sequence of length 2048
    for (i = 0; i < 2048; i++)
    {
        pInput[i] = sin (2 * 3.1415 * 500 * (i) / (double) N);
    }

   Real_FFT ();

}

void Real_FFT ()
{
    int i, rad, nTemp = N / 2;
    float *twiddle;
    int start, stop, overhead;

    if (nTemp == 16 || nTemp == 64 || nTemp == 256 || nTemp == 1024 || nTemp == 4096 || nTemp == 16384)
        rad = 4;
    else if (nTemp == 8 || nTemp == 32 || nTemp == 128 || nTemp == 512 || nTemp == 2048 || nTemp == 8192)
        rad = 2;
    else
    {
        printf ("%d Value of N is not supported \n", N);
        exit (0);
    }

    // Real FFT of length N/2
    for (i = 0; i < N; i++)
    {
        pRFFT_In[2 * i] = pInput[2 * i];          //arrange real input sequence to
        pRFFT_In[2 * i + 1] = pInput[2 * i + 1]; //N/2 complex sequence..
    }

    tw_gen (w, N / 2);
    split_gen (A, B, N / 2);
    twiddle = (float *) w;

    // Forward FFT Calculation using N/2 complex FFT..
    DSPF_sp_fftSPxSP (N / 2, pRFFT_In, twiddle, pTemp, brev, rad, 0, N / 2);
    // FFT Split call to get complex FFT out of length N..
    FFT_Split (N / 2, pTemp, A, B, pRFFT_Out);

}

// FFT magnitude calculation..

    for(j=0; j < N; j++)

        {
            fft_mag[j] = magcalculation((double)pRFFT_Out[2*j], (double)pRFFT_Out[2*j+1]);
        }

Waiting for your help.

Regards.

0 Tariq Siddiqui over 14 years ago in reply to Tariq Siddiqui

Expert 2270 points

Dear Joern,

Thanks for your valuable help. I am new to this field so it will take time to unterstand things.

I really appreciate your help.

-Tariq

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

You work with 2048 floating point values, the real input sequence. Then you fill these 2048 real values into a 1024 complex values containing field. So far so good - but actually don't know why you do this: because in this case it is only a question of interpretation, or isn't the content of both fields equal?

Tariq Siddiqui said:
    for (i = 0; i < N; i++)
    {
        pRFFT_In[2 * i] = pInput[2 * i];          //arrange real input sequence to
        pRFFT_In[2 * i + 1] = pInput[2 * i + 1]; //N/2 complex sequence..
    }

I mean this "arrangement" might be left out, you might use pInput (of 2048 reals) content as FFT input (of 1024 complex values)...

In each case you have a complex input field of length N=1024. So I do not understand why you work with N/2 at many places, where in the original sample stood N? You make the FFT for N/2, but you have N=1024 complex input values...

Afterwards, what you do with "magcalculation()" seems wrong to me. I assume, that what you wanted was what is explained here, where first the N FFT output values for the complex values field are used to calculate the first half of the FFT values which we would have gotten with a FFT over real values (each time the complex component set to zero, in your case that would be a field of 2048 complex resp. 4096 floating point values), afterwards the second half of these values is calculated by use of the known symmetry.

Again, the whole game is: having M real values, needing real-input-FFT for M values from that, making complex-input-FFT for M/2 values instead of the same field as if that was a field of complex values. Afterwards from this result calculate the real-input-FFT-result from that complex-input-FFT-result, being hopefully that these finishing steps will need less effort than calculating the real-input-FFT directly by transferring the M real inputs x1, x2, x3, ... into a complex field of M complex values x1, 0, x2, 0, x3, 0, ... and call the complex input FFT for a M length instead of M/2.

Regards,
Joern.

0 Tariq Siddiqui over 14 years ago in reply to Joern

Expert 2270 points

Joern,

I transform them into 1024 complex values as shown in the example code. But you are right, they both have same contents.

I don't understand this statement of yours (So I don't not understand why you work with N/2 at many places, where in the original sample stood N? You make the FFT for N/2) that what line of code you are referring to ?

If you mean N/2 in the following two equations given below then they are defined in the example code and also says on wiki page to compute .

DSPF_sp_fftSPxSP (N / 2, pRFFT_In, twiddle, pTemp, brev, rad, 0, N / 2);

FFT_Split (N / 2, pTemp, A, B, pRFFT_Out);

What wrong magcalculation ? Here attached magcalculation function. FFT output has 1024 sample values so I calculated magnitude for N points.

float magcalculation(double x, double y)
{
    double a1, a2, b1, b2;
    a1 = pow(x, 2);
    a2 = pow(y, 2);
    b1 = a1+a2;
    b2 = (sqrt(b1)/1024.0f)*2.0f;
    return (float)b2;
}

0 Tariq Siddiqui over 14 years ago in reply to Tariq Siddiqui

Expert 2270 points

I am sorry to say but I am lost and totally confused..

Will you tell me the exact size for different variables in the code I posted in my previous post..

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

Sorry for confusing you, partially I myself have been confused, because I didn't read the sample as exactly as I should have done. I did not realize that there already the whole way is implemented.

They work in that within main() in each case - as you want - with a real field of length N, and then they do two FFT calculations:

1st the less efficient version:

Within Complex_FFT() they first transform the real values into directly corresponding complex values, x1, x2, x3 into x1, 0, x2, 0, x3, 0,... so from a field of N real values they generate a field with N complex values, as input for DSPF_sp_fftSPxSP(), which in each case assumes the input values being complex values. As result we get directly N complex FFT values. Finally the call the inverse, DSPF_sp_ifftSPxSP(), to compare the result with the original values.

2nd the more efficient version:

within RealFFT() they try to spare time by - in the end - just interpreting the field of N real values as a field of N/2 complex values. As a result we get N/2 complex FFT values. Afterwards that result has to be somwhat enhanced to a FFT result like that which we got within 1st version by Complex_FFT(). That job is done by FFT_Split(), implemented within fft_split.c. It deliveres the N complex FFT result values. Finally they do again the same vice versa to check for right results.

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

Tariq Siddiqui said:

What should be the size of following variables if size of input buffer is 1024 ?

FFT Length = N ?

float pRFFT_In ?
float pRFFT_Out ?
twidle (w) ?

If size of your input buffer is 1024, and if you speak about 1024 real values, then N in the sample code has to be N=1024. Instead of 512. In principle that's all. All other values within the sample will be set via that predefined N.

What I didn't realize is, that the sample alredy is all the wiki article writes about. Sorry for that.

So what you have to change in your example is:

If you want to work with 1024 real values, so generate a real input sequence of length 1024 and not 2048
in your shortened Real_FFT() below the comment " // Real FFT of length N/2" do " for (i = 0; i < N / 2; i++)" instead of " for (i = 0; i < N; i++)"

Regards,
Joern.

0 Tariq Siddiqui over 14 years ago in reply to Joern

Expert 2270 points

Dear Joern,

Thanks for your help. I was using the same size but got confused with some previous posts on the forum. I appreciate your help.

But I am still getting one extra peak in between the real and immaginary peak as shown below. What's wrong here ?

If you see the amplitude of FFT peak, it is around 0.801V but input signal has an amplitude of 1.25V. How can I scale my FFT output to be the same as input ? I am calculating magnitude as;

Magnitude [FFT(A)]/N

Regards.

0 Joern over 14 years ago in reply to Tariq Siddiqui

Genius 3055 points

Dear Tariq,

concerning magnitude I've no idea at the moment, would have to read your code first (which is already provided, I know) and would have to be more sure about the theory.

But concerning that peak at least I see an inconsistency within the concerning wiki page and within the sample provided there as well.

Within the wiki there is written with regard to the symmetry of a real-sequence FFT, that the missed :

G_r(N/2) = X_r(0) – X_i(0)
G_i(N/2) = 0

(Besides, this above might lead to such a peak at 512, but I don't understand at all this calculation.)
And further they write:

G_r(N–k) = G_r(k), for k = 1, 2, ..., N/2–1
G_i(N–k) = –G_i(k)

That for N=1024 would calculate the FFT values from 1024-1=1023 (so far, so good) until 1024-(512-1)=513. Is that symmetry? Why the hell not until k=N/2, that means for N=1024 from 1023 until element 512??? Please be aware, that obviously I don't understand the theory behind the special handling of FFT element nr. 512.

Now, in contrast, in the sample within the function FFT_Split() they write this (please note, that here n already is N/2, and that not complex values are addressed here but their components within fields of floats):

for (i = 0; i < n; i++)
{
    Tr = [...]
    Ti = [...]

    pOut[2 * i] = Tr;
    pOut[2 * i + 1] = Ti;
    // Use complex conjugate symmetry properties to get the rest..
    pOut[4 * n - 2 * i] = Tr;
    pOut[4 * n - 2 * i + 1] = -Ti;
}

This loop now for N=1024 resp. n=512 fills the FFT elements 0..511 and at the same time the "mirrored" elements from 1024(??why that??)..513. Why not from 1023..512?? That might be to have calculation pairs again, concerning optimization. But, again, as within the wiki, the FFT values nr. 512 is calculated in a way being a mystery for me.

Couldn't that mystery part left out and instead the for-loop look like the following?:

for (i = 0; i < n; i++)
{
    Tr = [...]
    Ti = [...]

    pOut[2 * i] = Tr;
    pOut[2 * i + 1] = Ti;
    // Use complex conjugate symmetry properties to get the rest..
    pOut[4 * n - 2 * i - 2] = Tr;
    pOut[4 * n - 2 * i - 1 ] = -Ti;
}

The main question seems to me: what says the theory about that mysterious calculation of the FFT element one right from the middle? Maybe you have an idea about that...

Regards,
Joern.

0 Tariq Siddiqui over 14 years ago in reply to Joern

Expert 2270 points

Dear Joern,

I really appreciate your effort but I am confused myself. Accoring to theory of FFT, the output of FFT will give values corresponding to frequencies from 0 to Fs which will be symmetric about N/2.

I am really surprised why not some one from TI reply to this post.

C'mon guys..

0 Tariq Siddiqui over 14 years ago in reply to Tariq Siddiqui

Expert 2270 points

Will some one tell me what is wrong with my FFT code ? Why there are 3 peaks ? How to calculate FFT magnitude properly ?

I am stuck with my project and waiting for some one to put some light on this issue..

Processors

Processors forum

Efficient FFT Computation of Real Input