This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Real Time Problems with high FFTs using EDMA

Hello every one,

i have a Problem "see Topic"...

my Programm works for FFT lenght up to 1024  when i increase the size to 2048,4096,8192.... then i get a realtime Problemm see in Picture for N = 32768

Real Time FFT N=32K Left and Right channel

Im using 2 Chanels Left and Right (Stereo).

When i use 1 Channel i get no problems up to N = 32K

Im using "Welch, Wright, & Morrow, Real-time Digital Signal Processing, 2005"

"Main Programm"

#include "..\..\..\Common_Code\DSK_Config.h"
#include "frames.h"

int main()
{  

 // initialize all buffers to 0
 ZeroBuffers();
 
 // initialize EDMA controller
 EDMA_Init();
 
 // initialize DSK for selected codec using EDMA
   DSK_Init_EDMA(CodecType, TimerDivider);

 // main loop here, process buffer when ready
   while(1) {
        if(IsBufferReady()==1 && IsOverRun()==0){ // process buffers in background
           ProcessBuffer();    
  }
   }  
}

"ISR"

// Welch, Wright, & Morrow,
// Real-time Digital Signal Processing, 2005

// modified by B.Fuchs

///////////////////////////////////////////////////////////////////////
// Filename: ISRs.c
//
// Synopsis: Interrupt service routines for EDMA service
//
///////////////////////////////////////////////////////////////////////

#include "..\..\..\Common_Code\DSK_Config.h"
#include "math.h"

#include "frames.h" 
 
// frame buffer declarations   
#define BUFFER_COUNT  32768  // buffer length in McBSP samples (L+R) //MAX 32768 //warum? wegen datentypen
#define BUFFER_LENGTH    BUFFER_COUNT*2 // two shorts read from McBSP each time 
#define NUM_BUFFERS      3     // don't change this!

#pragma DATA_SECTION (buffer, "CE0"); // allocate buffers in SDRAM
short int buffer[NUM_BUFFERS][BUFFER_LENGTH];
// there are 3 buffers in use at all times, one being filled from the McBSP,
// one being operated on, and one being emptied to the McBSP
// ready_index --> buffer ready for processing
short buffer_ready = 0, over_run = 0, ready_index = 0;

// fft defines
#include "fft.h"
#define N BUFFER_COUNT

//new fft-assambler
#define RADIX 2         //radix or base 
#define Tw N/RADIX

#pragma DATA_SECTION (xR, "CE0");
#pragma DATA_SECTION (xL, "CE0");
#pragma DATA_SECTION (WxFFT,  "CE0");
#pragma DATA_SECTION (WxIFFT, "CE0");


COMPLEX xR[N], xL[N];  //für eingang
COMPLEX WxFFT[Tw],WxIFFT[Tw];  //twiddle factor

#pragma DATA_SECTION (iData, "CE0");
#pragma DATA_SECTION (iTwid, "CE0");
short iData[N];      //index for bitrev X
short iTwid[N/2];     //index for twiddle constants W


 

void ProcessBuffer()
///////////////////////////////////////////////////////////////////////
// Purpose:   Processes the data in buffer[ready_index] and produces
//      the Fourier transform of the data in 
//            Data is packed into the buffer, alternating right/left
//
// Input:     None
//
// Returns:   Nothing
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
{  
 short int *pBuf = buffer[ready_index];

 int i;
      

 *(volatile int *)IO_PORT = 0; // set STS0/1 low - for time measurement
 

 //***Lade daten in xR und xL ***\\
 //erst rechts dann links immer achten
  for(i=0; i < BUFFER_COUNT; i++){
   xR[i].real = *pBuf++;
   xR[i].imag = 0.0;
   xL[i].real = *pBuf++;
   xL[i].imag = 0.0;
  // *pBuf++;  
  }
 
  // TI_FFTrd2-CODE_ASSAMBLER
 
   
 cfftr2_dit(xR, WxFFT, N ) ;    //TI floating-pt complex FFT
  bitrev(xR, iData, N);        //freq scrambled->bit-reverse x
   
  cfftr2_dit(xR, WxIFFT, N ) ;    //TI floating-pt complex FFT
  bitrev(xR, iData, N);                //freq scrambled->bit-reverse x

  
  cfftr2_dit(xL, WxFFT, N ) ;    //TI floating-pt complex FFT
  bitrev(xL, iData, N);        //freq scrambled->bit-reverse x
  
  cfftr2_dit(xL, WxIFFT, N ) ;    //TI floating-pt complex FFT
  bitrev(xL, iData, N);     
 
     

  pBuf = buffer[ready_index];
 for(i = 0;i < BUFFER_COUNT;i++) { // pack into buffer
  xR[i].real = xR[i].real / N;
  xL[i].real = 0;//xL[i].real / N;
    
  *pBuf++ = xR[i].real;
  *pBuf++ = xL[i].real;
 }
 

 *(volatile int *)IO_PORT = -1; // set STS0/1 high - for time measurement
    buffer_ready = 0; // signal we are done
    over_run = 0;


}

void EDMA_Init()
////////////////////////////////////////////////////////////////////////
// Purpose:   Configure EDMA controller to perform all McBSP servicing.
//            EDMA is setup so buffer[2] is outbound to McBSP, buffer[0] is
//            available for processing, and buffer[1] is being loaded.
//            Conditional statement ensure that the correct EDMA events are
//            used based on the McBSP that is being used.
//            Both the EDMA transmit and receive events are set to automatically
//            reload upon completion, cycling through the 3 buffers.
//            The EDMA completion interrupt occurs when a buffer has been filled
//            by the EDMA from the McBSP.
//            The EDMA interrupt service routine updates the ready buffer index,
//            and sets the buffer ready flag which is being polled by the main
//            program loop
//
// Input:     None
//
// Returns:   Nothing
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
{
 EDMA_params* param;

 // McBSP tx event params
 param = (EDMA_params*)(EVENTE_PARAMS);
 param->options = 0x211E0002;
 param->source = (unsigned int)(&buffer[2][0]);
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = 0x34000000;
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTN_PARAMS & 0xFFFF);
 
 // set up first tx link param
 param = (EDMA_params*)EVENTN_PARAMS;
 param->options = 0x211E0002;
 param->source = (unsigned int)(&buffer[0][0]);
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = 0x34000000;
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTO_PARAMS & 0xFFFF);
 
 // set up second tx link param
 param = (EDMA_params*)EVENTO_PARAMS;
 param->options = 0x211E0002;
 param->source = (unsigned int)(&buffer[1][0]);
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = 0x34000000;
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTP_PARAMS & 0xFFFF);
 
 // set up third tx link param
 param = (EDMA_params*)EVENTP_PARAMS;
 param->options = 0x211E0002;
 param->source = (unsigned int)(&buffer[2][0]);
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = 0x34000000;
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTN_PARAMS & 0xFFFF);
 
 
 // McBSP rx event params
 param = (EDMA_params*)(EVENTF_PARAMS);
 param->options = 0x203F0002;
 param->source = 0x34000000;
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = (unsigned int)(&buffer[1][0]);
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTQ_PARAMS & 0xFFFF);
 
 // set up first rx link param
 param = (EDMA_params*)EVENTQ_PARAMS;
 param->options = 0x203F0002;
 param->source = 0x34000000;
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = (unsigned int)(&buffer[2][0]);
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTR_PARAMS & 0xFFFF);
 
 // set up second rx link param
 param = (EDMA_params*)EVENTR_PARAMS;
 param->options = 0x203F0002;
 param->source = 0x34000000;
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = (unsigned int)(&buffer[0][0]);
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTS_PARAMS & 0xFFFF);
 
 // set up third rx link param
 param = (EDMA_params*)EVENTS_PARAMS;
 param->options = 0x203F0002;
 param->source = 0x34000000;
 param->count = (0 << 16) + (BUFFER_COUNT);
 param->dest = (unsigned int)(&buffer[1][0]);
 param->reload_link = (unsigned int)(BUFFER_COUNT << 16) + (EVENTQ_PARAMS & 0xFFFF);
 
 *(unsigned volatile int *)ECR = 0xf000; // clear all McBSP events
 *(unsigned volatile int *)EER = 0xC000;
 *(unsigned volatile int *)CIER = 0x8000; // interrupt on rx reload only
}

void ZeroBuffers()
////////////////////////////////////////////////////////////////////////
// Purpose:   Sets all buffer locations to 0
//
// Input:     None
//
// Returns:   Nothing
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
{
    int ip0 = BUFFER_COUNT * NUM_BUFFERS;

    int *p0  = (int *)buffer;


   while(ip0--)
        *p0++ = 0;

// twiddle faktoren initalisieren
  
    twiddle_factor_rd2(N, WxFFT);
    twiddle_factor_ird2(N, WxIFFT);
  
 digitrev_index(iTwid, N/RADIX, RADIX); //produces index for bitrev() W
  bitrev(WxFFT, iTwid, N/RADIX);          //bit reverse W
  bitrev(WxIFFT, iTwid, N/RADIX);          //bit reverse W
  
  
  // für FFT/IFFT in Process
  
  digitrev_index(iData, N, RADIX);   //produces index for bitrev() X
}


///////////////////////////////////////////////////////////////////////
// Purpose:   Access function for buffer ready flag
//
// Input:     None
//
// Returns:   Non-zero when a buffer is ready for processing
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
int IsBufferReady()
{
    return buffer_ready;
}

///////////////////////////////////////////////////////////////////////
// Purpose:   Access function for buffer overrun flag
//
// Input:     None
//
// Returns:   Non-zero if a buffer overrun has occurred
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
int IsOverRun()
{
    return over_run;
}
 
interrupt void EDMA_ISR()
///////////////////////////////////////////////////////////////////////
// Purpose:   EDMA interrupt service routine.  Invoked on every buffer
//            completion
//
// Input:     None
//
// Returns:   Nothing
//
// Calls:     Nothing
//
// Notes:     None
///////////////////////////////////////////////////////////////////////
{

 *(unsigned volatile int *)CIPR = 0xf000; // clear all McBSP events
    if(++ready_index >= NUM_BUFFERS) // update buffer index
        ready_index = 0;
    if(buffer_ready == 1) // set a flag if buffer isn't processed in time
        over_run = 1;  
    buffer_ready = 1; // mark buffer as ready for processing

}

 

maybe anyone can help me.

Thank too all

 

 

 

 

  • Benjamin,

    I'm not sure I understand your graph; it doesn't look like the typical output of an FFT algorithm to me.  Does this show the input data, or are you also calling an IFFT to arrive back in the time domain?  I'm also not sure I understand what the different colors mean in the graph.  Does the red color show where you detect a problem with the data?  If so, it seems odd that there is no apparent periodicity between good and bad results.  Is there some irregular activity in your system (ex. serial communication with another device) that could be causing intermittent work load issues?

    Regardless, there are a few general tips that you can follow to improve performance.  One is to make sure cache is properly enabled for your program code and data buffers.  This includes making sure that the MAR bits are set correctly for the memory region(s) that you are using.  You may also want to consider moving your program code (or at least the FFT algorithm code) into IRAM.

    Finally, I want to point out that TI provides an optimized DSP library called DSPLIB that includes many commonly used algorithms, including FFTs and IFFTs.  One of the FFT algorithms in the C67 DSPLIB (DSPF_sp_cfftr2_dit) supports FFT lengths of 32K.  If you have a newer C674 DSP device, you can also use the C674 DSPLIB, which extends the maximum supported FFT length to 64K.  I encourage you to take a look at these releases if you haven't already.

    Hope this helps.

  • Benjamin Fuchs,

    Welcome to the E2E forums.

    How fast is your DSP running?

    How fast are your McBSP samples?

    When you change to 1 channel, what changes do you make? The code above shows xL being zero'ed out at the end of ProcessBuffer. That should not be enough to make this big difference?

    How fast does "Welch, Wright, & Morrow, Real-time Digital Signal Processing, 2005" say that this should run?

    Do you have the compiler optimization switches enabled?

    Are cfftr2_dit and bitrev simple shells around the TI library functions?

    Have you followed Joe Coombs' recommendations in his second paragraph, above?

    Do you or "Welch, Wright, & Morrow, Real-time Digital Signal Processing, 2005" expect the execution time for the FFTs & IFFTs to increase linearly with the size of the FFT? The DSPLib Reference Guide says that the cycle count increases faster than that, so there should be a limit to how many samples can be processed with a given DSP speed and sample rate.

    For the 1024, 2048 and 4096 versions, how much time do you have between the IO_PORT toggles?

    Once you figure out the processing time and whether or not that is the limitation, you can paste more of the IO_PORT toggles through ProcessBuffer to make sure where the most time is being spent. Then you will know what needs some work or at least where the most benefit can be realized.

    Regards,
    RandyP

  • Hello Joe,

    Thank you for the Answer.

    first to the graph, red color means right channel, green color the left chanel. I use first FFT then IFFT.

    The libary are the same that i use at the moment but i implement it too. Nothing change.

    Then i took the "input channels" in the IRAM

    #pragma DATA_SECTION (xR, "L2"); 
    #pragma DATA_SECTION (xL, "L2"); 

    "linking file"

    MEMORY
    {
        vecs:       o = 00000000h   l = 00000200h
        IRAM:       o = 00000200h   l = 0000FE00h   /* 8k */ 
        L2:   o = 00010000h l = 00080000h    /* 64k*/
        SDRAM:      o = 80000000h l = 08000000h /* 16MByte  */
      FLASH:  o = 90000000h l = 00200000h /* 256kByte */
                                                                  
    }

    SECTIONS
    {
        "vectors"   >       vecs
        .cinit      >       IRAM  
        .text       >       IRAM
        .stack      >       IRAM
        .bss        >       IRAM
        .const      >       IRAM
        .data       >       IRAM
        .far        >       IRAM 
        .switch     >       IRAM
        .sysmem     >       IRAM
        .tables     >       IRAM
        .cio        >       IRAM
        "CE0"  >  SDRAM
        "CE1"  >  FLASH
     "L2"  >  L2
    }  

     

    and now it works up to 8k. When i use 16k,32k  nothing happens but why?

     

  • Hello RandyP,

    Thanks for your answer too.

    1. 225MHZ

    2. when you men the samplerate its 44100 Hz

    3. i comment the "Left channel  FFT/IFFT"  

      //DSPF_sp_cfftr2_dit(xL, WxFFT, N ) ;    //TI floating-pt complex FFT 
      //DSPF_sp_bitrev_cplx(xL, iData, N);        //freq scrambled->bit-reverse x
      //DSPF_sp_cfftr2_dit(xL, WxIFFT, N ) ;    //TI floating-pt complex FFT
      //DSPF_sp_bitrev_cplx(xL, iData, N);

    4. they dont say any about it

    5. i dont know what you mean whit this

    6 the funktion cffr2_dit() doing the same like DSPF_sp_cfftr2_dit()

    Have you followed Joe Coombs' recommendations in his second paragraph, above? Yes i followed an now it works up to 8k  and when i use 16k or 32k nothing happend.

    maybe some one knows why.

  • Benjamin,

    If it gets worse when you move to IRAM, then that may mean your program (or data buffers) won't fit there.  I'm a little surprised it even compiled if that's the case.  Regardless, you may need to leave everything in external memory, and that makes it more important to make sure your cache is setup correctly.

    Also, I would encourage you to try out the DSPLIB version of the same function.  Even if there's no functional difference, there could be a difference in performance.

  • Benjamin Fuchs,

    [ When you are editing a reply, if you select text in the "previous post" window above your reply window, then click the red Quote link, the editor will insert that text in a box with the name of the person being quoted. You can click Preview to see how it will look later, then click Compose to get back to the edit window. ]

    RandyP said:
    Do you have the compiler optimization switches enabled?

    If you are building with the default Debug Configuration, which puts your executable .out file into the Debug folder, then you do not have optimization switches enabled.

    If you change to the Release Configuration and build the project, you will have some optimization enabled.

    RandyP said:
    For the 1024, 2048 and 4096 versions, how much time do you have between the IO_PORT toggles?

    You did not address this question. In your ProcessBuffer function, there is a write to IO_PORT at the beginning and at the end of the function. It would appear that this is intended to toggle a GPIO pin so that you can monitor how much time is being spent in the ProcessBuffer function. If you can watch this signal with an oscilloscope and tell us what the waveform looks like for the 2048, 4096, 8192, and 16384 cases, it will help us all to understand what is going on with your system.

    If your cffr2_dit() function is written in C, you will significantly better performance using either the Release Configuration or the DSPF_sp_cfftr2_dit() function.

    Release, DSPF_sp_cfftr2_dit(), better placement in IRAM and L2, proper use of cache. These are the things that will improve your performance.

    What do you want the project to do? Is 8K good enough for what you need to do?

    Regards,
    RandyP

     

    If you need more help, please reply back. If this answers the question, please click  Verify Answer  , below.

  • Hi Joe,

    yesterday i implemet the DSPLIB.

     When i use the external memory then i will get the problems like before so the external Memory is to slow. I think the 8k is the Max what i can process.

    But thanks for your Help.

  • Hi RandyP,

    I use at the moment the Debug mode. The Release Mode doesn't change any.

    The Processing time (approximate values)

    size Input Time Process Time (SDRAM) Process Time (L2)
    0.5k 11ms 8ms ---
    1k 23ms 40ms 20ms
    2k 46ms 70ms 30ms
    4k 92ms 160ms 75ms
    8k 185ms 320ms 150ms

    cffr2_dit()  is written in Assembler from TI. I think it is same as the DSPF_sp_cfftr2_dit() but i implement the "new" lib.

    For the Projekt "Real time Convolution with binaural room impulse response (BRIR)" i need longer length 32k,64k maybe 128k...

    at the moment 8k works but when i use 16k nothing happens i don't know why.

  • Benjamin Fuchs,

    [ For my knowledge, how do you attach an "alternative translation" to your text? When my cursor hovers over your reply, an offer for an alternative translation is presented. I noticed this hover-text with a Chinese reply this past week, but yours was the first that I figured out what it meant. I assume this is a new feature of the E2E editor, so I am embarrassed to ask you, but I would be interested to learn. ]

    Your easiest solution is to move to a faster DSP. The C6747 and C6748 would work very well for this, and they can run at up to 456MHz. You can buy development boards for us$395 and us$495, respectively.

    From your table above, you should have real-time problems with the 1k size when using SDRAM for the buffer. With Process Time > Input Time, you cannot keep up with real-time even at the 1k buffer size. I am surprised that your original post said that 1k worked. Has something changed between the first post and the one above?

    If you program does not crash with the larger buffer sizes, please take Process Time measurements for the L2 case for the larger buffer sizes. The Input Time measurements naturally increase linearly with size, and the L2 Process Time appears to increase approximately linearly with size. This is not what I would have expected for the Process Time increase, but if it is increasing linearly, then Process Time L2 should never exceed Input Time and this would not be the cause of your problems.

    One thing I want you to look at carefully is your linker command file, in particular the MEMORY part. The comments do not affect the operation of the link, but the comments are incorrect about the size of the sections. At least one of the actual length fields is incorrect, not just the comment, although I am not sure when this would cause a problem in your program.

    To optimize your code, you need to understand where the time is being used. Then you can make tradeoffs by changing system resources, such as moving some buffers to L2, changing the use of cache, and using QDMA for data movement. To understand where the time is being used, use the same measurement methods (IO_PORT ?) to get smaller resolution of the ProcessBuffer execution time by measuring the time to do the three major portions: 1) copy from buffer[] to xR/xL, 2) primary processing, and 3) copy from xR/xL to buffer[].

    Regards,
    RandyP