This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

28069 FFT Library Benchmarks

Other Parts Discussed in Thread: SYSBIOS

Hi,

I am running the FFT library on the 28069 FPU.  I have linked the "rts2800_fpu32_fast_supplement.lib" which improves the performance of the FFT magnitude.  However, I am measuring the amount of time it takes my 28069 experimenters kit to perform the following functions from the FPU library and it takes approximately 3x longer than expected.  I am running at 90Mhz CPU.  

My RFFT_SIZE is 256.  I load all data in RAM memory (not flash).  

As you can see the RFFT_adc_f32u takes roughly 240usec to complete.  This is approx 240e-6 * 90e6 = 21,600 cycles.  

Likewise the RFFT_f32_mag takes 200 usec or approx 18,000 cycles.  

From the FPU-SW-LIB-UG-V1.40.00.00 I should expect the following benchmarks: 

For RFFT_adc_f32u: 

And 

So it takes approximately 3x more cycles than expected to process these functions.  Any ideas on what I'm doing wrong?  BTW- I previously ran these functions on the 28377D and got much faster resutls.  

Thanks,
Rick 

  • Hi Rick,

    For the FFT, do you have the twiddle factors in FLASH or RAM. Also do you have interrupts turned on that could affect the speed.

    Also, FYI, v1.50.00.00 of the library has been released. There should be an example of the RFFT_ADC that actually uses the phyiscal ADC hooked up to an EPWM. Its for the 28377D but should be easily adapted to the 28069.

    For the magnitude function, go to the project properties->c2000 linker-> file search path and make sure the fast RTS library is the first in the list, or atleast higher up that the regular rts library. This will make sure you are pulling the sqrt from the fast RTS and not the regular RTS.
  • Hi Vishal,

    I placed my Twiddle Factor buffer in RAML6  (origin = 0x00E000, length = 0x002000     /* on-chip RAM block L6 and PAGE=1 */)  

    And yes, using the FAST RTS SUPPLEMENT LIbrary really improves the speed of the magnitude calculation (from 350usec to 200usec). (I did have a weird error trying to set the Link Order in CCS Build->Link Order where my PLL wouldn't lock when I put the "rts2800_fpu32_fast_supplement.lib" before the "rts2800_fpu32.lib" but I got rid of the error by removing the "rts2800_fpu32.lib" from the Link Order list.  So now I just have "rts2800_fpu32_fast_supplement.lib" listed in the Link Order section and everything seems to work ok.)    

    However, I am still 3x too slow.  I'm just speculating here but doesn't it seem that if I am 3x too slow that it might be wait states of RAM that is causing the problem?  Can you try to duplicate the benchmark for these two functions on a 28069 expermenter kit?   

    Thanks,
    Rick 

  • F28069_FPU_RFFT_150702.zipRick,

    Is the PLL locking - can you confirm you are running at 90MHz? The way i benchmark the routines is through CCS' clock capability. In the debug perspective, the menu bar: Run->Clock->Enable. you should see a small clock in the bottom right hand of CCS. Run to the point where the function is being called, double click on the clock to reset it, then hit F6 to step over the FFT function and then check how many cycles it takes. This way i know how many SYSCLK cycles it takes irrespective of what device its running on and then you multiply by the device's T_sysclk to get  the execution time.

    I typically start the clock in the disassembly window, wherever the call instruction (LCR) is - but you can do it in C, you will see a few extra cycles than whats in the user's guide. Anyway i tried the code on a 069 controlcard on an experimenters kit and it matches up. attached

  • Hi Vishal,
    I confirmed the 90Mhz operation by setting the XCLKOUT pin and measuring on oscilloscope.

    I am having problems running the clock measurement. (Although this is really cool.) I keep getting this "C28xx: Breakpoint Manager: Error enabling this function: This task cannot be accomplished with the existing AET resources." error. I am running SYSBIOS with XDS100v1 USB Debug probe. Not sure why I keep getting error. But, I wanted to get some measurements back to you so I single stepped through the assembly code of the RFFT_adc_f32u() function with the clock enabled. So far it has taken 7,828 cycles (and my finger hurts!) which is above the benchmark numbers, so clearly this function is taking too long.

    Do you have any ideas on why this would take longer on my application than yours? It seems related to the FPU library?

    I will look at your example project more closely next and please let me know if you have any ideas.

    FYI, I tried running the FFT library functions from the VCU and it takes almost exactly the same amount of time as the FPU.... 220usec.
    Rick
  • Oh, i just noticed that your coefficients, input, output buffers are allocated to the same RAM. CAn you try allocating to different RAMs and see if that changes anything. I have a hunch that you are seeing read/write stalls.
  • Hi Vishal,

    I figured out the problem. 

    1.  I had placed .text in FLASH instead of RAM.  So the functions were running out of flash with 3 read waitstates.  Placing .text in RAM cuts the time in half.  

    2.  I had another SYSBIOS thread running on the main CPU during this function call.  I disabled the thread and now the performance is inline with the benchmarks.

    You sending me that new project helped me to diagnose the problem.  I ran that and compared with mine to figure out that I was running out of FLASH.  

    Thanks for the excellent support,

    Rick