This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSP Data in External Memory

Other Parts Discussed in Thread: OMAP3530

Hi,

I'm using the DSP on the OMAP3530 for FFT calculations (C64x+DSPLIB).  Due to the large FFT size (32K points), it is not possible to perform this calculation in dsp internal memory.  Thus, input, output, and twiddle factor buffers are located in external ram allocated with CMEM.  Timing analysis shows ~60ms to compute this FFT on the DSP with data in external memory.  A comparison was made using an FFT found in FFMPEG, optimized for the ARM/Neon- timing for this resulted in 6.9ms to compute the same FFT (32-bit float for Neon, 16-bit int for dsp).  My initial guess for relatively poor performance on the dsp is the need to access external memory, but given that the Neon is also acting on data in external memory, why the large difference in performance?

Thanks,

Rick

 

  • In general when an algorithm is performing much slower than expected I generally suspect cache configuration as the main culprit.  If the data is stored in memory that is non-cacheable to the DSP then performance will suffer greatly.

    I recommend starting by reading this article.  However, this is a bit more complex given that you have an ARM+DSP architecture.  Are you using Codec Engine?  If you're using DVSDK 4.0 it integrates a new product called "C6Accel" which actually builds our dsplib into the codec server that we ship.  Certainly the cache is configured properly within DVSDK 4.0 so that might be a good starting point.