This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Moving buffers from Shared RAM to L2 gives different results?



Hi,

 

I've implemented an FFT based convolution.

The code is loaded into SHRAM (0x80000000). When my buffers are also placed in that segment I get some sort of aliasing effect on the output. When I move the buffers to L2 Shared Internal RAM then the output is fine. Why is that?

I move my buffers like so:

#pragma DATA_SECTION(audio_buffer,".data_sh2ram");

#pragma DATA_ALIGN(audio_buffer,8);

float   audio_buffer[2*(NUM_FFT+2)];

The section is diffent as:

  .data_sh2ram   > SHDSPL2RAM

Why are the results different based on the DATA_SECTION pragma?

 

Thanks in advance.

 

Kind regards,

 

Remco Poelstra

  • To make it even more strange: If I move the buffers to DDR2 memory, all goes silence. The twiddle factors are initialised nicely, but the real audio buffers remain silent. No data seems to ever get in there.

    I tried a separate application which just writes a huge buffer into DDR memory and reads the results back again, and that works, so the DDR memory seems to be working.

     

    Regards,

     

    Remco

  • Hi Remco,

    I suppose this is a OMAP-L13x or C674x or variant device. These issues may be caused due to latency issues - essentially the CPU is not able to complete processing data in real-time based on the audio input rate, operating frequency etc. Try the following one step at a time - but all of them are good tips to follow in general:

    1. Start with the highest possible CPU and memory frequency during development - you can reduce this later based on CPU loading and data bandwidth required.
    2. Make sure L1P/D caches are enabled (see sprufk5a.pdf); if data/program is in DDR2, enable L2 as cache suitably - see below for cache config recommendation.
    3. Use EDMA for data transfers to/from peripheral and memory - whether onchip or external RAM - use double buffering (or ping-pong buffering).
    4. As system become more complicated, with multiple data transfers taking place simultaneously, it may be necessary to experiment with setting different master priorities.

    If EMIFB and DDR2 memory is initialized correctly in your application, there should be no reason why data cannot be transferred to it either by CPU or EDMA. There has to be a configuration problem as the other application can successfully read and write to/from DDR2.

    Hope this helps.

    Regards,

    Sunil Kamath

  • And the promised cache configuration recommendations;

    Code (.text) – DDR (or SDRAM)

    L1D/P – leave as 32K cache (and turn on cache)

    L2 – use 64K cache, the rest SRAM (for starters)

    Stack – L2 SRAM

    Heap (fast) – L2 SRAM (only if needed)

    Heap(slow) – DDR (or SDRAM)

  • Hi,

     

    Thanks for your answer. It indeed had something to do with latency issues.

    I was already using the techniques you mention in 1-3. So that did not further help me, but I found out that if I compile my application with optimization level 2 that it works also from DDR memory. Maybe the code with no optimization was to inefficient and required more memory access than necessary.

    I do still wonder why I get no output if the DSP can't keep up with the data. I would suspect to hear something, although incorrect as some buffers are still getting filled (I use ping-pong buffering).

     

    Regards,

     

    Remco Poelstra