This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IPNC DM8127: dsp L2 SRAM efficiency

Hi everyone,

I'm using IPNC DM8127 with IPNC_RDK ver 3.5.

I've already fulfilled my own alg in algLink,but now I meet an efficiency problem.

In my alg, the data are stored in DSP_DATA section,which is in DDR with dsp cache enabled.But the efficiency of my alg still can't satisfy my need.

As far as I know,the L2  SRAM(256K) on dsp is configured as 128K cache and 128K SRAM by default.And here is my question:

1.If I configure L2 SRAM as 256K cache,will it increase my alg efficiency greatly?

2.What's the difference between using L2 cache directly & manually copy my data to L2 SRAM and then process it? If I use EDMA to copy my data to L2 SRAM and then process it there,will it be more efficient than letting the cache do this job?

3.If my data are much larger than 256K,will it be efficient to copy my data partly in and out L2 SRAM?what if I configure L2 SRAM as 256K SRAM and 0k cache,will something bad happen since I close L2 cache here?

Thank you for your patience.

regards,

Zhao

 

  • I will notify our IPNC RDK team for help.

    Regards,
    Pavel

  • While I'm not familiar with the IPNC RDK, I can make some general statements here:

    If you can predict in advance which data you're going to need next (which is probably often the case when processing data in a streaming fashion) then using L2 as SRAM should be more efficient as you can then make a "pipelining" construction where EDMA transfers the next block of data while the DSP processes the previous block of data in parallel, thus completely hiding the cost of the data transfer.

    The cache on the other hand cannot see into the future, nor does it support any kind of background preloading, so if the algorithm has poor data locality (also often the case when processing data in a streaming fashion) the DSP will spend much time stalled waiting on cache linefills.

    The downside of such a pipelining construction is that it complicates the structure of software, and only applies when the structure of data access is regular enough to allow starting the EDMA transfers in advance; waiting synchronously for EDMA to complete would probably result in worse performance than relying on the cache.

    Zhao Shui said:
    what if I configure L2 SRAM as 256K SRAM and 0k cache,will something bad happen since I close L2 cache here?

    I have DSP code running with both L2 and L1D disabled without any problems, but it is a relatively simple case since all of its "hot" data fits in L1D and all of its code in L1P.  The latter is a point to take into consideration: L2 cache is used for both code and data.  If the hot parts of your code do not fit into L1P then you're going to have costly stalls on instruction fetching if you disable L2 (or I should say, even costlier, since an L1 fill from L2 is already quite costly compared to L1 hit).

  • Hi Matthijs,

    Thank you for your patience to write such a clear explanation, and your answer is exactly what I want.☺

    regards,

    Zhao