This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[OMAP3530] Performance degradation when data is placed in internal memory

Hi,

In my DSP code on OMAP 3530 , I am seeing that the even if i place  certain data (some buffers) and program(some functions) on chip memory  , my performance is degraded compared to the same data and program in external memory with caching being enabled.Can somebody tell what could be the reason for this.

Is it because the data buffers are placed in the same banks of internal memory  so that CPU stalls are ocurring in accessing the same memory bank...

Please help me in resolving this.

 

Thanks,

Manoj

  • This is not a typical situation, generally everything in internal memory will be faster than if it was placed externally. I could see this happening to some extent if your code/data was in L2 and its alignment on the L1 cache lines was different from external memory to the point that you end up with L1 cache thrashing, but this also assumes that your code/data is small enough to entirely fit within the L1 cache space. Another possible explanation would be if your code/data was in L2 and you disabled the L1 caches all together, this would cause a significant performance loss as the L2 memory runs slower (1/2 clock speed I believe) than the L1 memory, so losing the L1 cache even if all your code/data was in internal L2 memory would be bad.

    In general as your code base on the DSP grows larger you will have better performance by putting your most commonly accessed code and data sections into internal memory with as much cache still enabled as possible, however if external code/data is spaced out in such a way that the L1 caching takes it all in efficiently than the difference can be minimal.

  • Thanks for ur reply.

    As u indicated , most of my code and data is in external memory and what am trying to do is that most cycle cosuming parts (data & program) am trying to put in internal memory.

    Initally full 32k L1p was configured for cache and then i placed something in internal and made the 16k as L1p cache .But this degraded the performance.

     

    Does this mean that having full 32k L1p as cache is giving better performance than 16k each cahce and the rest internal RAM ?? Is this the expected behaviour .

    Same scenario , i experienced with L1D also.

    Thanks,

    Manoj

  • Manoj R said:
    Does this mean that having full 32k L1p as cache is giving better performance than 16k each cahce and the rest internal RAM ?? Is this the expected behaviour .

    This is very system dependent, but if you have very much of your code/data accessing DDR than you are correct, having the 32k L1 as cache can improve performance over 16k L1 cache and 16k L1 RAM, it really depends on how much you access DDR locations relative to how much you access what you put in the 16k L1 RAM.