This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AWR1843: The best setting for L1P and L1D.

Part Number: AWR1843

Hi Experts,

Could you please let me know the way to optimize the cache size setting for L1P and L1D in AWR1843?

Is there any tool to check the cache hit number with a customer's application?

Thank you and best regards,
Hitoshi

  • Hi,

    Optimizing the cache use has always been a concern for DSP based platforms. Usually the higher the size of the cache the better the performance of the system.

    However there is code an buffers that have to be allocated in fastest code and data memory.

    For Program Cache, the size should we the maximum the implementation can afford. If there is code that is required to be in fastest memory (L1P) then, part of L1P should be allocated as memory. The rest should be allocated as cache. The larger the cache the better the performance.

    For Data Cache there are processing buffers which must be allocated in fastest Data Memory (L1D). Usually these are the FFT processing buffers. The rest of L1D should be allocated as cache.

    I am checking with CCS/Emulation team if there are capbilities to detect the cache misses at run time

    Thank you
    Cesar
  • The optimization that you are referring to is related to cycles which is less of a problem unless you are putting code or data into L3 memory that is slower relative to L2 i.e cache miss penalty to L3 is much higher than miss penalty to L2 [L2 is part of C674x megamodule, L3 is external to the megamodule]. For example, in the 2.1 SDK oob demo on 16xx, we don't put any real-time code in L3 and all data in L3 (radar cube/detection matrix etc) we page in and out of L2 SRAM (or L1D SRAM if part of L1D is allocated as SRAM) so all L1D access is to L2, not L3. If your customer wants to put any data in L3 RAM and access through cache [instead of EDMA to L2, sometimes this is convenient (less complex code) in inter-frame processing], then make sure to set the MAR register correctly [locate section "MARs ( DSP/C674x) " in SDK user guide]. In the demo, we did not configure full sized L1 caches because we were tight in L2 (and L3) memory and wanted to free up some extra memory so using only half the L1D and P as caches, we get extra 16 KB of data and 16 KB of program memory. The code and data to allocate into the 16 KB L1D and L1P SRAMs was chosen to recover from the reduction in cycles due to using smaller cache. We mention about this in the SDK user guide in section "Guidelines on optimizing memory usage" [item #6]. However, the same split cache vs SRAM strategy may be useful in actually improving cycles compared to the full cache case depending on the access patterns in the code and any specific areas of code you are interested in optimizing. See also the "Discussion on Cache Strategy" section in www.ti.com/.../swra552.pdf.
  • Hi Cesar and team,
    Thank you for kind explanation.
    It helps a lot.
    Will keep on studying .

    Best regards,
    Hitoshi