AWR1843: The best setting for L1P and L1D.

Hitoshi Sugawara

Part Number: AWR1843

Hi Experts,

Could you please let me know the way to optimize the cache size setting for L1P and L1D in AWR1843?

Is there any tool to check the cache hit number with a customer's application?

Thank you and best regards,
Hitoshi

over 5 years ago

0 Cesar over 5 years ago

TI__Guru*** 138329 points

Hi,

Optimizing the cache use has always been a concern for DSP based platforms. Usually the higher the size of the cache the better the performance of the system.

However there is code an buffers that have to be allocated in fastest code and data memory.

For Program Cache, the size should we the maximum the implementation can afford. If there is code that is required to be in fastest memory (L1P) then, part of L1P should be allocated as memory. The rest should be allocated as cache. The larger the cache the better the performance.

For Data Cache there are processing buffers which must be allocated in fastest Data Memory (L1D). Usually these are the FFT processing buffers. The rest of L1D should be allocated as cache.

I am checking with CCS/Emulation team if there are capbilities to detect the cache misses at run time

Thank you
Cesar

0 Piyush_ over 5 years ago

TI__Expert 5235 points

The optimization that you are referring to is related to cycles which is less of a problem unless you are putting code or data into L3 memory that is slower relative to L2 i.e cache miss penalty to L3 is much higher than miss penalty to L2 [L2 is part of C674x megamodule, L3 is external to the megamodule]. For example, in the 2.1 SDK oob demo on 16xx, we don't put any real-time code in L3 and all data in L3 (radar cube/detection matrix etc) we page in and out of L2 SRAM (or L1D SRAM if part of L1D is allocated as SRAM) so all L1D access is to L2, not L3. If your customer wants to put any data in L3 RAM and access through cache [instead of EDMA to L2, sometimes this is convenient (less complex code) in inter-frame processing], then make sure to set the MAR register correctly [locate section "MARs ( DSP/C674x) " in SDK user guide]. In the demo, we did not configure full sized L1 caches because we were tight in L2 (and L3) memory and wanted to free up some extra memory so using only half the L1D and P as caches, we get extra 16 KB of data and 16 KB of program memory. The code and data to allocate into the 16 KB L1D and L1P SRAMs was chosen to recover from the reduction in cycles due to using smaller cache. We mention about this in the SDK user guide in section "Guidelines on optimizing memory usage" [item #6]. However, the same split cache vs SRAM strategy may be useful in actually improving cycles compared to the full cache case depending on the access patterns in the code and any specific areas of code you are interested in optimizing. See also the "Discussion on Cache Strategy" section in www.ti.com/.../swra552.pdf.

0 Hitoshi Sugawara over 4 years ago in reply to Piyush_

TI__Mastermind 18650 points

Hi Cesar and team,
Thank you for kind explanation.
It helps a lot.
Will keep on studying .

Best regards,
Hitoshi

Sensors

Sensors forum

AWR1843: The best setting for L1P and L1D.