This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AWR1642: How to place code in L3 and run efficiently?

Part Number: AWR1642

Hi champion,
   In mmw demo in SDK, all the runtime code runs from L2. Only the initialization code (section .overlay) runs from L3 once and then overlapped by L3 radar cube. In customer's real case, L2 is not enough for code, and they need to place some run time code in L3. Could you help to check how to support it? Especially how to run code efficiently from L3? Is this needed to enable L2 cache and change MAR to enable caching the code section in L3?

Thank you,
Adam

  • Hi,

    Enabling cacheability of L3 will definitely help.

    Profiling the code and allocating the sections which run the most in L1 or L2 will help as well.

    Thank you
    Cesar
  • HI Ceasr,
    If we does not enable L2 cache and enable cachability for L3 memory, can L1P cache code in L3 directly?

    Thank you,
    Adam
  • Yes

    Thank you
    Cesar
  • Hi Cesar,
       Thanks a lot. I am clear now.

    Best regards,
    Adam

  • Hi Adam,
    As mentioned in the section "EDMA versus Cache based Processing" of mmwave demo doxygen, the MAR corresponding to L3 range is not enabled for cacheability i.e left to its chip-default state. You can enable it for higher program execution efficiency from L3 but this also means that any data in L3 will also get cached and any L3 data we are accessing from DSP directly (versus using EDMA) may need to be cache write-backed/invalidated when DSP is writing/reading to/from any non-DSP entity (e.g mailbox, EDMA, UART) i.e take care of cache coherency operations in the software if needed.

    In the current demo, probably (I haven't evaluated thoroughly hence "probably") there does not seem to be any such cases because we access L3 using EDMA and DSP is only accessing L1/L2 for data (and even sending some payloads from L3 to UART that way are probably o.k because DSP is simply passing the pointer and not doing read-modify-write on L3 data) but just making you aware in case customer may have different usage patterns of data in L3 for their application code.
  • Hi Piyush,
       Thanks a lot for your reply. I am clear now:)

    Best regards,
    Adam

  • Hi Adam,

     I dug a bit more into this because MAR is a register in L2 configuration so I was curious if L1D is also influenced by MAR, my recollection was that it was but was not sure, the megamodule document (http://www.ti.com/lit/ug/sprufk5a/sprufk5a.pdf) clearly mentions as quoted below below that L1D also gets influenced by MAR:

    --------------

    4.3.7.3 L1 Interaction
    When L1P or L1D makes a request to L2 for an address that is not held in L2 RAM or L2 cache, the L2
    controller queries the corresponding MAR register for that address. If the permit copies (PC) bit in the
    MAR register is 0, the L2 cache controller treats this as a non-cacheable access and initiates a
    long-distance access. If the access is a long distance read, the CPU stalls until the read data returns and
    the L1D will write-back dirty data if present in the LRU cache set that matches the non-cacheable memory
    address.
    Concerning L1D long distance requests, the net result of the PC bit in the MAR is to prevent
    non-cacheable data from being stored in the L2 and L1D caches. Thus, when PC = 0 in a given MAR
    register, neither the L1D nor the L2 cache retains a copy of data accessed within the address range
    covered by that MAR.
    The MAR registers have no effect on L1P. If L1P is enabled, it will always cache program fetches
    regardless of MAR configuration.

    ----------------------------------

    Note also the last statement i.e if you are only caring about efficient program access (and not data access) in L3 with no L2 cache configuration (demo does not enable L2 cache as you know because of trying to maximize SRAM), then it is already efficient i.e no need to program MAR. And from the above (first statement "L1P and L1D.."),  we can see that MAR has an effect on program access for L2 i.e if cacheability is not set for non-zero L2 cache, then program accesses will not get cached in L2, they will get cached only in L1P.

  • Hi Piyash,
       So if MAR PC is not enabled for L3 memory address,

    1. data on L3 will not be cached by either L1P or L2 cache.
    2. code on L3 will be cached by L1P, but not by L2 cache.

       Do I understand it correctly?

    Thank you,
    Adam

  • correct except a minor correction in point #1, you meant "L1D or L2 cache", not "L1P or L2 cache".

    Also, a little nuance in the usage of words: we often use the word "code" to mean program or instruction access but code can also include read-only (.const) data tables which will be data accesses, the tables are generally considered part of the code. If MAR is not enabled, then this read only data will not get cached (because it goes through L1D not L1P) and may cause performance degradation depending on the access patterns. While we need not worry about read-only data in terms of coherency operations (no need for write back or invalidate), it will suffer performance degradation if MAR is not set, unlike instruction access. So it is preferable to use "instruction" instead of "code" when referring strictly to P access but I also use the word "code" casually in my statements to mean instruction so I need to follow my own advice first :-).

  • Hi Piyush,
       Yes, this is a typo in point #1. 
       Great to know the details about code:) Very helpful.

    Thanks and best regards,
    Adam