This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Which area of data in L1D will be written back if the size of potential cached data in MSMC is larger than L1D?



Hello

Suppose the function "Matrix multiplication" is a calculation performing on InBuffer and OutBuffer. InBuffer and OutBuffer are resided on MSMC, which is programmed as SL2. In this condition, I am aware that data on SL2 is cached on L1D, and SW needs to maintain the cache coherency between L1D and MSMC. Supposing that the size of OutBuffer is 50KB, which is larger than the size of L1D. So when I use the CSL codes, CACHE_InvL2(OutBuffer, sizeof(OutBuffer), CACHE_WAIT);, does the HW determines which area needs to be writeback, and only writeback those area related with OutBuffer? Or the HW just writeback the entire L1D, because the size of OutBuffer is larger than L1D. 

Or maybe the data in OutBuffer is not cached because of L1D is read-allocated? 

A similar question, what will be invalidated if InBuffer size larger than L1D? Note: InBuffer is previously cached in L1D, because those three lines of codes are contained in a "for" loop.

CACHE_invL1d(InBuffer, sizeof(InBuffer), CACHE_WAIT);
Matrix_multiplication(InBuffer, OutBuffer)
CACHE_wbL1d(OutBuffer, sizeof(OutBuffer), CACHE_WAIT);

Thanks.

Xining Yu

  • You ask an interesting question. First if you look at the User Guide you see that L1D is allocating lines only on read, not write. From the User’s Guide:

     

    The L1D cache is a read-allocate-only cache.

    This means that the L1D cache will fetch

    a complete line of 64 bytes only on a read

    miss. Write misses are sent directly to L2

    through the L1D write buffer. The replacement

    strategy calls for the least-recently-used

    (LRU) L1D line to be replaced with the new

    line. This keeps the most recently-accessed

    data in the L1D cache at all times.

     

    Now I am not sure if your code uses the OutBuffer to read or only to write. So if you use the OutBuffer to read, part of it was allocated in the L1D. Obviously if the size is 50KB not all OutBuffer can be in L1D (32KB) and depends on your algorithm, the cache might have other values, but in general, the eviction algorithm is last touched first to be evicted (out of the two cache lines that can be used, this is two way association cache, so each address can be reside in one of two locations in the cache)

    The invalidate write-back (or the write-back) will write back to the MSMC memory only the lines that are part of the OutBuffer and in the L1D cache

     

    Does it make sense?   If you have more questions ask me, otherwise close the thread please

     

    Ran

     

  • HI Ran

    Thanks for your reply. You had answered my first question.

    I know the part of user guide you referenced. And that is another reason I ask the above question. As the user guide said the write misses happened in L1D are send directly to L2, and MSMC is treated as external memory, I want to make sure whether the write misses would be sent to MSMC but not L2.

    Regards
    Xining

  • Xining

    There is no difference between MSMC memory and DDR in the following sense. If a write request comes to the L2 memory controller, and the address is outside the core and IS NOT in the L2 cache, then it will be sent toward the MSMC system and will be written to the MSMC memory (if the address is MSMC memory address) or to the DDR (if the address is DDR address)

    Please close the thread if this is enough

    Best Regards

    Ran
  • Last question.

    In the C66x Cache User Guide, the pseudo-code in the Example 2-4 shows below.

    It implement a Ping-pong Buffering on external memory similar as mine. The function "DMA_transfer" in our application is performed by EDMA. My question is: Does the CACHE_WAIT is necessary for invalidating InBuff? From my point of view, the function "process" will not use the invalidated Buffer until the next function "process" is called. So if the time consumption of invalidating InBuff is less than the time span between invalidating InBuff to calling the function "process", which uses the Invalidated InBuff,  I can use CACHE_NOWAIT. Is that correct?

    Thanks.

    Xining Yu

  • You are right, and should be careful as well.

    Without getting into the details of the example, if you use Buffer A that was previously moved, you do not need to wait until buffer B is completely moved. So in that sense, you are right, BUT

    You need to guarantee that moving buffer A is completely done before you start processing buffer A, and yes, it was moved during the previous processing, but you need to be sure. There are several ways to ensure that buffer A is completely in the memory, so pick one. (you can use EDMA event, or play other tricks)

    If you have any other question do not hesitate to submit it, but please submit it in a separate posting, so we can close this one

    Best regards

    Ran
  • Thanks Ran.