This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

L1D and LL2 cache coherency when using EDMA3

Processor Used: c6472.

We have a situation where we suspect we could be hitting a L1D to L2 cache coherency issue but not sure.

We use all of L1D as cache. We have a (temporary) processing buffer (significant size) in LL2. The final output buffer is in DDR2. We use EDMA3 to transfer the processed output from LL2 to DDR2 and then clear the LL2 for next round of processing.

We are seeing an intermittent artifact and What we are not sure is if somehow L1D cache could be overwriting the cleared LL2 buffer (LL2 got written over by the L1D cached contents because it needed a new line to be assigned to L1D cache).

However reading SPRU871 it doesn't look like that should be happening.

This is what I am referring to in SPRU871.

3.3.6 Cache Coherence Protocol

The C64x+ L1D cache remains coherent with respect to DMA activity in L2 RAM. To support this
paradigm, the L1D cache accepts cache coherence commands arriving from L2.

3.3.6.1 L2 to L1D Cache Coherence Protocol

To support L1D cache coherence with respect to DMA/IDMA traffic in L2 RAM, the L1D controller supports
two cache coherence commands arriving from L2: snoop-read (SNPR) and snoop-write (SNPW). The L2
only sends these snoop commands, when necessary, in response to DMA and IDMA activity in L2 RAM.
Snoop-read is sent to L1D when L2 detects that the L1D cache holds the requested line, and that the line
is dirty. L1D responds by returning the requested data.
Snoop-write is sent to L1D when L2 detects that the L1D holds the requested line. It does not matter if the
line is modified within L1D. The L1D updates its contents accordingly.

Will appreciate any help you can provide in clarifying the above.

Thanks, 

Somnath Banik

  • If you read further into the document (Section 4.3.8.1) you'll see that Snoop Reads while updating the Data that was in L1D -> L2 prior to the L2 -> DDR EDMA Transfer.  It does not update the dirty/valid/LRU values of the cache.  So if it was dirty, and then the cache line was evicted to be used by something else, it would be written back to the L2 space.

    I'd suggest performing a writeback prior to the EDMA transfer that way it's no longer dirty and won't result in a writeback if the cache line is evicted.

    Best Regards,

    Chad

  • Somnath,

     

    my suspicion is that L1D dirty lines could potentially get written back to L2, so my main question would be if the CPU is in fact altering the data in L1D which would create dirty lines, that need to be written back to L2 to keep L2 coherent. I realize you are "clearing" the buffer presumably by CPU, but are there additional modifications of L1D data via CPU?

    What I am to understand is how the TRM section 3.3.6.2 applies to your case

    3.3.6.2 L1D to L2 Cache Coherence Protocol

    In order to reduce excessive snoop traffic to L1D, L2 filters the snoops so that unnecessary snoops are not sent to L1D.L2 keeps a shadow copy of L1D's tag memory. L2 consults its local copy of the L1D tags to decide whether a snoop command to L1D is warranted. L2 primarily updates its shadow tags in response to L1D read miss requests, and secondarily in response to L1D victim writebacks. When L1D issues a read request, it also indicates whether or not the line is allocated within L1D; and if so, what way within the set the line is allocated in. L2 can update the corresponding set in its shadow tags from this information.In addition to tracking which addresses are present in L1D cache, L2 tracks also tracks whether or not those lines are dirty in the C64x+ DSP.

     

    Regards,

    --Gunter

  • Reading 4.3.8.1 I get the following understanding. The EDMA transfer (Read from L2) would get the consistent data (Snoop Read in Table 4.9). But then when I use EDMA to clear the L2, Table 4.9 suggests that Snoop Write happens and L1D gets updated too.

    So if L1D gets evicted later, L2 memory should still be consistent.

    If that is correct, then it doesn't agree with your statement "So if it was dirty, and then the cache line was evicted to be used by something else, it would be written back to the L2 space."

    Am I missing something? Is there a limit to how much data write to L2 gets copied to L1D too? What is the significance of  "upto 256 bits of new data is sent from L2 to L1D"  (in the same Table 4.9 for Snoop Write description) 

  • I read section 3.3.6.2 again and I find it confusing. Based on 3.3.6.1 & 3.3.6.2 and 4.3.8.1 I get the understanding we don't have to worry about the cache coherency. The snoop read & write protocol takes care of it all.

    I hope the word DMA used in the doc applied to EDMA3 too.

    Somnath

  • Reading 4.3.8.1 I get the following understanding. The EDMA transfer (Read from L2) would get the consistent data (Snoop Read in Table 4.9). But then when I use EDMA to clear the L2, Table 4.9 suggests that Snoop Write happens and L1D gets updated too.

    So if L1D gets evicted later, L2 memory should still be consistent.

    If that is correct, then it doesn't agree with your statement "So if it was dirty, and then the cache line was evicted to be used by something else, it would be written back to the L2 space."

    Am I missing something? Is there a limit to how much data write to L2 gets copied to L1D too? What is the significance of  "upto 256 bits of new data is sent from L2 to L1D"  (in the same Table 4.9 for Snoop Write description)


    I had assumed you were talking about during the time in between EDMA L2 -> DDR3 Transfer and EDMA ?? -> L2 transfer over that space that this was occuring, as that's the only time it should be able to occur as you've described it?

    You indicated that you cleared it with EDMA xfer.  Can you provide some more specific details on this?

    Best Regards,

    Chad

  • The transfer is from LL2 -> DDR2 followed by clearing the LL2 buffer. Clearing happens by copying from another LL2 (smaller buffer prefilled with 0) to this LL2 buffer.

    We use chain transfer to accomplish this.

    1st PARAM sets up the copy and chained PARAM sets up the clearing. 

  • We just found the source of the problem we have been seeing. The source is another Edma transfer and not this Copy&Fill

    I think I can conclude that cache coherency between L1D and LL2 is not an issue when using EDMA

    Somnath.

  • Thanks, I was scratching my head over this :-)

    Best Regards,

    Chad

  • Thanks Chad and Gunther for the help.

    It helped clarify a long standing and lingering question in my mind about cache coherence between L1D and LL2 when EDMA is used.

    Somnath