This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Cache coherence between L2SRAM and L1D cache

Hello,


I am working on 6670 evm. I have configured L1D as Cache and kept L2 cache size to 0.


I have a big buffer in L2SRAM of size around 32000 Words. Out of which first 8000 words are being writted through DMA and being read by CPU. After processing CPU is giving output starting at index 8000 of this buffer. This process is working fine initially but after some time randomly there occurs an event when CPU does not get read the correct input buffer written by DMA. Analysing the situation i found that cpu is instead reading values written in previous run. But when i see this buffer after the cpu processing the content of this buffer is correctly updated as it should be.

I have configured the DMA in polling mode where i kept waiting until DMA transfer completion event is triggered (done thorugh CSL_EDMA3_QUERY_INTRPEND query). So i presume that cpu processing is occuring after DMA transfer is completed.

So either could it be related to EDMA configuration or is it a cache coherency issue? Can somebody help me in understanding this behaviour. Will appreciate any help you can provide in clarifying this situation.

Regards

Naveen



  • Hi,

    There is cache coherency between L2 and L1D cache, so whenever you write to L2 the affected cache-lines of L1D are invalidated. We are using C6678 and this works as expected.

    Could it be your DMA polling sometimes succeeds although the transfer has not finished?

    Regards, Clemens

  • Hello Clemens,

    How can i ensure that it is not a cache coherency issue? Is there some way that i can check if DMA transfer has been completed or not?

    Regards

    Naveen

  • Hi,

    When used as L2 Memory, the L2SRAM is cachable by default, so I suppose you have a cache coherence issue.

    To verify, you can try to disable the L1D cache (by means of the L1DCFG register).

    Also, with L1D enabled, from CCS you should see the caching status from a memory view window (look at colours and L1D check box selector). Break somewhere in you processing and look at memory location colours.

    See SPRUGY8 for cache coherence operation.

  • Naveen,

    What's the TCCMODE setting used when configuring the OPT portion the EDMA transfer? It should be set to 0 for Normal Completion. If it's set to 1 for Early Completion, then you may be accessing the data while it's in flight as the Early Completion gives the TCC after the final TR is submitted, while Normal Completion waits for the data to finish writing.

    Best Regards,
    Chad

  • Hi Chad,

    Thanks for replying.

    I have checked the TCC Mode in my configurations it is configured in normal mode only.  Even if it was configured in EARLY mode then some last block should have mismatch but i see starting words are in mismatch. I have configured the EDMA channel in incremental address mode.

    Regards

    Naveen

  • Alberto Chessa said:

    Hi,

    When used as L2 Memory, the L2SRAM is cachable by default, so I suppose you have a cache coherence issue.

    To verify, you can try to disable the L1D cache (by means of the L1DCFG register).

    Also, with L1D enabled, from CCS you should see the caching status from a memory view window (look at colours and L1D check box selector). Break somewhere in you processing and look at memory location colours.

    See SPRUGY8 for cache coherence operation.

    Hi,

    With L2 cache size set to zero i meant that no part of L2SRAM has been configured as L2 cache. All L2SARM is being used as L2 memory only.

    Yes i agree with you that when L2SRAM is used as L2 memory then it is cacheable by default. But in that case cache coherency is maintained by processor itself. I cannot break in between my processing as this issue is very random in nature and most of the times it works fine.

  • Hi,

    As of now, I am stuck into this issue. Please give me some direction in which i can move forward.

    Regards

    Naveen

  • Naveen,

    Are multiple cores accessing this data or just one core and the DMA?

    Best Regards,
    Chad

  • Chad,


    DMA and software processing both are on the same core.

    Regards,

    Naveen

  • Naveen,

    How are you observing this issue, do you have code reading and verifying this, or is this by memory map reads?

    Can you provide the code if it's small enough to be read through (i.e. not integrated into an existing large project.)  If not then code snippets showing, your DMA transfer kickoff, your polling of the TCC, you code operating on the data, your checking, etc.

    Best Regards,

    Chad

  • Chad,

    To observe the issue i am feeding the same input repeatedly and checking the corresponding output which is known to me and should be same everytime. I made an array in which every time i copy some initial words from L2 bufffer through memcpy(), similarly i copied this L2 buffer into another buffer through DMA as well. I put a breakpoint when my output is not as per my expectation and then compare these two buffers and i get to know that they are not same.

    It is not possible to share the code. But i can provide you its snippet.

    int    a[5000];
    int    b[N][1000];    /* IN DDR Memory */
    int    c[4];
    int    d[1000];
    CSL_Edma3CmdIntr        regionIntr;

    for (i = 0; i < N; i++)
    {
          DMA_COPY(&a[1000], b[i], 1000, 4);
          memcpy(c, &a[1000], 16);
          DMA_COPY(d, &a[1000], 1000, 4);

          /* Here i perform correlation between two signals of 1000 words, one starting from a[0], and other from a[1000]
             and output is placed at location starting from a[2000] location. I am using ccmpyr1 intrinsic to perform this. */


             Here i have put the breakpoint on condition where my correlation result is not matched with the expected one. Then i am viewing the array buffers c and d which are not same. Interestingly here a[1000] is having the correct value as it should have, i mean a[1000] and d are same but c buffer contains wrong data.

    }

    void DMA_COPY(int* destPointer, int* srcPointer, int aCnt, int bCnt)
    {
           CSL_Edma3ParamSetup dataParams;
           dataParams.option = CSL_FINSR(pDataParams->option, 20, 20, CSL_EDMA3_TCINT_EN);
           dataParams.srcAddr    = (uint32_t)srcPointer;
           dataParams.aCntbCnt   = CSL_EDMA3_CNT_MAKE(bCnt, aCnt);
           dataParams.dstAddr    = (uint32_t)destPointer;
           dataParams.srcDstBidx = CSL_EDMA3_BIDX_MAKE(bCnt,bCnt);
           dataParams.linkBcntrld= CSL_EDMA3_LINKBCNTRLD_MAKE(0xFFFF, 0);
           dataParams.srcDstCidx=CSL_EDMA3_CIDX_MAKE(0, 0);
           dataParams.cCnt=1;

           /*hDataTransferParam is Handle to data transfer param setup at init time of system*/
           CSL_edma3ParamSetup(hDataTransferParam, dataParams);

           /*hDataTransferCh is Handle to EDMA Channel opened at init time of system and edmaObj is edma object handle*/
           CSL_edma3HwChannelControl(hDataTransferCh,
                                                CSL_EDMA3_CMD_CHANNEL_SET, NULL);

            regionIntr.region = -1;
            do
            {
                CSL_edma3GetHwStatus((CSL_Edma3Handle)&edmaObj,
                                                                CSL_EDMA3_QUERY_INTRPEND, &regionIntr);
                ullTemp = _itoll(regionIntr.intrh, regionIntr.intr);
            }
            while(!(ullTemp & (1 << dmaChanNum)));     /* generalized poll technique for 64-bit register*/

            /* Clear interrupt bit corresponding to channel */
            regionIntr.intr     = _loll(pEdma3Struct->chaNumMASK);
            regionIntr.intrh    = _hill(pEdma3Struct->chaNumMASK);
           CSL_edma3HwControl((CSL_Edma3Handle)&edmaObj,
                                             CSL_EDMA3_CMD_INTRPEND_CLEAR,
                                             &regionIntr);

    }

    Let me know in case you need any further information on this issue.

    Regards
    Naveen



        

  • Hi,

    Please suggest some direction to move further on this issue.

  • Naveen,

    Can you provide a dump of a, b, c and d  arrays.  Please put them in separate text files and attache the text files.

    -Chad

  • Not sure it is related, but I suggest to try to invalidate the prefetch buffer (by means of XPFCMD) after a DMA read.

    Also try to align the buffers at multiple of cache line size (128 bytes).

    Try also with the L1D disabled.

  • Hi Chad,

    I got the issue in my EDMA configurations itself. Actually another task was also using EDMA, sometimes both tasks triggers submission simultaneously and causing this issue. 


    Regards

    Naveen