This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to disable/invalidate cache holding MSMCSRAM

Other Parts Discussed in Thread: TMS320C6678

In my application, a device writes the the C6678 PCIe inbound memory. This inbound memory is configured by software to be located in the MSMCSRAM.

There are 2 buffers in MSMCSRAM. In my test, the device writes alternate between buffers. So if the device writes a value of n to buffer 0 on Dolby frame n, it will write a value of (n+1) to buffer 1 on Dolby frame (n+1), a value of (n+2) to buffer 0 on Dolby frame (n+2), and so on. In my test app, C6678 software will read from both buffers and report when it finds a value of n in 1 of the buffers. It only finds value n when I put a printf() near the memory read. It never finds value n if the printf() is commented out.

Also, from the CCS debugger, I have noticed that the memory browser reports increasing values when the "Go" button is clicked only when the "L1D", "L1P", and "L2" boxes are NOT checked.

I strongly suspect that the cache is the problem.

How do I disable caching of the MSMCSRAM? How do I invalidate the cache before reading from MSMCSRAM?

  • As a first step I'd suggest if you're looking for a value to change on variable located in memory, declare this variable as volatile to make sure it's not optimized to only be read once.  Tossing a printf in front of it may end up shifting the timing of when it's read from MSMC off such that it's already written by PCIe before it first gets read by CorePac while it wasn't without the printf.  

    You can use the Memory Attribute register to disable caching for various memory range sizes, to cover MSMC if you wish.  This is covered in the TMS320C66x CorePac User Guide which can be found on the TMS320C6678 <- Product Page in the user guide area.  You may also want to take a look at the TMS320C66x DSP Cache User Guide located there.

    Best Regards,

    Chad

  • I tried to use the "volatile" attribute. This had no effect. My code is now:

    #pragma DATA_SECTION(pcieBuf, ".pcieBufSec")
    /* Cache coherence: Align must be a multiple of cache line size (L2=128 bytes, L1=64 bytes) to operate with cache enabled. */
    /* Aligning to 256 bytes because the PCIe inbound offset register masks the last 8bits of the buffer address  */
    #pragma DATA_ALIGN(pcieBuf, 256)
    volatile unsigned char pcieBuf[8][2][0x20000];

    ...

    void main(void){

       *Initialize PCIe*

       for (;;) {
          unsigned long base = (unsigned long) &pcieBuf[core][0][0];
          volatile unsigned long rd;
          base += 4;
          rd = *((unsigned long *) base);
          //printf("0:%p\n", (unsigned long *) base);
          if (rd == 6) {
             printf("Found 0\n");
          }
          base = (unsigned long) &pcieBuf[core][1][0];
          base += 4;
          rd = *((unsigned long *) base);
          //printf("1:%p\n", (unsigned long *) base);
          if (rd == 6) {
             printf("Found 1\n");
          }
       }

    }

    I have read in the forums that I have to invalidate the cache before accessing pcieBuf. How do I do this?

  • I have tried to set MAR 12 bit 0 to 0 using CACHE_disableCaching(12). This did not work when I read hCache->MAR[12], I ended up with a value of 0xd.

    According to sprugw0, this bit can only be set in supervisor mode. How does one put the processor in supervisor mode?

  • By default you're operating in Supervisor mode.  That said, looking through the corepac UG, I see footnote #1, that points to table 4-22 and the PC bit of the MAR for 12-15 (MSMC range) is read only, so you can't set the MAR to non-cacheable.  I missed this.  You would have to disable caching altogether to do what you wanted.  

    Looking at the code, I see you're effectively polling on what's written by PCIe.  I would suggest having the PCIe notify you instead of polling on data, then perform and invalidate and then do you reads.  

    You could do this by writing directly into the L2 of the CorePac you're notifying and poll that location, if written to the L2 via a Master, it will mark the cached location as invalid and it will grab the new data the next time it's attempted to be accessed.  There are other ways to do a notification such as interrupts via the PCIe as well, but this is the simplest based on polling.

    Best Regards,

    Chad

  • I only showed one of my test applications. I was trying to find the problem by simplifying the code as much as possible. I am using this code for debug purposes only.

    In my real application, the device uses PCIe MSI interrupts to notify the C6678 when data is available. In the PCIe MSI interrupt service routine, I look at the first 32-bit word in the buffer to make sure that the buffer provided by the device is valid.

    Right now, only core 0 is being used. The other cores will eventually run the same application as core 0.

    In my PCIe MSI based test app, I had problems doing cache invalidates. I used the CACHE_invL1d() function, and found that if the number of bytes passed to this function (2nd argument) did not match the number of bytes read in the "for" loop, I would see invalid data in the buffer. In the real application, I do not know how the buffer is being accessed by the function that does the work because this function is only available to me in binary format. So, I cannot read the source code of the worker function. Also, in the real application, the buffer is too large to be held in L2 SRAM. That is why I put it in MSMC SRAM.

    The PCIe MSI based test function is the same as the one shown before with the addition of code to acknowledge the interrupt.

  • Geoffrey,

    You would need to invalidate the full address range that has new data written to via PCIe, including in L2 cache if L2 cache is used.

    Best Regards,
    Chad 

  • This is the part I don't understand.

    I found that if the device wrote 17 32-bit words, and I invalidated all 17 32-bit words, and I read all 17 32-bit words, the C6678 would receive data correctly.

    But, if the device wrote 17 32-bit words, and I invalidated all 17 32-bit words, but only read the first 16 32-bit words, the C6678 would receive corrupt data.

  • I'd suggest opening the CCS memory window and checking the L1D, L2 boxes and see where in memory the correct values reside for this.

    I would look at this before and after the Invalidate operation for the 17 words of data you're specifically looking at.

    Can you provide a screenshot of this before and after, as well as a snapshot  of the code used to invalidate.

  • The code used to invalidate is:

          unsigned int key;

          /* Disable Interrupts */
          key = _disable_interrupts();

          /*  Cleanup the prefetch buffer also. */
          CSL_XMC_invalidatePrefetchBuffer();    

          CACHE_invL1d ((void *)&pcieBuf[core][buf][0],  17*4, CACHE_FENCE_WAIT);
          //CACHE_invL2  ((void *)&pcieBuf[core][buf][0],  16*4, CACHE_FENCE_WAIT);

          /* Reenable Interrupts. */
          _restore_interrupts(key);

    It was copied from the PCIe sample application in the PDK_C6678 directory.

    pcieBuf is defined as follows:

       extern unsigned char pcieBuf[8][2][0x20000];

    This is before the call to CACHE_invL1d():

    This is after the call to CACHE_invL1d():

    Please note that the device writes constant data to the buffer, triggers a PCIe MSI. The device then increments the data value by 1, and fills the buffer with this new value. This continues while the C6678 is at a breakpoint.

  • Was there a read error after this?  It would appear that the cache invalidate worked, the fact that there is no tag ram setting it as cached causes the emulation to view that as White (instead of blue since that was in L1D cache at that time.)

    It looks like you have MSMC as an L2 so you don't need to do the L2 invalidate (if, MSMC was setup as an L3 which can be done you would have need to invalidate it as well.)

    It does look like you've gotten the newest data from PCIe write picked up on the previous execution.  Are you certain everything had been read out prior to the next PCI transfer.  Where you see the 0x2282 that's the first location of a new cache line 0x0C020040 if that happened to have been evicted before the reading, and the PCI had already updated (which it looks like the next set of data) then this would be what was cached in and it would make since given that the value is what you expect the PCIe to write the next time, correct?

    Best Regards,
    Chad 

  • The problem is I cannot stop the device from writing data to MSMC SRAM via PCIe, so it is entirely possible that when I click "Go", the device is in the middle of writing to the middle of the buffer.

    The MSMC should be set as L2. I use the default setting which I believe is L2. You'll notice that the L2 cache invalidate has been commented out.

    This means I cannot tell if there was a read error after the memory browser was captured.

    I am relatively certain that everything has been read out prior to the next PCIe transfer. The device writes data at the Dolby frame rate. I can't imagine that reading 17 32-bit words and comparing against a value would take more that a Dolby frame.

    I have noticed that if I change the cache invalidate size from 17*4 to 32*4, the rate at which I receive errors changes. If I use 17*4, 1 out of 5 power ups will result in software reporting bad data. If I use 32*4, 3 out of 5 power ups will result in software reporting bad data.

  • I think you need to go back and look at the 'bad data' and you'll find that it's always the future data, and it's always all of a cache line that's used that contains that future data.  This is a PCIe write timing issue.  

    When your changing the Cache Invalidate size you're changing the relative timing timing to PCIe, just like adding printf's would do.  You need to provide some sort of useful signalling when the PCIe tranfer is complete before invalidating and processing.

  • You are correct. The bad data is always future data.

    When you say it is a PCIe write timing issue, exactly what do you mean?

    Are you saying that the cache invalidate is taking too much time to execute, and the PCIe MSI needs to happen immediately after the PCIe transfer completes? Currently, the PCIe MSI is triggered by the device 20us after the transfer completes.

    You said in an earlier post, " You would have to disable caching altogether". I would like to try to do this as a test. How do I disable data caching completely?

  • I mean that the PCIe is writing data over the buffer before you've processed it yet.  It appears to be happening during the middle of your attempt to process the data.

    I don't have your timing requirements so it's hard to say.  It may be best to implement a PingPong buffer system if you have limited control over when the PCIe will write the data, that way you're processing one buffer while it has the ability to write the other buffer, this is common methodology to ensure you don't run into the sittuations.

    Otherwise you must be able to notify and process the data before the next write to the buffer start to happen.

    Best Regards,
    Chad 

  • It appears that the cache invalidate causes me to miss my timing requirements. Is it possible to make MSMC SRAM non cacheable?

  • Geoffrey,

    yes it is possible. It's not straightforward as it involves memory address translation.

    You need to configure the XMC something.

    There are threads on the forum about that.

    Clement