This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5718: different behaviour of memory buffer on DDR vs OCMC

Part Number: AM5718

Tool/software:

Hello experts!!

I've experienced some weird behaviour and I would like to ask you any kind of hint of where to look.

I had a buffer on DDR memory that was written by the A15(Linux) and read asynchronously by the C66x (TI-RTOS). Most of the times the data read by the C66x was right, but sometimes (if the C66x read the buffer short after it was written by the A15) it happenned to have wrong data. I've seen it completely corrupt, partially corrupt (size of the corrupted part multiple of 16 bytes) or right.

We implemented a notification from the C66x to IPU1 (TI-RTOS) to check the same memory address the C66x was reading and we found that the IPU1 was seeing the data as wrong as the C66x, but if we asked the IPU1 to read the same memory positions again some seconds later, it appeared to be right.

We thought it could be some kind of issue with the DMA so we added a sanity signature right after and before the buffer (both written/updated with new values by the A15 every time the buffer was read, a not elaborated hash) and we saw that most of the times both signatures were OK, sometimes one of the signatures was not properly updated and in rare cases none of the signatures.

We thought too that the memory could be cached on Linux-A15 side but it seems that /dev/mem (used with mmap) is not cached. Is this true or is there any kind of caché on /dev/mem?

Both C66x (MAR registers) and IPU1 (AMMU) don't cache the memory area where the buffer is located. 

So, the final somersault was to move the buffer from DDR to OCMC and it works perfectly!

Would you be so kind to tell me anything that I could check/investigate to understand why it works on OCMC but not on DDR? Any baseline? Any idea? Caché? DMA? Other? Which differences between EMIF-DDR and OCMC should I focus at?

Thank you ind advanced for any help!

  • Hello Ro,

    I will inquire about this internally. 

    Do you see this with multiple devices? 

    Did it at any point work with a certain amount of delay?

    -Josue

  • Hello Josue,

    Got to say that this time I only tried the functionality in my development unit, as when we faced it we were in a rush to deploy the software version to a production cell. That's also why I decided to relocate the buffer to OCMC as we already had some free space left. But I can tell you that the DDR of my unit is working properly as it's being used by all the cores for other system purposes and I tried different addresses.

    The point is that I'm a little bit concerned because this may(will) happen again whenever we need something similar in DDR mandatorily.

    Actually, I "suffered" quite the same effect some time ago (maybe a year) in another unit of the same device HW. That time I was developing another functionality that had to be finally located in a non-volatile memory. My firts approach was to locate the buffer in DDR as a PoC but when I got stuck due to a similar weird behaviour I finally relocated the buffer into the final destination non-volatile RAM and it worked properly, quite the same as this recent time.

    After that "old" issue my guess was that Linux was caching the DDR region where I located the buffer originally but I reached a dead end without finding the reason and I couldnt't invest any more time on it.

    That's why this time I thought of caché too, but our Linux OS team mates tell me that /dev/mem is not cached so I tried to find any other reason for the weird behaviour such as the DMA that could behave in a different way for the DDR or for OCMC.

    I'm pretty sure that it must be something I'm not aware of, some kind of "special care" that needs to be taken for DDR, that's why I think in terms of caché, DMA or maybe that is a much more accessed memory from more cores at the "same" time and some access protection lacks...

    Ro

  • Hello Ro,

    Looking deeper into this, are you using your own IPC solution? Would this not be a good candidate operation to use the existing IPC infrastructure for AM57? 

    The A15core and the C66 core are not coherent, the writes have to be done to non-cached buffer. The issue you are describing sounds like a cache coherency issue.

    See the following graphics from a Bootlin training slide deck:

    Seems like the case of the third graphic where a write is not fully flushed and the data is stale. 

    I believe this issues are taken care of within our IPC stack.

    Please see the following first question for in the following link https://software-dl.ti.com/processor-sdk-linux/esd/AM57X/09_02_00_133/exports/docs/linux/Foundational_Components_IPC.html#frequently-asked-questions

    -Josue 

  • Hello Josue,

    I agree that it sounds like a cache coherency issue as in the case you pointed out (BTW, thank you very much for the Bootlin training and the support!)... and checking again with our Linux OS area colleagues (that's why I took some time to answer), we've seen the internal code of open function regards a special flag por (non)cacheability on /dev/mem. First tests seem to work properly with the buffer in DDR, so yes, it seems it was a cache coherency issue.

    Regarding the use of the existing IPC infrastructure, we tried in the past but faced some "challenges" such as messageQ_put function getting stuck sometimes for 80-100 ms in a call from a TI-RTOS SWI for no apparent reason (my guess was an internal GateMP), compilation errors when moving a shared region from DDR to OCMC and that we didn't find a clear way to share a CMEM region to several different A15-Linux binaries (I mean, different binaries accessing the same shared region with IPUs and C66x). Moreover, as far as I know CMEM has been deprecated.

    The "issues" with CMEM were the most compelling reasons to switch to mmap and we already use this approach in several communication "lanes" among cores through non-volatile memory and OCMC too. Even through DDR but slow data and from IPU to A15.

    Thanks and best regards

  • Ro,

    Since this is a non-TI implementation, it falls out of scope of our support.

    The best I can do is encourage you to do a Invalidate before Read, Flush after Write.

    Best

    Josue