I'm trying to understand the behaviour of our TI-RTOS (Processor SDK RTOS v06.01) application running on a single A53 core.
Background: By mistake our application called CacheP_Inv() on a buffer in DDR memory that was written by the CPU just before. The CPU then reads the same buffer and copies it to PRU memory. I would expect this sequence to go wrong almost all of the time, since invalidating the cache for these addresses should lead to old, stale data to be read again. On the R5f our application subsequently fails (force write through is not set). On the A53 on the other hand, the application reliably works fine, which is what I don't understand.
The buffer's memory region is mapped with attribute index 7, which should mean normal, non-transient, inner and outer write-back cacheable memory. I verified the page tables and the result of the address translation and the memory is indeed mapped correctly.
I've looked at the TI-RTOS implementation for CacheP_Inv() and that function calls Cache_inv() with Cache_Type_ALL. Cache_inv is implemented in bios.../family/arm/v8a/Cache.c and calls Cache_invL1p and Cache_invL1d, which are assembler functions that use "dc ivac" and "ic ivau".
According to the A53 TRM:
"dc ivac" is "Data cache invalidate by VA to PoC", but the "point of coherence" is outside of the processor system and depends on the external memory system.
"ic ivau" is "Instruction cache invalidate by virtual address (VA) to PoU", and the "point of unification" apparently depends on a configuration signal BROADCASTINNER.
Later the TRM says "If the data is dirty within the cluster then a clean is performed before the invalidate.", which would explain why the sequence described above is working on the A53.
I believe the situation is further complicated by the MSMC, which is involved in coherency, and might act as a L3 cache. The default configuration for the AM65x w/ TI-RTOS seems to use the MSMC SRAM as all SRAM though.
- Can someone tell me what the inner and outer shareability domains for the A53s in the AM65x are?
- Where is the "point of coherence" and the "point of unification"?
- Is the invalidate operation on the A53 really a clean-and-invalidate?
- If the invalidate really cleans any dirty data, does that mean I have to invalidate twice in order to manually maintain coherency? Once before a buffer might be written by DMA (to clean any potentially dirty lines), and once before I try to read it (to ensure the latest data is fetched)?
- Is there any configuration of the MSMC (or other parts of the memory subsystem) necessary to achieve coherency? There's an old thread (https://e2e.ti.com/support/processors/f/791/t/741291) that says "Yes enabling cache coherency [...] is supported". If it "can be enabled" does that also mean that it starts out disabled?
- The MSMC apparently only takes care of coherency for DDR and MSMC_SRAM (?). How about data in MCU_SRAM or one of the PRU RAMs?
Regards,
Dominic