This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TMS570LC4357: How to solve the cache coherency issues when accessing the shared memory through CPU and DMA?

Part Number: TMS570LC4357

After CPU has updated the data buffer, but the DMA still transfers the old content in the buffer instead of the updated one. 

  • TMS570LC43x and RM57Lx have DMA and Data Cache (DCACHE). We can use DMA for data transfer between peripheral and memory, or between memory and memory. The DMA transfer does not go through MCU DCACHE. It writes data directly to memory. The SRAM is protected by MPU. The cache policy and allocation are programmable.

    Write-back with write allocate (WBWA) provides the best performance. The cache hits only update the cache memory. Cache misses on a write, copy data from the main memory to the cache. As a result, subsequent access results in a cache hit. Read-allocate is permanently enabled by default.

    The cache is enabled on SRAM and the cacheability attribute is set to write-back with write-allocate (WBWA). The CPU has previously read the DMA buffer and therefore, the same data is available in the cache memory due to the read allocate policy.

    1. Cache Coherency Issue - DMA Writes to SRAM

    • The DMA reads the data from the peripheral and updates the receive buffer in the SRAM.
    • When the CPU tries to read the receive buffer, it will read the data present in the cache and not the new data available in the SRAM.

    2. Cache coherency Issue - DMA Reads from SRAM

    • The CPU updates the data to be transmitted in a transmit buffer as the cache policy is set to WBWA, only the cache is updated and not the main memory.
    • When the DMA reads the transmit buffer, it reads the old value present in the main memory and not the latest value updated by the CPU which is still in the cache.

    There are two mechanisms to maintain coherency:

    1. Disable caching

    This is the simplest mechanism but might cost significant MCU performance. To get the highest performance, the CPU is pipelined to run fast, and to run from caches which offer a very low latency. Caching of data that is accessed multiple times increases performance significantly and reduces SRAM accesses. Marking data as "non-cached" could impact performance.

    2. Software managed coherency

    Software managed coherency is the traditional solution to the data sharing problem. The software must clean or flush dirty data from caches, and invalidate old data to enable sharing with other bus master (CPU or DMA).

       1. When DMA writes to SRAM condition:

    • DMA writes data to the rx_buffer[]
    • invalidate the cached rx_buffer[]

                coreInvalidateDCByAddress(uint32 u32Address, uint32 u32Size);

    • CPU tries to read the rx_buffer[] and results in a cache miss as rx_buffer[] was invalidated in step 2
    • Due to the read-allocate policy, a cache line is allocated and copies data from the rx_buffer[] in the SRAM to the allocated cache line.
    • The CPU reads from the cache will then be coherent.

       2. When DMA reads from SRAM Conditions:

    • The CPU initially accessed the transmit buffer (tx_buffer[]), and cached it in the D-Cache.
    • The CPU writes data to the tx_buffer[] which will be transmitted by the DMA.
    • A cache clean operation is performed to flush the cached tx_buffer[] into the SRAM before enabling the DMA transfer.

                coreCleanDCByAddress(uint32 u32Address, uint32 u32Size);

    • The DMA reads from the SRAM will now be coherent.