This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: About cache settings when using DMA-BUF Heap for allocating memory under Linux

Part Number: TDA4VM

Hi TI,

I want to allocate memory from phyiscal address in Linux. I am using suggested API from vision_apps, that means appMemInit() and appMemAlloc() from app_mem_linux_dma_heap.c. The relevant section of my device tree overlay looks like this:

Fullscreen
1
2
3
4
5
vision_apps_shared_region: vision_apps_shared-memories {
compatible = "dma-heap-carveout";
reg = <0x00 0xb8000000 0x00 0x20000000>;
no-map;
};
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This approach works fine and I can write and read from the allocated memory region.

However, the missing piece in understanding is how to deal with cache settings. When comparing to Cortex R5 (running with FreeRTOS) on TDA4VM, I can explicitly set some region as strongly ordered, device or cached memory via MPU settings.

On Cortex A72 (running Linux) I was not able to find an equivalent setting yet. The only thing I found out when debugging carveout.heap.c  (Linux kernel driver) is, that no-map option from device tree section (see beginning of this thread) sets option "cached" from struct carveout_dma_heap_buffer to "FALSE". Following is a table which summarizes the effect of these options on the "cached" entry from carveout_dma_heap_buffer:

Device Tree Property buffer->cached
no-map FALSE
map TRUE
reusable TRUE

I could also measure the following differences on read/write speed (using memset and memcpy, size = 64 kByte)

no-map, 64kByte
Memset: Speed in MBytes/s: 2872.243
Memcpy: Speed in MBytes/s: 102.097

map, 64kByte
Memset: Speed in MBytes/s: 12926.577
Memcpy: Speed in MBytes/s: 8316.700

reusable, 64kByte
Memset: Speed in MBytes/s: 12980.270
Memcpy: Speed in MBytes/s: 8491.848

Apparently, this device tree property has a huge impact on read/write performance. I have got now the following questions:

1. Is manipulating the above mentioned device tree property the official way to set allocated memory from physical memory under linux as cacheable/non-cacheable?

2. What exactly is set, when cached=FALSE? Is this comparable to Strongly-Ordered for R5F?

3. What exactly is set, when cached=TRUE? Is this comparable to Write-Through, No Write-Allocate, Write-Back, Write-Allocate or something else?

Thanks for your help and best regards,

Felix

  • Hi TI,

    I want to kindly remind of this query.

    Thanks and best regards,

    Felix

  • Hi fellx,

    Sorry for the long delay ,

    1. Is manipulating the above mentioned device tree property the official way to set allocated memory from physical memory under linux as cacheable/non-cacheable?

    Yes.

    2. What exactly is set, when cached=FALSE? Is this comparable to Strongly-Ordered for R5F?

    Yes.

    3. What exactly is set, when cached=TRUE? Is this comparable to Write-Through, No Write-Allocate, Write-Back, Write-Allocate or something else?

    It will use the normal ram cache policy, which on ARM64 Linux is Inner shareable write allocate.