TDA4VM: About cache settings when using DMA-BUF Heap for allocating memory under Linux

felix_123

Part Number: TDA4VM

Hi TI,

I want to allocate memory from phyiscal address in Linux. I am using suggested API from vision_apps, that means appMemInit() and appMemAlloc() from app_mem_linux_dma_heap.c. The relevant section of my device tree overlay looks like this:

Fullscreen

1
2
3
4
5
    vision_apps_shared_region: vision_apps_shared-memories {
        compatible = "dma-heap-carveout";
        reg = <0x00 0xb8000000 0x00 0x20000000>;
        no-map;
    };
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

	vision_apps_shared_region: vision_apps_shared-memories {
		compatible = "dma-heap-carveout";
		reg = <0x00 0xb8000000 0x00 0x20000000>;
		no-map;
	};

This approach works fine and I can write and read from the allocated memory region.

However, the missing piece in understanding is how to deal with cache settings. When comparing to Cortex R5 (running with FreeRTOS) on TDA4VM, I can explicitly set some region as strongly ordered, device or cached memory via MPU settings.

On Cortex A72 (running Linux) I was not able to find an equivalent setting yet. The only thing I found out when debugging carveout.heap.c (Linux kernel driver) is, that no-map option from device tree section (see beginning of this thread) sets option "cached" from struct carveout_dma_heap_buffer to "FALSE". Following is a table which summarizes the effect of these options on the "cached" entry from carveout_dma_heap_buffer:

Device Tree Property	buffer->cached
no-map	FALSE
map	TRUE
reusable	TRUE

I could also measure the following differences on read/write speed (using memset and memcpy, size = 64 kByte)

no-map, 64kByte
Memset: Speed in MBytes/s: 2872.243
Memcpy: Speed in MBytes/s: 102.097

map, 64kByte
Memset: Speed in MBytes/s: 12926.577
Memcpy: Speed in MBytes/s: 8316.700

reusable, 64kByte
Memset: Speed in MBytes/s: 12980.270
Memcpy: Speed in MBytes/s: 8491.848

Apparently, this device tree property has a huge impact on read/write performance. I have got now the following questions:

1. Is manipulating the above mentioned device tree property the official way to set allocated memory from physical memory under linux as cacheable/non-cacheable?

2. What exactly is set, when cached=FALSE? Is this comparable to Strongly-Ordered for R5F?

3. What exactly is set, when cached=TRUE? Is this comparable to Write-Through, No Write-Allocate, Write-Back, Write-Allocate or something else?

Thanks for your help and best regards,

Felix

over 3 years ago

0 felix_123 over 3 years ago

Expert 1251 points

Hi TI,

I want to kindly remind of this query.

Thanks and best regards,

Felix

+1 Diwakar Dhyani over 2 years ago in reply to felix_123

TI__Mastermind 49110 points

Hi fellx,

Sorry for the long delay ,

felix_123 said:
1. Is manipulating the above mentioned device tree property the official way to set allocated memory from physical memory under linux as cacheable/non-cacheable?

Yes.

felix_123 said:
2. What exactly is set, when cached=FALSE? Is this comparable to Strongly-Ordered for R5F?

Yes.

felix_123 said:
3. What exactly is set, when cached=TRUE? Is this comparable to Write-Through, No Write-Allocate, Write-Back, Write-Allocate or something else?

It will use the normal ram cache policy, which on ARM64 Linux is Inner shareable write allocate.

Processors

Processors forum

TDA4VM: About cache settings when using DMA-BUF Heap for allocating memory under Linux