TMDXIDK5718: IPC questions

Christian Leeb

Part Number: TMDXIDK5718
Other Parts Discussed in Thread: AM5718

Hi,

IDK572, Linux SDK6.0.0.07

regarding software-dl.ti.com/.../Foundational_Components_IPC.html

"Overall Linux Memory Map" mentions that CMEM allocates memory here a0000000-abffffff : CMEM, defined in linux/arch/arm/boot/dts/am57xx-evm-cmem.dtsi

Later CMA Carveouts are mentioned

Memory Section	Physical Address	Size
IPU2 CMA	0x95800000	56 MB
DSP1 CMA	0x99000000	64 MB
IPU1 CMA	0x9d000000	32 MB
DSP2 CMA	0x9f000000	8 MB
Default CMA	0xfe400000	24 MB

They are defined in (e.g.) am572x-idk-common.dtsi

----------------

Question 1: How is CMEM related to CMA? CMEM allocated memory is disjoint to CMA carveouts.

Question 2: Are the CMA carveouts used for the IPC message queues?

Question 2: regarding the IPC example "ex02_messageq", ti/ipc_3_50_03_05/examples/DRA7XX_linux_elf/ex02_messageq/shared/config.bld

In the below structure, which his used for BOTH DSPs I see overlap with the CMA carveouts define above.

var evmDRA7XX_ExtMemMapDsp = {
EXT_CODE: {
name: "EXT_CODE",
base: 0x95000000,
len: 0x00100000,
space: "code",
access: "RWX"
},
EXT_DATA: {
name: "EXT_DATA",
base: 0x95100000,
len: 0x00100000,
space: "data",
access: "RW"
},
EXT_HEAP: {
name: "EXT_HEAP",
base: 0x95200000,
len: 0x00300000,
space: "data",
access: "RW"
},
TRACE_BUF: {
name: "TRACE_BUF",
base: 0x9F000000,
len: 0x00060000,
space: "data",
access: "RW"
},
EXC_DATA: {
name: "EXC_DATA",
base: 0x9F060000,
len: 0x00010000,
space: "data",
access: "RW"
},
PM_DATA: {
name: "PM_DATA",
base: 0x9F070000,
len: 0x00020000,
space: "data",
access: "RWX" /* should this have execute perm? */
},
};

Build.platformTable["ti.platforms.evmDRA7XX:dsp2"] =
Build.platformTable["ti.platforms.evmDRA7XX:dsp1"];

Question 3: in " http://software-dl.ti.com/processor-sdk-rtos/esd/docs/latest/rtos/index_Foundational_Components.html#ipc" , 4.4.6.3. MessageQ Module I found

Supports zero-copy transfers (BIOS only)

Does this suggest that in Linux, copying is involved?

Background is that we would like to offload decoding of specific Ethernet packets to a DSP. For this we need an efficient IPC mechanism.

BR, Chris

over 4 years ago

0 Nick Saulnier over 4 years ago

TI__Guru 72585 points

Hello Chris,

Let me take a look at this and get back to you in a day or so.

Regards,

Nick

0 Nick Saulnier over 4 years ago in reply to Nick Saulnier

TI__Guru 72585 points

Hello Chris,

I am sorry for the long wait. I will do my best to provide answers with this post, but I am still learning about this subject. If anything does not sound right, please challenge me on it so we can make sure we get everything correct!

1) CMA vs CMEM:

The CMA carveouts are in DDR by default. This is where RemoteProc places your DSP / IPU code and data. RPMsg resources (the vring buffers) will also go in the CMA carveout. It is not currently stated in our documentation, but RPMsg is the "backend" for IPC message transfer between the ARM and the DSP / IPU. So that means that yes, IPC message queues are in the CMA carveout of the associated core.

CMA size: why are the carveouts different sizes by default? I am not sure - I might have an answer later.

When would I allocate a separate CMEM block? If your use case required a large buffer, you could create a CMEM allocation for it. I think you would also be able to create that buffer in the original CMA carveout as well (as long as you made the right changes to all resource tables, etc). I am not sure when you would put extra data in a CMEM allocation instead of a CMA allocation. E.g., if you wanted to use OCMC RAM, I'm not sure if it has to be allocated through CMEM.

2) config.bld addresses

Why is every core in shared/config.bld given a "physical memory address" around 0x9F00_0000 if DSP2 is the only core that is actually at physical address 0x9F00_0000? Apparently this is related to the iommu. From the DSP side, each DSP thinks it is using physical address 0x9F00_0000, and then the iommu performs address translation to get it to the actual physical memory address that the ARM side sees. I'm not quite sure how this works, or how you would rewrite your files if you wanted a core's CMA allocation to start at a different memory address.

3) Zero-copy data transfer with an RTOS ARM core, but not Linux ARM core

Yes, using out IPC solution with a Linux ARM involves some copying. That means that IPC will be faster when communicating with an RTOS ARM core than it would be with a Linux ARM core. That is because the IPC message goes through a Linux kernel driver - so it at least involves copying the message into kernel space, and then back out into user space for your application to access. RTOS does not have the same kernel space / user space division.

Please let me know if there are any follow-up questions,

Nick

0 Sahin Okur over 4 years ago in reply to Nick Saulnier

TI__Mastermind 27355 points

Hi Chris,

Adding to Nick's response,

CMA is a Linux tool allowing for static allocation of big physically contiguous memory blocks. In Processors SDK it is used to allocate static memory regions that are accessible from Linux, as well as DSP and IPU cores. CMA memory pools are used to store DSP and IPU application code (loaded by the Linux during SoC initialization) as well as IPC buffers. A detailed description can be found at the following location: A deep dive into CMA.

CMEM is a kernel module developed by TI that allows for dynamic creation and management of one or more blocks of contiguous memory for exchanging data buffers between Linux running on A15 and SYS/BIOS running on DSP or IPU. CMEM enables users to avoid memory fragmentation and ensures large physically contiguous memory blocks are available by using pool-based configuration of CMEM. In the Processor SDK for the AM57x family, CMEM allocates buffers for data that the A15 sends to the DSP or IPU for processing. Specifically, CMEM is used in the Big Data IPC example to store the large buffers. A detailed description of CMEM can be found at the CMEM Overview page.

More details around this can be found in the following application note.

AM57x Processor SDK Linux®: Customizing Multicore Applications to Run on New Platforms

The addresses seen in the config.bld file are virtual addresses, i.e., the addresses seen by each DSP. The actual location (physical address) where the DSP code/text/data resides is within the CMA carveout.

The "bridge" between the Linux CMA carveouts and the DSP side config.bld is the resource table. The resource table is a Linux construct that informs the Linux kernel remoteproc driver about the available resources of the remote processor, and typically refers to memory and local peripheral registers. When a remote processor image is loaded, the remoteproc driver will parse the system resources defined in the resource table, which is linked into the remote processor image. Also, remoteproc allocates rpmsg vring buffers, trace buffers, and configures MMUs according to the resource table.

The DSP resource table for AM57xx is located in the Processor SDK RTOS IPC folder at packages/ti/ipc/remoteproc/rsc_table_vayu_dsp.h

When the A15 is running BIOS, a shared memory transport is used, which places the message buffer in shared memory and passes the location of the message - the message does not actually get passed. With Linux, the hw mailbox is used in addition to what Nick explained above.

How fast are you looking to pass messages? We've posted some benchmarks for other devices at the following page. You can run the messageQ benchmarking application that comes with Processor SDK on your AM5718 to get exact numbers - details on this are at the link below.

IPC Benchmarking

Regards,
Sahin

Processors

Processors forum

TMDXIDK5718: IPC questions