Tool/software: Linux
Hi,
I am trying to evaluate the IPC options between Linux on the ARM cores and TI-RTOS on the DSP core of Keystone 2 66AK2E05. I started with the ex02_messageq example and modified to explore the performance and functionality limits. Most important modification I made are:
- the Linux side allocates and sends messages as fast as it can instead of waiting for each reply from the DSP.
- the length of the MessageQ message is increased to the maximum allowed of 469 bytes (including MessageQ header). I had to find this limit by trial and error, since the only relevant information I found in any documentation was the 512 byte limit of rpmsg.
- added an option to disable the DSP->ARM communication in order to test max unidirectional performance
With this change I have found that If there are no measures to limit the rate at which the ARM / Linux application tries to allocate and send/put new messages to the DSP's queue the IPC fails when MessageQ_put() is called with the following message on the ARM side:
TransportRpmsg_put: send failed: 512 (Unknown error 512)
On the DSP trace I see that execution fails after that:
[ 1.398] [t=0x74b78c74] xdc.runtime.Memory: ERROR: line 52: out of memory: heap=0x87cdc0, size=496 [ 1.398] xdc.runtime.Memory: line 52: out of memory: heap=0x87cdc0, size=496 [ 1.398] [t=0x74b96e10] ti.sdo.ipc.MessageQ: ERROR: line 503: assertion failure: A_invalidMsg: Invalid message [ 1.398] ti.sdo.ipc.MessageQ: line 503: assertion failure: A_invalidMsg: Invalid message [ 1.398] xdc.runtime.Error.raise: terminating execution
Moreover, by artificially limiting the message rate I managed to get a maximum transfer rate of a little over 20MB/s using the maximum message size and with only the ARM side generating traffic. With lower message size or bidirectional traffic this decreases significantly.
The above mean that it is not that hard for an application burst to cause the messageQ to fail ungracefully! Please note that I have taken care to check the return status of all messageQ-related functions but I cannot get any information to avoid the failure. I was hoping that if the rpmsg /v-ring buffers are full messageQ_put() would fail gracefully and allow some time to the receiving end to empty the buffers, or that I could at least check for the required space before actually calling MessageQ_put(), but none of this worked.
Also, the max throughput of 20-25MB/s seems rather low, given that I am working with high performance ARM and DSP cores with shared RAM. Moreover, when reaching this limit one of the ARM cores is at 100% usage. I understand that this could be due to the new MessageQ/IPC implementation using rpmsg and not shared memory, so that the Linux kernel has to copy the data from ARM memory to DSP memory, but still I was expecting higher performance and more robust behavior at the limit. Is there any option to improve performance, using the shared memory or otherwise? Do I have to resort to custom shared memory implementation using cmem for higher performance?
Is it "normal" for the DSP application to fail like that when the MessageQ heap is full? I could increase the heap size (eg. by placing the heap in the DDR) but still I would like to know that if the heap becomes full my application will not fail, just slow down until some messages are processed to make room in the heap.
As others have noted, the information about IPC and memory management is scattered and incomplete, I believe a short IPC facts sheet describing the basic operation, limitations, performance expectations, etc would be very useful.