AM6422: IPC delay

wanglili 王

Intellectual 680 points

Part Number: AM6422

Tool/software:

Hello，

I have two questions about IPC。

First Question:
We are using IPC on AM6422 to implement communication between A53 and R5F, following the example program RPMSG-SIMPLE. The A53 sends data (less than 100 bytes), and the R5F parses the data and sends 496 bytes an acknowledgment back. When measuring the time interval from A53 sending data to receiving the R5F response, we observed a maximum latency of over 800ms. Is this latency accurate/normal?
Second Question:
"Can we use the zero-copy shared memory functionality on top of the existing rpmsg-simple example implementation?"

Glad to receive your message!

5 months ago

0 Nick Saulnier 5 months ago

TI__Guru** 100980 points

Hello Wanglili,

First question - benchmarking RPMsg performance

What version of Linux are you running?

If you care about controlling latency, you should use RT Linux, not "regular" Linux. Even then, please remember that RT Linux is NOT completely deterministic - you can say that RT Linux is statistically likely to meet a real-time requirement, but you can never guarantee that Linux will meet that need. For more discussion about Linux vs RT Linux, refer to
AM64x academy > Multicore > Operating systems
https://dev.ti.com/tirex/explore/node?node=A__AZmYmYcoWo.KGrq4wf-oPQ__AM64-ACADEMY__WI1KRXP__LATEST

Linux kernel 6.6 is the version of Linux where we finally rewrote the low-level mailbox driver beneath RPMsg so that it could be given higher priority. This means that in RT Linux, starting on Linux kernel 6.6, you can begin to have more control over your RPMsg latency by raising the priority of the RPMsg communication.

I have written a dedicated benchmarking example that will finally get pushed public on ti-rpmsg-char sometime over the next couple of weeks. I can provide the current version of the source code if you're interested. For now, I'll attach the binary for you to run tests with:

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/rpmsg_5F00_char_5F00_benchmark

usage guide:

rpmsg-char-benchmark
====================

rpmsg-char-benchmark can be used to calculate average and worst-case round-trip
latencies for RPMsg messages between Linux and a non-Linux core.

Usage:
  rpmsg_char_benchmark [-r <rproc_id>] [-n <num_msgs>] [-m <msg_length>] [-d <rpmsg_dev_name>] [-p <remote_endpt>] [-l <local_endpt>]

  Where:
    -r <rproc_id>        remote processor id to be used.
                         Valid values are 0 to RPROC_ID_MAX
    -n <num_msgs>        Number of messages to exchange (default 100)
    -m <msg_length>      Number of characters to send per message (default 100)
    -d <rpmsg_dev_name>  rpmsg device name
                         (defaults to NULL, translates to rpmsg_chrdev)
    -p <remote_endpt>    remote end-point address of the rpmsg device
                         (default 14 based on current example firmwares)
    -l <local_endpt>     local end-point address of the rpmsg device
                         (default RPMSG_ADDR_ANY)
                         (If manually set, must be greater than
                         RPMSG_RESERVED_ADDRESSES = 1024)

Examples:
  rpmsg_char_benchmark -r 4 -n 1000000 -m 1
     Runs the example using default rpmsg device "rpmsg_chrdev",
     remote port 14 with rproc_id value of 4 (R5F_MAIN1_0), exchanges
     1,000,000 messages that are 1 character long

  stress-ng --cpu-method=all -c 4 & rpmsg_char_benchmark -r 4 -n 1000000 -m 496 & chrt -f -p 80 $!
     Use with RT Linux
     Runs the example using default rpmsg device "rpmsg_chrdev",
     remote port 14 with rproc_id value of 4 (R5F_MAIN1_0), exchanges
     1,000,000 messages that are 496 characters long.
     Add a background load on Linux with stress-ng.
     Set the rpmsg_char_benchmark application priority to 80.

You can see test results that I generated on Linux kernel 6.6 with earlier versions of the code here:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1410313/am6442-communication-latency-issues-between-a53-and-r5-in-a-linux-rt-system/5434861#5434861

0 Nick Saulnier 5 months ago

TI__Guru** 100980 points

Second question - can we use zero-copy shared memory alongside "regular" RPMsg?

Absolutely.

Based on your question, I assume you have already found the zerocopy project:
https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy/

This project uses RPMsg to send a pointer to a shared memory region. Then each processor directly reads/writes the shared memory region, instead of needing to send that data over a whole bunch of RPMsg messages.

You could set up multiple RPMsg endpoints, and use different endpoints for different tasks (e.g., endpoint 14 for shared memory, endpoint 15 for other communication). You can find an example of adding multiple RPMsg endpoints to an MCU+ project in the AM64x academy here:
multicore > IPC > How to add multiple RPMsg endpoints to an MCU+ project
https://dev.ti.com/tirex/explore/node?node=A__Ae7iN576eTKQcrPcBcMrog__AM64-ACADEMY__WI1KRXP__LATEST

You could also just have a single endpoint and have your code process the content of the RPMsg message in order to see if the message is for shared memory signaling, or for some other kind of passing data.

Regards,

Nick

0 wanglili 王 5 months ago in reply to Nick Saulnier

Intellectual 680 points

I am using SDK 8.6, and the linux version is 5.10. This version cannot meet the settings priority and low latency you mentioned, right? If I don't plan to change the linux version, what can I do to reduce the latency?

0 Nick Saulnier 4 months ago in reply to wanglili 王

TI__Guru** 100980 points

Hello Wanglili,

The best way to control latency in Linux is to use RT Linux, and raise the priority of the code that you care about.

On earlier versions of Linux, the mailbox interrupts are grouped in with everything else in terms of priority. So there is no way to elevate the priority of mailbox code execution.

You could try backporting the mailbox code changes to Linux kernel 5.10. I can point you to the kernel commits if you want to try it, but we will not be able to support you in the backporting process, and I cannot guarantee that the code will work the same on kernel 5.10 once it is actually backported.

One other note, if you are using the zerocopy project on Linux kernel 5.10, please use the ti-linux-6.1 branch instead of the master branch:
https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy/

Regards,

Nick

0 wanglili 王 4 months ago in reply to Nick Saulnier

Intellectual 680 points

Hello，

I plan to port the interrupt priority handling you mentioned to 5.10, and I hope you can provide technical support. Thank you！

0 Nick Saulnier 4 months ago in reply to wanglili 王

TI__Guru** 100980 points

Hello Wanglili,

The Linux patches are here:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1388960/sk-am64b-rpmsg-between-a53-and-r5-performance-update/5315748#5315748

I will not be able to help with backporting that code to Linux kernel 5.10.

Regards,

Nick

0 wanglili 王 4 months ago in reply to Nick Saulnier

Intellectual 680 points

Hello,

I am encountering problems during inter-core communication and would appreciate your technical assistance:

Non-Real-Time Core (A53 Core) Sending Data Issue:
- When the non-real-time core sends data, if the real-time core (RT Core) does not receive it in time, an error occurs after 25 or 26 successful sends:
  Error Log:
  [509.586118] rpmsg device ti.ipc4.ping-pong platform 78000000.r5f: failed to send mailbox message, status = -105
  Suggestion from Log:
  omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
- Observation:
  - The function send_msg is used for message transmission.
  - Expected Behavior:
    The function should return an error when the mailbox queue is full (to prevent data loss).
  - Actual Behavior:
    No error is returned, and the message is silently discarded (or the sender is unaware of the failure).
Real-Time Core (R5F Core) Reception Deadlock Issue:
- When the RT Core fails to read messages in a timely manner, and more than 16 messages accumulate in the reception queue:
  - The program becomes stuck in an infinite loop within the RPMessage_notifyCallback function.
  - Impact:
    - Other tasks on the RT Core are starved and cannot execute.
  - Attempted Solution:
    - Setting Bit 0 of register 0x29020010 to 1 (likely a hardware flow control or interrupt enable bit) did not resolve the issue.

0 Nick Saulnier 4 months ago in reply to wanglili 王

TI__Guru** 100980 points

Hello Wanglili,

Number of mailboxes?

The mailbox hardware on the processor has a hardware FIFO that can only hold up to 4 mailbox messages. However, the Linux mailbox driver has a software FIFO that stores mailbox messages until they are ready to be sent. This Linux driver has an arbitrary size of 20 mailboxes. If you wanted, you could just increase this to 256 mailboxes or greater, and then you would be limited by the 256 messages in the VIRTIO buffer instead. For more information, refer to
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1179393/am6442-problem-detecting-rpmsg-mailbox-full-when-communicating-to-r5-core/4618458#4618458

MCU+ core lockup after the receive queue gets full?

This is a known bug. You can find the patches to fix it in MCU+ SDK 10.1 (as well as much more discussion about the bug) on this thread:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1365362/processor-sdk-am62x-multicore-development-ipc-process-multiple-endpoints-reserved-memory-purpose/5532041#5532041

Regards,
Nick

0 wanglili 王 4 months ago in reply to Nick Saulnier

Intellectual 680 points

Thank you for your reply. We are currently using SDK 8.6. Is it possible to backport the bug fixes for the R5F RTOS-related issues (e.g., the infinite loop problem in RPMessage) to SDK 8.6? If so, what modifications would be required?

0 wanglili 王 4 months ago in reply to Nick Saulnier

Intellectual 680 points

For the alarm message "omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN" that occurs in Linux when the value exceeds 20, can simply modifying the macro definition MBOX_TX_QUEUE_LEN from 20 to 256 in the /include/linux/mailbox_controller.h file within the Linux source directory resolve the issue?

0 Nick Saulnier 4 months ago in reply to wanglili 王

TI__Guru** 100980 points

Hello Wanglili,

Linux software mailbox queue

Yes, MBOX_TX_QUEUE_LEN is the variable that you would want to update.

MCU+ SDK TX buffer lockup

I am not sure what would be involved in backporting the bug fixes. I pointed to the actual commits in that linked thread, but I am not sure if there are additional changes between SDK 8.6 and SDK 10.1 that would be required to apply those commits. Unfortunately, I cannot support backporting code to previous software releases.

Regards,

Nick

0 Nick Saulnier 4 months ago in reply to Nick Saulnier

TI__Guru** 100980 points

Hello Wanglili,

I have split your latest question off into a separate e2e thread:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1510849/re-am6422-ipc-enabling-disabling-interrupts

Regards,

Nick

Processors

Processors forum

AM6422: IPC delay