This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6422: IPC delay

Part Number: AM6422

Tool/software:

Hello,

    I have two questions about IPC。

  1. First Question:
    We are using IPC on AM6422 to implement communication between A53 and R5F, following the example program RPMSG-SIMPLE. The A53 sends data (less than 100 bytes), and the R5F parses the data and sends 496 bytes an acknowledgment back. When measuring the time interval from A53 sending data to receiving the R5F response, we observed a maximum latency of over 800ms. Is this latency accurate/normal?

  2. Second Question:
    "Can we use the zero-copy shared memory functionality on top of the existing rpmsg-simple example implementation?"

Glad to receive your message!

  • Hello Wanglili,

    First question - benchmarking RPMsg performance 

    What version of Linux are you running?

    If you care about controlling latency, you should use RT Linux, not "regular" Linux. Even then, please remember that RT Linux is NOT completely deterministic - you can say that RT Linux is statistically likely to meet a real-time requirement, but you can never guarantee that Linux will meet that need. For more discussion about Linux vs RT Linux, refer to
    AM64x academy > Multicore > Operating systems
    https://dev.ti.com/tirex/explore/node?node=A__AZmYmYcoWo.KGrq4wf-oPQ__AM64-ACADEMY__WI1KRXP__LATEST 

    Linux kernel 6.6 is the version of Linux where we finally rewrote the low-level mailbox driver beneath RPMsg so that it could be given higher priority. This means that in RT Linux, starting on Linux kernel 6.6, you can begin to have more control over your RPMsg latency by raising the priority of the RPMsg communication.

    I have written a dedicated benchmarking example that will finally get pushed public on ti-rpmsg-char sometime over the next couple of weeks. I can provide the current version of the source code if you're interested. For now, I'll attach the binary for you to run tests with:

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/rpmsg_5F00_char_5F00_benchmark

    usage guide:

    rpmsg-char-benchmark
    ====================
    
    rpmsg-char-benchmark can be used to calculate average and worst-case round-trip
    latencies for RPMsg messages between Linux and a non-Linux core.
    
    Usage:
      rpmsg_char_benchmark [-r <rproc_id>] [-n <num_msgs>] [-m <msg_length>] [-d <rpmsg_dev_name>] [-p <remote_endpt>] [-l <local_endpt>]
    
      Where:
        -r <rproc_id>        remote processor id to be used.
                             Valid values are 0 to RPROC_ID_MAX
        -n <num_msgs>        Number of messages to exchange (default 100)
        -m <msg_length>      Number of characters to send per message (default 100)
        -d <rpmsg_dev_name>  rpmsg device name
                             (defaults to NULL, translates to rpmsg_chrdev)
        -p <remote_endpt>    remote end-point address of the rpmsg device
                             (default 14 based on current example firmwares)
        -l <local_endpt>     local end-point address of the rpmsg device
                             (default RPMSG_ADDR_ANY)
                             (If manually set, must be greater than
                             RPMSG_RESERVED_ADDRESSES = 1024)
    
    Examples:
      rpmsg_char_benchmark -r 4 -n 1000000 -m 1
         Runs the example using default rpmsg device "rpmsg_chrdev",
         remote port 14 with rproc_id value of 4 (R5F_MAIN1_0), exchanges
         1,000,000 messages that are 1 character long
    
      stress-ng --cpu-method=all -c 4 & rpmsg_char_benchmark -r 4 -n 1000000 -m 496 & chrt -f -p 80 $!
         Use with RT Linux
         Runs the example using default rpmsg device "rpmsg_chrdev",
         remote port 14 with rproc_id value of 4 (R5F_MAIN1_0), exchanges
         1,000,000 messages that are 496 characters long.
         Add a background load on Linux with stress-ng.
         Set the rpmsg_char_benchmark application priority to 80.
    

    You can see test results that I generated on Linux kernel 6.6 with earlier versions of the code here:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1410313/am6442-communication-latency-issues-between-a53-and-r5-in-a-linux-rt-system/5434861#5434861

  • Second question - can we use zero-copy shared memory alongside "regular" RPMsg? 

    Absolutely.

    Based on your question, I assume you have already found the zerocopy project:
    https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy/

    This project uses RPMsg to send a pointer to a shared memory region. Then each processor directly reads/writes the shared memory region, instead of needing to send that data over a whole bunch of RPMsg messages.

    You could set up multiple RPMsg endpoints, and use different endpoints for different tasks (e.g., endpoint 14 for shared memory, endpoint 15 for other communication). You can find an example of adding multiple RPMsg endpoints to an MCU+ project in the AM64x academy here:
    multicore > IPC > How to add multiple RPMsg endpoints to an MCU+ project
    https://dev.ti.com/tirex/explore/node?node=A__Ae7iN576eTKQcrPcBcMrog__AM64-ACADEMY__WI1KRXP__LATEST

    You could also just have a single endpoint and have your code process the content of the RPMsg message in order to see if the message is for shared memory signaling, or for some other kind of passing data.

    Regards,

    Nick

  • I am using SDK 8.6, and the linux version is 5.10. This version cannot meet the settings priority and low latency you mentioned, right? If I don't plan to change the linux version, what can I do to reduce the latency?

  • Hello Wanglili,

    The best way to control latency in Linux is to use RT Linux, and raise the priority of the code that you care about.

    On earlier versions of Linux, the mailbox interrupts are grouped in with everything else in terms of priority. So there is no way to elevate the priority of mailbox code execution.

    You could try backporting the mailbox code changes to Linux kernel 5.10. I can point you to the kernel commits if you want to try it, but we will not be able to support you in the backporting process, and I cannot guarantee that the code will work the same on kernel 5.10 once it is actually backported.

    One other note, if you are using the zerocopy project on Linux kernel 5.10, please use the ti-linux-6.1 branch instead of the master branch:
    https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy/

    Regards,

    Nick

  • Hello,

        I plan to port the interrupt priority handling you mentioned to 5.10, and I hope you can provide technical support. Thank you!

  • Hello Wanglili,

    The Linux patches are here:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1388960/sk-am64b-rpmsg-between-a53-and-r5-performance-update/5315748#5315748

    I will not be able to help with backporting that code to Linux kernel 5.10.

    Regards,

    Nick

  • Hello,

    I am encountering problems during inter-core communication and would appreciate your technical assistance:

    1. Non-Real-Time Core (A53 Core) Sending Data Issue:
      • When the non-real-time core sends data, if the real-time core (RT Core) does not receive it in time, an error occurs after 25 or 26 successful sends:
        Error Log:
        [509.586118] rpmsg device ti.ipc4.ping-pong platform 78000000.r5f: failed to send mailbox message, status = -105
        Suggestion from Log:
        omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
      • Observation:
        • The function send_msg is used for message transmission.
        • Expected Behavior:
          The function should return an error when the mailbox queue is full (to prevent data loss).
        • Actual Behavior:
          No error is returned, and the message is silently discarded (or the sender is unaware of the failure).
    2. Real-Time Core (R5F Core) Reception Deadlock Issue:
      • When the RT Core fails to read messages in a timely manner, and more than 16 messages accumulate in the reception queue:
        • The program becomes stuck in an infinite loop within the RPMessage_notifyCallback function.
        • Impact:
          • Other tasks on the RT Core are starved and cannot execute.
        • Attempted Solution:
          • Setting Bit 0 of register 0x29020010 to 1 (likely a hardware flow control or interrupt enable bit) did not resolve the issue.
  • Hello Wanglili,

    Number of mailboxes? 

    The mailbox hardware on the processor has a hardware FIFO that can only hold up to 4 mailbox messages. However, the Linux mailbox driver has a software FIFO that stores mailbox messages until they are ready to be sent. This Linux driver has an arbitrary size of 20 mailboxes. If you wanted, you could just increase this to 256 mailboxes or greater, and then you would be limited by the 256 messages in the VIRTIO buffer instead. For more information, refer to
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1179393/am6442-problem-detecting-rpmsg-mailbox-full-when-communicating-to-r5-core/4618458#4618458

    MCU+ core lockup after the receive queue gets full? 

    This is a known bug. You can find the patches to fix it in MCU+ SDK 10.1 (as well as much more discussion about the bug) on this thread:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1365362/processor-sdk-am62x-multicore-development-ipc-process-multiple-endpoints-reserved-memory-purpose/5532041#5532041

    Regards,
    Nick

  • Thank you for your reply. We are currently using SDK 8.6. Is it possible to backport the bug fixes for the R5F RTOS-related issues (e.g., the infinite loop problem in RPMessage) to SDK 8.6? If so, what modifications would be required?

  • For the alarm message "omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN" that occurs in Linux when the value exceeds 20, can simply modifying the macro definition MBOX_TX_QUEUE_LEN from 20 to 256 in the /include/linux/mailbox_controller.h file within the Linux source directory resolve the issue?

  • Hello Wanglili,

    Linux software mailbox queue

    Yes, MBOX_TX_QUEUE_LEN is the variable that you would want to update.

    MCU+ SDK TX buffer lockup

    I am not sure what would be involved in backporting the bug fixes. I pointed to the actual commits in that linked thread, but I am not sure if there are additional changes between SDK 8.6 and SDK 10.1 that would be required to apply those commits. Unfortunately, I cannot support backporting code to previous software releases.

    Regards,

    Nick