AM6442: RPMsg/IPC: How to implement non-blocking or timeout send for RPMsg char device in Linux?

Part Number: AM6442


Tool/software:

Hi,

I’m working on a multi-core IPC system using RPMsg on a TI SoC. From user space I send messages to remote cores via the RPMsg character device (e.g. /dev/rpmsgX). The problem: write() blocks indefinitely when the remote endpoint is not ready.

I tried calling select() / poll() on the fd before write(), but select() always times out (errno 110 / ETIMEDOUT) even though a plain blocking write() succeeds when the remote becomes ready shortly after. It looks like the RPMsg char device might not expose write-readiness the same way sockets/pipes do.

  • Is there a recommended way to implement a non-blocking or timeout send for RPMsg char devices on Linux?

  • Do standard RPMsg char drivers implement poll()/select() for write readiness? If not, are there known patches or driver options to enable that?

  • Any alternative techniques to avoid indefinite blocking on write() if the remote endpoint is not ready? (e.g., driver changes, kernel API, user-space pattern)

  • Any TI-specific APIs, examples, or best practices for robust send-with-timeout in RPMsg-based apps?

Regards,

Mary

  • Hello Mary,

    I have not done testing myself with these functions at this point in time. But my general understanding is that calling open() with O_NONBLOCK or O_NDELAY will allow you to do a non-blocking write, and poll() will allow you to poll until an RPMsg is received. You can set a timeout value with poll(). I am not sure if there is a way to implement a timeout with open().

    We should support all "standard" RPMsg features. We have not implemented any TI-specific APIs for interacting with the RPMsg driver.

    More information here:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1248401/processor-sdk-am62x-question-of-ti-rpmsg-char-scripts-specifically-the-recv_msg-method/4731803#4731803

    Please keep me updated as you run experiments and figure things out! Eventually I would like to add information about writing and reading RPMsg from Linux userspace to the Multicore academy: https://dev.ti.com/tirex/explore/node?node=A__AeQbQhHUWUoNbbj86zUSgg__AM64-ACADEMY__WI1KRXP__LATEST

    Regards,

    Nick

  • Part Number: AM6442

    Tool/software:

    Hi..

     I am working on Rpmsg transfer between different cores.I need to implement a timeout logic for sending/writing the data to created fd. Timeout logic means, If the core receiving the data not ready, sleep and continue writing again. I have tried select()/poll()/setsockopt() etc...But not working.Is there any other method that support for adding the timeout logic.

    Thanks and Regards

    Shiny K George

  • Hello Shiny K George,

    Are you working with Mary on this topic? If so I will move this discussion over to her thread so that we can keep the conversation in one place:
    AM6442: RPMsg/IPC: How to implement non-blocking or timeout send for RPMsg char device in Linux?

    I gave an initial response on that thread.

    Regards,

    Nick

  • Hello Nick,

    Thanks for the clarification and the references. I’ll try this out on my side and will keep you updated on the results.

    Regards,
    Mary

  • Thanks Nick for your immediate reply..yes,we are working together.

  • Hi Nick,

    Based on previous suggestions, I tried to implement timeout logic using poll() by setting the endpoint to non-blocking mode. However, the implementation is not working as expected.

    I tried changing:-

    rpmsg_char_dev_t *rcdev;

    int flags = O_NONBLOCK; // <--- ADDED O_NONBLOCK HERE // ...

    rcdev = rpmsg_char_open(rproc_id, dev_name, local_endpt, remote_endpt, eptdev_name, flags); // ...

    Then used poll()-based Send Function:

    static int send_with_poll_deadline(int fd, const void *buf, size_t len, int deadline_ms)
    {
    int ret;
    struct pollfd pfd;

    if (len == 0) return 0;

    pfd.fd = fd;
    pfd.events = POLLOUT;
    pfd.revents = 0;

    // Wait for the descriptor to become writable
    ret = poll(&pfd, 1, deadline_ms);

    if (ret < 0) {
    perror("poll error in send_with_poll_deadline");
    return -errno;
    }

    if (ret == 0) {
    // Timeout
    return -ETIMEDOUT;
    }

    // Check for writability and errors
    if (!(pfd.revents & POLLOUT)) {
    if (pfd.revents & (POLLERR | POLLHUP | POLLNVAL)) {
    return -EIO;
    }
    return -EAGAIN; // Should not happen after a successful poll
    }

    // Attempt the write now that poll indicates it's safe
    ret = send_msg(fd, (char*)buf, (int)len);

    if (ret < 0) {
    // Handle failure in send_msg
    return -errno;
    }

    if ((size_t)ret != len) {
    // Partial write
    return -EIO;
    }

    return 0; // Success
    }

    The application immediately fails with a timeout (-ETIMEDOUT) from poll(), even though the R5F is running and ready to receive messages and trying to send again.

    I have tried 

    int fd = rcdev->fd;

    // Get the file descriptor

    int current_flags = fcntl(fd, F_GETFL, 0);

    if (current_flags == -1) {

    perror("fcntl F_GETFL");

    return -1;

    }

    if (fcntl(fd, F_SETFL, current_flags | O_NONBLOCK) == -1) {

    perror("fcntl F_SETFL O_NONBLOCK");

    return -1;

    }

    i am expecting  poll() to return successfully within the SEND_DEADLINE_MS (500 ms) when the R5F core is running and ready, allowing the message to be sent, or return -ETIMEDOUT if the R5F fails to respond or the buffer is perpetually full.Plese help me to implement this feature.

    Thanks and Regards.

    Shiny

  • For future readers, Mary and Shiny K George are working together on this project. Since there are 2 threads asking the same question, I am going to merge both threads here so that we can have a single discussion. If new questions pop up, we can split those off into separate threads.

  • Hello Shiny,

    I would expect that poll() is waiting to receive an RPMsg message from the R5F. So is the R5F programmed to send the first RPMsg?

    If the R5F is running the IPC_Echo example, then the R5F is also waiting for an RPMsg to be sent. In that case, I would expect you to test like this:

    Linux userspace
    open() <-- send RPMsg trigger
    poll() <-- wait for R5F to echo back the RPMsg

    Regards,

    Nick

  • Hello Nick,

    Thank you for the clarification. I am using the IPC_Echo Linux example, and when we use the default send message function without timeout logic, it works successfully.

    In my current setup, I am executing the exact steps you mentioned: Linux sends the first message (the "trigger"). However, I am finding that the poll() call on the Linux side still blocks indefinitely (or until timeout), and the initial message is not reaching the R5F (the Linux → R5F path is failing).

    Could you please suggest what I should check next on either side?

    Regards,
    Shiny

  • Hello Shiny,

    Please share a snippet of the latest test code that you are using on the Linux side. I would expect your code to look like this:
    https://git.ti.com/cgit/rpmsg/ti-rpmsg-char/tree/examples/rpmsg_char_simple.c

    int send_msg(int fd, char *msg, int len)
    {
    // no changes. You still send a message with write()
    }
    
    int recv_msg(int fd, int len, char *reply_msg, int *reply_len)
    {
    // this is where the poll would go
    }

    You can use the trace log to check to see if the R5F is receiving the message. For more information, refer to AM64x academy:
    Multicore > Application Development on Remote Cores > Debug the remote core through the Linux terminal
    https://dev.ti.com/tirex/explore/node?node=A__AbUQ5KHRh.Fi4sFvN-aleA__AM64-ACADEMY__WI1KRXP__LATEST 

    Regards,

    Nick

  • Hello Nick,

    Thank you for the suggestions. I wanted to clarify my approach: I have added a poll() mechanism to the send path as well. The idea is that if the R-core is not ready to receive messages, the Linux core should attempt to send, but if it cannot proceed immediately, it should timeout, back off for a certain duration, and then retry.

    The same logic applies to the receive side: if no message is available yet, Linux should wait up to the timeout, then retry after a backoff period. This ensures that Linux does not block indefinitely while waiting for the R-core to be ready, and it allows both send and receive paths to handle the R-core’s readiness gracefully. I can share a snippet of my current Linux-side test code function if that would help illustrate this behavior further

    static int send_with_poll_deadline(int fd, const void *buf, size_t len, int deadline_ms)
    {
            int ret;
            struct pollfd pfd;
            size_t remaining = len;
            const char *ptr = (const char *)buf;
    
            if (len == 0)
                    return 0; // Nothing to send.
    
            pfd.fd = fd;
            pfd.events = POLLOUT; // We want to wait for the file descriptor to be ready for writing.
            pfd.revents = 0;
    
            // Use poll() to wait until the file descriptor is ready for writing or the timeout expires.
            ret = poll(&pfd, 1, deadline_ms);
    
            if (ret < 0) {
                    // Error during poll()
    #ifdef DEBUG
                    perror("poll error in send_with_poll_deadline");
    #endif
                    return -errno;
            }
    
            if (ret == 0) {
                    // Timeout occurred (poll returned 0)
    #ifdef DEBUG
                    printf("poll timeout (%d ms) in send_with_poll_deadline\n", deadline_ms);
    #endif
                    return -ETIMEDOUT;
            }
    
            // ret > 0, check revents
                    if (!(pfd.revents & POLLOUT)) {
                    // If POLLOUT isn't set, something else happened (e.g., error/hangup)
    #ifdef DEBUG
                    fprintf(stderr, "poll returned revents 0x%x but not POLLOUT\n", pfd.revents);
    #endif
                    if (pfd.revents & (POLLERR | POLLHUP | POLLNVAL)) {
                            // Handle actual error/hangup events
                            return -EIO; // Or a more specific error
                    }
                    return -EAGAIN; // Try again (though this shouldn't happen with a timeout set)
            }
    
            // File descriptor is ready to write. Call your existing send_msg.
            // Note: send_msg/write might return a short count, but for RPMsg messages,
            // they usually send the full packet or fail with EAGAIN/EWOULDBLOCK.
            // Since we polled, a successful write should occur.
            ret = send_msg(fd, (char*)ptr, (int)len);
    
            if (ret < 0) {
                    // Handle failure in send_msg
                    int e = errno;
    #ifdef DEBUG
                    if (e == EAGAIN || e == EWOULDBLOCK) {
                            printf("send_msg failed with EAGAIN/EWOULDBLOCK after poll, should not happen\n");
                    } else {
                            perror("send_msg error in send_with_poll_deadline");
                    }
    #endif
                    return -e;
            }
    
            if ((size_t)ret != len) {
                    // Partial write (unlikely for RPMsg-char unless a very large packet is sent)
    #ifdef DEBUG
                    printf("Partial write: sent %d/%zu bytes\n", ret, len);
    #endif
                    // You might need more logic here if partial writes are possible/expected.
                    return -EIO; // Treat partial write as a failure for a simple packet
            }
            #ifdef DEBUG
            printf("Send success using poll fn\n");
    #endif
            return 0; // Success
    }

    I am calling this function from here...


    int rpmsg_start(int rproc_id, char *dev_name, unsigned int local_endpt, unsigned int remote_endpt,
                    int num_msgs)
    {
            int ret = 0;
            int i = 0;
            char eptdev_name[64] = { 0 };
            char packet_buf[512] = { 0 };
            rpmsg_char_dev_t *rcdev;
            int flags = 0;
            //int flags = O_NONBLOCK;
            int iret;
            size_t packet_len;
            timestamp_t ts;
            size_t got = 0;
            uint64_t local_gtc;
            uint64_t gtc_diff_ms;
            int total_msec_toset = 0;
            int status;
            /*
             * Open the remote rpmsg device identified by dev_name and bind the
             * device to a local end-point used for receiving messages from
             * remote processor
             */
            sprintf(eptdev_name, "rpmsg-char-%d-%d", rproc_id, getpid());
            rcdev = rpmsg_char_open(rproc_id, dev_name, local_endpt, remote_endpt,
                            eptdev_name, flags);
            if (!rcdev) {
    #ifdef DEBUG
                    perror("Can't create an endpoint device");
    #endif
                    return -EPERM;
            }
    #ifdef DEBUG
            printf("Created endpt device %s, fd = %d port = %d\n", eptdev_name,
                            rcdev->fd, rcdev->endpt);
                                    printf("Exchanging messages with rpmsg device %s on rproc id %d ...\n\n", eptdev_name, rproc_id);
    #endif
    #if 0//Tested not success
            int fd = rcdev->fd; // Get the file descriptor
            int current_flags = fcntl(fd, F_GETFL, 0);
            if (current_flags == -1) {
                    perror("fcntl F_GETFL");
                    return -1;
            }
    
            if (fcntl(fd, F_SETFL, current_flags | O_NONBLOCK) == -1) {
                    perror("fcntl F_SETFL O_NONBLOCK");
                    return -1;
            }
    #endif
            while(1) {
    #if 1
                    memset(packet_buf, 0, sizeof(packet_buf));
                    snprintf(packet_buf, sizeof(packet_buf), "Ready!");
                    packet_len = strlen(packet_buf);
                    iret = send_with_poll_deadline(rcdev->fd, packet_buf, packet_len, SEND_DEADLINE_MS);



    Thanks and Regards

    Shiny

  • Hello Shiny,

    I am not going to have time to run tests on my end this week, but I will still brainstorm ideas with you. Please keep me updated as you figure things out!

    Fundamental #1: Linux does not need to "wait for R5F to be ready to receive a message"

    After the RPMsg & VIRTIO infrastructure gets set up, Linux can immediately start writing messages to the VIRTIO buffer. If Linux is writing messages to the buffer and the R5F is not reading messages from the buffer, then eventually Linux will be blocked from writing more messages. I will discuss that more in APPENDIX. But at the start, Linux should be able to immediately write to the RPMsg buffer.

    I have not tried using poll() with POLLOUT, but it looks like POLLOUT is specifically looking for an event to occur that says "ok, the file was not writable before, but NOW the file is writable". Since the file is already writable to begin with, I suspect that you will never get an event indicating that the file descriptor has changed to become writable.

    Please try with a non-blocking write 

    Like this:
    open file descriptor with non-blocking
    returnValue = write()
    if (returnValue == -1) {
    // this is an error output. We did not successfully write to the RPMsg buffer
    now try polling with POLLOUT until we get an event
    }

    You might try it by sending back to back to back messages from Linux. On the R5F side, you may want to do something like wait for 5 seconds to give Linux time to fill up the buffers and hit a returnValue of -1.

    APPENDIX - how many RP messages can Linux write before it will be blocked?

    You have a limited number of spaces in the VIRTIO buffer. By default, we allocate 256 buffers in each direction, but there are other limitations like the Linux SW mailbox FIFO. Refer to this thread for more:
    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1500812/am6422-ipc-delay/5791025#5791025

    Regards,

    Nick

  • Hi Nick,

    Thank you for the guidance. Now it is working..

    I implemented non-blocking writes and fall back to poll(POLLOUT) only when write() returns -1. With this approach, Linux is able to transmit messages immediately after RPMsg/VIRTIO initialization.

    I’m now encountering the expected buffer-exhaustion condition and would value your guidance on safely increasing the available buffering/capacity on AM64x. Are there any concerns or trade-offs when enlarging the queue size and related parameters?

    Thanks & Regards

    Shiny

  • Hello Shiny,

    Glad to hear that you are able to make progress. My responses about the queue size will be followup to the link in the "APPENDIX" of my previous response (the April 25, 2025 response on this thread).

    SDK check: are you using MCU+ SDK 10.1 or later? 

    There is a known bug on AM64x MCU+ SDK 10.0 and earlier, where if the MCU+ core's TX buffers are all full, then the core enters an infinite loop and is unable to pop any messages out of the TX buffer.

    Is it safe to increase the size of the TX software mailbox going in the direction from Linux to MCU+ core? 

    I would expect to see 2 different behaviors in these 2 different cases:

    CASE 1: the software mailbox is full, but the VIRTIO buffers are not all full (e.g., SW mailbox size of 20, and 256 TX buffers)

    CASE 2: the software mailbox is NOT full, but the VIRTIO buffers are full (e.g., SW mailbox size of 270, and 256 TX buffers)

    There are 4 hardware FIFO slots per mailbox. So I am not sure if we transition from CASE 1 to CASE 2 when SW mailbox size = number of TX buffers, or when (SW mailbox size + HW mailbox size) = number of TX buffers.

    It has been a couple of years since I looked into CASE 1 and CASE 2. Please update MBOX_TX_QUEUE_LEN as per the link in the APPENDIX response, rebuild the kernel drivers and load the new kernel & kernel modules onto your EVM, and show me how your EVM behavior changes between CASE 1 and CASE 2 if you let the TX buffers fill up. Please capture the terminal output for both cases.

    Regards,

    Nick