This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: Problem detecting rpmsg mailbox full when communicating to r5 core

Part Number: AM6442

We are currently in an early development cycle where the firmware is frequently reloaded or paused for debugging. Under these conditions an A53 application is still attempting to send messages to the r5 via rpmsg. Eventually all mailboxes fill up and the A53 app can no longer send rpmsg messages to the r5.

The problem is the call to write the rpmsg does not fail, and there is no other indication of failure, that I can see, in the A53 application. I only see the error message in the terminal console: "failed to send mailbox message...Try increasing MBOX_TX_QUEUE_LEN".

This shouldn't happen in production or when everything is working correctly. However it is necessary to detect this condition in the A53 application and log it so that we can know, for example if the system is sending messages too fast, or too many of them. I could write a script that watches dmesg and looks for this error message and then signals the app, but I hope that is unnecessary. Is there some way to detect this error condition in a Linux application that is attempting to communicate to a previously connected r5 application?

On a closely related note, is there a way to communicate with the remoteproc API directly? Or is the sysfs interface the only supported interface? Is there example code communicating directly with the remoteproc API?

  • Hello Brian,

    Just to make sure you have seen the RPMsg documentation here:
    https://software-dl.ti.com/processor-sdk-linux/esd/AM64X/08_04_01_04/exports/docs/linux/Foundational_Components_IPC64x.html#rpmsg-char-driver

    and the RPMsg userspace library & example here:
    https://git.ti.com/cgit/rpmsg/ti-rpmsg-char/tree/examples/rpmsg_char_simple.c

    Off the top of my head, I am not familiar with a method to check how many available VIRTIO buffers are left. I'll check with some other team members. Please ping the thread if I don't respond within a couple of business days.

    Regards,

    Nick

  • Thanks Nick. Any comment on the other question at the bottom of my post, "...is there a way to communicate with the remoteproc API directly? Or is the sysfs interface the only supported interface? Is there example code communicating directly with the remoteproc API?"

  • Hello Brian,

    Checking for available VIRTIO buffers 

    The developer says there is not a way to check whether there is an available VIRTIO buffer before attempting to send an RPMsg.

    To double check on the userspace behavior, is the user space rpmsg send call getting blocked (or times out with error) if there is no virtio buffer space? Or is it just silently failing?

    RPMsg example code

    The RPMsg examples that I am aware of are:

    1) https://git.ti.com/cgit/rpmsg/ti-rpmsg-char (linked above)
    2) https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy (brand new shared memory example, if you have any constructive feedback for us around it please let me know)
    3) the standard Linux example in the Linux kernel source under samples/rpmsg/rpmsg_client_sample.c

    Regards,

    Nick

  • Hi Nick,

    The user space rpmsg send call is silently failing. The call returns that the correct number of bytes have been written. I believe this to be a bug.

    The only other indication of failure is in the kernel ring buffer which can be retrieved using dmesg.

    Thank you for the links. I'll take a look at rpmsg_char_zerocopy.

    Regards,

    Brian

  • Hello Brian,

    I am double-checking on what the expected behavior is. Please ping the thread if I have not replied within several business days.

    Regards,

    Nick

  • Hello Brian,

    It sounds like we expect the send to be a blocking call. Can you attach a sample application for us to look at?

    Regards,

    Nick

  • Hi Nick. Please see attached.

    mailbox_full.zip

    Below is sample output:

    root@am64xx-evm:~# ./rpmsg_mailbox_full -r 2 -n 100 -p 16 -d rpmsg_chrdev
    Created endpt device rpmsg-char-2-4220, fd = 3 port = 1024
    Exchanging 100 messages with rpmsg device ti.ipc4.ping-pong on rproc id 2 ...
    
    Sending message #0: hello there 0!
    Sending message #1: hello there 1!
    Sending message #2: hello there 2!
    Sending message #3: hello there 3!
    Sending message #4: hello there 4!
    Sending message #5: hello there 5!
    Sending message #6: hello there 6!
    Sending message #7: hello there 7!
    Sending message #8: hello there 8!
    Sending message #9: hello there 9!
    Sending message #10: hello there 10!
    Sending message #11: hello there 11!
    Sending message #12: hello there 12!
    Sending message #13: hello there 13!
    Sending message #14: hello there 14!
    Sending message #15: hello there 15!
    Sending message #16: hello there 16!
    Sending message #17: hello there 17!
    Sending message #18: hello there 18!
    Sending message #19: hello there 19!
    Sending message #20: hello there 20!
    Sending message #21: hello there 21!
    Sending message #22: hello there 22!
    Sending message #23: hello there 23!
    Sending message #24: hello there 24!
    Sending message #25: hello there 25![ 5386.118752] omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
    
    [ 5386.118778] platform 78000000.r5f: failed to send mailbox message, status = -105
    Sending message #26: hello there 26!
    [ 5387.119039] omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
    [ 5387.119064] platform 78000000.r5f: failed to send mailbox message, status = -105
    Sending message #27: hello there 27!
    [ 5388.119458] omap-mailbox 29020000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
    [ 5388.119485] platform 78000000.r5f: failed to send mailbox message, status = -105
    
    ^C
    Clean up and exit while handling signal 2
    Application did not close some rpmsg_char devices
    root@am64xx-evm:~#

    Please note that the error messages are coming from the kernel, not the application. That's because I ran the reproducer using the serial terminal.

  • Hello Brian,

    I am checking with the developer. Please ping the thread if I have not provided a response by Friday.

    Regards,

    Nick

  • Hi Nick. Any update from the developer?

    Thanks,
    Brian

  • Not yet. Trying again.

    -Nick

  • Hello Brian,

    I was told to expect a response by Friday.

    -Nick

  • Hi Nick. Any update?

  • Hello Brian,

    Thanks for the ping. It looks like the mailbox queue (queue length of 20) is smaller than the number of vring buffers (set to 256 VRINGs by default in the resource table). So it is silently failing when the mailbox queues get filled, instead of blocking when the vring buffers are filled.

    So we would expect to see different behavior when you reduce the number of vring buffers to something like 16.

    Linux gets the number of VRING buffers from the remote processor's resource table.

    The MCU+ Linux IPC driver automatically sets the number of buffers to 256, so I modified the MCU+ SDK driver file to use 16 buffers instead. However, so far it is not working for me. I am checking with the developer.

    If you want to run experiements on your side, here are the modifications I made to the AM62x MCU+ SDK 8.5. Your AM64x project should be pretty similar if not identical. Note that these changes are NOT working for me yet, so additional modifications may be required:

    0001-modify-ipc-driver-to-use-16-VIRTIOs.patch

    And here are the commands to build after you have your MCU+ SDK fully set up (after following the instructions in the getting started page https://software-dl.ti.com/mcu-plus-sdk/esd/AM64X/08_05_00_24/exports/docs/api_guide_am64x/SDK_DOWNLOAD_PAGE.html ). Again, these were run with AM62x but the same concept applies:

    /mcu_plus_sdk_am62x_08_05_00_14$ make -s -C examples/drivers/ipc/ipc_rpmsg_echo_linux/am62x-sk/m4fss0-0_freertos/ti-arm-clang/ clean
    /mcu_plus_sdk_am62x_08_05_00_14$ make -s -C examples/drivers/ipc/ipc_rpmsg_echo_linux/am62x-sk/m4fss0-0_freertos/ti-arm-clang/

    Regards,

    Nick

  • Hmm. ok, so it seems like the VRING size of 256 buffers in each direction is hardcoded somewhere, because whenever we reduce the number of buffers the code breaks. We're not sure where that value is hardcoded at this point in time.

    I am planning to provide another update in a couple of days. Please ping the thread if I have not replied by Thursday.

    Regards,

    Nick

  • Hi Nick. Any update on this?

  • Brian,

    Nick is out of the office until middle of next week so please allow a couple of days for him to get back and continue this discussion. Thanks for your patience.

    Regards, Andreas

  • Hi Nick. Any update on this?

  • Hello Brian,

    Apologies for all the delays here.

    Summarizing the current status:

    There are the Linux RPMsg VIRTIO buffers, which have a size of 256 buffers by default.

    The RPMsg driver uses mailboxes to communicate. There is a software mailbox queue that has a size of 20 by default (i.e., the mailbox driver assumes that there will never be more than 20 waiting messages that it has to keep track of).

    (note: I am saying "software mailbox queue" to differentiate between the physical FIFO that is inside the mailbox and is a fixed size, and the software queue that is just a software construct where you can simply rewrite the kernel code to change the queue size) 

    If you try to send a message after all of the RPMsg VIRTIO buffers have been filled, then the message fails with a blocking call. However, if you try to send a message after the software mailbox queue has been filled (i.e., after 20 RPMsg messages are waiting to be processed by the other core if we are using the default value), it will fail silently and we see the behavior you are observing.

    Ok, is there currently a way to get around the behavior? 

    As long as the number of VIRTIO buffers is the same as the size of the software mailbox queue, you don't see the issue.

    a) If you increase the value of MBOX_TX_QUEUE_LEN in include/linux/mailbox_controller.h to be 256, then we observe the expected "blocking call" behavior.

    b) In our initial experiments, we were not able to decrease the number of RPMsg VIRTIO buffers created by Linux RPMsg to less than 256. This was just an initial look, so there may be a way to do this easily

    What can you expect from us next? 

    mailbox_controller.h is used in multiple contexts as far as I can tell, not just in Linux RPMsg. So the above fixes feel more like hacks to me than an actual "fix". The developer is tied up in some work preparing for the next SDK, but I have a meeting with them scheduled for Friday May 12 to discuss further. Please ping the thread if I have not provided another update by Monday May 15.

    Regards,

    Nick

  • Update:

    The developer would expect that the "right" fix here would be to simply reduce the amount of VIRTIO buffers that the MCU+ firmware defines in its resource table to be less than 20, as discussed in this earlier response: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1179393/am6442-problem-detecting-rpmsg-mailbox-full-when-communicating-to-r5-core/4489518#4489518

    They will run experiments over the next couple of days, and have told me that they will get back to us by Wednesday May 17.

    Regards,

    Nick

  • Thank you Nick for staying on top of thisl.

  • Hello Brian,

    So far the Linux developer is able to replicate my non-working behavior, but does not see anything from the Linux side that should be limited to 256 buffers in each direction (in fact, the same RPMsg framework used by other processor companies has different numbers of buffers than we do by default, so we do not expect that the Linux framework itself is the issue). When I dug through the MCU+ driver code I didn't see anything that looked suspicious there either.

    I am reaching out to the MCU+ developers to see if they have any input.

    Regards,

    Nick