This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6548: RP_msg problem

Part Number: AM6548

Using  a AM65xx idk eval card with the 7.3 version of the linux, tools and RTOS.

Testing IPC/rpmsg /ping-pong example.  (rpmsg_client_sample)

That works as expected when running it unchanged. 

Wanted to test several aspects of message throughput. using rpmsg between Linux and the R5 processor.

When reading the description of the function rpmsg_send() in rpmsg_core.c:

“In case there are no TX buffers available, the function will block until 

one becomes available, or a timeout of 15 second elapses.  

When the latter happens, -ERESTARTSYS is returned 

Trying to put the rpmsg_send() in a in a forloop to check buffer size and if it is blocking.  

But function always returns 0, even if it is not sending the message.

Tried to replace the function with rpmsg_trysend() instead.

But, when I put it in forloop, it does no signalling problems until about 300 calls.

But while I look at the R5 side, it looks like the buffer is about 20.

So then my question is: Why is  it not blocking or signalling full until around 300 function calls,

when it should have blocked long before?

And how to fix it?

Can someone give me some help?

Espen

  • Hi Espen,

    What scenario/behavior are you trying to test/emulate here? Can you give the steps you modifed for your test scenario.

    The virtio_rpmsg bus uses the virtio-ring transport, so it relies on a ring of buffers (configured as 256 in each direction in current examples). rpmsg_send() will only block when all 256 buffers were used for transmission and none of them were processed by the remote processor. A typical sequence would have the remote processor process all pending ring buffers upon receiving the virtio kick interrupt, and add the buffer back to the available ring. So normal usage, it is very unlikely to run into this situation unless a remote processor has hung or is running very slow in keeping up with the Tx-side.

    The difference between rpmsg_trysend() and rpmsg_send() is that the former will return right away when there are no available buffers for Tx.

    regards

    Suman 

  • Thanks for the answer.

    Comments directly to your response:

    I think you I understood what you are writing. (But, I did not know the buffer size.)

    I was using the rpmsg_trysend function to get immediate response.

    But, even if it is unlikely this happens, I feel I can not trust the rpmsg_trysend 

    if does not return EBUFFERFUL and the message doesn’t  go through.

    I can possible put in a delay, but how do I know it is large enough?

     

    But what I have tested:  

    My task was only to get familiar with the rpmsg system used between Linux and R5,

    and learn how to use it.

    Therefore, I thought I could play little with the examples and do some tests:

    1. Verify what happens with the sender when the receiver stops receiving or is slow/owerloaded,
       and try to figure out what the buffer size is when it happens.
    2. Measure throughput.(either/both direction, send as often I could or ping-pong) 

    Test program:

    I am using the rpmsg_client_sample module at the linux end as a base:

    In the probe function I have replace the rpmsg_send function with a rpmsg_trysend.

    And I have put the rpmsg_trysend in a for-loop with 1 000 iterations with sending to check

    when the send buffer is full.

    The regular Callback function in the kernel module is changed to only print out incoming messages and it returns instead of sending a message back to the R5.

    -

    At the R5-end I connected the debugger and put a breakpoint after receiving(pulling) messages.

    ---------

    What happens as one can see in the printout from the kernel log (Included below):

    1. Message 1-25 is sent according to the ret-value of the rpmsg_trysend function,
      according to the printout.
      Then for messages: 26 to 256:
           - omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
           - platform 41000000.r5f: failed to send mailbox message, status = -105
      Looks like a “magic” error from the system,(not seen in the retval from the program/rpmsg_trysend)    
      but, it looks like it is actually sending when looking at the return value from rpmsg_trysend().
    2. After 257 messages the rpmsg_trysend function returns  -12, which is -EBUFFERFULL
      This continues at long as one is sending. (1 000 messages)

    1. Then I turn to the R5 and look  via the debugger and which has stopping at a breakpoint
      when it received the first message.
      Starting single stepping through the messages and sending them back.
      But, it looks like I am only able to pull out 17 messages which is returned from R5 to Linux and written in the kernel log.

     So then I feel have 3 different numbers for the buffer size

    1. When the message
           - omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
           - platform 41000000.r5f: failed to send mailbox message, status = -105
        At message 26.    
    2. Sending error(-EBUFFERFULL) is signalled by trysend: 256 msgs
    3. Actually receive message at R5: 17 msgs

    I feel I can not trust the -EBUFFERFULL from rpmsg_trysend to determine how often one can send.

    How do I know I am not overloading the sending?

    Can you please explain it to me? This gives me a feeling that , send one message at a time and acknowledge it before sending the next one.

    Espen

    ———————

    Summary of kernel log:

    (Lines:

    modprobe: Sending went through 1 0 : 

    The numbers (1,0)  are msg-seqno & return value from rpmsg_trysend())

    ———

    [  166.206188] virtio_rpmsg_bus virtio0: creating channel ti.ipc4.ping-pong addr 0xd               

    [  174.222385] virtio_rpmsg_bus virtio0: creating channel rpmsg_chrdev addr 0xe                                       

    [  197.451413] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: new channel: 0x400 -> 0xd!                        

    [  197.459839] Start time: 197334712020                                                                               

    [  197.463472] modprobe: Sending went through 1 0                                                                     

    [  197.467962] modprobe: Sending went through 2 0                                                                     

    [  197.472432] modprobe: Sending went through 3 0                                                                                                                                                                                           

    ....

    [Removed some similar lines]

    ....                                                                                                                                                                                                                                                                                                                 

    [  197.563016] modprobe: Sending went through 23 0                                                                    

    [  197.567561] modprobe: Sending went through 24 0                                                                    

    [  197.572107] modprobe: Sending went through 25 0                                                                    

    [  197.576658] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  197.583640] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  197.591062] modprobe: Sending went through 26 0                                                                    

    [  197.595622] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  197.602604] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  197.610017] modprobe: Sending went through 27 0 

    ...

    [Similar lines removed]

    ....                                           

    [  201.910479] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  201.917460] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  201.924875] modprobe: Sending went through 254 0                                                                   

    [  201.929522] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  201.936503] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  201.943914] modprobe: Sending went through 255 0                                                                   

    [  201.948550] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  201.955525] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  201.962928] modprobe: Sending went through 256 0                                                                   

    [  201.967574] omap-mailbox 31f80000.mailbox: Try increasing MBOX_TX_QUEUE_LEN                                        

    [  201.974553] platform 41000000.r5f: failed to send mailbox message, status = -105                                   

    [  201.981961] modprobe: Sending went through 257 0                                                                   

    [  201.986590] modprobe: Sending error for msg 258,-12                                                                

    [  201.991486] modprobe: Sending error for msg 259,-12                                                                                                                                                                                           

    .....

    [Similar lines removed]

    .....                                                                                                                                                                                                                                 

    [  205.604939] modprobe: Sending error for msg 997,-12                                                                

    [  205.609828] modprobe: Sending error for msg 998,-12                                                                

    [  205.614722] modprobe: Sending error for msg 999,-12

    ....

    [Now I start manually single stepping the R5 and pull out the messages and return them to the A53 such they show up in the kernel log.]

    ......                                                           

    [  366.518019] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 1 (src: 0xd)                         

    [  366.526311] Recv msg_cb hello world!                                                                               

    [  383.251554] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 2 (src: 0xd)                         

    [  383.259848] Recv msg_cb hello world!                                                                               

    [  402.879652] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 3 (src: 0xd)                         

    ....

    [Similar lines removed]

    ......                                                                               

    [  523.625096] Recv msg_cb hello world!                                                                               

    [  532.820564] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 15 (src: 0xd)                        

    [  532.828974] Recv msg_cb hello world!                                                                               

    [  540.753502] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 16 (src: 0xd)                        

    [  540.761885] Recv msg_cb hello world!                                                                               

    [  549.565718] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13: incoming msg 17 (src: 0xd)                        

    [  549.574106] Recv msg_cb hello world!                                                                               

  • Hi Espen,

    Thanks for giving all the details about your test environment. Here are some further clarifications:

    The rpmsg vring transport has two main variables: number of vring buffers and buffer size.

    1. The number of vring buffers are supplied through the resource table. This can actually be adjusted through the resource table through the vdev resource type. The Tx and Rx number should be the same. The Linux-side is flexible, but the RTOS-side code is mostly hard-coded for this in the core transport layer.

    You can look up the configured number of buffers by reading the resource_table in debugfs (cat /sys/kernel/debug/remoteproc/remoteprocX/resource_table).  

    2. The vring transport buffer size is fixed at 512 bytes with a maximum payload of 496 bytes. This is hard-coded on Linux-side as well. The transport is processor-copy based, the 512 bytes transport size is selected for covering most common usages and not incur a lot of memcpy penalty.

    The actual error message returned when it could not find a Tx buffer is actually ENOMEM (-12, include/uapi/asm-generic/errno-base.h)

    ---

    The issue you are running into is a failure at Mailbox subsystem level which unfortunately is getting masked at various levels at present. There are Tx buffers available, but the virtqueue_kick() in rpmsg_send_offchannel_raw() is failing. This one is getting masked first in k3_r5_rproc_kick() (the rproc kick ops is returning void) and the rproc_virtio_notify() in drivers/remoteproc/remoteproc_virtio.c is always returning true. This is not a catastrophic failure in general, since all the pending buffers in the vring will get processed even when processing a single interrupt.

    The Mailbox subsystem provides the infrastructure for all Mailbox Controllers, and it has its own queue (MBOX_TX_QUEUE_LEN, defined as 20 in include/linux/mailbox_controller.h). The OMAP Mailbox driver has a bigger queue (256 as defined by OMAP_MBOX_KFIFO_SIZE), but the Mailbox Tx state-machine is all limited and controlled by the MBOX_TX_QUEUE_LEN variable.

    On the RTOS-side, you don't have unlimited Rx buffers unlike on Linux-side. The application provides the number of buffers at RPMessage_create time, and current example sample has about 256 buffers. So, I expect you to receive a bit more than 17 messages. Ofcourse, the timing goes off once you put a breakpoint, you should receive some more if you don't use a breakpoint and let the R5F firmware process more messages especially since you are flooding the messages from the Tx-side. 

    Also, please run your test by putting a delay in between your rpmsg_trysend loop of messages.

    Traditional application usage in general is usually a 2-way communication. The virtio-ring layer does provide some notification suppression for flow-control, but those are not used by virtio-rpmsg layer. 

    regards

    Suman

  • WoW

    Really thanks for a really good, in-depth  and understandable answer :)

    It was really Interesting and I learned a lot from it!

    Thanks for the suggestion for how to mitigate it.

    I was actually experimenting with it while your post appeared.  (Non-breakpoint & delay in sending)

    And observed the behaviour you explain. 

    But, your good explanation made things much clearer to me.

    Of course, 

    it is not the natural way to use it by flooding with messages and without acknowledge for each message.

    I just wanted to test what throughput to expect and discovered some strange behaviour,

    which I did not feel was like the description for the rpmsg_trysend() function. 

     

    But, are there any plans to fix the error soon?

    Thanks,

    Espen

  • Hi Espen,

    I am glad that you found the explanation very useful, thank you :)

    In summary, rpmsg_send() or rpmsg_trysend() return failure only when there are no Tx buffers available. The only difference is that rpmsg_send() will wait for 15 seconds (currently hard-coded timeout value in drivers/rpmsg/virtio_rpmsg_bus.c) while the rpmsg_trysend() will return immediately.

    Your tests exposed a hole with the kick mechanism. You can mitigate the kick issue by increasing the MBOX_TX_QUEUE_LEN value. I would recommend to use double the size of number of vring buffers (a separate kick is sent for Rx and Tx of vring buffers). That's quite a lot since we normally don't expect the interrupts to be not processed, the condition usually will mean a terminal failure for the usecase.

    Btw, how many messages were you able to receive without the breakpoint?

    End of the day, for uni-directional flooding with continuous Tx without any delays in between will eventually be limited by the number of buffers available both in the transport and the interrupt/callback mechanism. The receiving remote processor speed and interrupt latencies will require the usage of flow control for such cases.

    This problem is SoC agnostic with the virtio-rpmsg layer. I can follow-up directly with the upstream community on this. I am not expecting an immediate resolution for this, since this is not a conventional/typical usecase.

    regards

    Suman

  • Thank for more explanation.
    I know it is not the usual way use it and I was not going to use it this way. One should ack each message.

    How many ping-pong messages variated due to several parameters. I think I had to put in approx a 30us delay between each send commands to make it last for the a long run without losing messages when sending from the kernel driver.  Probably making a driver to userland will also eliminate some of the need for waiting. 

    The result was also dependent on the message length. 

    I think I ended up with a approx 17k ping-pong messages/sec. (and 13k for waiting for each message before sending.)

    (Dont remember the numbers longer)

    But I did not really try to get it faster, it is no need.

    As long as one do not write to the kernel log it to often, it is fast enough. ;) 

    Espen