TMDS64EVM: A way to achive data transer of 1500 Bytes in 1 millisecond from A53 to R5f and back.

Elad Sl

Part Number: TMDS64EVM

I have a project where I want to have EhterCAT Master on the A53 to send data(1500Bytes) every 1 millisecond to R5f core, (both ways).
I'm running on the Linux RT patch and tried both examples: rpmsg_simple_char and rpmsg_char_zerocopy
rpmsg_char_zerocopy is very slow and doesn't meet our requiremnts (~30 millisecond round trip)
rpmsg_simple_char gives on average ~60 microseconds round trip but has a big jitter of 3-9 millisecond 0.1% of the times which is still not acceptable.
I don't know why I have this problem, If it just can't handle the fast rate of data transfer or I'm doing something wrong.

The idea is the A53 core sends every 1 millisecond to R5f and the R5f also send every 1 millisecond to the A53 core.
Both will have two tasks one for sending and one for recevin data. (Not blocking on receive like in the examples.)

I would be happy to know if there a way to achive this?
Thank you for your time.

over 2 years ago

0 Nick Saulnier over 2 years ago

TI__Guru** 101800 points

Hello Elad,

To verify exactly what you are testing:

1) did you modify the rpmsg_char_zerocopy example to use only 1500 Bytes? Or does that ~30 msec round trip involve something else, like copying the default amount of 1MB of data in each direction?
https://git.ti.com/cgit/rpmsg/rpmsg_char_zerocopy/tree/linux/src/rpmsg_char_zerocopy.c#n58

2) I assume with the rpmsg_char average round trip of ~60 usec, you are just counting a single RPMsg message (512 bytes total, 496 bytes of information per message)? Keep in mind that each RPMsg message packet can only send 490 bytes of information, so you would need to send 4 RPMsg messages to transmit 1500 bytes total.

Have you checked your interrupt response time yet for RT Linux?

I would start by running cyclictest to get a feel for what your RT Linux interrupt response time is. You can find more information about that here:
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1183526/faq-linux-how-do-i-test-the-real-time-performance-of-an-am3x-am4x-am6x-soc
and
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1172055/faq-am625-how-to-measure-interrupt-latency-on-multicore-sitara-devices-using-cyclictest

and check this FAQ for more information about ensuring that computations happen in a specific amount of time: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1085663/faq-sitara-multicore-system-design-how-to-ensure-computations-occur-within-a-set-cycle-time

Regards,

Nick

0 Elad Sl over 2 years ago in reply to Nick Saulnier

Prodigy 10 points

Thank you for your response,
1. I didn't modify the examples data only activate them in a RT thread and took a timestamp before sending and after receiveing at the A53 core.
The data for rpmsg_simple_char is just a message index.
For rpmsg_char_zerocopy , used the deault data and data size.
the goal was to know what is the fastest rate I can achive.

2. I'm aware I will need 4 RPMsg for 1500 bytes.

3. I checked the cyclictest (For now with no isolcpus) and got 100us worst case for CPU0 and 150us for CPU1. (No stress load)

I don't understand why 1-3 times per 1000 RPMsg on the receive part it takes 3-9ms, what could be the reason and what can I do to improve it.
Is there a different way to move data from A53 to R5F?

0 Dominic Rath over 2 years ago in reply to Elad Sl

Mastermind 7480 points

Hello Elad,
Hello Nick,

we've seen this issue in the past, too.

From what I remember the mailbox driver uses a workqueue to handle processing, which is something you can't prioritize, even in a RT setup. I've just checked the latest version available in TI's ti-rt-linux-6.1.y and that still uses the workqueue.

We have a patch that replaces the workqueue with a dedicated thread, but that required further changes in generic rpmsg code, too. Unfortunately we haven't had time to get these patches in a suitable state for getting them upstream yet.

Maybe if Nick could get in touch with TI's developers to verify if they can confirm our assumption about the workqueue being an issue.

Regards,

Dominic

0 Nick Saulnier over 2 years ago in reply to Dominic Rath

TI__Guru** 101800 points

Hello Elad,
One of my teammates should be running RPMsg benchmarks within the next week or so, and I will make sure to check to see if we can replicate your observation of occasional multi-millisecond latencies.

Hey there Dominic,

Thanks for commenting! Interesting note about the workqueue and inability to prioritize in an RT setup. I will check with the developer. If I have not responded by Monday, please ping the thread.

Regards,

Nick

0 Dominic Rath over 2 years ago in reply to Nick Saulnier

Mastermind 7480 points

Hello Nick,

have you already had a chance to check with your developers?

Regards,

Dominic

0 Dominic Rath over 2 years ago in reply to Dominic Rath

Mastermind 7480 points

Ping?

0 Nick Saulnier over 2 years ago in reply to Dominic Rath

TI__Guru** 101800 points

Hello Elad & Dominic,

Apologies for the delayed response here, and thank you for the pings. No reply yet from the developer between vacations and some other high priority work. I am asking the developer again to take a look and give us their thoughts.

Regards,

Nick

0 Nick Saulnier over 2 years ago in reply to Nick Saulnier

TI__Guru** 101800 points

Hello yall,

Ok, at this point the developer has not traced the flow of the RPMsg driver and communication to see if there are any parts of the code that could add latency in specific edge cases. They are focused on some other near-term tasks, so unfortunately we will not be able to get them to take a closer look for another couple of weeks.

Still waiting on my team member who wants to run RPMsg benchmarks, but we'll see if we can replicate your results.

As for the question "Is there a different way to move data from A53 to R5F?", you have a couple of options if you are using a shared memory region to pass that data back and forth.

If you want the data passing to be interrupt based, RPMsg is the SW solution that TI currently provides for messaging between Linux and remote cores. two MCU+ remote cores talking between themselves can use mailboxes instead of an entire RPMsg, but we do not currently enable a mailbox solution that is exposed up to Linux userspace.

You could also use polling methods. Something like
1) tell Linux userspace to wait for X usec (note that context switching itself takes a certain amount of time that adds on to this amount
2) Linux checks the first byte of the shared memory to see if a "ready for read" value has been written there
3) If a "ready for read" value is there, read the shared memory, else wait a bit longer and check again
4) write a "ready for write" value to the first byte of shared memory
5) wait for X usec

Regards,

Nick

0 Elad Sl over 2 years ago

Prodigy 10 points

Thank you for the response, I will look into it.

0 Nick Saulnier over 2 years ago in reply to Elad Sl

TI__Guru** 101800 points

Hello Elad & Dominic,

Circling back around to this.

1) Apologies about not running worst tests on this side yet. If I have some time on this end over the next 3 weeks I'll take a look myself since my team member didn't over the summer, but I'll be taking most of October off so the next couple weeks are pretty packed trying to get everything else addressed.

2) Dominic, is there any additional information I should pass along? We're trying to evaluate if the current RPMsg implementation provides a baseline featureset that we are satisfied with, or if we need to take another look at either reworking RPMsg or looking at an alternative messaging implementation like a Linux-side IPC_Notify. I would definitely be curious to see any of your test results / code, the patch you mentioned, etc if you are interested in sharing.

Regards,

Nick

0 Nick Saulnier over 2 years ago in reply to Nick Saulnier

TI__Guru** 101800 points

I am getting some additional input from developers:

"Workqueue priority should be configurable. They will show-up as kworker/ threads. On RT-Linux almost every priority should be configurable. Even hard irqs are handled as threads unless kernel passes a flag not to do so. Need to understand what was tried for priority setting and what did not work."

From someone else, "On the networking benchmarks we run ksoftirq’s as best effort, often just putting them as RT has a big effect on performance. I’ve not tried out similar with rpmsg, but I’d think there is an interrupt involved."

Any additional thoughts from yall's side?

Thanks,

Nick

0 Dominic Rath over 2 years ago in reply to Nick Saulnier

Mastermind 7480 points

Hello Nick,

Nick Saulnier said:
"Workqueue priority should be configurable. They will show-up as kworker/ threads. On RT-Linux almost every priority should be configurable. Even hard irqs are handled as threads unless kernel passes a flag not to do so. Need to understand what was tried for priority setting and what did not work."

I don't think that's true for workqueues. There was a patch on LKML last year (https://lore.kernel.org/lkml/20220323145600.2156689-1-linux@rasmusvillemoes.dk/), but it seems the idea was rejected. From what I understand the idea was rather having dedicated workers where they are needed.

Additionally the workqueue used by the omap mailbox driver via schedule_work() is the kernel's global workqueue, i.e. where "anything" could end up being processed on. Even IF that workqueue's priority was raised, you could still end up behind a lot of lower priority stuff.

Other mailbox drivers call mbox_chan_received_data() from their hard IRQs. Doing this also for the omap mailbox driver caused issues further down the line in rpmsg.

We ended up implementing a dedicated worker for rpmsg and calling mbox_chan_received_data() directly from __mbox_rx_interrupt, but we weren't confident enough of our change in rpmsg to get this upstream.

Best Regards,

Dominic

0 Nick Saulnier over 2 years ago in reply to Dominic Rath

TI__Guru** 101800 points

Hello Dominic,

Thanks again for jumping on this thread to begin with, and continuing to engage with us throughout. Your input is helping me move the discussion forward on our side.

I don't have any firm updates, but thanks to your latest response I have the buy-in to create a requirement for us to look into ways to put an upper bound on the RPMsg latency in an RT Linux system.

Regards,

Nick

Processors

Processors forum

TMDS64EVM: A way to achive data transer of 1500 Bytes in 1 millisecond from A53 to R5f and back.