This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Part Number: BEAGLEBN
I've been digging through the documentation and the forum, and numerous source on the internet, but I can't find any definite answer about how to most efficiently let the PRU access larger chunks of global memory (DDR). A use case would be a framebuffer (PRU reads from DDR), or a data capture device (PRU writes to DDR).
There are lots of examples for how to do this prior to RemoteProc and RPMsg. Is it as trivial as just passing a memory pointer value from Linux to a PRU (that would be rather unsafe, but if that's what it takes to achieve speed, then so be it.) Or is there some mechanism for RemoteProc or RPMsg?
Any hint or pointer to documentation would be appreciated!
We are glad that we were able to resolve this issue, and will now proceed to close this thread.
If you have further questions related to this thread, you may click "Ask a related question" below. The newly created question will be automatically linked to this question.
In reply to Biser Gatchev-XID:
While meandering through the forum I found this entry: e2e.ti.com/.../642081 The question seems to imply that sharing host memory via RPMsg does not allow the largest possible throughput. Is that really so? Specifically, where exactly does the overhead lie? For the use-case of highest-possible throughput of shared host memory, i.e. a fixed buffer of several MBytes of host memory, say, is RPMsg nonetheless the way to go, despite its overhead? The alternative, i.e. reserving host memory via the device tree, seems to be safe and reasonably straightforward - or does that still require /dev/mem with root access? Are there any pointers to demo code I could work with? I really would appreciate some guidance.
In reply to Markus Mayer53:
RPMsg is used to communicate between the ARM and the PRU, and only allows you to send 492 byte messages. It takes a bit of time for it to get from PRU to ARM as detailed here.
The PRU has access to read the entire memory space, so it does not require any special protocols to allow the PRU to simply read large chunks of memory.
I can give you more detail if we know your use case. e.g., Are you simply PRU reading DDR? Are you trying to read a certain amount of data at a certain rate? Are you trying to pass information from one core to another? Are you trying to stream information, e.g. audio?
In reply to Nick Saulnier:
I really have two use-cases in mind (I already mentioned them), they are the canonical examples where maximum throughput is required:
In both cases control functions will be necessary, of course, but let's leave those aside and focus on max. throughput.
As far as I understand, on the PRU side this is straightforward, once the address range is known. How to do this on the Linux side is less clear to me, Linux memory management introduces many layers of abstraction. What I've been thinking is to allocate a fixed amount of RAM via device tree and then create a memory mapped device over it. Or something like that ...
Are there any examples out there how to best do this?
In reply to Schuyler Patton:
A similar question was asked on the beagleboard forum: "RemoteProc DDR Access: Transferring images from PRUs to ARM".
I think it's fair to say that there's broader interest in this question.
RPMsg/RemoteProc does not currently have any built-in method to move large amounts of data between cores. It does make a difference on whether you are trying to access PRU data from the kernel or from userspace.
Some example ways to attack the problem of moving data:
1) (not your use case) If the amount of information per transfer is less than 8k or 12k, then PRU cores could load data into RAM local to the ICSS (the benefit is PRU has much faster memory access here than in DDR. I am not sure of latency for ARM). Linux would not have to reserve that memory space as long as that ICSS was not implementing ethernet.
2) PRU cores could load data into a different internal memory or external DDR memory. Keep in mind memory external to the processor will always be slower than memory internal to the processor. Linux could reserve that DDR memory space in the device tree. Take a look at TIDA-01555 project and code for a sample ping-pong buffer communication between PRU and ARM - that example shows PRU firmware (look at the firmware for the controller PRU), userspace code, and device tree. I have also been told about using the CMEM API to reserve DDR memory, but I have not looked into that option extensively.
3) DMA may offer speed benefits over using memcpy (which should use the LBBO assembly instruction), but I cannot say for sure. I have not yet looked into DMA myself.
In terms of signalling, you could use RPMsg to tell one core that a buffer is ready to be read. If you do not need to send a message (i.e. you just need the interrupt), you could use INTC system events to communicate, which should be faster than RPMsg. UIO offers another means of communication, but TI does not support UIO in this context.
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.