This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/BEAGLEBN: DDR access from PRU

Part Number: BEAGLEBN

Tool/software: Linux

I've been digging through the documentation and the forum, and numerous source on the internet, but I can't find any definite answer about how to most efficiently let the PRU access larger chunks of global memory (DDR). A use case would be a framebuffer (PRU reads from DDR), or a data capture device (PRU writes to DDR).

There are lots of examples for how to do this prior to RemoteProc and RPMsg. Is it as trivial as just passing a memory pointer value from Linux to a PRU (that would be rather unsafe, but if that's what it takes to achieve speed, then so be it.) Or is there some mechanism for RemoteProc or RPMsg

Any hint or pointer to documentation would be appreciated!

  • Hi,

    I have asked the PRU experts to comment. Meanwhile, here is the RemoteProc documentation link: processors.wiki.ti.com/.../PRU-ICSS_Remoteproc_and_RPMsg
  • While meandering through the forum I found this entry: e2e.ti.com/.../642081

    The question seems to imply that sharing host memory via RPMsg does not allow the largest possible throughput. Is that really so? Specifically, where exactly does the overhead lie? For the use-case of highest-possible throughput of shared host memory, i.e. a fixed buffer of several MBytes of host memory, say, is RPMsg nonetheless the way to go, despite its overhead?

    The alternative, i.e. reserving host memory via the device tree, seems to be safe and reasonably straightforward - or does that still require /dev/mem with root access? Are there any pointers to demo code I could work with?
    I really would appreciate some guidance.

  • Hello Markus,

    RPMsg is used to communicate between the ARM and the PRU, and only allows you to send 492 byte messages. It takes a bit of time for it to get from PRU to ARM as detailed here.

    The PRU has access to read the entire memory space, so it does not require any special protocols to allow the PRU to simply read large chunks of memory.

    I can give you more detail if we know your use case. e.g., Are you simply PRU reading DDR? Are you trying to read a certain amount of data at a certain rate? Are you trying to pass information from one core to another? Are you trying to stream information, e.g. audio?

    Regards, 

    Nick

  • Hi Nick,

    I really have two use-cases in mind (I already mentioned them), they are the canonical examples where maximum throughput is required:

    1. Framebuffer: The PRU sequentially reads several MB of host RAM (DDR) and generates the appropriate output for some display.
    2. Data acquisition: The PRU sequentially writes several MB of data, from an ADC, say, to host RAM (DDR).

    In both cases control functions will be necessary, of course, but let's leave those aside and focus on max. throughput.

    As far as I understand, on the PRU side this is straightforward, once the address range is known. How to do this on the Linux side is less clear to me, Linux memory management introduces many layers of abstraction. What I've been thinking is to allocate a fixed amount of RAM via device tree and then create a memory mapped device over it. Or something like that ...

    Are there any examples out there how to best do this?

    Regards,

    Markus

  • Hi,
    The person handling this thread is out of the office today and will return next week and will respond then.
    Best Regards,
    Schuyler
  • A similar question was asked on the beagleboard forum: "RemoteProc DDR Access: Transferring images from PRUs to ARM".

    I think it's fair to say that there's broader interest in this question.

  • Hello Markus,

    RPMsg/RemoteProc does not currently have any built-in method to move large amounts of data between cores. It does make a difference on whether you are trying to access PRU data from the kernel or from userspace.

    Some example ways to attack the problem of moving data:

    1) (not your use case) If the amount of information per transfer is less than 8k or 12k, then PRU cores could load data into RAM local to the ICSS (the benefit is PRU has much faster memory access here than in DDR. I am not sure of latency for ARM). Linux would not have to reserve that memory space as long as that ICSS was not implementing ethernet.

    2) PRU cores could load data into a different internal memory or external DDR memory. Keep in mind memory external to the processor will always be slower than memory internal to the processor. Linux could reserve that DDR memory space in the device tree. Take a look at TIDA-01555 project and code for a sample ping-pong buffer communication between PRU and ARM - that example shows PRU firmware (look at the firmware for the controller PRU), userspace code, and device tree. I have also been told about using the CMEM API to reserve DDR memory, but I have not looked into that option extensively.

    3) DMA may offer speed benefits over using memcpy (which should use the LBBO assembly instruction), but I cannot say for sure. I have not yet looked into DMA myself.

    In terms of signalling, you could use RPMsg to tell one core that a buffer is ready to be read. If you do not need to send a message (i.e. you just need the interrupt), you could use INTC system events to communicate, which should be faster than RPMsg. UIO offers another means of communication, but TI does not support UIO in this context.

    Regards, 

    Nick

  • Hello Markus,

    I am marking this resolved. Please reply if you have any more questions!

    Regards,
    Nick