This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM623: using PRUSS to transfer data

Part Number: AM623

Hi,

We are exploring to use PRUSS on AM623 to transfer data from FPGA into the system memory at the rate of 1M samples per second, with each sample is 64-bit.

So, the effective data rate will be 64M bits per second. The data in the system memory will then be consumed by Linux application running at A53 core.

Not sure if the PRUSS is able to support this data rate using serial port? Or it needs to be a parallel port for this rate?

If yes, how to get started? How is the hardware connection?

Any example will be helpful ...

rgds,

kc wong

  • Hello KC,

    Good question. Let's talk about usecase a bit:

    1) Would this transfer happen with a standard communication protocol, or a custom serial / parallel protocol?

    2) What transfer rates does your FPGA support? Does it allow for parallel ports?

    3) Would the PRU need to do any processing of the data before making it available to Linux?

    4) What kind of throughput would the system need? Is this a constant throughput of 64Mbits/sec, or would this be sporadic data (that was sent at 1Msamples/sec when there was data to send)

    5) What is Linux doing with this data? (totally understand that usecase may be sensitive, feel free to talk about this in vague terms or send to me in a direct message)

    FYI: AM62x PRU Academy & OpenPRU is coming! 

    I am currently working on writing the AM62x version of PRU Academy (AM64x PRU Academy is here), and adding support for AM62x to the new OpenPRU repo. Let me know if you want a notification when the AM62x is officially added!

    Regards,

    Nick

  • Hi Nick,

    1) Actually currently we are trying to use OSPI in the indirect mode to transfer the data in A53 Linux, but 64Mbits/sec maybe out of the OSPI limit.

    2) Yes, it allows parallel ports as we are trying OSPI now.

    3) Most likely not, just need PRUSS to arrange the data in a ring buffer in memory, and update the next write index. A53 Linux application will poll and look at this write index to determine if there is new data in the ring buffer to be processed. Basically, want PRUSS to function like a DMA to A53.

    4) 64Mbits/sec is the maximum throughput, and is a constant rate from the front end. The rate can be less than that with different configuration.

    5) A53 Linux application will get the data from the ring buffer and apply the measurement algorithm to each sample, and push to the reading memory one by one.

    Yes, you can notify me when the AM62x is officially added.

    Attached is a proposal that I came out with ChatGPT, just not sure if it is viable or maybe the information is totally incorrect. Slight smile

    Or you can share your idea on this based on your understanding of the PRUSS capability ... 

    Or any other better alternative.


    rgds,

    kc Wong

    Proposal_to_use_PRUSS_64Mbit_FPGA_ingress_RDY_NEXT.pdf

  • Hello KC Wong,

    Sounds good. Send me a direct message with your email and I'll notify you when there are updates around PRU academy / OpenPRU.

    Are you already using the AM62x's OSPI peripheral, or is that available for the PRU to control?

    The PRU can either control existing peripherals, or the PRU can bit-bang the protocol directly (i.e., the PRU writes to the PRU GPI / PRU GPO pins, as discussed here). Typically it is easier to write the code to control an existing peripheral than to manually bit-bang a protocol, so I would usually suggest evaluating that option first.

    I did not have time today to look at the ChatGPT proposal.

    Regards,

    Nick

  • Hi Nick,

    Ya, we are trying to use OSPI peripheral in A53 Linux with the rate of 6.4Mbits/sec but have not been successful.

    Now, we just realize the throughput requirement is actually 10x faster.

    A53 may need to spend a lot CPU cycles moving data for rate of 64Mbits/sec.

    That's why we are moving away from A53 Linux to PRUSS.

    If PRUSS can operate OSPI peripheral with the rate of 64Mbits/sec, that will be great!!!

    My understanding is only A53 and M4F can operate the OSPI peripheral, not PRUSS.

    rgds,

    kc Wong

  • Hello KC Wong,

    So disclaimer, either of the options we are discussing would require development from yall's side. Unfortunately I don't have a prebuilt TI software solution I can give you.

    I have not dealt with OSPI before, so please be patient if I get anything wrong here. Seems like it is generally SPI protocol with 8 data lines instead of 1 data line. Off the top of my head I am not sure how the PHY comes into it.

    8 bit parallel interface at 64Mbits/sec --> 8Mbit/sec interface.

    Bitbanging quick check

    If the PRU did not need to do any manipulation of the data, and if the PHY component does not require much additional programming or maintenance during runtime, 8Mbit/sec feels doable from a bitbanging perspective (based on example SPI code here were we can get over 30MHz: https://github.com/TexasInstruments/open-pru/tree/main/examples/spi_loopback). You could do one PRU core controlling the pins, the other PRU core moving data somewhere Linux could get to it.

    Hardware peripheral quick check 

    We do not have a driver already written for PRU to control OSPI, but that does not mean that the PRU core is unable to control OSPI. PRU has access to the full system memory map, so it can interact with the OSPI registers and memory.

    You may be able to look at the MCU+ SDK drivers for M4F to interact with the OSPI for guidance on how to configure the OSPI peripheral. Linux drivers are another option, but sometimes it can be a bit harder to figure out the exact register reads from the Linux drivers.

    https://github.com/TexasInstruments/mcupsdk-core/tree/next/source/drivers/ospi/v0 

    https://software-dl.ti.com/mcu-plus-sdk/esd/AM62X/11_01_00_16/exports/docs/api_guide_am62x/DRIVERS_OSPI_PAGE.html

    The MCU+ SDK docs mention a tuning algorithm, I am not sure if that would apply to this usecase.

    I am sending your thread to a member of our hardware team just to give us a reality check on any hardware interface limitations that we might be missing.

    Regards,

    Nick

  • Hi Nick,

    My understanding is OSPI is able to support 64Mbits/sec rate in direct mode with DMA support if FPGA can mimic a NOR flash.

    But, in indirect mode without DMA support, it will be too much CPU load for A53 to use interrupt for data transfer.

    As for using PRUSS to control OSPI, below is the answer from ChatGPT. Of course, I need TI's help to check on that.


    No—PRUSS doesn’t directly “access” or operate the Cadence QSPI/OSPI controller on AM62x.

    • Pin ownership: The OSPI pads are hard-wired to the Cadence OSPI controller, not to the PRU-ICSSG. PRU fast I/O (R30/R31) is a separate pin bank; you can’t mux those pins over to PRU or vice-versa.

    • Register access/control: The Cadence OSPI is a SoC peripheral managed by the A53/M4F (and by Linux via spi-cadence-quadspi). The PRU cores don’t sit on the same peripheral bus to safely/officially program those registers. Under Linux (remoteproc), the PRU also doesn’t get mappings to manipulate OSPI.

    • DMA path: OSPI DMA/indirect engines are driven by the CPU and system DMA—not by PRU.


    rgds,

    kc Wong

  • Hi Kiung Chung Wong,

    We are going to internally discuss on possibilities for AM623 PRUSS to access the OSPI host, and choice of an OSPI interface mode to allow a 64 Mbit transfer rate between the FPGA and AM623 System memory through  AM623 OSPI host.

    Please ping the thread if you do not get a response early next week. 

    Thanks for your patience !

    Best Regards

    Anastas Yordanov

  • Ok, thanks Anastas.

    If not PRUSS + OSPI, appreciate also the other suggestion that can meet the requirement of 64Mbits/sec data transfer from FPGA.

    rgds,

    kc Wong

  • Hi Kiung Chung Wong,

    One commonly used option to interchange data with FPGA is the GPMC parallel memory controller. It can transfer 16 bits at a time at clock rates up to 100 MHz. I'm not sure however how GPMC can be used with PRUSS.

    Regards,

    Stan

  • Hello KC Wong,

    ChatGPT is incorrect.

    "Pin ownership" is correct. The OSPI signals going to the OSPI pins are only routed to the OSPI peripheral. If PRU wants to bitbang the OSPI protocol, PRU should use the PRU GPI / PRU GPO signals.

    "Register access" is wrong. The PRU can access the full system memory map. I see OSPI registers showing up in the TRM "MAIN Memory Map", which means that PRU can interact with them. You would need to make sure that Linux and PRU were not trying to read and write the same registers at the same time (I call this avoiding a "resource allocation conflict"), but that is just a step the system designer needs to take during the design process, not a fundamental limitation.

    Regards,

    Nick

  • Hi Nick,

    Great!!!

    In this case, can PRUSS + OSPI achieve the rate of 64Mbits/sec?

    Does it come with DMA support?

    Is there OSPI interrupt support? For example, I know there is no interrupt routed to M4F core for main domain UART.

    Also, what are the pro and con if comparing between M4F+OSPI vs PRUSS+OSPI?

    rgds,

    kc Wong

  • Hi Stan,

    If not PRUSS, do you have example for either A53+GPMC or M4F+GPMC?

    And what are the pro and con if comparing the 2 options, use GPMC in A53 or M4F?

    I asked ChatGPT to make a comparison table between OSPI vs GPMC as attached in the pdf file.

    Need TI's help to validate the information given by ChatGPT so that we can make the right decision. 

    Now, it seems to me that GPMC is better suit than OSPI for our use case.

    Advantages using GPMC are: -

    1. FPGA does not need to emulate NOR protocol
    2. Support read write to FPGA, while OSPI XIP mode is read-only
    3. No speculative prefetch issue


    rgds,

    kc Wong

    6835.OSPI_vs_GPMC_64Mbit_FPGA.pdf

  • Hi  KC Wong,

    GPMC is very basic parallel memory interface at the expense of 20 to 30 pins required depending on data and address bus widths.

    1. Correct

    2. Random writes will also be supported since FPGA will emulate RAM instead of flash memory

    3. Yes, overall GMPC is much simpler - you will simply need to read/write from/to memory locations just like from internal RAM.  

    You can refer to this thread for memory mapped configuration for GPMC, as well to many other threads.

     SK-AM62P-LP: Interfacing AM62P GPMC with FPGA IP Core: Device Tree, Kernel Drivers, and Configuration Steps 

    Regards,

    Stan

  • Hi Stan,

    Below are the pin counts and theoretical throughput for both the 8-bit and 16-bit bus widths from ChatGPT.

    According to ChatGPT, 32-bit bus width is not supported on AM62x.

    Hope you can help to verify the information below.

    Or point us to some application note or documentation that talks more about the GPMC.

    Especially 8-bit bus width, can it really meet 64Mbits/sec throughput?

    rgds,

    kc Wong

  • Hi KC Wong,

    GPMC description can be found in section:

    12.4.3 General-Purpose Memory Controller (GPMC) in  TRM

    For speed calculations I will be able to  check tomorrow. 

    Thanks,

    Stan

  • Hi Stan,

    Ok, thanks.

    Another question, let say the FPGA registers have been memory mapped, can the same location be accessed from both the A53 and M4F core?

    rgds,

    kc wong

  • Hi Kiung Chung Wong,

    Stan is currently out of office for the Christmas holidays. Please expect a response between 26-th of December 2025 and 6-th of January 2026. 

    We appreciate your patience !

    Kind Regards,

    Anastas Yordanov

  • 1) Actually currently we are trying to use OSPI in the indirect mode to transfer the data in A53 Linux, but 64Mbits/sec maybe out of the OSPI limit.

    Surprised by this though - OSPI with 8 data lines can hit 64 Mbps with 8 MHz clock, afaik you can do 25 MHz without enabling OSPI PHY.

    Most likely not, just need PRUSS to arrange the data in a ring buffer in memory, and update the next write index. A53 Linux application will poll and look at this write index to determine if there is new data in the ring buffer to be processed. Basically, want PRUSS to function like a DMA to A53.

    open-pru/examples/pru_emif at main · TexasInstruments/open-pru is parallel interface (emif client) emulation example using PRU.
    open-pru/examples/spi_loopback at main · TexasInstruments/open-pru is serial interface emulation using PRU. only way to achieve 64 Mbps here using multiple data lines based on the numbers listed

    GPMC throughput also depends on FPGA side implementation too - but 100 MB/s kind of performance can be achieved based on GPMC CLK :  AM5728: GPMC throughput with FPGA  

  • Ya, as much as I want to use GPMC to interface with FPGA so that FPGA registers can be memory mapped into A53 address space.

    Unfortunately our processor board does not route the GPMC lines to external connector that interfacing with FPGA. And some of the pins are already be used for boot pins and display parallel port. 

    So, we are falling back to the OSPI option. Even the OSPI raw speed can achieve 64 Mbps, I believe it is still challenging for A53 Linux to sustain that rate without DMA.

    Below is the previous discussion on the same project, which the target throughput is 100k samples per second (6.4Mbps) at that time. I think the team has done a lot fine tunning to the custom Linux kernel driver to try to meet 6.4Mbps, but now the throughput requirement is 1M samples per second (64 Mbps).

    AM623: AM623 DMA - Processors forum - Processors - TI E2E support forums
    AM623: Memory Map an external device to OSPI for faster reads - Processors forum - Processors - TI E2E support forums 

    Thus, we probably want to drop the A53+OSPI option to avoid the Linux, and turn to either M4F+OSPI or PRUSS+OSPI option to interface with FPGA.

    But, unfortunately below page mentions "Please note that this driver is supported only on DM R5(WKUP R5) as part SBL examples. It is not supported on MCU-M4."
    https://software-dl.ti.com/mcu-plus-sdk/esd/AM62X/11_01_00_16/exports/docs/api_guide_am62x/DRIVERS_OSPI_PAGE.html

    M4F+OSPI PRUSS+OSPI
    MCU+ SDK support no no

    RTOS

    yes, FreeRTOS ?

    OSPI register access

    yes yes

    interrupt

    ? ?
    DMA support ? ?
    example available ? ?


    Need help to provide the complete information to determine if either M4F+OSPI or PRUSS+OSPI option is viable for our project


    Below is the brief description of our use case.

    The FPGA will generate 1M samples per second (8 bytes per sample) continuously until user stops it, and we need to transfer those samples continuously from FPGA into the memory that A53 Linux can access.

    The front end will fill the FIFO at the rate of 1M samples per second (8 bytes per sample) continuously, and A53 Linux will need to pull the samples into the memory by reading the data register continuously for further processing.

    Instead of using A53 Linux, we are thinking to use the real-time core like M4F or PRUSS to pull the samples over the OSPI interface.


  • Hello Kiung Chung Wong,

    The topic owner comes back to the office today.

    Please ping this thread if you don't get a response in a couple of days.

    Thanks for your patience

    Kind Regards,

    Anastas Yordanov

  • Hello KC,

    Apologies for the delayed responses here.

    For future readers, KC is discussing the possibility of using DMA to enable Linux to interact with GPMC on this other thread:
    AM623: AM623: uisng GPMC to interface with FPGA

    Let's summarize potential options 

    There have been a lot of responses to this thread. This reply is a continuation of my response on December 14.

    Transferring 64Mbit/sec is absolutely doable in hardware. However, we do not have an out-of-the-box solution for you. All of these options will require software development.

    1) DMA + GPMC: 8 bit parallel interface + trigger signal to tell DMA to read in the 8 bit parallel interface at 8MHz. Requires Linux driver development

    2) PRU bitbanging a SW OSPI (i.e., PRU emulating OSPI): 8 bit parallel interface + OSPI control signals. Requires PRU assembly development. Defining a shared memory region to pass the data back to Linux userspace should not be an issue, as I just had another customer verify the zerocopy shared memory example between Linux + PRU 

    3) PRU controlling the HW OSPI peripheral: This is probably doable, but I have not looked at the design to see if read / write latency between PRU and OSPI peripheral would be a concern. Could use MCU+ driver as a reference to write PRU firmware. May be able to write the PRU code in C instead of assembly, and use the same zerocopy concept to send the data to Linux for consumption

    I misspoke about the M4F controlling OSPI, apologies. The MCU+ docs clarify that the DM R5F supports OSPI, but not the M4F core:
    https://software-dl.ti.com/mcu-plus-sdk/esd/AM62X/11_01_00_16/exports/docs/api_guide_am62x/DRIVERS_OSPI_PAGE.html

    So option 4) could be that you write custom DM R5F code to interact with the OSPI, as discussed in the AM62x multicore academy here: 
    https://dev.ti.com/tirex/explore/node?isTheia=false&node=A__AZNhqJdyJ3LM.YBw-Z2UAw__AM62-ACADEMY__uiYMDcq__LATEST
    and best practices here:
    [FAQ] How to avoid crashing the DM R5F when writing custom DM R5F firmware

    If you want to discuss potential DM R5F development, please create a new e2e thread for that topic.

    Regards,

    Nick

  • Hi  Nick,

    Our processor board currently only routes the OSPI lines to external connector that can be used to interface with FPGA. If OSPI is able to work, then we do not need to go to GPMC interface which requires hardware board change.

    Ok, M4F+OSPI is out of the option. PRU+OSPI option seems to have even more unknown.

    Understand that the primary role of the Device Management (DM) R5F core is to run the DM task. 


    So to add a new task to read 1 sample (8 bytes) per microsecond from OSPI interface, will it cause problem to the DM task?

    Because the read will be continuous until user stops it. The last thing we want to see is the DM task and the sample reading task stepping on each other.

    If that will not cause problem to the DM task, then I am happy to start a new e2e thread to discuss for R5F+OSPI option.

    rgds,

    kc Wong

  • Hello KC,

    If you set your task priorities correctly, then the DM task should be the only one with the highest priority. So the DM task will not be impacted by the OSPI read, but if the DM task may interrupt the OSPI read if the rest of the system makes a request. The amount of time the DM R5F spends on the DM task is hugely usecase-dependent (e.g., is there dynamic power scaling that requires frequent power management requests that go through the DM?). I do not know if there is a good way to benchmark how long the DM R5F spends in the DM task before returning to other tasks, but I am also not an MCU+ programming specialist. That may be a good discussion for a new thread.

    Regards,

    Nick

  • Kiung Chung Wong, Nick,

    thanks. Stan is currently out of office. Will let him know early next week..

    Thanks

    Kind Regards

    Anastas