This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x GPMC performance single read

Hi,

We are having problems with the performance on our GPMC bus. I recently posted a similar question regarding access speed in general, and we are now wondering if this sets an absolute limit for our throughout using GPMC.

This is the former post:

http://e2e.ti.com/support/arm/sitara_arm/f/791/t/331214.aspx

We are doing consecutive reads to an FPGA connected to the GPMC bus. We are alternating between data an status register which means we can't do burst reads.

The GPMC module is therefore configured to do single byte reads using the following parameters:

WRAPBURST                 0
READMULTIPLE              0
READTYPE                  1
WRITEMULTIPLE             0
WRITETYPE                 1
CLKACTIVATIONTIME         0
ATTACHEDDEVICEPAGELENGTH  0
WAITREADMONITORIN         0
WAITWRITEMONITORING       0
WAITMONITORINGTIME        0
WAITPINSELECT             0
DEVICESIZE                0
DEVICETYPE                0
MUXADDDATA                0
TIMEPARAGRANULARITY       0
GPMCFCLKDIVIDER           1
CSWROFFTIME               5
CSRDOFFTIME               11
CSEXTRADELAY              0
CSONTIME                  1
ADVAADMUXWROFFTIME        0
ADVAADMUXRDOFFTIME        0
ADVWROFFTIME              3
ADVRDOFFTIME              3
ADVEXTRADELAY             0
ADVAADMUXONTIME           0
ADVONTIME                 1
WEOFFTIME                 5
WEEXTRADELAY              0
WEONTIME                  1
OEAADMUXOFFTIME           0
OEOFFTIME                 11
OEEXTRADELAY              0
OEAADMUXONTIME            0
OEONTIME                  4
PAGEBURSTACCESSTIME       0
RDACCESSTIME              10
WRCYCLETIME               5
RDCYCLETIME               11
WRACCESSTIME              0
WRDATAONADMUXBU           0
CYCLE2CYCLEDELAY          0
CYCLE2CYCLESAMECSEN       0
CYCLE2CYCLEDIFFCSEN       0
BUSTURNAROUND             0

Above, the read operation is configured to 11 100-Mhz cycles. The chip select is configured to be held low on the last 10 of those. 

On the oscilloscope we can see that the chip select is being held low for exactly 100ns as seen below.

The problem is the long delay until next read. After the chip select goes high at the end of the read cycle, there is a delay of 260ns before the next read cycle starts, giving us a maximum speed of 370ns/byte.

The above test is made from u-boot. An objdump of the code reads:

                val = GPMC_READ(reg);
80103824:       e5d32000        ldrb    r2, [r3]
                val = GPMC_READ(reg);
80103828:       e5d32000        ldrb    r2, [r3]
                val = GPMC_READ(reg);
8010382c:       e5d32000        ldrb    r2, [r3]
                val = GPMC_READ(reg);
80103830:       e5d32000        ldrb    r2, [r3]
                val = GPMC_READ(reg);
80103834:       e5d32000        ldrb    r2, [r3]

Since the FPGA doesn't care about the clock, I have noticed that I can cut 20ns by setting the read type to 0 (async), but it is still to slow.

Is this the best performace we can hope for, due to delays on the internal interconnects?

Thank you for your time,

Rickard Åberg

  • Hi Rickard,

    I will ask someone from the factory team to take a look at this.

  • Rickard,

    the problem with the GPMC is the internal bus arbitration time. If you use single byte reads, you won't get any further.

    The GPMC is also used to do NAND flash I/O, and there is a prefetch unit and a DMA unit attached to the GPMC to get the required throughput for NAND I/O. Without prefetch and DMA, you will get only about 2MBytes/sec.

    This arbitration time is typical for these sort of CPUs. I am not aware of any CPU which is much better.


    So what can you do?

    You might change the address mapping so that status and data register are on adjacent addresses. And you can do a 16bit read to read both at the same time. So you have only one arbitration instead of 2.

    And you can add a sort of FIFO inside the FPGA, reading more than one data byte at a time, using DMA.

    Using a FPGA with a status byte and a data byte is not a valid solution if high speed ist a requirement.

    regards

    Wolfgang

  • Wolfgang,

    Thank you for your answer.

    We will try to optimize the communication using DMA. The problem so far has been that the status register tells us which register to read the next byte from. This is due to incoming data from several external sources. We may however gain time by reading several status registers at the same time, and certainly when reading the data when there is more than one or two bytes to read in the FPGA:s FIFO.

    The main problem isn't the actual speed of the communication, but the relative time that the CPU is busy doing GPMC accesses. If the load is still to high using DMA, we will take a look at the PRU to see if we can use it for the GPMC communication.

    Best regards,

    Rickard

     

  • What I would do is:

    a) use an external event line from your FPGA to the CPU. So you can trigger the DMA controller from the FPGA. No polling needed. No waste of time.

    b) use ONE FIFO inside the FPGA. Each FIFO entry consists of 2 informations:

    1) the data byte

    2) the channel number to which the data byte is belonging.

    The DMA will dump these data into memory, and inside your software, you can read the channel numbers and sort the data to the right place.

    regards

    Wolfgang

  • Hi

    I have a similar problem (access to CPLD registers via the GPMC) and arrived at the same point: I measured 50 + 250 ns for the chip select with an appropriate GPMC configuration. Unlike Rickards situation, my problem is not only the cpu load, but the data rate too: 300 ns per byte will be too slow. My Question is: Will DMA access make the GPMC faster or will I still have the same timing? Any experiences? (Our PRUSs are busy with EtherCAT.)
  • Hi Wolfgang et.al.,

    Wolfgang Muees1 said:
    the problem with the GPMC is the internal bus arbitration time. If you use single byte reads, you won't get any further.

    may be the following is interesting for you:

    As written in my former posting, I found this to be true, but with older XDC 3.25.3.72, BIOS 6.35.4.50 (250 ns delay between memory cycles). But with XDCTools 3.305.60, BIOS 6.41.0.26, surprisingly this delay completely vanished, I get memory access cycles in asynchronous single write mode (AD muxed) of 70 ns, as programmed in the GPMC registers. No additional times due to "internal bus arbitration time".

    Details see

    May be, you have an explanation for that. Until now, I didn't get any from the TI guys.

    Thanks,

    Frank

  • Hi All,

    We are also trying to optimize write delay between GPMC and FPGA, Our system needs to write stream of data from a buffer to FIFO in the FPGA..We are running Linux on TI81XX series SOC and using single writes using standard writel calls.

    We are observing discontinuity between two successive writes similar to the problem cited in above posts.  Like with XDCTools & BIOS case can we get write delay optimization with in Linux too?? 

    This will solve our limited bandwidth issue between processor and FPGA for now.. we are also trying to use DMA to counter internal bus arbitration times and CPU utilizaion..

    Thanks,

    Srinivasu Manne.


     

  • Hi Srinivasu Manne,

    see latest (not really good) news here:

    Regards,

    Frank

  • Hi Rickard Åberg,
    we are trying to access the GPMC bus,but even I configured as you post,We didnt get the gpmc clock pulse,can you give any idea