SK-AM62: Excessive Dead Time Between Consecutive U16 GPMC Reads

Part Number: SK-AM62
Other Parts Discussed in Thread: AM4376,

Tool/software:

Hello,

We have the GPMC interface connected to our ASIC via a 16 bit bus with wait acknowledgment. Everything appears to work great except for this one minor issue with excessive time between two U16 reads.

Data transfers that works great are as follows:

When the CPU requests a single U32 read or a U32 write, then the bus has 2 U16 transactions separated by less than 40 ns.

When the CPU requests a U16 write followed by a U16 write, then the bus has 2 U16 transactions separated by less than 50 ns.

The problem is that when the CPU does a U16 read followed by another U16 read, then there is almost 250 ns of dead time between the accesses. Is there any way to improve this? I would expect the CPU to take the approximately the same 50ns separation between the two U16 reads that it would take to do the two separate U16 writes.

Thanks,
Victor

  • Hello Victor,

    Please help us to understand morte about the GPMC setup.

    What type of memory does your ASIC emulate: a NAND-Flash, NOR-Flash or SRAM memory ?
    What is the 16-bit GPMC access mode you have configured:
    synchronous/asynchronous, multiplexed or non-multiplexed ?
    If multiplexed, Is it an Address/Data multiplexed or a AAD address/address/data multiplexed mode ?
    Are these two consecutive reads - burst reads or two consecutive single reads ?
    Is the WAIT signal (input to GPMC) asserted between the two consecutive 16-bit reads
    Can you show timing oscillograms of the WAIT, REn/OEn, Data0 signals
    Can you dump registers: CS_CS_GPMC_CONFIG1_J_J as per the relevant chipselect
    Can you show dump of registers: CS_CS_GPMC_CONFIG1_J_J ...CS_CS_GPMC_CONFIG7_J_J values for your ASIC-relevant chipselect (index j) during the two consecutive reads ?
    We are interested to know the values of some bitfields like:
    RDACCESSTIME
    RDCYCLETIME
    PAGEBURSTACCESSTIME

    Thanks,

    Stan

  • Hi Victor,

    This delay between back to back U16 reads is typical of the delay between sequential read requests coming from the CPU. Command latency from CPU to GPMC + GPMC configuration time + Read data latency back to CPU. I've observed this both with R5F and A53 ARM cores. We can see this in simulation of the internal signals and see the CPU issue a read request on the interconnect and wait until the data returns before requesting another CPU read.

    The U32 read has minimal dead time between 2 back to back U16 transactions because it is issued as a single CPU read command on the internal interconnect.

    I'm not too familiar with the specifics of the ARM architecture or compiler... but I suppose its to prevent data hazards in the ARM data pipeline. DMA can get around this latency, but I'm not sure how to force the CPU to issue multiple reads before the first read data appears at the CPU. I will ask internally if there is some optimization or data block like calling asm(" DMB"); between reads... 

    Writes get pipelined/queued and can maximize throughput. You could disable write pipeline with optimization -o2 (to see similar latencies as reads).

    Hope this helps,
    Mark

  • Hello,

    Sorry for the delay in replying.

    NOR Flash, Synchronous, non-multiplexed, consecutive 16-bit single reads.

    We do use burst 64-bit reads with the four 16 bit reads coming in within the 3 or 4 clock window.  I do not know if there is excessive dead time between burst 64-bit reads. Since majority of the burst 64-bit reads are done via DMA.  it is not an issue if there is excessive dead time between consecutive U64-bit reads.

    WAIT signal asserted for each 16-read reads.

    Can you show timing oscillograms of the WAIT, REn/OEn, Data0 signals

    Maybe, I have to ask the H/W guys to put it on the scope.  However, I doubt if you see anything strange there since the the U32 read works so well.   I do know that the WAIT signal will tristate once the CS deassert and gets pulled by by a 1K ohm resister.  The other lines that you are inquiring is driven by the CPU and will be ignored by the ASIC once CS deasserts.

    Can you dump registers: CS_CS_GPMC_CONFIG1_J_J as per the relevant chipselect

    Maybe,  I have to write software to do it.

    Can you show dump of registers: CS_CS_GPMC_CONFIG1_J_J ...CS_CS_GPMC_CONFIG7_J_J values for your ASIC-relevant chipselect (index j) during the two consecutive reads ?

    Maybe, I have to write software to do it.

    Below is the configuration for the gpmc in the .dts file.

    ////////////////////////////////////////////////////////////////////////
    //// Single Read Sync Mode:
    //
        gpmc,sync-clk-ps = <10000>;
        gpmc,cs-extra-delay;
        gpmc,cs-on-ns = <0>;
        gpmc,cs-rd-off-ns = <30>;
        gpmc,cs-wr-off-ns = <30>;
        gpmc,oe-on-ns = <0>;
        gpmc,oe-off-ns = <30>;
        gpmc,we-on-ns = <0>;
        gpmc,we-off-ns = <30>;
        gpmc,rd-cycle-ns = <40>;
        gpmc,wr-cycle-ns = <40>;
        gpmc,access-ns = <20>;
        gpmc,wr-access-ns = <20>;
        gpmc,cycle2cycle-samecsen;
        gpmc,cycle2cycle-diffcsen;
        gpmc,cycle2cycle-delay-ns = <30>;
        gpmc,num-waitpins = <1>;
        gpmc,wait-pin = <0>;
        gpmc,wait-on-read;
        gpmc,wait-on-write;
        gpmc,wait-monitoring-ns = <0>;
        gpmc,device-width= <1>;
        gpmc,sync-write;
        gpmc,sync-read;
        gpmc,adv-on-ns = <0>;
        gpmc,adv-rd-off-ns = <10>;
        gpmc,adv-wr-off-ns = <10>;

            };

    Let me know if you find anything incorrect with our configuration that can cause this excessive dead time.

    Thanks,

    Victor

  • Hello,

    The software does these consecutive U16 reads in an ISR.  So, having this 250ns delay is killing our performance especially when the ISR operates at 2KHz.   Our previous generation of products which utilizes an AM4376 did not have this problem.

    Hopefully, there is register setting that was not configured correctly that is causing this problem.  Please investigate this further since most likely I am not the only one that is using the GPMC controller to access an ASIC.

    Thanks,

    Victor

  • Hello Victor,

    The GPMC expert is currently out of office. Please expect an answer early next week.

    Thanks for your patience !

    Kind Regards,

    Thanks

    Anastas Yordanov

  • The typical value for the excessive dead time between consecutive U16 GPMC reads is around 10-15 clock cycles from our documentation. Are you using single reads or burst? 

    To enable burst reads, you need to:

    1. Set the BURST_ENABLE field (bit 20) in the GPMC_CONFIG1 register to 1.
    2. Set the BURST_LENGTH field (bits 16-19) in the GPMC_CONFIG1 register to the desired burst length (e.g., 4, 8, 16, etc.).
    3. Set the READ_BURST_TYPE field (bits 0-1) in the GPMC_CONFIG2 register to the desired burst type (e.g., linear, wrap, or incremental).

    Morten 

  • Hello,

    The problem is only with consecutive single U16 read.

    We are using a 100MHz clock for the GPMC.  So 10-15 clocks will be 100 to 150nsecs.  We are seeing 250nsecs.

    We do have burst read working and we are satisfy with its performance since it is mainly used via DMA.

    Thanks,

    Victor

  • Hi Victor,

    WAIT signal asserted for each 16-read reads.

    For how long WAIT is being asserted by the ASIC? Can this be the reason for longer time between transfers?

    Thanks,

    Stan

  • No, the WAIT gets deasserted and pulled up by ASIC once GPMC deasserts the CS.  If WAIT is the issue, then we see same issue on U32 reads.

  • Hi Victor, 

     

    "We have the GPMC interface connected to our ASIC via a 16 bit bus with wait acknowledgment."

    From the dtsi it seems that WAIT0 pin monitoring is enabled for GPMC synchronous reads and writes to the ASIC:

        gpmc,cycle2cycle-diffcsen;
        gpmc,cycle2cycle-delay-ns = <30>;
        gpmc,num-waitpins = <1>;
        gpmc,wait-pin = <0>;
        gpmc,wait-on-read;
        gpmc,wait-on-write;
        gpmc,wait-monitoring-ns = <0>;
        gpmc,device-width= <1>;
        gpmc,sync-write;
        gpmc,sync-read;

    "I do know that the WAIT signal will tristate once the CS de-assert and gets pulled by by a 1K ohm resister."

    "No, the WAIT gets de-asserted and pulled up by ASIC once GPMC de-asserts the CS."

    Q1: Do I understand you correct that during the 1 x 32-bit access and during the two consecutive 16-bit single read  accesses (in your ISR) the WAIT0 signal is constantly kept asserted (active-low) by the ASIC while the CS0 is hold active low by the AM62 ?

    According to the Section External Signals / Subsection WAIT Pin Monitoring Control of the AM62x TRM , the GPMC WAIT signal shall be de-asserted for the GPMC controller to be able to capture a valid data during a data transfer stage. CS0 of course remains asserted during the entire data transfer.

    I have presented the two data transfer phases on the below diagram: GPMC and NOR Flash — Synchronous Burst Read — 4x16–bit (GPMCFCLKDIVIDER = 0) from the Section,GPMC   and NOR Flash — Synchronous Mode of the AM62x Datasheet

    Q2: Is there some chance that your AM62SK software disables WAIT0 pin monitoring at run-time, so it is disregarded ?

    Q3: Have you appropriately configured the SoC wait0 pin polarity (according the ASIC active wait level) ?

    If WAIT is the issue, then we see same issue on U32 reads.

    I think these are not identical usecases - for U32 reads you have a burst of 2 x 16-bit halfwords without intermediate deactivation of the Wait signal (HIGH) during the "valid data transfer stage" (so if this is the case this is normal to expect). For two consecutive 16-bit reads you will have two separate transactions of read data. There should be a de-assertion of the Wait signal not only between the two entire 16-bit transfers but also during the data stage of each of the transfers.

    Kind Regards

    Thanks,

    Anastas Yordanov

  • Hello,

    The GPMC controller controls the CS.  For the U32 read, it asserts the CS twice as shown in the scope capture below.  Since the CS deasserts between the two U16 reads, the WAIT signal would have also deasserted when CS is not asserted. Our  ASIC does not know if it is a U32 read or a single U16 read since all of the control signals from the GPMC are the same.  Note the dead time between the two CS is 40ns.  Which is what we expected.  This read was generated using a devmem command on the debug console.

    This second scope capture is a U32 write and two U16 reads.  The U32 write works just fine, same as U32 read.  However, note the dead time after the write and between the two U16 reads.  The data transfers were generated by CPU running debug code.  I have not reviewed the code but I have to assume it is written in "C" and while probably not the most efficient, it can't possibly be the cause of the excessive dead time.

    Thanks,

    Victor

  • Hi Victor,

    Thank you for the CS oscillograms. 

    "We do have burst read working and we are satisfy with its performance since it is mainly used via DMA."

    As per your confirmation and the waveforms the CS is de-asserted by the GPMC per each 16-bit data halfword produced from  the 32-bit U32 word write or read transactions. With a burst transfer CS shall not be de-asserted between the 16-bit halfwords. Are burst (multiple) reads and writes currently enabled in SoC GPMC in your setup ?

    "So, having this 250ns delay is killing our performance especially when the ISR operates at 2KHz. Our previous generation of products which utilizes an AM4376 did not have this problem."

    Are you running some OS onto the SK-AM62 or it is a bare metal code ? If it is some TI example code please specify the SDK version ?

    Thanks

    Kind Regards,

    Anastas Yordanov 

  • Hello,

    Yes, that is correct with the burst mode CS.  Just to be clear, we use a CS for the single synchronous U16 read and another CS for the burst mode.  The burst mode CS stays asserted for the entire 4 cycles of U16 reads.  The WAIT asserts at the beginning of the first U16 read and stay asserted until the CS de-asserts.  Like I mentioned before, the burst mode works perfectly and if there is excessive dead time between each burst read, it doesn't matter since DMA is the primary user.

    We are using Linux on our SK-AM62 platforms.  So, the .dts setting in this previous post was used to configure the GPMC.

    I am having the excessive dead time between two single U16 reads retested.   Previously oscillogram with the U32 write followed by two consecutive single U16 reads may not have been performed with 3 consecutive assembly language instructions.  So, we will confirm that is the test code and see if there is any improvement with the excessive dead time.

    Thanks,

    Victor

  • Hello Victor,

    Thank you for the additional inputs.

    I'll be waiting for your confirmation !

    Kind Regards,

    Anastas Yordanov

  • Sorry, firmware guy forgot to run this test.  Hopefully, will be done in a week or so.

  • Hi Victor,

    Let us know on the status once test is ready.

    Thank you !

    Kind Regards,

    Anastas Yordanov