This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM623: Memory Map an external device to OSPI for faster reads

Part Number: AM623


Tool/software:

Hi TI Support team,

I previously had run into an issue where I was not able to read data fast enough from the Cadence OSPI controller. I have used the OSPI controller driver to perform register reads.

The FPGA had a FIFO and was pumping in data and the AM62x processor based Linux board was reading from it. I was about reading a 32bit register at about 33 - 35us using readl() from the spi-cadence-qspi.c OSPI Linux Driver. While the application demands me to complete a 64 bit read in under 7us. 

Is there a way that I can memory map FPGA's FIFO to Cadence conroller to achieve faster reads?

Any insights provided to achieve this throughput is much appreciated.

I hope I hear back from you guys soon.

Thanks & Regards,
Maneesh N

  • Hi Maneesh,

    I am routing your query to our SPI expert for comments.

  • Hi Maneesh, Cadence OSPI controller can be configured in DAC mode for Memory mapping.

    Other performance considerations:

    - DTR: This should be enabled by default, but worthy to double check is not disabled 

    - OSPI PHY enabled: https://www.ti.com/lit/ug/spruiv7b/spruiv7b.pdf#page=1380 

    Hope this helps, Thank you,

    Paula

  • Hi Paula,

    Thank you for sharing some insights, I was experimenting with the linux/drivers/spi/spi-cadence-quadspi.c driver,

    I just wanted to find out the time taken by individual functions while reading the OSPI data.
    For burst reads of 1024 samples that I read using a for loop, I found that after removing a function 'cqspi_wait_for_bit()' the read times of the data had significantly reduced.

      for(counter = 0; counter < 1024; counter++)
      {
        start = ktime_get();
        writel(offset, cqspi->iobase + CQSPI_REG_CMDADDRESS);
        writel(0x00bb0001, cqspi->iobase + CQSPI_REG_CMDCTRL);

                  /* commenting this function out
                  cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_CMDCTRL,
                                     CQSPI_REG_CMDCTRL_INPROGRESS_MASK, 1);
    */

        reg = readl(cqspi->iobase + CQSPI_REG_CMDREADDATALOWER);
        end = ktime_get();
              duration_ns = ktime_to_ns(ktime_sub(end, start));
        printk("reg: %x\tt: %lli\n", reg, duration_ns);    
      }

    After commenting  cqspi_wait_for_bit function out, I am able to achieve read times of about 1.6 - 1.7 us.

    Is it safe to use a readl skipping the use of 'cqspi_wait_for_bit()'? Are there some safety concerns that I should be worried about?

  • Hi Maneesh, without "cqspi_wait_for_bit" there is a risk reading control registers haven't finished to be setup. Did you verify the data read? is OK?

    One thing could be to add "busyway" as true to this function and see if time is better

    from I see from below code, that when busyway is false, then regular Polling is used with a "10" probably us as delay between reads

    linux/drivers/spi/spi-cadence-quadspi.c at master · torvalds/linux · GitHub

    static int cqspi_wait_for_bit(const struct cqspi_driver_platdata *ddata,
    			      void __iomem *reg, const u32 mask, bool clr,
    			      bool busywait)
    {
    	u64 timeout_us = CQSPI_TIMEOUT_MS * USEC_PER_MSEC;
    	u32 val;
    
    	if (busywait) {
    		int ret = readl_relaxed_poll_timeout(reg, val,
    						     (((clr ? ~val : val) & mask) == mask),
    						     0, CQSPI_BUSYWAIT_TIMEOUT_US);
    
    		if (ret != -ETIMEDOUT)
    			return ret;
    
    		timeout_us -= CQSPI_BUSYWAIT_TIMEOUT_US;
    	}
    
    	return readl_relaxed_poll_timeout(reg, val,
    					  (((clr ? ~val : val) & mask) == mask),
    					  10, timeout_us);
    }
     

    suggested change

    for (counter = 0; counter < 1024; counter++) {
    	start = ktime_get();
    	writel(offset, cqspi->iobase + CQSPI_REG_CMDADDRESS);
    	writel(0x00bb0001, cqspi->iobase + CQSPI_REG_CMDCTRL);
    
    	cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_CMDCTRL,
    			   CQSPI_REG_CMDCTRL_INPROGRESS_MASK, 1, true);
    
    	reg = readl(cqspi->iobase + CQSPI_REG_CMDREADDATALOWER);
    	end = ktime_get();
    	duration_ns = ktime_to_ns(ktime_sub(end, start));
    	printk("reg: %x\tt: %lli\n", reg, duration_ns);
    }

  • Hi Paula,

    I'm reading the data as expected for now. I need to experiment the suggested change and recheck the timing.

    Thanks,
    Maneesh N

  • Hi Paula,

    Sorry for replying this late.
    Like you told me I encountered issues when I tried to read this register with different addresses consecutively.

    I have made the following change, but not utilizing the busywait method as you have suggested, but something like this


    static int cqspi_wait_for_bit(void __iomem *reg, const u32 mask, bool clr)
    {
    	u32 val;
    
    	return readl_relaxed_poll_timeout(reg, val,
    					  (((clr ? ~val : val) & mask) == mask),
    					  0, CQSPI_TIMEOUT_MS * 1000);
    }

    just made delay_us field 0 instead of 10.
    Can you please help me understand if this method is ok? Or is it due to any particular reason that you wanted me to add busywait?

  • Hi Maneesh, what would be the reason for not using the "busywait"?

    From linux/drivers/spi/spi-cadence-quadspi.c at master · torvalds/linux · GitHub

    I see current implementation of "cqspi_wait_for_bit" , if "busywait" is enabled it would do a similar aggressive polling as your snipped code, but it also has a fallback, more relaxed polling with 10us dalys, after the aggressive polling times out.

    So probably this is a good balance already implemented in the driver..

    thank you,

    Paula