AM623: Memory Map an external device to OSPI for faster reads

Maneesh N

Part Number: AM623

Tool/software:

Hi TI Support team,

I previously had run into an issue where I was not able to read data fast enough from the Cadence OSPI controller. I have used the OSPI controller driver to perform register reads.

The FPGA had a FIFO and was pumping in data and the AM62x processor based Linux board was reading from it. I was about reading a 32bit register at about 33 - 35us using readl() from the spi-cadence-qspi.c OSPI Linux Driver. While the application demands me to complete a 64 bit read in under 7us.

Is there a way that I can memory map FPGA's FIFO to Cadence conroller to achieve faster reads?

Any insights provided to achieve this throughput is much appreciated.

I hope I hear back from you guys soon.

Thanks & Regards,
Maneesh N

3 months ago

0 Bin Liu 3 months ago

TI__Guru**** 170301 points

Hi Maneesh,

I am routing your query to our SPI expert for comments.

0 Paula Carrillo 3 months ago in reply to Bin Liu

TI__Mastermind 40100 points

Hi Maneesh, Cadence OSPI controller can be configured in DAC mode for Memory mapping.

From linux/drivers/spi/spi-cadence-quadspi.c at master · torvalds/linux · GitHub. I don't see any ".quirks = CQSPI_DISABLE_DAC_MODE" used for AM62x platform so it should be enabled by default.
From AM62x TRM we have that OSPI data region can be mapped into the CPU's address space (0x4_0000_0000). https://www.ti.com/lit/ug/spruiv7b/spruiv7b.pdf#page=668
- This memory region would need to be configured in Device Tree Configuration. https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/arch/arm64/boot/dts/ti

Other performance considerations:

- DTR: This should be enabled by default, but worthy to double check is not disabled

- OSPI PHY enabled: https://www.ti.com/lit/ug/spruiv7b/spruiv7b.pdf#page=1380

Hope this helps, Thank you,

Paula

0 Maneesh N 3 months ago in reply to Paula Carrillo

Prodigy 20 points

Hi Paula,

Thank you for sharing some insights, I was experimenting with the linux/drivers/spi/spi-cadence-quadspi.c driver,

I just wanted to find out the time taken by individual functions while reading the OSPI data.
For burst reads of 1024 samples that I read using a for loop, I found that after removing a function 'cqspi_wait_for_bit()' the read times of the data had significantly reduced.

for(counter = 0; counter < 1024; counter++)
{
start = ktime_get();
writel(offset, cqspi->iobase + CQSPI_REG_CMDADDRESS);
writel(0x00bb0001, cqspi->iobase + CQSPI_REG_CMDCTRL);

              /* commenting this function out
              cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_CMDCTRL,
                                 CQSPI_REG_CMDCTRL_INPROGRESS_MASK, 1);
*/

reg = readl(cqspi->iobase + CQSPI_REG_CMDREADDATALOWER);
end = ktime_get();
duration_ns = ktime_to_ns(ktime_sub(end, start));
printk("reg: %x\tt: %lli\n", reg, duration_ns);
}

After commenting cqspi_wait_for_bit function out, I am able to achieve read times of about 1.6 - 1.7 us.

Is it safe to use a readl skipping the use of 'cqspi_wait_for_bit()'? Are there some safety concerns that I should be worried about?

0 Paula Carrillo 3 months ago in reply to Maneesh N

TI__Mastermind 40100 points

Hi Maneesh, without "cqspi_wait_for_bit" there is a risk reading control registers haven't finished to be setup. Did you verify the data read? is OK?

One thing could be to add "busyway" as true to this function and see if time is better

from I see from below code, that when busyway is false, then regular Polling is used with a "10" probably us as delay between reads

linux/drivers/spi/spi-cadence-quadspi.c at master · torvalds/linux · GitHub

static int cqspi_wait_for_bit(const struct cqspi_driver_platdata *ddata,
			      void __iomem *reg, const u32 mask, bool clr,
			      bool busywait)
{
	u64 timeout_us = CQSPI_TIMEOUT_MS * USEC_PER_MSEC;
	u32 val;

	if (busywait) {
		int ret = readl_relaxed_poll_timeout(reg, val,
						     (((clr ? ~val : val) & mask) == mask),
						     0, CQSPI_BUSYWAIT_TIMEOUT_US);

		if (ret != -ETIMEDOUT)
			return ret;

		timeout_us -= CQSPI_BUSYWAIT_TIMEOUT_US;
	}

	return readl_relaxed_poll_timeout(reg, val,
					  (((clr ? ~val : val) & mask) == mask),
					  10, timeout_us);
}

suggested change

for (counter = 0; counter < 1024; counter++) {
	start = ktime_get();
	writel(offset, cqspi->iobase + CQSPI_REG_CMDADDRESS);
	writel(0x00bb0001, cqspi->iobase + CQSPI_REG_CMDCTRL);

	cqspi_wait_for_bit(cqspi->iobase + CQSPI_REG_CMDCTRL,
			   CQSPI_REG_CMDCTRL_INPROGRESS_MASK, 1, true);

	reg = readl(cqspi->iobase + CQSPI_REG_CMDREADDATALOWER);
	end = ktime_get();
	duration_ns = ktime_to_ns(ktime_sub(end, start));
	printk("reg: %x\tt: %lli\n", reg, duration_ns);
}

0 Maneesh N 3 months ago in reply to Paula Carrillo

Prodigy 20 points

Hi Paula,

I'm reading the data as expected for now. I need to experiment the suggested change and recheck the timing.

Thanks,
Maneesh N

0 Maneesh N 2 months ago in reply to Maneesh N

Prodigy 20 points

Hi Paula,

Sorry for replying this late.
Like you told me I encountered issues when I tried to read this register with different addresses consecutively.

I have made the following change, but not utilizing the busywait method as you have suggested, but something like this

static int cqspi_wait_for_bit(void __iomem *reg, const u32 mask, bool clr)
{
	u32 val;

	return readl_relaxed_poll_timeout(reg, val,
					  (((clr ? ~val : val) & mask) == mask),
					  0, CQSPI_TIMEOUT_MS * 1000);
}

just made delay_us field 0 instead of 10.
Can you please help me understand if this method is ok? Or is it due to any particular reason that you wanted me to add busywait?

0 Paula Carrillo 2 months ago in reply to Maneesh N

TI__Mastermind 40100 points

Hi Maneesh, what would be the reason for not using the "busywait"?

From linux/drivers/spi/spi-cadence-quadspi.c at master · torvalds/linux · GitHub

I see current implementation of "cqspi_wait_for_bit" , if "busywait" is enabled it would do a similar aggressive polling as your snipped code, but it also has a fallback, more relaxed polling with 10us dalys, after the aggressive polling times out.

So probably this is a good balance already implemented in the driver..

thank you,

Paula

Processors

Processors forum

AM623: Memory Map an external device to OSPI for faster reads