DRA821U-Q1: DRA821 MCSPI1 transaction to transaction delay issue

Syed N Akhtar

Part Number: DRA821U-Q1
Other Parts Discussed in Thread: DRA821, DRA829

This is a continuation of previous E2Epost: DRA821U: DRA821 MCSPI1 transaction to transaction delay time is too long - Processors forum - Processors - TI E2E support forums

We have two customers complaining about the same issue. They can't use DMA for this application. They have increased the data throughput to the max already.

The customers are still seeing 3.5us delay between each SPI transactions between Hydra board DRA821 processor and carrier FPGA.

We plan on testing SPI bus on DRA829 Eval board to confirm that the limiting factor is not the VxWorks SPI driver. But I am hoping that TI team can help share any SPI test data and scope captures (if they have any) to show that that such delays are not expected and are not due to limitation of DRA821 internal bus structure.

I will appreciate if your team can provide support on this issue as it is impacting customers' integration schedule. Can TI team help perform SPI test on the DRA829 Eval board?

30 days ago

0 Takuma Fujiwara 30 days ago

TI__Guru 53553 points

Hi Syed,

How is the McSPI interface being used? Is it one large block of data being transferred, or multiple small transactions that are sent frequently? (and if small, how large is each transaction?)

Regards,

Takuma

0 Luke 28 days ago in reply to Takuma Fujiwara

TI__Expert 4010 points

Hi Takuma,

For information, we have 2 different use cases. In both the cases, DRA821 SPI is interacting with memory on an FPGA.

Bulk memory update:
Data is being written to contiguous memory locations and the total amount of data to be written is large.
For this use case, both the teams have already implemented the suggested workaround for reducing the overhead by using long data frames.
Asynchronous memory update:
Small amounts of data being written to different addresses, 4 to 8 bytes. There is not a lot of scope to optimize these further.

To address the second use case, team was able to modify the driver to not re-configure SPI peripheral for back-to-back transactions. With the changes in place, only a few (I think 2, I can check) registers are being written to between 2 transactions. This still leads to 3-3.5us delay between transactions (measured between chip selects).

Are other customers facing similar challenges? Are there any mitigation actions/workarounds we can look into?

Best,

Luke

0 Luke 23 days ago in reply to Luke

TI__Expert 4010 points

Hi Takuma, any update here?

0 Takuma Fujiwara 23 days ago in reply to Luke

TI__Guru 53553 points

Hi Luke,

Still looking into this.

At least for Linux, what I have found is that McSPI seems to go into a suspend mode when not in-use, so when writing small messages with some delay in between would cause the omap2_mcspi_setup function to take a significant amount of time regardless of how large the message to transfer is.

Example:

root@j7200-evm:~#  /opt/spidev_test -v -D /dev/spidev1.0 -p "HELLOWORLD"
[  319.018335] DEBUG: omap_mcspi_runtime_resume takes 1815 ns
[  319.023836] DEBUG: omap2_mcspi_setup takes 5581885 ns
[  319.028909] DEBUG: omap2_mcspi_setup takes 1735 ns
[  319.033709] DEBUG: omap2_mcspi_setup takes 1515 ns
spi mode: 0x0[  319.038856] DEBUG: omap2_mcspi_transfer_one takes 198670 ns

bits per word: 8
max speed: 500000 Hz (500 kHz)
TX | 48 45 4C 4C 4F 57 4F 52 4C 44 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __  |HELLOWORLD|
RX | 00 00 00 00 00 00 00 00 00 00 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __  |..........|

I think customer is using VxWorks, but if they referenced Linux device driver, do you happen to know if they also implement some sort of power management feature for McSPI?

Regards,

Takuma

0 Vivek Thota 21 days ago in reply to Takuma Fujiwara

Prodigy 40 points

Hello Takuma,

I wanted to provide some more information about the scenario.

Yes we are using VxWorks. The SPI driver has been modified/optimized to suite our application better (only access the registers needed and use polled mode). We do not have suspend/resume or any power optimizations features in the VxWorks SPI driver.

There are very few things happening in the application & driver between 2 transfers:

setting up the transmit data (application)
driver object lookup (driver)
setting up the SPI channel (driver) (CHCTRL, CHCONF, CHSTAT & Tx buffer)
wait for transfer to finish (driver)

With these optimizations, we still see a 3-4 us between two chip select assertions. I am currently working on timing these operations and breakdown to see where majority time is being consumed.

Additionally, we are planning to evaluate speeding up of CBASS interconnect to see if that would be of any help in speeding up of register access. Do you have any comments/ideas about that?

Our observation is similar in VxWorks to what you see, that the setup time is same/similar (for data length < buffer length).

Let me know if you need any further information about our environment/use case.

Thanks,

Vivek

0 Vivek Thota 15 days ago in reply to Vivek Thota

Prodigy 40 points

Hello Takuma, any update on this?

0 Takuma Fujiwara 15 days ago in reply to Vivek Thota

TI__Guru 53553 points

Hi Vivek,

Getting our internal SDK team's help on this one. This requires a bit more in-depth debug it seems.

However, one finding I do have is that in older SDK, it seems there was a extra device config for McSPI that has been removed for newer SDK in this commit: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/include/linux/platform_data/spi-omap2-mcspi.h?h=ti-linux-6.12.y&id=67bb37c05a6b56e0e1f804706145a52f655af3f1. This deals with a CS toggle feature after every word is transmitted.

Although, it is for Linux instead of VxWorks, maybe there is a similar feature on VxWorks if the TI driver was referenced?

Regards,

Takuma

0 Takuma Fujiwara 14 days ago in reply to Takuma Fujiwara

TI__Guru 53553 points

Hi Vivek,

To update more, there was a recent commit for "turbo" mode that decreases interword gap: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/?h=ti-linux-6.12.y-cicd&id=d445576c3b7f234acbdff168a5923f0f196dcfeb

This is again on Linux, but see if this can be referenced for porting over to VxWorks and whether the performance improvement can be seen.

Regards,

Takuma

0 Vivek Thota 3 days ago in reply to Takuma Fujiwara

Prodigy 40 points

Hello Takuma,

Is there any update on this?

We do not see any issue with CS toggle, it is asserted when the transfer begins and de-asserted after data transfer.
We discussed about turbo mode internally, as per our understanding it would be helpful for sequential transfers.
As mentioned above, we have 2 different use cases:
- Bulk update: with increased data size, team is satisfied with the performance and we are not looking to optimize it further.
- Asynchronous memory update: (address + 1 or 2 words of data) this is still a concern and based on our understanding turbo mode will not be of help.

Let me know if this makes sense and if you need further information from us.

Thanks,

Vivek

0 Takuma Fujiwara 3 days ago in reply to Vivek Thota

TI__Guru 53553 points

Hi Vivek,

No further update. Due to Thanksgiving holiday, responses will be delayed until next week.

Regards,

Takuma

0 Vivek Thota 2 days ago in reply to Takuma Fujiwara

Prodigy 40 points

Takuma, I was able to measure access time of MCSPI_CHCONF. As per my observations, each write takes about 35ns. Is that in-line with what TI expects?

We want to see if this is as expected and if there is something we can do to speed up the access.

Have a great thanksgiving!

0 Suman Anna 1 day ago in reply to Vivek Thota

TI__Guru** 114225 points

Hi Vivek,

I will let Takuma come back on this after the US Thanksgiving Holidays.

regards

Suman