DRA821U-Q1: DRA821 MCSPI1 transaction to transaction delay issue

Part Number: DRA821U-Q1
Other Parts Discussed in Thread: DRA821, DRA829

This is a continuation of previous E2Epost: DRA821U: DRA821 MCSPI1 transaction to transaction delay time is too long - Processors forum - Processors - TI E2E support forums 

We have two customers complaining about the same issue. They can't use DMA for this application. They have increased the data throughput to the max already.

The customers are still seeing 3.5us delay between each SPI transactions between Hydra board DRA821 processor and carrier FPGA.

We plan on testing SPI bus on DRA829 Eval board to confirm that the limiting factor is not the VxWorks SPI driver. But I am hoping that TI team can help share any SPI test data and scope captures (if they have any) to show that that such delays are not expected and are not due to limitation of DRA821 internal bus structure.

I will appreciate if your team can provide support on this issue as it is impacting customers' integration schedule. Can TI team help perform SPI test on the DRA829 Eval board? 

  • Hi Syed,

    How is the McSPI interface being used? Is it one large block of data being transferred, or multiple small transactions that are sent frequently? (and if small, how large is each transaction?)

    Regards,

    Takuma

  • Hi Takuma,

    For information, we have 2 different use cases. In both the cases, DRA821 SPI is interacting with memory on an FPGA.

    1. Bulk memory update:
      Data is being written to contiguous memory locations and the total amount of data to be written is large.
      For this use case, both the teams have already implemented the suggested workaround for reducing the overhead by using long data frames.
    2. Asynchronous memory update:
      Small amounts of data being written to different addresses, 4 to 8 bytes. There is not a lot of scope to optimize these further.

    To address the second use case, team was able to modify the driver to not re-configure SPI peripheral for back-to-back transactions. With the changes in place, only a few (I think 2, I can check) registers are being written to between 2 transactions. This still leads to 3-3.5us delay between transactions (measured between chip selects).

    Are other customers facing similar challenges? Are there any mitigation actions/workarounds we can look into?

    Best,

    Luke

  • Hi Takuma, any update here?

  • Hi Luke,

    Still looking into this.

    At least for Linux, what I have found is that McSPI seems to go into a suspend mode when not in-use, so when writing small messages with some delay in between would cause the omap2_mcspi_setup function to take a significant amount of time regardless of how large the message to transfer is.

    Example:

    root@j7200-evm:~#  /opt/spidev_test -v -D /dev/spidev1.0 -p "HELLOWORLD"
    [  319.018335] DEBUG: omap_mcspi_runtime_resume takes 1815 ns
    [  319.023836] DEBUG: omap2_mcspi_setup takes 5581885 ns
    [  319.028909] DEBUG: omap2_mcspi_setup takes 1735 ns
    [  319.033709] DEBUG: omap2_mcspi_setup takes 1515 ns
    spi mode: 0x0[  319.038856] DEBUG: omap2_mcspi_transfer_one takes 198670 ns
    
    bits per word: 8
    max speed: 500000 Hz (500 kHz)
    TX | 48 45 4C 4C 4F 57 4F 52 4C 44 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __  |HELLOWORLD|
    RX | 00 00 00 00 00 00 00 00 00 00 __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __  |..........|

    I think customer is using VxWorks, but if they referenced Linux device driver, do you happen to know if they also implement some sort of power management feature for McSPI?

    Regards,

    Takuma

  • Hello Takuma,

    I wanted to provide some more information about the scenario.

    Yes we are using VxWorks. The SPI driver has been modified/optimized to suite our application better (only access the registers needed and use polled mode). We do not have suspend/resume or any power optimizations features in the VxWorks SPI driver.

    There are very few things happening in the application & driver between 2 transfers:

    • setting up the transmit data (application)
    • driver object lookup (driver)
    • setting up the SPI channel (driver) (CHCTRL, CHCONF, CHSTAT & Tx buffer)
    • wait for transfer to finish (driver)

    With these optimizations, we still see a 3-4 us between two chip select assertions. I am currently working on timing these operations and breakdown to see where majority time is being consumed.

    Additionally, we are planning to evaluate speeding up of CBASS interconnect to see if that would be of any help in speeding up of register access. Do you have any comments/ideas about that?

    Our observation is similar in VxWorks to what you see, that the setup time is same/similar (for data length < buffer length).

    Let me know if you need any further information about our environment/use case.

    Thanks,

    Vivek

  • Hello Takuma, any update on this?

  • Hi Vivek,

    Getting our internal SDK team's help on this one. This requires a bit more in-depth debug it seems. 

    However, one finding I do have is that in older SDK, it seems there was a extra device config for McSPI that has been removed for newer SDK in this commit: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/include/linux/platform_data/spi-omap2-mcspi.h?h=ti-linux-6.12.y&id=67bb37c05a6b56e0e1f804706145a52f655af3f1. This deals with a CS toggle feature after every word is transmitted.

    Although, it is for Linux instead of VxWorks, maybe there is a similar feature on VxWorks if the TI driver was referenced?

    Regards,

    Takuma

  • Hi Vivek,

    To update more, there was a recent commit for "turbo" mode that decreases interword gap: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/commit/?h=ti-linux-6.12.y-cicd&id=d445576c3b7f234acbdff168a5923f0f196dcfeb

    This is again on Linux, but see if this can be referenced for porting over to VxWorks and whether the performance improvement can be seen.

    Regards,

    Takuma

  • Hello Takuma,

    Is there any update on this?

    • We do not see any issue with CS toggle, it is asserted when the transfer begins and de-asserted after data transfer.
    • We discussed about turbo mode internally, as per our understanding it would be helpful for sequential transfers.
      As mentioned above, we have 2 different use cases:
      • Bulk update: with increased data size, team is satisfied with the performance and we are not looking to optimize it further.
      • Asynchronous memory update: (address + 1 or 2 words of data) this is still a concern and based on our understanding turbo mode will not be of help.

    Let me know if this makes sense and if you need further information from us.

    Thanks,

    Vivek

  • Hi Vivek,

    No further update. Due to Thanksgiving holiday, responses will be delayed until next week.

    Regards,

    Takuma

  • Takuma, I was able to measure access time of MCSPI_CHCONF. As per my observations, each write takes about 35ns. Is that in-line with what TI expects?

    We want to see if this is as expected and if there is something we can do to speed up the access.

    Have a great thanksgiving!

  • Hi Vivek,

    I will let Takuma come back on this after the US Thanksgiving Holidays. 

    regards

    Suman