This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-AM437X: SPI transfer delays

Part Number: PROCESSOR-SDK-AM437X

Hi!

I am struggling to get a decent speed from an SPI transfer. I need to transfer about 2k in smaller chunks using 24MHz SPI clock as MASTER.

Between the bytes of a chunk (maybe 32 to 128 bytes), the CS must be active all the time. Before and after the bytes there should be about 1 to 4 clock periods to the CS edge. The bytes within a chunk I expect to be transfered back-to-back.

I am using TI-RTOS and the pdk_am437x_1_0_13 (I haven't found any difference to the latest pdk in SPI code).

I am using DMA. In the cfg file I added Spi.Settings.useDma = "true";

like this:

/* Load the spi package */
var socType          = "am437x";
var Spi              = xdc.loadPackage('ti.drv.spi');
Spi.Settings.socType = socType;
Spi.Settings.useDma = "true";

I actually need to use 2 SPI units (not channels) but that does not matter, anyway my init code is like this:

static EDMA3_RM_Handle edma_handle;

void spi_preinit(void)
{
    EDMA3_DRV_Result edmaResult;
    SPI_v1_HWAttrs spi_cfg;
    uint32_t i;
    edma_handle = (EDMA3_RM_Handle)edma3init(0, &edmaResult);
    if (edmaResult != EDMA3_DRV_SOK) {
        UART_printf("EDMA init failed!\r\n");
    }
    for(i=0;i<2;i++) {
        SPI_socGetInitCfg(i, &spi_cfg);     /* Get the default SPI init configuration */
        spi_cfg.chnCfg[0].csPolarity=MCSPI_CS_POL_LOW;
        spi_cfg.chnCfg[0].dataLineCommMode = MCSPI_DATA_LINE_COMM_MODE_6;
        spi_cfg.chnCfg[0].tcs = MCSPI_CH0CONF_TCS0_ZEROCYCLEDLY;
        spi_cfg.chnCfg[0].trMode=MCSPI_TX_RX_MODE;
        spi_cfg.edmaHandle = edma_handle;
        spi_cfg.dmaMode = true;
        spi_cfg.chNum = MCSPI_CHANNEL_0;
        spi_cfg.chMode = MCSPI_SINGLE_CH;
        spi_cfg.rxTrigLvl = 8;
        spi_cfg.txTrigLvl = 8;
        SPI_socSetInitCfg(i, &spi_cfg);     /* Set the SPI init configuration */
    }
}

The actual transfer is started like this

    SPI_Transaction transaction;         /* SPI transaction */
    transaction.count = len;
    transaction.txBuf = &spi_txbuf[0];
    transaction.rxBuf = &spi_rxbuf[0];
    SPI_transfer(spi_handle[num], &transaction);

What I see is that for every byte transfered, I have the same time as idle time where nothing is transfered which already cuts the transfer speed in half, regardless of the transfer length.

And then before the first byte is transfered, after the CS going low, there is a huge delay of about 1.5us.

And then after the last byte is transfered, before the CS going high, there is a gigantic delay of about 4us.

What am I doing wrong or is this already the best this processor can do, despite using DMA?

I already tried a coupe of things. For example using MCSPI_MULTI_CH decreases the CS delay but then the CS goes away in between the bytes and also it does not reduce the gap between them. Without DMA is is much worse, something like I found in an older post where some poor guy is having a similar problem, but although the topic is marked as "solved" there is no help or solution there.

https://e2e.ti.com/support/processors/f/791/p/763016/2821395

Best regards,

 Manuel Köppen

In the scope picture, the upper plot is CS, the lower is CLK.

  • Manuel,

    Can you also indicate what your SPI_Params look like. Are you using blocking mode or call back mode? I am wondering if you have tried to set the trigger level as 1. Your use case of using SPI Clock of 24 MHz and continuous transfer is what the main_mcspi_dma_serial_flash_read_write aims to demonstrate using single channel connected to the SPI flash.

    pdk_am57xx_1_0_xx\packages\ti\drv\spi\test\mcspi_serial_flash\src\main_mcspi_dma_serial_flash_read_write

    This allows the driver to perform a sector read(256 byes) and write from the flash.

    The issue reported in the other E2E thread is unrelated to the SPI driver as the AM437x McSPI bootloader in starterware is not using dma based SPI read and is completely polling based implementation. The issue was resolved offline by leveraging DMA implementation. 

    Regards,

    Rahul

  • The example wasn't helpful. I guess the flash is also much slower than it should be.

    When I tried to recreate the picture in my first post I found out that the actual picture was shot with MCSPI_CH0CONF_TCS0_THREECYCLEDLY and not MCSPI_CH0CONF_TCS0_ZEROCYCLEDLY setting (tried a lot of combinations back then). I thought this setting only affects the delay between CS and data but it also adds idle time between the bytes! When I use ZEROCYCLEDLY the space between the bytes is halved (not great but an improvement).

    I wasn't able to improve the CS being held for so long after the transfer. Setting trigger levels to 1 has no effect.

    BTW: I am using blocking mode (=default):

        SPI_Params_init(&spiParams);
        spiParams.frameFormat  = SPI_POL1_PHA0;
        spiParams.bitRate = 24000000;
        spiParams.dataSize = 8; /* wordlen */
        ascspi_handle[0] = SPI_open(0, &spiParams);

    From studying the TI code, I think the long delay is caused by the CS being set inactive in software. That seems to be what the hardware requires for multi-word transfers with CS held (ouch). The DMA completion interrupt also does a lot of things before finally deasserting the CS.

    So I decided that the driver is unusable for now and tried to use the SPI unit directly.

    After some playing around (who would have thought that the FORCE bit really starts the transfer, you cannot send a byte in single mode without it, and also it clears the FIFO, so you cannot pre-load any data into the FIFO before CS. Another quirk is the RX FIFO empty flag that gets active when only 1 byte is left in the FIFO so you have check the "Receiver Register FULL" flag as well...) I got it working. I'm also using the turbo mode which cannot be reached from the SPI_xxx API.

    Now without TI code and without DMA it works much better than the TI code with DMA :)

    I also had to replace the McSPIxxxx function calls with direct writes to the registers to get some decent speed. These function calls eat up a lot of CPU power (3 calls per byte transmitted!!! McSPITransmitData, McSPIReceiveData, McSPIChannelStatusGet). A macro-based solution would have been much more efficient.

    The picture shows the result of my code running. If anybody knows how to do this using the TI API I would gladly use the API but I don't have much hope.

    Best regards,

     Manuel