TM4C129ENCPDT: uDMA with SSI sometimes delayed by 32 bits (4 bytes) on receive

Brian Willoughby18

Part Number: TM4C129ENCPDT

My application is communicating with a SPI Flash Memory chip. I have code working flawlessly when directly accessing the SSI peripheral in a tight, optimized loop. I have attempted to improve performance by changing the data transfer from direct SSI access to uDMA. I've kept the Flash Memory command byte and address bytes (usually 4-Byte address) in direct SSI access, so that only the longer Flash data transfer is handled by uDMA.

I have the uDMA Channel Control set to UDMA_SIZE_8 | UDMA_ARB_4 for efficiency. Channel Transfer is UDMA_MODE_PING_PONG. Channel Attribute has UDMA_ATTR_USEBURST set, and I believe that I am careful to wait for the SSI Busy to be de-asserted and then only program the uDMA size to a multiple of 4 bytes.

Reading from SPI Flash works flawlessly right after a System Reset, no matter how many times I repeat the command. Since these commands involve alternating between direct SSI access for command and address versus uDMA for data, it seems that my code is written correctly. However, if I write to the SPI Flash (using direct access only, no uDMA), the next uDMA read from SPI Flash suffers from the issue that there is an extra word received at the start, followed by the expected data, one word too late. Looking at the actual pins with a logic analyzer, it's apparent that the SPI Flash chip is sending the right data at the right time, but the uDMA is delivering an extra word. Once the system gets into this state, repeating the uDMA read always has that extra word at the start of each new transfer.

Other than looking for general help, based on the above issues, I'm also wondering whether there is some way to flush or reset the uDMA peripheral that I'm missing. Since there is an extra word coming from uDMA, I initially suspected that my write code left unread data in the SSI FIFO, but that doesn't seem possible since I empty the FIFO.

On a related note: Is it sufficient to loop until the SSI peripheral is not Busy before starting uDMA? Or, would I need to also check that the FIFO is Empty?

over 6 years ago

0 Ralph Jacobi over 6 years ago

TI__Guru*** 135355 points

Hello Brian,

I spent time looking into this, but I have not been able to uncover anything that would explain this behavior. I'm thinking at this point the most useful thing I can do is review relevant sections of your code to see how you are:

Configuring SPI/uDMA

Writing to the SPI Flash

What kind of reset operations are being done to try and resolve the bad state when it occurs.

Another question I would have is when the extra word shows up, is it completely nonsensical data, or is there any indication that it was related to the previous transaction or previous data loaded into the uDMA?

0 Ralph Jacobi over 6 years ago

TI__Guru*** 135355 points

Hello Brian,

Also, what clock source are you using? Want to make sure Errata item SSI#04 isn't applicable to this.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Ralph Jacobi said:
I'm thinking at this point the most useful thing I can do is review relevant sections of your code to see how you are:

Configuring SPI/uDMA

SSI:

    MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_SSI0);
    MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOA);
    MAP_GPIOPinConfigure(GPIO_PA2_SSI0CLK);
    // software control of FSS requires skipping GPIO_PA3_SSI0FSS
    // for hardware control of FSS, un-comment this line
    //  GPIOPinConfigure(GPIO_PA3_SSI0FSS);
    MAP_GPIOPinConfigure(GPIO_PA4_SSI0XDAT0);
    MAP_GPIOPinConfigure(GPIO_PA5_SSI0XDAT1);
    MAP_GPIOPinTypeSSI(SSI0_BASE, GPIO_PIN_2 | GPIO_PIN_4 | GPIO_PIN_5);
    MAP_GPIOPinTypeGPIOOutput(SSI0_BASE, GPIO_PIN_3);
    HWREG(SSI0_BASE + SSI_O_CR1) = SSI_CR1_MS | SSI_CR1_SSE | SSI_ADV_MODE_READ_WRITE;
    HWREG(SSI0_BASE + SSI_O_CPSR) = 2;
    HWREG(SSI0_BASE + SSI_O_CR0) = 0x0307;

uDMA:

    MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_UDMA);
    MAP_SysCtlPeripheralSleepEnable(SYSCTL_PERIPH_UDMA);
    MAP_uDMAEnable();
    MAP_uDMAControlBaseSet(udma_control_table);
// begin transaction:
    ROM_uDMAChannelAttributeDisable(UDMA_CHANNEL_SSI0RX, UDMA_ATTR_ALTSELECT | UDMA_ATTR_HIGH_PRIORITY | UDMA_ATTR_REQMASK);
    ROM_uDMAChannelAttributeEnable(UDMA_CHANNEL_SSI0RX, UDMA_ATTR_USEBURST);
    ROM_uDMAChannelControlSet(UDMA_CHANNEL_SSI0RX | UDMA_PRI_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_NONE | UDMA_DST_INC_8 | UDMA_ARB_4);
    ROM_uDMAChannelTransferSet(UDMA_CHANNEL_SSI0RX | UDMA_PRI_SELECT, UDMA_MODE_PINGPONG, SSI0_BASE + SSI_O_DR, dstPri, size);
    ROM_uDMAChannelControlSet(UDMA_CHANNEL_SSI0RX | UDMA_ALT_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_NONE | UDMA_DST_INC_8 | UDMA_ARB_4);
    ROM_uDMAChannelTransferSet(UDMA_CHANNEL_SSI0RX | UDMA_ALT_SELECT, UDMA_MODE_PINGPONG, SSI0_BASE + SSI_O_DR, dstAlt, size);
    ROM_uDMAChannelEnable(UDMA_CHANNEL_SSI0RX);
    ROM_uDMAChannelAttributeDisable(UDMA_CHANNEL_SSI0TX, UDMA_ATTR_ALTSELECT | UDMA_ATTR_HIGH_PRIORITY | UDMA_ATTR_REQMASK);
    ROM_uDMAChannelAttributeEnable(UDMA_CHANNEL_SSI0TX, UDMA_ATTR_USEBURST);
    ROM_uDMAChannelControlSet(UDMA_CHANNEL_SSI0TX | UDMA_PRI_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4);
    ROM_uDMAChannelTransferSet(UDMA_CHANNEL_SSI0TX | UDMA_PRI_SELECT, UDMA_MODE_PINGPONG, srcPri, SSI0_BASE + SSI_O_DR, size);
    ROM_uDMAChannelControlSet(UDMA_CHANNEL_SSI0TX | UDMA_ALT_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4);
    ROM_uDMAChannelTransferSet(UDMA_CHANNEL_SSI0TX | UDMA_ALT_SELECT, UDMA_MODE_PINGPONG, srcAlt, SSI0_BASE + SSI_O_DR, size);
    ROM_uDMAChannelEnable(UDMA_CHANNEL_SSI0TX);
    ROM_SSIIntEnable(SSI0_BASE, SSI_DMATX | SSI_DMARX);
    ROM_IntEnable(INT_SSI0);

Writing to the SPI Flash

MAP_SSIAdvModeSet(SSI0_BASE, SSI_ADV_MODE_READ_WRITE);
    _select(nPort);
    // send command byte in standard SPI mode (actually Advanced R/W)
    MAP_SSIDataPut(SSI0_BASE, COMMAND_2READ);
    MAP_SSIDataGet(SSI0_BASE, &ui32);
        MAP_SSIDataPut(SSI0_BASE, address >> 24);
            MAP_SSIDataGet(SSI0_BASE, &ui32);
    MAP_SSIDataPut(SSI0_BASE, address >> 16);
    MAP_SSIDataPut(SSI0_BASE, address >> 8);
    MAP_SSIDataPut(SSI0_BASE, address);
    // 8-bit data dummy cycle on SIO1 & SIO0 a.k.a. 4 ADD Cycle
    MAP_SSIDataPut(SSI0_BASE, 0x00);
        for (i = 0; i < 4; i++)
            MAP_SSIDataGet(SSI0_BASE, &ui32);
    while (MAP_SSIBusy(SSI0_BASE))
        ;
    MAP_SSIAdvModeSet(SSI0_BASE, SSI_ADV_MODE_BI_READ);
    while (HWREG(SSI0_BASE + SSI_O_SR) & SSI_SR_BSY)
        ;
    // ensure that the read FIFO is empty
    while (HWREG(SSI0_BASE + SSI_O_SR) & SSI_SR_RNE)
        data = HWREG(SSI0_BASE + SSI_O_DR);
    // see uDMA "begin transaction" above

What kind of reset operations are being done to try and resolve the bad state when it occurs.

So far, only the System Reset menu item in CCS under the Reset icon seems to work. I have looked for some other method to reset the uDMA, but haven't found it yet. All manner of waiting for SSI Busy to de-assert and emptying the Rx FIFO seem to have no affect on uDMA. I thought of Resetting the uDMA peripheral as a whole, before each transaction, but it might be in use by some other peripheral, so I did not even try that heavy hammer.

Another question I would have is when the extra word shows up, is it completely nonsensical data, or is there any indication that it was related to the previous transaction or previous data loaded into the uDMA?

There is a pattern, in that when the issue first appears, the extra word is one value. After that, the extra word has a different value on the second transaction and then stays the same for every subsequent repetition. This seems to indicate that the SSI or SSI Rx FIFO has some stale data from a previous transaction, but I have code that clears the Rx FIFO before starting uDMA, so I don't know what would be leaving the stale data.

Also note that before the bug is triggered, uDMA transactions to READ the SPI Flash can be repeated multiple times without any stale data. i.e. the first word is correct, as are all subsequent words. After the bug is triggered, though, the extra word is always present (until System Reset), no matter how many PIO versus DMA accesses are interleaved. Also, note that PIO access to the SPI Flash (without uDMA) is always successful, even when interleaved with uDMA transactions after the bug has been triggered. Nothing seems to clear the issue once it appears, not even successful PIO transactions on the same SSI port!

NOTE: I have concatenated many subroutines into a linear code flow above, so there might be slight errors. The redundant Busy wait that you see is there because it appears in one function before calling another subroutine that also has a wait.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Not using the ALTCLK. This code is using the System clock because we need rates like 60 MHz, or at least 15 MHz. I don't think ALTCLK can go above 16 MHz.

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hello Brian,

Looking through your SSI configuration I think you may be missing a few parameters as you are not using recommended TivaWare API's. In particular, I am not seeing the settings to ensure you have the right Mode and data size set. Note that for Advanced mode, you must used Motorola Mode 0 per:

//! When using an advanced mode of operation, the SSI module must have been
//! configured for eight data bits and the \b SSI_FRF_MOTO_MODE_0 protocol.
//! The advanced mode operation that is selected applies only to data newly
//! written into the FIFO; the data that is already present in the FIFO is
//! handled using the advanced mode of operation in effect when that data was
//! written.

Beyond that, the only other issue that offers some clue or idea I have come across is an errata that doesn't sound like it matches your case but maybe is something you can review and compare to the behavior of your system which is SSI#05 from www.ti.com/.../spmz850g.pdf

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Ralph Jacobi said:
Looking through your SSI configuration I think you may be missing a few parameters as you are not using recommended TivaWare API's. In particular, I am not seeing the settings to ensure you have the right Mode and data size set. Note that for Advanced mode, you must used Motorola Mode 0

Thanks for reviewing, Ralph. Note that the final line of the SSI configuration is:

HWREG(SSI0_BASE + SSI_O_CR0) = 0x0307;

... which should force FRF=0x0 (Freescale SPI) and DSS=7 (8-bit data) in all cases.

Is there a prerequisite other than this?

Also note that writing to the SPI Flash is preceded by SSI configuration. So, SSIAdvModeSet(SSI_BASE, SSI_ADV_MODE_BI_READ) is called after the peripheral is already in Motorola Mode 0.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Ralph Jacobi said:
the only other issue that offers some clue or idea I have come across is an errata that doesn't sound like it matches your case but maybe is something you can review and compare to the behavior of your system which is SSI#05 from www.ti.com/.../spmz850g.pdf

You are correct - I do not think that SSI#05 applies. This issue appears whether I use Bi-Mode SSI or regular 1-bit Advanced Mode.

New Information:

I hadn't run the 1-bit READ tests in a while, so I coded up a test case and found some new behavior. When using pure 1-bit SPI Flash Memory commands (and data), this issue appears in the middle of a long chain of reads, without any other (unrelated) commands.

Background:

When running my Bi-Mode tests, whose configuration code is shown above, about 64 MB of data is read flawlessly (after a System Reset). Since the TM4C129E SRAM is only 256 KB, the uDMA is writing to only 16 KB (0x004000) per command, then looping to continue at the next address. This test performs over 4,000 repetitions (non-DMA command and address phase followed by uDMA data phase) without error. I have repeated the command as many as five times in a row, and it always succeeds.

The new 1-bit READ test is basically the same, reading about 64 MB of data from SPI Flash in commands of 16 KB chunk sizes. The first run showed the missing word at the beginning of chunk number 1,411. A subsequent run showed the missing word at the beginning of chunk number 289. The spurious word was 0xBFBFBFBF the first time and 0x3F3F3F3F the second time.

I'm not sure what this tells us, but now I know that it's not related to the non-DMA ERASE / PROGRAM commands that were triggering the Bi-Mode errors. The 1-bit mode read will trigger this problem in the middle of the loop. Thus, the data written on SSI is almost exactly the same for each iteration of the loop, except that the SPI Memory address increments by 16 KB for each repetition of the loop. I don't see how that could trigger the bug since the same addresses are accessed in Bi-Mode.

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hi Brian,

No there aren't other steps that are needed. As I only joined the team after we moved to supporting TivaWare only I had missed that the line in question made those settings while crosschecking our API with your code.

I am going to ask around a bit if anyone else has any ideas on what could be causing this issue as I am not finding any sort of lead on my end right now that would lead to further debug ideas.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Thank you for continuing to look into this.

Some additional information:

I have set the FIFO to deal in 32-bit chunks (burst), and have written the firmware carefully to ensure that uDMA transfer sizes are always a multiple of 4 bytes. Any non-uDMA transfers with the SSI peripheral are cleared by waiting on the busy bit, and ensuring that the receive FIFO is empty before turning on uDMA. This should guarantee that all uDMA transfers involve a whole number of 32-bit words.

That said, when I turned off BURST, my code stopped working. I have not found an explanation for that, so I turned it back on, both to get things working and for performance reasons. I'm still wary that the 32-bit anomaly is somehow related to burst mode.

On that note, I'm not sure what the expected behavior would be in burst mode when the data size is not a multiple of 4 bytes. Would an extra word be delivered? ... or would the odd data simply persist in the FIFO until additional data is added? i.e. I don't believe that the symptoms I'm seeing could result from misuse of burst mode, but I wanted to mention the possibility.

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hi Brian,

I need to speak to a colleague about this DMA issue and the other one you posted further, but won't be able to until Wednesday next week.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Thank you for the status update.

FYI: The issues have been reproduced with multiple chip vendors - Macronix and Micron - as well as different sizes (128 Mb and 1 Gb).

0 Bob Crosby over 6 years ago in reply to Brian Willoughby18

TI__Guru 72500 points

Hi Brian,
I suggest that you put the RX FIFO flush at the beginning of your routine to write the flash instead of at the end. Here is why: In the routine to write the flash you follow each SSIDataPut() with an SSIDataGet(). Six puts and six gets. The RX FIFO should be empty at the end if it started empty. Assume there were 4 bytes in the RX FIFO at the start of routine to write the flash. You would end up with 4 bytes that would need to be flushed at the end. However, because of the TX FIFO and the delay between starting a transmit and finishing a receive, your test for receive empty may happen before the last four bytes finish transmitting.

0 Brian Willoughby18 over 6 years ago in reply to Bob Crosby

Intellectual 935 points

Bob Crosby said:
I suggest that you put the RX FIFO flush at the beginning of your routine to write the flash instead of at the end. Here is why: In the routine to write the flash you follow each SSIDataPut() with an SSIDataGet(). Six puts and six gets. The RX FIFO should be empty at the end if it started empty. Assume there were 4 bytes in the RX FIFO at the start of routine to write the flash. You would end up with 4 bytes that would need to be flushed at the end. However, because of the TX FIFO and the delay between starting a transmit and finishing a receive, your test for receive empty may happen before the last four bytes finish transmitting.

Thank you for the suggestion, Bob. I believe that my code is already handling this in the _select() function at the beginning of the routine.

static inline void _select(unsigned nPort)
{
    spi_xfer_flush_rx(nPort);
    MAP_GPIOPinWrite(g_ssi[nPort].port_base, g_ssi[nPort].fss_pin, 0);
}
spi_xfer_error_t spi_xfer_flush_rx(unsigned nPort)
{
    uint32_t ui32DataRx;

    if (nPort >= SSI_PORTS)
        return kSPIXferErrorInvalidIndex;
    while (MAP_SSIDataGetNonBlocking(g_ssi[nPort].ssi_base, &ui32DataRx))
        ;
    return kSPIXferErrorNone;
}

This ensures that no data remains in either FIFO before the SPI Flash Memory Chip Select is asserted.

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hello Brian,

Honestly at this point it's really hard for us to gauge what is going wrong as we don't have a comparable setup in hand. Simplifying things may help uncover the point of failure though.

To start simplifying, is it possible for you to use legacy mode instead? We aren't really familiar with uDMA + Advanced mode, hasn't really come up before for us.

Also you mention that you are using the MCU to send the commands to SPI Flash but you also have TX code for the uDMA. It's possible maybe something got triggered on that, so commenting out the TX code for uDMA would be good for now.

If the issue continues to persist once those steps are taken, we can try some debugging ideas with the simplified setup.

Also for the moment, there is a decent chance this issue and the other are related, so lets address these one at a time. if you see the corrupted bit behavior go away at some point, please note that.

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Ralph Jacobi said:
Simplifying things may help uncover the point of failure though.

To start simplifying, is it possible for you to use legacy mode instead? We aren't really familiar with uDMA + Advanced mode, hasn't really come up before for us.

I can try using Legacy Mode.

Ralph Jacobi said:
Also you mention that you are using the MCU to send the commands to SPI Flash but you also have TX code for the uDMA. It's possible maybe something got triggered on that, so commenting out the TX code for uDMA would be good for now.

I have compiler directives that remove all Tx code for uDMA. The Rx uDMA issues still occur even when all Tx code is PIO (non-DMA).

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hello Brian,

Any luck with the attempt to use Legacy Mode?

0 Brian Willoughby18 over 6 years ago in reply to Ralph Jacobi

Intellectual 935 points

Ralph Jacobi said:
Any luck with the attempt to use Legacy Mode?

Thank you for following up.

No, I did not have any luck. Legacy Mode results in the exact same errors that I was seeing before.

0 Brian Willoughby18 over 6 years ago in reply to Brian Willoughby18

Intellectual 935 points

Has there been any progress trying to find someone with uDMA + SSI experience? I get the impression that I'm exploring brand new territory here.
On that note, are there any examples that use uDMA to read and/or write SPI Memory? That might be a good start. If I can run those demos, then I might be able to see whether I'm doing something differently, or perhaps missing a step.

0 Bob Crosby over 6 years ago in reply to Brian Willoughby18

TI__Guru 72500 points

Brian, I independently created a project using SSI2 and uDMA. I did not run into either of the problems you mentioned. You might look at the example and see if there are differences that might account for the behavior you see.

/cfs-file/__key/communityserver-discussions-components-files/908/2318.udma_2D00_ssi2_5F00_demo.zip

0 Bob Crosby over 6 years ago in reply to Bob Crosby

TI__Guru 72500 points

Version using BUFFERED UARTprintf to avoid missing validation of some transfers. Connect PD0 to PD1 so SSI2 receives what it transmits.

/cfs-file/__key/communityserver-discussions-components-files/908/7462.udma_2D00_ssi2_5F00_demo.zip

0 Brian Willoughby18 over 6 years ago in reply to Bob Crosby

Intellectual 935 points

Thanks, Bob.

Your code runs fine for me, but it does not perform the equivalent of SPI Memory commands (where non-uDMA and uDMA transfers occur in between each other). In other words, your example code might be too simple to trigger the problem that I am seeing.

As I mentioned, my code experiences correct SSI/uDMA data alignment right after a System Reset, and I can repeat certain uDMA-based commands as long as I want without error. However, certain sequences of SPI commands - possibly odd numbers of bytes - trigger the 32-bit misalignment issue, and that issue persists until the next System Reset.

I have yet to find out the exact trigger - there are multiple - and I have yet to find any way that is less drastic than a System Reset to restore alignment between the SSI FIFO and uDMA peripherals.

0 Brian Willoughby18 over 6 years ago in reply to Brian Willoughby18

Intellectual 935 points

Would it be possible for this loopback-based SSI+FIFO+uDMA example code to be expanded so that it performs full SPI commands with an actual external memory chip?

I suspect that the issue we're seeing here will be revealed with a more complex interleaving of non-uDMA and uDMA transfers.

I see that the TM4C129X Development Board has an SPI memory chip of some kind. Its connection conflicts with one of our firmware peripheral assignments, so it is difficult for me to run our code there. Would it be possible for this Texas Instruments example code to be expanded to communicate with that SPI chip on the TM4C129X board? It should be sufficient to check SPI chip status and then perform a long uDMA read and/or write command, using non-uDMA for the SPI Command and Address phase and uDMA for the Data phase.

The good news is that I see that the example code sets up the peripheral once, and does not reset any peripheral between iterations of the test. That's at least similar to our firmware, where setup is performed just once, and everything works until certain interactive command sequences are initiated. I suspect there might be a peripheral state that I need to reset. Anything short of the full System Reset that I'm currently doing to recover functionality would be helpful, since it's obviously not possible to reset the entire chip when multiple peripherals are active at the same time.

0 Ralph Jacobi over 6 years ago in reply to Brian Willoughby18

TI__Guru*** 135355 points

Hello Brian,

Bob and I discussed your request, but the amount of effort required to expand that example to that level is large and therefore that request is beyond what we can provide.

If the suspicion is that the interleaving of non-uDMA and uDMA transfers is the root cause, then the best means to isolate the issue would be to try and simplify the process in order to identify what step(s) either in configuration or in application-level handling where the handoff between these commands occurs seems to be the trigger for the issue. Another idea could be to validate if it is the interleaving by using only uDMA commands as well (if I remember right, you already verified only non-uDMA works).

When simplifying the process, perhaps you can work from the example provided and therefore replicate the issue on a TI EVM in the process, that would be the quickest way for us to be able to deep dive the issue.

Overall these sorts of very specific issues that have not come up before really require the issue to be simplified to the point where we can hone in on the API's or sequence of actions that trigger the issue so we can do further analysis to understand what went wrong and attempt to resolve it.

Arm-based microcontrollers

Arm-based microcontrollers forum

TM4C129ENCPDT: uDMA with SSI sometimes delayed by 32 bits (4 bytes) on receive