TM4C123 SSI insidious "feature" causes spurious IRQs / data corruption

Patrick Herborn

Hello All,

Whilst trying to implement an efficient SSI driver, I stumbled upon a rather insidious "feature" of the SSI core in the TM4C123 devices.

By "efficient" I mean minimising the execution cycles needed. To this end, it would be helpful not to have to perform certain checks on the hardware, before hitting it. For example, not having to inspect the RNE bit in the SSISR before doing a fetch from the RX FIFO - it would be better if you inherently *knew* there was data there. One way of achieving this is to use the EOT mode. Since RX data is synchronous with the TX data, for any given burst of data, it should be sufficient to know that the TX FIFO has become empty to also know that there must now be an equal number of entries in the RX FIFO as were originally in the TX FIFO. For up to 8 items you can just pump the FIFO in code, for more items you can set up a uDMA transfer (and use RX channel completion to move on).

The "issue" became apparent during the switching between uDMA and IRQ modes. Occasionally some "extra" entries would appear in the RX FIFO (which, by nature of the fact that in the given application, the uDMA was using the same memory for TX and RX, would end up corrupting the TX data since the RX FIFO still had data in it, AND the RX has higher priority in the uDMA core than the TX).

This was behaving as a Heisenbug some of the time and as a Bohrbug other times. Changing the SSI clock speed could fix or break things. At one point, the change from a read-modify-write of the SSIIM to a simple write even caused a Watchdog Timeout (despite this REDUCING the instruction count)! Very weird!

It seemed to break when switching from uDMA to IRQ, so I re-inspected that code. The uDMA mode completion handler checks for the RX channel to have finished before moving on. Interestingly, upon entry to the handler, the UDMA CHIS "always" had both RX and TX bits set. It would stand to reason you should get called twice, once for TX and then for RX : since, by definition, there must be more than 8 items and since, by definition, the TX FIFO is empty when we start, this will result in a uDMA burst (which happens when there are at least 4 entries free in the FIFO, so we set the DMA ARB size to 4), followed by another burst (since we only half filled the TX FIFO). Depending on the SSI baud rate, the first word might not even have been clocked out the SSI at the point the TX FIFO is full. It stands to reason, then, that the TX transfer will be ahead of the RX, so we should "expect" the TX channel to finish first. But I just said that the read of the DMA CHIS "always" had both TX and RX bits set, and the CHIS read was the very first thing in the handler! I'm still not sure how or why that happens (perhaps related to UDMA "Wait" mode, which is enabled for the SSI, but I suspect that is "supposed" to allow the RX FIFO to accumulate enough entries for a burst request, rather than just doing single requests - I digress).

This is where it gets insidious. We already know that we are dealing with BOTH channels, which means we won't(*) get called again - we are just tidying up, so we do that, then we update the vector table with the IRQ mode handler, pump the FIFO and enable the EOT interrupt (TXIM). And then it breaks, even if we checked that the TXIS is clear in the RIS (but note SSI#07 Erratum) and even if we checked that the TFE bit in the SSISR is clear (ie there IS data in the FIFO, so we are WAITING for it to become empty, so it should be safe to unmask the TXIM now since we HAVE NOT finished yet). The IRQ handler enters and the RNE bit in the SSICR shows there is NO data in the RX FIFO. Errr, say what? How is that even possible ? The TXIS and TFE bits were clear before we enabled the TXIM so there's no(*) way we could have tail-chained back into the IRQ handler. What is going on here ? We *are* back in the IRQ handler for some reason and we haven't even shifted a single word out yet!

* - My current theory (and I do have to do more testing) is that we really do get a double-tap from the uDMA core (which are both on the same IRQ / vector) due to both TX and RX transfers finishing, BUT that there is a race condition. Sometimes, they both happen before the handler is entered and are "coalesced" into a single, joint IRQ - BOTH channels have done. But sometimes, when we are unlucky, the M4F core / NVIC is sufficiently far into the process of entering the IRQ handler (but, crucially, not *IN* it yet, so we haven't read the CHIS yet) that the second tap is no longer coalesced, but instead puts us into an IRQ ACTIVE *AND* PENDING situation, but by the time we are actually in the handler, the UDMA CHIS is showing BOTH channels, so we proceed believing it is a fully coalesced call, when in reality, it is not (as might have been evinced by only one bit in CHIS being set).

I have confirmed that reading the NVIC UNPEND register after pumping the TX FIFO (having switched from uDMA to IRQ mode) does indeed show that there is a pending interrupt (presumably for the RX channel of the previous transfer) under some conditions. It may seem like a tad late in the game to be checking (and I should probably move that check to earlier) but as tested (2 x 16 bit transfers at 1us/bit) there is no way that the TX FIFO could have been drained between being pumped and the UNPEND being checked (we're talking only a few cycles, vs 1900 to clear the TX FIFO).

As a saving grace this "feature" is only really noticeable in mixed IRQ/uDMA mode drivers. If you are using purely IRQ then you are fine. If you are using purely uDMA *and* you are gating your decisions on UDMA CHIS then the second call will just exit (since you will have cleared the CHIS flags on your first call), albeit wasting a few cycles in the process.

My suggestion is : *if* you are using uDMA to feed your SSI FIFOs, then do not rely solely on the UDMA CHIS register in your handler, also check the UDMA UNPEND register (but you can gate that on CHIS having both bits set, if only one is set then don't check UNPEND, since you might legitimately need to re-enter).

EDIT: As per the stipulation above that checking the UNPEND register just after pumping the FIFO (pretty much at the end of the handler) is perhaps not the best place to do so, I moved it to just after clearing CHIS (which happens just after reading CHIS at the entry) and it broke again. This suggests that the RX completion IRQ is happening later than I thought (as one might expect based on the above explanation) but that for some reason the CHIS is still reporting RX complete even when the RX IRQ has not arrived yet. This leads me to think that there is some delay between the uDMA updating CHIS and the NVIC seeing the RX completion signal.

Hope this info might be of use to someone,

Pat.

over 1 year ago

0 Charles Tsai over 1 year ago

TI__Guru**** 191256 points

Hi Pat,

Patrick Herborn said:
I have confirmed that reading the NVIC UNPEND register after pumping the TX FIFO (having switched from uDMA to IRQ mode) does indeed show that there is a pending interrupt (presumably for the RX channel of the previous transfer) under some conditions. It may seem like a tad late in the game to be checking (and I should probably move that check to earlier) but as tested (2 x 16 bit transfers at 1us/bit) there is no way that the TX FIFO could have been drained between being pumped and the UNPEND being checked (we're talking only a few cycles, vs 1900 to clear the TX FIFO).

As a saving grace this "feature" is only really noticeable in mixed IRQ/uDMA mode drivers. If you are using purely IRQ then you are fine. If you are using purely uDMA *and* you are gating your decisions on UDMA CHIS then the second call will just exit (since you will have cleared the CHIS flags on your first call), albeit wasting a few cycles in the process.

My suggestion is : *if* you are using uDMA to feed your SSI FIFOs, then do not rely solely on the UDMA CHIS register in your handler, also check the UDMA UNPEND register (but you can gate that on CHIS having both bits set, if only one is set then don't check UNPEND, since you might legitimately need to re-enter).

Again, thank you for your tips and findings. I'm sure it will be beneficially when someone is mixing the uDMA with the IRQ mode. I will also bookmark this post.

0 Patrick Herborn over 1 year ago in reply to Charles Tsai

Intellectual 530 points

Hiya Charles!

Again you're most welcome and I can only hope that it helps someone else out - it was a challenge to find because the hardware was, in effect, lying about the state of the interrupt.

Since there is a solution present I will mark this as resolved.

Thanks again,

Pat.

Arm-based microcontrollers

Arm-based microcontrollers forum

TM4C123 SSI insidious "feature" causes spurious IRQs / data corruption