Irregular missing SPI byte on slave MISO

Jarrett Gersten

Other Parts Discussed in Thread: TM4C1290NCPDT

In my setup the host master device sends many SPI commands to the Tiva (especially on start-up). Irregularly, the Tiva will reply with 0x0 when it should be replying with an ack (0xA5). I have tried the following tests without correcting the issue:

1. I have scoped the MISO line when this issue occurs and have seen the line is active (pulled low) for the byte in question, but does not attempt to send an 0xA5 pattern. This rules out the condition of another device dominating the MISO line.

2. Instead of using the API call- SSIDataPut(), write directly to register: HWREG(SSI0_BASE + SSI_0_DR)

3. Hardcode an 0xA5 return (rather than variable) to rule out RAM issue

4. Added a while( SPI busy) before calling SSIDataPut()

5. Changed compiler optimization settings (default=2)

I have wiggled an led before and after the SSIDataPut() call to verify that the function is being executed at the expected time.

TM4C1290NCPDT w/ CCS 6.1

SPI is configured as:

-SSI_MODE_SLAVE

-SSI_FRF_MOTO_MODE_1

- 3.5Mhz master clock (48Mhz internal Tiva clock)

-8 data bits

Wondering if anyone has any ideas as to what I can look at?

Thanks for your input,

Jarrett

over 9 years ago

0 cb1 over 9 years ago

Guru 47900 points

You've supplied much thoughtful data - good that. That said - your issue qualifies as an "Intermittent one" does it not - and these are (always) the most challenging. When intermittents arrive - we need even more data!

J: "Irregularly, the Tiva will reply with 0x0..."

How do you define, "Irregular?" Every 10th byte, 100th, 1Kth?

J: "line is active (pulled low) for the byte in question, but does not attempt to send an 0xA5 pattern. This rules out the condition of another device dominating the MISO"

How do you know your TM4C is, "active & pulling down MISO?" Your writing does not - convincing - support that. I don't agree that, "other device "domination" has been ruled out.

J: "anyone has any ideas as to what I can look at?"

Indeed - most always in such intermittent conditions - slowing the data rate proves helpful. You can always "dial up" (after) the issue's resolved.

Undescribed is the connection sanctity between TM4C and controlling master. That's been checked - by more than one person - and judged proper? Power to all boards w/in your scheme has been measured during operation - and (again) judged proper?

You're silent as to the number of boards which so suffer. You've more than one (we hope) do all behave the same? Single board anomaly - never fun - never really worthwhile.

Devil in such detail - especially when intermittents emerge.

0 Robert Adsett over 9 years ago in reply to cb1

Guru 27665 points

One other question what is the protocol you are using? You mention a reply but not how it's generated and when it must be sent.

Likewise not convinced you've ruled out other devices interfering. The only real method of doing that is to isolate the other devices in some fashion.

Robert

0 Jarrett Gersten over 9 years ago in reply to cb1

Prodigy 130 points

cb1,

Thanks for the thoughtful reply.

The problem is not regular or even highly predictable thus I cannot reliably duplicate the issue (thus proving a fix will be challenging).

This issue has occurred on other boards, so I have ruled out a single board abnormality.

The segment of code where the acknowledgement write takes place is the same for every processed command. Ie. a switch() statement handles the master command before calling a common function that sends the MISO reply. The 'missed' ack occurs in this common send function - this process is the same for all commands.

To address your questions/concerns:

cb1: How do you know your TM4C is, "active & pulling down MISO?"

In this capture, 04-Channel4 is the analog version of the 03-MISO line. The 06-Channel6 pulse is when I am attempting to write the 0xA5 ack byte to the tranmit buffer. I expect the next byte to go out the MISO line should be the 0xA5 (and usually it is). In this affected example, it does not transmit on the next byte nor does it on subsequent bytes. I expect that if the Tiva was attempting to output 0xA5 on the MISO line, I would see some activity on 04-Channel4 corresponding to the 0xA5 bit pattern. Because the pulling down and releasing of the 04-Channel4 MISO line is regular and consistent with all other MISO transmissions, I draw the conclusion that the Tiva is in control of the line and an external device is not dominating the line.

To speak to the connection between the host and the Tiva slave- it includes some complexity. The host drives many devices on this SPI bus. In my world there are 2 Tiva devices: TM4C129 (primary) & TM4C123 (secondary). They are routed through an FPGA to the host which resides on another board.

I am currently running tests to answer 2 questions:

1. Is the FPGA dominating/mis-routing the signal(s)

2. Does the fact that the Tiva's transmit buffer gets exhausted during command processing influence the issue? Ie. The host sends a command then continues to send 'reads', 0x00, until the ACK is received. As the Tiva is processing the command, reads continue to come in and the Tiva, whose transmit buffer is potentially empty, sends the last transmitted value, 0x00, until the ACK is written to the transmit buffer.

At start-up, the master device sends between 20-40 commands. At some point later when activated, the master will send another 10-30 commands.

This is what I have observed:

1. the issue can happen at either stage: start-up or activation. If it happens at start-up, we stop the process and don't try activation.

2. the issue tends to 'pop-up' - meaning I can go a dozen resets without seeing it, then for unknown reasons it will happen every reset 5 times in a row, or happen once every 4 resets (these example numbers are not precise, they vary widely - hence 'irregular'). It has been difficult for me to identify any sort of pattern other than when the issue is happening, it tends to be more likely to happen again.

3. the issue tends to happen on the same command in the start-up/activation sequence.

Eg. during start-up issue occurred on command sequence #10. Restart system issue occurs again on command sequence #10.

4. altering the execution timing changes the behavior

Eg. Do some unrelated bit wiggling before SPI write, start-up issue may occur on command sequence #9 or # 11

5. the SPI peripheral is not 'hung' as it can process and correctly output subsequent commands/acks

Another fun fact: I have seen this issue happen very rarely on the TM4C123 (secondary) device as well (it uses the same command processing software structure as the TM4C129).

0 Jarrett Gersten over 9 years ago in reply to Robert Adsett

Prodigy 130 points

Hi Robert,

Protocol:
1. There is a 0x00 queued in the transmit buffer
2. Host sends in a command and continues to send reads (0x00) until a response is received
3. Tiva interrupts on the host command & disables SPI interrupt (host is continuing to send reads)
4. Tiva processes host command and queues a response in the transmit buffer
5. Tiva queues a 0x00 in the transmit buffer
6. SPI interrupt is re-enabled

Interference:
Currently testing some ideas around the FPGA's behavior; the FPGA routes this Tiva and another Tiva on the board to the host.

0 Robert Adsett over 9 years ago in reply to Jarrett Gersten

Guru 27665 points

A couple of thoughts

You might want to 'scope the clock line as well to make sure it's well formed.

I'd try adding a pause before reading the status, you could be seeing a race condition. Disabling the interrupt wouldn't prevent a race. You are also vulnerable in the case of a race the acknowledgement would be lost since you only queue it once. Another check against that would be to continuously queue of the ACK as long as you were in the same transaction. If either the pause or continuous ack fill worked I'd strongly suspect a race. A numbered ACK might be useful as part of the solution in that case.

Robert

0 cb1 over 9 years ago in reply to Jarrett Gersten

Guru 47900 points

May I memorialize your most recent two postings as, "Top cabin!" Greatly detailed - well introduced - and generally chronologic in presentation.

Have to properly digest this new data - yet one comment bubbles up: firm/I seek to avoid the use of 0x00 and/or 0xFF as key/critical commands. We justify this as either of those (solid) signal levels may be the (predictable) result of a signal line's receiving (unwanted) "pull" to Gnd or VDD. Any other 8 bit command escapes this predicament. (and I know in certain applications/design sectors - both of those "Command Values" are "illegal/disallowed" for the reason I've detailed.

Believe poster/friend Robert and I vote for your best efforts in isolating (at least minimizing) any intrusive effect brought on by (other) bus accessing devices. As a simplistic (first) test - cannot you run that (essentially) identical code across TWO Eval boards - one a 129 - the other 123 - alone? You want to determine that - in the most minimal "incarnation" the problem does - or does not - occur...

That's "first order" KISS - which rises (greatly) in value - when issues such as yours dawn...

0 Jarrett Gersten over 9 years ago

Prodigy 130 points

I thought I would post a follow-up with where we ended up with this issue.
We were using a polling type communication scheme where the host would send a command to the Tiva and continually ping it with read commands until it received an answer. This had the negative effect of 1. depleting the Tiva's output FIFO 2. much overhead and bus traffic of dealing with all the read commands while the Tiva is trying to process the original command.
While I have no conclusive proof, I suspect the missed ACK was a result of the depleted transmit FIFO. When the Tiva receives a SPI byte and has nothing readily queued in its output FIFO, it sends a previously sent value. We were not keeping the Tiva's transmit FIFO continuously populated and relying on this 'previously sent value' as a placeholder response to the host until the Tiva had an actual response to send. I believe there can be some undefined behavior related to this operation.
We ended up re-architecting the host/Tiva communication protocol. The host now sends SPI commands/data in 4byte chunks (more in line with how the Tiva's SPI interrupt is triggered) and waits for the Tiva to interrupt back to the host to say it's done doing what it needs to do and has a response waiting. The host then sends down a read command to get the Tiva's reply. This eliminates the behavior of exhausting the Tiva's transmit FIFO as well as reduces the SPI traffic at the cost of added complexity around the added interrupt line to the host.
We have seen very positive results since the change as the SPI communication is more robust and less error prone than the previous implementation.

Arm-based microcontrollers

Arm-based microcontrollers forum

Irregular missing SPI byte on slave MISO