This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3359 (AM335x) UART0: First FIFO fill occasionally sends out garbage when TX_FIFO_TRIG >1

Other Parts Discussed in Thread: AM3359, SYSCONFIG

Hello,

We have a case where BBB / AM3359 UART0 is used in FIFO interrupt mode. The driver idea is based on StarterWare code, which has been modified to our purposes.

TX_FIFO_TRIG is set to a value >1 (e.g. 8, 16 etc., using either granularity 1 or 4), FIFO operation is enabled etc.

Whenever the THR interrupt occurs, the SW writes a maximum of Y (e.g. Y = 63-TX_FIFO_TRIG) bytes into the TX FIFO. There is a guard mechanism in place to stop writing when TXFIFO_LVL exceeds e.g. 62. (It also worked with TXFIFOFULL check).

When there is nothing to send, the THR interrupt is disabled. It is enabled again when we have something to send. (I also had the interrupt disabled for the duration of writing to the FIFO like they had in StarterWare example, but it did not have any impact on this issue or anything else).

This approach and the driver is verified as follows:

- it can perform hundreds of thousands of transfers (overnight) of different lengths over UART at 115200 kbps, 1 stop bit, no flow control, showing no fails. These transfers are either back to back or separated by a short duration (of RX over UART0), in two different test cases.

- debug code proves that the approach works i.e. the FIFO is being fed so it does not underflow or overflow (i.e. i write a special character in the stream each time the FIFO is written, so I can see how often I am feeding it - indirect evidence, but as the pattern makes sense I would say it has some value in proving it works. With FIFO empty at start, the write is more bytes and when feeding the UART, the number of written bytes depends on where I have set the trigger, and the final write can of course be shorter again )

- by removing the guard mechanism (TXFIFO_LVL ) I can make UART0 send out garbage (this is supposed to prove that it does not overflow normally)

However, the problem starts when the driver is used for sending out buffers periodically, with appx. one second pause in between transmissions. The line is silent for ~1 second and we start the transmission.

Occasionally, but not always, the first transfer attempt after a 1 second pause results in sending out garbage. The only differentce this write has to the rest of the writes is that it is the first one in the row. (It used to be consisting of more bytes than the rest because the FIFO was empty initially, but I reduced the length of the first FIFO fill to the level of the typical later fills, so this is also ruled out). The subsequent FIFO writes again yield fully succesfull transmission. The frequency of this failure varies, but generally, you have a fail every few seconds, e.g. 1 out of 5 buffers writes, which happen at 1 second intervals, fails. 

My colleague had a look into this situation with a scope and:

- UART0 output on the BBB (before going to USB cable)

- Timing and voltages look OK. 

- there appear to be extra bits between characters at the beginning of a packet

- The bits are well-defined with regard to timing and voltage as if the UART is really sending them.

Several things were tried out to rule out possible root causes:

- Writing just one byte at a time using the UARTCharPut (and not UARTFIFOCharPut), it still failed (this time, only the first byte after the 1 second pause was garbage). This was a very surprising result as this approach effectively means that we don't write to the FIFO until it is completely empty.

- different combinations of TX_FIFO_TRIG and TXFIFO_LVL (as well as TXFIFOFULL, and other limitations to number of bytes written) were tried out

Finally, it turns out that setting TX_FIFO_TRIG to 0 (!) or 1 removes this instability from the system. (Also value 2 appears to work but this has not been verified properly). 3 and above, it starts failing again.

The only difference to the trial with UARTCharPut (as described above) is the TX_FIFO_TRIG setting, and the fact that in the UARTCharPut case we (potentially) come to polling the FIFO empty bit (also tried the shift register empty condition) a bit earlier.

Based on this. the TX_FIFO_TRIG values >1 (or maybe 2) behave in a way not understood by me yet.

(Or then the system does not tolerate accesses to UART0 HW whenever the FIFO is in the process of writing into the transfer shift register and/or out of the chip.)

Would you have a theory what could have caused this behaviour? 

There is the advisory 1.0.12 UART: Extra Assertion of FIFO Transmit DMA Request, UARTi_DMA_TX.

Also, in some Linux discussions which I found by google, they indicate that another issue might be present:

<citation starts>

At least on AM335x the following problem exists: Even if the TX FIFO is
empty and a TX transfer is programmed (and started) the UART does not
trigger the DMA transfer.
After $TRESHOLD number of bytes have been written to the FIFO manually the
UART reevaluates the whole situation and decides that now there is enough
room in the FIFO and so the transfer begins.
This problem has not been seen on DRA7 or beaglebone (OMAP3). I am not
sure if this is UART-IP core specific or DMA engine.

The workaround is to use a threshold of one byte, program the DMA
transfer minus one byte and then to put the first byte into the FIFO to
kick start the transfer.

<citation ends>

These are both related to using DMA, which we haven't got. However, the latter case bears some similarity to our case with FIFO being empty and having difficulties in making a succesfull transmission. (In our case, if for some reason the FIFO did not start running correctly when we first wrote to it, we could actually overflow the system by the subsequent writes - and by overflowing it on purpose, I can make it behave similarly. However, I emphasise this all is speculation based on supeficial similarities).

Do you have any further information on any possibility of issues in TX FIFO interrupt generation also in non-DMA use case?

  • Hi,

    What software are you using?

  • I'll admit I only skimmed through your post since I'm a bit busy atm, but I'd like to point you to this thread where I helped someone with UART issues and along the way explained many ugly details about the UART, made a diagram of all modes of the uart register interface, and also discovered that the AM335x StarterWare uart driver does some pretty idiotic things (and the echo example compiles to a deadloop when using gcc with -O2 or higher).

    The erratum you mentioned only applies when using DMA.  As far as I can tell, the UART "works fine" in interrupt mode for all combinations of settings I tested (if you don't consider the messy register interface and irq reporting with a few decades of historical baggage to be a defect).  However, do see my comment about the omap4/omap5 i202 erratum in that thread, I don't know whether it applies to the am335x but it doesn't hurt to perform the workaround anyway.

    (I've also started working on making a proper uart example, but other stuff has been taking priority so far.)

  • I should note that the init procedure I give early in the thread is still more complicated than necessary; as my understanding of the uart progressed further during and after the thread it is now:

    • reset module, wait until done, set sysconfig to desired value (typically 0x11)
    • set LCR to 0xBF
    • configure bitrate, EFR, and if applicable xon/xoff registers as desired. Always set EFR.4.
    • set LCR to its final value (i.e. the desired frame format settings)
    • set MCR.6 to enable access to fifo threshold levels (also a good time to set MCR.5 to desired value)
    • configure and reset fifos (SCR, TCR, TLR, FCR)
    • take serdes out of reset (MDR1)
    • apply i202 workaround: wait briefly, then reset fifos again (FCR)

    uart is now operational, you can enable irqs as desired.  I'm leaving MCR.6 enabled since it can actually be quite useful to change those thresholds on-the-fly (and based on my testing the uart is perfectly okay with it). You can't access MSR as a result, but almost nobody cares about it anyway.  (if you do care, you can toggle MCR.6 to access it, but to be honest its job is actually better performed using GPIO)

    note that setting LCR.7 is forbidden during uart operation, so if you need to change any settings that you cannot access in operational mode the best thing is just wait until the transmitter is idle and then fully reinitialize the uart using the same procedure.

  • Matthijs,

    Thank You for taking the time and pointing me to that earlier thread. I was following many of the same paths and also spotted the errors in the uartEcho example code (which I however did not use). I was also suspicious enough to go through the fifoConfig code and all the register bit fields bit by bit.

    However, I was not suspicious enough to question the fundamental approach in their interrupt enable function, which you criticised for accessing the LCR and going to configuration mode. I now have a strong indication to believe that this was the root cause. I simplified my interrupt disable and enable to only modify the 1st bit (1<<1) and the malfunction disappears. I have been running the code for an hour, with TX_FIFO_TRIG set at 57, and haven't seen a fail.

    Once more: thanks for your help!

    Thanks to this, we now also have a thread on this issue on Sitara / AM335x side.

    Additional remarks to TI:

    Regarding spruh73k.pdf, I get a strong feeling that the individual who wrote the spec, had the same misconception regarding the operation of TX_FIFO_TRIG which I had earlier. Going through Matthij's messages in the other thread, I once again checked the spec for these, and there are two places in the spec where they seem to determine the TX_FIFO_TRIG operation:

    - Figure 19-6 "TX Fifo Interrupt Request Generation", clearly suggests that the interrupt request is generated based on the FIFO threshold value and not on the "number of spaces" value as it should. All the wordings suggest we are triggering based on the FIFO threshold value.  

    - Chapter 19.3.6.2 FIFO Interrupt Mode. "These interrupts are raised when the RX/TX FIFO threshold (the UARTi.UART_TLR[7:4] RX_FIFO_TRIG_DMA and UARTi.UART_TLR[3:0] TX_FIFO_TRIG_DMA bit fields or the UARTi.UART_FCR[7:6] RX_FIFO_TRIG and UARTi.UART_FCR[5:4] TX_FIFO_TRIG bit fields, respectively) is reached."

  • Yeah, as the garbled final character in my screenshot shows, setting bit 7 of LCR causes the UART to cease operation even if it's in the middle of transmitting a character, so that interrupt-enable function in StarterWare's driver is playing russian roulette with the data stream.

    The silly thing is that it only seems to be doing this because of the insistence of keeping bit 4 of EFR cleared most of the time, even though there is absolutely no benefit in doing so as far as I can tell.  The only effect clearing EFR.4 has is decreasing the amount of accessible functionality of the UART, and I'm pretty sure the only reason the bit exists at all and is 0 by default is for backward compatibility reasons.  (With extra emphasis on backward, like a decade or two or so. Pretty much all functionality of this UART except CIR was already present in the OMAP1 version.)

    The trigger level confusion is why I proposed (and consistently used) the replacement terminology of "read/write burst size" for the parameters programmed via TLR and the upper half of FCR, since the official names are quite misleading, as the actual transmit fifo trigger level is 64 - the write burst size (the programmed value).  The DMA suffix is of course also bogus as they also apply to non-dma operation, and in fact for transmit dma the trigger level needs to be programmed in a separate register due to erratum (in this case it's the actual fifo level, not the number of free spaces).