RTOS/EK-TM4C1294XL: NDK in TI-RTOS for TivaC 2.16.1.14 can get into a state where unable to transmit packets

Chester Gillon

Part Number: EK-TM4C1294XL

Tool/software: TI-RTOS

As a result of RTOS/TM4C129XNCZAD: Is there a way to reset Ethernet Phy and NDK? which describes an intermittent failure of Ethernet communication, created an Ethernet stress-test of TI-RTOS for TivaC 2.16.1.14 by using a modified tcpEcho_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT example with changes to allow the data rate to be increased. When using four off tcpSendReceive programs with a data size of 4380 and zero delay the Tiva Ethernet is transmitting and receiving about 23 Mbits/second, but after a several seconds to tens of minutes the test stops due to the Tiva being unable to transmit Ethernet packets. Watching the 'EMACSnow.c'::EMACSnow_private EMAC driver statistics in the CCS Expressions view shows that Ethernet packets can still be received (rxCount and isrCount increment) but Ethernet packets can't be transmitted (txSent doesn't change and txDropped increments).

The EMACSnow.c driver in TI-RTOS handles interrupts by:

a) EMACSnow_hwiIntFxn() calls EMACIntStatus() to read the interrupt status, and then EMACIntClear() to clear the interrupt status.

b) EMACSnow_hwiIntFxn() disables the interrupts.

c) EMACSnow_hwiIntFxn() stores the interrupt status in the global variable g_ulStatus.

d) EMACSnow_hwiIntFxn() calls Swi_post() to have the Swi handle the incoming packets.

e) EMACSnow_handlePackets() which is the Swi handler reads g_ulStatus to determine if to process the transmit and/or receive interrupts.

f) EMACSnow_handlePackets() re-enables the interrupts.

By investigating the cause of the test failure found a condition under which EMACSnow_hwiIntFxn() can be called twice before EMACSnow_handlePackets() is run which can lead to the failure. E.g. the sequence:

1) All four transmit descriptors are in use.

2) EMACSnow_hwiIntFxn() is called and sets g_ulStatus to 0x10041 meaning transmit and receive interrupts are pending.

3) EMACSnow_hwiIntFxn() is called again and there are no pending interrupts resulting in g_ulStatus being set to zero.

4) EMACSnow_handlePackets() runs, but since g_ulStatus is zero resulting in EMACSnow_handlePackets() thinking no transmit or receive interrupts need to be processed.

5) The transmit descriptors have completed their transmission, and the EMAC peripheral doesn't generate any more transmit interrupts.

6) Since the pending transmit interrupt from 2) has been lost EMACSnow_processTransmitted() is not called.

7) The Tiva NDK is stuck thinking all transmit descriptors are use waiting for transmission to complete, resulting in all further attempts to queue pbufs for transmission to fail with EMACSnow_private.txDropped being incremented.

The following capture from Data Variable Tracing on the g_ulStatus shows the problem of a lost transmit interrupt occurring:

In normal use, for each interrupt handled there should be one write to g_ulStatus from EMACSnow_hwiIntFxn() and two reads of g_ulStatus from EMACSnow_handlePackets (the first read tests for transmit interrupts and the second read tests for receive interrupts). The highlighted line in the trace above is EMACSnow_hwiIntFxn() over-writing the previous value indicating transmit and receive interrupts are pending with a zero, before the interrupts have been handled by EMACSnow_handlePackets().

The modified TCP echo example which showed the failure is attached, and the readme file describes the changes made and investigation into the failure mode.7080.tcpEcho_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT.zip

over 7 years ago

0 Chester Gillon over 7 years ago

Guru 92251 points

Chester Gillon said:
By investigating the cause of the test failure found a condition under which EMACSnow_hwiIntFxn() can be called twice before EMACSnow_handlePackets() is run which can lead to the failure.

Rather that using a global variable to communication the pending interrupts from the Hwi to Swi a "trigger" can be used to pass the pending interrupts where Swi_or() can be used instead of Swi_post() to ensure if the Hwi gets called more than once before Swi runs then no pending interrupts are lost.

The changes to the TI-RTOS EMACSnow.c are:

a) Delete the g_ulStatus global variable.

b) Make EMACSnow_handlePackets() call Swi_getTrigger() to get the pending interrupts, rather than reading a global variable.

c) Make EMACSnow_hwiIntFxn() call Swi_or() to set the pending interrupts passed to the Swi.

d) Can also delete the disabling of the interrupts from EMACSnow_hwiIntFxn() and the re-enabling of interrupts from EMACSnow_handlePackets(). This is because the use of the Swi trigger prevents pending interrupts from being lost.

With these changes the test which previous failed in about 10 minutes has now been running for 5.5 hours without failure having echoed 52 Gbytes of data over TCP.

The updated example is attached, which includes the modified EMACSnow.c 1033.tcpEcho_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT.zip

Compared to the EMACSnow.c from TI-RTOS 2.16.1.14 the changes are:

a) Those described above.

b) The fix for TivaC NDK TCP: Can't receive large packets (>=1460 bytes)

c) Add some diagnostics to EMACSnow_private to record which types of abnormal interrupts have occurred (rather than just the count of abnormal interrupts). This shows that in a 5.5 hour stress test there have been 120 abnormal interrupts due to EMAC_INT_RX_NO_BUFFER (given that there are four receive descriptors and are trying to echo data as quickly as possible over four TCP sockets not a problem, just an occasional receive packet loss which TCP is designed to cope with).

0 ToddMullanix over 7 years ago in reply to Chester Gillon

TI__Guru* 96960 points

Hi Chester,

Thanks for the analysis. We'll look into it on Monday.

Todd

0 Gerardo Gomez Martinez over 7 years ago in reply to Chester Gillon

TI__Expert 5825 points

Hi Chester,

Thank you for bringing this to our attention. A bug has been filed internally to address this.

Thanks,
Gerardo

0 Chester Gillon over 7 years ago in reply to Gerardo Gomez Martinez

Guru 92251 points

Gerardo Gomez Martinez said:
A bug has been filed internally to address this.

Thanks for raising the bug.

While my suggested modification to the Tiva EMACSnow.c changed how the Hwi and Swi communicated, comparing the tirtos_tivac_2_16_01_14\products\tidrivers_tivac_2_16_01_13\packages\ti\drivers\emac\ and simplelink_msp432e4_sdk_2_10_00_17\source\ti\drivers\emac\EMACMSP432E4.c files shows that the MSP432E4 SDK:

a) Handles the EMAC interrupts in the Hwi.

b) Doesn't create a Swi.

Therefore, the MSP432E4 EMAC driver shouldn't suffer the Tiva TI-RTOS bug identified in this thread.

As far as I can tell the MSP432E devices are a subset of the TM4C129 devices sharing the same peripherals. Therefore, the MSP432E4 SDK may contain but fixes and improvements which could be ported to the Tiva TI-RTOS.

Arm-based microcontrollers

Arm-based microcontrollers forum

RTOS/EK-TM4C1294XL: NDK in TI-RTOS for TivaC 2.16.1.14 can get into a state where unable to transmit packets