Tool/software: TI-RTOS
As a result of RTOS/TM4C129XNCZAD: Is there a way to reset Ethernet Phy and NDK? which describes an intermittent failure of Ethernet communication, created an Ethernet stress-test of TI-RTOS for TivaC 2.16.1.14 by using a modified tcpEcho_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT example with changes to allow the data rate to be increased. When using four off tcpSendReceive programs with a data size of 4380 and zero delay the Tiva Ethernet is transmitting and receiving about 23 Mbits/second, but after a several seconds to tens of minutes the test stops due to the Tiva being unable to transmit Ethernet packets. Watching the 'EMACSnow.c'::EMACSnow_private EMAC driver statistics in the CCS Expressions view shows that Ethernet packets can still be received (rxCount and isrCount increment) but Ethernet packets can't be transmitted (txSent doesn't change and txDropped increments).
The EMACSnow.c driver in TI-RTOS handles interrupts by:
a) EMACSnow_hwiIntFxn() calls EMACIntStatus() to read the interrupt status, and then EMACIntClear() to clear the interrupt status.
b) EMACSnow_hwiIntFxn() disables the interrupts.
c) EMACSnow_hwiIntFxn() stores the interrupt status in the global variable g_ulStatus.
d) EMACSnow_hwiIntFxn() calls Swi_post() to have the Swi handle the incoming packets.
e) EMACSnow_handlePackets() which is the Swi handler reads g_ulStatus to determine if to process the transmit and/or receive interrupts.
f) EMACSnow_handlePackets() re-enables the interrupts.
By investigating the cause of the test failure found a condition under which EMACSnow_hwiIntFxn() can be called twice before EMACSnow_handlePackets() is run which can lead to the failure. E.g. the sequence:
1) All four transmit descriptors are in use.
2) EMACSnow_hwiIntFxn() is called and sets g_ulStatus to 0x10041 meaning transmit and receive interrupts are pending.
3) EMACSnow_hwiIntFxn() is called again and there are no pending interrupts resulting in g_ulStatus being set to zero.
4) EMACSnow_handlePackets() runs, but since g_ulStatus is zero resulting in EMACSnow_handlePackets() thinking no transmit or receive interrupts need to be processed.
5) The transmit descriptors have completed their transmission, and the EMAC peripheral doesn't generate any more transmit interrupts.
6) Since the pending transmit interrupt from 2) has been lost EMACSnow_processTransmitted() is not called.
7) The Tiva NDK is stuck thinking all transmit descriptors are use waiting for transmission to complete, resulting in all further attempts to queue pbufs for transmission to fail with EMACSnow_private.txDropped being incremented.
The following capture from Data Variable Tracing on the g_ulStatus shows the problem of a lost transmit interrupt occurring:
In normal use, for each interrupt handled there should be one write to g_ulStatus from EMACSnow_hwiIntFxn() and two reads of g_ulStatus from EMACSnow_handlePackets (the first read tests for transmit interrupts and the second read tests for receive interrupts). The highlighted line in the trace above is EMACSnow_hwiIntFxn() over-writing the previous value indicating transmit and receive interrupts are pending with a zero, before the interrupts have been handled by EMACSnow_handlePackets().
The modified TCP echo example which showed the failure is attached, and the readme file describes the changes made and investigation into the failure mode.7080.tcpEcho_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT.zip