This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TM4C129ENCPDT: Ethernet port becomes unreliable

Part Number: TM4C129ENCPDT

Hi,

We have a Ethernet network product that uses TM4C129ENCPDT. These is an issue that we need to find out the cause.

From time to time the device would become unreliable in terms of receiving and transmitting Ethernet packets.

We have a testing device that constantly sends test packet to other device over Ethernet. The tested device will simply bounce back the test packet back so that the testing device can compare the numbers of packets that are sent and received. If the 2 numbers don't match, it means packets are lost.

When the issue happens, we found that there are lots of lost packets during testing.

One interesting thing is that, without resetting the device, i.e., the firmware reset, we can trigger a procedure in the device from outside(through a command). This procedure will reset the Ethernet port. After this, the issue will be cleared and no lost packet will be detected.

It looks like the issue is not related to the firmware, but to the Ethernet controller hardware. My question is, what could go wrong inside the Ethernet MAC that we can do to prevent it from firmware?

The procedure that can cure the issue is as follows:

void
InitializeEthernet()
{
    //
    // Enable and reset the Ethernet modules.
    //
    SysCtlPeripheralEnable(SYSCTL_PERIPH_EMAC0);
    SysCtlPeripheralEnable(SYSCTL_PERIPH_EPHY0);
    SysCtlPeripheralReset(SYSCTL_PERIPH_EMAC0);
    SysCtlPeripheralReset(SYSCTL_PERIPH_EPHY0);
    //
    // Wait for the MAC to be ready.
    //
    while(!SysCtlPeripheralReady(SYSCTL_PERIPH_EMAC0)) {
        WdogTemporyFeed();
    }
    //
    // Configure for use with the internal PHY.
    //
    EMACPHYConfigSet(EMAC0_BASE, (EMAC_PHY_TYPE_INTERNAL | EMAC_PHY_INT_MDIX_EN | EMAC_PHY_AN_100B_T_FULL_DUPLEX));
    //
    // Reset the MAC to latch the PHY configuration.
    //
    EMACReset(EMAC0_BASE);
    //
    // Initialize the MAC and set the DMA mode.
    //
    EMACInit(EMAC0_BASE, SysClock, EMAC_BCONFIG_MIXED_BURST | EMAC_BCONFIG_PRIORITY_FIXED, 4, 4, 0);
    //
    // Set MAC configuration options.
    //
    EMACConfigSet(EMAC0_BASE, ( EMAC_CONFIG_FULL_DUPLEX | EMAC_CONFIG_CHECKSUM_OFFLOAD |     EMAC_CONFIG_7BYTE_PREAMBLE |
                                EMAC_CONFIG_IF_GAP_96BITS |    EMAC_CONFIG_USE_MACADDR0 | EMAC_CONFIG_SA_FROM_DESCRIPTOR |
                                EMAC_CONFIG_BO_LIMIT_1024),
                              ( EMAC_MODE_RX_STORE_FORWARD | EMAC_MODE_TX_STORE_FORWARD | EMAC_MODE_TX_THRESHOLD_64_BYTES |
                                EMAC_MODE_RX_THRESHOLD_64_BYTES), 0);
    //
    // Initialize the Ethernet DMA descriptors.
    //
    InitDescriptors(EMAC0_BASE);
    //
    // Program the hardware with its MAC address (for filtering).
    //
    EMACAddrSet(EMAC0_BASE, 0, MacAddr);

    //
    // Enable the Ethernet RX Packet interrupt source.
    //
    EMACIntEnable(EMAC0_BASE, EMAC_INT_RECEIVE | EMAC_INT_PHY);

    SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOF);
    GPIOPinTypeEthernetLED(GPIO_PORTF_BASE, GPIO_PIN_0 | GPIO_PIN_1);
    GPIOPinConfigure(GPIO_PF0_EN0LED0);
    GPIOPinConfigure(GPIO_PF1_EN0LED2);

    EMACPHYExtendedWrite(EMAC0_BASE,0,EPHY_LEDCR,EPHY_LEDCR_BLINKRATE_20HZ);
    EMACPHYExtendedWrite(EMAC0_BASE,0,EPHY_LEDCFG,EPHY_LEDCFG_LED0_LINK | EPHY_LEDCFG_LED2_RXTX);

    // The driver lib is not correct. This bit actually controls the polarity of LED indicator
    //
    HWREG(EMAC0_BASE + EMAC_O_CC) |= EMAC_CC_ECEXT;
}

  • It looks like the issue is not related to the firmware, but to the Ethernet controller hardware. My question is, what could go wrong inside the Ethernet MAC that we can do to prevent it from firmware?

    The procedure that can cure the issue is as follows:

    Hi,

      I suppose InitializeEthernet() is the same when the EMAC is first initialized vs when you call again through an external command. Is that correct?

      Can you repeat the same issue on a LaunchPad running the same firmware? Do you see the lost packets?

      If the LaunchPad is working then can you compare if there is any difference on the schematic for the Ethernet interface? Did you follow the TM4C129 System Design guideline section 4.1? https://www.ti.com/lit/pdf/spma056

     

  • Hi Charles,

    Yes, it is the same procedure that is called on start up and through external command.

    I cannot run the firmware on LaunchPad since the hardware is too different. I am planning to play with the procedure that is invoked by command to find out if this is related to the MAC or the PHY. Since the issue rarely happens, it may take some time to get some result.

    At the same time, if there is answer to the following question, it will be helpful at this moment:

    Is it possible that the MAC/PHY is stuck in some kind of error state that shows such symptom? 

    Some updates:

    It looks like the issue is more related to Ethernet transmitting side. The device has an audio streaming feature. With the issue, the streamed audio from this device and played on other network device sounds cracking, meaning some of the audio packet are lost (not sent out).

    Regards,

    Tianlei

  • I cannot run the firmware on LaunchPad since the hardware is too different. I am planning to play with the procedure that is invoked by command to find out if this is related to the MAC or the PHY. Since the issue rarely happens, it may take some time to get some result.

    Hi Tianlei,

      I'm not really sure why packets are lost. What does Wireshark show? 

      Just to make sure it is not a network issue, can you try on a different network? A different switch or router? Do you see the same issue? How about connecting the MCU to your audio device directly using a crossover cable. Do you see the same issue?

    Is it possible that the MAC/PHY is stuck in some kind of error state that shows such symptom? 

    You are just losing packets right? In another word, after some packets are lost, the EMAC will continue to send more packets. Is that correct? That means the MAC/PHY is not stuck. If it is stuck, I think the transmission will stop totally. 

  • Hi Charles,

    The issue was shown on different networks. Also we have several products with lots of sales but this issue only happens on one product. So I am sure that the packets are lost inside the device, not from the network. The cracking sound means the packets are continuously sent, with a few missing packets from time to time.

    Audio packets are scheduled by a timer. Since the issue can be fixed by only resetting the MAC or PHY, it means it is not a scheduling issue.

    My plan is to isolate whether it is the MAC or the PHY that is in error. I will keep you updated.

    Regards,

    Tianlei

  • Hi Tianlei,

      Yes, please keep me posted with your investigation. Can you confirm that once you send the command to reinitialize EMAC/PHY the transmission will continue indefinitely without dropping packets?

  • The device comes back to normal after re-initialization of EMAC/PHY.

  • ok. Please use wireshark to examine any difference on the network traffic between first initialization vs second initialization. Can you also repeat the same issue on another board of the same design. This is to isolate if the problem only occurs on one particular board. 

  • We saw the issue on multiple boards. The customer reported that the this kind of device sometimes was missing from the network. This is probably the same issue, due to the lost reply packet to the monitoring packet.

  • Hi Charles,

    I got some progress on the issue. The issue was not related to the PHY. I used the command to reset the PHY while there was an issue and the command did not help.

    It is very likely the issue was related to the Tx descriptor usage. Although it is hard to reproduce the issue in real operation, I found a way to trigger an issue which has almost the same symptom. It's quite possible the root of the issues is the same.

    What I did to trigger the issue was to disrupt the Tx descriptor index.  The regular usage of the index (g_ui32TxDescIndex) is as follows

    #define NUM_TX_DESCRIPTORS 16

    tEMACDMADescriptor g_psTxDescriptor[NUM_TX_DESCRIPTORS] __attribute__ ((aligned (16)));

    int32_t
    PacketTransmit(uint8_t *pui8Buf, int32_t i32BufLen)
    {
        uint32_t next=g_ui32TxDescIndex + 1;
        if(next == NUM_TX_DESCRIPTORS) {
            next=0;
        }

        tEMACDMADescriptor *desc=g_psTxDescriptor + next;

        while(desc->ui32CtrlStatus &    DES0_TX_CTRL_OWN)
        {    }

        desc->ui32Count = (uint32_t)i32BufLen > 68 ? i32BufLen : 68;
        desc->pvBuffer1 = pui8Buf;
        desc->ui32CtrlStatus = (DES0_TX_CTRL_LAST_SEG | DES0_TX_CTRL_FIRST_SEG |
                                DES0_TX_CTRL_INTERRUPT | DES0_TX_CTRL_IP_ALL_CKHSUMS |
                                DES0_TX_CTRL_CHAINED | DES0_TX_CTRL_OWN);

        g_ui32TxDescIndex=next;

        EMACTxDMAPollDemand(EMAC0_BASE);
        return(i32BufLen);
    }

    This is how I disrupt it when the following procedure is called by a remote command.

    void
    TxDescriptorIndexChange()
    {
        g_ui32TxDescIndex += 15;
        if(g_ui32TxDescIndex >= NUM_TX_DESCRIPTORS) {
            g_ui32TxDescIndex -= NUM_TX_DESCRIPTORS;
        }
    }

    The above procedure intentionally forces the index to go backwards by 1. This creates a condition that when the next packet is sent, it is put into a wrong descriptor that cannot be found by the DMA controller, and this packet will be stuck in the descriptor. However, the next14 packets will be sent normally since the descriptor they use will match the DMA controller. The 15th packet will be sent followed by the first stuck packet since the index has looped back.

    If my above understanding is right, the issue should recover by itself after 15 packets are sent. But it was not the case. The issue will persist forever after the error is injected.

    However, I can clear the issue by triggering the following procedure:

    void
    TxDescriptorReset()
    {
        uint32_t ui32Loop;
        tEMACDMADescriptor *desc;
        for(ui32Loop = 0; ui32Loop < NUM_TX_DESCRIPTORS; ui32Loop++) {
            desc=g_psTxDescriptor + ui32Loop;
            desc->ui32CtrlStatus = (DES0_TX_CTRL_LAST_SEG | DES0_TX_CTRL_FIRST_SEG |
                                    DES0_TX_CTRL_INTERRUPT | DES0_TX_CTRL_CHAINED |
                                    DES0_TX_CTRL_IP_ALL_CKHSUMS);
        }
    }

    At this moment I cannot figure out how the Tx descriptor index gets disrupted in real operation. The interrupt is always disabled whenever function PacketTransmit() is called. However, if I can find a simple way to recover the issue aftermath, it may be also a solution for us. So the answer of the following question is import to me:

    Why the injected error (by disrupting the Tx descriptor index) won't recover by itself?

    Thanks

  • Hi Tianlei,

      Can you please tell me if you are using TI-RTOS NDK? In which file is the PacketTransmit? Is it your own code? If you are using NDK, can you take a look at this post and see if it applies to you. It seems like the post is describing a very similar problem compared to yours if you are using NDK. 

    https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/710149/rtos-tm4c1290ncpdt-ndk-in-ti-rtos-works-intermittently-after-being-shutdown-and-restarted

  • Hi Charles,

    I am not using TI-RTOS NDK. It's a bare metal project. The post does not seem to be related to our issue.

  • Hi Tianlei,

      Are you using any TCP/IP stack? If yes, which one?  Is PacketTransmit your own code? 

  • Hi Charles,

    We do not use TCP/IP stack. The PacketTransmit is copied from somewhere, I can't recall where, probably from on of Tivaware sample project.

    I have more progress now and I can kind of explain the behavior.

    The root of the issue (both confirmed for real issue and the induced issue in my previous post) is that there are a few un-released Tx DMA descriptor (DES0_TX_CTRL_OWN is always set)stuck in the descriptor chain. The cause of the un-released descriptor is unknown.

    There are total of 16 Tx descriptors. These descriptors are chained in a loop. Suppose the 10th descriptor is un-released and the currently g_ui32TxDescIndex is in-sync with DMA controller's current Tx descriptor pointer. The transmission of packets will be fine till the 9th descriptor is to be used. At this moment, the new packet will be sent and the the packet stuck in the 10th descriptor will also be sent. After this the DMA controller points to 11th descriptor, but g_ui32TxDescIndex only gets incremented once, so it will be pointing to the 10th descriptor. The next packet transmission will be using the 10th descriptor but since the position does not match the DMA's, this packet will not be sent. The result is that the 10th descriptor become stuck again. As a result, any packet that is using the 10th descriptor will always be delayed by 16 packets, and even worse, when the packet is sent, the packet data probably will have been altered by other part of the code.

    This is why such condition can never recover, and shows lost packets in testing and cracking sound when streaming audio.

    For now I have a passive solution of the issue. Every time a packet is sent, the software will check if the index is not in sync with DMA pointer and if so correct the index.

    int32_t
    PacketTransmit(uint8_t *pui8Buf, int32_t i32BufLen)
    {
        uint32_t next=g_ui32TxDescIndex + 1;
        if(next == NUM_TX_DESCRIPTORS) {
            next=0;
        }

        tEMACDMADescriptor *desc=g_psTxDescriptor + next;

        // Passive solution of un-released Tx descritpor
        tEMACDMADescriptor *current=EMACTxDMACurrentDescriptorGet(EMAC0_BASE);
        if(desc->DES3.pLink == current) {
            g_ui32TxDescIndex=next;
            return (0);
        }

        while(desc->ui32CtrlStatus &    DES0_TX_CTRL_OWN)
        {
        }

        desc->ui32Count = (uint32_t)i32BufLen > 68 ? i32BufLen : 68;
        desc->pvBuffer1 = pui8Buf;
        desc->ui32CtrlStatus = (DES0_TX_CTRL_LAST_SEG | DES0_TX_CTRL_FIRST_SEG |
                                DES0_TX_CTRL_IP_ALL_CKHSUMS |
                                DES0_TX_CTRL_CHAINED | DES0_TX_CTRL_OWN);

        g_ui32TxDescIndex=next;

        EMACTxDMAPollDemand(EMAC0_BASE);
        return(i32BufLen);
    }

    I will continue to investigate what is causing the descriptor not to be released.

    Thanks

  • We do not use TCP/IP stack. The PacketTransmit is copied from somewhere, I can't recall where, probably from on of Tivaware sample project.

    Hi Tianlei,

      Earlier, I was searching for PacketTransmit and can't find any reference in the TivaWare library. Normally, a TCP/IP stack like lwIP is used to abstract from the hardware and also manage the data transfers between connected hosts per TCP/IP protocol. If you are managing the descriptors yourself then it is very hard to diagnose as you are dealing with the EMAC hardware directly like the chained descriptors. There is probably some race condition between the CPU and DMA as to who is OWNing the descriptor. I think that is what you are mostly likely facing. 

  • Hi Charles,

    Is there standard EMAC handling code (not bundled with TCP/IP) that I can take reference?

  • Hi Tianlei,

      I will suggest you look at C:\ti\TivaWare_C_Series-2.2.0.295\third_party\lwip-1.4.1\ports\tiva-tm4c129\netif\tiva_tm4c129.c file. This file handles both the TX and RX descriptor management and various device drivers to receive and transmit Ethernet frames. 

  • Hi Charles,

    The cause of the issue was found. We have a macro defined for disabling the interrupt by setting the BASEPRI register to certain value. However, the value we used happens to be zero, which means the masking function is disabled, and this macro will not have any effect. This causes the function PacketTransmit unprotected and got re-entered over time.

  • Hi Tianlei,

      Super! Really glad that you resolved the issue on your own. 

  • Hi Charles, thanks for your help.