This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: Ethernet controller (EMAC) and new undocumented race condition on receive

Part Number: TMS570LC4357
Other Parts Discussed in Thread: HALCOGEN

I have characterized the same EMAC hardware race condition seen in this thread.

If you use LwIP in your testing, you wouldn't see this issue, as the LwIP EMAC driver has implemented a workaround that it waits for EOQ before adding to the chain.

In practice this delay consumes a significant amount of time (too much for usage in our real-time system).  The workaround we settled on is that when a new CPPI descriptor is being chained to the active CPPI descriptor, we enable the packet completion interrupt.  Then, when the packet completes, the interrupt fires and the ISR restarts the EMAC if it has stalled.

  • The LwIP code mentioned above.

    HALCoGen EMAC Driver with lwIP Demonstration/v00.03.00/TMS570LC43x/HALCoGen-TMS570LC43x/source/HL_emac.c:1427

        /* Wait for the EOQ bit is set */
        /*SAFETYMCUSW 28 D MR:NA <APPROVED> "Hardware status bit read check" */
    	/*SAFETYMCUSW 134 S MR:12.2 <APPROVED> "LDRA Tool issue" */
    	/*SAFETYMCUSW 45 D MR:21.1 <APPROVED> "Valid non NULL input parameters are assigned in this driver" */	 
        while (EMAC_BUF_DESC_EOQ != (EMACSwizzleData(curr_bd->flags_pktlen) & EMAC_BUF_DESC_EOQ))
    	{
    	}

  • Hello Stephen,

    This demo code doesn't append the new descriptor to the existing list. It waits until EOQ and starts a new.
  • I agree with your analysis.  My question is why the LwIP driver is implemented in this way as the technical reference makes no mention of requiring this behavior.  If instead the EOQ is only checked once (as recommended in the technical reference) does the demo continue to work?  In my experiments, there is a race condition around reading EOQ, where EOQ will be unset when read (but will be set sometime later)- and result in the chain stalling.

  • Hello Stephen,

    The TRM says that the EMAC supports the feature of appending the packet to the existing list. The avoid the race condition, the SW should check the buffer descriptor flags.

    There is a potential race condition where the EMAC may read the “next” pointer of a descriptor as NULL in the instant before an application appends additional descriptors to the list by patching the pointer. This case is handled by the software application always examining the buffer descriptor flags of all EOP packets, looking for a special flag called end of queue (EOQ).

    When the software application sees the EOQ flag set, the application may at that time submit the new list, or the portion of the appended list that was missed by writing the new list pointer to the same HDP that started the process.
  • QJ Wang said:
    When the software application sees the EOQ flag set, the application may at that time submit the new list, or the portion of the appended list that was missed by writing the new list pointer to the same HDP that started the process.

    The problem is "when" can the software application see that the EOQ flag is set.  The LwIP solution is to halt until the EOQ is set, but this delay it too great for our real-time application- the wait appears to be characterized by the time to transmit the data, which for a 1500 byte frame @ 100Mbps would be 120us.

    The only solution we've found is to check the EOQ upon the packet transmission interrupts.

     

     

  • Hello Stephen,

    I tested EMAC based on LWIP demo code generated through our HAlCoGen. Can you share your code so I can do test on my bench?
  • I've attached a patch file.

    7838.patch.txt

  • Hi Stephen,

    In LWIP demo, the functions in hdkif.c are used rather than the function in hl_emac.c. Did you change the code in hdkif.c too? Is this the only change to make them work? Thanks
  • Hmm, I'm not sure why your project is using the hdkif.c as when I use the RM57x/Build-RM57x its using the hl_emac.c, but here is the hdkif.c info:

    The notes in hdkif.c indicate that the code I'm concerned about is known as a workaround (comment on line 329)

    hdkif.c @ line 327

      else {
        curr_bd = txch->active_tail;
        /* TODO: (This is a workaround) Wait for the EOQ bit is set */
        while (EMAC_BUF_DESC_EOQ != (hdkif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_EOQ));
        /* TODO: (This is a workaround) Don't write to TXHDP0 until it turns to zero */
      	while (0 != *((uint32 *)0xFCF78600));
        curr_bd->next = hdkif_swizzle_txp(active_head);
        if (EMAC_BUF_DESC_EOQ == (hdkif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_EOQ)) {
          /* Write the Header Descriptor Pointer and start DMA */
          EMACTxHdrDescPtrWrite(hdkif->emac_base, (unsigned int)(active_head), 0);
        }
    

    PATCH:

    --- "HALCoGen EMAC Driver with lwIP Demonstration/v00.03.00/lwip-1.4.1/ports/hdk/netif/hdkif.c"	2018-11-14 13:55:21.865364850 -0800
    +++ "HALCoGen EMAC Driver with lwIP Demonstration/v00.03.00/lwip-1.4.1/ports/hdk/netif/hdkif.new.c"	2018-11-14 13:54:49.910000000 -0800
    @@ -324,10 +324,8 @@
        * Chain the bd's. If the DMA engine, already reached the end of the chain, 
        * the EOQ will be set. In that case, the HDP shall be written again.
        */
    -  else {
    +  else if (EMAC_BUF_DESC_EOQ == (hdkif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_EOQ)) {
         curr_bd = txch->active_tail;
    -    /* TODO: (This is a workaround) Wait for the EOQ bit is set */
    -    while (EMAC_BUF_DESC_EOQ != (hdkif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_EOQ));
         /* TODO: (This is a workaround) Don't write to TXHDP0 until it turns to zero */
       	while (0 != *((uint32 *)0xFCF78600));
         curr_bd->next = hdkif_swizzle_txp(active_head);
    
  • Warning! there are two different EOQ race conditions in EMAC.
    One in transmit and another similar in receive data fragment queue.
    See to my original post e2e.ti.com/.../2576160
    We are able to repeat both cases.

    Problem is that our driver is heavily modified. We don't using original interrupt functions (they are too heavy for our interrupt functions limit, it is mapped to RTOS processes events). And our driver support multiple memory fragment queue for multiple packet priority (802.1q tags)
  • Jiri Dobry said:
    Warning! there are two different EOQ race conditions in EMAC.

    I agree, both RX and TX chains have this race condition.  I was concentrating on the TX case first as its easier to induce.

  • Thanks Jiri, Stephen

    Our SW team is planning to study the code and update the driver in 1Q 2019. I will let you know whenever the new driver is available.
  • Sorry by we don't need new driver. We have own, because default not support all necessary features (ex zerocopy data flow and multiple RX/TX queues for different priorities.)
    But we need confirmation for this two bugs, analyse and recommended workaround.