This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM46 LWIP UDP stuck in hdkif_rx_inthandler when multiple UDP packets received

Other Parts Discussed in Thread: HALCOGEN

Hi,

I originally posted this as a reply to a question in  "https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/445955" 

I am having a problem with the code getting locked up in the hkdif_rx_handler typically within the

  while(hdkif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_SOP)  section of code on the RM46L852PGET package.  In Halcogen 4.05 the only drivers / pinmux enabled are MIBSPI3, MII, and SCI, with MIBSPI3NCS_1 conflict cleared and MIBSPI3NENA cleared (as opposed to CS_5).MDIO and MDCLK are checked.  The VIM RAM is set as expected, and the clocks are also configured  to be the same as the LWIP Ethernet Demo.  The Ethernet Demo is used but modified for UDP instead of HTTP.  

I am using UDP mode on LWIP, and if the host computer sends one UDP packet per cycle (i.e. the packet gets handled in a timely fashion) there are no issues and everything works, whereas if two UDP packets are sent sequentially without being processed by the ISR, the code gets stuck in the ISR, and typically  at the hkdif_swizzle_data(curr_bd->flags_pktlen) & EMAC_BUF_DESC_SOP)

flags_pktlen is a big number (3942645758).

Any thoughts?  It's frustrating not being able to handle the receipt of multiple UDP packets without locking up the microcontroller.

Josh Karch

  • Hi Joshua,

    Is there any information in the MAC Network Statistics Registers that would help us figure out what's going on?

    These registers should begin at 0xFCF7_8200.

  • Anthony, I have some memory information here from the memory browser: Also FYI the problem exists on the HDK with the ZWT part too. This capture is from the HDK so we have known hardware with this issue:


    EMAC_RXGOODFRAMES
    00000CD0
    EMAC_RXBCASTFRAMES
    0000095A
    EMAC_RXMCASTFRAMES
    00000000
    EMAC_RXPAUSEFRAMES
    00000000
    EMAC_RXCRCERRORS
    00000000
    EMAC_RXALIGNCODEERRORS
    00000000
    EMAC_RXOVERSIZED
    00000000
    EMAC_RXJABBER
    00000000
    EMAC_RXUNDERSIZED
    00000000
    EMAC_RXFRAGMENTS
    00000000
    EMAC_RXFILTERED
    000006E9
    EMAC_RXQOSFILTERED
    00000000
    EMAC_RXOCTETS
    00047A98
    EMAC_TXGOODFRAMES
    00000004
    EMAC_TXBCASTFRAMES
    00000003
    EMAC_TXMCASTFRAMES
    00000000
    EMAC_TXPAUSEFRAMES
    00000000
    EMAC_TXDEFERRED
    00000000
    EMAC_TXCOLLISION
    00000000
    EMAC_TXSINGLECOLL
    00000000
    EMAC_TXMULTICOLL
    00000000
    EMAC_TXEXCESSIVECOLL
    00000000
    EMAC_TXLATECOLL
    00000000
    EMAC_TXUNDERRUN
    00000000
    EMAC_TXCARRIERSENSE
    00000000
    EMAC_TXOCTETS
    00000344
    EMAC_FRAME64
    0x00000F14
    EMAC_FRAME65T127
    0x0000090E
    EMAC_FRAME128T255
    0x00000319
    EMAC_FRAME256T511
    0x00000093
    EMAC_FRAME512T1023
    0x00000000
    EMAC_FRAME1024TUP
    0x00000000
    EMAC_NETOCTETS
    0x0019C3A7
    EMAC_RXSOFOVERRUNS
    0x00001B16
    EMAC_RXMOFOVERRUNS
    0x00000000
    EMAC_RXDMAOVERRUNS
    0x0000190E 0x0016ACC4 0x0000190E 0x00000000 0x0000190E 0x0016ACC4 0x0000190E 0x00000000
    0x0000190E 0x0016ACC4 0x0000190E 0x00000000 0x0000190E 0x00000000 0x00000000 0x00000000
    0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000344 0x00000DF0
    0x00000877 0x000002DC 0x00000083 0x00000000 0x00000000 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
    0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5 0x5BBC0DE5
  • Thank you by the way for your quick response!
    -Josh
  • Hi Josh,

    I'm not really an expert on the EMAC but from these statistics, it looks like there are quite a few RX Overruns being recorded. (See EMAC_RXSOFOVERRUNS, EMAC_RXDMAOVERRUNS counts).

    So I would think that either making the receive matching more selective or increasing the number of DMA buffer resources (which means using more RAM)... Or possibly looking at latency issues.

    Maybe the first thing to do is to check that you don't have an unreasonably small number of receive buffers for the task at hand. I'd think this number would be written into the RXnFREEBUFFER register somewhere early in the initialization code if you can't find it elsewhere.

    Then it might be good to check the filtering - there are some sections in the EMAC TRM chapter that explain what filtering options there are - see Receive Channel Enabling and Receive Address Matching.

    Also you might want to take a quick look at 31.2.13 regarding latency. If your descriptors are pointing to buffers in off chip RAM especially if its' slow async SRAM you may have a problem with latency. Not only would it be slower for the DMA to write the buffer but it's going to take longer for the CPU to process the buffers and return them to the queue.

    Let me know if this gives you some things to check or if you've already checked these things and think something else may be going on.

    Best Regards,
    Anthony
  • Anthony, so in Halcogen there are 10 1500 byte buffers allocated automatically. They should be using on-chip RAM because my RM46L852PGE has no off-chip memory, so I don't think that's the issue. It just seems that two separate (<100 byte) messages being sent back to back is killing the task. The second message has a payload count of four bytes.

    So the question is, what is causing RX Overruns? Is there someone at TI who can run a straight test on the RM46 example and replace the HTTP server with the UDP mode and send two back to back messages? That's basically where I'm at right now.

    Regards,

    Josh
  • Hi Josh,

    Some other community members have tried lwIP/UDP with Hercules. Seems that there's some success mentioned in this post: e2e.ti.com/.../1468982

    Maybe there's something in that post.

    If you post your code here someone might be able to test it out. I'd like to myself but won't be able to until e.o. next week.
  • Anthony,

    So it appears in the end the issue is that multiple messages come in and overrun the buffer before pbuf_free has the opportunity to be called. I refactored the code so that the callback function quickly determines the message received and memcopies the buffer in to the right location, then frees the buffer. I think we're almost there. The only other issue I seem to be having from time to time is getting a clean ethernet connection on power-up. I sometimes have to press the reset button to reset the PHY and MAC.
    Thank you, I think we are good for now. That buffer overrun lead was great and led me to rethink how we were handling the interrupt!

    Cheers,

    Josh
  • Hello Josh,

    I've the same Problem now.

    If it is possible for you, can you please post your reworked code here? You will help me a lot.

    Thank you in advance

    Best Regards

    Christian

  • Christian,

    I think it's best if Josh can post his solution - but you can try in the meantime to increase the number of receive pbuf through HalCoGen. There is a control on the EMAC Global screen. It defaults to 10 which is low. You can increase this.
  • Christian,  so the issue we had was not processing and releasing packets quickly enough.  The Int Handler needs to release buffers once packets have been received ASAP in order to prevent this lockup from occurring.

    Best Regards,

    Josh Karch

  • Hi Josh,

    To fix the issue though - were you able to simply increase the number of pBUF on the HalCoGen EMAC tab so that it would take longer to run out of buffers?
  • Anthony,

    I actually kept pBUF the same in my application, I just handled it faster with less delays.

    Best Regards,

    Josh
  • Hi,

    thanks for your answers.

    Increasing the pBuf is not possible because of low RAM. In one test case the PC sends 500 ARP request in one second, so the buffer cannot be big enough to handle this.

    A faster handling is also difficult because the EMAC rx ISR only posts the pointer to the buffer in a RTOS queue. Then a task makes the call to the ethernet stack.

    My idea is now to stop the EMAC from receiving if there are no free buffers and enable it again when the task has freed the buffers again. But until now my changed software doesn't work like expected. Is this a possible way to handle this issue?

    Best regards

    Christian

  • Christian,

    The EMAC has flow control capability - see 31.2.10.1.3  of the TRM.

    Can you make use of this - I think it essentially does what you are asking but in hardware by itself.